WO2023273729A1 - 字幕显示方法及相关设备 - Google Patents

字幕显示方法及相关设备 Download PDF

Info

Publication number
WO2023273729A1
WO2023273729A1 PCT/CN2022/095325 CN2022095325W WO2023273729A1 WO 2023273729 A1 WO2023273729 A1 WO 2023273729A1 CN 2022095325 W CN2022095325 W CN 2022095325W WO 2023273729 A1 WO2023273729 A1 WO 2023273729A1
Authority
WO
WIPO (PCT)
Prior art keywords
subtitle
electronic device
video
mask
color
Prior art date
Application number
PCT/CN2022/095325
Other languages
English (en)
French (fr)
Inventor
罗绳礼
Original Assignee
花瓣云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 花瓣云科技有限公司 filed Critical 花瓣云科技有限公司
Publication of WO2023273729A1 publication Critical patent/WO2023273729A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72436User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for text messaging, e.g. short messaging services [SMS] or e-mails
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72439User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for image or video messaging
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/278Subtitling

Definitions

  • the present application relates to the technical field of terminals, and in particular to a subtitle display method and related equipment.
  • the window displaying subtitles related to the played video also has a wide range of application scenarios, for example, displaying subtitles synchronized with audio in the video playback window, or displaying user-input subtitles in the video playback window to increase the interactivity of the video.
  • Embodiments of the present application provide a subtitle display method and related equipment, which can solve the problem of low subtitle recognition when users watch videos, and improve user experience.
  • an embodiment of the present application provides a method for displaying subtitles, the method comprising: playing a first video on an electronic device; when the electronic device displays a first interface, the first interface includes a first picture and a first subtitle , the first subtitle is floatingly displayed on the first area of the first picture with the first mask as the background, and the first area is in the first picture corresponding to the display position of the first subtitle area, wherein the difference between the color value of the first subtitle and the color value of the first area is the first value; when the electronic device displays the second interface, the second interface includes the second picture and The first subtitle, the first subtitle does not display a mask, the first subtitle is suspended and displayed above a second area of the second picture, and the second area is the display position of the first subtitle The corresponding area in the second picture, wherein the difference between the color value of the first subtitle and the color value of the second area is a second value, and the second value is greater than the first value;
  • the first frame is a frame in the first video
  • the second frame is a frame in
  • the electronic device can set a mask for the subtitle when the subtitle recognizability is low, and improve the subtitle recognizability without changing the subtitle color.
  • the method further includes: the electronic device acquires a first video file and a first subtitle file, where the first video file It is the same as the time information carried by the first subtitle file; the electronic device generates a first video frame based on the first video file, and the first video frame is used to generate the first picture; the electronic device generates the first video frame based on The first subtitle file generates a first subtitle frame, and obtains the color value and display position of the first subtitle in the first subtitle frame, wherein the time information carried by the first subtitle frame is the same as the first subtitle frame The time information carried by a video frame is the same; the electronic device determines the first area based on the display position of the first subtitle; the electronic device determines the first area based on the color value of the first subtitle or the color value of the first area value to generate the first mask; the electronic device superimposes the first subtitle on the first mask in the first subtitle frame to generate a second subtitle frame, and the second subtitle A frame is composited with the first video
  • the electronic device can obtain a video file to be played and a subtitle file to be displayed, then decode the video file to obtain a video frame, decode the subtitle file to obtain a subtitle frame, and then the electronic device can extract the subtitle color from the subtitle frame.
  • Gamut information, subtitle position information, etc. extract the color gamut information at the subtitle display position in the video frame corresponding to the subtitle based on the subtitle position information, and calculate based on the subtitle color gamut information and the color gamut information at the subtitle display position in the video frame corresponding to the subtitle Subtitle recognition degree, further calculate the color value of the mask corresponding to the subtitle based on the subtitle recognition degree to generate a masked subtitle frame, and then synthesize and render the video frame and the masked subtitle frame.
  • the method before the electronic device generates the first mask based on the color value of the first subtitle or the color value of the first region, the method further includes: the electronic device It is determined that the first value is less than a first threshold. In this way, the electronic device may further determine that the recognizability of the subtitle is low by determining that the first value is smaller than the first threshold.
  • the electronic device determining that the first value is smaller than a first threshold specifically includes: the electronic device dividing the first area into N first sub-areas, wherein the N is a positive integer; the electronic device determines that the first value is smaller than the first threshold based on the color value of the first subtitle and the color values of the N first subregions. In this way, the electronic device may determine that the first numerical value is smaller than the first threshold based on the color value of the first subtitle and the color values of the N first subregions.
  • the electronic device generating the first mask based on the color value of the first subtitle or the color value of the first region specifically includes: the electronic device generating the first mask based on the first subtitle
  • the color value of a subtitle or the color values of the N first sub-regions determines a color value of the first mask; the electronic device generates the first mask based on the color value of the first mask plate.
  • the electronic device may determine a color value of a first mask based on the color value of the first subtitle or the color values of the N first subregions, and further generate the first mask for the first subtitle.
  • the electronic device determining that the first value is smaller than a first threshold specifically includes: the electronic device dividing the first area into N first sub-areas, wherein the N is a positive integer; the electronic device determines whether to merge the adjacent first sub-areas into a second sub-area based on the color value difference between the adjacent first sub-areas; when adjacent When the color value difference between the first sub-regions is less than a second threshold, the electronic device merges the adjacent first sub-regions into the second sub-region; the electronic device merges the adjacent first sub-regions into the second sub-region; The color value of the first subtitle and the color value of the second subregion determine that the first value is smaller than the first threshold.
  • the electronic device can combine the first sub-regions with similar color values to generate the second sub-region, and further determine that the first value is smaller than the first sub-region based on the color value of the first subtitle and the color value of the second sub-region. a threshold.
  • the first area includes M second sub-areas, where M is a positive integer and less than or equal to the N, and the second sub-area includes one or more of the For the first sub-region, each of the second sub-regions includes the same or different numbers of the first sub-regions.
  • the electronic device can divide the first area into M second sub-areas.
  • the electronic device generating the first mask based on the color value of the first subtitle or the color value of the first region specifically includes: the electronic device generating the first mask based on the first subtitle
  • the color value of a subtitle or the color values of the M second sub-regions is sequentially calculated as the color values of the M first sub-masks; the electronic device generates the color values of the M first sub-masks based on the color values M first sub-masks, wherein the M first sub-masks are combined into the first mask.
  • the electronic device can generate M first sub-masks for the first subtitle.
  • the method further includes: when the electronic device displays a third interface, the third interface includes a third picture and the first subtitle, and the first subtitle includes at least the first part and The second part, the first part displays the second sub-mask, the second part displays the third sub-mask or does not display the third sub-mask, the color value of the second sub-mask is the same as the color value of the The third sub-mask has a different color value.
  • subtitles corresponding to multiple sub-masks can be displayed on the electronic device.
  • the display position of the first mask is determined based on the display position of the first subtitle. In this way, the display position of the first mask can coincide with the display position of the first subtitle.
  • the difference between the color value of the first mask and the color value of the first subtitle is greater than the first value. In this way, subtitle recognizability can be improved.
  • the display position of the first subtitle is not fixed or fixed relative to the display screen of the electronic device, and the The first subtitle is a piece of text or symbols displayed continuously.
  • the first subtitle can be a barrage or a subtitle synchronized with the audio, and the first subtitle is one subtitle instead of all the subtitles displayed on the display screen.
  • the method before the electronic device displays the first interface, the method further includes: the electronic device setting the transparency of the first mask to be less than 100%. In this way, it can be ensured that the video frame corresponding to the area where the first mask is located still has a certain degree of visibility.
  • the method before the electronic device displays the second interface, the method further includes: the electronic device generates a second subtitle based on the color value of the first subtitle or the color value of the second region. mask, and superimpose the first subtitle on the second mask, wherein the color value of the second mask is a preset color value, and the transparency of the second mask is 100%; Or, the electronic device does not generate the second mask. In this way, the electronic device can set a mask whose transparency is 100% for a highly recognizable subtitle, or set a mask for it.
  • the embodiment of the present application provides an electronic device, the electronic device includes one or more processors and one or more memories; wherein, the one or more memories and the one or more processors
  • the one or more memories are used to store computer program codes, the computer program codes include computer instructions, and when the one or more processors execute the computer instructions, the electronic device executes the above-mentioned first
  • the method described in any possible implementation manner the method described in any possible implementation manner.
  • the embodiment of the present application provides a computer storage medium, the computer storage medium stores a computer program, the computer program includes program instructions, and when the program instructions are run on the electronic device, the electronic device The device executes the method described in any possible implementation manner of the first aspect.
  • an embodiment of the present application provides a computer program product, which, when the computer program product is run on a computer, causes the computer to execute the method described in any possible implementation manner of the first aspect above.
  • FIG. 1 is a schematic flow chart of a subtitle display method provided in an embodiment of the present application
  • 2A-2C are schematic diagrams of a set of user interfaces provided by the embodiment of the present application.
  • FIG. 3 is a schematic flowchart of another subtitle display method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a subtitle frame provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the principle of generating a subtitle corresponding mask provided by an embodiment of the present application.
  • FIG. 6A is a schematic diagram of a masked subtitle frame provided by an embodiment of the present application.
  • FIG. 6B-FIG. 6C are schematic diagrams of a user interface for displaying a group of subtitles provided by the embodiment of the present application.
  • FIG. 7A is a schematic flowchart of a method for generating a subtitle corresponding mask provided by an embodiment of the present application
  • Fig. 7B is another schematic diagram of the principle of generating a subtitle corresponding mask provided by the embodiment of the present application.
  • FIG. 8A is a schematic diagram of another masked subtitle frame provided by the embodiment of the present application.
  • FIG. 8B-FIG. 8C are schematic diagrams of a user interface for displaying a group of subtitles provided by the embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a software structure of an electronic device provided by an embodiment of the present application.
  • Fig. 11 is a schematic structural diagram of another electronic device provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of another electronic device provided by an embodiment of the present application.
  • the process of interpreting the data of the image frame (also called a video frame) played by the video according to the compression algorithm of the video file.
  • the video playback window displays subtitles synchronized with the audio
  • Compositing means superimposing the subtitles on the corresponding video frames, and the position of the subtitles and the overlapping position of the video frames are relatively fixed.
  • the video playback window displays subtitles (that is, barrage) input by the user
  • subtitles that is, barrage
  • the position of the subtitles The overlapping position with the video frame is relatively fluid.
  • the video playback platform in order to improve the fun of video playback, usually provides users with the ability to independently select the color of subtitles.
  • the subtitle color is usually the default color of the system. Users can choose their own subtitle color when playing a video, and the electronic device will display the subtitle color according to the color selected by the user. Display subtitles on the video playback window.
  • the user who sends the bullet chat can choose the color of the bullet chat to be sent independently, and the color of the bullet chat seen by other users is consistent with the color of the bullet chat selected by the user who sent the bullet chat. Therefore, when the user watches the bullet chat, the color of each bullet chat displayed on the same video frame may be different.
  • the embodiment of this application provides a subtitle display method.
  • the electronic device can first obtain a video file to be played and a subtitle file to be displayed in the video playback window, and then perform video processing on the video files respectively. Decode the video frame, and decode the subtitle file to obtain the subtitle frame. After that, the video frame and the subtitle frame can be aligned and matched in chronological order, and the final video frame to be displayed is synthesized and stored in the video frame queue. After that, in chronological order Read and render the video frame to be displayed, and finally, display the rendered video frame to the video playback window.
  • FIG. 1 exemplarily shows a method flow of a subtitle display method provided by an embodiment of the present application.
  • the method can be applied to an electronic device 100 capable of playing video.
  • the specific steps of this method are described in detail below:
  • the electronic device 100 detects the user's operation of playing a video on a video application program, and in response to the operation, the electronic device 100 may obtain a video information stream and a subtitle information stream.
  • a video application program may be installed on the electronic device 100. After detecting the user's operation of playing a video on the video application program, in response to the operation, the electronic device 100 may obtain the video corresponding to the video that the user wants to play.
  • Information stream or called video file
  • subtitle information stream or called subtitle file
  • FIG. 2A is a user interface (user interface, UI) provided by the electronic device 100 for displaying applications installed on the electronic device 100 .
  • the electronic device 100 may detect the user's operation (such as a click operation) on the "video" application option 211 on the user interface 210, and in response to the operation, the electronic device 100 may display an exemplary user interface 220 as shown in FIG. 2B ,
  • the user interface 220 may be the main interface of the "Video" application program.
  • the electronic device 100 detects the user's operation (such as a click operation) on the video playback option 221 on the user interface 220, in response to the operation, the electronic device 100 may obtain the video.
  • the above-mentioned video information stream and subtitle information stream may be files downloaded by the electronic device 100 from the server of the above-mentioned video application program or files acquired in the electronic device 100 .
  • Both the video file and the subtitle file carry time information.
  • FIG. 2A and FIG. 2B only exemplarily show the user interface on the electronic device 100, and should not be construed as limiting the embodiment of the present application.
  • the video application program on the electronic device 100 sends a video information stream to the video decoding module on the electronic device 100 .
  • the video application may send the video information stream to the video decoding module.
  • the video decoding module on the electronic device 100 decodes the video information stream to generate a video frame, and sends the video frame to the video frame synthesis module on the electronic device 100 .
  • the video decoding module can decode the video information stream to generate a video frame, and the video frame can be all video frames in the video playing process, wherein one video A frame may also be referred to as an image frame, and each video frame may carry time information (that is, a time stamp) of the video frame.
  • the video decoding module may send the decoded and generated video frames to the video frame synthesis module for subsequent generation of video frames to be displayed.
  • the video decoding module may use a video decoding method in the prior art to decode the video information stream, which is not limited in this embodiment of the present application.
  • a video decoding method in the prior art to decode the video information stream, which is not limited in this embodiment of the present application.
  • the video application program on the electronic device 100 sends the subtitle information stream to the subtitle decoding module on the electronic device 100 .
  • the video application may send the subtitle information stream to the subtitle decoding module.
  • the subtitle decoding module on the electronic device 100 decodes the subtitle information stream to generate a subtitle frame, and sends the subtitle frame to the video frame synthesis module on the electronic device 100 .
  • the subtitle decoding module after the subtitle decoding module receives the subtitle information stream sent by the video application program, it can decode the subtitle information stream to generate a subtitle frame, and the subtitle frame can be all subtitle frames in the video playback process, wherein each The subtitle frame may include the subtitle text, the display position of the subtitle text, the font color of the subtitle text, the font format of the subtitle text, etc., and may also carry the time information (time stamp) of the subtitle frame. Afterwards, the subtitle decoding module may send the decoded subtitle frame to the video frame synthesis module for subsequent generation of a video frame to be displayed.
  • the subtitle decoding module may use a subtitle decoding method in the prior art to decode the subtitle information stream, which is not limited in this embodiment of the present application.
  • a subtitle decoding method for the specific implementation of the subtitle decoding method, reference may be made to technical materials related to subtitle decoding, which will not be repeated here.
  • the embodiment of the present application only takes the steps of the stage 2 video decoding stage first, and then the steps of the stage 3 subtitle decoding stage as an example.
  • the steps of the stage 3 subtitle decoding stage can also be executed first. Then execute the steps in the stage two video decoding stage, or, the steps in the stage two video decoding stage and the steps in the stage three subtitle decoding stage can also be executed simultaneously, which is not limited in this embodiment of the present application.
  • Stage 4 Video frame synthesis, rendering and display stage
  • the video frame synthesis module on the electronic device 100 superimposes and merges the received video frame and subtitle frame to generate a video frame to be displayed, and sends the video frame to be displayed to the video frame queue on the electronic device 100 .
  • the video frame synthesis module can match the time information corresponding to the video frame with the time information corresponding to the subtitle frame. After the matching is completed, the subtitle frame is superimposed on the corresponding video frame and combined to generate a video frame to be displayed. Afterwards, the video frame synthesis module can send the video frame to be displayed to the video frame queue.
  • the video rendering module may read video frames to be displayed from the video frame queue in chronological order, and render the video frames to be displayed in chronological order to generate rendered video frames.
  • the video rendering module may acquire video frames to be displayed in the video frame queue in real time (or at intervals). After the video frame synthesis module sends the video frames to be displayed to the video frame queue, the video rendering module can read and render the video frames to be displayed from the video frame queue in chronological order, and generate rendered video frames. Afterwards, the video rendering module can send the rendered video frame to the video application program.
  • the video rendering module to render the video frames to be displayed can use the video rendering method in the prior art, which is not limited in this embodiment of the present application.
  • the video rendering method reference may be made to technical documents related to video rendering, which will not be repeated here.
  • the electronic device 100 displays the rendered video frame.
  • the video application program on the electronic device 100 may display the rendered video frame on the display screen of the electronic device 100 (ie, the video playback window).
  • FIG. 2C it may be a picture of a certain frame in the rendered video frames displayed after the electronic device 100 executes the subtitle display method shown in FIG. 1 .
  • the subtitles "I am a subtitle spanning multiple color gamuts”, the subtitles “highly recognizable subtitles”, and the subtitles “indistinct color subtitles” are all barrage, and the display position of the barrage is relative to the electronic device 100 The display is not fixed.
  • the display position of the subtitle “subtitle synchronized with audio” is fixed with respect to the display screen of the electronic device 100 .
  • the embodiment of the present application provides another subtitle display method.
  • the electronic device can first obtain a video file to be played and a subtitle file to be displayed in the video playback window, and then perform video decoding on the video files respectively to obtain A video frame, subtitle decoding is performed on the subtitle file to obtain the subtitle frame, and then the electronic device can extract the subtitle color gamut information, subtitle position information, etc. from the subtitle frame, and extract the subtitle at the subtitle display position in the video frame corresponding to the subtitle based on the subtitle position information Color gamut information, and then calculate the subtitle recognition degree based on the subtitle color gamut information and the color gamut information at the subtitle display position in the video frame corresponding to the subtitle.
  • the subtitle recognition degree is low, you can add a mask for the subtitle, and calculate based on the subtitle recognition degree The color value and transparency of the mask to generate a masked subtitle frame. After that, the video frame and the masked subtitle frame can be aligned and matched in chronological order to synthesize the final video frame to be displayed and buffered to the video frame queue. , after that, read and render the video frames to be displayed in chronological order, and finally, display the rendered video frames to the video playback window.
  • the problem of low subtitle recognition can be solved by adjusting the color and transparency of the subtitle mask without changing the subtitle color selected by the user. At the same time, it can reduce the occlusion of the subtitle on the video content and ensure a certain degree of visibility of the video content. , improve user experience.
  • FIG. 3 exemplarily shows the method flow of another subtitle display method provided by the embodiment of the present application.
  • the method can be applied to an electronic device 100 capable of playing video.
  • the specific steps of this method are described in detail below:
  • the electronic device 100 detects the user's operation of playing a video on a video application program, and in response to the operation, the electronic device 100 may obtain a video information stream and a subtitle information stream.
  • step S301-step S302 reference may be made to the relevant content in step S101-step S102 in the embodiment shown in FIG. 1 above, which will not be repeated here.
  • the video application program on the electronic device 100 sends a video information stream to the video decoding module on the electronic device 100 .
  • the video decoding module on the electronic device 100 decodes the video information stream to generate a video frame, and sends the video frame to the video frame synthesis module on the electronic device 100 .
  • step S303-step S305 for the specific execution process of step S303-step S305, reference may be made to the relevant content in step S103-step S105 in the embodiment shown in FIG.
  • the video application program on the electronic device 100 sends the subtitle information stream to the subtitle decoding module on the electronic device 100 .
  • the subtitle decoding module on the electronic device 100 decodes the subtitle information stream to generate a subtitle frame.
  • steps S306-S307 for the specific execution process of steps S306-S307, reference may be made to relevant content in steps S106-step S107 in the embodiment shown in FIG. 1 above, which will not be repeated here.
  • FIG. 4 exemplarily shows one subtitle frame generated by decoding the subtitle information stream by the subtitle decoding module.
  • the area inside the rectangular solid-line frame may represent a subtitle frame display area (or called a video playback window area), which may overlap with a video frame display area.
  • One or more subtitles can be displayed in this area, for example, “I am a subtitle spanning multiple color gamuts”, “highly recognizable subtitles”, “indistinct color subtitles”, “subtitles synchronized with audio " and so on, “I am a subtitle spanning multiple color gamuts”, “highly recognizable subtitles” and so on can be called a subtitle respectively, and all the subtitles displayed in this area can be called a subtitle group, for example , “I am a subtitle spanning multiple color gamuts”, “highly recognizable subtitles”, “indistinct color subtitles”, and “subtitles synchronized with audio” can be called a subtitle group.
  • each subtitle shown in FIG. 4 is only an auxiliary element used to identify the position of each subtitle, and may not be displayed during video playback.
  • the subtitle decoding module on the electronic device 100 extracts subtitle position information, subtitle color gamut information, etc. of each subtitle in the subtitle frame, and generates subtitle group information.
  • the subtitle decoding module after the subtitle decoding module generates the subtitle frame, it can extract subtitle position information, subtitle color gamut information, etc. of each subtitle from the subtitle frame, thereby generating subtitle group information.
  • the subtitle position information may be the display position of each subtitle in the subtitle frame display area
  • the subtitle color gamut information may include the color value of each subtitle.
  • the subtitle group information may include subtitle position information and subtitle color gamut information of all subtitles in the subtitle frame.
  • the subtitle color gamut information may also include information such as brightness of the subtitle.
  • the subtitle display position area may be the inner area of the rectangular dotted frame just able to cover the subtitle as shown in FIG. 4 , or any other inner area of any shape that can cover the subtitle, which is not limited in this embodiment of the present application.
  • the subtitle position information extraction process is introduced by taking the area inside the rectangular dotted frame as the display position area of the subtitle as an example:
  • the subtitle decoding module can first establish an X-O-Y plane Cartesian coordinate system in the subtitle frame display area, and then select the subtitle frame A certain point in the display area (such as the vertex in the lower left corner of the rectangular solid line frame) is used as the reference coordinate point O.
  • the coordinates of the reference coordinate point O can be set to (0, 0).
  • the subtitle "I am a cross The coordinates (x1, y1), (x2, y2), (x3, y3), (x4, y4) of the four vertices of the rectangular dotted frame outside the "subtitle with multiple color gamuts" can be calculated, then the subtitle The position information of "I am a subtitle spanning multiple color gamuts" can include the coordinates of the four vertices of the dotted rectangle, or, since the rectangle is a regular figure, only one diagonal of the dotted rectangle needs to be determined The coordinates at the two vertices on the line can determine the position area where the rectangle is located. Therefore, the position information of the subtitle "I am a subtitle spanning multiple color gamuts” can also only include a pair of dotted lines in the rectangle. The coordinates at the two vertices on the diagonal.
  • the subtitle position information of other subtitles shown in FIG. 4 can also be extracted by the above subtitle position extraction method, which will not be repeated here.
  • the subtitle decoding module After the subtitle decoding module has determined the position information of all the subtitles in the subtitle frame, it means that the subtitle decoding module has completed the extraction of the subtitle position information.
  • subtitle position information extraction process described above is only a possible implementation of extracting subtitle position information, and the implementation of extracting subtitle position information can also be other implementation methods in the prior art.
  • the embodiment of the present application There is no limit to this.
  • the color value refers to the color value corresponding to a certain color in different color modes. Take the RGB color mode as an example. In the RGB color mode, a color is formed by mixing red, green, and blue.
  • the color value of each color can be represented by (r, g, b), where r, g and b represent the values of the three primary colors of red, green and blue respectively, and the value range is [0, 255].
  • the color value of red can be expressed as (255, 0, 0)
  • the color value of green can be expressed as (0, 255, 0)
  • the color value of blue can be expressed as (0, 0, 255)
  • the color value can be expressed as (0, 0, 0)
  • the color value of white can be expressed as (255, 255, 255).
  • the 224 different colors and the color value corresponding to each color can form a color value table, and the color value corresponding to each color can be found in the color value table.
  • the subtitle decoding module After the subtitle decoding module completes the extraction of the subtitle position information, based on the font color of the subtitle at the location of the subtitle, the color value corresponding to the font color can be searched in the color value table to determine the color value of the subtitle.
  • the subtitle decoding module After the subtitle decoding module has determined the color values of all the subtitles in the subtitle frame, it means that the subtitle decoding module has completed the extraction of subtitle color gamut information.
  • the subtitle decoding module on the electronic device 100 sends an instruction to acquire subtitle group mask parameters to the video frame color gamut interpretation module on the electronic device 100, and the instruction carries time information of the subtitle frame, subtitle group information, and the like.
  • the subtitle decoding module after the subtitle decoding module generates the subtitle group information, it can send an instruction to obtain the subtitle group mask parameter to the video frame color gamut interpretation module, and the instruction is used to instruct the video frame color gamut interpretation module to send the subtitle to the subtitle decoding module.
  • a group of corresponding mask parameters including the color value and transparency of the mask
  • a color value and a transparency can be called a group of mask parameters.
  • This instruction can carry the time information of the subtitle frame, the subtitle group information, etc., wherein the time information of the subtitle frame can be used to obtain the video frame corresponding to the subtitle group in subsequent steps, and the subtitle group information can be used to control the subtitle group in subsequent steps Analysis of subtitle recognition.
  • the video frame color gamut interpretation module on the electronic device 100 sends an instruction to obtain a video frame corresponding to the subtitle group to the video decoding module on the electronic device 100, and the instruction carries time information of the subtitle frame and the like.
  • the video frame color gamut interpretation module after the video frame color gamut interpretation module receives the instruction to obtain the mask parameters of the subtitle group sent by the subtitle decoding module, it can send an instruction to the video decoding module to obtain the video frame corresponding to the subtitle group.
  • the decoding module sends the video frame corresponding to the subtitle group to the video frame color gamut interpretation module.
  • the instruction may carry time information of the subtitle frame, and the time information of the subtitle frame may be used by the video decoding module to find the video frame corresponding to the subtitle group.
  • the video decoding module on the electronic device 100 searches for the video frame corresponding to the subtitle group, and sends the video frame corresponding to the subtitle group to the video frame color gamut interpretation module on the electronic device 100 .
  • the video decoding module can find the video corresponding to the subtitle group based on the time information of the subtitle frame carried in the instruction. frame. Because the video decoding module has decoded the time information of all video frames in the video decoding stage, therefore, the video decoding module can match the time information of all video frames with the time information of the subtitle frame, if the matching is successful (i.e. the time information of the video frame is consistent with the time information of the subtitle frame), then the video frame is the video frame corresponding to the subtitle group. Afterwards, the video decoding module may send the video frame corresponding to the subtitle group to the video frame color gamut interpretation module.
  • the video frame color gamut interpretation module on the electronic device 100 obtains the color gamut information at each subtitle position in the video frame corresponding to the subtitle group based on the subtitle position information in the subtitle group information.
  • the video frame color gamut interpretation module acquires the video frame corresponding to the subtitle group, it can determine the video frame area corresponding to the position of each subtitle based on each piece of subtitle position information in the subtitle group information. Further, the video frame The color gamut interpretation module can calculate the color gamut information of the video frame area corresponding to the location of each subtitle.
  • the video frame color gamut interpretation module calculates the color gamut information of the video frame area corresponding to subtitle 1 as an example for illustration.
  • the video frame area corresponding to the position of the subtitle 1 can be the inner area of the rectangular solid line frame at the top of Figure 5. Since there may be pixel areas of different color gamuts in a video frame area, a video can be The frame area is divided into multiple sub-areas, and each sub-area may be called a video frame color gamut extraction unit. Wherein, the sub-regions can be divided according to the preset width, or can be divided according to the width of each word in the subtitle. For example, the subtitle 1 has 13 words in total, and the video frame area corresponding to the location of the subtitle 1 is divided into 13 sub-areas according to the width of each word in the subtitle 1 in FIG. 5 , that is, 13 video frame color gamut extraction units.
  • the video frame color gamut interpretation module may sequentially calculate the color gamut information of each sub-region in a sequence from left to right (or from right to left). Taking the calculation of the color gamut information of a sub-region in the video frame area as an example, the video frame color gamut interpretation module can obtain the color values of all pixels in the sub-region, and then superimpose and average the color values of all pixels, so that The average value of the color values of all pixels in the sub-region can be obtained, the average value is the color value of the sub-region, and the color value of the sub-region is the color gamut information of the sub-region.
  • the sub-region is m pixels wide and n pixels high, then the sub-region has a total of m*n pixels, and the color value x of each pixel can be represented by (r, g, b), then , the average value of the color values of all pixels in the sub-region then
  • r i is the average red color value of all the pixels in the sub-region
  • g i is the average green color value of all the pixels in the sub-region
  • b i is the average blue color value of all the pixels in the sub-region
  • the video frame color gamut interpretation module can calculate the color gamut information of all sub-regions of the video frame region corresponding to the location of each subtitle, that is, the color gamut information of the subtitle position in the video frame corresponding to the subtitle group.
  • the number of subregions for dividing the video frame region corresponding to the subtitle may be determined based on a preset division rule, which is not limited in this embodiment of the present application.
  • the color gamut information of the video frame area may also include information such as brightness of the video frame area.
  • the video frame color gamut interpretation module on the electronic device 100 generates a superimposed subtitle recognition degree analysis result based on each piece of subtitle color gamut information in the subtitle group information and the color gamut information at each subtitle position in the video frame corresponding to the subtitle group.
  • the video frame color gamut interpretation module after the video frame color gamut interpretation module has calculated the color gamut information at the subtitle position in the video frame corresponding to the subtitle group, it can base on the subtitle color gamut information in the subtitle group information and the subtitle position in the video frame corresponding to the subtitle group
  • the color gamut information of the superimposed subtitles is used to analyze the recognition degree of the superimposed subtitles.
  • the superimposed subtitles recognition degree analysis result can be generated through the superimposed subtitles recognition degree analysis. The result is used to indicate the recognition degree of each subtitle in the subtitle group (also called recognition high or low).
  • the video frame color gamut interpretation module can judge the subtitle group after being superimposed on the subtitle position in the video frame corresponding to the subtitle group, the difference between the subtitle color and the color of the video frame area corresponding to the subtitle, if the difference If the subtitle is less, it means that the subtitle recognition degree is low, and it is not easy to be recognized by the user.
  • the video frame color gamut interpretation module can determine the color difference value between the subtitle color and the color of the video frame area corresponding to the subtitle, and the color difference value is used to represent the difference between the subtitle color and the color of the video frame area corresponding to the subtitle.
  • the color difference value can be determined using a related algorithm in the prior art.
  • the color difference value Diff can be calculated using the following formula:
  • k is the number of all sub-regions of the video frame region corresponding to a subtitle
  • ri is the average red color value of all pixels in the sub-region
  • g i is the average green color value of all pixels in the sub-region
  • bi is The average blue color value of all pixels in the sub-region
  • r 0 is the red color value of the subtitle
  • g 0 is the green color value of the subtitle
  • b 0 is the blue color value of the subtitle.
  • the video frame color gamut interpretation module calculates the color difference value, it can determine whether the subtitle recognition degree is high or low by judging whether the color difference value is smaller than a preset color difference threshold.
  • the color difference value is smaller than a preset color difference threshold (also referred to as a first threshold), it indicates that the subtitle has a low degree of recognition.
  • the brightness of the video frame area corresponding to the subtitle can also be combined to further determine whether the subtitle is recognizable.
  • the subtitle recognition can be further combined with the brightness of the video frame area corresponding to the subtitle. If the brightness of the video frame area corresponding to the subtitle is higher than a certain preset brightness threshold, it means that the subtitle recognition degree is low.
  • the extracted subtitle color gamut information may only include a parameter corresponding to a color value of the subtitle.
  • the extracted subtitle color gamut information may include multiple parameters.
  • the extracted subtitle color gamut information may include multiple parameters such as starting point color value, end point color value, gradient direction, etc.
  • the average value of the starting point color value and the ending point color value of the subtitle can be calculated first, and then the average value can be used as the corresponding color value of the subtitle to analyze the recognition degree of superimposed subtitles .
  • the video frame color gamut interpretation module on the electronic device 100 calculates the color value and transparency of the mask corresponding to each subtitle in the subtitle group based on the analysis result of the superimposed subtitle recognition degree.
  • the video frame color gamut interpretation module After the video frame color gamut interpretation module generates the superimposed subtitle recognition degree analysis result, it can calculate the color value and transparency of the mask corresponding to each subtitle in the subtitle frame based on the result.
  • the color value of the mask corresponding to the subtitle can be a preset fixed value, Transparency can be set to 100%.
  • the subtitles correspond to the color value and transparency of the mask.
  • the color value and transparency of the mask corresponding to the subtitle are further determined based on the subtitle color gamut information or the color gamut information of the video frame area corresponding to the subtitle location.
  • the color value corresponding to the color value of the subtitle or the color value of the color value of the video frame area corresponding to the subtitle can be determined as the color value of the mask corresponding to the subtitle, such that , can make the user see the subtitle more clearly, and can also determine the color value corresponding to a color that is in the center of the color value of the subtitle or the color value of the video frame area corresponding to the subtitle as the color of the mask corresponding to the subtitle In this way, while ensuring that the user can clearly see the subtitles, it can also avoid eye discomfort caused by excessive color differences, and so on.
  • the electronic device 100 can calculate the color difference value Diff between the color value corresponding to each color in the color value table and the color value of the subtitle, and then select the color corresponding to the color with the largest/centered color difference value Diff Value as the color value of the mask.
  • the following formula can be used to calculate the color difference value Diff between the color value corresponding to each color in the color value table and the color value of the subtitle:
  • the color value corresponding to a certain color in the color value table is (R 0 , G 0 , B 0 )
  • R 0 is the red color value corresponding to the color
  • G 0 is the green color value corresponding to the color
  • B 0 is the blue color value corresponding to the color
  • r 0 is the red color value of the subtitle
  • g 0 is the green color value of the subtitle
  • b 0 is the blue color value of the subtitle.
  • the electronic device 100 may calculate the color difference value Diff between the color value corresponding to each color in the color value table and the color value of the video frame area corresponding to the subtitle, and then select the one with the largest/centered color difference value Diff.
  • the color value corresponding to a color is used as the color value of the mask.
  • the following formula can be used to calculate the color difference value Diff between the color value corresponding to each color in the color value table and the color value of the video frame area corresponding to the subtitle:
  • the color value corresponding to a certain color in the color value table is (R 0 , G 0 , B 0 )
  • R 0 is the red color value corresponding to the color
  • G 0 is the green color value corresponding to the color
  • B 0 is the blue color value corresponding to the color
  • k is the number of all sub-regions in the video frame region corresponding to the subtitle
  • r i is the average red color value of all pixels in the sub-region
  • g i is the total number of sub-regions The average green color value of the pixel.
  • the transparency of the mask corresponding to the subtitle may be further determined based on the color value of the mask corresponding to the subtitle. For example, when the difference between the color value of the mask corresponding to the subtitle and the color value of the subtitle is large, the transparency of the mask corresponding to the subtitle can be appropriately selected to a larger value (for example, a value greater than 50%). While clearly seeing the subtitles, it can also reduce the occlusion of the subtitle superimposition area on the video screen.
  • the video frame color gamut interpretation module on the electronic device 100 sends the color value and transparency of the mask corresponding to each subtitle in the subtitle group to the subtitle decoding module on the electronic device 100 .
  • the video frame color gamut interpretation module calculates the color value and the transparency of the mask corresponding to each subtitle in the subtitle group, it can send the color value and the transparency of the mask corresponding to each subtitle in the subtitle group to the subtitle decoding module, At the same time, the subtitle position information of the subtitle corresponding to the mask can also be carried, so that the subtitle decoding module can make a one-to-one correspondence between the subtitle and the mask.
  • the subtitle decoding module on the electronic device 100 generates a corresponding mask based on the color value and transparency of the mask corresponding to each subtitle in the subtitle group, and superimposes each subtitle and its corresponding mask in the subtitle group to generate a band mask subtitle frame.
  • the subtitle decoding module receives the color value and the transparency of the mask corresponding to each subtitle in the subtitle group sent by the video frame color gamut interpretation module, it can be based on the color value and transparency of the mask corresponding to a subtitle.
  • the subtitle position information generates a mask corresponding to the subtitle (such as the mask corresponding to subtitle 1 shown in FIG. 5 ), wherein the shape of the mask can be a rectangle or any other shape that can cover the subtitle. There is no limit to this.
  • the subtitle decoding module can generate a mask corresponding to each subtitle in the subtitle group.
  • the subtitle decoding module can generate four masks, and one subtitle corresponds to one mask.
  • the subtitle decoding module may superimpose the subtitle on the upper layer of the mask corresponding to the subtitle to generate a masked subtitle (for example, the masked subtitle 1 shown in FIG. 5 ).
  • the subtitle decoding module can superimpose each subtitle in the subtitle group and its corresponding mask to generate a masked subtitle frame.
  • Fig. 6A exemplarily shows a subtitle frame with a mask. It can be seen that each subtitle is superimposed with a mask, wherein the subtitles with high recognizability (such as “highly recognizable subtitles” and “synchronized with audio subtitles”) corresponds to a mask with a transparency of 100%, and subtitles with low visibility (such as "I am a subtitle that spans multiple color gamuts” and "colored subtitles that cannot be seen clearly”) correspond to a mask with a transparency of less than 100 %, have a certain color value.
  • the subtitles with high recognizability such as “highly recognizable subtitles” and “synchronized with audio subtitles”
  • subtitles with low visibility such as "I am a subtitle that spans multiple color gamuts” and "colored subtitles that cannot be seen clearly
  • the subtitle decoding module on the electronic device 100 sends the masked subtitle frame to the video frame synthesis module on the electronic device 100 .
  • the subtitle decoding module After the subtitle decoding module generates the masked subtitle frame, it can send the masked subtitle frame to the video frame synthesis module for subsequent generation of a video frame to be displayed.
  • Stage 4 Video frame synthesis, rendering and display stage
  • the video frame composition module on the electronic device 100 superimposes the received video frame and the masked subtitle frame to generate a video frame to be displayed, and sends the video frame to be displayed to the video frame queue on the electronic device 100 video frames.
  • the video rendering module can read video frames to be displayed from the video frame queue in chronological order, and render the video frames to be displayed in chronological order to generate rendered video frames.
  • the electronic device 100 displays the rendered video frame.
  • step S319-step S324 reference may be made to relevant content in step S109-step S114 in the embodiment shown in FIG.
  • the above-mentioned video decoding module, subtitle decoding module, video frame color gamut interpretation module, video frame synthesis module, video frame queue, and video rendering module can all be integrated in the above-mentioned video application program To implement the subtitle display method provided in the embodiment of the present application, which is not limited in the embodiment of the present application.
  • FIG. 6B it may be a picture of a certain frame in the rendered video frames displayed after the electronic device 100 executes the subtitle display method shown in FIG. 3 (one subtitle may correspond to one mask).
  • one subtitle may correspond to one mask.
  • FIG. 6B may be a schematic diagram of the first user interface with the video playback progress at 8:00
  • FIG. 6C may be a schematic diagram of the second user interface with the video playback progress at 8:02.
  • the video included in the first user interface The frame is different than the video frame included by the second user interface.
  • the subtitles "I am a subtitle spanning multiple color gamuts”, the subtitles “Highly recognizable subtitles”, and the subtitles “Color subtitles that cannot be seen clearly” are all similar to those in Figure 6B.
  • the electronic device 100 will recalculate the color value and transparency of the mask corresponding to the subtitle based on the color value of the subtitle and the color value of the subtitle corresponding to the current video frame area, and generate a mask corresponding to the subtitle. It is easy to see that in the second user interface, the video background color of the subtitle "I am a subtitle spanning multiple color gamuts” corresponding to the current video frame area has changed, and the recognition of the subtitle has also become higher.
  • the mask corresponding to the subtitle "I am a subtitle spanning multiple color gamuts" has also changed compared to Figure 6B. It can be seen that the subtitle does not display the mask. Specifically, it can be the transparency of the mask corresponding to the subtitle becomes 100%, or, the subtitle has no mask.
  • the video playing screen shown in FIG. 6B and FIG. 6C may be displayed in full screen or in partial screen, which is not limited in this embodiment of the present application.
  • the mask corresponding to the subtitle shown in FIG. 6B above is a mask spanning the entire area where the subtitle is located, that is, each subtitle corresponds to only one mask.
  • a subtitle may span multiple areas with large color gamut differences, resulting in a part of the subtitle with high recognizability and another part with low recognizability.
  • a subtitle can be generated Multiple corresponding masks. For example, for the subtitle "I am a subtitle spanning multiple color gamuts" shown in Figure 2C, the subtitle recognition degree of the front part of the area where the subtitle is located is relatively low (that is, the four words "I am a subtitle" are not easy for users to identify).
  • the subtitle recognition degree of the back end part of the area where the subtitle is located is also low (that is, the four words "subtitle of the domain” are not easy for users to see clearly), and the subtitle in the middle part of the area where the subtitle is located
  • the recognition degree is high (that is, the five words "across multiple colors" are easy for users to see clearly), therefore, in this case, one can be generated for the front part, middle part and back part of the area where the subtitle is located.
  • Corresponding masks that is, the subtitle can have three corresponding masks.
  • the embodiment of the present application can make some corresponding improvements to steps S313-step S317 on the basis of the method shown in Figure 3 above, so as to realize that one subtitle corresponds to multiple masks. plate. No other steps need to be changed.
  • the video frame color gamut interpretation module can calculate the color value of each sub-region sequentially from left to right (or from right to left) , in the above application scenario where one subtitle needs to correspond to multiple masks, that is, in the application scenario where a subtitle spans multiple regions with large color gamut differences, the video frame color gamut interpretation module can compare the color If the color values of adjacent sub-regions are similar, they will be merged into one region, and the merged region corresponds to a mask. If the color values of adjacent sub-regions differ greatly, they will not be merged, and the two unmerged regions will be Corresponding to their respective masks, therefore, one subtitle may correspond to multiple masks.
  • step S313-step S317 can be specifically executed according to the following steps, the following subtitle 1 as shown in Figure 7B is the subtitle "I" shown in Figure 2C is a subtitle spanning multiple color gamuts" as an example.
  • the video frame color gamut interpretation module sequentially calculates the color value of each sub-area of the video frame area corresponding to the position of the subtitle, and merges the sub-areas with similar color values to obtain M second sub-areas.
  • the video frame color gamut interpretation module calculates the color value of each sub-region sequentially from left to right (or from right to left), it also needs to compare the color values of adjacent sub-regions. color value, merging subregions with similar color values to obtain M second subregions, where M is a positive integer.
  • the video frame color gamut interpretation module divides the video frame area corresponding to the position of the subtitle into three areas (i.e. three The second sub-area): area A, area B, area C, assuming that area A is formed by merging a sub-area, area B is formed by merging b sub-areas, and area A is formed by merging c sub-areas of.
  • the similar color value may mean that the difference between the color values of the two sub-regions is smaller than a second threshold, and the second threshold is preset.
  • the video frame color gamut interpretation module analyzes the superimposed subtitle recognition degree for the M second subregions respectively, and generates superimposed subtitle recognition degree analysis results for the M second subregions.
  • the video frame color gamut interpretation module needs to analyze the recognition degree of superimposed subtitles for region A, region B, and region C respectively, instead of directly analyzing the recognition degree of superimposed subtitles for the entire video frame region.
  • the video frame color gamut interpretation module can also use the color difference value in step S314 to analyze the superimposed subtitle recognition degree of the area A, area B, and area C respectively, and the process is as follows:
  • a is the number of sub-regions included in region A
  • ri is the average red color value of all the pixels in the sub-regions in region A
  • g is the average green color value of all the pixels in the sub-regions in region A
  • b i is the average blue color value of all pixels in the sub-region in area A
  • r 0 is the red color value of the subtitle in area A
  • g 0 is the green color value of the subtitle in area A
  • b 0 is the color value in area A
  • b is the number of sub-regions included in region B
  • ri is the average red color value of all pixels in the sub-regions in region B
  • g i is the average green color value of all pixels in the sub-regions in region B
  • b i is the average blue color value of all pixels in the sub-region in area B
  • r 0 is the red color value of the subtitle in area B
  • g 0 is the green color value of the subtitle in area B
  • b 0 is the color value in area B The blue color value of subtitles.
  • c is the number of sub-regions included in region C
  • ri is the average red color value of all pixels in the sub-regions in region C
  • g i is the average green color value of all pixels in the sub-regions in region C
  • b i is the average blue color value of all pixels in the sub-region in area C
  • r 0 is the red color value of the subtitle in area C
  • g 0 is the green color value of the subtitle in area C
  • b 0 is the color value in area C
  • the video frame color gamut interpretation module calculates the color difference values of area A, area B, and area C respectively, it can judge whether the color difference values of these three areas are less than a preset color difference threshold, and if so, it indicates that the area subtitle recognition is low.
  • the video frame color gamut interpretation module respectively determines the color value and transparency of the masks corresponding to the M second sub-regions based on the subtitle color gamut information and the superimposed subtitle recognition analysis results of the M second sub-regions.
  • the video frame color gamut interpretation module needs to determine the color value and transparency of the mask corresponding to area A, and the color value and transparency of the mask corresponding to area B based on the subtitle color gamut information and the superimposed subtitle recognition analysis results of area A, area B, and area C, respectively.
  • Color value and transparency, area C corresponds to the color value and transparency of the mask.
  • the process of specifically determining the color value and transparency of the mask corresponding to each second sub-region is similar to the process of determining the color value and transparency of the mask corresponding to the entire video frame area corresponding to the position of the subtitle in step S315. You can refer to the aforementioned related content, I won't repeat them here.
  • the video frame color gamut interpretation module sends the color value, transparency, and position information of the masks corresponding to the M second subregions to the subtitle decoding module.
  • the video frame color gamut interpretation module needs to send the subtitle decoding module the color value and transparency of the mask corresponding to each subtitle in the subtitle group.
  • the module sends the position information of each mask (or the position information of each mask relative to its corresponding subtitle), wherein, the position information of each mask can be obtained based on the subtitle position information, specifically, if a subtitle corresponds to multiple Since the position information of the subtitle is known, the position information of all sub-regions of the video frame area where the subtitle is located can be deduced, and the position information of each second sub-region corresponding to the mask can be deduced.
  • the subtitle decoding module generates a mask corresponding to the subtitle based on the color value, transparency, and position information of the masks corresponding to the M second subregions, and superimposes the subtitle on the mask to generate a masked subtitle.
  • the subtitle decoding module may generate three masks corresponding to the subtitles based on the color value and transparency of the masks in each second sub-region corresponding to the subtitles, and the position information of the masks. (such as the mask corresponding to the subtitle 1 shown in Figure 7B), after that, the subtitle decoding module can superimpose the subtitle on the upper layer of the mask corresponding to the subtitle to generate a subtitle with a mask (such as the mask with a mask shown in Figure 7B Plate subtitles 1).
  • the subtitle decoding module can superimpose each subtitle in the subtitle group and its corresponding mask, so as to generate a masked subtitle frame.
  • Fig. 8A exemplarily shows a subtitle frame with a mask. It can be seen that the subtitle "I am a subtitle spanning multiple color gamuts” is superimposed with three masks, where “I am a subtitle” and “of a color gamut "Subtitle” has a low degree of recognition, so the transparency of the corresponding mask is less than 100%, and has a certain color value, while “Spanning multiple colors” has a higher degree of recognition, so the transparency of the corresponding mask is 100%. Each of the remaining three items has a mask superimposed on it.
  • the subtitle "Highly Recognizable Subtitle” and the subtitle “Subtitle Synchronized with Audio” are highly recognizable, so the transparency of the corresponding mask is 100%, and the subtitle “Unclear “Color subtitle” has a low degree of recognition, so the transparency of the corresponding mask is less than 100%, and it has a certain color value.
  • FIG. 8B it may be displayed after the electronic device 100 executes the improved subtitle display method shown in FIG. An image of a certain frame in the rendered video frame.
  • the mask corresponding to the subtitle has changed, and it is easy to It can be seen that since the middle part of the area where the subtitle is located (that is, the part that "crosses multiple colors") has a high degree of subtitle recognition, the transparency of the mask corresponding to this part is set to 100% (that is, fully transparent), or It is not necessary to set a mask, but because the subtitles of the front part (that is, the "I am a piece") and the rear part (that is, the "subtitle of the domain” part) of the area where the subtitle is located are relatively low in subtitle recognition, therefore, the two parts correspond to The color value and transparency of the mask are calculated based on the subtitle color gamut information and the color gamut
  • FIG. 8B may be a schematic diagram of the user interface of the video playback progress at 8:00, including the first video frame
  • FIG. 8C may be a schematic diagram of the user interface of the video playback progress of 8:01, including the second video frame , the first video frame and the second video frame are the same.
  • the subtitles with low recognition are "I am a subtitle” and "Subtitle of Domain”. Therefore, the corresponding masks of these two parts have certain color values, and the transparency of the corresponding masks is less than 100%.
  • the part with higher recognizability of the subtitle is "across multiple colors", so there is no display mask for this part. Specifically, the transparency of the mask corresponding to the subtitle can be 100%, or no mask is set.
  • the subtitles with low recognizability are changed to "I am a cross" and "subtitles”, so the electronic device 100 will re-create the subtitle based on the color value of the subtitle and the color value of the subtitle corresponding to the current video frame area.
  • the generation process of the mask corresponding to the subtitle in FIG. 8C is similar to the generation process of the mask corresponding to the subtitle in FIG. 8B , and will not be repeated here.
  • the video playing screen shown in FIG. 8B and FIG. 8C may be displayed in full screen or in partial screen, which is not limited in this embodiment of the present application.
  • the electronic device 100 will also generate a mask for the subtitle, the color value of the mask can be a preset color value, and the transparency of the mask is 100%.
  • the electronic device 100 may not generate a mask for the subtitle, that is, if the electronic device 100 determines that the subtitle is highly recognizable, the electronic device 100 may not further process the subtitle, Therefore, the subtitle has no corresponding mask, that is, the subtitle is not set with a mask.
  • a subtitle corresponds to a mask may mean that a subtitle corresponds to a mask containing a color value and a transparency, and a subtitle corresponds to multiple masks ( That is, one subtitle corresponds to multiple sets of mask parameters) can mean that one subtitle corresponds to multiple masks with different color values and different transparency, or one subtitle corresponds to one mask with different color values and different transparency (that is, multiple different color values Values and different transparency masks are combined into a mask containing different color values and different transparency).
  • the electronic device 100 in the embodiment of the present application takes a mobile phone as an example, and the electronic device 100 can also be a portable computer such as a tablet computer (Pad), a personal digital assistant (Personal Digital Assistant, PDA), or a laptop computer (Laptop).
  • the embodiment of the present application does not limit the type, physical form, and size of the electronic device 100 .
  • the first video may be the video played by the electronic device 100 after the user clicks on the video play option 221 shown in FIG. 2B
  • the first interface may be the user interface shown in FIG.
  • the first subtitle can be the subtitle "I am a subtitle spanning multiple color gamuts”
  • the first area is the area in the first picture corresponding to the display position of the first subtitle
  • the first subtitle A numerical value can be the color difference value of the color of the first subtitle and the color of the first picture area corresponding to the display position of the first subtitle
  • the second interface can be the user interface shown in Figure 6C
  • the second picture can be as shown in Figure 6C
  • the video frame picture, the second area is the area in the second picture corresponding to the display position of the first subtitle, and the second value can be the color of the color of the first subtitle and the color of the second picture area corresponding to the display position of the first subtitle Difference value
  • the first video file can be the video file corresponding to the first video
  • the first subtitle file can be the video file corresponding to the first video
  • the second sub-mask can be the mask corresponding to "I am a strip” (that is, the area A mask shown in Figure 7B), and the third sub-mask can be the mask corresponding to "across multiple colors" ( That is, the area B mask shown in FIG. 7B), the second mask may be the mask corresponding to the subtitle "I am a subtitle spanning multiple color gamuts" shown in FIG. 6C.
  • FIG. 9 exemplarily shows the structure of an electronic device 100 provided in the embodiment of the present application.
  • the electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, a display screen 194, and a subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input and output
  • subscriber identity module subscriber identity module
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (serial data line, SDA) and a serial clock line (derail clock line, SCL).
  • processor 110 may include multiple sets of I2C buses.
  • the processor 110 can be respectively coupled to the touch sensor 180K, the charger, the flashlight, the camera 193 and the like through different I2C bus interfaces.
  • the processor 110 may be coupled to the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the electronic device 100 .
  • the I2S interface can be used for audio communication.
  • processor 110 may include multiple sets of I2S buses.
  • the processor 110 may be coupled to the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 .
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through the Bluetooth headset.
  • the PCM interface can also be used for audio communication, sampling, quantizing and encoding the analog signal.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is generally used to connect the processor 110 and the wireless communication module 160 .
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
  • MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 110 communicates with the camera 193 through the CSI interface to realize the shooting function of the electronic device 100 .
  • the processor 110 communicates with the display screen 194 through the DSI interface to realize the display function of the electronic device 100 .
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193 , the display screen 194 , the wireless communication module 160 , the audio module 170 , the sensor module 180 and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface conforming to the USB standard specification, specifically, it can be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100 , and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones and play audio through them. This interface can also be used to connect other terminal devices, such as AR devices.
  • the interface connection relationship between the modules shown in the embodiment of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the charging management module 140 is configured to receive a charging input from a charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 can receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive a wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 is charging the battery 142 , it can also supply power to the electronic device 100 through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives the input from the battery 142 and/or the charging management module 140 to provide power for the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 141 may also be disposed in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be set in the same device.
  • the wireless communication function of the electronic device 100 can be realized by the antenna 1 , the antenna 2 , the mobile communication module 150 , the wireless communication module 160 , a modem processor, a baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signals modulated by the modem processor, and convert them into electromagnetic waves through the antenna 1 for radiation.
  • at least part of the functional modules of the mobile communication module 150 may be set in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be set in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is passed to the application processor after being processed by the baseband processor.
  • the application processor outputs sound signals through audio equipment (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent from the processor 110, and be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) network), bluetooth (bluetooth, BT), global navigation satellite, etc. applied on the electronic device 100.
  • System global navigation satellite system, GNSS
  • frequency modulation frequency modulation, FM
  • near field communication technology near field communication, NFC
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR techniques, etc.
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • code division multiple access code division multiple access
  • CDMA broadband Code division multiple access
  • WCDMA wideband code division multiple access
  • time division code division multiple access time-division code division multiple access
  • TD-SCDMA time-division code division multiple access
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a Beidou navigation satellite system (beidou navigation satellite system, BDS), a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • Beidou navigation satellite system beidou navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 100 realizes the display function through the GPU, the display screen 194 , and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the electronic device 100 can realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 and the application processor.
  • the ISP is used for processing the data fed back by the camera 193 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin color.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be located in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be realized through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. Such as saving music, video and other files in the external memory card.
  • the internal memory 121 may be used to store computer-executable program codes including instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 .
  • the internal memory 121 may include an area for storing programs and an area for storing data.
  • the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like.
  • the storage data area can store data created during the use of the electronic device 100 (such as audio data, phonebook, etc.) and the like.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.
  • the electronic device 100 can implement audio functions through the audio module 170 , the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • the audio module 170 may be set in the processor 110 , or some functional modules of the audio module 170 may be set in the processor 110 .
  • Speaker 170A also referred to as a "horn" is used to convert audio electrical signals into sound signals.
  • Electronic device 100 can listen to music through speaker 170A, or listen to hands-free calls.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the receiver 170B can be placed close to the human ear to listen to the voice.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a phone call or sending a voice message, the user can put his mouth close to the microphone 170C to make a sound, and input the sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In some other embodiments, the electronic device 100 may be provided with two microphones 170C, which may also implement a noise reduction function in addition to collecting sound signals. In some other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions, etc.
  • the earphone interface 170D is used for connecting wired earphones.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (open mobile terminal platform, OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense the pressure signal and convert the pressure signal into an electrical signal.
  • pressure sensor 180A may be disposed on display screen 194 .
  • pressure sensors 180A such as resistive pressure sensors, inductive pressure sensors, and capacitive pressure sensors.
  • a capacitive pressure sensor may be comprised of at least two parallel plates with conductive material.
  • the electronic device 100 determines the intensity of pressure according to the change in capacitance.
  • the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example: when a touch operation with a touch operation intensity less than the first pressure threshold acts on the short message application icon, an instruction to view short messages is executed. When a touch operation whose intensity is greater than or equal to the first pressure threshold acts on the icon of the short message application, the instruction of creating a new short message is executed.
  • the gyro sensor 180B can be used to determine the motion posture of the electronic device 100 .
  • the angular velocity of the electronic device 100 around three axes may be determined by the gyro sensor 180B.
  • the gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shaking angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shaking of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 may use the magnetic sensor 180D to detect the opening and closing of the flip leather case.
  • the electronic device 100 when the electronic device 100 is a clamshell machine, the electronic device 100 can detect opening and closing of the clamshell according to the magnetic sensor 180D.
  • features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of the electronic device 100, and can be applied to applications such as horizontal and vertical screen switching, pedometers, etc.
  • the distance sensor 180F is used to measure the distance.
  • the electronic device 100 may measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F for distance measurement to achieve fast focusing.
  • Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • the electronic device 100 emits infrared light through the light emitting diode.
  • Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it may be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 .
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to make a call, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, automatic unlock and lock screen in pocket mode.
  • the ambient light sensor 180L is used for sensing ambient light brightness.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket, so as to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access to application locks, take pictures with fingerprints, answer incoming calls with fingerprints, and the like.
  • the temperature sensor 180J is used to detect temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to implement a temperature treatment strategy. For example, when the temperature reported by the temperature sensor 180J exceeds the threshold, the electronic device 100 may reduce the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
  • the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to prevent the electronic device 100 from being shut down abnormally due to the low temperature.
  • the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also known as "touch panel”.
  • the touch sensor 180K can be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to the touch operation can be provided through the display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the position of the display screen 194 .
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the human pulse and receive the blood pressure beating signal.
  • the bone conduction sensor 180M can also be disposed in the earphone, combined into a bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vibrating bone mass of the vocal part acquired by the bone conduction sensor 180M, so as to realize the voice function.
  • the application processor may analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M. Realize the heart rate detection function.
  • the keys 190 include a power key, a volume key and the like.
  • the key 190 may be a mechanical key. It can also be a touch button.
  • the electronic device 100 can receive key input and generate key signal input related to user settings and function control of the electronic device 100 .
  • the motor 191 can generate a vibrating reminder.
  • the motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback.
  • touch operations applied to different applications may correspond to different vibration feedback effects.
  • the motor 191 may also correspond to different vibration feedback effects for touch operations acting on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, and can be used to indicate charging status, power change, and can also be used to indicate messages, missed calls, notifications, and the like.
  • the SIM card interface 195 is used for connecting a SIM card.
  • the SIM card can be connected and separated from the electronic device 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 .
  • the electronic device 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card etc. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the multiple cards may be the same or different.
  • the SIM card interface 195 is also compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as calling and data communication.
  • the electronic device 100 adopts an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
  • the electronic device 100 shown in FIG. 9 is only an example, and that the electronic device 100 may have more or fewer components than those shown in FIG. 9, two or more components may be combined, or Different component configurations are possible.
  • the various components shown in Figure 9 may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
  • FIG. 10 exemplarily shows a software structure of an electronic device 100 provided in the embodiment of the present application.
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture.
  • the software structure of the electronic device 100 is exemplarily described below.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the software structure of the electronic device 100 is divided into three layers, which are respectively an application program layer, an application program framework layer, and a kernel layer from top to bottom.
  • the application layer can consist of a series of application packages.
  • the application package may include application programs such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • the video may refer to the video application program mentioned in the embodiment of the present application.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, a video processing system, and so on.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • Said data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on.
  • the view system can be used to build applications.
  • a display interface can consist of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the electronic device 100 . For example, the management of call status (including connected, hung up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify the download completion, message reminder, etc.
  • the notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
  • prompting text information in the status bar issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.
  • the video processing system may be used to execute the subtitle display method provided in the embodiment of the present application.
  • the video processing system may include a subtitle decoding module, a video frame color gamut interpretation module, a video frame synthesis module, a video frame queue, and a video rendering module, wherein the specific functions of each module may refer to the relevant content in the foregoing embodiments, which are not described herein. Let me repeat.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, a Bluetooth driver, and a sensor driver.
  • the workflow of the software and hardware of the electronic device 100 will be exemplarily described below in conjunction with capturing and photographing scenes.
  • a corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, and other information). Raw input events are stored at the kernel level.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Take the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon as an example.
  • the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer.
  • Camera 193 captures still images or video.
  • FIG. 11 exemplarily shows the structure of another electronic device 100 provided in the embodiment of the present application.
  • the electronic device 100 may include: a video application program 1100 and a video processing system 1110 .
  • the video application program 1100 may be a system application program installed on the electronic device 100 (such as the "video" application program shown in FIG. Program, mainly used to play video.
  • the video processing system 1110 may include: a video decoding module 1111 , a subtitle decoding module 1112 , a video frame color gamut interpretation module 1113 , a video frame synthesis module 1114 , a video frame queue 1115 , and a video rendering module 1116 .
  • the video decoding module 1111 may receive the video information stream sent by the video application program 1100, and decode the video information stream to generate a video frame.
  • the subtitle decoding module 1112 can receive the subtitle information stream sent by the video application program 1100, and decode the subtitle information stream to generate a subtitle frame, and can generate masked subtitles based on the mask parameters sent by the video frame color gamut interpretation module 1113 frame, which can improve the recognition of subtitles.
  • the video frame color gamut interpretation module 1113 can analyze the subtitle recognition degree, generate the subtitle recognition degree analysis result, and calculate the corresponding mask parameters (color value and transparency) of the subtitle based on the subtitle recognition degree analysis result.
  • the video frame composition module 1114 can superimpose and combine the video frame and the subtitle frame to generate a video frame to be displayed.
  • the video frame queue 1115 can store the video frames to be displayed sent by the video frame synthesis module 1114 .
  • the video rendering module 1116 may render the video frames to be displayed in chronological order, generate the rendered video frames, and send them to the video application 1100 for video playback.
  • the electronic device 100 shown in FIG. 11 is only an example, and the electronic device 100 may have more or fewer components than those shown in FIG. 11 , and two or more components may be combined, Or can have a different component configuration.
  • Various components shown in FIG. 11 may be realized in hardware, software, or a combination of hardware and software.
  • FIG. 12 exemplarily shows the structure of another electronic device 100 provided in the embodiment of the present application.
  • the electronic device 100 may include: a video application program 1200, wherein the video application program 1200 may include: a video decoding module 1211, a subtitle decoding module 1212, a video frame color gamut interpretation module 1213, and a video frame synthesis module 1214.
  • the video application program 1200 may be a system application program installed on the electronic device 100 (such as the "video" application program shown in FIG. Program, mainly used to play video.
  • the acquiring and displaying module 1210 can acquire the video information stream and the subtitle information stream, and display the rendered video frame sent by the video rendering module 1216 and the like.
  • the video decoding module 1211 may receive the video information stream sent by the acquiring and displaying module 1210, and decode the video information stream to generate a video frame.
  • the subtitle decoding module 1212 can receive the subtitle information stream sent by the acquisition and display module 1210, and decode the subtitle information stream to generate a subtitle frame, and can generate masked subtitles based on the mask parameters sent by the video frame color gamut interpretation module 1213 frame, which can improve the recognition of subtitles.
  • the video frame color gamut interpretation module 1213 can analyze the subtitle recognition degree, generate the subtitle recognition degree analysis result, and calculate the corresponding mask parameters (color value and transparency) of the subtitle based on the subtitle recognition degree analysis result.
  • the video frame composition module 1214 may superimpose and merge the video frame and the subtitle frame to generate a video frame to be displayed.
  • the video frame queue 1215 can store the video frames to be displayed sent by the video frame synthesis module 1214 .
  • the video rendering module 1216 can render the video frames to be displayed in chronological order, generate the rendered video frames, and send them to the acquisition and display module 1210 for video playback.
  • the electronic device 100 shown in FIG. 12 is only an example, and the electronic device 100 may have more or fewer components than those shown in FIG. 12 , and two or more components may be combined, Or can have a different component configuration.
  • the various components shown in FIG. 12 can be realized in hardware, software, or a combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Studio Circuits (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请公开了一种字幕显示方法及相关设备,电子设备获取一个待播放的视频文件和待显示的字幕文件,然后对视频文件进行解码得到视频帧,对字幕文件进行解码得到字幕帧,之后,电子设备可以从字幕帧中提取字幕色域信息、字幕位置信息等,基于字幕位置信息提取字幕对应的视频帧中字幕显示位置处的色域信息,并基于字幕色域信息与字幕对应的视频帧中字幕显示位置处的色域信息计算字幕识别度,进一步基于字幕识别度计算字幕对应的蒙板的色值、透明度生成带蒙板的字幕帧,之后将视频帧与带蒙板的字幕帧合成、渲染并显示到视频播放窗口。这样,可以在不改变字幕颜色的基础上,提高字幕辨识度,同时也保证视频内容一定的可见性,提高用户体验。

Description

字幕显示方法及相关设备
本申请要求于2021年06月30日提交中国国家知识产权局、申请号为202110742392.9、申请名称为“字幕显示方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及一种字幕显示方法及相关设备。
背景技术
随着电子产品的迅速发展,手机、平板电脑、智能电视等电子设备已经广泛进入人们的生活,视频播放也成为了这些电子设备的一个重要应用功能,电子设备进行视频播放的同时,在视频播放窗口显示与所播放的视频相关的字幕的应用场景也较为广泛,例如,在视频播放窗口显示与音频同步的字幕,或者,为增加视频的互动性,在视频播放窗口显示用户输入的字幕。
但是,在上述视频播放同时也进行字幕显示的应用场景下,如果视频的颜色和亮度覆盖字幕的颜色,或者,字幕的颜色与字幕显示位置处视频的颜色和亮度重叠度比较高,例如,在高亮场景下显示一些浅色字幕,在雪地场景下显示一些白色字幕等情况下,则会导致字幕辨识度不足,难以被用户看清楚,用户体验差。
发明内容
本申请实施例提供了一种字幕显示方法及相关设备,可以解决用户在观看视频过程中字幕辨识度低的问题,提高用户体验。
第一方面,本申请实施例提供了一种字幕显示方法,该方法包括:电子设备播放第一视频;所述电子设备显示第一界面时,所述第一界面包括第一画面和第一字幕,所述第一字幕以第一蒙板为背景悬浮显示于所述第一画面的第一区域之上,所述第一区域是所述第一字幕的显示位置对应的所述第一画面中的区域,其中,所述第一字幕的色值与所述第一区域的色值的差异值为第一数值;所述电子设备显示第二界面时,所述第二界面包括第二画面和所述第一字幕,所述第一字幕不显示蒙板,所述第一字幕悬浮显示于所述第二画面的第二区域之上,所述第二区域是所述第一字幕的显示位置对应的所述第二画面中的区域,其中,所述第一字幕的色值与所述第二区域的色值的差异值为第二数值,所述第二数值大于所述第一数值;其中,所述第一画面是所述第一视频中的一个画面,所述第二画面是所述第一视频中的另一个画面。
本申请实施例通过实施上述字幕显示方法,电子设备可以在字幕辨识度低的情况下为字幕设置蒙板,在不改变字幕颜色的基础上,提高字幕辨识度。
在一种可能的实现方式中,在所述电子设备显示所述第一画面之前,该方法还包括:所述电子设备获取第一视频文件和第一字幕文件,其中,所述第一视频文件和所述第一字幕文件携带的时间信息相同;所述电子设备基于所述第一视频文件生成第一视频帧,所述第一视频帧用于生成所述第一画面;所述电子设备基于所述第一字幕文件生成第一字幕帧,并在所 述第一字幕帧中获取所述第一字幕的色值、显示位置,其中,所述第一字幕帧携带的时间信息与所述第一视频帧携带的时间信息相同;所述电子设备基于所述第一字幕的显示位置确定所述第一区域;所述电子设备基于所述第一字幕的色值或所述第一区域的色值生成所述第一蒙板;所述电子设备在所述第一字幕帧中将所述第一字幕叠加到所述第一蒙板之上生成第二字幕帧,并将所述第二字幕帧与所述第一视频帧进行合成。这样,电子设备可以获取一个待播放的视频文件和待显示的字幕文件,然后对视频文件进行解码得到视频帧,对字幕文件进行解码得到字幕帧,之后,电子设备可以从字幕帧中提取字幕色域信息、字幕位置信息等,基于字幕位置信息提取字幕对应的视频帧中字幕显示位置处的色域信息,并基于字幕色域信息与字幕对应的视频帧中字幕显示位置处的色域信息计算字幕识别度,进一步基于字幕识别度计算字幕对应的蒙板的色值生成带蒙板的字幕帧,之后将视频帧与带蒙板的字幕帧合成、渲染。
在一种可能的实现方式中,在所述电子设备基于所述第一字幕的色值或所述第一区域的色值生成所述第一蒙板之前,该方法还包括:所述电子设备确定所述第一数值小于第一阈值。这样,电子设备可以通过确定第一数值小于第一阈值来进一步确定字幕的辨识度低。
在一种可能的实现方式中,所述电子设备确定所述第一数值小于第一阈值,具体包括:所述电子设备将所述第一区域划分为N个第一子区域,其中,所述N为正整数;所述电子设备基于所述第一字幕的色值和所述N个第一子区域的色值确定所述第一数值小于所述第一阈值。这样,电子设备可以通过基于第一字幕的色值和所述N个第一子区域的色值确定所述第一数值小于所述第一阈值。
在一种可能的实现方式中,所述电子设备基于所述第一字幕的色值或所述第一区域的色值生成所述第一蒙板,具体包括:所述电子设备基于所述第一字幕的色值或所述N个第一子区域的色值确定出一个所述第一蒙板的色值;所述电子设备基于所述第一蒙板的色值生成所述第一蒙板。这样,电子设备可以基于第一字幕的色值或所述N个第一子区域的色值来确定出一个第一蒙板的色值,并进一步为第一字幕生成第一蒙板。
在一种可能的实现方式中,所述电子设备确定所述第一数值小于第一阈值,具体包括:所述电子设备将所述第一区域划分为N个第一子区域,其中,所述N为正整数;所述电子设备基于相邻的所述第一子区域之间的色值的差异值,确定是否将相邻的所述第一子区域合并为第二子区域;当相邻的所述第一子区域之间的色值的差异值小于第二阈值时,所述电子设备将相邻的所述第一子区域合并为所述第二子区域;所述电子设备基于所述第一字幕的色值和所述第二子区域的色值确定所述第一数值小于所述第一阈值。这样,电子设备可以将色值相近的第一子区域进行合并生成第二子区域,进一步基于第一字幕的色值和所述第二子区域的色值确定所述第一数值小于所述第一阈值。
在一种可能的实现方式中,所述第一区域包含M个所述第二子区域,所述M为正整数且小于等于所述N,所述第二子区域包括一个或多个所述第一子区域,每一个所述第二子区域包括的所述第一子区域的个数相同或不同。这样,电子设备可以把第一区域划分为M个第二子区域。
在一种可能的实现方式中,所述电子设备基于所述第一字幕的色值或所述第一区域的色值生成所述第一蒙板,具体包括:所述电子设备基于所述第一字幕的色值或M个所述第二子区域的色值依次计算M个第一子蒙板的色值;所述电子设备基于所述M个第一子蒙板的色值生成所述M个第一子蒙板,其中,所述M个第一子蒙板组合为所述第一蒙板。这样,电子设备可以为第一字幕生成M个第一子蒙板。
在一种可能的实现方式中,该方法还包括:所述电子设备显示第三界面时,所述第三界面包括第三画面和所述第一字幕,所述第一字幕至少包括第一部分和第二部分,所述第一部分显示第二子蒙板,所述第二部分显示第三子蒙板或不显示所述第三子蒙板,所述第二子蒙板的色值与所述第三子蒙板的色值不同。这样,电子设备上可以显示对应多条子蒙板的字幕。
在一种可能的实现方式中,所述第一蒙板的显示位置是基于所述第一字幕的显示位置确定的。这样,第一蒙板的显示位置可以与第一字幕的显示位置重合。
在一种可能的实现方式中,所述第一蒙板的色值与所述第一字幕的色值的差异值大于所述第一数值。这样,可以提高字幕辨识度。
在一种可能的实现方式中,在所述第一画面和所述第二画面中,所述第一字幕的显示位置相对于所述电子设备的显示屏是不固定的或固定的,所述第一字幕是连续显示的一段文字或符号。这样,第一字幕可以是弹幕或者是与音频同步的字幕,且第一字幕是一条字幕,而不是显示屏中显示的全部字幕。
在一种可能的实现方式中,在所述电子设备显示第一界面之前,该方法还包括:所述电子设备将所述第一蒙板的透明度设置为小于100%。这样,可以保证第一蒙板所在区域对应的视频帧仍然有一定的可见性。
在一种可能的实现方式中,在所述电子设备显示第二界面之前,该方法还包括:所述电子设备基于所述第一字幕的色值或所述第二区域的色值生成第二蒙板,并将所述第一字幕叠加到所述第二蒙板之上,其中,所述第二蒙板的色值为预设色值,所述第二蒙板的透明度为100%;或,所述电子设备不生成所述第二蒙板。这样,对于辨识度高的字幕,电子设备可以为其设置透明度为100%的蒙板,也可以为其设置蒙板。
第二方面,本申请实施例提供了一种电子设备,所述电子设备包括一个或多个处理器和一个或多个存储器;其中,所述一个或多个存储器与所述一个或多个处理器耦合,所述一个或多个存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述一个或多个处理器执行所述计算机指令时,使得所述电子设备执行上述第一方面任一项可能的实现方式中所述的方法。
第三方面,本申请实施例提供了一种计算机存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,当所述程序指令在电子设备上运行时,使得所述电子设备执行第一方面任一项可能的实现方式中所述的方法。
第四方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行上述第一方面任一项可能的实现方式中所述的方法。
附图说明
图1是本申请实施例提供的一种字幕显示方法的流程示意图;
图2A-图2C是本申请实施例提供的一组用户界面示意图;
图3是本申请实施例提供的另一种字幕显示方法的流程示意图;
图4是本申请实施例提供的一个字幕帧示意图;
图5是本申请实施例提供的一个生成字幕对应蒙板的原理示意图;
图6A是本申请实施例提供的一个带蒙板的字幕帧示意图;
图6B-图6C是本申请实施例提供的一组字幕显示的用户界面示意图;
图7A是本申请实施例提供的一种生成字幕对应蒙板方法的流程示意图;
图7B是本申请实施例提供的另一个生成字幕对应蒙板的原理示意图;
图8A是本申请实施例提供的另一个带蒙板的字幕帧示意图;
图8B-图8C是本申请实施例提供的一组字幕显示的用户界面示意图;
图9是本申请实施例提供的一种电子设备的结构示意图;
图10是本申请实施例提供的一种电子设备的软件结构示意图;
图11是本申请实施例提供的另一种电子设备的结构示意图;
图12是本申请实施例提供的另一种电子设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;文本中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,另外,在本申请实施例的描述中,“多个”是指两个或多于两个。
应当理解,本申请的说明书和权利要求书及附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本申请中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本申请所描述的实施例可以与其它实施例相结合。
为了便于理解,下面首先对本申请实施例中涉及的一些相关概念进行说明。
1、视频解码:
通过读取视频文件的二进制数据,根据视频文件的压缩算法解释出视频播放的图像帧(也可以称为视频帧)数据的过程。
2、字幕:
视频播放过程中在视频播放窗口中显示的独立于视频文件之外的文字、符号信息。
3、视频播放:
视频文件经过视频解码、视频渲染等操作之后,在视频播放窗口中按照时间顺序显示一组图像和对应的声音信息的过程。
4、弹幕:
在视频播放客户端(或者称为视频类应用程序)上由用户输入,并可以根据用户输入时间所对应的视频播放的图像帧位置显示到输入用户的视频播放窗口或其他用户在该视频播放客户端的视频播放窗口上的字幕。
随着电子产品的迅速发展,手机、平板电脑、智能电视等电子设备已经广泛进入人们的生活,视频播放也成为了这些电子设备的一个重要应用功能,电子设备进行视频播放的同时,在视频播放窗口显示与所播放的视频相关的字幕的应用场景也较为广泛,例如,在视频播放窗口显示与音频同步的字幕,或者,为增加视频的互动性,在视频播放窗口显示用户输入的字幕(即弹幕)。
在视频播放窗口显示与音频同步的字幕的应用场景下,通常是在视频播放窗口的下方按照字幕的时间戳与视频播放的图像帧的时间戳进行匹配,将字幕与对应的视频播放的图像帧合成,即将字幕叠加到对应的视频帧上面,字幕的位置与视频帧的重叠位置是相对固定的。
在视频播放窗口显示用户输入的字幕(即弹幕)的应用场景下,通常是在视频播放窗口有多条字幕在视频播放过程中由左至右或由右至左产生流动效果,字幕的位置与视频帧的重叠位置是相对不固定的。
在实际的一些应用场景中,为提升视频播放的趣味性,视频播放平台通常会提供给用户可以自主选择字幕颜色的能力。在视频播放窗口显示与音频同步的字幕的应用场景下,字幕颜色通常是系统默认的颜色,用户在进行视频播放的时候可以自主选择自己喜好的字幕颜色,电子设备则会按照用户选择的颜色在视频播放窗口上进行字幕显示。在视频播放窗口显示弹幕的应用场景下,发送弹幕的用户可以自主选择发送的弹幕的颜色,其他用户看到的弹幕颜色与发送弹幕的用户选择的弹幕的颜色保持一致,因此可能出现用户在观看弹幕的时候,同一视频帧上显示的每一条弹幕的颜色可能各不相同的情况。
为实现上述两个应用场景,本申请实施例提供了一种字幕显示方法,电子设备可以先获取一个待播放的视频文件和待显示到视频播放窗口的字幕文件,然后可以分别对视频文件进行视频解码得到视频帧,对字幕文件进行字幕解码得到字幕帧,之后,可以将视频帧与字幕帧按照时间顺序进行对齐匹配,合成最终待显示的视频帧,存储到视频帧队列,之后,按照时间顺序读取并渲染待显示的视频帧,最后,将渲染后的视频帧显示到视频播放窗口。
下面对上述字幕显示方法的方法流程进行详细介绍。
图1示例性示出了本申请实施例提供的一种字幕显示方法的方法流程。
如图1所示,该方法可以应用于具有视频播放能力的电子设备100。下面详细介绍该方法的具体步骤:
阶段一、视频信息流与字幕信息流获取阶段
S101-S102、电子设备100检测到用户在视频类应用程序上播放视频的操作,响应于该操作,电子设备100可以获取视频信息流和字幕信息流。
具体地,电子设备100上可以安装有视频类应用程序,检测到用户在视频类应用程序上播放视频的操作之后,响应于该操作,电子设备100可以获取用户想要播放的视频所对应的视频信息流(或者称为视频文件)和字幕信息流(或者称为字幕文件)。
示例性地,如图2A所示的是电子设备100提供的用于展示电子设备100安装的应用程序的用户界面(user interface,UI)。电子设备100可以检测到用户针对用户界面210上的“视频”应用程序选项211的操作(例如点击操作),响应于该操作,电子设备100可以显示如图2B所示的示例性用户界面220,用户界面220可以为“视频”应用程序的主界面,电子设备100在检测到用户针对用户界面220上的视频播放选项221的操作(例如点击操作),响 应于该操作,电子设备100可以获取该视频所对应的视频信息流和字幕信息流。
其中,上述视频信息流和字幕信息流可以是电子设备100从上述视频类应用程序的服务器下载的文件或在电子设备100中获取的文件。视频文件和字幕文件中都携带有时间信息。
可以理解的是,图2A和图2B仅仅示例性示出了电子设备100上的用户界面,不应构成对本申请实施例的限定。
阶段二、视频解码阶段
S103、电子设备100上的视频类应用程序向电子设备100上的视频解码模块发送视频信息流。
具体地,视频类应用程序在获取到视频信息流之后,可以向视频解码模块发送该视频信息流。
S104-S105、电子设备100上的视频解码模块解码视频信息流生成视频帧,并向电子设备100上的视频帧合成模块发送该视频帧。
具体地,视频解码模块在接收到视频类应用程序发送的视频信息流之后,可以对该视频信息流进行解码生成视频帧,该视频帧可以是视频播放过程中的全部视频帧,其中,一个视频帧也可以称为一个图像帧,每一个视频帧都可以携带有该视频帧的时间信息(即时间戳)。之后,视频解码模块可以将解码生成的视频帧发送给视频帧合成模块,用于后续生成待显示的视频帧。
其中,视频解码模块对视频信息流进行解码均可以使用现有技术中的视频解码方法,本申请实施例对此不作限定。视频解码方法的具体实现可以参照视频解码相关的技术资料,在此不作赘述。
阶段三、字幕解码阶段
S106、电子设备100上的视频类应用程序向电子设备100上的字幕解码模块发送字幕信息流。
具体地,视频类应用程序在获取到字幕信息流之后,可以向字幕解码模块发送该字幕信息流。
S107-S108、电子设备100上的字幕解码模块解码字幕信息流生成字幕帧,并向电子设备100上的视频帧合成模块发送该字幕帧。
具体地,字幕解码模块在接收到视频类应用程序发送的字幕信息流之后,可以对该字幕信息流进行解码生成字幕帧,该字幕帧可以为视频播放过程中的全部字幕帧,其中,每一个字幕帧中可以包括字幕文字、字幕文字的显示位置、字幕文字的字体颜色、字幕文字的字体格式等,还可以携带有该字幕帧的时间信息(即时间戳)。之后,字幕解码模块可以将解码生成的字幕帧发送给视频帧合成模块,用于后续生成待显示的视频帧。
其中,字幕解码模块对字幕信息流进行解码均可以使用现有技术中的字幕解码方法,本申请实施例对此不作限定。字幕解码方法的具体实现可以参照字幕解码相关的技术资料,在此不作赘述。
需要说明的是,本申请实施例仅仅以先执行阶段二视频解码阶段的步骤,再执行阶段三字幕解码阶段的步骤为例,在一些实施例中,也可以先执行阶段三字幕解码阶段的步骤再执 行阶段二视频解码阶段的步骤,或者,阶段二视频解码阶段的步骤与阶段三字幕解码阶段的步骤也可以同时执行,本申请实施例对此不作限定。
阶段四、视频帧合成、渲染及显示阶段
S109-S110、电子设备100上的视频帧合成模块将接收到的视频帧和字幕帧进行叠加合并生成待显示的视频帧,并向电子设备100上的视频帧队列发送该待显示的视频帧。
具体地,视频帧合成模块可以根据视频帧对应的时间信息与字幕帧对应的时间信息进行匹配,匹配完成之后将字幕帧叠加到对应的视频帧上面,并进行合并生成待显示的视频帧。之后,视频帧合成模块可以将该待显示的视频帧发送给视频帧队列。
S111-S113、视频渲染模块可以从视频帧队列中按照时间顺序读取待显示的视频帧,并按照时间顺序对待显示的视频帧进行渲染,生成渲染后的视频帧。
具体地,视频渲染模块可以实时(或每隔一段时间)获取视频帧队列中的待显示的视频帧。在视频帧合成模块将待显示的视频帧发送给视频帧队列之后,视频渲染模块可以从视频帧队列中按照时间顺序读取并渲染待显示的视频帧,生成渲染后的视频帧。之后,视频渲染模块可以把渲染后的视频帧发送给视频类应用程序。
其中,视频渲染模块待显示的视频帧进行渲染均可以使用现有技术中的视频渲染方法,本申请实施例对此不作限定。视频渲染方法的具体实现可以参照视频渲染相关的技术资料,在此不作赘述。
S114、电子设备100显示渲染后的视频帧。
具体地,电子设备100上的视频类应用程序在接收到视频渲染模块发送的渲染后的视频帧之后,可以在电子设备100的显示屏上(即视频播放窗口)显示渲染后的视频帧。
示例性地,如图2C所示的可以是电子设备100执行图1所示的字幕显示方法之后显示的渲染后的视频帧中的某一帧的画面。其中,字幕“我是一条跨了多个色域的字幕”、字幕“辨识度高的字幕”、字幕“看不清的彩色字幕”均为弹幕,弹幕的显示位置相对于电子设备100的显示屏是不固定的。字幕“与音频同步的字幕”的显示位置相对于电子设备100的显示屏是固定的。从图2C中容易看出,字幕“我是一条跨了多个色域的字幕”的前后两端与视频颜色的色差较小,从而导致字幕辨识度较低,用户无法清楚地看到该字幕;字幕“辨识度高的字幕”和字幕“与音频同步的字幕”与视频颜色的色差较大,字幕辨识度较高,用户可以清楚地看到该字幕;字幕“看不清的彩色字幕”的字幕颜色虽然与视频颜色色差并不是很小,但可能由于视频亮度较高,也会导致字幕辨识度较低,用户无法清楚地看到该字幕。
从图2C可以看出,使用图1所示的字幕显示方法,在视频播放同时也进行字幕显示的应用场景下,如果字幕的颜色与字幕显示位置处视频的颜色和亮度重叠度比较高,则会导致字幕辨识度低,难以被用户看清楚,用户体验差。
为解决上述问题,本申请实施例提供了另一种字幕显示方法,电子设备可以先获取一个待播放的视频文件和待显示到视频播放窗口的字幕文件,然后可以分别对视频文件进行视频解码得到视频帧,对字幕文件进行字幕解码得到字幕帧,之后,电子设备可以从字幕帧中提 取字幕色域信息、字幕位置信息等,并基于字幕位置信息提取字幕对应的视频帧中字幕显示位置处的色域信息,接着基于字幕色域信息与字幕对应的视频帧中字幕显示位置处的色域信息计算字幕识别度,若字幕识别度较低,则可以为字幕添加蒙板,基于字幕识别度计算蒙板的色值、透明度,从而生成带蒙板的字幕帧,之后,可以将视频帧与带蒙板的字幕帧按照时间顺序进行对齐匹配,合成最终待显示的视频帧,缓存到视频帧队列,之后,按照时间顺序读取并渲染待显示的视频帧,最后,将渲染后的视频帧显示到视频播放窗口。这样,可以在不改变用户选择的字幕颜色的基础上,通过调整字幕蒙板的颜色和透明度来解决字幕辨识度低的问题,同时可以减少字幕对视频内容的遮挡,保证视频内容一定的可见性,提高用户体验。
下面介绍本申请实施例提供的另一种字幕显示方法。
图3示例性示出了本申请实施例提供的另一种字幕显示方法的方法流程。
如图3所示,该方法可以应用于具有视频播放能力的电子设备100。下面详细介绍该方法的具体步骤:
阶段一、视频信息流与字幕信息流获取阶段
S301-S302、电子设备100检测到用户在视频类应用程序上播放视频的操作,响应于该操作,电子设备100可以获取视频信息流和字幕信息流。
其中,步骤S301-步骤S302的具体执行过程可以参照前述图1所示实施例中的步骤S101-步骤S102中的相关内容,在此不再赘述。
阶段二、视频解码阶段
S303、电子设备100上的视频类应用程序向电子设备100上的视频解码模块发送视频信息流。
S304-S305、电子设备100上的视频解码模块解码视频信息流生成视频帧,并向电子设备100上的视频帧合成模块发送该视频帧。
其中,步骤S303-步骤S305的具体执行过程可以参照前述图1所示实施例中的步骤S103-步骤S105中的相关内容,在此不再赘述。
阶段三、字幕解码阶段
S306、电子设备100上的视频类应用程序向电子设备100上的字幕解码模块发送字幕信息流。
S307、电子设备100上的字幕解码模块解码字幕信息流生成字幕帧。
其中,步骤S306-S307的具体执行过程可以参照前述图1所示实施例中的步骤S106-步骤S107中的相关内容,在此不再赘述。
图4示例性示出了字幕解码模块解码字幕信息流生成的其中一个字幕帧。
如图4所示,矩形实线框内部区域可以表示字幕帧显示区域(或者称为视频播放窗口区域),其可以与视频帧显示区域重合。该区域内可以显示一条或多条字幕,例如,“我是一条跨了多个色域的字幕”、“辨识度高的字幕”、“看不清的彩色字幕”、“与音频同步的字幕”等等,“我是一条跨了多个色域的字幕”、“辨识度高的字幕”等等均可以分别称为一条字幕,该区域内显示的全部字幕可以称为一个字幕组,例如,“我是一条跨了多个色域 的字幕”、“辨识度高的字幕”“看不清的彩色字幕”、“与音频同步的字幕”这一组字幕列表可以称为一个字幕组。
其中,图4所示的每一条字幕外的矩形虚线框仅仅是用于标识每一条字幕位置的辅助元素,在视频播放过程中可以不显示。
基于上述对字幕和字幕组的解释说明,如图2C所示,容易理解,图2C所示的画面中显示有四条字幕,分别为“我是一条跨了多个色域的字幕”、“辨识度高的字幕”、“看不清的彩色字幕”、“与音频同步的字幕”,这四条字幕组成了一个字幕组。
S308、电子设备100上的字幕解码模块提取字幕帧中的每一条字幕的字幕位置信息、字幕色域信息等,生成字幕组信息。
具体地,字幕解码模块在生成字幕帧之后,可以在字幕帧中提取出每一条字幕的字幕位置信息、字幕色域信息等,从而生成字幕组信息。其中,字幕位置信息可以为每一条字幕在字幕帧显示区域内的显示位置,字幕色域信息可以包括每一条字幕的色值。字幕组信息可以包括该字幕帧中全部字幕的字幕位置信息、字幕色域信息等。
可选的,字幕色域信息也可以包括字幕的亮度等信息。
下面分别详细介绍字幕位置信息和字幕色域信息的提取过程:
1、字幕位置信息提取过程:
字幕的显示位置区域可以是图4所示的刚好能够涵盖字幕的矩形虚线框的内部区域,或者其他能够涵盖字幕的任意形状的内部区域,本申请实施例对此不作限定。
在本申请实施例中,以矩形虚线框内部区域是字幕的显示位置区域为例对字幕位置信息提取过程进行介绍:
以提取图4所示的字幕“我是一条跨了多个色域的字幕”的字幕位置信息为例,字幕解码模块可以首先在字幕帧显示区域建立一个X-O-Y平面直角坐标系,然后选择字幕帧显示区域内的某一个点(例如矩形实线框左下角顶点)作为参考坐标点O,该参考坐标点O的坐标可以设置为(0,0),由数学知识可知,字幕“我是一条跨了多个色域的字幕”外的矩形虚线框的四个顶点处的坐标(x1,y1)、(x2,y2)、(x3,y3)、(x4,y4)均可以计算出来,则字幕“我是一条跨了多个色域的字幕”的位置信息可以包括该矩形虚线框的四个顶点处的坐标,或者,由于矩形是规则图形,只需要确定该矩形虚线框的某一条对角线上的两个顶点处的坐标即可确定该矩形所在的位置区域,因此,字幕“我是一条跨了多个色域的字幕”的位置信息也可以只包括该矩形虚线框的某一条对角线上的两个顶点处的坐标。
同理,图4所示的其他字幕的字幕位置信息也可以通过上述字幕位置提取方法提取出来,在此不再赘述。
字幕解码模块确定完字幕帧中全部字幕的位置信息,即表示字幕解码模块完成字幕位置信息提取。
需要说明的是,上述介绍的字幕位置信息提取过程仅仅是提取字幕位置信息的一种可能的实现方式,提取字幕位置信息的实现方式还可以是现有技术中的其他实现方式,本申请实施例对此不作限定。
2、字幕色域信息提取过程:
首先介绍字幕色域提取过程中涉及的相关概念:
色值:
色值是指某种颜色在不同的颜色模式中所对应的颜色值。以RGB颜色模式为例,在RGB颜色模式中,一种颜色由红色、绿色、蓝色混合而成,每一种颜色的色值均可以由(r,g,b)表示,其中,r,g,b分别表示红色、绿色、蓝色三原色的值,取值范围为[0,255]。例如,红色的色值可以表示为(255,0,0),绿色的色值可以表示为(0,255,0),蓝色的色值可以表示为(0,0,255),黑色的色值可以表示为(0,0,0),白色的色值可以表示为(255,255,255)。
色域:
色域是色值的集合,即在某种颜色模式中所能够产生的颜色的集合。容易理解,在RGB颜色模式中,最多可以产生256×256×256=16777216种不同的颜色,即224种不同的颜色,色域为[0,224-1]。这224种不同的颜色及每一种颜色对应的色值可以组成一个色值表,每一种颜色对应的色值均可以在该色值表中查找到。
字幕解码模块在完成字幕位置信息提取之后,可以基于字幕所在位置处的字幕的字体颜色,在色值表中查找该字体颜色对应的色值,从而确定该字幕的色值。
字幕解码模块确定完字幕帧中全部字幕的色值,即表示字幕解码模块完成字幕色域信息提取。
S309、电子设备100上的字幕解码模块向电子设备100上的视频帧色域解释模块发送获取字幕组蒙板参数的指令,该指令携带字幕帧的时间信息、字幕组信息等。
具体地,字幕解码模块在生成字幕组信息之后,可以向视频帧色域解释模块发送获取该字幕组蒙板参数的指令,该指令用于指示视频帧色域解释模块向字幕解码模块发送该字幕组对应的蒙板参数(包括蒙板的色值和透明度),一个色值和一个透明度可以称为一组蒙板参数。该指令可以携带字幕帧的时间信息、字幕组信息等,其中,字幕帧的时间信息可以用于在后续步骤中获取到该字幕组对应的视频帧,字幕组信息可以用于在后续步骤中对字幕识别度进行分析。
S310、电子设备100上的视频帧色域解释模块向电子设备100上的视频解码模块发送获取字幕组对应的视频帧的指令,该指令携带字幕帧的时间信息等。
具体地,视频帧色域解释模块在接收到字幕解码模块发送的获取该字幕组蒙板参数的指令之后,可以向视频解码模块发送获取字幕组对应的视频帧的指令,该指令用于指示视频解码模块向视频帧色域解释模块发送给字幕组对应的视频帧。该指令可以携带字幕帧的时间信息,该字幕帧的时间信息可以用于视频解码模块查找到字幕组对应的视频帧。
S311-S312、电子设备100上的视频解码模块查找字幕组对应的视频帧,并向电子设备100上的视频帧色域解释模块发送该字幕组对应的视频帧。
具体地,视频解码模块接收到视频帧色域解释模块发送的获取字幕组对应的视频帧的指令之后,视频解码模块可以基于该指令中携带的字幕帧的时间信息查找到该字幕组对应的视频帧。由于视频解码模块在视频解码阶段已经解码得到全部视频帧的时间信息,因此,视频解码模块可以将全部视频帧的时间信息与字幕帧的时间信息进行匹配,若匹配成功(即视频 帧的时间信息与字幕帧的时间信息一致),则该视频帧即为该字幕组对应的视频帧。之后,视频解码模块可以向视频帧色域解释模块发送该字幕组对应的视频帧。
S313、电子设备100上的视频帧色域解释模块基于字幕组信息中的字幕位置信息得到字幕组对应的视频帧中每一条字幕位置处的色域信息。
具体地,视频帧色域解释模块在获取到字幕组对应的视频帧之后,可以基于字幕组信息中的每一条字幕位置信息确定出每一条字幕所在位置对应的视频帧区域,进一步地,视频帧色域解释模块可以计算每一条字幕所在位置对应的视频帧区域的色域信息。
下面详细介绍视频帧色域解释模块计算每一条字幕所在位置对应的视频帧区域的色域信息的过程:
假设图2C所示画面中的字幕“我是一条跨了多个色域的字幕”为字幕1,以视频帧色域解释模块计算字幕1对应的视频帧区域的色域信息为例进行说明。
如图5所示,字幕1所在位置对应的视频帧区域可以为图5最上方的矩形实线框内部区域,由于一个视频帧区域内可能存在不同色域的像素区域,因此,可以将一个视频帧区域划分成多个子区域,每一个子区域均可以称为一个视频帧色域提取单元。其中,子区域的划分可以根据预设宽度进行划分,也可以根据字幕中每个字的宽度进行划分。例如,字幕1共有13个字,则图5中根据字幕1中每个字的宽度将字幕1所在位置对应的视频帧区域分为了13个子区域,即13个视频帧色域提取单元。
进一步地,视频帧色域解释模块可以按照从左到右(或从右到左)的顺序依次计算每一个子区域的色域信息。以计算视频帧区域中的一个子区域的色域信息为例,视频帧色域解释模块可以获取到该子区域的全部像素点的色值,然后对全部像素点的色值进行叠加平均,从而可以得到该子区域的全部像素点的色值的平均值,该平均值即为该子区域的色值,该子区域的色值即为该子区域的色域信息。
示例性地,假设该子区域为m像素宽,n像素高,则该子区域共有m*n个像素点,每一个像素点的色值x均可以由(r,g,b)表示,那么,该子区域的全部像素点的色值的平均值
Figure PCTCN2022095325-appb-000001
则为
Figure PCTCN2022095325-appb-000002
其中,r i为子区域全部像素点的平均红色色值,g i为子区域全部像素点的平均绿色色值,b i为子区域全部像素点的平均蓝色色值,
Figure PCTCN2022095325-appb-000003
为第i个像素点的红色色值,
Figure PCTCN2022095325-appb-000004
为第i个像素点的绿色色值,
Figure PCTCN2022095325-appb-000005
为第i个像素点的蓝色色值。
同理,视频帧色域解释模块可以计算出每一条字幕所在位置对应的视频帧区域的全部子区域的色域信息,即字幕组对应的视频帧中字幕位置处的色域信息。
应当理解,字幕对应的视频帧区域划分多个子区域的个数可以基于预设的划分规则进行确定,本申请实施例对此不作限定。
可选的,视频帧区域的色域信息也可以包括视频帧区域的亮度等信息。
需要说明的是,上述介绍的计算每一条字幕所在位置对应的视频帧区域的色域信息的过程仅仅是一种可能的实现方式,还可以使用其他实现方式,本申请实施例对此不作限定。
S314、电子设备100上的视频帧色域解释模块基于字幕组信息中的每一条字幕色域信息和字幕组对应的视频帧中每一条字幕位置处的色域信息生成叠加字幕识别度分析结果。
具体地,视频帧色域解释模块在计算完字幕组对应的视频帧中字幕位置处的色域信息之后,可以基于字幕组信息中的字幕色域信息和字幕组对应的视频帧中字幕位置处的色域信息进行叠加字幕识别度分析,进一步地,可以通过叠加字幕识别度分析生成叠加字幕识别度分析结果,该结果用于表示字幕组中每一条字幕的识别度高低(也可以称为辨识度高低)。
也即是说,视频帧色域解释模块可以判断字幕组在叠加到该字幕组对应的视频帧中的字幕位置处之后,字幕颜色和字幕对应的视频帧区域的颜色的差异性大小,若差异性较小,则表示字幕识别度低,不容易被用户识别出来。
下面详细介绍视频帧色域解释模块进行叠加字幕识别度分析的过程:
视频帧色域解释模块可以确定字幕颜色和字幕对应的视频帧区域的颜色的颜色差异值,该颜色差异值用于表示字幕颜色和字幕对应的视频帧区域的颜色的差异性。该颜色差异值可以利用现有技术中的相关算法来确定。
在一种可能的实现方式中,颜色差异值Diff可以采用以下公式来计算:
Figure PCTCN2022095325-appb-000006
其中,k为一条字幕对应的视频帧区域的全部子区域的个数,r i为子区域全部像素点的平均红色色值,g i为子区域全部像素点的平均绿色色值,b i为子区域全部像素点的平均蓝色色值,r 0为字幕的红色色值,g 0为字幕的绿色色值,b 0为字幕的蓝色色值。
进一步地,视频帧色域解释模块计算得到颜色差异值之后,可以通过判断该颜色差异值是否小于某一预设颜色差异阈值来确定该字幕识别度高低。
若该颜色差异值小于某一预设颜色差异阈值(也可以称为第一阈值),则表示该字幕识别度低。
在一些实施例中,还可以结合字幕对应视频帧区域的亮度来进一步确定字幕识别度高低。
举例来说,图2C所示的字幕“看不清的彩色字幕”,虽然字幕颜色与字幕对应的视频帧区域的颜色差异值不是很小,但是由于该字幕对应视频帧区域的亮度过高,仍然存在字幕识别度低的问题,因此,针对这种情况,还可以进一步结合字幕对应视频帧区域的亮度来判断字幕识别度,若该字幕对应的视频帧区域的亮度高于某一预设亮度阈值,则表示该字幕识别度低。
对于纯色字幕来说,提取出来的字幕色域信息可以只包括该字幕对应的一个色值这一个参数。而对于非纯色字幕,提取出来的字幕色域信息可能包括多个参数,例如,对于渐变色字幕,提取出来的字幕色域信息可以包括起点色值、终点色值、渐变方向等多个参数,在这种情况下,在一种可能的实现方式中,可以先计算字幕的起点色值和终点色值的平均值,之后再将该平均值作为字幕对应的色值来进行叠加字幕识别度分析。
需要说明的是,上述介绍的视频帧色域解释模块进行叠加字幕识别度分析的过程仅仅是 一种可能的实现方式,还可以使用其他实现方式,本申请实施例对此不作限定。
S315、电子设备100上的视频帧色域解释模块基于叠加字幕识别度分析结果计算字幕组中每一条字幕对应蒙板的色值和透明度。
具体地,视频帧色域解释模块在生成叠加字幕识别度分析结果之后,可以基于该结果计算出字幕帧中每一条字幕对应蒙板的色值和透明度。
对于识别度较高的字幕(例如图2C中的字幕“辨识度高的字幕”和字幕“与音频同步的字幕”),该字幕对应蒙板的色值可以为一个预先设置好的固定值,透明度可以设置为100%。
对于识别度较低的字幕(例如图2C中的字幕“我是一条跨了多个色域的字幕”、字幕“看不清的彩色字幕”),该字幕对应蒙板的色值和透明度需要基于字幕色域信息或字幕所在位置对应的视频帧区域的色域信息来进一步确定该字幕对应蒙板的色值和透明度。
具体确定字幕对应蒙板的色值和透明度的方式可以有很多种,本申请实施例对此不作限定,本领域技术人员可以根据需要来选择。
在一种可能的实现方式中,可以将与字幕的色值或字幕对应的视频帧区域的色值的颜色差异值最大的一种颜色对应的色值确定为字幕对应蒙板的色值,这样,可以使得用户更清楚地看到字幕,也可以将与字幕的色值或字幕对应的视频帧区域的色值的颜色差异值居中的一种颜色对应的色值确定为字幕对应蒙板的色值,这样,在保证用户清楚地看到字幕的同时也能够避免颜色差异过大给用户带来的眼部不适感,等等。
例如,电子设备100可以计算色值表中每一种颜色对应的色值与字幕的色值之间的颜色差异值Diff,之后,可以选择颜色差异值Diff最大/居中的一种颜色对应的色值作为蒙板的色值。在一种可能的实现方式中,可以用以下公式计算色值表中每一种颜色对应的色值与该字幕的色值之间的颜色差异值Diff:
Diff=(r 0-R 0) 2+(g 0-G 0) 2+(b 0-B 0) 2
其中,假设色值表中某一种颜色对应的色值为(R 0,G 0,B 0),R 0则为该颜色对应的红色色值,G 0则为该颜色对应的绿色色值,B 0则为该颜色对应的蓝色色值;r 0为字幕的红色色值,g 0为字幕的绿色色值,b 0为字幕的蓝色色值。
又例如,电子设备100可以计算色值表中每一种颜色对应的色值与字幕对应的视频帧区域的色值之间的颜色差异值Diff,之后,可以选择颜色差异值Diff最大/居中的一种颜色对应的色值作为蒙板的色值。在一种可能的实现方式中,可以用以下公式计算色值表中每一种颜色对应的色值与字幕对应的视频帧区域的色值之间的颜色差异值Diff:
Figure PCTCN2022095325-appb-000007
其中,假设色值表中某一种颜色对应的色值为(R 0,G 0,B 0),R 0则为该颜色对应的红色色值,G 0则为该颜色对应的绿色色值,B 0则为该颜色对应的蓝色色值;k为该字幕对应的视频帧区域的全部子区域的个数,r i为子区域全部像素点的平均红色色值,g i为子区域全部像素点的平均绿色色值。
在一种可能的实现方式中,字幕对应蒙板的透明度可以基于字幕对应蒙板的色值来进一步确定。例如,在字幕对应蒙板的色值与字幕的色值的差异较大的情况下,字幕对应蒙板的 透明度可以适当选择较大的值(例如大于50%的值),这样,在保证用户清楚地看到字幕的同时也可以减小对字幕叠加区域对视频画面的遮挡。
S316、电子设备100上的视频帧色域解释模块向电子设备100上的字幕解码模块发送字幕组中每一条字幕对应的蒙板的色值和透明度。
具体地,视频帧色域解释模块在计算出字幕组中每一条字幕对应蒙板的色值和透明度之后,可以向字幕解码模块发送字幕组中每一条字幕对应的蒙板的色值和透明度,同时,还可以携带蒙板所对应的字幕的字幕位置信息,以便字幕解码模块可以将字幕与蒙板进行一一对应。
S317、电子设备100上的字幕解码模块基于字幕组中每一条字幕对应的蒙板的色值和透明度生成对应蒙板,并将字幕组中每一条字幕及其对应蒙板进行叠加生成带蒙板的字幕帧。
具体地,字幕解码模块在接收到视频帧色域解释模块发送的字幕组中每一条字幕对应的蒙板的色值和透明度之后,可以基于一条字幕对应的蒙板的色值和透明度与该字幕的字幕位置信息生成一条该字幕对应的蒙板(例如图5所示的字幕1对应的蒙板),其中,蒙板的形状可以是能够涵盖该字幕的矩形或者其他任意形状,本申请实施例对此不作限定。
同理,字幕解码模块可以为字幕组中的每一条字幕生成一条该字幕对应的蒙板。
示例性地,如图2C所示,容易看出,该画面中有四条字幕,因此,字幕解码模块可以生成四条蒙板,一条字幕对应一条蒙板。
进一步地,字幕解码模块可以将字幕叠加到该字幕所对应的蒙板上层生成带蒙板的字幕(例如图5所示的带蒙板的字幕1)。
同理,字幕解码模块可以将字幕组中的每一条字幕及其对应蒙板进行叠加,从而生成带蒙板的字幕帧。
图6A示例性示出了一个带蒙板的字幕帧,可以看出,每一条字幕均叠加有一条蒙板,其中,辨识度高的字幕(例如“辨识度高的字幕”和“与音频同步的字幕”)对应蒙板的透明度为100%,辨识度低的字幕(例如“我是一条跨了多个色域的字幕”和“看不清的彩色字幕”)对应蒙板的透明度小于100%,有一定的色值。
S318、电子设备100上的字幕解码模块向电子设备100上的视频帧合成模块发送带蒙板的字幕帧。
具体地,字幕解码模块在生成带蒙板的字幕帧之后,可以将该带蒙板的字幕帧发送给视频帧合成模块,用于后续生成待显示的视频帧。
阶段四、视频帧合成、渲染及显示阶段
S319-S320、电子设备100上的视频帧合成模块将接收到的视频帧和带蒙板的字幕帧进行叠加合并生成待显示的视频帧,并向电子设备100上的视频帧队列发送该待显示的视频帧。
S321-S323、视频渲染模块可以从视频帧队列中按照时间顺序读取待显示的视频帧,并按照时间顺序对待显示的视频帧进行渲染,生成渲染后的视频帧。
S324、电子设备100显示渲染后的视频帧。
其中,步骤S319-步骤S324的具体执行过程可以参照前述图1所示实施例中的步骤S109-步骤S114中的相关内容,在此不再赘述。
需要说明的是,在一些实施例中,上述视频解码模块、字幕解码模块、视频帧色域解释模块、视频帧合成模块、视频帧队列、视频渲染模块也可以都集成在上述视频类应用程序中来执行本申请实施例提供的字幕显示方法,本申请实施例对此不作限定。
示例性地,如图6B所示的可以是电子设备100执行图3所示的字幕显示方法(一条字幕可对应一条蒙板)之后显示的渲染后的视频帧中的某一帧的画面。容易看出,与图2C所示的画面相比,在为字幕组添加对应的蒙板之后,字幕“我是一条跨了多个色域的字幕”和字幕“看不清的彩色字幕”这两条字幕的辨识度有了很大的提升,同时,由于字幕对应的蒙板有一定的透明度,因此,字幕叠加区域对视频画面也未完全遮挡,这样,综合考虑到了视频显示和字幕显示的效果,在不改变用户选择的字幕颜色的基础上,保证用户可以看清字幕的同时,也可以保证视频画面一定的可见性,提高用户体验。
进一步地,在整个视频播放过程中,字幕的位置、视频背景的颜色等均可能发生变化,因此上述字幕显示方法可以一直执行,从而实现在整个视频播放过程中,用户均可以清楚地看到字幕。示例性地,上述图6B可以为视频播放进度在8:00时刻的第一用户界面示意图,图6C可以为视频播放进度在8:02时刻的第二用户界面示意图,第一用户界面包括的视频帧与第二用户界面包括的视频帧不同。如图6C所示,可以看出,字幕“我是一条跨了多个色域的字幕”,字幕“辨识度高的字幕”,字幕“看不清的彩色字幕”相对于图6B来说均向显示屏的左侧发生了移动,电子设备100会基于字幕的色值和该字幕对应当前视频帧区域的色值重新计算字幕对应的蒙板的色值、透明度,生成字幕对应的蒙板。容易看出,在第二用户界面中,字幕“我是一条跨了多个色域的字幕”对应当前视频帧区域的视频背景颜色发生了变化,该字幕的辨识度也变高了,因此,字幕“我是一条跨了多个色域的字幕”对应的蒙板相对于图6B也发生了变化,可以看出,该字幕没有显示蒙板,具体地,可以是该字幕对应蒙板的透明度变为了100%,或,该字幕没有蒙板。
图6B和图6C所示的视频播放画面可以是全屏显示也可以是部分屏幕显示,本申请实施例对此不作限定。
上述图6B所示的字幕对应的蒙板都是一条跨越整个字幕所在区域的蒙板,即一条字幕均只对应一条蒙板。在实际的一些应用场景中,一条字幕可能跨越多个色域差别较大的区域,从而导致字幕的一部分辨识度较高,另一部分辨识度较低,在这种情况下,可以为一条字幕生成多条对应的蒙板。例如,图2C中所示的字幕“我是一条跨了多个色域的字幕”,该字幕所在区域前端部分的字幕辨识度较低(即“我是一条”这四个字是用户不容易看清楚的),该字幕所在区域的后端部分的字幕识别度也较低(即“域的字幕”这四个字是用户不容易看清楚的),而该字幕所在区域的中间部分的字幕辨识度较高(即“跨了多个色”这五个字是用户容易看清楚的),因此,在这种情况下,可以为字幕所在区域前端部分、中间部分、后端部分分别生成一条对应的蒙板,即该字幕可以有三条对应的蒙板。
针对上述一条字幕对应多条蒙板的应用场景,本申请实施例可以在前述图3所示的方法的基础上,对步骤S313-步骤S317进行一些相应的改进,从而实现一条字幕对应多条蒙板。其他步骤无需变化。
下面详细介绍实现一条字幕对应多条蒙板的过程:
在生成字幕组对应的视频帧中字幕位置处的色域信息过程中,视频帧色域解释模块可以按照从左到右(或从右到左)的顺序依次计算出每一个子区域的色值,在上述需要实现一条字幕对应多条蒙板的应用场景下,也即一条字幕跨越多个色域差别较大的区域的应用场景下,视频帧色域解释模块可以比较相邻子区域的色值,如果相邻子区域色值相近则合并成一个区域,合并后的区域对应一条蒙板,如果相邻子区域色值差异较大,则不进行合并,这两个未合并的区域则分别对应各自的蒙板,因此,一条字幕可能对应多条蒙板。
如图7A所示,在一条字幕可能对应多条蒙板的情况下,步骤S313-步骤S317可以按照以下步骤具体执行,下面以如图7B所示的字幕1是图2C所示的字幕“我是一条跨了多个色域的字幕”为例进行说明。
S701、视频帧色域解释模块依次计算出字幕所在位置对应的视频帧区域的每一个子区域的色值,合并色值相近的子区域,得到M个第二子区域。
具体地,在步骤S313的基础上,视频帧色域解释模块按照从左到右(或从右到左)的顺序依次计算出每一个子区域的色值之后,还需要比较相邻子区域的色值,合并色值相近的子区域,得到M个第二子区域,其中,M为正整数。如图7B所示,视频帧色域解释模块通过比较相邻子区域的色值,合并色值相近的子区域之后,将该字幕所在位置对应的视频帧区域分为了三个区域(即三个第二子区域):区域A、区域B、区域C,假设区域A是由a个子区域合并而成的,区域B是由b个子区域合并而成的,区域A是由c个子区域合并而成的。
其中,色值相近可以是指两个子区域的色值的差异值小于第二阈值,第二阈值是预先设置的。
S702、视频帧色域解释模块针对M个第二子区域分别进行叠加字幕识别度分析,生成M个第二子区域的叠加字幕识别度分析结果。
具体地,视频帧色域解释模块需要针对区域A、区域B、区域C分别进行叠加字幕识别度分析,而不是直接对整个视频帧区域进行叠加字幕识别度分析。类似的,视频帧色域解释模块也可以利用步骤S314中的颜色差异值来对区域A、区域B、区域C分别进行叠加字幕识别度分析,过程如下:
区域A的颜色差异值Diff1:
Figure PCTCN2022095325-appb-000008
其中,a为区域A包括的子区域的个数,r i为区域A中的子区域全部像素点的平均红色色值,g i为区域A中的子区域全部像素点的平均绿色色值,b i为区域A中的子区域全部像素点的平均蓝色色值,r 0为区域A中的字幕的红色色值,g 0为区域A中的字幕的绿色色值,b 0为区域A中的字幕的蓝色色值。
区域B的颜色差异值Diff2:
Figure PCTCN2022095325-appb-000009
其中,b为区域B包括的子区域的个数,r i为区域B中的子区域全部像素点的平均红色色值,g i为区域B中的子区域全部像素点的平均绿色色值,b i为区域B中的子区域全部像素点的平均蓝色色值,r 0为区域B中的字幕的红色色值,g 0为区域B中的字幕的绿色色值,b 0为区域B中的字幕的蓝色色值。
区域C的颜色差异值Diff3:
Figure PCTCN2022095325-appb-000010
其中,c为区域C包括的子区域的个数,r i为区域C中的子区域全部像素点的平均红色色值,g i为区域C中的子区域全部像素点的平均绿色色值,b i为区域C中的子区域全部像素点的平均蓝色色值,r 0为区域C中的字幕的红色色值,g 0为区域C中的字幕的绿色色值,b 0为区域C中的字幕的蓝色色值。
视频帧色域解释模块分别计算得到区域A、区域B、区域C的颜色差异值之后,可以分别判断这三个区域的颜色差异值是否小于某一预设颜色差异阈值,若是,则表示该区域的字幕识别度低。
S703、视频帧色域解释模块基于字幕色域信息和M个第二子区域的叠加字幕识别度分析结果分别确定M个第二子区域对应蒙板的色值和透明度。
具体地,视频帧色域解释模块需要基于字幕色域信息和区域A、区域B、区域C的叠加字幕识别度分析结果分别确定区域A对应蒙板的色值和透明度、区域B对应蒙板的色值和透明度、区域C对应蒙板的色值和透明度。具体确定每一个第二子区域对应蒙板的色值和透明度的过程与步骤S315中确定字幕所在位置对应的整个视频帧区域对应蒙板的色值和透明度的过程类似,可以参照前述相关内容,在此不再赘述。
S704、视频帧色域解释模块向字幕解码模块发送M个第二子区域对应的蒙板的色值、透明度、位置信息。
具体地,由于一条字幕可能对应多条蒙板,因此,视频帧色域解释模块除了向字幕解码模块发送字幕组中每一条字幕对应的蒙板的色值和透明度之外,还需要向字幕解码模块发送每一条蒙板的位置信息(或者每一条蒙板相对其对应字幕的位置信息),其中,每一条蒙板的位置信息可以是基于字幕位置信息得到的,具体地,若一条字幕对应多条蒙板,由于字幕位置信息已知,从而可以推算出字幕所在位置的视频帧区域的全部子区域的位置信息,进一步可以推算出每个第二子区域对应蒙板的位置信息。
S705、字幕解码模块基于上述M个第二子区域对应蒙板的色值、透明度、位置信息生成字幕对应的蒙板,并将字幕叠加到上述蒙板生成带蒙板的字幕。
具体地,对于对应多条蒙板的字幕,字幕解码模块可以基于该条字幕对应的每一个第二子区域的蒙板的色值和透明度、蒙板的位置信息,生成三条该字幕对应的蒙板(例如图7B所示的字幕1对应的蒙板),之后,字幕解码模块可以将该字幕叠加到该字幕所对应的蒙板上层生成带蒙板的字幕(例如图7B所示的带蒙板的字幕1)。
如图2C所示,由于字幕“辨识度高的字幕”、字幕“看不清的彩色字幕”、字幕“与音频同步的字幕”这三条字幕没有跨越多个色域差别较大的区域,因此,这三条字幕还是均对应一条蒙板。
字幕解码模块可以将字幕组中的每一条字幕及其对应蒙板进行叠加,从而生成带蒙板的字幕帧。
图8A示例性示出了一个带蒙板的字幕帧,可以看出,字幕“我是一条跨了多个色域的字幕”叠加有三条蒙板,其中,“我是一条”和“域的字幕”由于辨识度较低,因此对应蒙板的透明度小于100%,有一定的色值,而“跨了多个色”由于辨识度较高,因此对应蒙板的透明度为100%。其余三条均各自叠加有一条蒙板,其中,字幕“辨识度高的字幕”和字幕“与音频同步的字幕”由于辨识度较高,因此对应蒙板的透明度为100%,字幕“看不清的彩色字幕”由于辨识度较低,因此对应蒙板的透明度小于100%,有一定的色值。
示例性地,如图8B所示的可以是电子设备100执行改进后的图3所示的字幕显示方法(跨越多个色域差别较大的区域的字幕可对应多条蒙板)之后显示的渲染后的视频帧中的某一帧的画面。与图6B所示的画面相比,由于字幕“我是一条跨了多个色域的字幕”跨越了多个色域差别较大的区域,因此,该字幕对应的蒙板发生了变化,容易看出,由于该字幕所在区域的中间部分(即“跨了多个色”部分)字幕辨识度较高,因此该部分对应的蒙板的透明度设置成了100%(即全透明),或者也可以不设置蒙板,而由于该字幕所在区域的前端部分(即“我是一条”部分)和后端部分(即“域的字幕”部分)字幕辨识度较低,因此,这两部分对应的蒙板的色值和透明度则是基于字幕色域信息和这两部分所在区域的色域信息分别计算出来的。这样,由于字幕“我是一条跨了多个色域的字幕”所在区域的中间部分对应的蒙板的透明度为100%,或者也可以不设置蒙板,因此,在达到了图6B所示的有益效果的基础上,进一步减少了蒙板对视频画面的遮挡,也进一步提高了用户体验。
进一步地,在整个视频播放过程中,字幕的位置、视频背景的颜色等均可能发生变化,因此上述字幕显示方法可以一直执行,从而实现在整个视频播放过程中,用户均可以清楚地看到字幕。示例性地,上述图8B可以为视频播放进度在8:00时刻的用户界面示意图,包括第一视频帧,图8C可以为视频播放进度在8:01时刻的用户界面示意图,包括第二视频帧,第一视频帧和第二视频帧相同。如图8C所示,可以看出,字幕“我是一条跨了多个色域的字幕”,字幕“辨识度高的字幕”,字幕“看不清的彩色字幕”相对于图8B来说均向显示屏的左侧发生了移动,电子设备100会基于字幕的色值和该字幕对应当前视频帧区域的色值重新计算字幕对应的蒙板的色值、透明度,生成字幕对应的蒙板。容易看出,图8C中的字幕“我是一条跨了多个色域的字幕”对应的蒙板相对于图8B发生了明显变化。在图8B中,该字幕辨识度较低的部分为“我是一条”和“域的字幕”,因此这两部分对应蒙板均有一定的色值,且对应蒙板的透明度小于100%,该字幕辨识度较高的部分为“跨了多个色”,因此这部分没有显示蒙板,具体地,可以是将该字幕对应蒙板的透明度为100%,或者不设置蒙板。而在图8C中,该字幕辨识度较低的部分变为了“我是一条跨”和“的字幕”,因此电子设备100会基于字幕的色值和该字幕对应当前视频帧区域的色值重新计算这两部分对应蒙板的色值、透明度,由于这两部分辨识度低,因此这两部分对应蒙板均有一定的色值,且对应蒙板的透明度小于100%。该字幕辨识度较高的部分变为了“了多个色域”,因此这部分没有显示蒙板,具体地,可以将该字幕对应蒙板的透明度设置为100%,或者不设置蒙板。其中,图 8C中字幕对应蒙板的生成过程与前述图8B中字幕对应蒙板的生成过程类似,在此不再赘述。
图8B和图8C所示的视频播放画面可以是全屏显示也可以是部分屏幕显示,本申请实施例对此不作限定。
在本申请实施例中,对于辨识度高的字幕,电子设备100也会为该字幕生成蒙板,其蒙板的色值可以为预设色值,其蒙板的透明度为100%,在一些实施例中,对于辨识度高的字幕,电子设备100也可以不为该字幕生成蒙板,即若电子设备100确定该字幕辨识度高,则电子设备100可以不再对该字幕做进一步处理,因此该字幕没有对应的蒙板,即该字幕不被设置有蒙板。
在本申请实施例中,一条字幕对应一条蒙板(即一条字幕对应一组蒙板参数)可以是指一条字幕对应一条包含一个色值和一个透明度的蒙板,一条字幕对应多条蒙板(即一条字幕对应多组蒙板参数)可以是指一条字幕对应多条不同色值和不同透明度的蒙板,或者,一条字幕对应一条包含不同色值和不同透明度的蒙板(即多条不同色值和不同透明度的蒙板组合成一条包含不同色值和不同透明度的蒙板)。
本申请的实施例中的电子设备100以手机(mobile phone)为例,电子设备100还可以是平板电脑(Pad)、个人数字助理(Personal DigitalAssistant,PDA)、膝上型电脑(Laptop)等便携式电子设备,本申请实施例对电子设备100的类型、物理形态、尺寸不作限定。
在本申请实施例中,第一视频可以是在用户点击图2B所示的视频播放选项221之后电子设备100所播放的视频,第一界面可以是图6B所示的用户界面,第一画面可以是图6B所示的视频帧画面,第一字幕可以是字幕“我是一条跨了多个色域的字幕”,第一区域是第一字幕的显示位置对应的第一画面中的区域,第一数值可以是第一字幕的颜色与第一字幕的显示位置对应的第一画面区域颜色的颜色差异值,第二界面可以是图6C所示的用户界面,第二画面可以是图6C所示的视频帧画面,第二区域是第一字幕的显示位置对应的第二画面中的区域,第二数值可以是第一字幕的颜色与第一字幕的显示位置对应的第二画面区域颜色的颜色差异值,第一视频文件可以是第一视频对应的视频文件,第一字幕文件可以是第一视频对应的字幕文件,第一视频帧是用于生成第一画面的视频帧,第一字幕帧是包含第一字幕,且与第一视频帧携带相同时间信息的字幕帧,第二字幕帧是第一字幕叠加第一蒙板之后生成的字幕帧(即带蒙板的字幕帧),第一子区域可以是视频帧色域提取单元,第二子区域可以是将色值相近的相邻第一子区域进行合并之后的区域(例如区域A、区域B、区域C),第一子蒙板可以是每个第二子区域对应的蒙板,第一蒙板可以是图6B所示的字幕“我是一条跨了多个色域的字幕”对应的蒙板,也可以是图8B所示的字幕“我是一条跨了多个色域的字幕”对应的蒙板,第三界面可以是图8B所示的用户界面,第三画面可以是图8B所示的视频帧画面,第一部分可以是字幕“我是一条跨了多个色域的字幕”中的“我是一条”,第二部分可以是字幕“我是一条跨了多个色域的字幕”中的“跨了多个色”,第二子蒙板可以是“我是一条”对应的蒙板(即图7B所示的区域A蒙板),第三子蒙板可以是“跨了多个色”对应的蒙板(即图7B所示的区域B蒙板),第二蒙板可以是图6C所示的字幕“我是一条跨了多个色域的字幕”对应的蒙板。
下面介绍本申请实施例提供的一种电子设备100的结构。
图9示例性示出了本申请实施例中提供的一种电子设备100的结构。
如图9所示,电子设备100可以包括:处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现 通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他终端设备,例如AR设备等。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备100供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备 100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电 话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动终端设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备100姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳 朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息。实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设 备100分离。
应当理解的是,图9所示电子设备100仅是一个范例,并且电子设备100可以具有比图9中所示的更多的或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图9中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
下面介绍本申请实施例提供的一种电子设备100的软件结构。
图10示例性示出了本申请实施例中提供的一种电子设备100的软件结构。
如图10所示,电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。下面示例性说明电子设备100的软件结构。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,电子设备100的软件结构分为三层,从上至下分别为应用程序层,应用程序框架层,内核层。
应用程序层可以包括一系列应用程序包。
如图10所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。其中,视频可以是指本申请实施例提及的视频类应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图10所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器、视频处理系统等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
视频处理系统可以用于执行本申请实施例提供的字幕显示方法。视频处理系统可以包括字幕解码模块、视频帧色域解释模块、视频帧合成模块、视频帧队列、视频渲染模块,其中,每一个模块的具体功能可以参照前述实施例中的相关内容,在此不再赘述。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,蓝牙驱动,传感器驱动。
下面结合捕获拍照场景,示例性说明电子设备100软件以及硬件的工作流程。
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像头193捕获静态图像或视频。
下面介绍本申请实施例提供的另一种电子设备100的结构。
图11示例性示出了本申请实施例中提供的另一种电子设备100的结构。
如图11所示,电子设备100可以包括:视频类应用程序1100和视频处理系统1110。
视频类应用程序1100可以是电子设备100上安装的系统应用程序(例如图2A所示的“视频”应用程序),也可以是电子设备100上安装的来自第三方提供的具有视频播放能力的应用程序,主要用于播放视频。
视频处理系统1110可以包括:视频解码模块1111、字幕解码模块1112、视频帧色域解释模块1113、视频帧合成模块1114、视频帧队列1115、视频渲染模块1116。
视频解码模块1111可以接收视频类应用程序1100发送的视频信息流,并对该视频信息流进行解码生成视频帧。
字幕解码模块1112可以接收视频类应用程序1100发送的字幕信息流,并对该字幕信息流进行解码生成字幕帧,并可以基于视频帧色域解释模块1113发送的蒙板参数生成带蒙板的字幕帧,从而可以提高字幕的辨识度。
视频帧色域解释模块1113可以字幕辨识度进行分析,生成字幕辨识度分析结果,并基于字幕辨识度分析结果计算字幕对应的蒙板参数(蒙板的色值、透明度)。
视频帧合成模块1114可以对视频帧和字幕帧进行叠加合并,生成待显示的视频帧。
视频帧队列1115可以对视频帧合成模块1114发送的待显示的视频帧进行存储。
视频渲染模块1116可以对待显示的视频帧按照时间顺序进行渲染,生成渲染后的视频帧,并发送给视频类应用程序1100进行视频播放。
关于上述电子设备100的功能和工作原理的更多细节,可以参照上述各个实施例中的相关内容,在此不再赘述。
应当理解的是,图11所示的电子设备100仅仅是一个示例,并且电子设备100可以具有比图11中所示的更多的或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图11中所示出的各种部件可以在硬件、软件、或硬件和软件的组合中实现。
以上模块可以根据功能进行划分,在实际的产品中,可以为同一软件模块执行的不同功能。
下面介绍本申请实施例提供的另一种电子设备100的结构。
图12示例性示出了本申请实施例中提供的另一种电子设备100的结构。
如图12所示,电子设备100可以包括:视频类应用程序1200,其中,视频类应用程序1200可以包括:视频解码模块1211、字幕解码模块1212、视频帧色域解释模块1213、视频帧合成模块1214、视频帧队列1215、视频渲染模块1216。
视频类应用程序1200可以是电子设备100上安装的系统应用程序(例如图2A所示的“视频”应用程序),也可以是电子设备100上安装的来自第三方提供的具有视频播放能力的应用程序,主要用于播放视频。
获取与显示模块1210可以获取视频信息流和字幕信息流,显示视频渲染模块1216发送的渲染后的视频帧等。
视频解码模块1211可以接收获取与显示模块1210发送的视频信息流,并对该视频信息流进行解码生成视频帧。
字幕解码模块1212可以接收获取与显示模块1210发送的字幕信息流,并对该字幕信息流进行解码生成字幕帧,并可以基于视频帧色域解释模块1213发送的蒙板参数生成带蒙板的字幕帧,从而可以提高字幕的辨识度。
视频帧色域解释模块1213可以字幕辨识度进行分析,生成字幕辨识度分析结果,并基于字幕辨识度分析结果计算字幕对应的蒙板参数(蒙板的色值、透明度)。
视频帧合成模块1214可以对视频帧和字幕帧进行叠加合并,生成待显示的视频帧。
视频帧队列1215可以对视频帧合成模块1214发送的待显示的视频帧进行存储。
视频渲染模块1216可以对待显示的视频帧按照时间顺序进行渲染,生成渲染后的视频帧,并发送给获取与显示模块1210进行视频播放。
关于上述电子设备100的功能和工作原理的更多细节,可以参照上述各个实施例中的相关内容,在此不再赘述。
应当理解的是,图12所示的电子设备100仅仅是一个示例,并且电子设备100可以具有比图12中所示的更多的或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图12中所示出的各种部件可以在硬件、软件、或硬件和软件的组合中实现。
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (16)

  1. 一种字幕显示方法,其特征在于,所述方法包括:
    电子设备播放第一视频;
    所述电子设备显示第一界面时,所述第一界面包括第一画面和第一字幕,所述第一字幕以第一蒙板为背景悬浮显示于所述第一画面的第一区域之上,所述第一区域是所述第一字幕的显示位置对应的所述第一画面中的区域,其中,所述第一字幕的色值与所述第一区域的色值的差异值为第一数值;
    所述电子设备显示第二界面时,所述第二界面包括第二画面和所述第一字幕,所述第一字幕不显示蒙板,所述第一字幕悬浮显示于所述第二画面的第二区域之上,所述第二区域是所述第一字幕的显示位置对应的所述第二画面中的区域,其中,所述第一字幕的色值与所述第二区域的色值的差异值为第二数值,所述第二数值大于所述第一数值;
    其中,所述第一画面是所述第一视频中的一个画面,所述第二画面是所述第一视频中的另一个画面。
  2. 根据权利要求1所述的方法,其特征在于,在所述电子设备显示第一界面之前,所述方法还包括:
    所述电子设备获取第一视频文件和第一字幕文件,其中,所述第一视频文件和所述第一字幕文件携带的时间信息相同;
    所述电子设备基于所述第一视频文件生成第一视频帧,所述第一视频帧用于生成所述第一画面;
    所述电子设备基于所述第一字幕文件生成第一字幕帧,并在所述第一字幕帧中获取所述第一字幕的色值、显示位置,其中,所述第一字幕帧携带的时间信息与所述第一视频帧携带的时间信息相同;
    所述电子设备基于所述第一字幕的显示位置确定所述第一区域;
    所述电子设备基于所述第一字幕的色值或所述第一区域的色值生成所述第一蒙板;
    所述电子设备在所述第一字幕帧中将所述第一字幕叠加到所述第一蒙板之上生成第二字幕帧,并将所述第二字幕帧与所述第一视频帧进行合成。
  3. 根据权利要求2所述的方法,其特征在于,在所述电子设备基于所述第一字幕的色值或所述第一区域的色值生成所述第一蒙板之前,所述方法还包括:
    所述电子设备确定所述第一数值小于第一阈值。
  4. 根据权利要求3所述的方法,其特征在于,所述电子设备确定所述第一数值小于第一阈值,具体包括:
    所述电子设备将所述第一区域划分为N个第一子区域,其中,所述N为正整数;
    所述电子设备基于所述第一字幕的色值和所述N个第一子区域的色值确定所述第一数值小于所述第一阈值。
  5. 根据权利要求4所述的方法,其特征在于,所述电子设备基于所述第一字幕的色值或所述第一区域的色值生成所述第一蒙板,具体包括:
    所述电子设备基于所述第一字幕的色值或所述N个第一子区域的色值确定出一个所述第一蒙板的色值;
    所述电子设备基于所述第一蒙板的色值生成所述第一蒙板。
  6. 根据权利要求3所述的方法,其特征在于,所述电子设备确定所述第一数值小于第一阈值,具体包括:
    所述电子设备将所述第一区域划分为N个第一子区域,其中,所述N为正整数;
    所述电子设备基于相邻的所述第一子区域之间的色值的差异值,确定是否将相邻的所述第一子区域合并为第二子区域;
    当相邻的所述第一子区域之间的色值的差异值小于第二阈值时,所述电子设备将相邻的所述第一子区域合并为所述第二子区域;
    所述电子设备基于所述第一字幕的色值和所述第二子区域的色值确定所述第一数值小于所述第一阈值。
  7. 根据权利要求6所述的方法,其特征在于,所述第一区域包含M个所述第二子区域,所述M为正整数且小于等于所述N,所述第二子区域包括一个或多个所述第一子区域,每一个所述第二子区域包括的所述第一子区域的个数相同或不同。
  8. 根据权利要求7所述的方法,其特征在于,所述电子设备基于所述第一字幕的色值或所述第一区域的色值生成所述第一蒙板,具体包括:
    所述电子设备基于所述第一字幕的色值或M个所述第二子区域的色值依次计算M个第一子蒙板的色值;
    所述电子设备基于所述M个第一子蒙板的色值生成所述M个第一子蒙板,其中,所述M个第一子蒙板组合为所述第一蒙板。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述方法还包括:
    所述电子设备显示第三界面时,所述第三界面包括第三画面和所述第一字幕,所述第一字幕至少包括第一部分和第二部分,所述第一部分显示第二子蒙板,所述第二部分显示第三子蒙板或不显示所述第三子蒙板,所述第二子蒙板的色值与所述第三子蒙板的色值不同。
  10. 根据权利要求1-9任一项所述的方法,其特征在于,所述第一蒙板的显示位置是基于所述第一字幕的显示位置确定的。
  11. 根据权利要求1-10任一项所述的方法,其特征在于,所述第一蒙板的色值与所述第一字幕的色值的差异值大于所述第一数值。
  12. 根据权利要求1-11任一项所述的方法,其特征在于,在所述第一画面和所述第二画面中,所述第一字幕的显示位置相对于所述电子设备的显示屏是不固定的或固定的,所述第一字幕是连续显示的一段文字或符号。
  13. 根据权利要求1-12任一项所述的方法,其特征在于,在所述电子设备显示第一界面 之前,所述方法还包括:
    所述电子设备将所述第一蒙板的透明度设置为小于100%。
  14. 根据权利要求1-13任一项所述的方法,其特征在于,在所述电子设备显示第二界面之前,所述方法还包括:
    所述电子设备基于所述第一字幕的色值或所述第二区域的色值生成第二蒙板,并将所述第一字幕叠加到所述第二蒙板之上,其中,所述第二蒙板的色值为预设色值,所述第二蒙板的透明度为100%;
    或,
    所述电子设备不生成所述第二蒙板。
  15. 一种电子设备,其特征在于,所述电子设备包括一个或多个处理器和一个或多个存储器;其中,所述一个或多个存储器与所述一个或多个处理器耦合,所述一个或多个存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述一个或多个处理器执行所述计算机指令时,使得所述电子设备执行如权利要求1-14中任一项所述的方法。
  16. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,当所述程序指令在电子设备上运行时,使得所述电子设备执行如权利要求1-14中任一项所述的方法。
PCT/CN2022/095325 2021-06-30 2022-05-26 字幕显示方法及相关设备 WO2023273729A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110742392.9A CN115550714A (zh) 2021-06-30 2021-06-30 字幕显示方法及相关设备
CN202110742392.9 2021-06-30

Publications (1)

Publication Number Publication Date
WO2023273729A1 true WO2023273729A1 (zh) 2023-01-05

Family

ID=84689986

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/095325 WO2023273729A1 (zh) 2021-06-30 2022-05-26 字幕显示方法及相关设备

Country Status (2)

Country Link
CN (1) CN115550714A (zh)
WO (1) WO2023273729A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108200361A (zh) * 2018-02-07 2018-06-22 中译语通科技股份有限公司 一种基于环境感知技术的字幕背景处理方法、显示器
CN110022499A (zh) * 2018-01-10 2019-07-16 武汉斗鱼网络科技有限公司 一种直播弹幕颜色设置方法及装置
CN111614993A (zh) * 2020-04-30 2020-09-01 腾讯科技(深圳)有限公司 弹幕展示方法、装置、计算机设备及存储介质
CN112511890A (zh) * 2020-11-23 2021-03-16 维沃移动通信有限公司 视频图像处理方法、装置及电子设备
US20210158586A1 (en) * 2019-11-25 2021-05-27 International Business Machines Corporation Dynamic subtitle enhancement

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106254933B (zh) * 2016-08-08 2020-02-18 腾讯科技(深圳)有限公司 字幕提取方法及装置
CN109214999B (zh) * 2018-09-21 2021-01-22 阿里巴巴(中国)有限公司 一种视频字幕的消除方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110022499A (zh) * 2018-01-10 2019-07-16 武汉斗鱼网络科技有限公司 一种直播弹幕颜色设置方法及装置
CN108200361A (zh) * 2018-02-07 2018-06-22 中译语通科技股份有限公司 一种基于环境感知技术的字幕背景处理方法、显示器
US20210158586A1 (en) * 2019-11-25 2021-05-27 International Business Machines Corporation Dynamic subtitle enhancement
CN111614993A (zh) * 2020-04-30 2020-09-01 腾讯科技(深圳)有限公司 弹幕展示方法、装置、计算机设备及存储介质
CN112511890A (zh) * 2020-11-23 2021-03-16 维沃移动通信有限公司 视频图像处理方法、装置及电子设备

Also Published As

Publication number Publication date
CN115550714A (zh) 2022-12-30

Similar Documents

Publication Publication Date Title
EP4084450A1 (en) Display method for foldable screen, and related apparatus
WO2020259452A1 (zh) 一种移动终端的全屏显示方法及设备
US11669242B2 (en) Screenshot method and electronic device
CN111669459B (zh) 键盘显示方法、电子设备和计算机可读存储介质
CN114895861A (zh) 一种消息处理的方法、相关装置及系统
CN113448382B (zh) 多屏幕显示电子设备和电子设备的多屏幕显示方法
WO2022262313A1 (zh) 基于画中画的图像处理方法、设备、存储介质和程序产品
CN114327127B (zh) 滑动丢帧检测的方法和装置
CN114089932B (zh) 多屏显示方法、装置、终端设备及存储介质
CN110248037B (zh) 一种身份证件扫描方法及装置
CN113891009B (zh) 曝光调整方法及相关设备
US20230168802A1 (en) Application Window Management Method, Terminal Device, and Computer-Readable Storage Medium
CN110727380A (zh) 一种消息提醒方法及电子设备
CN112930533A (zh) 一种电子设备的控制方法及电子设备
CN113935898A (zh) 图像处理方法、系统、电子设备及计算机可读存储介质
US11816494B2 (en) Foreground element display method and electronic device
CN112449101A (zh) 一种拍摄方法及电子设备
CN114756184A (zh) 协同显示方法、终端设备及计算机可读存储介质
CN113438366A (zh) 信息通知的交互方法、电子设备和存储介质
CN113542574A (zh) 变焦下的拍摄预览方法、终端、存储介质及电子设备
EP4206865A1 (en) Brush effect picture generation method, image editing method and device, and storage medium
CN113923372B (zh) 曝光调整方法及相关设备
CN113497888B (zh) 照片预览方法、电子设备和存储介质
WO2023273729A1 (zh) 字幕显示方法及相关设备
CN112527220B (zh) 一种电子设备显示方法及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22831552

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023580652

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE