WO2020244553A1 - 字幕越界的处理方法、装置和电子设备 - Google Patents

字幕越界的处理方法、装置和电子设备 Download PDF

Info

Publication number
WO2020244553A1
WO2020244553A1 PCT/CN2020/094191 CN2020094191W WO2020244553A1 WO 2020244553 A1 WO2020244553 A1 WO 2020244553A1 CN 2020094191 W CN2020094191 W CN 2020094191W WO 2020244553 A1 WO2020244553 A1 WO 2020244553A1
Authority
WO
WIPO (PCT)
Prior art keywords
size
subtitles
video image
frame
bounds
Prior art date
Application number
PCT/CN2020/094191
Other languages
English (en)
French (fr)
Inventor
卢永晨
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Priority to US17/616,954 priority Critical patent/US11924520B2/en
Priority to JP2021571922A priority patent/JP7331146B2/ja
Publication of WO2020244553A1 publication Critical patent/WO2020244553A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25808Management of client data
    • H04N21/25825Management of client data involving client display capabilities, e.g. screen resolution of a mobile phone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • H04N21/4355Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream involving reformatting operations of additional data, e.g. HTML pages on a television screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/4508Management of client data or end-user data
    • H04N21/4518Management of client data or end-user data involving characteristics of one or more peripherals, e.g. peripheral type, software version, amount of memory available or display capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8126Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts
    • H04N21/8133Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts specifically related to the content, e.g. biography of the actors in a movie, detailed information about an article seen in a video program

Definitions

  • the present disclosure relates to the field of image processing, and in particular to a method, device, and electronic equipment for processing subtitles out of bounds.
  • Today's terminal devices have entertainment capabilities, such as smart phones, tablet computers, etc., which can play multimedia files such as video and audio.
  • current videos often have subtitles, and the position of the subtitles is not fixed and can be located anywhere in the video.
  • a user records a video and puts it in a terminal device for playback, but the size of the video does not match the screen size of the terminal, causing some subtitles to cross the boundary and enter the position outside the video screen, affecting the viewing effect.
  • Figure 1 it is an example of the above-mentioned subtitles out of bounds.
  • the video includes the subtitles of "I am Chinese", but because the size of the video is larger than the size of the terminal device, the terminal device cannot detect it.
  • a processing method for subtitles out of bounds including:
  • the composite frame contains text, it is determined that the subtitle in the video image is out of bounds.
  • the method further includes:
  • the size of the subtitles is reduced to the safe area.
  • the acquiring size information of the display device of the terminal includes:
  • Obtain display attributes of the terminal where the display attributes include the height and width of the display device.
  • the establishing a safe area according to the size information, wherein the safe area is smaller than or equal to the size of the display device includes:
  • the height of the safe zone is calculated according to a second percentage, wherein the majority of the second percentages indicate the percentage of the height of the safe zone to the height of the display device.
  • the extracting video frames in the video image in response to playing the video image in the terminal includes:
  • intercepting a part of the video frame that exceeds the size of the safe zone to generate a composite frame includes:
  • the frame segments in the height direction are combined to generate a composite frame.
  • the detecting whether the composite frame contains text includes:
  • the text judgment model is obtained through convolutional neural network training, wherein a training set with classification marks is input to the convolutional neural network, and the convolutional neural network is supervised by the output result of the convolutional neural network.
  • the network is trained into the character judgment model.
  • determining that the subtitle in the video image is out of bounds includes:
  • the composite frame contains text, it is determined that the subtitle in the video image is out of bounds in the width direction and/or the height direction of the video image.
  • reducing the size of the subtitles to the safe area includes:
  • the subtitles in the video image are out of bounds
  • the subtitles are zoomed so that all the subtitles are located in the safe area; or, the video image is zoomed so that the subtitles are all located in the safety zone. Safe zone.
  • a processing device for subtitle transboundary including:
  • a size obtaining module configured to obtain size information of a display device of a terminal, wherein the size information indicates the size of the display device;
  • a safe zone establishing module configured to establish a safe zone according to the size information, wherein the safe zone is smaller than or equal to the size of the display device;
  • the video frame extraction module is configured to extract video frames in the video image in response to playing the video image in the terminal;
  • a frame synthesis module configured to intercept a part of the video frame that exceeds the size of the safe zone to generate a synthesized frame
  • the text detection module is used to detect whether the composite frame contains text
  • the out-of-bounds judging module is configured to determine that the subtitles in the video image are out of bounds if the composite frame contains text.
  • the device further includes:
  • the zoom module is used to reduce the size of the subtitle to the safe area when it is determined that the subtitle in the video image is out of bounds.
  • the size obtaining module further includes:
  • the display attribute acquisition module is used to acquire the display attributes of the terminal, and the display attributes include the height and width of the display device.
  • security zone establishment module further includes:
  • a safe zone width calculation module configured to calculate the width of the safe zone according to a first percentage, wherein the first percentage indicates the percentage of the width of the safe zone to the width of the display device; and/or ,
  • the safe zone height calculation module is configured to calculate the height of the safe zone according to a second percentage, wherein the majority of the second percentages indicate the percentage of the height of the safe zone to the height of the display device.
  • video frame extraction module is also used for:
  • the frame synthesis module further includes:
  • An interception distance calculation module configured to calculate an interception distance according to the size of the video frame and the size of the safe zone
  • a frame fragment interception module configured to intercept frame fragments in the width direction and/or height direction of the video frame according to the intercept distance
  • the synthesis module is configured to combine the frame segments in the width direction to generate a composite frame; and/or combine the frame segments in the height direction to generate a composite frame. .
  • the text detection module further includes:
  • An input module for inputting the composite frame into a text judgment model
  • the judgment module is used for judging whether the composite frame contains text according to the output of the text judgment model.
  • the text judgment model is obtained through convolutional neural network training, wherein a training set with classification marks is input to the convolutional neural network, and the convolutional neural network is supervised by the output result of the convolutional neural network.
  • the network is trained into the character judgment model.
  • cross-border judgment module further includes:
  • the out-of-bounds type determination module is configured to determine that the subtitles in the video image are out of bounds in the width direction and/or height direction of the video image if the composite frame contains text.
  • the zoom module is further configured to: when it is determined that the subtitles in the video image are out of bounds, zoom the subtitles so that all the subtitles are located in the safe area; or, the video image Zoom so that the subtitles are all within the safe area.
  • An electronic device comprising: a memory for storing non-transitory computer-readable instructions; and a processor for running the computer-readable instructions, so that the processor implements any of the above-mentioned subtitle out-of-bounds processing methods when executed The steps described.
  • a computer-readable storage medium for storing non-transitory computer-readable instructions.
  • the non-transitory computer-readable instructions When executed by a computer, the computer can execute the steps in any of the above methods.
  • the present disclosure discloses a method, device and electronic equipment for processing subtitles out of bounds.
  • the method for processing the subtitles out of bounds includes: obtaining size information of the display device of the terminal, wherein the size information indicates the size of the display device; and establishing a safe zone according to the size information, wherein the safe zone is less than or equal to all The size of the display device; in response to playing a video image in the terminal, extract a video frame in the video image; intercept a part of the video frame that exceeds the size of the safe zone to generate a composite frame; detect the composite frame Whether it contains text; if the composite frame contains text, it is determined that the subtitle in the video image is out of bounds.
  • the method for processing subtitles out of bounds in the embodiments of the present disclosure solves the current technical problem that the user needs to manually determine whether there is a subtitle out of bounds by setting a safe zone and determining whether the frame segment that exceeds the safe zone contains text.
  • FIG. 1 is a schematic diagram of a subtitle crossing the boundary of a display screen in the prior art
  • FIG. 2 is a schematic flowchart of a method for processing subtitles out of bounds according to an embodiment of the present disclosure
  • Fig. 3 is a schematic diagram of calculating the interception distance of a frame segment according to an embodiment of the present disclosure
  • Fig. 4 is a schematic diagram of a composite frame according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic flowchart of a method for processing subtitles out of bounds according to an embodiment of the present disclosure
  • Fig. 6 is a schematic structural diagram of a processing device for subtitle cross-border according to an embodiment of the present disclosure
  • Fig. 7 is a schematic structural diagram of an electronic device provided according to an embodiment of the present disclosure.
  • the embodiments of the present disclosure provide a processing method for subtitles out of bounds.
  • the method for processing the subtitle transboundary provided in this embodiment can be executed by a computing device, which can be implemented as software, or as a combination of software and hardware, and the computing device can be integrated in a server, terminal device, etc.
  • the processing method for the subtitle cross-border mainly includes the following steps S201 to S206. among them:
  • Step S201 Obtain size information of the display device of the terminal, where the size information indicates the size of the display device;
  • the acquiring size information of the display device of the terminal includes acquiring the display attributes of the terminal, and the display attributes include the height and width of the display device.
  • the system information generally includes screen object attributes, which include the height and width of the smart phone’s screen.
  • the units are pixels.
  • the display attributes also exist in the system information, which can be viewed from the system Read in the information, so I won’t repeat it here.
  • NxM the acquired size information of the display device
  • N is the width of the display device
  • M is the height of the display device, N ⁇ 1, M ⁇ 1.
  • Step S202 Establish a safe area according to the size information, wherein the safe area is smaller than or equal to the size of the display device;
  • said establishing a safe area according to the size information, wherein the safe area is smaller than or equal to the size of the display device includes: calculating the width of the safe area according to a first percentage, wherein the The first percentage indicates the percentage of the width of the safety zone to the width of the display device; and/or the height of the safety zone is calculated according to a second percentage, wherein the majority of the second percentage indicates the The height of the safe zone accounts for the percentage of the height of the display device.
  • the safe area defines the display area of the subtitles, so that the subtitles will not cross the boundary of the display device during display.
  • Step S203 In response to playing the video image in the terminal, extract video frames in the video image
  • extracting a video frame in the video image includes: in response to playing a video image in the terminal, randomly extracting at least one of the video images A video frame or a specific video frame in the video image is extracted, wherein the specific video frame is a video frame with specific characteristics extracted using a preset method.
  • the extraction method of extracting video frames in the video image includes random extraction.
  • the random extraction may be a random extraction of a few consecutive frames or a random extraction of a few frames at a fixed interval or a sequential random extraction of a few frames.
  • the random method is not limited, and any random extraction method can be applied to the present disclosure.
  • a specific video frame may also be extracted.
  • the specific video may be a video frame with specific specific characteristics extracted using a preset method, for example, a text recognition model is used to identify a video frame with text, and the The video frame with text is extracted from the video image.
  • Step S204 intercept the part of the video frame that exceeds the size of the safe zone to generate a composite frame
  • the intercepting the part of the video frame that exceeds the size of the safe zone to generate a composite frame includes: calculating an intercepting distance according to the size of the video frame and the size of the safe zone; and according to the intercepting distance Intercept frame segments in the width direction and/or height direction of the video frame; combine the frame segments in the width direction to generate a composite frame; and/or combine the frame segments in the height direction to generate a composite frame.
  • the calculation of the interception distance according to the size of the video frame and the size of the safe zone may be directly by subtracting the width of the safe zone from the width of the video frame, and subtracting the height of the video frame from the video frame the height of.
  • the interception distance is calculated according to the size of the video frame and the size of the safety zone, and the interception distance can also be calculated based on the result of subtracting the width of the safety zone from the width of the video frame.
  • the interception distance in the width direction takes 80 as the maximum value of the interception distance in the width direction and 60 as the maximum value of the interception distance in the height direction to calculate the interception distance in the width direction and the interception distance in the height direction, such as each interception 50% of the maximum value of the distance is taken as the interception distance, the interception distance in the width direction is 40, and the interception distance in the height direction is 30.
  • the interception distance is obtained through the above steps, and then the frame fragments are intercepted in the width direction and/or height direction of the video frame according to the interception distance, and the frame fragments in the width direction are combined to generate a composite frame; and/or , Combining the frame segments in the height direction to generate a composite frame.
  • FIG. 4 it is a composite frame in the width direction, where the left frame fragment includes part of the character "I", and the right frame fragment includes part of the character " ⁇ ". It is understandable that FIG. 4 only shows the composite frame in the width direction, and the composite frame in the height direction is similar, except that the upper and lower frame fragments are synthesized, which will not be repeated here. It is understandable that although the frame fragment of the composite frame shown in FIG. 4 includes text, in fact, the frame fragment that generates the composite frame may not include text. This situation corresponds to the case where the caption does not cross the boundary. I will not repeat them here.
  • Step S205 Detect whether the composite frame contains text
  • the detecting whether the composite frame contains text includes: inputting the composite frame into a text judgment model; and judging whether the composite frame contains text according to the output of the text judgment model.
  • the text judgment model is obtained by convolutional neural network training, wherein a training set with classification marks is input to the convolutional neural network, and the convolutional neural network is supervised by the output result of the convolutional neural network. Train into the character judgment model.
  • the pre-trained convolutional neural network is used to determine whether the synthesized frame contains text.
  • the convolutional neural network can be any deformed form of the convolutional application network. There is no restriction here.
  • multiple images as shown in FIG. 4 are marked as containing text.
  • the pictures in the training set are input to the convolutional neural network and output through the sigmoid function, and the output result is compared with the mark. If the input is correct, the parameters of the current convolutional neural network are saved, and if they are incorrect, they are fed back to the convolutional neural network.
  • the product neural network adjusts its parameters and continues to input pictures to repeat the above steps until the parameters that adapt to all the pictures in the training set are trained, and the training ends, and a text judgment model is formed.
  • the composite frame generated in step S204 is input into the text judgment model, and it is determined whether the composite frame contains text according to the output of the model.
  • the model output is 1, it is considered that the composite frame contains text .
  • the model output is 0, it is considered that no text is included in the composite frame.
  • Step S206 If the composite frame contains text, it is determined that the subtitle in the video image is out of bounds.
  • determining that the subtitle in the video image is out of bounds includes: if the composite frame contains text, determining that the subtitle in the video image is in the The width and/or height of the video image is out of bounds. In this step, if the result obtained in step S205 is that the composite frame contains text, it is determined that the subtitle in the image is out of bounds. Furthermore, it can be determined whether the composite frame is a composite frame in the width direction or in the height direction. Synthesize frames to determine whether the subtitles cross the boundary in the width direction or the height direction of the video image.
  • the present disclosure discloses a method, device and electronic equipment for processing subtitles out of bounds.
  • the method for processing the subtitles out of bounds includes: obtaining size information of the display device of the terminal, wherein the size information indicates the size of the display device; establishing a safe area according to the size information, wherein the safe area is less than or equal to all The size of the display device; in response to playing a video image in the terminal, extract a video frame in the video image; intercept a part of the video frame that exceeds the size of the safe zone to generate a composite frame; detect the composite frame Whether it contains text; if the composite frame contains text, it is determined that the subtitle in the video image is out of bounds.
  • the method for processing subtitles out of bounds in the embodiments of the present disclosure solves the current technical problem that the user needs to manually determine whether there is a subtitle out of bounds by setting a safe zone and determining whether the frame segment that exceeds the safe zone contains text.
  • the above-mentioned method for processing subtitles out of bounds further includes:
  • Step S501 When it is determined that the subtitles in the video image are out of bounds, reduce the size of the subtitles to the safe area.
  • reducing the size of the subtitles to the safe area includes: when it is determined that the subtitles in the video image are out of bounds, zooming the subtitles so that The subtitles are all located in the safe area; or, the video image is zoomed so that the subtitles are all located in the safe area.
  • This step is an automatic processing step after judging that the subtitle is out of bounds.
  • the subtitle will be reduced until the subtitle is in the safe area.
  • the subtitles are generally separated from the video image, which means that the subtitles are plug-in, and the subtitles can be configured through the configuration file. Display position, font size and color, etc.
  • the subtitles can be zoomed into the safe area; the other is to directly zoom the video.
  • the subtitles and the video are combined together.
  • the subtitles are part of the video image, and the subtitles cannot be zoomed separately.
  • the video image can be zoomed By zooming, the video image is zoomed to the size of the safe zone.
  • the subtitles must be located in the safe zone, which solves the problem of subtitles crossing boundaries.
  • the device embodiments of the present disclosure can be used to perform the steps implemented by the method embodiments of the present disclosure.
  • the embodiment of the present disclosure provides a processing device for subtitles out of bounds.
  • the device can execute the steps described in the above-mentioned embodiment of the method for processing out of bounds subtitles.
  • the device 600 mainly includes: a size acquisition module 601, a safe zone establishment module 602, a video frame extraction module 603, a frame synthesis module 604, a text detection module 605, and a cross-border judgment module 606. among them,
  • the size obtaining module 601 is configured to obtain size information of the display device of the terminal, where the size information indicates the size of the display device;
  • the safe zone establishment module 602 is configured to establish a safe zone according to the size information, wherein the safe zone is smaller than or equal to the size of the display device;
  • the video frame extraction module 603 is configured to extract video frames in the video image in response to the video image being played in the terminal;
  • the frame synthesis module 604 is configured to intercept a part of the video frame that exceeds the size of the safe zone to generate a synthesized frame
  • the text detection module 605 is configured to detect whether the composite frame contains text
  • the out-of-bounds judging module 606 is configured to determine that the subtitles in the video image are out of bounds if the composite frame contains text.
  • the device 600 further includes:
  • the zoom module is used to reduce the size of the subtitle to the safe area when it is determined that the subtitle in the video image is out of bounds.
  • the size obtaining module 601 further includes:
  • the display attribute acquisition module is used to acquire the display attributes of the terminal, and the display attributes include the height and width of the display device.
  • security zone establishing module 602 further includes:
  • a safe zone width calculation module configured to calculate the width of the safe zone according to a first percentage, wherein the first percentage indicates the percentage of the width of the safe zone to the width of the display device; and/or ,
  • the safe zone height calculation module is configured to calculate the height of the safe zone according to a second percentage, wherein the majority of the second percentages indicate the percentage of the height of the safe zone to the height of the display device.
  • video frame extraction module 603 is also used for:
  • the frame synthesis module 604 further includes:
  • An interception distance calculation module configured to calculate an interception distance according to the size of the video frame and the size of the safe zone
  • a frame fragment interception module configured to intercept frame fragments in the width direction and/or height direction of the video frame according to the intercept distance
  • the synthesis module is configured to combine the frame segments in the width direction to generate a composite frame; and/or combine the frame segments in the height direction to generate a composite frame.
  • the text detection module 605 further includes:
  • An input module for inputting the composite frame into a text judgment model
  • the judgment module is used for judging whether the composite frame contains text according to the output of the text judgment model.
  • the text judgment model is obtained through convolutional neural network training, wherein a training set with classification marks is input to the convolutional neural network, and the convolutional neural network is supervised by the output result of the convolutional neural network.
  • the network is trained into the character judgment model.
  • out-of-bounds judgment module 606 further includes:
  • the out-of-bounds type determination module is configured to determine that the subtitles in the video image are out of bounds in the width direction and/or height direction of the video image if the composite frame contains text.
  • the zoom module is further configured to: when it is determined that the subtitles in the video image are out of bounds, zoom the subtitles so that all the subtitles are located in the safe area; or, the video image Zoom so that the subtitles are all within the safe area.
  • the device shown in FIG. 6 can execute the methods of the embodiments shown in FIG. 1 and FIG. 5.
  • parts that are not described in detail in this embodiment please refer to the related descriptions of the embodiments shown in FIG. 1 and FIG. 5.
  • For the implementation process and technical effects of this technical solution refer to the description in the embodiment shown in FIG. 1 and FIG. 5, which will not be repeated here.
  • FIG. 7 shows a schematic structural diagram of an electronic device 700 suitable for implementing embodiments of the present disclosure.
  • Electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (for example, Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 7 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the electronic device 700 may include a processing device (such as a central processing unit, a graphics processor, etc.) 701, which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 702 or from a storage device 708.
  • the program in the memory (RAM) 703 executes various appropriate actions and processing.
  • the RAM 703 also stores various programs and data required for the operation of the electronic device 700.
  • the processing device 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704.
  • An input/output (I/O) interface 705 is also connected to the bus 704.
  • the following devices can be connected to the I/O interface 705: including input devices 706 such as touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD), speakers, An output device 707 such as a vibrator; a storage device 708 such as a magnetic tape, a hard disk, etc.; and a communication device 709.
  • the communication device 709 may allow the electronic device 700 to perform wireless or wired communication with other devices to exchange data.
  • FIG. 4 shows an electronic device 700 having various devices, it should be understood that it is not required to implement or have all the illustrated devices. It may be implemented alternatively or provided with more or fewer devices.
  • the process described above with reference to the flowchart can be implemented as a computer software program.
  • the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 709, or installed from the storage device 708, or installed from the ROM 702.
  • the processing device 701 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the aforementioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electric, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs.
  • the electronic device obtains the size information of the display device of the terminal, wherein the size information indicates the display The size of the device; establish a safe area according to the size information, wherein the safe area is smaller than or equal to the size of the display device; in response to playing a video image in the terminal, extract the video frame in the video image; intercept The part of the video frame that exceeds the size of the safe zone generates a composite frame; it is detected whether the composite frame contains text; if the composite frame contains text, it is determined that the subtitle in the video image is out of bounds.
  • the computer program code used to perform the operations of the present disclosure may be written in one or more programming languages or a combination thereof.
  • the above-mentioned programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user’s computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented in a software manner, or may be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation on the unit itself under certain circumstances.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Graphics (AREA)
  • Studio Circuits (AREA)
  • Transforming Electric Information Into Light Information (AREA)
  • Image Analysis (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

本公开公开一种字幕越界的处理方法、装置和电子设备。其中,该字幕越界的处理方法包括:获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸;根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸;响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧;截取所述视频帧的超过所述安全区大小的部分生成合成帧;检测所述合成帧中是否包含文字;如果所述合成帧中包含文字,则判断所述视频图像中的字幕越界。本公开实施例的字幕越界的处理方法,通过设置安全区并判断超过安全区的帧片段上是否包含文字,解决了目前需要用户人工判断是否有字幕越界的技术问题。

Description

字幕越界的处理方法、装置和电子设备
相关申请的交叉引用
本申请要求于2019年06月06日提交的,申请号为201910493548.7、发明名称为“字幕越界的处理方法、装置和电子设备”的中国专利申请的优先权,该申请的全文通过引用结合在本申请中。
技术领域
本公开涉及图像处理领域,特别是涉及一种字幕越界的处理方法、装置和电子设备。
背景技术
随着通信技术的发展,各种终端设备,例如智能手机、平板电脑、笔记本电脑等,在人们生活中占据着越来越重要的地位。
现在的终端设备具备娱乐能力,如智能手机、平板电脑等都可以播放多媒体文件,如视频、音频等。而现在的视频中,往往带有字幕,而字幕的位置不固定,可以位于视频中的任何位置。存在这样一种场景,当用户将视频录制好之后,放入终端设备中播放,但是视频的大小与终端的屏幕大小不匹配,导致部分字幕越界进入视屏外的位置,影响观看效果。如图1所示,为上述字幕越界的一个例子,在该例子中,视频中包括了“我是中国人”的字幕,但是由于视频的尺寸大于所述终端设备的尺寸,终端设备又无法检测到字幕超出屏幕的范围, 因此其中的“我”字只显示了一部分,影响观看视频的效果。当前的技术方案,一般需要用户判断是否有字幕越界,之后通过调整屏幕的分辨率或者调整字幕的大小或者视频的大小来解决上述问题,非常不方便。
发明内容
根据本公开的一个方面,提供以下技术方案:
一种字幕越界的处理方法,包括:
获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸;
根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸;
响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧;
截取所述视频帧的超过所述安全区大小的部分生成合成帧;
检测所述合成帧中是否包含文字;
如果所述合成帧中包含文字,则判断所述视频图像中的字幕越界。
进一步的,所述方法还包括:
当判断所述视频图像中的字幕越界,缩小所述字幕的大小至所述安全区内。
进一步的,所述获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸,包括:
获取终端的显示属性,所述显示属性中包括显示装置的高度以及宽度。
进一步的,所述根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸,包括:
根据第一百分比计算所述安全区的宽度,其中所述第一百分比指示所述安全区的宽度占所述显示装置的宽度的百分比;和/或,
根据第二百分比计算所述安全区的高度,其中多数第二百分比指 示所述安全区高度占所述显示装置的高度的百分比。
进一步的,所述响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧,包括:
响应于在所述终端中播放视频图像,随机抽取所述视频图像中的至少一个视频帧或者抽取所述视频图像中的特定视频帧,其中所述特定视频帧为使用预先设置的方法抽取的具有特定特征的视频帧。
进一步的,所述截取所述视频帧的超过所述安全区大小的部分生成合成帧,包括:
根据所述视频帧的大小以及所述安全区的大小计算截取距离;
根据所述截取距离在所述视频帧的宽度方向和/或高度方向上截取帧片段;
将所述宽度方向上的帧片段结合生成合成帧;和/或,
将所述高度方向上的帧片段结合生成合成帧。
进一步的,所述检测所述合成帧中是否包含文字,包括:
将所述合成帧输入文字判断模型;
根据所述文字判断模型的输出判断所述合成帧中是否包含文字。
进一步的,所述文字判断模型通过卷积神经网络训练得到,其中将带有分类标记的训练集合输入所述卷积神经网络,通过监督所述卷积神经网络的输出结果将所述卷积神经网络训练成所述文字判断模型。
进一步的,所述如果所述合成帧中包含文字,则判断所述视频图像中的字幕越界,包括:
如果所述合成帧中包含文字,则判断所述视频图像中的字幕在所述视频图像的宽度方向和/或高度方向上越界。
进一步的,所述当判断所述视频图像中的字幕越界,缩小所述字幕的大小至所述安全区内,包括:
当判断所述视频图像中的字幕越界,对所述字幕进行缩放以使所述字幕全部位于所述的安全区内;或,将所述视频图像进行缩放以使所述字幕全部位于所述的安全区内。
根据本公开的另一个方面,还提供以下技术方案:
一种字幕越界的处理装置,包括:
尺寸获取模块,用于获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸;
安全区建立模块,用于根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸;
视频帧抽取模块,用于响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧;
帧合成模块,用于截取所述视频帧的超过所述安全区大小的部分生成合成帧;
文字检测模块,用于检测所述合成帧中是否包含文字;
越界判断模块,用于如果所述合成帧中包含文字,则判断所述视频图像中的字幕越界。
进一步的,所述装置还包括:
缩放模块,用于当判断所述视频图像中的字幕越界,缩小所述字幕的大小至所述安全区内。
进一步的,所述尺寸获取模块,还包括:
显示属性获取模块,用于获取终端的显示属性,所述显示属性中包括显示装置的高度以及宽度。
进一步的,所述安全区建立模块,还包括:
安全区宽度计算模块,用于根据第一百分比计算所述安全区的宽度,其中所述第一百分比指示所述安全区的宽度占所述显示装置的宽度的百分比;和/或,
安全区高度计算模块,用于根据第二百分比计算所述安全区的高度,其中多数第二百分比指示所述安全区高度占所述显示装置的高度的百分比。
进一步的,所述视频帧抽取模块,还用于:
响应于在所述终端中播放视频图像,随机抽取所述视频图像中的至少一个视频帧或者抽取所述视频图像中的特定视频帧,其中所述特定视频帧为使用预先设置的方法抽取的具有特定特征的视频帧。
进一步的,所述帧合成模块,还包括:
截取距离计算模块,用于根据所述视频帧的大小以及所述安全区的大小计算截取距离;
帧片段截取模块,用于根据所述截取距离在所述视频帧的宽度方向和/或高度方向上截取帧片段;
合成模块,用于将所述宽度方向上的帧片段结合生成合成帧;和/或,将所述高度方向上的帧片段结合生成合成帧。。
进一步的,所述文字检测模块,还包括:
输入模块,用于将所述合成帧输入文字判断模型;
判断模块,用于根据所述文字判断模型的输出判断所述合成帧中是否包含文字。
进一步的,所述文字判断模型通过卷积神经网络训练得到,其中将带有分类标记的训练集合输入所述卷积神经网络,通过监督所述卷积神经网络的输出结果将所述卷积神经网络训练成所述文字判断模型。
进一步的,所述越界判断模块,还包括:
越界类型判断模块,用于如果所述合成帧中包含文字,则判断所述视频图像中的字幕在所述视频图像的宽度方向和/或高度方向上越界。
进一步的,所述缩放模块,还用于:当判断所述视频图像中的字幕越界,对所述字幕进行缩放以使所述字幕全部位于所述的安全区内;或,将所述视频图像进行缩放以使所述字幕全部位于所述的安全区内。
根据本公开的又一个方面,还提供以下技术方案:
一种电子设备,包括:存储器,用于存储非暂时性计算机可读指令;以及处理器,用于运行所述计算机可读指令,使得所述处理器执行时实现上述任一字幕越界的处理方法所述的步骤。
根据本公开的又一个方面,还提供以下技术方案:
一种计算机可读存储介质,用于存储非暂时性计算机可读指令,当所述非暂时性计算机可读指令由计算机执行时,使得所述计算机执行上述任一方法中所述的步骤。
本公开公开一种字幕越界的处理方法、装置和电子设备。其中, 该字幕越界的处理方法包括:获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸;根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸;响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧;截取所述视频帧的超过所述安全区大小的部分生成合成帧;检测所述合成帧中是否包含文字;如果所述合成帧中包含文字,则判断所述视频图像中的字幕越界。本公开实施例的字幕越界的处理方法,通过设置安全区并判断超过安全区的帧片段上是否包含文字,解决了目前需要用户人工判断是否有字幕越界的技术问题。
上述说明仅是本公开技术方案的概述,为了能更清楚了解本公开的技术手段,而可依照说明书的内容予以实施,并且为让本公开的上述和其他目的、特征和优点能够更明显易懂,以下特举较佳实施例,并配合附图,详细说明如下。
附图说明
图1为现有技术中字幕越过显示屏幕的边界的示意图;
图2为根据本公开一个实施例的字幕越界的处理方法的流程示意图;
图3为根据本公开一个实施例的计算帧片段的截取距离的示意图;
图4为根据本公开一个实施例的合成帧的示意图;
图5为根据本公开一个实施例的字幕越界的处理方法的流程示意图;
图6为根据本公开一个实施例的字幕越界的处理装置的结构示意图;
图7为根据本公开实施例提供的电子设备的结构示意图。
具体实施方式
以下通过特定的具体实例说明本公开的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本公开的其他优点与功效。 显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。本公开还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本公开的精神下进行各种修饰或改变。需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
需要说明的是,下文描述在所附权利要求书的范围内的实施例的各种方面。应显而易见,本文中所描述的方面可体现于广泛多种形式中,且本文中所描述的任何特定结构及/或功能仅为说明性的。基于本公开,所属领域的技术人员应了解,本文中所描述的一个方面可与任何其它方面独立地实施,且可以各种方式组合这些方面中的两者或两者以上。举例来说,可使用本文中所阐述的任何数目个方面来实施设备及/或实践方法。另外,可使用除了本文中所阐述的方面中的一或多者之外的其它结构及/或功能性实施此设备及/或实践此方法。
还需要说明的是,以下实施例中所提供的图示仅以示意方式说明本公开的基本构想,图式中仅显示与本公开中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制,其实际实施时各组件的型态、数量及比例可为一种随意的改变,且其组件布局型态也可能更为复杂。
另外,在以下描述中,提供具体细节是为了便于透彻理解实例。然而,所属领域的技术人员将理解,可在没有这些特定细节的情况下实践所述方面。
本公开实施例提供一种字幕越界的处理方法。本实施例提供的该字幕越界的处理方法可以由一计算装置来执行,该计算装置可以实现为软件,或者实现为软件和硬件的组合,该计算装置可以集成设置在 服务器、终端设备等中。如图2所示,该字幕越界的处理方法主要包括如下步骤S201至步骤S206。其中:
步骤S201:获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸;
在本公开中,所述获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸,包括:获取终端的显示属性,所述显示属性中包括显示装置的高度以及宽度。具体的,对于智能手机,其系统信息中一般包括屏幕对象属性,该属性中包括了智能手机的屏幕的高度和宽度,其单位均为像素,对于一般的手机或者平板电脑等终端来说,由于屏幕的分辨率是固定的,因此所述屏幕对象的属性可以看作是常量,而对于普通桌面电脑等可以屏幕分辨率的终端来说,该显示属性也是存在于系统信息中的,可以从系统信息中读取,在此不再赘述。这里可以设获取到的显示装置的尺寸信息为NⅹM,其中N为显示装置的宽度,M为显示装置的高度,N≥1,M≥1。
步骤S202:根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸;
在本公开中,所述根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸,包括:根据第一百分比计算所述安全区的宽度,其中所述第一百分比指示所述安全区的宽度占所述显示装置的宽度的百分比;和/或,根据第二百分比计算所述安全区的高度,其中多数第二百分比指示所述安全区高度占所述显示装置的高度的百分比。具体的,所述的第一百分比和第二百分比可以预先设置在固定的存储位置上或者可以根据终端的人机交互接口接收用户的设置命令来实时的设置,设第一百分比为a%,第二百分比为b%,其中0<a≤100,0<b≤100,这样,安全区的宽度n=Nⅹa%,安全区的 高度m=Mⅹb%;在该步骤中,可以只计算安全区的宽度或高度,当只计算安全区的宽度时可以直接设置所述安全区的高度与所述尺寸信息中的高度相同,当只计算安全区的高度时可以直接设置所述安全区的宽度与所述尺寸信息中的宽度相同。
可以理解的,该步骤中还可以使用其他方法来建立安全区,如直接将所述安全区设置为与显示装置的尺寸相同或者直接设置安全区相对于显示装置的尺寸的偏移量等等,在此不再赘述。所述安全区定义了字幕的显示区域,以使字幕在显示时不会越出显示装置的边界。
步骤S203:响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧;
在本公开中,所述响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧,包括:响应于在所述终端中播放视频图像,随机抽取所述视频图像中的至少一个视频帧或者抽取所述视频图像中的特定视频帧,其中所述特定视频帧为使用预先设置的方法抽取的具有特定特征的视频帧。在该步骤中,所述抽取视频图像中的视频帧的抽取方式包括随机抽取,所述的随机抽取可以是随机抽取连续的几帧或者随机抽取固定间隔的几帧或者顺序的随机抽取几帧,随机的方式并不做限制,任何随机的抽取方式都可以应用到本公开中来。或者,在该步骤中也可以抽取特定的视频帧,所述特定的视频可以是使用预先设置的方法抽取具体特定特征的视频帧,比如通过文字识别模型识别出具有文字的视频帧,将所述具有文字的视频帧从视频图像中抽取出来。
步骤S204:截取所述视频帧的超过所述安全区大小的部分生成合成帧;
在本公开中,所述截取所述视频帧的超过所述安全区大小的部分 生成合成帧,包括:根据所述视频帧的大小以及所述安全区的大小计算截取距离;根据所述截取距离在所述视频帧的宽度方向和/或高度方向上截取帧片段;将所述宽度方向上的帧片段结合生成合成帧;和/或,将所述高度方向上的帧片段结合生成合成帧。。在该步骤中,所述根据所述视频帧的大小以及所述安全区的大小计算截取距离,可以是直接通过所述视频帧的宽度减去安全区的宽度,视频帧的高度减去视频帧的高度。具体的,如图3所示,设视频帧301的大小为700ⅹ1080,安全区302的大小为540ⅹ960,则可以计算在宽度方向的截取距离303为(700-540)/2=80,在高度方向的截取距离304为(1080-960)/2=60。所述根据所述视频帧的大小以及所述安全区的大小计算截取距离,还可以是以所述视频帧的宽度减去安全区的宽度的结果为阈值来计算截取距离,还以上述图3中的例子为例,以80为在宽度方向上的截取距离的最大值,以60为在高度方向的截取距离的最大值,来计算宽度方向的截取距离和高度方向的截取距离,如各截取距离的最大值的50%作为截取距离,则在宽度方向的截取距离为40,在高度方向上的截取距离为30。通过上述步骤得到截取距离,之后根据所述截取距离在所述视频帧的宽度方向和/或高度方向上截取帧片段,并将将所述宽度方向上的帧片段结合生成合成帧;和/或,将所述高度方向上的帧片段结合生成合成帧。,也即是,宽度方向上截取的两个帧片段合成一个合成帧,高度方向上截取的两个帧片段合成一个合成帧。如图4所示,为在宽度方向上的合成帧,其中左边的帧片段中包括部分“我”字,右边的帧片段中包括部分“人”字。可以理解的是,图4仅仅示出了宽度方向上的合成帧,高度方向的合成帧类似,只是换成上下两个帧片段合成,在此不再赘述。可以理解的是,虽然图4中示出的合成帧的帧片段中包括了文字的,但是实际上 生成该合成帧的帧片段中也可以不包括文字,该情况对应于字幕没有越界的情况,在此不再赘述。
步骤S205:检测所述合成帧中是否包含文字;
在本公开中,所述检测所述合成帧中是否包含文字,包括:将所述合成帧输入文字判断模型;根据所述文字判断模型的输出判断所述合成帧中是否包含文字。其中,所述文字判断模型通过卷积神经网络训练得到,其中将带有分类标记的训练集合输入所述卷积神经网络,通过监督所述卷积神经网络的输出结果将所述卷积神经网络训练成所述文字判断模型。在该步骤中,通过预先训练的卷积神经网络来判断合成帧中是否包含文字,所述的卷积神经网络可以是任何卷积申请网络的变形形式,在此不做限制,训练该模型时,首选需要形成训练集合,所述训练集合为带有标记的合成帧图片,如多张如图4所示的图像,被标记为含有文字。将训练集合中的图片输入所述卷积神经网络,并通过sigmoid函数输出,并将输出结果与所述标记对比,如果输入正确则保存当前卷积神经网络的参数,如果不正确则反馈给卷积神经网络使其调整参数并继续输入图片重复上述步骤,直至训练出适应训练集合中所有图片的参数,训练结束,形成文字判断模型。在该步骤中,将步骤S204中生成的合成帧输入所述文字判断模型中,根据模型的输出判断所述合成帧中是否包含文字,可选的,当模型输出为1认为合成帧中包含文字,当模型输出为0,认为合成帧中不包含文字。
可以理解的,上述检测所述合成帧中是否包含文字的实施例仅仅为举例,实际上任何可以检测图片中是否包含文字的方法均可以应用到本公开的技术方案中,在此不再赘述。
步骤S206:如果所述合成帧中包含文字,则判断所述视频图像 中的字幕越界。
在本公开中,所述如果所述合成帧中包含文字,则判断所述视频图像中的字幕越界,包括:如果所述合成帧中包含文字,则判断所述视频图像中的字幕在所述视频图像的宽度方向和/或高度方向上越界。在该步骤中,如果步骤S205中得到的结果为合成帧中包含文字,则判断所述图像中的字幕越界,进一步的,可以根据所述合成帧为宽度方向上的合成帧还是高度方向上的合成帧来判断所述字幕是在所述视频图像的宽度方向上越界还是高度方向上越界。
本公开公开了一种字幕越界的处理方法、装置和电子设备。其中,该字幕越界的处理方法包括:获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸;根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸;响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧;截取所述视频帧的超过所述安全区大小的部分生成合成帧;检测所述合成帧中是否包含文字;如果所述合成帧中包含文字,则判断所述视频图像中的字幕越界。本公开实施例的字幕越界的处理方法,通过设置安全区并判断超过安全区的帧片段上是否包含文字,解决了目前需要用户人工判断是否有字幕越界的技术问题。
如图5所示,上述字幕越界的处理方法,还进一步包括:
步骤S501:当判断所述视频图像中的字幕越界,缩小所述字幕的大小至所述安全区内。
具体的,所述当判断所述视频图像中的字幕越界,缩小所述字幕的大小至所述安全区内,包括:当判断所述视频图像中的字幕越界,对所述字幕进行缩放以使所述字幕全部位于所述的安全区内;或,将所述视频图像进行缩放以使所述字幕全部位于所述的安全区内。该步 骤为判断字幕越界后的自动处理步骤,当判断字幕越界,则将字幕缩小,直至字幕位于安全区内为止。缩小所述字幕的大小有两种方式,一种是直接缩放字幕本身,使用该方式时,一般来说字幕是与视频图像分离的,也就是说字幕是外挂式的,可以通过配置文件配置字幕的显示位置、字体的大小以及颜色等等,此时由于安全区的宽度和高度为已知值,只需要根据安全区的宽度和高度去配置字幕文件中的显示位置和/或字体大小等就可以将字幕缩放至所述安全区内;另外一种是直接缩放视频,有时候字幕和视频是合成在一起的,此时字幕是视频图像的一部分,无法单独缩放字幕,此时可以将视频图像进行缩放,将视频图像缩放至安全区的大小,此时所述字幕一定位于安全区内,也就解决了字幕越界的问题。
可以理解的是,上述缩小所述字幕的大小至所述安全区内的两种方式仅仅是举例,其他可以将字幕进行直接或间接缩放的方法均可以应用到本公开中,在此不再赘述。
在上文中,虽然按照上述的顺序描述了上述方法实施例中的各个步骤,本领域技术人员应清楚,本公开实施例中的步骤并不必然按照上述顺序执行,其也可以倒序、并行、交叉等其他顺序执行,而且,在上述步骤的基础上,本领域技术人员也可以再加入其他步骤,这些明显变型或等同替换的方式也应包含在本公开的保护范围之内,在此不再赘述。
下面为本公开装置实施例,本公开装置实施例可用于执行本公开方法实施例实现的步骤,为了便于说明,仅示出了与本公开实施例相关的部分,具体技术细节未揭示的,请参照本公开方法实施例。
本公开实施例提供一种字幕越界的处理装置。该装置可以执行上述字幕越界的处理方法实施例中所述的步骤。如图6所示,该装置 600主要包括:尺寸获取模块601、安全区建立模块602、视频帧抽取模块603、帧合成模块604、文字检测模块605和越界判断模块606。其中,
尺寸获取模块601,用于获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸;
安全区建立模块602,用于根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸;
视频帧抽取模块603,用于响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧;
帧合成模块604,用于截取所述视频帧的超过所述安全区大小的部分生成合成帧;
文字检测模块605,用于检测所述合成帧中是否包含文字;
越界判断模块606,用于如果所述合成帧中包含文字,则判断所述视频图像中的字幕越界。
进一步的,所述装置600还包括:
缩放模块,用于当判断所述视频图像中的字幕越界,缩小所述字幕的大小至所述安全区内。
进一步的,所述尺寸获取模块601,还包括:
显示属性获取模块,用于获取终端的显示属性,所述显示属性中包括显示装置的高度以及宽度。
进一步的,所述安全区建立模块602,还包括:
安全区宽度计算模块,用于根据第一百分比计算所述安全区的宽度,其中所述第一百分比指示所述安全区的宽度占所述显示装置的宽度的百分比;和/或,
安全区高度计算模块,用于根据第二百分比计算所述安全区的高 度,其中多数第二百分比指示所述安全区高度占所述显示装置的高度的百分比。
进一步的,所述视频帧抽取模块603,还用于:
响应于在所述终端中播放视频图像,随机抽取所述视频图像中的至少一个视频帧或者抽取所述视频图像中的特定视频帧,其中所述特定视频帧为使用预先设置的方法抽取的具有特定特征的视频帧。
进一步的,所述帧合成模块604,还包括:
截取距离计算模块,用于根据所述视频帧的大小以及所述安全区的大小计算截取距离;
帧片段截取模块,用于根据所述截取距离在所述视频帧的宽度方向和/或高度方向上截取帧片段;
合成模块,用于将所述宽度方向上的帧片段结合生成合成帧;和/或,将所述高度方向上的帧片段结合生成合成帧。
进一步的,所述文字检测模块605,还包括:
输入模块,用于将所述合成帧输入文字判断模型;
判断模块,用于根据所述文字判断模型的输出判断所述合成帧中是否包含文字。
进一步的,所述文字判断模型通过卷积神经网络训练得到,其中将带有分类标记的训练集合输入所述卷积神经网络,通过监督所述卷积神经网络的输出结果将所述卷积神经网络训练成所述文字判断模型。
进一步的,所述越界判断模块606,还包括:
越界类型判断模块,用于如果所述合成帧中包含文字,则判断所述视频图像中的字幕在所述视频图像的宽度方向和/或高度方向上越界。
进一步的,所述缩放模块,还用于:当判断所述视频图像中的字幕越界,对所述字幕进行缩放以使所述字幕全部位于所述的安全区内;或,将所述视频图像进行缩放以使所述字幕全部位于所述的安全区内。
图6所示装置可以执行图1和图5所示实施例的方法,本实施例未详细描述的部分,可参考对图1和图5所示实施例的相关说明。该技术方案的执行过程和技术效果参见图1和图5所示实施例中的描述,在此不再赘述。
下面参考图7,其示出了适于用来实现本公开实施例的电子设备700的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图7示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图7所示,电子设备700可以包括处理装置(例如中央处理器、图形处理器等)701,其可以根据存储在只读存储器(ROM)702中的程序或者从存储装置708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。在RAM 703中,还存储有电子设备700操作所需的各种程序和数据。处理装置701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。
通常,以下装置可以连接至I/O接口705:包括例如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等的输入装置706;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置707;包括例如磁带、硬盘等的存储装置708;以及通信装置 709。通信装置709可以允许电子设备700与其他设备进行无线或有线通信以交换数据。虽然图4示出了具有各种装置的电子设备700,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置709从网络上被下载和安装,或者从存储装置708被安装,或者从ROM 702被安装。在该计算机程序被处理装置701执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组 合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸;根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸;响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧;截取所述视频帧的超过所述安全区大小的部分生成合成帧;检测所述合成帧中是否包含文字;如果所述合成帧中包含文字,则判断所述视频图像中的字幕越界。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (13)

  1. 一种字幕越界的处理方法,包括:
    获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸;
    根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸;
    响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧;
    截取所述视频帧的超过所述安全区大小的部分生成合成帧;
    检测所述合成帧中是否包含文字;
    如果所述合成帧中包含文字,则判断所述视频图像中的字幕越界。
  2. 如权利要求1所述的字幕越界的处理方法,还包括:
    当判断所述视频图像中的字幕越界,缩小所述字幕的大小至所述安全区内。
  3. 如权利要求1所述的字幕越界的处理方法,所述获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸,包括:
    获取终端的显示属性,所述显示属性中包括显示装置的高度以及宽度。
  4. 如权利要求2所述的字幕越界的处理方法,所述根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸,包括:
    根据第一百分比计算所述安全区的宽度,其中所述第一百分比指示所述安全区的宽度占所述显示装置的宽度的百分比;和/或,
    根据第二百分比计算所述安全区的高度,其中多数第二百分比指 示所述安全区高度占所述显示装置的高度的百分比。
  5. 如权利要求1所述的字幕越界的处理方法,其中所述响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧,包括:
    响应于在所述终端中播放视频图像,随机抽取所述视频图像中的至少一个视频帧或者抽取所述视频图像中的特定视频帧,其中所述特定视频帧为使用预先设置的方法抽取的具有特定特征的视频帧。
  6. 如权利要求1所述的字幕越界的处理方法,其中所述截取所述视频帧的超过所述安全区大小的部分生成合成帧,包括:
    根据所述视频帧的大小以及所述安全区的大小计算截取距离;
    根据所述截取距离在所述视频帧的宽度方向和/或高度方向上截取帧片段;
    将所述宽度方向上的帧片段结合生成合成帧;和/或,
    将所述高度方向上的帧片段结合生成合成帧。
  7. 如权利要求1所述的字幕越界的处理方法,其中所述检测所述合成帧中是否包含文字,包括:
    将所述合成帧输入文字判断模型;
    根据所述文字判断模型的输出判断所述合成帧中是否包含文字。
  8. 如权利要求7所述的字幕越界的处理方法,其中,
    所述文字判断模型通过卷积神经网络训练得到,其中将带有分类标记的训练集合输入所述卷积神经网络,通过监督所述卷积神经网络的输出结果将所述卷积神经网络训练成所述文字判断模型。
  9. 如权利要求6所述的字幕越界的处理方法,其中所述如果所述合成帧中包含文字,则判断所述视频图像中的字幕越界,包括:
    如果所述合成帧中包含文字,则判断所述视频图像中的字幕在所述视频图像的宽度方向和/或高度方向上越界。
  10. 如权利要求2所述的字幕越界的处理方法,其中所述当判断所述视频图像中的字幕越界,缩小所述字幕的大小至所述安全区内,包括:
    当判断所述视频图像中的字幕越界,对所述字幕进行缩放以使所述字幕全部位于所述的安全区内;或,将所述视频图像进行缩放以使所述字幕全部位于所述的安全区内。
  11. 一种字幕越界的处理装置,包括:
    尺寸获取模块,用于获取终端的显示装置的尺寸信息,其中所述尺寸信息指示所述显示装置的尺寸;
    安全区建立模块,用于根据所述尺寸信息建立安全区,其中所述安全区小于或等于所述显示装置的尺寸;
    视频帧抽取模块,用于响应于在所述终端中播放视频图像,抽取所述视频图像中的视频帧;
    帧合成模块,用于截取所述视频帧的超过所述安全区大小的部分生成合成帧;
    文字检测模块,用于检测所述合成帧中是否包含文字;
    越界判断模块,用于如果所述合成帧中包含文字,则判断所述视频图像中的字幕越界。
  12. 一种电子设备,包括:
    存储器,用于存储计算机可读指令;以及
    处理器,用于运行所述计算机可读指令,使得所述处理器运行时实现根据权利要求1-10中任意一项所述的字幕越界的处理方法。
  13. 一种非暂态计算机可读存储介质,用于存储计算机可读指令,当所述计算机可读指令由计算机执行时,使得所述计算机执行权利要求1-10中任意一项所述的字幕越界的处理方法。
PCT/CN2020/094191 2019-06-06 2020-06-03 字幕越界的处理方法、装置和电子设备 WO2020244553A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/616,954 US11924520B2 (en) 2019-06-06 2020-06-03 Subtitle border-crossing processing method and apparatus, and electronic device
JP2021571922A JP7331146B2 (ja) 2019-06-06 2020-06-03 サブタイトルのクロスボーダーの処理方法、装置及び電子装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910493548.7 2019-06-06
CN201910493548.7A CN110177295B (zh) 2019-06-06 2019-06-06 字幕越界的处理方法、装置和电子设备

Publications (1)

Publication Number Publication Date
WO2020244553A1 true WO2020244553A1 (zh) 2020-12-10

Family

ID=67698044

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/094191 WO2020244553A1 (zh) 2019-06-06 2020-06-03 字幕越界的处理方法、装置和电子设备

Country Status (4)

Country Link
US (1) US11924520B2 (zh)
JP (1) JP7331146B2 (zh)
CN (1) CN110177295B (zh)
WO (1) WO2020244553A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110177295B (zh) * 2019-06-06 2021-06-22 北京字节跳动网络技术有限公司 字幕越界的处理方法、装置和电子设备
CN111225288A (zh) * 2020-01-21 2020-06-02 北京字节跳动网络技术有限公司 展示字幕信息的方法、装置以及电子设备
CN111414494A (zh) * 2020-02-17 2020-07-14 北京达佳互联信息技术有限公司 一种多媒体作品的展示方法、装置、电子设备及存储介质
CN112738629B (zh) * 2020-12-29 2023-03-10 北京达佳互联信息技术有限公司 视频展示方法、装置、电子设备和存储介质
CN114302211B (zh) * 2021-12-29 2023-08-01 北京百度网讯科技有限公司 视频播放方法、装置和电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102111601A (zh) * 2009-12-23 2011-06-29 大猩猩科技股份有限公司 内容可适性的多媒体处理系统与处理方法
US20160029016A1 (en) * 2014-07-28 2016-01-28 Samsung Electronics Co., Ltd. Video display method and user terminal for generating subtitles based on ambient noise
CN106210838A (zh) * 2016-07-14 2016-12-07 腾讯科技(深圳)有限公司 字幕显示方法及装置
CN109743613A (zh) * 2018-12-29 2019-05-10 腾讯音乐娱乐科技(深圳)有限公司 一种字幕处理方法、装置、终端及存储介质
CN110177295A (zh) * 2019-06-06 2019-08-27 北京字节跳动网络技术有限公司 字幕越界的处理方法、装置和电子设备

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4552426B2 (ja) 2003-11-28 2010-09-29 カシオ計算機株式会社 表示制御装置および表示制御処理のプログラム
CN101064177A (zh) * 2006-04-26 2007-10-31 松下电器产业株式会社 字幕显示控制设备
US8346049B2 (en) * 2007-05-21 2013-01-01 Casio Hitachi Mobile Communications Co., Ltd. Captioned video playback apparatus and recording medium
JP2009216815A (ja) 2008-03-07 2009-09-24 Sanyo Electric Co Ltd 投写型映像表示装置
CN101668132A (zh) * 2008-09-02 2010-03-10 华为技术有限公司 一种字幕匹配处理的方法和系统
CN102082930B (zh) * 2009-11-30 2015-09-30 新奥特(北京)视频技术有限公司 一种字幕文本替换的方法及装置
CN102082931A (zh) * 2009-11-30 2011-06-01 新奥特(北京)视频技术有限公司 一种自适应调整字幕区域的方法及装置
CN102088571B (zh) * 2009-12-07 2012-11-21 联想(北京)有限公司 一种字幕显示方法和终端设备
JP2013040976A (ja) 2009-12-11 2013-02-28 Panasonic Corp 画像表示装置及び画像表示方法
JP5930363B2 (ja) * 2011-11-21 2016-06-08 株式会社ソニー・インタラクティブエンタテインメント 携帯情報機器およびコンテンツ表示方法
JP6089454B2 (ja) 2012-06-07 2017-03-08 株式会社リコー 画像配信装置、表示装置及び画像配信システム
WO2015002442A1 (ko) * 2013-07-02 2015-01-08 엘지전자 주식회사 다시점 영상이 제공되는 시스템에서 부가 오브젝트를 포함하는 3차원 영상 처리 방법 및 장치
CN103700360A (zh) * 2013-12-09 2014-04-02 乐视致新电子科技(天津)有限公司 一种屏幕显示比例调整方法和电子设备
KR102227088B1 (ko) * 2014-08-11 2021-03-12 엘지전자 주식회사 전자기기 및 그것의 제어방법
EP3297274A4 (en) * 2015-05-14 2018-10-10 LG Electronics Inc. Display device and operation method therefor
US10019412B2 (en) * 2016-03-22 2018-07-10 Verizon Patent And Licensing Inc. Dissociative view of content types to improve user experience
CN106657965A (zh) * 2016-12-13 2017-05-10 奇酷互联网络科技(深圳)有限公司 识别3d格式视频的方法、装置及用户终端
JP6671613B2 (ja) 2017-03-15 2020-03-25 ソフネック株式会社 文字認識方法及びコンピュータプログラム
CN108769821B (zh) * 2018-05-25 2019-03-29 广州虎牙信息科技有限公司 游戏场景描述方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102111601A (zh) * 2009-12-23 2011-06-29 大猩猩科技股份有限公司 内容可适性的多媒体处理系统与处理方法
US20160029016A1 (en) * 2014-07-28 2016-01-28 Samsung Electronics Co., Ltd. Video display method and user terminal for generating subtitles based on ambient noise
CN106210838A (zh) * 2016-07-14 2016-12-07 腾讯科技(深圳)有限公司 字幕显示方法及装置
CN109743613A (zh) * 2018-12-29 2019-05-10 腾讯音乐娱乐科技(深圳)有限公司 一种字幕处理方法、装置、终端及存储介质
CN110177295A (zh) * 2019-06-06 2019-08-27 北京字节跳动网络技术有限公司 字幕越界的处理方法、装置和电子设备

Also Published As

Publication number Publication date
US11924520B2 (en) 2024-03-05
US20220248102A1 (en) 2022-08-04
JP7331146B2 (ja) 2023-08-22
JP2022535549A (ja) 2022-08-09
CN110177295A (zh) 2019-08-27
CN110177295B (zh) 2021-06-22

Similar Documents

Publication Publication Date Title
WO2020244553A1 (zh) 字幕越界的处理方法、装置和电子设备
US20190208230A1 (en) Live video broadcast method, live broadcast device and storage medium
CN108024079B (zh) 录屏方法、装置、终端及存储介质
WO2021008223A1 (zh) 信息的确定方法、装置及电子设备
WO2020253766A1 (zh) 图片生成方法、装置、电子设备及存储介质
WO2020220809A1 (zh) 目标对象的动作识别方法、装置和电子设备
WO2021160143A1 (zh) 用于显示视频的方法、装置、电子设备和介质
WO2016192325A1 (zh) 视频文件的标识处理方法及装置
CN104869305B (zh) 处理图像数据的方法及其装置
WO2021082639A1 (zh) 操作用户界面的方法、装置、电子设备及存储介质
WO2020151491A1 (zh) 图像形变的控制方法、装置和硬件装置
WO2021147461A1 (zh) 展示字幕信息的方法、装置、电子设备和计算机可读介质
US12019669B2 (en) Method, apparatus, device, readable storage medium and product for media content processing
US20220159197A1 (en) Image special effect processing method and apparatus, and electronic device and computer readable storage medium
WO2021143273A1 (zh) 直播流采样方法、装置及电子设备
US20230316529A1 (en) Image processing method and apparatus, device and storage medium
WO2021031847A1 (zh) 图像处理方法、装置、电子设备和计算机可读存储介质
WO2021027547A1 (zh) 图像特效处理方法、装置、电子设备和计算机可读存储介质
US20170161871A1 (en) Method and electronic device for previewing picture on intelligent terminal
US20230185444A1 (en) Multimedia information playback and apparatus, electronic device, and computer storage medium
WO2021073204A1 (zh) 对象的显示方法、装置、电子设备及计算机可读存储介质
WO2021139634A1 (zh) 素材展示方法, 装置, 终端及存储介质
WO2021027632A1 (zh) 图像特效处理方法、装置、电子设备和计算机可读存储介质
US12022162B2 (en) Voice processing method and apparatus, electronic device, and computer readable storage medium
CN117319736A (zh) 视频处理方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20818250

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021571922

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20818250

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20818250

Country of ref document: EP

Kind code of ref document: A1