WO2023071349A1 - 显示设备 - Google Patents

显示设备 Download PDF

Info

Publication number
WO2023071349A1
WO2023071349A1 PCT/CN2022/109162 CN2022109162W WO2023071349A1 WO 2023071349 A1 WO2023071349 A1 WO 2023071349A1 CN 2022109162 W CN2022109162 W CN 2022109162W WO 2023071349 A1 WO2023071349 A1 WO 2023071349A1
Authority
WO
WIPO (PCT)
Prior art keywords
subtitle
content
target
information
display device
Prior art date
Application number
PCT/CN2022/109162
Other languages
English (en)
French (fr)
Inventor
陆世明
段宝山
Original Assignee
海信视像科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202111255290.0A external-priority patent/CN113992960B/zh
Priority claimed from CN202111280246.5A external-priority patent/CN114007145A/zh
Application filed by 海信视像科技股份有限公司 filed Critical 海信视像科技股份有限公司
Priority to CN202280063352.4A priority Critical patent/CN118104241A/zh
Publication of WO2023071349A1 publication Critical patent/WO2023071349A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker

Definitions

  • the present application relates to the field of display devices, in particular to a display device.
  • subtitle information is generally displayed on the video screen, and the subtitle information is synchronized with the human voice and dialogue of the characters in the video screen.
  • the subtitle information By displaying the subtitle information, on the one hand, it provides a language translation function to convert the human voice into the user's habitual language.
  • Language and text on the other hand, can also provide assistance for the hearing-impaired, so that users can understand the content of the video playback.
  • Conventional subtitle information is displayed sentence by sentence at a fixed position (such as the bottom) on the video screen, but this subtitle display method is not friendly to users with hearing impairments. For example, in a video where multiple people speak at the same time, the hearing impaired cannot understand Which character the current subtitle corresponds to.
  • the embodiment of the present application provides a display device, including: a display for displaying video and its subtitle information; a communicator for communicating with a resource server; and a controller configured to: receive the resource server synchronously sent Video data and subtitle data, the subtitle data including subtitle information, time information and target information for indicating the sounding object of the subtitle information; according to the video data and the target information, calculate the position of the sounding object in the video picture The relative position and size information in the video screen; according to the time information, the relative position and size information of the sounding object in the video screen, control the display to display the subtitle information on the video screen.
  • the embodiment of the present application also provides a method for a display device.
  • the method includes: receiving video data and subtitle data synchronously sent by a resource server, where the subtitle data includes subtitle information, time information, and is used to indicate the subtitle information
  • the target information of the sounding object according to the video data and the target information, calculate the relative position and size information of the sounding object in the video picture; according to the time information, the relative position and size information of the sounding object in the video picture position and size information, and display the subtitle information on the video screen.
  • Figure 1 shows the usage scenario of the display device
  • FIG. 2 is a hardware configuration block diagram of the control device 100
  • FIG. 3 is a block diagram of a hardware configuration of a display device 200
  • FIG. 4 is a software configuration diagram in the display device 200
  • Fig. 5(a) is a display effect diagram of subtitle information 1 and 2 when a man and a woman speak at the same time before improvement;
  • Figure 5(b) is a display effect diagram of subtitle information 3 when a woman speaks alone before improvement
  • Fig. 5(c) is a display effect diagram of subtitle information 4 when switching to a man speaking alone before improvement
  • Fig. 5(d) is a display effect diagram of the video screen when the man and the woman go out before the improvement
  • Fig. 6(a) is the display effect diagram of subtitle information 1 and 2 when men and women speak at the same time after improvement
  • Figure 6(b) is a display effect diagram of the subtitle information 3 when the improved woman speaks alone;
  • Fig. 6(c) is a display effect diagram of subtitle information 4 when switching to a man speaking alone after improvement
  • Figure 6(d) is a display effect diagram of subtitle information 5 when men and women go out after improvement
  • Fig. 7 is a flow chart of a subtitle display method
  • FIG. 8 is a schematic diagram of a logical architecture of subtitle display
  • FIG. 9 is a schematic diagram of playing video content with subtitle content on a display device 200 according to some embodiments.
  • FIG. 10 is a schematic diagram of a menu page with a subtitle property page entry displayed on the user interface of the display device 200 according to some embodiments;
  • FIG. 11 is a schematic diagram of a subtitle property page displayed on a user interface of the display device 200 according to some embodiments.
  • FIG. 12 is a schematic diagram of a subtitle encoding page on a user interface of a display device 200 according to some embodiments.
  • FIG. 13 is a schematic diagram of a font size page on a user interface of the display device 200 according to some embodiments.
  • FIG. 14 is a schematic diagram of a content color page on a user interface of the display device 200 according to some embodiments.
  • FIG. 15 is a schematic diagram of a background color page on a user interface of the display device 200 according to some embodiments.
  • Fig. 16 is a schematic diagram of an interaction process between a display device 200 and a user according to some embodiments
  • 17 is a flow chart of a method for previewing subtitles on a display device according to some embodiments.
  • Fig. 18 is another schematic diagram of the interaction process between the display device 200 and the user according to some embodiments.
  • FIG. 19 is a schematic diagram of a user interface for pausing playing video content on a display device 200 according to some embodiments.
  • FIG. 20 is a schematic diagram of a position of a preset display area on a display device 200 according to some embodiments.
  • Fig. 21 is another schematic diagram of the position of the preset display area on the display device 200 according to some embodiments.
  • FIG. 22 is a schematic diagram of a transparency page on a user interface of the display device 200 according to some embodiments.
  • FIG. 23 is a schematic diagram of displaying subtitles with a background transparency of 80% on the display device 200 according to some embodiments.
  • Fig. 24 is a schematic diagram of subtitle content processing between modules on the display device 200 according to some embodiments.
  • Fig. 25 is another schematic diagram of subtitle content processing between modules on the display device 200 according to some embodiments.
  • Fig. 1 is a schematic diagram of a usage scenario of a display device according to an embodiment.
  • the display device 200 also performs data communication with the server 400 , and the user can operate the display device 200 through the smart device 300 or the control device 100 .
  • control device 100 may be a remote controller, and the communication between the remote controller and the display device includes at least one of infrared protocol communication, Bluetooth protocol communication, and other short-distance communication methods, and the display device is controlled wirelessly or wiredly.
  • Device 200 may be a remote controller, and the communication between the remote controller and the display device includes at least one of infrared protocol communication, Bluetooth protocol communication, and other short-distance communication methods, and the display device is controlled wirelessly or wiredly.
  • the smart device 300 may include any one of a mobile terminal, a tablet computer, a computer, a notebook computer, an AR/VR device, and the like.
  • the smart device 300 can also be used to control the display device 200 .
  • the display device 200 is controlled using an application program running on the smart device.
  • the smart device 300 and the display device may also be used for data communication.
  • the display device 200 can also be controlled in a manner other than the control device 100 and the smart device 300.
  • the module for obtaining voice commands configured inside the display device 200 can directly receive the user's voice command control
  • the user's voice command control can also be received through the voice control device provided outside the display device 200 .
  • the display device 200 also performs data communication with the server 400 .
  • the display device 200 may be allowed to communicate via a local area network (LAN), a wireless local area network (WLAN), and other networks.
  • the server 400 may provide various contents and interactions to the display device 200 .
  • FIG. 2 is a configuration block diagram of the control device 100 according to the exemplary embodiment.
  • the control device 100 includes a controller 110 , a communication interface 130 , a user input/output interface 140 , a memory, and a power supply.
  • the control device 100 can receive the user's input operation instruction, and convert the operation instruction into an instruction that the display device 200 can recognize and respond to, and play an intermediary role between the user and the display device 200 .
  • the communication interface 130 is used for communicating with the outside, and includes at least one of a WIFI chip, a Bluetooth module, NFC or an alternative module.
  • the user input/output interface 140 includes at least one of a microphone, a touch pad, a sensor, a button or an alternative module.
  • FIG. 3 is a block diagram of a hardware configuration of a display device 200 according to an exemplary embodiment.
  • the display device 200 includes a tuner and demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface. at least one.
  • the controller includes a CPU, a video processor, an audio processor, a graphics processor, a RAM, a ROM, a first interface to an nth interface for input/output.
  • the display 260 includes a display screen component for presenting images, and a drive component for driving image display, for receiving image signals output from the controller, and displaying video content, image content, and menu manipulation interface. Components and user manipulation of the UI interface, etc.
  • the display 260 may be at least one of a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.
  • the tuner-demodulator 210 receives broadcast TV signals through wired or wireless reception, and demodulates audio and video signals, such as EPG data signals, from multiple wireless or cable broadcast TV signals.
  • the communicator 220 is a component for communicating with external devices or servers according to various communication protocol types.
  • the communicator may include at least one of a Wifi module, a Bluetooth module, a wired Ethernet module and other network communication protocol chips or near field communication protocol chips, and an infrared receiver.
  • the display device 200 can establish transmission and reception of control signals and data signals with the control device 100 or the server 400 through the communicator 220 .
  • the detector 230 is used to collect signals of the external environment or interaction with the outside.
  • the detector 230 includes a light receiver, which is a sensor for collecting ambient light intensity; or, the detector 230 includes an image collector, such as a camera, which can be used to collect external environmental scenes, user attributes or user interaction gestures, or , the detector 230 includes a sound collector, such as a microphone, for receiving external sound.
  • the external device interface 240 may include, but is not limited to, the following: High Definition Multimedia Interface Interface (HDMI), Analog or Data High Definition Component Input Interface (Component), Composite Video Input Interface (CVBS), USB Input Interface (USB) , RGB port, etc. any one or more interfaces. It may also be a composite input/output interface formed by the above-mentioned multiple interfaces.
  • HDMI High Definition Multimedia Interface Interface
  • Component Composite Video Input Interface
  • CVBS Composite Video Input Interface
  • USB USB Input Interface
  • RGB port etc. any one or more interfaces. It may also be a composite input/output interface formed by the above-mentioned multiple interfaces.
  • the controller 250 and the tuner-demodulator 210 may be located in different split devices, that is, the tuner-demodulator 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box wait.
  • the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in the memory.
  • the controller 250 controls the overall operations of the display device 200 . For example, in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.
  • the controller includes a central processing unit (Central Processing Unit, CPU), a video processor, an audio processor, a graphics processing unit (Graphics Processing Unit, GPU), RAM Random Access Memory, RAM), ROM (Read- Only Memory, ROM), at least one of the first interface to the nth interface for input/output, a communication bus (Bus), and the like.
  • CPU Central Processing Unit
  • video processor video processor
  • audio processor audio processor
  • graphics processing unit Graphics Processing Unit, GPU
  • RAM Random Access Memory
  • ROM Read- Only Memory
  • CPU processor It is used to execute the operating system and application program instructions stored in the memory, and to execute various application programs, data and content according to various interactive instructions received from the outside, so as to finally display and play various audio and video content.
  • a CPU processor may include multiple processors. For example, including a main processor and one or more sub-processors.
  • the user can input user commands through a graphical user interface (GUI) displayed on the display 260, and the user input interface receives user input commands through the graphical user interface (GUI).
  • GUI graphical user interface
  • the user may input a user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through a sensor to receive the user input command.
  • the user input interface 280 is an interface that can be used to receive control input (such as: physical buttons on the display device body, or others).
  • the system is divided into four layers, from top to bottom are respectively the application (Applications) layer (abbreviated as “application layer”), application framework (Application Framework) layer (abbreviated as “framework layer”) "), Android runtime (Android runtime) and system library layer (referred to as “system runtime layer”), and the kernel layer.
  • application layer application layer
  • application framework Application Framework
  • Android runtime Android runtime
  • system library layer system library layer
  • the application framework layer includes managers (Managers), content providers (Content Provider) etc.
  • the manager includes at least one of the following modules: activity manager (Activity Manager) Interact with all activities running in the system; the Location Manager is used to provide system services or applications with access to the system location service; the Package Manager is used to retrieve the information currently installed on the device Various information related to the application package; Notification Manager (Notification Manager) is used to control the display and clearing of notification messages; Window Manager (Window Manager) is used to manage icons, windows, toolbars, wallpapers on the user interface and desktop widgets.
  • Activity Manager Activity Manager
  • the Location Manager is used to provide system services or applications with access to the system location service
  • the Package Manager is used to retrieve the information currently installed on the device Various information related to the application package
  • Notification Manager Notification Manager
  • Window Manager Window Manager
  • a video resource may be obtained from an external signal source (such as a set-top box) or a network, and the video resource may be loaded and played.
  • the display device plays video resources, in addition to playing video data, it generally also plays subtitle information synchronously.
  • the subtitle information is the text converted from the voice content spoken by the speaking object.
  • the original sound of the film is in English, and it is aimed at users in mainland China.
  • the original sound can be translated into simplified Chinese language, and then the subtitle information can be displayed in simplified Chinese.
  • audio content into visual subtitle information it is also easy for hearing-impaired people to understand the plot and content conveyed by video resources.
  • operators can also provide subtitle data of the video resources together.
  • the subtitle data includes several pieces of subtitle information, and configures corresponding time information for each piece of subtitle information.
  • the time information It is used to indicate the time node when the subtitle information is displayed. For example, the total playback time of a certain video resource is 30 minutes, and the subtitle information 1 is configured to be displayed at the time node when the video playback progress is the 50th second.
  • each video resource may be associated with a playback time axis, the length of which is equal to the total duration of the video, and the display nodes of each piece of subtitle information included in the video resource are displayed on the playback time axis mark, the subtitle information ID to be displayed can be recorded at each marked node, so that the display device knows which subtitle information or subtitle information should be displayed at the node.
  • Each marked node on the playback time axis can be mapped to one or more pieces of subtitle information.
  • the display device when it receives the video data, it receives the subtitle data synchronously, and controls the subtitle display according to the current time and the time information preset by the operator.
  • the display device receives the subtitle data synchronously, and controls the subtitle display according to the current time and the time information preset by the operator.
  • the man and the woman in the video screen make voices at the same time, thus displaying two parallel subtitle information 1 and subtitle information 2, the subtitle information 1 corresponds to the man's voice, and the subtitle information 2 corresponds to the woman's voice,
  • the subtitle information 1 is "Let's go out for dinner”
  • the subtitle information 2 is "I'm a little hungry, it's time to have dinner”.
  • the time node corresponding to Figure 5(b) is 19:30:32. At this time node, only the woman in the video screen speaks alone, thus displaying a piece of subtitle information 3, for example, subtitle information 3 is "OK, I want to eat hot pot”.
  • the time node corresponding to Figure 5(c) is 19:30:33, and only the man in the video picture speaks alone, thus generating a piece of subtitle information 4, which is the response to subtitle information 3, for example, subtitle information 4 is "OK , I treat you, let's go.” Afterwards, when the man and the woman went out, neither of them made a sound, so the rendering shown in Figure 5(d) was displayed without subtitles.
  • the display position and format of the subtitle information are generally fixed, for example, the subtitle information in the drawings of this application is always Displayed at the bottom of the video screen, the font, font size, font shape and font color of the text in the subtitle information are also uniform, and the subtitle display mode is relatively simple, not rich and vivid; for the example of 5(a), the hearing Naturally, users can accurately distinguish subtitle information 1 from a man and subtitle information 2 from a woman based on the differences in the timbre and pitch of male and female voices. For the hearing-impaired, they can see the man and woman in the video.
  • the mouths of the characters are all active when speaking, and the positions of the characters are randomly distributed.
  • the content expressed by the subtitle information 1 and the subtitle information 2 is similar, so that the hearing-impaired people cannot distinguish who said the two side-by-side subtitle information;
  • the physical sounds generated by the environment in the scene are displayed without subtitles. For example, in the example of 5(d), when two men and women go out, there will be a "bang" door closing sound, but there is no onomatopoeic subtitles for the closing sound. People with disabilities cannot perceive sounds triggered by the environment through auditory and visual perception. It can be seen that the ordinary subtitle display method is not friendly to the hearing-impaired, which affects their viewing and understanding of the video.
  • the operator when configuring the subtitle data, in addition to setting the subtitle information and time information, the operator also adds target information.
  • the target information is used to indicate the voice of the subtitle information Objects, the sounding objects mentioned in this application are not only biological objects with the ability to produce sound, such as people, animals, etc., but also non-biological objects that can produce physical sounds in the environment, such as thunderstorms, rivers, cars, doors, broadcast speakers, etc. .
  • the target information specifically includes image features or identity marks used to describe the utterance object, such as a man, a woman, a cat, an airplane, and the like.
  • voice processing such as semantic recognition can be performed on voice objects of the character type based on their actual voice content in video resources, so as to convert them into corresponding text information;
  • Object the sound in the video resource can be onomatopoeic to generate subtitle information.
  • voice processing such as semantic recognition can be performed on voice objects of the character type based on their actual voice content in video resources, so as to convert them into corresponding text information;
  • Object the sound in the video resource can be onomatopoeic to generate subtitle information.
  • voice processing such as semantic recognition can be performed on voice objects of the character type based on their actual voice content in video resources, so as to convert them into corresponding text information
  • Object the sound in the video resource can be onomatopoeic to generate subtitle information.
  • a cat it can be converted into a cry of " ⁇ ”, for thunder, it can be converted into a thunderous sound of "Boom ⁇ Boom", for The camera can be converted into a "click” camera sound, and so on.
  • the improved subtitle data includes not only human voice subtitle information and its display nodes, but also subtitle information of other non-human voice objects capable of producing sound and their display nodes, so that after the display device parses the subtitle data, It can not only display human subtitles, but also non-human subtitles such as animals, insects, and the environment, making the subtitle display richer and more vivid, and closer to the content and effect of the actual video sound.
  • the target information can also include a description of the location distribution of the sounding object.
  • the sounding object can be indicated and located relatively more accurately, such as in the example of Figure 5(a) , including three object elements, that is, man, woman and door. The three are similar to standing together, and the position distribution of each object element can be set by sorting the positions.
  • the position of the door The position distribution is "first from the left”, the position distribution of men is “second from the left”, and the position distribution of women is “third from the left”; if the order is from right to left, the position distribution of women is “first from the right”, and the position distribution of men The position distribution is “second from the right”, and the position distribution of the door is “third from the right”.
  • the column number, the position distribution of the door is [1, 1], the position distribution of men is [1, 2], and the position distribution of women is [1, 3].
  • the position distribution of the object element is added to the target information of the corresponding subtitle data.
  • the target information may also include state description information of the sounding object, the state description information is used to describe the state of the sounding object when making a sound, for example, for creatures such as characters and animals, the state description information Including but not limited to mental/emotional states such as calm, excited, happy, and sad, as well as physical states such as fatigue, pain, and sleepiness, as well as biological activity states, etc. This is because both physical and mental states and activity states may affect the vocalization of a creature Tone, intonation, volume, etc. People with good hearing can naturally perceive the state of biological objects in the video through hearing. However, people with hearing impairment mainly rely on subtitles to understand and feel the content of the video.
  • Disabled people convey the emotions and state fluctuations of the voice object, so add the state description and definition of the subtitle voice object in the target information, so that after the display device parses the subtitle data, it can use the state description information as a reference to match the subtitle information
  • the special effect implemented on the subtitle information can be that the subtitle text is red, and the font size is relatively increased;
  • the voice object is a mouse, its activity state For "quietly gnawing”, the special effect implemented can be to make the subtitle information looming, so as to convey the faint and slight effect of the sound.
  • the state description information includes but not limited to running state, working state, natural state, etc.
  • the display device can also use the state description information as For reference, adapt the special effects implemented for the subtitles. For example, if the sounding object is a door and its state is "closed door”, then a special effect similar to an explosion will be implemented on the subtitle information when the door is closed, such as "bang", to convey the effect of a sudden and loud sound
  • the optional special effect is to embed the subtitle information in the small speaker icon, so as to present the effect of "broadcasting”.
  • the subtitle display will be more vivid, and it will be more conducive to conveying the status of the sounding object in the video to the user, improving the accuracy and richness of the subtitle display, thereby providing users, especially It is to provide a better viewing experience of the video and its subtitles for the hearing impaired. It should be noted that the setting and implementation of subtitle special effects are not limited to the examples in this application.
  • the play control includes but is not limited to start playing, pause playing, seek, Double-speed playback, etc.; on the other hand, after the video starts playing, it controls the display of subtitles according to the video playback process and the pre-marked time information.
  • the display device parses the subtitle data to obtain a sequence of subtitle information, the sequence of subtitle information includes all subtitle information of the current video resource, and time information and target information corresponding to each piece of subtitle information.
  • the display device parses the subtitle data to obtain a sequence of subtitle information, the sequence of subtitle information includes all subtitle information of the current video resource, and time information and target information corresponding to each piece of subtitle information.
  • Subtitle 2 ⁇ Voice object: woman; position: third from left; state: calm; time node: 19:30:31; subtitle information 2: "I'm a little hungry, it's time for dinner” ⁇ ;
  • Subtitle 3 ⁇ Voice object: woman; position: third from left; state: excited; time point: 19:30:32; subtitle information 3: "Okay, I want to eat hot pot” ⁇ ;
  • Subtitle 4 ⁇ Voice object: man; position: second from left; state: excited; time node: 19:30:33; subtitle information 4: "OK, I treat you, let's go” ⁇ ;
  • Subtitle 5 ⁇ Sound object: door; position: right one; status: closed; time node: 19:31:15; subtitle information 5: "bang" ⁇ ]
  • subtitle 1 and subtitle 2 correspond to the same time node 19:30:31, that is, there are many people speaking at the same time at the 19:30:31 node, then referring to the example in Figure 6(a), at 19:30:31 At this time point, the subtitle information 1 "Let's go out to dinner" is displayed in the local area at the second left position where the man is, and the subtitle information 2 "I'm a little hungry" is displayed in the local area at the third left position where the woman is.
  • the display device traverses and displays the subtitle information included in the subtitle information sequence in time sequence. During this period, the display device can determine whether all the subtitle information in the sequence has been displayed completely, and if all the subtitle information in the sequence have been displayed, then according to The user selects and orders the next video resource, continues to obtain the video data and subtitle data of the next video resource, and controls the subtitle display according to the implementation methods of the above examples; if the subtitle display sequence has not been displayed yet, continue to follow the video Play progress, control the display of subtitle information in the sequence until all the subtitle information in the sequence are displayed.
  • the method is executed by the controller 250 at the display device end, and the method includes the following steps:
  • Step S01 receiving video data and subtitle data synchronously sent by the resource server.
  • the resource server is a generalization of video operators, which is equivalent to a signal source providing video resources.
  • the resource server can be a server of network resources, or a server of operators such as cable broadcast TV and TV boxes.
  • the subtitle data includes subtitle information expressing the audio content of the video in text form, time information used to indicate the subtitle display node, and target information used to indicate the sounding object corresponding to the subtitle information.
  • the target information includes image features/identities, location distribution and status description information of the speaking object.
  • the subtitle data can be embedded in the video data, or the subtitle data can also be associated and bound with the video data as independent data.
  • Step S02 according to the video data and the target information, calculate the relative position and size information of the sounding object in the video frame.
  • the size information of the sounding object in the video image is used to make the display device determine the font size and the area covered by the subtitle information In order to avoid inconvenience for users to browse due to too small font, and to avoid subtitle information covering both audible and unvoiced object elements at the same time due to too large font, so as to ensure that hearing-impaired users can accurately identify the audible object corresponding to subtitle information.
  • the display device after the display device receives the video data, it can extract a frame of video image from it, and each frame of video image has a corresponding display time stamp to indicate when the frame of video image is displayed on the screen , and the subtitle display is related to the video playback process. For example, at a certain time node, it is necessary to start displaying subtitle information 1 on the S-th frame of the video image.
  • the display Since the speaking object A will consume time after speaking a line, let the time-consuming of this utterance be T (unit is second/s), when the voice object A finishes speaking the lines corresponding to the subtitle information 1, synchronously, the display cancels the display of the subtitle information 1, then the duration of displaying the subtitle information 1 is T, and the start and end time of the display of the subtitle information 1 In the segment, the display refreshes T*f frames of video images, where f is the refresh rate (unit Hz), that is to say, the video images from the Sth frame to the (S+T*f)th frame played continuously in the video data display the same A subtitle message 1. Since the display position of the subtitle information 1 in this application is associated with the position distribution of the sounding object A, the sounding object A may move.
  • the sounding object A is located on the left side of the video screen at the Sth frame.
  • (K is less than or equal to T*f)
  • the position of the sounding object A is transformed to the center of the video screen, so the dynamic position change of the sounding object A in the Sth frame to the (S+T*f)th frame can be tracked, and the The subtitle information 1 is linked with the movement of the sounding object A, so as to ensure the accuracy of the subtitle information display.
  • step S02 follow the video playback process, first extract the target video image from the video data, the target video image is the Si frame to (Si+Ti*f) frame, where Si is the i-th frame in the sequence
  • Si is the i-th frame in the sequence
  • Ti is the time-consuming sounding corresponding to the i-th subtitle information in the sequence
  • 1 ⁇ i ⁇ M is the subtitle included in the subtitle information sequence
  • M is the subtitle included in the subtitle information sequence
  • step S02 after the target video image is extracted, refer to the description of the target information on the vocal object, and segment and identify the vocal object from the target video image.
  • An image coordinate system is constructed, and the coordinates (x, y) of the sounding object in the image coordinate system are calculated, and the height h and width w of the sounding object included in the size information are calculated.
  • a key point is selected from the sounding object, and the coordinates (x, y) of the key point are calculated.
  • the key point may be a midpoint or an edge point of the sounding object.
  • ymax is the coordinate value of the highest point on the utterance object on the y-axis (height direction)
  • ymin is the coordinate value of the lowest point on the utterance object on the y-axis
  • xmax is the rightmost point on the utterance object on the x-axis (width direction).
  • xmin is the coordinate value of the leftmost point on the sounding object on the x-axis.
  • a processing model can be constructed and trained.
  • the processing model can use a deep learning model, such as a deep network model based on a convolutional neural network, etc., and the display device can call the processing model to complete Such as image segmentation, target recognition and other analytical processing.
  • One end of the processing model receives the input of the target video image, and the other end of the processing model provides an output result, which includes the object element Objectj segmented and recognized from the target video image, and the coordinates of each object element Objectj ( xj, yj), height hj and width wj, where j represents the serial number of the object element in the target video image, 1 ⁇ j ⁇ N, and N is the total number of object elements in the target video image.
  • the sounding object matching the target information is screened out from the object element Objectj, and the coordinates, height and width of the sounding object are obtained from the output result of the processing model.
  • the position coordinates and dimensions of the model, the format of the output result of the processing model is, for example, [ ⁇ Object1: door; x1:150; y1:450; w1:300; h1:900 ⁇ , ⁇ Object2: man; x2:750; y2:536; w2:203; h2:714 ⁇ , ⁇ Object3: woman; x3:975; y3:480; w3:152; h3:655 ⁇ ].
  • the target information is indicated as ⁇ speaking object: woman; position: third from the left; state: excited ⁇
  • the controller 250 uses the target information and the output result of the processing model to perform screening and matching, and the matching vocalizing object is Object3 in the object element, then
  • the target video image and target information can be used as input items and input into the processing model at the same time, and the processing model includes image segmentation, target recognition and sounding objects.
  • the processing model can directly output the sounding object and its coordinates and dimensions, and there is no need for the controller to match the sounding object from the object elements.
  • the format of the model output result is, for example, ⁇ speaking object: woman; coordinates: (975, 480); width: 152; height: 655 ⁇ .
  • the processing model can be evolved through long-term training and correction, so that the processing efficiency of the model and the accuracy of the output result can be enhanced, and better subtitle effects can be provided, that is, the processing model can be continuously updated, which means A process can be executed by the display device, but maintaining the processing model will occupy the processing resources of the controller and increase the memory overhead, which may affect the performance of the display device.
  • a model server can be set up, which can communicate with the display device. The model server is used to build and train the processing model, and update the processing model. Therefore, in the evolution process after the processing model is constructed, multiple The display device downloads the new version model from the model server to replace the old version model, so that the display device end processing model can be updated. In this improved method, the model server replaces the display device to update and maintain the processing model, thereby reducing the memory overhead and CPU processing resource consumption of the display device.
  • the model server pushes a model update message to the display device each time the model server successfully updates the processing model; when the display device receives the model update message, it requests the model server to download the updated processing model. After the download is completed, the display device The old version of the processing model is deleted, and the updated processing model is stored locally, and then the display device can call the current latest version of the processing model to analyze and process the target video image.
  • the target video image before the target video image is input to the processing model, can be preprocessed according to the requirements of the neural network.
  • the preprocessing includes but is not limited to scaling and binarizing the target video image. processing, grayscale processing, etc.
  • the neural network only accepts images with a resolution of 288*288, and the extracted target video image has a resolution of 1280*720, so the target video image is compressed in advance to reduce the target video image to 288*288 ;
  • the neural network only accepts black and white images, and the extracted target video image is a color image, then the target video image can be binarized to convert the color image into a black and white image.
  • Step S03 controlling a display to display the subtitle information on the video screen according to the time information, the relative position and size information of the utterance object in the video screen.
  • the display position of the subtitle information is determined according to the relative position (comprising coordinates) of the sounding object calculated in step S02, to realize The subtitle is linked with the sounding object, so that the user can accurately identify which object element the current subtitle information is sent from, and, according to the size information of the sounding object calculated in step S02, determine the font size and the area occupied by the subtitle information, etc.
  • the size information of the sounding object is not limited to the width and height included in the foregoing embodiments, for example, it may also be in the form of the area of the sounding object.
  • the display device parses out the state description information of the sounding object configured in the target information, it implements a display special effect adapted to the state description information on the subtitles.
  • the display device can maintain a state-special effect list, which records the preset special effects of the sounding object in different states, and supports the user to add, delete or modify the state special effects, just as an example, For example, the default special effect in the angry state is a large red bold font, the default special effect in the active state of the sounding object from far to near is the animation effect of font gradient enlargement, the default special effect in the weak state is subtitle flickering, etc. wait.
  • the display format of subtitle information is not limited to font format and special effects, but also includes line spacing, character spacing, language, etc.
  • the subtitle display format that is suitable for the sounding object can be determined, thereby Draw the current subtitle template according to the subtitle display format, that is to say, the font, special effects, language and other formats of the subtitle information are constrained in the subtitle template, and when the time node indicated by the time information is reached, the location where the sounding object is located on the video screen According to the subtitle template, the subtitle information is loaded and displayed at the position, thereby improving the diversity, accuracy and vividness of the subtitle display, and providing users, especially the hearing-impaired, with a better viewing experience of videos and their subtitles.
  • FIG. 8 provides a subtitle display architecture, which generally includes a server end and a display device end, and the server end can be refined to include a resource server and a model server.
  • the resource server is used to provide video data and subtitle data of video resources to the display device, and on the resource server side, the operator will add configuration target information to the subtitle data, so as to provide reference for the subtitle display format for the display device side;
  • model The server allows the user to create, train and update the processing model to realize the management and maintenance of the processing model, and when the processing model is successfully updated, it will promptly notify the display device to upgrade the model version.
  • five modules can be configured on the display device side, which are respectively a data receiving module, a picture capture module, a neural network processing module, a subtitle parsing module, and a rendering and rendering module. These functional modules can be configured in In the controller 250, the controller coordinates the logical operation among the control modules.
  • the data receiving module can receive the video data and subtitle data sent by the resource server, and send the subtitle data to the subtitle analysis module, and send the video data to the decoder and the picture capture module respectively.
  • the data receiving module may first separate the subtitle data from the video data, and then send the subtitle data to the subtitle parsing module.
  • the decoder performs decoding processing, and sends the decoded data to the display to realize video playback, wherein the decoder includes a video decoder and an audio decoder.
  • the image capture module is used to extract the target video image, and store the target video image in the memory for the neural network processing module to process the target video image.
  • the image capture module may preprocess the extracted target video image according to the requirements of the neural network processing module for the image to be processed.
  • the neural network processing module is used to complete two functions, one of which is to load the processing model stored locally, then read the target video image from the memory, and input the target video image to the processing model , finally send the output result of the processing model to the rendering module; secondly, the neural network processing module can download the new version of the processing model from the model server according to the model update message pushed by the model server, and delete the old version after the download is successful. version of the processing model, and store the new version of the processing model locally, so as to realize the upgrade of the display device processing model.
  • the subtitle parsing module is used to parse the subtitle data to obtain subtitle information, time information, and additionally configured target information, the target information including but not limited to the shape feature/identity, position distribution and status of the sounding object Description information, etc., and then send the information obtained by parsing to the drawing and rendering module.
  • the drawing and rendering module belongs to the front-end module associated with the display, and is used to determine the subtitle display format adapted to the sounding object according to the reference information sent by the neural network processing module and the subtitle parsing module, so as to draw the subtitle template and The subtitle effect is rendered, and then when the time node indicated by the time information is reached, the display will load and display the corresponding subtitle information at the position of the sounding object according to the subtitle template according to the subtitle template.
  • the rendering module needs to further output the result and target according to the model. Information, matching the sounding object and its position and size; if the output of the processing model is the matched sounding object and its position and size information, then the rendering module does not need to repeat the matching.
  • the display device in this application can capture the target video image, thereby locating the relative position of the sounding object in the video screen, and calculating the size information of the sounding object, so that the sounding object can be At the location of the target, display the subtitle information that matches the size of the sounding object, so that the user can intuitively identify which target object the subtitle information corresponds to, and know who is the sounding object, even if there are multiple objects at the same time node making sound at the same time,
  • the hearing-impaired can still know how many utterances are currently being made and what each utterance object said through the subtitle information partially displayed by each utterance object, so as to improve the accuracy and richness of the subtitle display, thereby providing users with Especially for the hearing impaired to provide a better viewing experience of the video and its subtitles.
  • this application also supports onomatopoeic subtitles for non-biological objects that emit physical sounds in the environment, so as to provide hearing-impaired people with more vivid subtitles that are closer to the sound effects of videos, and are no longer limited to conventional Voice subtitles.
  • the display format and effect of the subtitles can be flexibly set.
  • the operator can consider multiple angles when configuring the target information in the subtitle data. Factors that may affect the subtitle display effect, and add the corresponding description in the target information, so as to provide more references for the display device to determine the subtitle display effect; for another example, when the display device configures the subtitle effect, it can be from the subtitle information
  • the processing model in this application can be based on high-precision image algorithms for neural network modeling and training.
  • the image algorithms include but are not limited to image segmentation, target recognition, edge detection, etc.
  • the processing model and its related training algorithms are not limited. Reference is made to known techniques.
  • Subtitles refer to a form of expression that uses text to describe video scenes and dialogues of characters in the video. Subtitles can help users understand the content of the video in the form of text reading in addition to sound and picture. The combination of subtitle text and audio can express the content of the video more clearly. In addition, subtitles also play a special role in helping users with hearing impairments or language differences.
  • subtitles will also provide users with some preference settings. For example, the user is allowed to set the size of the subtitle text, the color of the subtitle text, the background color of the subtitle display area, and so on. Especially for some subtitles, you need to set the correct subtitle encoding, otherwise the subtitles may be displayed as garbled characters and cannot be read normally.
  • the picture is continuously output, but the subtitles are different.
  • Subtitles need to display specific text content at a specific time point, and subtitles are updated at a certain time interval. Taking movies as an example, each subtitle is updated synchronously with the dialogue time of the characters. Because of this, users often cannot check the effect after setting the subtitle preference in real time. If they want to check the effect of the subtitle setting, they must wait until the subtitle is displayed at the time when the video is played. However, the time point for displaying subtitles is uncertain, and the user needs to wait for the appearance of the subtitles to watch the effect after setting, and the waiting time is uncertain.
  • the embodiment of the present application also provides the following processing:
  • Content such as videos played on the current display device 200, such as TV programs, variety shows, news programs, etc.
  • all display subtitles synchronously such as the content of the TV series currently being played by the display device 200 shown in FIG. Character A in the TV series is speaking, and the corresponding subtitle reads "Life can't stand the toss".
  • Subtitles refer to a form of expression that uses text to describe video scenes and dialogues of characters in the video. Subtitles can help users understand the content of the video in the form of text reading in addition to sound and picture. The combination of subtitle text and audio can express the content of the video more clearly. In addition, subtitles also play a special role in helping users with hearing impairments or language differences.
  • subtitles will also provide users with some preference settings. For example, the user is allowed to set the size of the subtitle text, the color of the subtitle text, the background color of the subtitle display area, and so on. Especially for some subtitles, you need to set the correct subtitle encoding, otherwise the subtitles may be displayed as garbled characters and cannot be read normally.
  • the picture is continuously output, but the subtitles are different.
  • Subtitles need to display specific text content at a specific time point, and subtitles are updated at a certain time interval. Taking movies as an example, each subtitle is updated synchronously with the dialogue time of the characters. Because of this, users often cannot check the effect after setting the subtitle preference in real time. If they want to check the effect of the subtitle setting, they must wait until the subtitle is displayed at the time when the video is played. However, the time point for displaying subtitles is uncertain, and the user needs to wait for the appearance of the subtitles to watch the effect after setting, and the waiting time is uncertain. As a result, when the user sets subtitles, too much time is wasted waiting for the effect to be displayed, which in turn causes a problem of poor interactive experience between the user and the display device 200 .
  • the embodiment of the present application provides a display device 200, which can instantly display the user's setting effect on the subtitle content when the video content is played.
  • the display device 200 in the embodiment of the present application may provide the user with a setting page for setting subtitle attributes, and may also provide the user with a preset display area for displaying subtitle setting effects.
  • the operation for the user to open the subtitle property page may be to move the focus on the user interface through a control device 100 such as a remote control, and then select the entry of the subtitle property page;
  • Function buttons such as menu buttons, such function buttons are associated with some menu pages on the display device 200, and the menu pages are provided with the entrance of the subtitle property page; it is also possible for the user to input voice to the display device 200 through voice assistants, etc. command, the voice command can control the display device 200 to directly enter the subtitle property page.
  • Fig. 10 is a schematic diagram of displaying a menu page with a subtitle property page entry on the user interface of the display device 200 according to some embodiments.
  • the menu page can be displayed on the right side of the user interface as shown in FIG. 9 , and includes some function entries related to video content, such as "subtitle”, "display”, "audio” and so on. After the user selects the "subtitle” entry, the display device 200 will display the subtitle property page.
  • Fig. 11 is a schematic diagram of displaying a subtitle property page on a user interface of the display device 200 according to some embodiments.
  • the subtitle property page can also be displayed on the right side of the user interface, and includes some subtitle property options, such as subtitle code, font size, content color, background color, and the like.
  • the user can control the display device 200 to display the corresponding property page by selecting any subtitle property on the subtitle property page.
  • the display device 200 displays the subtitle property page, it will also be displayed as a preset display area 201, and the preset subtitle content as a preview content can be displayed in the preset display area 201, such as "I got everything I should get.” .
  • a subtitle encoding page as shown in Figure 12 can be displayed on the display device 200, which includes several options of different encoding methods, such as utf8 (Universal Character Set/Unicode Transformation Format 8, 8-bit Yuan), gb (GuoBiao, national standard), big5 (big five yards), ios, windows, etc.
  • the coding method indicates the method of storing subtitle content in the display device 200.
  • the display device 200 needs to display the subtitle content, it also needs to use the corresponding decoding method to decode the stored subtitle content.
  • Different decoding methods may also be used to decode the subtitle content.
  • one coding method may correspond to a decoding method with a unified name, and the corresponding decoding method can be determined after determining the coding method of the subtitle content.
  • the display device 200 can display a font size page as shown in FIG. 13 , which includes several font size options, such as size four, size three, small three, and so on. Users can select the corresponding font size according to their preferences and viewing habits, and then the display device 200 will adjust the size of all the characters, letters or symbols in the subtitle content accordingly.
  • the display device 200 may display a content color page as shown in FIG. 14 , which includes several color options, such as white, black, yellow, red, and blue. Users can select the corresponding font color according to their preferences and viewing habits, and then the display device 200 will adjust the colors of all text in the subtitle content, subtitles or symbols accordingly.
  • the display device 200 may display a background color page as shown in FIG. 15 , which also includes several color options, such as white, black, yellow, red, blue, etc.
  • the user can select the corresponding background color according to his or her preferences and viewing habits, and then the display device 200 will set the background color of the subtitle content to the most corresponding setting.
  • the background of the subtitle content mentioned in the embodiment of the present application refers to the area where no text, letter or symbol is displayed in the area specified on the user interface where the subtitle content is displayed.
  • the display device 200 normally plays video content, it usually has a fixed area for displaying subtitle content. In some cases, this area may be rectangular, and the text, letters or symbols displayed within the rectangular range cannot completely cover the entire area. In this case, the uncovered area can be regarded as the background of the subtitle content.
  • the color of the background cannot be the same as that of the subtitle content, so as to avoid the problem that the subtitle content cannot be displayed clearly.
  • Fig. 16 is a schematic diagram of an interaction process between the display device 200 and the user according to some embodiments. As shown in Figure 16, the interaction process specifically includes:
  • the user may issue a control command to the display device 200 through the control device 100 to control the display device 200 to display a subtitle property page.
  • the display device 200 After the display device 200 displays the subtitle property page, it will immediately display the preset display area 201 on the current user interface, and the user can preview the effect of the subtitle property setting in this area. During this setting operation, if the user does not select any subtitle attribute, the preset display area 201 will directly display the obtained preset subtitle content or display the effect of the subtitle attribute set by the user last time.
  • the display device 200 After the user selects the target attribute on the subtitle attribute page through the control device 100, the display device 200 will set the corresponding attribute according to the preset subtitle content, and display the set display effect in the preset display area 201.
  • the preset display area 201 of the aforementioned content always displays the same content of text, letters or symbols, but the attributes of these content will change according to the user's selection.
  • the display device 200 in the embodiment of the present application can provide the user with the function of subtitle setting while playing the video content, and the display effect after the subtitle setting can be displayed immediately, which is convenient for the user to set the attributes of the subtitle at any time. Changes are made to speed up the display of the effect after the subtitle is set, and also improve the interactive experience between the user and the display device 200 .
  • the embodiment of the present application also provides a subtitle preview method on the display device, which can be applied to the display of the foregoing embodiments. in the device 200, and executed by the controller 250 in the display device 200. As shown in Figure 17, this method may specifically include the following steps:
  • Step S101 when the display device 200 is playing video content, in response to a user operation of selecting to open a subtitle property page on a user interface, obtain preset subtitle content from a subtitle file corresponding to the video content.
  • the user interface is displayed by the display 260 of the display device 200 .
  • the preset subtitle content is obtained according to the subtitle content extraction conditions preset in the display device 200 .
  • the preset condition requires the display device 200 to extract the first subtitle content corresponding to each video content as the preset subtitle content, or the preset condition requires the display device 200 to extract the third subtitle content corresponding to each video content as the preset subtitle content.
  • Subtitle content, etc. the preset subtitle content, target subtitle content, etc. in the embodiments of the present application refer to the preview subtitle content used as an effect preview, and do not refer to the subtitle content played in real time during video content playback.
  • the user may want to set the subtitles to a display effect more in line with his requirements or preferences.
  • the user may operate on the user interface of the video content currently played by the display device 200 to control the display device 200 to display the subtitle property page.
  • the subtitle property page may be as shown in the aforementioned FIG. 11 .
  • step S101 after displaying the subtitle property page, the display device 200 can immediately obtain the subtitle file corresponding to the currently played video content, and obtain the preset subtitle content from the subtitle file.
  • the display device 200 displays the subtitle property page, it will also be displayed as a preset display area 201 on the current user interface immediately, so that the preset subtitle content can be displayed for the user to preview the effect.
  • the display device 200 Since the display device 200 needs to store the subtitle file using a specific encoding method when obtaining the subtitle file, it is also necessary to take out each subtitle content from the storage space when displaying the specific content in the subtitle file, and perform decoding.
  • the decoding method corresponds to the encoding method, otherwise there will be problems of garbled characters when the decoded subtitle content is displayed.
  • the display device 200 when the display device 200 decodes the subtitle content, it does not know the corresponding encoding method, so it is difficult to determine the decoding method. In the current display device 200, usually the user selects the decoding method sequentially on the subtitle encoding page. After the display device 200 uses the selected decoding method to decode the subtitle content, it will be displayed at the real playback time of the subtitle. If the currently selected decoding method The distance between the time point and the time point when the subtitle is actually played is long, so the user needs to wait for a period of time to see the effect of the subtitle display.
  • the user can know that the currently selected decoding method is correct; And if the displayed subtitle is garbled, the user needs to re-select the decoding method, and then wait for the next subtitle to be displayed to check the display effect.
  • the interaction process between the display device 200 and the user provided by another embodiment includes:
  • a preset decoding mode may also be set in the display device 200, and the preset subtitle content is obtained in step S101, and a subtitle property page is displayed on the user interface;
  • the user can continue to select other decoding methods on the subtitle encoding page in the subtitle property page to continue decoding the preset subtitle content (that is, use the target decoding method to decode the preset subtitle content ), and display the decoded subtitle content in the preset display area 201.
  • the user can continue to select other property pages to display on the subtitle property page, and then select other properties of the subtitle to set. For example, the user selects the font size attribute, and the display device 200 displays the font size page. The user selects a font size option on the font size page, and then adjusts the size of characters, letters or symbols in the decoded subtitle content in the preset display area 201 .
  • the preset subtitle content when the subtitle content is stored on the display device 200, a certain flag in the encoded subtitle content will indicate the encoding method. Therefore, in some embodiments, after the preset subtitle content is acquired in step S101, it may first be determined whether the preset subtitle content includes a flag bit used to indicate the encoding method, and if so, the encoding can be determined according to the flag bit. mode and the decoding mode corresponding to the coding mode, and determine the decoding mode as the candidate decoding mode, and then use the candidate decoding mode to decode the above-mentioned preset subtitle content, so as to obtain the decoded subtitle content.
  • the preset subtitle content obtained in step S101 does not include a flag bit
  • the preset subtitle content is decoded using a preset decoding method in the display device 200 , so as to obtain decoded subtitle content.
  • Step S102 in response to the user operation of selecting the target attribute on the subtitle attribute page, set the corresponding attribute of the preset subtitle content according to the content of the target attribute, and obtain the target subtitle content.
  • step S102 it can be determined whether the user selects the target decoding method on the subtitle encoding page in the subtitle attribute page, and if the user selects the target decoding method, the preset subtitle content can be re-encoded using the target decoding method, thereby Get the target subtitle content.
  • step S102 the user can also select the target decoding method on the subtitle encoding page in the subtitle property page, so that the display device 200 Directly use the target decoding method to decode the preset subtitle content.
  • the user in addition to setting the subtitle decoding method, can also set the font size of the subtitle. Furthermore, after the decoded subtitle content is displayed in the preset display area 201, it can be determined whether the user selects the target font size on the font size page of the subtitle property page. If the user selects the target font size, the font size of the decoded subtitle content can be set according to the target font size, so as to obtain the target subtitle content.
  • step S102 the user can also select the target font size on the font size page of the subtitle property page, so that the preset subtitle content is directly displayed on the display device 200. Set the font size of the subtitle content to be adjusted to the target font size.
  • the user can also set the color of the subtitle content. Furthermore, after the decoded subtitle content is displayed in the preset display area 201, it can be determined whether the user selects the target color on the content color page in the subtitle property page. If the user selects the target color, the color of the decoded subtitle content can be set according to the target color, so as to obtain the target subtitle content.
  • step S102 the user can also select the target color on the content color page of the subtitle property page, so that the display device 200 directly displays The color of the preset subtitle content is adjusted to the target color.
  • the user can also set the background color of the subtitle content. Furthermore, after the decoded subtitle content is displayed in the preset display area 201, it can be determined whether the user selects a target background color on the background color page of the subtitle property page. If the user selects the target background color, the background color of the decoded subtitle content can be set according to the target background color, so as to obtain the target subtitle content.
  • step S102 if the display device 200 directly displays the preset subtitle content in the preset display area 201, in step S102, the user can also select the target background color on the background color page of the subtitle property page, so that the display device 200 directly Adjust the background color of the preset subtitle content to the target background color.
  • the user after previewing the decoded subtitle content displayed in the preset display area 201, the user can set any, any or all of the attributes such as subtitle code, font size, content color and background color.
  • the target attribute in this embodiment of the application refers to the target decoding method selected by the user on the subtitle encoding page, the target font size selected by the user on the font size page, the target color selected by the user on the content color page, and the target color selected by the user on the background color page. target background color, etc.
  • Step S103 displaying target subtitle content in a preset display area on the user interface.
  • the video content can be played normally on the user interface of the display device 200 .
  • a preset display area 201 is displayed, and then the display effect for setting the subtitle attribute is displayed immediately.
  • the subtitle content whose attributes are set is the target subtitle content mentioned in the embodiment of the present application. If the display effect of the target subtitle content cannot meet the needs or preferences of the user, the user can also reselect the target attribute on the corresponding attribute page.
  • the display device 200 normally displays the subtitle content in the video content normally played.
  • the normally displayed subtitle content and the target subtitle content in the preset display area 201 may be displayed on the user at the same time. On the interface, this will cause the user to be unable to determine which subtitle content is the subtitle content of the preview setting effect, thereby affecting the user experience.
  • the display device 200 may be controlled not to display normally played subtitle content during the process of setting subtitle attributes by the user.
  • the display device 200 in order to prevent the subtitle content in the video content normally played on the display device 200 from affecting the target subtitle content in the preset display area 201, in the process of setting the subtitle attribute by the user, it is also possible to control the content of the subtitle being played.
  • Video content paused For example, as shown in FIG. 19, when the user selects to perform subtitle attribute setting operation on the interface of the currently playing video content, the display device 200 will pause playing the video content while displaying the subtitle attribute page, and the currently suspended user A preset display area 201 is displayed on the interface. Also, actual subtitle content will not be displayed on the currently paused UI.
  • the position of the preset display area 201 on the user interface is not fixed, and it can also be displayed at other positions on the user interface that do not block the subtitle property page, for example, in the user interface as shown in FIG. 20 on the left side, or at the top of the user interface as shown in Figure 21, etc.
  • the display device 200 can also provide the user with the content of setting the property of the transparency of the subtitle background, that is, add the background transparency property and the transparency page in the subtitle property page. If the user wants to set the subtitle background color but does not want the subtitle background to block the video content, then the user can select the target value on the transparency page to set the subtitle background transparency.
  • FIG. 22 is a schematic diagram of a transparency page on the user interface of the display device 200 according to some embodiments. As shown in FIG.
  • the transparency page includes several transparency values, such as 0%, 10%, 30%, 50%, 80%, 100% etc.
  • the user selects different transparency values, and the display device 200 can display different transparency of the subtitle background. Taking the transparency as 80% as an example, the transparency display effect of the subtitle background is shown in Figure 23.
  • different processing modules may also be set in the display device 200 to realize acquisition or processing of different contents, such as a subtitle parsing module, a subtitle setting module, a subtitle display module, and the like.
  • the subtitle analysis module can analyze the subtitle content
  • the subtitle setting module can set the attribute of the subtitle content according to user's selection or preset content, etc.
  • the subtitle display module can display the subtitle content.
  • the interaction process between the display device 200 and the user provided by another embodiment includes:
  • the subtitle setting module calls the subtitle parsing module, so as to obtain a piece of real subtitle content in advance.
  • the subtitle setting module sets the decoding method of the subtitle, and the subtitle analysis module uses this decoding method to decode the subtitle content;
  • the user can instantly preview the display effect after the subtitle decoding setting is set. If the subtitle decoding method is set correctly, the subtitle content will be displayed correctly. Otherwise, garbled characters will be displayed, and the user needs to set another decoding method to preview again.
  • the display device 200 can also add a code recognition module, which is used to identify code flags in the subtitle content and determine which coding method the subtitle content adopts.
  • the interaction process between the display device 200 and the user provided by another embodiment includes:
  • the subtitle setting module calls the subtitle parsing module, and then obtains a piece of real subtitle content in advance.
  • the subtitle parsing module sends the subtitle content to the code recognition module.
  • the coding identification module analyzes the coding method of the subtitle content, and then returns the analysis result to the subtitle setting module.
  • the subtitle setting module updates the user interface according to the parsed coding method, highlights the corresponding coding option on the subtitle coding page, and sets the decoding method corresponding to the coding method to the subtitle parsing module.
  • the subtitle analysis module decodes the subtitle content according to the set decoding method, and then sends the decoded subtitle content to the subtitle display module for display.
  • the subtitle setting module may also send the corresponding attribute value or attribute option to the subtitle display module for display.
  • subtitle analysis module the aforementioned subtitle analysis module, subtitle setting module, subtitle display module and code recognition module
  • these processing modules are also controlled by the controller 250 in the display device 200 to achieve specific processing. functional.
  • the user can continue to operate the controller of the display device 200 to close the current subtitle attribute page. Therefore, after step S103, if the display device 200 receives the user's operation of selecting the closed caption property page, it can control the display device 200 to close the preset display area 201 while closing the closed caption property page. And, all content corresponding to the current video content on the control display device 200 is set and displayed according to the attribute selected by the user.
  • the video content on the display device 200 is in a paused playback state when the subtitle attribute is set, then after controlling the display device 200 to broadcast the subtitle attribute page and the preset display area 201, the video content will also be controlled to continue playing, and according to the user's selected
  • the attribute displays all subtitle content corresponding to the video content.
  • subtitle attributes mentioned in the embodiment of this application include but are not limited to the attributes listed in the foregoing content.
  • the user's display requirements for subtitle content can be used as subtitle content. Attributes, and how to set them can refer to the content in the foregoing embodiments, and will not be repeated here.
  • the embodiment of the present application provides a method for previewing subtitles on a display device and the display device, and the user can set the subtitle attribute of the video content while the display device 200 is playing the video content.
  • the display effect after setting can be directly displayed on the display device 200 in real time, which is convenient for the user to set the property setting in time, and avoids displaying the subtitle setting effect only when the video content synchronously displays the subtitle content, saving the user
  • the time for waiting for the effect to be displayed can further ensure the user's experience of using the display device 200 .
  • the present application also provides a computer-readable non-volatile storage medium, which can store a program, and when the program is executed, it can include the program steps involved in the subtitle display method in the above-mentioned embodiments .
  • the computer storage medium may be a magnetic disk, an optical disk, a read-only memory (English: Read-Only Memory, ROM for short), or a random access memory (English: Random Access Memory, RAM for short).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

本申请公开一种显示设备,显示设备接收资源服务器同步发送的视频数据和字幕数据,所述字幕数据包括字幕信息、时间信息和用于指示所述字幕信息的发声对象的目标信息;根据所述视频数据和所述目标信息,计算所述发声对象在视频画面中的相对位置和尺寸信息;根据所述时间信息、所述发声对象在视频画面中的相对位置和尺寸信息,在所述视频画面上显示所述字幕信息。

Description

显示设备
相关申请的交叉引用
本申请要求在2021年10月27日提交、申请号为202111255290.0;在2021年10月29日提交、申请号为202111280246.5的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及显示设备领域,尤其涉及一种显示设备。
背景技术
显示设备在播放视频资源时,视频画面中一般显示字幕信息,字幕信息与视频画面中人物的人声和对话同步,通过显示字幕信息,一方面提供语言翻译功能,将人声转化为用户习惯的语言文字,另一方面还能为听力障碍人士提供帮助,便于用户理解视频播放的内容。常规的字幕信息是在视频画面上的固定位置(例如底部)逐句显示,但这种字幕显示方式对于具有听力障碍的用户并不友好,比如视频中多人同时发声,听力障碍人士就无法获知当前字幕对应于哪一人物。
发明内容
本申请实施方式提供一种显示设备,包括:显示器,用于显示视频及其字幕信息;通信器,用于与资源服务器通信连接;控制器,被配置为执行:接收所述资源服务器同步发送的视频数据和字幕数据,所述字幕数据包括字幕信息、时间信息和用于指示所述字幕信息的发声对象的目标信息;根据所述视频数据和所述目标信息,计算所述发声对象在视频画面中的相对位置和尺寸信息;根据所述时间信息、所述发声对象在视频画面中的相对位置和尺寸信息,控制显示器在所述视频画面上显示所述字幕信息。
本申请实施方式还提供一种用于显示设备的方法,所述方法包括:接收资源服务器同步发送的视频数据和字幕数据,所述字幕数据包括字幕信息、时间信息和用于指示所述字幕信息的发声对象的目标信息;根据所述视频数据和所述目标信息,计算所述发声对象在视频画面中的相对位置和尺寸信息;根据所述时间信息、所述发声对象在视频画面中的相对位置和尺寸信息,在所述视频画面上显示所述字幕信息。
附图说明
图1为显示设备的使用场景;
图2为控制装置100的硬件配置框图;
图3为显示设备200的硬件配置框图;
图4为显示设备200中软件配置图;
图5(a)为改进前男人和女人同时发声时,字幕信息1、2的显示效果图;
图5(b)为改进前女人单独发声时,字幕信息3的显示效果图;
图5(c)为改进前切换到男人单独发声时,字幕信息4的显示效果图;
图5(d)为改进前男人和女人出门时的视频画面显示效果图;
图6(a)为改进后男人和女人同时发声时,字幕信息1、2的显示效果图;
图6(b)为改进后女人单独发声时,字幕信息3的显示效果图;
图6(c)为改进后切换到男人单独发声时,字幕信息4的显示效果图;
图6(d)为改进后男人和女人出门时,字幕信息5的显示效果图;
图7为一种字幕显示方法的流程图;
图8为字幕显示的逻辑架构示意图;
图9为根据一些实施例的显示设备200上播放带有字幕内容的视频内容的示意图;
图10为根据一些实施例的显示设备200用户界面上显示带有字幕属性页面入口的菜单页面的示意图;
图11为根据一些实施例的显示设备200用户界面上显示字幕属性页面的示意图;
图12为根据一些实施例的显示设备200用户界面上字幕编码页面的示意图;
图13为根据一些实施例的显示设备200用户界面上字号页面的示意图;
图14为根据一些实施例的显示设备200用户界面上内容颜色页面的示意图;
图15为根据一些实施例的显示设备200用户界面上背景颜色页面的示意图;
图16为根据一些实施例的显示设备200与用户交互过程的一种示意图;
图17为根据一些实施例的显示设备上字幕预览方法的流程图;
图18为根据一些实施例的显示设备200与用户交互过程的另一种示意图;
图19为根据一些实施例的显示设备200上暂停播放视频内容的用户界面示意图;
图20为根据一些实施例的显示设备200上预设显示区域位置的一种示意图;
图21为根据一些实施例的显示设备200上预设显示区域位置的另一种示意图;
图22为根据一些实施例的显示设备200用户界面上透明度页面的示意图;
图23为根据一些实施例的显示设备200上显示字幕背景透明为80%的示意图;
图24为根据一些实施例的显示设备200上各模块之间字幕内容处理的一种示意图;
图25为根据一些实施例的显示设备200上各模块之间字幕内容处理的另一种示意图。
具体实施方式
为使本申请的目的和实施方式更加清楚,下面将结合本申请示例性实施例中的附图,对本申请示例性实施方式进行清楚、完整地描述,显然,描述的示例性实施例仅是本申请一部分实施例,而不是全部的实施例。
需要说明的是,本申请中对于术语的简要说明,仅是为方便理解接下来描述的实施方式,而不是意图限定本申请的实施方式。除非另有说明,这些术语应当按照其普通和通常的含义理解。
图1为根据实施例的显示设备的使用场景的示意图。如图1所示,显示设备200还与服务器400进行数据通信,用户可通过智能设备300或控制装置100操作显示设备200。
在一些实施例中,控制装置100可以是遥控器,遥控器和显示设备的通信包括红外协议通信或蓝牙协议通信,及其他短距离通信方式中的至少一种,通过无线或有线方式来控制显示设备200。
在一些实施例中,智能设备300可以包括移动终端、平板电脑、计算机、笔记本电脑,AR/VR设备等中的任意一种。
在一些实施例中,也可以使用智能设备300以控制显示设备200。例如,使用在智能设备上运行的应用程序控制显示设备200。
在一些实施例中,也可以使用智能设备300和显示设备进行数据的通信。
在一些实施例中,显示设备200还可以采用除了控制装置100和智能设备300之外的方式进行控制,例如,可以通过显示设备200设备内部配置的获取语音指令的模块直接接收用户的语音指令控制,也可以通过显示设备200设备外部设置的语音控制装置来接收用户的语音指令控制。
在一些实施例中,显示设备200还与服务器400进行数据通信。可允许显示设备200通过局域网(LAN)、无线局域网(WLAN)和其他网络进行通信连接。服务器400可以向显示设备200提供各种内容和互动。
图2为根据示例性实施例的控制装置100的配置框图。如图2所示,控制装置100包括控制器110、通信接口130、用户输入/输出接口140、存储器、供电电源。控制装置100可接收用户的输入操作指令,且将操作指令转换为显示设备200可识别和响应的指令,起用用户与显示设备200之间交互中介作用。
在一些实施例中,通信接口130用于和外部通信,包含WIFI芯片,蓝牙模块,NFC或可替代模块中的至少一种。
在一些实施例中,用户输入/输出接口140包含麦克风,触摸板,传感器,按键或可替代模块中的至少一种。
图3为根据示例性实施例的显示设备200的硬件配置框图。
在一些实施例中,显示设备200包括调谐解调器210、通信器220、检测器230、外部装置接口240、控制器250、显示器260、音频输出接口270、存储器、供电电源、用户接口中的至少一种。
在一些实施例中控制器包括中央处理器,视频处理器,音频处理器,图形处理器,RAM,ROM,用于输入/输出的第一接口至第n接口。
在一些实施例中,显示器260包括用于呈现画面的显示屏组件,以及驱动图像显示的驱动组件,用于接收源自控制器输出的图像信号,进行显示视频内容、图像内容以及菜单操控界面的组件以及用户操控UI界面等。
在一些实施例中,显示器260可为液晶显示器、OLED显示器、以及投影显示器中的至少一种,还可以为一种投影装置和投影屏幕。
在一些实施例中,调谐解调器210通过有线或无线接收方式接收广播电视信号,以及从多个无线或有线广播电视信号中解调出音视频信号,如以及EPG数据信号。
在一些实施例中,通信器220是用于根据各种通信协议类型与外部设备或服务器进行通信的组件。例如:通信器可以包括Wifi模块,蓝牙模块,有线以太网模块等其他网络通信协议芯片或近场通信协议芯片,以及红外接收器中的至少一种。显示设备200可以通过通信器220与控制装置100或服务器400建立控制信号和数据信号的发送和接收。
在一些实施例中,检测器230用于采集外部环境或与外部交互的信号。例如,检测器230包括光接收器,用于采集环境光线强度的传感器;或者,检测器230包括图像采集器,如摄像头,可以用于采集外部环境场景、用户的属性或用户交互手势,再或者,检测器230包括声音采集器,如麦克风等,用于接收外部声音。
在一些实施例中,外部装置接口240可以包括但不限于如下:高清多媒体接口接口(HDMI)、模拟或数据高清分量输入接口(分量)、复合视频输入接口(CVBS)、USB输入接口(USB)、RGB端口等任一个或多个接口。也可以是上述多个接口形成的复合性的输入/输出接口。
在一些实施例中,控制器250和调谐解调器210可以位于不同的分体设备中,即调谐解调器210也可在控制器250所在的主体设备的外置设备中,如外置机顶盒等。
在一些实施例中,控制器250,通过存储在存储器上中各种软件控制程序,来控制显示设备的工作和响应用户的操作。控制器250控制显示设备200的整体操作。例如:响应于接收到用于选择在显示器260上显示UI对象的用户命令,控制器250便可以执行与由用户命令选择的对象有关的操作。
在一些实施例中控制器包括中央处理器(Central Processing Unit,CPU),视频处理器,音频处理器,图形处理器(Graphics Processing Unit,GPU),RAM Random Access Memory,RAM),ROM(Read-Only Memory,ROM),用于输入/输出的第一接口至第n接口,通 信总线(Bus)等中的至少一种。
CPU处理器。用于执行存储在存储器中操作系统和应用程序指令,以及根据接收外部输入的各种交互指令,来执行各种应用程序、数据和内容,以便最终显示和播放各种音视频内容。CPU处理器,可以包括多个处理器。如,包括一个主处理器以及一个或多个子处理器。
在一些实施例中,用户可在显示器260上显示的图形用户界面(GUI)输入用户命令,则用户输入接口通过图形用户界面(GUI)接收用户输入命令。或者,用户可通过输入特定的声音或手势进行输入用户命令,则用户输入接口通过传感器识别出声音或手势,来接收用户输入命令。
在一些实施例中,用户输入接口280,为可用于接收控制输入的接口(如:显示设备本体上的实体按键,或其他等)。
参见图4,在一些实施例中,将系统分为四层,从上至下分别为应用程序(Applications)层(简称“应用层”),应用程序框架(Application Framework)层(简称“框架层”),安卓运行时(Android runtime)和系统库层(简称“系统运行库层”),以及内核层。
如图4所示,本申请实施例中应用程序框架层包括管理器(Managers),内容提供者(Content Provider)等,其中管理器包括以下模块中的至少一个:活动管理器(Activity Manager)用与和系统中正在运行的所有活动进行交互;位置管理器(Location Manager)用于给系统服务或应用提供了系统位置服务的访问;文件包管理器(Package Manager)用于检索当前安装在设备上的应用程序包相关的各种信息;通知管理器(Notification Manager)用于控制通知消息的显示和清除;窗口管理器(Window Manager)用于管理用户界面上的括图标、窗口、工具栏、壁纸和桌面部件。
以上实施例介绍了显示设备的硬件/软件架构以及功能实现等内容。对于该显示设备,比如可从外部信号源(如机顶盒等)或网络获取视频资源,并加载播放该视频资源。显示设备在播放视频资源时,除播放视频数据,一般还同步播放字幕信息,字幕信息是发声对象说出的声音内容所转换成的文本,字幕信息可以根据用户语言习惯进行显示或翻译,例如某影片的原音为英文,面向中国大陆用户,可将原音翻译为简体中文语言的文字,进而以简体中文显示字幕信息。此外,通过将音频内容转化为可视化的字幕信息,也便于听力障碍人士理解视频资源所传达的情节和内容。
在一些实施例中,运营商除提供视频资源,还可一并提供该视频资源的字幕数据,字幕数据中包括若干条字幕信息,并为每条字幕信息配置对应的时间信息,所述时间信息用于指示字幕信息所显示的时间节点,例如某视频资源的总播放时长为30分钟,字幕信息1被配置在视频播放进度为第50秒的时间节点处显示。
在一些实施例中,每个视频资源可以关联有一个播放时间轴,该播放时间轴的长度等 于为视频总时长,在该播放时间轴上对视频资源所包括的各条字幕信息的显示节点进行标记,在每一标记的节点处可记录要显示的字幕信息ID,从而使显示设备获知在节点处应显示哪个或哪些字幕信息。播放时间轴上每个标记的节点可映射于一条或多条字幕信息,当节点与字幕信息是一对多的映射关系时,说明节点时刻存在多个对象同时发出声音,而同一条字幕信息不可映射于多个节点。
在一些实施例中,显示设备在接收视频数据时,同步接收字幕数据,并根据当前时间和运营商预设好的时间信息,来控制字幕显示。参照图5(a)~图5(d)示例,假设在视频的环境场景中包括两个人物间的对话,分别为男人和女人,其中图5(a)对应的时间节点为19:30:31,在该时间节点处,视频画面中男人和女人同时发声,由此显示两条并列的字幕信息1和字幕信息2,字幕信息1对应于男人的发声,字幕信息2对应于女人的发声,例如字幕信息1为“出去吃晚饭吧”,字幕信息2为“有点饿了,该吃晚饭啦”。
其中图5(b)对应的时间节点为19:30:32,在该时间节点处,视频画面中仅女人单独发声,由此显示一条字幕信息3,例如字幕信息3为“好呀,我想吃火锅”。图5(c)对应的时间节点为19:30:33,视频画面中仅男人单独发声,由此产生一条字幕信息4,字幕信息4是对字幕信息3的应答,例如字幕信息4为“OK,我请客,咱们走吧”。之后男人和女人出门时,两人均未发声,因此显示如图5(d)的效果图,无字幕显示。
由图5(a)~图5(d)的视频及其字幕的显示示例可以看出,常规来说,字幕信息的显示位置及格式一般是固定模式的,例如本申请附图中字幕信息始终显示于视频画面的底部,字幕信息中文本的字体、字号、字形和字体颜色等板式也统一不变,其字幕显示模式较为单一,不够丰富和形象;对于5(a)的示例,听力良好的用户自然能根据男女声音的音色、音调等差异,准确区分出字幕信息1是男人发出的,字幕信息2是女人发出的,而对于听力障碍人士来说,他们能看到视频画面中男人和女人的嘴部都是发声时的活动状态,并且人物位置随机分布,字幕信息1和字幕信息2所表达的内容相似,导致听力障碍人士根本无法分辨两条并列显示的字幕信息分别由谁说出;此外,场景中环境产生的物理声响是无字幕显示的,例如5(d)的示例中,男女二人出门过程中,会产生“砰”的关门声,但关门声无拟声的字幕,听力障碍人士无法通过听觉和视觉感知环境所触发的声音。由此可见,普通的字幕显示方式对听力障碍人士并不友好,影响其对视频的观看和理解。
为克服常规字幕显示方式所存在的缺陷,在一些实施方式中,运营商在配置字幕数据时,除设置字幕信息和时间信息外,还增设目标信息,目标信息用于指示该条字幕信息的发声对象,本申请中所述发声对象不仅为具备发声能力的生物对象,例如人物、动物等,还可以是环境中能够产生物理声响的非生物对象,例如雷雨、河流、汽车、门、广播喇叭等。目标信息中具体包括用于描述发声对象的形象特征或身份标识,例如男人、女人、猫、飞机等。
在一些实施方式中,对于人物类型的发声对象,可以基于其在视频资源中实际的声音内容,进行如语义识别等语音处理,从而转换成相应的文本信息;对于动物类型、非生物类型等发声对象,可以将视频资源中的声音进行拟声生成字幕信息,例如对于猫,可以转换为“喵~喵~喵”的叫声,对于雷,可以转换为“轰隆~轰隆”的雷声,对于相机,可以转换为“咔嚓”的拍照声,等等。也就是说,改进后的字幕数据中不仅包括人声的字幕信息及其显示节点,还包括其他能够产生声音的非人声对象的字幕信息及其显示节点,这样显示设备端解析字幕数据后,不仅能显示人声字幕,还能显示如动物、昆虫、环境等非人声字幕,使得字幕显示更加丰富和形象,更贴近实际视频声音的播放内容和效果。
在一些实施方式中,所述目标信息还可包括对发声对象的位置分布的描述,通过形象特征和位置分布,可相对更精准地指示和定位发声对象,例如在图5(a)的示例中,包括三个对象元素,即男人、女人和门,三者近似于同行站立,则可通过站位排序,设置每个对象元素的位置分布,例如若按照从左向右的顺序,则门的位置分布为“左一”,男人的位置分布为“左二”,女人的位置分布为“左三”;若按照从右向左的顺序,则女人的位置分布为“右一”,男人的位置分布为“右二”,门的位置分布为“右三”。
在一些实施方式中,也可根据视频图像中包括的对象元素的数量、各对象元素间的位置分布规律等方面,将视频图像的区域进行阵列划分,例如在图5(a)的示例中,包括三个近乎呈行分布的对象元素,则将视频图像划分为一行三列,则位置分布=[i′,j′],其中i′表示对象元素所处的行序号,j′表示对象元素所处的列序号,则门的位置分布为[1,1],男人的位置分布为[1,2],女人的位置分布为[1,3]。在视频图像中哪个或哪些对象元素发声,则将该对象元素的位置分布填加到对应的字幕数据的目标信息中,以图5(b)为例,仅有女人单独发声,则将女人的位置分布[1,3]填加到字幕信息3所对应的目标信息3中。需要说明的是,发声对象的位置分布的定义形式不限于本申请实施例的示例。
在一些实施方式中,所述目标信息还可包括发声对象的状态描述信息,所述状态描述信息用于描述发声对象在发声时所处的状态,例如对于人物和动物等生物,则状态描述信息包括但不限于平静、激动、开心、悲伤等心理/情绪状态,以及疲惫、疼痛、困乏等身体状态,以及生物的活动状态等,这是由于身心状态和活动状态都可能会影响生物发声时的语气、语调、音量等,听力良好人士自然能通过听觉直观感知视频中生物对象的状态,然而听力障碍人士主要依赖字幕来理解和感受视频内容,但常规字幕的显示效果固定且单一,无法向听力障碍人士传达发声对象的情绪和状态波动,因此在目标信息中填加对字幕发声对象的状态描述和定义,使得显示设备解析字幕数据后,能够以状态描述信息作为参考,对字幕信息实施相匹配的展示特效,例如发声对象为男人,其情绪状态为“激动”,则对其字幕信息实施的特效可以是字幕文字为红色,以及相对增大字号;又例如,发声对象为老鼠,其活动状态为“悄悄啃食”,则实施的特效可以是使字幕信息若隐若现,以传达声 音隐隐的、轻微的效果。
在一些实施方式中,对于如门、汽车、河流、雷等环境中的非生物元素,则状态描述信息包括但不限于运行状态、工作状态、自然状态等,则显示设备同样可以状态描述信息为参考,来适配为字幕实施的特效,例如发声对象为门,其状态为“关门”,则对“砰”这一关门时的字幕信息实施类似于爆炸的特效,以传达声音突变响亮的效果;又例如,发声对象为扩音器,其状态为“正在播放广播”,则可选择实施的特效是将字幕信息嵌入于小喇叭图标中,从而呈现出“广播中”的效果。通过为字幕信息实施与状态描述信息相适配的特效,从而使字幕显示更生动形象,更利于向用户传达视频中发声对象的状态,提升字幕显示的精准性和丰富性,从而为用户,尤其是听力障碍人士提供更好的视频及其字幕的观看体验。需要说明的是,字幕特效的设置及实施方式不限于本申请的示例。
在一些实施方式中,显示设备在获取视频数据和字幕数据后,一方面需要对视频数据进行解码及播放控制,所述播放控制包括但不限于根据用户操作执行的起播、暂停播放、seek、倍速播放等;另一方面则是在视频起播后,根据视频播放进程和预先已打点标记的时间信息,控制字幕显示。
在一些实施方式中,显示设备对字幕数据进行解析,获取到字幕信息序列,所述字幕信息序列包括当前视频资源所具有全部字幕信息,以及每条字幕信息所对应的时间信息和目标信息,可选地,按照显示字幕的时间顺序,对序列中的各条字幕信息进行排序。
为便于描述,仅以视频中某一简化的片段为例,示例的字幕信息序列为:
[字幕1:{发声对象:男人;位置:左二;状态:平静;时间节点:19:30:31;字幕信息1:“出去吃晚饭吧”};
字幕2:{发声对象:女人;位置:左三;状态:平静;时间节点:19:30:31;字幕信息2:“有点饿了,该吃晚饭啦”};
字幕3:{发声对象:女人;位置:左三;状态:兴奋;时间节点:19:30:32;字幕信息3:“好呀,我想吃火锅”};
字幕4:{发声对象:男人;位置:左二;状态:兴奋;时间节点:19:30:33;字幕信息4:“OK,我请客,咱们走吧”};
字幕5:{发声对象:门;位置:右一;状态:被关闭;时间节点:19:31:15;字幕信息5:“砰”}]
根据上述视频片段以及字幕信息序列,提供的字幕显示效果如图6(a)~图6(d)的示例。其中,字幕1和字幕2对应于同一个时间节点19:30:31,即在19:30:31节点处存在多人同时发声,则参照图6(a)的示例,在19:30:31这一时间节点处,在男人所在的左二位置处的局部区域内显示字幕信息1“出去吃晚饭吧”,以及,在女人所在的左三位置处的局部区域内显示字幕信息2“有点饿了,该吃晚饭啦”,由于视频中男人和女人均处于平静 状态,因此对字幕信息1和字幕信息2同时实施与平静状态相匹配的特效,例如特效为字幕文字颜色为绿色,而字体和字形等可采用默认格式,字号则可根据发声对象的尺寸大小进行适配。由图6(a)可以看出,通过将每条字幕信息与发声对象的位置进行定位关联,实现在多对象同时发声的视频场景内,用户通过字幕信息的显示位置,即可快速锁定各条字幕所指向的发声对象,克服了听力障碍人士经常遇到的字幕与发声对象无法匹配的问题。
当视频播放进程达到19:30:32这一时间节点时,参照图6(b)的示例,停止显示字幕信息1和字幕信息2,并在女人所在的左三位置处的局部区域内显示字幕信息3“好呀,我想吃火锅”,由于此时视频中女人转变为兴奋状态,因此需要对字幕信息3实施与兴奋状态相匹配的特效,例如特效为字幕文字颜色为红色,字体相对放大。
当视频播放进程达到19:30:33这一时间节点时,参照图6(c)的示例,停止显示字幕信息3,并在男人所在的左二位置处的局部区域内显示字幕信息4“OK,我请客,咱们走吧”,由于此时视频中男人转变为兴奋状态,因此需要对字幕信息4实施与兴奋状态相匹配的特效,例如特效为字幕文字颜色为红色,字体相对放大。
当视频播放进程达到19:31:15这一时间节点时,视频场景跳转至男女二人出门后关门,参照图6(d)的示例,停止显示字幕信息4,并在门当前所在的右一位置处的局部区域内显示字幕信息5“砰”,由于此时视频呈现为用户进行关门动作,因此需要对字幕信息5实施与关门相匹配的特效,例如呈现类似于爆炸的效果,并且将字体相对放大。由图6(d)可以看出,本申请还可对环境中产生物理声响的非生物对象显示拟声字幕,使得听力障碍人士能够获知环境中的其他发声来源,提升用户体验。
在一些实施方式中,显示设备按时序、遍历显示字幕信息序列中包括的字幕信息,在此期间,显示设备可判断序列中的全部字幕信息是否都已显示完毕,若都已显示完毕,则根据用户对下一视频资源的选定和点播操作,继续获取下一视频资源的视频数据和字幕数据,并按照上述示例的各实现方式控制字幕显示;若字幕显示序列尚未显示完毕,则继续根据视频播放进程,控制序列中字幕信息的显示,直至序列中全部字幕信息都显示完毕。
本申请提供的UI附图仅是为便于描述而作出的示意,不代表实际产品设计,字幕格式及显示效果应以实际应用和设计为准。
在一些实施方式中,参照图7提供的字幕显示方法,所述方法由显示设备端的控制器250执行,所述方法包括如下步骤:
步骤S01,接收资源服务器同步发送的视频数据和字幕数据。
其中,所述资源服务器是对视频运营商的概括,相当于提供视频资源的信号源,所述资源服务器可以是网络资源的服务器,也可以是如有线广播电视、电视盒子等运营商的服务器。参照前述相关实施例的描述,所述字幕数据包括以文本形式表达视频声音内容的字幕信息、用于指示字幕显示节点的时间信息,以及用于指示字幕信息对应的发声对象的目 标信息。可选地,目标信息包括发声对象的形象特征/身份标识、位置分布和状态描述信息。字幕数据可内置于视频数据中,或者,字幕数据也可作为独立数据与视频数据进行关联绑定。
步骤S02,根据所述视频数据和所述目标信息,计算所述发声对象在视频画面中的相对位置和尺寸信息。
通过计算当前视频图像中发声对象的相对位置,从而为字幕信息的显示位置提供参照依据;视频图像中发声对象的尺寸信息,则用于使显示设备确定字幕信息的字体大小和所覆盖的区域大小,以避免因字体过小而导致用户浏览不便,也避免字体过大导致字幕信息同时覆盖到发声及未发声的对象元素上,保证听力障碍用户能够准确辨别出字幕信息所对应的发声对象。
在一些实施方式中,显示设备接收到视频数据后,可以从中提取出一帧的视频图像,每帧视频图像具有对应的显示时间戳,以指示该帧视频图像在什么时间节点处显示于屏幕上,而字幕显示与视频播放进程相关,例如在某时间节点处需要在第S帧视频图像上开始显示字幕信息1,由于发声对象A说完一句台词会产生时间消耗,设此发声耗时为T(单位为秒/s),当发声对象A说完字幕信息1对应的台词,同步地,显示器取消显示字幕信息1,则显示字幕信息1的持续时间为T,在字幕信息1显示的起止时间段内,显示器刷新了T*f帧视频图像,其中f为刷新频率(单位Hz),也就是说,视频数据中连续播放的第S帧~第(S+T*f)帧视频图像显示同一条字幕信息1。由于本申请中字幕信息1的显示位置与发声对象A的位置分布相关联,发声对象A可能发生移动,例如第S帧时发声对象A位于视频画面的左侧,当播放至第S+K帧(K小于或等于T*f)时,发声对象A的位置变换到了视频画面的中央,因此可追踪第S帧~第(S+T*f)帧中发声对象A的动态位置变化,并使字幕信息1随发声对象A的移动而联动,从而保证字幕信息显示的精准性。
在步骤S02示例性的实现方式中,跟随视频播放进程,首先从视频数据中提取目标视频图像,目标视频图像为第Si帧~第(Si+Ti*f)帧,其中Si为序列中第i个字幕信息对应的时间节点处应同步显示的视频图像的帧序号,Ti为序列中第i个字幕信息所对应的发声耗时,1≤i≤M,M为字幕信息序列中所包括的字幕信息的总条数。
在步骤S02示例性的实现方式中,在提取出目标视频图像后,参照目标信息对发声对象的描述,从目标视频图像中分割及识别出发声对象,在具体实现时,可在目标视频图像中构建图像坐标系,并计算发声对象在图像坐标系中的坐标(x,y),以及计算尺寸信息所包括的发声对象的高度h和宽度w。可选地,从发声对象上选取关键点,并计算该关键点的坐标(x,y),所述关键点可以是发声对象的中点,或者边缘点等。发声对象的高度h=∣ymax-ymin∣,发声对象的宽度w=∣xmax-xmin∣,假设图像坐标系以左下角为原点,x轴向右为正向,y轴向上为正向,则ymax为发声对象上的最高点在y轴(高度方向)的坐标 值,ymin为发声对象上的最低点在y轴的坐标值,xmax为发声对象上的最右点在x轴(宽度方向)的坐标值,xmin为发声对象上的最左点在x轴的坐标值。在获取到发声对象的高度h和宽度w后,即可确定发声对象大约占据的区域范围,从而为决策字幕信息的字体显示格式提供参考依据。
在一些实施方式中,可以构建并训练一处理模型,可选地,所述处理模型可采用深度学习模型,例如基于卷积神经网络的深度网络模型等,显示设备可调用所述处理模型来完成如图像分割、目标识别等解析处理。所述处理模型的一端接收目标视频图像的输入,处理模型的另一端给出输出结果,所述输出结果包括从目标视频图像中分割及识别出的对象元素Objectj,以及各对象元素Objectj的坐标(xj,yj)、高度hj和宽度wj,其中j表示目标视频图像中的对象元素的序号,1≤j≤N,N为目标视频图像中具有的对象元素总数。然后,从对象元素Objectj中筛选出与目标信息相匹配的发声对象,并由处理模型的输出结果一并获取到发声对象的坐标、高度和宽度。
以图6(b)中的视频图像为例进行说明,处理模型可分割及识别出N=3个对象元素,Object1为门,Object2为男人,Object3为女人,并计算及输出门、男人和女人的位置坐标及尺寸,处理模型输出结果的格式例如为[{Object1:门;x1:150;y1:450;w1:300;h1:900},{Object2:男人;x2:750;y2:536;w2:203;h2:714},{Object3:女人;x3:975;y3:480;w3:152;h3:655}]。例如目标信息指示为{发声对象:女人;位置:左三;状态:兴奋},控制器250利用该目标信息和处理模型的输出结果进行筛选匹配,匹配出发声对象为对象元素中的Object3,则发声对象的坐标为(975,480),发声对象的宽度*高度=152*655。
在另一些实施方式中,针对另一种训练模式的处理模型,可将目标视频图像和目标信息作为输入项,同时输入至处理模型中,由处理模型进行包括图像分割、目标识别和发声对象的筛选匹配等处理环节,则处理模型可直接输出发声对象及其坐标和尺寸,后续无需控制器在从对象元素中匹配发声对象。模型输出结果的格式例如为{发声对象:女人;坐标:(975,480);宽度:152;高度:655}。
在一些实施方式中,处理模型可以通过长期的训练和校正实现进化,从而使模型的处理效率及输出结果的准确性得以加强,提供更好的字幕效果,即处理模型是可不断更新的,这一过程可以由显示设备来执行,但维护处理模型会占用控制器的处理资源并且增大内存开销,可能会影响到显示设备的运行性能。对此,可以设置一模型服务器,模型服务器可与显示设备通信连接,所述模型服务器用于构建和训练处理模型,并对处理模型进行更新,因此处理模型构建后的进化过程中会衍生出多种版本,显示设备从模型服务器下载新版本模型,以替换旧版本模型,即可实现显示设备端处理模型的更新。这种改进方式由模型服务器代替显示设备对处理模型进行更新和维护,从而降低显示设备的内存开销和CPU处理资源的消耗。
在一些实施方式中,模型服务器每次更新成功处理模型时,向显示设备推送模型更新消息;显示设备接收到模型更新消息时,向模型服务器请求下载更新后的处理模型,下载完成后,显示设备删除旧版本的处理模型,并将更新后的处理模型存储于本地,之后显示设备即可调用当前最新版本的处理模型对目标视频图像进行解析处理。
在一些实施方式中,在将目标视频图像输入至处理模型之前,可根据神经网络的要求先对目标视频图像进行预处理,所述预处理包括但不限于对目标视频图像进行缩放、二值化处理、灰度处理等。例如,神经网络仅接受288*288分辨率的图像,而提取出的目标视频图像的分辨率大小为1280*720,则预先对目标视频图像做压缩处理,从而将目标视频图像缩小至288*288;又例如,神经网络仅接受黑白图像,而提取的目标视频图像是彩色图像,则可以对目标视频图像进行二值化处理,将彩色图像转换为黑白图像。
步骤S03,根据所述时间信息、所述发声对象在视频画面中的相对位置和尺寸信息,控制显示器在所述视频画面上显示所述字幕信息。
在视频播放进程达到时间信息所指示的时间节点时,需要在视频画面上显示对应的字幕信息,根据步骤S02中计算出的发声对象的相对位置(包括坐标),确定字幕信息的显示位置,实现字幕与发声对象联动,从而使用户能精准辨别当前字幕信息由哪个对象元素发出,以及,根据步骤S02中计算出的发声对象的尺寸信息,确定字幕信息的字体大小和所占据的区域范围等,以避免字幕过大或过小所产生的不利影响。其中,发声对象的尺寸信息不限于前述实施例中包括的宽度和高度,例如还可以是发声对象的面积等形式。
在一些实施方式中,若显示设备解析出目标信息中配置有发声对象的状态描述信息,则对字幕实施与状态描述信息相适配的展示特效。可选地,显示设备端可以维护一个状态-特效列表,该列表中记录有发声对象在不同状态下的预设特效,并支持用户新增、删减或修改状态特效,仅作为一种示例,例如愤怒状态下的预设特效为大号红色加粗字体,发声对象由远及近的活动状态下的预设特效为字体渐变放大的动画效果,虚弱状态下的预设特效为字幕闪烁,等等。需要说明的是,字幕信息的显示格式不限于字体格式和特效,还包括如行距、字符间距、语言等。
在一些实施方式中,在根据状态-特效列表筛选出与状态描述信息相匹配的展示特效后,根据展示特效和发声对象的尺寸信息,可确定与发声对象相适配的字幕显示格式,由此根据该字幕显示格式绘制当前的字幕模板,也就是说,字幕模板里约束了字幕信息的字体、特效、语言等格式,并在达到时间信息指示的时间节点时,在视频画面上发声对象所在的位置处按照字幕模板,加载显示字幕信息,从而提升了字幕显示的多样性、精准性和生动性,为用户尤其是听力障碍人士提供更好的视频及其字幕的观看体验。
在一些实施方式中,图8提供一种字幕显示的架构,架构中整体上包括服务器端和显示设备端,服务器端可细化为包括资源服务器和模型服务器。其中,资源服务器用于向显 示设备提供视频资源的视频数据和字幕数据,并且在资源服务器端,运营商会在字幕数据中增加配置目标信息,从而为显示设备端提供字幕显示格式的参考依据;模型服务器则用户创建、训练及更新处理模型,实现对处理模型的管理和维护,并在处理模型更新成功时,及时通知显示设备端升级模型版本。
在一些实施方式中,参照图8的示例,显示设备端可配置五个模块,分别为数据接收模块、抓图模块、神经网络处理模块、字幕解析模块和绘制渲染模块,这些功能模块可配置在控制器250内,并由控制器协调控制模块间的逻辑运行。资源服务器与显示设备建立通信连接后,数据接收模块可接收资源服务器发送的视频数据和字幕数据,并将字幕数据发送给字幕解析模块,将视频数据分别发送给解码器和抓图模块。可选地,当字幕数据内置于视频数据内时,数据接收模块可以先从视频数据内分离出字幕数据,然后再将字幕数据发送给字幕解析模块。视频数据注入解码器后,由解码器进行解码处理,并将解码数据发送给显示器,实现视频播放,其中所述解码器包括视频解码器和音频解码器。
在一些实施方式中,抓图模块用于提取目标视频图像,并将目标视频图像存储于内存中,以供神经网络处理模块对目标视频图像进行处理。可选地,抓图模块可根据神经网络处理模块对待处理图像的要求,对提取出的目标视频图像进行预处理。
在一些实施方式中,神经网络处理模块用于完成两项功能,其一是加载本地存储的所述处理模型,然后从内存中读取目标视频图像,并将目标视频图像输入至所述处理模型中,最后将所述处理模型的输出结果发送给绘制渲染模块;其二是神经网络处理模块可以根据模型服务器推送的模型更新消息,从模型服务器下载新版本的处理模型,下载成功后,删除旧版本的处理模型,并将新版本的处理模型存储于本地,实现显示设备端处理模型的升级。
在一些实施方式中,字幕解析模块用于解析字幕数据,以获取字幕信息、时间信息以及额外配置的目标信息,所述目标信息包括但不限于发声对象的形状特征/身份标识、位置分布和状态描述信息等,然后将解析获取的这些信息发送给绘制渲染模块。
在一些实施方式中,绘制渲染模块属于与显示器关联的前端模块,用于根据神经网络处理模块和字幕解析模块发送的参考信息,确定与发声对象适配的字幕显示格式,从而绘制字幕模板以及对字幕效果进行渲染,之后在达到时间信息指示的时间节点时,由显示器在发声对象所在的位置处按照字幕模板,加载显示对应的字幕信息。
在一些实施方式中,针对神经网络处理模块发送的模型输出结果,若处理模型输出的是目标视频图像中全部对象元素及其位置和尺寸信息,则绘制渲染模块还需进一步根据模型输出结果和目标信息,匹配出发声对象及其位置和尺寸;若处理模型输出的是已匹配完成的发声对象及其位置和尺寸信息,则绘制渲染模块无需重复匹配。
由以上实施例可知,本申请中显示设备在获取视频数据后,可以抓取目标视频图像, 从而定位发声对象在视频画面中的相对位置,并计算发声对象的尺寸信息,这样即可在发声对象的位置处,显示与发声对象的尺寸大小相适配的字幕信息,使得用户能够肉眼直观辨别出字幕信息对应于哪个目标对象,获知谁是发声对象,即便同一时间节点处有多对象同时发声,但听力障碍人士仍可通过每个发声对象局部展示的字幕信息,获知当前有几处发声,以及每个发声对象分别说了什么内容,从而提升字幕显示的精准性和丰富性,从而为用户,尤其是听力障碍人士提供更好的视频及其字幕的观看体验。此外,本申请中还支持对环境中发出物理声响的非生物对象提供其拟声字幕,从而为听力障碍人士提供更贴近于视频声音效果的更为生动形象的字幕显示,而不再局限于常规人声字幕。
需要说明的是,在本申请提供的字幕与发声对象位置联动机制的基础上,可以对字幕显示格式和效果进行灵活设置,例如运营商端在配置字幕数据内的目标信息时,可以多角度考虑可能影响字幕显示效果的因素,并在目标信息中增加相应的描述,从而为显示设备端决策字幕显示效果提供更多的参考依据;又例如,显示设备端在配置字幕效果时,可以从字幕信息表达的内容、发声对象状态、字幕观赏性和生动性等多角度,适配字幕的格式和特效。本申请中的处理模型可基于高精度图像算法进行神经网络建模和训练,所述图像算法包括但不限于图像分割、目标识别、边缘检测等,处理模型及其相关训练算法不限定,具体可参照已知技术。
字幕是指用文字描述视频场景、视频中人物对话的一种表现形式。字幕可以帮助用户在声音、画面之外,以文本阅读的方式理解视频的内容。字幕文字和音频结合起来观看,能更加清晰地表达出视频的内容。另外,字幕对于有听力障碍或者具有语言差异的用户来说,也起到帮助理解的特殊意义。
对于常见的显示设备,例如电脑、电视、手机等,都会开发相应的播放器用来播放视频。字幕作为播放器的必备功能也会给用户提供一些喜好设置。比如允许用户设置字幕文字的大小、字幕文字的颜色、字幕显示区的背景颜色等等。尤其对于某些字幕,需要设置正确的字幕编码,否则字幕可能会显示为乱码,不能正常阅读。
对于一个视频文件来说,其画面是持续不断输出的,但是字幕却不同。字幕需要在特定的时间点显示特定的文字内容,字幕的更新具有一定的时间间隔。以电影为例,每条字幕是与人物对话时间同步更新的。正因如此,用户在字幕喜好设置时往往不能即时查看设置后的效果,要想查看字幕设置后的效果,必须等到视频播放到有字幕的时间点显示字幕的时候。而显示字幕的时间点是不确定的,用户需要等待字幕的出现才能观看到设置后的效果,等待的时间是不确定的。这就导致用户设置字幕时,会浪费过多的时间等待效果的显示,进而造成用户与显示设备之间的交互体验感差的问题。因此,为解决目前显示设备为用户提供的字幕设置方式中,不能实时显示设置效果的问题,本申请实施例中还提供了如下处理:
当前的显示设备200上播放的视频等内容,例如电视节目、综艺节目、新闻节目等,都是同步显示字幕的,例如图9中所示的显示设备200当前正在播放的电视剧的内容,此时电视剧中的人物A正在说话,对应的字幕内容是“生命经不起折腾”。
字幕是指用文字描述视频场景、视频中人物对话的一种表现形式。字幕可以帮助用户在声音、画面之外,以文本阅读的方式理解视频的内容。字幕文字和音频结合起来观看,能更加清晰地表达出视频的内容。另外,字幕对于有听力障碍或者具有语言差异的用户来说,也起到帮助理解的特殊意义。
对于常见的显示设备200,例如电脑、电视、手机等,都会开发相应的播放器用来播放视频。字幕作为播放器的必备功能也会给用户提供一些喜好设置。比如允许用户设置字幕文字的大小、字幕文字的颜色、字幕显示区的背景颜色等等。尤其对于某些字幕,需要设置正确的字幕编码,否则字幕可能会显示为乱码,不能正常阅读。
对于一个视频文件来说,其画面是持续不断输出的,但是字幕却不同。字幕需要在特定的时间点显示特定的文字内容,字幕的更新具有一定的时间间隔。以电影为例,每条字幕是与人物对话时间同步更新的。正因如此,用户在字幕喜好设置时往往不能即时查看设置后的效果,要想查看字幕设置后的效果,必须等到视频播放到有字幕的时间点显示字幕的时候。而显示字幕的时间点是不确定的,用户需要等待字幕的出现才能观看到设置后的效果,等待的时间是不确定的。这就导致用户设置字幕时,会浪费过多的时间等待效果的显示,进而造成用户与显示设备200之间的交互体验感差的问题。
基于此,本申请实施例提供了一种显示设备200,在播放视频内容的时,可以即时显示用户对于字幕内容的设置效果。
本申请实施例中的显示设备200,可以为用户提供设置字幕属性的设置页面,也可以为用户提供显示字幕设置效果的预设显示区域。
显示设备200正在播放视频内容时,如果用户想要对字幕进行设置,则可以在当前显示视频内容的用户界面上选择字幕属性页面打开。本申请实施例中,用户打开字幕属性页面的操作,可以是通过遥控器等控制装置100移动用户界面上的焦点,进而选择字幕属性页面入口;也可以是通过按下遥控器等控制装置100上的菜单按键等功能按键,这种功能按键与显示设备200上的一些菜单页面相关联,并且菜单页面上设置有字幕属性页面的入口;还可以是用户通过语音助手等方式向显示设备200输入语音指令,该语音指令可以控制显示设备200直接进入字幕属性页面。
图10为根据一些实施例的显示设备200用户界面上显示带有字幕属性页面入口的菜单页面的示意图。如图10所示,该菜单页面可以显示在如图9所示的用户界面的右侧,其上包括一些视频内容相关的功能入口,例如“字幕”、“显示”、“音频”等。当用户选择“字幕”入口后,显示设备200就会显示字幕属性页面。
图11为根据一些实施例的显示设备200用户界面上显示字幕属性页面的示意图。如图11所示,该字幕属性页面同样可是显示在用户界面的右侧,其上包括一些字幕属性的选项,例如字幕编码、字号、内容颜色、背景颜色等等。用户在字幕属性页面上选择任意字幕属性即可控制显示设备200显示对应的属性页面。同时,显示设备200在显示字幕属性页面的同时,还会显为预设显示区域201,在预设显示区域201可以显示作为预览内容的预设字幕内容,例如“该得到的我都得到了”。
用户选择字幕编码选项时,显示设备200上可以显示如图12所示的字幕编码页面,在此页面上包括若干种不同编码方式的选项,例如utf8(Universal Character Set/Unicode Transformation Format 8,8位元)、gb(GuoBiao,国标)、big5(大五码)、ios、windows等。编码方式表示显示设备200中存储字幕内容的方式,当显示设备200需要显示字幕内容时,还需要采用对应的解码方式对存储的字幕内容进行解码,通过用户对不同编码方式的选择,显示设备200也可以采用不同的解码方式对字幕内容进行解码。
本申请实施例中,一种编码方式可以对应统一名称的解码方式,确定了字幕内容的编码方式即可确定对应的解码方式。
用户选择文字字号选项时,显示设备200上可以显示如图13所示的字号页面,在此页面上包括若干字号选项,例如四号、三号、小三号等等。用户可根据自己的喜好和观看习惯等选择对应的字号,进而显示设备200会将字幕内容中全部文字、字母或者符号等的大小作对应的调整。
用户选择内容颜色选项时,显示设备200上可以显示如图14所示的内容颜色页面,在此页面上包括若干颜色选项,例如白色、黑色、黄色、红色、蓝色等。用户可根据自己的喜好和观看习惯等选择对应的字体颜色,进而显示设备200会将字幕内容的全部文字、字幕或者符号等的颜色作对应的调整。
用户选择背景颜色选项时,显示设备200上可以显示如图15所示的背景颜色页面,在此页面上也包括若干颜色选项,例如白色、黑色、黄色、红色、蓝色等。用户可根据自己的喜好和观看习惯等选择对应的背景颜色,进而显示设备200会将字幕内容的背景颜色最对应的设置。本申请实施例中所说的字幕内容的背景,就是指用户界面上规定的显示字幕内容的区域内不显示文字、字母或者符号的区域。例如,显示设备200正常播放视频内容时通常具有固定区域用来显示字幕内容,在一些情况下,此区域可能呈现矩形,在该矩形范围内显示的文字、字母或者符号等并不能完全布满全部的矩形区域,此时,未被覆盖区域即可看作是字幕内容的背景。
值得说明的是,在具有背景的字幕显示区域中,背景的颜色不能与字幕内容的颜色相同,从而避免无法清楚地显示字幕内容的问题。
图16为根据一些实施例的显示设备200与用户交互过程的一种示意图。如图16所示, 交互过程具体包括:
S161、用户在显示设备200播放视频内容时,可以通过控制装置100向显示设备200发出控制指令,控制显示设备200显示字幕属性页面。
S162、显示设备200显示字幕属性页面后,会立即在当前的用户界面上显示预设显示区域201,用户可以在此区域预览字幕属性设置的效果。在本次设置操作的过程中,如果用户未选择任何的字幕属性,那么预设显示区域201内会直接显示获取到的预设字幕内容或者显示用户上次设置好的字幕属性效果。
S163、用户通过控制装置100在字幕属性页面选择了目标属性后,显示设备200会根据将预设字幕内容的对应属性进行设置,并将设置后的显示效果显示在预设显示区域201内。
前述内容中的预设显示区域201中始终显示相同的文字、字母或者符号的内容,只是这些内容的属性会根据用户的选择而发生改变。
由前述内容可知,本申请实施例中的显示设备200,可以在播放视频内容的同时为用户提供字幕设置的功能,并且字幕设置后的显示效果可以即时显出来,方便用户对字幕设置的属性随时进行更改,加快字幕设置后效果的显示,也提高了用户与显示设备200的交互体验。
为解决上述显示设备200为用户提供的字幕设置方式中,不能实时显示设置效果的问题,本申请实施例中还提供了一种显示设备上字幕预览方法,此方法可以应用于前述实施例的显示设备200中,并由显示设备200中的控制器250执行。如图17所示,此方法具体可以包括如下步骤:
步骤S101,在显示设备200播放视频内容时,响应于在用户界面上选择打开字幕属性页面的用户操作,从视频内容对应的字幕文件中获取预设字幕内容。其中,用户界面是显示设备200的显示器260进行显示的。
预设字幕内容是根据显示设备200中预置的字幕内容提取条件获取的。例如,预置条件要求显示设备200提取每个视频内容对应的第一条字幕内容作为预设字幕内容,或者预置条件要求显示设备200提取每个视频内容对应的第三条字幕内容作为预设字幕内容等。另外,本申请实施例中的预设字幕内容、目标字幕内容等均是指作为效果预览使用的预览字幕内容,并不是指视频内容播放中实时播放的字幕内容。
在显示设备200播放视频内容的过程中,用户会想要将字幕设置为更加符合自己要求或者喜好的显示效果。此时,用户可以在显示设备200当前播放视频内容的用户界面上进行操作,控制显示设备200显示字幕属性页面。字幕属性页面可以如前述图11中所示。
在步骤S101中,显示设备200显示字幕属性页面后,可以立即获取到当前播放的视频内容对应的字幕文件,并从字幕文件中获取到预设字幕内容。另外,在显示设备200显示字幕属性页面后,还会立即在当前的用户界面上显为预设显示区域201,以便将预设字幕 内容显为来供用户进行效果预览。
由于显示设备200在获取到字幕文件时,需要将其采用特定的编码方式进行存储,那么在显示字幕文件中具体的内容时,也需要将每条字幕内容从存储空间中取出,并对其进行解码。解码的方式与编码的方式是相对应的,不然会出现解码后的字幕内容在显示时出现乱码的问题。
在一些情况下,显示设备200在对字幕内容解码时,并不知道对应的编码方式,进而难以确定解码方式。目前的显示设备200中,通常是用户在字幕编码页面上依次选择解码方式,显示设备200利用选择的解码方式对字幕内容解码后,在字幕真正的播放时间点上显示,如果当前选择解码方式的时间点与字幕真正播放的时间点距离较长,那么用户需要等待一段时间后才能观看到字幕显示的效果,如果显为的字幕不是乱码,那么用户可以了解到当前选择的解码方式是正确的;而如果显为的字幕是乱码,用户则需要重新选择解码方式,然后等待下一条字幕显示时查看显示效果。
可见,目前显示设备200上这种需要等到字幕出现时才能显示设置效果的方式,对于字幕解码的显示效果也存在一定的延时性,也会影响用户的使用体验。因此,在一些实施例中,如图18所示,另一实施例提供的显示设备200与用户交互过程包括:
S181、在显示设备200中还可以设置好预设解码方式,在步骤S101中获取预设字幕内容,并在用户界面显示字幕属性页面;
S182、在获取到预设字幕内容之后,先利用预设解码方式对预设字幕内容进行解码,获取解码字幕内容。
S183、解码后,直接将解码字幕内容显示在预设显示区域201,以便用户及时地了解到利用预设解码方式解码是否成功,即解码字幕内容是否乱码。
S184、如果用户界面上显示乱码的解码字幕内容,那么用户可以继续在字幕属性页面中的字幕编码页面上选择其他的解码方式继续对预设字幕内容解码(即利用目标解码方式解码预设字幕内容),并将解码字幕内容显示在预设显示区域201。
如果用户界面上显示的是未乱码的正常解码字幕内容,那么用户可以继续在字幕属性页面上选择其他的属性页面进行显示,进而选择字幕的其他属性进行设置。例如,用户选择字号属性,显示设备200则显示字号页面,用户在字号页面上选择某一个字号选项,进而调整预设显示区域201中的解码字幕内容中文字、字母或者符号的大小。
在一些情况下,显示设备200上存储字幕内容时,编码后字幕内容中的某一个标志位会表示编码方式。因此,在一些实施例中,在步骤S101中获取到预设字幕内容之后,可以先确定预设字幕内容中是否包括用于表示编码方式的标志位,如果包括,则可以根据标志位确定出编码方式和编码方式对应的解码方式,并将此解码方式确定为待选解码方式,而后再利用此待选解码方式对上述预设字幕内容进行解码,从而获解码字幕内容。
而如果步骤S101中获得的预设字幕内容中不包括标志位,则利用显示设备200中的预设解码方式对预设字幕内容进行解码,从而获得解码字幕内容。
步骤S102,响应于在字幕属性页面上选择目标属性的用户操作,根据目标属性的内容对预设字幕内容的相应属性进行设置,获得目标字幕内容。
如前述内容所述,如果显示设备200在预设显示区域201内显示的解码字幕内容为乱码,那么用户可以继续在字幕属性页面上选择其他的解码方式重新对预设字幕内容进行解码。进而在步骤S102中,可以确定用户是否在字幕属性页面中的字幕编码页面上选择目标解码方式,如果用户选择了目标解码方式,则可以利用此目标解码方式对预设字幕内容重新进行编码,从而获得目标字幕内容。
或者,如果显示设备200直接将预设字幕内容显示在预设显示区域201内,在步骤S102中,用户也可以在字幕属性页面中的字幕编码页面上选择目标解码方式,从而在显示设备200中直接利用目标解码方式对预设字幕内容进行解码。
在一些实施例中,用户除了可以对字幕解码方式进行设置,还可以对字幕的字号进行设置。进而在预设显示区域201内显示解码字幕内容之后,可以确定用户是否在字幕属性页面中的字号页面上选择目标字号。如果用户选择了目标字号,则可以根据目标字号,设置解码字幕内容的文字字号,从而获得目标字幕内容。
或者,如果显示设备200直接将预设字幕内容显示在预设显示区域201内,在步骤S102中,用户也可以在字幕属性页面的字号页面上选择目标字号,从而在显示设备200中直接将预设字幕内容的字号调整为目标字号。
在一些实施例中,用户还可以对字幕内容的颜色进行设置。进而在预设显示区域201内显示解码字幕内容之后,可以确定用户是否在字幕属性页面中的内容颜色页面上选择目标颜色。如果用户选择了目标颜色,则可以根据目标颜色,设置解码字幕内容的颜色,从而获得目标字幕内容。
或者,如果显示设备200直接将预设字幕内容显示在预设显示区域201内,在步骤S102中,用户也可以在字幕属性页面的内容颜色页面上选择目标颜色,从而在显示设备200中直接将预设字幕内容的颜色调整为目标颜色。
在一些实施例中,用户还可以对字幕内容的背景颜色进行设置。进而在预设显示区域201内显示解码字幕内容之后,可以确定用户是否在字幕属性页面中的背景颜色页面上选择目标背景色。如果用户选择了目标背景色,则可以根据目标背景色,设置解码字幕内容的背景颜色,从而获得目标字幕内容。
或者,如果显示设备200直接将预设字幕内容显示在预设显示区域201内,在步骤S102中,用户也可以在字幕属性页面的背景颜色页面上选择目标背景色,从而在显示设备200中直接将预设字幕内容的背景颜色调整为目标背景色。
在上述内容中,用户在预览预设显示区域201中显示的解码字幕内容后,可以设置字幕编码、字号、内容颜色和背景颜色等属性中的任一项、任几项或者全部。本申请实施例中的目标属性是指用户在字幕编码页面上选择的目标解码方式、用户在字号页面上选择的目标字号、用户在内容颜色页面上选择的目标颜色以及用户在背景颜色页面上选择的目标背景色等。
步骤S103,在用户界面上的预设显示区域内显示目标字幕内容。
在上述设置字幕属性的过程中,显示设备200的用户界面上可以正常播放视频内容。在播放视频内容的用户界面上,显示预设显示区域201,进而即时显示用于对于字幕属性设置后的显示效果。其中,设置好属性的字幕内容即为本申请实施例中所说的目标字幕内容。如果目标字幕内容的显示效果不能满足用户的需求或者用户的喜好,那么用户还可以在相应的属性页面上重新选择目标属性。
在一些情况下,显示设备200正常播放的视频内容中会正常显示字幕内容,在某个时间点,正常显为来的字幕内容与预设显示区域201中的目标字幕内容可能会同时显示在用户界面上,这会导致用户无法确定哪个字幕内容是预览设置效果的字幕内容,从而影响用户的使用体验。为避免这种情况,在一些实施例中,在用户设置字幕属性的过程中,可以控制显示设备200不显示正常播放的字幕内容。
在一些实施例中,为避免显示设备200上正常播放的视频内容中的字幕内容对预设显示区域201中的目标字幕内容造成影响,在用户设置字幕属性的过程中,还可以控制正在播放的视频内容暂停。例如,图19中所示,当用户在正在播放的视频内容界面上选择进行字幕属性设置操作时,显示设备200在显为字幕属性页面的同时,会暂停播放视频内容,并且在当前暂停的用户界面上显示预设显示区域201。另外,当前暂停的用户界面上也不会显示真实的字幕内容。
在一些实施例中,用户界面上的预设显示区域201的位置也不是固定的,还可以显示在用户界面上不遮挡字幕属性页面的其他位置,例如,位于如图20中所示的用户界面的左侧,或者位于如图21中所示的用户界面的顶部等。
前述实施例中,如果用户设置了字幕的背景颜色,那么在字幕正常显示时,背景颜色的区域就会遮挡一部分视频内容,这种情况会影响用户的观看体验。为避免这种情况的发生,在一些实施例中,显示设备200还可以为用户提供字幕背景的透明度的属性设置内容,即在字幕属性页面中增加背景透明度属性以及透明度页面。如果用户想要设置字幕背景颜色又不想字幕背景遮挡视频内容,那么用户可以在透明度页面上选择目标数值,进而设置字幕背景透明度。图22为根据一些实施例的显示设备200用户界面上透明度页面的示意图,如图23所示,此透明度页面上包括若干透明度数值,例如0%、10%、30%、50%、80%、100%等。用户选择不同的透明度数值,显示设备200上即可显示字幕背景不同的透明度。 以透明度为80%为例,字幕背景的透明度显示效果如图23所示。
在一些实施例中,显示设备200中还可以设置不同的处理模块来实现不同内容的获取或者处理,例如字幕解析模块、字幕设置模块、字幕显示模块等。其中,字幕解析模块可以解析字幕内容,字幕设置模块可以根据用户的选择或者预设内容等设置字幕内容的属性,字幕显示模块可以将字幕内容显为来。
如图24所示,另一实施例提供的显示设备200与用户交互过程包括:
S241、在进行字幕解码方式设置时,字幕设置模块调用字幕解析模块,进而提前获取一条真实的字幕内容。
S242、字幕设置模块设置字幕的解码方式,字幕解析模块利用此解码方式对字幕内容进行解码;
S243、将解码后的字幕内容发送给字幕显示模块进行显示。
经过以上步骤,用户就可以即时预览到字幕解码设置后的显示效果,如果字幕解码方式设置正确就能正确显为字幕内容,否则会显示乱码,需要用户再次设置其他解码方式进行预览。
另外,在上述字幕解析模块、字幕设置模块、字幕显示模块的基础上,显示设备200还可以增加编码识别模块,用来识别字幕内容中的编码标志位以及确定字幕内容采用哪种编码方式。
如图25所示,另一实施例提供的显示设备200与用户交互过程包括:
S251、在进行字幕解码方式设置时,字幕设置模块调用字幕解析模块,进而提前获取一条真实的字幕内容。
S252、字幕解析模块会将此字幕内容发送给编码识别模块。编码识别模块解析字幕内容的编码方式,再将解析结果回传给字幕设置模块。
S253、字幕设置模块根据解析出的编码方式更新用户界面,将字幕编码页面中对应的编码选项高亮表示,同时将此编码方式对应的解码方式设置给字幕解析模块。字幕解析模块根据设置的解码方式对该字幕内容进行解码,再将解码后的字幕内容发送给字幕显示模块进行显示。
并且,在用户进行其他属性设置时,字幕设置模块也可以将对应的属性值或者属性选项发送给字幕显示模块进行显示。
在本申请实施例中,虽然可以增加前述的字幕解析模块、字幕设置模块、字幕显示模块和编码识别模块等,但是这些处理模块也都是受到显示设备200中控制器250的控制才能实现具体处理功能的。
当字幕属性设置完成后,用户可以继续操作显示设备200控制器关闭当前的字幕属性页面。因此,在步骤S103之后,如果显示设备200接收到用户选择关闭字幕属性页面的操 作,那么可以控制显示设备200在关闭字幕属性页面的同时关闭预设显示区域201。以及,控制显示设备200上当前视频内容对应的全部内容根据用户选择的属性进行设置和显示。
如果进行字幕属性设置时,显示设备200上的视频内容处于暂停播放状态,那么在控制显示设备200广播字幕属性页面和预设显示区域201之后,还要控制视频内容继续播放,以及按照用户选择的属性显示视频内容对应的全部字幕内容。
值得说明的是,本申请实施例中所说的字幕属性包括但不限于前述内容中所列举的属性,在显示设备200实际使用的过程中,用户对于字幕内容的显示需求都可以作为字幕内容的属性,并且设置的方式可参见前述实施例中的内容,此处不再赘述。
由以上内容可知,本申请实施例中提供了一种显示设备上字幕预览方法及显示设备,用户可以在显示设备200播放视频内容的同时,对视频内容的字幕属性设置。并且,经过属性设置后,显示设备200上可以直接实时显示设置后的显示效果,方便用户对属性设置进行及时地设置,而避免只有在视频内容同步显示字幕内容时才显示字幕设置效果,节省用户等待效果显示的时间,进而保证用户使用显示设备200的体验感。
在一些实施方式中,本申请还提供一种计算机可读的非失性存储介质,该计算机存储介质可存储有程序,该程序执行时可包括前述各实施例中字幕显示方法所涉及的程序步骤。其中,计算机存储介质可为磁碟、光盘、只读存储记忆体(英文:Read-Only Memory,简称ROM)或随机存储记忆体(英文:Random Access Memory,简称RAM)等。
为方便解释,已经结合具体的实施方式进行了上述说明。但是,上述示例性的讨论不是意图穷尽或者将实施方式限定到上述公开的具体形式。根据上述的教导,可以得到多种修改和变形。上述实施方式的选择和描述是为更好的解释原理以及实际的应用,从而使得本领域技术人员更好的使用所述实施方式以及适于具体使用考虑的各种不同的变形的实施方式。

Claims (20)

  1. 一种显示设备,包括:
    显示器,用于显示视频及其字幕信息;
    通信器,用于与资源服务器通信连接;
    控制器,被配置为执行:
    接收所述资源服务器同步发送的视频数据和字幕数据,所述字幕数据包括字幕信息、时间信息和用于指示所述字幕信息的发声对象的目标信息;
    根据所述视频数据和所述目标信息,计算所述发声对象在视频画面中的相对位置和尺寸信息;
    根据所述时间信息、所述发声对象在视频画面中的相对位置和尺寸信息,控制显示器在所述视频画面上显示所述字幕信息。
  2. 根据权利要求1所述的显示设备,所述目标信息包括所述发声对象的形象特征和位置分布,则所述控制器被配置为按照如下方式计算所述发声对象在视频画面中的相对位置和尺寸信息:
    从所述视频数据中提取目标视频图像;
    根据所述目标信息,从所述目标视频图像中分割及识别出所述发声对象;
    计算所述发声对象在图像坐标系中的坐标、以及,计算所述发声对象的宽度和高度。
  3. 根据权利要求1或2所述的显示设备,所述目标信息中包括所述发声对象的状态描述信息,则所述控制器被配置为按照如下方式显示所述字幕信息:
    确定与所述状态描述信息相匹配的展示特效;
    根据所述展示特效和所述发声对象的尺寸信息,绘制当前的字幕模板;
    在达到所述时间信息指示的时间节点时,控制显示器在所述视频画面上发声对象所在的位置按照所述字幕模板,加载显示所述字幕信息。
  4. 根据权利要求2所述的显示设备,所述控制器被配置为按照如下方式计算所述发声对象在视频画面中的相对位置和尺寸信息:
    调用本地存储的处理模型;
    将所述目标视频图像输入至所述处理模型,控制所述处理模型对所述目标视频图像进行处理;
    获取所述处理模型的输出结果,所述输出结果包括从所述目标视频图像中分割及识别出的对象元素,以及各对象元素的坐标、宽度和高度;
    从所述对象元素中筛选出与所述目标信息相匹配的发声对象。
  5. 根据权利要求1所述的显示设备,所述发声对象为具备发声能力的生物对象或者环境中能够产生物理声响的非生物对象。
  6. 根据权利要求4所述的显示设备,所述通信器还用于与模型服务器通信连接,所述控制器还被配置为执行:
    在接收到所述模型服务器推送的模型更新消息时,向所述模型服务器请求下载更新后的处理模型;
    删除旧版本的处理模型,将更新后的处理模型存储于本地。
  7. 根据权利要求4所述的显示设备,在将所述目标视频图像输入至所述处理模型之前,所述控制器还配置为执行:
    对所述目标视频图像进行预处理,所述预处理包括对所述目标视频图像进行缩放,和/或,对所述目标视频图像进行二值化处理。
  8. 根据权利要求1所述的显示设备,所述控制器,还被配置为执行:
    在显示设备播放视频内容时,响应于在用户界面上选择打开字幕属性页面的用户操作,从所述视频内容对应的字幕文件中获取预设字幕内容;
    响应于在所述字幕属性页面上选择目标属性的用户操作,根据目标属性的内容对所述预设字幕内容的相应属性进行设置,获得目标字幕内容;
    在所述用户界面上的预设显示区域内显示所述目标字幕内容。
  9. 根据权利要求8所述的显示设备,所述控制器,还被配置为执行:
    在从所述视频内容对应的字幕文件中获取预设字幕内容之后,利用预设解码方式对所述预设字幕内容进行解码,获得解码字幕内容;
    在所述用户界面上的预设显示区域内显示所述解码字幕内容。
  10. 根据权利要求8所述的显示设备,所述控制器,还被配置为执行:
    在从所述视频内容对应的字幕文件中获取预设字幕内容之后,确定所述预设字幕内容中是否包括用于表示编码方式的标志位;
    如果所述预设字幕内容中包括所述标志位,则根据所述标志位的内容确定出待选解码方式;
    利用所述待选解码方式对所述预设字幕内容进行解码,获得解码字幕内容;
    在所述用户界面上的预设显示区域内显示所述解码字幕内容。
  11. 根据权利要求10所述的显示设备,所述控制器,还被配置为执行:
    如果所述预设字幕内容中不包括所述标志位,则利用预设解码方式对所述预设字幕内容进行解码,获得解码字幕内容。
  12. 根据权利要求9-11任一项所述的显示设备,所述控制器,还被配置为执行:
    在所述预设显示区域内显示所述解码字幕内容之后,确定用户是否在所述字幕属性页面上选择目标解码方式;
    如果用户在所述字幕属性页面上选择了目标解码方式,则利用所述目标解码方式对所 述预设字幕内容重新进行解码,获得目标字幕内容。
  13. 根据权利要求9-11任一项所述的显示设备,所述控制器,还被配置为执行:
    在所述预设显示区域内显示所述解码字幕内容之后,确定用户是否在所述字幕属性页面上选择目标字号;
    如果用户在所述字幕属性页面上选择了目标字号,则根据所述目标字号,设置所述解码字幕内容的文字字号,获得目标字幕内容。
  14. 根据权利要求9-11任一项所述的显示设备,所述控制器,还被配置为执行:
    在所述预设显示区域内显示所述解码字幕内容之后,确定用户是否在所述字幕属性页面上选择目标颜色;
    如果用户在所述字幕属性页面上选择了目标颜色,则根据所述目标颜色,设置所述解码字幕内容的颜色,获得目标字幕内容。
  15. 根据权利要求9-11任一项所述的显示设备,所述控制器,还被配置为执行:
    在所述预设显示区域内显示所述解码字幕内容之后,确定用户是否在所述字幕属性页面上选择目标背景色;
    如果用户在所述字幕属性页面上选择了目标背景色,则根据所述目标背景色,设置所述解码字幕内容的背景颜色,获得目标字幕内容。
  16. 根据权利要求9所述的显示设备,所述控制器,还被配置为执行:
    响应于在用户界面上选择关闭所述字幕属性页面的用户操作,关闭所述预设显示区域,控制所述视频内容对应的字幕文件中的全部字幕内容以所述目标属性进行显示。
  17. 一种字幕处理方法,包括:
    接收资源服务器同步发送的视频数据和字幕数据,所述字幕数据包括字幕信息、时间信息和用于指示所述字幕信息的发声对象的目标信息;
    根据所述视频数据和所述目标信息,计算所述发声对象在视频画面中的相对位置和尺寸信息;
    根据所述时间信息、所述发声对象在视频画面中的相对位置和尺寸信息,在所述视频画面上显示所述字幕信息。
  18. 根据权利要求17所述的方法,所述目标信息中包括所述发声对象的状态描述信息,则在所述视频画面上显示所述字幕信息,包括:
    确定与所述状态描述信息相匹配的展示特效;
    根据所述展示特效和所述发声对象的尺寸信息,绘制当前的字幕模板;
    在达到所述时间信息指示的时间节点时,控制显示器在所述视频画面上发声对象所在的位置按照所述字幕模板,加载显示所述字幕信息。
  19. 根据权利要求17或18所述的方法,所述发声对象为具备发声能力的生物对象或者 环境中能够产生物理声响的非生物对象。
  20. 根据权利要求17所述的方法,所述方法还包括:
    在显示设备播放视频内容时,响应于在用户界面上选择打开字幕属性页面的用户操作,从所述视频内容对应的字幕文件中获取预设字幕内容;
    响应于在所述字幕属性页面上选择目标属性的用户操作,根据目标属性的内容对所述预设字幕内容的相应属性进行设置,获得目标字幕内容;
    在所述用户界面上的预设显示区域内显示所述目标字幕内容。
PCT/CN2022/109162 2021-10-27 2022-07-29 显示设备 WO2023071349A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202280063352.4A CN118104241A (zh) 2021-10-27 2022-07-29 显示设备

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202111255290.0 2021-10-27
CN202111255290.0A CN113992960B (zh) 2021-10-27 显示设备上字幕预览方法及显示设备
CN202111280246.5 2021-10-29
CN202111280246.5A CN114007145A (zh) 2021-10-29 2021-10-29 一种字幕显示方法及显示设备

Publications (1)

Publication Number Publication Date
WO2023071349A1 true WO2023071349A1 (zh) 2023-05-04

Family

ID=86159016

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/109162 WO2023071349A1 (zh) 2021-10-27 2022-07-29 显示设备

Country Status (2)

Country Link
CN (1) CN118104241A (zh)
WO (1) WO2023071349A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103945140A (zh) * 2013-01-17 2014-07-23 联想(北京)有限公司 视频字幕的生成方法及系统
CN108419141A (zh) * 2018-02-01 2018-08-17 广州视源电子科技股份有限公司 一种字幕位置调整的方法、装置、存储介质及电子设备
CN108833992A (zh) * 2018-06-29 2018-11-16 北京优酷科技有限公司 字幕显示方法及装置
CN112383809A (zh) * 2020-11-03 2021-02-19 Tcl海外电子(惠州)有限公司 字幕显示方法、装置和存储介质
CN112580302A (zh) * 2020-12-11 2021-03-30 海信视像科技股份有限公司 一种字幕校正方法及显示设备
CN112601120A (zh) * 2020-12-15 2021-04-02 三星电子(中国)研发中心 字幕显示方法及装置
CN113992960A (zh) * 2021-10-27 2022-01-28 海信视像科技股份有限公司 显示设备上字幕预览方法及显示设备
CN114007145A (zh) * 2021-10-29 2022-02-01 青岛海信传媒网络技术有限公司 一种字幕显示方法及显示设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103945140A (zh) * 2013-01-17 2014-07-23 联想(北京)有限公司 视频字幕的生成方法及系统
CN108419141A (zh) * 2018-02-01 2018-08-17 广州视源电子科技股份有限公司 一种字幕位置调整的方法、装置、存储介质及电子设备
CN108833992A (zh) * 2018-06-29 2018-11-16 北京优酷科技有限公司 字幕显示方法及装置
CN112383809A (zh) * 2020-11-03 2021-02-19 Tcl海外电子(惠州)有限公司 字幕显示方法、装置和存储介质
CN112580302A (zh) * 2020-12-11 2021-03-30 海信视像科技股份有限公司 一种字幕校正方法及显示设备
CN112601120A (zh) * 2020-12-15 2021-04-02 三星电子(中国)研发中心 字幕显示方法及装置
CN113992960A (zh) * 2021-10-27 2022-01-28 海信视像科技股份有限公司 显示设备上字幕预览方法及显示设备
CN114007145A (zh) * 2021-10-29 2022-02-01 青岛海信传媒网络技术有限公司 一种字幕显示方法及显示设备

Also Published As

Publication number Publication date
CN118104241A (zh) 2024-05-28

Similar Documents

Publication Publication Date Title
CN112511882B (zh) 一种显示设备及语音唤起方法
CN112163086B (zh) 多意图的识别方法、显示设备
CN114118064A (zh) 显示设备、文本纠错方法及服务器
CN112182196A (zh) 应用于多轮对话的服务设备及多轮对话方法
CN116229311B (zh) 视频处理方法、装置及存储介质
CN114007145A (zh) 一种字幕显示方法及显示设备
CN112492390A (zh) 一种显示设备及内容推荐方法
CN117809680A (zh) 一种服务器、显示设备及数字人交互方法
CN113066491A (zh) 显示设备及语音交互方法
CN111464869B (zh) 一种运动位置检测方法、屏幕亮度调节方法及智能设备
WO2023071349A1 (zh) 显示设备
CN117809649A (zh) 显示设备和语义分析方法
CN117809679A (zh) 一种服务器、显示设备及数字人交互方法
CN111858856A (zh) 多轮检索式聊天方法及显示设备
CN115146652A (zh) 显示设备和语义理解方法
CN115273848A (zh) 一种显示设备及显示设备的控制方法
CN113079400A (zh) 显示设备、服务器及语音交互方法
CN111914114A (zh) 一种badcase挖掘方法及电子设备
CN111950288A (zh) 一种命名实体识别中的实体标注方法及智能设备
CN113038217A (zh) 一种显示设备、服务器及应答语生成方法
CN113703621A (zh) 语音交互方法、存储介质及设备
CN110764618A (zh) 一种仿生交互系统、方法及相应的生成系统和方法
CN113940049B (zh) 基于内容的语音播放方法及显示设备
CN115396717B (zh) 显示设备及显示画质调节方法
CN113794915B (zh) 服务器、显示设备、诗词歌赋生成方法及媒资播放方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885249

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202280063352.4

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE