WO2021217433A1 - 基于内容的语音播放方法及显示设备 - Google Patents

基于内容的语音播放方法及显示设备 Download PDF

Info

Publication number
WO2021217433A1
WO2021217433A1 PCT/CN2020/087544 CN2020087544W WO2021217433A1 WO 2021217433 A1 WO2021217433 A1 WO 2021217433A1 CN 2020087544 W CN2020087544 W CN 2020087544W WO 2021217433 A1 WO2021217433 A1 WO 2021217433A1
Authority
WO
WIPO (PCT)
Prior art keywords
broadcast
character string
voice
punctuation
user interface
Prior art date
Application number
PCT/CN2020/087544
Other languages
English (en)
French (fr)
Inventor
朱子鸣
Original Assignee
青岛海信传媒网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 青岛海信传媒网络技术有限公司 filed Critical 青岛海信传媒网络技术有限公司
Priority to CN202080000657.1A priority Critical patent/CN113940049B/zh
Priority to PCT/CN2020/087544 priority patent/WO2021217433A1/zh
Publication of WO2021217433A1 publication Critical patent/WO2021217433A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/725Cordless telephones

Definitions

  • This application relates to the field of display technology, and in particular to a content-based voice playback method and display device.
  • the voice playback function refers to inputting a paragraph of text and then outputting the text in a voice mode through algorithmic synthesis.
  • the significance of the voice playback function is to make it easier and more convenient for the blind or visually impaired to control the TV and better enjoy the multimedia services.
  • the broadcasting speed is fast, and the broadcasting is at a constant speed, and will not actively form sentence breaks.
  • this broadcast scenario if you broadcast several words or a short sentence at once, you can understand the specific meaning without breaking the sentence. But if you need to broadcast a relatively long sentence at a time, or a large-length article composed of many paragraphs.
  • the electronic manual of the UI menu the novel of the browser webpage. Because the large-length document is composed of sentences connected to sentences, there are no sentence breaks and tone frustrations in the middle, but the words are broadcasted one by one, and the broadcast speed is fast, it is easy for users to listen for a long time, the more they cannot understand the specific broadcast content. Even blind people with sensitive hearing and no sentence breaks will have doubts about the content of the broadcast when listening to a large-length broadcast.
  • the present application provides a content-based voice playback method and display device, which are used to make the broadcast content have a feeling of frustration and sentence breaks, avoid users from misunderstanding the broadcast content, and effectively improve user experience.
  • a display device including:
  • the user interface is used to receive instructions input by the user, and the input instructions are used to instruct the sound playing module to play the voice content corresponding to the character string;
  • a sound playing module which is used to play the voice content corresponding to the character string
  • the length of the character string is greater than the unit playback length and punctuation exists in the broadcast content corresponding to the character string, and the broadcast content is divided into several broadcast segments according to the punctuation;
  • the character strings corresponding to the broadcast segment are sequentially transmitted to the sound playback module, so that the sound playback module plays the voice content corresponding to the broadcast segment.
  • the punctuation includes a period ending number
  • the controller is configured to divide the broadcast content into several broadcast segments according to the punctuation according to the following steps:
  • the broadcast segment includes one or several whole sentences, and the string length of the broadcast segment is not greater than the unit playback length;
  • the punctuation includes a period ending period mark, and the entire sentence is identified according to the period ending period mark.
  • the controller is also used to execute:
  • a display device including:
  • Tuner demodulator used to receive and demodulate the program carried in the digital broadcast signal
  • Loudspeaker used to output sound
  • the voice corresponding to the character string included in the user interface is output from the speaker; wherein the voice is broadcast at an uneven speed.
  • the controller is further configured to execute: in response to user input, control the selector to move to the position of the character string to instruct to select the character string.
  • a display device including:
  • Tuner demodulator used to receive and demodulate the program carried in the digital broadcast signal
  • a display for displaying a user interface the user interface includes at least a character string of a preset length, and the character string contains punctuation marks;
  • Loudspeaker used to output sound
  • the voice corresponding to the character string included in the user interface is output from the speaker; wherein, the voice is paused for a preset time corresponding to the punctuation mark, and then continues to broadcast .
  • different punctuation marks correspond to different preset times of paused broadcasting.
  • the preset time of the paused broadcast corresponding to the same punctuation mark is the same.
  • a display device including:
  • Tuner demodulator used to receive and demodulate the program carried in the digital broadcast signal
  • Loudspeaker used to output sound
  • the character string when it is determined that the total length of the character string is longer than a predetermined length, the character string is divided into a plurality of segments according to the predetermined length, and presets are paused between the broadcast voices corresponding to the different segmented character strings After a long time, continue to play.
  • the controller is further configured to perform: determining that there is no punctuation in the character string.
  • a content-based playback method including:
  • the user interface including at least a character string of a preset length and punctuation marks
  • the voice corresponding to the character string included in the user interface is output from the speaker; wherein the voice is broadcast at an uneven speed.
  • a content-based playback method including:
  • the voice corresponding to the character string included in the user interface is output from the speaker; wherein the voice is suspended for a preset time corresponding to the punctuation mark, and then continues to broadcast.
  • a content-based playback method including:
  • the character string when it is determined that the total length of the character string is longer than a predetermined length, the character string is divided into a plurality of segments according to the predetermined length, and presets are paused between the broadcast voices corresponding to the different segmented character strings After a long time, continue to play.
  • a content-based playback method including:
  • the length of the character string is greater than the unit playback length and there are punctuation in the broadcast content corresponding to the character string, and the broadcast content is divided into several broadcast segments according to the punctuation;
  • the character strings corresponding to the broadcast segment are sequentially transmitted to the sound playback module, so that the sound playback module plays the voice content corresponding to the broadcast segment.
  • FIG. 1A exemplarily shows a schematic diagram of an operation scene between a display device and a control device
  • FIG. 1B exemplarily shows a configuration block diagram of the control device 100 in FIG. 1A;
  • FIG. 1C exemplarily shows a configuration block diagram of the display device 200 in FIG. 1A;
  • FIG. 1D exemplarily shows a block diagram of the architecture configuration of the operating system in the memory of the display device 200
  • FIG. 2 exemplarily shows a schematic diagram of a language guide opening screen provided by the display device 200
  • 3A-3B exemplarily show schematic diagrams of the voice playback speed modification screen provided by the display device 200;
  • FIG. 4 exemplarily shows a schematic diagram of a GUI provided by the display device 200 by operating the control device 100;
  • 5A-5C exemplarily show schematic diagrams of another GUI provided by the display device 200 by operating the control device 100;
  • Fig. 6 exemplarily shows a flow chart of a content-based voice playback method
  • FIG. 7 exemplarily shows a schematic diagram of the broadcast content corresponding to the character string
  • Fig. 8 exemplarily shows another flow chart of a content-based voice playback method
  • FIG. 9 exemplarily shows a schematic diagram of a scenario of calculating pause time and unit playback length
  • FIG. 10 exemplarily shows a flowchart of a method for modifying the pause time corresponding to a punctuation mark.
  • user interface in this application refers to a medium interface for interaction and information exchange between an application or operating system and a user. It realizes the conversion between the internal form of information and the form acceptable to the user.
  • the commonly used form of user interface is a graphical user interface (graphic user interface, GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It can be an icon, window, control and other interface elements displayed on the display of the display device.
  • the control can include icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, Widgets, etc. Visual interface elements.
  • FIG. 1A exemplarily shows a schematic diagram of an operation scene between the display device and the control device.
  • the control device 100 and the display device 200 can communicate in a wired or wireless manner.
  • control device 100 is configured to control the display device 200, which can receive operation instructions input by the user, and convert the operation instructions into instructions that the display device 200 can recognize and respond to, and act as an intermediary for the interaction between the user and the display device 200 effect.
  • the user operates the channel addition and subtraction keys on the control device 100, and the display device 200 responds to the channel addition and subtraction operations.
  • the control device 100 may be a remote controller 100A, including infrared protocol communication or Bluetooth protocol communication, and other short-distance communication methods, etc., to control the display device 200 in a wireless or other wired manner.
  • the user can control the display device 200 by inputting user instructions through keys on the remote control, voice input, control panel input, etc.
  • the user can control the display device 200 by inputting corresponding control commands through the volume plus and minus keys, channel control keys, up/down/left/right movement keys, voice input keys, menu keys, and power on/off keys on the remote control. Function.
  • the control device 100 may also be a smart device, such as a mobile terminal 100B, a tablet computer, a computer, a notebook computer, and the like.
  • a smart device such as a mobile terminal 100B, a tablet computer, a computer, a notebook computer, and the like.
  • an application program running on a smart device is used to control the display device 200.
  • the application can be configured to provide users with various controls through an intuitive user interface (UI) on the screen associated with the smart device.
  • UI intuitive user interface
  • the mobile terminal 100B can install a software application with the display device 200, realize connection communication through a network communication protocol, and realize the purpose of one-to-one control operation and data communication.
  • the mobile terminal 100B can establish a control instruction protocol with the display device 200, and realize the functions of the physical keys arranged in the remote control 100A by operating various function keys or virtual buttons of the user interface provided on the mobile terminal 100B.
  • the audio and video content displayed on the mobile terminal 100B can also be transmitted to the display device 200 to realize the synchronous display function.
  • the display device 200 may provide a broadcast receiving function and a network TV function of a computer support function.
  • the display device can be implemented as digital TV, Internet TV, Internet Protocol TV (IPTV), and so on.
  • the display device 200 may be a liquid crystal display, an organic light emitting display, or a projection device.
  • the specific display device type, size and resolution are not limited.
  • the display device 200 also performs data communication with the server 300 through a variety of communication methods.
  • the display device 200 may be allowed to communicate through a local area network (LAN), a wireless local area network (WLAN), and other networks.
  • the server 300 may provide various contents and interactions to the display device 200.
  • the display device 200 can send and receive information, such as receiving electronic program guide (EPG) data, receiving software program updates, or accessing a remotely stored digital media library.
  • EPG electronic program guide
  • the server 300 can be one group or multiple groups, and can be one type or multiple types of servers.
  • the server 300 provides other network service content such as video-on-demand and advertising services.
  • FIG. 1B exemplarily shows a configuration block diagram of the control device 100.
  • the control device 100 includes a controller 110, a memory 120, a communicator 130, a user input interface 140, an output interface 150, and a power supply 160.
  • the controller 110 includes a random access memory (RAM) 111, a read only memory (ROM) 112, a processor 113, a communication interface, and a communication bus.
  • RAM random access memory
  • ROM read only memory
  • the controller 110 is used to control the operation and operation of the control device 100, as well as the communication and cooperation between internal components, and external and internal data processing functions.
  • the controller 110 may control to generate a signal corresponding to the detected interaction, And send the signal to the display device 200.
  • the memory 120 is used to store various operating programs, data, and applications for driving and controlling the control device 100 under the control of the controller 110.
  • the memory 120 can store various control signal instructions input by the user.
  • the communicator 130 realizes the communication of control signals and data signals with the display device 200 under the control of the controller 110.
  • the control device 100 sends a control signal (such as a touch signal or a button signal) to the display device 200 via the communicator 130, and the control device 100 can receive the signal sent by the display device 200 via the communicator 130.
  • the communicator 130 may include an infrared signal interface 131 and a radio frequency signal interface 132.
  • the user input instruction needs to be converted into an infrared control signal according to the infrared control protocol, and sent to the display device 200 via the infrared sending module.
  • a radio frequency signal interface a user input instruction needs to be converted into a digital signal, which is then modulated according to the radio frequency control signal modulation protocol, and then sent to the display device 200 by the radio frequency sending terminal.
  • the user input interface 140 may include at least one of a microphone 141, a touch panel 142, a sensor 143, a button 144, etc., so that the user can input user instructions for controlling the display device 200 to the control device through voice, touch, gesture, pressing, etc. 100.
  • the output interface 150 outputs a user instruction received by the user input interface 140 to the display device 200, or outputs an image or voice signal received by the display device 200.
  • the output interface 150 may include an LED interface 151, a vibration interface 152 that generates vibration, a sound output interface 153 that outputs a sound, a display 154 that outputs an image, and the like.
  • the remote controller 100A can receive output signals such as audio, video, or data from the output interface 150, and display the output signals as images on the display 154, as audio on the sound output interface 153, or as vibration on the vibration interface 152. form.
  • the power supply 160 is used to provide operating power support for each element of the control device 100 under the control of the controller 110.
  • the form can be battery and related control circuit.
  • FIG. 1C exemplarily shows a block diagram of the hardware configuration of the display device 200.
  • the display device 200 may include a tuner and demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a memory 260, a user interface 265, a video processor 270, a display 275, Audio processor 280, audio output interface 285, and power supply 290.
  • the tuner and demodulator 210 which receives broadcast television signals through wired or wireless means, can perform modulation and demodulation processing such as amplification, mixing and resonance, and is used to demodulate the television selected by the user from multiple wireless or cable broadcast television signals
  • modulation and demodulation processing such as amplification, mixing and resonance
  • the audio and video signals carried in the frequency of the channel, as well as additional information (such as EPG data).
  • the tuner and demodulator 210 can be selected by the user and controlled by the controller 250 to respond to the frequency of the television channel selected by the user and the television signal carried by the frequency.
  • the tuner and demodulator 210 can receive signals in many ways according to different broadcasting formats of TV signals, such as terrestrial broadcasting, cable broadcasting, satellite broadcasting or Internet broadcasting; and according to different modulation types, it can be digital modulation or analog Modulation method; and according to different types of received TV signals, analog signals and digital signals can be demodulated.
  • different broadcasting formats of TV signals such as terrestrial broadcasting, cable broadcasting, satellite broadcasting or Internet broadcasting
  • modulation types it can be digital modulation or analog Modulation method
  • received TV signals, analog signals and digital signals can be demodulated.
  • the tuner demodulator 210 may also be in an external device, such as an external set-top box.
  • the set-top box outputs a TV signal after modulation and demodulation, and is input to the display device 200 through the external device interface 240.
  • the communicator 220 is a component used to communicate with external devices or external servers according to various types of communication protocols.
  • the display device 200 may transmit content data to an external device connected via the communicator 220, or browse and download content data from an external device connected via the communicator 220.
  • the communicator 220 may include a network communication protocol module such as a WIFI module 221, a Bluetooth communication protocol module 222, and a wired Ethernet communication protocol module 223 or a near field communication protocol module, so that the communicator 220 can receive the control device 100 according to the control of the controller 250 Control signals, and implement the control signals as WIFI signals, Bluetooth signals, radio frequency signals, etc.
  • the detector 230 is a component of the display device 200 for collecting signals from the external environment or interacting with the outside.
  • the detector 230 may include a sound collector 231, such as a microphone, which may be used to receive a user's voice, such as a voice signal of a control instruction for the user to control the display device 200; or, may collect environmental sounds used to identify the type of environmental scenes to achieve display
  • the device 200 can adapt to environmental noise.
  • the detector 230 may also include an image collector 232, such as a camera, a camera, etc., which may be used to collect external environment scenes to adaptively change the display parameters of the display device 200; and to collect The attributes of the user or interactive gestures with the user to achieve the function of interaction between the display device and the user.
  • an image collector 232 such as a camera, a camera, etc., which may be used to collect external environment scenes to adaptively change the display parameters of the display device 200; and to collect The attributes of the user or interactive gestures with the user to achieve the function of interaction between the display device and the user.
  • the detector 230 may further include a light receiver, which is used to collect the ambient light intensity to adapt to changes in display parameters of the display device 200 and so on.
  • the detector 230 may also include a temperature sensor.
  • the display device 200 may adaptively adjust the display color temperature of the image. In some embodiments, when the temperature is relatively high, the color temperature of the displayed image of the display device 200 can be adjusted to be relatively cool; when the temperature is relatively low, the color temperature of the displayed image of the display device 200 can be adjusted to be relatively warm.
  • the external device interface 240 is a component that provides the controller 250 to control data transmission between the display device 200 and external devices.
  • the external device interface 240 can be connected to external devices such as set-top boxes, game devices, notebook computers, etc. in a wired/wireless manner, and can receive external devices such as video signals (such as moving images), audio signals (such as music), and additional information (such as EPG). ) And other data.
  • the external device interface 240 may include: a high-definition multimedia interface (HDMI) terminal 241, a composite video blanking synchronization (CVBS) terminal 242, an analog or digital component terminal 243, a universal serial bus (USB) terminal 244, and a component (Component) Any one or more of terminals (not shown in the figure), red, green and blue (RGB) terminals (not shown in the figure), etc.
  • HDMI high-definition multimedia interface
  • CVBS composite video blanking synchronization
  • USB universal serial bus
  • Component Any one or more of terminals (not shown in the figure), red, green and blue (RGB) terminals (not shown in the figure), etc.
  • the controller 250 controls the work of the display device 200 and responds to user operations by running various software control programs (such as an operating system and various application programs) stored on the memory 260.
  • the controller can be implemented as a chip (System-on-a-Chip, SOC).
  • the controller 250 includes a random access memory (RAM) 251, a read only memory (ROM) 252, a graphics processor 253, a CPU processor 254, a communication interface 255, and a communication bus 256.
  • RAM random access memory
  • ROM read only memory
  • CPU CPU processor
  • communication interface 255 a communication bus 256.
  • the RAM 251, the ROM 252, the graphics processor 253, and the CPU processor 254 communication interface 255 are connected via a communication bus 256.
  • ROM252 used to store various system startup instructions. For example, when the power-on signal is received, the power of the display device 200 starts to start, and the CPU processor 254 runs the system start-up instruction in the ROM 252 to copy the operating system stored in the memory 260 to the RAM 251 to start running the start-up operating system. After the operating system is started up, the CPU processor 254 copies various application programs in the memory 260 to the RAM 251, and then starts to run and start various application programs.
  • the graphics processor 253 is used to generate various graphics objects, such as icons, operating menus, and user input instructions to display graphics.
  • the graphics processor 253 may include an arithmetic unit, which is used to perform operations by receiving various interactive instructions input by the user, and then display various objects according to display attributes; and a renderer, which is used to generate various objects obtained based on the arithmetic unit, and perform operations The rendered result is displayed on the display 275.
  • the CPU processor 254 is configured to execute operating system and application program instructions stored in the memory 260. And according to the received user input instructions, to execute various applications, data and content processing, so as to finally display and play various audio and video content.
  • the CPU processor 254 may include multiple processors.
  • the multiple processors may include a main processor and multiple or one sub-processors.
  • the main processor is configured to perform some initialization operations of the display device 200 in the display device preloading mode, and/or, to display screen operations in the normal mode. Multiple or one sub-processor, used to perform an operation in the standby mode of the display device.
  • the communication interface 255 may include the first interface to the nth interface. These interfaces may be network interfaces connected to external devices via a network.
  • the controller 250 may control the overall operation of the display device 200. For example, in response to receiving a user input command for selecting a GUI object displayed on the display 275, the controller 250 may perform an operation related to the object selected by the user input command.
  • the controller can be implemented as an SOC (System on Chip) or an MCU (Micro Control Unit, Micro Control Unit).
  • the object can be any one of the selectable objects, such as a hyperlink or an icon.
  • the operation related to the selected object for example, the operation of displaying the page, document, image, etc. connected to the hyperlink, or the operation of executing the program corresponding to the object.
  • the user input command for selecting the GUI object may be a command input through various input devices (for example, a mouse, a keyboard, a touch pad, etc.) connected to the display device 200 or a voice command corresponding to a voice spoken by the user.
  • the memory 260 is used to store various types of data, software programs or application programs for driving and controlling the operation of the display device 200.
  • the memory 260 may include volatile and/or non-volatile memory.
  • the term “memory” includes the memory 260, the RAM 251 and ROM 252 of the controller 250, or the memory card in the display device 200.
  • the memory 260 is specifically used to store the operating program that drives the controller 250 in the display device 200; to store various application programs built in the display device 200 and downloaded from an external device by the user; and the storage is used for configuration provided by the display 275 Data such as various GUIs, various objects related to the GUI, and visual effect images of the selector used to select GUI objects.
  • the memory 260 is specifically used to store drivers and related data of the tuner and demodulator 210, the communicator 220, the detector 230, the external device interface 240, the video processor 270, the display 275, the audio processor 280, etc.
  • external data such as audio and video data
  • user data such as key information, voice information, touch information, etc.
  • the memory 260 specifically stores software and/or programs for representing an operating system (OS). These software and/or programs may include, for example, a kernel, middleware, application programming interface (API), and/or application.
  • OS operating system
  • these software and/or programs may include, for example, a kernel, middleware, application programming interface (API), and/or application.
  • the kernel can control or manage system resources and functions implemented by other programs (such as the middleware, API, or application program); at the same time, the kernel can provide interfaces to allow middleware, API, or application programs to access Controller to control or manage system resources.
  • FIG. 1D exemplarily shows a block diagram of the architecture configuration of the operating system in the memory of the display device 200.
  • the operating system architecture consists of the application layer, the middleware layer, and the kernel layer from top to bottom.
  • Application layer system built-in applications and non-system-level applications belong to the application layer. Responsible for direct interaction with users.
  • the application layer can include multiple applications, such as settings applications, e-post applications, media center applications, and so on. These applications can be implemented as Web applications, which are executed based on the WebKit engine, and specifically can be developed and executed based on HTML5, Cascading Style Sheets (CSS) and JavaScript.
  • CSS Cascading Style Sheets
  • HTML HyperText Markup Language
  • HTML tags are used to describe text, graphics, animations, sounds, tables, For links, the browser will read the HTML document, interpret the content of the tags in the document, and display it in the form of a web page.
  • CSS the full name of Cascading Style Sheets (Cascading Style Sheets), is a computer language used to express the style of HTML documents, and can be used to define style structures, such as fonts, colors, and positions. CSS styles can be directly stored in HTML web pages or in separate style files to achieve control over styles in web pages.
  • JavaScript is a language used in web page programming, which can be inserted into HTML pages and interpreted and executed by the browser.
  • the interaction logic of the web application is implemented through JavaScript.
  • JavaScript can encapsulate the JavaScript extension interface through the browser to realize the communication with the kernel layer,
  • the middleware layer can provide some standardized interfaces to support the operation of various environments and systems.
  • the middleware layer can be implemented as the Multimedia and Hypermedia Information Coding Expert Group (MHEG) of the middleware related to data broadcasting, and can also be implemented as the DLNA middleware of the middleware related to external device communication, and can also be implemented as providing Display the middleware of the browser environment in which each application in the device runs.
  • MHEG Multimedia and Hypermedia Information Coding Expert Group
  • the kernel layer provides core system services, such as file management, memory management, process management, network management, system security authority management and other services.
  • the kernel layer can be implemented as a kernel based on various operating systems, for example, a kernel based on the Linux operating system.
  • the kernel layer also provides communication between system software and hardware, and provides device driver services for various hardware, such as: providing display drivers for displays, camera drivers for cameras, button drivers for remote controls, and WIFI modules Provide WiFi driver, audio driver for audio output interface, power management driver for power management (PM) module, etc.
  • device driver services for various hardware, such as: providing display drivers for displays, camera drivers for cameras, button drivers for remote controls, and WIFI modules Provide WiFi driver, audio driver for audio output interface, power management driver for power management (PM) module, etc.
  • the user interface 265 receives various user interactions. Specifically, it is used to send the input signal of the user to the controller 250, or to transmit the output signal from the controller 250 to the user.
  • the remote control 100A may send input signals such as power switch signals, channel selection signals, and volume adjustment signals input by the user to the user interface 265, and then forward the user interface 265 to the controller 250; or, the remote control 100A
  • An output signal such as audio, video, or data output from the user interface 265 processed by the controller 250 may be received, and the received output signal may be displayed or output as an audio or vibration form.
  • the user may input a user command on a graphical user interface (GUI) displayed on the display 275, and the user interface 265 receives the user input command through the GUI.
  • GUI graphical user interface
  • the user interface 265 may receive user input commands for controlling the position of the selector in the GUI to select different objects or items.
  • the user may input a user command by inputting a specific sound or gesture, and the user interface 265 recognizes the sound or gesture through the sensor to receive the user input command.
  • the video processor 270 is used to receive external video signals, and perform video data processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to the standard codec protocol of the input signal.
  • video data processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to the standard codec protocol of the input signal.
  • the video signal displayed or played directly on the display 275.
  • the video processor 270 includes a demultiplexing module, a video decoding module, an image synthesis module, a frame rate conversion module, a display formatting module, and the like.
  • the demultiplexing module is used to demultiplex the input audio and video data stream, such as the input MPEG-2 stream (based on the compression standard of digital storage media moving images and voice), then the demultiplexing module will demultiplex it Multiplexed into video signals and audio signals, etc.
  • the video decoding module is used to process the demultiplexed video signal, including decoding and scaling.
  • An image synthesis module such as an image synthesizer, is used to superimpose and mix the GUI signal generated by the graphics generator with the zoomed video image according to user input or by itself, to generate a displayable image signal.
  • the frame rate conversion module is used to convert the frame rate of the input video, such as converting the frame rate of the input 60Hz video to a frame rate of 120Hz or 240Hz, and the usual format is realized by such as frame interpolation.
  • the display formatting module is used to change the signal output by the frame rate conversion module to a signal conforming to the display format such as a display, for example, format the signal output by the frame rate conversion module to output RGB data signals.
  • the display 275 is used to receive the image signal input from the video processor 270 to display video content, images, and a menu control interface.
  • the displayed video content can be from the video content in the broadcast signal received by the tuner and demodulator 210, or from the video content input by the communicator 220 or the external device interface 240.
  • the display 275 simultaneously displays a user manipulation interface UI generated in the display device 200 and used to control the display device 200.
  • the display 275 may include a display screen component for presenting a picture and a driving component for driving image display.
  • the display 275 may also include a projection device and a projection screen.
  • the sound playback module 280 is used to receive external audio signals, and perform decompression and decoding according to the standard codec protocol of the input signal, as well as audio data processing such as noise reduction, digital-to-analog conversion, and amplification processing.
  • the audio signal to be played is used to receive external audio signals, and perform decompression and decoding according to the standard codec protocol of the input signal, as well as audio data processing such as noise reduction, digital-to-analog conversion, and amplification processing.
  • the audio processor 280 may support various audio formats. Such as MPEG-2, MPEG-4, Advanced Audio Coding (AAC), High Efficiency AAC (HE-AAC) and other formats.
  • AAC Advanced Audio Coding
  • HE-AAC High Efficiency AAC
  • the sound playing module 280 is also used to convert the character string into a sound in PCM format and play it in the speaker 286.
  • the audio output interface 285 is used to receive the audio signal output by the audio processor 280 under the control of the controller 250.
  • the audio output interface 285 may include a speaker 286, or output to an external audio output terminal 287 of a generator of an external device, such as a headset Output terminal.
  • the video processor 270 may include one or more chips.
  • the audio processor 280 may also include one or more chips.
  • the video processor 270 and the audio processor 280 may be separate chips, or may be integrated with the controller 250 in one or more chips.
  • the power supply 290 is used to provide power supply support for the display device 200 with power input from an external power supply under the control of the controller 250.
  • the power supply 290 may be a built-in power supply circuit installed inside the display device 200, or may be a power supply installed outside the display device 200.
  • FIG. 2 exemplarily shows a schematic diagram of a language guide opening screen provided by the display device 200.
  • the display device can provide a language guide to the display to turn on or turn off the setting screen. Blind or visually impaired persons need to turn on the language guide function before using the display device to turn on the voice playback function.
  • Fig. 3 exemplarily shows a schematic diagram of a voice broadcast speed modification screen provided by the display device 200.
  • the display device can provide a voice broadcast speed modification setting screen to the display.
  • the voice broadcast speed is divided into 5 levels, "Very slow”, “Slow”, “Normal”, “Fast”, and “Quick”. If the user does not modify the speaking rate, the default is "normal” speaking rate.
  • the display device can provide a voice broadcast speed modification setting screen to the display.
  • the voice broadcast speed can be displayed numerically, and the user can input the voice broadcast speed 150 words/minute he wants.
  • FIG. 4 exemplarily shows a schematic diagram of a GUI 400 provided by the display device 200 by operating the control device 100.
  • the display device may provide a GUI 400 to the display.
  • the GUI 400 includes one or more display areas providing different image content, and each display area includes one or more different items arranged. For example, items 411 to 417 are arranged in the display area 41.
  • the GUI also includes a selector 42 indicating that any item is selected. The position of the selector in the GUI or the position of each item in the GUI can be moved by the input of the user operating the control device to change the selection of different items. For example, the selector 42 indicates that the item 411 in the display area 41 is selected.
  • items refer to visual objects displayed in each display area of the GUI of the display device 200 to represent corresponding content such as icons, thumbnails, video clips, links, etc. These items can provide users with information received through data broadcasting.
  • the presentation form of the project is usually diversified.
  • the item may include text content and/or an image for displaying thumbnails related to the text content.
  • the item can be the text and/or icon of the application.
  • the display form of the selector can be the focus object.
  • the item can be selected or controlled by controlling the movement of the focus object displayed in the display device 200 according to the user's input through the control device 100.
  • the user can use the arrow keys on the control device 100 to control the movement of the focus object between items to select and control items.
  • the identification form of the focus object is not limited.
  • the position of the focus object can be realized or identified by setting the item background color, and the position of the focus object can also be identified by changing the border line, size, transparency and outline and/or font of the text or image of the focus item.
  • FIGS. 5A to 5C exemplarily show schematic diagrams of a GUI provided by the display device 200 by operating the control device 100.
  • the GUI can be implemented as the home page of the terminal device.
  • the display area 41 includes items 411 to 417 provided for users, items 411 to 416 are novels, poems, proses, scripts, play novels, and fables, respectively, and item 417 is an introduction to novels.
  • the current selector 42 indicates that the novel is selected.
  • the user operates the control device and instructs the selector 42 to select the item 411.
  • the user presses the arrow keys on the control device.
  • the display device responds to the key input instruction and instructs the selector 43 to select In item 412, the voice content corresponding to item 412, namely "poetry" is played.
  • the user operates the control device and instructs the selector 42 to select the item 411.
  • the user presses the arrow keys on the control device.
  • the display device responds to the key input instruction and instructs the selector 43 to select
  • the voice content corresponding to item 417 is played, that is, "fiction, which focuses on portraying characters and reflects the literary genre of social life through a complete storyline and environment description.
  • Characters, plot, and environment are the three elements of a novel.
  • the plot generally includes four parts: beginning, development, climax, and ending. Some include the prologue and the end.
  • the environment includes the natural environment and the social environment.”
  • the length of the character string of the content of item 417 is greater than the unit playback length and the content has punctuation, and the whole sentence is identified according to the dot at the end of the sentence.
  • the period ending point includes three types: period, exclamation mark and question mark.
  • each whole sentence is divided into a broadcast segment, thereby dividing the content of item 417 into several broadcast segments; in some embodiments, when the string length of the broadcast segment is not greater than the unit playback length ,
  • the broadcast segment can include one or several whole sentences. For example, if the sum of the first whole sentence and the second whole sentence string is less than the unit playback length, the first whole sentence and the second whole sentence can be divided into one broadcast segment.
  • the punctuation in the broadcast segment the identifier of the pause time corresponding to the punctuation is added; the content corresponding to the broadcast segment is sequentially transmitted to the sound playing module for playback.
  • Fig. 6 exemplarily shows a flow chart of a content-based voice broadcast method.
  • a content-based voice broadcast method includes the following steps S51-S59:
  • Step S51 Receive an instruction input by the user through the control device.
  • the user opens the language guide of the display device.
  • the user interface displays a UI menu or browser application, and the user interface includes at least a character string of a preset length.
  • the user moves the position of the selector in the user interface through the control device to select the character string.
  • the input instruction is used to instruct the sound playing module to play the voice content corresponding to the character string.
  • the broadcast content corresponding to the character string is large content, for example: an article, as shown in FIG. 7, an article can be divided into paragraphs, and each paragraph can also be divided into paragraphs. Sentences one by one. Punctuation was added as needed in the sentence. Like a pause, a comma, and a full stop can be used to indicate a pause between words, a comma indicates a pause between sentences, and a full stop indicates the end of a sentence.
  • Step S52 In response to the input instruction, receive the broadcast content corresponding to the character string;
  • Step S53 Determine whether the length of the character string is greater than the unit playback length
  • step S54 is executed.
  • Step S54 Transmit the character string to the sound playing module, so that the sound playing module plays the voice content corresponding to the character string;
  • the broadcast content is the name of an application
  • the string length of the name of an application is 5, which is less than the unit playback length of 20
  • the name of the application is directly transmitted to the sound playback module, and the sound playback module plays the name of the application. name.
  • step S55 is executed.
  • Step S55 Determine whether there are punctuations in the broadcast content
  • step S56 is executed.
  • Step S56 The broadcast content is intercepted with a unit playback length, and transmitted to the sound playback module in segments, so that the sound playback module plays the voice content corresponding to the character string.
  • the unit playback length is 25, that is, the sound player can receive and convert 25 characters at a time.
  • the content of the broadcast is "The novel focuses on portraying the image of the characters through the complete story plot and environmental description to reflect the literary genre of social life.
  • the character plot environment is the three elements of the novel. The plot generally includes the beginning, the development, the climax, the ending, and the four parts. Some include the prologue and the ending environment. Natural environment and social environment".
  • the segmentation result is:
  • the prime plot generally includes the beginning, the development, the climax, and the ending.
  • the four parts and some include the prologue and the end (25 characters)
  • Acoustic environment includes natural environment and social environment (14 characters)
  • step S57 is executed.
  • Step S57 Divide the broadcast content into several broadcast segments according to the punctuation
  • the sound playback module needs to convert the character string into a sound in PCM format before broadcasting, and determines how long the converted string can be received at one time according to the capability of the sound playback module. According to the conversion ability of the sound playing module, it is best to judge how many strings need to be transmitted at a time.
  • the optimal broadcast length can be set as the unit playback length.
  • the unit playback length within the conversion capability of the sound playback module can also be set according to user requirements.
  • Punctuation marks are divided into points and labels.
  • the period in a sentence includes four types: a comma, a comma, a semicolon and a colon, which indicate the pause and structural relationship in the play.
  • Labels include quotation marks, brackets, dashes, ellipsis, etc.
  • the broadcast content is divided into several broadcast segments according to the punctuation, which specifically includes:
  • the first period ending period of the broadcast content and the content before the first period ending period constitute a whole sentence.
  • the end of the current period and the content between the end of the current period and the end of the previous period constitute a whole sentence.
  • the period ending point includes three types: period, exclamation mark and question mark.
  • the broadcast content is "fiction, which focuses on portraying the image of characters, and reflects the literary genre of social life through a complete storyline and environment description.
  • Characters, plot, and environment are the three elements of a novel.
  • the plot generally includes the beginning and the environment. There are four parts: development, climax, and ending. Some include prologue and epilogue.
  • the environment includes the natural environment and the social environment. Novels can be divided into long stories, novellas, short stories, and mini novels according to their length and capacity.”
  • the broadcast segment is divided into:
  • the novel centered on portraying characters, reflects the literary genre of social life through a complete storyline and environmental description.
  • the plot generally includes four parts: beginning, development, climax, and ending, and some include prologue and epilogue.
  • the fourth paragraph Environment includes natural environment and social environment.
  • novel can be divided into long stories, novellas, short stories and micro novels according to their length and capacity.
  • the string length of the entire sentence may be too long, resulting in that the string length of the entire sentence may be greater than the unit playback length.
  • the broadcast content is divided into several broadcast segments according to the punctuation. include:
  • the unit playback length is 25, that is, the sound player can receive and convert 25 characters at a time.
  • the content of the broadcast is "novel, centered on the portrayal of characters, reflecting the literary genre of social life through complete story plots and environmental descriptions. Characters, plots, and environment are the three elements of the novel.
  • the environment includes the natural environment and the social environment. The novel follows The length and capacity are divided into long stories, novellas, short stories and micro novels.
  • the broadcast segment is divided into:
  • the fourth paragraph Environment includes natural environment and social environment. (14 characters)
  • the broadcast segment includes one or several whole sentences, and the string length of the broadcast segment is not greater than the unit playback length.
  • the first whole sentence and the second whole sentence are divided into one broadcast segment; A whole sentence, the sum of the string lengths of the second whole sentence and the third whole sentence is not greater than the unit playback length, then whether the sum of the first whole sentence to the fourth whole sentence is greater than the unit playback length, and so on , Divide the broadcast segment.
  • the unit playback length is 42, that is, the sound player can receive and convert 42 characters at a time.
  • the content of the broadcast is "novel, centered on portraying the image of characters, reflecting the literary genre of social life through a complete storyline and environment description. Characters, plot, and environment are the three elements of a novel.
  • the plot generally includes beginning, development, climax, and ending. There are four parts, some include prologue and epilogue. Environment includes natural environment and social environment. Novels can be divided into long stories, novellas, short stories and micro novels according to their length and capacity.
  • the broadcast segment is divided into:
  • the plot generally includes four parts: beginning, development, climax, and ending, and some include prologue and epilogue. (31 strings)
  • Step S58 Add a pause time identifier corresponding to the punctuation at the punctuation in the broadcast segment;
  • the punctuation in the broadcast segment may be replaced with a pause time identifier corresponding to the punctuation.
  • the content can be divided into individual paragraphs, and each paragraph can also be divided into individual sentences. Punctuation was added as needed in the sentence. Like a pause, a comma, and a full stop can be used to indicate a pause between words, a comma indicates a pause between sentences, and a full stop indicates the end of a sentence.
  • This application adds different pause times at different punctuations, so that the whole sentence can be broken during the broadcast, and the meaning of the sentence can be clear during the broadcast.
  • the period, question mark, and exclamation mark indicate the pause at the end of the sentence, and the comma, pause, semicolon, and colon express different types of pauses in the sentence.
  • the punctuation at the end of the sentence can have a longer pause time, and the punctuation at the end of the sentence can be used for different degrees of pause on the basis that the punctuation at the end of the sentence is less than the punctuation at the end of the sentence.
  • the pause time corresponding to the punctuation can be in seconds, and it can also be determined by the multiple of the pause time of the word and the word at the current voice broadcast speed. Different pause times correspond to different pause time identifiers.
  • the pause time of the dot at the end of a sentence can be set to 1s, and the pause time of the dot and label in the sentence can be set to 0.5s.
  • the pause time of the dot at the end of a sentence can be set to 2 or 3 times the pause time of words and words.
  • the period in a sentence can be less paused than the period at the end.
  • the period in a sentence can be set 0.5 times or 1 times the pause time between words and words, and the label can be set 0.5 times the pause time between words and words.
  • Step S59 The character strings corresponding to the broadcast segment are sequentially transmitted to the sound playback module, so that the sound playback module plays the voice content corresponding to the broadcast segment.
  • the user selects the content to be played by browsing the UI menu or the browser application, and the platform middleware can execute steps S51-S59 to transmit the broadcast string to the sound player. Complete the conversion of text and sound, and broadcast through the sound card driver.
  • pause time To add pause time to the punctuation, first confirm the pause time between words at the current speaking rate. Since the broadcast rate is fixed on the TV platform, it can be hard-written in the system in advance, or it can be dynamically obtained. If it is a dynamic acquisition, refer to Figure 9. It is necessary to calculate and acquire the pause time in the two scenarios in Figure 9. Scene 1 is when the TV is turned on, and scene 2 is when the voice broadcast speed is modified. Specifically, the pause time is calculated according to the voice broadcast speed, and the unit playback length is set according to the endurance of the sound playback module.
  • the content-based voice playback method further includes:
  • Step S501 Receive a modification instruction input by the user through the control device.
  • the user selects the voice broadcast speed modification item by moving the selector of the control device, and moves the position of the selector in the user interface through the control device to select different voice broadcast speeds.
  • Step S502 In response to the modification instruction, modify the voice broadcast speed
  • the voice broadcast speed is divided into 5 levels, "very slow”, “slow”, “normal”, “fast”, and “fast”. If the user does not modify the speaking rate, the default is "normal” speaking rate.
  • the voice broadcast speed can be displayed as a numerical value, and the user can input the voice broadcast speed he wants within the allowable range of the speech rate.
  • Step S503 Modify the pause time corresponding to the punctuation according to the modified voice broadcast speed.
  • the pause time between words is calculated to be 0.5s.
  • the dot in a sentence was originally set to be 1 times the pause time between words and words, that is, 1s. After changing the speaking rate to fast, the pause time between words is calculated to be 0.3s. The dot in the sentence is set to 1 times the pause time between words and words, that is, 0.6s.
  • the content to be broadcast is divided into several broadcast segments according to the punctuation of the large-size content, and the punctuation corresponding to the pause time identifier is added to the punctuation in the broadcast segment.
  • the broadcast reaches the punctuation point, the corresponding time is paused, which makes the broadcast content feel frustrated and sentence-breaking, so that the sentence meaning is clear, avoiding users' misunderstanding of the broadcast content, and effectively improving the user experience.

Abstract

本申请公开一种基于内容的语音播放方法,包括:在显示器上显示用户界面,所述用户界面中至少包括预设长度的字符串、标点符号;在被配置为启用语音播报服务时,从扬声器中输出所述用户界面中包括的字符串对应的语音;其中,所述语音按照不均匀的速度被播报。

Description

基于内容的语音播放方法及显示设备 技术领域
本申请涉及显示技术领域,尤其涉及一种基于内容的语音播放方法及显示设备。
背景技术
语音播放功能是指输入一段文字经由演算法合成方式将文字以声音方式输出。语音播放功能的意义在于使盲人或者视障人士能更容易地,更方便地操控电视,更好地享受多媒体服务。
在实际语音播报过程中,播报语速快,而且是匀速播报的,不会主动形成断句。在此种播报的场景下,如果一次性播报几个单词或者是一句简短的语句,无须断句即可听懂具体的意思。但是如果一次需要播报比较长的句子,或者是由许多段落组成的大篇幅文章。例如:UI菜单的电子说明书,浏览器网页的小说。由于大篇幅的文档是句子连着句子,中间没有断句和语气顿挫,只是单词依次被播报,且播报速度快,很容易让用户听得越久越听不懂具体播报的内容。即便是听觉灵敏的盲人,没有断句,在听大篇幅播报时,也会对播报内容产生疑问。
发明内容
本申请提供一种基于内容的语音播放方法及显示设备,用以使播报内容有语气顿挫和断句的感觉,避免用户对播报内容产生误解,有效的提高用户体验。
第一方面,提供一种显示设备,包括:
显示器,用于显示用户界面,所述用户界面中至少包括预设长度的字符串;
用户接口,用于接收用户输入的指令,输入指令用于指示声音播放模块播放所述字符串对应的语音内容;
声音播放模块,用于播放所述字符串对应的语音内容;
扬声器,用于输出所述语音内容;
控制器,用于执行:
检测到所述字符串的长度大于单位播放长度且所述字符串对应的播报内容中存在标点,根据所述标点将所述播报内容划分为数个播报段;
在所述播报段中的标点处加入所述标点对应停顿时间的标识;
依次将所述播报段对应的字符串传输至所述声音播放模块,以使所述声音播放模块播放所述播报段对应的语音内容。
一些实施例中,所述标点包括句末点号,所述控制器用于按照下述步骤执行根据所述标点将所述播报内容划分为数个播报段:
根据所述句末点号识别出整句;
将每个整句划分为一个播报段。
一些实施例中,所述播报段包括一个或数个整句且所述播报段的字符串长度不大于所述单位播放长度;
其中,所述标点包括句末点号,所述整句根据所述句末点号识别得出。
一些实施例中,所述控制器,还用于执行:
响应于修改指令,修改语音播报速度;
根据修改后的语音播报速度,修改所述标点对应的停顿时间。
第二方面,提供一种显示设备,包括:
调谐解调器,用于接收和解调数字广播信号中携带的节目;
显示器,用于显示用户界面,所述用户界面中至少包括预设长度的字符串、标点符号;
扬声器,用于输出声音;
控制器,用于执行:
在被配置为启用语音播报服务时,从所述扬声器中输出所述用户界面中包括的字符串对应的语音;其中,所述语音按照不均匀的速度被播报。
一些实施例中,所述控制器,还用于执行:响应于用户输入,控制选择器移动至所述字符串的位置,以指示选择所述字符串。
第三方面,提供一种显示设备,包括:
调谐解调器,用于接收和解调数字广播信号中携带的节目;
显示器,用于显示用户界面,所述用户界面中至少包括预设长度的字符串,所述字符串中包含标点符号;
扬声器,用于输出声音;
控制器,用于执行:
在被配置为启用语音播报服务时,从所述扬声器中输出所述用户界面中包括的字符串对应的语音;其中,所述语音在对应所述标点符号处被暂停预设时间后、继续播报。
一些实施例中,不同标点符号处对应的被暂停播报的预设时间不同。
一些实施例中,相同标点符号处对应的被暂停播报的预设时间相同。
第四方面,提供一种显示设备,包括:
调谐解调器,用于接收和解调数字广播信号中携带的节目;
显示器,用于显示用户界面,所述用户界面中至少包括多个字符串;
扬声器,用于输出声音;
控制器,用于执行:
在被配置为启用语音播报服务时,从所述扬声器中输出所述用户界面中包括的字符串对应的语音;
其中,在确定所述字符串的总长度长于预定长度时,所述字符串按照所述预定长度被划分为多个分段,以及不同分段字符串对应的被播报的语音之间暂停预设时长后、继续播放。
一些实施例中,所述控制器,还用于执行:确定所述字符串内不存在标点。
第五方面,提供一种基于内容的播放方法,包括:
在显示器上显示用户界面,所述用户界面中至少包括预设长度的字符串、标点符号;
在被配置为启用语音播报服务时,从扬声器中输出所述用户界面中包括的字符串对应的语音;其中,所述语音按照不均匀的速度被播报。
第六方面,提供一种基于内容的播放方法,包括:
在显示器上显示用户界面,所述用户界面中至少包括预设长度的字符串,所述字符串中包含标点符号;
在被配置为启用语音播报服务时,从扬声器中输出所述用户界面中包括的字符串对应的语音;其中,所述语音在对应所述标点符号处被暂停预设时间后、继续播报。
第七方面,提供一种基于内容的播放方法,包括:
在显示器上显示用户界面,所述用户界面中至少包括多个字符串;
在被配置为启用语音播报服务时,从扬声器中输出所述用户界面中包括的字符串对应的语音;
其中,在确定所述字符串的总长度长于预定长度时,所述字符串按照所述预定长度被划分为多个分段,以及不同分段字符串对应的被播报的语音之间暂停预设时长后、继续播放。
第八方面,提供一种基于内容的播放方法,包括:
检测到字符串的长度大于单位播放长度且所述字符串对应的播报内容中存在标点,根据所述标点将所述播报内容划分为数个播报段;
在所述播报段中的标点处加入所述标点对应停顿时间的标识;
依次将所述播报段对应的字符串传输至所述声音播放模块,以使所述声音播放模块播放所述播报段对应的语音内容。
附图说明
为了更清楚地说明本申请实施例中的实施方式,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1A中示例性示出了显示设备与控制装置之间操作场景的示意图;
图1B中示例性示出了图1A中控制装置100的配置框图;
图1C中示例性示出了图1A中显示设备200的配置框图;
图1D中示例性示出了显示设备200存储器中操作系统的架构配置框图;
图2中示例性示出了显示设备200提供的语言指南开启画面的示意图;
图3A-3B中示例性示出了显示设备200提供的语音播放速度修改画面的示意图;
图4中示例性示出了通过操作控制装置100而使显示设备200提供的一个GUI的示意图;
图5A-5C中示例性示出了通过操作控制装置100而使显示设备200提供的另一个GUI的示意图;
图6中示例性示出了一种基于内容的语音播放方法的流程图;
图7中示例性示出了字符串对应播报内容的示意图;
图8中示例性示出了基于内容的语音播放方法的另一种流程图;
图9中示例性示出了计算停顿时间和单位播放长度场景的示意图;
图10中示例性示出了修改标点对应的停顿时间的方法流程图。
具体实施方式
为了使本申请的目的、实施方式和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,显然,所描述的实施例仅仅是本申请一部份实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
本申请中的术语“用户界面”,是应用程序或操作系统与用户之间进行交互和信息交换的介质接口,它实现信息的内部形式与用户可以接受形式之间的转换。用户界面常用的表现形式是图形用户界面(graphicuserinterface,GUI),是指采用图形方式显示的与计算机操作相关的用户界面。它可以是在显示设备的 显示屏中显示的一个图标、窗口、控件等界面元素,其中控件可以包括图标、按钮、菜单、选项卡、文本框、对话框、状态栏、导航栏、Widget等可视的界面元素。
图1A中示例性示出了显示设备与控制装置之间操作场景的示意图。如图1A所示,控制装置100和显示设备200之间可以有线或无线方式进行通信。
其中,控制装置100被配置为控制显示设备200,其可接收用户输入的操作指令,且将操作指令转换为显示设备200可识别和响应的指令,起着用户与显示设备200之间交互的中介作用。如:用户通过操作控制装置100上频道加减键,显示设备200响应频道加减的操作。
控制装置100可以是遥控器100A,包括红外协议通信或蓝牙协议通信,及其他短距离通信方式等,通过无线或其他有线方式来控制显示设备200。用户可以通过遥控器上按键、语音输入、控制面板输入等输入用户指令,来控制显示设备200。如:用户可以通过遥控器上音量加减键、频道控制键、上/下/左/右的移动按键、语音输入按键、菜单键、开关机按键等输入相应控制指令,来实现控制显示设备200的功能。
控制装置100也可以是智能设备,如移动终端100B、平板电脑、计算机、笔记本电脑等。例如,使用在智能设备上运行的应用程序控制显示设备200。该应用程序通过配置可以在与智能设备关联的屏幕上,通过直观的用户界面(UI)为用户提供各种控制。
一些实施例中,移动终端100B可与显示设备200安装软件应用,通过网络通信协议实现连接通信,实现一对一控制操作的和数据通信的目的。如:可以使移动终端100B与显示设备200建立控制指令协议,通过操作移动终端 100B上提供的用户界面的各种功能键或虚拟按钮,来实现如遥控器100A布置的实体按键的功能。也可以将移动终端100B上显示的音视频内容传输到显示设备200上,实现同步显示功能。
显示设备200可提供广播接收功能和计算机支持功能的网络电视功能。显示设备可以实施为,数字电视、网络电视、互联网协议电视(IPTV)等。
显示设备200,可以是液晶显示器、有机发光显示器、投影设备。具体显示设备类型、尺寸大小和分辨率等不作限定。
显示设备200还与服务器300通过多种通信方式进行数据通信。这里可允许显示设备200通过局域网(LAN)、无线局域网(WLAN)和其他网络进行通信连接。服务器300可以向显示设备200提供各种内容和互动。示例的,显示设备200可以发送和接收信息,例如:接收电子节目指南(EPG)数据、接收软件程序更新、或访问远程储存的数字媒体库。服务器300可以一组,也可以多组,可以一类或多类服务器。通过服务器300提供视频点播和广告服务等其他网络服务内容。
图1B中示例性示出了控制装置100的配置框图。如图1B所示,控制装置100包括控制器110、存储器120、通信器130、用户输入接口140、输出接口150、供电电源160。
控制器110包括随机存取存储器(RAM)111、只读存储器(ROM)112、处理器113、通信接口以及通信总线。控制器110用于控制控制装置100的运行和操作,以及内部各部件之间的通信协作、外部和内部的数据处理功能。
一些实施例中,当检测到用户按压在遥控器100A上布置的按键的交互或触摸在遥控器100A上布置的触摸面板的交互时,控制器110可控制产生与检测 到的交互相应的信号,并将该信号发送到显示设备200。
存储器120,用于在控制器110的控制下存储驱动和控制控制装置100的各种运行程序、数据和应用。存储器120,可以存储用户输入的各类控制信号指令。
通信器130在控制器110的控制下,实现与显示设备200之间控制信号和数据信号的通信。如:控制装置100经由通信器130将控制信号(例如触摸信号或按钮信号)发送至显示设备200上,控制装置100可经由通信器130接收由显示设备200发送的信号。通信器130可以包括红外信号接口131和射频信号接口132。例如:红外信号接口时,需要将用户输入指令按照红外控制协议转化为红外控制信号,经红外发送模块进行发送至显示设备200。再如:射频信号接口时,需将用户输入指令转化为数字信号,然后按照射频控制信号调制协议进行调制后,由射频发送端子发送至显示设备200。
用户输入接口140,可包括麦克风141、触摸板142、传感器143、按键144等中至少一者,从而用户可以通过语音、触摸、手势、按压等将关于控制显示设备200的用户指令输入到控制装置100。
输出接口150,通过将用户输入接口140接收的用户指令输出至显示设备200,或者,输出由显示设备200接收的图像或语音信号。这里,输出接口150可以包括LED接口151、产生振动的振动接口152、输出声音的声音输出接口153和输出图像的显示器154等。例如,遥控器100A可从输出接口150接收音频、视频或数据等输出信号,并且将输出信号在显示器154上显示为图像形式、在声音输出接口153输出为音频形式或在振动接口152输出为振动形式。
供电电源160,用于在控制器110的控制下为控制装置100各元件提供运 行电力支持。形式可以为电池及相关控制电路。
图1C中示例性示出了显示设备200的硬件配置框图。如图1C所示,显示设备200中可以包括调谐解调器210、通信器220、检测器230、外部装置接口240、控制器250、存储器260、用户接口265、视频处理器270、显示器275、音频处理器280、音频输出接口285、供电电源290。
调谐解调器210,通过有线或无线方式接收广播电视信号,可以进行放大、混频和谐振等调制解调处理,用于从多个无线或有线广播电视信号中解调出用户所选择的电视频道的频率中所携带的音视频信号,以及附加信息(例如EPG数据)。
调谐解调器210,可根据用户选择,以及由控制器250控制,响应用户选择的电视频道的频率以及该频率所携带的电视信号。
调谐解调器210,根据电视信号的广播制式不同,可以接收信号的途径有很多种,诸如:地面广播、有线广播、卫星广播或互联网广播等;以及根据调制类型不同,可以数字调制方式或模拟调制方式;以及根据接收电视信号的种类不同,可以解调模拟信号和数字信号。
在其他一些示例性实施例中,调谐解调器210也可在外部设备中,如外部机顶盒等。这样,机顶盒通过调制解调后输出电视信号,经过外部装置接口240输入至显示设备200中。
通信器220,是用于根据各种通信协议类型与外部设备或外部服务器进行通信的组件。例如显示设备200可将内容数据发送至经由通信器220连接的外部设备,或者,从经由通信器220连接的外部设备浏览和下载内容数据。通信器220可以包括WIFI模块221、蓝牙通信协议模块222、有线以太网通信协议 模块223等网络通信协议模块或近场通信协议模块,从而通信器220可根据控制器250的控制接收控制装置100的控制信号,并将控制信号实现为WIFI信号、蓝牙信号、射频信号等。
检测器230,是显示设备200用于采集外部环境或与外部交互的信号的组件。检测器230可以包括声音采集器231,如麦克风,可以用于接收用户的声音,如用户控制显示设备200的控制指令的语音信号;或者,可以采集用于识别环境场景类型的环境声音,实现显示设备200可以自适应环境噪声。
在其他一些示例性实施例中,检测器230,还可以包括图像采集器232,如相机、摄像头等,可以用于采集外部环境场景,以自适应变化显示设备200的显示参数;以及用于采集用户的属性或与用户交互手势,以实现显示设备与用户之间互动的功能。
在其他一些示例性实施例中,检测器230,还可以包括光接收器,用于采集环境光线强度,以自适应显示设备200的显示参数变化等。
在其他一些示例性实施例中,检测器230,还可以包括温度传感器,如通过感测环境温度,显示设备200可自适应调整图像的显示色温。一些实施例中,当温度偏高的环境时,可调整显示设备200显示图像色温偏冷色调;当温度偏低的环境时,可以调整显示设备200显示图像色温偏暖色调。
外部装置接口240,是提供控制器250控制显示设备200与外部设备间数据传输的组件。外部装置接口240可按照有线/无线方式与诸如机顶盒、游戏装置、笔记本电脑等外部设备连接,可接收外部设备的诸如视频信号(例如运动图像)、音频信号(例如音乐)、附加信息(例如EPG)等数据。
其中,外部装置接口240可以包括:高清多媒体接口(HDMI)端子241、 复合视频消隐同步(CVBS)端子242、模拟或数字分量端子243、通用串行总线(USB)端子244、组件(Component)端子(图中未示出)、红绿蓝(RGB)端子(图中未示出)等任一个或多个。
控制器250,通过运行存储在存储器260上的各种软件控制程序(如操作系统和各种应用程序),来控制显示设备200的工作和响应用户的操作。例如,控制器可实现为芯片(System-on-a-Chip,SOC)。
如图1C所示,控制器250包括随机存取存储器(RAM)251、只读存储器(ROM)252、图形处理器253、CPU处理器254、通信接口255、以及通信总线256。其中,RAM251、ROM252以及图形处理器253、CPU处理器254通信接口255通过通信总线256相连接。
ROM252,用于存储各种系统启动指令。如在接收到开机信号时,显示设备200电源开始启动,CPU处理器254运行ROM252中的系统启动指令,将存储在存储器260的操作系统拷贝至RAM251中,以开始运行启动操作系统。当操作系统启动完成后,CPU处理器254再将存储器260中各种应用程序拷贝至RAM251中,然后,开始运行启动各种应用程序。
图形处理器253,用于产生各种图形对象,如图标、操作菜单、以及用户输入指令显示图形等。图形处理器253可以包括运算器,用于通过接收用户输入各种交互指令进行运算,进而根据显示属性显示各种对象;以及包括渲染器,用于产生基于运算器得到的各种对象,将进行渲染的结果显示在显示器275上。
CPU处理器254,用于执行存储在存储器260中的操作系统和应用程序指令。以及根据接收的用户输入指令,来执行各种应用程序、数据和内容的处理,以便最终显示和播放各种音视频内容。
在一些示例性实施例中,CPU处理器254,可以包括多个处理器。多个处理器可包括一个主处理器以及多个或一个子处理器。主处理器,用于在显示设备预加载模式中执行显示设备200的一些初始化操作,和/或,在正常模式下显示画面的操作。多个或一个子处理器,用于执行在显示设备待机模式等状态下的一种操作。
通信接口255,可包括第一接口到第n接口。这些接口可以是经由网络被连接到外部设备的网络接口。
控制器250可以控制显示设备200的整体操作。例如:响应于接收到用于选择在显示器275上显示的GUI对象的用户输入命令,控制器250便可以执行与由用户输入命令选择的对象有关的操作。例如,控制器可实现为SOC(System on Chip,,系统级芯片)或者MCU(Micro Control Unit,微控制单元)。
其中,该对象可以是可选对象中的任何一个,例如超链接或图标。该与所选择的对象有关的操作,例如显示连接到超链接页面、文档、图像等操作,或者执行与对象相对应的程序的操作。该用于选择GUI对象的用户输入命令,可以是通过连接到显示设备200的各种输入装置(例如,鼠标、键盘、触摸板等)输入命令或者与由用户说出语音相对应的语音命令。
存储器260,用于存储驱动和控制显示设备200运行的各种类型的数据、软件程序或应用程序。存储器260可以包括易失性和/或非易失性存储器。而术语“存储器”包括存储器260、控制器250的RAM251和ROM252、或显示设备200中的存储卡。
在一些实施例中,存储器260具体用于存储驱动显示设备200中控制器250的运行程序;存储显示设备200内置的和用户从外部设备下载的各种应用程序; 存储用于配置由显示器275提供的各种GUI、与GUI相关的各种对象及用于选择GUI对象的选择器的视觉效果图像等数据。
在一些实施例中,存储器260具体用于存储调谐解调器210、通信器220、检测器230、外部装置接口240、视频处理器270、显示器275、音频处理器280等的驱动程序和相关数据,例如从外部装置接口接收的外部数据(例如音视频数据)或用户接口接收的用户数据(例如按键信息、语音信息、触摸信息等)。
在一些实施例中,存储器260具体存储用于表示操作系统(OS)的软件和/或程序,这些软件和/或程序可包括,例如:内核、中间件、应用编程接口(API)和/或应用程序。一些实施例中,内核可控制或管理系统资源,以及其它程序所实施的功能(如所述中间件、API或应用程序);同时,内核可以提供接口,以允许中间件、API或应用程序访问控制器,以实现控制或管理系统资源。
图1D中示例性示出了显示设备200存储器中操作系统的架构配置框图。该操作系统架构从上到下依次是应用层、中间件层和内核层。
应用层,系统内置的应用程序以及非系统级的应用程序都是属于应用层。负责与用户进行直接交互。应用层可包括多个应用程序,如设置应用程序、电子帖应用程序、媒体中心应用程序等。这些应用程序可被实现为Web应用,其基于WebKit引擎来执行,具体可基于HTML5、层叠样式表(CSS)和JavaScript来开发并执行。
这里,HTML,全称为超文本标记语言(HyperText Markup Language),是一种用于创建网页的标准标记语言,通过标记标签来描述网页,HTML标签用以说明文字、图形、动画、声音、表格、链接等,浏览器会读取HTML文档, 解释文档内标签的内容,并以网页的形式显示出来。
CSS,全称为层叠样式表(Cascading Style Sheets),是一种用来表现HTML文件样式的计算机语言,可以用来定义样式结构,如字体、颜色、位置等的语言。CSS样式可以直接存储与HTML网页或者单独的样式文件中,实现对网页中样式的控制。
JavaScript,是一种应用于Web网页编程的语言,可以插入HTML页面并由浏览器解释执行。其中Web应用的交互逻辑都是通过JavaScript实现。JavaScript可以通过浏览器,封装JavaScript扩展接口,实现与内核层的通信,
中间件层,可以提供一些标准化的接口,以支持各种环境和系统的操作。例如,中间件层可以实现为与数据广播相关的中间件的多媒体和超媒体信息编码专家组(MHEG),还可以实现为与外部设备通信相关的中间件的DLNA中间件,还可以实现为提供显示设备内各应用程序所运行的浏览器环境的中间件等。
内核层,提供核心系统服务,例如:文件管理、内存管理、进程管理、网络管理、系统安全权限管理等服务。内核层可以被实现为基于各种操作系统的内核,例如,基于Linux操作系统的内核。
内核层也同时提供系统软件和硬件之间的通信,为各种硬件提供设备驱动服务,例如:为显示器提供显示驱动程序、为摄像头提供摄像头驱动程序、为遥控器提供按键驱动程序、为WIFI模块提供WiFi驱动程序、为音频输出接口提供音频驱动程序、为电源管理(PM)模块提供电源管理驱动等。
用户接口265,接收各种用户交互。具体的,用于将用户的输入信号发送给控制器250,或者,将从控制器250的输出信号传送给用户。一些实施例中,遥控器100A可将用户输入的诸如电源开关信号、频道选择信号、音量调节信号 等输入信号发送至用户接口265,再由用户接口265转送至控制器250;或者,遥控器100A可接收经控制器250处理从用户接口265输出的音频、视频或数据等输出信号,并且显示接收的输出信号或将接收的输出信号输出为音频或振动形式。
在一些实施例中,用户可在显示器275上显示的图形用户界面(GUI)输入用户命令,则用户接口265通过GUI接收用户输入命令。确切的说,用户接口265可接收用于控制选择器在GUI中的位置以选择不同的对象或项目的用户输入命令。
或者,用户可通过输入特定的声音或手势进行输入用户命令,则用户接口265通过传感器识别出声音或手势,来接收用户输入命令。
视频处理器270,用于接收外部的视频信号,根据输入信号的标准编解码协议,进行解压缩、解码、缩放、降噪、帧率转换、分辨率转换、图像合成等视频数据处理,可得到直接在显示器275上显示或播放的视频信号。
示例的,视频处理器270,包括解复用模块、视频解码模块、图像合成模块、帧率转换模块、显示格式化模块等。
其中,解复用模块,用于对输入音视频数据流进行解复用处理,如输入MPEG-2流(基于数字存储媒体运动图像和语音的压缩标准),则解复用模块将其进行解复用成视频信号和音频信号等。
视频解码模块,用于对解复用后的视频信号进行处理,包括解码和缩放处理等。
图像合成模块,如图像合成器,其用于将图形生成器根据用户输入或自身生成的GUI信号,与缩放处理后视频图像进行叠加混合处理,以生成可供显示 的图像信号。
帧率转换模块,用于对输入视频的帧率进行转换,如将输入的60Hz视频的帧率转换为120Hz或240Hz的帧率,通常的格式采用如插帧方式实现。
显示格式化模块,用于将帧率转换模块输出的信号,改变为符合诸如显示器显示格式的信号,如将帧率转换模块输出的信号进行格式转换以输出RGB数据信号。
显示器275,用于接收源自视频处理器270输入的图像信号,进行显示视频内容、图像以及菜单操控界面。显示视频内容,可以来自调谐解调器210接收的广播信号中的视频内容,也可以来自通信器220或外部装置接口240输入的视频内容。显示器275,同时显示显示设备200中产生且用于控制显示设备200的用户操控界面UI。
以及,显示器275可以包括用于呈现画面的显示屏组件以及驱动图像显示的驱动组件。或者,倘若显示器275为一种投影显示器,还可以包括一种投影装置和投影屏幕。
声音播放模块280,用于接收外部的音频信号,根据输入信号的标准编解码协议,进行解压缩和解码,以及降噪、数模转换、和放大处理等音频数据处理,得到可以在扬声器286中播放的音频信号。
一些实施例中,音频处理器280可以支持各种音频格式。例如MPEG-2、MPEG-4、高级音频编码(AAC)、高效AAC(HE-AAC)等格式。
声音播放模块280,还用于将字符串转化成PCM格式的声音并在扬声器286中播放。
音频输出接口285,用于在控制器250的控制下接收音频处理器280输出 的音频信号,音频输出接口285可包括扬声器286,或输出至外接设备的发生装置的外接音响输出端子287,如耳机输出端子。
在其他一些示例性实施例中,视频处理器270可以包括一个或多个芯片组成。音频处理器280,也可以包括一个或多个芯片组成。
以及,在其他一些示例性实施例中,视频处理器270和音频处理器280,可以为单独的芯片,也可以与控制器250一起集成在一个或多个芯片中。
供电电源290,用于在控制器250的控制下,将外部电源输入的电力为显示设备200提供电源供电支持。供电电源290可以是安装在显示设备200内部的内置电源电路,也可以是安装在显示设备200外部的电源。
图2中示例性示出了显示设备200提供的语言指南开启画面的示意图。
如图2所示,显示设备可向显示器提供语言指南开启或关闭设置画面。盲人或者视障人士在使用显示设备之前需将语言指南这一功能开启,从而开启语音播放功能。
图3中示例性示出了显示设备200提供的语音播报速度修改画面的示意图。
如图3A所示,显示设备可向显示器提供语音播报速度修改设置画面。语音播报速度分了5档,“很慢”,“慢速”,“正常”,“快速”,“很快”。如果用户不修改语速,默认的是“正常”语速。
如图3B所示,显示设备可向显示器提供语音播报速度修改设置画面。语音播报速度可以数值显示,用户可输入自己想要的语音播报速度150字/分钟。
图4中示例性示出了通过操作控制装置100而使显示设备200提供的一个GUI400的示意图。
在一些实施例中,如图4所示,显示设备可向显示器提供GUI400,该 GUI400包括提供不同图像内容的一个或多个展示区,各个展示区中包括布置的一个或多个不同项目。例如,展示区41内布置项目411~417。以及该GUI还包括指示任一项目被选择的选择器42,可通过用户操作控制装置的输入而移动选择器在GUI中的位置或移动各项目在GUI中的位置,以改变选择不同的项目。例如,选择器42指示展示区41内项目411被选择。
需要说明的是,项目是指在显示设备200中GUI的各展示区中显示以表示诸如图标、缩略图、视频剪辑、链接等对应内容的视觉对象,这些项目可以为用户提供通过数据广播接收的各种传统节目内容、以及由内容制造商设置的各种应用和服务内容等等。
项目的展示形式通常多样化。例如,项目可以包括文本内容和/或用于显示与文本内容相关的缩略图的图像。又如,项目可以是应用程序的文本和/或图标。
还需说明的是,选择器的显示形式可以为焦点对象。可根据用户通过控制装置100的输入,控制显示设备200中显示焦点对象的移动来选择或控制项目。如:用户可通过控制装置100上方向键控制焦点对象在项目之间的移动来选择和控制项目。焦点对象的标识形式不限。示例的,通过设置项目背景颜色来实现或标识焦点对象的位置,也可以通过改变聚焦项目的文本或图像的边框线、尺寸、透明度和轮廓和/或字体等标识焦点对象的位置。
图5A-5C中示例性示出了通过操作控制装置100而使显示设备200提供的一个GUI的示意图。
如图5A所示,GUI可实现为终端设备的首页。其中,展示区41包括为用户提供的项目411~417,项目411~416分别为小说、诗歌、散文、剧本、剧小说和寓言,项目417为小说介绍。当前选择器42指示小说被选择。
在图5A中,用户操作控制装置而指示选择器42选择了项目411,如用户按压控制装置上的方向键,如图5B所示,显示设备响应于该按键输入指令,指示选择器43选择了项目412时,播放项目412对应的语音内容,即“诗歌”。
在图5A中,用户操作控制装置而指示选择器42选择了项目411,如用户按压控制装置上的方向键,如图5C所示,显示设备响应于该按键输入指令,指示选择器43选择了项目417时,播放项目417对应的语音内容,即“小说,以刻画人物形象为中心,通过完整的故事情节和环境描写来反映社会生活的文学体裁。人物、情节、环境是小说的三要素。情节一般包括开端、发展、高潮、结局四部分,有的包括序幕、尾声。环境包括自然环境和社会环境”。
项目417的内容的字符串长度大于单位播放长度且该内容存在标点,根据句末点号识别出整句。其中,句末点号包括句号、感叹号和问号三种。
一些实施例中,将每个整句划分为一个播报段,从而将项目417的内容划分为数个播报段;一些实施例中,在播报段的字符串长度不大于所述单位播放长度的情况下,播报段可包括一个或数个整句。例如,第一个整句和第二个整句字符串之和小于单位播放长度,可将第一个整句和第二个整句划分为一个一个播报段。在播报段中的标点处加入标点对应停顿时间的标识;依次将所述播报段对应的内容传输至所述声音播放模块播放。
图6中示例性示出了一种基于内容的语音播报方法的流程图。
结合图6所示的方法来说,一种基于内容的语音播报方法包括以下步骤S51-S59:
步骤S51:接收用户通过控制装置而输入的指令。
用户打开显示设备的语言指南。用户界面显示UI菜单或浏览器应用,用户 界面中至少包括预设长度的字符串。用户通过控制装置移动选择器在用户界面中的位置,选择所述字符串。输入指令用于指示声音播放模块播放所述字符串对应的语音内容。
在一些实施例中,所述字符串对应的播报内容为大篇幅内容,例如:一篇文章,如图7所示,一篇文章都可以分为一个个的段落,每个段落也可以分为一个个的句子。句子里面根据需要加入了标点。像顿号,逗号,句号可以在表示词语间的停顿,逗号表示句子间的停顿,句号表示一句话的结束。
步骤S52:响应于输入指令,接收字符串对应的播报内容;
步骤S53:判断所述字符串长度是否大于单位播放长度;
如果所述字符串长度不大于单位播放长度,执行步骤S54。
步骤S54:将所述字符串传输至声音播放模块,以使声音播放模块播放所述字符串对应的语音内容;
一些实施例中,播报内容为某应用的名称,某应用的名称的字符串长度为5,小于单位播放长度20,则直接将某应用的名称传输至声音播放模块,声音播放模块播放某应用的名称。
如果所述播报内容的字符串长度大于单位播放长度,执行步骤S55。
步骤S55:判断所述播报内容中是否存在标点;
如果所述播报内容中不存在标点,执行步骤S56。
步骤S56:以单位播放长度截取所述播报内容,分段传输至所述声音播放模块,以使所述声音播放模块播放所述字符串对应的语音内容。
一些实施例中,单位播放长度为25,即声音播放器一次可以接收转化25个字符。播报内容为“小说以刻画人物形象为中心通过完整的故事情节和环境 描写来反映社会生活的文学体裁人物情节环境是小说的三要素情节一般包括开端发展高潮结局四部分有的包括序幕尾声环境包括自然环境和社会环境”。
分段结果为:
第一段:小说以刻画人物形象为中心通过完整的故事情节和环境描(25个字符)
第二段:写来反映社会生活的文学体裁人物情节环境是小说的三要(25个字符)
第三段:素情节一般包括开端发展高潮结局四部分有的包括序幕尾(25个字符)
第四段:声环境包括自然环境和社会环境(14个字符)
如果所述播报内容中存在标点,执行步骤S57。
步骤S57:根据所述标点将所述播报内容划分为数个播报段;
声音播放模块在播报之前需要将字符串转化成PCM格式的声音,根据声音播放模块能力决定一次可以接收转化多长的字符串。根据声音播放模块转化的能力,判断一次需要传多少的字符串为最佳。这个最佳的播报长度就可以设定为单位播放长度。
一些实施例中,还可以根据用户需求设置在声音播放模块转化的能力内的单位播放长度。
标点符号分为点号和标号。句中的点号包括顿号,逗号,分号和冒号四种,表示剧中的停顿和结构关系。句末点号包括句号,问号,叹号三种,表示一句好说完之后一个较大的停顿。标号包括引号,括号,破折号,省略号等。
一些实施例中,根据所述标点将所述播报内容划分为数个播报段,具体包 括:
1)根据句末点号识别出整句;
例如:播报内容的第一个句末点号与第一个句末点号之前的内容构成一个整句。当前句末点号,以及,当前句末点号与前一个句末点号之间的内容构成一个整句。其中,句末点号包括句号、感叹号和问号三种。
2)将每个整句划分为一个播报段。
一些实施例中,播报内容为“小说,以刻画人物形象为中心,通过完整的故事情节和环境描写来反映社会生活的文学体裁。人物、情节、环境是小说的三要素。情节一般包括开端、发展、高潮、结局四部分,有的包括序幕、尾声。环境包括自然环境和社会环境。小说按照篇幅及容量可分为长篇、中篇、短篇和微型小说。”
划分播报段为:
第一段:小说,以刻画人物形象为中心,通过完整的故事情节和环境描写来反映社会生活的文学体裁。
第二段:人物、情节、环境是小说的三要素。
第三段:情节一般包括开端、发展、高潮、结局四部分,有的包括序幕、尾声。
第四段:环境包括自然环境和社会环境。
第五段:小说按照篇幅及容量可分为长篇、中篇、短篇和微型小说。
一些实施例中,整句的字符串长度可能过长,导致整句的字符串长度有可能大于单位播放长度,针对这一情况,根据所述标点将所述播报内容划分为数个播报段,具体包括:
1)根据句末点号识别出整句;
2)如果所述整句的字符串长度不大于单位播放长度,将所述整句划分为一个播报段。
3)如果所述整句的字符串长度大于单位播放长度,以单位播放长度将截取所述整句,划分播报段。
一些实施例中,单位播放长度为25,即声音播放器一次可以接收转化25个字符。播报内容为“小说,以刻画人物形象为中心,通过完整的故事情节和环境描写来反映社会生活的文学体裁。人物、情节、环境是小说的三要素。环境包括自然环境和社会环境。小说按照篇幅及容量分为长篇、中篇、短篇和微型小说。
划分播报段为:
第一段:小说,以刻画人物形象为中心,通过完整的故事情节和环(25个字符)
第二段:境描写来反映社会生活的文学体裁。(16个字符)
第三段:人物、情节、环境是小说的三要素。(16个字符)
第四段:环境包括自然环境和社会环境。(14个字符)
第五段:小说按照篇幅及容量分为长篇、中篇、短篇和微型小说。(25个字符)
一些实施例中,所述播报段包括一个或数个整句且所述播报段的字符串长度不大于所述单位播放长度。
具体的步骤:如果第一个整句的字符串长度不大于单位播放长度,但第一个整句和第二个整句的字符串长度之和大于单位播放长度,则将第一个整句划 分为一个播报段;如果第一个整句和第二个整句的字符串长度之和不大于单位播放长度,则继续判断第一个整句,第二个整句和第三个整句的字符串长度之和是否大于单位播放长度。如果第一个整句,第二个整句和第三个整句的字符串长度之和大于单位播放长度,则将第一个整句和第二个整句划分为一个播报段;如果第一个整句,第二个整句和第三个整句的字符串长度之和不大于单位播放长度,则第一个整句至第四个整句之和是否大于单位播放长度,依次类推,划分出播报段。
一些实施例中,单位播放长度为42,即声音播放器一次可以接收转化42个字符。播报内容为“小说,以刻画人物形象为中心,通过完整的故事情节和环境描写来反映社会生活的文学体裁。人物、情节、环境是小说的三要素。情节一般包括开端、发展、高潮、结局四部分,有的包括序幕、尾声。环境包括自然环境和社会环境。小说按照篇幅及容量可分为长篇、中篇、短篇和微型小说。
划分播报段为:
第一段:小说,以刻画人物形象为中心,通过完整的故事情节和环境描写来反映社会生活的文学体裁。(41个字符)
第二段:人物、情节、环境是小说的三要素。(16个字符)
第三段:情节一般包括开端、发展、高潮、结局四部分,有的包括序幕、尾声。(31个字符串)
第四段:环境包括自然环境和社会环境。(14个字符串)小说按照篇幅及容量可分为长篇、中篇、短篇和微型小说。(26个字符串)
步骤S58:在所述播报段中的标点处加入所述标点对应的停顿时间标识;
一些实施例中,可将所述播报段中的标点处替换为所述标点对应的停顿时间标识。
当播报内容为大篇幅内容时,该内容可以分为一个个的段落,每个段落也可以分为一个个的句子。句子里面根据需要加入了标点。像顿号,逗号,句号可以在表示词语间的停顿,逗号表示句子间的停顿,句号表示一句话的结束。本申请在不同的标点处加入不同的停顿时间,让整句话在播报的过程中,可以断句,在播报的过程中,让句子意思清晰。
在句子中,句号,问号,叹号表示句末的停顿,逗号,顿号,分号,冒号表达的句内的不同性质的停顿。处于句末的标点可以拥有较大的停顿时间,处于句内的标点在少于句末标点的基础上,可以用于不同程度的停顿。其中,标点对应停顿时间可以为秒为单位,还可以通过当前语音播报速度下词与词的停顿时间的倍数来确定。不同停顿时间对应不同停顿时间标识。
例如:在正常语速下,句末点号的停顿时间可以设置为1s,句中点号和标号的停顿时间可以设置为0.5s。句末点号的停顿时间可以设置2倍或3倍于词与词的停顿时间。句中点号可以比句末点号少一些停顿,句中点号可以设置0.5倍或1倍于词与词的停顿时间,标号可以设置0.5倍于词与词的停顿时间。还可以将同为句中点号的顿号和逗号分别设置为0.5倍于词与词的停顿时间和1倍于词与词的停顿时间。
步骤S59:依次将所述播报段对应的字符串传输至所述声音播放模块,以使所述声音播放模块播放所述播报段对应的语音内容。
在一些实施例中,参阅图8,用户通过浏览UI菜单或浏览器应用,选择想要播放的内容,平台中间件可以执行步骤S51-S59,以使将播报的字符串传入 声音播放器,完成文字同声音的转换,并通过声卡驱动进行播报。
要在标点处加入停顿时间,首先要确认当前语速下,词与词之间的停顿时间是多少。由于播报语速在电视平台是定好的,可以提前获得硬写在系统中,也可以动态的去获取。如果是动态的获取,参阅图9,需要图9中两个场景下,去计算获取到停顿时间。场景1是开启电视时,场景2是修改语音播报速度时。具体为根据语音播报速度,计算停顿时间,根据声音播放模块的承受能力设定单位播放长度。
在一些实施例中,结合图10所示的方法来说,在步骤S51之前,所述基于内容的语音播放方法还包括:
步骤S501:接收用户通过控制装置而输入的修改指令。
用户通过控制装置移动选择器选择语音播报速度修改项目,通过控制装置移动选择器在用户界面中的位置,以选择不同语音播报速度。
步骤S502:响应于所述修改指令,修改语音播报速度;
一些实施例中,语音播报速度分为5档,“很慢”,“慢速”,“正常”,“快速”,“很快”。如果用户不修改语速,默认的是“正常”语速。
一些实施例中,语音播报速度可以数值显示,在语速允许范围内用户可输入自己想要的语音播报速度。
步骤S503:根据修改后的语音播报速度,修改所述标点对应的停顿时间。
在语音播报过程中,词与词之间有一定间隔时间。每种语速播报过程中,词与词之间的间隔时间是不一样的,语速慢一些,间隔时间会多一些,语速快一些,间隔时间会少一些。相应的,在语速改变的情况下,标点对应的停顿时间也需要进行修改,才能更好地起到断句的作用。
一些实施例中,正常语速下,计算得到词与词之间的停顿时间为0.5s。句中点号原本设置为1倍于词与词的停顿时间,即1s。在将语速修改为快速后,计算得到词与词之间的停顿时间为0.3s。句中点号设置为1倍于词与词的停顿时间,即0.6s。
在上述实施例中,在播报大篇幅内容时,根据大篇幅内容的标点将需要播报内容划分为数个播报段,在播报段中的标点处加入标点对应停顿时间的标识。当播报到标点处停顿对应的时间,使播报内容有语气顿挫,断句的感觉,让句子意思清晰,避免用户对播报内容产生误解,有效的提高用户体验。
出于说明和描述的目的,提供了前述实施例,而非旨在穷举或限制本申请。具体实施例的各个元件或特征通常不限于该具体实施例,而是在适用情况下即使未具体示出或描述也可在所选实施例中使用或互换。同样也可以进行许多形式变型,这种变型不被认为是脱离本申请所附的权利要求的范围,而且所有这样的修改被涵盖在本申请所附的权利要求的范围内。

Claims (20)

  1. 一种显示设备,其特征在于,包括:
    调谐解调器,用于接收和解调数字广播信号中携带的节目;
    显示器,用于显示用户界面,所述用户界面中至少包括预设长度的字符串、标点符号;
    扬声器,用于输出声音;
    控制器,用于执行:
    在被配置为启用语音播报服务时,从所述扬声器中输出所述用户界面中包括的字符串对应的语音;其中,所述语音按照不均匀的速度被播报。
  2. 如权利要求1所述的显示设备,其特征在于,所述控制器,还用于执行:
    响应于用户输入,控制选择器移动至所述字符串的位置,以指示选择所述字符串。
  3. 一种显示设备,其特征在于,包括:
    调谐解调器,用于接收和解调数字广播信号中携带的节目;
    显示器,用于显示用户界面,所述用户界面中至少包括预设长度的字符串,所述字符串中包含标点符号;
    扬声器,用于输出声音;
    控制器,用于执行:
    在被配置为启用语音播报服务时,从所述扬声器中输出所述用户界面中包括的字符串对应的语音;其中,所述语音在对应所述标点符号处被暂停预设时间后、继续播报。
  4. 如权利要求3所述的显示设备,其特征在于,不同标点符号处对应的被 暂停播报的预设时间不同。
  5. 如权利要求3所述的显示设备,其特征在于,相同标点符号处对应的被暂停播报的预设时间相同。
  6. 一种显示设备,其特征在于,包括:
    调谐解调器,用于接收和解调数字广播信号中携带的节目;
    显示器,用于显示用户界面,所述用户界面中至少包括多个字符串;
    扬声器,用于输出声音;
    控制器,用于执行:
    在被配置为启用语音播报服务时,从所述扬声器中输出所述用户界面中包括的字符串对应的语音;
    其中,在确定所述字符串的总长度长于预定长度时,所述字符串按照所述预定长度被划分为多个分段,以及不同分段字符串对应的被播报的语音之间暂停预设时长后、继续播放。
  7. 如权利要求6所述的显示设备,其特征在于,所述控制器,还用于执行:确定所述字符串内不存在标点。
  8. 一种显示设备,其特征在于,包括:
    显示器,用于显示用户界面,所述用户界面中至少包括预设长度的字符串;
    用户接口,用于接收用户输入的指令;
    声音播放模块,用于播放所述字符串对应的语音内容;
    扬声器,用于输出所述语音内容;
    控制器,用于执行:
    检测到所述字符串的长度大于单位播放长度且所述字符串内存在标点,根 据所述标点将所述播报内容划分为数个播报段,以及在所述标点处加入所述标点对应停顿时间的标识;
    依次将所述播报段对应的字符串传输至所述声音播放模块,以使所述声音播放模块播放所述播报段对应的语音内容。
  9. 如权利要求8所述的显示设备,其特征在于,所述标点包括句末点号,所述控制器用于按照下述步骤执行根据所述标点将所述播报内容划分为数个播报段:
    根据所述句末点号识别出整句;
    将每个整句划分为一个播报段。
  10. 如权利要求8所述的显示设备,其特征在于,所述播报段包括一个或数个整句且所述播报段的字符串长度不大于所述单位播放长度;
    其中,所述标点包括句末点号,所述整句根据所述句末点号识别得出。
  11. 如权利要8所述的显示设备,其特征在于,所述控制器,还用于执行:
    响应于用户输入的修改指令,修改声音播放模块播放语音内容的语音播报速度;
    根据修改后的语音播报速度,修改所述标点对应的停顿时间。
  12. 一种基于内容的播放方法,其特征在于,包括:
    在显示器上显示用户界面,所述用户界面中至少包括预设长度的字符串、标点符号;
    在被配置为启用语音播报服务时,从扬声器中输出所述用户界面中包括的字符串对应的语音;其中,所述语音按照不均匀的速度被播报。
  13. 如权利要求12所述的方法,其特征在于,所述方法还包括:
    响应于用户输入,控制选择器移动至所述字符串的位置,以指示选择所述字符串。
  14. 一种基于内容的播放方法,其特征在于,包括:
    在显示器上显示用户界面,所述用户界面中至少包括预设长度的字符串,所述字符串中包含标点符号;
    在被配置为启用语音播报服务时,从扬声器中输出所述用户界面中包括的字符串对应的语音;其中,所述语音在对应所述标点符号处被暂停预设时间后、继续播报。
  15. 如权利要求14所述的方法,其特征在于,不同标点符号处对应的被暂停播报的预设时间不同,和/或,相同标点符号处对应的被暂停播报的预设时间相同。
  16. 一种基于内容的播放方法,其特征在于,包括:
    在显示器上显示用户界面,所述用户界面中至少包括多个字符串;
    在被配置为启用语音播报服务时,从扬声器中输出所述用户界面中包括的字符串对应的语音;
    其中,在确定所述字符串的总长度长于预定长度时,所述字符串按照所述预定长度被划分为多个分段,以及不同分段字符串对应的被播报的语音之间暂停预设时长后、继续播放。
  17. 如权利要求16所述的方法,其特征在于,所述方法还包括:
    确定所述字符串内不存在标点。
  18. 一种基于内容的播放方法,其特征在于,包括:
    检测到字符串的长度大于单位播放长度且所述字符串中存在标点,根据所 述标点将所述播报内容划分为数个播报段,以及在所述标点处加入所述标点对应停顿时间的标识;
    依次将所述播报段对应的字符串传输至所述声音播放模块,以使所述声音播放模块播放所述播报段对应的语音内容。
  19. 如权利要求18所述的方法,其特征在于,所述标点包括句末点号,根据所述标点将所述播报内容划分为数个播报段,具体包括:
    根据所述句末点号识别出整句;
    将每个整句划分为一个播报段。
  20. 如权利要18所述的方法,其特征在于,还包括:
    响应于所述修改指令,修改语音播报速度;
    根据修改后的语音播报速度,修改所述标点对应的停顿时间。
PCT/CN2020/087544 2020-04-28 2020-04-28 基于内容的语音播放方法及显示设备 WO2021217433A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080000657.1A CN113940049B (zh) 2020-04-28 2020-04-28 基于内容的语音播放方法及显示设备
PCT/CN2020/087544 WO2021217433A1 (zh) 2020-04-28 2020-04-28 基于内容的语音播放方法及显示设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/087544 WO2021217433A1 (zh) 2020-04-28 2020-04-28 基于内容的语音播放方法及显示设备

Publications (1)

Publication Number Publication Date
WO2021217433A1 true WO2021217433A1 (zh) 2021-11-04

Family

ID=78331558

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087544 WO2021217433A1 (zh) 2020-04-28 2020-04-28 基于内容的语音播放方法及显示设备

Country Status (2)

Country Link
CN (1) CN113940049B (zh)
WO (1) WO2021217433A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110320206A1 (en) * 2010-06-29 2011-12-29 Hon Hai Precision Industry Co., Ltd. Electronic book reader and text to speech converting method
CN106648291A (zh) * 2016-09-28 2017-05-10 珠海市魅族科技有限公司 一种信息显示、信息播报的方法及装置
CN107516509A (zh) * 2017-08-29 2017-12-26 苏州奇梦者网络科技有限公司 用于新闻播报语音合成的语音库构建方法及系统
CN108831436A (zh) * 2018-06-12 2018-11-16 深圳市合言信息科技有限公司 一种模拟说话者情绪优化翻译后文本语音合成的方法
CN109995939A (zh) * 2019-03-25 2019-07-09 联想(北京)有限公司 信息处理方法和电子设备
CN110136688A (zh) * 2019-04-15 2019-08-16 平安科技(深圳)有限公司 一种基于语音合成的文字转语音方法及相关设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110320206A1 (en) * 2010-06-29 2011-12-29 Hon Hai Precision Industry Co., Ltd. Electronic book reader and text to speech converting method
CN106648291A (zh) * 2016-09-28 2017-05-10 珠海市魅族科技有限公司 一种信息显示、信息播报的方法及装置
CN107516509A (zh) * 2017-08-29 2017-12-26 苏州奇梦者网络科技有限公司 用于新闻播报语音合成的语音库构建方法及系统
CN108831436A (zh) * 2018-06-12 2018-11-16 深圳市合言信息科技有限公司 一种模拟说话者情绪优化翻译后文本语音合成的方法
CN109995939A (zh) * 2019-03-25 2019-07-09 联想(北京)有限公司 信息处理方法和电子设备
CN110136688A (zh) * 2019-04-15 2019-08-16 平安科技(深圳)有限公司 一种基于语音合成的文字转语音方法及相关设备

Also Published As

Publication number Publication date
CN113940049B (zh) 2023-10-31
CN113940049A (zh) 2022-01-14

Similar Documents

Publication Publication Date Title
CN111200746B (zh) 显示设备处于待机状态时被唤醒的方法及显示设备
WO2021109491A1 (zh) Epg用户界面的显示方法及显示设备
WO2021147299A1 (zh) 一种内容显示方法及显示设备
CN111654743B (zh) 音频播放方法及显示设备
WO2021169168A1 (zh) 一种视频文件预览方法及显示设备
WO2021189712A1 (zh) 网页视频全屏播放切换小窗口播放的方法及显示设备
CN111343492B (zh) 一种浏览器在不同图层的显示方法及显示设备
WO2021212667A1 (zh) 一种多媒资数据显示方法及显示设备
WO2021109411A1 (zh) 文本类型转换方法及显示设备
CN111093106B (zh) 一种显示设备
WO2021109450A1 (zh) Epg界面的展示方法及显示设备
CN112004126A (zh) 搜索结果显示方法及显示设备
CN111601143A (zh) 一种护眼模式服务启动方法及显示设备
WO2021227232A1 (zh) 一种语言选项和国家选项的显示方法及显示设备
CN111885415B (zh) 一种音频数据快速输出方法及显示设备
WO2021253592A1 (zh) 一种启动体育模式的方法及显示设备
CN111050197B (zh) 一种显示设备
WO2021217433A1 (zh) 基于内容的语音播放方法及显示设备
CN113010074A (zh) 一种网页视频Video控制栏显示方法及显示设备
WO2020147507A1 (zh) 显示设备和显示方法
CN113329246A (zh) 一种显示设备及关机方法
US20220188069A1 (en) Content-based voice output method and display apparatus
CN111107403B (zh) 一种显示设备
WO2021217345A1 (zh) 内容显示方法及显示设备
WO2021218477A1 (zh) 一种显示方法及显示设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20933482

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20933482

Country of ref document: EP

Kind code of ref document: A1