WO2018018882A1 - 一种语音播报方法及装置 - Google Patents

一种语音播报方法及装置 Download PDF

Info

Publication number
WO2018018882A1
WO2018018882A1 PCT/CN2017/073946 CN2017073946W WO2018018882A1 WO 2018018882 A1 WO2018018882 A1 WO 2018018882A1 CN 2017073946 W CN2017073946 W CN 2017073946W WO 2018018882 A1 WO2018018882 A1 WO 2018018882A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
content
target node
merged
target
Prior art date
Application number
PCT/CN2017/073946
Other languages
English (en)
French (fr)
Inventor
曹刚
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to US16/320,776 priority Critical patent/US11074037B2/en
Priority to EP17833197.1A priority patent/EP3489845A4/en
Publication of WO2018018882A1 publication Critical patent/WO2018018882A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback

Definitions

  • the present disclosure relates to the field of communications, and in particular, to a voice broadcast method and apparatus.
  • the voice broadcast and the button machine navigation function have conflicting functions, because the button of the button can only focus on the elements that can be focused on the webpage (such as links, input boxes, buttons, etc.), so that navigation by the button will miss a lot of non-availability.
  • a voice announcement of the body information of the focused element Therefore, a technical solution for voice broadcast is urgently needed, which can solve the problem that the prior art misses the body information of many unfocused elements in the process of performing voice broadcast.
  • the embodiments of the present disclosure are directed to providing a method and apparatus for voice broadcast, which at least solves the problem of missing a lot of body information of an unfocused element when performing a voice broadcast.
  • An embodiment of the present disclosure provides a voice broadcast method, where the method includes: when a voice broadcast operation is received, generating a voice broadcast instruction; and searching for a node from a current focus node in a current web interface according to the voice broadcast instruction; When the node is the target node, the text content of the target node is broadcasted; wherein the target node is a node that has text information and does not have a child node and does not respond to an operation event.
  • the method before the text content of the target node is broadcasted, the method further includes: determining that the node is a non-focal node when the node is not responding to the operation event; determining the non- When the focused node is a node element of the webpage text object model DOM that does not include the child node, determining that the node is a leaf node; determining a text length of the leaf node; when the text length of the leaf node is greater than a preset length threshold Determining that the node is a target node.
  • the method before the text content of the target node is broadcasted, the method further includes: detecting a size of the content to be broadcasted by the target node; and corresponding to the target node according to the size of the content to be broadcasted by the target node Node reconstruction processing.
  • the target node is corresponding according to the size of the content to be broadcasted by the target node.
  • the node reconstruction process includes: when the content to be broadcast of the target node does not reach the first preset range, performing a merge process on the target node, and using the merged processed content to be the text content; When the content to be broadcasted belongs to the first preset range, the content to be broadcasted by the target node is obtained, and the content to be broadcasted is used as text content; when the content of the target node to be broadcast exceeds the first preset range The segmentation process is performed on the target node, and the content to be broadcasted after the segmentation process is used as the text content.
  • the merging process of the target node includes: searching for a node to be merged of the target node in order from the target node, where the node to be merged is not the text information of the target node. a sibling node having a child node; when the node to be merged is the same as the element label of the target node, combining the target node with the merged node to obtain a merged node; and detecting the content of the merged node to be broadcasted Whether the size of the to-be-advertised content of the merged node does not reach the first preset range, and continues to find the next node to be merged with the element label of the target node until the The size of the content to be broadcast of the merged node reaches a first preset range or there is no node to be merged that is the same as the element label of the target node.
  • the embodiment of the present disclosure further provides a voice broadcast apparatus for implementing the voice broadcast method, where the apparatus includes: a receiving unit, a searching unit, and a broadcast unit; wherein the receiving unit is configured to generate a voice broadcast when receiving the voice broadcast operation
  • the searching unit is configured to: according to the voice broadcast instruction, search for a node from a current focus node in a current webpage interface; the broadcast unit is configured to broadcast the text content of the target node when the node is a target node Wherein the target node is a node having text information that does not have a child node and does not respond to an operation event.
  • the device further includes: a determining unit, configured to: when the node is determined to be a node that does not respond to the operation event, determine that the node is a non-follable node; and determine that the non-follable node is a webpage text
  • a determining unit configured to: when the node is determined to be a node that does not respond to the operation event, determine that the node is a non-follable node; and determine that the non-follable node is a webpage text
  • the node element of the child node is not included in the object model DOM
  • the node is determined to be a leaf node; the text length of the leaf node is determined; and when the text length of the leaf node is greater than a preset length threshold, the node is determined. For the target node.
  • the apparatus further includes: a detecting unit and a reconstruction unit; wherein the detecting unit is configured to detect a size of the content to be broadcasted of the target node; and the reconstructing unit is configured to be according to the target node The content size to be broadcasted performs corresponding node reconstruction processing on the target node.
  • the re-construction unit includes: a merge module, a direct broadcast module, and a split module; wherein the merge module is configured to: when the content to be broadcast of the target node does not reach the first preset range, The target node performs the merging process, and the merged processed content is used as the text content; the direct broadcast module is configured to acquire the target when the content to be broadcasted of the target node belongs to the first preset range. And the content of the to-be-advertised content of the node is used as the text content; the segmentation module is configured to perform segmentation processing on the target node when the content to be broadcasted of the target node exceeds the first preset range, and the segmentation process is performed. The content to be broadcasted after processing is treated as text content.
  • the merging module is specifically configured to: when the content to be broadcasted of the target node does not reach the first preset range, sequentially search for the node to be merged of the target node from the target node, where
  • the node to be merged is a sibling node of the target node having text information and having no child nodes; when the node to be merged is When the element tags of the target node are the same, the target node is merged with the merged node to obtain a merged node; and the size of the content to be broadcasted by the merged node is detected to be less than a first preset range; If the size of the content to be broadcasted by the node does not reach the first preset range, the node to be merged with the same element label as the target node is continued to be searched until the size of the content to be broadcast of the merged node reaches the first preset.
  • the range or the node to be merged with the same element label of the target node does not exist
  • a method for voice broadcast includes: generating a voice broadcast instruction when receiving a voice broadcast operation; searching for a node from a current focus node in a current webpage interface according to the voice broadcast instruction; When the target node is a target node, the text content of the target node is broadcasted; wherein the target node is a node having text information and having no child nodes.
  • the current web page interface searches for a node having text information and no child nodes from the current focus node, searches for the target node, and performs text content of the target node. It is reported that the method provided by the embodiment of the present disclosure can solve the problem that the body information of many non-focus elements is missed during the voice broadcast in the prior art, and the complete broadcast of all the content to be played is guaranteed during the voice broadcast.
  • FIG. 1 is a schematic flowchart of a voice broadcast method according to Embodiment 1 of the present disclosure
  • FIG. 2 is a schematic flowchart of a voice broadcast method according to Embodiment 2 of the present disclosure
  • FIG. 3 is a schematic flowchart of a voice broadcast method according to Embodiment 3 of the present disclosure.
  • FIG. 4 is a schematic flowchart of a node reconstruction method according to Embodiment 4 of the present disclosure
  • FIG. 5 is a schematic structural diagram of a voice broadcast apparatus according to Embodiment 5 of the present disclosure.
  • FIG. 6 is a schematic structural diagram of another voice broadcast apparatus according to Embodiment 5 of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a reconstruction unit according to Embodiment 5 of the present disclosure.
  • FIG. 8 is a schematic structural diagram of a voice broadcast apparatus according to Embodiment 6 of the present disclosure.
  • Embodiment 1 of the present disclosure provides a voice broadcast method, as shown in FIG. 1 , the method includes:
  • the terminal when the terminal receives the voice broadcast operation of the user to the terminal, the voice broadcast report command is generated based on the trigger of the voice broadcast operation.
  • the terminal may include a terminal having a touch screen to receive a touch operation of the user, a terminal performing a control operation on the terminal through a function button, and a terminal performing a control operation by voice, and do not impose any restrictions on the type of the specific terminal, as long as Can receive user actions.
  • the terminal receives the operation of the user, it is determined whether the received operation is consistent with the preset voice broadcast operation, and when it is consistent, it is determined that the voice broadcast operation is received.
  • the specific form of the voice broadcast operation is not limited, such as: The upper button of the button terminal of the button terminal, the double touch point sliding of the touch screen terminal, the operation of the preset track received on the touch screen terminal or a preset voice, etc., may be configured by the system or may be set by the user.
  • the webpage interface may be a webpage interface of the browser, such as a UC, or a webpage interface of an application installed on the terminal, such as a NetEase news app.
  • the current focus node is the action point of the current user interface. For example, when the received voice broadcast operation is a touch screen operation, at this time, the focus node receives the point corresponding to the coordinates of the touch screen operation; when the received voice broadcast operation is a button During operation, the action point of the last operation before this operation can be used as the current focus point.
  • the first node of the webpage interface may also be used as the current focus point when receiving the voice broadcast operation, and when the voice broadcast operation is received, the node is searched from the first node of the webpage interface.
  • the current focus node can be set as needed.
  • the currently focused node is a node corresponding to the coordinate corresponding to the last operation of the terminal before receiving the voice broadcast operation.
  • each component in the hypertext markup language (HTML) document of the webpage is a node.
  • the entire document of a web page is a document node
  • each HTML tag is an element node
  • the text contained in the HTML element is a text node.
  • Each HTML attribute is a property node
  • the annotation belongs to the annotation node.
  • Nodes have a hierarchical relationship to each other. All nodes in the HTML document make up a document tree (or node tree).
  • Each element, attribute, text, etc. in an HTML document represents a node in the tree. The tree starts at the document node and continues to extend the branches until it reaches all the text nodes at the lowest level of the node tree.
  • the node is a target node
  • broadcast the text content of the target node wherein the target node is a node that has text information and does not have a child node and does not respond to an operation event.
  • the method further includes: determining that the node is a node that does not respond to the operation event, determining that the node is a non-follable node; determining that the non-follable node is When the webpage text object model DOM does not include the node element of the child node, determining that the node is a leaf node; determining a text length of the leaf node; when the text length of the leaf node is greater than a preset length threshold, determining the The node is the target node.
  • the node elements in the DOM tree of the webpage are searched, wherein the node elements include a focusable node element and a non-follable node element.
  • the node label of the node may be used to determine that the node is The focus node is also a non-follable node; wherein the element label of the focusable node includes an element label such as A, INPUT, BUTTON, etc., and the corresponding node element is a node element that needs to respond to an operation event such as a link, an input box, a button, and the operation event is a click.
  • User events such as events.
  • a node when a user's click is received, the web page needs to be transferred to the web page corresponding to the link in response to the user's click event.
  • a node When a node is not a focusable node, it may be determined that the node is a non-follable node, and the non-follable node does not respond to a user's operation event. For example, a node displays a piece of text, and the piece of text receives the user's text. During the operation, the display content of the text is kept unchanged, and the received user operation is not responded.
  • a non-follable node When a non-follable node is identified in the DOM tree of the webpage, it is determined whether the node has a child node. When the node does not have a child node, it indicates that the node is a leaf node in the DOM tree, where the leaf node is a DOM tree.
  • the last level node in the node which only has a parent node or a sibling node, and no child nodes.
  • the parent node of the child node is a structure node, and no broadcast is required.
  • determining that a non-follable node is a leaf node When determining that a non-follable node is a leaf node, obtaining a text length of the leaf node, when the text length is greater than a length threshold, determining that the node is a node having text information and having no child node and not responding to the operation event, that is, Target node.
  • the innerText property value of the node can be obtained, and the text value of the node can be displayed by the innerText property value.
  • the text length of the property value is greater than the length threshold
  • the length threshold may be zero, or may be other minimum lengths for defining the text to be voiced, and the value may be set according to actual needs.
  • the length of the text is greater than zero, it indicates that the text to be broadcasted has the displayed text information.
  • the length of the text is greater than a value, it indicates that the text information displayed by the node to be voiced is greater than the value.
  • the text content of the node is sent to the voice assist interface, where the voice assisted broadcast interface is between the browser or application to which the webpage belongs and the voice assist application of the mobile phone system (such as GOOGLE's TALKBACK)
  • the bridge transmits the content in the current target node to the voice-assisted application for real-time voice broadcast, and realizes the voice broadcast of the text content of the target node.
  • the method before the text content of the target node is broadcasted, the method further includes: detecting a size of the content to be broadcasted by the target node; and targeting the target node according to the size of the content to be broadcasted by the target node Corresponding node reconstruction processing is performed. Specifically, the processing according to the size of the broadcast content is different:
  • the target node When the content of the to-be-advertised content of the target node does not reach the first preset range, the target node is merged, and the content to be broadcasted after the merged processing is used as the text content;
  • the content to be broadcasted of the target node belongs to the first preset range, the content to be broadcasted of the target node is acquired, and the content to be broadcasted is output as the text content;
  • the target node When the content to be broadcast of the target node exceeds the first preset range, the target node is segmented, and the content to be broadcasted after the segmentation process is output as the text content.
  • the size of the content to be broadcasted may include a parameter for measuring the content of the content to be broadcast, such as the length of the text to be broadcasted, the layout height of the node, and the like, wherein the layout height is the content pixel occupied by the node in the actual webpage after the webpage layout is completed. height.
  • the node reconstruction process corresponding to the target node according to the size of the content to be broadcasted is described by taking the layout height as the size of the content to be broadcasted as an example.
  • the node reconstruction processing herein includes not processing the text content of the direct output target node, performing the segmentation processing to output the text content of the divided split node, and performing the merge processing to output the combined text content of the merged node.
  • the first preset range is (aH/10, aH), where H is the screen pixel height of the mobile terminal, a is a coefficient having a value of 1 to 1.5, and h is a layout height of the target node.
  • H the screen pixel height of the mobile terminal
  • a a coefficient having a value of 1 to 1.5
  • h a layout height of the target node.
  • the content to be broadcasted of the target node is segmented, and the content of the target node to be broadcasted is divided into multiple split nodes, wherein each split node
  • the content to be broadcasted is in the range of (aH/10, aH), so that the content of the node with more content to be broadcasted is divided into multiple moderately-distributed nodes for voice broadcast, and the content of the split node to be broadcasted is sequentially sent to voice assist.
  • the divided plurality of divided nodes may sequentially perform voice broadcast according to the splitting order. When h ⁇ aH/10, it indicates that the layout height of the target node does not reach the first preset range, and the content to be broadcasted is relatively too small.
  • the target node is merged, and the content of the target node to be broadcasted is
  • the to-be-advertised content of the to-be-combined node is merged, so that the merged node is moderately sized, that is, the size of the to-be-advertised content of the merged node is moderate, and the content of the merged node to be broadcasted is sent to the voice-assisted interface for broadcast.
  • the node to be merged of the target node is sequentially searched from the target node, where the node to be merged has text information of the target node.
  • a sibling node that does not have a child node; when the node to be merged is the same as the element label of the target node, the target node is merged with the merged node to obtain a merged node; and the merged node is detected to be broadcasted Whether the size of the content does not reach the first preset range; when the size of the content to be broadcast of the merged node does not reach the first preset range, continue to search for the next node to be merged with the same element label of the target node, Until the size of the content to be broadcast of the merged node reaches a first preset range or there is node to be merged that is the same as the element label of the target node.
  • the number of the nodes to be merged is determined according to the size of the content to be broadcasted of the node to be merged, so that the size of the content to be broadcasted of the merged node belongs to the second preset range, where the first preset range and the second The preset range can be the same or different, and the specific one can be adjusted according to actual needs.
  • the node to be merged is a sibling node having the same parent node as the target node, and is the same as the element tag of the target node, and the element tag may include ⁇ p>, ⁇ div>, ⁇ h3>, etc., for example, the element tag of the target node is ⁇ When p>, the element label of the node to be merged is also ⁇ p>.
  • the non-focusable element that can support the voice broadcast that is, the virtual focusable node is determined by the judgment of the leaf node and the innerText attribute value, thereby avoiding key focus on the button machine.
  • the node containing the text information for voice broadcast will be omitted, and the voice broadcast of some unnecessary nodes will be removed, so as to eliminate the noise, and the text information of many unfocused elements will be missed in the prior art when the voice broadcast is performed.
  • the problem is to ensure that all the content to be played is fully broadcast during the voice broadcast.
  • the dynamic node is divided or merged by judging whether the content of the node to be broadcasted is too large or too small, thereby avoiding excessive content when the content is too large. It will make too much content for each broadcast, which is not conducive to the blind users to replay, and too small and need to frequently operate the broadcast and other issues. Therefore, the method for voice broadcast provided by the embodiment of the present disclosure can well support the blind user's webpage navigation experience of voice assisted functions on various types of terminals, and has high technical value and commercial value.
  • the method for triggering the target node is triggered by the two-finger sliding to describe the voice broadcast method provided by the embodiment of the present disclosure. As shown in FIG. 2, the method includes:
  • the mobile terminal When the mobile terminal receives the user's two-finger sliding operation, it is determined that the voice broadcast request is received. At this time, according to the two-finger sliding event input by the user, the next supported voice broadcast is sequentially searched from the current focused node in the webpage DOM tree. The focusable candidate node. When the candidate node is found, it is judged whether the candidate node is a regular focusable node (ie, a focus node), and if it is to proceed to S205 for processing, otherwise, it proceeds to S202.
  • a regular focusable node ie, a focus node
  • the conventional focusable node mainly includes nodes whose elements are labeled A (link), INPUT (input box), BUTTON (button), etc., and these nodes are elements that need to be processed in response to user events, so they are called regular focusable nodes; When a node is not a conventional focusable node, it is determined that the node is a non-follable node.
  • the process proceeds to S203, otherwise it re-enters S201 and then searches for the next candidate node focus node processing.
  • the leaf node in the DOM tree here means that the node has no child node elements.
  • the value of the innerText attribute of the candidate node is obtained. If the text length of the attribute value is greater than zero, the process proceeds to S204, where the candidate node is the target node of the search, otherwise the process proceeds to S201 and then the next candidate focus node is processed.
  • the node's innerText property value corresponds to the textual information that the node can display.
  • S204 Perform node reconstruction processing according to the to-be-advertised content of the candidate node, and perform voice broadcast;
  • the candidate node is set as a virtual focusable node (ie, a target node), and the size of the content to be broadcasted by the virtual focusable node is obtained, and the virtual focusable node is subjected to node reconstruction processing, and the processed node is processed.
  • the to-be-advertised content is sent to the voice-assisted interface, and the content to be broadcasted is transmitted to the voice-assisted application through the voice-assisted interface for real-time voice broadcast.
  • the content of the to-be-advertised content of the conventional focusable node is directly output to the voice-assisted interface, and the content to be broadcasted is transmitted to the voice-assisted application through the voice-assisted interface for real-time voice broadcast.
  • the voice broadcast method provided by the embodiment of the present disclosure is described in the case where the user presses the direction key to trigger the search of the target node in the pure button terminal, but the user presses the down arrow key.
  • the terminal receives the arrow key event.
  • the method includes:
  • the mobile terminal When the mobile terminal receives the user's direction key operation, it is determined that the voice broadcast request is received. At this time, according to the direction key event pressed by the user, the next focused voice node is sequentially searched from the current focus node in the web page DOM tree. The candidate node may be focused. When the candidate node is found, it is determined whether the candidate node is a regular focusable node (ie, a focused node). If it is, the process proceeds to S305, otherwise the process proceeds to S302.
  • a regular focusable node ie, a focused node
  • the conventional focusable node mainly includes nodes whose elements are labeled A (link), INPUT (input box), BUTTON (button), etc., and these nodes are elements that need to be processed in response to user events, so they are called regular focusable nodes; When a node is not a focusable node, it is determined that the node is a non-follable node.
  • the candidate node is a leaf node in the webpage DOM tree, if it is to enter S303, otherwise it re-enters S301 and then searches for the next candidate node focus node processing.
  • the leaf node in the DOM tree here means that the node has no child node elements.
  • the value of the innerText attribute of the candidate node is obtained. If the text length of the attribute value is greater than zero, the process proceeds to S304, where the candidate node is the target node of the search, otherwise, the process proceeds to S301 and then the next candidate focus node is processed.
  • the node's innerText property value corresponds to the textual information that the node can display.
  • S304 Perform node reconstruction processing according to the to-be-advertised content of the candidate node, and perform voice broadcast;
  • the candidate node is set as a virtual focusable node (ie, a target node), and the size of the content to be broadcasted by the virtual focusable node is obtained, and the virtual focusable node is subjected to node reconstruction processing, and the processed node is processed.
  • the to-be-advertised content is sent to the voice-assisted interface, and the content to be broadcasted is transmitted to the voice-assisted application through the voice-assisted interface for real-time voice broadcast.
  • the content of the to-be-advertised content of the focusable node is directly output to the voice-assisted interface, and the content to be broadcasted is transmitted to the voice-assisted application through the voice-assisted interface for real-time voice broadcast.
  • the process of the target node search in the voice broadcast method provided by the embodiment of the present disclosure is applicable to various terminal devices, except that the input trigger mode is different. The other steps are handled exactly the same. Therefore, the method for performing voice announcement provided by the embodiment of the present disclosure is not limited to the manner in which the physical button operation instruction is triggered or the gesture instruction action on the touch screen, and other input instruction actions such as voice.
  • the method for reconstructing a node in the method for voice broadcast provided by the embodiment of the present disclosure is specifically described.
  • the size of the content to be broadcasted is the layout height
  • the first preset range is (aH).
  • aH /10, aH) as an example to illustrate the node reconstruction method, as shown in Figure 4, the node reconstruction method includes:
  • the layout height of the virtual focusable node is recorded as h, and the layout height refers to the content pixel height occupied by the node in the actual webpage after the webpage layout is completed; the layout height h to be obtained and the first pre-preparation Set the range to match.
  • H is the screen pixel height of the terminal device, and a is a constant coefficient, which may take 1 or 1.5.
  • next node of the virtual focusable node is also a virtual focusable node.
  • the node has the same element label and the same parent node as the target node, and the virtual next node is the node to be merged, and then the DOM tree
  • the node to be merged is merged into the current focus node and re-entered into S401 for processing; otherwise, the process proceeds to S407;
  • S406 Perform segmentation processing on the target node.
  • S407 The voice broadcasts the content to be broadcasted.
  • the to-be-advertised content of the current virtual focusable node is sent to the voice-assisted interface, and the content to be broadcasted is transmitted to the voice-assisted application through the voice-assisted interface for real-time voice broadcast.
  • the relationship between the layout height and the first preset range is sequentially compared.
  • the corresponding node weight may be directly entered according to the relationship between the layout height and the first preset range. Structure processing.
  • the fifth embodiment of the present disclosure provides a device for voice broadcast, as shown in FIG. 5, the device includes: a receiving unit 501, a searching unit 502, and a broadcast unit 503;
  • the receiving unit 501 is configured to generate a voice broadcast instruction when receiving the voice broadcast operation
  • the searching unit 502 is configured to search for a node from the current focused node in the current webpage interface according to the voice broadcast instruction;
  • the broadcast unit 503 is configured to broadcast the text content of the target node when the node is a target node; wherein the target node is a node that has text information and does not have a child node and does not respond to an operation event.
  • the voice broadcast apparatus may further include a determining unit 504, configured to: when the node is determined to be a node that does not respond to an operation event, determine that the node is a non-follable node; and determine the non-follable node.
  • a determining unit 504 configured to: when the node is determined to be a node that does not respond to an operation event, determine that the node is a non-follable node; and determine the non-follable node.
  • the node element of the child node is not included in the webpage text object model DOM, the node is determined to be a leaf node; the text length of the leaf node is determined; when the text length of the leaf node is greater than a preset length threshold, determining The node is a target node.
  • the voice broadcast apparatus may further include a detecting unit 505 and a reconstruction unit 506, wherein the detecting unit 505 is configured to detect a size of the content to be broadcasted of the target node; and the reconstructing unit 506 is configured to The size of the to-be-advertised content of the target node performs corresponding node reconstruction processing on the target node.
  • the reconstruction unit 506 can include: a merge module 5061, a direct broadcast module 5062, and a split module 5063;
  • the merging module 5061 is configured to perform a merging process on the target node when the content to be broadcasted of the target node does not reach the first preset range, and use the combined content to be broadcasted as the text content;
  • the direct broadcast module 5062 is configured to: when the content to be broadcast of the target node belongs to the first preset range, acquire the content to be broadcasted by the target node, and use the content to be broadcast as the text content;
  • the segmentation module 5063 is configured to perform segmentation processing on the target node when the content to be broadcasted of the target node exceeds the first preset range, and use the content to be broadcasted as the text content after the segmentation process.
  • the merging module 5061 is specifically configured to: when the content to be broadcasted of the target node does not reach the first preset range, sequentially search for the to-be-joined node of the target node from the target node, where the node to be merged is a sibling node of the target node that has text information and does not have a child node; when the node to be merged is the same as the element label of the target node, the target node is merged with the merged node to obtain a merged node; Detecting whether the size of the content to be broadcast of the merged node does not reach the first preset range; when the size of the content to be broadcast of the merged node does not reach the first preset range, continue to search for the next and the target node.
  • the node to be merged with the same element label until the size of the content to be broadcast of the merged node reaches a first preset range or there is no node to be merged with the same element label of the target node; the content to be broadcasted after the merged processing As text content.
  • the sixth embodiment of the present disclosure further describes a voice broadcast device provided by the embodiment of the present disclosure, which is a specific application scenario of a webpage navigation method for supporting a voice assist function in a browser.
  • the device includes a focus.
  • the node processing unit 801 (corresponding to the receiving unit 501, the searching unit 502, the determining unit 504), the node dynamic reconstruction processing unit 802 (corresponding to the detecting unit 505 and the reconstructing unit 506), and the voice assisting interface unit 803 (and the broadcast unit 503) correspond).
  • the focus node processing unit 801 mainly searches for the focusable node element supporting the voice broadcast in the DOM tree according to the user input focus request event, and directly inputs the focusable node to the voice assist interface unit 803 for processing.
  • Other nodes capable of voice broadcast are input to the node reconstruction processing unit 802 as virtual focusable nodes.
  • the node reconstruction processing unit 802 dynamically merges or splits the node according to the virtual focusable node layout height to obtain the target node after the reconstruction process, and sends the to-be-recorded content of the reconstructed processed voice node to the voice auxiliary interface unit 803 for processing. .
  • voice auxiliary interface unit 803 can be implemented through a voice assisted interface, and combined with a voice broadcast application to implement voice broadcast.
  • embodiments of the present disclosure can be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware aspects. Moreover, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • the voice broadcast method of the embodiment of the present disclosure is applicable to various terminal devices, and can solve the problem that the body information of many unfocused elements is missed during voice broadcast in the prior art, and all the content to be played is guaranteed in the process of voice broadcast. Complete broadcast.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本公开涉及一种语音播报方法,所述方法包括:接收到语音播报操作时,生成语音播报指令;根据所述语音播报指令在当前的网页界面从当前聚焦节点开始中查找节点;当所述节点为目标节点时,播报所述目标节点的文本内容;其中,所述目标节点为具有文本信息的不具有子节点且不响应操作事件的节点。本公开实施例还涉及一种语音播报装置。

Description

一种语音播报方法及装置 技术领域
本公开涉及通信领域,尤其涉及一种语音播报方法及装置。
背景技术
北美国家法律规定诸如手机等终端设备必须支持残障人士使用的语音辅助功能,如GOOGLE开发的TALKBACK语音播报辅助功能等。该语音功能通过盲人用户在手机上屏幕上进行一些手势的操作可以自动语音播报用户接触的控件名称或内容。但目前手机上的常规技术在支持这些语音辅助功能时还存在各种的问题。比如对于纯按键手机,在浏览器中支持语音辅助功能进行上网浏览来说这是一件非常困难的事情,因为在触摸屏上可以通过双指滑动自动依次语音播报网页文档对象模型(Document Object Model,DOM)树中每个节点的内容,而按键机上只有靠方向键导航才能对聚焦元素进行语音播报。
但语音播报和按键机方向键导航本身功能有冲突,因为按键机方向键只能聚焦网页中可聚焦的元素(如链接、输入框、按钮等),这样靠按键导航就会漏掉很多非可聚焦元素的正文信息的语音播报。因此,急需一种语音播报的技术方案,能够解决现有技术在进行语音播报的过程中漏掉很多非聚焦元素的正文信息的问题。
发明内容
有鉴于此,本公开实施例旨在提供一种语音播报的方法及装置,至少解决了在进行语音播报时漏掉很多非聚焦元素的正文信息的问题。
本公开实施例的技术方案是这样实现的:
本公开实施例提供一种语音播报方法,所述方法包括:接收到语音播报操作时,生成语音播报指令;根据所述语音播报指令在当前的网页界面从当前聚焦节点开始查找节点;当所述节点为目标节点时,播报所述目标节点的文本内容;其中,所述目标节点为具有文本信息的不具有子节点且不响应操作事件的节点。
在上述方案中,在播报所述目标节点的文本内容之前,所述方法还包括:确定所述节点为不响应操作事件的节点时,确定所述节点为非可聚焦节点;确定所述非可聚焦节点为网页文本对象模型DOM中不包括子节点的节点元素时,确定所述节点为叶节点;判断所述叶节点的文本长度;当所述叶节点的文本长度大于预设的长度阈值时,确定所述节点为目标节点。
在上述方案中,在播报所述目标节点的文本内容之前,所述方法还包括:检测所述目标节点的待播报内容大小;根据所述目标节点的待播报内容大小对所述目标节点进行对应的节点重构处理。
在上述方案中,所述根据所述目标节点的待播报内容大小对所述目标节点进行对应的 节点重构处理包括:当所述目标节点的待播报内容未达到第一预设范围时,对所述目标节点进行合并处理,将合并处理后的待播报内容作为文本内容;当所述目标节点的待播报内容属于所述第一预设范围内时,获取所述目标节点的待播报内容,将待播报内容作为文本内容;当所述目标节点的待播报内容超过所述第一预设范围时,对所述目标节点进行分割处理,将分割处理后待播报内容作为文本内容。
在上述方案中,对所述目标节点进行合并处理包括:从所述目标节点开始顺序查找所述目标节点的待合并节点,其中,所述待合并节点为所述目标节点的具有文本信息的不具有子节点的兄弟节点;当所述待合并节点与所述目标节点的元素标签相同时,将所述目标节点与所述合并节点进行合并得到合并节点;检测所述合并节点的待播报内容的大小是否未达到第一预设范围;当所述合并节点的待播报内容的大小未达到第一预设范围时,继续查找下一与所述目标节点的元素标签相同的待合并节点,直到所述合并节点的待播报内容的大小达到第一预设范围或不存在与所述目标节点的元素标签相同的待合并节点。
本公开实施例还提供一种实现上述语音播报方法的语音播报装置,所述装置包括:接收单元、查找单元以及播报单元;其中,所述接收单元设置为接收到语音播报操作时,生成语音播报指令;所述查找单元设置为根据所述语音播报指令在当前的网页界面从当前聚焦节点开始查找节点;所述播报单元设置为当所述节点为目标节点时,播报所述目标节点的文本内容;其中,所述目标节点为具有文本信息的不具有子节点且不响应操作事件的节点。
在上述方案中,所述装置还包括:确定单元,设置为:确定所述节点为不响应操作事件的节点时,确定所述节点为非可聚焦节点;确定所述非可聚焦节点为网页文本对象模型DOM中不包括子节点的节点元素时,确定所述节点为叶节点;判断所述叶节点的文本长度;当所述叶节点的文本长度大于预设的长度阈值时,确定所述节点为目标节点。
在上述方案中,所述装置还包括:检测单元和重构单元;其中,所述检测单元设置为检测所述目标节点的待播报内容大小;所述重构单元设置为根据所述目标节点的待播报内容大小对所述目标节点进行对应的节点重构处理。
在上述方案中,所述重构单元包括:合并模块、直接播报模块以及分割模块;其中,所述合并模块设置为当所述目标节点的待播报内容未达到第一预设范围时,对所述目标节点进行合并处理,将合并处理后的待播报内容作为文本内容;所述直接播报模块设置为当所述目标节点的待播报内容属于所述第一预设范围内时,获取所述目标节点的待播报内容,将待播报内容作为文本内容;所述分割模块设置为当所述目标节点的待播报内容超过所述第一预设范围时,对所述目标节点进行分割处理,将分割处理后待播报内容作为文本内容。
在上述方案中,所述合并模块具体设置为:当所述目标节点的待播报内容未达到第一预设范围时,从所述目标节点开始顺序查找所述目标节点的待合并节点,其中,所述待合并节点为所述目标节点的具有文本信息的不具有子节点的兄弟节点;当所述待合并节点与 所述目标节点的元素标签相同时,将所述目标节点与所述合并节点进行合并得到合并节点;检测所述合并节点的待播报内容的大小是否未达到第一预设范围;当所述合并节点的待播报内容的大小未达到第一预设范围时,继续查找下一与所述目标节点的元素标签相同的待合并节点,直到所述合并节点的待播报内容的大小达到第一预设范围或不存在与所述目标节点的元素标签相同的待合并节点;将合并处理后的待播报内容作为文本内容。
本公开实施例的一种语音播报的方法,包括:接收到语音播报操作时,生成语音播报指令;根据所述语音播报指令在当前的网页界面从当前聚焦节点开始中查找节点;当所述节点为目标节点时,播报所述目标节点的文本内容;其中,所述目标节点为具有文本信息的不具有子节点的节点。采用本公开实施例,当接收到语音播报操作时,在当前的网页界面从当前的聚焦节点开始查找具有文本信息且不具有子节点的节点,查找出目标节点,将该目标节点的文本内容进行播报,通过本公开实施例提供的方法,能够解决现有技术中在进行语音播报时漏掉很多非聚焦元素的正文信息的问题,进行语音播报的过程中保证所有需播放内容的完整播报。
附图说明
图1为本公开实施例一提供的语音播报方法的流程示意图;
图2为本公开实施例二提供的语音播报方法的流程示意图;
图3为本公开实施例三提供的语音播报方法的流程示意图;
图4为本公开实施例四提供的节点重构方法的流程示意图;
图5为本公开实施例五提供的一种语音播报装置的结构示意图;
图6为本公开实施例五提供的另一种语音播报装置的结构示意图;
图7为本公开实施例五提供的重构单元的结构示意图;
图8为本公开实施例六提供的一种语音播报装置的结构示意图。
具体实施方式
下面结合附图对技术方案的实施作进一步的详细描述。
实施例一
本公开实施例一提供语音播报方法,如图1所示,所述方法包括:
S101、接收到语音播报操作时,生成语音播报指令;
具体的,当终端接收到用户对终端的语音播报操作时,基于语音播报操作的触发生成语音播报指令。这里,终端可包括具有触摸屏来接收用户的触摸操作的终端、通过功能按键对终端进行控制操作的终端,也可包括通过语音进行控制操作的终端,对具体的终端的类型不做任何限制,只要能够接收用户操作即可。
这里,当终端接收到用户的操作时,判断接收到的操作与预设的语音播报操作是否一致,当一致时,确定接收到语音播报操作。语音播报操作的具体的形式不做限制,比如: 按键终端的方向键中的上按键、触摸屏终端的双触摸点滑动、触摸屏终端上接收到的预设轨迹的操作或一段预设的语音等,这些可由系统进行配置,也可由用户进行设置。
S102、根据所述语音播报指令在当前的网页界面从当前聚焦节点开始查找节点;
当终端生成语音播报指令时,确定接收到用户触发语音播报过程,此时,在终端当前的网页界面从当前聚焦节点开始查找节点。这里,网页界面可为浏览器的网页界面,比如UC,也可为终端上安装的应用程序的网页界面,比如:网易新闻APP。当前的聚焦节点为当前用户界面的作用点,比如,当接收到的语音播报操作为触屏操作时,此时,聚焦节点接收触屏操作的坐标对应的点;当接收的语音播报操作为按键操作时,可将本次操作之前的上一次操作作用的作用点作为当前聚焦点。这里,也可将网页界面的第一个节点作为接收到语音播报操作时的当前聚焦点,当接收到语音播报操作时,从网页界面的第一个节点开始查找节点。这里,当前聚焦节点可根据需要进行设置。
这里,当终端生成语音播报指令时,当前聚焦的节点为终端在接收到语音播报操作之前最后一次操作所对应的坐标对应的节点。
需要说明的是,网页的结构可根据DOM,节点树来进行解析,网页的超文本标记语言(Hypertext Markup Language,HTML)文档中的每个成分都是一个节点。网页的整个文档是一个文档节点,每个HTML标签是一个元素节点,包含在HTML元素中的文本是文本节点。每一个HTML属性是一个属性节点,注释属于注释节点。节点彼此都有等级关系。HTML文档中的所有节点组成了一个文档树(或节点树)。HTML文档中的每个元素、属性、文本等都代表着树中的一个节点。树起始于文档节点,并由此继续伸出枝条,直到处于这棵节点树最低级别的所有文本节点为止。
S103、当所述节点为目标节点时,播报所述目标节点的文本内容;其中,所述目标节点为具有文本信息的不具有子节点且不响应操作事件的节点。
这里,在播报所述目标节点的文本内容之前,所述方法还包括:确定所述节点为不响应操作事件的节点时,确定所述节点为非可聚焦节点;确定所述非可聚焦节点为网页文本对象模型DOM中不包括子节点的节点元素时,确定所述节点为叶节点;判断所述叶节点的文本长度;当所述叶节点的文本长度大于预设的长度阈值时,确定所述节点为目标节点。
当终端接收到语音播报指令时,查找网页DOM树中的节点元素,其中,这些节点元素包括可聚焦节点元素和非可聚焦节点元素,这里,可通过节点的元素标签来确定该节点是为可聚焦节点还是非可聚焦节点;其中,可聚焦节点的元素标签包括A、INPUT、BUTTON等元素标签,对应的节点元素为链接、输入框、按钮等需要响应操作事件的节点元素,操作事件为点击事件等用户事件。这里以链接举例,对于链接,当接收到用户的点击时,需将网页转至该链接对应的网页,以响应用户的点击事件。当一节点不是可聚焦节点时,可确定该节点为非可聚焦节点,非可聚焦节点不响应用户的操作事件,比如,某节点为显示一段文字,该段文字在接收用户对这段文字的操作时,保持文字的显示内容不变,不对接收到的用户操作进行响应。
当在网页DOM树中识别出非可聚焦节点时,确定该节点是否存在子节点,当该节点不具有子节点时,则表明该节点为DOM树中的叶节点,这里,叶节点为DOM树中最后一级节点,该节点只存在父节点或兄弟节点,而不存在子节点。这里,存在子节点的父节点为结构节点,不需要进行播报。
当确定一个非可聚焦节点为叶节点时,获取该叶节点的文本长度,当文本长度大于长度阈值时,确定该节点为具有文本信息的不具有子节点且不响应操作事件的节点,即为目标节点。
这里,可获取该节点的innerText属性值,通过innerText属性值通常对应该节点可显示的文本信息,当该属性值的文本长度大于长度阈值则表明该节点的文本长度大于长度阈值。其中,长度阈值可为零,也可为其他用于限定待语音播报的文本的最低长度,该值可根据实际需求进行设置。当文本长度大于零,表明待语音播报的节点存在显示的文本信息即可,当文本长度大于一值时,表明待语音播报的节点显示的文本信息需大于该值。
当确定一节点为目标节点时,将该节点的文本内容发送至语音辅助接口,这里,语音辅助播报接口是网页所属的浏览器或应用程序与手机系统语音辅助应用(如GOOGLE的TALKBACK)之间的桥梁,它将当前目标节点中的内容传递给语音辅助应用进行实时地语音播报,实现目标节点的文本内容的语音播报。
在本公开实施例中,在播报所述目标节点的文本内容之前,所述方法还包括:检测所述目标节点的待播报内容大小;根据所述目标节点的待播报内容大小对所述目标节点进行对应的节点重构处理。具体的,根据播报内容大小不同进行的处理不同:
当所述目标节点的待播报内容未达到第一预设范围时,对所述目标节点进行合并处理,将合并处理后的待播报内容作为文本内容;
当所述目标节点的待播报内容属于所述第一预设范围内时,获取所述目标节点的待播报内容,将输出待播报内容作为文本内容;
当所述目标节点的待播报内容超过所述第一预设范围时,对所述目标节点进行分割处理,输出分割处理后待播报内容作为文本内容。
这里,待播报内容的大小可包括待播报文本的长度、节点的布局高度等用于衡量待播报内容多少的参数,其中,布局高度为网页布局完成后该节点在实际网页中所占的内容像素高度。这里,以布局高度作为待播报内容的大小为例对根据待播报内容的大小对目标节点做对应的节点重构处理进行说明。这里的节点重构处理包括不处理直接输出目标节点的文本内容、进行分割处理输出分割后的分割节点的文本内容以及进行合并处理输出合并后的合并节点的文本内容。
当第一预设范围为(aH/10,aH),其中,H为移动终端的屏幕像素高度,a为一取值为1至1.5的系数,h为目标节点的布局高度。当aH/10<h<aH时,表明该目标节点的布局高度处于第一预设范围内,认为该目标节点的高度适中,则将该目标节点的文本内容作为待播报内容,直接输出至语音辅助接口。当h>aH时,表明该目标节点的布局高度超过 第一预设范围,待播报内容相对过大,此时,对该目标节点的待播报内容进行分割处理,将该目标节点的待播报内容分割为多个分割节点,其中,每个分割节点的待播报内容处于(aH/10,aH)范围内,使得该待播报内容比较多的节点的内容分割为多个高度适中的分割节点进行语音播报,将分割节点的待播报内容依次发送至语音辅助接口,此时,可将分割后的多个分割节点根据分割顺序依次进行语音播报。当h<aH/10时,表明该目标节点的布局高度未达到第一预设范围,待播报内容相对过小,此时,对该目标节点进行合并处理,将该目标节点的待播报内容与待合并节点的待播报内容进行合并,使得合并后的合并节点的高度适中,即合并节点的待播报内容的大小适中,将合并后的合并节点的待播报内容发送至语音辅助接口,进行播报。
当目标节点的待播报内容未达到第一预设范围内时,从所述目标节点开始顺序查找所述目标节点的待合并节点,其中,所述待合并节点为所述目标节点的具有文本信息的不具有子节点的兄弟节点;当所述待合并节点与所述目标节点的元素标签相同时,将所述目标节点与所述合并节点进行合并得到合并节点;检测所述合并节点的待播报内容的大小是否未达到第一预设范围;当所述合并节点的待播报内容的大小未达到第一预设范围时,继续查找下一与所述目标节点的元素标签相同的待合并节点,直到所述合并节点的待播报内容的大小达到第一预设范围或不存在与所述目标节点的元素标签相同的待合并节点。这里,进行合并的节点的数量根据待合并节点的待播报内容的大小确定,使得合并后的合并节点的待播报内容的大小属于第二预设范围内,这里,第一预设范围与第二预设范围可相同,也可不同,具体的,可根据实际需求进行调整。待合并节点为与目标节点具有相同父节点的兄弟节点,且与目标节点的元素标签相同,元素标签可包括<p>、<div>、<h3>等,比如,目标节点的元素标签为<p>时,待合并节点的元素标签也为<p>。
通过本公开实施例提供的语音播报方法,一方面,通过叶节点和innerText属性值的判断确定可支持语音播报的非可聚焦元素,即虚拟可聚焦节点,这样就避免了在按键机上进行按键聚焦会漏掉包含正文信息进行语音播报的节点,同时去除了一些不必要节点的语音播报,达到消除噪音的目的,能够解决现有技术中在进行语音播报时漏掉很多非聚焦元素的正文信息的问题,进行语音播报的过程中保证所有需播放内容的完整播报。另一方面,对非可聚焦节点进行节点动态重构处理中,通过对节点的待播报内容大小判断其内容是否过大或过小来进行动态节点分割或合并处理,从而避免了内容过大时会使每次播报的内容过多,不利于盲人用户进行重放,而过小又需要频繁地操作播报等问题。因此通过本公开实施例提供的语音播报的方法能够在各种类型的终端上很好地支持盲人用户进行语音辅助功能的网页导航的体验,具有很高的技术价值和商用价值。
实施例二
在本公开实施例二中,以双指滑动来触发目标节点的查找为具体的应用场景对本公开实施例提供的语音播报方法进行说明,如图2所示,该方法包括:
S201:接收到双指滑动事件时,判断备选节点是否为可聚焦节点;
当移动终端接收到用户的双指滑动操作时,确定接收到语音播报请求,此时,根据用户输入的双指滑动事件,在网页DOM树中从当前聚焦节点开始顺次查找下一个支持语音播报的可聚焦备选节点。当查找到备选节点时,判断该备选节点是否是常规可聚焦节点(即可聚焦节点),如果是进入S205进行处理,否则进入S202。这里常规可聚焦节点主要包括节点的元素标签为A(链接)、INPUT(输入框)、BUTTON(按钮)等节点,这些节点都是需要响应用户事件处理的元素,所以称为常规可聚焦节点;当一节点不是常规可聚焦节点时,确定该节点为非可聚焦节点。
S202、判断备选节点是否为叶节点;
判断该备选节点是否是网页DOM树中的叶节点,如果是进入S203,否则重新进入S201再查找下一个备选节点聚焦节点处理。这里DOM树中的叶节点表示该节点无任何子节点元素。
S203、判断该节点的文本长度是否大于零;
获取该备选节点的innerText属性值,如果该属性值的文本长度大于零则进入S204处理,该备选节点即为查找的目标节点,否则重新进入S201再查找下一个备选聚焦节点处理。这里节点的innerText属性值对应该节点可显示的文本信息。
S204、根据该备选节点的待播报内容进行节点重构处理并进行语音播报;
这里,将该备选节点设置为虚拟可聚焦节点(即目标节点),并获取该可虚拟可聚焦节点的待播报内容的大小对该虚拟可聚焦节点进行节点重构处理,将处理后的节点的待播报内容发送至语音辅助接口,通过语音辅助接口将待播报内容传递给语音辅助应用进行实时地语音播报。
S205、将可聚焦节点的待播报内容进行播报。
将该常规可聚焦节点的待播报内容直接输出给语音辅助接口,通过语音辅助接口将待播报内容传递给语音辅助应用进行实时地语音播报。
实施例三
在本公开实施例三中,以在纯按键终端中通过用户按下方向键来触发目标节点的查找为具体的应用场景对本公开实施例提供的语音播报方法进行说明,但当用户按压下方向键时,终端接收到方向键事件。如图3所示,该方法包括:
S301:接收到方向键事件时,判断备选节点是否为可聚焦节点;
当移动终端接收到用户的方向键操作时,确定接收到语音播报请求,此时,根据用户按下的方向键事件,在网页DOM树中从当前聚焦节点开始顺次查找下一个支持语音播报的可聚焦备选节点,当查找到备选节点时,判断该备选节点是否是常规可聚焦节点(即可聚焦节点),如果是进入S305进行处理,否则进入S302。这里常规可聚焦节点主要包括节点的元素标签为A(链接)、INPUT(输入框)、BUTTON(按钮)等节点,这些节点都是需要响应用户事件处理的元素,所以称为常规可聚焦节点;当一节点不是可聚焦节点时,确定该节点为非可聚焦节点。
S302、判断备选节点是否为叶节点;
判断该备选节点是否是网页DOM树中的叶节点,如果是进入S303,否则重新进入S301再查找下一个备选节点聚焦节点处理。这里DOM树中的叶节点表示该节点无任何子节点元素。
S303、判断该节点的文本长度是否大于零;
获取该备选节点的innerText属性值,如果该属性值的文本长度大于零则进入S304处理,该备选节点即为查找的目标节点,否则重新进入S301再查找下一个备选聚焦节点处理。这里节点的innerText属性值对应该节点可显示的文本信息。
S304、根据该备选节点的待播报内容进行节点重构处理并进行语音播报;
这里,将该备选节点设置为虚拟可聚焦节点(即目标节点),并获取该可虚拟可聚焦节点的待播报内容的大小对该虚拟可聚焦节点进行节点重构处理,将处理后的节点的待播报内容发送至语音辅助接口,通过语音辅助接口将待播报内容传递给语音辅助应用进行实时地语音播报。
S305、将该可聚焦节点的待播报内容直接输出给语音辅助接口,通过语音辅助接口将待播报内容传递给语音辅助应用进行实时地语音播报。
需要说明的是,根据实施例二和实施例三中的语音播报方法,本公开实施例提供的语音播报的方法中目标节点查找的处理适用于各种终端设备,除了输入触发方式不一样外,其他步骤处理都是完全一样。因此本公开实施例提供的进行语音播报的方法不局限于实体按键操作指令触发的方式或触摸屏上的手势指令动作,以及语音等其他输入指令动作。
实施例四
在本公开实施例四中,对本公开实施例提供的语音播报的方法中的节点重构的方法进行具体描述,这里,以获取的待播报内容大小为布局高度、第一预设范围为(aH/10,aH)为例对节点重构方法进行说明,如图4所示,该节点重构方法包括:
S401、获取目标节点的布局高度;
具体的,获取虚拟可聚焦节点(目标节点)的布局高度记为h,布局高度是指网页布局完成后该节点在实际网页中所占的内容像素高度;将获取的布局高度h与第一预设范围进行匹配。
S402、判断高度布局是否属于第一预设范围?
具体的,如果该虚拟可聚焦节点的布局高度h满足:
aH/10<h<aH       (1)
则认为该高度适中进入S406处理,否则进入S403处理。这里H为终端设备的屏幕像素高度,a为一常系数,可取1或1.5。
S403、判断高度布局是否超过第一预设范围?
具体的,如果该虚拟可聚焦节点的布局高度h满足:
h>aH          (2)
则认为该高度过大进入S406处理,否则进入S404处理;
S404、判断高度布局是否未达到第一预设范围?
具体的,如果该虚拟可聚焦节点的布局高度h满足:
h<aH/10          (3)
则认为该高度过小进入下一步骤S405处理。
S405:对目标节点进行合并处理;
判断该虚拟可聚焦节点的下一个节点也是虚拟可聚焦节点同时,该节点与目标节点具有相同的元素标签且具有相同的父节点,则该虚拟的下一个节点为待合并节点,则在DOM树中将该待合并节点合并到当前聚焦节点中并重新进入S401处理,否则进入S407处理;
S406:对目标节点进行分割处理;
将该虚拟可聚焦节点分割成满足式(1)的各个新的虚拟可聚焦节点即分割节点(也可称为分离节点),并将分割后的第一个分割节点作为当前虚拟可聚焦节点,进入S407处理;
S407:语音播报待播报内容。
将当前虚拟可聚焦节点的待播报内容发送到语音辅助接口,通过语音辅助接口将待播报内容传递给语音辅助应用进行实时地语音播报。
需要说明的是,在S401至S404的步骤中,依次比较布局高度与第一预设范围的关系,在实际应用中,可直接根据布局高度与第一预设范围的关系,进入对应的节点重构处理。
实施例五
为实现上述语音播报方法,本公开实施例五提供一种语音播报的装置,如图5所示,该装置包括:接收单元501、查找单元502以及播报单元503;其中,
接收单元501,设置为接收到语音播报操作时,生成语音播报指令;
查找单元502,设置为根据所述语音播报指令在当前的网页界面从当前聚焦节点开始查找节点;
播报单元503,设置为当所述节点为目标节点时,播报所述目标节点的文本内容;其中,所述目标节点为具有文本信息的不具有子节点且不响应操作事件的节点。
如图6所示,该语音播报装置还可包括确定单元504,设置为:确定所述节点为不响应操作事件的节点时,确定所述节点为非可聚焦节点;确定所述非可聚焦节点为网页文本对象模型DOM中不包括子节点的节点元素时,确定所述节点为叶节点;判断所述叶节点的文本长度;当所述叶节点的文本长度大于预设的长度阈值时,确定所述节点为目标节点。
如图6所示,该语音播报装置还可包括检测单元505、重构单元506,其中,检测单元505,设置为检测所述目标节点的待播报内容大小;重构单元506,设置为根据所述目标节点的待播报内容大小对所述目标节点进行对应的节点重构处理。
如图7所示,重构单元506可包括:合并模块5061、直接播报模块5062以及分割模块5063;其中,
合并模块5061,设置为当所述目标节点的待播报内容未达到第一预设范围时,对所述目标节点进行合并处理,将合并处理后的待播报内容作为文本内容;
直接播报模块5062,设置为当所述目标节点的待播报内容属于所述第一预设范围内时,获取所述目标节点的待播报内容,将待播报内容作为文本内容;
分割模块5063,设置为当所述目标节点的待播报内容超过所述第一预设范围时,对所述目标节点进行分割处理,将分割处理后待播报内容作为文本内容。
合并模块5061具体设置为:当所述目标节点的待播报内容未达到第一预设范围时,从所述目标节点开始顺序查找所述目标节点的待合并节点,其中,所述待合并节点为所述目标节点的具有文本信息的不具有子节点的兄弟节点;当所述待合并节点与所述目标节点的元素标签相同时,将所述目标节点与所述合并节点进行合并得到合并节点;检测所述合并节点的待播报内容的大小是否未达到第一预设范围;当所述合并节点的待播报内容的大小未达到第一预设范围时,继续查找下一与所述目标节点的元素标签相同的待合并节点,直到所述合并节点的待播报内容的大小达到第一预设范围或不存在与所述目标节点的元素标签相同的待合并节点;将合并处理后的待播报内容作为文本内容。
实施例六
本公开实施例六以在浏览器中的一种支持语音辅助功能的网页导航方法为具体的应用场景,对本公开实施例提供的语音播报装置进行进一步说明,如图8所示,该装置包括聚焦节点处理单元801(与接收单元501、查找单元502、确定单元504对应)、节点动态重构处理单元802(与检测单元505、重构单元506对应)、语音辅助接口单元803(与播报单元503对应)。
聚焦节点处理单元801主要在接收到语音播报操作时,根据用户输入聚焦请求事件在网页DOM树中查找支持语音播报的可聚焦节点元素,将可聚焦节点直接输入到语音辅助接口单元803进行处理,其他能进行语音播报的节点作为虚拟可聚焦节点输入到节点重构处理单元802中。
节点重构处理单元802根据虚拟可聚焦节点布局高度进行动态合并或分割节点得到重构处理后的目标节点,将重构处理后的语音节点的待播报内容发送至给语音辅助接口单元803进行处理。
需要说明的是,语音辅助接口单元803可通过语音辅助接口实现,与语音播报应用程序结合实现语音的播报。
本领域内的技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程 和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本公开的较佳实施例而已,并非用于限定本公开的保护范围。
工业实用性
本公开实施例的语音播报方法适用于各种终端设备,能够解决现有技术中在进行语音播报时漏掉很多非聚焦元素的正文信息的问题,进行语音播报的过程中保证所有需播放内容的完整播报。

Claims (10)

  1. 一种语音播报方法,包括:
    接收到语音播报操作时,生成语音播报指令;
    根据所述语音播报指令在当前的网页界面从当前聚焦节点开始查找节点;以及
    当所述节点为目标节点时,播报所述目标节点的文本内容;其中,所述目标节点为具有文本信息的不具有子节点且不响应操作事件的节点。
  2. 根据权利要求1所述的方法,其中,在播报所述目标节点的文本内容之前,所述方法还包括:
    确定所述节点为不响应操作事件的节点时,确定所述节点为非可聚焦节点;
    确定所述非可聚焦节点为网页文本对象模型DOM中不包括子节点的节点元素时,确定所述节点为叶节点;
    判断所述叶节点的文本长度;以及
    当所述叶节点的文本长度大于预设的长度阈值时,确定所述节点为目标节点。
  3. 根据权利要求1所述的方法,其中,在播报所述目标节点的文本内容之前,所述方法还包括:
    检测所述目标节点的待播报内容大小;以及
    根据所述目标节点的待播报内容大小对所述目标节点进行对应的节点重构处理。
  4. 根据权利要求3所述的方法,其中,所述根据所述目标节点的待播报内容大小对所述目标节点进行对应的节点重构处理包括:
    当所述目标节点的待播报内容未达到第一预设范围时,对所述目标节点进行合并处理,将合并处理后的待播报内容作为文本内容;
    当所述目标节点的待播报内容属于所述第一预设范围内时,获取所述目标节点的待播报内容,将待播报内容作为文本内容;以及
    当所述目标节点的待播报内容超过所述第一预设范围时,对所述目标节点进行分割处理,将分割处理后待播报内容作为文本内容。
  5. 根据权利要求4所述的方法,其中,对所述目标节点进行合并处理包括:
    从所述目标节点开始顺序查找所述目标节点的待合并节点,其中,所述待合并节点为所述目标节点的具有文本信息的不具有子节点的兄弟节点;
    当所述待合并节点与所述目标节点的元素标签相同时,将所述目标节点与所述合并节点进行合并得到合并节点;
    检测所述合并节点的待播报内容的大小是否未达到第一预设范围;以及
    当所述合并节点的待播报内容的大小未达到第一预设范围时,继续查找下一与所述目标节点的元素标签相同的待合并节点,直到所述合并节点的待播报内容的大小达到第一预设范围或不存在与所述目标节点的元素标签相同的待合并节点。
  6. 一种语音播报装置,包括:接收单元、查找单元以及播报单元;其中,
    所述接收单元设置为接收到语音播报操作时,生成语音播报指令;
    所述查找单元设置为根据所述语音播报指令在当前的网页界面从当前聚焦节点开始查找节点;
    所述播报单元设置为当所述节点为目标节点时,播报所述目标节点的文本内容;其中,所述目标节点为具有文本信息的不具有子节点且不响应操作事件的节点。
  7. 根据权利要求6所述的装置,其中,所述装置还包括:确定单元,设置为:
    确定所述节点为不响应操作事件的节点时,确定所述节点为非可聚焦节点;
    确定所述非可聚焦节点为网页文本对象模型DOM中不包括子节点的节点元素时,确定所述节点为叶节点;
    判断所述叶节点的文本长度;以及
    当所述叶节点的文本长度大于预设的长度阈值时,确定所述节点为目标节点。
  8. 根据权利要求6所述的装置,其中,所述装置还包括:检测单元和重构单元;其中,
    所述检测单元设置为检测所述目标节点的待播报内容大小;
    所述重构单元设置为根据所述目标节点的待播报内容大小对所述目标节点进行对应的节点重构处理。
  9. 根据权利要求8所述的装置,其中,所述重构单元包括:合并模块、直接播报模块以及分割模块;其中,
    所述合并模块设置为当所述目标节点的待播报内容未达到第一预设范围时,对所述目标节点进行合并处理,将合并处理后的待播报内容作为文本内容;
    所述直接播报模块设置为当所述目标节点的待播报内容属于所述第一预设范围内时,获取所述目标节点的待播报内容,将待播报内容作为文本内容;
    所述分割模块设置为当所述目标节点的待播报内容超过所述第一预设范围时,对所述目标节点进行分割处理,将分割处理后待播报内容作为文本内容。
  10. 根据权利要求9所述的装置,其中,所述合并模块进一步设置为:
    当所述目标节点的待播报内容未达到第一预设范围时,从所述目标节点开始顺序查找所述目标节点的待合并节点,其中,所述待合并节点为所述目标节点的具有文本信息的不具有子节点的兄弟节点;
    当所述待合并节点与所述目标节点的元素标签相同时,将所述目标节点与所述合并节点进行合并得到合并节点;以及
    检测所述合并节点的待播报内容的大小是否未达到第一预设范围;当所述合并节点的待播报内容的大小未达到第一预设范围时,继续查找下一与所述目标节点的元素标签相同的待合并节点,直到所述合并节点的待播报内容的大小达到第一预设范围或不存在与所述目标节点的元素标签相同的待合并节点;将合并处理后的待播报内容作为文本内容。
PCT/CN2017/073946 2016-07-25 2017-02-17 一种语音播报方法及装置 WO2018018882A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/320,776 US11074037B2 (en) 2016-07-25 2017-02-17 Voice broadcast method and apparatus
EP17833197.1A EP3489845A4 (en) 2016-07-25 2017-02-17 LANGUAGE RADIO PROCESS AND DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610590146.5 2016-07-25
CN201610590146.5A CN107656933B (zh) 2016-07-25 2016-07-25 一种语音播报方法及装置

Publications (1)

Publication Number Publication Date
WO2018018882A1 true WO2018018882A1 (zh) 2018-02-01

Family

ID=61017525

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/073946 WO2018018882A1 (zh) 2016-07-25 2017-02-17 一种语音播报方法及装置

Country Status (4)

Country Link
US (1) US11074037B2 (zh)
EP (1) EP3489845A4 (zh)
CN (1) CN107656933B (zh)
WO (1) WO2018018882A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110737817A (zh) * 2018-07-02 2020-01-31 中兴通讯股份有限公司 浏览器的信息处理方法、装置、智能设备及存储介质
CN110047214A (zh) * 2019-04-23 2019-07-23 深圳市丰巢科技有限公司 一种快递柜语音播报的配置方法、装置、设备及存储介质
CN110308887A (zh) * 2019-05-14 2019-10-08 广东康云科技有限公司 基于浏览器的ai机器人实现方法、系统及存储介质
CN110703614B (zh) * 2019-09-11 2021-01-22 珠海格力电器股份有限公司 语音控制方法、装置、语义网络构建方法及装置
CN115766933A (zh) * 2022-10-31 2023-03-07 中国农业银行股份有限公司 一种无障碍模式语音播报方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117317A (zh) * 2010-12-28 2011-07-06 北京航空航天大学 一种基于语音技术的盲人互联网系统
US20120123781A1 (en) * 2010-11-11 2012-05-17 Park Kun Touch screen device for allowing blind people to operate objects displayed thereon and object operating method in the touch screen device
CN103188316A (zh) * 2011-12-30 2013-07-03 上海博泰悦臻电子设备制造有限公司 车载端,车载语音播报的实现系统、适配装置和启动方法
US20140282002A1 (en) * 2013-03-15 2014-09-18 Verizon Patent And Licensing Inc. Method and Apparatus for Facilitating Use of Touchscreen Devices

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002213279A1 (en) * 2000-10-16 2002-04-29 Text Analysis International, Inc. Method for analyzing text and method for builing text analyzers
US8065151B1 (en) * 2002-12-18 2011-11-22 At&T Intellectual Property Ii, L.P. System and method of automatically building dialog services by exploiting the content and structure of websites
US8311835B2 (en) * 2003-08-29 2012-11-13 Microsoft Corporation Assisted multi-modal dialogue
WO2006003714A1 (ja) * 2004-07-06 2006-01-12 Fujitsu Limited 読み上げ機能付きブラウザプログラム、読み上げ機能付きブラウザ、ブラウジング処理方法およびブラウザプログラム記録媒体
US7885390B2 (en) * 2005-07-01 2011-02-08 Soleo Communications, Inc. System and method for multi-modal personal communication services
US9318108B2 (en) * 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
CN101325768B (zh) * 2007-06-14 2012-05-30 鸿富锦精密工业(深圳)有限公司 移动通信装置及其按键输入方法
US10276170B2 (en) * 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US8984604B2 (en) * 2010-05-07 2015-03-17 Blackberry Limited Locally stored phishing countermeasure
WO2014062859A1 (en) * 2012-10-16 2014-04-24 Audiologicall, Ltd. Audio signal manipulation for speech enhancement before sound reproduction
CN102946469B (zh) * 2012-10-18 2014-12-10 小米科技有限责任公司 移动终端的语音播报方法和装置以及移动终端
US9356574B2 (en) * 2012-11-20 2016-05-31 Karl L. Denninghoff Search and navigation to specific document content
US20150314454A1 (en) * 2013-03-15 2015-11-05 JIBO, Inc. Apparatus and methods for providing a persistent companion device
CN104572650A (zh) * 2013-10-11 2015-04-29 中兴通讯股份有限公司 浏览器智能阅读实现方法、装置及其终端
CN103853355A (zh) * 2014-03-17 2014-06-11 吕玉柱 电子设备操作方法及其操控设备
CN111427534B (zh) * 2014-12-11 2023-07-25 微软技术许可有限责任公司 能够实现可动作的消息传送的虚拟助理系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120123781A1 (en) * 2010-11-11 2012-05-17 Park Kun Touch screen device for allowing blind people to operate objects displayed thereon and object operating method in the touch screen device
CN102117317A (zh) * 2010-12-28 2011-07-06 北京航空航天大学 一种基于语音技术的盲人互联网系统
CN103188316A (zh) * 2011-12-30 2013-07-03 上海博泰悦臻电子设备制造有限公司 车载端,车载语音播报的实现系统、适配装置和启动方法
US20140282002A1 (en) * 2013-03-15 2014-09-18 Verizon Patent And Licensing Inc. Method and Apparatus for Facilitating Use of Touchscreen Devices

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3489845A4 *

Also Published As

Publication number Publication date
EP3489845A4 (en) 2019-06-05
CN107656933B (zh) 2022-02-08
EP3489845A1 (en) 2019-05-29
US20190163439A1 (en) 2019-05-30
US11074037B2 (en) 2021-07-27
CN107656933A (zh) 2018-02-02

Similar Documents

Publication Publication Date Title
WO2018018882A1 (zh) 一种语音播报方法及装置
US9129606B2 (en) User query history expansion for improving language model adaptation
KR102247533B1 (ko) 음성 인식 장치 및 그 제어 방법
US9760551B2 (en) Generating regular expression
WO2018045646A1 (zh) 基于人工智能的人机交互方法和装置
TW201514845A (zh) 從網頁擷取標題及主體
US11157586B2 (en) Scoping searches within websites
US20150242474A1 (en) Inline and context aware query box
EP3622383B1 (en) Data transfers from memory to manage graphical output latency
CN106055721A (zh) 一种网页无障碍处理方法及相关设备
WO2017028407A1 (zh) 一种用于提取文本摘要的方法与设备
US20180285444A1 (en) Rewriting contextual queries
TWI519980B (zh) 網頁顯示方法和裝置及電腦可讀取儲存介質
WO2017162031A1 (zh) 一种信息采集方法和装置,以及一种智能终端
US9811592B1 (en) Query modification based on textual resource context
EP3161675B1 (en) Indexing actions for resources
CN112685534B (zh) 在创作过程中生成已创作内容的脉络信息的方法与设备
US10467300B1 (en) Topical resource recommendations for a displayed resource
WO2016018682A1 (en) Processing image to identify object for insertion into document
US20170293683A1 (en) Method and system for providing contextual information
US10452727B2 (en) Method and system for dynamically providing contextually relevant news based on an article displayed on a web page
US20130230248A1 (en) Ensuring validity of the bookmark reference in a collaborative bookmarking system
JP6488399B2 (ja) 情報提示システム、及び情報提示方法
WO2017181619A1 (zh) 页面响应方法及装置
CN104134177A (zh) 一种信息分析处理系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17833197

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017833197

Country of ref document: EP

Effective date: 20190225