CN115291826A - Display method and device and electronic equipment - Google Patents

Display method and device and electronic equipment Download PDF

Info

Publication number
CN115291826A
CN115291826A CN202210927614.9A CN202210927614A CN115291826A CN 115291826 A CN115291826 A CN 115291826A CN 202210927614 A CN202210927614 A CN 202210927614A CN 115291826 A CN115291826 A CN 115291826A
Authority
CN
China
Prior art keywords
text
service
audio
sub
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210927614.9A
Other languages
Chinese (zh)
Inventor
柯芝锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202210927614.9A priority Critical patent/CN115291826A/en
Publication of CN115291826A publication Critical patent/CN115291826A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a display method, a display device and electronic equipment, and belongs to the technical field of electronics. The method comprises the following steps: under the condition of receiving audio transmitted by an intelligent voice system, converting the audio into characters to obtain a target text; extracting N text contents from the target text, wherein different text contents are used for indicating different voice services, and N is a positive integer; and displaying the N text contents corresponding to N controls, wherein different controls correspond to different text contents, and the controls are used for allowing a user to select the voice service indicated by the text contents corresponding to the controls.

Description

Display method and device and electronic equipment
Technical Field
The application belongs to the technical field of electronics, and particularly relates to a display method and device and electronic equipment.
Background
The intelligent voice system is based on Natural Language Processing (NLP), speech Recognition (ASR) and Speech synthesis (Text to Speech, TTS) technologies, realizes outbound and response of voice calls, can communicate with customers using Natural and vivid dialogues, and helps enterprises to improve outbound efficiency.
At present, a user generally interacts with an intelligent voice system through an electronic device such as a mobile phone and a tablet computer, so that the user can select a desired service according to an audio of the intelligent voice system. However, since the speech rate of the smart speech system may be too fast, the user needs to repeatedly listen to the audio in interacting with the smart speech system through the electronic device, which results in inefficient interaction between the user and the smart speech system through the electronic device.
Disclosure of Invention
The embodiment of the application aims to provide a display method, a display device and electronic equipment, and can solve the problem of low interaction efficiency of the electronic equipment and an intelligent voice system.
In a first aspect, an embodiment of the present application provides a display method, including:
under the condition of receiving audio transmitted by an intelligent voice system, converting the audio into characters to obtain a target text;
extracting N text contents from the target text, wherein different text contents are used for indicating different voice services, and N is a positive integer;
and displaying the N text contents corresponding to N controls, wherein different controls correspond to different text contents, and the controls are used for a user to select the voice service indicated by the corresponding text contents.
In a second aspect, an embodiment of the present application provides a display device, including:
the audio conversion module is used for converting the audio into characters under the condition of receiving the audio transmitted by the intelligent voice system to obtain a target text;
a text content extracting module, configured to extract N text contents from the target text, where different text contents are used to indicate different voice services, and N is a positive integer;
and the display module is used for displaying the N text contents corresponding to N controls, wherein different controls correspond to different text contents, and the controls are used for enabling a user to select the voice service indicated by the corresponding text contents.
In a third aspect, embodiments of the present application provide an electronic device, which includes a processor and a memory, where the memory stores a program or instructions executable on the processor, and the program or instructions, when executed by the processor, implement the steps of the method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product, stored on a storage medium, for execution by at least one processor to implement the method according to the first aspect.
In the embodiment of the application, under the condition of receiving the audio transmitted by the intelligent voice system, the audio is converted into characters to obtain a target text; then, extracting N text contents for indicating different voice services from the target text; and finally, displaying the N text contents corresponding to the N controls. Therefore, in the process that the user communicates with the intelligent voice system through the electronic equipment, the control can be operated to select the required voice service through the text content corresponding to each control displayed in the electronic equipment, so that the occurrence of the situation that the user repeatedly listens to the audio output by the voice system is reduced, and the interaction efficiency of the user with the intelligent voice system through the electronic equipment is improved.
Drawings
FIG. 1 is a schematic flow chart diagram of an embodiment of a display method provided herein;
FIG. 2 is a schematic diagram of a display interface in an embodiment of a display method provided herein;
FIG. 3 is another schematic diagram of a display interface in an embodiment of a display method provided herein;
FIG. 4 is another schematic diagram of a display interface in an embodiment of a display method provided herein;
FIG. 5 is another schematic view of a display interface in an embodiment of a display method provided herein;
FIG. 6 is a schematic structural diagram of an embodiment of a display device provided herein;
FIG. 7 is a schematic structural diagram of an embodiment of an electronic device provided herein;
fig. 8 is a schematic structural diagram of another embodiment of an electronic device provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The display method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.
Please refer to fig. 1, which is a flowchart illustrating a display method according to an embodiment of the present disclosure. The display method is applied to the electronic equipment, and as shown in fig. 1, the method comprises the following steps:
step 101, under the condition of receiving audio transmitted by an intelligent voice system, converting the audio into characters to obtain a target text;
102, extracting N text contents from a target text, wherein different text contents are used for indicating different voice services, and N is a positive integer;
and 103, displaying the N text contents corresponding to the N controls, wherein different controls correspond to different text contents, and the controls are used for the user to select the voice service indicated by the corresponding text contents.
In the embodiment of the application, under the condition that the audio transmitted by the intelligent voice system is received, the audio is converted into characters to obtain a target text; then, extracting N text contents for indicating different voice services from the target text; and finally, displaying the N text contents corresponding to the N controls. Therefore, in the process that the user makes a voice call with the intelligent voice system through the electronic equipment, the control can be operated through the text content corresponding to each control displayed in the electronic equipment to select the required voice service, so that the occurrence of the situation that the user repeatedly listens to the audio output by the voice system is reduced, and the interaction efficiency of the user with the intelligent voice system through the electronic equipment is further improved.
In the step 101, when the audio transmitted by the intelligent speech system is received, the audio is converted into characters to obtain the target text.
The audio is converted into characters to obtain the target text, an audio character conversion tool is preset in the electronic equipment, and the received audio can be converted in real time through the audio character conversion tool to obtain the target text.
For example, in the process of the above electronic device interacting with the intelligent voice system, the electronic device may convert the received audio into the following text 1 (i.e. target text) in real time through its audio-to-text tool:
if the service A1 is required to be provided, please press 1; if the service A2 is provided, please press 2; if the service A3 is required to be provided, please press A3; if the service A5 is provided, please press 5; if the service A6 is provided, please press 6; if the service A7 is provided, please press 7; if the service A8 is provided, please press 8; service A9 is provided if needed, please press 9, etc.
In step 102, after the electronic device obtains the target text, the electronic device may extract N text contents from the target text.
The N text contents are used to indicate different voice services. For example, the following 9 text contents can be extracted from the above text 1: the service A1 is provided on demand, the service A2 is provided on demand, the service A3 is provided on demand, the service A4 is provided on demand, the service A5 is provided on demand, the service A6 is provided on demand, the service A7 is provided on demand, the service A8 is provided on demand, and the service A9 is provided on demand.
The extracting N text contents from the target text may be that a plurality of preset fields are preset in the electronic device, and the target text is divided into a plurality of text contents through the plurality of preset fields; and when the preset field does not exist in the target text, taking all the contents of the target text as the text contents.
For example, a field "please press X" is preset in the electronic device, and when the target text is the text 1, the electronic device may use a text before each "please press X" as the text content, that is, the 9 text contents may be extracted: the service A1 is provided on demand, the service A2 is provided on demand, the service A3 is provided on demand, the service A4 is provided on demand, the service A5 is provided on demand, the service A6 is provided on demand, the service A7 is provided on demand, the service A8 is provided on demand, and the service A9 is provided on demand.
In the step 103, after the electronic device extracts the N text contents, the electronic device may display controls corresponding to the N text contents, so that a user may operate each control according to the text contents to select a voice service indicated by the text content corresponding to the control.
The displaying of the N text contents corresponding to the N controls may be that the electronic device displays the N controls in a display interface of the electronic device, and each control of the N controls displays text contents corresponding to the control.
For example, in a case where the electronic device displays a desktop interface, after the electronic device extracts the 9 text contents, the electronic device may replace the display contents of the 9 application icons in the desktop interface with the 9 text contents. Under the condition that a user clicks any one application icon based on text contents of the 9 application icons, the electronic device may send an instruction to the intelligent voice system to request the intelligent voice system to provide a voice service indicated by the text contents displayed on the clicked control, and if the user clicks the control 3 on which the "service A3 needs to be provided", the electronic device may request the intelligent voice system to provide the service A3, that is, at this time, the 9 application icons are 9 controls.
In some embodiments, the converting the audio into the text to obtain the target text includes:
converting the audio into words, and adding identifiers at the partial words obtained by conversion to obtain a target text comprising a plurality of identifiers, wherein the identifiers are used for identifying the start and stop and the interruption of the audio content;
the extracting N text contents from the target text includes:
dividing the target text into a plurality of subfolders according to the identifiers, wherein each subfolder comprises characters between two identifiers;
and determining at least one sub text as N text contents in the plurality of sub texts.
In the embodiment, in the process of generating the target text, the identifier is added to the target text, the target text is divided into the plurality of subfolders according to the identifier, and the N text contents are determined in the plurality of subfolders, so that the efficiency and the accuracy of extracting the N text contents from the target text can be further improved.
The identifier added to the converted partial text may be added to the partial text of the converted text by the electronic device according to the audio content of the audio. The identifier may include a start-stop symbol and a separator, where the start-stop symbol is used to identify the start-stop of the audio content, and the audio content represented by the start-stop symbol may be all or part of the audio content; and separators are used to identify discontinuities in the audio content.
For example, when the pause duration of a speech segment in the received audio is greater than or equal to a preset duration, a symbol "|" (i.e., a separator) is added after the converted text before the pause; under the condition that the voice content of a received voice segment is ' please press [ number, # ], adding a symbol ' @ | ' to the characters after the voice segment is converted (namely start and stop symbols); and, add the symbol "@" at the beginning of speech, and so on.
In the above, the target text may be divided into a plurality of sub-texts based on the plurality of identifiers, and characters between any two adjacent identifiers may be used as the sub-texts.
Alternatively, when the identifier includes a separator and a start-stop, the dividing the target text into a plurality of subfolders based on the plurality of identifiers may include: and taking the characters between two adjacent start and stop characters as the sub-texts.
For example, assume that the target text added with the identifier is: the @ a service | c service | d service | please press 1@ | b service | please press 2@ | \ 8230 @ | X service | please press 9@ |. Then, the electronic device may segment the target text according to the symbol "@ |" to obtain the sub-text: "a service | c service | d service | please press 1", "| b service | please press 2",8230, 8230, and "X service | please press 9".
The determining at least one sub-text in the plurality of sub-texts as N text contents may include: acquiring a keyword list corresponding to the intelligent voice system; matching each sub text of the plurality of sub texts with keywords in a keyword list; at least one sub-text matching the keyword is determined as N text contents.
The obtaining of the keyword list corresponding to the intelligent voice system may be that the intelligent voice system sends the stored keyword list to the electronic device when the electronic device and the intelligent voice system are connected to perform voice.
For example, assume that the keyword list includes keywords: a. b, c, d, \8230; \8230, and X, in obtaining sub-texts: in the case of "a service | c service | d service | please press 1", "| b service | please press 2", \8230 \ 8230and "X service | please press 9", the electronic device can obtain the above-mentioned N text contents as "a service, c service, d service", "b service", \8230 \ 8230and "X service".
In some embodiments, the determining at least one sub-text in the plurality of sub-texts is N text contents includes:
determining at least one sub text matched with the service indication information of each service option in the intelligent voice interaction interface in the plurality of sub texts, wherein the intelligent voice interaction interface comprises at least one service option, and the service indication information is displayed on each service option;
and acquiring text contents matched with the service indication information of the service options according to at least one sub-text matched with each service option to obtain N text contents matched with the service indication information of the N service options.
In the embodiment, the efficiency and the accuracy of extracting the N text contents from the target text can be further improved by extracting the N text contents matched with the service indication information of the N service options from the target text.
In the process of voice call between the electronic device and the intelligent voice system, the electronic device may display the intelligent voice interaction interface, where the intelligent voice interaction interface includes at least one service option, and service indication information is displayed on each service option, where the service indication information may be a text corresponding to a part of audio in the audio, and may correspondingly indicate a voice service that the intelligent voice system can provide.
For example, during the voice call between the electronic device and the intelligent voice system, the electronic device may display an intelligent voice interaction interface as shown in fig. 2, where the intelligent voice interaction interface includes 10 digital virtual keys (i.e., service options), and each digital virtual key displays a corresponding number, for example, the number "7" is displayed in the digital virtual key 21, and so on.
The determining of the at least one sub-text matching the service indication information of each service option in the plurality of sub-texts may be matching each sub-text with the service indication information of each service option, and determining the at least one sub-text including the service indication information of the service option as the at least one sub-text matching the service indication information of the service option.
For example, in a case where the electronic device displays the intelligent voice interaction interface as shown in fig. 2, if the sub-text includes: "a service | c service | d service | please press 1", "| b service | please press 2", \8230 \ and "X service | please press 9", the electronic device may determine "a service | c service | d service | please press 1" as a subfile matching the virtual key displaying the number "1", determine "| b service | please press 2" as a subfile matching the virtual key displaying the number "2", 8230 \8230 |, and determine "| X service | please press 9" as a subfile matching the virtual key displaying the number "9".
The obtaining of the text content matched with the service indication information of the service option according to the at least one sub-text matched with each service option may be to obtain N text contents matched with the service indication information of the N service options, where the mapping table is constructed according to the at least one sub-text matched with each service option in the N service options, the mapping table includes N mapping relationships, each mapping relationship is a relationship between a service option and at least one service content text, and each service content text is extracted from a sub-text matched with a service option; and acquiring the text content according to at least one service content in each mapping relation to obtain N text contents.
For example, in a case where it is determined that "a service | c service | d service | please press 1" matches a virtual key on which a number "1" is displayed, and "| b service | please press 2" matches a virtual key on which a number "2" is displayed, \8230 \ and "| X service | please press 9" matches a virtual key on which a number "9" is displayed, the electronic device may divide each child text according to the separator "|" and add the divided operation action text and the number of the virtual key to a mapping table in which "a service", "c service", "d service" has a mapping relationship with the virtual key on which a number "1" is displayed, and "b service" has a mapping relationship with the virtual key on which a number "2" is displayed, \8230 \\\\\ X service "has a mapping relationship with the virtual key on which a number" 9 "is displayed.
The above obtaining the text content according to at least one service content in each mapping relationship to obtain N text contents may be that all the service contents in each mapping relationship are used as the text contents. For example, for the mapping relationship between "a service", "c service", and "d service" and the virtual key with the number "1", the "a service, c service, d service" may be used as the text content.
In some embodiments, the obtaining, according to the at least one sub-text matched to each service option, text content matched to the service indication information of the service option includes:
and refining the content of the at least one sub-text matched with each service option to obtain the text content matched with the service indication information of the service option.
In the embodiment, the content of the at least one sub-text matched with each service option is refined to obtain the text content matched with the service indication information of the service option, so that the displayed text content is simpler, and the display effect is improved.
The extracting of the content of the at least one sub-text matched with each service option may be that the electronic device extracts the content of the at least one sub-text matched with each service option through a Named Entity Recognition (NER) technology to obtain a text content.
For example, for the mapping relationship between "a service", "c service" and "d service" and the virtual key with the number "1", the "a service", "c service" and "d service" may be refined to obtain the text content "acd" matching the virtual key with the number "1"; for the mapping relationship between the "b service" and the virtual key with the number "2", the "b service" can be refined to obtain the text content "b" matching with the virtual key with the number "1", and so on.
In some embodiments, the converting the audio into the text to obtain the target text includes:
converting a first part of audio of the audio into characters to obtain a first sub-text;
under the condition that the first sub-text is matched with service indication information of a preset target service option and a variable value of a preset mark variable is a first preset value, converting a second part of audio of the audio into characters to obtain a second sub-text, and updating the variable value of the mark variable to be a second preset threshold value, wherein the target service option is at least one service option in the intelligent voice interaction interface, the second part of audio is part of audio received after the first part of audio, and the target text comprises the first sub-text and the second sub-text;
and under the condition that the first sub text is matched with the service indication information of the preset target service option and the variable value of the mark variable is the second preset value, stopping converting the second part of audio into characters, and searching the target sub text comprising the first sub text in the electronic equipment.
In this embodiment, in the process of receiving the audio by the electronic device, it may be determined to convert all the audio according to the first sub-text obtained by converting the first part of the audio, the service indication information of the target service option, and the variable of the flag variable, or to search for a pre-stored target text from the electronic device, so that repeated conversion of the audio may be avoided.
The first part of audio may be a part of audio converted into the audio of the target text. For example, the first part of audio may be a part of the received audio, where the audio content is "a service, c service, d service, please press 1", and so on.
The target service option can be any service option in the intelligent voice interaction interface. Specifically, the target service option may be a service option in which the service indication information is first played. For example, the target service option may be a virtual key with a number of "1".
After the electronic device acquires the first sub-text, the electronic device may match the first sub-text with the service indication information of the target service option. For example, in a case where the first sub-text is "a service, c service, d service, please press 1", and the target service option is a virtual key with a number of "1", the electronic device may determine that the first sub-text matches the service indication information of the target service option.
The flag variable may be any variable defined as required, and a variable value of the flag variable may include a first preset value and a second preset value, where the first preset value is used to indicate that a successful matching of the first sub-text and the service indication information of the target service option has not occurred before; the second preset value is used for indicating that the condition that the matching of the first sub-text and the service indication information of the target service option is successful occurs before.
And under the condition that the first sub-text is matched with the service indication information of the preset target service option and the variable value of the preset mark variable is the first preset value, the electronic equipment can convert the second part of the audio into characters to obtain a second sub-text and update the variable value of the mark variable to a second preset threshold value.
For example, assuming that the electronic device initializes a start flag variable (i.e., the preset flag variable) StartFlag to False (i.e., a first preset value), if the first sub-text is "a service, c service, d service, please press 1", and the target service option is a virtual key with a number of "1", and StartFlag is False, the electronic device may convert the remaining audio into text, obtain a second sub-text "service b", please press 2; \8230; x service, please press 9", and update StartFlag to True, and" a service, c service, d service, please press 1; service b, please press 2; 823060, 8230; x service, please save as 9". And under the condition that the first sub text is matched with the service indication information of the preset target service option and the variable value of the mark variable is the second preset value, stopping converting the second part of audio into characters, and searching the target sub text comprising the first sub text in the electronic equipment.
For example, when the first sub-text is "a service, c service, d service, please press 1", the target service option is a virtual key with the number "1", and StartFlag is True, the electronic device determines that the audio has been converted before and obtains the target text, so the electronic device please press 1 "to find the target text of the audio according to" a service, c service, d service.
In some embodiments, the displaying N text contents corresponding to N controls includes:
displaying N text contents on the intelligent voice interaction interface corresponding to the N service options; alternatively, the first and second electrodes may be,
and displaying a popup window in the display interface, wherein the popup window comprises N controls, and N text contents are correspondingly displayed on the N controls.
In this embodiment, N text contents may be displayed on the intelligent voice interaction interface corresponding to the N service options, or a pop-up window including N controls on which the N text contents are displayed may also be displayed in the display interface, so that the manner of displaying the N text contents is more flexible.
The display interface of the electronic equipment can be an intelligent voice interaction interface; alternatively, the interface may be other than the above-mentioned smart voice interaction interface, for example, an application program interface, and the like.
For example, when the electronic device is in a call with the intelligent voice system and the electronic device displays an instant chat application program interface, the electronic device may display a pop-up window in the instant chat application program interface, and 9 controls may be displayed in the pop-up window, and the 9 controls respectively display the 9 text contents.
In some embodiments, the displaying N text contents corresponding to N service options on the intelligent voice interactive interface includes:
displaying N text contents in N service options of the intelligent voice interaction interface, wherein each service option comprises service indication information and text contents; alternatively, the first and second electrodes may be,
and updating the service indication information of each service option in the N service options of the intelligent voice interaction interface into N text contents.
In this embodiment, the electronic device may display the text content and the service indication information in parallel on the service option, or may replace the text content with the service indication information of the service option, so that the display mode is more flexible.
The electronic device may display N text contents in the N service options, and display N text contents in the N service options.
For example, as shown in fig. 3, during the voice call between the electronic device and the intelligent voice system for travel service, the electronic device may display each text content and corresponding service indication information in parallel, such as a parallel display of numeral "1" and "passenger transport", a parallel display of numeral "2" and "freight transport" \ 8230;, and so on.
The electronic equipment can also update the service indication information of each service option in the N service options into N text contents, so that the N text contents can be amplified and displayed, and the display effect is improved.
For example, as shown in fig. 4, during a voice call between the electronic device and the intelligent voice system for travel services, the electronic device may update the number "1" to "passenger transport", the number "2" to "freight transport", \8230;, etc.
In some embodiments, the displaying N text contents corresponding to N service options on the intelligent voice interactive interface further includes:
and updating the display positions of the N service options under the condition that the service indication information of each service option in the N service options of the intelligent voice interaction interface is updated into N text contents.
In this embodiment, when the service indication information of each of the N service options is updated to N text contents, the electronic device may further update the display positions of the N service options, so as to further improve the display effect of the electronic device.
For example, in the case that the N service options are distributed in a rectangular array in the smart voice interaction interface, the electronic device may update the N service options to be distributed in a circular ring shape, and so on.
In this embodiment of the application, when the intelligent voice interaction interface displays the N controls, the electronic device may send an instruction to the intelligent voice system when the electronic device receives a touch input (such as a click input or a press input) to any one of the controls, where the instruction is used to instruct the intelligent voice system to provide a voice service indicated by text content corresponding to the control targeted by the touch input.
In some embodiments, the intelligent voice interaction interface displays N controls and a first control, and the N controls are distributed around the first control.
After the displaying N text contents corresponding to the N controls, the method may further include:
and under the condition that target input to a second control of the first control and the N controls is received, responding to the target input, and sending a target instruction to the intelligent voice system, wherein the target instruction is used for indicating voice service indicated by text content corresponding to the second control.
In this embodiment, the electronic device may send the target instruction to the intelligent voice system to instruct the intelligent voice system to provide the voice service indicated by the text content corresponding to the second control when receiving the target input, so that the misoperation of the user may be reduced.
The target input may be any input to the first control and the second control. For example, the target input may be a slide operation that slides between a first control and a second control; or, the clicking operation of the first control and the second control can be clicked in sequence; alternatively, a drag operation may be performed on dragging the first control to the second control, and so on.
For example, as shown in fig. 5, if the electronic device receives an operation that the user first presses the "press and hold" control (i.e., the first control) and slides to the "ship" control (i.e., the second control), the electronic device may send a target instruction to the intelligent voice system, where the target instruction is used to instruct the intelligent voice system to provide a ship voice service.
In addition, the electronic device may also update the background of the intelligent voice interaction interface, and the like, which is not described herein again.
According to the display method provided by the embodiment of the application, the execution main body can be a display device. In the embodiment of the present application, a display device executing a display method is taken as an example, and the display device provided in the embodiment of the present application is described.
Fig. 6 is a schematic structural diagram of a display device according to an embodiment of the present application. As shown in fig. 6, the display device 600 includes:
the audio conversion module 601 is configured to convert an audio into characters to obtain a target text when the audio transmitted by the intelligent speech system is received;
a text content extracting module 602, configured to extract N text contents from the target text, where different text contents are used to indicate different voice services, and N is a positive integer;
the display module 603 is configured to display the N text contents corresponding to the N controls, where different controls correspond to different text contents, and the controls are used for a user to select a voice service indicated by the text contents corresponding to the controls.
In some embodiments, the audio conversion module 601 is specifically configured to:
and converting the audio into words, and adding identifiers to the converted partial words to obtain target text comprising a plurality of identifiers, wherein the identifiers are used for identifying the start and the end of the audio content and the break.
The text content extracting module 602 may include:
a text dividing unit for dividing the target text into a plurality of subfolders according to the identifiers, wherein each subfolder comprises characters between two identifiers;
and the text content determining unit is used for determining at least one sub-text as N text contents in the plurality of sub-texts.
In some embodiments, the text content determining unit includes:
the matching subunit is used for determining at least one sub text matched with the service indication information of each service option in the plurality of sub texts;
and the text content acquisition subunit is used for acquiring the text content matched with the service indication information of the service options according to the at least one sub-text matched with each service option to obtain N text contents matched with the service indication information of the N service options.
In some embodiments, the text content obtaining unit is specifically configured to:
and refining the content of the at least one sub-text matched with each service option to obtain the text content matched with the service indication information of the service option.
In some embodiments, the audio conversion module 601 includes:
the first conversion unit is used for converting a first part of audio of the audio into characters to obtain a first sub-text;
the second conversion unit is used for converting a second part of audio of the audio into words to obtain a second sub-text and updating the variable value of the mark variable to a second preset threshold value under the condition that the first sub-text is matched with the service indication information of the preset target service option and the variable value of the preset mark variable is a first preset value, wherein the target service option is at least one service option in the intelligent voice interaction interface, the second part of audio is part of audio received after the first part of audio, and the target text comprises the first sub-text and the second sub-text;
and the searching unit is used for stopping converting the second part of audio into characters and searching the target sub text comprising the first sub text in the electronic equipment under the condition that the first sub text is matched with the preset service indication information of the target service option and the variable value of the mark variable is a second preset value.
In some embodiments, the display module 603 is specifically configured to:
displaying N text contents on the intelligent voice interaction interface corresponding to the N service options; alternatively, the first and second electrodes may be,
and displaying a popup window in the display interface, wherein the popup window comprises N controls, and N text contents are correspondingly displayed on the N controls.
In some embodiments, the display module 603 is specifically configured to:
displaying N text contents in N service options of the intelligent voice interaction interface, wherein each service option comprises service indication information and text contents; alternatively, the first and second liquid crystal display panels may be,
and updating the service indication information of each service option in the N service options of the intelligent voice interaction interface into N text contents.
In some embodiments, the display module 603 is further configured to:
and the updating module is used for updating the display positions of the N service options under the condition that the service indication information of each service option in the N service options of the intelligent voice interaction interface is updated into N text contents.
In some implementations, the intelligent voice interaction interface displays N controls and a first control, with the N controls distributed around the first control.
The apparatus 600, may further include:
and the sending module is used for responding to the target input and sending a target instruction to the intelligent voice system under the condition that the target input to the first control and a second control in the N controls is received, wherein the target instruction is used for indicating the voice service indicated by the text content corresponding to the second control.
The display device in the embodiment of the present application may be an electronic device, or may be a component in an electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be a device other than a terminal. The electronic Device may be, for example, a Mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic Device, a Mobile Internet Device (MID), an Augmented Reality (AR)/Virtual Reality (VR) Device, a robot, a wearable Device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and may also be a server, a Network Attached Storage (Network Attached Storage, NAS), a personal computer (NAS), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not limited in particular.
The display device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.
The display device provided in the embodiment of the present application can implement each process implemented in the method embodiment of fig. 1, and can achieve the same technical effect, and is not described here again to avoid repetition.
Optionally, as shown in fig. 7, an electronic device 700 is further provided in an embodiment of the present application, and includes a processor 701 and a memory 702, where the memory 702 stores a program or an instruction that can be executed on the processor 701, and when the program or the instruction is executed by the processor 701, the steps of the display method embodiment are implemented, and the same technical effect can be achieved, and are not described again here to avoid repetition.
It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic device and the non-mobile electronic device described above.
Fig. 8 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 800 includes, but is not limited to: a radio frequency unit 801, a network module 802, an audio output unit 803, an input unit 804, a sensor 805, a display unit 806, a user input unit 807, an interface unit 808, a memory 809, and a processor 810.
Those skilled in the art will appreciate that the electronic device 800 may further comprise a power supply (e.g., a battery) for supplying power to the various components, and the power supply may be logically connected to the processor 810 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system. The electronic device structure shown in fig. 8 does not constitute a limitation to the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.
Wherein, the processor 810 is configured to:
under the condition of receiving audio transmitted by an intelligent voice system, converting the audio into characters to obtain a target text;
extracting N text contents from the target text, wherein different text contents are used for indicating different voice services, and N is a positive integer;
and displaying the N text contents corresponding to the N controls, wherein different controls correspond to different text contents, and the controls are used for a user to select the voice service indicated by the corresponding text contents.
In some embodiments, the processor 810 is further configured to:
converting the audio into words, and adding identifiers at the partial words obtained by conversion to obtain a target text comprising a plurality of identifiers, wherein the identifiers are used for identifying the start and stop and the interruption of the audio content;
dividing the target text into a plurality of sub-texts according to the identifiers, wherein each sub-text comprises characters between two identifiers;
and determining at least one of the plurality of sub texts as the N text contents.
In some embodiments, the processor 810 is further configured to:
determining at least one piece of sub-text matched with the service indication information of each service option in the intelligent voice interaction interface in the plurality of pieces of sub-text, wherein the intelligent voice interaction interface comprises at least one service option, and the service indication information is displayed on each service option;
and acquiring text contents matched with the service indication information of the service options according to at least one sub-text matched with each service option to obtain N text contents matched with the service indication information of the N service options.
In some embodiments, the processor 810 is further configured to:
and refining the content of the at least one sub-text matched with each service option to obtain the text content matched with the service indication information of the service option.
In some embodiments, the processor 810 is further configured to:
converting a first part of audio of the audio into characters to obtain a first sub-text;
under the condition that the first sub-text is matched with service indication information of a preset target service option and a variable value of a preset mark variable is a first preset value, converting a second part of audio of the audio into characters to obtain a second sub-text, and updating the variable value of the mark variable to be a second preset threshold value, wherein the target service option is at least one service option in the intelligent voice interaction interface, the second part of audio is part of audio received after the first part of audio, and the target text comprises the first sub-text and the second sub-text;
and under the condition that the first sub text is matched with the service indication information of the preset target service option and the variable value of the mark variable is the second preset value, stopping converting the second part of audio into characters, and searching the target sub text comprising the first sub text in the electronic equipment.
In some embodiments, the processor 810 is further configured to:
displaying the N text contents on the intelligent voice interaction interface corresponding to the N service options; alternatively, the first and second electrodes may be,
and displaying a popup window in a display interface, wherein the popup window comprises N controls, and the N text contents are correspondingly displayed on the N controls.
In some embodiments, the processor 810 is further configured to:
displaying N text contents in N service options of the intelligent voice interaction interface, wherein each service option comprises service indication information and text contents; alternatively, the first and second electrodes may be,
and updating the service indication information of each service option in the N service options of the intelligent voice interaction interface into N text contents.
In some embodiments, the processor 810 is further configured to:
and updating the display positions of the N service options under the condition that the service indication information of each service option in the N service options is updated into N text contents.
In some embodiments, the intelligent voice interaction interface displays the N controls and the first control, and the N controls are distributed around the first control.
A processor 810, further configured to:
and under the condition that target input to a second control of the first control and the N controls is received, responding to the target input, and sending a target instruction to the intelligent voice system, wherein the target instruction is used for indicating voice service indicated by text content corresponding to the second control.
The display device provided in the embodiment of the present application can implement each process implemented in the method embodiment of fig. 1, and can achieve the same technical effect, and is not described here again to avoid repetition.
It should be understood that in the embodiment of the present application, the input Unit 804 may include a Graphics Processing Unit (GPU) 8041 and a microphone 8042, and the Graphics Processing Unit 8041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 806 may include a display panel 8061, and the display panel 8061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 807 includes at least one of a touch panel 8071 and other input devices 8072. A touch panel 8071, also referred to as a touch screen. The touch panel 8071 may include two portions of a touch detection device and a touch controller. Other input devices 8072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.
The memory 809 may be used to store software programs as well as various data. The memory 809 may mainly include a first storage area storing programs or instructions and a second storage area storing data, wherein the first storage area may store an operating system, application programs or instructions required for at least one function (such as a sound playing function, an image playing function, and the like), and the like. Further, the memory 809 can include volatile memory or nonvolatile memory, or the memory 809 can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM), a Static Random Access Memory (Static RAM, SRAM), a Dynamic Random Access Memory (Dynamic RAM, DRAM), a Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, ddr SDRAM), an Enhanced Synchronous SDRAM (ESDRAM), a Synchronous Link DRAM (SLDRAM), and a Direct Memory bus RAM (DRRAM). The memory 809 in the present embodiment of the application includes, but is not limited to, these and any other suitable types of memory.
Processor 810 may include one or more processing units; optionally, the processor 810 integrates an application processor, which primarily handles operations related to the operating system, user interface, applications, etc., and a modem processor, which primarily handles wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into processor 810.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements the processes of the display method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read only memory ROM, a random access memory RAM, a magnetic or optical disk, and the like.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the foregoing display method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the description is omitted here.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as a system-on-chip, or a system-on-chip.
Embodiments of the present application provide a computer program product, where the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the processes of the foregoing display method embodiments, and achieve the same technical effects, and in order to avoid repetition, details are not described here again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one of 8230, and" comprising 8230does not exclude the presence of additional like elements in a process, method, article, or apparatus comprising the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (16)

1. A display method, comprising:
under the condition of receiving audio transmitted by an intelligent voice system, converting the audio into characters to obtain a target text;
extracting N text contents from the target text, wherein different text contents are used for indicating different voice services, and N is a positive integer;
and displaying the N text contents corresponding to N controls, wherein different controls correspond to different text contents, and the controls are used for a user to select the voice service indicated by the corresponding text contents.
2. The method of claim 1, wherein converting the audio into text to obtain target text comprises:
converting the audio into words, and adding identifiers at the converted partial words to obtain a target text comprising a plurality of identifiers, wherein the identifiers are used for identifying the start and the end and the interruption of the audio content;
the extracting N text contents from the target text includes:
according to the identifiers, the target text is divided into a plurality of sub texts, and each sub text comprises characters between two identifiers;
and determining at least one of the plurality of sub texts to be the N text contents.
3. The method according to claim 2, wherein the determining at least one of the plurality of sub-texts as the N text contents comprises:
determining at least one sub-text matched with the service indication information of each service option in the intelligent voice interaction interface in the plurality of sub-texts, wherein the intelligent voice interaction interface comprises at least one service option, and the service indication information is displayed on each service option;
and acquiring text contents matched with the service indication information of the service options according to at least one sub-text matched with each service option to obtain N text contents matched with the service indication information of the N service options.
4. The method of claim 1, wherein converting the audio into text to obtain target text comprises:
converting a first part of the audio into characters to obtain a first sub-text;
under the condition that the first sub text is matched with service indication information of a preset target service option and a variable value of a preset mark variable is a first preset value, converting a second part of audio of the audio into words to obtain a second sub text, and updating the variable value of the mark variable to a second preset threshold value, wherein the target service option is at least one service option in an intelligent voice interaction interface, the second part of audio is part of audio received after the first part of audio, and the target text comprises the first sub text and the second sub text;
and when the first sub text is matched with the service indication information of the preset target service option and the variable value of the mark variable is the second preset value, stopping converting the second part of audio into characters, and searching the target text comprising the first sub text in the electronic equipment.
5. The method of claim 1, wherein displaying the N text contents corresponding to N controls comprises:
displaying the N text contents on the intelligent voice interaction interface corresponding to the N service options; alternatively, the first and second liquid crystal display panels may be,
and displaying a popup window in a display interface, wherein the popup window comprises N controls, and the N text contents are correspondingly displayed on the N controls.
6. The method of claim 5, wherein displaying the N text contents corresponding to N service options on the intelligent voice interactive interface comprises:
displaying the N text contents in N service options of an intelligent voice interaction interface, wherein each service option comprises the service indication information and the text contents; alternatively, the first and second electrodes may be,
and updating the service indication information of the N service options of the intelligent voice interaction interface into the N text contents.
7. The method of claim 6, wherein displaying the N text contents on the intelligent voice interactive interface corresponding to N service options further comprises:
and updating the display positions of the N service options under the condition that the service indication information of the N service options of the intelligent voice interaction interface is updated to the N text contents.
8. A display device, comprising:
the audio conversion module is used for converting the audio into characters under the condition of receiving the audio transmitted by the intelligent voice system to obtain a target text;
a text content extracting module, configured to extract N text contents from the target text, where different text contents are used to indicate different voice services, and N is a positive integer;
and the display module is used for displaying the N text contents corresponding to N controls, wherein different controls correspond to different text contents, and the controls are used for enabling a user to select the voice service indicated by the corresponding text contents.
9. The apparatus of claim 8, wherein the audio conversion module is specifically configured to:
converting the audio into words, and adding identifiers at the converted partial words to obtain a target text comprising a plurality of identifiers, wherein the identifiers are used for identifying the start and the end and the interruption of the audio content;
the text content extraction module comprises:
a text segmentation unit, configured to segment the target text into a plurality of sub-texts according to the plurality of identifiers, where each sub-text includes a word between two identifiers;
a text content determining unit, configured to determine, in the plurality of sub-texts, that at least one of the sub-texts is the N text contents.
10. The apparatus of claim 9, wherein the text content determining unit comprises:
the matching subunit is used for determining at least one sub-text which is matched with the service indication information of each service option in the intelligent voice interaction interface in the plurality of sub-texts, wherein the intelligent voice interaction interface comprises at least one service option, and the service indication information is displayed on each service option;
and the text content acquisition subunit is used for acquiring the text content matched with the service indication information of the service options according to at least one sub-text matched with each service option to obtain N text contents matched with the service indication information of the N service options.
11. The apparatus of claim 8, wherein the audio conversion module comprises:
the first conversion unit is used for converting a first part of audio of the audio into characters to obtain a first sub-text;
a second conversion unit, configured to convert a second part of audio of the audio into words to obtain a second sub-text and update a variable value of a preset flag variable to a second preset threshold when the first sub-text matches service indication information of the preset target service option and the variable value of the preset flag variable is a first preset value, where the target service option is at least one service option in an intelligent voice interaction interface, the second part of audio is a part of audio received after the first part of audio, and the target text includes the first sub-text and the second sub-text;
and the searching unit is used for stopping converting the second part of audio into characters and searching the target text comprising the first sub-text in the electronic equipment under the condition that the first sub-text is matched with the service indication information of the preset target service option and the variable value of the mark variable is the second preset value.
12. The apparatus of claim 8, wherein the display module is specifically configured to:
displaying the N text contents on the intelligent voice interaction interface corresponding to the N service options; alternatively, the first and second liquid crystal display panels may be,
and displaying a popup window in a display interface, wherein the popup window comprises N controls, and the N text contents are correspondingly displayed on the N controls.
13. The apparatus according to claim 12, wherein the display module is specifically configured to:
displaying the N text contents in N service options of an intelligent voice interaction interface, wherein each service option comprises the service indication information and the text contents; alternatively, the first and second electrodes may be,
and updating the service indication information of the N service options of the intelligent voice interaction interface into the N text contents.
14. The apparatus of claim 13, wherein the display module is further configured to:
and under the condition that the service indication information of the N service options of the intelligent voice interaction interface is updated to the N text contents, updating the display positions of the N service options.
15. An electronic device, comprising a processor and a memory, the memory storing a program or instructions executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the display method of any one of claims 1-7.
16. A readable storage medium, characterized in that it stores thereon a program or instructions which, when executed by a processor, implement the steps of the display method according to any one of claims 1 to 7.
CN202210927614.9A 2022-08-03 2022-08-03 Display method and device and electronic equipment Pending CN115291826A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210927614.9A CN115291826A (en) 2022-08-03 2022-08-03 Display method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210927614.9A CN115291826A (en) 2022-08-03 2022-08-03 Display method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN115291826A true CN115291826A (en) 2022-11-04

Family

ID=83825480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210927614.9A Pending CN115291826A (en) 2022-08-03 2022-08-03 Display method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN115291826A (en)

Similar Documents

Publication Publication Date Title
US11178454B2 (en) Video playing method and device, electronic device, and readable storage medium
US20150199341A1 (en) Speech translation apparatus, method and program
US9043300B2 (en) Input method editor integration
US11011170B2 (en) Speech processing method and device
CN107885823B (en) Audio information playing method and device, storage medium and electronic equipment
CN107885826B (en) Multimedia file playing method and device, storage medium and electronic equipment
CN111831806A (en) Semantic integrity determination method and device, electronic equipment and storage medium
CN111796747B (en) Multi-open application processing method and device and electronic equipment
CN105283882B (en) Apparatus for text input and associated method
CN112181253A (en) Information display method and device and electronic equipment
CN113992972A (en) Subtitle display method and device, electronic equipment and readable storage medium
CN109951380B (en) Method, electronic device, and computer-readable medium for finding conversation messages
CN112306450A (en) Information processing method and device
CN113055529B (en) Recording control method and recording control device
CN115291826A (en) Display method and device and electronic equipment
CN115412634A (en) Message display method and device
CN114374663A (en) Message processing method and message processing device
CN114024929A (en) Voice message processing method and device, electronic equipment and medium
CN113593614A (en) Image processing method and device
CN112578965A (en) Processing method and device and electronic equipment
CN113126780A (en) Input method, input device, electronic equipment and readable storage medium
CN112764551A (en) Vocabulary display method and device and electronic equipment
CN112417095A (en) Voice message processing method and device
CN112863495A (en) Information processing method and device and electronic equipment
CN113660375B (en) Call method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination