CN108605074B

CN108605074B - Method and equipment for triggering voice function

Info

Publication number: CN108605074B
Application number: CN201780004960.7A
Authority: CN
Inventors: 王培�; 何小文
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2017-01-26
Filing date: 2017-06-12
Publication date: 2021-01-05
Anticipated expiration: 2037-06-12
Also published as: CN108605074A; WO2018137306A1

Abstract

The embodiment of the invention relates to a method and equipment for triggering a voice function, wherein the method comprises the following steps: the method comprises the steps that the terminal equipment obtains interface content presented on an interface of the terminal equipment; and if the working mode of the terminal equipment is detected to be switched into a first mode, converting the interface content into voice information for voice output, wherein the first mode is a working mode that the terminal equipment is accessed into an earphone or starts the earphone function of the terminal equipment. The embodiment of the invention can improve the operation efficiency, reduce the operation of learning and memorizing a large number of gestures by the user and improve the experience of the user.

Description

Method and equipment for triggering voice function

The present application claims priority from a chinese patent application filed by the chinese patent office on 26/1/2017 under application number 201710061841.7 entitled "method and apparatus for headset mode triggering automatic voice feature," the entire contents of which are incorporated herein by reference.

Technical Field

The present invention relates to the field of communications, and in particular, to a method and an apparatus for triggering a voice function.

Background

At present, when a User wants to acquire content displayed on a User Interface (UI) of a terminal, the User can browse the UI through eyes, but the acquisition mode is single, and in addition, when the User is inconvenient to operate the UI, such as turning up and down pages, the User cannot browse the content presented by the UI quickly, and only can browse the current UI Interface.

In the prior art, for the viewing and operation of the content displayed on the UI, a multi-touch mode, a key mode, or a gesture recognition mode may be adopted. In a multi-touch mode, when a user browses contents displayed on a UI, the user needs to continuously interact with an interface to continuously check the contents; in the manner of keys and gestures, because each gesture can only complete a certain specific operation, browsing the UI content includes not only one operation in the specific operation; therefore, the user needs to learn and memorize a large number of operations corresponding to the gestures. In conclusion, the operation efficiency is low in the above mode, the learning and memory burden of the user is improved, and the experience degree of the user is reduced.

Disclosure of Invention

The embodiment of the invention relates to a method and equipment for triggering a voice function. The problem that a user is low in operation efficiency in the process of browsing the UI and the user needs to learn and memorize operations corresponding to a large number of gestures to finish browsing the UI is solved.

In a first aspect, an embodiment of the present invention provides a method for triggering a voice function, where the method includes: the method comprises the steps that the terminal equipment obtains interface content presented on an interface of the terminal equipment; and if the working mode of the terminal equipment is detected to be switched into a first mode, the terminal equipment converts the interface content into voice information to carry out voice output, and the first mode is a working mode in which the terminal equipment is accessed into an earphone or the earphone function of the terminal equipment is started.

According to the embodiment of the invention, by detecting the working mode of the terminal equipment, when the working mode of the terminal equipment is switched to the first mode, the voice function of the terminal equipment can be triggered, and the interface content of the terminal equipment is converted into the voice information to be output. And then improve operating efficiency, reduce the operation that the user study and memory a large amount of gestures correspond, improve user's experience degree.

In a possible embodiment, after the step of obtaining the interface content presented on the interface of the terminal device, the method further includes: the terminal device determines the type of the interface element in the interface content.

In one possible embodiment, the types of interface elements in the interface content include: one or more of text information, picture information, contact names, phone numbers, and contact avatars.

In one possible embodiment, the type of interface element in the interface content includes picture information including one or more of text information, contact name, phone number, and contact avatar.

In one possible embodiment, the interface content includes textual information; the terminal equipment converts the interface content into voice information to be output, and the method comprises the following steps: and converting the text information into voice information, and outputting the voice.

In one possible embodiment, the interface content includes one or more of a contact name, a phone number, and a contact image; the terminal equipment converts the interface content into voice information to be output, and the method comprises the following steps: and the terminal equipment dials according to the contact person associated with one or more of the contact person name, the telephone number and the contact person image.

In a possible embodiment, the step of the terminal device converting the interface content into voice information for voice output further includes: detecting whether the interface content comprises system voice information of the terminal equipment; if the interface content does not include the system language information of the terminal equipment, converting the interface content into target voice information; or, converting the interface content into language information corresponding to the user requirement.

In a second aspect, an embodiment of the present invention provides a terminal device, where the terminal device includes: the terminal equipment comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring interface content presented on an interface of the terminal equipment; the detection unit is used for detecting the working mode of the terminal equipment; if the detection unit detects that the working mode of the terminal equipment is switched to the first mode, the execution unit converts the interface content of the terminal equipment into voice information, and then the output unit outputs the voice. The first mode is a working mode that the terminal equipment is accessed into the earphone or the earphone function of the terminal equipment is started.

According to the terminal device provided by the embodiment of the invention, when the detection unit detects that the working mode of the terminal device is switched to the first mode, the execution unit converts the interface content of the terminal device into the voice information, and then the output unit outputs the voice information. The embodiment of the invention can improve the operation efficiency, reduce the operation of learning and memorizing a large number of gestures by the user and improve the experience of the user.

In one possible embodiment, the processing unit is configured to: a type of an interface element in the interface content is determined.

In one possible embodiment, if the interface content includes text information; converting the interface content of the terminal equipment into voice information for outputting, comprising: and converting the text information into voice information, and outputting the voice.

In one possible embodiment, if the interface content includes one or more of a contact name, a phone number, and a contact image; converting the interface content into voice information for output, including: and dialing according to the contact person associated with one or more of the contact person name, the telephone number and the contact person image.

In a possible embodiment, the step of converting the interface content into voice information for voice output further comprises: detecting whether the interface content comprises system voice information of the terminal equipment; if the interface content does not include the system language information of the terminal equipment, converting the interface content of the terminal equipment into target voice information; or, converting the interface content into language information corresponding to the user requirement.

In a possible embodiment, the terminal device further includes an input unit, and the input unit is configured to receive a voice input of the user during a voice interaction between the terminal device and the user.

In three aspects, an embodiment of the present invention provides another terminal device, where the terminal device includes: a memory for storing program instructions; a processor for performing the following operations according to program instructions stored in the memory: acquiring interface content presented on an interface of terminal equipment; and if the working mode of the terminal equipment is detected to be switched into a first mode, converting the interface content into voice information for voice output, wherein the first mode is a working mode that the terminal equipment is accessed into an earphone or starts the earphone function of the terminal equipment.

In one possible embodiment, the processor is further configured to perform the following operations according to program instructions stored in the memory: after the step of obtaining the interface content presented on the interface of the terminal device, determining the type of the interface element in the interface content.

In one possible embodiment, the types of interface elements in the interface content include: picture information including one or more of text information, contact names, phone numbers, and contact avatars.

In one possible embodiment, if the interface content includes text information; a processor for performing the following operations according to program instructions stored in the memory: converting the interface content into voice information for output, including: and converting the text information into voice information, and outputting the voice.

In one possible embodiment, if the interface content includes one or more of a contact name, a phone number, and a contact image; a processor for performing the following operations according to program instructions stored in the memory: converting the interface content into voice information for output, including: and dialing according to the contact person associated with one or more of the contact person name, the telephone number and the contact person image.

In one possible embodiment, the processor is further configured to perform the following operations according to program instructions stored in the memory: before converting the interface content into voice information and outputting the voice, detecting whether the interface content comprises system voice information of the terminal equipment; if the interface content does not include the system language information of the terminal equipment, converting the interface content into target voice information; or, converting the interface content into language information corresponding to the user requirement.

Based on the above technical solution, the method and device for triggering a voice function provided in the embodiments of the present invention switch the operating mode of the terminal device to the first mode, convert the interface content into voice information, and output the voice information, where the first mode is an operating mode in which the terminal device accesses an earphone or starts an earphone function of the terminal device. The embodiment of the invention can improve the operation efficiency, reduce the operation of learning and memorizing a large number of gestures by the user and improve the experience of the user.

Drawings

Fig. 1 is a schematic structural diagram of a terminal device according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a method for triggering a voice function according to an embodiment of the present invention;

fig. 3 is a possible implementation of a method for triggering a voice function according to an embodiment of the present invention;

4a-4d illustrate another possible implementation of a triggered voice function according to an embodiment of the present invention;

5a-5c illustrate yet another possible implementation of a triggered voice function according to an embodiment of the present invention;

FIGS. 6a-6e illustrate yet another possible implementation of a trigger voice function according to an embodiment of the present invention;

FIG. 7 is a possible implementation of converting interface content into voice information according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of another terminal device according to an embodiment of the present invention.

Detailed Description

Fig. 1 is a schematic structural diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 1, the terminal device may include: processor 180, memory 120, Radio Frequency (RF) circuitry 110, and peripheral systems 170. These components may communicate over one or more communication buses 210.

The system is mainly used for realizing the interactive function between the terminal equipment and the user/external environment and comprises an input and output device of the terminal equipment. In some embodiments, peripheral system 170 may include: other device controllers 171, a sensor controller 172, and a display controller 173. Wherein each controller may be coupled to a respective peripheral device (e.g., other input devices 130, sensors 150, display screen 140). It should be noted that the peripheral system 170 may also include other I/O peripherals.

The display screen 140 may be used to display information input by the user or present information to the user, for example, various menus of the terminal device, interfaces of running applications, such as buttons (Button), Text input boxes (Text), sliders (Scroll Bar), menus (Menu), and so on may be presented. The Display screen 140 may include a Display panel 141 and a touch panel 142, and optionally, the Display panel 141 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 142 can cover the display panel, and when the touch panel 142 detects a touch operation on or near the touch panel, the touch panel is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 provides a corresponding visual output on the display panel 141 according to the type of the touch event. The touch panel 142 and the display panel 141 are two independent components to implement the input and output functions of the terminal device, but in some embodiments, the touch panel 142 and the display panel 141 may be integrated to implement the input and output functions of the terminal device.

Radio Frequency (RF) circuitry 110 is used to receive and transmit radio frequency signals and primarily integrates the receiver and transmitter of the terminal equipment. Radio Frequency (RF) circuitry 110 communicates with communication networks and other communication devices via radio frequency signals. In some embodiments, the Radio Frequency (RF) circuitry 110 may include, but is not limited to: an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chip, a SIM card, a storage medium, and the like. In some embodiments, the Radio Frequency (RF) circuitry 110 may be implemented on a separate chip. In general, WIreless transmission, such as Bluetooth (Bluetooth) transmission, WIreless Fidelity (WI-FI) transmission, third Generation mobile communication technology (3rd-Generation, 3G) transmission, fourth Generation mobile communication technology (4G) transmission, etc., may be performed through the radio frequency circuit B03.

The audio circuit 160 is used for single MP3 audio streaming and bi-directional voice transmission over a network. The audio circuitry 160 may include a speaker 161 and a microphone 162.

The memory 120 is coupled to the processor 180 for storing various software programs and/or sets of instructions. In some embodiments, memory 120 may include high speed random access memory and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid state storage devices. The memory 120 may store an operating system (hereinafter referred to simply as a system), such as an embedded operating system like ANDROID, IOS, WINDOWS, or LINUX. The memory 1120 may also store a network communication program that may be used to communicate with one or more additional devices, one or more terminal devices, one or more network devices. The memory 120 may further store a user interface program, which may vividly display the content of the application program through a graphical operation interface, and receive a control operation of the application program from a user through input controls such as menus, dialog boxes, and buttons.

It should be understood that the terminal device is only one example provided by the embodiments of the present invention, and the terminal device may have more or less components than those shown, may combine two or more components, or may have a different configuration implementation of the components.

The above is a schematic structural diagram of a typical terminal device, and of course, different device forms may be added or subtracted on the basis, for example, there may be no audio circuit, no speaker, no microphone, no RF circuit, or no other input device; for example, a WIFI circuit, a Bluetooth circuit, an infrared circuit and the like can be added.

Fig. 2 is a flowchart of a method for triggering a voice function according to an embodiment of the present invention. As shown in fig. 2, the method for triggering the voice function may include the steps of:

step 201: the terminal equipment acquires interface content presented on an interface of the terminal equipment.

Specifically, the terminal device may obtain interface content presented on an interface of the terminal device through system software of the terminal device, where the system software is various independent hardware in the terminal device, so that they may work in coordination. For the sake of the aspect discussion, the interface content presented on the interface of the terminal device is hereinafter simply referred to as the interface content of the terminal device.

In one possible embodiment, after step 201, the method for triggering the voice function further comprises: the type of the interface element in the interface content of the terminal equipment is determined.

Specifically, the terminal device determines a format of an interface element in the interface content, and further determines a type of the interface element. For example, the format of the interface element includes a format of a text (. TXT) and a format of a picture (. JPG), and after determining the format of the interface element in the interface content, the terminal device may determine that the type of the interface element in the interface content corresponds to text information and picture information.

In one possible embodiment, the types of interface elements in the interface content of the terminal device include one or more of text information, picture information, contact names, phone numbers, and contact avatars.

In one possible embodiment, the type of the interface element in the interface content of the terminal device includes picture information including one or more of a contact name, a phone number, and a contact avatar.

Step 202: if the working mode of the terminal equipment is detected to be switched into a first mode, the terminal equipment converts the interface content of the terminal equipment into voice information to carry out voice output, and the first mode is a working mode that the terminal equipment is accessed into an earphone or starts the earphone function of the terminal equipment.

In one possible embodiment, the interface content of the terminal device includes, but is not limited to, text information, picture information, contact names, phone numbers, and contact avatars.

In a possible embodiment, the interface content of the terminal device includes text information, and if it is detected that the operating mode of the terminal device is switched to the first mode, the terminal device converts the text information into voice information, and performs voice broadcast on the interface content of the terminal device through the terminal device. In fig. 3, text information is taken as an example. The text information may include, but is not limited to, a text interface, a picture interface displaying text.

In a possible embodiment, the interface content of the terminal device includes picture information, and if it is detected that the operating mode of the terminal device is switched to the first mode, the terminal device converts the picture information into voice information, and performs voice broadcast on the interface content of the terminal device through the terminal device.

In one possible embodiment, the interface content of the terminal device may include one or more of a contact name, a phone number, and a contact avatar.

In one possible embodiment, the interface content of the terminal device may include a contact name, or a phone number, or a contact avatar. If the terminal equipment is detected to be switched to the first mode, the terminal equipment carries out dialing operation according to a contact person name, a telephone number or a contact person associated with a contact person head portrait on the user interface. Contact 1 is only an example in fig. 4 a. And if the terminal equipment is detected to be switched to the first mode, the terminal equipment carries out dialing operation on the contact person associated with the contact person 1.

It should be noted that: when the interface content of the terminal equipment comprises a contact person head portrait, the terminal equipment can also realize that the contact person associated with the contact person head portrait carries out dialing operation in the above mode.

In one possible embodiment, the user interface of the terminal device comprises a contact name, the contact corresponding to at least two telephone numbers; or a contact photo, the contact photo corresponding to at least two phone numbers. For convenience of description, two phone numbers corresponding to one contact are taken as an example for illustration. And if the terminal equipment is detected to be switched to the first mode, the terminal equipment selects one of the two telephone numbers corresponding to one contact person to carry out dialing operation.

The terminal device selecting one of two telephone numbers corresponding to one contact person to perform dialing operation may include the following several ways:

the first mode is as follows: the production merchant may be set before the terminal device leaves the factory, or set by the user after leaving the factory. For example, the first ranked object is used by default for dialing operations. In fig. 4b, the phone numbers corresponding to contact 1 are phone number 1(xxxx-xxxxxxxx) and phone number 2 (xxx-xxxx-xxxx). The telephone number 1 is ranked at the first place, and the terminal equipment carries out dialing operation according to the contact person associated with the telephone number 1. Through default setting, the method is very consistent with the operation of the terminal equipment and is relatively simple to operate.

The second mode is as follows: the user may access keys on the headset connected to the terminal device. For example, one of the at least two telephone numbers corresponding to one contact name can be selected through a volume key on the earphone for dialing operation. For example, the volume-down key sequentially selects one of at least two telephone numbers corresponding to one contact person according to the sequence from first to last to carry out dialing operation; and a volume increasing key selects one telephone number from at least two telephone numbers corresponding to one contact person according to the sequence from back to front to dial. In fig. 4b, the dialing operation for telephone number 1 is selected via the volume up key of the headset.

The third mode is as follows: the terminal equipment selects one of at least two telephone numbers corresponding to one contact person to carry out dialing operation by inquiring the user. In fig. 4b, the terminal device inquires that the user needs to communicate with the phone number 1 corresponding to the contact 1, and the terminal device performs a dialing operation according to the phone number 1. Through voice interaction with the user, the user experience is emphasized, and the user experience degree is improved.

It should be noted that: when the interface content of the terminal device includes a contact photo, the contact photo corresponds to at least two telephone numbers, and the dialing operation of one telephone number of the at least two telephone numbers corresponding to the contact photo can be realized through the above mode.

In one possible embodiment, the interface content of the terminal device may include at least two contact names, each contact name corresponding to a telephone number; or, at least two telephone numbers; or at least two contact head portraits, wherein each contact head portrait corresponds to one telephone number. For convenience of description, three contact names are illustrated as an example. And if the terminal equipment is detected to be switched to the first mode, the terminal equipment selects a telephone number corresponding to one of the three contact names to carry out dialing operation.

The terminal device selecting a telephone number corresponding to one of the three contact names to perform dialing operation may include the following several ways:

the first mode is as follows: the production merchant may be set before the terminal device leaves the factory, or set by the user after leaving the factory. For example, the contact associated with the contact name with the first order is dialed by default. In fig. 4c, the name of the first-ranked contact is contact 1, and the terminal device performs dialing operation according to the contact associated with contact 1. Through default setting, the method is very consistent with the operation of the terminal equipment and is relatively simple to operate.

The second mode is as follows: the user may access keys on the headset connected to the terminal device. For example, one of the three contacts may be selected for dialing operation through a volume button on the headset. For example, a volume down key, which sequentially selects contact 1, contact 2, and contact 3 in order from first to last; a volume up key to select contact 3, contact 2 and contact 1 in order from back to front. Suppose that the user selects contact 1 through the volume key, and the terminal device performs dialing operation according to the contact associated with contact 1.

The third mode is as follows: and voice interaction is carried out between the earphone and the user, and the terminal equipment selects to carry out dialing operation on one of the three contacts by inquiring the user. In fig. 4c, the terminal device receives an instruction that the user wants to dial the contact 1 through voice interaction with the user, and the terminal device performs a dialing operation according to the contact associated with the contact 1. Through voice interaction with the user, the user experience is emphasized, and the user experience degree is improved.

It should be noted that: when the interface content of the terminal device includes at least two contact person avatars, the terminal device can select a contact person associated with one contact person avatar in the at least two contact person avatars to perform dialing operation in the above manner.

In one possible embodiment, the user content of the terminal device includes at least two contacts, each contact corresponding to at least two phone numbers; or, at least two contact avatars, each contact avatar corresponding to at least two phone numbers. For convenience of description, two contacts and two phone numbers corresponding to each contact are taken as an example for illustration. And if the terminal equipment is detected to be switched to the first mode, the terminal equipment selects one telephone number corresponding to one of the two contacts to carry out dialing operation.

The terminal device selecting a phone number corresponding to one of the two contacts to perform dialing operation may include the following several ways:

the first mode is as follows: the production merchant may be set before the terminal device leaves the factory, or set by the user after leaving the factory. For example, the first ranked object is selected by default for a dialing operation. In fig. 4d, the phone numbers corresponding to contact 1 are phone number 1(xxxx-xxxxxxxx) and phone number 2(xxx-xxxx-xxxx), and the phone numbers corresponding to contact 2 are phone number 3 (xxxx-xxxxxxxxxx) and phone number 4 (xxx-xxxx-xxxx). The telephone number 1 corresponding to the contact 1 is ranked at the first position, and the terminal equipment carries out dialing operation according to the telephone number 1. Through default setting, the method is very consistent with the operation of the terminal equipment and is relatively simple to operate.

The second mode is as follows: the user may access keys on the headset connected to the terminal device. For example, the object may be selected for dialing operation through volume keys on the headset. For example, the volume-down keys sequentially select objects in order from first to last to perform dialing operation; and a volume increasing key for selecting the object in the order from back to front to perform dialing operation. In fig. 4 d. Suppose that the user selects telephone number 1 corresponding to the contact through the volume key, and the terminal device performs dialing operation according to the telephone number 1.

The third mode is as follows: the terminal equipment selects an object to carry out dialing operation by inquiring the user through carrying out voice interaction with the user through the earphone. In fig. 4d, it is known through interaction with the user that the user needs to communicate with the phone number 1 corresponding to the contact 1, and the terminal device performs a dialing operation according to the phone number 1 after receiving the instruction of the user. Through voice interaction with the user, the user experience is emphasized, and the user experience degree is improved.

It should be noted that: when the interface content of the terminal device includes at least two contact person avatars corresponding to at least two telephone numbers, the dialing operation of one telephone number corresponding to one contact person avatar in the at least two contact person avatars can be realized through the above mode.

In one possible embodiment, the interface content of the terminal device comprises a contact name and a telephone number; or, a phone number and a contact avatar; or, contact name and contact avatar; or contact name, phone number, and contact avatar. And each contact person head portrait corresponds to one telephone number. For convenience of description, a contact name and a contact avatar are illustrated as an example. And if the terminal equipment is detected to be switched to the first mode, the terminal equipment selects the contact person associated with the contact person name or the contact person head portrait to carry out dialing operation.

The terminal device selecting the contact associated with the contact name or the contact icon to perform dialing operation may include the following several ways: the first mode is as follows: the production merchant may be set before the terminal device leaves the factory, or set by the user after leaving the factory. For example, the default ordering performs a dialing operation on the first ordered object. In fig. 5a, the contact names are sorted in the first order, and the terminal device performs a dialing operation according to the contact associated with the contact name.

The second mode is as follows: through keys on the headset connected to the terminal device. For example, the volume-down keys sequentially select objects in order from first to last to perform dialing operation; and a volume increasing key for selecting the object in the order from back to front to perform dialing operation. In fig. 5 a. Assuming that the contact name can be selected through a volume key on the earphone, the terminal device performs dialing operation according to the contact related to the contact name.

The third mode is as follows: the terminal equipment carries out voice interaction with the user through the earphone, and the terminal equipment selects to carry out dialing operation on the object by inquiring the user. In fig. 5b, the terminal is arranged to instruct the terminal device to talk to contact 1 by asking the user by voice. And the terminal equipment carries out dialing operation according to the contact person associated with the contact person 1. Through voice interaction with the user, the user experience is emphasized, and the user experience degree is improved.

In one possible embodiment, the interface content of the terminal device comprises a contact name and a telephone number; or, a phone number and a contact avatar; or, contact name and contact avatar; or contact name, phone number, and contact avatar. Each contact name and contact head portrait corresponds to at least two telephone numbers. For the purpose of the aspect description, an example is described in which the contact name and the contact avatar, and the contact avatar respectively correspond to two phone numbers. And if the terminal equipment is detected to be switched to the first mode, the terminal equipment selects a telephone number corresponding to the contact name or the contact head portrait to carry out dialing operation.

The terminal device selecting a phone number corresponding to the contact name or the contact icon to perform dialing operation may include the following several ways:

the first mode is as follows: the production merchant may be set before the terminal device leaves the factory, or set by the user after leaving the factory. For example, the first ranked object is used by default for dialing operations. In fig. 5c, contact 1 corresponds to telephone numbers of telephone number 1(xxxx-xxxxxxxx) and telephone number 2(xxx-xxxx-xxxx), and the contact header corresponds to telephone numbers of telephone number 3 (xxxx-xxxxxxxxxxxx) and telephone number 4 (xxx-xxxx-xxxx). The telephone number 1 corresponding to the contact 1 is ranked at the first position, and the terminal equipment carries out dialing operation according to the telephone number 1. Through default setting, the method is very consistent with the operation of the terminal equipment and is relatively simple to operate.

The second mode is as follows: the user may access keys on the headset connected to the terminal device. For example, the object may be selected for dialing operation through volume keys on the headset. For example, the volume-down keys sequentially select objects in order from first to last to perform dialing operation; and a volume increasing key for selecting the object in the order from back to front to perform dialing operation. In fig. 5 c. Suppose that the user selects the telephone number 1 corresponding to the contact 1 through the volume key, and the terminal device performs dialing operation according to the telephone number 1.

The third mode is as follows: the terminal equipment selects an object to carry out dialing operation by inquiring the user through carrying out voice interaction with the user through the earphone. In fig. 5c, by interacting with the user, the user needs to communicate with the phone number 1 corresponding to the contact 1, and the terminal device performs a dialing operation according to the phone number 1 after receiving the instruction of the user. Through voice interaction with the user, the user experience is emphasized, and the user experience degree is improved.

In one possible embodiment, the interface content of the terminal device comprises text information, picture information, contact names or phone numbers or contact avatars.

In one possible embodiment, the interface content of the terminal device includes picture information and one or more of a contact name, a phone number, and a contact avatar.

In one possible embodiment, the interface content of the terminal device includes text information and one or more of a contact name, a phone number, and a contact avatar. As shown in fig. 6 a. Text information is converted to speech by default. And then in the process of voice reading, the user can be inquired through voice when the contact name is met, or the user can be inquired whether to execute the dialing operation through voice after the interface content is read, and if the user needs to communicate with the contact associated with the contact name, the terminal equipment carries out the dialing operation according to the contact associated with the contact name.

In one possible embodiment, the interface content of the terminal device includes text information and contact names, and each contact name corresponds to a telephone number; or, text messages and telephone numbers; or, the text information and the contact head portrait, wherein each contact head portrait corresponds to one telephone number. In fig. 6b, text information and contact names are illustrated as an example. If the terminal equipment is detected to be switched into the first mode, the terminal equipment can carry out voice interaction with a user, and when the user gives an instruction of converting text information into voice, the terminal equipment converts the text information into voice; or, when the user gives an instruction for dialing, the terminal device performs dialing operation according to the contact associated with the contact name.

In one possible embodiment, the interface content of the terminal device comprises text information and contact names, wherein each contact name corresponds to at least two telephone numbers; or, text messages and telephone numbers; or the text information and the contact head portrait, wherein each contact head portrait corresponds to at least two telephone numbers. In fig. 6c, the text information and contact 1, contact 1 corresponding to phone number 1(xxxx-xxxxxxxx) and phone number 2(xxx-xxxx-xxxx) are illustrated as an example. If the terminal equipment is detected to be switched into the first mode, the terminal equipment can carry out voice interaction with a user, and when the user gives an instruction of converting text information into voice, the terminal equipment converts the text information into voice; or, when the user gives an instruction of dialing the telephone number 1 corresponding to the contact 1, the terminal device performs dialing operation according to the telephone number 1.

In one possible embodiment, the interface content of the terminal device includes text information, contact names and telephone numbers, wherein each contact name corresponds to one telephone number; or, the text information, the telephone number and the contact person head portrait, wherein each contact person head portrait corresponds to one telephone number; or, the text information, the contact names and the contact head portraits, wherein each contact name and each contact head portraits respectively correspond to one telephone number; or the text information, the contact names, the telephone numbers and the contact head images, wherein each contact name and each contact head image respectively correspond to one telephone number. In fig. 6d, text information, contact names, phone numbers and contact avatars are illustrated as an example. If the terminal equipment is detected to be switched into the first mode, the terminal equipment can carry out voice interaction with a user, and when the user gives an instruction of converting text information into voice, the terminal equipment converts the text information into voice; or, when the user gives an instruction of dialing the telephone number, the terminal equipment performs dialing operation according to the contact person associated with the telephone number.

In one possible embodiment, the interface content of the terminal device includes text information, contact names and telephone numbers, wherein each contact name corresponds to at least two telephone numbers; or, the text information, the telephone numbers and the contact person head portraits, wherein each contact person head portraits corresponds to at least two telephone numbers; or, the text information, the contact names and the contact head portraits, wherein each contact name and each contact head portraits respectively correspond to at least two telephone numbers; or the text information, the contact names, the telephone numbers and the contact head portraits, wherein each contact name and each contact head portraits respectively correspond to at least two telephone numbers. In fig. 6e, an example is illustrated of text information, contact 1, a phone number and a contact avatar, contact 1 corresponding to phone number 1(xxxx-xxxxxxxx) and phone number 2 (xxx-xxxx-xxxx). If the terminal equipment is detected to be switched into the first mode, the terminal equipment can carry out voice interaction with a user, and when the user gives an instruction of converting text information into voice, the terminal equipment converts the text information into voice; or, when the user gives an instruction of dialing the telephone number 1 corresponding to the contact 1, the terminal device performs dialing operation according to the telephone number 1.

In one possible embodiment, the interface content of the terminal device includes picture information including one or more of text information, contact names, phone numbers, and contact avatars. And if the terminal equipment is detected to be switched to the first mode, the terminal equipment converts the text information in the picture information into voice information, and the terminal equipment carries out dialing operation according to the contact person related to the contact person name, the telephone number and the contact person head portrait.

In one possible embodiment, the picture information includes text information, and a contact name or phone number or contact avatar.

In one possible embodiment, the picture information comprises a contact name or phone number or contact avatar

It should be noted that, from at least two contact names, each contact corresponds to a telephone number or a plurality of telephone numbers; or, at least two contact person avatars, each contact person avatar corresponding to one phone number or a plurality of phone numbers; or, one telephone number is selected from at least two telephone numbers for dialing, and the default setting of the terminal equipment system can be adopted; or, a volume key of the earphone is adopted; alternatively, voice communication is employed. The specific implementation process can refer to the corresponding embodiment.

In the embodiment of the present invention, the contact name may include a contact stored in the terminal device; but also include, but are not limited to, instant messaging, etc., account contacts registered with, or bound to, a telephone number. The contact photo can comprise: the head portrait set by the contact person stored in the terminal equipment by the user; but also include, but are not limited to, instant messaging, etc., an avatar of an account contact registered with, or bound to, a telephone number. Instant Message (IM) refers to a service capable of both sending and receiving internet messages and the like.

In the embodiment of the present invention, the first mode refers to an operation mode in which the terminal device accesses an earphone or an earphone function inside the terminal device.

In a possible embodiment, before the step that the terminal device converts the interface content of the terminal device into voice information for voice output, the method further includes: the terminal equipment detects whether the interface content comprises voice information of a terminal equipment system; if the interface content does not include the language information of the terminal equipment system, converting the interface content into target voice information; or, converting the interface content into language information corresponding to the user requirement.

Specifically, as shown in fig. 7, if the interface content of the terminal device does not include the system voice information of the terminal device, the terminal device converts the interface content of the terminal device into the voice information of the terminal device system according to the default setting of the terminal device; or the terminal device communicates with the user, if the user issues an instruction for translation, in one case, the user gives a target language, and the terminal device converts the interface content of the terminal device into the target language, in the other case, the user does not give the target voice, and the terminal device converts the interface content of the terminal device into the voice information of the terminal device system; and if the user does not issue the translation instruction, directly converting the interface content of the terminal equipment into the voice information.

It should be noted that the language information of the interface content of the terminal device is consistent with the language information of the terminal device system, for example, the system language of the terminal device and the voice of the text information are both simplified chinese. The terminal device may not perform the conversion operation, and directly convert the interface content of the terminal device into text information.

In the embodiment of the present invention, the language information may refer to a voice, or may refer to a type of a language, for example, english, chinese, and the like. Or a combination of both.

Step 203: and the terminal equipment executes corresponding operation according to the interface content of the terminal equipment.

In a possible embodiment, when it is detected that the operating mode of the terminal device is switched to the first mode, the terminal device detects interface content of the terminal device, and converts the interface content of the terminal device into voice information for voice output, where the first mode is an operating mode in which the terminal device enters an earphone or starts an earphone function of the terminal device. And when the working mode of the detection terminal equipment is switched to the first mode, executing the operation of detecting the interface of the terminal equipment. And compared with the method, when the working mode of the terminal equipment is detected to be switched into the first mode, the power consumption of the terminal equipment is saved.

Fig. 8 is a schematic structural diagram of another terminal device according to an embodiment of the present invention. As shown in fig. 8, the terminal device includes: an acquisition unit 810, a detection unit 820, an execution unit 830, an output unit 840 and a processing unit 850.

It will be appreciated by those skilled in the art that fig. 8 merely shows a simplified design of the structure of the terminal device. The terminal structure shown in fig. 8 does not constitute a limitation of the terminal, and the terminal device may include more or less components than those shown in fig. 8, for example, the terminal device may further include a storage unit for storing instructions corresponding to a communication algorithm.

In fig. 8, an obtaining unit 810, configured to obtain interface content presented on an interface of a terminal device; a detecting unit 820, configured to detect an operating mode of the terminal device; if the detecting unit 820 detects that the operation mode of the terminal device is switched to the first mode, the executing unit 830 converts the interface content of the terminal device into voice information, and then the output unit 850 outputs the voice. The first mode is a working mode that the terminal equipment is accessed into the earphone or the earphone function of the terminal equipment is started.

In the terminal device provided by the embodiment of the present invention, when the detection unit 820 detects that the working mode of the terminal device is switched to the first mode, the execution unit 830 converts the interface content of the terminal device into the voice information, and the output unit 840 outputs the voice information. The embodiment of the invention can improve the operation efficiency, reduce the operation of learning and memorizing a large number of gestures by the user and improve the experience of the user.

In one possible embodiment, if the interface content of the terminal device includes text information; converting the interface content of the terminal equipment into voice information for outputting, comprising: and converting the text information into voice information, and outputting the voice.

In one possible embodiment, if the interface content of the terminal device includes one or more of a contact name, a phone number and a contact image; converting the interface content of the terminal equipment into voice information for outputting, comprising: and dialing according to the contact person associated with one or more of the contact person name, the telephone number and the contact person image.

In a possible embodiment, the terminal device further comprises an input unit 860, said input unit 860 being configured to receive a voice input of the user during a voice interaction of the terminal device with the user.

According to the method and the device for triggering the voice function provided by the embodiment of the invention, when the working mode of the terminal device is switched to the first mode, the interface content of the terminal device is converted into the voice information for voice output, and the first mode is the working mode that the terminal device is accessed into an earphone or the earphone function of the terminal device is started. The embodiment of the invention can improve the operation efficiency, reduce the operation of learning and memorizing a large number of gestures by the user and improve the experience of the user.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be understood by those of ordinary skill in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory (non-transitory) medium, such as a random access memory, a read-only memory, a flash memory, a hard disk, a solid state drive, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk) and any combination thereof.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of triggering a voice function, comprising:

acquiring interface content presented on an interface of terminal equipment;

if the working mode of the terminal equipment is detected to be switched into a first mode, converting the interface content into voice information to carry out voice output, and operating the interface content according to the voice information, wherein the first mode is a working mode that the terminal equipment is accessed into an earphone or starts the earphone function of the terminal equipment; the operating the interface content according to the voice information comprises: when the contact name or the contact head portrait in the interface content corresponds to at least two telephone numbers, the terminal equipment interacts with a user through an earphone and selects one telephone number for dialing operation; the terminal equipment interacts with the user through the earphone and comprises the following steps: interact with the user through keys on the headset or interact with the user through the headset in a voice manner.

2. The method according to claim 1, wherein after the step of obtaining the interface content presented on the interface of the terminal device, the method further comprises: determining the type of the interface element in the interface content.

3. The method of claim 2, wherein the types of page elements in the interface content comprise: one or more of text information, picture information, contact names, phone numbers, and contact avatars.

4. The method of claim 2, wherein the type of page element in the interface content comprises picture information including one or more of text information, contact name, phone number, and contact avatar.

5. The method according to any one of claims 2 to 4, wherein if the interface content includes a text message;

converting the interface content into voice information for outputting, including:

and converting the text information into voice information, and outputting the voice.

6. The method of claim 1, wherein the step of converting the interface content into voice information for voice output further comprises:

detecting whether the interface content comprises system voice information of the terminal equipment;

if the interface content does not include the system language information of the terminal equipment, converting the interface content into target voice information; alternatively, the first and second electrodes may be,

and converting the interface content into language information corresponding to the user requirement.

7. A terminal, comprising:

a memory for storing program instructions;

a processor for performing the following operations according to program instructions stored in the memory:

acquiring interface content presented on an interface of terminal equipment;

8. The terminal of claim 7, wherein the processor is further configured to perform the following operations according to program instructions stored in the memory: after the step of obtaining the interface content presented on the interface of the terminal equipment, determining the type of the interface element in the interface content.

9. The terminal of claim 8, wherein the types of interface elements in the interface content comprise: one or more of text information, picture information, contact names, phone numbers, and contact avatars.

10. The terminal of claim 8, wherein the type of interface element in the interface content comprises picture information, the picture information comprising one or more of text information, contact name, phone number, and contact avatar.

11. A terminal according to any of claims 8 to 10, wherein if the interface content comprises a text message; the processor is configured to perform the following operations according to program instructions stored in the memory: converting the interface content into voice information for outputting, including:

12. The terminal of claim 7, wherein the processor is further configured to perform the following operations according to program instructions stored in the memory: before converting the interface content into voice information for voice output, detecting whether the interface content comprises system voice information of the terminal equipment; if the interface content does not include the system language information of the terminal equipment, converting the interface content into target voice information; or converting the interface content into language information corresponding to the user requirement.