CN105744074A

CN105744074A - Voice operation method and apparatus in mobile terminal

Info

Publication number: CN105744074A
Application number: CN201610195293.2A
Authority: CN
Inventors: 刘涛
Original assignee: Hisense Mobile Communications Technology Co Ltd
Current assignee: Hisense Mobile Communications Technology Co Ltd
Priority date: 2016-03-30
Filing date: 2016-03-30
Publication date: 2016-07-06

Abstract

Embodiments of the present invention provide a voice operation method and apparatus in a mobile terminal. The method includes the steps of when a voice service is triggered in an operation system in a sleeping state, waking the operation system; acquiring audio data by the voice service; searching for a preset voice instruction matched with the audio data; extracting an object identifier from the voice instruction; and invoking an application corresponding to an identification object identifier. According to the voice operation method and apparatus in the mobile terminal, an automatic voice operation in the sleeping state is realized, so that a series of complex operations like waking the system by pressing a power key, unlocking, searching for the application, starting the application and so on are prevented, operation convenience is greatly improved, and consumed time is reduced.

Description

Method and device for carrying out voice operation in mobile terminal

Technical Field

The present invention relates to the field of voice processing technologies, and in particular, to a method and an apparatus for performing voice operation in a mobile terminal.

Background

With the development of mobile communication technology, intelligent devices such as mobile phones and intelligent wearable devices are more and more popular, and great convenience is brought to life, study and work of people.

Because of the limitations of the smart device itself, power, and network traffic, in order to save power and traffic as much as possible, the system of the smart device is usually dormant under certain conditions for a long time.

However, if the user needs to perform an operation, the user generally needs to press a power key to wake up the system, unlock the system, search for a corresponding application after unlocking the system, and start the application to perform the operation.

This operation is complicated and time consuming.

Disclosure of Invention

In view of the above problems, in order to solve the above problems of complicated operation steps and long time consumption when the system is in a sleep state, embodiments of the present invention provide a method for performing a voice operation in a mobile terminal and a corresponding apparatus for performing a voice operation in a mobile terminal.

In order to solve the above problem, an embodiment of the present invention discloses a method for performing voice operation in a mobile terminal, including:

when a voice service is triggered in an operating system in a dormant state, waking up the operating system;

obtaining, by the voice service, audio data;

searching a preset voice instruction matched with the audio data;

extracting an object identification from the voice instruction;

and calling the application corresponding to the identification object identifier.

Preferably, when the voice service is triggered in the operating system in the sleep state, the step of waking up the operating system includes:

when the voice service monitors the appointed broadcast in the operating system in the dormant state, waking up the operating system;

wherein the designated broadcast is triggered by a physical key of the mobile terminal when the operating system is in a sleep state.

Preferably, the step of searching for a preset voice command matching the audio data comprises:

extracting a voice instruction which is trained by adopting an object identifier in the operating system;

calculating the similarity between the audio data and the voice instruction;

and when the similarity exceeds a preset similarity threshold, confirming that the audio data is matched with the voice command.

Preferably, the step of extracting the object identifier from the voice instruction comprises:

searching a configuration file corresponding to the voice instruction;

and extracting object identification from the configuration file.

Preferably, the step of calling the application corresponding to the identification object identifier includes:

searching an operation parameter corresponding to the object identifier from a preset information table;

and calling the application according to the operating parameters.

Preferably, the object is identified as a contact name or an application name;

the operation parameter corresponding to the contact name is a contact number, and the operation parameter corresponding to the application name is a main activity name and/or a package name;

the step of calling the application according to the operating parameters comprises the following steps:

communicating with the contact number through an intent mechanism;

or,

and starting an application corresponding to the main activity name and/or the package name through an intention mechanism.

Preferably, the method further comprises the following steps:

when the operating system is started, starting voice service;

and promoting the priority of the voice service.

Preferably, the method further comprises the following steps:

extracting, by the voice service, object information for one or more data objects in the system;

storing the object information in an information table.

Preferably, the object information includes an object identifier and an operation parameter, and the method further includes:

packaging the object identification into a preset configuration file;

and training the voice instruction by the configuration file after the object identification is packaged.

The embodiment of the invention also discloses a device for carrying out voice operation in the mobile terminal, wherein the device is positioned in the voice service and comprises:

the operating system awakening module is used for awakening the operating system when the voice service is triggered in the operating system in the dormant state;

the audio data acquisition module is used for acquiring audio data;

the voice instruction matching module is used for searching a preset voice instruction matched with the audio data;

the object identification extracting module is used for extracting object identifications from the voice instructions;

and the application calling module is used for calling the application corresponding to the identification object identifier.

Preferably, the operating system wake-up module includes:

the broadcast triggering submodule is used for waking up the operating system when the specified broadcast is monitored in the operating system in the dormant state;

Preferably, the voice instruction matching module includes:

the sample voice instruction extraction submodule is used for extracting a voice instruction which is trained by adopting an object identifier in the operating system;

the sample voice instruction matching submodule is used for calculating the similarity between the audio data and the voice instruction;

and the voice instruction determining submodule is used for determining that the audio data is matched with the voice instruction when the similarity exceeds a preset similarity threshold.

Preferably, the object identification extracting module includes:

the configuration file searching submodule is used for searching a configuration file corresponding to the voice instruction;

and the object identifier extraction submodule is used for extracting the object identifier from the configuration file.

Preferably, the application calling module includes:

the operation parameter searching submodule is used for searching the operation parameter corresponding to the object identifier from a preset information table;

and the parameter operation submodule is used for calling the application according to the operation parameters.

Preferably, the object is identified as a contact name or an application name;

the operating parameter corresponding to the contact name can be a contact number, and the operating parameter corresponding to the application name can be a main activity name and/or a package name;

the parameter operation submodule comprises:

a communication unit for communicating with the contact number through an intention mechanism;

or,

and the starting unit is used for starting the application corresponding to the main activity name and/or the package name through an intention mechanism.

Preferably, the method further comprises the following steps:

the voice service starting module is used for starting the voice service when the system is started;

and the priority improving module is used for improving the priority of the voice service.

Preferably, the method further comprises the following steps:

the object information extraction module is used for extracting the object information of one or more data objects in the operating system;

and the information table storage module is used for storing the object information in an information table.

Preferably, the object information includes an object identifier and an operation parameter, and further includes:

the configuration file packaging module is used for packaging the object identifier into a preset configuration file;

and the voice instruction training module is used for training the voice instruction of the configuration file after the object identification is packaged.

The embodiment of the invention has the following advantages:

the embodiment of the invention keeps in the operating system in the dormant state through the voice service, monitors the appointed broadcast, and if the appointed broadcast is monitored, the operating system can be awakened to collect the audio data, so that the long-term collection of the audio data and the matching of the voice instruction can be avoided, the resource waste caused by the long-term collection can be greatly reduced, in addition, the appointed broadcast is simple in trigger operation, if the audio data is matched with the voice instruction, the semantic is quickly identified to call the application, the automatic voice operation in the dormant state is realized, a series of complicated operations of awakening the system, unlocking, searching the application, starting the application and the like by pressing a power key are avoided, the simplicity and convenience of the operation are greatly improved, and the time consumption is reduced.

According to the embodiment of the invention, the voice command is trained by extracting the object information of the data object in the system operation system, and the data of the mobile terminal of each user is not only the same, so that the personalized voice operation can be realized by automatically extracting the data object training voice command, and the condition that the user inputs fussy voice to train the voice command before using the voice command is avoided, the voice command training efficiency is greatly improved, and the operation simplicity is improved.

Drawings

FIG. 1 is a flow chart of the steps of one embodiment of a method of voice operation in a mobile terminal of the present invention;

fig. 2 is a block diagram of an embodiment of the apparatus for voice operation in a mobile terminal according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a method for performing a voice operation in a mobile terminal according to the present invention is shown, which may specifically include the following steps:

step 101, when a voice service is triggered in an operating system in a dormant state, waking up the operating system;

it should be noted that the embodiments of the present invention can be applied to various mobile terminals, for example, a mobile phone, a tablet computer, a personal digital assistant, a wearable device (such as glasses, a watch, and the like), and the like.

The operating system of the mobile device may include Android (Android), IOS, Windows phone, Windows, and the like, and may support running a voice Service, i.e., a Service that processes operations related to voice operations, in a system in a sleep state.

In order to enable those skilled in the art to better understand the embodiment of the present invention, in the present specification, Android is described as an example of an operating system.

When the operating system is started, the voice Service can be started, the priority of the voice Service is improved, and the probability that the voice Service is forcibly closed (Kill) by the system is reduced.

For example, in the Android system, if an Android.

The priority of the voice Service is improved by two methods:

1. the priority is increased by a startForeground (1, newNotification ()) method in the onStart () method;

2. in a manifest file corresponding to a voice Service process, for example, android manifest.

If the user does not operate the mobile device, after a certain time, the system enters a dormant state, or the user clicks a power-off key to enter the dormant state.

The sleep state generally refers to a state in which the system stops operations other than the designated task and is in a standby state.

In practical applications, the sleep state can be defined by those skilled in the art according to practical situations.

For example, in android (linux), there are three main steps for hibernation (suspend):

1. freezing a user state process and a kernel state task;

2. calling a callback function of the suspend of the registered equipment;

3. sleeping the core device and bringing a CPU (central processing unit) into a sleep state to freeze the process is that the kernel sets the state of all processes in the process list to stop and saves the context of all processes.

In the embodiment of the invention, the voice Service resident memory is kept running when the system is in a dormant state so as to collect audio data.

In order to reduce resource consumption, the voice Service may register a broadcast receiver in advance for listening to a customized broadcast.

If a certain physical key (such as a direction key) in the mobile terminal is pressed (single click, double click, etc.), the state of the bottom layer driving register is read, and if the system is judged to be in a dormant state, a self-defined broadcast is sent instead of the original broadcast (such as KEYCODE _ VOLUME _ DOWN) of the physical key.

When the voice service monitors the designated broadcast in the operating system in the dormant state, wherein the designated broadcast is triggered by a physical key of the mobile terminal when the operating system is in the dormant state, the physical key is triggered, and the user actively wakes up the operation, so that the operating system can be woken up.

In particular implementations, waking up the operating system may include operations to invoke a power service, illuminate a screen, and so on.

Step 102, obtaining audio data by the voice service;

in a specific implementation, the voice Service may call a recording Service of the operating system to acquire audio data, and then acquire the audio data acquired by the acquirer.

In the Android system, recording of audio data is done through the mediadecoder class. The implementation steps are generally as follows:

a. generating a MediaRecorder class object;

MediaRecorderrecorder＝newMediaRecorder()；

b. setting an audio recording source;

record. // set the audio recording source (media recorder. Audio Source query corresponding audio recording source constant)

Alternatively, recorder. V/set video recording source (media recorder. video source query corresponding video recording source constant)

c. Setting the output file format during recording (this step precedes recorder.preparation ());

record. The corresponding file format is found in// mediaregister.

d. Setting an audio and video coding mode;

setaudioencoder (audio _ encoder); v/set audio recording coding mode (media recorder. Audio encoder query corresponding audio coding constant)

Alternatively, record. V/set video recording coding mode (media recorder. video encoder query corresponding video coding constant)

e. Setting an output file storage address (this step is after reorder.setoutputformat () and before reorder.prepare ()), reorder.setoutputfile (strippath);

f. preparing the recorder to start capturing encoded data, recorder.

g. Formally begin capturing encoded data to a specified file, record.

Step 103, searching a preset voice instruction matched with the audio data;

the voice instruction may refer to voice data (i.e., a sound made by a user) as an operation instruction.

By applying the embodiment of the invention, when the voice Service is initially started or the data objects are changed (such as installation, upgrading and the like), the object information of one or more data objects in the operating system can be extracted and stored in the information table (such as a hash table).

The object information may refer to related information of one data object.

In one example, if the data object is a contact, a thread may be started to read information of the contact, such as a name, a number, and the like, and store the information of the contact in the hash table.

In another example, if the data object is an application, another thread may be started to obtain information of the application, and the information of the application is stored in the hash table.

For example, in the Android system, information of an application with a launcher (desktop) attribute, such as a name, a package name/a main Activity name, and the like, may be obtained by means of a packageManager and the like.

In the embodiment of the present invention, the object information may include an object identifier and an operation parameter.

Wherein the object identification may be capable of representing information of a uniquely determined data object, e.g. the name of a contact, the name of an application, etc.

The operation parameter may refer to information configured by operating on the data object, such as a number of a contact, a package name/active of an application, and the like.

A configuration file, such as an XML (extensible markup language) file, may be set in advance as a template of the voice instruction.

For example, "call XX," "open XX," "view XX," where XX is a reserved object identification.

Packaging the object identifier into a preset configuration file, and training a voice instruction in the configuration file after packaging the object identifier, for example, using an EM (expectation maximization, maximum expectation algorithm) to train a global GMM (gaussian mixture model) model which is independent of a speaker and independent of content as a sample voice instruction, or using a speak right to train the sample voice instruction.

SpeakRight is a Java framework for writing voice recognition applications, and based on VoiceXML technology, a Stringtemplate engine is used for automatically generating VoiceXML documents.

For example, names of contacts, such as xiaoming, are packaged into preset configuration files of "call to xx", and templates of training voice commands of "call to xiaoming", are obtained, and then are trained into sample voice commands.

For another example, the name of the application, such as a browser, is packaged into a preset configuration file "open XX" or "view XX", and a template "open browser" or "view browser" of the training voice command is obtained, so as to train the training voice command into a sample voice command.

It should be noted that the training of the sample voice instruction may be performed locally on the mobile device or may be performed on the network server, which is not limited in this embodiment of the present invention.

Therefore, when speech recognition is performed, a speech instruction which is trained by adopting object identification in an operating system in advance can be extracted, the similarity between the audio data and the speech instruction is calculated, and when the similarity exceeds a preset similarity threshold, the similarity between the audio data and the speech instruction is high, and the audio data and the speech instruction are confirmed to be matched.

In the embodiment of the invention, the similarity between the audio data and the voice instruction can be calculated by a distance measurement method and the like.

In this manner, the feature distance (featuredistance) is one of the methods for measuring the similarity between audio samples (e.g., audio data and sample voice instructions) and generally reflects the degree of separation between audio samples, and commonly used distance measurement methods include Min's distance, Mahalanobis distance, cosine distance, correlation distance, and so on.

Of course, the matching manner is only an example, and when the embodiment of the present invention is implemented, other matching manners, such as an audio similarity measurement method based on a distance correlation diagram, may be set according to actual situations, which is not limited in this embodiment of the present invention. In addition, besides the above matching methods, those skilled in the art may also adopt other matching methods according to actual needs, and the embodiment of the present invention is not limited thereto.

It should be noted that the recognition of the voice command may be performed locally on the mobile device or may be performed on the network server, which is not limited in this embodiment of the present invention.

If the recognition is carried out locally on the mobile equipment, the voice command stored locally on the mobile equipment is directly extracted and matched with the audio data.

And if the voice command is identified in the network server, sending the audio data to the network server, and matching the voice command extracted and stored in the network server with the audio data by the network server.

Step 104, extracting object identification from the voice instruction;

in a specific implementation, a configuration file corresponding to the voice instruction, that is, a configuration file for training a sample voice instruction matching the voice instruction, may be searched, and the configuration file is encapsulated in the configuration file.

In a specific implementation, the object identifier may be extracted from the configuration file according to a preset syntax rule.

For example, the grammar rules for a profile are "call XX," "open XX," "view XX," where XX is a reserved object identification.

In the Android system, a java.xml.parsers.documentbuilder class parsing XML file is provided, and the content can be parsed by inputting a key field.

And 105, calling the application corresponding to the identification object identifier.

In the embodiment of the present invention, if the object identifier is analyzed, corresponding operations may be performed on the data object corresponding to the object identifier.

In a specific implementation, an operation parameter corresponding to the object identifier may be searched from a preset information table, and an application may be invoked according to the operation parameter.

In one example, if the object is identified as a contact name and the operation parameter corresponding to the contact name is a contact number, the object may communicate with the contact number through an intention mechanism Intent, such as making a call, sending a short message, sending a multimedia message, and the like.

If two or more numbers exist in one contact person information or the names and the pronunciation of two or more contact persons are the same, a selection frame can be popped up to allow the user to select.

Taking a call as an example of communication, the communication process is as follows:

(1) adding users-permission in android Manifest, and declaring the use authority:

<uses-permissionandroid:name＝"android.permission.CALL_PHONE"/>

this is because the call belongs to the underlying Service, and is related to the privacy of the user, the call charge, and the like, and therefore, the voice Service generally needs to acquire the related rights.

(2) The keyword "ACTION _ CALL" is brought in through the Intent object, and the dialed number is brought in through the uri.

Note that for incoming Uri data, the prefix of the phone is "tel:".

(3) And completing the call making through the voice Service by using a startActivity () method (inputting the self-defined Intent).

In another example, if the object is identified as an application name, and the operation parameter corresponding to the application name is a main activity name and/or a package name, the application corresponding to the main activity name and/or the package name may be started through the Intent mechanism Intent.

In Android, if one Activity is used to start another Activity, a startActivity (Intent) method can be used to transfer an Intent object, and the Intent object can precisely specify the jumping Activity or specify an action operation to be completed through the Intent object.

Of course, the above operation modes are only examples, and when implementing the embodiment of the present invention, other operation modes may be set according to practical situations, and the embodiment of the present invention is not limited thereto. In addition, besides the above operation modes, a person skilled in the art may also use other operation modes according to actual needs, and the embodiment of the present invention is not limited thereto.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 2, a block diagram of an embodiment of an apparatus for performing voice operation in a mobile terminal according to the present invention is shown, where the apparatus is located in a voice service 200, and may specifically include the following modules:

an operating system wake-up module 201, configured to wake up an operating system when a voice service is triggered in the operating system in a dormant state;

the audio data acquisition module 202 is used for acquiring audio data;

the voice instruction matching module 203 is used for searching a preset voice instruction matched with the audio data;

an object identifier extracting module 204, configured to extract an object identifier from the voice instruction;

and the application calling module 205 is configured to call an application corresponding to the identification object identifier.

In an embodiment of the present invention, the os wakeup module 201 may include the following sub-modules:

In one embodiment of the present invention, the voice command matching module 203 may include the following sub-modules:

In an embodiment of the present invention, the object identifier extracting module 204 may include the following sub-modules:

In one embodiment of the present invention, the application calling module 205 may include the following sub-modules:

In one example of the embodiment of the present invention, the object identifier may be a contact name or an application name;

the parameter operation submodule may include the following units:

or,

In one embodiment of the present invention, the apparatus may further include the following modules:

a priority raising module, configured to raise the priority of the voice service 200.

In an embodiment of the present invention, the object information includes an object identifier and an operation parameter, and the apparatus may further include the following modules:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The method for performing voice operation in the mobile terminal and the device for performing voice operation in the mobile terminal provided by the present invention are described in detail above, and a specific example is applied in the text to explain the principle and the implementation of the present invention, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for performing voice operation in a mobile terminal, comprising:

obtaining, by the voice service, audio data;

searching a preset voice instruction matched with the audio data;

extracting an object identification from the voice instruction;

2. The method of claim 1, wherein when the voice service is triggered in an operating system in a sleep state, the step of waking up the operating system comprises:

3. The method of claim 1, wherein the step of searching for preset voice commands matching the audio data comprises:

calculating the similarity between the audio data and the voice instruction;

4. The method of claim 1 or 3, wherein the step of extracting the object identifier from the voice instruction comprises:

searching a configuration file corresponding to the voice instruction;

and extracting object identification from the configuration file.

5. The method of claim 1, 2 or 3, wherein the step of invoking the application corresponding to the identification object identifier comprises:

and calling the application according to the operating parameters.

6. The method of claim 5, wherein the object is identified as a contact name or an application name;

communicating with the contact number through an intent mechanism;

or,

7. The method of claim 1, further comprising:

when the operating system is started, starting voice service;

and promoting the priority of the voice service.

8. The method of claim 1 or 7, further comprising:

storing the object information in an information table.

9. The method of claim 8, wherein the object information includes an object identification and an operational parameter, the method further comprising:

packaging the object identification into a preset configuration file;

10. An apparatus for performing voice operation in a mobile terminal, the apparatus being located in a voice service, the apparatus comprising:

the audio data acquisition module is used for acquiring audio data;