CN108521501B

CN108521501B - Voice input method, mobile terminal and computer readable storage medium

Info

Publication number: CN108521501B
Application number: CN201810209073.XA
Authority: CN
Inventors: 刘海兵; 胡林涛
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2018-03-14
Filing date: 2018-03-14
Publication date: 2021-01-08
Anticipated expiration: 2038-03-14
Also published as: CN108521501A

Abstract

The invention discloses a voice input method and a mobile terminal, wherein the method comprises the following steps: when the mobile terminal is connected with an earphone with a microphone and in a voice input state, receiving voice input by a user through the microphone of the mobile terminal to obtain first voice; receiving voice input by a user through a microphone of the earphone to obtain second voice; acquiring the voice quality of the first voice and the voice quality of the second voice; and when the voice quality of the first voice is higher than that of the second voice, the first voice is used as voice input. In the invention, the microphone of the mobile terminal and the microphone of the earphone can be started to detect the voice input by the user, the first voice or the second voice is finally determined to be used as the voice input by comparing the voice quality of the voice detected by the two microphones, and when the voice quality of the voice detected by the microphone of the mobile terminal is higher, the voice detected by the microphone of the mobile terminal is used as the voice input, so that the identifiability of the input voice is improved.

Description

Voice input method, mobile terminal and computer readable storage medium

Technical Field

The embodiment of the invention relates to the technical field of mobile terminals, in particular to a voice input method and a mobile terminal.

Background

At present, the mobile terminal and the earphone can be connected in various modes such as wired mode, Bluetooth mode or wifi mode, and under the condition that the mobile terminal is connected with the earphone, the mobile terminal can default to take the sound detected by a microphone of the earphone as voice input. When the mobile terminal is in a state of being connected with the headset, although the user can input voice through the microphone of the headset, the first reaction of the user is to input voice through the microphone of the mobile terminal due to daily use habits, and at this time, the volume of sound detected by the microphone of the headset is very small, which results in poor intelligibility of the input voice.

Disclosure of Invention

The embodiment of the invention provides a voice input method and a mobile terminal, and aims to solve the technical problem that in the prior art, when an earphone is connected with the mobile terminal, a user may still input voice through a microphone of the mobile terminal, so that the voice input by the microphone of the earphone has poor voice recognizability.

To solve the above technical problem, the embodiment of the present invention is implemented as follows:

in a first aspect, an embodiment of the present invention further provides a voice input method, which is applied to a mobile terminal, and the method includes:

receiving voice input by a user through a microphone of the mobile terminal under the conditions that the mobile terminal is connected with an earphone with the microphone and the voice is input, and obtaining first voice; receiving the voice input by the user through a microphone of the earphone to obtain a second voice;

acquiring the voice quality of the first voice and the voice quality of the second voice;

and when the voice quality of the first voice is higher than that of the second voice, taking the first voice as voice input.

In a second aspect, an embodiment of the present invention further provides a mobile terminal, where the mobile terminal includes:

the first voice receiving unit is used for receiving voice input by a user through a microphone of the mobile terminal under the conditions that the mobile terminal is connected with an earphone with the microphone and the voice is input, so as to obtain first voice;

the second voice receiving unit is used for receiving the voice input by the user through a microphone of the earphone to obtain a second voice;

an obtaining unit configured to obtain a voice quality of the first voice and a voice quality of the second voice;

an input unit configured to input the first voice as voice in a case where a voice quality of the first voice is higher than a voice quality of the second voice.

In a third aspect, an embodiment of the present invention further provides a mobile terminal, including a processor, a memory, and a voice input program stored on the memory and operable on the processor, where the voice input program, when executed by the processor, implements the steps of the voice input method.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a voice input program is stored on the computer-readable storage medium, and when being executed by a processor, the voice input program implements the steps of the voice input method.

In the embodiment of the invention, in the states of connection between the mobile terminal and the earphone and voice input, the mobile terminal can simultaneously start the microphone of the mobile terminal and the microphone of the earphone to detect the voice input by a user, and finally determine whether the first voice or the second voice is used as the voice input by comparing the voice quality of the voice detected by the microphone of the mobile terminal and the voice detected by the microphone of the earphone, and finally use the voice detected by the microphone of the mobile terminal as the voice input when the voice quality detected by the microphone of the mobile terminal is higher, so that the identifiability of the voice input when the mobile terminal is connected with the earphone is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow diagram of a method of speech input according to one embodiment of the invention;

FIG. 2 is an architecture diagram of a speech input system of one embodiment of the present invention;

FIG. 3 is a flow chart of a method of speech input according to another embodiment of the present invention;

fig. 4 is a schematic structural diagram of a mobile terminal of an embodiment of the present invention;

fig. 5 is a schematic hardware structure diagram of a mobile terminal implementing various embodiments of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Although the user may perform voice input through the microphone of the headset when the mobile terminal is in a state of being connected to the headset, the first reaction of the user is generally to perform voice input through the microphone of the mobile terminal due to daily usage habits. For example, a user often moves a microphone of the mobile terminal to the mouth for voice input during the process of using the mobile terminal at ordinary times, and when the mobile terminal is in an earphone mode, the user still moves the microphone of the mobile terminal to the mouth for voice input, and at this time, the volume of sound detected by the microphone of the earphone is very small, which results in poor intelligibility of the input voice. In order to solve the technical problem that in the prior art, when an earphone is connected with a mobile terminal, a user may still perform voice input through a microphone of the mobile terminal, so that the voice input of the microphone of the earphone has poor intelligibility, embodiments of the present invention provide a voice input method and a mobile terminal.

First, a speech input method provided by an embodiment of the present invention will be described below.

It should be noted that the method provided by the embodiment of the present invention is applicable to a mobile terminal, and in practical application, the mobile terminal may include: smart phones, tablet computers, personal digital assistants, and the like, which are not limited in this embodiment of the present invention.

Fig. 1 is a flow chart of a voice input method according to an embodiment of the present invention, which may include the following steps, as shown in fig. 1: step 101, step 102 and step 103, wherein,

in step 101, receiving a voice input by a user through a microphone of a mobile terminal in a state that the mobile terminal is connected with an earphone with the microphone and in a voice input state to obtain a first voice; and receiving voice input by a user through a microphone of the earphone to obtain second voice.

In the embodiment of the present invention, the mobile terminal and the earphone with the microphone may be connected by at least one of the following ways: wired, bluetooth, ZigBee, or wifi. In practical application, any detection means in the prior art can be adopted to detect whether the mobile terminal is connected with an earphone with a microphone.

In the embodiment of the invention, whether the mobile terminal is in a voice input state can be determined by detecting whether a voice input instruction is received; when detecting that a voice input instruction is received, determining that the mobile terminal is in a voice input state; the voice input instruction is used for triggering the mobile terminal to enter a voice input state.

In the embodiment of the present invention, the voice input instruction may be triggered by the user with respect to a third-party application installed in the mobile terminal, for example, when the user "long presses" a voice button in the chat application, the voice input instruction may be considered to be triggered. The voice input instruction may also be triggered by the user for the system application in the mobile terminal, for example, when the user clicks a recording button in a recording tool, the voice input instruction may be considered to be triggered; or, when the user "long presses" the home key of the mobile terminal, it may be considered that the voice input instruction is triggered, and this is not limited in the embodiment of the present invention.

In the embodiment of the invention, when the voice input instruction is received, the mobile terminal can simultaneously start the microphone of the mobile terminal and the microphone of the earphone, and simultaneously receive the voice input by the user through the microphone of the mobile terminal and the microphone of the earphone respectively.

In the embodiment of the present invention, the microphone of the mobile terminal may be controlled to still receive voice when the mobile terminal is in the earphone mode through a software control manner, and specifically, when the mobile terminal is in the earphone mode, power may be supplied to the microphone of the mobile terminal, or a voice signal received by the microphone of the mobile terminal is not shielded, so as to ensure that the microphone of the mobile terminal still can receive voice when the mobile terminal is in the earphone mode. The earphone mode refers to a state that the mobile terminal is connected with an earphone with a microphone in a processing mode.

In step 102, the voice quality of the first voice and the voice quality of the second voice are obtained.

In the embodiment of the present invention, the voice quality may be a sound intensity used for representing the volume, or may also be a signal-to-noise ratio or a fluctuation degree of the sound intensity within a preset second duration. In addition, the voice quality may be a weighted sum of the sound intensity, the signal-to-noise ratio, and the degree of fluctuation of the sound intensity in the preset second period.

In the embodiment of the invention, when the voice quality is the voice intensity, the voice intensity of the first voice and the voice intensity of the second voice are obtained; when the voice quality is the signal-to-noise ratio, acquiring the signal-to-noise ratio of the first voice and the signal-to-noise ratio of the second voice; and when the voice quality is the fluctuation degree of the sound intensity in the preset second duration, acquiring the fluctuation degree of the sound intensity in the preset second duration of the first voice and the fluctuation degree of the sound intensity in the preset second duration of the second voice.

In the embodiment of the present invention, in the process of acquiring the fluctuation degree of the sound intensity in the preset second duration of the first voice and the fluctuation degree of the sound intensity in the preset second duration of the second voice, the following processing may be performed on both the first voice and the second voice:

splitting the voice into frames, acquiring the sound intensity of each frame, and if the sound intensity fluctuation of each frame is small in a preset second time length, determining that the difference between the useful sound and the background noise intensity of the voice is not large, namely the voice is not the sound intentionally input by the user, namely, the fluctuation degree of the sound intensity in the preset second time length is small; if the sound intensity of each frame fluctuates greatly in the preset second time period, the difference between the useful sound of the speech and the background noise is considered to be large, the useful sound is obvious, namely the speech is the sound intentionally input by the user, namely, the fluctuation degree of the sound intensity in the preset second time period is large.

In the embodiment of the present invention, when the fluctuation of the sound intensity is calculated, the variance calculation method may be adopted to determine the sound intensity of each frame within a period of time, and perform variance calculation on all the sound intensities, and if the variance is large, the fluctuation of the sound intensity may be considered to be large, and if the variance is small, the fluctuation of the sound intensity may be considered to be small. In addition, other calculation methods in the prior art may also be adopted to calculate the fluctuation of the sound intensity, which is not limited in the embodiment of the present invention.

In step 103, when the voice quality of the first voice is higher than the voice quality of the second voice, the first voice is used as the voice input.

In the embodiment of the invention, when the voice quality of the first voice is higher than that of the second voice, the voice to be used finally is the first voice, the second voice is not used, and the second voice is discarded.

In the embodiment of the present invention, when the voice quality is the voice intensity, if the voice intensity of the first voice is higher than the voice intensity of the second voice, the first voice is input as a voice, that is, a voice detected by a microphone of the mobile terminal is used as a finally input voice.

In the embodiment of the invention, when the voice quality is the signal-to-noise ratio, if the signal-to-noise ratio of the first voice is higher than that of the second voice, the first voice is used as the voice input.

In the embodiment of the present invention, when the voice quality is the fluctuation degree of the sound intensity in the preset second time duration, if the fluctuation degree of the sound intensity in the preset second time duration of the first voice is higher than the fluctuation degree of the sound intensity in the preset second time duration of the second voice, the first voice is input as the voice.

In this embodiment of the present invention, in order to ensure that the finally input voice can be recognized by the mobile terminal, step 103 may specifically include the following steps: and when the voice quality of the first voice is higher than that of the second voice and the difference value between the voice quality of the first voice and the voice quality of the second voice is larger than a preset voice quality threshold value, taking the first voice as voice input.

In this case, when the voice quality is the voice intensity, if the voice intensity of the first voice is higher than the voice intensity of the second voice and the difference between the voice intensity of the first voice and the voice intensity of the second voice is greater than the preset voice intensity threshold, it may be considered that the user does not notice that the earphone is connected to the mobile terminal at this time, and the original mobile terminal microphone input mode is still adopted, and then the first voice is used as the voice input, that is, the voice detected by the microphone of the mobile terminal is used as the final input voice. For example, whether the difference between the intensity of the sound received by the microphone of the mobile phone and the intensity of the sound received by the microphone of the earphone is greater than 30dB is detected, and if the difference is greater than 30dB, the voice received by the microphone of the mobile phone is used as the final input voice.

In the embodiment of the present invention, when the voice quality is a signal-to-noise ratio, if the signal-to-noise ratio of the first voice is higher than the signal-to-noise ratio of the second voice and the difference between the signal-to-noise ratio of the first voice and the signal-to-noise ratio of the second voice is greater than a preset signal-to-noise ratio threshold, it can be assumed that the user does not notice that the earphone is connected to the mobile terminal at this time, and the original mobile terminal microphone input mode is still adopted, and then the first voice is used as voice input.

In the embodiment of the present invention, when the voice quality is the fluctuation degree of the sound intensity in the preset second duration, if the fluctuation degree of the sound intensity in the preset second duration of the first voice is higher than the fluctuation degree of the sound intensity in the preset second duration of the second voice, and the difference between the fluctuation degree of the sound intensity of the first voice and the fluctuation degree of the sound intensity of the second voice is greater than the fluctuation degree threshold of the preset sound intensity, it may be considered that the user does not notice that the earphone is connected to the mobile terminal at this time, and still use the original microphone input mode of the mobile terminal, and then use the first voice as the voice input.

In the embodiment of the present invention, when the voice quality of the first voice is lower than the voice quality of the second voice, the second voice is used as a voice input, that is, a sound detected by a microphone of the earphone is used as a final input voice.

For easy understanding, the technical solution of the present invention is described with reference to the architecture diagram shown in fig. 2, where fig. 2 shows an architecture of a voice input system in a case where a mobile terminal is connected to an earphone with a microphone, and the voice input system includes: the microphone of the earphone and the microphone of the mobile terminal are started after the central processing unit receives a voice input instruction, the microphone of the earphone and the microphone of the mobile terminal almost simultaneously receive voice input by a user, then the voice received by the microphone of the earphone and the voice input by the mobile terminal are transmitted to the audio processing module, and the central processing unit controls the audio processing module to select the voice from the received voice as final input voice.

As can be seen from the above embodiments, in the embodiment, in the state where the mobile terminal is connected to the headset and the voice input is performed, the mobile terminal may simultaneously activate the microphone of the mobile terminal and the microphone of the headset to detect the voice input by the user, and determine whether the first voice or the second voice is finally input as the voice input by comparing the voice quality of the voice detected by the microphone of the mobile terminal and the voice detected by the microphone of the headset, and when the voice quality of the voice detected by the microphone of the mobile terminal is higher, the voice detected by the microphone of the mobile terminal is finally input as the voice input, so that the recognition degree of the voice input when the mobile terminal is connected to the headset is improved.

Fig. 3 is a flowchart of a voice input method according to another embodiment of the present invention, in which whether a user may move a microphone of a mobile terminal close to the mouth may be inferred by determining whether the user has moved the mobile terminal within a certain time period before and after a voice input command is triggered, as shown in fig. 3, the method may include the following steps: step 301, step 302, step 303, step 304 and step 305, wherein,

in step 301, a voice input instruction is received while the mobile terminal is connected to an earphone with a microphone. The voice input instruction is used for triggering the mobile terminal to enter a voice input state.

Step 301 in the embodiment of the present invention is similar to step 101 in the embodiment shown in fig. 1, and details thereof are not repeated here, please refer to the contents in the embodiment shown in fig. 1.

In step 302, it is determined whether the mobile terminal moves within a preset first time period before and/or after receiving the voice input command.

In the embodiment of the invention, whether the mobile terminal moves within a preset first time length before the voice input instruction is received can be judged; or, it may be determined whether the mobile terminal moves within a preset first time period after receiving the voice input instruction; or, it may be determined whether the mobile terminal moves within a preset first time period before and after the voice input instruction is received, which is not limited in the embodiment of the present invention.

In view of the fact that the sensitivity of the gyroscope is high, in order to improve the detection sensitivity, in the embodiment of the present invention, whether the mobile terminal moves or not may be determined according to the data change of the gyroscope in the mobile terminal, that is, whether the user has an action of suddenly moving the mobile phone or not may be determined according to the data change of the gyroscope, and it may be inferred that the user uses the microphone of the mobile terminal or the microphone of the headset to perform voice input. If there is an abrupt movement of the mobile terminal, it is inferred that the user uses the microphone of the mobile terminal for voice input. If there is no sudden movement of the microphone of the mobile terminal, it is inferred that the user is using the microphone input of the headset.

In this case, the step 302 may specifically include the following steps: and judging whether the gyroscope data change of the mobile terminal is greater than a preset change threshold value within a preset first time period before and/or after the voice input instruction is received, and if so, moving the mobile terminal.

In the embodiment of the present invention, whether the mobile terminal moves may also be detected by another sensor in the mobile terminal, for example, an infrared sensor, which is not limited in the embodiment of the present invention.

In step 303, receiving a voice input by a user through a microphone of the mobile terminal to obtain a first voice; and receiving voice input by a user through a microphone of the earphone to obtain second voice.

Step 303 in the embodiment of the present invention is similar to step 101 in the embodiment shown in fig. 1, and details are not repeated here, please refer to the contents in the embodiment shown in fig. 1.

In step 304, the voice quality of the first voice and the voice quality of the second voice are obtained.

Step 304 in the embodiment of the present invention is similar to step 102 in the embodiment shown in fig. 1, and is not repeated here, please refer to the contents in the embodiment shown in fig. 1 for details.

In step 305, when the voice quality of the first voice is higher than that of the second voice and the mobile terminal moves, the first voice is input as voice.

Preferably, in the embodiment of the present invention, when the voice quality of the first voice is higher than the voice quality of the second voice, and a difference between the voice quality of the first voice and the voice quality of the second voice is greater than a preset voice quality threshold, and the mobile terminal moves within a preset first time period before and/or after receiving the voice input instruction, the first voice is used as the voice input.

As can be seen from the foregoing embodiments, in this embodiment, when the mobile terminal is connected to the headset, the mobile terminal may simultaneously activate the microphone of the mobile terminal and the microphone of the headset to detect the sound input by the user, and determine the sound as the final input by comparing the voice qualities of the sounds detected by the microphone of the mobile terminal and the microphone of the headset; in addition, the user can suddenly move the mobile terminal within the time period before and after the voice input instruction is triggered, and the intention of the user for inputting the voice by using the microphone of the mobile terminal can be reflected to a great extent, so when the voice quality of the voice detected by the microphone of the mobile terminal is high and the mobile terminal is in motion, the voice detected by the microphone of the mobile terminal is taken as the voice input, and the identifiability of the voice input when the mobile terminal is connected with the earphone is further improved.

Fig. 4 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention, and as shown in fig. 4, the mobile terminal 400 may include: a first voice receiving unit 401, a second voice receiving unit 402, an acquiring unit 403, and an input unit 404, wherein,

a first voice receiving unit 401, configured to receive a voice input by a user through a microphone of the mobile terminal when the mobile terminal is connected to an earphone with the microphone and in a voice input state, so as to obtain a first voice;

a second voice receiving unit 402, configured to receive a voice input by the user through a microphone of the headset, so as to obtain a second voice;

an obtaining unit 403, configured to obtain a voice quality of the first voice and a voice quality of the second voice;

an input unit 404, configured to input the first voice as voice if the voice quality of the first voice is higher than the voice quality of the second voice.

As can be seen from the foregoing embodiments, in the embodiment, in the state where the mobile terminal is connected to the headset and the voice input is performed, the mobile terminal may simultaneously activate the microphone of the mobile terminal and the microphone of the headset to detect the voice input by the user, and determine whether the first voice or the second voice is finally input as the voice input by comparing the voice qualities of the voices detected by the microphone of the mobile terminal and the microphone of the headset, and when the voice quality of the voice detected by the microphone of the mobile terminal is higher, the voice detected by the microphone of the mobile terminal is finally input as the voice input, so that the intelligibility of the voice input when the mobile terminal is connected to the headset is improved.

In another embodiment of the present invention, the input unit 404 may include:

and the first voice input subunit is used for taking the first voice as voice input under the condition that the difference value between the voice quality of the first voice and the voice quality of the second voice is greater than a preset voice quality threshold value.

In another embodiment provided by the present invention, the mobile terminal 400 may further include:

the mobile terminal comprises a judging unit, a processing unit and a processing unit, wherein the judging unit is used for judging whether the mobile terminal moves within a preset first time period before and/or after a voice input instruction is received, and the voice input instruction is used for triggering the mobile terminal to enter a voice input state;

the input unit 404 may include:

and the second voice input subunit is used for taking the first voice as voice input under the condition that the voice quality of the first voice is higher than that of the second voice and the mobile terminal moves.

In another embodiment of the present invention, the determining unit may include:

the judging subunit is configured to judge whether a change in gyroscope data of the mobile terminal is greater than a preset change threshold within a preset first time period before and/or after the voice input instruction is received, and if the change is greater than the preset change threshold, the mobile terminal moves.

In another embodiment provided by the present invention, the voice quality may include at least one of:

the sound intensity, the signal-to-noise ratio and the fluctuation degree of the sound intensity within the preset second time length.

Fig. 5 is a schematic diagram of a hardware structure of a mobile terminal for implementing various embodiments of the present invention, and as shown in fig. 5, the mobile terminal 500 includes, but is not limited to: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 507, an interface unit 508, a memory 509, a processor 510, and a power supply 511. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 5 is not intended to be limiting of mobile terminals, and that a mobile terminal may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the mobile terminal includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

The processor 510 is configured to receive a voice input by a user through a microphone of the mobile terminal when the mobile terminal is in a state of being connected to an earphone with the microphone and a voice input state, so as to obtain a first voice; receiving the voice input by the user through a microphone of the earphone to obtain a second voice; acquiring the voice quality of the first voice and the voice quality of the second voice; and when the voice quality of the first voice is higher than that of the second voice, taking the first voice as voice input.

Optionally, as an embodiment, when the voice quality of the first voice is higher than the voice quality of the second voice, using the first voice as a voice input includes:

and when the difference value between the voice quality of the first voice and the voice quality of the second voice is larger than a preset voice quality threshold value, taking the first voice as voice input.

Optionally, as an embodiment, the method further includes:

judging whether the mobile terminal moves within a preset first time period before and/or after a voice input instruction is received, wherein the voice input instruction is used for triggering the mobile terminal to enter a voice input state;

when the voice quality of the first voice is higher than the voice quality of the second voice, the step of taking the first voice as voice input comprises the following steps:

and when the voice quality of the first voice is higher than that of the second voice and the mobile terminal moves, taking the first voice as voice input.

Optionally, as an embodiment, the determining whether the mobile terminal moves within a preset first time period before and/or after receiving the voice input instruction includes:

judging whether the gyroscope data change of the mobile terminal is larger than a preset change threshold value within a preset first time period before and/or after the voice input instruction is received, and if the gyroscope data change is larger than the preset change threshold value, the mobile terminal moves.

Optionally, as an embodiment, the voice quality includes at least one of:

It should be understood that, in the embodiment of the present invention, the radio frequency unit 501 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 510; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 501 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 501 can also communicate with a network and other devices through a wireless communication system.

The mobile terminal provides the user with wireless broadband internet access through the network module 502, such as helping the user send and receive e-mails, browse webpages, access streaming media, and the like.

The audio output unit 503 may convert audio data received by the radio frequency unit 501 or the network module 502 or stored in the memory 509 into an audio signal and output as sound. Also, the audio output unit 503 may also provide audio output related to a specific function performed by the mobile terminal 500 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 503 includes a speaker, a buzzer, a receiver, and the like.

The input unit 504 is used to receive an audio or video signal. The input Unit 504 may include a Graphics Processing Unit (GPU) 5041 and a microphone 5042, and the Graphics processor 5041 processes image data of a still picture or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 506. The image frames processed by the graphic processor 5041 may be stored in the memory 509 (or other storage medium) or transmitted via the radio frequency unit 501 or the network module 502. The microphone 5042 may receive sounds and may be capable of processing such sounds into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 501 in case of the phone call mode.

The mobile terminal 500 also includes at least one sensor 505, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that adjusts the brightness of the display panel 5061 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 5061 and/or a backlight when the mobile terminal 500 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of the mobile terminal (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 505 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The display unit 506 is used to display information input by the user or information provided to the user. The Display unit 506 may include a Display panel 5061, and the Display panel 5061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 507 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Specifically, the user input unit 507 includes a touch panel 5071 and other input devices 5072. Touch panel 5071, also referred to as a touch screen, may collect touch operations by a user on or near it (e.g., operations by a user on or near touch panel 5071 using a finger, stylus, or any suitable object or attachment). The touch panel 5071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 510, and receives and executes commands sent by the processor 510. In addition, the touch panel 5071 may be implemented in various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 5071, the user input unit 507 may include other input devices 5072. In particular, other input devices 5072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.

Further, the touch panel 5071 may be overlaid on the display panel 5061, and when the touch panel 5071 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 510 to determine the type of the touch event, and then the processor 510 provides a corresponding visual output on the display panel 5061 according to the type of the touch event. Although in fig. 5, the touch panel 5071 and the display panel 5061 are two independent components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 5071 and the display panel 5061 may be integrated to implement the input and output functions of the mobile terminal, and is not limited herein.

The interface unit 508 is an interface through which an external device is connected to the mobile terminal 500. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 508 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 500 or may be used to transmit data between the mobile terminal 500 and external devices.

The memory 509 may be used to store software programs as well as various data. The memory 509 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 509 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The processor 510 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 509 and calling data stored in the memory 509, thereby performing overall monitoring of the mobile terminal. Processor 510 may include one or more processing units; preferably, the processor 510 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 510.

The mobile terminal 500 may further include a power supply 511 (e.g., a battery) for supplying power to various components, and preferably, the power supply 511 may be logically connected to the processor 510 via a power management system, so that functions of managing charging, discharging, and power consumption are performed via the power management system.

In addition, the mobile terminal 500 includes some functional modules that are not shown, and thus, are not described in detail herein.

Preferably, an embodiment of the present invention further provides a mobile terminal, which includes a processor 510, a memory 509, and a voice input program stored in the memory 509 and capable of running on the processor 510, where the voice input program is executed by the processor 510 to implement each process of the above-mentioned voice input method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.

An embodiment of the present invention further provides a computer-readable storage medium, where a voice input program is stored on the computer-readable storage medium, and when the voice input program is executed by a processor, the voice input program implements the processes of the voice input method, and can achieve the same technical effects, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A voice input method is applied to a mobile terminal, and is characterized by comprising the following steps:

when the voice quality of the first voice is higher than that of the second voice, taking the first voice as voice input;

the method further comprises the following steps:

when the voice quality of the first voice is higher than that of the second voice and the mobile terminal moves, taking the first voice as voice input;

when the voice quality is the fluctuation degree of the sound intensity in the preset second duration, acquiring the fluctuation degree of the sound intensity in the preset second duration of the first voice and the fluctuation degree of the sound intensity in the preset second duration of the second voice, and processing the first voice and the second voice comprises:

splitting the voice into frames, acquiring the sound intensity of each frame, and if the sound intensity of each frame fluctuates less in a preset second time length, the fluctuation degree of the sound intensity in the preset second time length of the voice is smaller; and if the sound intensity of each frame fluctuates greatly in the preset second time length, the fluctuation degree of the sound intensity in the preset second time length of the voice is large.

2. The method of claim 1, wherein the using the first speech as the speech input when the speech quality of the first speech is higher than the speech quality of the second speech comprises:

3. The method according to claim 1, wherein the determining whether the mobile terminal moves within a preset first time period before and/or after receiving the voice input instruction comprises:

4. The method of any of claims 1 to 3, wherein the speech quality comprises at least one of:

5. A mobile terminal, characterized in that the mobile terminal comprises:

an input unit configured to input the first voice as voice in a case where voice quality of the first voice is higher than voice quality of the second voice;

the mobile terminal further includes:

the input unit includes:

the second voice input subunit is used for taking the first voice as voice input under the condition that the voice quality of the first voice is higher than that of the second voice and the mobile terminal moves;

the obtaining unit is further configured to: when the voice quality is the fluctuation degree of the sound intensity in the preset second duration, acquiring the fluctuation degree of the sound intensity in the preset second duration of the first voice and the fluctuation degree of the sound intensity in the preset second duration of the second voice, and processing the first voice and the second voice comprises the following steps:

6. The mobile terminal according to claim 5, wherein the input unit comprises:

7. The mobile terminal according to claim 5, wherein the determining unit comprises:

8. The mobile terminal according to any of claims 5 to 7, wherein the speech quality comprises at least one of:

9. A mobile terminal comprising a processor, a memory and a speech input program stored on the memory and operable on the processor, the speech input program when executed by the processor implementing the steps of the speech input method according to any one of claims 1 to 4.

10. A computer-readable storage medium, on which a speech input program is stored, which when executed by a processor implements the steps of the speech input method according to any one of claims 1 to 4.