CN112951232A - Voice input method, device, equipment and computer readable storage medium - Google Patents

Voice input method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN112951232A
CN112951232A CN202110232353.4A CN202110232353A CN112951232A CN 112951232 A CN112951232 A CN 112951232A CN 202110232353 A CN202110232353 A CN 202110232353A CN 112951232 A CN112951232 A CN 112951232A
Authority
CN
China
Prior art keywords
information
voice
instruction
input
text information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110232353.4A
Other languages
Chinese (zh)
Inventor
张学明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Skyworth RGB Electronics Co Ltd
Original Assignee
Shenzhen Skyworth RGB Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Skyworth RGB Electronics Co Ltd filed Critical Shenzhen Skyworth RGB Electronics Co Ltd
Priority to CN202110232353.4A priority Critical patent/CN112951232A/en
Publication of CN112951232A publication Critical patent/CN112951232A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention discloses a voice input method, which comprises the following steps: when detecting that the information input function is started, starting a voice input function and acquiring first voice information; converting the first voice information into text information, and determining a focus position corresponding to the information input function; outputting the text information to the focal position. The invention also discloses a voice input device, equipment and a computer readable storage medium. The invention realizes the quick input of the text information through the voice and improves the input efficiency of the information.

Description

Voice input method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a speech input method, apparatus, device, and computer-readable storage medium.
Background
At present, when characters are input in a text box of intelligent equipment such as a television and the like, a traditional input mode is often adopted, namely, the characters are input by a virtual keyboard generated by the intelligent equipment through a remote controller or a touch mode, some intelligent equipment is provided with an interaction function, and a user can establish communication with the intelligent equipment through a mobile terminal such as a mobile phone and the like, so that character input operation is completed on a mobile terminal such as the mobile phone and the like, and the characters are synchronized to the intelligent equipment. However, the input efficiency of this method is too low, especially in the equipment similar to the television, when inputting by using the remote controller, the operation is complicated, and when searching for the program, the whole spelling of the program name is mostly input for searching, the input information has certain limitation, and when continuously inputting the search content, the last search content needs to be cleared manually, resulting in low input efficiency.
Disclosure of Invention
The invention mainly aims to provide a voice input method, a voice input device, voice input equipment and a computer readable storage medium, and aims to solve the technical problem that the traditional information input mode is low in input efficiency at present.
In addition, to achieve the above object, the present invention further provides a voice input method, including the steps of:
when detecting that the information input function is started, starting a voice input function and acquiring first voice information;
converting the first voice information into text information, and determining a focus position corresponding to the information input function;
outputting the text information to the focal position.
Optionally, the step of starting the voice input function when the start of the information input function is detected includes:
when detecting that the information input function is started, outputting prompt information for starting the voice input function and acquiring a starting instruction;
and starting a voice input function according to the starting instruction.
Optionally, the step of converting the first voice message into text message includes:
analyzing the first voice information to obtain a conversion instruction corresponding to the first voice information;
and converting the first voice information into text information according to the conversion instruction.
Optionally, the step of analyzing the first voice information to obtain a conversion instruction corresponding to the first voice information includes:
extracting keyword information from the first voice information;
analyzing the keyword information to determine whether the first voice information needs to be optimized when the first voice information is converted into text information;
and if the first voice information does not need to be optimized, generating a conversion instruction for converting the first voice information into corresponding text information.
Optionally, after the step of analyzing the keyword information to determine whether the first speech information needs to be optimized when the first speech information is converted into text information, the method includes:
if the first voice information needs to be optimized, generating an information optimization instruction for optimizing the first voice information according to the keyword information;
and generating a conversion instruction for converting the first voice information into text information based on the information optimization instruction.
Optionally, after the step of outputting the text information to the focal position, the method further includes:
acquiring a change instruction of the text information;
changing the text information according to the change instruction to obtain target text information;
and outputting the target text information to the focus position.
Optionally, the modifying instruction includes a re-input instruction and a modifying instruction, and the step of modifying the text information according to the modifying instruction to obtain the target text information includes:
if the change instruction is a re-input instruction, acquiring second voice information and converting the second voice information to obtain target text information;
and if the modification instruction is a modification instruction, modifying the text information according to the modification instruction to obtain target text information.
Further, to achieve the above object, the present invention also provides a voice input device including:
the voice input module is used for starting a voice input function and acquiring first voice information when detecting that the equipment starts the information input function;
the voice recognition module is used for converting the first voice information into text information and determining a focus position corresponding to the information input function;
and the text output module is used for outputting the text information to the focus position.
Further, to achieve the above object, the present invention also provides a voice input apparatus including: the voice input method comprises a memory, a processor and a voice input program which is stored on the memory and can run on the processor, wherein the voice input program realizes the steps of the voice input method when being executed by the processor.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium having a voice input program stored thereon, which when executed by a processor, implements the steps of the voice input method as described above.
The embodiment of the invention provides a voice input method, a voice input device, voice input equipment and a computer readable storage medium. In the prior art, information is input through a virtual keyboard by using a remote controller or a touch mode and the like, so that the information input efficiency is low; converting the first voice information into text information, and determining a focus position corresponding to the information input function; outputting the text information to the focal position. The method and the device finish the quick input of the text information by using the voice, solve the problem of slow information input in the traditional input mode and improve the information input efficiency.
Drawings
Fig. 1 is a schematic hardware structure diagram of an implementation manner of a voice input device according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first exemplary embodiment of a speech input method according to the present invention;
FIG. 3 is a schematic diagram of a prompt message in a first embodiment of a voice input method according to the invention;
FIG. 4 is another prompt intent in accordance with the first embodiment of the present invention;
FIG. 5 is a functional block diagram of a voice input device according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.
The voice input device (also called terminal, device or terminal device) in the embodiment of the invention can be a PC, and can also be a mobile terminal device with display and voice functions, such as a smart phone, a smart television, a tablet computer, a portable computer and the like.
As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the mobile terminal is stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer and tapping) and the like for recognizing the attitude of the mobile terminal; of course, the mobile terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.
Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a voice input program.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to invoke a voice input program stored in the memory 1005, which when executed by the processor, implements operations in the voice input method provided by the embodiments described below.
Based on the hardware structure of the equipment, the embodiment of the voice input method is provided.
Referring to fig. 2, in a first embodiment of the voice input method of the present invention, the voice input method includes:
step S10, when detecting the start of the information input function, starting the voice input function and acquiring the first voice information;
the voice input method of the present invention is applied to an intelligent terminal device having voice and display functions, such as a television, and is described below by taking an example of application to an intelligent television (a television for short). The input function of the television is monitored, and when the fact that the user starts the information input function is detected, the voice input function is started and voice information input by the user is obtained. In this embodiment, the information input function that is started by the user on the television may be an information input function preset on the television, or an information input function set in an application installed in the television, and is not limited specifically herein.
The input function of the television is detected, the focus of a cursor of the television is tracked and monitored, and when the focus of the television falls into an area with the input function, such as an input text box, the voice input function is started. When the voice input function is started, the television can be started in the background, a user can select whether to use the voice input function according to the self requirement, and when the user selects to use the voice input function, the voice information input by the user is acquired.
The refining step of the step S10 comprises the steps A1-A2:
step A1, when detecting that the information input function is started, outputting prompt information for starting the voice input function and obtaining a starting instruction;
and step A2, starting the voice input function according to the starting instruction.
Further, when it is detected that the television set has started the information input function, the voice input function is started in the background of the television set, and a prompt message for starting the voice input function is output and displayed on the display screen of the television set, and the prompt message may be in the form of a pop-up dialog box that is displayed similarly to "is the voice input function started, used or not? The prompt message "is shown in fig. 3, where fig. 3 is a schematic diagram of a prompt message dialog box in this embodiment, and the prompt message may include a selection button for a user to select, for example, as shown in fig. 3, the prompt message is popped up in a dialog box form, a prompt button of" use/not use "is set below the prompt content, then a start instruction input by the user is obtained, a voice input function is started in a front console of the television according to the start instruction, and the voice input function interacts with the user to obtain the voice message input by the user. The obtained start instruction input by the user may be an instruction triggered by a prompt button of "use" in the prompt information under the control of the user through a remote controller, or a voice instruction of the user, for example, when information such as "use/start" and the like input by the user is obtained, a voice input function is started on a front desk of the television, and voice information input by the user is obtained.
And if the obtained starting instruction input by the user is that the voice input function is not used, starting the voice input function at the background of the television, monitoring the focus of the television, and outputting and displaying the prompt information for starting the voice input function again when the focus of the television is detected to fall into the area with the information input function again. When it is detected that the user selects not to use the voice input function for a plurality of times, an option of closing the prompt message similar to "no longer prompt this time when the user turns on the computer this time" may be added to the prompt message for the next time, as shown in fig. 4, when detecting that the user is continuously in the prompt message for a plurality of times, triggering an instruction not to use the voice input function, then, an option that does not display any more prompt information is added to the prompt information shown in fig. 3, if an instruction corresponding to the option triggered by the user is obtained, the focus of the television set can continue to be monitored, but when the focus of the subsequent television set falls again into the area with input functionality, the prompt information for starting the voice input function is not output or displayed any more, and the option for closing the prompt information is displayed after the prompt information for starting the voice input function is continuously displayed for a plurality of times, so that the user can perform self-defined setting.
Further, after the prompt message for starting the voice input function is turned off, when the focus of the television is detected to fall into the area with the information input function, a button for starting the voice input function is displayed in the input area, for example, a start button for setting "voice input" on the input text box, and when the user wants to start the voice input function in the foreground of the television, the start instruction of the voice input function can be triggered through the start button or a voice instruction, so as to call up the voice input function running in the background of the television. It should be noted that, when the voice input function is started in the background, the focus of the cursor is detected, and when the focus is in the area with the information input function, for example, in the input text box, the voice input function is started in the background, the foreground of the voice input function is started according to the obtained instruction triggered by the user, and when the foreground is started, the voice start instruction of the user may be preset by the television or may be a start instruction set by the user in a self-defined manner.
Step S20, converting the first voice information into text information, and determining a focus position corresponding to the information input function;
when detecting that the user starts the voice input function, acquiring voice information input by the user, converting the acquired voice information into text information, and meanwhile, monitoring a focus of the television to determine a focus position corresponding to the currently started information input function of the television, in this embodiment, the focus position refers to a position where a cursor of the television is focused, for example, when the user searches on the television, the search function is provided with an input text box, when the user moves the cursor of the television into the input text box, the voice input function is triggered to start at a background, and when the acquired voice information input by the user is converted into text information, the position where the cursor is focused is determined again to determine an output position of the text information.
Further, when the voice information input by the user is converted into text information, the voice information of the user can be recognized through a voice recognition technology preset by the television to obtain corresponding text information. Therefore, when the television is started for the first time, the voice information of the user can be acquired, for example, specific keywords or sentences displayed on the television and read by the user are included, so as to extract the acoustic features of the user and establish an acoustic model of the user, thereby improving the accuracy of voice recognition and further improving the accuracy of conversion between the voice information and the text information.
Step S30, outputting the text information to the focus position.
And after the position of the focus of the cursor, namely the focus position, is determined, outputting the converted text information to the focus position, and displaying the output text information to the user so that the user can carry out the next operation.
After step S30, steps B1-B3 are included:
step B1, acquiring a change instruction of the text information;
step B2, modifying the text information according to the modification instruction to obtain target text information;
and step B3, outputting the target text information to the focus position.
And further, outputting the text information to the focus position of the cursor, displaying the text information to a user, acquiring a change instruction of the user for the text information, and changing the text information according to the acquired change instruction. Due to the complex and diversified Chinese characters, the existence of homophones and homophones, and the common harmonic peduncles of program names, such as 'hip-hop' and 'unappreciable', the difficulty of voice recognition is increased, after the text information converted from the voice information is displayed to a user, the change instruction of the user is obtained, the text information is changed according to the change instruction to obtain target text information, and the target text information is output to the focus position of a television cursor and displayed to the user.
The refining step of the step B2 comprises the steps B21-B22:
step B21, if the change instruction is a re-input instruction, acquiring second voice information and converting the second voice information to obtain target text information;
and step B22, if the modification instruction is a modification instruction, modifying the text information according to the modification instruction to obtain target text information.
Furthermore, the obtained modification instruction of the user for the text information comprises a re-input instruction and a modification instruction, and when the obtained modification instruction triggered by the user is the re-input instruction, the voice information of the user is obtained again, and the voice information is converted to obtain the target text information. And when the obtained modification instruction triggered by the user is a modification instruction, modifying the output text information according to the modification instruction triggered by the user to obtain the target text information.
In the embodiment, when the information input function is detected to be started, the voice input function is started and first voice information is acquired; converting the first voice information into text information, and determining a focus position corresponding to the information input function; outputting the text information to the focal position. The method and the device finish the quick input of the text information by using the voice, solve the problem of slow information input in the traditional input mode and improve the information input efficiency.
Further, on the basis of the above-described embodiments of the present invention, a second embodiment of the voice input method of the present invention is proposed.
The present embodiment is a step of the refinement of step S20 in the first embodiment, and includes steps C1-C2:
step C1, analyzing the first voice information to obtain a conversion instruction corresponding to the first voice information;
and step C2, converting the first voice information into text information according to the conversion instruction.
In this embodiment, taking the television in the above embodiment as an example, when converting voice information input by a user into text information, the voice information input by the user is firstly analyzed to obtain a conversion instruction corresponding to the voice information. Due to the diversity of language expressions, the generated text information does not necessarily completely correspond to the voice information input by the user, and in the process of analyzing the halo information, the sentence pattern and the grammar information of the sentence corresponding to the voice information input by the user can be extracted from the voice information input by the user, so that the input intention of the user is predicted. The finally output text information does not necessarily correspond exactly to the voice information input by the user. For example, if it is acquired that the voice information input by the user is "input 123456", the phrase is extracted, and the keyword "input" in the voice information is recognized as an action command, and conversion into text information is not necessary, the text information is "123456" in the generated conversion command, and therefore the finally generated and output text information is "123456".
The refinement of the step C1 comprises the steps C11-C13:
step C11, extracting keyword information from the first voice information;
step C12, analyzing the keyword information to determine whether the first voice information needs to be optimized when converting the first voice information into text information;
step C13, if the first voice information does not need to be optimized, a conversion instruction for converting the first voice information into corresponding text information is generated.
Specifically, when analyzing the voice information input by the user, first, keyword information is extracted from the voice information input by the user, for example, "input", "text information 123456", etc., taking the above-mentioned voice information "input 123456" as an example, and then the extracted keyword information is analyzed, thereby determining whether the voice information input by the user needs to be optimized when converting the voice information input by the user into text information, and if the optimization is not needed, a conversion instruction corresponding to the voice information input by the user is directly generated.
After the step C12, the method also comprises the steps C14-C15:
step C14, if the first voice information needs to be optimized, generating an information optimization instruction for optimizing the first voice information according to the keyword information;
and step C15, generating a conversion instruction for converting the first voice information into text information based on the information optimization instruction.
Further, when the user inputs the voice information, the expression mode may be simplified or the expression with similar meaning may be used, for example, the voice information "input 123456" may be "input 1 to 6", in this case, the keyword information extracted from the voice information is "input" and "text information 1 to 6", it is known that the content that the user really wants to input is "123456", and the voice information input by the user needs to be optimized. Therefore, an information optimization instruction needs to be generated first, the simplified expression of the user is converted into corresponding complete text content according to the information optimization instruction, then a conversion instruction is generated based on the generated information optimization instruction, and the text information included in the generated conversion instruction is text information obtained by optimizing text information corresponding to voice information input by the user, that is, text information which the user wants to input. Therefore, optimizing the voice information includes expanding, supplementing and changing the text information corresponding to the multiple voice information, when the user uses the simplified expression, the voice information of the user needs to be expanded and supplemented, and when the user uses the approximate expression, the voice information of the user needs to be changed so as to optimize the text information which the user actually wants to input.
In this embodiment, a conversion instruction corresponding to first voice information is obtained by analyzing the first voice information input by a user, converting the first voice information according to the generated conversion instruction to obtain corresponding text information, that is, by extracting keyword information from the first voice information input by the user, and analyzing the extracted keyword information, determining whether the text information in the first voice information needs to be optimized, if the text information corresponding to the first voice information needs to be optimized, generating an information optimization instruction, generating a conversion instruction based on the information optimization instruction, optimizing the instruction and the conversion instruction according to the generated information, and optimizing and converting the text information to obtain the text information corresponding to the first voice information, so that the conversion accuracy between the voice information and the text information is improved.
In addition, referring to fig. 5, an embodiment of the present invention further provides a voice input device, where the voice input device includes:
the voice input module 10 is used for starting a voice input function and acquiring first voice information when detecting that the equipment starts the information input function;
the voice recognition module 20 is configured to convert the first voice information into text information, and determine a focus position corresponding to the information input function;
a text output module 30, configured to output the text information to the focal position.
Optionally, the voice input module 10 includes:
the detection unit is used for outputting prompt information for starting the voice input function and acquiring a starting instruction when detecting that the information input function is started;
and the starting unit is used for starting the voice input function according to the starting instruction.
Optionally, the speech recognition module 20 includes:
the voice analysis unit is used for analyzing the first voice information to obtain a conversion instruction corresponding to the first voice information;
and the information conversion unit is used for converting the first voice information into text information according to the conversion instruction.
Optionally, the voice parsing unit includes:
an information extraction subunit, configured to extract keyword information from the first voice information;
the analysis subunit is used for analyzing the keyword information to determine whether the first voice information needs to be optimized when the first voice information is converted into text information;
and the first instruction subunit is used for generating a conversion instruction for converting the first voice information into corresponding text information if the first voice information does not need to be optimized.
Optionally, the voice parsing unit further includes:
the supplementary instruction subunit is used for generating an information optimization instruction for optimizing the first voice information according to the keyword information if the first voice information needs to be optimized;
and the second instruction subunit is used for generating a conversion instruction for converting the first voice information into text information based on the information optimization instruction.
Optionally, the voice input device further includes:
the change instruction unit is used for acquiring a change instruction of the text information;
the information changing unit is used for changing the text information according to the changing instruction to obtain target text information;
and the text output unit is used for outputting the target text information to the focus position.
Optionally, the information modifying unit includes:
the first changing subunit is used for acquiring second voice information and converting the second voice information to obtain target text information if the changing instruction is a re-input instruction;
and the second modification subunit is used for modifying the text information according to the modification instruction to obtain the target text information if the modification instruction is the modification instruction.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a voice input program is stored on the computer-readable storage medium, and when the voice input program is executed by a processor, the voice input program implements operations in the voice input method provided in the foregoing embodiment.
The method executed by each program module can refer to each embodiment of the method of the present invention, and is not described herein again.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity/action/object from another entity/action/object without necessarily requiring or implying any actual such relationship or order between such entities/actions/objects; the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
For the apparatus embodiment, since it is substantially similar to the method embodiment, it is described relatively simply, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, in that elements described as separate components may or may not be physically separate. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be substantially or partially embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the voice input method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A voice input method, characterized by comprising the steps of:
when detecting that the information input function is started, starting a voice input function and acquiring first voice information;
converting the first voice information into text information, and determining a focus position corresponding to the information input function;
outputting the text information to the focal position.
2. The voice input method of claim 1, wherein the step of activating the voice input function when activation of the information input function is detected comprises:
when detecting that the information input function is started, outputting prompt information for starting the voice input function and acquiring a starting instruction;
and starting a voice input function according to the starting instruction.
3. The voice input method of claim 1, wherein the step of converting the first voice information into text information comprises:
analyzing the first voice information to obtain a conversion instruction corresponding to the first voice information;
and converting the first voice information into text information according to the conversion instruction.
4. The voice input method according to claim 3, wherein the step of analyzing the first voice message to obtain a conversion instruction corresponding to the first voice message comprises:
extracting keyword information from the first voice information;
analyzing the keyword information to determine whether the first voice information needs to be optimized when the first voice information is converted into text information;
and if the first voice information does not need to be optimized, generating a conversion instruction for converting the first voice information into corresponding text information.
5. The speech input method of claim 4, wherein the step of analyzing the keyword information to determine whether the first speech information needs to be optimized when converting the first speech information into text information comprises:
if the first voice information needs to be optimized, generating an information optimization instruction for optimizing the first voice information according to the keyword information;
and generating a conversion instruction for converting the first voice information into text information based on the information optimization instruction.
6. The voice input method of claim 1, wherein the step of outputting the text information to the focal position is followed by:
acquiring a change instruction of the text information;
changing the text information according to the change instruction to obtain target text information;
and outputting the target text information to the focus position.
7. The voice input method of claim 6, wherein the modification instruction includes a re-input instruction and a modification instruction, and the step of modifying the text information according to the modification instruction to obtain the target text information includes:
if the change instruction is a re-input instruction, acquiring second voice information and converting the second voice information to obtain target text information;
and if the modification instruction is a modification instruction, modifying the text information according to the modification instruction to obtain target text information.
8. A voice input apparatus, characterized in that the voice input apparatus comprises:
the voice input module is used for starting a voice input function and acquiring first voice information when detecting that the equipment starts the information input function;
the voice recognition module is used for converting the first voice information into text information and determining a focus position corresponding to the information input function;
and the text output module is used for outputting the text information to the focus position.
9. A voice input device characterized by comprising: memory, a processor and a speech input program stored on the memory and executable on the processor, the speech input program, when executed by the processor, implementing the steps of the speech input method according to any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a speech input program which, when executed by a processor, implements the steps of the speech input method of any one of claims 1 to 7.
CN202110232353.4A 2021-03-02 2021-03-02 Voice input method, device, equipment and computer readable storage medium Pending CN112951232A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110232353.4A CN112951232A (en) 2021-03-02 2021-03-02 Voice input method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110232353.4A CN112951232A (en) 2021-03-02 2021-03-02 Voice input method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112951232A true CN112951232A (en) 2021-06-11

Family

ID=76247240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110232353.4A Pending CN112951232A (en) 2021-03-02 2021-03-02 Voice input method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112951232A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060224383A1 (en) * 2005-03-29 2006-10-05 Samsung Electronics Co., Ltd. Speech processing apparatus, medium, and method recognizing and responding to speech
US20160260433A1 (en) * 2015-03-06 2016-09-08 Apple Inc. Structured dictation using intelligent automated assistants
WO2019024692A1 (en) * 2017-08-02 2019-02-07 深圳壹账通智能科技有限公司 Speech input method and device, computer equipment and storage medium
CN109917982A (en) * 2019-03-21 2019-06-21 科大讯飞股份有限公司 A kind of pronunciation inputting method, device, equipment and readable storage medium storing program for executing
CN112230811A (en) * 2020-10-15 2021-01-15 科大讯飞股份有限公司 Input method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060224383A1 (en) * 2005-03-29 2006-10-05 Samsung Electronics Co., Ltd. Speech processing apparatus, medium, and method recognizing and responding to speech
US20160260433A1 (en) * 2015-03-06 2016-09-08 Apple Inc. Structured dictation using intelligent automated assistants
WO2019024692A1 (en) * 2017-08-02 2019-02-07 深圳壹账通智能科技有限公司 Speech input method and device, computer equipment and storage medium
CN109917982A (en) * 2019-03-21 2019-06-21 科大讯飞股份有限公司 A kind of pronunciation inputting method, device, equipment and readable storage medium storing program for executing
CN112230811A (en) * 2020-10-15 2021-01-15 科大讯飞股份有限公司 Input method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
TWI544366B (en) Voice input command
US8996386B2 (en) Method and system for creating a voice recognition database for a mobile device using image processing and optical character recognition
KR101897492B1 (en) Display apparatus and Method for executing hyperlink and Method for recogniting voice thereof
EP2899719B1 (en) Display apparatus for performing voice control and voice controlling method thereof
US9691381B2 (en) Voice command recognition method and related electronic device and computer-readable medium
JP6450768B2 (en) Quick task for on-screen keyboard
US20120050530A1 (en) Use camera to augment input for portable electronic device
US10528320B2 (en) System and method for speech-based navigation and interaction with a device's visible screen elements using a corresponding view hierarchy
JP6618223B2 (en) Audio processing method and apparatus
KR101756042B1 (en) Method and device for input processing
EP2657856A1 (en) Contact search method, device and mobile terminal applying same
KR20160014465A (en) electronic device for speech recognition and method thereof
EP2518722A2 (en) Method for providing link list and display apparatus applying the same
CN109215640B (en) Speech recognition method, intelligent terminal and computer readable storage medium
CN110827825A (en) Punctuation prediction method, system, terminal and storage medium for speech recognition text
CN107918509B (en) Software shortcut prompt setting method and device and readable storage medium
CN108256523B (en) Identification method and device based on mobile terminal and computer readable storage medium
CN110825296A (en) Application control method, device and computer readable storage medium
CN112951232A (en) Voice input method, device, equipment and computer readable storage medium
US20220114367A1 (en) Communication system, display apparatus, and display control method
US11501762B2 (en) Compounding corrective actions and learning in mixed mode dictation
KR100919227B1 (en) The method and apparatus for recognizing speech for navigation system
JP2017531889A (en) Character string storage method and apparatus
CN112764551A (en) Vocabulary display method and device and electronic equipment
US20170091316A1 (en) Electronic device and controlling method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination