CN111883134A

CN111883134A - Voice input method and device, electronic equipment and storage medium

Info

Publication number: CN111883134A
Application number: CN202010723238.2A
Authority: CN
Inventors: 郭毓伟
Original assignee: Beijing Fotoable Technology Ltd
Current assignee: Beijing Fotoable Technology Ltd
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2020-11-03
Anticipated expiration: 2040-07-24
Also published as: CN111883134B

Abstract

The application provides a voice input method, a voice input device, electronic equipment and a storage medium, which are applied to voice input of a voice input system, and the method comprises the following steps: acquiring voice information to be recognized, wherein the voice information to be recognized comprises at least one character string information; inputting the voice information to be recognized into a preset voice recognition model for voice recognition to obtain a character string text corresponding to the voice information to be recognized, wherein an algorithm for constructing the preset voice recognition model comprises a convolutional neural network and a long-short term memory network; matching the character string text with character strings in a preset database; and if the matching is successful, inputting the character string text into a voice input system to realize voice input. According to the game platform, voice recognition is carried out in the game of the hand game, answer recognition is carried out on voice input, the manual input of answers of a player is replaced, the game level is completed, the game player is helped to play, the user experience of the game player is improved, and the higher user requirements are met.

Description

Voice input method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of speech recognition technologies, and in particular, to a speech input method and apparatus, an electronic device, and a storage medium.

Background

With the progress of data processing technology and the rapid popularization of mobile internet, computer technology is widely applied to various fields of society, and speech recognition technology will enter various fields such as industry, household appliances, communication, automotive electronics, medical treatment, home services, consumer electronics, and the like in the future, for example: in the current popular hand game market, such as the word game played by the public, when a game player makes a game break-through, the game player needs to input a corresponding answer in each game level to complete the game level.

However, in the manner of manually inputting answers by a game player, the game level can be completed only by inputting corresponding answers with both hands of the player, and the game level cannot be completed under the condition that both hands of the game player are occupied or some words cannot be spelled, so that the user experience of the game player is reduced, and higher user requirements cannot be met.

Disclosure of Invention

The application provides a voice input method, a voice input device, electronic equipment and a storage medium, which are used for improving the user experience of game players and meeting higher user requirements.

A voice input method is applied to voice input of a voice input system, and comprises the following steps:

acquiring voice information to be recognized, wherein the voice information to be recognized comprises at least one character string information;

inputting the voice information to be recognized into a preset voice recognition model for voice recognition to obtain a character string text corresponding to the voice information to be recognized, wherein an algorithm for constructing the preset voice recognition model comprises a convolutional neural network and a long-short term memory network;

matching the character string text with character strings in a preset database;

and if the matching is successful, inputting the character string text into the voice input system to realize voice input.

Further, the method also comprises the following steps:

and if the matching is unsuccessful, sending message reminding information corresponding to the unsuccessful matching to the voice input system, and deleting the character string text.

Further, after inputting the character string text into the voice input system, the method further comprises:

and sending message reminding information corresponding to successful matching to the voice input system, and deleting the character string text.

Further, before the obtaining the speech to be recognized, the method further includes: selecting position information corresponding to a text of a character string to be input;

inputting the character string text into the voice input system, specifically:

and inputting the character string text to a position corresponding to the position information corresponding to the character string text to be input.

Further, the process of constructing the preset speech recognition model specifically includes:

acquiring voice information sample data, wherein the voice information sample data comprises voice information of various character string information;

and training a voice recognition model by adopting a convolutional neural network algorithm and a long-short term memory network algorithm based on the voice information sample data to obtain the preset voice recognition model.

A voice input device applied to voice input of a voice input system comprises:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring voice information to be recognized, and the voice information to be recognized comprises at least one character string information;

the recognition unit is used for inputting the voice information to be recognized into a preset voice recognition model for voice recognition so as to obtain a character string text corresponding to the voice information to be recognized, wherein an algorithm for constructing the preset voice recognition model comprises a convolutional neural network and a long-short term memory network;

the matching unit is used for matching the character string text with character strings in a preset database;

and the input unit is used for inputting the character string text into the voice input system to realize voice input if the matching is successful.

Further, the method also comprises the following steps:

and the first sending unit is used for sending message reminding information corresponding to unsuccessful matching to the voice input system and deleting the character string text if the matching is unsuccessful.

Further, the method also comprises the following steps:

and the second sending unit is used for sending the message reminding information corresponding to the successful matching to the voice input system and deleting the character string text.

An electronic device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the speech input method as described above.

A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform a speech input method as described above.

Compared with the prior art, the voice input method, the voice input device, the electronic equipment and the storage medium are applied to voice input of a voice input system, and the method comprises the following steps: acquiring voice information to be recognized, wherein the voice information to be recognized comprises at least one character string information; inputting the voice information to be recognized into a preset voice recognition model for voice recognition to obtain a character string text corresponding to the voice information to be recognized, wherein an algorithm for constructing the preset voice recognition model comprises a convolutional neural network and a long-short term memory network; matching the character string text with character strings in a preset database; and if the matching is successful, inputting the character string text into the voice input system to realize voice input. According to the game platform, voice recognition is carried out in the game of the hand game, answer recognition is carried out on voice input, the manual input of answers of a player is replaced, the game level is completed, the game player is helped to play, the user experience of the game player is improved, and the higher user requirements are met.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a speech input system according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a voice input method according to an embodiment of the present application;

fig. 3 to 8 are schematic display diagrams of a game interface provided in an embodiment of the present application in various states;

fig. 9 is a schematic structural diagram of a voice input device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The present application provides a voice input method, apparatus, electronic device and storage medium, which are applied to the voice input system shown in fig. 1, such as word games played by the public, including english and chinese, such as words, idioms or postlanguage word-filling, and the like), and the voice input system includes: the game system comprises a voice input device 10, a voice recognition device 20 and a voice output device 30, wherein the voice input device 10 receives voice information input by a user, the voice recognition device 20 performs voice recognition on the voice information input by the user to obtain a voice recognition result, and the voice recognition result is sent to a character game page through the voice output device 30, so that a game level is completed.

The invention of the present application aims to: how to improve the user experience of game players and meet higher user requirements.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 2, a schematic flow chart of a voice input method according to an embodiment of the present application is shown. As shown in fig. 2, a voice input method provided in the embodiment of the present application is applied to voice input of a voice input system, and specifically includes the following steps:

s200: and selecting the position information corresponding to the text of the character string to be input.

In practical applications, such as word-filling character games in a hand game, a game player first needs to select position information corresponding to a text of a character string to be input, where the position information mainly identifies a position of the character string to be input on a game interface, as shown in fig. 3.

S201: and acquiring voice information to be recognized, wherein the voice information to be recognized comprises at least one character string information.

In practical applications, such as word-filling character games in a hand game, a game player enters a voice input mode through a mode conversion key arranged on a game interface, as shown in fig. 3, the mode conversion key is arranged at a lower left corner. After the game player clicks the mode conversion key, the game interface is as shown in fig. 4, which includes a button for returning to the keyboard interface, a microphone button, and its prompt text, and a countdown display (60 s). In the embodiment of the present application, when the game player presses the microphone button in fig. 4, a voice input prompt sound is set, and an animation of the volume of the microphone is set. Further, when the game player releases the microphone button, a release warning tone may be provided.

It should be noted that the speech information to be recognized includes at least one character string information, that is, the speech information may include a plurality of character strings, that is, the game player may input the speech information including the character strings according to the word vocabulary of the game player.

S202: and inputting the voice information to be recognized into a preset voice recognition model for voice recognition so as to obtain a character string text corresponding to the voice information to be recognized, wherein an algorithm for constructing the preset voice recognition model comprises a convolutional neural network and a long-short term memory network.

It should be noted that speech recognition technology, also known as Automatic Speech Recognition (ASR), aims to convert the vocabulary content in human speech into computer-readable input, such as keystrokes, binary codes or character sequences. In the embodiment of the application, the speech recognition technology belongs to technologies known to those skilled in the art, and therefore, details of the speech recognition are not described again, and specific contents may refer to related technologies.

In an embodiment of the present application, the process of constructing the preset speech recognition model specifically includes the following steps:

and acquiring voice information sample data, wherein the voice information sample data comprises voice information of various character string information.

And training the voice recognition model by adopting a convolutional neural network algorithm and a long-short term memory network algorithm based on voice information sample data to obtain a preset voice recognition model.

In the embodiment of the application, because the recognition efficiency of the voice recognition technology is continuously improved, a game player can use the voice recognition technology to complete the game stage by inputting voice, the voice recognition system can recognize the voice input of the player to help the player to play the game, the voice recognition technology is mainly applied to a hand game, and the game stage is completed by inputting voice.

S203: and matching the character string text with the character strings in a preset database.

It should be noted that a character string corresponding to a correct answer in a word game is pre-stored in a preset database, when voice recognition is performed on voice information to be recognized input by a game player, a corresponding character string text is obtained, the character string text is matched with the character string in the preset database, and whether the character string obtained through recognition is the correct answer is determined.

S204: judging whether the matching is successful, if so, executing the step S205; if the matching is not successful, step S207 is executed.

S205: and inputting the character string text to a position corresponding to the position information corresponding to the character string text to be input so as to realize voice input.

S206: and sending message reminding information corresponding to successful matching to the voice input system, and deleting the character string text.

S207: and sending the message reminding information corresponding to unsuccessful matching to the voice input system, and deleting the character string text.

In the embodiment of the application, the recognized character strings are matched with the character strings in the preset database one by one, and when the character strings which are successfully matched exist, the character string texts are input to the positions corresponding to the position information corresponding to the character string texts to be input, so as to realize voice input, as shown in fig. 5, the process from the step that a game player presses a microphone button, and the voice recognition is carried out to the step that words are successfully input is realized.

It should be noted that, after a game player inputs a word by voice, if matching is successful, the character string text is input to a position corresponding to position information corresponding to the character string text to be input, so as to implement voice input, as shown in fig. 5, after input is successful, a message reminding message corresponding to successful matching is sent to the voice input system, that is, when the game player submits a correct answer, a prompt message of "inputting a correct answer animation" is sent to the voice input system, and the recognized character string text is deleted. If the matching is unsuccessful, sending a message reminding message corresponding to the unsuccessful matching to the voice input system, namely sending a prompt message of 'inputting incorrect answer animation' to the voice input system when the answer submitted by the game player is incorrect, and deleting the character string text.

Furthermore, it should be noted that when the game player answers correctly, a prompt tone may be provided, and after the game player answers correctly, the button is released and an animation with correct answer is provided.

In order to further explain the application scenario of the embodiment of the application, the application scenario applied by the method can set a Party mode, and in the Party mode, the topic can be automatically switched and the voice is started; when the game player answers correctly, playing voice prompt CORRECT; playing a countdown prompt tone when the voice overtime occurs, or directly starting the next round of voice input; in addition, when the party speech recognition is clicked or the next question is started automatically, the speech question is given to the user X letters, … … (query).

It should be noted that, in the embodiment of the present application, the game interface in the wireless state is displayed as shown in fig. 6, and at any time, when a network link failure is found, the original microphone interface is replaced with the wireless prompt interface shown in fig. 6.

Regarding the game version problem of the game player, that is, the Beta version, in the Beta version, the voice recognition cannot be continued due to the limitation of the number of uses, and the game interface of the game player is displayed as shown in fig. 7.

It should be further noted that, when the game player plays the game, there is a flow of guiding by the novice, for the novice, the flow of starting the guiding by the new user at the 2 nd switch, and for the old user, as long as the 2 nd switch is passed, the flow of starting the guiding after the 6 th switch. Before starting the user guide, the game player can not see the entrance of the voice mode, and one user has the new hand guide for life, and the guide number of the game player can be configured locally according to the requirement of the user. As shown in fig. 8, when the player clicks the speech recognition for the first time, 2 permission applications pop up, and when any one permission application fails, a picture C appears and a diagram is needed to be attached to show the opening permission flow. And if the player does not open the authority, clicking any position and ending the guidance.

In the embodiment of the present application, various data of voice recognition need to be dotted in time, which mainly includes: BQ dotting and BI dotting, wherein the BQ dotting is a newly-built voice input dotting table, and the dotting time is each successful detection and is shown in the following table; and adding 1 column into the BI dotting LevelPass table and the FlashCrazePass table, and comparing the number of the topics by using voice input.

BQ dotting meter

The voice input method provided by the embodiment of the application is applied to voice input of a voice input system, and comprises the following steps: acquiring voice information to be recognized, wherein the voice information to be recognized comprises at least one character string information; inputting the voice information to be recognized into a preset voice recognition model for voice recognition to obtain a character string text corresponding to the voice information to be recognized, wherein an algorithm for constructing the preset voice recognition model comprises a convolutional neural network and a long-short term memory network; matching the character string text with character strings in a preset database; and if the matching is successful, inputting the character string text into the voice input system to realize voice input. According to the embodiment of the application, the voice recognition is carried out in the hand game, the answer recognition is carried out on the voice input, the manual input of the answer of a player is replaced, the game stage is completed, the game player is helped to play, the user experience of the game player is improved, and the higher user requirement is met.

Referring to fig. 9, based on the voice input method disclosed in the foregoing embodiment, the present embodiment correspondingly discloses a voice input device, which is applied to voice input of a voice input system, and the device specifically includes:

an obtaining unit 901, configured to obtain to-be-recognized voice information, where the to-be-recognized voice information includes at least one character string information.

And the recognition unit 902 is configured to input the speech information to be recognized into a preset speech recognition model for speech recognition, so as to obtain a character string text corresponding to the speech information to be recognized, where an algorithm for constructing the preset speech recognition model includes a convolutional neural network and a long-short term memory network.

And the matching unit 903 is configured to match the character string text with a character string in a preset database.

And an input unit 904, if the matching is successful, configured to input the character string text into the voice input system, so as to implement voice input.

And the first sending unit 905 is configured to send message reminding information corresponding to unsuccessful matching to the voice input system and delete the character string text if the matching is unsuccessful.

A second sending unit 906, configured to send message reminding information corresponding to successful matching to the voice input system, and delete the character string text.

The device comprises a processor and a memory, wherein the acquisition unit, the identification unit, the matching unit, the first transmission unit, the second transmission unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.

The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, voice recognition is carried out in the hand game, answer recognition is carried out on voice input, and the manual input of answers by a player is replaced, so that a game stage is completed, the game player is helped to play, the user experience of the game player is improved, and higher user requirements are met.

An embodiment of the present invention provides a storage medium on which a program is stored, the program implementing the voice input method when executed by a processor.

The embodiment of the invention provides a processor, which is used for running a program, wherein the voice input method is executed when the program runs.

An embodiment of the present invention provides an electronic device, as shown in fig. 10, the electronic device 100 includes at least one processor 1001, and at least one memory 1002 and a bus 1003 connected to the processor; the processor 1001 and the memory 1002 complete communication with each other through the bus 1003; the processor 1001 is used for calling the program instructions in the memory 1002 to execute the voice input method described above.

The electronic device herein may be a server, a PC, a PAD, a mobile phone, etc.

The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:

matching the character string text with character strings in a preset database;

Preferably, the method further comprises the following steps:

Preferably, after the character string text is input to the voice input system, the method further includes:

Preferably, before the acquiring the speech to be recognized, the method further includes: selecting position information corresponding to a text of a character string to be input;

inputting the character string text into the voice input system, specifically:

Preferably, the process of constructing the preset speech recognition model specifically includes:

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A speech input method applied to speech input of a speech input system, the method comprising:

matching the character string text with character strings in a preset database;

2. The method of claim 1, further comprising:

3. The method of claim 1, further comprising, after entering the string of characters into the speech input system:

4. The method according to any one of claims 1-3, wherein before the obtaining the speech to be recognized, further comprising: selecting position information corresponding to a text of a character string to be input;

inputting the character string text into the voice input system, specifically:

5. The method according to claim 1, wherein the process of constructing the preset speech recognition model specifically comprises:

6. A speech input device for speech input in a speech input system, the device comprising:

7. The apparatus of claim 6, further comprising:

8. The apparatus of claim 6, further comprising:

9. An electronic device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the speech input method of any of claims 1-5.

10. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the speech input method of any of claims 1-5.