CN111326145B

CN111326145B - Speech model training method, system and computer readable storage medium

Info

Publication number: CN111326145B
Application number: CN202010074272.1A
Authority: CN
Inventors: 塞力克·斯兰穆; 陈乙银; 郑斌
Original assignee: Shenzhen Grey Shark Technology Co ltd
Current assignee: Shenzhen Grey Shark Technology Co ltd
Priority date: 2020-01-22
Filing date: 2020-01-22
Publication date: 2023-04-28
Anticipated expiration: 2040-01-22
Also published as: CN111326145A

Abstract

The invention provides a voice model training method, a system and a computer readable storage medium, wherein the voice model training method comprises the following steps: forming a prompt interface to display information for activating the voice message receiving function; receiving at least one externally formed voice message; identifying each voice message to form at least one identification result message, and displaying the identification result message on a mapping interface; the operation unit of the target application program is also displayed on the mapping interface; and associating each identification result message with one or more operation units, forming a configuration relation and then storing the configuration relation. After the technical scheme is adopted, through training of the voice model, the working time caused by voice semantic recognition can be reduced, and the power consumption in the voice operation process is reduced.

Description

Speech model training method, system and computer readable storage medium

Technical Field

The present invention relates to the field of training model generation, and in particular, to a method, a system, and a computer readable storage medium for training a speech model.

Background

Along with the rapid popularization of intelligent terminals, tablet computers and notebook computers, people have increasingly depended on the use of the devices. For use of such devices, a user typically inputs a designation based on a touch screen that the device has, such as a single click, a double click, a long press of an operation button displayed on the touch screen, to output an operation instruction to the device.

In order to enrich the instruction input of users to devices, many device manufacturers have developed voice-operated functions. And through recognition of the voice sent by the user to the equipment, the voice is analyzed into the operation of the equipment, and then the corresponding operation is executed.

In the prior art, the voice input is converted into the voice command through voice recognition, and then the voice command and the game command in the game are mapped, and in the concrete implementation, the voice acquisition recognition module and the voice control command set are required to be packaged into an SDK and deeply integrated into the game module, or the modification of the input driving program in the terminal equipment is required to be realized with high cost, and the deep cooperation development of the game manufacturer and the equipment manufacturer is required to be completed. And the mode has poor compatibility, needs to be adapted for each game instruction, and does not consider the problem of power consumption of voice recognition. In addition, if the voice recognition process is long or stuck, the instruction input of the user will be affected.

Therefore, a novel voice model training method is needed, a model applied to low-power-consumption scene control can be obtained through training, and the cruising ability of the intelligent terminal is improved.

Disclosure of Invention

In order to overcome the technical defects, the invention aims to provide a voice model training method, a system and a computer readable storage medium, which can reduce the working time during voice semantic recognition and reduce the power consumption during voice operation through training of a voice model.

The invention discloses a voice model training method, which comprises the following steps:

forming a prompt interface to display information for activating the voice message receiving function;

receiving at least one externally formed voice message;

identifying each voice message to form at least one identification result message, and displaying the identification result message on a mapping interface;

the operation unit of the target application program is also displayed on the mapping interface;

and associating each identification result message with one or more operation units, forming a configuration relation and then storing the configuration relation.

Preferably, the step of identifying each voice message to form at least one identification result message and displaying the identification result message on a mapping interface comprises:

analyzing the voice message and converting the voice message into a text message;

extracting keywords in the text message;

the keyword is saved as at least one recognition result message, and the recognition result message is sent to a server side to generate a voice model at the server side.

Preferably, the step of extracting keywords in the text message includes:

acquiring common expressions of a target application program and the target application program;

comparing the text message with the common term, and extracting the content of the text message, which is matched with the common term or has the similarity higher than a preset threshold value;

the common term that stores content as a keyword or modifies content to have closest similarity is a keyword.

Preferably, the step of further displaying the operation unit of the target application on the mapping interface includes:

acquiring the type and key frame of a target application program;

and extracting part or all operation units which operate on the target application program in the key frame.

Preferably, the step of associating each recognition result message with one or more operation units, and storing after forming the configuration relation includes:

receiving external operation executed on the mapping interface, and moving the position of the operation unit on the mapping interface according to the external operation;

when any operation unit moves to a position corresponding to a recognition result message, the recognition result message is associated with the operation unit;

and storing the association relation between each operation unit and the recognition result message as the configuration relation of the voice model.

Preferably, the method further comprises the following steps:

naming the configuration relation and downloading the voice model from the server;

modifying the name of the voice model into the name of the configuration relation, and storing the configuration relation into the voice model;

the speech model is saved to a database.

The invention also discloses a voice model training system, which comprises:

the prompt module forms a prompt interface and displays information for activating the voice message receiving function;

a receiving module for receiving at least one voice message formed externally;

the recognition module recognizes each voice message to form at least one recognition result message, and displays the recognition result message on a mapping interface;

the interaction module forms a mapping interface, and an operation unit of the target application program is further displayed on the mapping interface;

and the association module associates each identification result message with one or more operation units to form a configuration relation and then stores the configuration relation.

Preferably, the association module comprises:

the mobile unit is connected with the interaction module, receives external operation executed on the mapping interface, and moves the position of the operation unit on the mapping interface according to the external operation;

a display unit highlighting the moving operation unit;

the association unit associates the identification result message with the operation unit when any operation unit moves to a position corresponding to the identification result message;

and the storage unit is used for storing the association relation between each operation unit and the recognition result message as the configuration relation of the voice model.

The invention also discloses a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, realizes the following steps:

receiving at least one externally formed voice message;

After the technical scheme is adopted, compared with the prior art, the method has the following beneficial effects:

1. the trained model supports a plurality of applications in the same scene or a plurality of applications in different scenes;

2. the mapping mode is more direct, so that a user can correlate the trained voice model with an operation instruction conveniently;

3. when the voice model is used, the recognition power consumption and time are reduced, and the process of converting voice into operation is effectively accelerated.

Drawings

FIG. 1 is a flow chart of a method for training a speech model according to a preferred embodiment of the invention;

FIG. 2 is a flow chart of a method for training a speech model according to a further preferred embodiment of the present invention;

FIG. 3 is a flow chart of a method for training a speech model according to yet a further preferred embodiment of the present invention;

FIG. 4 is a schematic diagram of a speech model training system according to a preferred embodiment of the present invention.

Detailed Description

Advantages of the invention are further illustrated in the following description, taken in conjunction with the accompanying drawings and detailed description.

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

In the description of the present invention, it should be understood that the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and defined, it should be noted that the terms "mounted," "connected," and "coupled" are to be construed broadly, and may be, for example, mechanical or electrical, or may be in communication with each other between two elements, directly or indirectly through intermediaries, as would be understood by those skilled in the art, in view of the specific meaning of the terms described above.

In the following description, suffixes such as "module", "component", or "unit" for representing elements are used only for facilitating the description of the present invention, and are not of specific significance per se. Thus, "module" and "component" may be used in combination.

Referring to fig. 1, a flowchart of a method for training a speech model according to a preferred embodiment of the present invention is shown, in which the method for training a speech model includes the following steps:

s100: forming a prompt interface to display the information activating the voice message receiving function

The voice model training method can be completed in a server end and an intelligent terminal, and when the voice model is trained, the voice model is displayed outwards through interaction media such as a display screen carried by the server end, a display screen connected with the server end, a display screen of the intelligent terminal and the like. After the display screens are arranged, a prompt interface is formed when the voice model training method is started, the prompt interface is displayed on the display screens, and information of the activated voice message receiving function is displayed, so that a user needing to form a voice model can be informed of sending voice messages to a server side, an intelligent terminal or a device which is connected with the server side and the intelligent terminal and can receive voice, such as a microphone, so as to start voice recognition and model establishment.

S200: receiving externally formed at least one voice message

And prompting the user to send the information of the voice message to the equipment according to the displayed prompting interface and after entering the model training interface. After receiving the model training interface, the user may send at least one voice message to a device (e.g., a server, an intelligent terminal, or a device connected to the server, an intelligent terminal and capable of receiving voice) according to the guidance of the model training interface, for example, an operation instruction message including pure Chinese, such as "attack", "defense", "city return", "set", "retreat", etc., or an operation instruction message including foreign language, such as "attack", "security", "back", "done", etc., or an operation instruction message including digital, such as "666", "333", "886", etc.

S300: identifying each voice message to form at least one identification result message, and displaying the identification result message on a mapping interface

After receiving the voice messages, each voice message is subjected to voice recognition to form at least one recognition result message. It will be appreciated that the resulting recognition result message may correspond entirely to the received voice message, e.g., the user entered the voice message as "all attacks" on the device, the recognition result message as "all attacks" or may correspond to a portion of the received voice message, e.g., the recognition result message as "all attacks" or "attacks". The recognition result message formed by recognition is displayed on a display screen of the device, specifically, a mapping interface of the display screen, so as to inform the user that the recognition result of the voice message by the user equipment can confirm the recognition accuracy, and when the recognition result message accuracy is high enough (greater than a set threshold value or confirmed by the user), the next step can be executed; and when the accuracy of the recognition result message is insufficient (less than the set threshold value, or not confirmed by the user), the user may be requested to re-input the voice message, or to re-recognize the voice message, until the accuracy of the recognition result message is sufficiently high.

S400: operation unit for displaying target application program on mapping interface

In addition to the recognition result message, at least one target application program is displayed on the mapping interface, where the target application programs are application programs that can use a voice model and execute corresponding operations according to the voice model, such as a game application program that executes corresponding operations according to the voice message, a media application program that executes streaming media control according to the voice message, and so on. On the mapping interface, a unique, easily identifiable operation unit, such as a name, an icon, etc., of the target application program may be employed for display of the target application program. That is, the mapping interface displays the identification result message and the operation unit corresponding to the target application program, so that the user can conveniently know the usage scenarios to which the identification result message can correspond.

S500: associating each identification result message with one or more operation units to form configuration relationship and storing

The user can input a control instruction in the mapping interface, and for each identification result message, the control instruction is associated with one or more operation units, so that a mapping relation between the identification result message and the operation units is formed, the mapping relation between the identification result message and the target application program is extended, the mapping relation between the voice message and specific operations in the target application program is further extended, and the mapping relation is stored as a configuration relation. For example, the recognition result message is "attack", and according to the mapping operation of the user on the recognition result message, the recognition result message of "attack" is associated with a game-like application program such as "king", "using summoning", "yin-yang engineer", etc., so that the "attack" after the voice recognition will correspond to the specific operation of the target application program within the formed voice model. The specific operation may be that the recognition result message is associated with a specific icon in the application program in the mapping interface according to the mapping operation of the user, so that the initial "attack" voice message is converted into execution of attack icons of game application programs such as "king glory", "using summons", "yin-yang engineer" and the like.

Through the configuration, the trained model supports a plurality of applications in the same scene or a plurality of applications in different scenes, so that one voice message can be used in a plurality of application programs, and the occupied space of the voice model is saved; secondly, the user maps the voice message with the operation unit more directly.

Referring to fig. 2, in a preferred embodiment, the step S300 of identifying each voice message to form at least one identification result message and displaying the identification result message on a mapping interface includes:

s310: parsing voice message and converting voice message into text message

After receiving the voice message, the voice message in the form of voice signal can be converted into text message by the voice recognition module. The speech recognition module used in this embodiment may be a conventional APK or the like that converts speech into text.

S320: extracting keywords in the text message;

for the converted text message, keywords therein will be extracted. The extraction of the keywords may be the whole text message (for example, when the number of words in the text message is small), or the text message with noise removed, or the rest keywords unrelated to the operation instruction, as described above.

S330: storing the keyword as at least one recognition result message, and sending the recognition result message to a server side to generate a voice model at the server side

And storing the obtained keywords as at least one recognition result message, and when the voice message is received as the intelligent terminal, the intelligent terminal can send the recognition result message to a server end, and after the recognition result message is stored in the server end, converting the recognition result message into a common voice model.

More specifically, the step S320 of extracting keywords in the text message includes:

s321: acquiring common expressions of a target application program and the target application program;

and selecting part or all of the applications installed in the intelligent terminal of the user as target applications according to the selection operation of the user. After the target application program is selected, the commonly used expressions in the target application program are acquired. Taking the target application program of "the owner glowing" as an example, after the target application program is determined to include that the common information of "the owner glowing" can be called from the network as common words, such as "one wave", "wild", "back city", "withdrawing", and the like, the common information special for the user can be customized according to the configuration of the user, such as "shoot with me", "get back without back" lamp, and the interfaces of "the owner glowing" can be identified, so that the characters displayed in the interfaces are converted into common words, such as characters directly displayed in the interfaces of the target application program, such as "mall", "setting", "hero", and the like; taking the target application program of "Tengmao video" as an example, after the target application program is determined to include, common information of "Tengmao video" can be called from the network as common words, such as "exit", "recommend", "increase volume", etc., common information dedicated to the user can be customized according to the configuration of the user, such as "fast forward 15 seconds", "fast reverse 30 seconds", "next head", etc., each interface of "Tengmao video" can be identified, and characters displayed in the interface can be converted into common words, such as characters directly displayed in the interface of the target application program, such as "daily recommendation", "movie", "synthetic skill", "sports", etc.

S322: comparing the text message with the common term, and extracting the content of the text message, which is matched with the common term or has the similarity higher than a preset threshold value;

after the words are commonly used, the text message obtained after recognition is compared with the commonly used words, and the following situations can exist in the comparison:

1) Text message and common term complete matching

Taking the commonly used term as an example of "attack" or "city return", when the text message converted from the voice message is "attack" or "city return", on the one hand, the text message represents that the user inputs the voice of the terminal as "attack" or "city return" sent by the user, and on the other hand, the text message is completely reserved under the condition that the text message is completely matched with the commonly used term.

2) Part of text message matches common term

Taking the common expressions as "attack" or "get around the city" as an example, when the text message converted from the voice message is "i want to attack", "attack counterpart", "i want to get around the city" or "get around the city", on the one hand, the text message indicates that the user sends the voice input to the terminal for the voice to attack "," attack counterpart "," i want to get around the city "or" get around the city ", on the other hand, the text message contains all the common expressions, but does not completely reserve the text message, but extracts the common expressions included in the text message as the recognition result message, so as to save the occupation space of the voice model.

3) Partial matching of text messages with commonly used words

Taking the commonly used terms as "fast forward 15 seconds", "fast reverse 30 seconds", "play music adjustment atmosphere", when the text message converted from the voice message is "fast reverse", "fast forward" or "coming music", on the one hand, the text message indicates "fast reverse", "fast forward" or "coming music" sent by the user for the voice input by the terminal, on the other hand, the text message includes the commonly used terms, that is, the text message includes the portion of the commonly used terms, so that the text message can be selectively and completely reserved, for example, only "fast reverse", "fast forward" or "coming music" is reserved, or the text message is automatically mapped to the commonly used terms according to the inclusion procedure of the text message and the commonly used terms, for example, when the text message is "fast reverse", the text message is extracted as "fast reverse 30 seconds" closest to the "fast reverse".

4) Partial matching of text messages with commonly used words

Taking the commonly used terms as "fast forward 15 seconds", "fast backward 30 seconds", "play music adjustment atmosphere" as an example, when the text message converted from the voice message is "i want to fast forward", "i want to fast backward" or "i want to play music", on the one hand, the text message indicates that the user sends the voice input to the terminal for "i want to fast forward", "i want to fast backward" or "i want to play music", and on the other hand, the text message includes a portion of the commonly used terms, that is, a portion of the text message overlaps a portion of the commonly used terms, and then only the portion overlapping the two is reserved, and "fast backward", "fast forward" or "play music" is reserved.

5) The similarity between the part of the text message and the part of the commonly used term is higher than a threshold value

Taking the commonly used expressions of "fast forward 15 seconds", "fast backward 30 seconds" and "play music adjustment atmosphere" as examples, when the text message converted from the voice message is "i want to go forward", "i want to look back" or "i want to play songs", on the one hand, "i want to go forward", "i want to look back" or "i want to play songs" sent by the user for the voice input by the terminal is indicated, on the other hand, the part of the text message is basically not overlapped or not overlapped with the part of the commonly used expression, but the control instruction expressed by the text message is actually the same as the control instruction expressed by the commonly used expression. In this case, step S322 simply recognizes the meaning of the word message in addition to the recognition of the word message, compares the meaning with each of the expressions in the commonly used words, considers that the word message has a certain degree of similarity to the commonly used word when the meaning of the expression is identical to the meaning of the expression, and can selectively include all of the word message or all of the commonly used word as a keyword when the degree of similarity is greater than a predetermined threshold.

S323: storing content as keywords or modifying content to the commonly used term with closest similarity as keywords

In each of the above cases, the extracted content is finally saved as a keyword, or the content is modified with a common term as a standard. For example, in the case of 4) and 5) above, it is preferable to use the commonly used term as a usage standard so that the procedure of extracting and expressing meaning understanding of the text message can be simplified. Based on the existing commonly used terms, the analysis results which have expressed meaning on the commonly used terms in advance can be used, so that the formation flow of the voice model is simplified.

In a preferred embodiment, the step S400 of displaying the operation unit of the target application program on the mapping interface further includes:

s410: acquiring the type and key frame of a target application program;

and acquiring an application program list installed in the intelligent terminal, and determining the types of the application programs, such as games, media, social, reading, news and the like, according to the application programs or all the application programs which are set by a user and can be used as target application programs. For these target applications, at least one key frame under their activation and running will also be acquired, e.g. a display frame under the target application launch interface, a display frame under the entry operation interface, a display frame under the most commonly used interface, etc.

S420: extracting part or all operation units operating on target application program in key frame

After the key frames are acquired, part or all of operation units corresponding to the operation of the target application program are extracted. For example, in a certain key frame, the operation unit has an attack key, a defending key, a skill key, which are always displayed at the front end, or a direction key, an instruction key, a guide key, or the like, which are displayed after the user touches the display screen.

In another preferred embodiment, the step S500 of associating each recognition result message with one or more operation units to form a configuration relationship and then storing the configuration relationship includes:

s510: receiving external operation executed on the mapping interface, and moving the position of the operation unit on the mapping interface according to the external operation;

the operation unit is displayed on the mapping interface to inform the user which operations within the target application will be mapped to the speech model. After the user recognizes these operation units, external operations such as long pressing, clicking, double clicking, etc. of the operation units are applied to the display screen, and according to these external operations, when a contact portion of the user to the display screen, such as a finger, a touch pen, etc., moves on the display screen, the operation units also move with the movement of the contact portion, thereby changing the position of the operation units within the mapping interface.

S520: when any operation unit moves to a position corresponding to a recognition result message, the recognition result message is associated with the operation unit;

the mapping interface also displays the identification result information, and a blank area can be arranged beside the identification result information to be used for establishing the mapping relation. For example, if one or more operation units are moved into the blank area and maintained for a certain time, it indicates that the operation unit is associated with the recognition result message. Thus, when any one or more of the operation units is moved to the position corresponding to the recognition result message based on the operation of the user, and the contact portion of the user is moved out of the display screen, the final position of the operation unit is indicated, and when the final position corresponds to the recognition result message, the recognition result message is associated with the operation unit.

S530: and storing the association relation between each operation unit and the recognition result message as the configuration relation of the voice model.

After the identification result message is associated with the operation units, the association relation between each operation unit and the identification result message is saved, and if the operation unit further has the next keyword or the identification result message corresponding to the keyword, the operation unit further can be configured.

Referring to fig. 3, in a preferred embodiment, the method for training a speech model further comprises the steps of:

s600: naming the configuration relation and downloading the voice model from the server;

according to the operation of the user, naming each stored configuration relation can be the application of the target application program and the voice model, such as the glowing of an owner, the summoning of a user, the blood return and the like, or the plurality of configurations are stored after being packaged, the naming mode is only the target application program, and the original voice model is downloaded from a server side.

S700: modifying the name of the voice model into the name of the configuration relation, and storing the configuration relation into the voice model;

after receiving the native voice model, the voice model can be modified into the name of the configuration relation, and the configuration relation is saved into the voice model. Finally, step S800 is executed to save the speech model to a database, and then the configuration interface or the mapping interface is finished.

Referring to fig. 4, there is shown a speech model training system comprising: the prompt module forms a prompt interface and displays information for activating the voice message receiving function; a receiving module for receiving at least one voice message formed externally; the recognition module recognizes each voice message to form at least one recognition result message, and displays the recognition result message on a mapping interface; the interaction module forms a mapping interface, and an operation unit of the target application program is further displayed on the mapping interface; and the association module associates each identification result message with one or more operation units to form a configuration relation and then stores the configuration relation.

In a preferred embodiment, the association module comprises: the mobile unit is connected with the interaction module, receives external operation executed on the mapping interface, and moves the positions of the operation unit and the mapping interface according to the external operation; a display unit highlighting the moving operation unit; the association unit associates the identification result message with the operation unit when any operation unit moves to a position corresponding to the identification result message; and the storage unit is used for storing the association relation between each operation unit and the recognition result message as the configuration relation of the voice model.

In one embodiment, a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of: forming a prompt interface to display information for activating the voice message receiving function; receiving at least one externally formed voice message; identifying each voice message to form at least one identification result message, and displaying the identification result message on a mapping interface; the operation unit of the target application program is also displayed on the mapping interface; and associating each identification result message with one or more operation units, forming a configuration relation and then storing the configuration relation.

The intelligent terminal may be implemented in various forms. For example, the terminals described in the present invention may include smart terminals such as mobile phones, smart phones, notebook computers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), navigation devices, and the like, and fixed terminals such as digital TVs, desktop computers, and the like. In the following, it is assumed that the terminal is an intelligent terminal. However, it will be understood by those skilled in the art that the configuration according to the embodiment of the present invention can be applied to a fixed type terminal in addition to elements particularly used for a moving purpose.

It should be noted that the embodiments of the present invention are preferred and not limited in any way, and any person skilled in the art may make use of the above-disclosed technical content to change or modify the same into equivalent effective embodiments without departing from the technical scope of the present invention, and any modification or equivalent change and modification of the above-described embodiments according to the technical substance of the present invention still falls within the scope of the technical scope of the present invention.

Claims

1. A method for training a speech model, comprising the steps of:

receiving at least one externally formed voice message;

an operation unit of the target application program is also displayed on the mapping interface;

associating each identification result message with one or more operation units to form a configuration relation and then storing the configuration relation, wherein

Associating each identification result message with one or more operation units, and storing after forming a configuration relation, wherein the step of storing comprises the following steps: receiving external operation executed on a mapping interface, and moving the positions of the operation unit and the mapping interface according to the external operation;

2. The speech model training method of claim 1,

the step of identifying each voice message to form at least one identification result message and displaying the identification result message on a mapping interface comprises the following steps:

extracting keywords in the text message;

and storing the keyword as at least one recognition result message, and sending the recognition result message to a server side to generate a voice model at the server side.

3. The speech model training method of claim 2,

the step of extracting the keywords in the text message comprises the following steps:

comparing the text message with the common term, and extracting the content of the text message, which is matched with the common term or has similarity higher than a preset threshold;

and storing the content as a keyword or modifying the content to a common term with closest similarity as the keyword.

4. The speech model training method of claim 1,

the step of displaying the operation unit of the target application program on the mapping interface further comprises the following steps:

acquiring the type and key frame of a target application program;

5. The speech model training method of claim 2, further comprising the steps of:

modifying the name of the voice model as the name of the configuration relation, and storing the configuration relation into the voice model;

and saving the voice model to a database.

6. A speech model training system, the speech model training system comprising:

a receiving module for receiving at least one voice message formed externally;

the association module associates each identification result message with one or more operation units to form a configuration relation and then stores the configuration relation, and the association module comprises:

the mobile unit is connected with the interaction module, receives external operation executed on the mapping interface, and moves the positions of the operation unit and the mapping interface according to the external operation;

a display unit highlighting the moving operation unit;

7. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor realizes the steps of:

receiving at least one externally formed voice message;

Associating each identification result message with one or more operation units, and storing after forming a configuration relation, wherein the step of storing comprises the following steps:

receiving external operation executed on a mapping interface, and moving the positions of the operation unit and the mapping interface according to the external operation;