CN106710598A

CN106710598A - Voice recognition method and device

Info

Publication number: CN106710598A
Application number: CN201710182776.3A
Authority: CN
Inventors: 洪帆; 罗绿梅
Original assignee: Shanghai Yude Technology Co Ltd
Current assignee: Shanghai Yude Technology Co Ltd
Priority date: 2017-03-24
Filing date: 2017-03-24
Publication date: 2017-05-24

Abstract

The invention relates to the field of communication and discloses a voice recognition method and device. The voice recognition method comprises the following steps of: extracting a character identifier of each switch in a display interface; when receiving voice information, converting the voice information into characters by a voice recognition technology; matching the converted characters and the extracted character identifiers; and when the matching is successful, directly calling switch clicking events corresponding to the character identifiers matched successfully. The invention also discloses a voice recognition device. Compared with the prior art, the voice recognition method and device have the advantages that for the character identifier of each switch, the switch clicking events corresponding to the character identifiers can be directly called, and complete remote operation for voice of terminal equipment can be realized without installing an operation type interface or installing a voice adapting interface for apk, so that the extended application is wider.

Description

Audio recognition method and device

Technical field

The present invention relates to the communications field, more particularly to audio recognition method and device.

Background technology

In recent years, with the communication technology continue to develop and science and technology continuous progress, mobile phone, notebook computer, flat board The mobile terminals such as computer have turned into essential use instrument in people's daily life；Because its carrying is convenient, use Life that is simple and giving people brings great convenience.

At present, intelligent voice system e.g., converts speech into word, basis in the terminal using more and more extensively The control that Voice command third-party application or determination are matched with voice messaging；The control instruction of control is generated, to realize to terminal Control of equipment etc..

But inventor realize it is of the invention during, discovery also there is following technological deficiency in the prior art：First, when Preceding intelligent voice system, as far as possible many can only do operation class interface, for example open certain APP (AP refers to application program), The relatively fixed operation such as notepad is opened, and can not completely accomplish the Voice command to mobile phone, using more limiting to.2nd, When realizing converting speech into word, it is necessary to each apk (apk refers to installation kit) does voice adaptable interface, this for Cannot almost be realized for third party apk.3rd, it is determined that the control matched with voice messaging；The control for generating control refers to Order, to realize during the control to terminal device, can cause speech text to match with the word description of each control, many controls Part be picture category without corresponding text information, surely make even if matching corresponding control for picture category control and also differing Go out corresponding operation.

The content of the invention

The purpose of the embodiment of the present invention is to provide a kind of audio recognition method and device so that can be for each switch Words identification, directly invoke the switch corresponding with words identification and click on event, and operation class interface need not be done or be directed to Apk does voice adaptable interface, can be achieved with the complete sound remote manipulation to terminal device, expands application relatively broad.

In order to solve the above technical problems, a kind of audio recognition method is the embodiment of the invention provides, including：Extract display circle The words identification of each switch in face；When voice messaging is received, text is converted speech information into by speech recognition technology Word；The word changed is matched with each described words identification for extracting；When the match is successful, directly invoke and match The corresponding switch of the successful words identification clicks on event.

The embodiment of the present invention additionally provides a kind of speech recognition equipment, including：First extraction module, for extracting display circle The words identification of each switch in face；First judge module, for judging whether to receive voice messaging；Modular converter, is used for When voice messaging is received, word is converted speech information into by speech recognition technology；Matching module, for by change The word is matched with each described words identification for extracting；Second judge module, for judging whether that the match is successful；Call Module, the corresponding switch of the words identification for when the match is successful, directly invoking with the match is successful clicks on event.

The embodiment of the present invention in terms of existing technologies, by extracting the words identification that each is switched in display interface, When voice messaging is received, word is converted speech information into by speech recognition technology, by change the word with carry Each described words identification for taking is matched, and when the match is successful, is directly invoked relative with the words identification that the match is successful The switch answered clicks on event so that can be directed to the words identification of each switch, directly invoke corresponding with words identification opening Click event is closed, and operation class interface need not be done or voice adaptable interface is done for apk, can be achieved with to terminal device Complete sound remote manipulation, expands application relatively broad.

In addition, after the words identification for extracting each switch in display interface, it is described when voice messaging is received, Before converting speech information into word by speech recognition technology, the audio recognition method also includes：Each institute that will be extracted Words identification is stated to be shown in the way of data block；Carried out with each described words identification for extracting by the word changed During matching, specially：The word changed is matched with the words identification in each data block.By this side Formula, there is provided the specific implementation that a kind of word by conversion is matched with each words identification for extracting, so as to contribute to Ensure further feasibility of the invention.

In addition, before the words identification for extracting each switch in display interface, the audio recognition method also includes： Attribute to each switch in display interface is identified；It is described to carry when the attribute of the switch switchs button for text The words identification of each switch in display interface is taken, specially：Described in text text attributes according to the button are obtained The corresponding words identifications of button.In this way, there is provided a kind of side of implementing of the words identification of acquisition button Formula, and the corresponding words identifications of button are obtained by the text text attributes of button so that the words identification of acquisition compared with For accurate.

In addition, before the words identification for extracting each switch in display interface, the audio recognition method also includes： Attribute to each switch in display interface is identified；When the attribute of the switch switchs imagebutton for picture, sentence Whether the imagebutton that breaks is unicity imagebutton；It is unicity imagebutton in the imagebutton When, the words identification of each switch in the extraction display interface, specially：According to the path src attributes that the picture is switched Find corresponding picture, the word in the picture recognized by picture recognition technology OCR, the word that will be recognized as The words identification of the unicity imagebutton.In this way, there is provided one kind obtains unicity imagebutton Words identification specific implementation, and the words identification for obtaining in this way is more accurate.Described When imagebutton is non-singularity imagebutton, the words identification of each switch in the extraction display interface, specifically For：The upper strata encapsulation of the non-singularity imagebutton is obtained, text is obtained from the layout layout of upper strata encapsulation Attribute, the corresponding words identifications of the non-singularity imagebutton are obtained according to the text attributes for obtaining.By this A kind of mode, there is provided specific implementation of the words identification of acquisition non-singularity imagebutton, and by this side The words identification that formula is obtained is more accurate.

In addition, judging whether the imagebutton is unicity imagebutton in the following manner：Described When having at least two pictures in the upper strata encapsulation of imagebutton, the imagebutton is unicity imagebutton； When having a picture in the upper strata encapsulation of the imagebutton, the imagebutton is non-singularity imagebutton.Judge whether imagebutton is the one of unicity imagebutton there is provided a kind of in this way Kind of specific implementation, and by being encapsulated according to upper strata in the picture number that has judge whether imagebutton is single Property imagebutton causes that result of determination is more accurate.

In addition, the speech recognition equipment also includes：Display module, for each switch in the extraction display interface Words identification after, it is described when voice messaging is received, by speech recognition technology convert speech information into word it Before, each described words identification for extracting is shown in the way of data block；The matching module, in the text that will be changed When word is matched with each described words identification for extracting, specially：In the word and each data block that will change Words identification is matched.

In addition, the speech recognition equipment also includes：Identification module, for each switch in the extraction display interface Words identification before, in display interface each switch attribute be identified；Second extraction module, for extracting described opening The attribute of pass；First extraction module, during for the attribute in the switch for text switch button, extracts display interface In each switch words identification, specially：It is corresponding that text text attributes according to the button obtain the button Words identification.

In addition, the speech recognition equipment also includes：Identification module, for each switch in the extraction display interface Words identification before, in display interface each switch attribute be identified；Second extraction module, for extracting described opening The attribute of pass；3rd judge module, during for the attribute in the switch for picture switch imagebutton, judges described Whether imagebutton is unicity imagebutton；First extraction module, for being single in the imagebutton During one property imagebutton, the words identification of each switch in display interface is extracted, specially：Switched according to the picture Path src attributes find corresponding picture, the word in the picture are recognized by picture recognition technology OCR, by what is recognized The word as the unicity imagebutton words identification；First extraction module, for described When imagebutton is non-singularity imagebutton, the words identification of each switch in display interface is extracted, specially：Obtain The upper strata encapsulation of the non-singularity imagebutton is taken, text attributes is obtained from the layout layout of upper strata encapsulation, The corresponding words identifications of the non-singularity imagebutton are obtained according to the text attributes for obtaining.

In addition, the 3rd judge module includes：Judging submodule, for judging on the upper strata of the imagebutton Whether there are at least two pictures in encapsulation；Decision sub-module, for having extremely in the encapsulation of the upper strata of the imagebutton During few two pictures, judge that the imagebutton is unicity imagebutton；Judging submodule, is additionally operable to judge in institute Whether there is a picture in the upper strata encapsulation for stating imagebutton；Decision sub-module, is additionally operable in the imagebutton Upper strata encapsulation in have a picture when, judge that the imagebutton is non-singularity imagebutton.

Brief description of the drawings

One or more embodiments are illustrative by the picture in corresponding accompanying drawing, these exemplary theorys The bright restriction not constituted to embodiment, the element with same reference numbers label is expressed as similar element in accompanying drawing, removes It is non-to have especially statement, the figure not composition limitation in accompanying drawing.

Fig. 1 is the flow chart according to audio recognition method in first embodiment of the invention；

Fig. 2 is the flow chart according to audio recognition method in second embodiment of the invention；

Fig. 3 is the block diagram according to speech recognition equipment in third embodiment of the invention；

Fig. 4 is the block diagram according to speech recognition equipment in four embodiment of the invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to each reality of the invention The mode of applying is explained in detail.However, it will be understood by those skilled in the art that in each implementation method of the invention, In order that reader more fully understands the application and proposes many ins and outs.But, even if without these ins and outs and base Many variations and modification in following implementation method, it is also possible to realize the application technical scheme required for protection.

First embodiment of the invention is related to a kind of audio recognition method.Idiographic flow is as shown in figure 1, speech recognition side Method includes：

Step 101, terminal device extracts the words identification of each switch in display interface.

It is noted that have multiple icons in each interface of terminal device, e.g., calendar, clock, mail, Icon in the icons, and interface such as consulting, camera is substantially switch.When icon on to terminal device is operated, Realized essentially by switch is clicked on.In actual applications, the switch on interface includes two classes, and a class is text switch Button (button Chinese be interpreted as text switch, herein text switch for button appositive), it is another kind of be that picture is opened (imagebutton Chinese is interpreted as picture switch, and picture switch is the same position of imagebutton herein to close imagebutton Language).Either button or imagebutton have the corresponding words identification for describing switch purposes.Performing During the step of audio recognition method, terminal device can extract the words identification of each button or imagebutton in interface.

Step 102, terminal device judges whether to receive voice messaging.If it is, into step 103；Otherwise, return Step 102.

It is noted that terminal device comes with microphone, microphone can be used to receive voice messaging, as user against words When cylinder sends voice, terminal device can receive the voice that user sends by microphone.

Step 103, terminal device converts speech information into word by speech recognition technology.

Step 104, terminal device is matched the word of conversion with each words identification for extracting.

Step 105, terminal device judges whether that the match is successful.If it is, into step 106；Otherwise, terminate.

Additionally, it is noted that when terminal device judges that matching is unsuccessful, terminal device can be sent out prompting letter Breath, for pointing out user speech to mismatch, please re-enter.Because, in actual mechanical process, may be sent out due to user Sound is inaccurate etc., and reason causes the word of conversion and each words identification of extraction all to mismatch.

Step 106, terminal device judges whether the number of the words identification that the match is successful is one.If it is, into Step 107；Otherwise, into step 108.

Step 107, the corresponding switch of terminal device is directly invoked with the match is successful words identification clicks on event.

It is noted that when our using terminal equipment are operated, being substantially done as follows without non-required：Point Hit (click on include clicking, double-click or it is long by), return, upper draw and the operation such as glide.Therefore, in the word for calling with the match is successful When the corresponding switch of mark clicks on event, can complete corresponding with button or imagebutton click, double-click or long Operated by waiting.

Step 108, terminal device sends prompt message.

It is noted that after the prompt message that sends of terminal device is used to pointing out user to lengthen voice messaging, again Voice messaging after typing lengthening, further to limit the accuracy of voice messaging.

Furthermore, it is necessary to explanation, after step 108, can continue to determine whether to receive with return to step 102 Voice messaging, or directly terminate.

By the above, it is seen that, present embodiment can be directed to each switch words identification, directly invoke with The corresponding switch of words identification clicks on event, and need not do operation class interface or do voice adaptable interface for apk, just The complete sound remote manipulation to terminal device can be realized, application is expanded relatively broad.

Second embodiment of the present invention is related to a kind of audio recognition method.Second embodiment is in first embodiment On the basis of improve.

In the present embodiment, audio recognition method is specifically included：

Step 201, terminal device is identified to the attribute of each switch in display interface.

Step 202, terminal device judges whether the attribute of switch is button.If it is, into step 203；Otherwise, say The attribute of bright switch is imagebutton, into step 204.

Step 203, terminal device obtains the corresponding words identifications of button according to the text attributes of button.Wherein, Text Chinese is interpreted as text.

It is noted that having word description in text attributes, it is possible to which the text attributes according to button are obtained The corresponding words identifications of button.In this way, there is provided a kind of side of implementing of the words identification of acquisition button Formula, and the corresponding words identifications of button are obtained by the text text attributes of button so that the words identification of acquisition compared with For accurate.

Step 204, judges whether imagebutton is unicity imagebutton.If it is, into step 205；It is no Then, imagebutton is illustrated for non-singularity imagebutton, into step 207.

Specifically, judge whether imagebutton is unicity imagebutton in the following manner： When having at least two pictures in the upper strata encapsulation of imagebutton, imagebutton is unicity imagebutton； When having a picture in the upper strata encapsulation of imagebutton, imagebutton is non-singularity imagebutton.By this Kind of mode provide it is a kind of judge imagebutton whether be unicity imagebutton a kind of specific implementation, and The picture number having in by being encapsulated according to upper strata judges whether imagebutton is unicity imagebutton so that sentencing Determine result more accurate.

Step 205, corresponding picture is found according to the src attributes that picture is switched.

Step 206, the word in picture is recognized by OCR, and the word that will be recognized is used as unicity imagebutton's Words identification.Wherein, the Chinese of src is interpreted as path, and the Chinese of OCR is interpreted as picture recognition technology.In this way, carry Supply a kind of specific implementation of the words identification for obtaining unicity imagebutton, and obtain in this way Words identification is more accurate.

Step 207, obtains the upper strata encapsulation of non-singularity imagebutton.

Step 208, obtains text attributes from the layout of upper strata encapsulation.Wherein, the Chinese of layout is interpreted as layout.

Step 209, the corresponding words identifications of non-singularity imagebutton are obtained according to the text attributes for obtaining.Pass through A kind of this mode, there is provided specific implementation of the words identification of acquisition non-singularity imagebutton, and by this The words identification that the mode of kind is obtained is more accurate.

It is noted that after step 203, step 206 and step 209, being performed both by step 210.

Step 210, terminal device is shown each words identification for obtaining in the way of data block.

Step 211, terminal device judges whether to receive voice messaging.If it is, into step 212；Otherwise, return Step 211.

Step 212, terminal device converts speech information into word by speech recognition technology.

Step 213, terminal device is matched the word of conversion with the words identification in each data block.By this side Formula, there is provided the specific implementation that a kind of word by conversion is matched with each words identification for extracting, so as to contribute to Ensure further feasibility of the invention.

Step 214, terminal device judges whether that the match is successful.If it is, into step 215；Otherwise, terminate.

Step 215, terminal device judges whether the number of the words identification that the match is successful is one.If it is, into Step 216；Otherwise, into step 217.

Step 216, the corresponding switch of terminal device is directly invoked with the match is successful words identification clicks on event.

Step 217, terminal device sends prompt message.

Furthermore, it is necessary to explanation, after step 217, can continue to determine whether to receive with return to step 211 Voice messaging, or directly terminate.

Above the step of various methods divide, be intended merely to description it is clear, can be merged into when realizing a step or Some steps are split, multiple steps are decomposed into, as long as including identical logical relation, all in the protection domain of this patent It is interior；To adding inessential modification in algorithm or in flow or introducing inessential design, but its algorithm is not changed With the core design of flow all in the protection domain of the patent.

Third embodiment of the invention is related to a kind of speech recognition equipment, as shown in figure 3, including：First extraction module 31, Words identification for extracting each switch in display interface；First judge module 32, for judging whether to receive voice letter Breath；Modular converter 33, for when voice messaging is received, word being converted speech information into by speech recognition technology； With module 34, for the word changed to be matched with each described words identification for extracting；Second judge module 35, uses In judging whether that the match is successful；Calling module 36, for the word mark for when the match is successful, directly invoking with the match is successful Sensible corresponding switch click event.

Present embodiment is the device embodiment corresponding with first embodiment, and present embodiment can be with the first embodiment party Formula is worked in coordination implementation.The relevant technical details mentioned in first embodiment are still effective in the present embodiment, in order to subtract It is few to repeat, repeat no more here.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in the first embodiment party In formula.

It is noted that each module involved in present embodiment is logic module, in actual applications, one Individual logic unit can be a part for a physical location, or a physical location, can also be with multiple physics lists The combination of unit is realized.Additionally, in order to protrude innovative part of the invention, will not be with solution institute of the present invention in present embodiment The unit that the technical problem relation of proposition is less close is introduced, but this is not intended that in present embodiment do not exist other lists Unit.

Four embodiment of the invention is related to a kind of speech recognition equipment.4th implementation method is in the 3rd implementation method On the basis of the improvement done.

Specifically as shown in figure 4, speech recognition equipment includes：First extraction module 31, for extracting each in display interface The words identification of switch；First judge module 32, for judging whether to receive voice messaging；Modular converter 33, for connecing When receiving voice messaging, word is converted speech information into by speech recognition technology；Matching module 34, for the institute that will be changed State word and matched with each described words identification for extracting；Second judge module 35, for judging whether that the match is successful；Call Module 36, the corresponding switch of the words identification for when the match is successful, directly invoking with the match is successful clicks on event.

In addition, the speech recognition equipment also includes：Display module 37, for it is described extraction display interface in each open It is described when voice messaging is received after the words identification of pass, word is converted speech information into by speech recognition technology Before, each described words identification for extracting is shown in the way of data block；The matching module 34, in the institute that will be changed State word with extract each described words identification matched when, specially：By the word changed and each data block In words identification matched.

In addition, the speech recognition equipment also includes：Identification module 38, for it is described extraction display interface in each open Before the words identification of pass, the attribute to each switch in display interface is identified；Second extraction module 39, for extracting State the attribute of switch；First extraction module 31, during for the attribute in the switch for text switch button, extracts aobvious Show the words identification of each switch in interface, specially：Text text attributes according to the button obtain the button Corresponding words identification.

In addition, the speech recognition equipment also includes：Identification module 38, for it is described extraction display interface in each open Before the words identification of pass, the attribute to each switch in display interface is identified；Second extraction module 39, for extracting State the attribute of switch；3rd judge module 310, during for the attribute in the switch for picture switch imagebutton, judges Whether the imagebutton is unicity imagebutton；First extraction module 31, for described When imagebutton is unicity imagebutton, the words identification of each switch in display interface is extracted, specially：According to The path src attributes of the picture switch find corresponding picture, and the text in the picture is recognized by picture recognition technology OCR Word, words identification of the word that will be recognized as the unicity imagebutton；First extraction module 31, For when the imagebutton is non-singularity imagebutton, extracting the word mark of each switch in display interface Know, specially：The upper strata encapsulation of the non-singularity imagebutton is obtained, from the layout layout of upper strata encapsulation Text attributes are obtained, the corresponding words identifications of the non-singularity imagebutton are obtained according to the text attributes for obtaining.

In addition, the 3rd judge module 310 includes：Judging submodule, for judging the imagebutton's Whether there are at least two pictures in layer encapsulation；Decision sub-module, for having in the encapsulation of the upper strata of the imagebutton During at least two pictures, judge that the imagebutton is unicity imagebutton；Judging submodule, is additionally operable to judge Whether there is a picture in the upper strata encapsulation of the imagebutton；Decision sub-module, is additionally operable to described When there is a picture in the upper strata encapsulation of imagebutton, judge that the imagebutton is non-singularity imagebutton。

Because second embodiment is mutually corresponding with present embodiment, therefore present embodiment can be mutual with second embodiment It is engaged implementation.The relevant technical details mentioned in second embodiment are still effective in the present embodiment, implement second The technique effect to be reached in mode in the present embodiment similarly it is achieved that in order to reduce repetition, no longer go to live in the household of one's in-laws on getting married here State.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in second embodiment.

It will be appreciated by those skilled in the art that all or part of step in realizing above-described embodiment method can be by Program is completed to instruct the hardware of correlation, and the program storage is in a storage medium, including some instructions are used to so that one Individual equipment (can be single-chip microcomputer, chip etc.) or processor (processor) perform the whole of each embodiment method of the application Or part steps.And foregoing storage medium includes：USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.

It will be understood by those skilled in the art that the respective embodiments described above are to realize specific embodiment of the invention, And in actual applications, can to it, various changes can be made in the form and details, without departing from the spirit and scope of the present invention.

Claims

1. a kind of audio recognition method, it is characterised in that including：

Extract the words identification of each switch in display interface；

When voice messaging is received, word is converted speech information into by speech recognition technology；

The word changed is matched with each described words identification for extracting；

When the match is successful, directly invoke the switch corresponding with the words identification that the match is successful and click on event.

2. audio recognition method according to claim 1, it is characterised in that each switch in the extraction display interface It is described when voice messaging is received after words identification, before converting speech information into word by speech recognition technology, The audio recognition method also includes：

Each described words identification for extracting is shown in the way of data block；

When the word changed is matched with each described words identification for extracting, specially：The text that will be changed Word is matched with the words identification in each data block.

3. audio recognition method according to claim 1 and 2, it is characterised in that each is opened in the extraction display interface Before the words identification of pass, the audio recognition method also includes：

Attribute to each switch in display interface is identified；

When the attribute of the switch switchs button for text, the words identification of each switch in the extraction display interface, Specially：Text text attributes according to the button obtain the corresponding words identifications of the button.

4. audio recognition method according to claim 1 and 2, it is characterised in that each is opened in the extraction display interface Before the words identification of pass, the audio recognition method also includes：

Attribute to each switch in display interface is identified；

When the attribute of the switch switchs imagebutton for picture, judge whether the imagebutton is unicity imagebutton；

When the imagebutton is unicity imagebutton, the word mark of each switch in the extraction display interface Know, specially：Corresponding picture is found according to the path src attributes that the picture is switched, is recognized by picture recognition technology OCR Word in the picture, words identification of the word that will be recognized as the unicity imagebutton；

When the imagebutton is non-singularity imagebutton, the word of each switch in the extraction display interface Mark, specially：The upper strata encapsulation of the non-singularity imagebutton is obtained, from the layout layout of upper strata encapsulation Middle acquisition text attributes, the corresponding word marks of the non-singularity imagebutton are obtained according to the text attributes for obtaining Know.

5. audio recognition method according to claim 4, it is characterised in that judge in the following manner described Whether imagebutton is unicity imagebutton：

When having at least two pictures in the upper strata encapsulation of the imagebutton, the imagebutton is unicity imagebutton；

When having a picture in the upper strata encapsulation of the imagebutton, the imagebutton is non-singularity imagebutton。

6. a kind of speech recognition equipment, it is characterised in that including：

First extraction module, the words identification for extracting each switch in display interface；

First judge module, for judging whether to receive voice messaging；

Modular converter, for when voice messaging is received, word being converted speech information into by speech recognition technology；

Matching module, for the word changed to be matched with each described words identification for extracting；

Second judge module, for judging whether that the match is successful；

Calling module, the corresponding switching point of the words identification for when the match is successful, directly invoking with the match is successful Hit event.

7. speech recognition equipment according to claim 6, it is characterised in that the speech recognition equipment also includes：

Display module, for it is described extraction display interface in each switch words identification after, it is described to receive voice During information, before converting speech information into word by speech recognition technology, each described words identification that will be extracted is with data The mode of block is shown；

The matching module, when the word changed is matched with each described words identification for extracting, specially：Will The word of conversion is matched with the words identification in each data block.

8. the speech recognition equipment according to claim 6 or 7, it is characterised in that the speech recognition equipment also includes：

Identification module, for it is described extraction display interface in each switch words identification before, in display interface each The attribute of switch is identified；

Second extraction module, the attribute for extracting the switch；

First extraction module, it is every in extraction display interface during for the attribute in the switch for text switch button The words identification of individual switch, specially：Text text attributes according to the button obtain the corresponding words of the button Mark.

9. the speech recognition equipment according to claim 6 or 7, it is characterised in that the speech recognition equipment also includes：

Second extraction module, the attribute for extracting the switch；

3rd judge module, during for the attribute in the switch for picture switch imagebutton, judges described Whether imagebutton is unicity imagebutton；

First extraction module, for when the imagebutton is unicity imagebutton, extracting display interface In each switch words identification, specially：Corresponding picture is found according to the path src attributes that the picture is switched, is passed through Picture recognition technology OCR recognizes the word in the picture, and the word that will be recognized is used as the unicity The words identification of imagebutton；

First extraction module, for when the imagebutton is non-singularity imagebutton, extracting display circle The words identification of each switch in face, specially：The upper strata encapsulation of the non-singularity imagebutton is obtained, from described Text attributes are obtained in the layout layout of layer encapsulation, the non-singularity is obtained according to the text attributes for obtaining The corresponding words identifications of imagebutton.

10. speech recognition equipment according to claim 9, it is characterised in that the 3rd judge module includes：

Judging submodule, for judging whether there is at least two pictures in the upper strata encapsulation of the imagebutton；

Decision sub-module, during for having at least two pictures in the encapsulation of the upper strata of the imagebutton, judges described Imagebutton is unicity imagebutton；

Whether judging submodule, is additionally operable to judge have a picture in the upper strata encapsulation of the imagebutton；

Decision sub-module, when being additionally operable to have a picture in the upper strata encapsulation of the imagebutton, judges described Imagebutton is non-singularity imagebutton.