CN110047481A

CN110047481A - Method for voice recognition and device

Info

Publication number: CN110047481A
Application number: CN201910329635.9A
Authority: CN
Inventors: 欧阳能钧; 贺学焱; 张丙林
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date: 2019-04-23
Filing date: 2019-04-23
Publication date: 2019-07-23
Anticipated expiration: 2039-04-23
Also published as: CN110047481B; CN113327609B; CN113327609A

Abstract

Embodiment of the disclosure discloses method for voice recognition and device.One specific embodiment of this method includes: to match first segment voice with scheduled keyword set in response to receiving first segment voice；If successful match, second segment voice is continued to, and carry out speech recognition after first segment voice and second segment voice are merged and obtain speech recognition result text；The progress semantic analysis of speech recognition result text is obtained into phonetic order；If phonetic order and matched keyword belong to same semantic domain, phonetic order is executed.The movement that traditional voice interaction will be completed twice can be reduced to once by the embodiment.Speech recognition process combines semanteme, to effectively suppress false triggering.

Description

Method for voice recognition and device

Technical field

Embodiment of the disclosure is related to field of computer technology, and in particular to method for voice recognition and device.

Background technique

Traditional speech recognition human-computer interaction scheme is all to need first to say a keyword to be waken up, and determines that user has After specifying intention, then open the secondary interactive system of speech recognition.This mode passes through a preposition offline keyword Identification efficiently solves universal phonetic identification CPU and occupies height, expends the problems such as customer flow, however, this mode is also brought Problem, i.e., identification is all to need first to wake up once every time, in a real artificial intelligence product, still seems very slow-witted Plate, not smart enoughization.One real artificial intelligent voice assistant, it should be if wanting to understand user at any time, go directly user It is required.

Summary of the invention

Embodiment of the disclosure proposes method for voice recognition and device.

In a first aspect, embodiment of the disclosure provides a kind of method for voice recognition, comprising: in response to receiving First segment voice matches first segment voice with scheduled keyword set；If successful match, second segment language is continued to Sound, and carry out speech recognition after first segment voice and second segment voice are merged and obtain speech recognition result text；By voice Recognition result text carries out semantic analysis and obtains phonetic order；If phonetic order and matched keyword belong to same semantic neck Domain then executes phonetic order.

In some embodiments, first segment voice is matched with scheduled keyword set, comprising: by first segment language Sound is converted into text information；Text information is matched with the scheduled keyword set of textual form.

In some embodiments, this method further include: if phonetic order and matched keyword are not belonging to same semantic neck Domain then abandons phonetic order.

In some embodiments, scheduled keyword set is public more than the phonetic order of the predetermined frequency by extracting Prefix word obtains.

In some embodiments, in keyword set the length of keyword less than 4.

Second aspect, embodiment of the disclosure provide a kind of device for speech recognition, comprising: matching unit, quilt It is configured to receive first segment voice, first segment voice is matched with scheduled keyword set；Recognition unit, If being configured to successful match, second segment voice is continued to, and carry out after first segment voice and second segment voice are merged Speech recognition obtains speech recognition result text；Analytical unit is configured to speech recognition result text carrying out semantic analysis Obtain phonetic order；Execution unit belongs to same semantic domain with matched keyword if being configured to phonetic order, executes Phonetic order.

In some embodiments, matching unit is further configured to: first segment voice is converted into text information；It will be literary Word information is matched with the scheduled keyword set of textual form.

In some embodiments, execution unit is further configured to: if phonetic order is not belonging to matched keyword Same semantic domain, then abandon phonetic order.

In some embodiments, in keyword set the length of keyword less than 4.

The third aspect, embodiment of the disclosure provide a kind of electronic equipment, comprising: one or more processors；Storage Device is stored thereon with one or more programs, when one or more programs are executed by one or more processors, so that one Or multiple processors are realized such as method any in first aspect.

Fourth aspect, embodiment of the disclosure provide a kind of computer-readable medium, are stored thereon with computer program, Wherein, it realizes when program is executed by processor such as method any in first aspect.

The method for voice recognition and device that embodiment of the disclosure provides are used by extracting a collection of user's high frequency Instruction, extract common key message, such as navigation type instruction, the disclosure can for specified semantic domain instruction, The movement that traditional voice interaction will be completed twice can be reduced to once, overcome the stiff, stiff of traditional interactive mode, increased The intelligence of voice system is added.In conjunction with semanteme, false triggering has effectively been suppressed, 0.5 time/hour can be reduced to from 10 times/hour.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the disclosure is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that one embodiment of the disclosure can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of the method for voice recognition of the disclosure；

Fig. 3 is the schematic diagram according to an application scenarios of the method for voice recognition of the disclosure；

Fig. 4 is the flow chart according to another embodiment of the method for voice recognition of the disclosure；

Fig. 5 is the structural schematic diagram according to one embodiment of the device for speech recognition of the disclosure；

Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of embodiment of the disclosure.

Specific embodiment

The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can phase Mutually combination.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can be using the method for voice recognition of the disclosure or the implementation of the device for speech recognition The exemplary system architecture 100 of example.

As shown in Figure 1, system architecture 100 may include microphone 101, controller 102, speech recognition server 103, language Reason and good sense solution server 104.Network between controller 102, speech recognition server 103, semantic understanding server 104 to mention For the medium of communication link.Network may include various connection types, such as wired, wireless communication link or fiber optic cables etc. Deng.

User can be used microphone 101 and input voice to controller 102.Then controller 102 passes through network and voice Server 103, the interaction of semantic understanding server 104 are identified, to receive or send message etc..Microphone 101 can be mounted in Voice-input device in the mobile devices such as unmanned vehicle, microphone 101 can also be the built-in device of mobile phone, computer etc..Control Device can be the built-in device that mobile unit is also possible to mobile phone, computer etc..Controller 102 has the function for sending and receiving information Energy.

Speech recognition server 103 turns for receiving the voice of the transmission of controller 102, and by the vocabulary content in voice It is changed to computer-readable input, such as key, binary coding or character string.With Speaker Identification and speaker verification Difference, the latter attempt the speaker of identification or confirmation sending voice rather than vocabulary content included in it.Speech-recognition services Speech recognition system is installed on device 102.Speech recognition system generally divides training and decoding two stages.Training, i.e., by a large amount of The voice data training acoustic model of mark.Decoding passes through acoustic model and language model for the voice data outside training set It is identified as text, trained acoustic model quality directly affects the precision of identification.

Semantic understanding server 103 for receiving the text results of the transmission of controller 102, and is carried out according to text results Semantic analysis.Semantic analysis refers to various methods, learns and understands semantic content represented by one section of text, any pair of language Understanding can be classified as the scope of semantic analysis.One section of text is usually made of word, sentence and paragraph, according to understanding object Linguistic unit it is different, semantic analysis can be further broken into the semantic analysis of vocabulary grade, Sentence-level semantic analysis and chapter again Grade semantic analysis.In general, the semantic analysis of vocabulary grade is concerned with the semanteme for how obtaining or distinguishing word, and Sentence-level is semantic Analysis then attempts to analyze semanteme expressed by entire sentence, and discourse semantics analysis is intended to study the inherent knot of natural language text Structure simultaneously understands the semantic relation between text unit (can be sentence subordinate clause or paragraph).Simply, the target of semantic analysis is just It is to realize the automatic language in each linguistic unit (including vocabulary, sentence and chapter etc.) by establishing effective model and system Justice analysis, to realize the true semanteme for understanding entire text representation.

It should be noted that speech recognition server 103, semantic understanding server 104 can be hardware, it is also possible to soft Part.When server is hardware, the distributed server cluster of multiple server compositions may be implemented into, list also may be implemented into A server.When server is software, multiple softwares or software module may be implemented into (such as providing Distributed Services Multiple softwares or software module), single software or software module also may be implemented into.It is not specifically limited herein.

It should be noted that method for voice recognition provided by embodiment of the disclosure is generally by controller 102 It executes, correspondingly, the device for speech recognition is generally positioned in controller 102.

It should be understood that the number of microphone, controller, speech recognition server, semantic understanding server in Fig. 1 is only It is schematical.According to needs are realized, any number of microphone, controller, speech recognition server, semantic reason can have Solve server.

With continued reference to Fig. 2, the process of one embodiment of the method for voice recognition according to the disclosure is shown 200.The method for voice recognition, comprising the following steps:

Step 201, in response to receiving first segment voice, by first segment voice and the progress of scheduled keyword set Match.

In the present embodiment, the executing subject (such as controller shown in FIG. 1) of the method for voice can lead to for identification It crosses wired connection mode or radio connection and obtains continuous speech frame in real time from microphone.Existing voice can be used to call out The technology of waking up matches first segment voice with scheduled keyword set.

Here first segment voice refers to the speech frame after voice starting point.Between first segment voice and second segment voice There may be pauses.The user that can make an appointment slightly is made a short pause after saying keyword so that it is convenient to detect first segment voice It is whole afterwards to carry out speech recognition match.Also each speech frame can be matched with keyword in real time, it is complete until being matched to Keyword, used speech frame are first segment voice.Refer here to speech terminals detection technology, speech terminals detection technology Refer to detected in noise circumstance people in the Duan Yuyin of end of speaking that loquiturs, i.e. the detection people starting point of saying a word With tail point.In speech recognition process each time, before speech recognition engine starts to process, need to pass through speech terminals detection Technology to carry out cutting to voice data.The average energy that a speech frame just calculates the speech frame is often got, then should The average energy of speech frame and preset starting point thresholding are compared.If the average energy of the speech frame is greater than preset starting point Thresholding, then it is assumed that the speech frame is the start frame of voice to be identified.Speech frame after including since start frame is real-time It is sent to identification engine, obtains the intermediate recognition result of speech recognition.Rather than until detect after tail point just will from starting point to One section of voice of tail point issues identification engine together.Identification engine obtains text results for carrying out speech recognition.Identify engine It can be local, be also possible to cloud.The process of speech recognition includes: input voice, speech terminals detection, extracts acoustics Feature, signal processing, identification net mate, identification decoding, judging confidence, identification text results.

Scheduled keyword set can be the speech form prerecorded keyword set (for example, " I will go/I Listen/phone "), it can also be the keyword set of textual form.It, can be preparatory for the keyword set of speech form The vocal print feature of each keyword of speech form is extracted, then by each pass of the vocal print feature of first segment voice and speech form The vocal print feature of keyword carries out similarity calculation, if the vocal print feature of first segment voice and the vocal print feature of some keyword it Between similarity be greater than predetermined similarity threshold, then it is assumed that first segment voice and the Keywords matching success.If in keyword It can not find the keyword for being greater than predetermined similarity threshold with first segment voice similarity in set, then it is assumed that it fails to match, does not hold Row step 202-204, but voice is continued to test, it waits and occurring with the voice of Keywords matching.

In some optional implementations of the present embodiment, by first segment voice and the progress of scheduled keyword set Match, comprising: first segment voice is converted into text information；The scheduled keyword set of text information and textual form is carried out Matching.First segment voice can be converted into text information in local or cloud.Then again by text information and textual form Scheduled keyword set is matched.The similarity calculation of text is carried out, if the text letter after the conversion of first segment voice Similarity between breath and the text information of some keyword is greater than predetermined similarity threshold, then it is assumed that first segment voice and the pass Keyword successful match.

Step 202, if successful match, second segment voice is continued to, and first segment voice and second segment voice are closed And it carries out speech recognition afterwards and obtains speech recognition result text.

In the present embodiment, second segment voice is the speech frame after detecting keyword.The start frame of second segment voice For the tail point speech frame of first segment voice, the tail point of second segment voice is the voice that average energy is less than preset tail point thresholding Frame.The average energy that the speech frame after first speech frame just calculates the speech frame is often got, then by the speech frame Average energy and preset tail point thresholding are compared.If the average energy of the speech frame is less than preset tail point thresholding, Think that the speech frame is tail point speech frame.Can't think that this section of voice is at this time, may be of short duration pause and ?.Therefore a tail point time-out time can be set, if there is not average energy greater than tail point thresholding in tail point time-out time Speech frame then illustrates that this section of language also terminates, has obtained second segment voice.First segment voice and second segment voice can be merged laggard Row speech recognition obtains speech recognition result text.For example, first segment voice is " I will go ", second segment voice is " the Forbidden City ", It is " I will remove the Forbidden City " that carry out speech recognition after then merging, which obtains speech recognition result text,.

Step 203, the progress semantic analysis of speech recognition result text is obtained into phonetic order.

In the present embodiment, speech recognition result text can be sent to the progress semantic analysis of semantic understanding server to obtain Phonetic order.For example, being analyzed " I will remove the Forbidden City " to obtain phonetic order " starting to navigate, destination is the Forbidden City ".

Step 204, if phonetic order and matched keyword belong to same semantic domain, phonetic order is executed.

In the present embodiment, semantic domain has been divided for the keyword in keyword set in advance, for example, " I will go " Semantic domain is navigation.The semantic domain of " I will listen " is to play music, and the semantic domain of " phoning " is to make a phone call.It is complete After above step, so that it may realize that primary interaction can go directly required by user.However, light is done so not enough, traditional voice The keyword of interactive system requires at least four words, requires discrimination bigger between each word, does so, it is ensured that wakes up In the sufficiently high situation of rate, false wake-up rate is also suppressed fine.And the application breaks through the limitation of keyword, for normal We extract prefix word-" I will go, and/I will listen/phones " to High Frequency Instruction, by this batch of prefix word simultaneously as pass Keyword trigger condition, than four words of triliteral keyword are easier by false triggering in addition, the two factors increase false triggering Risk.In order to suppress false triggering, the application combination semanteme judges, once triggering, speech text result can be sent out It is sent to natural language understanding module, after parsing semanteme, we can discriminate whether the preset Key Words for us according to semanteme Adopted field.If it is, notifying that man-machine interface layer is shown or voice broadcast accordingly.If it is not, then being abandoned on backstage Current to monitor as a result, restarting keyword, for a user, these movements are entirely noninductive.Facts proved that this do Method is necessary, if false triggering is 10 times/hour without this Semantic judgement；After being suppressed, 0.5 time/hour.

Optionally, if phonetic order and matched keyword are not belonging to same semantic domain, phonetic order is abandoned.Example Such as, matched keyword is " making a phone call ", and the phonetic order identified is " navigation destination is Xizhimen ", they are not belonging to There is false triggering in same semantic domain, therefore gives up the phonetic order, does not execute.

With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for voice recognition of the present embodiment Figure.In the application scenarios of Fig. 3, controller real-time detection voice, when detecting voice starting point, not by the speech frame received Disconnected ground is matched with keyword set, is matched to keyword " I will go ".After keyword is triggered, traditional voice interaction is Starting universal phonetic to identify and restart to record, the disclosure is also starting universal phonetic identification at this moment, the difference is that this It is open the recording of keyword cognitive phase to be multiplexed, and continue to record.The method of recording multiplexing is, by the finger of voice packet Needle traces back to the starting point of keyword identification, it is ensured that the recording of keyword-" I will go, and/I will listen/phones "-is sent completely To universal phonetic identification engine processing.Obtained speech recognition result text is subjected to semantic analysis, obtains significant semanteme, i.e., Phonetic order.If phonetic order and matched keyword belong to same semantic domain, voice is executed by man-machine interface and is referred to It enables.

The method provided by the above embodiment of the disclosure, can be by traditional voice by instructing for specified semantic domain The movement that interaction will be completed twice is reduced to once, is overcome the stiff, stiff of traditional interactive mode, is increased voice system Intelligence.In conjunction with semanteme, false triggering has effectively been suppressed.

With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of method for voice recognition.The use In the process 400 of the method for speech recognition, comprising the following steps:

Step 401, the common prefix word extracted more than the phonetic order of the predetermined frequency generates scheduled keyword set.

In the present embodiment, the executing subject (such as controller shown in FIG. 1) of the method for voice can be from for identification The common prefix word extracted in the phonetic order executed more than the phonetic order of the predetermined frequency generates scheduled keyword Set.For example, common prefix word " I will go " occurs 1300 times (the predetermined frequency is 10), then " I will go " can be added to key In set of words.Traditional voice interaction is with fixed keyword as trigger condition, the disadvantage is that excessively stiff, allows human-computer interaction Become very stiff, interaction all must first say one " the small small degree of degree " or other wake-up words every time.And by pre- in the disclosure If the key word information of high frequency semantic domain, directly by a collection of High Frequency Instruction such as " I will go, and/I will listen/phones XXX " Common prefix word extracts, and carries out speech monitoring as keyword.

The length of the keyword extracted by common prefix word is usually less than 4, and not to the discrimination of each word It is required.It is subsequent to be differentiated by semanteme again to inhibit false triggering.

Step 402, in response to receiving first segment voice, by first segment voice and the progress of scheduled keyword set Match.

Step 403, if successful match, second segment voice is continued to, and first segment voice and second segment voice are closed And it carries out speech recognition afterwards and obtains speech recognition result text.

Step 404, the progress semantic analysis of speech recognition result text is obtained into phonetic order.

Step 405, if phonetic order and matched keyword belong to same semantic domain, phonetic order is executed.

Step 402-405 and step 201-204 are essentially identical, therefore repeat no more.

Figure 4, it is seen that the method for voice recognition compared with the corresponding embodiment of Fig. 2, in the present embodiment Process 400 embody generate keyword the step of.The scheme of the present embodiment description can introduce more keyword phases as a result, Data are closed, to realize that more fully phonetic order detects and reduces interactive voice number.

With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, present disclose provides one kind to know for voice One embodiment of other device, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.

As shown in figure 5, the device 500 for speech recognition of the present embodiment includes: matching unit 501, recognition unit 502, analytical unit 503 and execution unit 504.Wherein, matching unit 501 are configured in response to receive first segment voice, First segment voice is matched with scheduled keyword set；Recognition unit 502, if being configured to successful match, after continued access Second segment voice is received, and carries out speech recognition after first segment voice and second segment voice are merged and obtains speech recognition result text This；Analytical unit 503 is configured to speech recognition result text progress semantic analysis obtaining phonetic order；Execution unit 504, belong to same semantic domain with matched keyword if being configured to phonetic order, executes phonetic order.

In the present embodiment, for the matching unit 501 of the device of speech recognition 500, recognition unit 502, analytical unit 503 and the specific processing of execution unit 504 can be with reference to step 201, step 202, step 203, the step in Fig. 2 corresponding embodiment Rapid 204.

In some optional implementations of the present embodiment, matching unit 501 is further configured to: by first segment language Sound is converted into text information；Text information is matched with the scheduled keyword set of textual form.

In some optional implementations of the present embodiment, execution unit 504 is further configured to: if phonetic order It is not belonging to same semantic domain with matched keyword, then abandons phonetic order.

In some optional implementations of the present embodiment, it is more than predetermined frequency that scheduled keyword set, which is by extracting, What the common prefix word of secondary phonetic order obtained.

In some optional implementations of the present embodiment, the length of keyword is less than 4 in keyword set.

Below with reference to Fig. 6, it illustrates the electronic equipment that is suitable for being used to realize embodiment of the disclosure, (example is as shown in figure 1 Controller) 600 structural schematic diagram.Controller shown in Fig. 6 is only an example, should not be to the function of embodiment of the disclosure Any restrictions can be brought with use scope.

As shown in fig. 6, electronic equipment 600 may include processing unit (such as central processing unit, graphics processor etc.) 601, random access can be loaded into according to the program being stored in read-only memory (ROM) 602 or from storage device 608 Program in memory (RAM) 603 and execute various movements appropriate and processing.In RAM 603, it is also stored with electronic equipment Various programs and data needed for 600 operations.Processing unit 601, ROM 602 and RAM603 are connected with each other by bus 604. Input/output (I/O) interface 605 is also connected to bus 604.

In general, following device can connect to I/O interface 605: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 606 of head, microphone, accelerometer, gyroscope etc.；Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 607 of dynamic device etc.；Storage device 608 including such as tape, hard disk etc.；And communication device 609.Communication device 609, which can permit electronic equipment 600, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 6 shows tool There is the electronic equipment 600 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with Alternatively implement or have more or fewer devices.Each box shown in Fig. 6 can represent a device, can also root According to needing to represent multiple devices.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 609, or from storage device 608 It is mounted, or is mounted from ROM 602.When the computer program is executed by processing unit 601, the implementation of the disclosure is executed The above-mentioned function of being limited in the method for example.It should be noted that computer-readable medium described in embodiment of the disclosure can be with It is computer-readable signal media or computer readable storage medium either the two any combination.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example of computer readable storage medium can include but is not limited to: have The electrical connection of one or more conducting wires, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In embodiment of the disclosure, computer Readable storage medium storing program for executing can be any tangible medium for including or store program, which can be commanded execution system, device Either device use or in connection.And in embodiment of the disclosure, computer-readable signal media may include In a base band or as the data-signal that carrier wave a part is propagated, wherein carrying computer-readable program code.It is this The data-signal of propagation can take various forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate Combination.Computer-readable signal media can also be any computer-readable medium other than computer readable storage medium, should Computer-readable signal media can send, propagate or transmit for by instruction execution system, device or device use or Person's program in connection.The program code for including on computer-readable medium can transmit with any suitable medium, Including but not limited to: electric wire, optical cable, RF (radio frequency) etc. or above-mentioned any appropriate combination.

Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment；It is also possible to individualism, and not It is fitted into the electronic equipment.Above-mentioned computer-readable medium carries one or more program, when said one or more A program by the electronic equipment execute when so that the electronic equipment: in response to receiving first segment voice, by first segment voice with Scheduled keyword set is matched；If successful match, second segment voice is continued to, and by first segment voice and second Duan Yuyin carries out speech recognition after merging and obtains speech recognition result text；The progress semantic analysis of speech recognition result text is obtained To phonetic order；If phonetic order and matched keyword belong to same semantic domain, phonetic order is executed.

The behaviour for executing embodiment of the disclosure can be write with one or more programming languages or combinations thereof The computer program code of work, described program design language include object oriented program language-such as Java, Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in embodiment of the disclosure can be realized by way of software, can also be passed through The mode of hardware is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor Including matching unit, recognition unit, analytical unit and execution unit.Wherein, the title of these units is not under certain conditions The restriction to the unit itself is constituted, for example, matching unit is also described as " in response to receiving first segment voice, inciting somebody to action The first segment voice and scheduled keyword set carry out matched unit ".

Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of method for voice recognition, comprising:

In response to receiving first segment voice, the first segment voice is matched with scheduled keyword set；

If successful match, second segment voice is continued to, and will be after the first segment voice and second segment voice merging It carries out speech recognition and obtains speech recognition result text；

The progress semantic analysis of institute's speech recognition result text is obtained into phonetic order；

If the phonetic order and matched keyword belong to same semantic domain, the phonetic order is executed.

2. described to carry out the first segment voice and scheduled keyword set according to the method described in claim 1, wherein Matching, comprising:

The first segment voice is converted into text information；

The text information is matched with the scheduled keyword set of textual form.

3. according to the method described in claim 1, wherein, the method also includes:

If the phonetic order and matched keyword are not belonging to same semantic domain, the phonetic order is abandoned.

4. according to the method described in claim 1, wherein, it is more than the predetermined frequency that the scheduled keyword set, which is by extracting, The common prefix word of phonetic order obtain.

5. method described in one of -4 according to claim 1, wherein the length of keyword is less than 4 in the keyword set.

6. a kind of device for speech recognition, comprising:

Matching unit is configured in response to receive first segment voice, by the first segment voice and scheduled keyword set Conjunction is matched；

Recognition unit continues to second segment voice if being configured to successful match, and by the first segment voice and described Second segment voice carries out speech recognition after merging and obtains speech recognition result text；

Analytical unit is configured to institute's speech recognition result text progress semantic analysis obtaining phonetic order；

Execution unit, if being configured to the phonetic order and matched keyword belongs to same semantic domain, execute described in Phonetic order.

7. device according to claim 6, wherein the matching unit is further configured to:

The first segment voice is converted into text information；

The text information is matched with the scheduled keyword set of textual form.

8. device according to claim 6, wherein the execution unit is further configured to:

9. device according to claim 16, wherein it is more than predetermined frequency that the scheduled keyword set, which is by extracting, What the common prefix word of secondary phonetic order obtained.

10. the device according to one of claim 6-9, wherein the length of keyword is less than 4 in the keyword set.

11. a kind of electronic equipment, comprising:

One or more processors；

Storage device is stored thereon with one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 5.

12. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor Now such as method as claimed in any one of claims 1 to 5.