CN110047481A - Method for voice recognition and device - Google Patents
Method for voice recognition and device Download PDFInfo
- Publication number
- CN110047481A CN110047481A CN201910329635.9A CN201910329635A CN110047481A CN 110047481 A CN110047481 A CN 110047481A CN 201910329635 A CN201910329635 A CN 201910329635A CN 110047481 A CN110047481 A CN 110047481A
- Authority
- CN
- China
- Prior art keywords
- voice
- phonetic order
- keyword
- speech recognition
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000004458 analytical method Methods 0.000 claims abstract description 23
- 230000004044 response Effects 0.000 claims abstract description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 230000003993 interaction Effects 0.000 abstract description 10
- 230000008569 process Effects 0.000 abstract description 9
- 230000006870 function Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 230000002452 interceptive effect Effects 0.000 description 5
- 230000001755 vocal effect Effects 0.000 description 5
- 239000000284 extract Substances 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000005291 magnetic effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
Embodiment of the disclosure discloses method for voice recognition and device.One specific embodiment of this method includes: to match first segment voice with scheduled keyword set in response to receiving first segment voice;If successful match, second segment voice is continued to, and carry out speech recognition after first segment voice and second segment voice are merged and obtain speech recognition result text;The progress semantic analysis of speech recognition result text is obtained into phonetic order;If phonetic order and matched keyword belong to same semantic domain, phonetic order is executed.The movement that traditional voice interaction will be completed twice can be reduced to once by the embodiment.Speech recognition process combines semanteme, to effectively suppress false triggering.
Description
Technical field
Embodiment of the disclosure is related to field of computer technology, and in particular to method for voice recognition and device.
Background technique
Traditional speech recognition human-computer interaction scheme is all to need first to say a keyword to be waken up, and determines that user has
After specifying intention, then open the secondary interactive system of speech recognition.This mode passes through a preposition offline keyword
Identification efficiently solves universal phonetic identification CPU and occupies height, expends the problems such as customer flow, however, this mode is also brought
Problem, i.e., identification is all to need first to wake up once every time, in a real artificial intelligence product, still seems very slow-witted
Plate, not smart enoughization.One real artificial intelligent voice assistant, it should be if wanting to understand user at any time, go directly user
It is required.
Summary of the invention
Embodiment of the disclosure proposes method for voice recognition and device.
In a first aspect, embodiment of the disclosure provides a kind of method for voice recognition, comprising: in response to receiving
First segment voice matches first segment voice with scheduled keyword set;If successful match, second segment language is continued to
Sound, and carry out speech recognition after first segment voice and second segment voice are merged and obtain speech recognition result text;By voice
Recognition result text carries out semantic analysis and obtains phonetic order;If phonetic order and matched keyword belong to same semantic neck
Domain then executes phonetic order.
In some embodiments, first segment voice is matched with scheduled keyword set, comprising: by first segment language
Sound is converted into text information;Text information is matched with the scheduled keyword set of textual form.
In some embodiments, this method further include: if phonetic order and matched keyword are not belonging to same semantic neck
Domain then abandons phonetic order.
In some embodiments, scheduled keyword set is public more than the phonetic order of the predetermined frequency by extracting
Prefix word obtains.
In some embodiments, in keyword set the length of keyword less than 4.
Second aspect, embodiment of the disclosure provide a kind of device for speech recognition, comprising: matching unit, quilt
It is configured to receive first segment voice, first segment voice is matched with scheduled keyword set;Recognition unit,
If being configured to successful match, second segment voice is continued to, and carry out after first segment voice and second segment voice are merged
Speech recognition obtains speech recognition result text;Analytical unit is configured to speech recognition result text carrying out semantic analysis
Obtain phonetic order;Execution unit belongs to same semantic domain with matched keyword if being configured to phonetic order, executes
Phonetic order.
In some embodiments, matching unit is further configured to: first segment voice is converted into text information;It will be literary
Word information is matched with the scheduled keyword set of textual form.
In some embodiments, execution unit is further configured to: if phonetic order is not belonging to matched keyword
Same semantic domain, then abandon phonetic order.
In some embodiments, scheduled keyword set is public more than the phonetic order of the predetermined frequency by extracting
Prefix word obtains.
In some embodiments, in keyword set the length of keyword less than 4.
The third aspect, embodiment of the disclosure provide a kind of electronic equipment, comprising: one or more processors;Storage
Device is stored thereon with one or more programs, when one or more programs are executed by one or more processors, so that one
Or multiple processors are realized such as method any in first aspect.
Fourth aspect, embodiment of the disclosure provide a kind of computer-readable medium, are stored thereon with computer program,
Wherein, it realizes when program is executed by processor such as method any in first aspect.
The method for voice recognition and device that embodiment of the disclosure provides are used by extracting a collection of user's high frequency
Instruction, extract common key message, such as navigation type instruction, the disclosure can for specified semantic domain instruction,
The movement that traditional voice interaction will be completed twice can be reduced to once, overcome the stiff, stiff of traditional interactive mode, increased
The intelligence of voice system is added.In conjunction with semanteme, false triggering has effectively been suppressed, 0.5 time/hour can be reduced to from 10 times/hour.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the disclosure is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is that one embodiment of the disclosure can be applied to exemplary system architecture figure therein;
Fig. 2 is the flow chart according to one embodiment of the method for voice recognition of the disclosure;
Fig. 3 is the schematic diagram according to an application scenarios of the method for voice recognition of the disclosure;
Fig. 4 is the flow chart according to another embodiment of the method for voice recognition of the disclosure;
Fig. 5 is the structural schematic diagram according to one embodiment of the device for speech recognition of the disclosure;
Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of embodiment of the disclosure.
Specific embodiment
The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can phase
Mutually combination.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the method for voice recognition of the disclosure or the implementation of the device for speech recognition
The exemplary system architecture 100 of example.
As shown in Figure 1, system architecture 100 may include microphone 101, controller 102, speech recognition server 103, language
Reason and good sense solution server 104.Network between controller 102, speech recognition server 103, semantic understanding server 104 to mention
For the medium of communication link.Network may include various connection types, such as wired, wireless communication link or fiber optic cables etc.
Deng.
User can be used microphone 101 and input voice to controller 102.Then controller 102 passes through network and voice
Server 103, the interaction of semantic understanding server 104 are identified, to receive or send message etc..Microphone 101 can be mounted in
Voice-input device in the mobile devices such as unmanned vehicle, microphone 101 can also be the built-in device of mobile phone, computer etc..Control
Device can be the built-in device that mobile unit is also possible to mobile phone, computer etc..Controller 102 has the function for sending and receiving information
Energy.
Speech recognition server 103 turns for receiving the voice of the transmission of controller 102, and by the vocabulary content in voice
It is changed to computer-readable input, such as key, binary coding or character string.With Speaker Identification and speaker verification
Difference, the latter attempt the speaker of identification or confirmation sending voice rather than vocabulary content included in it.Speech-recognition services
Speech recognition system is installed on device 102.Speech recognition system generally divides training and decoding two stages.Training, i.e., by a large amount of
The voice data training acoustic model of mark.Decoding passes through acoustic model and language model for the voice data outside training set
It is identified as text, trained acoustic model quality directly affects the precision of identification.
Semantic understanding server 103 for receiving the text results of the transmission of controller 102, and is carried out according to text results
Semantic analysis.Semantic analysis refers to various methods, learns and understands semantic content represented by one section of text, any pair of language
Understanding can be classified as the scope of semantic analysis.One section of text is usually made of word, sentence and paragraph, according to understanding object
Linguistic unit it is different, semantic analysis can be further broken into the semantic analysis of vocabulary grade, Sentence-level semantic analysis and chapter again
Grade semantic analysis.In general, the semantic analysis of vocabulary grade is concerned with the semanteme for how obtaining or distinguishing word, and Sentence-level is semantic
Analysis then attempts to analyze semanteme expressed by entire sentence, and discourse semantics analysis is intended to study the inherent knot of natural language text
Structure simultaneously understands the semantic relation between text unit (can be sentence subordinate clause or paragraph).Simply, the target of semantic analysis is just
It is to realize the automatic language in each linguistic unit (including vocabulary, sentence and chapter etc.) by establishing effective model and system
Justice analysis, to realize the true semanteme for understanding entire text representation.
It should be noted that speech recognition server 103, semantic understanding server 104 can be hardware, it is also possible to soft
Part.When server is hardware, the distributed server cluster of multiple server compositions may be implemented into, list also may be implemented into
A server.When server is software, multiple softwares or software module may be implemented into (such as providing Distributed Services
Multiple softwares or software module), single software or software module also may be implemented into.It is not specifically limited herein.
It should be noted that method for voice recognition provided by embodiment of the disclosure is generally by controller 102
It executes, correspondingly, the device for speech recognition is generally positioned in controller 102.
It should be understood that the number of microphone, controller, speech recognition server, semantic understanding server in Fig. 1 is only
It is schematical.According to needs are realized, any number of microphone, controller, speech recognition server, semantic reason can have
Solve server.
With continued reference to Fig. 2, the process of one embodiment of the method for voice recognition according to the disclosure is shown
200.The method for voice recognition, comprising the following steps:
Step 201, in response to receiving first segment voice, by first segment voice and the progress of scheduled keyword set
Match.
In the present embodiment, the executing subject (such as controller shown in FIG. 1) of the method for voice can lead to for identification
It crosses wired connection mode or radio connection and obtains continuous speech frame in real time from microphone.Existing voice can be used to call out
The technology of waking up matches first segment voice with scheduled keyword set.
Here first segment voice refers to the speech frame after voice starting point.Between first segment voice and second segment voice
There may be pauses.The user that can make an appointment slightly is made a short pause after saying keyword so that it is convenient to detect first segment voice
It is whole afterwards to carry out speech recognition match.Also each speech frame can be matched with keyword in real time, it is complete until being matched to
Keyword, used speech frame are first segment voice.Refer here to speech terminals detection technology, speech terminals detection technology
Refer to detected in noise circumstance people in the Duan Yuyin of end of speaking that loquiturs, i.e. the detection people starting point of saying a word
With tail point.In speech recognition process each time, before speech recognition engine starts to process, need to pass through speech terminals detection
Technology to carry out cutting to voice data.The average energy that a speech frame just calculates the speech frame is often got, then should
The average energy of speech frame and preset starting point thresholding are compared.If the average energy of the speech frame is greater than preset starting point
Thresholding, then it is assumed that the speech frame is the start frame of voice to be identified.Speech frame after including since start frame is real-time
It is sent to identification engine, obtains the intermediate recognition result of speech recognition.Rather than until detect after tail point just will from starting point to
One section of voice of tail point issues identification engine together.Identification engine obtains text results for carrying out speech recognition.Identify engine
It can be local, be also possible to cloud.The process of speech recognition includes: input voice, speech terminals detection, extracts acoustics
Feature, signal processing, identification net mate, identification decoding, judging confidence, identification text results.
Scheduled keyword set can be the speech form prerecorded keyword set (for example, " I will go/I
Listen/phone "), it can also be the keyword set of textual form.It, can be preparatory for the keyword set of speech form
The vocal print feature of each keyword of speech form is extracted, then by each pass of the vocal print feature of first segment voice and speech form
The vocal print feature of keyword carries out similarity calculation, if the vocal print feature of first segment voice and the vocal print feature of some keyword it
Between similarity be greater than predetermined similarity threshold, then it is assumed that first segment voice and the Keywords matching success.If in keyword
It can not find the keyword for being greater than predetermined similarity threshold with first segment voice similarity in set, then it is assumed that it fails to match, does not hold
Row step 202-204, but voice is continued to test, it waits and occurring with the voice of Keywords matching.
In some optional implementations of the present embodiment, by first segment voice and the progress of scheduled keyword set
Match, comprising: first segment voice is converted into text information;The scheduled keyword set of text information and textual form is carried out
Matching.First segment voice can be converted into text information in local or cloud.Then again by text information and textual form
Scheduled keyword set is matched.The similarity calculation of text is carried out, if the text letter after the conversion of first segment voice
Similarity between breath and the text information of some keyword is greater than predetermined similarity threshold, then it is assumed that first segment voice and the pass
Keyword successful match.
Step 202, if successful match, second segment voice is continued to, and first segment voice and second segment voice are closed
And it carries out speech recognition afterwards and obtains speech recognition result text.
In the present embodiment, second segment voice is the speech frame after detecting keyword.The start frame of second segment voice
For the tail point speech frame of first segment voice, the tail point of second segment voice is the voice that average energy is less than preset tail point thresholding
Frame.The average energy that the speech frame after first speech frame just calculates the speech frame is often got, then by the speech frame
Average energy and preset tail point thresholding are compared.If the average energy of the speech frame is less than preset tail point thresholding,
Think that the speech frame is tail point speech frame.Can't think that this section of voice is at this time, may be of short duration pause and
?.Therefore a tail point time-out time can be set, if there is not average energy greater than tail point thresholding in tail point time-out time
Speech frame then illustrates that this section of language also terminates, has obtained second segment voice.First segment voice and second segment voice can be merged laggard
Row speech recognition obtains speech recognition result text.For example, first segment voice is " I will go ", second segment voice is " the Forbidden City ",
It is " I will remove the Forbidden City " that carry out speech recognition after then merging, which obtains speech recognition result text,.
Step 203, the progress semantic analysis of speech recognition result text is obtained into phonetic order.
In the present embodiment, speech recognition result text can be sent to the progress semantic analysis of semantic understanding server to obtain
Phonetic order.For example, being analyzed " I will remove the Forbidden City " to obtain phonetic order " starting to navigate, destination is the Forbidden City ".
Step 204, if phonetic order and matched keyword belong to same semantic domain, phonetic order is executed.
In the present embodiment, semantic domain has been divided for the keyword in keyword set in advance, for example, " I will go "
Semantic domain is navigation.The semantic domain of " I will listen " is to play music, and the semantic domain of " phoning " is to make a phone call.It is complete
After above step, so that it may realize that primary interaction can go directly required by user.However, light is done so not enough, traditional voice
The keyword of interactive system requires at least four words, requires discrimination bigger between each word, does so, it is ensured that wakes up
In the sufficiently high situation of rate, false wake-up rate is also suppressed fine.And the application breaks through the limitation of keyword, for normal
We extract prefix word-" I will go, and/I will listen/phones " to High Frequency Instruction, by this batch of prefix word simultaneously as pass
Keyword trigger condition, than four words of triliteral keyword are easier by false triggering in addition, the two factors increase false triggering
Risk.In order to suppress false triggering, the application combination semanteme judges, once triggering, speech text result can be sent out
It is sent to natural language understanding module, after parsing semanteme, we can discriminate whether the preset Key Words for us according to semanteme
Adopted field.If it is, notifying that man-machine interface layer is shown or voice broadcast accordingly.If it is not, then being abandoned on backstage
Current to monitor as a result, restarting keyword, for a user, these movements are entirely noninductive.Facts proved that this do
Method is necessary, if false triggering is 10 times/hour without this Semantic judgement;After being suppressed, 0.5 time/hour.
Optionally, if phonetic order and matched keyword are not belonging to same semantic domain, phonetic order is abandoned.Example
Such as, matched keyword is " making a phone call ", and the phonetic order identified is " navigation destination is Xizhimen ", they are not belonging to
There is false triggering in same semantic domain, therefore gives up the phonetic order, does not execute.
With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for voice recognition of the present embodiment
Figure.In the application scenarios of Fig. 3, controller real-time detection voice, when detecting voice starting point, not by the speech frame received
Disconnected ground is matched with keyword set, is matched to keyword " I will go ".After keyword is triggered, traditional voice interaction is
Starting universal phonetic to identify and restart to record, the disclosure is also starting universal phonetic identification at this moment, the difference is that this
It is open the recording of keyword cognitive phase to be multiplexed, and continue to record.The method of recording multiplexing is, by the finger of voice packet
Needle traces back to the starting point of keyword identification, it is ensured that the recording of keyword-" I will go, and/I will listen/phones "-is sent completely
To universal phonetic identification engine processing.Obtained speech recognition result text is subjected to semantic analysis, obtains significant semanteme, i.e.,
Phonetic order.If phonetic order and matched keyword belong to same semantic domain, voice is executed by man-machine interface and is referred to
It enables.
The method provided by the above embodiment of the disclosure, can be by traditional voice by instructing for specified semantic domain
The movement that interaction will be completed twice is reduced to once, is overcome the stiff, stiff of traditional interactive mode, is increased voice system
Intelligence.In conjunction with semanteme, false triggering has effectively been suppressed.
With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of method for voice recognition.The use
In the process 400 of the method for speech recognition, comprising the following steps:
Step 401, the common prefix word extracted more than the phonetic order of the predetermined frequency generates scheduled keyword set.
In the present embodiment, the executing subject (such as controller shown in FIG. 1) of the method for voice can be from for identification
The common prefix word extracted in the phonetic order executed more than the phonetic order of the predetermined frequency generates scheduled keyword
Set.For example, common prefix word " I will go " occurs 1300 times (the predetermined frequency is 10), then " I will go " can be added to key
In set of words.Traditional voice interaction is with fixed keyword as trigger condition, the disadvantage is that excessively stiff, allows human-computer interaction
Become very stiff, interaction all must first say one " the small small degree of degree " or other wake-up words every time.And by pre- in the disclosure
If the key word information of high frequency semantic domain, directly by a collection of High Frequency Instruction such as " I will go, and/I will listen/phones XXX "
Common prefix word extracts, and carries out speech monitoring as keyword.
The length of the keyword extracted by common prefix word is usually less than 4, and not to the discrimination of each word
It is required.It is subsequent to be differentiated by semanteme again to inhibit false triggering.
Step 402, in response to receiving first segment voice, by first segment voice and the progress of scheduled keyword set
Match.
Step 403, if successful match, second segment voice is continued to, and first segment voice and second segment voice are closed
And it carries out speech recognition afterwards and obtains speech recognition result text.
Step 404, the progress semantic analysis of speech recognition result text is obtained into phonetic order.
Step 405, if phonetic order and matched keyword belong to same semantic domain, phonetic order is executed.
Step 402-405 and step 201-204 are essentially identical, therefore repeat no more.
Figure 4, it is seen that the method for voice recognition compared with the corresponding embodiment of Fig. 2, in the present embodiment
Process 400 embody generate keyword the step of.The scheme of the present embodiment description can introduce more keyword phases as a result,
Data are closed, to realize that more fully phonetic order detects and reduces interactive voice number.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, present disclose provides one kind to know for voice
One embodiment of other device, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer
For in various electronic equipments.
As shown in figure 5, the device 500 for speech recognition of the present embodiment includes: matching unit 501, recognition unit
502, analytical unit 503 and execution unit 504.Wherein, matching unit 501 are configured in response to receive first segment voice,
First segment voice is matched with scheduled keyword set;Recognition unit 502, if being configured to successful match, after continued access
Second segment voice is received, and carries out speech recognition after first segment voice and second segment voice are merged and obtains speech recognition result text
This;Analytical unit 503 is configured to speech recognition result text progress semantic analysis obtaining phonetic order;Execution unit
504, belong to same semantic domain with matched keyword if being configured to phonetic order, executes phonetic order.
In the present embodiment, for the matching unit 501 of the device of speech recognition 500, recognition unit 502, analytical unit
503 and the specific processing of execution unit 504 can be with reference to step 201, step 202, step 203, the step in Fig. 2 corresponding embodiment
Rapid 204.
In some optional implementations of the present embodiment, matching unit 501 is further configured to: by first segment language
Sound is converted into text information;Text information is matched with the scheduled keyword set of textual form.
In some optional implementations of the present embodiment, execution unit 504 is further configured to: if phonetic order
It is not belonging to same semantic domain with matched keyword, then abandons phonetic order.
In some optional implementations of the present embodiment, it is more than predetermined frequency that scheduled keyword set, which is by extracting,
What the common prefix word of secondary phonetic order obtained.
In some optional implementations of the present embodiment, the length of keyword is less than 4 in keyword set.
Below with reference to Fig. 6, it illustrates the electronic equipment that is suitable for being used to realize embodiment of the disclosure, (example is as shown in figure 1
Controller) 600 structural schematic diagram.Controller shown in Fig. 6 is only an example, should not be to the function of embodiment of the disclosure
Any restrictions can be brought with use scope.
As shown in fig. 6, electronic equipment 600 may include processing unit (such as central processing unit, graphics processor etc.)
601, random access can be loaded into according to the program being stored in read-only memory (ROM) 602 or from storage device 608
Program in memory (RAM) 603 and execute various movements appropriate and processing.In RAM 603, it is also stored with electronic equipment
Various programs and data needed for 600 operations.Processing unit 601, ROM 602 and RAM603 are connected with each other by bus 604.
Input/output (I/O) interface 605 is also connected to bus 604.
In general, following device can connect to I/O interface 605: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph
As the input unit 606 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration
The output device 607 of dynamic device etc.;Storage device 608 including such as tape, hard disk etc.;And communication device 609.Communication device
609, which can permit electronic equipment 600, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 6 shows tool
There is the electronic equipment 600 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with
Alternatively implement or have more or fewer devices.Each box shown in Fig. 6 can represent a device, can also root
According to needing to represent multiple devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communication device 609, or from storage device 608
It is mounted, or is mounted from ROM 602.When the computer program is executed by processing unit 601, the implementation of the disclosure is executed
The above-mentioned function of being limited in the method for example.It should be noted that computer-readable medium described in embodiment of the disclosure can be with
It is computer-readable signal media or computer readable storage medium either the two any combination.It is computer-readable
Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or
Device, or any above combination.The more specific example of computer readable storage medium can include but is not limited to: have
The electrical connection of one or more conducting wires, portable computer diskette, hard disk, random access storage device (RAM), read-only memory
(ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-
ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In embodiment of the disclosure, computer
Readable storage medium storing program for executing can be any tangible medium for including or store program, which can be commanded execution system, device
Either device use or in connection.And in embodiment of the disclosure, computer-readable signal media may include
In a base band or as the data-signal that carrier wave a part is propagated, wherein carrying computer-readable program code.It is this
The data-signal of propagation can take various forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate
Combination.Computer-readable signal media can also be any computer-readable medium other than computer readable storage medium, should
Computer-readable signal media can send, propagate or transmit for by instruction execution system, device or device use or
Person's program in connection.The program code for including on computer-readable medium can transmit with any suitable medium,
Including but not limited to: electric wire, optical cable, RF (radio frequency) etc. or above-mentioned any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not
It is fitted into the electronic equipment.Above-mentioned computer-readable medium carries one or more program, when said one or more
A program by the electronic equipment execute when so that the electronic equipment: in response to receiving first segment voice, by first segment voice with
Scheduled keyword set is matched;If successful match, second segment voice is continued to, and by first segment voice and second
Duan Yuyin carries out speech recognition after merging and obtains speech recognition result text;The progress semantic analysis of speech recognition result text is obtained
To phonetic order;If phonetic order and matched keyword belong to same semantic domain, phonetic order is executed.
The behaviour for executing embodiment of the disclosure can be write with one or more programming languages or combinations thereof
The computer program code of work, described program design language include object oriented program language-such as Java,
Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language
Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence
Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or
It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet
It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit
It is connected with ISP by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in embodiment of the disclosure can be realized by way of software, can also be passed through
The mode of hardware is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor
Including matching unit, recognition unit, analytical unit and execution unit.Wherein, the title of these units is not under certain conditions
The restriction to the unit itself is constituted, for example, matching unit is also described as " in response to receiving first segment voice, inciting somebody to action
The first segment voice and scheduled keyword set carry out matched unit ".
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (12)
1. a kind of method for voice recognition, comprising:
In response to receiving first segment voice, the first segment voice is matched with scheduled keyword set;
If successful match, second segment voice is continued to, and will be after the first segment voice and second segment voice merging
It carries out speech recognition and obtains speech recognition result text;
The progress semantic analysis of institute's speech recognition result text is obtained into phonetic order;
If the phonetic order and matched keyword belong to same semantic domain, the phonetic order is executed.
2. described to carry out the first segment voice and scheduled keyword set according to the method described in claim 1, wherein
Matching, comprising:
The first segment voice is converted into text information;
The text information is matched with the scheduled keyword set of textual form.
3. according to the method described in claim 1, wherein, the method also includes:
If the phonetic order and matched keyword are not belonging to same semantic domain, the phonetic order is abandoned.
4. according to the method described in claim 1, wherein, it is more than the predetermined frequency that the scheduled keyword set, which is by extracting,
The common prefix word of phonetic order obtain.
5. method described in one of -4 according to claim 1, wherein the length of keyword is less than 4 in the keyword set.
6. a kind of device for speech recognition, comprising:
Matching unit is configured in response to receive first segment voice, by the first segment voice and scheduled keyword set
Conjunction is matched;
Recognition unit continues to second segment voice if being configured to successful match, and by the first segment voice and described
Second segment voice carries out speech recognition after merging and obtains speech recognition result text;
Analytical unit is configured to institute's speech recognition result text progress semantic analysis obtaining phonetic order;
Execution unit, if being configured to the phonetic order and matched keyword belongs to same semantic domain, execute described in
Phonetic order.
7. device according to claim 6, wherein the matching unit is further configured to:
The first segment voice is converted into text information;
The text information is matched with the scheduled keyword set of textual form.
8. device according to claim 6, wherein the execution unit is further configured to:
If the phonetic order and matched keyword are not belonging to same semantic domain, the phonetic order is abandoned.
9. device according to claim 16, wherein it is more than predetermined frequency that the scheduled keyword set, which is by extracting,
What the common prefix word of secondary phonetic order obtained.
10. the device according to one of claim 6-9, wherein the length of keyword is less than 4 in the keyword set.
11. a kind of electronic equipment, comprising:
One or more processors;
Storage device is stored thereon with one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as method as claimed in any one of claims 1 to 5.
12. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor
Now such as method as claimed in any one of claims 1 to 5.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910329635.9A CN110047481B (en) | 2019-04-23 | 2019-04-23 | Method and apparatus for speech recognition |
CN202110684737.XA CN113327609B (en) | 2019-04-23 | 2019-04-23 | Method and apparatus for speech recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910329635.9A CN110047481B (en) | 2019-04-23 | 2019-04-23 | Method and apparatus for speech recognition |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110684737.XA Division CN113327609B (en) | 2019-04-23 | 2019-04-23 | Method and apparatus for speech recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110047481A true CN110047481A (en) | 2019-07-23 |
CN110047481B CN110047481B (en) | 2021-07-09 |
Family
ID=67278748
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110684737.XA Active CN113327609B (en) | 2019-04-23 | 2019-04-23 | Method and apparatus for speech recognition |
CN201910329635.9A Active CN110047481B (en) | 2019-04-23 | 2019-04-23 | Method and apparatus for speech recognition |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110684737.XA Active CN113327609B (en) | 2019-04-23 | 2019-04-23 | Method and apparatus for speech recognition |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN113327609B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110706696A (en) * | 2019-09-25 | 2020-01-17 | 珠海格力电器股份有限公司 | Voice control method and device |
CN110808031A (en) * | 2019-11-22 | 2020-02-18 | 大众问问(北京)信息科技有限公司 | Voice recognition method and device and computer equipment |
CN111640434A (en) * | 2020-06-05 | 2020-09-08 | 三星电子(中国)研发中心 | Method and apparatus for controlling voice device |
CN112017647A (en) * | 2020-09-04 | 2020-12-01 | 北京蓦然认知科技有限公司 | Semantic-combined speech recognition method, device and system |
CN112201246A (en) * | 2020-11-19 | 2021-01-08 | 深圳市欧瑞博科技股份有限公司 | Intelligent control method and device based on voice, electronic equipment and storage medium |
CN112463939A (en) * | 2020-11-12 | 2021-03-09 | 深圳市欢太科技有限公司 | Man-machine conversation method, system, service device and computer storage medium |
CN112466289A (en) * | 2020-12-21 | 2021-03-09 | 北京百度网讯科技有限公司 | Voice instruction recognition method and device, voice equipment and storage medium |
CN112466304A (en) * | 2020-12-03 | 2021-03-09 | 北京百度网讯科技有限公司 | Offline voice interaction method, device, system, equipment and storage medium |
CN113611294A (en) * | 2021-06-30 | 2021-11-05 | 展讯通信(上海)有限公司 | Voice wake-up method, apparatus, device and medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114399992B (en) * | 2021-12-03 | 2022-12-06 | 北京百度网讯科技有限公司 | Voice instruction response method, device and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102118886A (en) * | 2010-01-04 | 2011-07-06 | 中国移动通信集团公司 | Recognition method of voice information and equipment |
CN103219005A (en) * | 2013-04-28 | 2013-07-24 | 北京云知声信息技术有限公司 | Speech recognition method and device |
CN103593230A (en) * | 2012-08-13 | 2014-02-19 | 百度在线网络技术(北京)有限公司 | Background task control method of mobile terminal and mobile terminal |
US20160086606A1 (en) * | 2011-01-05 | 2016-03-24 | Interactions Llc | Automated Speech Recognition Proxy System for Natural Language Understanding |
CN107146618A (en) * | 2017-06-16 | 2017-09-08 | 北京云知声信息技术有限公司 | Method of speech processing and device |
CN107195303A (en) * | 2017-06-16 | 2017-09-22 | 北京云知声信息技术有限公司 | Method of speech processing and device |
CN108881466A (en) * | 2018-07-04 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | Exchange method and device |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559289B (en) * | 2013-11-08 | 2017-01-18 | 科大讯飞股份有限公司 | Language-irrelevant keyword search method and system |
CN104901926A (en) * | 2014-03-06 | 2015-09-09 | 武汉元宝创意科技有限公司 | Voiceprint feature based remote authentication payment system and method |
CN104110884A (en) * | 2014-03-14 | 2014-10-22 | 芜湖美的厨卫电器制造有限公司 | Water heater and control method thereof |
US20170116994A1 (en) * | 2015-10-26 | 2017-04-27 | Le Holdings(Beijing)Co., Ltd. | Voice-awaking method, electronic device and storage medium |
CN106250474B (en) * | 2016-07-29 | 2020-06-23 | Tcl科技集团股份有限公司 | Voice control processing method and system |
CN108962235B (en) * | 2017-12-27 | 2021-09-17 | 北京猎户星空科技有限公司 | Voice interaction method and device |
-
2019
- 2019-04-23 CN CN202110684737.XA patent/CN113327609B/en active Active
- 2019-04-23 CN CN201910329635.9A patent/CN110047481B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102118886A (en) * | 2010-01-04 | 2011-07-06 | 中国移动通信集团公司 | Recognition method of voice information and equipment |
US20160086606A1 (en) * | 2011-01-05 | 2016-03-24 | Interactions Llc | Automated Speech Recognition Proxy System for Natural Language Understanding |
CN103593230A (en) * | 2012-08-13 | 2014-02-19 | 百度在线网络技术(北京)有限公司 | Background task control method of mobile terminal and mobile terminal |
CN103219005A (en) * | 2013-04-28 | 2013-07-24 | 北京云知声信息技术有限公司 | Speech recognition method and device |
CN107146618A (en) * | 2017-06-16 | 2017-09-08 | 北京云知声信息技术有限公司 | Method of speech processing and device |
CN107195303A (en) * | 2017-06-16 | 2017-09-22 | 北京云知声信息技术有限公司 | Method of speech processing and device |
CN108881466A (en) * | 2018-07-04 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | Exchange method and device |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110706696A (en) * | 2019-09-25 | 2020-01-17 | 珠海格力电器股份有限公司 | Voice control method and device |
CN110808031A (en) * | 2019-11-22 | 2020-02-18 | 大众问问(北京)信息科技有限公司 | Voice recognition method and device and computer equipment |
CN111640434A (en) * | 2020-06-05 | 2020-09-08 | 三星电子(中国)研发中心 | Method and apparatus for controlling voice device |
CN112017647A (en) * | 2020-09-04 | 2020-12-01 | 北京蓦然认知科技有限公司 | Semantic-combined speech recognition method, device and system |
CN112017647B (en) * | 2020-09-04 | 2024-05-03 | 深圳海冰科技有限公司 | Semantic-combined voice recognition method, device and system |
CN112463939A (en) * | 2020-11-12 | 2021-03-09 | 深圳市欢太科技有限公司 | Man-machine conversation method, system, service device and computer storage medium |
CN112463939B (en) * | 2020-11-12 | 2024-05-24 | 深圳市欢太科技有限公司 | Man-machine conversation method, system, service equipment and computer storage medium |
CN112201246A (en) * | 2020-11-19 | 2021-01-08 | 深圳市欧瑞博科技股份有限公司 | Intelligent control method and device based on voice, electronic equipment and storage medium |
CN112201246B (en) * | 2020-11-19 | 2023-11-28 | 深圳市欧瑞博科技股份有限公司 | Intelligent control method and device based on voice, electronic equipment and storage medium |
CN112466304B (en) * | 2020-12-03 | 2023-09-08 | 北京百度网讯科技有限公司 | Offline voice interaction method, device, system, equipment and storage medium |
CN112466304A (en) * | 2020-12-03 | 2021-03-09 | 北京百度网讯科技有限公司 | Offline voice interaction method, device, system, equipment and storage medium |
CN112466289A (en) * | 2020-12-21 | 2021-03-09 | 北京百度网讯科技有限公司 | Voice instruction recognition method and device, voice equipment and storage medium |
CN113611294A (en) * | 2021-06-30 | 2021-11-05 | 展讯通信(上海)有限公司 | Voice wake-up method, apparatus, device and medium |
Also Published As
Publication number | Publication date |
---|---|
CN110047481B (en) | 2021-07-09 |
CN113327609B (en) | 2022-06-28 |
CN113327609A (en) | 2021-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110047481A (en) | Method for voice recognition and device | |
US11211062B2 (en) | Intelligent voice recognizing method with improved noise cancellation, voice recognizing apparatus, intelligent computing device and server | |
WO2022105861A1 (en) | Method and apparatus for recognizing voice, electronic device and medium | |
KR20210009596A (en) | Intelligent voice recognizing method, apparatus, and intelligent computing device | |
CN108428446A (en) | Audio recognition method and device | |
KR20190104941A (en) | Speech synthesis method based on emotion information and apparatus therefor | |
US11687526B1 (en) | Identifying user content | |
CN109961792A (en) | The method and apparatus of voice for identification | |
CN110689877A (en) | Voice end point detection method and device | |
CN109712610A (en) | The method and apparatus of voice for identification | |
KR20190106890A (en) | Speech synthesis method based on emotion information and apparatus therefor | |
KR20190104278A (en) | Intelligent voice recognizing method, apparatus, and intelligent computing device | |
US11417313B2 (en) | Speech synthesizer using artificial intelligence, method of operating speech synthesizer and computer-readable recording medium | |
CN111916088B (en) | Voice corpus generation method and device and computer readable storage medium | |
US20210110815A1 (en) | Method and apparatus for determining semantic meaning of pronoun | |
CN113129867B (en) | Training method of voice recognition model, voice recognition method, device and equipment | |
CN112102850A (en) | Processing method, device and medium for emotion recognition and electronic equipment | |
CN112735418A (en) | Voice interaction processing method and device, terminal and storage medium | |
KR20190096308A (en) | electronic device | |
CN111768789B (en) | Electronic equipment, and method, device and medium for determining identity of voice generator of electronic equipment | |
KR102312993B1 (en) | Method and apparatus for implementing interactive message using artificial neural network | |
CN113779208A (en) | Method and device for man-machine conversation | |
CN112669842A (en) | Man-machine conversation control method, device, computer equipment and storage medium | |
JP2019124952A (en) | Information processing device, information processing method, and program | |
CN113611316A (en) | Man-machine interaction method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20211009 Address after: 100176 101, floor 1, building 1, yard 7, Ruihe West 2nd Road, economic and Technological Development Zone, Daxing District, Beijing Patentee after: Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Address before: 100085 third floor, baidu building, No. 10, Shangdi 10th Street, Haidian District, Beijing Patentee before: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd. |