CN105869641A - Speech recognition device and speech recognition method - Google Patents
Speech recognition device and speech recognition method Download PDFInfo
- Publication number
- CN105869641A CN105869641A CN201510032839.8A CN201510032839A CN105869641A CN 105869641 A CN105869641 A CN 105869641A CN 201510032839 A CN201510032839 A CN 201510032839A CN 105869641 A CN105869641 A CN 105869641A
- Authority
- CN
- China
- Prior art keywords
- voice command
- acoustic model
- order word
- speech recognition
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses a speech recognition device and a speech recognition method. The speech recognition device comprises a unit which is configured to obtain speech inputted by a current user, a unit which is configured to split obtained speech and output at least two voice command segments, a unit which is configured to recognize a first predefined voice command from the voice command segments through using an acoustic model unrelated to a speaker, a unit which is configured to calculate a transformation matrix for the current user based on a voice command segment which is recognized as the first predefined voice command, a unit which is configured to select an acoustic model for the current user from acoustic models registered in the speech recognition device based the calculated transformation matrix, and a unit which is configured to recognize a second voice command from the voice command segments through using the selected acoustic model. According to the speech recognition device and the speech recognition method of the invention adopted, speech recognition performance can be improved through using the selected acoustic model (AM).
Description
Technical field
The present invention relates to speech recognition equipment and audio recognition method.
Background technology
The interface that speech recognition system provides users with the convenient, by this interface, user can be with
Any number of electronic equipment with speech identifying function interacts.For user operation such as
Multi-function printer (MFP), photographing unit, personal digital assistant (PDA) and mobile phone etc.
Electronic equipment, voice command recognition technology is the mode of most convenient.User can be via such as Mike
The voice-input device of wind etc., is directly inputted to the voice of oneself in electronic equipment, then, logical
Crossing use voice command recognition technology, the voice of user will be converted into voice command to operate electronics
Equipment.
Generally speaking, it should registered in advance or training is used for the acoustic model of voice command recognition
(AM), but, it is time-consuming for registering or training AM, and has many users to be unwilling to carry out
These operations.As the measure of this problem of reply, following technology is available: from other users
And/or the existing AM of other electronic equipments selects one group of AM.Such as, U.S. Patent No.
No. 7,103,549 disclose following method: personal attribute based on user and/or communication channel
Attribute, to select AM from existing AM, and the AM selected by utilization carries out voice knowledge
Not, to improve the speech recognition performance of user.The personal attribute of user includes sex, mother tongue, year
Age, race and local etc..Further, channel attribute includes connection type, telephone model, network mark
Know symbol, network attribute and background noise level etc..
But, in above-mentioned U.S. Patent No. 7,103,549, when user is initially at electronic equipment
In time account is set, or when user carries out initial session with electronic equipment, obtain user
Humanized and channel attribute.Therefore, personal attribute and channel attribute can not reflect that user's is instantaneous
Voice attribute and momentary surroundings attribute, the sound variation that causes of such as coughing, automobile pass suddenly
Road etc., thus, the AM selected based on these attributes can reduce the speech recognition performance of user.
Summary of the invention
Therefore, in view of above statement in the introduction, the problem to be solved in the present invention is from it
The existing AM of his user and/or other electronic equipments selects one group of AM for active user, wherein,
Selected AM can mesh well into distance speech attribute and the momentary surroundings genus of active user
Property so that by the AM selected by use, it is possible to increase the speech recognition performance of active user.
According to an aspect of the invention, it is provided a kind of speech recognition equipment, this speech recognition fills
Putting and include: voice-input unit, it is configured to obtain the voice inputted by active user;Voice
Cutting unit, it is configured to split the voice obtained and export at least two voice command section;
Predefined first voice command recognition unit, it is configured to by using the sound unrelated with speaker
Learn model, from voice command section, identify predefined first voice command;Transformation matrix calculates
Unit, it is configured to based on the voice command being identified as described predefined first voice command
Section, calculates the transformation matrix for described active user, wherein, the transformation matrix calculated
The described acoustic model unrelated with speaker can be made and be identified as described predefined first sound
The voice command section coupling of order;Model selection unit, it is configured to based on the change calculated
Change matrix, select to work as described from the acoustic model being registered in described speech recognition equipment
The acoustic model of front user;And the second voice command recognition unit, it is configured to by using
Selected acoustic model, identifies the second voice command from voice command section.
According to a further aspect in the invention, it is provided that a kind of audio recognition method, this speech recognition side
Method includes: phonetic entry step, it is thus achieved that the voice inputted in speech recognition equipment by active user;
Voice segmentation step, splits the voice obtained and exports at least two voice command section;Predefined
First voice command recognition step, by using the acoustic model unrelated with speaker, comes from sound
Command frame command phase identifies predefined first voice command;Transformation matrix calculation procedure, based on identified
For the voice command section of described predefined first voice command, calculate for described active user
Transformation matrix, wherein, the transformation matrix calculated can make the described sound unrelated with speaker
Learn model to mate with the voice command section being identified as described predefined first voice command;Model
Select step, based on the transformation matrix calculated, from being registered in described speech recognition equipment
Acoustic model in select the acoustic model for described active user;And second voice command know
Other step, by the acoustic model selected by use, identifies the second sound from voice command section
Order.
It is as noted previously, as the voice command split based on the voice by the instantaneous input of active user
A part (the most above-mentioned voice command section being identified as predefined first voice command) in Duan,
Calculating transformation matrix, therefore, the transformation matrix calculated can represent active user and work as front ring
The attribute in border, wherein, the attribute of active user can be the pronunciation attribute of active user and current
The sound variation attribute etc. of user, and the attribute of current environment can be current speech noise belong to
Property and the communication channel properties etc. of current speech.Thus, the AM selected based on transformation matrix
Distance speech attribute and the momentary surroundings attribute of active user can be meshed well into, and, logical
Cross the AM selected by using and can improve the speech recognition performance of active user.
By description referring to the drawings, further aspect of the present invention and advantage will become aobvious and
It is clear to.
Accompanying drawing explanation
It is merged in this specification and constitutes the accompanying drawing of a part of this specification exemplified with the reality of the present invention
Execute example, and be used for illustrating the principle of the present invention together with word description.
Fig. 1 is to illustrate according to the present invention, include that the several electronics with speech identifying function sets
The integrally-built block diagram of standby speech recognition system.
Fig. 2 is electricity that illustrate the exemplary embodiment according to the present invention, that have speech identifying function
First block diagram of the example of the internal structure of subset.
Fig. 3 be the first exemplary embodiment according to the present invention and the phonetic entry of electronic equipment
The block diagram of the built-in function of the speech recognition that unit is relevant.
Fig. 4 is the Model Selection shown in the first exemplary embodiment according to the present invention, Fig. 3
The block diagram of the built-in function of unit.
Fig. 5 be the second exemplary embodiment according to the present invention and the phonetic entry of electronic equipment
The block diagram of the built-in function of the speech recognition that unit is relevant.
Fig. 6 is electricity that illustrate the exemplary embodiment according to the present invention, that have speech identifying function
Second block diagram of another example of the internal structure of subset.
Fig. 7 be the first exemplary embodiment according to the present invention and the phonetic entry of electronic equipment
The flow chart of the speech recognition operation that unit is relevant.
Fig. 8 schematically show the first exemplary embodiment according to the present invention, for selecting
The flow chart of the step of AM based on phoneme (phoneme).
Fig. 9 schematically show the first exemplary embodiment according to the present invention, for selecting
The flow chart of the step of AM based on order word.
Figure 10 be the second exemplary embodiment according to the present invention and the phonetic entry of electronic equipment
The flow chart of the speech recognition operation that unit is relevant.
Detailed description of the invention
The exemplary embodiment of the present invention is described in detail next, with reference to accompanying drawing.It should be pointed out that,
Description below is the most only illustrative and exemplary of, and be in no way intended to limit the present invention and
Its application or purposes.The each element stated in an embodiment and positioned opposite, the numerical value of step
Expression formula and numerical value not delimit the scope of the invention, unless stated otherwise.Additionally, this
The technology known to the skilled person in field, method and apparatus can be not discussed in detail, but suitably
In the case of be regarded as the part of this specification.
Note that similar reference and letter refer to the similar item in figure, thus, once
Project is defined in one drawing, then need not this project is discussed again for figure afterwards.
Fig. 1 is to illustrate according to the present invention, include that the several electronics with speech identifying function sets
The integrally-built block diagram of standby speech recognition system.
As it is shown in figure 1, speech recognition system 100 is equipped with having any kind of speech identifying function
The electronic equipment of class, such as MFP 1, photographing unit 2, PDA 3, mobile phone 4, individual calculus
Machine (PC) 5 and the electronic equipment 6 of any other kind, and these electronic equipments are via network
7 are communicatively coupled each other.Type and the quantity of the electronic equipment of network 7 to be connected to do not limit
In the situation shown in Fig. 1.Any electronic equipment in speech recognition system 100 is all configured as
Receive the voice of user, and based on speech identifying function, identify from this voice and be used for operating this
The corresponding sound order of electronic equipment.
Above-mentioned speech identifying function can be realized by hardware and/or software.In one realization side
In formula, can be incorporated to be able to carry out the functional module of speech recognition or functional device in electronic equipment,
Thus, this electronic equipment will have corresponding speech identifying function.In another implementation, may be used
It is able to carry out the software program of speech recognition with storage in the storage device of electronic equipment, thus,
Electronic equipment also will have corresponding speech identifying function.Describe in detail next, with reference to accompanying drawing
Above two implementation.
(speech recognition equipment being incorporated in electronic equipment)
Fig. 2 is to illustrate according to the exemplary embodiment of the present invention, MFP 1 etc. in such as Fig. 1
First block diagram of example of internal structure of the electronic equipment 1 with speech identifying function, wherein,
In electronic equipment 1 (i.e. MFP 1), it is incorporated with hereinafter by with reference to Fig. 3~5 detailed description
Speech recognition equipment.Electronic equipment 1 can include CPU (CPU) 101, deposit at random
Access to memory (RAM) 102, read only memory (ROM) 103, hard disk 104, input equipment
105, speech recognition equipment 106, operating unit 107, outut device 108 and network interface 109,
And these parts are communicatively coupled each other via system bus 110.
CPU 101 can be any applicable programmable control device, and can be deposited by execution
The storage various application programs in ROM 103 or hard disk 104, perform various function described later.
RAM 102 is used for storing the program loaded from ROM 103 or hard disk 104 or data provisionally,
And also it is used as CPU 101 in order to perform the space of various program.Hard disk 104 can store many
The information of kind, such as operating system (OS), various application, control program, data and speak
Acoustic model (SI-AM) that person is unrelated and the AM etc. being registered by user or training.Additionally,
Or the AM of training registered in advance by manufacturer can be stored in ROM 103 or hard disk 104.
Input equipment 105 can include operating input equipment 115 and voice-input unit 125, and,
Input equipment 105 can make user based on the operation inputted by operation input equipment 115 or lead to
Cross the speech frame of voice-input unit 125 input, interact with electronic equipment 1.Operate defeated
Enter equipment 115 and can use each of such as button, keypad, rotating disk, touch-control wheel or touch screen etc.
The form of kind.Further, voice-input unit 125 can be mike.
According to embodiments of the invention detailed below, speech recognition equipment 106 will
Receive the voice of user from voice-input unit 125, and by from the voice received, know
Not corresponding voice command or corresponding voice content.
Operating unit 107 will perform the operation corresponding with the voice command identified.Outut device
108 can include display device 118 and voice-output unit 128, and, outut device 108 can
The voice command identified with display or output and/or the voice content identified.
Such as, when speech recognition equipment 106 identifies voice command from the voice received,
Operating unit 107 will perform electronic equipment 1 corresponding operating, such as print, duplicate, scan with
And send Email etc..Further, as optional operation, perform accordingly at operating unit 107
Operation before, display device 118 can show that the voice command that identifies is to obtain user really
Recognize, and/or voice-output unit 128 can export the voice command identified to obtain user
Confirmation.
Display device 118 can include cathode ray tube (CRT) or liquid crystal display, and, language
Sound output unit 128 can be equipped with the audio output apparatus of such as speaker etc..Additionally, operation
Input equipment 115 and display device 118 can be totally integrating or are incorporated to discretely.
Network interface 109 provides following interface, and this interface is for being connected to Fig. 1 by electronic equipment 1
Shown in network 7.Electronic equipment 1, via network interface 109, is connected with via network 7
Other electronic equipments (such as photographing unit 2, PDA 3) carry out data communication (such as sharing AM).
Alternately, can be that electronic equipment 1 arranges wave point, to carry out RFDC.
System bus 110 can provide following data transfer path, this data transfer path for such as
Under parts between or between following parts, mutually transmit data, described parts are CPU
101, RAM 102, ROM 103, hard disk 104, input equipment 105, speech recognition equipment 106,
Operating unit 107, outut device 108 and network interface 109 etc..Although being referred to as bus, but,
System bus 110 is not limited to any specific data transferring technique.
For speech recognition equipment 106, figure 3 illustrates speech recognition equipment 106
First example of internal functional elements.Fig. 3 be the first exemplary embodiment according to the present invention,
The built-in function of the speech recognition relevant with the voice-input unit 125 in Fig. 2 of electronic equipment 1
Block diagram.When CPU 101 performs to be stored in the program in ROM 103 and/or hard disk 104,
Following functional unit is achieved.
Before user operation electronic equipment 1, user need to utilize such as logged in by IC-card and
By the login method of any kind of fingerprint recognition login etc., log in electronic equipment 1.Then,
AM collection and order word list will be loaded from ROM 103 and/or hard disk 104 by CPU 101
To RAM 102, wherein, described order word list can be by electronic equipment 1 operation based on self
Automatically arrange, or can be arranged by user or manufacturer operation based on electronic equipment 1,
Such as, if electronic equipment 1 is operable to print, duplicate and scan, the most described order word list
The order words such as " printing ", " duplicating ", " scanning ", " two parts " and " two-sided " can be comprised.As
Really user once used electronic equipment 1, and have registered some being stored in advance in hard disk 104
AM, then CPU 101 is by loading the AM of the registration of user, as the above-mentioned AM collection being loaded.
Otherwise, the above-mentioned AM collection being loaded is empty set.
It will be apparent to one skilled in the art that and there is two kinds of AM, a type
It is AM based on phoneme, and another type is AM based on order word.On the one hand, when at language
When the AM used in sound identification device 106 is AM based on phoneme, user or CPU 101 will
Carry out following checking, the AM based on phoneme i.e. concentrated, if contain life at the AM loaded
Make whole phonemes of order word in word list.If load AM concentrate based on phoneme
AM contains whole phonemes of the order word in order word list, then comprised predetermined by input user
The voice of the first voice command of justice and after starting speech recognition equipment 106, speech recognition fills
Putting 106 will utilize the AM collection loaded to perform speech recognition.Otherwise, if at the AM loaded
The AM based on phoneme concentrated is not covered by whole phonemes of the order word in order word list, then language
Sound identification device 106 will be selected to contain the order word in order word list according to the present invention
The AM based on phoneme of phone set of whole phonemes, and by selected phone set based on
The AM of phoneme, the AM adding loading to concentrate, and then, speech recognition equipment 106 will utilize
The AM based on phoneme of selected phone set performs speech recognition.
On the other hand, when in speech recognition equipment 106 use AM be AM based on order word
Time, user or CPU 101 will carry out following checking, i.e. the AM loaded concentrate based on life
Make the AM of word, if contain the whole order word in order word list.If at the AM loaded
The AM based on order word concentrated contains the whole order word in order word list, then user
After starting speech recognition equipment 106, speech recognition equipment 106 comes utilizing the AM collection loaded
Perform speech recognition.Otherwise, if do not contained at the AM based on order word of the AM concentration loaded
Whole order word in lid order word list, then speech recognition equipment 106 will according to the present invention,
AM based on order word is selected for each in order word list or number order word, and will
The AM selected AM based on order word being added to loading concentrates, then, and speech recognition
AM based on order word selected by utilization is performed speech recognition by device 106.This area skill
It is understood that until there is predefined first sound in the voice of user's input in art personnel
Order, speech recognition equipment 106 just will perform speech recognition.Predefined first voice command can
Automatically to be predefined by electronic equipment 1, or can be arranged by user.Such as, predefined
The first voice command can be following predefined introducer, this introducer is by such as " start "
Any word of (beginning) and " start ... end " (starting ... terminate) etc. or one group of any word composition.
Now, will be described below the inside merit of the speech recognition equipment 106 being able to carry out the present invention
First example of energy unit.As it is shown on figure 3, speech recognition equipment 106 includes voice cutting unit
302, predefined first voice command recognition unit 303, transformation matrix computing unit 304, mould
Type selects unit 305 and the second voice command recognition unit 306.
Specifically, acquisition is inputted by the voice-input unit 125 of electronic equipment 1 by active user
Voice.
Voice cutting unit 302 will receive the voice obtained from voice-input unit 125, then will
Utilize any kind of speech terminals detection (VAD) technology well known in the art, such as based on short
The time domain approach of Shi Nengliang and transform domain method based on frequency domain parameter etc., split the language of acquisition
Sound also exports at least two voice command section.As it has been described above, until the voice packet obtained is containing predefined
The first voice command, just speech recognition equipment 106 perform speech recognition, therefore, in order to make electricity
Subset 1 performs such as to print and the corresponding operating of duplicating etc., it is thus achieved that voice must comprise at least
Two voice commands, wherein, a voice command is used for identifying predefined first voice command,
And another is used for operating electronic equipment 1.Such as, want to utilize electronic equipment 1 active user
When carrying out printed document, it is thus achieved that voice can be " start print ".
It will be apparent to one skilled in the art that the current environment of voice is unlikely to be definitely
Quietly, in other words, the voice of input may comprise active user input voice current environment week
The sound enclosed, therefore in addition to above-mentioned at least two voice command section, voice cutting unit 302
Output also will include at least one background sound segment, wherein, background sound segment can reflect input
The current environment of voice, the sound around such as office, the sound around kindergarten and street
Sound etc. around road.
Predefined first voice command recognition unit 303 is by by using the acoustics unrelated with speaker
Model (SI-AM) 310, identifies the voice command section of sound cutting unit 302 of speaking to oneself from output
Predefined first voice command, wherein, SI-AM such as can be stored in the electronics in Fig. 2
In the hard disk 104 of equipment 1.
Transformation matrix computing unit 304 will be based at predefined first voice command recognition unit 303
In be identified as the voice command section of predefined first voice command, calculate for active user
Transformation matrix.Additionally, the current environment being as noted previously, as voice is unlikely to be absolutely quiet
, therefore, transformation matrix computing unit 304 will be based on background sound segment and be identified as making a reservation for
The voice command section of the first voice command of justice, calculates the transformation matrix for active user.
It is as noted previously, as the voice command split based on the voice by the instantaneous input of active user
A part (the most above-mentioned voice command section being identified as predefined first voice command) in Duan
Calculate transformation matrix, and be also based on the part in voice command section and background sound
Section calculates transformation matrix, and therefore, the transformation matrix calculated can represent active user and current
The attribute of environment, wherein, the attribute of active user can be active user pronunciation attribute and
The sound variation attribute etc. of active user, and the attribute of current environment can be making an uproar of current speech
Sound attribute and the communication channel properties etc. of current speech.
Such as, in one implementation, transformation matrix can by use well known in the art
Maximum-likelihood linear regression (MLLR) method calculates, and such as can be represented as below equation:
Wherein O represents the object of observation, all background sound segments described above and above-mentioned be identified as
The voice command section etc. of predefined first voice command.W represents above-mentioned transformation matrix.λ generation
The parameter of the above-mentioned SI-AM of table.
The above-mentioned formula meaning is to will enable P (O/W, λ) maximized W as output.In other words,
MLLR method can be used in by using transformation matrix to adjust the parameter of SI-AM so that
SI-AM can mate the object of observation, all background sound segments described above and above-mentioned be identified as
The voice command section etc. of predefined first voice command.Additionally, as optional solution,
The transformation matrix of the active user calculated can be stored in the hard disk 104 of electronic equipment 1,
For follow-up work, such as select AM etc. for other users using the present invention.
Model selection unit 305 is by based on calculating of exporting from transformation matrix computing unit 304
Transformation matrix, selects to use for current from the AM 320 being registered in speech recognition equipment 106
The AM at family, wherein, AM 320 can be in advance based on mentioned order word list by manufacturer or user
Speech samples register or train, and can be stored in electronic equipment 1 ROM 103 or
In hard disk 104.Additionally, electronic equipment 1 can be able to communicate with other electronic equipments via network 7
Ground connects, as it is shown in figure 1, therefore, the model selection unit 305 in electronic equipment 1 can also
Based on the transformation matrix calculated, from being registered in other electronic equipments described, (i.e. other voices are known
Other device) in AM in select the AM for active user.It is as noted previously, as and there is base
In the AM and AM based on order word of phoneme, therefore, hereinafter, one will be entered with reference to Fig. 4
Step describes model selection unit 305 in detail.
The AM that second voice command recognition unit 306 will be selected by use, comes from output from model
Selecting to identify the second voice command in the voice command section of unit 305, wherein, voice command section is not
It is included in above-mentioned predefined first voice command recognition unit 303 and is identified as predefined the
The voice command section of one voice command.Then, as it has been described above, operating unit 107 will perform with defeated
Come from the operation that second voice command identified of the second voice command recognition unit 306 is corresponding,
Or, outut device 108 can show or export the second voice command identified.
Now, will be described below the interior of model selection unit 305 in speech recognition equipment 106
One example of portion's functional unit.Fig. 4 be the exemplary embodiment according to the present invention, in Fig. 3
The block diagram of the built-in function of shown model selection unit 305.As shown in Figure 4, Model Selection list
Unit 305 includes Acoustic model selection unit 315 based on phoneme and/or based on order word
Acoustic model selection unit 325, so that model selection unit 305 can be disposed any kind of
AM。
If as it has been described above, when active user logs in electronic equipment 1, concentrated at the AM loaded
AM based on phoneme can not contain whole phonemes of the order word in order word list, then voice
Identify device 106 will according to the present invention, the order word being selected to contain in order word list
All AM based on phoneme of the phone set of phoneme.In other words, Acoustic model selection based on phoneme
Unit 315 is configured to based on the transformation matrix calculated, come from speech recognition equipment 106 or
Person is by registration in network other speech recognition equipments interconnective (i.e. other electronic equipments)
In the AM based on phoneme (i.e. AM 320 shown in Fig. 3) of phone set, select for currently
The AM based on phoneme of the phone set of user.In one implementation, acoustic mode based on phoneme
Type selects unit 315 to include first transformation matrix acquiring unit the 3151, first metrics calculation unit
3152 and acoustic model based on phoneme determine unit 3153.
Specifically, the first transformation matrix acquiring unit 3151 will be at speech recognition equipment 106
Or by registration in network other speech recognition equipments interconnective (i.e. other electronic equipments)
The AM based on phoneme of phone set, obtain transformation matrix.For those skilled in the art aobvious and
It is clear to, can be inferior for different accents, sex and age level, register or train not
AM based on phoneme with phone set.Further, for the AM based on phoneme of phone set,
For this phone set AM based on phoneme transformation matrix can in this phone set based on phoneme
AM when being registered or train, by using above-mentioned MLLR method to be calculated, and
Can be stored in the hard disk of electronic equipment together with the AM based on phoneme of this phone set.
The distance that first metrics calculation unit 3152 will be calculated as follows between two kinds of transformation matrixs, wherein
A kind of transformation matrix is to be used for current by what the transformation matrix computing unit 304 in Fig. 3 calculated
The transformation matrix at family, and another kind of transformation matrix is to be obtained by the first transformation matrix acquiring unit 3151
The transformation matrix of AM based on phoneme that get, for phone set.Those skilled in the art's energy
Enough being understood by, above-mentioned distance can be any known distance, such as euclidean (Euclidean)
Distance, K-L distance and horse Mahalanobis (Mahalanobis) distance etc., and will not at this
Repeat detailed computational methods.
Acoustic model based on phoneme determine unit 3153 by determine the minimum phone set of distance based on
The AM of phoneme, it is alternatively that the AM based on phoneme for active user.
If additionally, as it has been described above, when active user logs in electronic equipment 1, at the AM loaded
The AM based on order word concentrated can not contain the whole order word in order word list, then language
Sound identification device 106 will be according to the present invention, for each in order word list or number order word
Select AM based on order word.In other words, Acoustic model selection unit 325 based on order word
It is configured to based on the transformation matrix calculated, from speech recognition equipment 106 or pass through net
Registration based on order in network other speech recognition equipments interconnective (i.e. other electronic equipments)
In the AM (i.e. AM 320 shown in Fig. 3) of word, select for active user based on order
The AM of word.In one implementation, Acoustic model selection unit 325 based on order word includes
Second transformation matrix acquiring unit 3251, second distance computing unit 3252 and based on order word
Acoustic model determines unit 3253.
Specifically, for each order word in mentioned order word list, the second transformation matrix obtains
Unit 3251 will be at speech recognition equipment 106 or by network other voices interconnective
Identify the corresponding with this order word based on order of registration in device (i.e. other electronic equipments)
The AM of word, obtains transformation matrix.It will be apparent to one skilled in the art that for one
For individual order word, by inferior for different accents, sex and age level, register or train
One or more AM based on order word.Further, for an AM based on order word
For, a transformation matrix is by when this AM based on order word is registered or trains, by making
It is calculated by above-mentioned MLLR method, and can be together with this AM based on order word
It is stored in the hard disk of electronic equipment.
For each order word in mentioned order word list, second distance computing unit 3252 will calculate
Distance between the following two kinds transformation matrix, one of which transformation matrix is by the conversion square in Fig. 3
Transformation matrix that battle array computing unit 304 calculates, for active user, and another kind of conversion
Matrix is got by the second transformation matrix acquiring unit 3251, for corresponding with this order word
The transformation matrix of acoustic model based on order word.As it has been described above, calculated distance equally may be used
To be any known distance, such as Euclidean distance, K-L distance and horse Mahalanobis away from
From etc..
For each order word in mentioned order word list, acoustic model based on order word determines list
The AM of based on order word corresponding with this order word that unit 3253 is minimum by determining distance, as
The AM based on order word for active user selected.
Additionally, Acoustic model selection unit 325 based on order word can also include recommendation unit
3254.When acoustic model based on order word determines that unit 3253 can not be in order word list
Number order word and when determining corresponding AM based on order word, it is recommended that unit 3254 will be recommended
Active user's registration is for the AM based on order word of described order word, it is alternatively that based on order
The AM of word.
As it has been described above, in one implementation, if the AM loaded concentrate based on order
The AM of word can not contain the whole order word in order word list, then acoustics based on order word
Model selection unit 325 can select based on order word for each order word in order word list
AM.In another implementation, Acoustic model selection unit 325 based on order word can be only
For order word list selects AM based on order word such as word of issuing orders, these are ordered
Word, the active user not corresponding AM based on order word of registration in speech recognition equipment 106.
For speech recognition equipment 106, figure 5 illustrates speech recognition equipment 106
Second example of internal functional elements.Fig. 5 be the second exemplary embodiment according to the present invention,
The built-in function of the speech recognition relevant with the voice-input unit 125 in Fig. 2 of electronic equipment 1
Block diagram.In a second embodiment, speech recognition equipment 106 would first, through use SI-AM 310
Identify the second voice command.When CPU 101 performs to be stored in ROM 103 and/or hard disk 104
In program time, functional unit below is achieved.
Compared with Fig. 5 with Fig. 3, the speech recognition equipment 106 shown in Fig. 5 has following
Difference:
First, speech recognition equipment 106 also includes the 3rd voice command recognition unit 501.3rd sound
Sound command recognition unit 501 will come from output from model selection unit by using SI-AM 310
Identifying the second voice command in the voice command section of 305, wherein, voice command section is not included in making a reservation for
Justice the first voice command recognition unit 303 is identified as the sound of predefined first voice command
Command frame command phase.
Secondly, the 3rd voice command recognition unit 501 also will determine that the second voice command identified
Recognition confidence whether less than predefined threshold value, wherein, for example, it is possible to by user or manufacture
Business should be for pre-defining described predefined threshold value according to actual.If the rising tone identified
The recognition confidence of sound order is more than or equal to predefined threshold value, then operating unit 107 will directly
Perform relative with the second voice command identified from the 3rd voice command recognition unit 501 output
The operation answered, or outut device 108 can show or export the second voice command identified.
Otherwise, if the recognition confidence of the second voice command identified is less than predefined threshold value,
The AM that then the second voice command recognition unit 306 will be selected by use, comes from output from the 3rd sound
Identifying the second voice command in the voice command section of sound command recognition unit 501, wherein, sound is ordered
The section of order is not included in predefined first voice command recognition unit 303 being identified as predefined the
The voice command section of one voice command.The segmentation of voice-input unit 125 shown in Fig. 5, voice is single
Unit 302, predefined first voice command recognition unit 303, transformation matrix computing unit 304, mould
Type selects other detailed descriptions of unit 305, SI-AM 310 and AM 320, shown in Fig. 3
Corresponding units be similar to, thus, detailed explanation thereof will not be repeated at this.
(audio recognition method being stored in the storage device of electronic equipment)
Fig. 6 is to illustrate according to the exemplary embodiment of the present invention, MFP 1 etc. in such as Fig. 1
Second block diagram of another example of internal structure of the electronic equipment 1 with speech identifying function,
Wherein, in the storage device of electronic equipment 1 (i.e. MFP 1), store hereinafter by reference
The audio recognition method that Fig. 7~10 describes in detail.Electronic equipment 1 can include CPU
(CPU) 101, random access memory (RAM) 102, read only memory (ROM) 103,
Hard disk 104, input equipment 105, operating unit 107, outut device 108 and network interface 109,
And these parts are communicatively coupled each other via system bus 110.
As shown in Figure 6, in addition to speech recognition equipment 106, the internal structure of electronic equipment 1
Essentially identical with the internal structure of the electronic equipment 1 shown in Fig. 2, thus, will be the heaviest at this
Multiple CPU 101, RAM 102, ROM 103, hard disk 104, input equipment 105, operating unit
107, outut device 108, network interface 109 and the detailed description of system bus 110.Additionally,
In the hard disk 104 of the electronic equipment 1 shown in figure 6, store and be capable of and institute in Fig. 2
The audio recognition method of the function that the speech recognition equipment 106 that shows is identical.
Fig. 7 be above-mentioned first exemplary embodiment according to the present invention and electronic equipment 1 in Fig. 6
The flow chart of the relevant speech recognition operation of voice-input unit 125.When CPU 101 will store
Program in ROM 103 and/or hard disk 104 is loaded in RAM 102 and performs corresponding
During program, the operation of each corresponding steps hereinafter is achieved.
As it is shown in fig. 7, in phonetic entry step S701, the electronic equipment 1 shown in Fig. 6
Voice-input unit 125 will obtain the voice inputted by active user (corresponding to the voice in Fig. 3
Input block 125).
In voice segmentation step S702, the CPU 101 of the electronic equipment 1 shown in Fig. 6 will be from
Voice-input unit 125 receives the voice obtained, and will split the voice obtained and export at least
Two voice command sections (the voice cutting unit 302 corresponding in Fig. 3).
As it has been described above, the current environment of voice is unlikely to be absolutely quiet, therefore split at voice
In step S702, in addition to above-mentioned at least two voice command section, CPU 101 also will output
At least one background sound segment.
In predefined first voice command recognition step S703, the electronic equipment 1 shown in Fig. 6
The SI-AM that will be stored in by use in the electronic equipment 1 shown in Fig. 6 of CPU 101, come
The voice command section of sound segmentation step S702 of speaking to oneself from output identifies predefined first voice command
(the predefined first voice command recognition unit 303 corresponding in Fig. 3).As it has been described above, it is predetermined
First voice command of justice can be such as predefined introducer.
In transformation matrix calculation procedure S704, the CPU 101 of the electronic equipment 1 shown in Fig. 6
By based on being identified as the voice command section of predefined first voice command, calculate for currently
The transformation matrix of user, wherein, the transformation matrix of calculating can make SI-AM and be identified as making a reservation for
The voice command section coupling of the first voice command of justice (calculates single corresponding to the transformation matrix in Fig. 3
Unit 304).
Additionally, the current environment being as noted previously, as voice is unlikely to be absolutely quiet, therefore,
In transformation matrix calculation procedure S704, CPU 101 can be based on background sound segment and identified
For the voice command section of predefined first voice command, calculate the conversion square for active user
Battle array.As preferred solution, conversion can be calculated by using above-mentioned MLLR method
Matrix.Additionally, as optional solution, CPU 101 can be the active user calculated
Transformation matrix, be stored in the hard disk 104 of the electronic equipment 1 shown in Fig. 6, for rear
Continuous work, such as selects AM etc. for other users using the present invention.
In Model Selection step S705, the CPU 101 of the electronic equipment 1 shown in Fig. 6 is by base
In the transformation matrix calculated, the AM from the electronic equipment 1 being registered in shown in Fig. 6
Select the AM (model selection unit 305 corresponding in Fig. 3) for active user.
Additionally, the electronic equipment 1 shown in Fig. 6 can via network 7 and other electronic equipments
Connect communicatedly, as it is shown in figure 1, therefore, in Model Selection step S705, shown in Fig. 6
Electronic equipment 1 in CPU 101 can also be based on the transformation matrix calculated, from being registered in
AM in other electronic equipments described selects the AM for active user.
It is as noted previously, as and there is AM based on phoneme and AM based on order word, therefore,
Model Selection step S705 also includes: Acoustic model selection step based on phoneme, by based on based on
The transformation matrix calculated, interconnects mutually from electronic equipment 1 shown in figure 6 or by network
In other electronic equipments connect in the AM based on phoneme of the phone set of registration, select for currently
The AM based on phoneme of the phone set of user;And/or Acoustic model selection of based on order word
Step, for based on the transformation matrix calculated, come the electronic equipment 1 shown in figure 6 or
Person, by the AM based on order word of registration in network other electronic equipments interconnective, selects
Select the AM based on order word for active user.
In fig. 8 it is shown that for a kind of illustrative methods selecting AM based on phoneme.Fig. 8
Schematically show that Model Selection step S705 illustrated in the figure 7 performs, for selecting
The flow chart of the step of AM based on phoneme.
As shown in Figure 8, in the first transformation matrix obtaining step S7051, the electricity shown in Fig. 6
The CPU 101 of subset 1 is by for electronic equipment 1 shown in figure 6 or by network phase
In other electronic equipments connected, the AM based on phoneme of the phone set of registration, obtains conversion square
Battle array (the first transformation matrix acquiring unit 3151 corresponding in Fig. 4).
In the first distance calculation procedure S7052, the CPU 101 of the electronic equipment 1 shown in Fig. 6
To be calculated as follows the distance between two kinds of transformation matrixs, one of which transformation matrix is from Fig. 7
Transformation matrix for active user that export in transformation matrix calculation procedure S704, that calculate,
And another kind of transformation matrix exports from the first transformation matrix obtaining step S7051, obtains
The transformation matrix of the AM based on phoneme for phone set arrived (corresponding to first in Fig. 4 away from
From computing unit 3152).
Determine in step S7053 at acoustic model based on phoneme, the electronic equipment 1 shown in Fig. 6
CPU 101 will determine the AM based on phoneme of the minimum phone set of distance, it is alternatively that pin
The AM based on phoneme of active user (is determined corresponding to the acoustic model based on phoneme in Fig. 4
Unit 3153).
In fig. 9 it is shown that for selecting for an order word in mentioned order word list
A kind of illustrative methods of AM based on order word.Fig. 9 schematically shows example in the figure 7
That Model Selection step S705 shown performs, for selecting the step of AM based on order word
Flow chart.
As it is shown in figure 9, in the second transformation matrix obtaining step S7151, the electricity shown in Fig. 6
The CPU 101 of subset 1 is by for electronic equipment 1 shown in figure 6 or by network phase
The AM of based on order word corresponding with order word of registration in other electronic equipments connected,
Obtain transformation matrix (the second transformation matrix acquiring unit 3251 corresponding in Fig. 4).
In second distance calculation procedure S7152, the CPU 101 of the electronic equipment 1 shown in Fig. 6
To be calculated as follows the distance between two kinds of transformation matrixs, one of which transformation matrix is from Fig. 7
Transformation matrix for active user that export in transformation matrix calculation procedure S704, that calculate,
And another kind of transformation matrix be get by the second transformation matrix obtaining step S7151 output,
For the transformation matrix of the acoustic model of based on order word corresponding with order word (corresponding to Fig. 4
In second distance computing unit 3252).
Determining in step S7153 at acoustic model based on order word, the electronics shown in Fig. 6 sets
The AM of based on order word corresponding with order word that the CPU 101 of standby 1 is minimum by determining distance,
As select, for active user AM based on order word (corresponding in Fig. 4 based on life
The acoustic model making word determines unit 3253).
In step S7154, the CPU 101 of the electronic equipment 1 shown in Fig. 6 will carry out as follows
Judgement, i.e. self whether can obtain minimum range for each order word in order word list.
If CPU 101 can not obtain minimum range for each order word in order word list, then pushing away
Recommending in step S7155, CPU 101 will recommend active user at acoustic mode based on order word
Type determines the order word that can not determine corresponding AM based on order word in step S7153, comes
Register AM based on order word, it is alternatively that AM based on order word (corresponding in Fig. 4
Recommendation unit 3254).
Although in above-mentioned Acoustic model selection step based on order word, the electricity shown in Fig. 6
The CPU 101 of subset 1 selects based on order word for each order word in order word list
AM, but, CPU 101 can also only for active user the most in figure 6 shown in electronic equipment
In 1, the corresponding AM based on order word of registration, order word in order word list, select base
AM in order word.
Now, return to Fig. 7, in the second voice command recognition step S706, shown in Fig. 6
The AM that will be selected by use of the CPU 101 of electronic equipment 1, come from voice command Duan Zhongshi
Other second voice command, wherein, voice command section is not included in predefined first voice command recognition
Step S703 is identified as the voice command section of predefined first voice command (corresponding to Fig. 3
In voice command recognition unit 306).
It should be pointed out that, that the unit of the speech recognition equipment 106 shown in Fig. 3~4 can be by structure
Make each step for performing the audio recognition method shown in the flow chart in Fig. 7~9.
Figure 10 be above-mentioned second exemplary embodiment according to the present invention and electronic equipment in Fig. 6
The flow chart of the speech recognition operation that the voice-input unit 125 of 1 is relevant.When CPU 101 will deposit
Storage program in ROM 103 and/or hard disk 104 is loaded in RAM 102 and performs corresponding
Program time, the operation of each corresponding steps hereinafter is achieved.
Compared with Figure 10 with Fig. 7, the audio recognition method shown in Figure 10 have following not
Same part:
First, audio recognition method also includes the 3rd voice command recognition step S1001.At the 3rd sound
In sound command recognition step S1001, the CPU 101 of the electronic equipment 1 shown in Fig. 6 will pass through
Use and be stored in the SI-AM in the electronic equipment 1 shown in Fig. 6, come from output from Model Selection
Identifying the second voice command in the voice command section of step S705, wherein, voice command section does not comprises
It is identified as predefined first voice command in predefined first voice command recognition step S703
Voice command section.
Secondly, in step S1002, the CPU 101 of the electronic equipment 1 shown in Fig. 6 also will
Carry out following judgement, i.e. from the 3rd voice command recognition step S1001 output, identify
The recognition confidence of the second voice command, if less than predefined threshold value, wherein, such as,
Described predefined threshold value can be pre-defined according to actual should being used for by user or manufacturer.As
The recognition confidence of the second voice command that fruit is identified is less than predefined threshold value, then at the rising tone
In sound command recognition step S706, the CPU 101 of the electronic equipment 1 shown in Fig. 6 will be by making
With the AM selected, from the voice command section of step S1002, identify the second sound life from output
Order, wherein, voice command section is not included in quilt in predefined first voice command recognition step S703
It is identified as the voice command section of predefined first voice command.
Otherwise, if the recognition confidence of the second voice command identified is more than or equal to predefined
Threshold value, then the CPU 101 of the electronic equipment 1 shown in Fig. 6 is by the second sound identified
Order, exports operating unit 107 or the outut device 108 of the electronic equipment 1 shown in Fig. 6.
Phonetic entry step S701 shown in Figure 10, voice segmentation step S702, predefined the
One voice command recognition step S703, transformation matrix calculation procedure S704 and Model Selection step S705
Other describe in detail, similar with the corresponding steps shown in Fig. 7, thus, will be the heaviest at this
Multiple detailed description.
It should be pointed out that, that the unit of the speech recognition equipment 106 shown in Fig. 5 can be constructed
For performing each step of the audio recognition method shown in the flow chart in Figure 10.
Utilize above-mentioned exemplary speech identification device and audio recognition method, due to based on by currently
The voice of the instantaneous input of user and a part in the voice command section split (the most above-mentioned are identified as
The voice command section of predefined first voice command) calculate transformation matrix, and also can be with base
A part and background sound segment in voice command section calculate transformation matrix, therefore, based on
The transformation matrix calculated and the AM selected can mesh well into the distance speech of active user and belong to
Property and momentary surroundings attribute, and, by use selected by AM can improve active user
Speech recognition performance.
Above-mentioned all of unit be all for realize the exemplary of process described in the disclosure and/or
Preferably module.These unit can be hardware cell (such as field programmable gate array (FPGA),
Digital signal processor or special IC etc.) and/or software module (such as computer-readable journey
Sequence).Unit for realizing various step is the most at large described.But, carry out certain in existence
One process step in the case of, can exist the corresponding function module for realizing same treatment or
Unit (is realized by hardware and/or software).In disclosure herein, including based on the step described
The technical scheme of whole combinations of rapid and corresponding with these steps unit, as long as constituted
These technical schemes are complete and are suitable for.
The method and device of the present invention can be implemented in several ways.For example, it is possible to by soft
Part, hardware, firmware or its combination in any, implement the method and device of the present invention.It is described above
The order of step of method be only intended to illustrate, and, the step of the method for the present invention is not
It is defined in the order being described in detail above, unless stated otherwise.Additionally, some embodiment party
In formula, the present invention may be embodied in the program recorded in the recording medium, including for realizing root
Machine readable instructions according to the method for the present invention.Therefore, present invention also contemplates that storage has for realizing
The record medium of the program of the method according to the invention.
Although some specific embodiments of the present invention have been discussed in detail above with example, but, this
Skilled person is it should be appreciated that above-mentioned example is only intended to illustrate rather than limit
The scope of the present invention.It will be apparent to a skilled person that can be without departing from the present invention's
In the case of scope and spirit, above-described embodiment is modified.The scope of the present invention is by appended
Claim defines.
Claims (16)
1. a speech recognition equipment, this speech recognition equipment includes:
Voice-input unit, it is configured to obtain the voice inputted by active user;
Voice cutting unit, it is configured to split the voice obtained and export at least two sound
Command frame command phase;
Predefined first voice command recognition unit, it is configured to by using unrelated with speaker
Acoustic model, from voice command section, identify predefined first voice command;
Transformation matrix computing unit, it is configured to based on being identified as described predefined first sound
The voice command section of sound order, calculates the transformation matrix for described active user, wherein, institute
Described in the transformation matrix calculated can make the described acoustic model unrelated with speaker and be identified as
The voice command section coupling of predefined first voice command;
Model selection unit, it is configured to based on the transformation matrix calculated, from being registered in
Acoustic model in described speech recognition equipment selects the acoustic model for described active user;
And
Second voice command recognition unit, it is configured to by the acoustic model selected by use,
The second voice command is identified from voice command section.
Speech recognition equipment the most according to claim 1, wherein,
The output of described voice cutting unit also includes at least one background sound segment,
Described transformation matrix computing unit is based on described background sound segment and is identified as described predetermined
The voice command section of the first voice command of justice, calculates described transformation matrix, and
The transformation matrix calculated can make the described acoustic model unrelated with speaker, with described
Background sound segment and be identified as the voice command section of described predefined first voice command
Join.
3. according to the speech recognition equipment described in claim 1 or claim 2, wherein, described
Model selection unit includes:
Acoustic model selection unit based on phoneme, it is configured to based on the conversion square calculated
Battle array, from the acoustic model based on phoneme of the phone set being registered in described speech recognition equipment,
Select the acoustic model based on phoneme of the phone set for described active user;And/or
Acoustic model selection unit based on order word, it is configured to based on the conversion calculated
Matrix, from the acoustic model based on order word being registered in described speech recognition equipment, choosing
Select the acoustic model based on order word for described active user.
Speech recognition equipment the most according to claim 3, wherein, described sound based on phoneme
Model selection unit includes:
First transformation matrix acquiring unit, it is configured to obtain for being registered in described speech recognition
The transformation matrix of the acoustic model based on phoneme of the phone set in device;
First metrics calculation unit, its be configured to be calculated for described active user
Transformation matrix and the conversion square of the accessed acoustic model based on phoneme for phone set
Distance between Zhen;And
Acoustic model based on phoneme determines unit, and it is configured to determine the phone set that distance is minimum
Acoustic model based on phoneme, as selected based on phoneme for described active user
Acoustic model.
Speech recognition equipment the most according to claim 3, wherein, described based on order word
Acoustic model selection unit includes:
Second transformation matrix acquiring unit, it is configured to in predefined order word list
Each order word, obtains corresponding with this order word for be registered in described speech recognition equipment
The transformation matrix of acoustic model based on order word;
Second distance computing unit, it is configured to in described predefined order word list
Each order word, calculates the transformation matrix for described active user calculated with acquired
Arrive for the transformation matrix of the acoustic model of based on order word corresponding with this order word between
Distance;And
Acoustic model based on order word determines unit, and it is configured to for described predefined life
Make each order word in word list, determine minimum corresponding with this order word based on life of distance
Make the acoustic model of word, as the selected acoustics based on order word for described active user
Model.
Speech recognition equipment the most according to claim 5, wherein, described based on order word
Acoustic model selection unit also includes:
Recommendation unit, it is configured to recommend described active user, for described based on order word
Acoustic model determines that unit can not determine the order word of corresponding acoustic model based on order word,
Register acoustic model based on order word, as selected acoustic model based on order word.
Speech recognition equipment the most according to claim 5, wherein, described based on order word
Acoustic model selection unit is only in described predefined order word list, described active user
The not order word of the corresponding acoustic model based on order word of registration in described speech recognition equipment,
Select acoustic model based on order word.
8. according to the speech recognition equipment described in claim 1 or claim 2, wherein, described
Whether the acoustic model that checking acoustic model is concentrated is contained predefined order word by speech recognition equipment
Complete order word in list,
If the acoustic model that described acoustic model is concentrated is contained in described predefined order word list
Complete order word, the most described speech recognition equipment will by use described acoustic model concentrate sound
Learn model, from voice command section, identify described second voice command;Otherwise, described voice is known
Other device, by by the acoustic model selected by use, identifies described second from voice command section
Voice command, and
Wherein, when described active user logs in described speech recognition equipment, described acoustic model collection
And described predefined order word list will be loaded.
9. according to the speech recognition equipment described in claim 1 or claim 2, wherein, described
Model selection unit from described speech recognition equipment and/or by network interconnective other
In speech recognition equipment in the acoustic model of registration, select the acoustic mode for described active user
Type.
10. according to the speech recognition equipment described in claim 1 or claim 2, this speech recognition
Device also includes:
3rd voice command recognition unit, it is configured to by using described unrelated with speaker
Acoustic model, identifies described second voice command, wherein from voice command section
When the recognition confidence from described 3rd voice command recognition unit output is less than predefined threshold
During value, described second voice command recognition unit by by use selected by acoustic model, come from
Voice command section identifies described second voice command.
11. 1 kinds of audio recognition methods, this audio recognition method includes:
Phonetic entry step, it is thus achieved that the voice inputted in speech recognition equipment by active user;
Voice segmentation step, splits the voice obtained and exports at least two voice command section;
Predefined first voice command recognition step, by using the acoustic model unrelated with speaker,
Predefined first voice command is identified from voice command section;
Transformation matrix calculation procedure, based on the sound being identified as described predefined first voice command
Sound command frame command phase, calculates the transformation matrix for described active user, wherein, the change calculated
Change matrix to make the described acoustic model unrelated with speaker and be identified as described predefined the
The voice command section coupling of one voice command;
Model Selection step, based on the transformation matrix calculated, knows from being registered in described voice
Acoustic model in other device selects the acoustic model for described active user;And
Second voice command recognition step, by the acoustic model selected by use, orders from sound
The section of order identifies the second voice command.
12. audio recognition methods according to claim 11, wherein,
The output of described voice segmentation step also includes at least one background sound segment,
Described transformation matrix calculation procedure is based on described background sound segment and is identified as described predetermined
The voice command section of the first voice command of justice, calculates described transformation matrix, and
The transformation matrix calculated can make the described acoustic model unrelated with speaker, with described
Background sound segment and be identified as the voice command section of described predefined first voice command
Join.
13. according to the audio recognition method described in claim 11 or claim 12, wherein, institute
State Model Selection step to include:
Acoustic model selection step based on phoneme, based on the transformation matrix calculated, comes from note
In the acoustic model based on phoneme of volume phone set in described speech recognition equipment, select for
The acoustic model based on phoneme of the phone set of described active user;And/or
Acoustic model selection step based on order word, based on the transformation matrix calculated, from
In the acoustic model based on order word being registered in described speech recognition equipment, select for described
The acoustic model based on order word of active user.
14. according to the audio recognition method described in claim 11 or claim 12, wherein, institute
Whether the acoustic model that checking acoustic model is concentrated is contained predefined order by predicate voice recognition method
Complete order word in word list,
If the acoustic model that described acoustic model is concentrated is contained in described predefined order word list
Complete order word, the most described audio recognition method will by use described acoustic model concentrate sound
Learn model, from voice command section, identify described second voice command;Otherwise, described voice is known
Other method, by by the acoustic model selected by use, identifies described second from voice command section
Voice command, and
Wherein, when described active user logs in described speech recognition equipment, described acoustic model collection
And described predefined order word list will be loaded.
15. according to the audio recognition method described in claim 11 or claim 12, wherein, institute
State Model Selection step from described speech recognition equipment and/or by network interconnective its
In his speech recognition equipment in the acoustic model of registration, select the acoustics for described active user
Model.
16. know according to the audio recognition method described in claim 11 or claim 12, this voice
Other method also includes:
3rd voice command recognition step, by using the described acoustic model unrelated with speaker,
Described second voice command is identified, wherein from voice command section
When from described 3rd voice command recognition step, the recognition confidence of output is less than predefined
During threshold value, described second voice command recognition step, by by the acoustic model selected by use, is come
Described second voice command is identified from voice command section.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510032839.8A CN105869641A (en) | 2015-01-22 | 2015-01-22 | Speech recognition device and speech recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510032839.8A CN105869641A (en) | 2015-01-22 | 2015-01-22 | Speech recognition device and speech recognition method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105869641A true CN105869641A (en) | 2016-08-17 |
Family
ID=56623369
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510032839.8A Pending CN105869641A (en) | 2015-01-22 | 2015-01-22 | Speech recognition device and speech recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105869641A (en) |
Cited By (66)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109257942A (en) * | 2017-05-12 | 2019-01-22 | 苹果公司 | The specific acoustic model of user |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
CN110706710A (en) * | 2018-06-25 | 2020-01-17 | 普天信息技术有限公司 | Voice recognition method and device, electronic equipment and storage medium |
US10720160B2 (en) | 2018-06-01 | 2020-07-21 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10741181B2 (en) | 2017-05-09 | 2020-08-11 | Apple Inc. | User interface for correcting recognition errors |
US10878809B2 (en) | 2014-05-30 | 2020-12-29 | Apple Inc. | Multi-command single utterance input method |
US10930282B2 (en) | 2015-03-08 | 2021-02-23 | Apple Inc. | Competing devices responding to voice triggers |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11656884B2 (en) | 2017-01-09 | 2023-05-23 | Apple Inc. | Application integration with a digital assistant |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11810578B2 (en) | 2020-05-11 | 2023-11-07 | Apple Inc. | Device arbitration for digital assistant-based intercom systems |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101622660A (en) * | 2007-02-28 | 2010-01-06 | 日本电气株式会社 | Audio recognition device, audio recognition method, and audio recognition program |
CN103221996A (en) * | 2010-12-10 | 2013-07-24 | 松下电器产业株式会社 | Device and method for pass-hrase modeling for speaker verification, and verification system |
CN103236260A (en) * | 2013-03-29 | 2013-08-07 | 京东方科技集团股份有限公司 | Voice recognition system |
CN103578462A (en) * | 2012-07-18 | 2014-02-12 | 株式会社东芝 | Speech processing system |
CN104137178A (en) * | 2011-12-19 | 2014-11-05 | 斯班逊有限公司 | Acoustic processing unit interface |
CN104143332A (en) * | 2013-05-08 | 2014-11-12 | 卡西欧计算机株式会社 | VOICE PROCESSING DEVICE, and VOICE PROCESSING METHOD |
-
2015
- 2015-01-22 CN CN201510032839.8A patent/CN105869641A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101622660A (en) * | 2007-02-28 | 2010-01-06 | 日本电气株式会社 | Audio recognition device, audio recognition method, and audio recognition program |
CN103221996A (en) * | 2010-12-10 | 2013-07-24 | 松下电器产业株式会社 | Device and method for pass-hrase modeling for speaker verification, and verification system |
CN104137178A (en) * | 2011-12-19 | 2014-11-05 | 斯班逊有限公司 | Acoustic processing unit interface |
CN103578462A (en) * | 2012-07-18 | 2014-02-12 | 株式会社东芝 | Speech processing system |
CN103236260A (en) * | 2013-03-29 | 2013-08-07 | 京东方科技集团股份有限公司 | Voice recognition system |
CN104143332A (en) * | 2013-05-08 | 2014-11-12 | 卡西欧计算机株式会社 | VOICE PROCESSING DEVICE, and VOICE PROCESSING METHOD |
Cited By (99)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11900936B2 (en) | 2008-10-02 | 2024-02-13 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11321116B2 (en) | 2012-05-15 | 2022-05-03 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11862186B2 (en) | 2013-02-07 | 2024-01-02 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US11557310B2 (en) | 2013-02-07 | 2023-01-17 | Apple Inc. | Voice trigger for a digital assistant |
US11636869B2 (en) | 2013-02-07 | 2023-04-25 | Apple Inc. | Voice trigger for a digital assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10878809B2 (en) | 2014-05-30 | 2020-12-29 | Apple Inc. | Multi-command single utterance input method |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11810562B2 (en) | 2014-05-30 | 2023-11-07 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11699448B2 (en) | 2014-05-30 | 2023-07-11 | Apple Inc. | Intelligent assistant for home automation |
US11670289B2 (en) | 2014-05-30 | 2023-06-06 | Apple Inc. | Multi-command single utterance input method |
US11838579B2 (en) | 2014-06-30 | 2023-12-05 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11842734B2 (en) | 2015-03-08 | 2023-12-12 | Apple Inc. | Virtual assistant activation |
US10930282B2 (en) | 2015-03-08 | 2021-02-23 | Apple Inc. | Competing devices responding to voice triggers |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US11550542B2 (en) | 2015-09-08 | 2023-01-10 | Apple Inc. | Zero latency digital assistant |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11954405B2 (en) | 2015-09-08 | 2024-04-09 | Apple Inc. | Zero latency digital assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11809886B2 (en) | 2015-11-06 | 2023-11-07 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11657820B2 (en) | 2016-06-10 | 2023-05-23 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11749275B2 (en) | 2016-06-11 | 2023-09-05 | Apple Inc. | Application integration with a digital assistant |
US11656884B2 (en) | 2017-01-09 | 2023-05-23 | Apple Inc. | Application integration with a digital assistant |
US10741181B2 (en) | 2017-05-09 | 2020-08-11 | Apple Inc. | User interface for correcting recognition errors |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11538469B2 (en) | 2017-05-12 | 2022-12-27 | Apple Inc. | Low-latency intelligent automated assistant |
CN109257942A (en) * | 2017-05-12 | 2019-01-22 | 苹果公司 | The specific acoustic model of user |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
CN109257942B (en) * | 2017-05-12 | 2020-01-14 | 苹果公司 | User-specific acoustic models |
US11862151B2 (en) | 2017-05-12 | 2024-01-02 | Apple Inc. | Low-latency intelligent automated assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11837237B2 (en) | 2017-05-12 | 2023-12-05 | Apple Inc. | User-specific acoustic models |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US11907436B2 (en) | 2018-05-07 | 2024-02-20 | Apple Inc. | Raise to speak |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11900923B2 (en) | 2018-05-07 | 2024-02-13 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11487364B2 (en) | 2018-05-07 | 2022-11-01 | Apple Inc. | Raise to speak |
US11360577B2 (en) | 2018-06-01 | 2022-06-14 | Apple Inc. | Attention aware virtual assistant dismissal |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10720160B2 (en) | 2018-06-01 | 2020-07-21 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11630525B2 (en) | 2018-06-01 | 2023-04-18 | Apple Inc. | Attention aware virtual assistant dismissal |
CN110706710A (en) * | 2018-06-25 | 2020-01-17 | 普天信息技术有限公司 | Voice recognition method and device, electronic equipment and storage medium |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11360739B2 (en) | 2019-05-31 | 2022-06-14 | Apple Inc. | User activity shortcut suggestions |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11924254B2 (en) | 2020-05-11 | 2024-03-05 | Apple Inc. | Digital assistant hardware abstraction |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11810578B2 (en) | 2020-05-11 | 2023-11-07 | Apple Inc. | Device arbitration for digital assistant-based intercom systems |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11750962B2 (en) | 2020-07-21 | 2023-09-05 | Apple Inc. | User identification using headphones |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105869641A (en) | Speech recognition device and speech recognition method | |
CN103680497B (en) | Speech recognition system and method based on video | |
CN103635962B (en) | Sound recognition system, recognition dictionary register system and acoustic model identifier nucleotide sequence generating apparatus | |
US20210397797A1 (en) | Method and apparatus for training dialog generation model, dialog generation method and apparatus, and medium | |
CN110489538A (en) | Sentence answer method, device and electronic equipment based on artificial intelligence | |
CN103280216B (en) | Improve the speech recognition device the relying on context robustness to environmental change | |
CN110534099A (en) | Voice wakes up processing method, device, storage medium and electronic equipment | |
CN110265040A (en) | Training method, device, storage medium and the electronic equipment of sound-groove model | |
CN105940407A (en) | Systems and methods for evaluating strength of an audio password | |
CN108447471A (en) | Audio recognition method and speech recognition equipment | |
CN102404278A (en) | Song request system based on voiceprint recognition and application method thereof | |
JP2018522303A (en) | Account addition method, terminal, server, and computer storage medium | |
US20230252019A1 (en) | Assigning a single new entigen to a word set | |
CN110136721A (en) | A kind of scoring generation method, device, storage medium and electronic equipment | |
JP6927318B2 (en) | Information processing equipment, information processing methods, and programs | |
US9613616B2 (en) | Synthesizing an aggregate voice | |
KR20190104280A (en) | Intelligent voice recognizing method, apparatus, and intelligent computing device | |
CN110136689A (en) | Song synthetic method, device and storage medium based on transfer learning | |
CN110147936A (en) | Service evaluation method, apparatus based on Emotion identification, storage medium | |
KR20210016829A (en) | Intelligent voice recognizing method, apparatus, and intelligent computing device | |
CN109947971A (en) | Image search method, device, electronic equipment and storage medium | |
CN108322770A (en) | Video frequency program recognition methods, relevant apparatus, equipment and system | |
CN113051384A (en) | User portrait extraction method based on conversation and related device | |
CN111833907A (en) | Man-machine interaction method, terminal and computer readable storage medium | |
CN113763925B (en) | Speech recognition method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20200310 |