CN108206020A - A kind of audio recognition method, device and terminal device - Google Patents
A kind of audio recognition method, device and terminal device Download PDFInfo
- Publication number
- CN108206020A CN108206020A CN201611166106.4A CN201611166106A CN108206020A CN 108206020 A CN108206020 A CN 108206020A CN 201611166106 A CN201611166106 A CN 201611166106A CN 108206020 A CN108206020 A CN 108206020A
- Authority
- CN
- China
- Prior art keywords
- recognition result
- descriptor
- voice messaging
- vector
- term vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
The embodiment of the invention discloses a kind of audio recognition method, device and terminal device, this method includes:Voice messaging input by user is received, determines descriptor corresponding with the voice messaging and theme term vector;The voice messaging is identified by neural network model to determine recognition result;Vectorization is carried out to the recognition result to handle to obtain recognition result vector;Calculate the distance value of theme term vector described in the recognition result vector sum, the distance value is normalized to obtain the distance weighting value of the recognition result and the descriptor, the voice messaging finally corresponding identification content is determined according to the distance weighting value.This programme causes the demand matching degree of speech recognition content and user to significantly improve, and recognition efficiency significantly increases.
Description
Technical field
The present embodiments relate to a kind of speech recognition technology more particularly to audio recognition method, device and terminal devices.
Background technology
Speech exchange is carried out with machine, machine is allowed to understand what you say, this is the thing that people dream of for a long time.
Alliance of Chinese Internet of Things school-run enterprise is vivid that speech recognition ratio is done " auditory system of machine ".Speech recognition technology is exactly to allow machine
Voice signal is changed into understanding process by identifying the intellectual technology of corresponding text or order.
In the prior art, when user speech is identified generally use cloud service speech recognition mode, i.e. user
Voice messaging is uploaded by terminal device, it is known after speech recognition cloud server to the file comprising voice messaging
Not to return to corresponding text message to terminal device.
In said program, the sound dimension for the voice messaging that speech recognition server is uploaded by user is identified, and makes
It is very poor for the voice messaging recognition effect of unisonance or nearly sound, if user speech is expressed the meaning as " which poem he has ", and final
Identify content may be " which thing he has ".
Invention content
The present invention provides a kind of audio recognition method, device and terminal devices so that speech recognition content and user's
Demand matching degree significantly improves, and recognition efficiency significantly increases.
In a first aspect, an embodiment of the present invention provides a kind of audio recognition method, including:
Voice messaging input by user is received, determines descriptor corresponding with the voice messaging and theme term vector;
The voice messaging is identified by neural network model to determine recognition result;
Vectorization is carried out to the recognition result to handle to obtain recognition result vector;
The distance value of theme term vector described in the recognition result vector sum is calculated, place is normalized to the distance value
Reason obtains the distance weighting value of the recognition result and the descriptor, and the voice messaging is determined according to the distance weighting value
Final corresponding identification content.
Second aspect, the embodiment of the present invention additionally provide a kind of speech recognition equipment, including:
Theme determining module for receiving voice messaging input by user, determines theme corresponding with the voice messaging
Word and theme term vector;
Recognition result determining module is identified to determine identification for passing through neural network model to the voice messaging
As a result;
Recognition result vector determining module, for the recognition result is carried out vectorization handle to obtain recognition result to
Amount;
Identify content determination module, it is right for calculating the distance value of theme term vector described in the recognition result vector sum
The distance value is normalized to obtain the distance weighting value of the recognition result and the descriptor, according to the distance
Weighted value determines the voice messaging finally corresponding identification content.
The third aspect, the embodiment of the present invention additionally provide a kind of terminal device, and the terminal device is integrated with foregoing description
Device.
Technical solution provided in an embodiment of the present invention, by the way that voice messaging input by user is corresponding with the voice messaging
Descriptor is associated, and determines final speech recognition content according to voice messaging and corresponding descriptor, solves and only pass through
The sound dimension for the voice messaging that user uploads is identified, and the voice messaging recognition effect for unisonance or nearly sound brought is very
Poor problem so that speech recognition content and the demand matching degree of user significantly improve, and recognition efficiency significantly increases.
Description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, of the invention is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is the flow chart for the audio recognition method that the embodiment of the present invention one provides;
Fig. 2 is the flow chart of audio recognition method provided by Embodiment 2 of the present invention;
Fig. 3 is the flow chart for the audio recognition method that the embodiment of the present invention three provides;
Fig. 4 is the flow chart for the audio recognition method that the embodiment of the present invention four provides;
Fig. 5 is the flow chart for the audio recognition method that the embodiment of the present invention five provides;
Fig. 6 is the structure diagram for the speech recognition equipment that the embodiment of the present invention six provides;
Fig. 7 is the structure diagram for the terminal device that the embodiment of the present invention seven provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is only used for explaining the present invention rather than limitation of the invention.It also should be noted that for the ease of
It describes, part related to the present invention rather than entire infrastructure is illustrated only in attached drawing.
Embodiment one
Fig. 1 is the flow chart for the audio recognition method that the embodiment of the present invention one provides, and the present embodiment is applicable to user
The situation that the voice messaging of input is identified, this method can be performed by Cloud Server, as shown in Figure 1, the present embodiment carries
The concrete scheme of confession is as follows:
S101, voice messaging input by user is received, determines descriptor corresponding with the voice messaging and descriptor
Vector.
In the present embodiment, user can carry out voice by terminal device such as smart mobile phone, tablet computer, laptop etc.
The typing of information, terminal device can be acquired the voice messaging of user's typing by microphone and be uploaded in Cloud Server.Wherein,
Cloud Server can be that speech-recognition services are provided separately in other suppliers, the speech identifying function typing for passing through ctrip.com such as user
The user speech information got is sent to the Cloud Server to carry out speech recognition by voice, ctrip.com's server.
In the present embodiment, after voice messaging input by user is received, determining for voice messaging descriptor is carried out, wherein
Descriptor characterizes application scenarios, range, field of the voice messaging etc..Illustratively, in voice messaging, " palm waist-leg area is anti-
What answer " in, descriptor can be " medical treatment ";In voice messaging " which poem of li po has ", corresponding main body word can be with
It is " poet ", " ancient poetry ".
In the present embodiment, theme term vector is the parameter for characterizing the descriptor, can be Multidimensional numerical, i.e., by different more
Dimension group uniquely determines corresponding descriptor.
S102, the voice messaging is identified by neural network model to determine recognition result.
In the present embodiment, neural network model obtains after can carrying out neural metwork training by great amount of samples, can be by defeated
The correct identification content for entering audio sample file and corresponding mark is determined in combination acoustic feature, language feature.
In the present embodiment, voice messaging is identified by advance trained neural network model to obtain identification knot
Fruit, wherein, which includes one or more.Illustratively, voice messaging input by user is " which poem li po has ",
The voice is identified by neural network model, recognition result can be " which thing li po has " and " which poem li po has "
Two recognition results.
S103, recognition result progress vectorization is handled to obtain recognition result vector.
In the present embodiment, recognition result vector can be Multidimensional numerical, and array dimension is identical with theme term vector latitude, makes
Corresponding comparison can be carried out by obtaining recognition result vector sum theme term vector.It should be noted that the recognition result vector is for only
One characterization recognition result, can also be and the recognition result, the present embodiment are characterized by the way of multidigit binary data coding
Specific characteristic manner is not limited.
S104, the distance value for calculating theme term vector described in the recognition result vector sum, return the distance value
One change handles to obtain the distance weighting value of the recognition result and the descriptor, and institute's predicate is determined according to the distance weighting value
The final corresponding identification content of message breath.
In the present embodiment, the distance weighting value of recognition result and descriptor characterize recognition result and descriptor close to journey
Degree, Ke Yishi, weighted value is higher, and degree of closeness is bigger.The calculating of distance weighting value, which takes, calculates recognition result vector sum theme
The distance value of term vector, the mode that value of adjusting the distance is normalized.I.e. by recognition result respectively with the association journey of descriptor
Degree is calculated, using the most strong recognition result of correlation degree as final identification content feed to user.
Illustratively, descriptor is " poet ", and the recognition result of voice messaging is " which poem li po has " and " which li po has
A little things " determine the distance weighting value of " which poem li po has " and descriptor " poet " higher than " li po has by the calculating of S104
The distance weighting value of which thing " and descriptor " poet ", then by " which poem li po has " this recognition result as final and language
The corresponding identification content feed of message breath is to user.Due to introducing descriptor this identification latitude in speech recognition process,
So that speech recognition accuracy greatly improves, be also more in line with the real demand of user, disposably can correctly be identified in
Appearance avoids the problem that the inefficiency that quadratic search is brought.
A kind of audio recognition method is present embodiments provided, by by voice messaging input by user and the voice messaging pair
The descriptor answered is associated, and determines final speech recognition content according to voice messaging and corresponding descriptor, solves only
The sound dimension of the voice messaging uploaded by user is identified, and the voice messaging for unisonance or nearly sound brought identifies effect
The very poor problem of fruit so that speech recognition content and the demand matching degree of user significantly improve, and recognition efficiency significantly increases.
Embodiment two
Fig. 2 is the flow chart of audio recognition method provided by Embodiment 2 of the present invention, on the basis of above-described embodiment one,
Optionally, it is described to carry out vectorization to the recognition result and handle to obtain recognition result vector to include:
It is true according to the identification term vector in the recognition result and the corresponding identification word weighted value of the identification term vector
The fixed recognition result vector.
Recognition result vector can be determined to the recognition result of statement type as a result, so that by recognition result vector sum master
The vectorial calculated distance weighted value of epigraph is more accurate, and speech recognition accuracy, efficiency further improve.
Based on above-mentioned optimization, as shown in Fig. 2, technical solution provided in this embodiment is specific as follows:
S201, voice messaging input by user is received, determines descriptor corresponding with the voice messaging and descriptor
Vector.
S202, the voice messaging is identified by neural network model to determine recognition result.
S203, according to the identification term vector in the recognition result and it is described identification term vector it is corresponding identification word weight
Value determines the recognition result vector.
In the present embodiment, to recognition result carry out vectorization take by recognition result carry out semantic grammar analysis be split as it is more
A word obtains corresponding recognition result vector by the way of based on term vector.Illustratively, voice recognition result is sentence
Sentence_i is analyzed by semantic grammar, by sentence sentence_i be decomposed into word word_i_1, word_i_2 ...,
Word_i_n, wherein word_i_1, word_i_2 ..., the corresponding weights of word_i_n be set as Rank_i_1, Rank_ successively
I_2 ..., the vector of Rank_i_n, sentence sentence_i can carry out following definitions:
Vector (sentence_i)=Vector (word_i_1) * Rank_i_1+Vector (word_i_2) * Rank_
i_2+…+Vector(word_i_n)*Rank_i_n。
S204, the distance value for calculating theme term vector described in the recognition result vector sum, return the distance value
One change handles to obtain the distance weighting value of the recognition result and the descriptor, and institute's predicate is determined according to the distance weighting value
The final corresponding identification content of message breath.
A kind of audio recognition method is present embodiments provided, the fractionation based on term vector is carried out to preliminary recognition result, then
It determines recognition result vector, compares finally to determine and voice messaging pair by the association of recognition result vector sum theme term vector
The identification content answered so that speech recognition content and the demand matching degree of user significantly improve, and recognition efficiency significantly increases.
Based on the above technical solution, S203 can also be:Syntax tree is built according to the recognition result, according to institute
It states the node in syntax tree and the corresponding node weighted value of the node determines the recognition result vector.Wherein, syntax tree
It is the figure expression of sentence structure, it represents the derivation result of sentence, is derived for sentence according to semantic grammar rule
When obtain.It determines to know by carrying out rationally shifting fractionation onto to the sentence in recognition result in the form of syntax tree in this programme
Other result vector so that more accurate by recognition result vector sum theme term vector calculated distance weighted value, voice is known
Other accuracy rate, efficiency further improve.
Embodiment three
Fig. 3 is the flow chart for the audio recognition method that the embodiment of the present invention three provides, can on the basis of above-described embodiment
Choosing, the distance value is normalized in the distance value for calculating theme term vector described in the recognition result vector sum
Processing obtains the recognition result and the distance weighting value of the descriptor includes:
The COS distance value of theme term vector described in the recognition result vector sum is calculated, it is true according to the COS distance value
The distance weighting value of the fixed recognition result and the descriptor.
It can obtain the correlation degree between recognition result and descriptor as a result, and then by the high recognition result of correlation degree
As final identification content corresponding with voice messaging.
Based on above-mentioned optimization, as shown in figure 3, technical solution provided in this embodiment is specific as follows:
S301, voice messaging input by user is received, determines descriptor corresponding with the voice messaging and descriptor
Vector.
S302, the voice messaging is identified by neural network model to determine recognition result.
S303, recognition result progress vectorization is handled to obtain recognition result vector.
S304, the COS distance value for calculating theme term vector described in the recognition result vector sum, according to the cosine away from
From the distance weighting value that value determines the recognition result and the descriptor, the voice letter is determined according to the distance weighting value
The final corresponding identification content of breath.
Illustratively, it is assumed that n recognition result sentence is shared in recognition result, wherein, the vector sum theme of i-th of sentence
The cosine of term vector can be expressed as away from value:Distance (sentence_i, topic)=cos (Vector (sentence_
I), Vector (topic)).Normalization process can calculate the COS distance of each recognition result vector sum theme term vector
Inverse account for the ratio of the sum of the inverse of COS distance of all recognition result vector sum theme term vectors as the recognition result
The distance weighting value of sentence, even:
Rank (sentence_i, topic)=1/Distance (sentence_i, topic),
Rank (sentence_i, topic)=Rank (sentence_i, topic)/(Rank (sentence_1,
topic)+Rank(sentence_2,topic)+…+Rank(sentence_n,topic)).According to result of calculation, to each
The distance weighting value of recognition result vector sum theme term vector is compared, and the conduct of weighted value maximum finally identifies content.
It should be noted that S301-S304 is illustratively formed one embodiment by the present invention performs a kind of intelligent interaction
Method, but be only the present invention a kind of example, in other embodiments of the invention, can by S301, S302 and S203,
S304 forms a new embodiment.
A kind of audio recognition method is present embodiments provided, by the cosine for calculating recognition result vector sum theme term vector
Distance value, and then determine the distance weighting value of recognition result and descriptor, determine final identification content according to distance weighting value,
So that speech recognition content and the demand matching degree of user significantly improve, recognition efficiency significantly increases.
Example IV
Fig. 4 is the flow chart for the audio recognition method that the embodiment of the present invention four provides, can on the basis of above-described embodiment
Choosing, it is described that the voice messaging is identified by neural network model to determine that recognition result includes:
The voice messaging is identified by neural network model to determine recognition result and the recognition result pair
The recognition result weighted value answered;
Further, it is described to determine the final corresponding identification content packet of the voice messaging according to the distance weighting value
It includes:
The voice messaging finally corresponding identification is determined according to the distance weighting value and the recognition result weighted value
Content.
As a result, in the correlation degree for calculating recognition result and descriptor, introduce recognition result weighted value and participate in finally
Identify determining for content so that speech recognition content and the demand matching degree of user further improve.
Based on above-mentioned optimization, as shown in figure 4, technical solution provided in this embodiment is specific as follows:
S401, voice messaging input by user is received, determines descriptor corresponding with the voice messaging and descriptor
Vector.
S402, the voice messaging is identified by neural network model to determine recognition result and the identification knot
The corresponding recognition result weighted value of fruit.
In the present embodiment, recognition result is one or more, and recognition result weighted value can pass through language model level word
The product of the probability of sequence probability in itself and acoustics model level voice signal obtains.Wherein, language model level word sequence
N-gram models can be used in row probability, are determined such as 2-gram models;The probability of acoustic model level voice signal can pass through
The modes such as Viterbi algorithm, neural network combination Bayesian formula determine that the present embodiment does not limit.
S403, recognition result progress vectorization is handled to obtain recognition result vector.
S404, the COS distance value for calculating theme term vector described in the recognition result vector sum, according to the cosine away from
From the distance weighting value that value determines the recognition result and the descriptor, according to the distance weighting value and the recognition result
Weighted value determines the voice messaging finally corresponding identification content.
In the present embodiment, determine voice messaging finally in corresponding identification according to distance weighting value and recognition result weighted value
Hold, Ke Yishi, by distance weighting value and recognition result weighted value linear, additive, corresponding to the maximum result value that will add up
Recognition result is determined as voice messaging finally corresponding identification content.
It should be noted that S401-S404 is illustratively formed one embodiment by the present invention performs a kind of intelligent interaction
Method, but be only the present invention a kind of example, in other embodiments of the invention, can by S401, S402 and S203,
S404 forms a new embodiment.
A kind of audio recognition method is present embodiments provided, correlation degree and identification according to recognition result and descriptor
As a result the weight of itself determines final identification content so that speech recognition content and the demand matching degree of user further carry
Height, recognition efficiency significantly increase.
Embodiment five
Fig. 5 is the flow chart for the audio recognition method that the embodiment of the present invention five provides, can on the basis of above-described embodiment
Choosing, it is described to determine that descriptor corresponding with the voice messaging and theme term vector include:
Using the descriptor to prestore and theme term vector as the descriptor of the voice messaging and theme term vector.
As a result, when carrying out speech recognition, it can determine that the degree of association is highest according to the descriptor and theme term vector to prestore
Recognition result is as final identification content.
Based on above-mentioned optimization, as shown in figure 5, technical solution provided in this embodiment is specific as follows:
S501, voice messaging input by user is received, using the descriptor to prestore and theme term vector as the voice
The descriptor of information and theme term vector.
In the present embodiment, which can be obtained by the identification content determined in upper wheel speech recognition process
Arrive, it is last round of obtain the descriptor after, can correspond to and acquire the corresponding theme term vector of the descriptor, by descriptor and corresponding close
The theme term vector of connection stores together, when receiving voice messaging again and needing that it is identified, can be prestored based on this
Descriptor and theme term vector are identified.Illustratively, the descriptor to prestore and corresponding theme term vector can be according to every
The identification content of wheel is dynamically determined again, can be to the theme that prestores when the descriptor determined and previous round descriptor difference
Word is updated, to ensure the accuracy of next round speech recognition.
S502, the voice messaging is identified by neural network model to determine recognition result and the identification knot
The corresponding recognition result weighted value of fruit.
S503, recognition result progress vectorization is handled to obtain recognition result vector.
S504, the COS distance value for calculating theme term vector described in the recognition result vector sum, according to the cosine away from
From the distance weighting value that value determines the recognition result and the descriptor, according to the distance weighting value and the recognition result
Weighted value determines the voice messaging finally corresponding identification content.
It should be noted that S501-S504 is illustratively formed one embodiment by the present invention performs a kind of intelligent interaction
Method, but be only the present invention a kind of example, in other embodiments of the invention, can by S501, S502 and S203,
S504 forms a new embodiment.
Present embodiments provide a kind of audio recognition method, by by recognition result and the descriptor that prestores be compared come
Determine final identification content so that speech recognition content and the demand matching degree of user further improve, and recognition efficiency is notable
Enhancing.
Based on the above technical solution, S501 can also be:Voice messaging input by user is received, it is defeated to receive user
The descriptor that entering or supplier sends, theme term vector is determined according to the descriptor, by the descriptor received with
And the theme term vector determined is as the corresponding descriptor of the voice messaging and theme term vector.Illustratively,
Supplier is ctrip.com, can be automatically using the theme of the website as descriptor when ctrip.com is when using the speech-recognition services
Cloud Server is sent to, to obtain more accurate identification content, certainly, user is more accurate in order to obtain to identify content,
Also it can voluntarily confirm descriptor when sending voice messaging, while the descriptor is sent to Cloud Server.
Embodiment six
Fig. 6 is the structure diagram for the speech recognition equipment that the embodiment of the present invention six provides, and described device is above-mentioned for performing
The audio recognition method that embodiment provides, has the corresponding function module of execution method and advantageous effect.It is as shown in fig. 6, described
Device includes theme determining module 601, recognition result determining module 602, recognition result vector determining module 603 and identification content
Determining module 604.
Wherein, theme determining module 601 for receiving voice messaging input by user, determines and the voice messaging pair
The descriptor and theme term vector answered;
Recognition result determining module 602 is identified to determine for passing through neural network model to the voice messaging
Recognition result;
Recognition result vector determining module 603 handles to obtain recognition result for carrying out vectorization to the recognition result
Vector;
Identify content determination module 604, for calculating the distance value of theme term vector described in the recognition result vector sum,
Be normalized to obtain the distance weighting value of the recognition result and the descriptor to the distance value, according to it is described away from
The voice messaging finally corresponding identification content is determined from weighted value.
Intelligent interaction device provided in this embodiment, by the way that voice messaging input by user is corresponding with the voice messaging
Descriptor is associated, and determines final speech recognition content according to voice messaging and corresponding descriptor, solves and only pass through
The sound dimension for the voice messaging that user uploads is identified, and the voice messaging recognition effect for unisonance or nearly sound brought is very
Poor problem so that speech recognition content and the demand matching degree of user significantly improve, and recognition efficiency significantly increases.
Based on the above technical solution, the recognition result vector determining module 603 is specifically used for:
It is true according to the identification term vector in the recognition result and the corresponding identification word weighted value of the identification term vector
The fixed recognition result vector;Or
Syntax tree is built according to the recognition result, according to the node in the syntax tree and the corresponding knot of the node
Point weighted value determines the recognition result vector.
Based on the above technical solution, the identification content determination module 604 is specifically used for:
The COS distance value of theme term vector described in the recognition result vector sum is calculated, it is true according to the COS distance value
The distance weighting value of the fixed recognition result and the descriptor.
Based on the above technical solution, the recognition result determining module 602 is specifically used for:
The voice messaging is identified by neural network model to determine recognition result and the recognition result pair
The recognition result weighted value answered;
The identification content determination module 604 is specifically used for:
The voice messaging finally corresponding identification is determined according to the distance weighting value and the recognition result weighted value
Content.
Based on the above technical solution, the theme determining module is specifically used for:
Using the descriptor to prestore and theme term vector as the descriptor of the voice messaging and theme term vector;Or
The descriptor that input by user or supplier sends is received, theme term vector is determined according to the descriptor, will be connect
The descriptor received and the theme term vector determined are as the corresponding descriptor of the voice messaging and master
Epigraph vector.
Embodiment seven
The present embodiment provides a kind of terminal device on the basis of the various embodiments described above, which can include this
The speech recognition equipment that inventive embodiments six provide.Fig. 7 is the structure diagram for the terminal device that the embodiment of the present invention seven provides,
As shown in fig. 7, the terminal device memory 701, central processing unit (Central Processing Unit, CPU) 702, peripheral hardware
Interface 703, voicefrequency circuit 707, loud speaker 711, power management chip 708, input/output (I/O) subsystem 709, touch screen
712nd, other input/control devicess 710 and outside port 704, these components pass through one or more communication bus or signal
Line 707 communicates.
It should be understood that graphic terminal 700 is only an example of terminal device, and terminal device 700
Can have than more or less components shown in figure, two or more components can be combined or can be with
It is configured with different components.Various parts shown in figure can be including one or more signal processings and/or special
Hardware, software including integrated circuit are realized in the combination of hardware and software.
Below just the terminal device provided in this embodiment for intelligent interaction be described in detail, the terminal device with
For Story machine.
Memory 701, the memory 701 can be by access such as CPU702, Peripheral Interfaces 703, and the memory 701 can
To include high-speed random access memory, nonvolatile memory can also be included, such as one or more disk memory,
Flush memory device or other volatile solid-state parts.
The peripheral hardware that outputs and inputs of equipment can be connected to CPU702 and deposited by Peripheral Interface 703, the Peripheral Interface 703
Reservoir 701.
I/O subsystems 709, the I/O subsystems 709 can be by the input/output peripherals in equipment, such as touch screen 712
With other input/control devicess 710, it is connected to Peripheral Interface 703.I/O subsystems 709 can include 7091 He of display controller
For controlling one or more input controllers 7092 of other input/control devicess 710.Wherein, one or more input controls
Device 7092 processed receives electric signal from other input/control devicess 710 or sends electric signal to other input/control devicess 710,
Other input/control devicess 710 can include physical button, control stick, click idler wheel.
Power management chip 708, the hardware for being connected by CPU702, I/O subsystem and Peripheral Interface are powered
And power management.
CPU702 provided in an embodiment of the present invention can perform following operation:
Voice messaging input by user is received, determines descriptor corresponding with the voice messaging and theme term vector;
The voice messaging is identified by neural network model to determine recognition result;
Vectorization is carried out to the recognition result to handle to obtain recognition result vector;
The distance value of theme term vector described in the recognition result vector sum is calculated, place is normalized to the distance value
Reason obtains the distance weighting value of the recognition result and the descriptor, and the voice messaging is determined according to the distance weighting value
Final corresponding identification content.
Note that it above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that
The present invention is not limited to specific embodiment described here, can carry out for a person skilled in the art various apparent variations,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention
It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also
It can include other more equivalent embodiments, and the scope of the present invention is determined by scope of the appended claims.
Claims (10)
1. a kind of audio recognition method, which is characterized in that including:
Voice messaging input by user is received, determines descriptor corresponding with the voice messaging and theme term vector;
The voice messaging is identified by neural network model to determine recognition result;
Vectorization is carried out to the recognition result to handle to obtain recognition result vector;
The distance value of theme term vector described in the recognition result vector sum is calculated, the distance value is normalized
To the recognition result and the distance weighting value of the descriptor, determine that the voice messaging is final according to the distance weighting value
Corresponding identification content.
2. according to the method described in claim 1, it is characterized in that, described handle to obtain to recognition result progress vectorization
Recognition result vector includes:
Institute is determined according to the identification term vector in the recognition result and the corresponding identification word weighted value of the identification term vector
State recognition result vector;Or
Syntax tree is built according to the recognition result, according to the node in the syntax tree and the corresponding node power of the node
Weight values determine the recognition result vector.
3. method according to claim 1 or 2, which is characterized in that described to calculate master described in the recognition result vector sum
The distance value of vector is write inscription, the distance value is normalized to obtain the distance of the recognition result and the descriptor
Weighted value includes:
The COS distance value of theme term vector described in the recognition result vector sum is calculated, institute is determined according to the COS distance value
State the distance weighting value of recognition result and the descriptor.
4. according to the method described in claim 3, it is characterized in that, it is described by neural network model to the voice messaging into
Row identification is included with determining recognition result:
The voice messaging is identified by neural network model to determine that recognition result and the recognition result are corresponding
Recognition result weighted value;Correspondingly,
It is described to determine that the final corresponding identification content of the voice messaging includes according to the distance weighting value:
The voice messaging finally corresponding identification content is determined according to the distance weighting value and the recognition result weighted value.
5. according to the method described in claim 4, it is characterized in that, it is described determine descriptor corresponding with the voice messaging with
And theme term vector includes:
Using the descriptor to prestore and theme term vector as the descriptor of the voice messaging and theme term vector;Or
The descriptor that input by user or supplier sends is received, theme term vector is determined according to the descriptor, will be received
The descriptor and the theme term vector determined as the corresponding descriptor of the voice messaging and descriptor
Vector.
6. a kind of speech recognition equipment, which is characterized in that including:
Theme determining module, for receiving voice messaging input by user, determine descriptor corresponding with the voice messaging with
And theme term vector;
Recognition result determining module is identified to determine identification knot for passing through neural network model to the voice messaging
Fruit;
Recognition result vector determining module handles to obtain recognition result vector for carrying out the recognition result vectorization;
Content determination module is identified, for calculating the distance value of theme term vector described in the recognition result vector sum, to described
Distance value is normalized to obtain the distance weighting value of the recognition result and the descriptor, according to the distance weighting
Value determines the voice messaging finally corresponding identification content.
7. device according to claim 6, which is characterized in that the recognition result vector determining module is specifically used for:
Institute is determined according to the identification term vector in the recognition result and the corresponding identification word weighted value of the identification term vector
State recognition result vector;Or
Syntax tree is built according to the recognition result, according to the node in the syntax tree and the corresponding node power of the node
Weight values determine the recognition result vector.
8. the device described according to claim 6 or 7, which is characterized in that the identification content determination module is specifically used for:
The COS distance value of theme term vector described in the recognition result vector sum is calculated, institute is determined according to the COS distance value
State the distance weighting value of recognition result and the descriptor.
9. device according to claim 8, which is characterized in that the recognition result determining module is specifically used for:
The voice messaging is identified by neural network model to determine that recognition result and the recognition result are corresponding
Recognition result weighted value;
The identification content determination module is specifically used for:
The voice messaging finally corresponding identification content is determined according to the distance weighting value and the recognition result weighted value;
The theme determining module is specifically used for:
Using the descriptor to prestore and theme term vector as the descriptor of the voice messaging and theme term vector;Or
The descriptor that input by user or supplier sends is received, theme term vector is determined according to the descriptor, will be received
The descriptor and the theme term vector determined as the corresponding descriptor of the voice messaging and descriptor
Vector.
10. a kind of terminal device, which is characterized in that the terminal device is integrated with as described in any one in claim 6-9
Device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611166106.4A CN108206020A (en) | 2016-12-16 | 2016-12-16 | A kind of audio recognition method, device and terminal device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611166106.4A CN108206020A (en) | 2016-12-16 | 2016-12-16 | A kind of audio recognition method, device and terminal device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108206020A true CN108206020A (en) | 2018-06-26 |
Family
ID=62601495
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611166106.4A Pending CN108206020A (en) | 2016-12-16 | 2016-12-16 | A kind of audio recognition method, device and terminal device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108206020A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108986797A (en) * | 2018-08-06 | 2018-12-11 | 中国科学技术大学 | A kind of voice subject identifying method and system |
CN112015872A (en) * | 2019-05-29 | 2020-12-01 | 华为技术有限公司 | Question recognition method and device |
CN112466327A (en) * | 2020-10-23 | 2021-03-09 | 北京百度网讯科技有限公司 | Voice processing method and device and electronic equipment |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1573926A (en) * | 2003-06-03 | 2005-02-02 | 微软公司 | Discriminative training of language models for text and speech classification |
CN103474065A (en) * | 2013-09-24 | 2013-12-25 | 贵阳世纪恒通科技有限公司 | Method for determining and recognizing voice intentions based on automatic classification technology |
CN104424296A (en) * | 2013-09-02 | 2015-03-18 | 阿里巴巴集团控股有限公司 | Query word classifying method and query word classifying device |
US20150222450A1 (en) * | 2012-09-10 | 2015-08-06 | Samsung Electronics Co., Ltd. | System and method of controlling external apparatus connected with device |
CN105808590A (en) * | 2014-12-31 | 2016-07-27 | 中国电信股份有限公司 | Search engine realization method as well as search method and apparatus |
CN105869642A (en) * | 2016-03-25 | 2016-08-17 | 海信集团有限公司 | Voice text error correction method and device |
CN105931642A (en) * | 2016-05-31 | 2016-09-07 | 北京灵隆科技有限公司 | Speech recognition method, apparatus and system |
CN106095737A (en) * | 2016-06-07 | 2016-11-09 | 杭州凡闻科技有限公司 | Documents Similarity computational methods and similar document the whole network retrieval tracking |
CN106202068A (en) * | 2016-07-25 | 2016-12-07 | 哈尔滨工业大学 | The machine translation method of semantic vector based on multi-lingual parallel corpora |
CN106205622A (en) * | 2016-06-29 | 2016-12-07 | 联想(北京)有限公司 | Information processing method and electronic equipment |
CN106228983A (en) * | 2016-08-23 | 2016-12-14 | 北京谛听机器人科技有限公司 | Scene process method and system during a kind of man-machine natural language is mutual |
-
2016
- 2016-12-16 CN CN201611166106.4A patent/CN108206020A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1573926A (en) * | 2003-06-03 | 2005-02-02 | 微软公司 | Discriminative training of language models for text and speech classification |
US20150222450A1 (en) * | 2012-09-10 | 2015-08-06 | Samsung Electronics Co., Ltd. | System and method of controlling external apparatus connected with device |
CN104424296A (en) * | 2013-09-02 | 2015-03-18 | 阿里巴巴集团控股有限公司 | Query word classifying method and query word classifying device |
CN103474065A (en) * | 2013-09-24 | 2013-12-25 | 贵阳世纪恒通科技有限公司 | Method for determining and recognizing voice intentions based on automatic classification technology |
CN105808590A (en) * | 2014-12-31 | 2016-07-27 | 中国电信股份有限公司 | Search engine realization method as well as search method and apparatus |
CN105869642A (en) * | 2016-03-25 | 2016-08-17 | 海信集团有限公司 | Voice text error correction method and device |
CN105931642A (en) * | 2016-05-31 | 2016-09-07 | 北京灵隆科技有限公司 | Speech recognition method, apparatus and system |
CN106095737A (en) * | 2016-06-07 | 2016-11-09 | 杭州凡闻科技有限公司 | Documents Similarity computational methods and similar document the whole network retrieval tracking |
CN106205622A (en) * | 2016-06-29 | 2016-12-07 | 联想(北京)有限公司 | Information processing method and electronic equipment |
CN106202068A (en) * | 2016-07-25 | 2016-12-07 | 哈尔滨工业大学 | The machine translation method of semantic vector based on multi-lingual parallel corpora |
CN106228983A (en) * | 2016-08-23 | 2016-12-14 | 北京谛听机器人科技有限公司 | Scene process method and system during a kind of man-machine natural language is mutual |
Non-Patent Citations (2)
Title |
---|
何炎祥: "《编译原理》", 31 October 2000 * |
程英英: "Web挖掘技术及其在邮件系统中的应用", 《万方数据知识服务平台》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108986797A (en) * | 2018-08-06 | 2018-12-11 | 中国科学技术大学 | A kind of voice subject identifying method and system |
CN112015872A (en) * | 2019-05-29 | 2020-12-01 | 华为技术有限公司 | Question recognition method and device |
CN112466327A (en) * | 2020-10-23 | 2021-03-09 | 北京百度网讯科技有限公司 | Voice processing method and device and electronic equipment |
CN112466327B (en) * | 2020-10-23 | 2022-02-22 | 北京百度网讯科技有限公司 | Voice processing method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108319599B (en) | Man-machine conversation method and device | |
CN110287461B (en) | Text conversion method, device and storage medium | |
CN110444199B (en) | Voice keyword recognition method and device, terminal and server | |
CN108694940B (en) | Voice recognition method and device and electronic equipment | |
WO2019076286A1 (en) | User intent recognition method and device for a statement | |
WO2021022992A1 (en) | Dialog generation model training method and device, and dialog generation method and device, and medium | |
CN110517664B (en) | Multi-party identification method, device, equipment and readable storage medium | |
CN106875940B (en) | Machine self-learning construction knowledge graph training method based on neural network | |
CN110275939B (en) | Method and device for determining conversation generation model, storage medium and electronic equipment | |
CN107978311A (en) | A kind of voice data processing method, device and interactive voice equipment | |
US20190221208A1 (en) | Method, user interface, and device for audio-based emoji input | |
CN111402861B (en) | Voice recognition method, device, equipment and storage medium | |
CN111309883A (en) | Man-machine conversation method based on artificial intelligence, model training method and device | |
WO2021135457A1 (en) | Recurrent neural network-based emotion recognition method, apparatus, and storage medium | |
CN111694940A (en) | User report generation method and terminal equipment | |
CN113314119B (en) | Voice recognition intelligent household control method and device | |
CN108206020A (en) | A kind of audio recognition method, device and terminal device | |
CN112836521A (en) | Question-answer matching method and device, computer equipment and storage medium | |
CN108053826A (en) | For the method, apparatus of human-computer interaction, electronic equipment and storage medium | |
US10269349B2 (en) | Voice interactive device and voice interaction method | |
JP2021081713A (en) | Method, device, apparatus, and media for processing voice signal | |
CN114023309A (en) | Speech recognition system, related method, device and equipment | |
WO2020199590A1 (en) | Mood detection analysis method and related device | |
WO2023040545A1 (en) | Data processing method and apparatus, device, storage medium, and program product | |
CN111680514A (en) | Information processing and model training method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180626 |