CN108206020A - A kind of audio recognition method, device and terminal device - Google Patents

A kind of audio recognition method, device and terminal device Download PDF

Info

Publication number
CN108206020A
CN108206020A CN201611166106.4A CN201611166106A CN108206020A CN 108206020 A CN108206020 A CN 108206020A CN 201611166106 A CN201611166106 A CN 201611166106A CN 108206020 A CN108206020 A CN 108206020A
Authority
CN
China
Prior art keywords
recognition result
descriptor
voice messaging
vector
term vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611166106.4A
Other languages
Chinese (zh)
Inventor
李黄海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Intelligent Housekeeper Technology Co Ltd
Original Assignee
Beijing Intelligent Housekeeper Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Intelligent Housekeeper Technology Co Ltd filed Critical Beijing Intelligent Housekeeper Technology Co Ltd
Priority to CN201611166106.4A priority Critical patent/CN108206020A/en
Publication of CN108206020A publication Critical patent/CN108206020A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

The embodiment of the invention discloses a kind of audio recognition method, device and terminal device, this method includes:Voice messaging input by user is received, determines descriptor corresponding with the voice messaging and theme term vector;The voice messaging is identified by neural network model to determine recognition result;Vectorization is carried out to the recognition result to handle to obtain recognition result vector;Calculate the distance value of theme term vector described in the recognition result vector sum, the distance value is normalized to obtain the distance weighting value of the recognition result and the descriptor, the voice messaging finally corresponding identification content is determined according to the distance weighting value.This programme causes the demand matching degree of speech recognition content and user to significantly improve, and recognition efficiency significantly increases.

Description

A kind of audio recognition method, device and terminal device
Technical field
The present embodiments relate to a kind of speech recognition technology more particularly to audio recognition method, device and terminal devices.
Background technology
Speech exchange is carried out with machine, machine is allowed to understand what you say, this is the thing that people dream of for a long time. Alliance of Chinese Internet of Things school-run enterprise is vivid that speech recognition ratio is done " auditory system of machine ".Speech recognition technology is exactly to allow machine Voice signal is changed into understanding process by identifying the intellectual technology of corresponding text or order.
In the prior art, when user speech is identified generally use cloud service speech recognition mode, i.e. user Voice messaging is uploaded by terminal device, it is known after speech recognition cloud server to the file comprising voice messaging Not to return to corresponding text message to terminal device.
In said program, the sound dimension for the voice messaging that speech recognition server is uploaded by user is identified, and makes It is very poor for the voice messaging recognition effect of unisonance or nearly sound, if user speech is expressed the meaning as " which poem he has ", and final Identify content may be " which thing he has ".
Invention content
The present invention provides a kind of audio recognition method, device and terminal devices so that speech recognition content and user's Demand matching degree significantly improves, and recognition efficiency significantly increases.
In a first aspect, an embodiment of the present invention provides a kind of audio recognition method, including:
Voice messaging input by user is received, determines descriptor corresponding with the voice messaging and theme term vector;
The voice messaging is identified by neural network model to determine recognition result;
Vectorization is carried out to the recognition result to handle to obtain recognition result vector;
The distance value of theme term vector described in the recognition result vector sum is calculated, place is normalized to the distance value Reason obtains the distance weighting value of the recognition result and the descriptor, and the voice messaging is determined according to the distance weighting value Final corresponding identification content.
Second aspect, the embodiment of the present invention additionally provide a kind of speech recognition equipment, including:
Theme determining module for receiving voice messaging input by user, determines theme corresponding with the voice messaging Word and theme term vector;
Recognition result determining module is identified to determine identification for passing through neural network model to the voice messaging As a result;
Recognition result vector determining module, for the recognition result is carried out vectorization handle to obtain recognition result to Amount;
Identify content determination module, it is right for calculating the distance value of theme term vector described in the recognition result vector sum The distance value is normalized to obtain the distance weighting value of the recognition result and the descriptor, according to the distance Weighted value determines the voice messaging finally corresponding identification content.
The third aspect, the embodiment of the present invention additionally provide a kind of terminal device, and the terminal device is integrated with foregoing description Device.
Technical solution provided in an embodiment of the present invention, by the way that voice messaging input by user is corresponding with the voice messaging Descriptor is associated, and determines final speech recognition content according to voice messaging and corresponding descriptor, solves and only pass through The sound dimension for the voice messaging that user uploads is identified, and the voice messaging recognition effect for unisonance or nearly sound brought is very Poor problem so that speech recognition content and the demand matching degree of user significantly improve, and recognition efficiency significantly increases.
Description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, of the invention is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is the flow chart for the audio recognition method that the embodiment of the present invention one provides;
Fig. 2 is the flow chart of audio recognition method provided by Embodiment 2 of the present invention;
Fig. 3 is the flow chart for the audio recognition method that the embodiment of the present invention three provides;
Fig. 4 is the flow chart for the audio recognition method that the embodiment of the present invention four provides;
Fig. 5 is the flow chart for the audio recognition method that the embodiment of the present invention five provides;
Fig. 6 is the structure diagram for the speech recognition equipment that the embodiment of the present invention six provides;
Fig. 7 is the structure diagram for the terminal device that the embodiment of the present invention seven provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is only used for explaining the present invention rather than limitation of the invention.It also should be noted that for the ease of It describes, part related to the present invention rather than entire infrastructure is illustrated only in attached drawing.
Embodiment one
Fig. 1 is the flow chart for the audio recognition method that the embodiment of the present invention one provides, and the present embodiment is applicable to user The situation that the voice messaging of input is identified, this method can be performed by Cloud Server, as shown in Figure 1, the present embodiment carries The concrete scheme of confession is as follows:
S101, voice messaging input by user is received, determines descriptor corresponding with the voice messaging and descriptor Vector.
In the present embodiment, user can carry out voice by terminal device such as smart mobile phone, tablet computer, laptop etc. The typing of information, terminal device can be acquired the voice messaging of user's typing by microphone and be uploaded in Cloud Server.Wherein, Cloud Server can be that speech-recognition services are provided separately in other suppliers, the speech identifying function typing for passing through ctrip.com such as user The user speech information got is sent to the Cloud Server to carry out speech recognition by voice, ctrip.com's server.
In the present embodiment, after voice messaging input by user is received, determining for voice messaging descriptor is carried out, wherein Descriptor characterizes application scenarios, range, field of the voice messaging etc..Illustratively, in voice messaging, " palm waist-leg area is anti- What answer " in, descriptor can be " medical treatment ";In voice messaging " which poem of li po has ", corresponding main body word can be with It is " poet ", " ancient poetry ".
In the present embodiment, theme term vector is the parameter for characterizing the descriptor, can be Multidimensional numerical, i.e., by different more Dimension group uniquely determines corresponding descriptor.
S102, the voice messaging is identified by neural network model to determine recognition result.
In the present embodiment, neural network model obtains after can carrying out neural metwork training by great amount of samples, can be by defeated The correct identification content for entering audio sample file and corresponding mark is determined in combination acoustic feature, language feature.
In the present embodiment, voice messaging is identified by advance trained neural network model to obtain identification knot Fruit, wherein, which includes one or more.Illustratively, voice messaging input by user is " which poem li po has ", The voice is identified by neural network model, recognition result can be " which thing li po has " and " which poem li po has " Two recognition results.
S103, recognition result progress vectorization is handled to obtain recognition result vector.
In the present embodiment, recognition result vector can be Multidimensional numerical, and array dimension is identical with theme term vector latitude, makes Corresponding comparison can be carried out by obtaining recognition result vector sum theme term vector.It should be noted that the recognition result vector is for only One characterization recognition result, can also be and the recognition result, the present embodiment are characterized by the way of multidigit binary data coding Specific characteristic manner is not limited.
S104, the distance value for calculating theme term vector described in the recognition result vector sum, return the distance value One change handles to obtain the distance weighting value of the recognition result and the descriptor, and institute's predicate is determined according to the distance weighting value The final corresponding identification content of message breath.
In the present embodiment, the distance weighting value of recognition result and descriptor characterize recognition result and descriptor close to journey Degree, Ke Yishi, weighted value is higher, and degree of closeness is bigger.The calculating of distance weighting value, which takes, calculates recognition result vector sum theme The distance value of term vector, the mode that value of adjusting the distance is normalized.I.e. by recognition result respectively with the association journey of descriptor Degree is calculated, using the most strong recognition result of correlation degree as final identification content feed to user.
Illustratively, descriptor is " poet ", and the recognition result of voice messaging is " which poem li po has " and " which li po has A little things " determine the distance weighting value of " which poem li po has " and descriptor " poet " higher than " li po has by the calculating of S104 The distance weighting value of which thing " and descriptor " poet ", then by " which poem li po has " this recognition result as final and language The corresponding identification content feed of message breath is to user.Due to introducing descriptor this identification latitude in speech recognition process, So that speech recognition accuracy greatly improves, be also more in line with the real demand of user, disposably can correctly be identified in Appearance avoids the problem that the inefficiency that quadratic search is brought.
A kind of audio recognition method is present embodiments provided, by by voice messaging input by user and the voice messaging pair The descriptor answered is associated, and determines final speech recognition content according to voice messaging and corresponding descriptor, solves only The sound dimension of the voice messaging uploaded by user is identified, and the voice messaging for unisonance or nearly sound brought identifies effect The very poor problem of fruit so that speech recognition content and the demand matching degree of user significantly improve, and recognition efficiency significantly increases.
Embodiment two
Fig. 2 is the flow chart of audio recognition method provided by Embodiment 2 of the present invention, on the basis of above-described embodiment one, Optionally, it is described to carry out vectorization to the recognition result and handle to obtain recognition result vector to include:
It is true according to the identification term vector in the recognition result and the corresponding identification word weighted value of the identification term vector The fixed recognition result vector.
Recognition result vector can be determined to the recognition result of statement type as a result, so that by recognition result vector sum master The vectorial calculated distance weighted value of epigraph is more accurate, and speech recognition accuracy, efficiency further improve.
Based on above-mentioned optimization, as shown in Fig. 2, technical solution provided in this embodiment is specific as follows:
S201, voice messaging input by user is received, determines descriptor corresponding with the voice messaging and descriptor Vector.
S202, the voice messaging is identified by neural network model to determine recognition result.
S203, according to the identification term vector in the recognition result and it is described identification term vector it is corresponding identification word weight Value determines the recognition result vector.
In the present embodiment, to recognition result carry out vectorization take by recognition result carry out semantic grammar analysis be split as it is more A word obtains corresponding recognition result vector by the way of based on term vector.Illustratively, voice recognition result is sentence Sentence_i is analyzed by semantic grammar, by sentence sentence_i be decomposed into word word_i_1, word_i_2 ..., Word_i_n, wherein word_i_1, word_i_2 ..., the corresponding weights of word_i_n be set as Rank_i_1, Rank_ successively I_2 ..., the vector of Rank_i_n, sentence sentence_i can carry out following definitions:
Vector (sentence_i)=Vector (word_i_1) * Rank_i_1+Vector (word_i_2) * Rank_ i_2+…+Vector(word_i_n)*Rank_i_n。
S204, the distance value for calculating theme term vector described in the recognition result vector sum, return the distance value One change handles to obtain the distance weighting value of the recognition result and the descriptor, and institute's predicate is determined according to the distance weighting value The final corresponding identification content of message breath.
A kind of audio recognition method is present embodiments provided, the fractionation based on term vector is carried out to preliminary recognition result, then It determines recognition result vector, compares finally to determine and voice messaging pair by the association of recognition result vector sum theme term vector The identification content answered so that speech recognition content and the demand matching degree of user significantly improve, and recognition efficiency significantly increases.
Based on the above technical solution, S203 can also be:Syntax tree is built according to the recognition result, according to institute It states the node in syntax tree and the corresponding node weighted value of the node determines the recognition result vector.Wherein, syntax tree It is the figure expression of sentence structure, it represents the derivation result of sentence, is derived for sentence according to semantic grammar rule When obtain.It determines to know by carrying out rationally shifting fractionation onto to the sentence in recognition result in the form of syntax tree in this programme Other result vector so that more accurate by recognition result vector sum theme term vector calculated distance weighted value, voice is known Other accuracy rate, efficiency further improve.
Embodiment three
Fig. 3 is the flow chart for the audio recognition method that the embodiment of the present invention three provides, can on the basis of above-described embodiment Choosing, the distance value is normalized in the distance value for calculating theme term vector described in the recognition result vector sum Processing obtains the recognition result and the distance weighting value of the descriptor includes:
The COS distance value of theme term vector described in the recognition result vector sum is calculated, it is true according to the COS distance value The distance weighting value of the fixed recognition result and the descriptor.
It can obtain the correlation degree between recognition result and descriptor as a result, and then by the high recognition result of correlation degree As final identification content corresponding with voice messaging.
Based on above-mentioned optimization, as shown in figure 3, technical solution provided in this embodiment is specific as follows:
S301, voice messaging input by user is received, determines descriptor corresponding with the voice messaging and descriptor Vector.
S302, the voice messaging is identified by neural network model to determine recognition result.
S303, recognition result progress vectorization is handled to obtain recognition result vector.
S304, the COS distance value for calculating theme term vector described in the recognition result vector sum, according to the cosine away from From the distance weighting value that value determines the recognition result and the descriptor, the voice letter is determined according to the distance weighting value The final corresponding identification content of breath.
Illustratively, it is assumed that n recognition result sentence is shared in recognition result, wherein, the vector sum theme of i-th of sentence The cosine of term vector can be expressed as away from value:Distance (sentence_i, topic)=cos (Vector (sentence_ I), Vector (topic)).Normalization process can calculate the COS distance of each recognition result vector sum theme term vector Inverse account for the ratio of the sum of the inverse of COS distance of all recognition result vector sum theme term vectors as the recognition result The distance weighting value of sentence, even:
Rank (sentence_i, topic)=1/Distance (sentence_i, topic),
Rank (sentence_i, topic)=Rank (sentence_i, topic)/(Rank (sentence_1, topic)+Rank(sentence_2,topic)+…+Rank(sentence_n,topic)).According to result of calculation, to each The distance weighting value of recognition result vector sum theme term vector is compared, and the conduct of weighted value maximum finally identifies content.
It should be noted that S301-S304 is illustratively formed one embodiment by the present invention performs a kind of intelligent interaction Method, but be only the present invention a kind of example, in other embodiments of the invention, can by S301, S302 and S203, S304 forms a new embodiment.
A kind of audio recognition method is present embodiments provided, by the cosine for calculating recognition result vector sum theme term vector Distance value, and then determine the distance weighting value of recognition result and descriptor, determine final identification content according to distance weighting value, So that speech recognition content and the demand matching degree of user significantly improve, recognition efficiency significantly increases.
Example IV
Fig. 4 is the flow chart for the audio recognition method that the embodiment of the present invention four provides, can on the basis of above-described embodiment Choosing, it is described that the voice messaging is identified by neural network model to determine that recognition result includes:
The voice messaging is identified by neural network model to determine recognition result and the recognition result pair The recognition result weighted value answered;
Further, it is described to determine the final corresponding identification content packet of the voice messaging according to the distance weighting value It includes:
The voice messaging finally corresponding identification is determined according to the distance weighting value and the recognition result weighted value Content.
As a result, in the correlation degree for calculating recognition result and descriptor, introduce recognition result weighted value and participate in finally Identify determining for content so that speech recognition content and the demand matching degree of user further improve.
Based on above-mentioned optimization, as shown in figure 4, technical solution provided in this embodiment is specific as follows:
S401, voice messaging input by user is received, determines descriptor corresponding with the voice messaging and descriptor Vector.
S402, the voice messaging is identified by neural network model to determine recognition result and the identification knot The corresponding recognition result weighted value of fruit.
In the present embodiment, recognition result is one or more, and recognition result weighted value can pass through language model level word The product of the probability of sequence probability in itself and acoustics model level voice signal obtains.Wherein, language model level word sequence N-gram models can be used in row probability, are determined such as 2-gram models;The probability of acoustic model level voice signal can pass through The modes such as Viterbi algorithm, neural network combination Bayesian formula determine that the present embodiment does not limit.
S403, recognition result progress vectorization is handled to obtain recognition result vector.
S404, the COS distance value for calculating theme term vector described in the recognition result vector sum, according to the cosine away from From the distance weighting value that value determines the recognition result and the descriptor, according to the distance weighting value and the recognition result Weighted value determines the voice messaging finally corresponding identification content.
In the present embodiment, determine voice messaging finally in corresponding identification according to distance weighting value and recognition result weighted value Hold, Ke Yishi, by distance weighting value and recognition result weighted value linear, additive, corresponding to the maximum result value that will add up Recognition result is determined as voice messaging finally corresponding identification content.
It should be noted that S401-S404 is illustratively formed one embodiment by the present invention performs a kind of intelligent interaction Method, but be only the present invention a kind of example, in other embodiments of the invention, can by S401, S402 and S203, S404 forms a new embodiment.
A kind of audio recognition method is present embodiments provided, correlation degree and identification according to recognition result and descriptor As a result the weight of itself determines final identification content so that speech recognition content and the demand matching degree of user further carry Height, recognition efficiency significantly increase.
Embodiment five
Fig. 5 is the flow chart for the audio recognition method that the embodiment of the present invention five provides, can on the basis of above-described embodiment Choosing, it is described to determine that descriptor corresponding with the voice messaging and theme term vector include:
Using the descriptor to prestore and theme term vector as the descriptor of the voice messaging and theme term vector.
As a result, when carrying out speech recognition, it can determine that the degree of association is highest according to the descriptor and theme term vector to prestore Recognition result is as final identification content.
Based on above-mentioned optimization, as shown in figure 5, technical solution provided in this embodiment is specific as follows:
S501, voice messaging input by user is received, using the descriptor to prestore and theme term vector as the voice The descriptor of information and theme term vector.
In the present embodiment, which can be obtained by the identification content determined in upper wheel speech recognition process Arrive, it is last round of obtain the descriptor after, can correspond to and acquire the corresponding theme term vector of the descriptor, by descriptor and corresponding close The theme term vector of connection stores together, when receiving voice messaging again and needing that it is identified, can be prestored based on this Descriptor and theme term vector are identified.Illustratively, the descriptor to prestore and corresponding theme term vector can be according to every The identification content of wheel is dynamically determined again, can be to the theme that prestores when the descriptor determined and previous round descriptor difference Word is updated, to ensure the accuracy of next round speech recognition.
S502, the voice messaging is identified by neural network model to determine recognition result and the identification knot The corresponding recognition result weighted value of fruit.
S503, recognition result progress vectorization is handled to obtain recognition result vector.
S504, the COS distance value for calculating theme term vector described in the recognition result vector sum, according to the cosine away from From the distance weighting value that value determines the recognition result and the descriptor, according to the distance weighting value and the recognition result Weighted value determines the voice messaging finally corresponding identification content.
It should be noted that S501-S504 is illustratively formed one embodiment by the present invention performs a kind of intelligent interaction Method, but be only the present invention a kind of example, in other embodiments of the invention, can by S501, S502 and S203, S504 forms a new embodiment.
Present embodiments provide a kind of audio recognition method, by by recognition result and the descriptor that prestores be compared come Determine final identification content so that speech recognition content and the demand matching degree of user further improve, and recognition efficiency is notable Enhancing.
Based on the above technical solution, S501 can also be:Voice messaging input by user is received, it is defeated to receive user The descriptor that entering or supplier sends, theme term vector is determined according to the descriptor, by the descriptor received with And the theme term vector determined is as the corresponding descriptor of the voice messaging and theme term vector.Illustratively, Supplier is ctrip.com, can be automatically using the theme of the website as descriptor when ctrip.com is when using the speech-recognition services Cloud Server is sent to, to obtain more accurate identification content, certainly, user is more accurate in order to obtain to identify content, Also it can voluntarily confirm descriptor when sending voice messaging, while the descriptor is sent to Cloud Server.
Embodiment six
Fig. 6 is the structure diagram for the speech recognition equipment that the embodiment of the present invention six provides, and described device is above-mentioned for performing The audio recognition method that embodiment provides, has the corresponding function module of execution method and advantageous effect.It is as shown in fig. 6, described Device includes theme determining module 601, recognition result determining module 602, recognition result vector determining module 603 and identification content Determining module 604.
Wherein, theme determining module 601 for receiving voice messaging input by user, determines and the voice messaging pair The descriptor and theme term vector answered;
Recognition result determining module 602 is identified to determine for passing through neural network model to the voice messaging Recognition result;
Recognition result vector determining module 603 handles to obtain recognition result for carrying out vectorization to the recognition result Vector;
Identify content determination module 604, for calculating the distance value of theme term vector described in the recognition result vector sum, Be normalized to obtain the distance weighting value of the recognition result and the descriptor to the distance value, according to it is described away from The voice messaging finally corresponding identification content is determined from weighted value.
Intelligent interaction device provided in this embodiment, by the way that voice messaging input by user is corresponding with the voice messaging Descriptor is associated, and determines final speech recognition content according to voice messaging and corresponding descriptor, solves and only pass through The sound dimension for the voice messaging that user uploads is identified, and the voice messaging recognition effect for unisonance or nearly sound brought is very Poor problem so that speech recognition content and the demand matching degree of user significantly improve, and recognition efficiency significantly increases.
Based on the above technical solution, the recognition result vector determining module 603 is specifically used for:
It is true according to the identification term vector in the recognition result and the corresponding identification word weighted value of the identification term vector The fixed recognition result vector;Or
Syntax tree is built according to the recognition result, according to the node in the syntax tree and the corresponding knot of the node Point weighted value determines the recognition result vector.
Based on the above technical solution, the identification content determination module 604 is specifically used for:
The COS distance value of theme term vector described in the recognition result vector sum is calculated, it is true according to the COS distance value The distance weighting value of the fixed recognition result and the descriptor.
Based on the above technical solution, the recognition result determining module 602 is specifically used for:
The voice messaging is identified by neural network model to determine recognition result and the recognition result pair The recognition result weighted value answered;
The identification content determination module 604 is specifically used for:
The voice messaging finally corresponding identification is determined according to the distance weighting value and the recognition result weighted value Content.
Based on the above technical solution, the theme determining module is specifically used for:
Using the descriptor to prestore and theme term vector as the descriptor of the voice messaging and theme term vector;Or
The descriptor that input by user or supplier sends is received, theme term vector is determined according to the descriptor, will be connect The descriptor received and the theme term vector determined are as the corresponding descriptor of the voice messaging and master Epigraph vector.
Embodiment seven
The present embodiment provides a kind of terminal device on the basis of the various embodiments described above, which can include this The speech recognition equipment that inventive embodiments six provide.Fig. 7 is the structure diagram for the terminal device that the embodiment of the present invention seven provides, As shown in fig. 7, the terminal device memory 701, central processing unit (Central Processing Unit, CPU) 702, peripheral hardware Interface 703, voicefrequency circuit 707, loud speaker 711, power management chip 708, input/output (I/O) subsystem 709, touch screen 712nd, other input/control devicess 710 and outside port 704, these components pass through one or more communication bus or signal Line 707 communicates.
It should be understood that graphic terminal 700 is only an example of terminal device, and terminal device 700 Can have than more or less components shown in figure, two or more components can be combined or can be with It is configured with different components.Various parts shown in figure can be including one or more signal processings and/or special Hardware, software including integrated circuit are realized in the combination of hardware and software.
Below just the terminal device provided in this embodiment for intelligent interaction be described in detail, the terminal device with For Story machine.
Memory 701, the memory 701 can be by access such as CPU702, Peripheral Interfaces 703, and the memory 701 can To include high-speed random access memory, nonvolatile memory can also be included, such as one or more disk memory, Flush memory device or other volatile solid-state parts.
The peripheral hardware that outputs and inputs of equipment can be connected to CPU702 and deposited by Peripheral Interface 703, the Peripheral Interface 703 Reservoir 701.
I/O subsystems 709, the I/O subsystems 709 can be by the input/output peripherals in equipment, such as touch screen 712 With other input/control devicess 710, it is connected to Peripheral Interface 703.I/O subsystems 709 can include 7091 He of display controller For controlling one or more input controllers 7092 of other input/control devicess 710.Wherein, one or more input controls Device 7092 processed receives electric signal from other input/control devicess 710 or sends electric signal to other input/control devicess 710, Other input/control devicess 710 can include physical button, control stick, click idler wheel.
Power management chip 708, the hardware for being connected by CPU702, I/O subsystem and Peripheral Interface are powered And power management.
CPU702 provided in an embodiment of the present invention can perform following operation:
Voice messaging input by user is received, determines descriptor corresponding with the voice messaging and theme term vector;
The voice messaging is identified by neural network model to determine recognition result;
Vectorization is carried out to the recognition result to handle to obtain recognition result vector;
The distance value of theme term vector described in the recognition result vector sum is calculated, place is normalized to the distance value Reason obtains the distance weighting value of the recognition result and the descriptor, and the voice messaging is determined according to the distance weighting value Final corresponding identification content.
Note that it above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The present invention is not limited to specific embodiment described here, can carry out for a person skilled in the art various apparent variations, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also It can include other more equivalent embodiments, and the scope of the present invention is determined by scope of the appended claims.

Claims (10)

1. a kind of audio recognition method, which is characterized in that including:
Voice messaging input by user is received, determines descriptor corresponding with the voice messaging and theme term vector;
The voice messaging is identified by neural network model to determine recognition result;
Vectorization is carried out to the recognition result to handle to obtain recognition result vector;
The distance value of theme term vector described in the recognition result vector sum is calculated, the distance value is normalized To the recognition result and the distance weighting value of the descriptor, determine that the voice messaging is final according to the distance weighting value Corresponding identification content.
2. according to the method described in claim 1, it is characterized in that, described handle to obtain to recognition result progress vectorization Recognition result vector includes:
Institute is determined according to the identification term vector in the recognition result and the corresponding identification word weighted value of the identification term vector State recognition result vector;Or
Syntax tree is built according to the recognition result, according to the node in the syntax tree and the corresponding node power of the node Weight values determine the recognition result vector.
3. method according to claim 1 or 2, which is characterized in that described to calculate master described in the recognition result vector sum The distance value of vector is write inscription, the distance value is normalized to obtain the distance of the recognition result and the descriptor Weighted value includes:
The COS distance value of theme term vector described in the recognition result vector sum is calculated, institute is determined according to the COS distance value State the distance weighting value of recognition result and the descriptor.
4. according to the method described in claim 3, it is characterized in that, it is described by neural network model to the voice messaging into Row identification is included with determining recognition result:
The voice messaging is identified by neural network model to determine that recognition result and the recognition result are corresponding Recognition result weighted value;Correspondingly,
It is described to determine that the final corresponding identification content of the voice messaging includes according to the distance weighting value:
The voice messaging finally corresponding identification content is determined according to the distance weighting value and the recognition result weighted value.
5. according to the method described in claim 4, it is characterized in that, it is described determine descriptor corresponding with the voice messaging with And theme term vector includes:
Using the descriptor to prestore and theme term vector as the descriptor of the voice messaging and theme term vector;Or
The descriptor that input by user or supplier sends is received, theme term vector is determined according to the descriptor, will be received The descriptor and the theme term vector determined as the corresponding descriptor of the voice messaging and descriptor Vector.
6. a kind of speech recognition equipment, which is characterized in that including:
Theme determining module, for receiving voice messaging input by user, determine descriptor corresponding with the voice messaging with And theme term vector;
Recognition result determining module is identified to determine identification knot for passing through neural network model to the voice messaging Fruit;
Recognition result vector determining module handles to obtain recognition result vector for carrying out the recognition result vectorization;
Content determination module is identified, for calculating the distance value of theme term vector described in the recognition result vector sum, to described Distance value is normalized to obtain the distance weighting value of the recognition result and the descriptor, according to the distance weighting Value determines the voice messaging finally corresponding identification content.
7. device according to claim 6, which is characterized in that the recognition result vector determining module is specifically used for:
Institute is determined according to the identification term vector in the recognition result and the corresponding identification word weighted value of the identification term vector State recognition result vector;Or
Syntax tree is built according to the recognition result, according to the node in the syntax tree and the corresponding node power of the node Weight values determine the recognition result vector.
8. the device described according to claim 6 or 7, which is characterized in that the identification content determination module is specifically used for:
The COS distance value of theme term vector described in the recognition result vector sum is calculated, institute is determined according to the COS distance value State the distance weighting value of recognition result and the descriptor.
9. device according to claim 8, which is characterized in that the recognition result determining module is specifically used for:
The voice messaging is identified by neural network model to determine that recognition result and the recognition result are corresponding Recognition result weighted value;
The identification content determination module is specifically used for:
The voice messaging finally corresponding identification content is determined according to the distance weighting value and the recognition result weighted value;
The theme determining module is specifically used for:
Using the descriptor to prestore and theme term vector as the descriptor of the voice messaging and theme term vector;Or
The descriptor that input by user or supplier sends is received, theme term vector is determined according to the descriptor, will be received The descriptor and the theme term vector determined as the corresponding descriptor of the voice messaging and descriptor Vector.
10. a kind of terminal device, which is characterized in that the terminal device is integrated with as described in any one in claim 6-9 Device.
CN201611166106.4A 2016-12-16 2016-12-16 A kind of audio recognition method, device and terminal device Pending CN108206020A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611166106.4A CN108206020A (en) 2016-12-16 2016-12-16 A kind of audio recognition method, device and terminal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611166106.4A CN108206020A (en) 2016-12-16 2016-12-16 A kind of audio recognition method, device and terminal device

Publications (1)

Publication Number Publication Date
CN108206020A true CN108206020A (en) 2018-06-26

Family

ID=62601495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611166106.4A Pending CN108206020A (en) 2016-12-16 2016-12-16 A kind of audio recognition method, device and terminal device

Country Status (1)

Country Link
CN (1) CN108206020A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986797A (en) * 2018-08-06 2018-12-11 中国科学技术大学 A kind of voice subject identifying method and system
CN112015872A (en) * 2019-05-29 2020-12-01 华为技术有限公司 Question recognition method and device
CN112466327A (en) * 2020-10-23 2021-03-09 北京百度网讯科技有限公司 Voice processing method and device and electronic equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1573926A (en) * 2003-06-03 2005-02-02 微软公司 Discriminative training of language models for text and speech classification
CN103474065A (en) * 2013-09-24 2013-12-25 贵阳世纪恒通科技有限公司 Method for determining and recognizing voice intentions based on automatic classification technology
CN104424296A (en) * 2013-09-02 2015-03-18 阿里巴巴集团控股有限公司 Query word classifying method and query word classifying device
US20150222450A1 (en) * 2012-09-10 2015-08-06 Samsung Electronics Co., Ltd. System and method of controlling external apparatus connected with device
CN105808590A (en) * 2014-12-31 2016-07-27 中国电信股份有限公司 Search engine realization method as well as search method and apparatus
CN105869642A (en) * 2016-03-25 2016-08-17 海信集团有限公司 Voice text error correction method and device
CN105931642A (en) * 2016-05-31 2016-09-07 北京灵隆科技有限公司 Speech recognition method, apparatus and system
CN106095737A (en) * 2016-06-07 2016-11-09 杭州凡闻科技有限公司 Documents Similarity computational methods and similar document the whole network retrieval tracking
CN106202068A (en) * 2016-07-25 2016-12-07 哈尔滨工业大学 The machine translation method of semantic vector based on multi-lingual parallel corpora
CN106205622A (en) * 2016-06-29 2016-12-07 联想(北京)有限公司 Information processing method and electronic equipment
CN106228983A (en) * 2016-08-23 2016-12-14 北京谛听机器人科技有限公司 Scene process method and system during a kind of man-machine natural language is mutual

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1573926A (en) * 2003-06-03 2005-02-02 微软公司 Discriminative training of language models for text and speech classification
US20150222450A1 (en) * 2012-09-10 2015-08-06 Samsung Electronics Co., Ltd. System and method of controlling external apparatus connected with device
CN104424296A (en) * 2013-09-02 2015-03-18 阿里巴巴集团控股有限公司 Query word classifying method and query word classifying device
CN103474065A (en) * 2013-09-24 2013-12-25 贵阳世纪恒通科技有限公司 Method for determining and recognizing voice intentions based on automatic classification technology
CN105808590A (en) * 2014-12-31 2016-07-27 中国电信股份有限公司 Search engine realization method as well as search method and apparatus
CN105869642A (en) * 2016-03-25 2016-08-17 海信集团有限公司 Voice text error correction method and device
CN105931642A (en) * 2016-05-31 2016-09-07 北京灵隆科技有限公司 Speech recognition method, apparatus and system
CN106095737A (en) * 2016-06-07 2016-11-09 杭州凡闻科技有限公司 Documents Similarity computational methods and similar document the whole network retrieval tracking
CN106205622A (en) * 2016-06-29 2016-12-07 联想(北京)有限公司 Information processing method and electronic equipment
CN106202068A (en) * 2016-07-25 2016-12-07 哈尔滨工业大学 The machine translation method of semantic vector based on multi-lingual parallel corpora
CN106228983A (en) * 2016-08-23 2016-12-14 北京谛听机器人科技有限公司 Scene process method and system during a kind of man-machine natural language is mutual

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何炎祥: "《编译原理》", 31 October 2000 *
程英英: "Web挖掘技术及其在邮件系统中的应用", 《万方数据知识服务平台》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986797A (en) * 2018-08-06 2018-12-11 中国科学技术大学 A kind of voice subject identifying method and system
CN112015872A (en) * 2019-05-29 2020-12-01 华为技术有限公司 Question recognition method and device
CN112466327A (en) * 2020-10-23 2021-03-09 北京百度网讯科技有限公司 Voice processing method and device and electronic equipment
CN112466327B (en) * 2020-10-23 2022-02-22 北京百度网讯科技有限公司 Voice processing method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN108319599B (en) Man-machine conversation method and device
CN110287461B (en) Text conversion method, device and storage medium
CN110444199B (en) Voice keyword recognition method and device, terminal and server
CN108694940B (en) Voice recognition method and device and electronic equipment
WO2019076286A1 (en) User intent recognition method and device for a statement
WO2021022992A1 (en) Dialog generation model training method and device, and dialog generation method and device, and medium
CN110517664B (en) Multi-party identification method, device, equipment and readable storage medium
CN106875940B (en) Machine self-learning construction knowledge graph training method based on neural network
CN110275939B (en) Method and device for determining conversation generation model, storage medium and electronic equipment
CN107978311A (en) A kind of voice data processing method, device and interactive voice equipment
US20190221208A1 (en) Method, user interface, and device for audio-based emoji input
CN111402861B (en) Voice recognition method, device, equipment and storage medium
CN111309883A (en) Man-machine conversation method based on artificial intelligence, model training method and device
WO2021135457A1 (en) Recurrent neural network-based emotion recognition method, apparatus, and storage medium
CN111694940A (en) User report generation method and terminal equipment
CN113314119B (en) Voice recognition intelligent household control method and device
CN108206020A (en) A kind of audio recognition method, device and terminal device
CN112836521A (en) Question-answer matching method and device, computer equipment and storage medium
CN108053826A (en) For the method, apparatus of human-computer interaction, electronic equipment and storage medium
US10269349B2 (en) Voice interactive device and voice interaction method
JP2021081713A (en) Method, device, apparatus, and media for processing voice signal
CN114023309A (en) Speech recognition system, related method, device and equipment
WO2020199590A1 (en) Mood detection analysis method and related device
WO2023040545A1 (en) Data processing method and apparatus, device, storage medium, and program product
CN111680514A (en) Information processing and model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180626