CN107464566A - Audio recognition method and device - Google Patents
Audio recognition method and device Download PDFInfo
- Publication number
- CN107464566A CN107464566A CN201710861589.8A CN201710861589A CN107464566A CN 107464566 A CN107464566 A CN 107464566A CN 201710861589 A CN201710861589 A CN 201710861589A CN 107464566 A CN107464566 A CN 107464566A
- Authority
- CN
- China
- Prior art keywords
- information
- voice messaging
- entity
- user
- analysis result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000004458 analytical method Methods 0.000 claims abstract description 75
- 230000008451 emotion Effects 0.000 claims abstract description 40
- 238000013135 deep learning Methods 0.000 claims abstract description 26
- 238000004590 computer program Methods 0.000 claims description 9
- 235000013399 edible fruits Nutrition 0.000 claims description 2
- 230000029058 respiratory gaseous exchange Effects 0.000 claims 1
- 230000032258 transport Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 230000008901 benefit Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000003733 optic disk Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Child & Adolescent Psychology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of audio recognition method and device, wherein, method includes:Obtain the voice messaging of user's input;The entity information in voice messaging is identified based on name entity recognition system;Obtain the word speed information and information volume during user's input voice information;And based on deep learning sentiment analysis model, according to entity information, word speed information and information volume, identify emotion information corresponding to voice messaging.The audio recognition method of the embodiment of the present invention, the voice messaging inputted by obtaining user, and the entity information in voice messaging is identified based on name entity recognition system, the word speed information and information volume during user's input voice information are obtained again, and based on deep learning sentiment analysis model, according to entity information, word speed information and information volume, identify emotion information corresponding to voice messaging, so as to using emotion information as an important factor for speech recognition, help improves the accuracy rate of speech recognition, more conforms to the real demand of user.
Description
Technical field
The present invention relates to technical field of information processing, more particularly to a kind of audio recognition method and device.
Background technology
With the continuous progress of science and technology, speech recognition technology achieves significant progress, the arrival of intellectualization times, voice
Identification technology will enter the every field such as industry, household electrical appliances, communication, automotive electronics, medical treatment, home services, consumption electronic product.Mesh
Before, speech recognition is mainly to parse the content that user speaks, and understands the intention of user, so as to simply be interacted with user,
Such as the phonetic order of user is received, some shirtsleeve operations are performed, or simply talked with user.But when with
Family inputs a voice, current speech recognition system, generally can be only that user speak when such as " being played a song song to me "
Content is parsed, and recommends a song at random according to analysis result, and user also includes more information when speaking, not
With being sufficiently used, therefore it can not understand that the behavior of deeper user is intended to.
The content of the invention
The present invention provides a kind of audio recognition method and device, at least one in above-mentioned technical problem to solve.
The embodiment of the present invention provides a kind of audio recognition method, including:Obtain the voice messaging of user's input;Based on name
Entity recognition system identifies the entity information in the voice messaging;Obtain the language when user inputs the voice messaging
Fast information and information volume;And based on deep learning sentiment analysis model, according to the entity information, the word speed information and
The information volume, identify emotion information corresponding to the voice messaging.
Optionally, the entity information in the voice messaging is identified based on name entity recognition system, including:To described
Voice messaging is analyzed, and obtains analysis result;The entity information is identified according to the analysis result.
Optionally, the voice messaging is analyzed, obtains analysis result, including:The voice messaging is carried out pre-
Processing, participle, part-of-speech tagging processing, to obtain the analysis result.
Optionally, based on deep learning sentiment analysis model, according to the entity information, the word speed information and the sound
Information is measured, identifies emotion information corresponding to the voice messaging, including:By the entity information, the word speed information and institute
Information volume is stated to input to deep learning sentiment analysis model as characteristic information;Based on default sentiment dictionary, pass through the depth
Degree Latent abilities analysis model identifies the emotion information.
Optionally, after the voice messaging of user's input is obtained, in addition to:Semantic solution is carried out to the voice messaging
Analysis, and generative semantics analysis result;According to the semantic analysis result and the emotion information to the user feedback with it is described
The corresponding object information of voice messaging.
Another embodiment of the present invention provides a kind of speech recognition equipment, including:First acquisition module, it is defeated for obtaining user
The voice messaging entered;Identification module, for identifying the entity information in the voice messaging based on name entity recognition system;
Second acquisition module, for obtaining word speed information and information volume when the user inputs the voice messaging;And emotion
Analysis module, for based on deep learning sentiment analysis model, according to the entity information, the word speed information and the volume
Information, identify emotion information corresponding to the voice messaging.
Optionally, the identification module, is used for:The voice messaging is analyzed, obtains analysis result;According to described
Analysis result identifies the entity information.
Optionally, the identification module, is specifically used for:The voice messaging is pre-processed, segmented, at part-of-speech tagging
Reason, to obtain the analysis result.
Optionally, the sentiment analysis module, is used for:The entity information, the word speed information and the volume are believed
Breath is inputted to deep learning sentiment analysis model as characteristic information;Based on default sentiment dictionary, pass through the deep learning feelings
Sense analysis model identifies the emotion information.
Optionally, described device also includes:Semantic meaning analysis module, for obtain user input voice messaging after,
Semantic parsing, and generative semantics analysis result are carried out to the voice messaging;Feedback module, for according to the semantic parsing knot
Fruit and the emotion information are to the user feedback object information corresponding with the voice messaging.
A further embodiment of the present invention provides a kind of non-transitorycomputer readable storage medium, is stored thereon with computer journey
Sequence, the audio recognition method as described in first aspect present invention embodiment is realized when the computer program is executed by processor.
Further embodiment of this invention provides a kind of terminal device, including processor, memory and is stored in the memory
Computer program that is upper and can running on the processor, the processor are used to perform first aspect present invention embodiment institute
The audio recognition method stated.
Technical scheme provided in an embodiment of the present invention can include the following benefits:
The voice messaging inputted by obtaining user, and identified based on name entity recognition system in the voice messaging
Entity information, then obtain the word speed information and information volume when the user inputs the voice messaging, and based on depth
Latent abilities analysis model, according to the entity information, the word speed information and the information volume, identify the voice letter
Emotion information corresponding to breath, so as to which using emotion information as an important factor for speech recognition, help improves the accurate of speech recognition
Rate, more conform to the real demand of user.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description
Obtain substantially, or recognized by the practice of the present invention.
Brief description of the drawings
Of the invention above-mentioned and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments
Substantially and it is readily appreciated that, wherein:
Fig. 1 is the flow chart of audio recognition method according to an embodiment of the invention;
Fig. 2 is the flow chart of audio recognition method in accordance with another embodiment of the present invention;
Fig. 3 is the sentiment analysis system framework schematic diagram based on deep learning;
Fig. 4 is the structured flowchart of speech recognition equipment according to an embodiment of the invention;
Fig. 5 is the structured flowchart of speech recognition equipment in accordance with another embodiment of the present invention.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end
Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings the audio recognition method and device of the embodiment of the present invention are described.
Fig. 1 is the flow chart of audio recognition method according to an embodiment of the invention.
As shown in figure 1, the audio recognition method includes:
S101, obtain the voice messaging of user's input.
At present, speech recognition is mainly to parse the content that user speaks, and understands the intention of user, so as to carry out letter with user
Single interaction, such as the phonetic order of user is received, some shirtsleeve operations are performed, or simply talked with user.
But when user inputs a voice, such as " is played a song song to me ", current speech recognition system, generally can be only
The content that user speaks is parsed, and recommends a song at random according to analysis result, and user also includes more when speaking
Information be beneficial to emotion information etc., and it is useless be sufficiently used, therefore can not understand that the behavior of deeper user is anticipated
Figure.
Therefore, the present invention proposes a kind of audio recognition method, can be using emotion information as Fusion Features to speech recognition
In, so as to more accurately understand the intention of user.Sentiment analysis, it is that the subjective texts with emotional color are divided
The process of analysis, processing, conclusion and reasoning.Therefore, the present invention is based on depth learning technology, with reference to the language in speech recognition system
The features such as speed, volume carry out comprehensive analysis, so as to obtain the emotion information of user exactly.
In one embodiment of the invention, the voice messaging of user's input can be obtained.
S102, the entity information in voice messaging is identified based on name entity recognition system.
Wherein, entity recognition system (Named Entity Recognition, NER) is named, is mainly used in identifying text
In there is the entity of certain sense, such as name, place name, mechanism name, proper noun etc..
In one embodiment of the invention, voice messaging can be analyzed, obtains analysis result, then basis point
Analysis result identifies entity information.Specifically, voice messaging can be pre-processed, segmented, the processing such as part-of-speech tagging, so as to obtain
Obtain analysis result.
S103, obtain word speed information and information volume during user's input voice information.
When obtaining the voice messaging of user's input, while it can also obtain word speed information corresponding to voice messaging and volume letter
Breath.Wherein, word speed information and information volume may be defined as double types, and span available digital represents, such as 0-15.Should
Understand, for step S103 and step S101, the present invention does not limit both execution sequencings.
S104, based on deep learning sentiment analysis model, according to entity information, word speed information and information volume, identify
Emotion information corresponding to voice messaging.
After above- mentioned information is obtained, can using entity information, word speed information and information volume as characteristic information input to
Deep learning sentiment analysis model, default sentiment dictionary then can be based on, feelings are gone out by deep learning sentiment analysis Model Identification
Feel information, such as identify emotion information corresponding to the voice messaging for happiness.
Wherein, deep learning sentiment analysis model can be a convolutional neural networks CNN model.Default sentiment dictionary is logical
Cross following sentiment dictionary fusion and arrange generation, 1 passes judgement on word and its near synonym, 2 Chinese emotion word extreme value tables, 3 Tsing-Hua University
Li Jun Chinese passes judgement on adopted dictionary, 4 sentiment dictionaries and its classification, 5 emotion vocabulary bodies, 6 Taiwan Univ. NTUSD simplified form of Chinese Character emotions
Dictionary and 7 Hownet HowNet sentiment dictionaries.
In another embodiment of the invention, as shown in Fig. 2 audio recognition method can also include the steps of:
S105, semantic parsing, and generative semantics analysis result are carried out to voice messaging.
S106, according to semantic analysis result and emotion information to the user feedback object information corresponding with voice messaging.
In the present embodiment, natural language understanding (Natural Language Understanding, NLU) can be used
Technology to carry out semantic parsing to voice messaging.The emotion information of semantic analysis result and then binding analysis acquisition is being obtained,
So as to carry out the parsing of depth to the intention of user, finally to the corresponding object information of user feedback.
It is described in detail below with a specific example.
As shown in figure 3, Fig. 3 is a sentiment analysis system framework schematic diagram based on deep learning.
First, user carries out phonetic entry " being played a song song to me ", now can be by naming Entity recognition NER to know
Do not go out corresponding entity information " song ".Wherein, text can be first converted speech into, then text is pre-processed, segmented, word
Property mark etc. processing, then export and be identified into NER.Afterwards, using speech recognition technology, word speed information and volume are obtained
Information.Word speed information and information volume may be defined as double types, and span available digital represents, such as 0-15.Numeral is got over
Greatly, represent that word speed is faster or volume is bigger.After this, entity information, word speed information and information volume can be arranged as spy
Reference ceases, and inputs deep learning sentiment analysis model, using the sentiment dictionary integrated, analyzes emotion corresponding to voice messaging
Information is happiness.Now, parsed in conjunction with semanteme, so as to analyze the song for being intended to want to obtain happiness of user, therefore
Intelligently a cheerful and light-hearted song can be pushed to user to play out, more conform to the demand of user.
The audio recognition method of the embodiment of the present invention, the voice messaging inputted by obtaining user, and based on name entity
Identifying system identifies the entity information in voice messaging, then word speed information and volume letter when obtaining user's input voice information
Breath, and based on deep learning sentiment analysis model, according to entity information, word speed information and information volume, identify that voice is believed
Emotion information corresponding to breath, so as to which using emotion information as an important factor for speech recognition, help improves the accurate of speech recognition
Rate, more conform to the real demand of user.
In order to realize above-described embodiment, the invention also provides a kind of speech recognition equipment, Fig. 4 is according to of the invention one
The structured flowchart of the speech recognition equipment of embodiment, as shown in figure 4, the device includes the first acquisition module 410, identification module
420th, the second acquisition module 430 and sentiment analysis module 440.
Wherein, the first acquisition module 410, for obtaining the voice messaging of user's input.
Identification module 420, for identifying the entity information in voice messaging based on name entity recognition system.
Second acquisition module 430, word speed information and information volume during for obtaining user's input voice information.
Sentiment analysis module 440, for based on deep learning sentiment analysis model, according to entity information, word speed information and
Information volume, identify emotion information corresponding to voice messaging.
In addition, as shown in figure 5, the device may also include semantic meaning analysis module 450 and feedback module 460.
Semantic meaning analysis module 450, for after the voice messaging of user's input is obtained, semantic solution to be carried out to voice messaging
Analysis, and generative semantics analysis result.
Feedback module 460, for corresponding with voice messaging to user feedback according to semantic analysis result and emotion information
Object information.
It should be noted that the foregoing explanation to audio recognition method, the voice of the embodiment of the present invention is also applied for
Identification device, unpub details in the embodiment of the present invention, will not be repeated here.
The speech recognition equipment of the embodiment of the present invention, the voice messaging inputted by obtaining user, and based on name entity
Identifying system identifies the entity information in voice messaging, then word speed information and volume letter when obtaining user's input voice information
Breath, and based on deep learning sentiment analysis model, according to entity information, word speed information and information volume, identify that voice is believed
Emotion information corresponding to breath, so as to which using emotion information as an important factor for speech recognition, help improves the accurate of speech recognition
Rate, more conform to the real demand of user.
In order to realize above-described embodiment, the present invention also provides a kind of non-transitorycomputer readable storage medium, deposited thereon
Computer program is contained, the speech recognition such as first aspect present invention embodiment is realized when the computer program is executed by processor
Method.
In order to realize above-described embodiment, the present invention also provides a kind of terminal device, including processor, memory and is stored in
On memory and the computer program that can run on a processor, processor are used for the language for performing first aspect present invention embodiment
Voice recognition method.
For example, computer program can be executed by processor to complete the audio recognition method of following steps:
S101 ', obtain the voice messaging of user's input.
S102 ', the entity information in voice messaging is identified based on name entity recognition system.
S103 ', obtain word speed information and information volume during user's input voice information.
S104 ', based on deep learning sentiment analysis model, according to entity information, word speed information and information volume, identify
Emotion information corresponding to voice messaging.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example description
Point is contained at least one embodiment or example of the present invention.In this manual, to the schematic representation of above-mentioned term not
Identical embodiment or example must be directed to.Moreover, specific features, structure, material or the feature of description can be with office
Combined in an appropriate manner in one or more embodiments or example.In addition, in the case of not conflicting, the skill of this area
Art personnel can be tied the different embodiments or example and the feature of different embodiments or example described in this specification
Close and combine.
In addition, term " first ", " second " are only used for describing purpose, and it is not intended that instruction or hint relative importance
Or the implicit quantity for indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can be expressed or
Implicitly include at least one this feature.In the description of the invention, " multiple " are meant that at least two, such as two, three
It is individual etc., unless otherwise specifically defined.
Any process or method described otherwise above description in flow chart or herein is construed as, and represents to include
Module, fragment or the portion of the code of the executable instruction of one or more the step of being used to realize specific logical function or process
Point, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable
Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be of the invention
Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (such as computer based system including the system of processor or other can be held from instruction
The system of row system, device or equipment instruction fetch and execute instruction) use, or combine these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass
Defeated program is for instruction execution system, device or equipment or the dress used with reference to these instruction execution systems, device or equipment
Put.The more specifically example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring
Connecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read-only storage
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium can even is that can the paper of print routine thereon or other suitable be situated between
Matter, because can then enter edlin, interpretation or if necessary with other for example by carrying out optical scanner to paper or other media
Suitable method is handled electronically to obtain program, is then stored in computer storage.
It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combinations thereof.Above-mentioned
In embodiment, software that multiple steps or method can be performed in memory and by suitable instruction execution system with storage
Or firmware is realized.If, and in another embodiment, can be with well known in the art for example, realized with hardware
Any one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signal
Discrete logic, have suitable combinational logic gate circuit application specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method carries
Suddenly be can by program come instruct correlation hardware complete, program can be stored in a kind of computer-readable recording medium
In, the program upon execution, including one or a combination set of the step of embodiment of the method.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, can also
That unit is individually physically present, can also two or more units be integrated in a module.Above-mentioned integrated mould
Block can both be realized in the form of hardware, can also be realized in the form of software function module.If integrated module with
The form of software function module realize and be used as independent production marketing or in use, can also be stored in one it is computer-readable
Take in storage medium.
Storage medium mentioned above can be read-only storage, disk or CD etc..Although have been shown and retouch above
Embodiments of the invention are stated, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as the limit to the present invention
System, one of ordinary skill in the art can be changed to above-described embodiment, change, replace and become within the scope of the invention
Type.
Claims (12)
- A kind of 1. audio recognition method, it is characterised in that including:Obtain the voice messaging of user's input;The entity information in the voice messaging is identified based on name entity recognition system;Obtain the word speed information and information volume when the user inputs the voice messaging;AndBased on deep learning sentiment analysis model, according to the entity information, the word speed information and the information volume, identification Go out emotion information corresponding to the voice messaging.
- 2. the method as described in claim 1, it is characterised in that the voice messaging is identified based on name entity recognition system In entity information, including:The voice messaging is analyzed, obtains analysis result;The entity information is identified according to the analysis result.
- 3. method as claimed in claim 2, it is characterised in that analyze the voice messaging, obtain analysis result, bag Include:The voice messaging is pre-processed, segmented, part-of-speech tagging processing, to obtain the analysis result.
- 4. the method as described in claim 1, it is characterised in that based on deep learning sentiment analysis model, according to the entity Information, the word speed information and the information volume, identify emotion information corresponding to the voice messaging, including:Inputted the entity information, the word speed information and the information volume as characteristic information to deep learning emotion point Analyse model;Based on default sentiment dictionary, the emotion information is gone out by the deep learning sentiment analysis Model Identification.
- 5. the method as described in claim 1, it is characterised in that after the voice messaging of user's input is obtained, in addition to:Semantic parsing, and generative semantics analysis result are carried out to the voice messaging;According to the semantic analysis result and the emotion information to the user feedback knot corresponding with the voice messaging Fruit information.
- A kind of 6. speech recognition equipment, it is characterised in that including:First acquisition module, for obtaining the voice messaging of user's input;Identification module, for identifying the entity information in the voice messaging based on name entity recognition system;Second acquisition module, for obtaining word speed information and information volume when the user inputs the voice messaging;AndSentiment analysis module, for based on deep learning sentiment analysis model, according to the entity information, the word speed information and The information volume, identify emotion information corresponding to the voice messaging.
- 7. device as claimed in claim 6, it is characterised in that the identification module, be used for:The voice messaging is analyzed, obtains analysis result;The entity information is identified according to the analysis result.
- 8. device as claimed in claim 7, it is characterised in that the identification module, be specifically used for:The voice messaging is pre-processed, segmented, part-of-speech tagging processing, to obtain the analysis result.
- 9. device as claimed in claim 6, it is characterised in that the sentiment analysis module, be used for:Inputted the entity information, the word speed information and the information volume as characteristic information to deep learning emotion point Analyse model;Based on default sentiment dictionary, the emotion information is gone out by the deep learning sentiment analysis Model Identification.
- 10. device as claimed in claim 6, it is characterised in that described device also includes:Semantic meaning analysis module, for after the voice messaging of user's input is obtained, semantic parsing to be carried out to the voice messaging, And generative semantics analysis result;Feedback module, for being believed according to the semantic analysis result and the emotion information to the user feedback and the voice Object information corresponding to manner of breathing.
- 11. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, the computer program is processed The audio recognition method as described in claim any one of 1-5 is realized when device performs.
- 12. a kind of terminal device, including processor, memory and it is stored on the memory and can transports on the processor Capable computer program, the processor are used for the audio recognition method described in perform claim requirement any one of 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710861589.8A CN107464566A (en) | 2017-09-21 | 2017-09-21 | Audio recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710861589.8A CN107464566A (en) | 2017-09-21 | 2017-09-21 | Audio recognition method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107464566A true CN107464566A (en) | 2017-12-12 |
Family
ID=60552962
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710861589.8A Pending CN107464566A (en) | 2017-09-21 | 2017-09-21 | Audio recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107464566A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108521500A (en) * | 2018-03-13 | 2018-09-11 | 努比亚技术有限公司 | A kind of voice scenery control method, equipment and computer readable storage medium |
CN108806671A (en) * | 2018-05-29 | 2018-11-13 | 杭州认识科技有限公司 | Semantic analysis, device and electronic equipment |
CN108920129A (en) * | 2018-07-27 | 2018-11-30 | 联想(北京)有限公司 | Information processing method and information processing system |
CN109215679A (en) * | 2018-08-06 | 2019-01-15 | 百度在线网络技术(北京)有限公司 | Dialogue method and device based on user emotion |
CN110164427A (en) * | 2018-02-13 | 2019-08-23 | 阿里巴巴集团控股有限公司 | Voice interactive method, device, equipment and storage medium |
CN110895658A (en) * | 2018-09-13 | 2020-03-20 | 珠海格力电器股份有限公司 | Information processing method and device and robot |
RU2720359C1 (en) * | 2019-04-16 | 2020-04-29 | Хуавэй Текнолоджиз Ко., Лтд. | Method and equipment for recognizing emotions in speech |
CN111091810A (en) * | 2019-12-19 | 2020-05-01 | 佛山科学技术学院 | VR game character expression control method and storage medium based on voice information |
CN111354361A (en) * | 2018-12-21 | 2020-06-30 | 深圳市优必选科技有限公司 | Emotion communication method and system and robot |
CN111370030A (en) * | 2020-04-03 | 2020-07-03 | 龙马智芯(珠海横琴)科技有限公司 | Voice emotion detection method and device, storage medium and electronic equipment |
CN112437956A (en) * | 2018-07-25 | 2021-03-02 | Lg 电子株式会社 | Speech recognition system |
CN113409790A (en) * | 2020-03-17 | 2021-09-17 | Oppo广东移动通信有限公司 | Voice conversion method, device, terminal and storage medium |
US20220084525A1 (en) * | 2020-09-17 | 2022-03-17 | Zhejiang Tonghuashun Intelligent Technology Co., Ltd. | Systems and methods for voice audio data processing |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572625A (en) * | 2015-01-21 | 2015-04-29 | 北京云知声信息技术有限公司 | Recognition method of named entity |
US20150255087A1 (en) * | 2014-03-07 | 2015-09-10 | Fujitsu Limited | Voice processing device, voice processing method, and computer-readable recording medium storing voice processing program |
CN106683672A (en) * | 2016-12-21 | 2017-05-17 | 竹间智能科技(上海)有限公司 | Intelligent dialogue method and system based on emotion and semantics |
-
2017
- 2017-09-21 CN CN201710861589.8A patent/CN107464566A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150255087A1 (en) * | 2014-03-07 | 2015-09-10 | Fujitsu Limited | Voice processing device, voice processing method, and computer-readable recording medium storing voice processing program |
CN104572625A (en) * | 2015-01-21 | 2015-04-29 | 北京云知声信息技术有限公司 | Recognition method of named entity |
CN106683672A (en) * | 2016-12-21 | 2017-05-17 | 竹间智能科技(上海)有限公司 | Intelligent dialogue method and system based on emotion and semantics |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110164427A (en) * | 2018-02-13 | 2019-08-23 | 阿里巴巴集团控股有限公司 | Voice interactive method, device, equipment and storage medium |
CN108521500A (en) * | 2018-03-13 | 2018-09-11 | 努比亚技术有限公司 | A kind of voice scenery control method, equipment and computer readable storage medium |
CN108806671A (en) * | 2018-05-29 | 2018-11-13 | 杭州认识科技有限公司 | Semantic analysis, device and electronic equipment |
CN108806671B (en) * | 2018-05-29 | 2019-06-28 | 杭州认识科技有限公司 | Semantic analysis, device and electronic equipment |
CN112437956A (en) * | 2018-07-25 | 2021-03-02 | Lg 电子株式会社 | Speech recognition system |
CN112437956B (en) * | 2018-07-25 | 2024-03-26 | Lg 电子株式会社 | Speech recognition system |
CN108920129A (en) * | 2018-07-27 | 2018-11-30 | 联想(北京)有限公司 | Information processing method and information processing system |
US11062708B2 (en) | 2018-08-06 | 2021-07-13 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for dialoguing based on a mood of a user |
CN109215679A (en) * | 2018-08-06 | 2019-01-15 | 百度在线网络技术(北京)有限公司 | Dialogue method and device based on user emotion |
CN110895658A (en) * | 2018-09-13 | 2020-03-20 | 珠海格力电器股份有限公司 | Information processing method and device and robot |
CN111354361A (en) * | 2018-12-21 | 2020-06-30 | 深圳市优必选科技有限公司 | Emotion communication method and system and robot |
RU2720359C1 (en) * | 2019-04-16 | 2020-04-29 | Хуавэй Текнолоджиз Ко., Лтд. | Method and equipment for recognizing emotions in speech |
CN111091810A (en) * | 2019-12-19 | 2020-05-01 | 佛山科学技术学院 | VR game character expression control method and storage medium based on voice information |
CN113409790A (en) * | 2020-03-17 | 2021-09-17 | Oppo广东移动通信有限公司 | Voice conversion method, device, terminal and storage medium |
CN111370030A (en) * | 2020-04-03 | 2020-07-03 | 龙马智芯(珠海横琴)科技有限公司 | Voice emotion detection method and device, storage medium and electronic equipment |
US20220084525A1 (en) * | 2020-09-17 | 2022-03-17 | Zhejiang Tonghuashun Intelligent Technology Co., Ltd. | Systems and methods for voice audio data processing |
US12119004B2 (en) * | 2020-09-17 | 2024-10-15 | Zhejiang Tonghuashun Intelligent Technology Co., Ltd. | Systems and methods for voice audio data processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107464566A (en) | Audio recognition method and device | |
Ghosh et al. | Fracking sarcasm using neural network | |
Schuller et al. | Cross-corpus acoustic emotion recognition: Variances and strategies | |
Gibbon et al. | Handbook of multimodal and spoken dialogue systems: Resources, terminology and product evaluation | |
CN110297907B (en) | Method for generating interview report, computer-readable storage medium and terminal device | |
Cole et al. | New methods for prosodic transcription: Capturing variability as a source of information | |
US20220245354A1 (en) | Automated classification of emotio-cogniton | |
Bertero et al. | Deep learning of audio and language features for humor prediction | |
CN110728997A (en) | A multimodal depression detection method and system based on situational awareness | |
Johar | Emotion, affect and personality in speech: The Bias of language and paralanguage | |
CN108647219A (en) | A kind of convolutional neural networks text emotion analysis method of combination sentiment dictionary | |
KR101971582B1 (en) | Method of providing health care guide using chat-bot having user intension analysis function and apparatus for the same | |
CN110297906B (en) | Method for generating interview report, computer-readable storage medium and terminal device | |
Blache et al. | Creating and exploiting multimodal annotated corpora: the ToMA project | |
Singh et al. | An efficient language-independent acoustic emotion classification system | |
CN112860871B (en) | Natural language understanding model training method, natural language understanding method and device | |
CN107526826A (en) | Phonetic search processing method, device and server | |
Campbell | Developments in corpus-based speech synthesis: Approaching natural conversational speech | |
CN110457424A (en) | Generate method, computer readable storage medium and the terminal device of interview report | |
CN116092472A (en) | Speech synthesis method and synthesis system | |
CN114120985A (en) | Pacifying interaction method, system and equipment of intelligent voice terminal and storage medium | |
Christodoulides et al. | Automatic detection and annotation of disfluencies in spoken French corpora. | |
CN116612541A (en) | A multi-modal emotion recognition method, device and storage medium | |
Alm | The role of affect in the computational modeling of natural language | |
Zhang | An automatic assessment method for spoken English based on multimodal feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171212 |