CN109994102A - A kind of outer paging system of intelligence based on Emotion identification - Google Patents
A kind of outer paging system of intelligence based on Emotion identification Download PDFInfo
- Publication number
- CN109994102A CN109994102A CN201910303368.8A CN201910303368A CN109994102A CN 109994102 A CN109994102 A CN 109994102A CN 201910303368 A CN201910303368 A CN 201910303368A CN 109994102 A CN109994102 A CN 109994102A
- Authority
- CN
- China
- Prior art keywords
- module
- signal
- connect
- voice
- paging system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/51—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
- H04M3/5166—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing in combination with interactive voice response systems or voice portals, e.g. as front-ends
Abstract
The outer paging system of intelligence that the invention discloses a kind of based on Emotion identification, including voice communications module, voice obtains module, audio dimension analysis module, text transcription module, model of place generation module, prompt generation module, text semantic analysis module, database, epidemic situation comparison module, Realtime Alerts module, it attends a banquet videograph module, user video logging modle and display screen, voice communications module obtains module with voice by signal and connect, voice obtains module and is connect by signal with audio dimension analysis module, audio dimension analysis module and text semantic analysis module pass through signal and connect with text transcription module, videograph module of attending a banquet and user video logging modle are connect by signal with text semantic analysis module, the system, it is added in common outgoing call of attending a banquet and is based on voice and text artificial intelligence's analytical technology , supervised with this, instruct both sides' mood, so that it is more standardized entirely to converse, hommization promotes user experience.
Description
Technical field
The present invention relates to speech emotional processing technology field, specially a kind of outer paging system of intelligence based on Emotion identification.
Background technique
In terms of pattern-recognition, all means are almost utilized in speech emotional process field by various countries researcher, newly
Method application and comparison emerge one after another, neural network classifier, Bayes classifier, K nearest neighbor classifier, SVM, GMM,
HMM classifier, which has, to be used, although the research on speech emotion recognition has carried out very much, at entire speech emotional information
Reason field is also in a lower level.Because the validity feature extracted first is limited, almost all of researcher is
Using the combination of prosodic features or these features or derivative feature as analysis parameter, secondly, for the means of pattern-recognition, though
So there are many different application methods, but the data as used in research project are different, and make analogy between these documents
A possibility that very little, research finds that the research object in document is widely different, as a result different, only for discrimination, is just formed
Great disparity as from 53% to 90%, and cannot say that high method of discrimination just centainly that side lower than discrimination
Method is good, this is without comparativity.
So in summary introducing, the stage that the identification of speech emotional is explored and studied also in one, many problems
It needs to solve with difficulty, speech emotional technology is applied to voice messaging inquiry system at present, and there are the correct recognition rata of emotion is general
All over lower problem, the joint efforts of all research workers are also needed to the breakthrough in the field.
Summary of the invention
The outer paging system of intelligence that the purpose of the present invention is to provide a kind of based on Emotion identification, to solve above-mentioned background technique
The problem of middle proposition.
In order to solve the above technical problem, the present invention provides following technical solutions: a kind of intelligence based on Emotion identification is outer
Paging system, including voice communications module, voice obtain module, audio dimension analysis module, text transcription module, model of place life
At module, prompt generation module, text semantic analysis module, database, epidemic situation comparison module, Realtime Alerts module, view of attending a banquet
Frequency logging modle, user video logging modle and display screen, the voice communications module obtain module by signal and voice and connect
It connects, the voice obtains module and connect by signal with audio dimension analysis module, the audio dimension analysis module and text
Semantic module passes through signal and connect with text transcription module, videograph module and the user video record mould of attending a banquet
Block is connect by signal with text semantic analysis module, and the text transcription module is connected by signal and model of place generation module
It connects, the model of place generation module is connect with prompt generation module and database respectively by signal, and the database passes through
Signal is connect with epidemic situation comparison module, and the epidemic situation comparison module is connect by signal with prompt generation module, the scene mould
Type generation module is connect by signal with prompt generation module, and the prompt generation module is connect by signal with display screen.
According to the above technical scheme, the epidemic situation comparison module is connect by signal with Realtime Alerts module.
According to the above technical scheme, videograph module and the user video logging modle of attending a banquet passes through signal and data
Library connection.
According to the above technical scheme, the database is to be bi-directionally connected with model of place generation module.
According to the above technical scheme, the audio dimension analysis module includes word speed phonic signal character analytical unit, width
Spend phonic signal character analytical unit and fundamental frequency phonic signal character analytical unit.
According to the above technical scheme, the audio dimension analysis module is based on Parzen probabilistic neural network.
According to the above technical scheme, the epidemic situation comparison module includes historical baseline values comparing unit and average reference value ratio
Compared with unit.
Compared with prior art, the beneficial effects obtained by the present invention are as follows being: it is somebody's turn to do the outer paging system of intelligence based on Emotion identification,
The speech emotion recognition of " with text dependent and unrelated with speaker " is applied in voice messaging inquiry system, Bayes is utilized
Minimal error rate decision theory determines optimal threshold, proposes a kind of new speech sound signal terminal point detection algorithm;Have studied word speed,
Amplitude and fundamental frequency three classes phonic signal character, and utilize these features of fuzzy entropy theory analysis for the effective of emotional semantic classification
Property, then select optimal characteristic parameter to combine to carry out speech emotion recognition;Have studied point suitable for speech emotion recognition
Class device is completed the identification of speech emotional state using Parzen probabilistic neural network, substantially increases the whole discrimination of system.
Detailed description of the invention
Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention
It applies example to be used to explain the present invention together, not be construed as limiting the invention.In the accompanying drawings:
Fig. 1 is system flow chart of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, the present invention provides a kind of technical solution: a kind of outer paging system of intelligence based on Emotion identification, including
Voice communications module, voice obtain module, audio dimension analysis module, text transcription module, model of place generation module, prompt
Generation module, text semantic analysis module, database, epidemic situation comparison module, Realtime Alerts module, videograph module of attending a banquet,
User video logging modle and display screen, voice communications module obtain module with voice by signal and connect, and voice obtains module
Connect by signal with audio dimension analysis module, audio dimension analysis module include word speed phonic signal character analytical unit,
Amplitude phonic signal character analytical unit and fundamental frequency phonic signal character analytical unit, audio dimension analysis module are based on Parzen
Probabilistic neural network, audio dimension analysis module and text semantic analysis module pass through signal and connect with text transcription module,
Videograph module of attending a banquet and user video logging modle are connect by signal with text semantic analysis module, videograph of attending a banquet
Module and user video logging modle are connect by signal with database, and text transcription module is generated by signal and model of place
Module connection, model of place generation module are connect with prompt generation module and database respectively by signal, database and scene
Model generation module is to be bi-directionally connected, and database is connect by signal with epidemic situation comparison module, and epidemic situation comparison module passes through signal
It is connect with Realtime Alerts module, epidemic situation comparison module includes historical baseline values comparing unit and average reference value comparing unit, shape
State comparison module is connect by signal with prompt generation module, and model of place generation module is connected by signal and prompt generation module
It connects, generation module is prompted to connect by signal with display screen;It user and attends a banquet normal voice is carried out by voice communications module
Communication, meanwhile, the audio data stream that module obtains user and attends a banquet is obtained by voice, by audio dimension analysis module to language
Sound carries out audio dimension analysis, then carries out text transcription and text semantic analysis to voice by text transcription module, passes through
Text semantic analysis module combines the user and portrait of attending a banquet that videograph module and user video logging modle provide that attend a banquet, with
And above-mentioned analysis is as a result, generate current scene drag using model of place generation module, according to model result by prompting life
It attends a banquet own self emotion, user emotion and suggestion at module and display screen prompt, meanwhile, by the real-time analysis to call of attending a banquet,
According to intonation, word speed, a reference value historical baseline values that voice is attended a banquet with this are compared, by Realtime Alerts module to the feelings attended a banquet
Thread abnormal behaviour Realtime Alerts, and by obtaining the correlation that marketing effectiveness is preferably attended a banquet to the big data analysis largely attended a banquet
Voice data supervises and guides other and attends a banquet and markets according to this standard.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.
Finally, it should be noted that the foregoing is only a preferred embodiment of the present invention, it is not intended to restrict the invention,
Although the present invention is described in detail referring to the foregoing embodiments, for those skilled in the art, still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features.
All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention
Within protection scope.
Claims (7)
1. a kind of outer paging system of intelligence based on Emotion identification, including voice communications module, voice obtain module, audio dimension point
Analyse module, text transcription module, model of place generation module, prompt generation module, text semantic analysis module, database, shape
State comparison module, Realtime Alerts module, videograph module of attending a banquet, user video logging modle and display screen, it is characterised in that:
The voice communications module obtains module with voice by signal and connect, and the voice obtains module and passes through signal and audio dimension
Analysis module connection, the audio dimension analysis module and text semantic analysis module pass through signal and text transcription module connects
It connects, videograph module and the user video logging modle of attending a banquet is connect by signal with text semantic analysis module, described
Text transcription module is connect by signal with model of place generation module, the model of place generation module pass through signal respectively with
Prompt generation module is connected with database, and the database is connect by signal with epidemic situation comparison module, the epidemic situation comparison mould
Block is connect by signal with prompt generation module, and the model of place generation module is connect by signal with prompt generation module,
The prompt generation module is connect by signal with display screen.
2. a kind of outer paging system of intelligence based on Emotion identification according to claim 1, it is characterised in that: the state ratio
It is connect by signal with Realtime Alerts module compared with module.
3. a kind of outer paging system of intelligence based on Emotion identification according to claim 1, it is characterised in that: the view of attending a banquet
Frequency logging modle and user video logging modle are connect by signal with database.
4. a kind of outer paging system of intelligence based on Emotion identification according to claim 1, it is characterised in that: the database
It is to be bi-directionally connected with model of place generation module.
5. a kind of outer paging system of intelligence based on Emotion identification according to claim 1, it is characterised in that: the audio dimension
Spending analysis module includes word speed phonic signal character analytical unit, amplitude phonic signal character analytical unit and fundamental frequency voice signal
Characteristic analysis unit.
6. a kind of outer paging system of intelligence based on Emotion identification according to claim 1, it is characterised in that: the audio dimension
It spends analysis module and is based on Parzen probabilistic neural network.
7. a kind of outer paging system of intelligence based on Emotion identification according to claim 1, it is characterised in that: the state ratio
It include historical baseline values comparing unit and average reference value comparing unit compared with module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910303368.8A CN109994102A (en) | 2019-04-16 | 2019-04-16 | A kind of outer paging system of intelligence based on Emotion identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910303368.8A CN109994102A (en) | 2019-04-16 | 2019-04-16 | A kind of outer paging system of intelligence based on Emotion identification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109994102A true CN109994102A (en) | 2019-07-09 |
Family
ID=67133635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910303368.8A Pending CN109994102A (en) | 2019-04-16 | 2019-04-16 | A kind of outer paging system of intelligence based on Emotion identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109994102A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112215927A (en) * | 2020-09-18 | 2021-01-12 | 腾讯科技(深圳)有限公司 | Method, device, equipment and medium for synthesizing face video |
CN112651237A (en) * | 2019-10-11 | 2021-04-13 | 武汉渔见晚科技有限责任公司 | User portrait establishing method and device based on user emotion standpoint and user portrait visualization method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9299343B1 (en) * | 2014-03-31 | 2016-03-29 | Noble Systems Corporation | Contact center speech analytics system having multiple speech analytics engines |
CN105700682A (en) * | 2016-01-08 | 2016-06-22 | 北京乐驾科技有限公司 | Intelligent gender and emotion recognition detection system and method based on vision and voice |
CN107256392A (en) * | 2017-06-05 | 2017-10-17 | 南京邮电大学 | A kind of comprehensive Emotion identification method of joint image, voice |
CN107480270A (en) * | 2017-08-18 | 2017-12-15 | 北京点易通科技有限公司 | A kind of real time individual based on user feedback data stream recommends method and system |
CN108174046A (en) * | 2017-11-10 | 2018-06-15 | 大连金慧融智科技股份有限公司 | A kind of personnel monitoring system and method for call center |
CN108764010A (en) * | 2018-03-23 | 2018-11-06 | 姜涵予 | Emotional state determines method and device |
-
2019
- 2019-04-16 CN CN201910303368.8A patent/CN109994102A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9299343B1 (en) * | 2014-03-31 | 2016-03-29 | Noble Systems Corporation | Contact center speech analytics system having multiple speech analytics engines |
CN105700682A (en) * | 2016-01-08 | 2016-06-22 | 北京乐驾科技有限公司 | Intelligent gender and emotion recognition detection system and method based on vision and voice |
CN107256392A (en) * | 2017-06-05 | 2017-10-17 | 南京邮电大学 | A kind of comprehensive Emotion identification method of joint image, voice |
CN107480270A (en) * | 2017-08-18 | 2017-12-15 | 北京点易通科技有限公司 | A kind of real time individual based on user feedback data stream recommends method and system |
CN108174046A (en) * | 2017-11-10 | 2018-06-15 | 大连金慧融智科技股份有限公司 | A kind of personnel monitoring system and method for call center |
CN108764010A (en) * | 2018-03-23 | 2018-11-06 | 姜涵予 | Emotional state determines method and device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112651237A (en) * | 2019-10-11 | 2021-04-13 | 武汉渔见晚科技有限责任公司 | User portrait establishing method and device based on user emotion standpoint and user portrait visualization method |
CN112651237B (en) * | 2019-10-11 | 2024-03-19 | 武汉渔见晚科技有限责任公司 | User portrait establishing method and device based on user emotion standpoint and user portrait visualization method |
CN112215927A (en) * | 2020-09-18 | 2021-01-12 | 腾讯科技(深圳)有限公司 | Method, device, equipment and medium for synthesizing face video |
CN112215927B (en) * | 2020-09-18 | 2023-06-23 | 腾讯科技(深圳)有限公司 | Face video synthesis method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhou et al. | Modality attention for end-to-end audio-visual speech recognition | |
CN103700370B (en) | A kind of radio and television speech recognition system method and system | |
CN105700682A (en) | Intelligent gender and emotion recognition detection system and method based on vision and voice | |
US20040122675A1 (en) | Visual feature extraction procedure useful for audiovisual continuous speech recognition | |
CN105446146A (en) | Intelligent terminal control method based on semantic analysis, system and intelligent terminal | |
CN109994102A (en) | A kind of outer paging system of intelligence based on Emotion identification | |
Ntalampiras et al. | Acoustic detection of human activities in natural environments | |
Dov et al. | Kernel-based sensor fusion with application to audio-visual voice activity detection | |
JP5302505B2 (en) | Dialog status separation estimation method, dialog status estimation method, dialog status estimation system, and dialog status estimation program | |
CN112165599A (en) | Automatic conference summary generation method for video conference | |
US8954327B2 (en) | Voice data analyzing device, voice data analyzing method, and voice data analyzing program | |
Karanasou et al. | Speaker diarisation and longitudinal linking in multi-genre broadcast data | |
US11194303B2 (en) | Method and system for anomaly detection and notification through profiled context | |
US8335332B2 (en) | Fully learning classification system and method for hearing aids | |
KR100308028B1 (en) | method and apparatus for adaptive speech detection and computer-readable medium using the method | |
CN109192197A (en) | Big data speech recognition system Internet-based | |
CN113436618A (en) | Signal accuracy adjusting system for voice instruction capture | |
Ferras et al. | System fusion and speaker linking for longitudinal diarization of TV shows | |
Krishnakumar et al. | A comparison of boosted deep neural networks for voice activity detection | |
Imoto et al. | Acoustic scene classification based on generative model of acoustic spatial words for distributed microphone array | |
Chen et al. | VB-HMM Speaker Diarization with Enhanced and Refined Segment Representation. | |
Zhang et al. | A novel speaker clustering algorithm via supervised affinity propagation | |
US20130295973A1 (en) | Method and apparatus for managing interruptions from different modes of communication | |
KR20050058161A (en) | Speech recognition method and device by integrating audio, visual and contextual features based on neural networks | |
Han et al. | Robust speaker clustering strategies to data source variation for improved speaker diarization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190709 |
|
RJ01 | Rejection of invention patent application after publication |