CN109994102A - A kind of outer paging system of intelligence based on Emotion identification - Google Patents

A kind of outer paging system of intelligence based on Emotion identification Download PDF

Info

Publication number
CN109994102A
CN109994102A CN201910303368.8A CN201910303368A CN109994102A CN 109994102 A CN109994102 A CN 109994102A CN 201910303368 A CN201910303368 A CN 201910303368A CN 109994102 A CN109994102 A CN 109994102A
Authority
CN
China
Prior art keywords
module
signal
connect
voice
paging system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910303368.8A
Other languages
Chinese (zh)
Inventor
朱宇光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Upper Hainan Airlines Move Science And Technology Ltd
Original Assignee
Upper Hainan Airlines Move Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Upper Hainan Airlines Move Science And Technology Ltd filed Critical Upper Hainan Airlines Move Science And Technology Ltd
Priority to CN201910303368.8A priority Critical patent/CN109994102A/en
Publication of CN109994102A publication Critical patent/CN109994102A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
    • H04M3/5166Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing in combination with interactive voice response systems or voice portals, e.g. as front-ends

Abstract

The outer paging system of intelligence that the invention discloses a kind of based on Emotion identification, including voice communications module, voice obtains module, audio dimension analysis module, text transcription module, model of place generation module, prompt generation module, text semantic analysis module, database, epidemic situation comparison module, Realtime Alerts module, it attends a banquet videograph module, user video logging modle and display screen, voice communications module obtains module with voice by signal and connect, voice obtains module and is connect by signal with audio dimension analysis module, audio dimension analysis module and text semantic analysis module pass through signal and connect with text transcription module, videograph module of attending a banquet and user video logging modle are connect by signal with text semantic analysis module, the system, it is added in common outgoing call of attending a banquet and is based on voice and text artificial intelligence's analytical technology , supervised with this, instruct both sides' mood, so that it is more standardized entirely to converse, hommization promotes user experience.

Description

A kind of outer paging system of intelligence based on Emotion identification
Technical field
The present invention relates to speech emotional processing technology field, specially a kind of outer paging system of intelligence based on Emotion identification.
Background technique
In terms of pattern-recognition, all means are almost utilized in speech emotional process field by various countries researcher, newly Method application and comparison emerge one after another, neural network classifier, Bayes classifier, K nearest neighbor classifier, SVM, GMM, HMM classifier, which has, to be used, although the research on speech emotion recognition has carried out very much, at entire speech emotional information Reason field is also in a lower level.Because the validity feature extracted first is limited, almost all of researcher is Using the combination of prosodic features or these features or derivative feature as analysis parameter, secondly, for the means of pattern-recognition, though So there are many different application methods, but the data as used in research project are different, and make analogy between these documents A possibility that very little, research finds that the research object in document is widely different, as a result different, only for discrimination, is just formed Great disparity as from 53% to 90%, and cannot say that high method of discrimination just centainly that side lower than discrimination Method is good, this is without comparativity.
So in summary introducing, the stage that the identification of speech emotional is explored and studied also in one, many problems It needs to solve with difficulty, speech emotional technology is applied to voice messaging inquiry system at present, and there are the correct recognition rata of emotion is general All over lower problem, the joint efforts of all research workers are also needed to the breakthrough in the field.
Summary of the invention
The outer paging system of intelligence that the purpose of the present invention is to provide a kind of based on Emotion identification, to solve above-mentioned background technique The problem of middle proposition.
In order to solve the above technical problem, the present invention provides following technical solutions: a kind of intelligence based on Emotion identification is outer Paging system, including voice communications module, voice obtain module, audio dimension analysis module, text transcription module, model of place life At module, prompt generation module, text semantic analysis module, database, epidemic situation comparison module, Realtime Alerts module, view of attending a banquet Frequency logging modle, user video logging modle and display screen, the voice communications module obtain module by signal and voice and connect It connects, the voice obtains module and connect by signal with audio dimension analysis module, the audio dimension analysis module and text Semantic module passes through signal and connect with text transcription module, videograph module and the user video record mould of attending a banquet Block is connect by signal with text semantic analysis module, and the text transcription module is connected by signal and model of place generation module It connects, the model of place generation module is connect with prompt generation module and database respectively by signal, and the database passes through Signal is connect with epidemic situation comparison module, and the epidemic situation comparison module is connect by signal with prompt generation module, the scene mould Type generation module is connect by signal with prompt generation module, and the prompt generation module is connect by signal with display screen.
According to the above technical scheme, the epidemic situation comparison module is connect by signal with Realtime Alerts module.
According to the above technical scheme, videograph module and the user video logging modle of attending a banquet passes through signal and data Library connection.
According to the above technical scheme, the database is to be bi-directionally connected with model of place generation module.
According to the above technical scheme, the audio dimension analysis module includes word speed phonic signal character analytical unit, width Spend phonic signal character analytical unit and fundamental frequency phonic signal character analytical unit.
According to the above technical scheme, the audio dimension analysis module is based on Parzen probabilistic neural network.
According to the above technical scheme, the epidemic situation comparison module includes historical baseline values comparing unit and average reference value ratio Compared with unit.
Compared with prior art, the beneficial effects obtained by the present invention are as follows being: it is somebody's turn to do the outer paging system of intelligence based on Emotion identification, The speech emotion recognition of " with text dependent and unrelated with speaker " is applied in voice messaging inquiry system, Bayes is utilized Minimal error rate decision theory determines optimal threshold, proposes a kind of new speech sound signal terminal point detection algorithm;Have studied word speed, Amplitude and fundamental frequency three classes phonic signal character, and utilize these features of fuzzy entropy theory analysis for the effective of emotional semantic classification Property, then select optimal characteristic parameter to combine to carry out speech emotion recognition;Have studied point suitable for speech emotion recognition Class device is completed the identification of speech emotional state using Parzen probabilistic neural network, substantially increases the whole discrimination of system.
Detailed description of the invention
Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.In the accompanying drawings:
Fig. 1 is system flow chart of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, the present invention provides a kind of technical solution: a kind of outer paging system of intelligence based on Emotion identification, including Voice communications module, voice obtain module, audio dimension analysis module, text transcription module, model of place generation module, prompt Generation module, text semantic analysis module, database, epidemic situation comparison module, Realtime Alerts module, videograph module of attending a banquet, User video logging modle and display screen, voice communications module obtain module with voice by signal and connect, and voice obtains module Connect by signal with audio dimension analysis module, audio dimension analysis module include word speed phonic signal character analytical unit, Amplitude phonic signal character analytical unit and fundamental frequency phonic signal character analytical unit, audio dimension analysis module are based on Parzen Probabilistic neural network, audio dimension analysis module and text semantic analysis module pass through signal and connect with text transcription module, Videograph module of attending a banquet and user video logging modle are connect by signal with text semantic analysis module, videograph of attending a banquet Module and user video logging modle are connect by signal with database, and text transcription module is generated by signal and model of place Module connection, model of place generation module are connect with prompt generation module and database respectively by signal, database and scene Model generation module is to be bi-directionally connected, and database is connect by signal with epidemic situation comparison module, and epidemic situation comparison module passes through signal It is connect with Realtime Alerts module, epidemic situation comparison module includes historical baseline values comparing unit and average reference value comparing unit, shape State comparison module is connect by signal with prompt generation module, and model of place generation module is connected by signal and prompt generation module It connects, generation module is prompted to connect by signal with display screen;It user and attends a banquet normal voice is carried out by voice communications module Communication, meanwhile, the audio data stream that module obtains user and attends a banquet is obtained by voice, by audio dimension analysis module to language Sound carries out audio dimension analysis, then carries out text transcription and text semantic analysis to voice by text transcription module, passes through Text semantic analysis module combines the user and portrait of attending a banquet that videograph module and user video logging modle provide that attend a banquet, with And above-mentioned analysis is as a result, generate current scene drag using model of place generation module, according to model result by prompting life It attends a banquet own self emotion, user emotion and suggestion at module and display screen prompt, meanwhile, by the real-time analysis to call of attending a banquet, According to intonation, word speed, a reference value historical baseline values that voice is attended a banquet with this are compared, by Realtime Alerts module to the feelings attended a banquet Thread abnormal behaviour Realtime Alerts, and by obtaining the correlation that marketing effectiveness is preferably attended a banquet to the big data analysis largely attended a banquet Voice data supervises and guides other and attends a banquet and markets according to this standard.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.
Finally, it should be noted that the foregoing is only a preferred embodiment of the present invention, it is not intended to restrict the invention, Although the present invention is described in detail referring to the foregoing embodiments, for those skilled in the art, still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims (7)

1. a kind of outer paging system of intelligence based on Emotion identification, including voice communications module, voice obtain module, audio dimension point Analyse module, text transcription module, model of place generation module, prompt generation module, text semantic analysis module, database, shape State comparison module, Realtime Alerts module, videograph module of attending a banquet, user video logging modle and display screen, it is characterised in that: The voice communications module obtains module with voice by signal and connect, and the voice obtains module and passes through signal and audio dimension Analysis module connection, the audio dimension analysis module and text semantic analysis module pass through signal and text transcription module connects It connects, videograph module and the user video logging modle of attending a banquet is connect by signal with text semantic analysis module, described Text transcription module is connect by signal with model of place generation module, the model of place generation module pass through signal respectively with Prompt generation module is connected with database, and the database is connect by signal with epidemic situation comparison module, the epidemic situation comparison mould Block is connect by signal with prompt generation module, and the model of place generation module is connect by signal with prompt generation module, The prompt generation module is connect by signal with display screen.
2. a kind of outer paging system of intelligence based on Emotion identification according to claim 1, it is characterised in that: the state ratio It is connect by signal with Realtime Alerts module compared with module.
3. a kind of outer paging system of intelligence based on Emotion identification according to claim 1, it is characterised in that: the view of attending a banquet Frequency logging modle and user video logging modle are connect by signal with database.
4. a kind of outer paging system of intelligence based on Emotion identification according to claim 1, it is characterised in that: the database It is to be bi-directionally connected with model of place generation module.
5. a kind of outer paging system of intelligence based on Emotion identification according to claim 1, it is characterised in that: the audio dimension Spending analysis module includes word speed phonic signal character analytical unit, amplitude phonic signal character analytical unit and fundamental frequency voice signal Characteristic analysis unit.
6. a kind of outer paging system of intelligence based on Emotion identification according to claim 1, it is characterised in that: the audio dimension It spends analysis module and is based on Parzen probabilistic neural network.
7. a kind of outer paging system of intelligence based on Emotion identification according to claim 1, it is characterised in that: the state ratio It include historical baseline values comparing unit and average reference value comparing unit compared with module.
CN201910303368.8A 2019-04-16 2019-04-16 A kind of outer paging system of intelligence based on Emotion identification Pending CN109994102A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910303368.8A CN109994102A (en) 2019-04-16 2019-04-16 A kind of outer paging system of intelligence based on Emotion identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910303368.8A CN109994102A (en) 2019-04-16 2019-04-16 A kind of outer paging system of intelligence based on Emotion identification

Publications (1)

Publication Number Publication Date
CN109994102A true CN109994102A (en) 2019-07-09

Family

ID=67133635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910303368.8A Pending CN109994102A (en) 2019-04-16 2019-04-16 A kind of outer paging system of intelligence based on Emotion identification

Country Status (1)

Country Link
CN (1) CN109994102A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215927A (en) * 2020-09-18 2021-01-12 腾讯科技(深圳)有限公司 Method, device, equipment and medium for synthesizing face video
CN112651237A (en) * 2019-10-11 2021-04-13 武汉渔见晚科技有限责任公司 User portrait establishing method and device based on user emotion standpoint and user portrait visualization method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9299343B1 (en) * 2014-03-31 2016-03-29 Noble Systems Corporation Contact center speech analytics system having multiple speech analytics engines
CN105700682A (en) * 2016-01-08 2016-06-22 北京乐驾科技有限公司 Intelligent gender and emotion recognition detection system and method based on vision and voice
CN107256392A (en) * 2017-06-05 2017-10-17 南京邮电大学 A kind of comprehensive Emotion identification method of joint image, voice
CN107480270A (en) * 2017-08-18 2017-12-15 北京点易通科技有限公司 A kind of real time individual based on user feedback data stream recommends method and system
CN108174046A (en) * 2017-11-10 2018-06-15 大连金慧融智科技股份有限公司 A kind of personnel monitoring system and method for call center
CN108764010A (en) * 2018-03-23 2018-11-06 姜涵予 Emotional state determines method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9299343B1 (en) * 2014-03-31 2016-03-29 Noble Systems Corporation Contact center speech analytics system having multiple speech analytics engines
CN105700682A (en) * 2016-01-08 2016-06-22 北京乐驾科技有限公司 Intelligent gender and emotion recognition detection system and method based on vision and voice
CN107256392A (en) * 2017-06-05 2017-10-17 南京邮电大学 A kind of comprehensive Emotion identification method of joint image, voice
CN107480270A (en) * 2017-08-18 2017-12-15 北京点易通科技有限公司 A kind of real time individual based on user feedback data stream recommends method and system
CN108174046A (en) * 2017-11-10 2018-06-15 大连金慧融智科技股份有限公司 A kind of personnel monitoring system and method for call center
CN108764010A (en) * 2018-03-23 2018-11-06 姜涵予 Emotional state determines method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651237A (en) * 2019-10-11 2021-04-13 武汉渔见晚科技有限责任公司 User portrait establishing method and device based on user emotion standpoint and user portrait visualization method
CN112651237B (en) * 2019-10-11 2024-03-19 武汉渔见晚科技有限责任公司 User portrait establishing method and device based on user emotion standpoint and user portrait visualization method
CN112215927A (en) * 2020-09-18 2021-01-12 腾讯科技(深圳)有限公司 Method, device, equipment and medium for synthesizing face video
CN112215927B (en) * 2020-09-18 2023-06-23 腾讯科技(深圳)有限公司 Face video synthesis method, device, equipment and medium

Similar Documents

Publication Publication Date Title
Zhou et al. Modality attention for end-to-end audio-visual speech recognition
CN103700370B (en) A kind of radio and television speech recognition system method and system
CN105700682A (en) Intelligent gender and emotion recognition detection system and method based on vision and voice
US20040122675A1 (en) Visual feature extraction procedure useful for audiovisual continuous speech recognition
CN105446146A (en) Intelligent terminal control method based on semantic analysis, system and intelligent terminal
CN109994102A (en) A kind of outer paging system of intelligence based on Emotion identification
Ntalampiras et al. Acoustic detection of human activities in natural environments
Dov et al. Kernel-based sensor fusion with application to audio-visual voice activity detection
JP5302505B2 (en) Dialog status separation estimation method, dialog status estimation method, dialog status estimation system, and dialog status estimation program
CN112165599A (en) Automatic conference summary generation method for video conference
US8954327B2 (en) Voice data analyzing device, voice data analyzing method, and voice data analyzing program
Karanasou et al. Speaker diarisation and longitudinal linking in multi-genre broadcast data
US11194303B2 (en) Method and system for anomaly detection and notification through profiled context
US8335332B2 (en) Fully learning classification system and method for hearing aids
KR100308028B1 (en) method and apparatus for adaptive speech detection and computer-readable medium using the method
CN109192197A (en) Big data speech recognition system Internet-based
CN113436618A (en) Signal accuracy adjusting system for voice instruction capture
Ferras et al. System fusion and speaker linking for longitudinal diarization of TV shows
Krishnakumar et al. A comparison of boosted deep neural networks for voice activity detection
Imoto et al. Acoustic scene classification based on generative model of acoustic spatial words for distributed microphone array
Chen et al. VB-HMM Speaker Diarization with Enhanced and Refined Segment Representation.
Zhang et al. A novel speaker clustering algorithm via supervised affinity propagation
US20130295973A1 (en) Method and apparatus for managing interruptions from different modes of communication
KR20050058161A (en) Speech recognition method and device by integrating audio, visual and contextual features based on neural networks
Han et al. Robust speaker clustering strategies to data source variation for improved speaker diarization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190709

RJ01 Rejection of invention patent application after publication