KR20170098445A - Situation judgment apparatus based on voice/sound analysis - Google Patents

Situation judgment apparatus based on voice/sound analysis Download PDF

Info

Publication number
KR20170098445A
KR20170098445A KR1020160020350A KR20160020350A KR20170098445A KR 20170098445 A KR20170098445 A KR 20170098445A KR 1020160020350 A KR1020160020350 A KR 1020160020350A KR 20160020350 A KR20160020350 A KR 20160020350A KR 20170098445 A KR20170098445 A KR 20170098445A
Authority
KR
South Korea
Prior art keywords
voice
module
speaker
ambient sound
analyzing
Prior art date
Application number
KR1020160020350A
Other languages
Korean (ko)
Inventor
최종석
임윤섭
서형호
최규태
장준혁
Original Assignee
한국과학기술연구원
(주)트리포스
한양대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국과학기술연구원, (주)트리포스, 한양대학교 산학협력단 filed Critical 한국과학기술연구원
Priority to KR1020160020350A priority Critical patent/KR20170098445A/en
Publication of KR20170098445A publication Critical patent/KR20170098445A/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Telephone Function (AREA)

Abstract

Disclosed by the present invention is an apparatus to judge situation based on voice/sound analysis. The present invention comprises: a call reception module which receives the voice and ambient sound of a speaker from a speaker mobile terminal; an age information inference module which infers the age information of the speaker by analyzing the voice and the ambient sound received from the call receiving module; a gender information inference module which infers the gender of the speaker by analyzing the voice and the ambient sound received from the call reception module; a psychological state inference module which infers the psychological state of the speaker by analyzing the voice and the ambient sound received from the call reception module; and a truth/false inference module which infers the true/false of the speaker by analyzing the voice and the ambient sound received from the call reception module. The situation judgment apparatus based on voice/sound analysis has effects of rapidly and accurately judging a false report within short time and judging accurate situation by an accident report by being formed to infer the age, gender, psychological state, truth/false, and surrounding situation of a speaker from the voice and ambient sound of the speaker.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a situation-

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a situation determination apparatus, and more particularly, to a voice / sound analysis based situation determination apparatus.

A lot of emergency calls or criminal incidents are reported every year.

However, a false report is a significant proportion of the report of such a congestion.

Unreasonable calls are frequent every few seconds, so you have to judge whether you are falsely notified within a short period of time and make a quick decision early on.

The recipient of the telephone call should not miss the true telephone call while attracting the false telephone call for a long time.

On the other hand, if it is not a false report, the speaker is often embarrassed and often fails to deliver the situation properly. In this case, it is necessary to quickly and accurately judge not only the authenticity of the declaration but also the urgency of the declaration through the voice of the speaker or the surrounding sound, as well as grasping and reasoning as much information as possible about the accident scene or the speaker in a short time.

However, there is a problem that it is not easy to grasp a lot of information accurately and quickly within a short time, and accuracy and objectivity may not be constant.

It is an object of the present invention to provide a voice / sound analysis based situation determination apparatus.

According to another aspect of the present invention, there is provided an apparatus for determining a situation based on voice / sound analysis, the apparatus comprising: a call receiving module for receiving a voice of a speaking party and a surrounding sound from a speaking party mobile terminal; An age information reasoning module for analyzing the voice and the ambient sound received from the call receiving module to infer the age information of the speaker; A gender information reasoning module for analyzing the voice and the ambient sound received from the call receiving module to infer the gender of the speaker; A psychological state reasoning module for analyzing a voice and a surrounding sound received from the call reception module to infer the psychological state of the speaker; And a truth / false inference module for inferring the truth / false of the speaker by analyzing the voice and the ambient sound received from the call receiving module.

In this case, when the call receiving module receives the voice of the talker and the ambient sound from the speaking terminal, the GPS receiving module transmits a GPS (Global Positioning System) function of turning on the global positioning system (GPS) global positioning system) remote control module.

According to the above-described voice / sound analysis-based situation determination apparatus, since the age, gender, psychological state, truth / false and surrounding situation of the speaking person are inferred from the voice of the speaking person and the surrounding sound, And it is effective to judge the exact situation according to the accident report.

1 is a block diagram of a voice / sound analysis based situation determination apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating a method of determining a context based on speech / sound analysis according to an exemplary embodiment of the present invention.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail to the concrete inventive concept. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

The terms first, second, A, B, etc. may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

1 is a block diagram of a voice / sound analysis based situation determination apparatus according to an embodiment of the present invention.

1, the voice / sound analysis based situation determination apparatus 100 according to an embodiment of the present invention includes a call reception module 110, an age information reasoning module 120, a gender information reasoning module 130, A state inference module 140, a truth / false speculation module 150, a peripheral acoustic reasoning module 160, a peripheral acoustic database 170, and a GPS remote control module 180.

Hereinafter, the detailed configuration will be described.

The call receiving module 110 may be configured to receive the speech of the speaking party and the ambient sound from the speaking terminal mobile terminal 10. [ The voice and ambient sounds received at the call receiving module 110 may be configured to be automatically backed up to the voice call database. Automatically backed up voice and ambient sounds can be used for real-time analysis.

The call receiving module 110 distinguishes between voice and ambient sound to extract voice and ambient sound respectively. The extracted voice includes an age information reasoning module 120, a gender information reasoning module 130, a psychology state reasoning module 140, And the inference of the truth / false inference module 150.

The age information inference module 120 may be configured to infer the age information of the speaker by analyzing the voice and the ambient sound received from the call receiving module 110. [

Specifically, the age information can be inferred according to the following inference criteria.

Generally speaking, there are some factors that cause significant difference in speaking behaviors of elderly people compared to young people. In the elderly, the speed of speech is generally slower than that of young people, and the speed of speech of syllables is not constant. In addition, the silence is inserted in an inappropriate position, and there is a tendency to exhibit abnormal behaviors in pronounciation and pronation.

On the other hand, younger adults showed longer MPT (maximum phonation time) than older adults, which means that the elongation performance of vowels tends to decrease with age. The alternating motion rate (AMR) and sequential motion rate (SMR), which check the repetition rate and regularity of syllables, are also found to be faster in younger people than in older people.

On the other hand, the elderly have lower cognitive sensation and motor function, which contribute to horse output, than the older age group. Therefore, the overall speech rate and articulation rate are slowed down. .

In addition, the elderly exhibit a high incidence of disability in both subjective and objective aspects, and the elderly women exhibit a significantly higher voice disorder index than adult women.

And for men, the vocal pitch is lowered from 40 to 50 and then rising again, and women tend to fall in pitch as they get older.

As a result of measuring jitter and shimmer, the rate of change of vibration and the regularity of waveform are increased in elderly males, and the rate of change of vibration in elderly females only tends to increase. Here, jitter is the rate of change of the vocal fold vibration, and the shimmer means the regularity of the voice waveform. This tendency is indicative of a decrease in laryngeal function or a degenerative change in the laryngeal tissue. As a result of the measurement of the noise contrast ratio, which is another indicator of the stability of the vocalization, it is significantly increased in the elderly woman, which supports the instability of the vocalization according to the age increase.

The change of the voice index due to degenerative changes of the larynx tends to show a larger value in the jitter of the vocal fold vibration.

The gender information inferring module 130 may be configured to infer the gender of the speaker by analyzing the voice and the ambient sound received from the call receiving module 110. [

The gender information inference module 130 may be configured to deduce gender based on the following criteria.

The gender information reasoning module 130 can analyze the difference between the maximum speech duration time, the fundamental frequency, the frequency variation rate, the amplitude variation rate, the noise versus noise ratio, the average fundamental frequency, the maximum fundamental frequency, and the minimum fundamental frequency.

According to gender, there are significant differences in fundamental frequency, frequency variation, amplitude variation, and maximum fundamental frequency. In addition, there is no significant difference according to gender, noise - to - noise ratio, average fundamental frequency, and minimum fundamental frequency. In addition, the fundamental frequency shows a significant difference between the annual utterance and vocal extension.

The psychological state inference module 140 may be configured to infer the psychological state of the speaker by analyzing the voice and the ambient sound received from the call receiving module 110. [

The psychological state and intention can be inferred by the following criteria.

First, the personality of the speaker can be deduced through the spoken behavior. The extroversion and introversion of the speaker can be judged on the basis of the speaking rate, silence length, silence frequency, and relative variation of the pitch.

In addition, the emotion inference engine for judging one emotion state of pleasant / pleasant / stable from the EEG / pulse wave sensing information of a speaking person can grasp the emotion, personality, psychological state, intention, etc. of the speaking person in various aspects.

The truth / false inference module 150 may be configured to infer the truth / falsehood of the speaker by analyzing the voice and ambient sounds received at the call receiving module 110.

The truth / falsehood of a speaker can be inferred by the following criteria.

First, the speaker's answer to the question of the sender's question can be stored for 5 seconds and analyzed to judge the truth or falsehood.

Here, the report taker can be configured to ask questions of the same pattern, to pre-set some answers to follow these questions, and to judge truth / falsehood through them.

The peripheral acoustic reasoning module 160 can be configured to use the ambient sound extracted from the call receiving module 110 to grasp the surrounding situation.

The ambient acoustic reasoning module 160 can be configured to infer the ambient sound as to what sound the ambient sound is in contrast to the sound of the previously stored acoustic database 170. [

For example, it is possible to preliminarily store sound such as a car sound, a human sound, a rain sound, and the like in the sound database 170 and determine the surrounding situation in relation to each other.

The GPS remote control module 180 receives the voice of the speaking party and the ambient sound from the speaking party mobile terminal 10 at the call receiving module 110 and transmits the voice of the speaking party mobile terminal 10 to the GPS global positioning system < / RTI > function.

It is preferable that the GPS remote control module 128 turns on the GPS function of the speaking terminal 10 and monitors the GPS coordinates in real time because the error is significant in the case of tracking the location by the base station or the WiFi.

The user terminal 20 may include age information deduced from the age information reasoning module 120, gender information deduced from the gender information reasoning module 130, psychological state deduced from the psychological state reasoning module 140, and truth / The truth / inference deduced from the sound module 150, and the ambient conditions of the ambient sound reasoning module 160 in real time.

Also, the user terminal 20 can be configured to display the GPS coordinates of the speaking terminal 10 in real time by the GPS remote control module 180. [

FIG. 2 is a flowchart illustrating a method of determining a context based on speech / sound analysis according to an exemplary embodiment of the present invention.

Referring to FIG. 2, the talker mobile terminal 10 transmits voice and ambient sounds of a speaking person (S101).

Next, the call reception module 110 receives the voice of the speaking party and the ambient sound from the speaking terminal 10 (S102).

Next, the age information reasoning module 120 analyzes the received voice and the ambient sound to infer the age information of the speaker (S103).

Next, the gender information reasoning module 130 analyzes the received voice and the surrounding sound to infer the gender information of the speaker (S104).

Next, the psychological state inference module 140 analyzes the received voice and the ambient sound to infer the psychological state of the speaker (S105).

Next, the truth / false inference module 150 analyzes the received voice and ambient sound to infer the truth / falsehood of the speaker (S106).

Next, when the call receiving module 110 receives the voice of the speaking party and the ambient sound from the speaking terminal 10, the GPS remote control module 180 transmits the voice of the speaking terminal 10 to the GPS global positioning system (GPS) function and receives GPS coordinates from the speaking terminal 10 in real time (S107).

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention as defined in the following claims. There will be.

110: call receiving module
120: Age Information Inference Module
130: Gender Information Inference Module
140: psychological state inference module
150: Truth / False Inference Module
160: Ambient acoustic reasoning module
170: ambient acoustic database
180: GPS remote control module

Claims (2)

A call receiving module for receiving a voice of a speaking party and ambient sound from a speaking party mobile terminal;
An age information reasoning module for analyzing the voice and the ambient sound received from the call receiving module to infer the age information of the speaker;
A gender information reasoning module for analyzing the voice and the ambient sound received from the call receiving module to infer the gender of the speaker;
A psychological state reasoning module for analyzing a voice and a surrounding sound received from the call receiving module to deduce a psychological state of the speaker;
And a truth / false inference module for inferring the truth / false of the talker by analyzing the voice and the ambient sound received from the call reception module.
The method according to claim 1,
When the call receiving module receives the voice of the speaking party and the ambient sound from the speaking mobile terminal, the global positioning (GPS) function of turning on the global positioning system (GPS) function of the speaking party mobile terminal through the corresponding communication company server system) remote control module according to the present invention.
KR1020160020350A 2016-02-22 2016-02-22 Situation judgment apparatus based on voice/sound analysis KR20170098445A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020160020350A KR20170098445A (en) 2016-02-22 2016-02-22 Situation judgment apparatus based on voice/sound analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020160020350A KR20170098445A (en) 2016-02-22 2016-02-22 Situation judgment apparatus based on voice/sound analysis

Publications (1)

Publication Number Publication Date
KR20170098445A true KR20170098445A (en) 2017-08-30

Family

ID=59760571

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020160020350A KR20170098445A (en) 2016-02-22 2016-02-22 Situation judgment apparatus based on voice/sound analysis

Country Status (1)

Country Link
KR (1) KR20170098445A (en)

Similar Documents

Publication Publication Date Title
ES2242634T3 (en) TELEPHONE EMOTION DETECTOR WITH OPERATOR FEEDBACK.
US6427137B2 (en) System, method and article of manufacture for a voice analysis system that detects nervousness for preventing fraud
JP5834449B2 (en) Utterance state detection device, utterance state detection program, and utterance state detection method
CN109460752B (en) Emotion analysis method and device, electronic equipment and storage medium
US8784311B2 (en) Systems and methods of screening for medical states using speech and other vocal behaviors
ES2261706T3 (en) METHOD AND APPARATUS FOR CONVERSATION ANALYSIS.
WO2019084214A1 (en) Separating and recombining audio for intelligibility and comfort
JP6268717B2 (en) State estimation device, state estimation method, and computer program for state estimation
JP2017100221A (en) Communication robot
WO2017085992A1 (en) Information processing apparatus
EP4020467A1 (en) Voice coaching system and related methods
KR101799874B1 (en) Situation judgment system and method based on voice/sound analysis
Frank et al. Nonverbal elements of the voice
KR20220048381A (en) Device, method and program for speech impairment evaluation
JP6258172B2 (en) Sound information processing apparatus and system
JP4631464B2 (en) Physical condition determination device and program thereof
Brutten Behaviour assessment and the strategy of therapy
JP2017196115A (en) Cognitive function evaluation device, cognitive function evaluation method, and program
JP2006230446A (en) Health-condition estimating equipment
KR20170098445A (en) Situation judgment apparatus based on voice/sound analysis
JP6598227B1 (en) Cat-type conversation robot
KR20170098446A (en) Situation judgment ethod based on voice/sound analysis
KR20180052909A (en) Interface system and method for database based on voice/sound analysis and legacy
KR20180052907A (en) System and method of supplying graphic statistics using database based on voice/sound analysis
Sheeder et al. Say it like you mean it: Priming for structure in caller responses to a spoken dialog system