KR20170099004A

KR20170099004A - Situation judgment system and method based on voice/sound analysis

Info

Publication number: KR20170099004A
Application number: KR1020160020348A
Authority: KR
Inventors: 최종석; 임윤섭; 김래현; 김재관; 서형호; 최규태; 박호진; 이현우; 권순일; 백성욱; 전석봉; 정희석; 진세훈; 장준혁
Original assignee: 한국과학기술연구원; (주)파워보이스; 한양대학교 산학협력단; 세종대학교산학협력단; 주식회사 와이즈넛; (주)트리포스
Priority date: 2016-02-22
Filing date: 2016-02-22
Publication date: 2017-08-31
Also published as: KR101799874B1

Abstract

Disclosed are a situation judgment system based on voice/sound analysis and a method thereof. The situation judgment system based on voice/sound analysis comprises: a speaker mobile terminal, which transmits speakers voice and surrounding sound; a situation judgment server comprising a call receiving module for receiving the speakers voice and the surrounding sound from the speaker mobile, an age information inferring module for inferring speakers age information by analyzing the voice and the surrounding sound received from the call receiving module, a gender information inferring module for inferring speakers gender information by analyzing the voice and the surrounding sound received from the call receiving module, a mental state inferring module for inferring a speakers mental state by analyzing the voice and the surrounding sound received from the call receiving module, and a truth/untruth inferring module for inferring speakers truth/untruth by analyzing the voice and the surrounding sound received from the call receiving module; and a user terminal, which displays the age information inferred by the age information inferring module, the gender information inferred by the gender information inferring module, the mental state inferred by the metal state inferring module, and the truth/untruth inferred by the truth/untruth module.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system and a method for determining a situation based on voice / sound analysis,

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a situation determination system and method, and more particularly, to a system and method for determining a situation based on voice / sound analysis.

A lot of emergency calls or criminal incidents are reported every year.

However, a false report is a significant proportion of the report of such a congestion.

Unreasonable calls are frequent every few seconds, so you have to judge whether you are falsely notified within a short period of time and make a quick decision early on.

The recipient of the telephone call should not miss the true telephone call while attracting the false telephone call for a long time.

On the other hand, if it is not a false report, the speaker is often embarrassed and often fails to deliver the situation properly. In this case, it is necessary to quickly and accurately judge not only the authenticity of the declaration but also the urgency of the declaration through the voice of the speaker or the surrounding sound, as well as grasping and reasoning as much information as possible about the accident scene or the speaker in a short time.

However, there is a problem that it is not easy to grasp a lot of information accurately and quickly within a short time, and accuracy and objectivity may not be constant.

It is an object of the present invention to provide a voice / sound analysis based situation determination system.

It is another object of the present invention to provide a method for determining a situation based on voice / sound analysis.

According to an aspect of the present invention, there is provided a voice / sound analysis based situation determination system including: a speech mobile terminal for transmitting a voice of a speaking person and a surrounding sound; An age information inference module for inferring age information of the talker by analyzing the voice and the ambient sound received from the call receiving module; A module for inferring a gender of the talker by analyzing the voice and the ambient sound received from the module; a psychological state reasoning module for analyzing the voice and the ambient sound received from the call receiving module to infer the psychological state of the talker; And a truth / false inference module for inferring the truth / false of the talker by analyzing the voice and the ambient sound received from the call reception module. The gender information deduced by the gender information reasoning module, the psychological state inferred from the psychological state reasoning module, and the truth / false information deduced from the truth / false inference module, May be configured to include a terminal.

In this case, when the call reception module receives the voice of the speaker and the ambient sound from the speaking terminal mobile terminal, the global positioning system (GPS) function of the speaking terminal is turned on through the corresponding communication server server and a global positioning system (GPS) remote control module for controlling the operation of the GPS module.

According to another aspect of the present invention, there is provided a speech / sound analysis based context determination method, Transmitting a voice of the speaker and ambient sound; Receiving a voice of the speaker and ambient sound from the speaker terminal; Analyzing the received voice and the ambient sound to infer the age information of the speaker; Analyzing the received voice and ambient sounds to infer the gender information of the speaker; Analyzing the received voice and the ambient sound to infer the psychological state of the speaker; Analyzing the received voice and ambient sound to infer the truth / falsehood of the speaker; And displaying the age information, gender information, psychological state, and truth / falsehood of the user terminal inferred by the situation judgment server.

Here, when the situation determination server receives the voice of the talker and the ambient sound from the speaking terminal, the situation determination server turns on the global positioning system (GPS) function of the speaking terminal through the corresponding communication company server, and receiving and displaying GPS coordinates from the speaker terminal in real time.

According to the voice / sound analysis based situation determination system and method described above, it is possible to deduce the age, gender, psychological state, truth / false and surrounding situation of a speaking person from the voice of the speaking person and the surrounding sound, And it is effective to judge the exact situation according to the accident report.

1 is a block diagram of a voice / sound analysis based situation determination system according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating a method of determining a context based on speech / sound analysis according to an exemplary embodiment of the present invention.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail to the concrete inventive concept. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

The terms first, second, A, B, etc. may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

1 is a block diagram of a voice / sound analysis based situation determination system according to an embodiment of the present invention.

Referring to FIG. 1, a voice / sound analysis based situation determination system 100 according to an embodiment of the present invention is configured to include a speaking terminal MT 110, a situation determination server 120, and a user terminal 130 .

Hereinafter, the detailed configuration will be described.

The talker mobile terminal 110 may be configured to transmit the speech of the speaking party and the ambient sound. Most of the recent reported calls tend to be made through mobile terminals. The talker mobile terminal 110 may also be configured as a mobile terminal.

The situation judgment server 120 includes a call receiving module 121, an age information reasoning module 122, a gender information reasoning module 123, a psychology state reasoning module 124, a truth / false inference module 125, A module 126, a peripheral acoustic database 127, and a GPS remote control module 128. [

Hereinafter, the detailed configuration will be described.

The call receiving module 121 may be configured to receive the voice of the speaking party and the ambient sound from the speaking terminal mobile terminal 110. [ The voice and ambient sounds received at the call receiving module 121 may be configured to be automatically backed up to the voice call database. Automatically backed up voice and ambient sounds can be used for real-time analysis.

The call reception module 121 distinguishes the voice and the ambient sound to extract the voice and the ambient sound, respectively, and the extracted voice includes an age information reasoning module 122, a gender information reasoning module 123, a psychology state reasoning module 124, And inference of the truth / false inference module 125.

The age information inference module 122 may be configured to infer the age information of the speaker by analyzing the voice and the ambient sound received from the call reception module 121. [

Specifically, the age information can be inferred according to the following inference criteria.

Generally speaking, there are some factors that cause significant difference in speaking behaviors of elderly people compared to young people. In the elderly, the speed of speech is generally slower than that of young people, and the speed of speech of syllables is not constant. In addition, the silence is inserted in an inappropriate position, and there is a tendency to exhibit abnormal behaviors in pronounciation and pronation.

On the other hand, younger adults showed longer MPT (maximum phonation time) than older adults, which means that the elongation performance of vowels tends to decrease with age. The alternating motion rate (AMR) and sequential motion rate (SMR), which check the repetition rate and regularity of syllables, are also found to be faster in younger people than in older people.

On the other hand, the elderly have lower cognitive sensation and motor function, which contribute to horse output, than the older age group. Therefore, the overall speech rate and articulation rate are slowed down. .

In addition, the elderly exhibit a high incidence of disability in both subjective and objective aspects, and the elderly women exhibit a significantly higher voice disorder index than adult women.

And for men, the vocal pitch is lowered from 40 to 50 and then rising again, and women tend to fall in pitch as they get older.

As a result of measuring jitter and shimmer, the rate of change of vibration and the regularity of waveform are increased in elderly males, and the rate of change of vibration in elderly females only tends to increase. Here, jitter is the rate of change of the vocal fold vibration, and the shimmer means the regularity of the voice waveform. This tendency is indicative of a decrease in laryngeal function or a degenerative change in the laryngeal tissue. As a result of the measurement of the noise contrast ratio, which is another indicator of the stability of the vocalization, it is significantly increased in the elderly woman, which supports the instability of the vocalization according to the age increase.

The change of the voice index due to degenerative changes of the larynx tends to show a larger value in the jitter of the vocal fold vibration.

The gender information inference module 123 may be configured to infer the gender of the speaker by analyzing the voice and the ambient sound received from the call receiving module 121. [

The gender information inference module 123 may be configured to deduce gender based on the following criteria.

The gender information inference module 123 can analyze the difference between the maximum speech duration time, the fundamental frequency, the frequency variation rate, the amplitude variation rate, the noise to noise ratio, the average fundamental frequency, the maximum fundamental frequency, and the minimum fundamental frequency.

According to gender, there are significant differences in fundamental frequency, frequency variation, amplitude variation, and maximum fundamental frequency. In addition, there is no significant difference according to gender, noise - to - noise ratio, average fundamental frequency, and minimum fundamental frequency. In addition, the fundamental frequency shows a significant difference between the annual utterance and vocal extension.

The psychological state inference module 124 may be configured to infer the psychological state of the speaker by analyzing the voice and the ambient sound received at the call receiving module 121. [

The psychological state and intention can be inferred by the following criteria.

First, the personality of the speaker can be deduced through the spoken behavior. The extroversion and introversion of the speaker can be judged on the basis of the speaking rate, silence length, silence frequency, and relative variation of the pitch.

In addition, the emotion inference engine for judging one emotion state of pleasant / pleasant / stable from the EEG / pulse wave sensing information of a speaking person can grasp the emotion, personality, psychological state, intention, etc. of the speaking person in various aspects.

The truth / false inference module 125 may be configured to infer the truth / falsehood of the speaker by analyzing the voice and ambient sounds received at the call receiving module 121.

The truth / falsehood of a speaker can be inferred by the following criteria.

First, the speaker's answer to the question of the sender's question can be stored for 5 seconds and analyzed to judge the truth or falsehood.

Here, the report taker can be configured to ask questions of the same pattern, to pre-set some answers to follow these questions, and to judge truth / falsehood through them.

The peripheral acoustic reasoning module 126 may be configured to use the ambient sound extracted from the call receiving module 121 for grasping the surrounding situation.

The ambient acoustic reasoning module 126 may be configured to infer the ambient sound to be the acoustic of the ambient sound in contrast to the sound of the previously stored acoustic database 127. [

For example, it is possible to preliminarily store sound such as a car sound, a human sound, a rain sound, and the like in the sound database 127 and determine the surrounding situation in relation to each other.

The GPS remote control module 128 receives the voice of the speaking party and the ambient sound from the speaking terminal 110 in the call receiving module 121 and transmits the global positioning system (GPS) of the speaking terminal 110 Quot; function < / RTI >

It is preferable that the GPS remote control module 128 turns on the GPS function of the speaking terminal 110 and monitors the GPS coordinates in real time because the error is significant in the case of the positioning by the base station or the Wi-Fi.

The user terminal 130 may include age information deduced from the age information reasoning module 122, gender information deduced from the gender information reasoning module 123, psychological state deduced from the psychological state reasoning module 124, and truth / A truth / false inferred from the sound module 125, and the ambient conditions of the ambient sound reasoning module 126 in real time.

Also, the user terminal 130 may be configured to display the GPS coordinates of the speaker terminal 110 in real time by the GPS remote control module 128.

FIG. 2 is a flowchart illustrating a method of determining a context based on speech / sound analysis according to an exemplary embodiment of the present invention.

Referring to FIG. 2, the talker mobile terminal 110 transmits voice of a speaking party and ambient sound (S101).

Next, the situation determination server 120 receives the voice of the speaking party and the ambient sound from the speaking terminal 110 (S102).

Next, the situation determination server 120 analyzes the received voice and the ambient sound to infer the age information of the speaker (S103).

Next, the situation determination server 120 analyzes the received voice and the ambient sound to infer the gender information of the speaker (S104).

Next, the situation determination server 120 analyzes the received voice and the ambient sound to infer the psychological state of the speaker (S105).

Next, the situation determination server 120 analyzes the received voice and the ambient sound to infer the truth / falsehood of the speaker (S106).

Next, the user terminal 130 displays the age information, gender information, psychological state, and truth / false information deduced by the situation judgment server 120 (S107).

Next, when the situation determination server 120 receives the voice of the speaker and the ambient sound from the speaker terminal 110, the situation determination server 120 transmits the voice of the speaker terminal 110 through the corresponding communication company server 10 The global positioning system (GPS) function is turned on and the GPS coordinates are received from the speaking terminal 110 in real time and displayed (S108).

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention as defined in the following claims. There will be.

110: Speaker Mobile terminal
120: Situation determination server
121: Call receiving module
122: Age Information Inference Module
123: Gender Information Inference Module
124: psychological state inference module
125: Truth / False Inference Module
126: Peripheral acoustic reasoning module
127: ambient acoustic database
128: GPS remote control module
130: User terminal

Claims

A speaker terminal for transmitting a voice of the speaker and a surrounding sound;
An age information inference module for inferring age information of the talker by analyzing the voice and the ambient sound received from the call receiving module; A module for inferring a gender of the talker by analyzing the voice and the ambient sound received from the module; a psychological state reasoning module for analyzing the voice and the ambient sound received from the call receiving module to infer the psychological state of the talker; And a truth / false inference module for inferring the truth / false of the talker by analyzing the voice and the ambient sound received from the call reception module.
The gender information deduced by the gender information reasoning module, the psychological state inferred from the psychological state reasoning module, and the truth / false information deduced from the truth / false inference module, A voice / sound analysis based context determination system including a terminal.

The system according to claim 1,
When the call receiving module receives the voice of the speaking party and the ambient sound from the speaking mobile terminal, the global positioning (GPS) function of turning on the global positioning system (GPS) function of the speaking party mobile terminal through the corresponding communication company server system) remote control module. < Desc / Clms Page number 21 >

Transmitting a voice of a speaking party and ambient sound to a speaker terminal;
Receiving a voice of the speaker and ambient sound from the speaker terminal;
Analyzing the received voice and the ambient sound to infer the age information of the speaker;
Analyzing the received voice and ambient sounds to infer the gender information of the speaker;
Analyzing the received voice and the ambient sound to infer the psychological state of the speaker;
Analyzing the received voice and ambient sound to infer the truth / falsehood of the speaker;
And displaying the age information, the gender information, the psychological state, and the truth / false information of the user terminal inferred by the situation judgment server.

The method of claim 3,
When the situation determination server receives the voice of the talker and the ambient sound from the talker mobile terminal, the situation determination server turns on the global positioning system (GPS) function of the talker mobile terminal through the corresponding communication company server, And receiving and displaying GPS coordinates from the speaker terminal in real time.