KR20170099004A - Situation judgment system and method based on voice/sound analysis - Google Patents

Situation judgment system and method based on voice/sound analysis Download PDF

Info

Publication number
KR20170099004A
KR20170099004A KR1020160020348A KR20160020348A KR20170099004A KR 20170099004 A KR20170099004 A KR 20170099004A KR 1020160020348 A KR1020160020348 A KR 1020160020348A KR 20160020348 A KR20160020348 A KR 20160020348A KR 20170099004 A KR20170099004 A KR 20170099004A
Authority
KR
South Korea
Prior art keywords
voice
module
speaker
analyzing
ambient sound
Prior art date
Application number
KR1020160020348A
Other languages
Korean (ko)
Other versions
KR101799874B1 (en
Inventor
최종석
임윤섭
김래현
김재관
서형호
최규태
박호진
이현우
권순일
백성욱
전석봉
정희석
진세훈
장준혁
Original Assignee
한국과학기술연구원
(주)파워보이스
한양대학교 산학협력단
세종대학교산학협력단
주식회사 와이즈넛
(주)트리포스
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국과학기술연구원, (주)파워보이스, 한양대학교 산학협력단, 세종대학교산학협력단, 주식회사 와이즈넛, (주)트리포스 filed Critical 한국과학기술연구원
Priority to KR1020160020348A priority Critical patent/KR101799874B1/en
Publication of KR20170099004A publication Critical patent/KR20170099004A/en
Application granted granted Critical
Publication of KR101799874B1 publication Critical patent/KR101799874B1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

Disclosed are a situation judgment system based on voice/sound analysis and a method thereof. The situation judgment system based on voice/sound analysis comprises: a speaker mobile terminal, which transmits speakers voice and surrounding sound; a situation judgment server comprising a call receiving module for receiving the speakers voice and the surrounding sound from the speaker mobile, an age information inferring module for inferring speakers age information by analyzing the voice and the surrounding sound received from the call receiving module, a gender information inferring module for inferring speakers gender information by analyzing the voice and the surrounding sound received from the call receiving module, a mental state inferring module for inferring a speakers mental state by analyzing the voice and the surrounding sound received from the call receiving module, and a truth/untruth inferring module for inferring speakers truth/untruth by analyzing the voice and the surrounding sound received from the call receiving module; and a user terminal, which displays the age information inferred by the age information inferring module, the gender information inferred by the gender information inferring module, the mental state inferred by the metal state inferring module, and the truth/untruth inferred by the truth/untruth module.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system and a method for determining a situation based on voice / sound analysis,

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a situation determination system and method, and more particularly, to a system and method for determining a situation based on voice / sound analysis.

A lot of emergency calls or criminal incidents are reported every year.

However, a false report is a significant proportion of the report of such a congestion.

Unreasonable calls are frequent every few seconds, so you have to judge whether you are falsely notified within a short period of time and make a quick decision early on.

The recipient of the telephone call should not miss the true telephone call while attracting the false telephone call for a long time.

On the other hand, if it is not a false report, the speaker is often embarrassed and often fails to deliver the situation properly. In this case, it is necessary to quickly and accurately judge not only the authenticity of the declaration but also the urgency of the declaration through the voice of the speaker or the surrounding sound, as well as grasping and reasoning as much information as possible about the accident scene or the speaker in a short time.

However, there is a problem that it is not easy to grasp a lot of information accurately and quickly within a short time, and accuracy and objectivity may not be constant.

It is an object of the present invention to provide a voice / sound analysis based situation determination system.

It is another object of the present invention to provide a method for determining a situation based on voice / sound analysis.

According to an aspect of the present invention, there is provided a voice / sound analysis based situation determination system including: a speech mobile terminal for transmitting a voice of a speaking person and a surrounding sound; An age information inference module for inferring age information of the talker by analyzing the voice and the ambient sound received from the call receiving module; A module for inferring a gender of the talker by analyzing the voice and the ambient sound received from the module; a psychological state reasoning module for analyzing the voice and the ambient sound received from the call receiving module to infer the psychological state of the talker; And a truth / false inference module for inferring the truth / false of the talker by analyzing the voice and the ambient sound received from the call reception module. The gender information deduced by the gender information reasoning module, the psychological state inferred from the psychological state reasoning module, and the truth / false information deduced from the truth / false inference module, May be configured to include a terminal.

In this case, when the call reception module receives the voice of the speaker and the ambient sound from the speaking terminal mobile terminal, the global positioning system (GPS) function of the speaking terminal is turned on through the corresponding communication server server and a global positioning system (GPS) remote control module for controlling the operation of the GPS module.

According to another aspect of the present invention, there is provided a speech / sound analysis based context determination method, Transmitting a voice of the speaker and ambient sound; Receiving a voice of the speaker and ambient sound from the speaker terminal; Analyzing the received voice and the ambient sound to infer the age information of the speaker; Analyzing the received voice and ambient sounds to infer the gender information of the speaker; Analyzing the received voice and the ambient sound to infer the psychological state of the speaker; Analyzing the received voice and ambient sound to infer the truth / falsehood of the speaker; And displaying the age information, gender information, psychological state, and truth / falsehood of the user terminal inferred by the situation judgment server.

Here, when the situation determination server receives the voice of the talker and the ambient sound from the speaking terminal, the situation determination server turns on the global positioning system (GPS) function of the speaking terminal through the corresponding communication company server, and receiving and displaying GPS coordinates from the speaker terminal in real time.

According to the voice / sound analysis based situation determination system and method described above, it is possible to deduce the age, gender, psychological state, truth / false and surrounding situation of a speaking person from the voice of the speaking person and the surrounding sound, And it is effective to judge the exact situation according to the accident report.

1 is a block diagram of a voice / sound analysis based situation determination system according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating a method of determining a context based on speech / sound analysis according to an exemplary embodiment of the present invention.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail to the concrete inventive concept. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

The terms first, second, A, B, etc. may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

1 is a block diagram of a voice / sound analysis based situation determination system according to an embodiment of the present invention.

Referring to FIG. 1, a voice / sound analysis based situation determination system 100 according to an embodiment of the present invention is configured to include a speaking terminal MT 110, a situation determination server 120, and a user terminal 130 .

Hereinafter, the detailed configuration will be described.

The talker mobile terminal 110 may be configured to transmit the speech of the speaking party and the ambient sound. Most of the recent reported calls tend to be made through mobile terminals. The talker mobile terminal 110 may also be configured as a mobile terminal.

The situation judgment server 120 includes a call receiving module 121, an age information reasoning module 122, a gender information reasoning module 123, a psychology state reasoning module 124, a truth / false inference module 125, A module 126, a peripheral acoustic database 127, and a GPS remote control module 128. [

Hereinafter, the detailed configuration will be described.

The call receiving module 121 may be configured to receive the voice of the speaking party and the ambient sound from the speaking terminal mobile terminal 110. [ The voice and ambient sounds received at the call receiving module 121 may be configured to be automatically backed up to the voice call database. Automatically backed up voice and ambient sounds can be used for real-time analysis.

The call reception module 121 distinguishes the voice and the ambient sound to extract the voice and the ambient sound, respectively, and the extracted voice includes an age information reasoning module 122, a gender information reasoning module 123, a psychology state reasoning module 124, And inference of the truth / false inference module 125.

The age information inference module 122 may be configured to infer the age information of the speaker by analyzing the voice and the ambient sound received from the call reception module 121. [

Specifically, the age information can be inferred according to the following inference criteria.

Generally speaking, there are some factors that cause significant difference in speaking behaviors of elderly people compared to young people. In the elderly, the speed of speech is generally slower than that of young people, and the speed of speech of syllables is not constant. In addition, the silence is inserted in an inappropriate position, and there is a tendency to exhibit abnormal behaviors in pronounciation and pronation.

On the other hand, younger adults showed longer MPT (maximum phonation time) than older adults, which means that the elongation performance of vowels tends to decrease with age. The alternating motion rate (AMR) and sequential motion rate (SMR), which check the repetition rate and regularity of syllables, are also found to be faster in younger people than in older people.

On the other hand, the elderly have lower cognitive sensation and motor function, which contribute to horse output, than the older age group. Therefore, the overall speech rate and articulation rate are slowed down. .

In addition, the elderly exhibit a high incidence of disability in both subjective and objective aspects, and the elderly women exhibit a significantly higher voice disorder index than adult women.

And for men, the vocal pitch is lowered from 40 to 50 and then rising again, and women tend to fall in pitch as they get older.

As a result of measuring jitter and shimmer, the rate of change of vibration and the regularity of waveform are increased in elderly males, and the rate of change of vibration in elderly females only tends to increase. Here, jitter is the rate of change of the vocal fold vibration, and the shimmer means the regularity of the voice waveform. This tendency is indicative of a decrease in laryngeal function or a degenerative change in the laryngeal tissue. As a result of the measurement of the noise contrast ratio, which is another indicator of the stability of the vocalization, it is significantly increased in the elderly woman, which supports the instability of the vocalization according to the age increase.

The change of the voice index due to degenerative changes of the larynx tends to show a larger value in the jitter of the vocal fold vibration.

The gender information inference module 123 may be configured to infer the gender of the speaker by analyzing the voice and the ambient sound received from the call receiving module 121. [

The gender information inference module 123 may be configured to deduce gender based on the following criteria.

The gender information inference module 123 can analyze the difference between the maximum speech duration time, the fundamental frequency, the frequency variation rate, the amplitude variation rate, the noise to noise ratio, the average fundamental frequency, the maximum fundamental frequency, and the minimum fundamental frequency.

According to gender, there are significant differences in fundamental frequency, frequency variation, amplitude variation, and maximum fundamental frequency. In addition, there is no significant difference according to gender, noise - to - noise ratio, average fundamental frequency, and minimum fundamental frequency. In addition, the fundamental frequency shows a significant difference between the annual utterance and vocal extension.

The psychological state inference module 124 may be configured to infer the psychological state of the speaker by analyzing the voice and the ambient sound received at the call receiving module 121. [

The psychological state and intention can be inferred by the following criteria.

First, the personality of the speaker can be deduced through the spoken behavior. The extroversion and introversion of the speaker can be judged on the basis of the speaking rate, silence length, silence frequency, and relative variation of the pitch.

In addition, the emotion inference engine for judging one emotion state of pleasant / pleasant / stable from the EEG / pulse wave sensing information of a speaking person can grasp the emotion, personality, psychological state, intention, etc. of the speaking person in various aspects.

The truth / false inference module 125 may be configured to infer the truth / falsehood of the speaker by analyzing the voice and ambient sounds received at the call receiving module 121.

The truth / falsehood of a speaker can be inferred by the following criteria.

First, the speaker's answer to the question of the sender's question can be stored for 5 seconds and analyzed to judge the truth or falsehood.

Here, the report taker can be configured to ask questions of the same pattern, to pre-set some answers to follow these questions, and to judge truth / falsehood through them.

The peripheral acoustic reasoning module 126 may be configured to use the ambient sound extracted from the call receiving module 121 for grasping the surrounding situation.

The ambient acoustic reasoning module 126 may be configured to infer the ambient sound to be the acoustic of the ambient sound in contrast to the sound of the previously stored acoustic database 127. [

For example, it is possible to preliminarily store sound such as a car sound, a human sound, a rain sound, and the like in the sound database 127 and determine the surrounding situation in relation to each other.

The GPS remote control module 128 receives the voice of the speaking party and the ambient sound from the speaking terminal 110 in the call receiving module 121 and transmits the global positioning system (GPS) of the speaking terminal 110 Quot; function < / RTI >

It is preferable that the GPS remote control module 128 turns on the GPS function of the speaking terminal 110 and monitors the GPS coordinates in real time because the error is significant in the case of the positioning by the base station or the Wi-Fi.

The user terminal 130 may include age information deduced from the age information reasoning module 122, gender information deduced from the gender information reasoning module 123, psychological state deduced from the psychological state reasoning module 124, and truth / A truth / false inferred from the sound module 125, and the ambient conditions of the ambient sound reasoning module 126 in real time.

Also, the user terminal 130 may be configured to display the GPS coordinates of the speaker terminal 110 in real time by the GPS remote control module 128.

FIG. 2 is a flowchart illustrating a method of determining a context based on speech / sound analysis according to an exemplary embodiment of the present invention.

Referring to FIG. 2, the talker mobile terminal 110 transmits voice of a speaking party and ambient sound (S101).

Next, the situation determination server 120 receives the voice of the speaking party and the ambient sound from the speaking terminal 110 (S102).

Next, the situation determination server 120 analyzes the received voice and the ambient sound to infer the age information of the speaker (S103).

Next, the situation determination server 120 analyzes the received voice and the ambient sound to infer the gender information of the speaker (S104).

Next, the situation determination server 120 analyzes the received voice and the ambient sound to infer the psychological state of the speaker (S105).

Next, the situation determination server 120 analyzes the received voice and the ambient sound to infer the truth / falsehood of the speaker (S106).

Next, the user terminal 130 displays the age information, gender information, psychological state, and truth / false information deduced by the situation judgment server 120 (S107).

Next, when the situation determination server 120 receives the voice of the speaker and the ambient sound from the speaker terminal 110, the situation determination server 120 transmits the voice of the speaker terminal 110 through the corresponding communication company server 10 The global positioning system (GPS) function is turned on and the GPS coordinates are received from the speaking terminal 110 in real time and displayed (S108).

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention as defined in the following claims. There will be.

110: Speaker Mobile terminal
120: Situation determination server
121: Call receiving module
122: Age Information Inference Module
123: Gender Information Inference Module
124: psychological state inference module
125: Truth / False Inference Module
126: Peripheral acoustic reasoning module
127: ambient acoustic database
128: GPS remote control module
130: User terminal

Claims (4)

A speaker terminal for transmitting a voice of the speaker and a surrounding sound;
An age information inference module for inferring age information of the talker by analyzing the voice and the ambient sound received from the call receiving module; A module for inferring a gender of the talker by analyzing the voice and the ambient sound received from the module; a psychological state reasoning module for analyzing the voice and the ambient sound received from the call receiving module to infer the psychological state of the talker; And a truth / false inference module for inferring the truth / false of the talker by analyzing the voice and the ambient sound received from the call reception module.
The gender information deduced by the gender information reasoning module, the psychological state inferred from the psychological state reasoning module, and the truth / false information deduced from the truth / false inference module, A voice / sound analysis based context determination system including a terminal.
The system according to claim 1,
When the call receiving module receives the voice of the speaking party and the ambient sound from the speaking mobile terminal, the global positioning (GPS) function of turning on the global positioning system (GPS) function of the speaking party mobile terminal through the corresponding communication company server system) remote control module. < Desc / Clms Page number 21 >
Transmitting a voice of a speaking party and ambient sound to a speaker terminal;
Receiving a voice of the speaker and ambient sound from the speaker terminal;
Analyzing the received voice and the ambient sound to infer the age information of the speaker;
Analyzing the received voice and ambient sounds to infer the gender information of the speaker;
Analyzing the received voice and the ambient sound to infer the psychological state of the speaker;
Analyzing the received voice and ambient sound to infer the truth / falsehood of the speaker;
And displaying the age information, the gender information, the psychological state, and the truth / false information of the user terminal inferred by the situation judgment server.
The method of claim 3,
When the situation determination server receives the voice of the talker and the ambient sound from the talker mobile terminal, the situation determination server turns on the global positioning system (GPS) function of the talker mobile terminal through the corresponding communication company server, And receiving and displaying GPS coordinates from the speaker terminal in real time.
KR1020160020348A 2016-02-22 2016-02-22 Situation judgment system and method based on voice/sound analysis KR101799874B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020160020348A KR101799874B1 (en) 2016-02-22 2016-02-22 Situation judgment system and method based on voice/sound analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020160020348A KR101799874B1 (en) 2016-02-22 2016-02-22 Situation judgment system and method based on voice/sound analysis

Publications (2)

Publication Number Publication Date
KR20170099004A true KR20170099004A (en) 2017-08-31
KR101799874B1 KR101799874B1 (en) 2017-12-21

Family

ID=59761369

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020160020348A KR101799874B1 (en) 2016-02-22 2016-02-22 Situation judgment system and method based on voice/sound analysis

Country Status (1)

Country Link
KR (1) KR101799874B1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101998650B1 (en) * 2019-02-12 2019-07-10 한방유비스 주식회사 Collecting information management system of report of disaster

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10423773B1 (en) 2019-04-12 2019-09-24 Coupang, Corp. Computerized systems and methods for determining authenticity using micro expressions

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101998650B1 (en) * 2019-02-12 2019-07-10 한방유비스 주식회사 Collecting information management system of report of disaster

Also Published As

Publication number Publication date
KR101799874B1 (en) 2017-12-21

Similar Documents

Publication Publication Date Title
JP5834449B2 (en) Utterance state detection device, utterance state detection program, and utterance state detection method
EP1222448B1 (en) System, method, and article of manufacture for detecting emotion in voice signals by utilizing statistics for voice signal parameters
ES2242634T3 (en) TELEPHONE EMOTION DETECTOR WITH OPERATOR FEEDBACK.
US8160210B2 (en) Conversation outcome enhancement method and apparatus
US20140314212A1 (en) Providing advisory information associated with detected auditory and visual signs in a psap environment
WO2017085992A1 (en) Information processing apparatus
EP4020467A1 (en) Voice coaching system and related methods
JP2017100221A (en) Communication robot
US11699043B2 (en) Determination of transcription accuracy
KR101799874B1 (en) Situation judgment system and method based on voice/sound analysis
JP6695057B2 (en) Cognitive function evaluation device, cognitive function evaluation method, and program
JP2020000713A (en) Analysis apparatus, analysis method, and computer program
JP6598227B1 (en) Cat-type conversation robot
JP2006230446A (en) Health-condition estimating equipment
KR20180052907A (en) System and method of supplying graphic statistics using database based on voice/sound analysis
KR20180052909A (en) Interface system and method for database based on voice/sound analysis and legacy
JP6718623B2 (en) Cat conversation robot
KR20170098445A (en) Situation judgment apparatus based on voice/sound analysis
KR102571549B1 (en) Interactive elderly neglect prevention device
KR20170098446A (en) Situation judgment ethod based on voice/sound analysis
US20130143543A1 (en) Method and device for automatically switching a profile of a mobile phone
KR20190085272A (en) Open api system and method of json format support by mqtt protocol
KR102000282B1 (en) Conversation support device for performing auditory function assistance
KR20180019375A (en) Condition check and management system and the method for emotional laborer
KR101329175B1 (en) Sound analyzing and recognizing method and system for hearing-impaired people

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right