KR20220110884A

KR20220110884A - Sound Visualization Device using Deep Learning and Its Control Method

Info

Publication number: KR20220110884A
Application number: KR1020210013840A
Authority: KR
Inventors: 남영호; 이동욱; 권영민; 조재준
Original assignee: 경상국립대학교산학협력단
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2022-08-09

Abstract

The present invention relates to a sound visualization device using deep learning, and a control method therefor. Specifically, the present invention relates to the sound visualization device using deep learning comprising a sound information recognizing part that recognizes the sound information inputted through a microphone provided in a terminal of a user, a sound information classifying part that classifies the sound information according to a type of situation using an artificial neural network model, and a visualization information generating part that generates the visualization information visualizing the type of situation in which the sound information is classified, and is characterized in being connected to enable communication with the terminal and transmitting the visualization information to the terminal; and the control method therefor. Therefore, the present invention is capable of having an effect of being recognized more accurately.

Description

Sound Visualization Device using Deep Learning and Its Control Method}

본 발명은 딥러닝을 활용한 소리 시각화 디바이스 및 이의 제어방법에 관한 것으로, 보다 구체적으로 소리를 들을 수 없는 청각장애인이 위험에 노출되는 것을 방지하도록 단말기에 구비된 마이크를 통해서 입력된 소리 정보를 인식한 후 인공 신경망 모델을 이용하여 분류된 소리 정보를 시각화하는 딥러닝을 활용한 소리 시각화 디바이스 및 이의 제어방법에 관한 것이다. The present invention relates to a sound visualization device using deep learning and a method for controlling the same, and more specifically, to prevent a deaf person who cannot hear a sound from being exposed to risk, recognizing sound information input through a microphone provided in a terminal Then, it relates to a sound visualization device using deep learning that visualizes classified sound information using an artificial neural network model and a control method thereof.

일반적으로 청각 장애는 외이로부터 대뇌에서 소리를 이해하기까지의 청각 경로에 장애를 입어 소리를 듣기가 어려운 장애를 말한다. 이러한 청각 장애는 선천적 유전요인 혹은 후천적 사고로 발생하게 된다.In general, hearing impairment refers to a disorder in which it is difficult to hear sound due to a disturbance in the auditory path from the outer ear to the cerebrum to understand sound. These hearing impairments are caused by congenital genetic factors or acquired accidents.

청각장애인은 일상에서 수화, 필담, 구화를 통해 대화하게 되는데, 이는 소리를 시각화시켜 소통을 하는 것이다. 다만, 일반인 대다수는 수화에 익숙하지 않으므로 청각장애인과의 소통에 어려움이다. 그리고 필담의 경우 소통의 속도가 느려 긴급한 상황에서 소통하는데 문제가 있다. 구화는 청각장애인이 일반인의 입 모양을 통해서 하고자 하는 말을 이해하는 것이나, 궁극적으로 청각장애인도 다시 일반인에게 말을 할 때 일반인이 이해할 수 있는 수단이 필요하므로 소통에 있어서 불완전한 수단이 된다. Hearing impaired people communicate through sign language, handwriting, and oral communication in their daily life, which communicates by visualizing sound. However, most of the general public are not familiar with sign language, so it is difficult to communicate with the hearing impaired. And in the case of handwriting, the communication speed is slow, so there is a problem in communicating in an emergency situation. Oral speech is an incomplete means of communication because deaf people understand what they want to say through the shape of their mouths, but ultimately, when hearing impaired people speak to the public again, they need a means that ordinary people can understand.

이를 해결하기 위하여, 관련문헌 1은 딥러닝 기반의 수화 및 음성 번역을 위한 청각 장애인용 웨어러블 디바이스에 관한 것으로, 외부에서 발생되는 음성을 텍스트로 변환하고, 수화에 대응하는 텍스트를 자동 완성한 후 해당 텍스트를 스피커를 통해 음성으로 출력함으로써 일반인과 청각 장애인의 의사소통을 원활히 하게 하나, 청각 장애인에게 텍스트로 내용을 전달하게 됨으로 주위에 긴급한 상황이 발생한 경우 즉각적으로 해당 상황을 알리거나 이해시킬 수 있지 못하는 한계가 있다. In order to solve this problem, related document 1 relates to a wearable device for the hearing impaired for sign language and voice translation based on deep learning, converts externally generated voice into text, automatically completes text corresponding to sign language, and then the text It facilitates communication between the general public and the hearing-impaired by outputting it as voice through the speaker, but as the text is delivered to the hearing-impaired, it is impossible to immediately inform or understand the situation when an urgent situation occurs around it. there is

KR 10-2019-0069786KR 10-2019-0069786

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로 청각장애인 주변 혹은 도시에서 자주 발생할 수 있는 소리를 통해서 어떤 상황인지 인식할 수 있도록 인공 신경망 모델을 이용하여 상기 소리 정보를 상황의 종류에 따라 분류하는 소리 정보 분류부를 포함하는 딥러닝을 활용한 소리 시각화 디바이스 및 이의 제어방법을 얻고자 하는 것을 목적으로 한다.In order to solve the above problems, the present invention classifies the sound information according to the type of situation by using an artificial neural network model so that it can recognize what kind of situation it is through a sound that can occur frequently in the vicinity of the hearing impaired or in the city. An object of the present invention is to obtain a sound visualization device using deep learning including an information classification unit and a control method thereof.

또한, 본 발명은 주위에 긴급한 상황이 발생한 경우 즉각적으로 해당 상황을 청각장애인에게 알리거나 이해시킬 수 있도록 상기 소리 정보가 분류된 상기 상황의 종류를 시각화한 시각화 정보를 생성하는 시각화 정보 생성부를 포함하는 딥러닝을 활용한 소리 시각화 디바이스 및 이의 제어방법을 제공하는 것이다.In addition, the present invention includes a visualization information generating unit that generates visualization information that visualizes the type of the situation in which the sound information is classified so that the hearing impaired can immediately notify or understand the situation when an urgent situation occurs around it. It is to provide a sound visualization device using deep learning and a method for controlling the same.

상기 목적을 달성하기 위하여, 본 발명의 딥러닝을 활용한 소리 시각화 디바이스는 사용자의 단말기에 구비된 마이크를 통해서 입력된 소리 정보를 인식하는 소리 정보 인식부; 인공 신경망 모델을 이용하여 상기 소리 정보를 상황의 종류에 따라 분류하는 소리 정보 분류부; 및 상기 소리 정보가 분류된 상기 상황의 종류를 시각화한 시각화 정보를 생성하는 시각화 정보 생성부;를 제공하고, 상기 단말기와 통신 가능하도록 연결되고, 상기 시각화 정보를 상기 단말기에 전송하는 것을 특징으로 한다.In order to achieve the above object, a sound visualization device using deep learning of the present invention includes a sound information recognition unit for recognizing sound information input through a microphone provided in a user's terminal; a sound information classification unit for classifying the sound information according to a type of situation using an artificial neural network model; and a visualization information generation unit for generating visualization information that visualizes the type of the situation in which the sound information is classified, and is connected to communicate with the terminal, characterized in that it transmits the visualization information to the terminal .

상기 목적을 달성하기 위하여, 본 발명의 딥러닝을 활용한 소리 시각화 디바이스의 제어방법은 소리 정보 인식부에 의하여, 사용자의 단말기에 구비된 마이크를 통해서 입력된 소리 정보가 인식되는 소리 정보 인식단계; 소리 정보 분류부에 의하여, 인공 신경망 모델이 이용되어 상기 소리 정보가 상황의 종류에 따라 분류되는 소리 정보 분류단계; 및 시각화 정보 생성부에 의하여, 상기 소리 정보가 분류된 상기 상황의 종류가 시각화된 시각화 정보가 생성되는 시각화 정보 생성단계;를 제공한다.In order to achieve the above object, a control method of a sound visualization device using deep learning of the present invention includes, by a sound information recognition unit, a sound information recognition step of recognizing sound information input through a microphone provided in a user's terminal; a sound information classification step in which, by the sound information classification unit, an artificial neural network model is used to classify the sound information according to the type of situation; and a visualization information generating step in which, by the visualization information generating unit, the type of the situation in which the sound information is classified is visualized, visualization information is generated.

이상과 같이 본 발명에 의하면 인공 신경망 모델을 이용하여 상기 소리 정보를 상황의 종류에 따라 분류하는 소리 정보 분류부를 구비함으로써, 청각장애인 주변 혹은 도시에서 자주 발생할 수 있는 소리를 통해서 어떤 상황인지 보다 정확하게 인식할 수 있는 효과가 있다.As described above, according to the present invention, by providing a sound information classification unit that classifies the sound information according to the type of situation using an artificial neural network model, more accurately recognize a situation through a sound that can occur frequently in the vicinity of the deaf or in the city. There is an effect that can be done.

또한, 본 발명은 상기 소리 정보가 분류된 상기 상황의 종류를 시각화한 시각화 정보를 생성하는 시각화 정보 생성부를 구비함으로써, 주위에 긴급한 상황이 발생한 경우 즉각적으로 해당 상황을 청각장애인에게 알리거나 이해시킬 수 있고, 이에 따라 청각장애인이 겪을 수 있는 사고를 미연에 예방할 수 있는 효과가 있다.In addition, the present invention is provided with a visualization information generating unit that generates visualization information that visualizes the type of the situation in which the sound information is classified, so that when an urgent situation occurs around, the situation can be immediately notified or understood to the hearing impaired. Accordingly, there is an effect that can prevent accidents that the hearing impaired may experience in advance.

도 1은 본 발명의 일실시예에 따른 단말기, 딥러닝을 활용한 소리 시각화 디바이스 및 웨어러블 디바이스를 표시한 도면이다.
도 2는 본 발명의 딥러닝을 활용한 소리 시각화 디바이스 구성도이다.
도 3은 본 발명의 딥러닝을 활용한 소리 시각화 디바이스의 제어방법 흐름도이다.1 is a view showing a terminal, a sound visualization device using deep learning, and a wearable device according to an embodiment of the present invention.
2 is a configuration diagram of a sound visualization device using deep learning of the present invention.
3 is a flowchart of a control method of a sound visualization device using deep learning of the present invention.

본 명세서에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in this specification have been selected as currently widely used general terms as possible while considering the functions in the present invention, which may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technology, and the like. In addition, in a specific case, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding invention. Therefore, the term used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, rather than the name of a simple term.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

딥러닝을 활용한 소리 시각화 디바이스Sound visualization device using deep learning

이하, 본 발명에 따른 실시예를 첨부한 도면을 참조하여 상세히 설명하기로 한다. 도 1은 본 발명의 일실시예에 따른 단말기, 딥러닝을 활용한 소리 시각화 디바이스 및 웨어러블 디바이스를 표시한 도면이다. 도 2는 본 발명의 딥러닝을 활용한 소리 시각화 디바이스 구성도이다. Hereinafter, an embodiment according to the present invention will be described in detail with reference to the accompanying drawings. 1 is a view showing a terminal, a sound visualization device using deep learning, and a wearable device according to an embodiment of the present invention. 2 is a configuration diagram of a sound visualization device using deep learning of the present invention.

우선, 도 1을 보면, 본 발명의 본 발명의 딥러닝을 활용한 소리 시각화 디바이스(200)는 사용자의 단말기(100)와 통신 가능하도록 유무선 방식으로 연결될 수 있다. 그리고 상기 단말기(100)는 사용자로부터 착용된 웨어러블 디바이스(300)와 유무선 방식으로 연결될 수 있다. First, referring to FIG. 1 , the sound visualization device 200 utilizing the deep learning of the present invention may be connected in a wired or wireless manner so as to be able to communicate with the user's terminal 100 . In addition, the terminal 100 may be connected to the wearable device 300 worn by the user in a wired or wireless manner.

다만, 통신하는데 상기 단말기(100), 딥러닝을 활용한 소리 시각화 디바이스(200) 및 웨어러블 디바이스(300)가 중복되지 않도록 하는 것이 가장 바람직하다. 그리고 외부의 해킹으로 인한 오류가 출력되거나 개인정보가 유출되는 것을 방지할 수 있도록 상기 단말기(100)와 딥러닝을 활용한 소리 시각화 디바이스(200)가 연결되면 이외의 기기가 접근하지 못하도록 접근방지 보완기능이 활성화되는 것을 특징으로 한다. However, it is most preferable not to overlap the terminal 100, the sound visualization device 200 using deep learning, and the wearable device 300 for communication. In addition, when the terminal 100 and the sound visualization device 200 using deep learning are connected to prevent an error due to external hacking or leakage of personal information, access prevention is supplemented to prevent other devices from accessing it. It is characterized in that the function is activated.

또한, 가장 바람직하게 사용자는 외부의 소리를 듣는데 어려움이 있는 청각장애인일 수 있다. 상기 단말기(100)는 사용자가 항시 가지고 다닐 수 있도록 휴대가 용이하고 주변의 외부 소리를 입력받을 수 있는 마이크를 구비하고 상기 딥러닝을 활용한 소리 시각화 디바이스(200)를 원격 제어할 수 있는 어플리케이션 설치가 가능한 것일 수 있고, 예컨대 스마트폰, 태블릿PC 등일 수 있다. 또한, 상기 웨어러블 디바이스(300)는 상기 단말기(100)를 주머니 혹은 가방에서 꺼내지 않고 사용자가 즉각적으로 소리 정보가 시각화된 시각화 정보를 확인할 수 있도록 하기 위한 것으로, 예컨대 스마트 와치 등일 수 있다.Also, most preferably, the user may be a deaf person having difficulty in hearing external sounds. The terminal 100 is easy to carry so that the user can carry it at all times, has a microphone that can receive external sounds, and installs an application that can remotely control the sound visualization device 200 using the deep learning may be possible, for example, a smartphone, a tablet PC, and the like. In addition, the wearable device 300 is for allowing the user to immediately check the visualization information in which the sound information is visualized without taking the terminal 100 out of the pocket or bag, and may be, for example, a smart watch.

다음으로 도 2를 보면, 본 발명의 본 발명의 딥러닝을 활용한 소리 시각화 디바이스(200)는 소리 정보 인식부(210), 소리 정보 분류부(220) 및 시각화 정보 생성부(230)를 포함할 수 있다. Next, referring to FIG. 2 , the sound visualization device 200 using deep learning of the present invention includes a sound information recognition unit 210 , a sound information classification unit 220 , and a visualization information generation unit 230 . can do.

보다 구체적으로, 상기 소리 정보 인식부(210)는 사용자의 단말기(100)에 구비된 마이크를 통해서 입력된 소리 정보를 인식한다. 이때, 상기 마이크는 아날로그 신호인 상기 단말기(100) 주변에서 발생되는 수많은 소리를 디지털 신호로 변환할 수 있다. 즉, 상기 소리 정보는 디지털 신호이다. More specifically, the sound information recognition unit 210 recognizes sound information input through a microphone provided in the user's terminal 100 . In this case, the microphone may convert numerous sounds generated around the terminal 100, which are analog signals, into digital signals. That is, the sound information is a digital signal.

그리고 상기 소리 정보 인식부(210)는 상기 소리 정보가 입력되면 외부에서 소리가 발생한 것으로 인식할 수 있고, 상기 소리 정보를 증폭된 전기적 신호로 출력할 수 있다. 그리고 노이즈가 제거될 수 있도록 상기 소리 정보 내 소리의 파형 중 기 설정된 임계치 미만을 가지는 신호는 제거할 수 있다. In addition, when the sound information is input, the sound information recognition unit 210 may recognize that a sound is generated from the outside, and may output the sound information as an amplified electrical signal. In addition, a signal having less than a preset threshold among sound waveforms in the sound information may be removed so that noise may be removed.

다음으로, 상기 소리 정보 분류부(220)는 인공 신경망 모델을 이용하여 상기 소리 정보를 상황의 종류에 따라 분류할 수 있다. 이때, 상기 소리 정보 분류부(220)는 상기 인공 신경망 모델을 구축하기 위하여 학습부(221)를 포함할 수 있다. 상기 학습부(221)는 도시환경에서 들을 수 있는 소리에 대한 데이터 셋을 이용하여 각각의 소리에 대한 특징을 추출한 후 상기 인공 신경망 모델에 학습시킬 수 있다.Next, the sound information classification unit 220 may classify the sound information according to the type of situation by using the artificial neural network model. In this case, the sound information classification unit 220 may include a learning unit 221 to build the artificial neural network model. The learning unit 221 may extract features of each sound using a data set of sounds audible in an urban environment, and then train the artificial neural network model to learn.

예컨대, 상기 학습부(221)는 도시환경에서 들을 수 있는 소리에 대한 데이터 셋으로 URBANSOUND8K 데이터 세트를 사용할 수 있다. URBANSOUND8K 데이터 세트는 에어컨 소리, 차 경적 소리, 아이들 노는 소리, 개 짖는 소리, 드릴 소리, 엔진 소리, 총 소리, 잭해머(jackhammer) 소리, 사이렌 소리, 길거리 음악소리 등을 도시에서 발생할 수 있는 소리를 4초 이하로 발췌하여 각각의 소리에 대한 레이블이 부착된 파일을 저장해 둔 데이터베이스로써, 누구나 접근하여 소정의 금액을 지불한 후 csv 형식의 파일을 다운로드 받을 수 있다. For example, the learning unit 221 may use the URBANSOUND8K data set as a data set for sounds that can be heard in an urban environment. The URBANSOUND8K data set includes the sounds that can occur in a city, such as air conditioning, car horns, children playing, dog barking, drills, engines, guns, jackhammers, sirens, street music, etc. It is a database that stores files with labels attached to each sound extracted in less than 4 seconds. Anyone can access and pay a certain amount of money to download a csv file.

즉, 상기 학습부(221)는 상기와 같은 수많은 상황에 대한 소리를 학습할 경우 큰 부하를 부담함에 따라 속도가 느려질 수 있으므로, 상기 학습부(221)는 가장 바람직하게 청각장애인의 안전을 보장하고 외부소리에 대응하는데 필수적인 상황에 대한 소리만을 기 설정한 후 학습할 수 있다. That is, since the learning unit 221 may be slowed down as it bears a large load when learning sounds for numerous situations as described above, the learning unit 221 most preferably ensures the safety of the hearing impaired and Only sounds for situations essential to responding to external sounds can be preset and then learned.

예컨대, 상기 학습부(221)는 차 경적 소리, 개 짖는 소리, 공사 소리, 사이렌 소리, 초인종 소리를 포함하는 5가지 상황에 대한 소리가 기 발췌된 데이터 셋을 이용하여 각각의 상황에 대한 소리 특징을 추출한 후 상기 인공 신경망 모델에 학습시킬 수 있다. 그러면 상기 소리 정보 분류부(220)는 상기 소리 정보 인식부(210)로부터 노이즈 제거 및 증폭 처리된 상기 소리 정보를 상기 인공 신경망 모델을 이용하여 상기 5가지 상황 중 상기 소리 정보에 포함된 상황의 종류에 따라 상기 소리 정보를 분류할 수 있다. For example, the learning unit 221 uses a data set from which sounds for five situations including a car horn sound, a dog barking sound, a construction sound, a siren sound, and a doorbell sound are extracted, and the sound characteristics for each situation. After extracting, it can be trained on the artificial neural network model. Then, the sound information classification unit 220 uses the artificial neural network model to use the artificial neural network model for the sound information processed by noise removal and amplification from the sound information recognition unit 210, and the type of situation included in the sound information among the five situations. The sound information may be classified according to

한편, 상기 소리 정보 분류부(220)는 분류하고자 하는 상황의 종류가 설정되면 각 상황에 대한 위험도를 설정할 수 있다. 상황의 종류에 따라 사용자가 즉각적으로 행동해야하는 위험한 상황일수록 위험도를 높게 부여할 수 있다. 예컨대, 차 경적 소리, 개 짖는 소리, 공사 소리, 사이렌 소리, 초인종 소리를 포함하는 5가지 상황에 대해서 상기 소리 정보 분류부(220)는 사이렌 소리, 차 경적 소리에 대하여 위험도를 높게 부여할 수 있고, 위험도에 따른 가중치를 부여할 수 있다.Meanwhile, the sound information classification unit 220 may set a degree of risk for each situation when the type of situation to be classified is set. Depending on the type of situation, the higher the risk, the higher the risk may be given to a dangerous situation in which the user must act immediately. For example, for five situations including a car horn sound, a dog barking sound, a construction sound, a siren sound, and a doorbell sound, the sound information classification unit 220 may assign a high risk to the siren sound and the car horn sound, , weights can be assigned according to the degree of risk.

다음으로, 상기 시각화 정보 생성부(230)는 상기 소리 정보가 분류된 상기 상황의 종류를 시각화한 시각화 정보를 생성한다. 상기 시각화 정보 생성부(230)는 상기 소리 정보 분류부(220)로부터 분류하고자 하는 상황의 종류가 설정되면 이에 대응하는 이미지 또는 영상이 기 저장될 수 있다. Next, the visualization information generating unit 230 generates visualization information that visualizes the type of the situation in which the sound information is classified. When the type of situation to be classified is set by the sound information classification unit 220 , the visualization information generating unit 230 may pre-store an image or an image corresponding thereto.

한편, 상기 시각화 정보 생성부(230)는 상기 소리 정보 분류부(220)로부터 상기 소리 정보가 2개 이상의 상황의 종류로 분류된다면, 2개 이상의 상기 상황의 종류로 유추할 수 있는 유추 상황을 시각화한 시각화 정보를 생성할 수 있다. 상기 유추 상황은 상기 상황의 종류의 개수에 따라 발생할 수 있는 경우에 수만큼 생성될 수 있고, 분류된 상기 소리 정보가 상기 유추 상황에 포함되지 않는다면 각각의 상황에 대한 시각화 정보가 개별적으로 표시되도록 할 수 있다.On the other hand, the visualization information generating unit 230 visualizes analogous situations that can be inferred into two or more types of situations if the sound information is classified into two or more types of situations from the sound information classification unit 220 . One visualization information can be created. The analogous situation can be generated as many as the number of cases that can occur according to the number of types of the situation, and if the classified sound information is not included in the analogy situation, the visualization information for each situation is displayed individually can

예컨대, 상기 언급한 것과 같이 상기 학습부(221)는 차 경적 소리, 개 짖는 소리, 공사 소리, 사이렌 소리, 초인종 소리를 포함하는 5가지 상황에 대한 소리를 상기 인공 신경망 모델에 학습시키고, 상기 소리 정보 분류부(220)는 상기 소리 정보를 개 짖는 소리와 초인종 소리로 분류할 수 있다. 이때, 상기 시각화 정보 생성부(230)는 차 경적 소리, 개 짖는 소리, 공사 소리, 사이렌 소리, 초인종 소리를 포함하는 5가지 상황에 대한 이미지 또는 영상이 기 저장될 수 있고, 상기 유추 상황에 대한 이미지 또는 영상이 기 저장될 수 있다. 그리고 상기 시각화 정보 생성부(230)는 개 짖는 소리와 초인종 소리로 분류된 상기 소리 정보에 대하여 개 짖는 소리와 초인종 소리가 발생할 수 있는 상황에 대해 기 저장된 이미지 또는 영상을 상기 시각화 정보로 생성할 수 있다. For example, as described above, the learning unit 221 trains the artificial neural network model to learn sounds for five situations including a car horn sound, a dog barking sound, a construction sound, a siren sound, and a doorbell sound, and the sound The information classification unit 220 may classify the sound information into a dog barking sound and a doorbell sound. At this time, the visualization information generating unit 230 may pre-store images or images for five situations including a car horn sound, a dog barking sound, a construction sound, a siren sound, and a doorbell sound, and An image or an image may be pre-stored. In addition, the visualization information generating unit 230 may generate a pre-stored image or image for a situation in which a dog barking sound and a doorbell sound may occur with respect to the sound information classified into a dog barking sound and a doorbell sound as the visualization information. have.

한편, 상기 시각화 정보 생성부(230)는 상기 단말기(100) 내 스케줄러, GPS 및 메시지와 연동되어 상기 시각화 정보를 생성할 수 있다. 상기 시각화 정보 생성부(230)는 개 짖는 소리와 초인종 소리가 동시에 발생할 수 있는 상기 유추 상황으로 택배원 방문, 지인 방문에 대한 상기 시각화 정보를 기 저장할 수 있다. 그리고 상기 단말기(100) 내 스케줄러에 해당 날짜와 시간에 지인 방문이 저장되어 있다면 상기 시각화 정보로 지인 방문에 대한 이미지를 생성할 수 있고, 상기 단말기(100) 내 수신 메시지에 택배배달예정 메시지가 있다면 상기 시각화 정보로 택배원 방문에 대한 이미지를 생성할 수 있고, 상기 단말기(100)로부터 어떠한 추가 정보도 확인되지 않는다면 개 짖는 소리와 초인종 소리가 발생할 수 있는 상황에 대해 기 저장된 이미지 또는 영상을 각각 개별적으로 상기 시각화 정보로 생성할 수 있다. Meanwhile, the visualization information generating unit 230 may generate the visualization information by interworking with a scheduler, a GPS, and a message in the terminal 100 . The visualization information generating unit 230 may pre-store the visualization information for a visit to a courier service person and a visit to an acquaintance as the analogous situation in which a dog barking sound and a doorbell sound may occur at the same time. And if a visit to an acquaintance is stored on the date and time in the scheduler in the terminal 100, an image of the visit of the acquaintance can be created with the visualization information, and if there is a delivery scheduled message in the received message in the terminal 100 An image of a courier visit can be created with the visualization information, and if no additional information is identified from the terminal 100, a pre-stored image or image for a situation in which a dog barking sound and a doorbell sound may occur individually It can be generated with the visualization information.

다음으로, 상기 딥러닝을 활용한 소리 시각화 디바이스(200)는 상기 시각화 정보를 상기 단말기(100)에 전송할 수 있고, 상기 단말기(100)는 상기 웨어러블 디바이스(300)에 전송할 수 있다. Next, the sound visualization device 200 utilizing the deep learning may transmit the visualization information to the terminal 100 , and the terminal 100 may transmit it to the wearable device 300 .

예컨대, 개를 키우고 있는 사용자가 집안에 있는 상황에서 상기 시각화 정보로 개 짖는 소리에 대한 이미지와 초인종 소리에 대한 이미지가 동시에 상기 웨어러블 디바이스(300)에 표시된다면 사용자는 사용자의 집에 누군가가 찾아온 것으로 판단할 수 있고, 컨트롤러를 이용하여 현관문을 열어줄 수 있는 행동을 즉각적으로 수행할 수 있다. 이에 따라, 사용자 즉, 외부소리를 듣는데 어려움이 있는 청각장애인이 즉각적으로 외부상황을 파악한 후 행동할 수 있도록 하는 효과가 있다. For example, if an image of a barking sound and an image of a doorbell sound are displayed on the wearable device 300 at the same time as the visualization information in a situation in which a user who has a dog is at home, the user is deemed to have visited the user's house. It can judge and use the controller to immediately perform an action that can open the front door. Accordingly, there is an effect of enabling the user, that is, the hearing impaired who has difficulty in hearing external sounds, to immediately recognize the external situation and then act.

딥러닝을 활용한 소리 시각화 디바이스의 제어방법Control method of sound visualization device using deep learning

이하, 본 발명에 따른 실시예를 첨부한 도면을 참조하여 상세히 설명하기로 한다. 도 3은 본 발명의 딥러닝을 활용한 소리 시각화 디바이스의 제어방법 흐름도이다.Hereinafter, an embodiment according to the present invention will be described in detail with reference to the accompanying drawings. 3 is a flowchart of a control method of a sound visualization device using deep learning of the present invention.

도 3을 보면, 본 발명의 딥러닝을 활용한 소리 시각화 디바이스의 제어방법은 소리 정보 인식단계(S100), 소리 정보 분류단계(S200) 및 시각화 정보 생성단계(S300)를 포함한다. Referring to FIG. 3 , the control method of a sound visualization device using deep learning of the present invention includes a sound information recognition step (S100), a sound information classification step (S200), and a visualization information generation step (S300).

보다 구체적으로, 상기 소리 정보 인식단계(S100)는 소리 정보 인식부(210)에 의하여, 사용자의 단말기(100)에 구비된 마이크를 통해서 입력된 소리 정보가 인식된다. 이때, 상기 마이크는 아날로그 신호인 상기 단말기(100) 주변에서 발생되는 수많은 소리를 디지털 신호로 변환할 수 있다. 즉, 상기 소리 정보는 디지털 신호이다. More specifically, in the sound information recognition step ( S100 ), sound information input through a microphone provided in the user's terminal 100 is recognized by the sound information recognition unit 210 . In this case, the microphone may convert numerous sounds generated around the terminal 100, which are analog signals, into digital signals. That is, the sound information is a digital signal.

그리고 상기 소리 정보 인식단계(S210)는 상기 소리 정보가 입력되면 외부에서 소리가 발생한 것으로 인식될 수 있고, 상기 소리 정보가 증폭된 전기적 신호로 출력될 수 있다. 그리고 노이즈가 제거될 수 있도록 상기 소리 정보 내 소리의 파형 중 기 설정된 임계치 미만을 가지는 신호는 제거될 수 있다. In the sound information recognizing step (S210), when the sound information is input, it may be recognized that a sound is generated from the outside, and the sound information may be output as an amplified electrical signal. In addition, a signal having less than a preset threshold among sound waveforms in the sound information may be removed so that noise may be removed.

다음으로, 상기 소리 정보 분류단계(S200)는 소리 정보 분류부(220)에 의하여, 인공 신경망 모델이 이용되어 상기 소리 정보가 상황의 종류에 따라 분류된다. 이때, 상기 소리 정보 분류단계(S200)는 상기 인공 신경망 모델이 구축되기 위하여 학습단계(S210)를 포함할 수 있다. 상기 학습단계(S210)는 상기 소리 정보 분류부(220) 내 학습부(221)에 의하여, 도시환경에서 들을 수 있는 소리에 대한 데이터 셋이 이용되어 각각의 소리에 대한 특징이 추출된 후 상기 인공 신경망 모델에 학습될 수 있다.Next, in the sound information classification step ( S200 ), an artificial neural network model is used by the sound information classification unit 220 to classify the sound information according to the type of situation. In this case, the sound information classification step (S200) may include a learning step (S210) in order to construct the artificial neural network model. In the learning step (S210), a data set for sounds audible in an urban environment is used by the learning unit 221 in the sound information classification unit 220 to extract features of each sound, and then, the artificial It can be trained on a neural network model.

예컨대, 상기 학습단계(S210)는 도시환경에서 들을 수 있는 소리에 대한 데이터 셋으로 URBANSOUND8K 데이터 세트가 사용될 수 있다. URBANSOUND8K 데이터 세트는 에어컨 소리, 차 경적 소리, 아이들 노는 소리, 개 짖는 소리, 드릴 소리, 엔진 소리, 총 소리, 잭해머(jackhammer) 소리, 사이렌 소리, 길거리 음악소리 등을 도시에서 발생할 수 있는 소리를 4초 이하로 발췌하여 각각의 소리에 대한 레이블이 부착된 파일을 저장해 둔 데이터베이스로써, 누구나 접근하여 소정의 금액을 지불한 후 csv 형식의 파일을 다운로드 받을 수 있다. For example, in the learning step ( S210 ), the URBANSOUND8K data set may be used as a data set for sounds that can be heard in an urban environment. The URBANSOUND8K data set includes the sounds that can occur in a city, such as air conditioning, car horns, children playing, dog barking, drills, engines, guns, jackhammers, sirens, street music, etc. It is a database that stores files with labels attached to each sound extracted in less than 4 seconds. Anyone can access and pay a certain amount of money to download a csv file.

즉, 상기 학습단계(S210)는 상기와 같은 수많은 상황에 대한 소리가 학습될 경우 큰 부하가 부담됨에 따라 속도가 느려질 수 있으므로, 가장 바람직하게 청각장애인의 안전을 보장하고 외부소리에 대응하는데 필수적인 상황에 대한 소리만이 기 설정된 후 학습될 수 있다. That is, in the learning step (S210), when the sound for numerous situations as described above is learned, the speed may be slowed down as a large load is burdened, so it is essential to most preferably ensure the safety of the hearing impaired and respond to external sounds. Only the sound for can be learned after preset.

예컨대, 상기 학습단계(S210)는 차 경적 소리, 개 짖는 소리, 공사 소리, 사이렌 소리, 초인종 소리가 포함된 5가지 상황에 대한 소리가 기 발췌된 데이터 셋이 이용되어 각각의 상황에 대한 소리 특징이 추출된 후 상기 인공 신경망 모델에 학습될 수 있다. 그러면 상기 소리 정보 분류단계(S200)는 상기 소리 정보 인식단계(S100)로부터 노이즈 제거 및 증폭 처리된 상기 소리 정보가 상기 인공 신경망 모델이 이용되어 상기 5가지 상황 중 상기 소리 정보에 포함된 상황의 종류에 따라 상기 소리 정보가 분류될 수 있다. For example, in the learning step (S210), a data set in which sounds for five situations including a car horn sound, a dog barking sound, a construction sound, a siren sound, and a doorbell sound are used are used, and sound characteristics for each situation After this is extracted, it can be trained on the artificial neural network model. Then, in the sound information classification step (S200), the sound information, which has been noise-removed and amplified from the sound information recognition step (S100), uses the artificial neural network model, and is included in the sound information among the five situations. The sound information may be classified according to

한편, 상기 소리 정보 분류단계(S200)는 분류하고자 하는 상황의 종류가 설정되면 각 상황에 대한 위험도가 설정될 수 있다. 상황의 종류에 따라 사용자가 즉각적으로 행동해야하는 위험한 상황일수록 위험도가 높게 부여될 수 있다. 예컨대, 차 경적 소리, 개 짖는 소리, 공사 소리, 사이렌 소리, 초인종 소리를 포함하는 5가지 상황에 대해서 상기 소리 정보 분류단계(S200)는 사이렌 소리, 차 경적 소리에 대하여 위험도가 높게 부여될 수 있고, 위험도에 따른 가중치가 부여될 수 있다.Meanwhile, in the sound information classification step ( S200 ), when the type of situation to be classified is set, a degree of risk for each situation may be set. According to the type of situation, the higher the risk, the higher the risk may be given to a dangerous situation in which the user must act immediately. For example, for five situations including a car horn sound, a dog barking sound, a construction sound, a siren sound, and a doorbell sound, the sound information classification step (S200) may be given a high risk to the siren sound and the car horn sound, , a weight may be assigned according to the degree of risk.

다음으로, 상기 시각화 정보 생성단계(S300)는 상기 시각화 정보 생성부(230)에 의하여, 상기 소리 정보가 분류된 상기 상황의 종류가 시각화된 시각화 정보가 생성된다. 상기 시각화 정보 생성단계(S300)는 상기 소리 정보 분류단계(S200)로부터 분류하고자 하는 상황의 종류가 설정되면 이에 대응되는 이미지 또는 영상이 기 저장될 수 있다. Next, in the visualization information generation step (S300), the visualization information in which the type of the situation in which the sound information is classified is visualized is generated by the visualization information generation unit 230 . In the visualization information generating step (S300), when the type of situation to be classified is set in the sound information classification step (S200), an image or an image corresponding thereto may be pre-stored.

한편, 상기 시각화 정보 생성단계(S300)는 상기 소리 정보 분류단계(S200)로부터 상기 소리 정보가 2개 이상의 상황의 종류로 분류된다면, 2개 이상의 상기 상황의 종류로 유추될 수 있는 유추 상황이 시각화된 시각화 정보가 생성될 수 있다. 상기 유추 상황은 상기 상황의 종류의 개수에 따라 발생할 수 있는 경우에 수만큼 생성될 수 있고, 분류된 상기 소리 정보가 상기 유추 상황에 포함되지 않는다면 각각의 상황에 대한 시각화 정보가 개별적으로 표시될 수 있다On the other hand, in the visualization information generation step (S300), if the sound information is classified into two or more types of situations from the sound information classification step (S200), analogous situations that can be inferred into two or more types of the situations are visualized Visualized information can be generated. The analogous situation may be generated as many as the number of cases that can occur according to the number of types of the situation, and if the classified sound information is not included in the analogous situation, visualization information for each situation may be displayed individually have

예컨대, 상기 소리 정보 분류단계(S200)는 상기 소리 정보가 개 짖는 소리와 초인종 소리로 분류될 수 있다. 상기 언급한 것과 같이 상기 학습단계(S210)는 차 경적 소리, 개 짖는 소리, 공사 소리, 사이렌 소리, 초인종 소리가 포함된 5가지 상황에 대한 소리가 상기 인공 신경망 모델에 학습될 수 있다.For example, in the sound information classification step ( S200 ), the sound information may be classified into a dog barking sound and a doorbell sound. As mentioned above, in the learning step (S210), sounds for five situations including a car horn sound, a dog barking sound, a construction sound, a siren sound, and a doorbell sound may be learned by the artificial neural network model.

이때, 상기 시각화 정보 생성단계(S300)는 차 경적 소리, 개 짖는 소리, 공사 소리, 사이렌 소리, 초인종 소리가 포함된 5가지 상황에 대한 이미지 또는 영상이 기 저장될 수 있고, 상기 유추 상황에 대한 이미지 또는 영상이 기 저장될 수 있다. 그리고 상기 시각화 정보 생성단계(S300)는 개 짖는 소리와 초인종 소리로 분류된 상기 소리 정보에 대하여 개 짖는 소리와 초인종 소리가 발생될 수 있는 상황에 대해 기 저장된 이미지 또는 영상이 상기 시각화 정보로 생성될 수 있다. At this time, in the visualization information generating step (S300), images or images for five situations including a car horn sound, a dog barking sound, a construction sound, a siren sound, and a doorbell sound may be pre-stored, and for the analogous situation An image or an image may be pre-stored. And in the visualization information generation step (S300), a pre-stored image or image for a situation in which a dog barking sound and a doorbell sound may be generated with respect to the sound information classified into a dog barking sound and a doorbell sound are generated as the visualization information. can

한편, 상기 시각화 정보 생성단계(S300)는 상기 단말기(100) 내 스케줄러, GPS 및 메시지와 연동되어 상기 시각화 정보가 생성될 수 있다. 상기 시각화 정보 생성단계(S300)는 개 짖는 소리와 초인종 소리가 동시에 발생할 수 있는 상기 유추 상황으로 택배원 방문, 지인 방문에 대한 상기 시각화 정보가 기 저장될 수 있다. 그리고 상기 단말기(100) 내 스케줄러에 해당 날짜와 시간에 지인 방문이 저장되어 있다면 상기 시각화 정보로 지인 방문 또는 지인 얼굴에 대한 이미지가 생성될 수 있고, 상기 단말기(100) 내 수신 메시지에 택배배달예정 메시지가 있다면 상기 시각화 정보로 택배원 방문에 대한 이미지가 생성될 수 있고, 상기 단말기(100)로부터 어떠한 추가 정보도 확인되지 않는다면 개 짖는 소리와 초인종 소리가 발생할 수 있는 상황에 대해 기 저장된 이미지 또는 영상이 각각 개별적으로 상기 시각화 정보로 생성될 수 있다. On the other hand, the visualization information generating step (S300) may be linked to the scheduler, GPS, and message in the terminal 100 to generate the visualization information. The visualization information generation step (S300) is the analogous situation in which a dog barking sound and a doorbell sound may occur at the same time, and the visualization information for a visit to a courier service person or a visit to an acquaintance may be pre-stored. And if a visit to an acquaintance is stored in the scheduler in the terminal 100 at the corresponding date and time, an image of the acquaintance's visit or the acquaintance's face can be generated as the visualization information, and a delivery service is scheduled in the received message in the terminal 100 If there is a message, an image of a courier visit can be created with the visualization information, and if no additional information is confirmed from the terminal 100, a pre-stored image or video for a situation where a dog barking sound and a doorbell sound may occur. Each may be individually generated as the visualization information.

이상과 같이 본 발명에 의하면 청각장애인 주변 혹은 도시에서 자주 발생할 수 있는 소리를 통해서 어떤 상황인지 보다 정확하게 인식할 수 있고, 주위에 긴급한 상황이 발생한 경우 즉각적으로 해당 상황을 청각장애인에게 알리거나 이해시킬 수 있고, 이에 따라 청각장애인이 겪을 수 있는 사고를 미연에 예방할 수 있는 효과가 있다.As described above, according to the present invention, it is possible to more accurately recognize a situation through a sound that can occur frequently in the vicinity of the hearing impaired or in the city, and when an urgent situation occurs around the hearing impaired, it is possible to immediately inform or understand the situation to the hearing impaired. Accordingly, there is an effect that can prevent accidents that the hearing impaired may experience in advance.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 시술들이 설명된 방법과 다른 순서로 수해되거나, 및/ 또는 으로 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible for those skilled in the art from the above description. For example, the procedures described may be performed in an order different from the method described, and/or components such as systems, structures, devices, circuits, etc. Substituted or substituted for elements or equivalents may achieve appropriate results.

그러므로 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 걸들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims also fall within the scope of the following claims.

100.. 단말기
200.. 딥러닝을 활용한 소리 시각화 디바이스
210.. 소리 정보 인식부
220.. 소리 정보 분류부
221.. 학습부
230.. 시각화 정보 생성부
300.. 웨어러블 디바이스100.. Terminal
200.. Sound visualization device using deep learning
210. Sound information recognition unit
220.. Sound Information Classification Unit
221. Study Department
230. Visualization information generation unit
300. Wearable devices

Claims

a sound information recognition unit for recognizing sound information input through a microphone provided in the user's terminal;
a sound information classification unit for classifying the sound information according to a type of situation using an artificial neural network model; and
A visualization information generation unit for generating visualization information that visualizes the type of the situation in which the sound information is classified;
A sound visualization device using deep learning, which is connected to be able to communicate with the terminal, and transmits the visualization information to the terminal.

The method of claim 1,
The sound risk output unit,
Using deep learning, characterized in that it comprises a; learning unit for extracting the features of each sound by using the data set for the sounds that can be heard in the urban environment of the artificial neural network model, and then learning the artificial neural network model. sound visualization device.

The method of claim 1,
The visualization information generating unit,
A sound visualization device using deep learning, characterized in that the image or video corresponding to the type of the situation is pre-stored.

The method of claim 1,
The terminal is
A sound visualization device using deep learning, characterized in that it transmits the visualization information to the wearable device worn by the user.

a sound information recognition step of recognizing, by the sound information recognition unit, sound information input through a microphone provided in the user's terminal;
a sound information classification step in which, by the sound information classification unit, an artificial neural network model is used to classify the sound information according to the type of situation; and
A method of controlling a sound visualization device using deep learning, comprising: a visualization information generating step of generating, by the visualization information generating unit, the visualization information in which the type of the situation in which the sound information is classified is visualized.