KR20170006286A

KR20170006286A - Apparatus and method for determining voice phishing using distance between voice phishing keyword

Info

Publication number: KR20170006286A
Application number: KR1020160086154A
Authority: KR
Inventors: 이길수; 박인영
Original assignee: (주)티아이스퀘어
Priority date: 2015-07-07
Filing date: 2016-07-07
Publication date: 2017-01-17
Also published as: KR101792203B1

Abstract

The present invention relates to a device and a method for determining voice phishing through calculating distance between a voice phishing keyword. The present invention includes: a voice recording unit which records a voice signal; an STT converting unit which receives the voice signal from the voice recording unit, and converts the received voice signal into text data; a keyword extracting unit which receives the text data from the STT converting unit and detects whether there is a dangerous word in the text data to generate a dangerous word list having distance information for the detected dangerous words; and a voice phishing determining unit which determines whether there is voice phishing based on the dangerous word list generated in the keyword extracting unit.

Description

[0001] APPARATUS AND METHOD FOR DETERMINING VOICE PHISHING [0002] DISTANCE BETWEEN VOICE PHISHING KEYWORD [

본 발명은 보이스 피싱 판단 장치 및 방법에 관한 것으로서, 보다 상세하게는, 통화 도중에 보이스 피싱과 관련된 위험 단어를 검출하고 위험 단어들 사이의 거리 정보를 산출하고, 거리 정보를 고려하여 보이스 피싱 여부를 판단할 수 있는 보이스 피싱 판단 장치 및 방법에 관한 것이다.The present invention relates to a voice phishing determination apparatus and method, and more particularly, to a voice phishing determination apparatus and method for detecting dangerous words related to voice phishing, calculating distance information between dangerous words, And more particularly, to a voice phishing determination apparatus and method.

"보이스 피싱(Voice Phishing)"은 전화를 통해 관련 기관을 사칭해서 개인 정보를 빼내거나 송금을 유도하는 사기 수법을 말한다. 보이스 피싱이 사회 문제화되고 알려짐에 따라 사람들도 어느 정도 보이스 피싱의 위험성을 인식하고는 있으나 보이스 피싱 수법도 교묘하게 진화하고 있어서 최근에도 계속적으로 피해 사례가 보고되고 있는 실정이다."Voice Phishing" refers to a fraudulent method of impersonating a relevant organization via telephone and extracting personal information or inducing remittance. As voice phishing becomes a social issue and people are aware of the danger of voice phishing to some degree, voice phishing techniques are also evolving, and recent cases have been reported.

보이스 피싱을 방지하기 위한 종래의 기술로서는, 통신 사업자 망에 보이스 피싱에 사용된 전화 번호를 저장해 두고 통화를 원천적으로 차단하거나 착신 단말로 알려주는 서비스가 알려져 있으나, 이는 사용자의 평가에 뒷받침한 대용량 데이터베이스의 구축이 선행되어야 하는 경우가 대부분이며, 이러한 경우 초기 데이터 확보에 많은 시간과 노력이 소요된다. 또한 이러한 정보들은 전화를 받기 전 수신화면에서만 확인이 가능하고, 전화를 수신한 이후 즉 통화 중일 때에는 보이스 피싱 위험에 노출된 상태임을 사용자에게 제공할 수 없다는 한계가 있다.As a conventional technique for preventing voice phishing, there is known a service in which a telephone number used for voice phishing is stored in a communication service provider network and the call is originally blocked or informed to a called terminal. However, In this case, it takes much time and effort to acquire initial data. In addition, such information can only be confirmed on the receiving screen before receiving the call, and there is a limit to that the user can not be provided with the risk of voice phishing after receiving the call, that is, during communication.

또한, 보이스 피싱 수법이 진화됨에 따라 새로운 보이스 피싱 수법이 등장하는데 사용자가 모든 보이스 피싱 사례에 대한 대처방법을 미리 숙지하기 어렵고, 순간적인 방심에 의해 피해가 발생하기 때문에, 통화 도중에 실시간으로 보이스 피싱 여부를 제공할 수 없다는 한계점도 있다.In addition, as the voice phishing method evolves, a new voice phishing method appears. It is difficult for the user to know how to cope with all the voice phishing cases beforehand, and the damage occurs due to the momentary vigilance. It is not possible to provide such a limit.

또한, 사용자의 단말기에 보이스 피싱 의심 전화 번호를 저장하도록 하는 방법도 제안되었으나 이 또한 마찬가지의 문제점이 있다. 더욱이, 실제 착신자에게 필요한 전화가 스팸으로 표기되어 전화의 수신이 자동차단 되거나, 정확한 정보를 제공하지 못하는 경우도 발생한다는 문제점도 있다.Also, a method of storing a voice phishing suspicious phone number in a user terminal has been proposed, but the same problem also exists. Furthermore, there is also a problem that a telephone necessary for an actual called party is marked as spam, so that the reception of the telephone is cut off, or accurate information can not be provided.

이와 같이 전화 번호를 저장해 두고 보이스 피싱 여부를 판단하는 기술은 많은 한계점을 지니고 있어서, 본 출원인은 통화 도중에 실시간으로 음성 신호를 음성 인식에 의해 분석하여 보이스 피싱과 관련된 키워드를 검출함으로써 보이스 피싱 여부를 판단하는 기술을 출원한 바 있다(특허 출원 제10-2016-0015700호). 그러나, 이러한 기술은 보이스 피싱과 관련된 키워드가 통화 도중에 나타나는지의 여부만을 판단하고 키워드 간의 상관 관계는 고려하지 않는다는 한계점을 가지고 있어서 정확성이 다소 떨어질 수 있는 한계점이 있다.The technique of storing the telephone number and determining whether or not to voice phishing has many limitations. The present applicant analyzes the voice signal by voice recognition in real time during the call and detects keywords related to voice phishing, thereby determining whether or not voice phishing is performed (Patent Application No. 10-2016-0015700). However, this technique has a limit in that accuracy is somewhat deteriorated because it has a limitation that it judges only whether or not a keyword related to voice phishing appears during a call and does not consider a correlation between keywords.

본 발명은 상기한 바와 같은 한계점을 해결하기 위한 것으로서, 통화 도중에 실시간으로 음성 신호를 분석하여 보이스 피싱과 관련된 위험 단어를 검출하고, 검출된 위험 단어들간의 거리 정보를 함께 고려하여 보이스 피싱 여부를 판단하도록 함으로써 보이스 피싱 판단의 정확성과 신뢰도를 높일 수 있는 장치 및 방법을 제공하는 것을 목적으로 한다.SUMMARY OF THE INVENTION The present invention has been made in order to solve the above-mentioned problems, and it is an object of the present invention to provide a speech recognition apparatus and a speech recognition method capable of detecting a dangerous word related to voice phishing by analyzing a speech signal in real time during a call and considering the distance information between the detected danger words, Thereby improving the accuracy and reliability of the voice phishing determination.

상기한 바와 같은 과제를 해결하기 위하여 본 발명은, 보이스 피싱 위험 단어의 거리 계산을 통한 보이스 피싱 판단 장치로서, 통화 내용인 음성 신호를 녹음하는 통화 녹음부; 상기 통화 녹음부로부터 음성 신호를 수신하고, 수신한 음성 신호를 텍스트 데이터로 변환하는 STT 변환부; 상기 STT 변환부로부터 텍스트 데이터를 전달받아서, 텍스트 데이터에 위험 단어가 존재하는지를 검출하고 검출된 위험 단어들에 대한 거리 정보를 포함하는 위험 단어 리스트를 생성하는 키워드 추출부; 및 상기 키워드 추출부에서 생성된 위험 단어 리스트에 기초하여 보이스 피싱 여부를 판단하는 보이스 피싱 판단부를 포함하는 것을 특징으로 하는 보이스 피싱 판단 장치를 제공한다.According to an aspect of the present invention, there is provided an apparatus for determining voice phishing through distance calculation of a voice phishing risk word, the apparatus comprising: a call recording unit for recording a voice signal, An STT converting unit for receiving a voice signal from the call recording unit and converting the received voice signal into text data; A keyword extracting unit receiving the text data from the STT converting unit to detect whether a danger word exists in the text data and to generate a danger word list including distance information on the detected danger words; And a voice phishing determination unit for determining whether or not voice phishing is performed based on the risk word list generated by the keyword extracting unit.

여기에서, 상기 위험 단어들간의 거리 정보는, 위험 단어들 사이에 존재하는 문자의 갯수, 위험 단어들 사이에 존재하는 단어의 갯수 및 위험 단어들 사이의 시간 간격 중 적어도 어느 하나인 것이 바람직하다.Here, the distance information between the danger words is preferably at least one of the number of characters existing between the danger words, the number of words existing between the danger words, and the time interval between the danger words.

또한, 상기 보이스 피싱 판단부는, 위험 단어 리스트에 포함된 위험 단어들과 위험 단어들간의 거리 정보에 기초하여 보이스 피싱 여부를 판단하도록 구성할 수 있다.The voice phishing determination unit may determine whether voice phishing is performed based on distance information between the dangerous words and the dangerous words included in the dangerous word list.

또한, 상기 보이스 피싱 판단부는, 위험 단어들의 집합으로 구성되는 보이스 피싱 패턴을 저장하는 보이스 피싱 패턴 데이터베이스를 참조하여, 위험 단어 리스트에 포함된 위험 단어들이 특정 보이스 피싱 패턴에 출현하는 빈도에 의해 보이스 피싱 패턴을 판단하고, 해당 보이스 피싱 패턴에 대해 미리 설정해 둔 위험 단어 군에 상응하는 위험 단어들의 거리 정보에 가중치를 부여함으로써 보이스 피싱 여부를 판단하는 것이 바람직하다.The voice phishing determination unit may refer to a voice phishing pattern database storing a voice phishing pattern composed of a set of dangerous words and determine whether the risk words included in the risk word list are voice phishing It is preferable to judge whether or not voice phishing is performed by determining a pattern and weighting the distance information of the danger words corresponding to the risk word group previously set for the voice phishing pattern.

또한, 상기 위험 단어들의 거리 정보가 가까울수록 높은 가중치를 부여할 수 있다.Further, the closer the distance information of the danger words is, the higher the weight value can be given.

또한, 위험 단어 군에 상응하는 위험 단어들의 거리 정보가 미리 설정된 값 이하인 경우에만 위험 단어로 취급되도록 할 수도 있다.Also, it may be treated as a danger word only when the distance information of the danger words corresponding to the danger word group is less than a predetermined value.

본 발명의 다른 측면에 의하면, 상기한 바와 같은 장치에 의한 보이스 피싱 판단 장치에 의한 보이스 피싱 판단 방법으로서, 통화 내용인 음성 신호를 녹음하는 제1 단계; 상기 녹음된 음성 신호를 텍스트 데이터로 변환하는 제2 단계; 상기 텍스트 데이터에 위험 단어가 존재하는지를 검출하고 검출된 위험 단어들에 대한 거리 정보를 포함하는 위험 단어 리스트를 생성하는 제3 단계; 및 상기 생성된 위험 단어 리스트에 기초하여 보이스 피싱 여부를 판단하는 제4 단계를 포함하는 보이스 피싱 판단 방법을 제공한다.According to another aspect of the present invention, there is provided a method for determining voice phishing by a device for determining voice phishing by the apparatus as described above, comprising the steps of: A second step of converting the recorded voice signal into text data; A third step of detecting whether a risk word exists in the text data and generating a risk word list including distance information on the detected risk words; And a fourth step of determining whether or not voice phishing is performed based on the generated danger word list.

여기에서, 상기 제4 단계는, 위험 단어 리스트에 포함된 위험 단어들과 위험 단어들간의 거리 정보에 기초하여 보이스 피싱 여부를 판단하도록 구성할 수도 있다.Here, the fourth step may be configured to determine whether or not voice phishing is performed based on distance information between the dangerous words included in the dangerous word list and the dangerous words.

본 발명에 의하면, 통화 도중에 실시간으로 음성 신호를 분석하여 보이스 피싱과 관련된 위험 단어를 검출하고, 검출된 위험 단어들간의 거리 정보를 함께 고려하여 보이스 피싱 여부를 판단하도록 함으로써 보이스 피싱 판단의 정확성과 신뢰도를 높일 수 있는 장치 및 방법을 제공할 수 있다.According to the present invention, a voice signal is analyzed in real time during a call to detect a danger word related to voice phishing, and the distance information between the detected danger words is considered together to determine whether or not voice phishing is performed. Can be provided.

또한, 본 발명은 보다 치밀해지고 다양한 형태로 변화하는 보이스 피싱 수법에 대응하기 위하여 보이스 피싱 사례에서 예측 가능한 키워드를 사전에 데이터베이스화하고 이러한 키워드가 음성 통화를 통해 검출되는지를 실시간으로 파악하고 보이스 피싱에 사용되는 단어들간의 거리 정보를 고려함으로써 통화 도중에 이루어지는 보이스 피싱에도 실시간으로 정확하게 대응할 수 있도록 하며, 단순한 키워드의 존재만으로 실제 보이스 피싱이 아닌 경우 보이스 피싱으로 취급될 가능성을 사전에 방지할 수 있는 효과가 있다.Further, in order to cope with the voice phishing method which becomes more dense and varied in various forms, the present invention preliminarily databases predictable keywords in a voice phishing case, grasps in real time whether such keywords are detected through voice communication, It is possible to accurately cope with the voice phishing performed during the call in real time by taking into consideration the distance information between the used words and to prevent the possibility of being treated as voice phishing in the case of not real voice phishing only by existence of a simple keyword have.

도 1은 본 발명에 의한 보이스 피싱 위험 단어의 거리 계산을 통한 보이스 피싱 판단 장치(100)의 내부 구성을 나타낸 도면이다.
도 2는 위험 단어 리스트의 일예를 나타낸 도면이다.
도 3은 도 1 및 도 2를 참조하여 설명한 보이스 피싱 판단 장치(100)에서 수행되는 방법을 나타낸 흐름도이다.1 is a block diagram illustrating an internal configuration of a voice phishing determination apparatus 100 according to an embodiment of the present invention.
2 is a diagram showing an example of a risk word list.
FIG. 3 is a flowchart illustrating a method performed by the voice phishing determination apparatus 100 described with reference to FIGS. 1 and 2. FIG.

이하, 첨부 도면을 참조하여 본 발명에 의한 바람직한 실시예를 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

도 1은 본 발명에 의한 보이스 피싱 위험 단어의 거리 계산을 통한 보이스 피싱 판단 장치(100)의 내부 구성을 나타낸 도면이다.1 is a block diagram illustrating an internal configuration of a voice phishing determination apparatus 100 according to an embodiment of the present invention.

도 1을 참조하면, 보이스 피싱 위험 단어의 거리 계산을 통한 보이스 피싱 판단 장치(이하, 간단히 "보이스 피싱 판단 장치(100)"라 한다)는, 통화 녹음부(10), STT 변환부(20), 키워드 추출부(30) 및 보이스 피싱 판단부(40)를 포함한다.1, a voice phishing determination device (hereinafter, simply referred to as a "voice phishing determination device 100") through distance calculation of a voice phishing risk word includes a call recording section 10, an STT conversion section 20, A keyword extracting unit 30, and a voice phishing determination unit 40.

보이스 피싱 판단 장치(100)는 이동 통신 단말기에 하드웨어 또는 소프트웨어적으로 구현될 수 있다. 특히, 스마트폰에 응용 프로그램(application) 형태로 포함되는 것이 바람직하다. 한편, 통신망의 자원에 서버 형태로 구성될 수도 있다.The voice phishing determination apparatus 100 may be implemented in hardware or software in the mobile communication terminal. In particular, it is preferable to be included in an application form in a smart phone. On the other hand, the resources of the communication network may be configured in the form of a server.

통화 녹음부(10)는 통화 내용인 음성 신호를 녹음하는 수단으로서, 녹음되는 음성 신호를 일정 길이 단위로 분할하여 STT 변환부(20)로 전달하는 기능을 수행한다.The call recording unit 10 is a means for recording a voice signal, which is the content of a call, and performs a function of dividing a voice signal to be recorded into units of a predetermined length and transmitting the divided voice signal to the STT converter 20.

STT(Speech to Text) 변환부(20)는 상기 통화 녹음부(10)로부터 전달되는 일정 단위의 녹음된 음성 신호를 텍스트 데이터로 변환하는 기능을 수행한다. 음성 신호를 텍스트 데이터로 변환하는 것은 음성 인식 엔진과 같은 수단을 통해 수행할 수 있는데, 자체적으로 수행하거나 또는 외부의 별도 음성 인식 서버를 통해 수행할 수 있다. 음성 신호를 텍스트 데이터로 변환하는 기술은 종래 기술에 의해 알려져 있는 것을 사용할 수 있으며, 이러한 기술 자체는 본 발명의 직접적인 목적은 아니므로 상세 설명은 생략한다.The STT (Speech to Text) conversion unit 20 converts a recorded voice signal of a predetermined unit transmitted from the call recording unit 10 into text data. The conversion of the speech signal into text data may be performed by means such as a speech recognition engine, which may be performed on its own or through a separate external speech recognition server. The technique for converting the voice signal into the text data may use what is known from the prior art, and this technique itself is not a direct object of the present invention, and thus a detailed description thereof will be omitted.

키워드 추출부(30)는 상기 STT 변환부(20)로부터 텍스트 데이터를 전달받아서 파싱 처리 등과 같은 필요한 처리를 수행하여 텍스트 데이터에 위험 단어가 존재하는지를 검출하고 검출된 위험 단어들에 대한 위험 단어(키워드) 리스트를 생성하는 기능을 수행한다. The keyword extracting unit 30 receives the text data from the STT converting unit 20 and performs necessary processing such as parsing processing to detect whether or not a danger word exists in the text data and extracts a danger word ) Function to generate a list.

여기에서, 위험 단어는 미리 위험 단어 데이터베이스(미도시)에 저장되어 있다. 키워드 추출부(30)는 위험 단어 데이터베이스를 참조하여 STT 변환부(20)로부터 전달받은 텍스트 데이터에 위험 단어가 존재하는지를 판별하고 위험 단어가 존재하는 경우 해당 위험 단어에 대한 리스트를 생성한다.Here, the danger word is stored in advance in the danger word database (not shown). The keyword extracting unit 30 determines whether or not a danger word exists in the text data transmitted from the STT converting unit 20 by referring to the danger word database and generates a list of the danger words if the danger word exists.

한편, 위험 단어 리스트는, 검출된 위험 단어들 사이의 거리 정보를 포함한다.On the other hand, the risk word list includes distance information between the detected danger words.

여기에서, 거리 정보라 함은, 위험 단어들 사이에 존재하는 문자 또는 단어의 갯수, 위험 단어들 사이의 시간 간격 등과 같은 정보 중의 어느 하나로 구성될 수 있다. 또는, 이들의 조합에 의해 구성될 수도 있다.Here, the distance information may be any one of information such as the number of characters or words existing between dangerous words, time intervals between dangerous words, and the like. Or a combination of these.

도 2는 위험 단어 리스트의 일예를 나타낸 도면이다.2 is a diagram showing an example of a risk word list.

도 2를 참조하면, 키워드 추출부(30)가 텍스트 데이터로부터 순차적으로 위험 단어가 "검찰, 은행, 계좌,...비밀번호"의 순서로 검출하였음을 알 수 있다. 이 때, 각각의 위험 단어들의 거리 정보는 "3,8,...,4"로 나타나 있는데, 여기서 각각의 숫자는 위험 단어들 사이의 단어의 갯수인 경우이다. 즉, "검찰"과 "은행"사이에 3개의 단어가 존재하고, "은행"과 "계좌"사이에는 8개의 단어가 존재한다.Referring to FIG. 2, it can be seen that the keyword extracting unit 30 sequentially detects dangerous words from the text data in the order of "prosecution, bank, account, password ...". At this time, the distance information of each danger words is indicated as "3, 8, ..., 4", where each number is the number of words between the danger words. That is, there are three words between "prosecution" and "bank", and there are eight words between "bank" and "account".

이와 같이, 위험 단어 리스트는 위험 단어들과 각 위험 단어들 사이의 거리 정보를 포함하도록 구성된다.As such, the risk word list is configured to include the risk words and the distance information between each risk word.

보이스 피싱 판단부(40)는 키워드 추출부(30)에서 생성된 위험 단어 리스트에 기초하여 보이스 피싱 여부를 판단하는데, 위험 단어 리스트에 포함된 위험 단어와 위험 단어들간의 거리 정보에 기초하여 보이스 피싱 여부를 판단한다.The voice phishing determination unit 40 determines whether or not voice phishing is performed based on the risk word list generated by the keyword extraction unit 30. The voice phishing determination unit 40 determines whether or not voice phishing is performed based on the distance information between the danger word .

보이스 피싱 판단부(40)는 위험 단어 리스트들에 포함된 위험 단어들과 각 위험 단어들간의 거리 정보에 의해 보이스 피싱 패턴을 분석함으로써 보이스 피싱 여부를 판단할 수 있다.The voice phishing determination unit 40 can determine whether or not voice phishing is performed by analyzing the voice phishing pattern based on the risk words included in the danger word lists and the distance information between the danger words.

여기에서, 보이스 피싱 패턴은 미리 보이스 피싱 패턴 데이터베이스(미도시)에 저장되어 있으며 보이스 피싱 판단부(40)는 보이스 피싱 패턴 데이터베이스를 참조하여 보이스 피싱 패턴을 분석한다.Here, the voice phishing pattern is stored in advance in a voice phishing pattern database (not shown), and the voice phishing determination unit 40 analyzes the voice phishing pattern with reference to the voice phishing pattern database.

보이스 피싱 패턴은 예컨대 "검찰 사칭형", "은행 사칭형" 등과 같이 자주 발생하는 보이스 피싱 패턴의 유형을 분석하고 해당 보이스 피싱 패턴에서 자주 나타나는 위험 단어(키워드)의 집합에 의해 구성될 수 있으며, 해당 보이스 피싱 패턴에 해당하는지의 여부는 키워드 추출부(30)에서 생성된 위험 단어들이 특정 보이스 피싱 패턴을 구성하는 위험 단어의 집합에 출현하는 빈도를 점수화하여 확률적으로 평가될 수 있다.The voice phishing pattern can be constituted by a set of risk words (keywords) frequently analyzed in the corresponding voice phishing pattern by analyzing types of frequently occurring voice phishing patterns such as "prosecution impersonation type "," Whether or not the voice phishing pattern corresponds to the voice phishing pattern can be evaluated stochastically by scoring the occurrence frequency of the danger words generated by the keyword extracting unit 30 in the set of danger words constituting the specific voice phishing pattern.

예컨대, 간단한 방법으로는, "검찰 사칭형"의 위험 단어 집합이 "검찰, 은행, 계좌, 도장, 비밀번호, 방문"의 6개로 구성되어 있을 때, 위험 단어 리스트에 포함된 위험 단어들이 "검찰, 은행, 비밀 번호"의 3개인 경우 보이스 피싱 판단부(40)는 "검찰 사칭형" 보이스 피싱 패턴일 확률을 50%(=(3/6)*100)라고 판단할 수 있다.For example, in a simple method, when the risk word set of "prosecution impersonation type" is composed of six items of "prosecution, bank, account, seal, password and visit" Bank, and password ", the voice phishing determination unit 40 can determine that the probability of the" prosecution false-type "voice phishing pattern is 50% (= (3/6) * 100).

이 때, 해당 보이스 피싱 패턴에 대해 특정 위험 단어 군을 미리 설정해 두고 해당 군에 속하는 위험 단어간의 거리 정보를 고려하여 거리 정보가 짧을수록 해당 보이스 피싱 패턴으로 판단될 확률값에 가중치를 부여할 수 있다. 예컨대, "은행, 비밀 번호"를 위험 단어 군으로 설정해 두고 "은행"과 "비밀 번호"의 거리 정보를 참조하여 짧을수록 가중치를 높게 부여함으로써 해당 보이스 피싱 패턴일 확률을 높게 할 수 있다.In this case, a specific risk word group may be preset for the voice phishing pattern, and a weight value may be assigned to the probability value to be determined as the voice phishing pattern as the distance information becomes shorter considering the distance information between the danger words belonging to the group. For example, it is possible to increase the probability of the voice phishing pattern by setting the "bank, password" as a dangerous word group and referring to distance information between "bank" and "password"

가중치를 부여하는 방법은 곱셈, 덧셈 등 다양한 연산들의 선형 또는 비선형 결합 연산 방법으로 구성될 수 있으며 특정한 연산에 제한되지 않는다.The weighting method can be configured as a linear or non-linear combination operation method of various operations such as multiplication and addition, and is not limited to a specific operation.

한편, 특정 위험 단어 군의 거리가 미리 설정된 값 이하인 경우에만 위험 단어로 취급하도록 할 수도 있다. 이는 특정 위험 단어간의 거리 정보가 미리 설정된 값 이상인 경우에는 해당 위험 단어를 보이스 피싱 패턴에 고려하지 않는다는 의미이다.On the other hand, it can be treated as a dangerous word only when the distance of a specific danger word group is less than a predetermined value. This means that if the distance information between specific risk words is greater than a predetermined value, the risk word is not considered in the voice phishing pattern.

도 3은 도 1 및 도 2를 참조하여 설명한 보이스 피싱 판단 장치(100)에서 수행되는 방법을 나타낸 흐름도이다.FIG. 3 is a flowchart illustrating a method performed by the voice phishing determination apparatus 100 described with reference to FIGS. 1 and 2. FIG.

도 3을 참조하면, 우선, 통화 녹음부(10)에서 통화 내용인 음성 신호를 녹음한다(S100). Referring to FIG. 3, the call recording unit 10 records a voice signal as a call content (S100).

통화 녹음부(10)는 녹음된 음성 신호를 일정 시간 또는 크기 단위로 분할하고 분할된 음성 신호를 STT 변환부(20)로 전송한다(S110,S120).The call recording unit 10 divides the recorded voice signal by a predetermined time or size unit and transmits the divided voice signal to the STT converting unit 20 (S110, S120).

STT 변환부(20)는 통화 녹음부(10)로부터 음성 신호를 수신하고, 수신한 음성 신호를 텍스트 데이터로 변환한다(S130). The STT conversion unit 20 receives a voice signal from the call recording unit 10, and converts the received voice signal into text data (S130).

텍스트 데이터로의 변환이 완료되면, STT 변환부(20)는 텍스트 데이터를 키워드 추출부(30)로 전송한다(S140).When the conversion into the text data is completed, the STT conversion unit 20 transmits the text data to the keyword extraction unit 30 (S140).

키워드 추출부(30)는 STT 변환부(20)로부터 텍스트 데이터를 수신하면, 수신한 텍스트 데이터에 위험 단어가 존재하는지를 검출하고 검출된 위험 단어들에 대한 거리 정보를 포함하는 위험 단어 리스트를 생성한다(S150).Upon receipt of the text data from the STT converter 20, the keyword extracting unit 30 detects whether or not a danger word exists in the received text data, and generates a danger word list including distance information on the detected danger words (S150).

여기에서, 위험 단어들 간의 거리 정보는 전술한 바와 같이 위험 단어들 사이에 존재하는 문자의 갯수, 위험 단어들 사이에 존재하는 단어의 갯수 및 위험 단어들 사이의 시간 간격 중 적어도 어느 하나일 수 있으며, 위험 단어 리스트는 도 2와 같은 형태로 구성될 수 있다.Here, the distance information between the dangerous words may be at least one of the number of characters existing between the dangerous words, the number of words present between the dangerous words, and the time interval between the dangerous words as described above , And the risk word list may be configured as shown in FIG.

키워드 추출부(30)는 위험 단어 리스트를 보이스 피싱 판단부(40)로 전달하고(S160), 보이스 피싱 판단부(40)는 수신한 위험 단어 리스트에 기초하여 보이스 피싱 여부를 판단한다(S170).The keyword extracting unit 30 transmits the risk word list to the voice phishing determination unit 40 in step S160 and the voice phishing determination unit 40 determines whether voice phishing is performed based on the received risk word list in step S170. .

여기에서, 보이스 피싱 여부의 판단은 위험 단어 리스트에 포함된 위험 단어들과 위험 단어들간의 거리 정보에 기초하여 이루어질 수 있다.Here, the determination of whether or not voice phishing is performed can be made based on the distance information between the danger words included in the risk word list and the danger words.

전술한 바와 같이, 보이스 피싱 판단부(40)는, 위험 단어들의 집합으로 구성되는 보이스 피싱 패턴을 저장하는 보이스 피싱 패턴 데이터베이스를 참조하여, 위험 단어 리스트에 포함된 위험 단어들이 특정 보이스 피싱 패턴에 출현하는 빈도에 의해 보이스 피싱 패턴을 판단하고, 해당 보이스 피싱 패턴에 대해 미리 설정해 둔 위험 단어 군에 상응하는 위험 단어들의 거리 정보에 가중치를 부여함으로써 보이스 피싱 여부를 판단할 수 있다. 예컨대, 특정 위험 단어 군에 포함된 위험 단어들의 거리 정보가 가까울수록 높은 가중치를 부여하거나, 특정 위험 단어 군에 상응하는 위험 단어들의 거리 정보가 미리 설정된 값 이하인 경우에만 위험 단어로 취급되도록 할 수 있다.As described above, the voice phishing determination unit 40 refers to a voice phishing pattern database storing a voice phishing pattern composed of a set of dangerous words, so that the danger words included in the risk word list appear in a specific voice phishing pattern A voice phishing pattern may be determined based on the frequency with which the voice phishing pattern is set and a voice phishing may be determined by weighting the distance information of the danger words corresponding to the risk word group set in advance for the voice phishing pattern. For example, the closer the distance information of the risk words included in the specific risk word group is, the higher the weight is given, or the risk word can be treated as a danger word only when the distance information of the risk words corresponding to the specific risk word group is less than a preset value .

이상에서, 본 발명에 의한 바람직한 실시예를 설명하였으나, 본 발명은 상기 실시예에 한정되는 것이 아님은 물론이다.While the present invention has been described in connection with certain exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments.

100...보이스 피싱 위험 단어의 거리 계산을 통한 보이스 피싱 판단 장치
10...통화 녹음부
20...STT 변환부
30...키워드 추출부
40...보이스 피싱 판단부100 ... Voice phishing The voice phishing judgment device
10 ... call recording section
20 ... STT conversion section
30 ... Keyword Extraction Unit
40 ... Voice phishing judgment unit

Claims

A voice phishing determination apparatus for calculating a distance of a voice phishing danger word,
A call recording unit for recording a voice signal as a call content;
An STT converting unit for receiving a voice signal from the call recording unit and converting the received voice signal into text data;
A keyword extracting unit receiving the text data from the STT converting unit to detect whether or not a danger word exists in the text data and generating a danger word list including distance information on the detected danger words; And
A voice phishing determination unit for determining whether a voice phishing is performed based on the risk word list generated by the keyword extracting unit,
Wherein the voice phishing determination device comprises:

The method according to claim 1,
Wherein the distance information between the danger words is at least one of a number of characters existing between the danger words, a number of words existing between the danger words, and a time interval between the danger words. Device.

The method according to claim 1,
Wherein the voice phishing determination unit determines whether or not voice phishing is performed based on distance information between the danger words included in the danger word list and the danger words.

The method of claim 3,
The voice phishing determination unit,
A voice phishing pattern database for storing a voice phishing pattern composed of a set of danger words, judging a voice phishing pattern by the frequency of occurrence of the danger words included in the risk word list in a specific voice phishing pattern,
Wherein the voice phishing determination unit determines whether or not voice phishing is performed by weighting the distance information of the danger words corresponding to the risk word group set in advance for the voice phishing pattern.

5. The method of claim 4,
And the higher the weighting value is, the closer the distance information of the danger words is.

5. The method of claim 4,
And is treated as a dangerous word only when the distance information of the dangerous words corresponding to the dangerous word group is less than a preset value.

A voice phishing determination method by a voice phishing determination apparatus according to any one of claims 1 to 6,
A first step of recording a voice signal as a content of a call;
A second step of converting the recorded voice signal into text data;
A third step of detecting whether a risk word exists in the text data and generating a risk word list including distance information on the detected risk words; And
A fourth step of determining whether or not voice phishing is performed based on the generated danger word list
And a voice phishing determination method.

8. The method of claim 7,
In the fourth step,
And determining whether or not voice phishing is performed based on the distance information between the danger words included in the danger word list and the danger words.