WO2018212584A2 - Method and apparatus for classifying class, to which sentence belongs, using deep neural network - Google Patents

Method and apparatus for classifying class, to which sentence belongs, using deep neural network Download PDF

Info

Publication number
WO2018212584A2
WO2018212584A2 PCT/KR2018/005598 KR2018005598W WO2018212584A2 WO 2018212584 A2 WO2018212584 A2 WO 2018212584A2 KR 2018005598 W KR2018005598 W KR 2018005598W WO 2018212584 A2 WO2018212584 A2 WO 2018212584A2
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
class
neural network
feature vector
vector
Prior art date
Application number
PCT/KR2018/005598
Other languages
French (fr)
Korean (ko)
Other versions
WO2018212584A3 (en
Inventor
송희준
쿨카르니니레시
Original Assignee
삼성전자 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020180055651A external-priority patent/KR102071582B1/en
Application filed by 삼성전자 주식회사 filed Critical 삼성전자 주식회사
Priority to US16/613,317 priority Critical patent/US11568240B2/en
Publication of WO2018212584A2 publication Critical patent/WO2018212584A2/en
Publication of WO2018212584A3 publication Critical patent/WO2018212584A3/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks

Definitions

  • the present disclosure relates to a method and apparatus for classifying a class to which a sentence belongs, by structurally analyzing a question sentence using a deep neural network.
  • AI Artificial Intelligence
  • AI technology is composed of machine learning and elementary technologies that utilize machine learning.
  • Machine learning is an algorithm technology that classifies / learns characteristics of input data by itself.
  • Element technology is a technology that simulates the functions of human brain cognition and judgment by using machine learning algorithms such as neural networks. It consists of technical areas such as understanding, reasoning / prediction, knowledge representation, and motion control.
  • AI technology can recognize, apply and process human language / characters, and is also used for natural language processing, machine translation, dialogue system, question and answer, speech recognition / synthesis, and so on.
  • question-and-answer system using artificial intelligence technology, the structure of the user's question sentence is analyzed, and the answer type, intent, subject / verb analysis is performed, and the related answer is found in the database.
  • question-and-answer system that executes a user's command, the user's input speech is classified, the intent is analyzed, and an independent entity is found to process the command.
  • customer care chatbots are being utilized that use artificial intelligence to analyze user problems and provide appropriate answers.
  • customer support chatbots it is important to analyze the user's speech and analyze the category in which the user wants to receive an answer. If the amount of questions already stored is not large, the user's speech may be misclassified into a different category, unlike the user's intention. In this case, the user may not receive the desired answer.
  • the present disclosure provides a method and apparatus for increasing the classification accuracy of a first sentence by additionally using a separate second neural network model in classifying a class to which the first sentence belongs using the first neural network model. .
  • the present disclosure not only learns using a first neural network model in classifying a first class to which a first sentence belongs, but also uses a first neural network model to further learn a first feature.
  • It provides a method and apparatus that can distinguish the degree of similarity. Accordingly, the method and apparatus according to the embodiment of the present disclosure may improve the accuracy of sentence classification by using not only the label of a sentence or speech but also semantic similarity.
  • FIG. 1 is a conceptual diagram illustrating an embodiment of obtaining a classification prediction value of a class to which a sentence belongs by training by inputting a sentence vector and a class into a neural network model according to an embodiment of the present disclosure.
  • FIG. 2 is a block diagram illustrating components of an electronic device according to an embodiment of the present disclosure.
  • FIG. 3 is a flowchart illustrating a method of classifying a class to which a sentence belongs, according to an embodiment of the present disclosure.
  • FIG. 4 is a diagram for describing a method of classifying a class to which a sentence belongs, using a convolutional neural network, according to an embodiment of the present disclosure.
  • FIG. 5 is a flowchart illustrating a method of obtaining, by an electronic device, a classification prediction value that is a probability value classified into a class to which a first sentence belongs.
  • FIG. 6 is a flowchart illustrating a learning method of adjusting, by an electronic device, a weight applied to a neural network model based on a loss value obtained through the neural network model, according to an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a first neural network using a first sentence including at least one word and a first class to which the first sentence belongs, as input data. Training a first feature vector through; Learning a second feature vector through a second neural network using as input data a second sentence and a second class to which the second sentence belongs; Contrastive loss that quantifies the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. Obtaining; And repeating the learning using the first neural network and the second neural network so that the contrast loss value is maximum, using a deep neural network. Provides a method of classification.
  • the method may include receiving a speech from a user; Recognizing the received speech as a sentence; And extracting at least one word included in the recognized sentence, and converting the at least one word into at least one word vector, wherein learning the first feature vector comprises: Generating a sentence vector by arranging the word vectors in a matrix form; And learning the first feature vector by inputting the sentence vector as input data to the first neural network.
  • a plurality of sentences and a plurality of classes to which each of the plurality of sentences belong are stored in a database, and the second sentence and the second class may be extracted randomly on the database.
  • the obtaining of the contrast loss value may be performed using a formula representing a dot product of a first feature vector and a second feature vector and whether the first class and the second class are equal to each other by a numerical formula. Can be calculated.
  • the formula may output 1 when the first class and the second class are the same, and output 0 when the first class and the second class are not the same.
  • learning the first feature vector may include converting the first sentence into a matrix form including at least one word vector; Inputting the transformed matrix into the convolutional neural network as input data and generating a feature map by applying a plurality of filters; And extracting the first feature vector by passing the feature map through a max pooling layer.
  • the method includes inputting a first feature vector into a fully connected layer and converting it into a one-dimensional vector value; And inputting a one-dimensional vector value to a softmax classifier to obtain a first classification prediction value representing a probability distribution classified into a first class.
  • the method includes obtaining a first classification loss value that is a difference between the first classification prediction value and the first class; Acquire a second classification prediction value representing a probability distribution in which a second sentence is classified into a second class through the second neural network, and obtain a second classification loss value, which is a difference between the second classification prediction value and the second class. Doing; And calculating the final loss value by summing the first classification loss value, the second classification loss value, and the control loss value, and calculating the final loss value to the first neural network and the second neural network based on the calculated final loss value.
  • the method may further include adjusting a weight applied.
  • the learning through the first neural network and the learning through the second neural network may be performed at the same time.
  • an embodiment of the present disclosure may provide an electronic device that classifies a class to which a sentence belongs, using a deep neural network.
  • the electronic device includes a processor that performs training by using a neural network, and the processor inputs a first sentence including at least one word and a first class to which the first sentence belongs.
  • a first feature vector is learned through a first neural network as data
  • a second feature vector is obtained through a second neural network using as input data a second sentence and a second class to which the second sentence belongs.
  • Contrast loss value obtained by learning and quantifying the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. (contrastive loss) can be obtained, and the learning using the first neural network and the second neural network can be repeated to maximize the contrast loss value.
  • the electronic device may further include a speech input unit configured to receive a speech from a user, and the processor may recognize the received speech as a sentence, extract at least one word included in the recognized sentence, and At least one word may be converted into at least one word vector, respectively.
  • a speech input unit configured to receive a speech from a user
  • the processor may recognize the received speech as a sentence, extract at least one word included in the recognized sentence, and At least one word may be converted into at least one word vector, respectively.
  • the processor generates the sentence vector by arranging the at least one word vector in a matrix form, and inputs the sentence vector as input data to the first neural network to learn the first feature vector. can do.
  • the electronic device further includes a database storing a plurality of sentences and a plurality of classes to which each of the plurality of sentences belongs, and wherein the processor is configured to store the second sentence and the second class on the database.
  • the data may be extracted randomly and input to the second neural network as input data.
  • the processor may calculate the contrast loss value through a numerical expression representing a dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal. have.
  • the formula may output 1 when the first class and the second class are the same, and output 0 when the first class and the second class are not the same.
  • the processor converts the first sentence into a matrix including at least one word vector, inputs the converted matrix into input data into a convolutional neural network, and applies a plurality of filters.
  • a feature map may be generated, and the first feature vector may be extracted by passing the feature map through a max pooling layer.
  • the processor may input the first feature vector into a fully connected layer and convert the first feature vector into a one-dimensional vector value, and input the one-dimensional vector value into a softmax classifier to generate the first feature vector.
  • a first classification prediction value representing a probability distribution classified into a class may be obtained.
  • the processor obtains a first classification loss value, which is a difference value between the first classification prediction value and the first class, and calculates a probability distribution in which a second sentence is classified into a second class through the second neural network.
  • the final loss value may be calculated by summing, and the weight applied to the first neural network and the second neural network may be adjusted based on the calculated final loss value.
  • the processor may simultaneously perform learning through the first neural network and learning through the second neural network.
  • an embodiment of the present disclosure provides a computer program product including a computer-readable storage medium, wherein the storage medium includes a first sentence comprising at least one word and the first sentence.
  • ... unit means a unit for processing at least one function or operation, which is implemented in hardware or software or a combination of hardware and software. Can be.
  • FIG. 1 illustrates training by inputting a sentence vector (S i , S j ) and a class (y 1 , y 2 ) into a neural network model (100, 110) according to an embodiment of the present disclosure. It is a conceptual diagram for explaining an embodiment of obtaining classification prediction values y 1 ′ and y 2 ′ of a class.
  • Artificial intelligence (AI) algorithms including deep neural networks, include input data into an artificial neural network (ANN), and learn output data through operations such as convolution. do.
  • Artificial neural networks can refer to a computer scientific architecture that models the biological brain.
  • nodes corresponding to neurons in the brain are connected to each other and operate collectively to process input data.
  • neurons in the neural network have links with other neurons. Such connections may extend in one direction, for example in a forward direction, via a neural network.
  • the first, the first sentence vector (S i) and a first class (y 1) to the neural network model 100 is input as the input data, the first learning by the neural network model 100 (
  • the first classification predicted value y 1 ′ may be output through training.
  • a second sentence vector S j and a second class y 2 are input to the second neural network model 110 as input data, and the second classification is performed through learning through the second neural network model 110.
  • the predicted value y 2 ′ may be output.
  • the first neural network model 100 and the second neural network model 110 shown in FIG. 1 may be implemented as a convolutional neural network (CNN), but are not limited thereto.
  • the first neural network model 100 and the second neural network model 110 comprise a recurrent neural network (RNN), a deep belief network (DBN), a limited Boltzmann machine ( It can be implemented as an artificial neural network model such as a Restricted Boltzman Machine (RBM) method, or a machine learning model such as a support vector machine (SVM).
  • RNN recurrent neural network
  • DNN deep belief network
  • RBM Restricted Boltzman Machine
  • SVM support vector machine
  • the first sentence vector S i and the second sentence vector S j are extracted by parsing at least one word included in a sentence or utterance input by a user through a natural language processing technique, and extracts the extracted word. Can be generated by converting to a vector.
  • the first sentence vector (S i) and the second sentence vectors (S j) may be generated through a machine learning model for embedding (embedding) the word, such as word2vec, GloVe, onehot as vectors, whereby It is not limited.
  • the first sentence vector Si and the second sentence vector S j may be generated by arranging at least one word vector in a matrix form.
  • the first class (y 1) and a second class (y 2) is a vector may be a value that defines the class to which it belongs each of the first sentence vector (S i) and the second sentence vectors (S j).
  • a class does not mean a hierarchy, but may mean a category classification to which a sentence belongs, for example, politics, society, economy, culture, entertainment, IT, and the like.
  • First classification predicted value (y 1 ') are learned through the first neural network model 100, the first sentence vector (S i) of the first neural network model 100 as a result of the data output due to learning by It may mean a probability value that may be classified as the first class y 1 .
  • the first first sentence corresponding to the sentence vector (S i) can be related to the global category "politics.”
  • the second classification predicted value y 2 ′ is a result value output through the second neural network model 110, and the second sentence vector S j is trained through the second neural network model 110 to generate a second class. It may mean a probability value that may be classified as (y 2 ).
  • a first classification loss value may be obtained by calculating a difference value between the first classification prediction value y 1 ′ and the first class y 1 .
  • a second classification loss value may be obtained by calculating a difference value between the second classification prediction value y 2 ′ and the second class y 2 .
  • the first neural network model 100 and the second neural network model 110 may be configured as a convolutional neural network (CNN).
  • the first sentence vector Si and the second sentence vector S j are each configured to filter a plurality of filters having different widths through the first neural network model 100 and the second neural network model 110.
  • the first and second feature vectors may be learned, respectively.
  • Contrastive loss L 1 may be obtained based on the identity of.
  • the contrast loss value L 1 may be calculated through a numerical expression representing the dot product of the first feature vector and the second feature vector and whether the first class and the second class are the same. .
  • the control loss value L 1 will be described in detail in the description of FIG. 4.
  • control loss value L 1 may have a value in the range of ⁇ 1 or more and 1 or less.
  • learning with the first neural network model 100 and the second neural network model 110 may be repeated in a direction in which the control loss value L 1 is maximized.
  • the repetition of learning may mean adjusting a weight applied to the first neural network model 100 and the second neural network model 110.
  • a classification is made by creating a loss function by learning through a neural network model based on the label of the sentence to be learned. Perform.
  • many misclassifications can occur if the utterance to be classified does not fall into any class of the classification model.
  • classes are classified based on labels, many expressions may be misclassified even if the expressions are similar. For example, when the user input speech is "Send 'KakaoTalk' to 'XXX'", a case may be classified as "Send 'Text' to 'XXX'.”
  • An embodiment of the present disclosure not only learns using the first neural network model 100 in classifying a first class to which a first sentence belongs, but also a second neural network model to learn a second sentence belonging to a second class.
  • By calculating the contrast loss value (L 1 ) based on the identity of the two classes (y 2 ) there is provided a method and apparatus that can distinguish the degree of representational similarity between the first sentence and the second sentence.
  • the method and apparatus according to an embodiment of the present disclosure may use not only the label of a sentence or speech but also semantic similarity together to improve classification accuracy for sentences that are similar but belong to different classes.
  • the electronic device 200 may be a device that performs training for classifying a class to which a sentence belongs by using a neural network model.
  • the electronic device 200 may be a fixed terminal implemented as a computer device or a mobile terminal.
  • the electronic device 200 may be, for example, at least one of a smart phone, a mobile phone, a navigation device, a computer, a notebook computer, a digital broadcasting terminal, a personal digital assistant (PDA), a portable multimedia player (PMP), and a tablet PC. But it is not limited thereto.
  • the electronic device 200 may communicate with other electronic devices and / or servers through a network by using a wireless or wired communication scheme.
  • the electronic device 200 may include a processor 210, a memory 220, and a speech input unit 230.
  • the processor 210 may be configured to process instructions of a computer program by performing arithmetic, logic, and input / output operations, such as convolution operations. Instructions may be provided to the processor 210 by the memory 220. In one embodiment, processor 210 may be configured to execute a command received according to a program code stored in a recording device, such as memory 220. The processor 210 may be configured, for example, with at least one of a central processing unit, a microprocessor, and a graphic processing unit, but is not limited thereto. In an embodiment, when the electronic device 200 is a mobile device such as a smart phone, a tablet PC, or the like, the processor 210 may be an application processor (AP) for executing an application.
  • AP application processor
  • the processor 210 may perform training through a general artificial intelligence algorithm based on a deep neural network such as a neural network model.
  • the processor 210 may perform a natural language processing (NLP) such as extracting a word from a user's speech, a question sentence, and the like, converting the extracted word into a word vector to generate a sentence vector.
  • NLP natural language processing
  • the processor 210 parses a word object through objectization of a sentence, processes a still word (filters for articles, etc.) and generates a token (tense, plural unification, etc.), and then extracts highly related keywords based on the frequency of occurrence. You can manage this as an independent entity.
  • NLP natural language processing
  • the processor 210 learns a first feature vector through a first neural network using as input data a first sentence including at least one word and a first class to which the first sentence belongs.
  • the second feature vector may be learned through the second neural network using the second class and the second class to which the second sentence belongs.
  • the first sentence may be a sentence or speech input by a user
  • the second sentence may be a sentence extracted randomly among a plurality of sentences stored in a server or a database.
  • the processor 210 may quantify the degree of similarity in the expression of the first sentence and the second sentence based on the first feature vector, the second feature vector, and whether the first class and the second class are identical. ) Can be obtained.
  • the processor 210 may calculate the contrast loss value through a numerical expression representing the dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal. . The method of calculating the contrast loss value will be described in detail in the description with reference to FIG. 4.
  • the contrast loss value has a value in the range of -1 or more and 1 or less, and the processor 210 repeats the learning using the first neural network and the second neural network so that the obtained contrast loss value is maximum. Can be done.
  • the processor 210 may simultaneously perform learning through the first neural network and learning through the second neural network.
  • the electronic device 200 may further include a speech input unit 230 that receives a speech or sentence from a user.
  • the speech input unit 230 may include a voice recognition module for recognizing a user's voice, but is not limited thereto.
  • the speech input unit 230 may include, for example, a hardware module capable of receiving a user's sentence such as a keypad, a mouse, a touch pad, a touch screen, a jog switch, and the like.
  • the processor 210 recognizes a utterance input through the utterance input unit 230 as a sentence, parses and extracts at least one word included in the recognized sentence, and extracts at least one extracted word. Each can be converted to a word vector.
  • the processor 210 may embed a word into a vector using a machine learning model such as word2vec, GloVe, onehot, etc., but is not limited thereto.
  • the processor 210 may convert the word representation into a vector value that can be represented in a vector space using the machine learning model.
  • the processor 210 may generate a sentence vector by arranging at least one word vector in a matrix form, and input the sentence vector as input data to the first neural network to learn the first feature vector.
  • the processor 210 converts the first sentence into a matrix form comprising at least one word vector, inputs the transformed matrix into the convolutional neural network as input data, and inputs a plurality of filters.
  • a feature map may be generated and the first feature vector may be extracted by passing the feature map through a max pooling layer.
  • the processor 210 inputs the first feature vector into a fully connected layer and converts it into a one-dimensional vector value, and inputs the one-dimensional vector value to a softmax classifier.
  • a first classification prediction value representing a probability distribution classified into one class may be obtained.
  • the processor 210 learns and extracts a second feature vector, inputs the second feature vector into a fully connected layer, converts it into a one-dimensional vector value, and inputs the one-dimensional vector value to a soft max classifier to give the second class.
  • a second classification prediction value representing a probability distribution classified as may be obtained. Detailed description thereof will be described later in the description of FIG. 4.
  • the processor 210 may obtain a first classification loss value that is a difference between the first classification prediction value and the first class, and obtain a second classification loss value that is a difference between the second classification prediction value and the second class.
  • the processor 210 calculates a final loss value by summing the first classification loss value, the second classification loss value, and the control loss value, and is applied to the first neural network and the second neural network based on the calculated final loss value. Learning to adjust the weight can be repeated.
  • the memory 220 may be a computer-readable recording medium, and may include a permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive.
  • the memory 220 may store an operating system (OS) or at least one computer program code (for example, a code for a learning program through a neural network performed by the processor 210). .
  • OS operating system
  • Such computer program code may be stored in the memory 220 but may be loaded from a separate computer readable recording medium or a computer program product.
  • the computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD / CD-ROM drive, a memory card, and the like.
  • the computer program code may be installed in the electronic device 200 and loaded from the memory 220 by files provided from a server through a network.
  • the electronic device 200 may include a database.
  • the database may store a plurality of sentences and a plurality of classes to which each of the plurality of sentences belongs.
  • the database may be included as a component in the electronic device 200, but is not limited thereto.
  • the database may be configured in the form of a server disposed outside the electronic device 200.
  • the processor 210 may randomly extract the second sentence and the second class from the database and input the input data into the second neural network as input data to learn.
  • FIG. 3 is a flowchart illustrating a method of classifying a class to which a sentence belongs, according to an embodiment of the present disclosure.
  • the electronic device trains a first feature vector through a first neural network using the first sentence and the first class to which the first sentence belongs as input data.
  • the electronic device may receive a utterance or question from a user and recognize the received utterance or question as a sentence.
  • the electronic device may parse at least one word included in a recognized sentence by using natural language processing (NLP) technology, and may convert at least one word into at least one word vector.
  • NLP natural language processing
  • the electronic device may embed at least one word into at least one word vector using a machine learning model such as word2vec, GloVe, onehot, and the like, but is not limited thereto.
  • the electronic device may convert the word representation into a vector value that can be represented in a vector space using the machine learning model.
  • the electronic device generates a sentence vector by arranging the embedded at least one word vector in a matrix form, and inputs the generated sentence vector as input data into the first neural network to classify it as a first class. Learn probability distributions.
  • the electronic device learns the second feature vector through the second neural network using the second sentence and the second class to which the second sentence belongs as input data.
  • the second sentence and the second class to which the second sentence belongs may be stored in a database form.
  • the electronic device may randomly extract the second sentence and the second class from the database and input the second sentence and the second class as input data into the second neural network to learn.
  • step S320 is performed after step S310, but is not limited thereto.
  • the electronic device may simultaneously perform the step of learning the first feature vector (S310) and the step of learning the second feature vector (S320).
  • the electronic device obtains a contrast loss based on the first feature vector, the second feature vector, and whether the first class and the second class are identical.
  • the contrast loss value may be calculated through a numerical expression representing the dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal. The expression may output 1 when the first class and the second class are the same, and output 0 when the first class and the second class are not the same.
  • the electronic device repeats the learning using the first neural network and the second neural network so that the contrast loss value is maximum.
  • the control loss value may have a value ranging from -1 to 1, inclusive.
  • the electronic device may repeat the learning using the first neural network model and the second neural network model in a direction in which the contrast loss value is maximized.
  • repetition of learning may mean adjusting a weight applied to the first neural network model and the second neural network model.
  • FIG. 4 is a diagram for describing a method of classifying a class to which a sentence belongs, using a convolutional neural network, according to an embodiment of the present disclosure.
  • an electronic apparatus includes a first neural network (401), a first sentence (S i) of the first class (y 1), a first sentence (S i) to the input to the input data is the first class in the ( We can learn the probability distribution classified as y 1 ). Further, the electronic device is a second neural network (402), a second sentence (S j) and a second class (y 2) of this second class second sentence (S j) by entering the input data (y 2) in the Learn the probability distributions that are classified.
  • the first class (y 1) and a second class (y 2) is a vector may be a value that defines the class to which it belongs each of the first sentence vector (S i) and the second sentence vectors (S j).
  • the first neural network 401 and the second neural network 402 may be configured as a convolutional neural network model (CNN), but are not limited thereto.
  • the first neural network 401 and the second neural network 402 are a recurrent neural network (RNN), a deep belief network (DBN), a restricted Boltzman machine (Restricted Boltzman). It may be implemented as an artificial neural network model such as a machine (RBM) scheme or a machine learning model such as a support vector machine (SVM).
  • RNN recurrent neural network
  • DNN deep belief network
  • restricted Boltzman machine restricted Boltzman
  • RBM machine
  • SVM support vector machine
  • Electronic device extracts a plurality of words (words 1-1 to 1-6 words) in the first sentence (S i) with a natural language processing technology, and the first sentence (S i) a plurality of words (words 1-1 To words 1-6), and a plurality of words (words 1-1 to 1-6) can be extracted.
  • Figure 4 but in a total of six words shown in 1-1 to 1-6, which is exemplary and is not a word belonging to the first sentence (S i) is not limited to six.
  • the electronic device is a plurality of words (words 1-1 to 1-6 words), a plurality of word vectors, respectively can be converted to (wv wv 1-1 to 1 6).
  • the electronic device is the first sentence vector (S i) and the second sentence vectors (S j) is word2vec, GloVe, a plurality of words using a machine learning model, such as onehot (1-1 words to the words 1 -6 may be embedded into a plurality of word vectors wv 1 -1 to wv 1 -6 .
  • the sentence vector 411 may be an n ⁇ k matrix having n words and a dimension of k.
  • the electronic device may apply a plurality of filters 421 having different widths to the sentence vector 411 to perform a convolution operation, thereby generating a feature map 431.
  • the plurality of filters 421 are vectors having different weights, and the weight value may change as learning progresses.
  • the electronic device may generate the feature map 431 by multiplying and adding vector values of the sentence vector 411 and weight values of the plurality of filters 421.
  • the plurality of filters 421 are illustrated as having a width of 2, 3, and 4, but are not limited thereto.
  • the dimension k in the plurality of filters 421 may be the same as the dimension k of the sentence vector.
  • the electronic device may subsample the feature map 431 by passing the feature map 431 through a max pooling layer, and generate a first feature vector 441.
  • the first feature vector 441 is a single feature vector generated by extracting only the vector value having the maximum value from the feature map 431 through the max pooling layer, and is a representation of the first sentence Si. ) Can be defined as an expression vector.
  • FIG. 4 shows that the electronic device may generate the first feature vector 441 through average pooling or L 2 -norm pooling.
  • the electronic device may input and concatenate the first feature vector 441 to a fully connected layer, thereby generating the one-dimensional vector 451.
  • the electronic device may generate the first classification predicted value vector 461 by inputting the one-dimensional vector 451 to the softmax classifier.
  • a first classification vector prediction value 461 may represent a probability distribution, which can be classified as a first sentence (S i) a first class (y 1).
  • the electronic device may perform a dropout operation in order to prevent the occurrence of overfitting generated in the process of adjusting the weight.
  • the electronic device generates a second feature vector 442 by inputting the second sentence S j and the second class y 2 as input data to the second neural network 402, thereby generating a second classification predicted vector.
  • the learning method through the second neural network 402 is the same as the learning method through the first neural network 401 except for input data and learning results, and thus redundant description thereof will be omitted.
  • the electronic device includes a first feature vector 441, a second feature vector 442, and the first class (y 1) and the second class (y 2), a first sentence (S i) based on the identity if the and the Contrastive loss L 1 obtained by quantifying the degree of similarity in expression of two sentences S j may be obtained. If the first feature vector 441 is defined as F (S i ) and the second feature vector 442 is defined as F (S j ), the contrast loss value L 1 is calculated based on the following equation: Can be.
  • the contrast loss value L 1 may be calculated through the absolute value of the dot product and Y of the first feature vector F (S i ) and the second feature vector F (S j ). have.
  • Y is a notation for converting the identity of the first class (y 1 ) and the second class (y 2 ) to a number, and the first class (y 1 ) and the second class (y 2 ) 1 may be output when 0 is the same, and 0 may be output when first class y 1 and second class y 2 are not identical.
  • -1 can be calculated.
  • 0.
  • 0.
  • 1.
  • the contrast loss value L 1 is not only identical to the classes y 1 and y 2 in which the first and second feature vectors are classified, but also the first feature vector F (S i. ) And the similarity degree of the second feature vector F (S j ) may be determined.
  • the electronic device may learn in a direction in which the contrast loss value L 1 is maximized.
  • the contrast loss value L 1 has a value of -1 or more and 1 or less.
  • the electronic device may reduce the number of times of learning through the first neural network 401 and the second neural network 402. That is, when the first sentence Si and the second sentence S j have similar expressions even though they belong to different classes, the electronic device may increase the number of learning to distinguish each other.
  • the electronic device may not have increased the number of times of learning with relative case with similar expression belonging to the same class the first sentence (S i) and the second sentence (S j).
  • the electronic device may determine a first classification loss value, which is a difference value between a first classification prediction value vector 461 that is output data of learning through the first neural network 401 and a vector of the first class y 1 .
  • classification loss (L 2 ) can be obtained.
  • the electronic device determines a second classification loss value L 3 , which is a difference between a second classification prediction value vector 462, which is output data of learning through the second neural network 402, and a vector of the second class y 2 .
  • the first classification loss value L 2 and the second classification loss value L 3 are each classified into the first class S i as the first class y 1
  • the second sentence S j is the first classification loss value L 3 .
  • the electronic device may add a final loss value (total) by adding a control loss value L 1 , a first classification loss value L 2 , and a second classification loss value L 3 as shown in Equation 2 below.
  • loss) (L) can be calculated.
  • the electronic device may learn to adjust weights applied to the first neural network 401 and the second neural network 402 based on the calculated final loss value L.
  • a first sentence (S i) and the second sentence (S j) are each of the first class (y 1) and a second class (y 2) classification classified as loss
  • y 1 the first class
  • y 2 the second class
  • L 1 the contrast loss value
  • the electronic device executes an interactive robot program such as Bixby or the like, even if the first sentence S i , which is the speech input by the user, belongs to the first class y 1 , the sentence expression is different and thus is different. Can be classified as In this case the user is not the answer to your question in accordance with the first sentence of unwanted (S i) may be the answer incorrectly classified. In this case, the electronic device may increase the accuracy of classifying the class to which the user's question belongs by learning by considering the contrast loss value L 1 .
  • the may be a case where one sentence (S i) does not belong to any class of previously stored class to the electronic device, in which case the electronic device can not be classified as a either a class of the first sentence (S i) (reject), which can reduce the likelihood of users receiving unwanted answers.
  • FIG. 5 is a flowchart illustrating a method of obtaining, by an electronic device, a classification prediction value that is a probability value classified into a class to which a first sentence belongs.
  • the electronic device converts the first sentence into a matrix form including at least one word vector.
  • the first sentence may be a speech or sentence input by a user.
  • the electronic device may extract at least one word included in the first sentence and convert the at least one word into at least one word vector.
  • the electronic device may embed at least one word into at least one word vector using a machine learning model such as word2vec, GloVe, onehot.
  • the electronic device may generate the first sentence vector by arranging at least one word vector in a matrix form.
  • the electronic device inputs the converted matrix as input data to a convolutional neural network and generates a feature map by applying a plurality of filters.
  • the electronic device may apply a convolution operation by applying multiple filters having different widths.
  • the multiple filter is a vector having different weights, and the weight value may change as learning progresses.
  • the multiple filter may have the same dimension as the dimension of the sentence vector generated in step S510.
  • the electronic device extracts the first feature vector by passing the feature map through a max pooling layer.
  • the electronic device may extract a first feature vector that is a single feature vector generated by extracting only a vector value having a maximum value from the feature map through the max pooling layer.
  • the layer used for subsampling is not limited to the max pooling layer.
  • the electronic device may extract the first feature vector through average pooling or L 2 -norm pooling.
  • the electronic device inputs the first feature vector into a fully connected layer and converts the first feature vector into a one-dimensional vector value.
  • the electronic device may concatenate a first feature vector composed of a plurality of feature maps generated by using filters having different widths into one to convert a one-dimensional vector value.
  • a dropout operation may be used to solve overfitting occurring while learning the first feature vector and to increase the accuracy of the training data.
  • the electronic device obtains a first classification prediction value by inputting a one-dimensional vector value to a softmax classifier.
  • the first classification prediction value refers to a probability value in which the first sentence may be classified into the first class, and may be generated by passing through a soft max classifier.
  • the vector value included in the one-dimensional vector may be converted into a probability value through which the total sum of the vector values passes through the soft max classifier.
  • FIG. 5 illustrates a process of obtaining a first classification prediction value by inputting a first sentence to a convolutional neural network
  • the illustrated steps may be equally applied to the second sentence.
  • the electronic device may obtain a second classification prediction value by inputting the second sentence to the convolutional neural network according to steps S510 to S550.
  • the electronic device may simultaneously perform a first learning process of obtaining a first classification prediction value and a second learning process of obtaining a second classification prediction value.
  • FIG. 6 is a flowchart illustrating a learning method of adjusting, by an electronic device, a weight applied to a neural network model based on a loss value obtained through the neural network model, according to an embodiment of the present disclosure.
  • the electronic device obtains a first classification loss value that is a difference between the first classification prediction value and the first class.
  • the first classification loss value may mean a difference value between the first classification prediction value and the first class vector, which is a probability value in which the first sentence may be classified into the first class.
  • the electronic device obtains a second classification loss value that is a difference between the second classification prediction value and the second class.
  • the second classification loss value may mean a difference value between the second classification prediction value and the second class vector, which is a probability value in which the second sentence may be classified into the second class.
  • step S610 and step S620 may be performed simultaneously.
  • the electronic device obtains a final loss value by summing the first classification loss value, the second classification loss value, and the control loss value.
  • a detailed method of calculating the control loss value has been described in the description of FIG. 4, and thus redundant description will be omitted.
  • the electronic device adjusts a weight applied to the first neural network and the second neural network based on the final loss value.
  • the first neural network and the second neural network are composed of a convolutional neural network that generates a feature map by applying a plurality of filters, and the electronic device is applied to the convolutional neural network according to the magnitude of the final loss value.
  • the weight values of the plurality of filters may be adjusted.
  • the electronic device described herein may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components.
  • the electronic device described in the disclosed embodiments may include a processor, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), It may be implemented using one or more general purpose or special purpose computers, such as a microprocessor or any other device capable of executing and responding to instructions.
  • the software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device.
  • the software may be implemented as a computer program including instructions stored in a computer-readable storage media.
  • Computer-readable recording media include, for example, magnetic storage media (eg, read-only memory (ROM), random-access memory (RAM), floppy disks, hard disks, etc.) and optical read media (eg, CD-ROMs). (CD-ROM) and DVD (Digital Versatile Disc).
  • the computer readable recording medium can be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
  • the medium may be read by a computer, stored in a memory, and executed by a processor.
  • the computer is a device capable of calling stored instructions from a storage medium and operating according to the disclosed embodiments according to the called instructions, and may include an electronic device according to the disclosed embodiments.
  • the computer readable storage medium may be provided in the form of a non-transitory storage medium.
  • 'non-temporary' means that the storage medium does not include a signal and is tangible, but does not distinguish that the data is stored semi-permanently or temporarily on the storage medium.
  • an electronic device or method according to the disclosed embodiments may be provided included in a computer program product.
  • the computer program product may be traded between the seller and the buyer as a product.
  • the computer program product may include a software program, a computer-readable storage medium on which the software program is stored.
  • a computer program product may be a product (eg, a downloadable application) in the form of a software program distributed electronically through a manufacturer of an electronic device or an electronic market (eg, Google Play Store, App Store). ) May be included.
  • the storage medium may be a server of a manufacturer, a server of an electronic market, or a storage medium of a relay server that temporarily stores a software program.
  • the computer program product may include a storage medium of a server or a storage medium of a terminal in a system consisting of a server and a terminal (for example, an ultrasound diagnostic apparatus).
  • a third device eg, a smartphone
  • the computer program product may include a storage medium of the third device.
  • the computer program product may include a software program itself transmitted from the server to the terminal or the third device, or transmitted from the third device to the terminal.
  • one of the server, the terminal and the third device may execute the computer program product to perform the method according to the disclosed embodiments.
  • two or more of the server, the terminal and the third device may execute a computer program product to distribute and implement the method according to the disclosed embodiments.
  • a server eg, a cloud server or an artificial intelligence server, etc.
  • a server may execute a computer program product stored in the server to control a terminal connected to the server to perform the method according to the disclosed embodiments.
  • a third device may execute a computer program product to control a terminal in communication with the third device to perform the method according to the disclosed embodiment.
  • the third device may download the computer program product from the server and execute the downloaded computer program product.
  • the third apparatus may execute the provided computer program product in a preloaded state to perform the method according to the disclosed embodiments.

Abstract

Provided are a method and an apparatus for classifying a class, to which a sentence belongs, using a deep neural network. One embodiment of the present disclosure provides a method and an apparatus for: learning a first sentence and a second sentence through a first neural network and a second neural network, respectively; acquiring a contrastive loss value on the basis of whether a first feature vector and a second feature vector, which are generated as output data of learning, are identical to classes to which the first and second sentences belong; and repeating learning so as to maximize the contrastive loss value.

Description

딥 뉴럴 네트워크를 이용하여 문장이 속하는 클래스를 분류하는 방법 및 장치Method and apparatus for classifying a class to which a sentence belongs by using a deep neural network
본 개시는 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 질문 문장을 구조 분석하여, 문장이 속하는 클래스(class)를 분류하는 방법 및 장치에 관한 것이다. The present disclosure relates to a method and apparatus for classifying a class to which a sentence belongs, by structurally analyzing a question sentence using a deep neural network.
인공 지능(Artificial Intelligence, AI) 시스템은 인간 수준의 지능을 구현하는 컴퓨터 시스템이며, 기존 Rule 기반 스마트 시스템과 달리 기계가 스스로 학습하고 판단하며 똑똑해지는 시스템이다. 인공지능 시스템은 사용할수록 인식률이 향상되고 사용자 취향을 보다 정확하게 이해할 수 있게 되어, 기존 Rule 기반 스마트 시스템은 점차 딥 뉴럴 네트워크 기반 인공지능 시스템으로 대체되고 있다.Artificial Intelligence (AI) system is a computer system that implements human-level intelligence, and unlike conventional rule-based smart systems, machines learn and judge themselves and become smart. As the AI system is used, the recognition rate is improved and the user's taste can be understood more accurately, and the existing Rule-based smart system is gradually replaced by the AI system based on the deep neural network.
인공지능 기술은 기계학습 및 기계학습을 활용한 요소 기술들로 구성된다.AI technology is composed of machine learning and elementary technologies that utilize machine learning.
기계학습은 입력 데이터들의 특징을 스스로 분류/학습하는 알고리즘 기술이며, 요소기술은 뉴럴 네트워크 등의 기계학습 알고리즘을 활용하여 인간 두뇌의 인지, 판단 등의 기능을 모사하는 기술로서, 언어적 이해, 시각적 이해, 추론/예측, 지식 표현, 동작 제어 등의 기술 분야로 구성된다.Machine learning is an algorithm technology that classifies / learns characteristics of input data by itself. Element technology is a technology that simulates the functions of human brain cognition and judgment by using machine learning algorithms such as neural networks. It consists of technical areas such as understanding, reasoning / prediction, knowledge representation, and motion control.
인공지능 기술은 인간의 언어/문자를 인식하고 응용/처리할 수 있고, 자연어 처리, 기계 번역, 대화 시스템, 질의 응답, 음성 인식/합성 등에도 활용된다. 인공지능 기술을 이용한 질의 응답 시스템에서는 사용자의 질문 문장을 구조 분석하여, 답의 종류(answer type), 의도(intent), 주어/동사 분석 등을 실시하고, 이를 가지고 데이터베이스에서 관련 답변을 찾는다. 사용자의 명령을 수행하는 질의 응답 시스템의 경우에는 사용자 입력 발화를 분류하여 의도(intent)를 분석하고, 독립 개체(entity)를 발견하여 명령을 처리한다. AI technology can recognize, apply and process human language / characters, and is also used for natural language processing, machine translation, dialogue system, question and answer, speech recognition / synthesis, and so on. In the question-and-answer system using artificial intelligence technology, the structure of the user's question sentence is analyzed, and the answer type, intent, subject / verb analysis is performed, and the related answer is found in the database. In the case of a question-and-answer system that executes a user's command, the user's input speech is classified, the intent is analyzed, and an independent entity is found to process the command.
최근에는 인공지능 기술을 활용하여 사용자의 문제점을 분석하고 적절한 답변을 제공하는 고객 지원용 챗봇(Customer Care Chatbot)이 활용되고 있다. 고객 지원용 챗봇 등에서는 사용자의 발화를 분석하여 사용자가 답변을 받고자 하는 카테고리(category)를 분석하는 것이 중요하다. 기 저장되어 있는 질문의 양이 많지 않은 경우 사용자의 발화가 사용자의 의도와는 달리 다른 카테고리로 잘못 분류될 수도 있다. 이 경우, 사용자는 원하는 답변을 제공받지 못하는 문제점이 발생된다. Recently, customer care chatbots are being utilized that use artificial intelligence to analyze user problems and provide appropriate answers. In customer support chatbots, it is important to analyze the user's speech and analyze the category in which the user wants to receive an answer. If the amount of questions already stored is not large, the user's speech may be misclassified into a different category, unlike the user's intention. In this case, the user may not receive the desired answer.
본 개시는 제1 뉴럴 네트워크 모델을 이용하여 제1 문장이 속하는 클래스를 분류(classifying)하는데 있어서 별도의 제2 뉴럴 네트워크 모델을 추가로 이용하여 제1 문장의 분류 정확도를 높이는 방법 및 장치를 제공한다. The present disclosure provides a method and apparatus for increasing the classification accuracy of a first sentence by additionally using a separate second neural network model in classifying a class to which the first sentence belongs using the first neural network model. .
본 개시는 제1 문장이 속하는 제1 클래스를 분류하는데 있어서 제1 뉴럴 네트워크 모델을 사용하여 학습할 뿐만 아니라, 제2 뉴럴 네트워크 모델을 추가로 이용하여 제1 뉴럴 네트워크 모델을 통해 학습된 제1 특징 벡터, 제2 뉴럴 네트워크 모델을 통해 학습된 제2 특징 벡터, 및 제1 클래스와 제2 클래스의 동일성 여부에 기초한 대조 손실값(contrastive loss)을 도입함으로써 제1 문장과 제2 문장 사이의 표현 상의 유사 정도까지 구분할 수 있는 방법 및 장치를 제공한다. 따라서, 본 개시의 실시예에 따른 방법 및 장치는 문장 또는 발화의 레이블 뿐만 아니라 의미적인 유사도를 함께 사용하여, 문장 분류의 정확도를 향상시킬 수 있다.The present disclosure not only learns using a first neural network model in classifying a first class to which a first sentence belongs, but also uses a first neural network model to further learn a first feature. Vector, a second feature vector learned through a second neural network model, and a contrast loss based on the identity of the first class and the second class, thereby introducing a representational image between the first and second sentences. It provides a method and apparatus that can distinguish the degree of similarity. Accordingly, the method and apparatus according to the embodiment of the present disclosure may improve the accuracy of sentence classification by using not only the label of a sentence or speech but also semantic similarity.
본 개시는, 다음의 자세한 설명과 그에 수반되는 도면들의 결합으로 쉽게 이해될 수 있으며, 참조 번호(reference numerals)들은 구조적 구성요소(structural elements)를 의미한다.The present disclosure can be easily understood by the following detailed description and the accompanying drawings in which reference numerals refer to structural elements.
도 1은 본 개시의 일 실시예에 따른 뉴럴 네트워크 모델에 문장 벡터와 클래스를 입력하여 학습(training)함으로써, 문장이 속하는 클래스의 분류 예측값을 획득하는 실시예를 설명하기 위한 개념도이다. 1 is a conceptual diagram illustrating an embodiment of obtaining a classification prediction value of a class to which a sentence belongs by training by inputting a sentence vector and a class into a neural network model according to an embodiment of the present disclosure.
도 2는 본 개시의 일 실시예에 따른 전자 장치의 구성 요소를 도시한 블록도이다.2 is a block diagram illustrating components of an electronic device according to an embodiment of the present disclosure.
도 3은 본 개시의 일 실시예에 따른 전자 장치가 문장이 속하는 클래스를 분류하는 방법을 도시한 흐름도이다.3 is a flowchart illustrating a method of classifying a class to which a sentence belongs, according to an embodiment of the present disclosure.
도 4는 본 개시의 일 실시예에 따른 전자 장치가 컨볼루션 뉴럴 네트워크를 이용하여 문장이 속하는 클래스를 분류하는 방법을 설명하기 위한 도면이다.4 is a diagram for describing a method of classifying a class to which a sentence belongs, using a convolutional neural network, according to an embodiment of the present disclosure.
도 5는 본 개시의 일 실시예에 따른 전자 장치가 제1 문장이 속하는 클래스로 분류되는 확률값인 분류 예측값을 획득하는 방법을 도시한 흐름도이다.5 is a flowchart illustrating a method of obtaining, by an electronic device, a classification prediction value that is a probability value classified into a class to which a first sentence belongs.
도 6은 본 개시의 일 실시예에 따른 전자 장치가 뉴럴 네트워크 모델을 통해 획득된 손실값을 기초로 하여 뉴럴 네트워크 모델에 적용된 가중치(weight)를 조절하는 학습 방법을 도시한 흐름도이다.6 is a flowchart illustrating a learning method of adjusting, by an electronic device, a weight applied to a neural network model based on a loss value obtained through the neural network model, according to an embodiment of the present disclosure.
상술한 기술적 과제를 해결하기 위하여, 본 개시의 일 실시예는 적어도 하나의 단어를 포함하는 제1 문장과 상기 제1 문장이 속하는 제1 클래스를 입력 데이터로 하여 제1 뉴럴 네트워크(neural network)를 통해 제1 특징 벡터(feature vector)를 학습(training)하는 단계; 제2 문장과 상기 제2 문장이 속하는 제2 클래스를 입력 데이터로 하여 제2 뉴럴 네트워크를 통해 제2 특징 벡터를 학습하는 단계; 상기 제1 특징 벡터, 상기 제2 특징 벡터, 및 상기 제1 클래스와 제2 클래스의 동일성 여부에 기초하여 상기 제1 문장 및 상기 제2 문장의 표현 상의 유사 정도를 수치화한 대조 손실값(contrastive loss)을 획득하는 단계; 및 상기 대조 손실값이 최대가 되도록 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크를 이용한 학습을 반복하는 단계를 포함하는, 딥 뉴럴 네트워크(Deep neural network)를 이용하여 문장이 속하는 클래스(class)를 분류(classification)하는 방법을 제공한다. In order to solve the above technical problem, an embodiment of the present disclosure provides a first neural network using a first sentence including at least one word and a first class to which the first sentence belongs, as input data. Training a first feature vector through; Learning a second feature vector through a second neural network using as input data a second sentence and a second class to which the second sentence belongs; Contrastive loss that quantifies the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. Obtaining; And repeating the learning using the first neural network and the second neural network so that the contrast loss value is maximum, using a deep neural network. Provides a method of classification.
예를 들어, 상기 방법은 사용자로부터 발화를 입력 받는 단계; 입력 받은 상기 발화를 문장으로 인식하는 단계; 및 인식된 상기 문장 내에 포함된 적어도 하나의 단어를 추출하고, 상기 적어도 하나의 단어를 적어도 하나의 단어 벡터로 각각 변환하는 단계를 더 포함하고, 상기 제1 특징 벡터를 학습하는 단계는 적어도 하나의 단어 벡터를 매트릭스(matrix) 형태로 배열하여 문장 벡터를 생성하는 단계; 및 상기 문장 벡터를 상기 제1 뉴럴 네트워크에 입력 데이터로 입력시켜 상기 제1 특징 벡터를 학습할 수 있다. For example, the method may include receiving a speech from a user; Recognizing the received speech as a sentence; And extracting at least one word included in the recognized sentence, and converting the at least one word into at least one word vector, wherein learning the first feature vector comprises: Generating a sentence vector by arranging the word vectors in a matrix form; And learning the first feature vector by inputting the sentence vector as input data to the first neural network.
예를 들어, 복수의 문장과 상기 복수의 문장 각각이 속하는 복수의 클래스가 데이터베이스에 저장되어 있고, 상기 제2 문장 및 상기 제2 클래스는 상기 데이터베이스 상에서 무작위(random)로 추출될 수 있다.For example, a plurality of sentences and a plurality of classes to which each of the plurality of sentences belong are stored in a database, and the second sentence and the second class may be extracted randomly on the database.
예를 들어, 상기 대조 손실값을 획득하는 단계는 제1 특징 벡터와 제2 특징 벡터의 내적(dot product) 및 제1 클래스와 제2 클래스의 동일성 여부를 숫자로 나타내는 수식을 통해 상기 대조 손실값을 계산할 수 있다.For example, the obtaining of the contrast loss value may be performed using a formula representing a dot product of a first feature vector and a second feature vector and whether the first class and the second class are equal to each other by a numerical formula. Can be calculated.
예를 들어, 상기 수식은 상기 제1 클래스와 상기 제2 클래스가 동일한 경우 1을 출력하고, 상기 제1 클래스와 상기 제2 클래스가 동일하지 않은 경우 0을 출력할 수 있다. For example, the formula may output 1 when the first class and the second class are the same, and output 0 when the first class and the second class are not the same.
예를 들어, 제1 특징 벡터를 학습하는 단계는 상기 제1 문장을 적어도 하나의 단어 벡터를 포함하는 매트릭스 형태로 변환하는 단계; 변환된 매트릭스를 컨볼루션 뉴럴 네트워크에 입력 데이터로 입력하고, 복수의 필터(filter)를 적용하여 특징 맵(feature map)을 생성하는 단계; 및 상기 특징 맵을 맥스 풀링 레이어(max pooling layer)에 통과시켜 상기 제1 특징 벡터를 추출하는 단계를 포함할 수 있다. For example, learning the first feature vector may include converting the first sentence into a matrix form including at least one word vector; Inputting the transformed matrix into the convolutional neural network as input data and generating a feature map by applying a plurality of filters; And extracting the first feature vector by passing the feature map through a max pooling layer.
예를 들어, 상기 방법은 제1 특징 벡터를 완전 연결 레이어(fully connected layer)에 입력시켜 1차원 벡터값으로 변환하는 단계; 및 1차원 벡터값을 소프트 맥스 분류기(softmax classifier)에 입력시켜 제1 클래스로 분류되는 확률 분포를 나타내는 제1 분류 예측값을 획득하는 단계를 더 포함할 수 있다. For example, the method includes inputting a first feature vector into a fully connected layer and converting it into a one-dimensional vector value; And inputting a one-dimensional vector value to a softmax classifier to obtain a first classification prediction value representing a probability distribution classified into a first class.
예를 들어, 상기 방법은 상기 제1 분류 예측값과 상기 제1 클래스 간의 차이값인 제1 분류 손실값을 획득하는 단계; 상기 제2 뉴럴 네트워크를 통해 제2 문장이 제2 클래스로 분류되는 확률 분포를 나타내는 제2 분류 예측값을 획득하고, 상기 제2 분류 예측값과 상기 제2 클래스 간의 차이값인 제2 분류 손실값을 획득하는 단계; 및 상기 제1 분류 손실값, 상기 제2 분류 손실값 및 상기 대조 손실값을 합산하여 최종 손실값을 계산하고, 계산된 상기 최종 손실값을 기초로 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크에 적용되는 가중치(weight)를 조절하는 단계를 더 포함할 수 있다. For example, the method includes obtaining a first classification loss value that is a difference between the first classification prediction value and the first class; Acquire a second classification prediction value representing a probability distribution in which a second sentence is classified into a second class through the second neural network, and obtain a second classification loss value, which is a difference between the second classification prediction value and the second class. Doing; And calculating the final loss value by summing the first classification loss value, the second classification loss value, and the control loss value, and calculating the final loss value to the first neural network and the second neural network based on the calculated final loss value. The method may further include adjusting a weight applied.
예를 들어, 상기 제1 뉴럴 네트워크를 통한 학습 단계 및 상기 제2 뉴럴 네트워크를 통한 학습 단계는 동시에 수행될 수 있다.For example, the learning through the first neural network and the learning through the second neural network may be performed at the same time.
상술한 기술적 과제를 해결하기 위하여, 본 개시의 일 실시예는 딥 뉴럴 네트워크(deep neural network)를 이용하여 문장이 속하는 클래스(class)를 분류하는 전자 장치(electronic device)를 제공할 수 있다. 상기 전자 장치는 뉴럴 네트워크(neural network)를 이용하여 학습(training)을 수행하는 프로세서를 포함하고, 상기 프로세서는 적어도 하나의 단어를 포함하는 제1 문장과 상기 제1 문장이 속하는 제1 클래스를 입력 데이터로 하여 제1 뉴럴 네트워크를 통해 제1 특징 벡터(feature vector)를 학습하고, 제2 문장과 상기 제2 문장이 속하는 제2 클래스를 입력 데이터로 하여 제2 뉴럴 네트워크를 통해 제2 특징 벡터를 학습하고, 상기 제1 특징 벡터, 상기 제2 특징 벡터, 및 상기 제1 클래스와 제2 클래스의 동일성 여부에 기초하여 상기 제1 문장 및 상기 제2 문장의 표현 상의 유사 정도를 수치화한 대조 손실값(contrastive loss)을 획득하고, 상기 대조 손실값이 최대가 되도록 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크를 이용한 학습을 반복하여 수행할 수 있다. In order to solve the above technical problem, an embodiment of the present disclosure may provide an electronic device that classifies a class to which a sentence belongs, using a deep neural network. The electronic device includes a processor that performs training by using a neural network, and the processor inputs a first sentence including at least one word and a first class to which the first sentence belongs. A first feature vector is learned through a first neural network as data, and a second feature vector is obtained through a second neural network using as input data a second sentence and a second class to which the second sentence belongs. Contrast loss value obtained by learning and quantifying the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. (contrastive loss) can be obtained, and the learning using the first neural network and the second neural network can be repeated to maximize the contrast loss value.
예를 들어, 상기 전자 장치는 사용자로부터 발화를 입력 받는 발화 입력부를 더 포함하고, 상기 프로세서는 입력받은 상기 발화를 문장으로 인식하고, 인식된 상기 문장 내에 포함된 적어도 하나의 단어를 추출하고, 상기 적어도 하나의 단어를 적어도 하나의 단어 벡터로 각각 변환할 수 있다. For example, the electronic device may further include a speech input unit configured to receive a speech from a user, and the processor may recognize the received speech as a sentence, extract at least one word included in the recognized sentence, and At least one word may be converted into at least one word vector, respectively.
예를 들어, 상기 프로세서는 상기 적어도 하나의 단어 벡터를 매트릭스(matrix) 형태로 배열하여 문장 벡터를 생성하고, 상기 문장 벡터를 상기 제1 뉴럴 네트워크에 입력 데이터로 입력시켜 상기 제1 특징 벡터를 학습할 수 있다. For example, the processor generates the sentence vector by arranging the at least one word vector in a matrix form, and inputs the sentence vector as input data to the first neural network to learn the first feature vector. can do.
예를 들어, 상기 전자 장치는 복수의 문장과 상기 복수의 문장 각각이 속하는 복수의 클래스를 저장하는 데이터베이스를 더 포함하고, 상기 프로세서는 상기 데이터베이스로부터 상기 제2 문장 및 상기 제2 클래스는 상기 데이터베이스 상에서 무작위(random)로 추출하여 상기 제2 뉴럴 네트워크에 입력 데이터로 입력할 수 있다. For example, the electronic device further includes a database storing a plurality of sentences and a plurality of classes to which each of the plurality of sentences belongs, and wherein the processor is configured to store the second sentence and the second class on the database. The data may be extracted randomly and input to the second neural network as input data.
예를 들어, 상기 프로세서는 상기 제1 특징 벡터와 상기 제2 특징 벡터의 내적(dot product) 및 상기 제1 클래스와 제2 클래스의 동일성 여부를 숫자로 나타내는 수식을 통해 상기 대조 손실값을 계산할 수 있다. For example, the processor may calculate the contrast loss value through a numerical expression representing a dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal. have.
예를 들어, 상기 수식은 상기 제1 클래스와 상기 제2 클래스가 동일한 경우 1을 출력하고, 상기 제1 클래스와 상기 제2 클래스가 동일하지 않은 경우 0을 출력할 수 있다. For example, the formula may output 1 when the first class and the second class are the same, and output 0 when the first class and the second class are not the same.
예를 들어, 상기 프로세서는 상기 제1 문장을 적어도 하나의 단어 벡터를 포함하는 매트릭스 형태로 변환하고, 변환된 매트릭스를 컨볼루션 뉴럴 네트워크에 입력 데이터로 입력하고, 복수의 필터(filter)를 적용하여 특징 맵(feature map)을 생성하고, 상기 특징 맵을 맥스 풀링 레이어(max pooling layer)에 통과시켜 상기 제1 특징 벡터를 추출할 수 있다. For example, the processor converts the first sentence into a matrix including at least one word vector, inputs the converted matrix into input data into a convolutional neural network, and applies a plurality of filters. A feature map may be generated, and the first feature vector may be extracted by passing the feature map through a max pooling layer.
예를 들어, 상기 프로세서는 상기 제1 특징 벡터를 완전 연결 레이어(fully connected layer)에 입력시켜 1차원 벡터값으로 변환하고, 상기 1차원 벡터값을 소프트 맥스 분류기(softmax classifier)에 입력시켜 제1 클래스로 분류되는 확률 분포를 나타내는 제1 분류 예측값을 획득할 수 있다. For example, the processor may input the first feature vector into a fully connected layer and convert the first feature vector into a one-dimensional vector value, and input the one-dimensional vector value into a softmax classifier to generate the first feature vector. A first classification prediction value representing a probability distribution classified into a class may be obtained.
예를 들어, 상기 프로세서는 상기 제1 분류 예측값과 상기 제1 클래스 간의 차이값인 제1 분류 손실값을 획득하고, 상기 제2 뉴럴 네트워크를 통해 제2 문장이 제2 클래스로 분류되는 확률 분포를 나타내는 제2 분류 예측값을 획득하고, 상기 제2 분류 예측값과 상기 제2 클래스 간의 차이값인 제2 분류 손실값을 획득하고, 상기 제1 분류 손실값, 상기 제2 분류 손실값 및 상기 대조 손실값을 합산하여 최종 손실값을 계산하고, 계산된 상기 최종 손실값을 기초로 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크에 적용되는 가중치(weight)를 조절할 수 있다. For example, the processor obtains a first classification loss value, which is a difference value between the first classification prediction value and the first class, and calculates a probability distribution in which a second sentence is classified into a second class through the second neural network. Obtain a second classification prediction value indicating a second classification loss value that is a difference between the second classification prediction value and the second class, and obtain the first classification loss value, the second classification loss value, and the control loss value. The final loss value may be calculated by summing, and the weight applied to the first neural network and the second neural network may be adjusted based on the calculated final loss value.
예를 들어, 상기 프로세서는 상기 제1 뉴럴 네트워크를 통한 학습 및 상기 제2 뉴럴 네트워크를 통한 학습을 동시에 수행할 수 있다. For example, the processor may simultaneously perform learning through the first neural network and learning through the second neural network.
상술한 기술적 과제를 해결하기 위하여, 본 개시의 일 실시예는 컴퓨터로 읽을 수 있는 저장 매체를 포함하는 컴퓨터 프로그램 제품을 제공하고, 상기 저장 매체는 적어도 하나의 단어를 포함하는 제1 문장과 상기 제1 문장이 속하는 제1 클래스를 입력 데이터로 하여 제1 뉴럴 네트워크(neural network)를 통해 제1 특징 벡터(feature vector)를 학습(training)하는 단계; 제2 문장과 상기 제2 문장이 속하는 제2 클래스를 입력 데이터로 하여 제2 뉴럴 네트워크를 통해 제2 특징 벡터를 학습하는 단계; 상기 제1 특징 벡터, 상기 제2 특징 벡터, 및 상기 제1 클래스와 제2 클래스의 동일성 여부에 기초하여 상기 제1 문장 및 상기 제2 문장의 표현 상의 유사 정도를 수치화한 대조 손실값(contrastive loss)을 획득하는 단계; 및 상기 대조 손실값이 최대가 되도록 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크를 이용한 학습을 반복하는 단계를 수행하는 명령어들을 포함할 수 있다.In order to solve the above technical problem, an embodiment of the present disclosure provides a computer program product including a computer-readable storage medium, wherein the storage medium includes a first sentence comprising at least one word and the first sentence. Training a first feature vector through a first neural network using the first class to which one sentence belongs as input data; Learning a second feature vector through a second neural network using as input data a second sentence and a second class to which the second sentence belongs; Contrastive loss that quantifies the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. Obtaining; And repeating the learning using the first neural network and the second neural network such that the contrast loss value is maximized.
본 출원은 2017년 05월 16일에 출원된 미국 가출원 USPA 62/506,724 및 2018년 05월 15일 대한민국 특허청에 출원된 출원번호 10-2018-0055651을 기초로 한 우선권 주장 출원이다.This application is a priority claim application based on US Provisional Application USPA 62 / 506,724, filed May 16, 2017 and Application No. 10-2018-0055651, filed May 15, 2018, with the Korean Patent Office.
본 명세서의 실시예들에서 사용되는 용어는 본 개시의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 실시예의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the embodiments of the present disclosure have been selected as widely used general terms as possible in consideration of the functions of the present disclosure, but may vary according to the intention or precedent of the person skilled in the art, the emergence of new technologies, etc. . In addition, in certain cases, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding embodiment. Therefore, the terms used in the present disclosure should be defined based on the meanings of the terms and the contents throughout the present disclosure, rather than simply the names of the terms.
명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "...모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When any part of the specification is to "include" any component, this means that it may further include other components, except to exclude other components unless otherwise stated. In addition, the terms "... unit", "... module" described in the specification means a unit for processing at least one function or operation, which is implemented in hardware or software or a combination of hardware and software. Can be.
아래에서는 첨부한 도면을 참고하여 본 개시의 실시예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present disclosure. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention.
이하에서는 도면을 참조하여 본 개시의 실시예들을 상세하게 설명한다. Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.
도 1은 본 개시의 일 실시예에 따른 뉴럴 네트워크 모델(100, 110)에 문장 벡터(Si, Sj)와 클래스(y1, y2)를 입력하여 학습(training)함으로써, 문장이 속하는 클래스의 분류 예측값(y1', y2')을 획득하는 실시예를 설명하기 위한 개념도이다.1 illustrates training by inputting a sentence vector (S i , S j ) and a class (y 1 , y 2 ) into a neural network model (100, 110) according to an embodiment of the present disclosure. It is a conceptual diagram for explaining an embodiment of obtaining classification prediction values y 1 ′ and y 2 ′ of a class.
딥 뉴럴 네트워크(Deep Neural Network) 등을 포함하는 인공지능(AI) 알고리즘은 인공 뉴럴 네트워크(Artificial Neural Network, ANN)에 입력 데이터을 입력시키고, 컨볼루션 등의 연산을 통해 출력 데이터를 학습하는 것을 특징으로 한다. 인공 뉴럴 네트워크는 생물학적 뇌를 모델링한 컴퓨터 과학적 아키텍쳐(Computational Architecture)를 의미할 수 있다. 인공 뉴럴 네트워크 내에서, 뇌의 뉴런들에 해당되는 노드들은 서로 연결되어 있고, 입력 데이터를 처리하기 위하여 집합적으로 동작한다. 피드-포워드(feed-forward) 뉴럴 네트워크에서, 뉴럴 네트워크의 뉴런들은 다른 뉴런들과의 연결들(links)을 갖는다. 이와 같은 연결들은 뉴럴 네트워크를 통해, 한 방향으로, 예를 들어 순방향(forward direction)으로 확장될 수 있다. Artificial intelligence (AI) algorithms, including deep neural networks, include input data into an artificial neural network (ANN), and learn output data through operations such as convolution. do. Artificial neural networks can refer to a computer scientific architecture that models the biological brain. In an artificial neural network, nodes corresponding to neurons in the brain are connected to each other and operate collectively to process input data. In a feed-forward neural network, neurons in the neural network have links with other neurons. Such connections may extend in one direction, for example in a forward direction, via a neural network.
도 1을 참조하면, 제1 뉴럴 네트워크 모델(100)에 제1 문장 벡터(Si) 및 제1 클래스(y1)가 입력 데이터로서 입력되고, 제1 뉴럴 네트워크 모델(100)을 통한 학습(training)을 통해 제1 분류 예측값(y1')이 출력될 수 있다. 또한, 제2 뉴럴 네트워크 모델(110)에 제2 문장 벡터(Sj) 및 제2 클래스(y2)가 입력 데이터로서 입력되고, 제2 뉴럴 네트워크 모델(110)을 통한 학습을 통해 제2 분류 예측값(y2')이 출력될 수 있다. 1, the first, the first sentence vector (S i) and a first class (y 1) to the neural network model 100 is input as the input data, the first learning by the neural network model 100 ( The first classification predicted value y 1 ′ may be output through training. In addition, a second sentence vector S j and a second class y 2 are input to the second neural network model 110 as input data, and the second classification is performed through learning through the second neural network model 110. The predicted value y 2 ′ may be output.
도 1에 도시된 제1 뉴럴 네트워크 모델(100) 및 제2 뉴럴 네트워크 모델(110)은 컨볼루션 뉴럴 네트워크(Convolution Neural Network, CNN)으로 구현될 수 있지만, 이에 한정되는 것은 아니다. 일 실시예에서, 제1 뉴럴 네트워크 모델(100) 및 제2 뉴럴 네트워크 모델(110)은 회귀 뉴럴 네트워크(Recurrent Neural Network, RNN), 딥 빌리프 네트워크(Deep Belief Network, DBN), 제한된 볼츠만 기계(Restricted Boltzman Machine, RBM) 방식 등의 인공 뉴럴 네트워크 모델로 구현되거나, 또는 서포트 벡터 머신(Support Vector Machine, SVM)과 같은 머신 러닝 모델로도 구현될 수 있다. The first neural network model 100 and the second neural network model 110 shown in FIG. 1 may be implemented as a convolutional neural network (CNN), but are not limited thereto. In one embodiment, the first neural network model 100 and the second neural network model 110 comprise a recurrent neural network (RNN), a deep belief network (DBN), a limited Boltzmann machine ( It can be implemented as an artificial neural network model such as a Restricted Boltzman Machine (RBM) method, or a machine learning model such as a support vector machine (SVM).
제1 문장 벡터(Si) 및 제2 문장 벡터(Sj)는 자연어 처리 기술을 통해 사용자가 입력한 문장 또는 발화에 포함된 적어도 하나의 단어를 파싱(parsing)하여 추출되고, 추출된 단어를 벡터로 변환하여 생성될 수 있다. 일 실시예에서, 제1 문장 벡터(Si) 및 제2 문장 벡터(Sj)는 word2vec, GloVe, onehot 과 같은 단어를 벡터로 임베딩(embedding)하는 기계 학습 모델을 통해 생성될 수 있으나, 이에 한정되는 것은 아니다. 제1 문장 벡터(Si) 및 제2 문장 벡터(Sj)는 적어도 하나의 단어 벡터를 매트릭스(matrix) 형태로 배열함으로써 생성될 수 있다. The first sentence vector S i and the second sentence vector S j are extracted by parsing at least one word included in a sentence or utterance input by a user through a natural language processing technique, and extracts the extracted word. Can be generated by converting to a vector. In one embodiment, the first sentence vector (S i) and the second sentence vectors (S j), but may be generated through a machine learning model for embedding (embedding) the word, such as word2vec, GloVe, onehot as vectors, whereby It is not limited. The first sentence vector Si and the second sentence vector S j may be generated by arranging at least one word vector in a matrix form.
제1 클래스(y1) 및 제2 클래스(y2)은 각각 제1 문장 벡터(Si) 및 제2 문장 벡터(Sj)가 속하는 클래스를 정의하는 벡터값일 수 있다. 여기서, 클래스(class)란 계층을 의미하는 것이 아니고, 문장이 속하는 카테고리 분류, 예컨대 정치, 사회, 경제, 문화, 연예, IT 등을 의미할 수 있다.The first class (y 1) and a second class (y 2) is a vector may be a value that defines the class to which it belongs each of the first sentence vector (S i) and the second sentence vectors (S j). Here, a class does not mean a hierarchy, but may mean a category classification to which a sentence belongs, for example, politics, society, economy, culture, entertainment, IT, and the like.
제1 분류 예측값(y1')은 제1 뉴럴 네트워크 모델(100)을 통한 학습으로 인하여 출력된 결과 데이터로서, 제1 문장 벡터(Si)가 제1 뉴럴 네트워크 모델(100)을 통해 학습되어 제1 클래스(y1)로 분류될 수 있는 확률값을 의미할 수 있다. 예를 들어, '정치' 카테고리의 해당되는 제1 클래스(y1) 값이 (1, 0, 0)인데, 제1 분류 예측값(y1')이 (0.9, 0.05, 0.05)인 경우 제1 문장 벡터(Si)에 대응되는 제1 문장은 '정치'에 관련된 글로 분류될 수 있다. 제2 분류 예측값(y2')은 제2 뉴럴 네트워크 모델(110)을 통해 출력된 결과값으로서, 제2 문장 벡터(Sj)가 제2 뉴럴 네트워크 모델(110)을 통해 학습되어 제2 클래스(y2)로 분류될 수 있는 확률값을 의미할 수 있다. 제1 분류 예측값(y1')과 제1 클래스(y1)의 차이값을 계산하여 제1 분류 손실값(Classification Loss)을 획득할 수 있다. 마찬가지로, 제2 분류 예측값(y2')과 제2 클래스(y2)의 차이값을 계산하여 제2 분류 손실값을 획득할 수 있다. First classification predicted value (y 1 ') are learned through the first neural network model 100, the first sentence vector (S i) of the first neural network model 100 as a result of the data output due to learning by It may mean a probability value that may be classified as the first class y 1 . For example, if the corresponding first class (y 1 ) value of the 'politics' category is (1, 0, 0), and the first classification prediction value (y 1 ') is (0.9, 0.05, 0.05), the first first sentence corresponding to the sentence vector (S i) can be related to the global category "politics." The second classification predicted value y 2 ′ is a result value output through the second neural network model 110, and the second sentence vector S j is trained through the second neural network model 110 to generate a second class. It may mean a probability value that may be classified as (y 2 ). A first classification loss value may be obtained by calculating a difference value between the first classification prediction value y 1 ′ and the first class y 1 . Similarly, a second classification loss value may be obtained by calculating a difference value between the second classification prediction value y 2 ′ and the second class y 2 .
일 실시예에서, 제1 뉴럴 네트워크 모델(100) 및 제2 뉴럴 네트워크 모델(110)은 컨볼루션 뉴럴 네트워크(CNN)으로 구성될 수 있다. 제1 문장 벡터(Si) 및 제2 문장 벡터(Sj)는 각각 제1 뉴럴 네트워크 모델(100) 및 제2 뉴럴 네트워크 모델(110)을 통해 서로 다른 폭(width)을 갖는 복수의 필터를 통해 컨볼루션 연산되고, 각각 제1 특징 벡터 및 제2 특징 벡터가 학습될 수 있다. 제1 뉴럴 네트워크 모델(100)을 통해 학습된 제1 특징 벡터, 제2 뉴럴 네트워크 모델(110)을 통해 학습된 제2 특징 벡터, 및 제1 클래스(y1)와 제2 클래스(y2)의 동일성 여부에 기초하여 대조 손실값(contrastive loss)(L1)이 획득될 수 있다. 일 실시예에서, 대조 손실값(L1)은 제1 특징 벡터와 제2 특징 벡터의 내적(dot product) 및 제1 클래스와 제2 클래스의 동일 여부를 숫자로 나타내는 수식을 통해 계산될 수 있다. 대조 손실값(L1)에 대해서는 도 4의 설명 부분에서 상세하게 설명하기로 한다. In one embodiment, the first neural network model 100 and the second neural network model 110 may be configured as a convolutional neural network (CNN). The first sentence vector Si and the second sentence vector S j are each configured to filter a plurality of filters having different widths through the first neural network model 100 and the second neural network model 110. Through convolution operation, the first and second feature vectors may be learned, respectively. A first feature vector learned through the first neural network model 100, a second feature vector trained through the second neural network model 110, and a first class y 1 and a second class y 2 . Contrastive loss L 1 may be obtained based on the identity of. In one embodiment, the contrast loss value L 1 may be calculated through a numerical expression representing the dot product of the first feature vector and the second feature vector and whether the first class and the second class are the same. . The control loss value L 1 will be described in detail in the description of FIG. 4.
일 실시예에서, 대조 손실값(L1)은 -1 이상 1 이하의 범위의 값을 가질 수 있다. 본 개시의 일 실시예에서, 대조 손실값(L1)이 최대가 되는 방향으로 제1 뉴럴 네트워크 모델(100) 및 제2 뉴럴 네트워크 모델(110)을 통한 학습이 반복될 수 있다. 여기서, 학습의 반복은 제1 뉴럴 네트워크 모델(100) 및 제2 뉴럴 네트워크 모델(110)에 적용되는 가중치(weight)를 조절하는 것을 의미할 수 있다. In one embodiment, the control loss value L 1 may have a value in the range of −1 or more and 1 or less. In one embodiment of the present disclosure, learning with the first neural network model 100 and the second neural network model 110 may be repeated in a direction in which the control loss value L 1 is maximized. Here, the repetition of learning may mean adjusting a weight applied to the first neural network model 100 and the second neural network model 110.
일반적인 딥 뉴럴 네트워크(Deep Neural Network) 기반의 문장 분류(text classification)에서는 학습될 문장의 레이블(Label)을 기초로 하나의 뉴럴 네트워크 모델을 통해 학습하여 하나의 손실 함수(Loss function)를 만들어 분류를 수행한다. 따라서, 분류될 발화가 분류 모델의 어떤 클래스에도 해당되지 않는 경우 많은 오분류가 일어날 수 있다. 또한, 라벨을 위주로 클래스를 분류하는바, 표현이 유사하면 다른 클래스라도 오분류되는 경우가 다수 존재한다. 예를 들어, 사용자 입력 발화가 "'XXX'에게 '카카오톡' 보내줘"인 경우, "'XXX'에게 '문자' 보내줘"로 분류되는 경우가 발생될 수 있다. In general deep neural network-based text classification, a classification is made by creating a loss function by learning through a neural network model based on the label of the sentence to be learned. Perform. Thus, many misclassifications can occur if the utterance to be classified does not fall into any class of the classification model. In addition, since classes are classified based on labels, many expressions may be misclassified even if the expressions are similar. For example, when the user input speech is "Send 'KakaoTalk' to 'XXX'", a case may be classified as "Send 'Text' to 'XXX'."
본 개시의 일 실시예는, 제1 문장이 속하는 제1 클래스를 분류하는데 있어서 제1 뉴럴 네트워크 모델(100)을 사용하여 학습할 뿐만 아니라, 제2 클래스에 속하는 제2 문장을 제2 뉴럴 네트워크 모델(110)을 통해 학습하고, 제1 뉴럴 네트워크 모델(100)에서 학습된 제1 특징 벡터, 제2 뉴럴 네트워크 모델(110)에서 학습된 제2 특징 벡터, 및 제1 클래스(y1)와 제2 클래스(y2)의 동일성에 기초하여 대조 손실값(L1)을 계산함으로써, 제1 문장과 제2 문장 사이의 표현 상의 유사 정도까지 구분할 수 있는 방법 및 장치를 제공한다. 따라서, 본 개시의 일 실시예에 따른 방법 및 장치는 문장 또는 발화의 레이블 뿐만 아니라 의미적인 유사도를 함께 사용하여, 유사하지만 다른 클래스에 속하는 문장에 대한 분류 정확도를 향상시킬 수 있다. An embodiment of the present disclosure not only learns using the first neural network model 100 in classifying a first class to which a first sentence belongs, but also a second neural network model to learn a second sentence belonging to a second class. A first feature vector learned through 110 and learned from the first neural network model 100, a second feature vector learned from the second neural network model 110, and a first class y 1 and By calculating the contrast loss value (L 1 ) based on the identity of the two classes (y 2 ), there is provided a method and apparatus that can distinguish the degree of representational similarity between the first sentence and the second sentence. Accordingly, the method and apparatus according to an embodiment of the present disclosure may use not only the label of a sentence or speech but also semantic similarity together to improve classification accuracy for sentences that are similar but belong to different classes.
도 2는 본 개시의 일 실시예에 따른 전자 장치(200)의 구성 요소를 도시한 블록도이다. 전자 장치(200)는 뉴럴 네트워크 모델을 이용하여 문장이 속하는 클래스를 분류(classification)하는 학습(training)을 수행하는 장치일 수 있다. 전자 장치(200)는 컴퓨터 장치로 구현되는 고정형 단말이거나 이동형 단말일 수 있다. 전자 장치(200)는 예를 들어, 스마트 폰(smart phone), 휴대폰, 내비게이션, 컴퓨터, 노트북, 디지털방송용 단말, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 및 태블릿 PC 중 적어도 하나일 수 있으나, 이에 한정되지 않는다. 전자 장치(200)는 무선 또는 유선 통신 방식을 이용하여 네트워크를 통해 다른 전자 장치 및/또는 서버와 통신할 수 있다. 2 is a block diagram illustrating components of the electronic device 200 according to an embodiment of the present disclosure. The electronic device 200 may be a device that performs training for classifying a class to which a sentence belongs by using a neural network model. The electronic device 200 may be a fixed terminal implemented as a computer device or a mobile terminal. The electronic device 200 may be, for example, at least one of a smart phone, a mobile phone, a navigation device, a computer, a notebook computer, a digital broadcasting terminal, a personal digital assistant (PDA), a portable multimedia player (PMP), and a tablet PC. But it is not limited thereto. The electronic device 200 may communicate with other electronic devices and / or servers through a network by using a wireless or wired communication scheme.
도 2를 참조하면, 전자 장치(200)는 프로세서(210), 메모리(220), 및 발화 입력부(230)를 포함할 수 있다. Referring to FIG. 2, the electronic device 200 may include a processor 210, a memory 220, and a speech input unit 230.
프로세서(210)는 컨볼루션 연산과 같은 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령(instruction)은 메모리(220)에 의해 프로세서(210)에 제공될 수 있다. 일 실시예에서, 프로세서(210)는 메모리(220)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다. 프로세서(210)는 예를 들어, 중앙 처리 장치(Central Processing Unit), 마이크로 프로세서(microprocessor) 및 그래픽 프로세서(Graphic Processing Unit) 중 적어도 하나로 구성될 수 있으나, 이에 제한되는 것은 아니다. 일 실시예에서, 전자 장치(200)가 스마트 폰, 태블릿 PC 등과 같은 모바일 디바이스인 경우, 프로세서(210)는 애플리케이션을 실행시키는 애플리케이션 프로세서(Application Processor, AP)일 수 있다. The processor 210 may be configured to process instructions of a computer program by performing arithmetic, logic, and input / output operations, such as convolution operations. Instructions may be provided to the processor 210 by the memory 220. In one embodiment, processor 210 may be configured to execute a command received according to a program code stored in a recording device, such as memory 220. The processor 210 may be configured, for example, with at least one of a central processing unit, a microprocessor, and a graphic processing unit, but is not limited thereto. In an embodiment, when the electronic device 200 is a mobile device such as a smart phone, a tablet PC, or the like, the processor 210 may be an application processor (AP) for executing an application.
프로세서(210)는 뉴럴 네트워크 모델(neural network model)과 같은 딥 뉴럴 네트워크 기반의 범용 인공 지능 알고리즘을 통해 학습(training)을 수행할 수 있다. The processor 210 may perform training through a general artificial intelligence algorithm based on a deep neural network such as a neural network model.
프로세서(210)는 사용자의 발화, 질문 문장 등에서 단어를 추출하고, 추출된 단어를 단어 벡터로 변환하여 문장 벡터를 생성하는 등의 자연어 처리(Natural Language Processing, NLP)를 수행할 수도 있다. 프로세서(210)는 문장의 객체화를 통해 단어 객체를 파싱하고 정지어 처리(관사 등 필터링) 및 토큰 생성(시제, 복수형통일 등) 처리를 거친 이후, 출현 빈도에 기반하여 연관도 높은 키워드를 추출하고 이를 독립 개체(entity)로 관리할 수 있다. The processor 210 may perform a natural language processing (NLP) such as extracting a word from a user's speech, a question sentence, and the like, converting the extracted word into a word vector to generate a sentence vector. The processor 210 parses a word object through objectization of a sentence, processes a still word (filters for articles, etc.) and generates a token (tense, plural unification, etc.), and then extracts highly related keywords based on the frequency of occurrence. You can manage this as an independent entity.
일 실시예에서, 프로세서(210)는 적어도 하나의 단어를 포함하는 제1 문장과 제1 문장이 속하는 제1 클래스를 입력 데이터로 하여 제1 뉴럴 네트워크를 통해 제1 특징 벡터(feature vector)를 학습하고, 제2 문장과 제2 문장이 속하는 제2 클래스를 입력 데이터로 하여 제2 뉴럴 네트워크를 통해 제2 특징 벡터를 학습할 수 있다. 여기서, 제1 문장은 사용자가 입력한 문장 또는 발화이고, 제2 문장은 서버 또는 데이터베이스에 저장되어 있는 복수의 문장 중 무작위(random)로 추출된 문장일 수 있다. In an embodiment, the processor 210 learns a first feature vector through a first neural network using as input data a first sentence including at least one word and a first class to which the first sentence belongs. The second feature vector may be learned through the second neural network using the second class and the second class to which the second sentence belongs. Here, the first sentence may be a sentence or speech input by a user, and the second sentence may be a sentence extracted randomly among a plurality of sentences stored in a server or a database.
프로세서(210)는 제1 특징 벡터, 제2 특징 벡터, 및 제1 클래스와 제2 클래스의 동일성 여부에 기초하여 제1 문장 및 제2 문장의 표현 상의 유사 정도를 수치화한 대조 손실값(contrastive loss)을 획득할 수 있다. 일 실시예에서, 프로세서(210)는 제1 특징 벡터와 제2 특징 벡터의 내적(dot product) 및 제1 클래스와 제2 클래스의 동일성 여부를 숫자로 나타내는 수식을 통해 대조 손실값을 계산할 수 있다. 대조 손실값을 계산하는 방법은 도 4에 관한 설명 부분에서 상세하게 설명하기로 한다. The processor 210 may quantify the degree of similarity in the expression of the first sentence and the second sentence based on the first feature vector, the second feature vector, and whether the first class and the second class are identical. ) Can be obtained. In one embodiment, the processor 210 may calculate the contrast loss value through a numerical expression representing the dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal. . The method of calculating the contrast loss value will be described in detail in the description with reference to FIG. 4.
일 실시예에서, 대조 손실값은 -1 이상 1 이하의 범위 내의 값을 갖는데, 프로세서(210)는 획득한 대조 손실값이 최대가 되도록 제1 뉴럴 네트워크 및 제2 뉴럴 네트워크를 이용한 학습을 반복하여 수행할 수 있다. In one embodiment, the contrast loss value has a value in the range of -1 or more and 1 or less, and the processor 210 repeats the learning using the first neural network and the second neural network so that the obtained contrast loss value is maximum. Can be done.
일 실시예에서, 프로세서(210)는 제1 뉴럴 네트워크를 통한 학습 및 제2 뉴럴 네트워크를 통한 학습을 동시에 수행할 수 있다. In one embodiment, the processor 210 may simultaneously perform learning through the first neural network and learning through the second neural network.
전자 장치(200)는 사용자로부터 발화 또는 문장을 입력받는 발화 입력부(230)를 더 포함할 수 있다. 발화 입력부(230)는 사용자의 음성을 인식할 수 있는 음성 인식 모듈을 포함할 수 있으나, 이에 제한되는 것은 아니다. 발화 입력부(230)는 예를 들어, 키패드(key pad), 마우스, 터치 패드, 터치스크린, 조그 스위치 등 사용자의 문장을 입력받을 수 있는 하드웨어 모듈로 구성될 수 있다. 프로세서(210)는 발화 입력부(230)를 통해 입력받은 발화를 문장으로 인식하고, 인식된 문장 내에 포함된 적어도 하나의 단어를 파싱(parsing)하여 추출하고, 추출된 적어도 하나의 단어를 적어도 하나의 단어 벡터로 각각 변환할 수 있다. 일 실시예에서, 프로세서(210)는 word2vec, GloVe, onehot 등과 같은 기계 학습 모델을 이용하여 단어를 벡터로 임베딩(embedding)할 수 있으나, 이에 한정되는 것은 아니다. 프로세서(210)는 상기 기계 학습 모델을 이용하여 단어 표현을 벡터 공간 상에 나타낼 수 있는 벡터 값으로 변환할 수 있다. The electronic device 200 may further include a speech input unit 230 that receives a speech or sentence from a user. The speech input unit 230 may include a voice recognition module for recognizing a user's voice, but is not limited thereto. The speech input unit 230 may include, for example, a hardware module capable of receiving a user's sentence such as a keypad, a mouse, a touch pad, a touch screen, a jog switch, and the like. The processor 210 recognizes a utterance input through the utterance input unit 230 as a sentence, parses and extracts at least one word included in the recognized sentence, and extracts at least one extracted word. Each can be converted to a word vector. In one embodiment, the processor 210 may embed a word into a vector using a machine learning model such as word2vec, GloVe, onehot, etc., but is not limited thereto. The processor 210 may convert the word representation into a vector value that can be represented in a vector space using the machine learning model.
프로세서(210)는 적어도 하나의 단어 벡터를 매트릭스(matrix) 형태로 배열하여 문장 벡터를 생성하고, 문장 벡터를 제1 뉴럴 네트워크에 입력 데이터로 입력시켜 제1 특징 벡터를 학습할 수 있다. 일 실시예에서, 프로세서(210)는 제1 문장을 적어도 하나의 단어 벡터를 포함하는 매트릭스 형태로 변환하고, 변환된 매트릭스를 컨볼루션 뉴럴 네트워크에 입력 데이터로 입력하고, 복수의 필터(filter)를 적용하여 특징 맵(feature map)을 생성하고, 특징 맵을 맥스 풀링 레이어(max pooling layer)에 통과시켜 제1 특징 벡터를 추출할 수 있다.The processor 210 may generate a sentence vector by arranging at least one word vector in a matrix form, and input the sentence vector as input data to the first neural network to learn the first feature vector. In one embodiment, the processor 210 converts the first sentence into a matrix form comprising at least one word vector, inputs the transformed matrix into the convolutional neural network as input data, and inputs a plurality of filters. A feature map may be generated and the first feature vector may be extracted by passing the feature map through a max pooling layer.
일 실시예에서, 프로세서(210)는 제1 특징 벡터를 완전 연결 레이어(fully connected layer)에 입력시켜 1차원 벡터값으로 변환하고, 1차원 벡터값을 소프트 맥스 분류기(softmax classifier)에 입력시켜 제1 클래스로 분류되는 확률 분포를 나타내는 제1 분류 예측값을 획득할 수 있다. 마찬가지로, 프로세서(210)는 제2 특징 벡터를 학습하여 추출하고, 제2 특징 벡터를 완전 연결 레이어에 입력시켜 1차원 벡터값으로 변환하고, 1차원 벡터값을 소프트 맥스 분류기에 입력시켜 제2 클래스로 분류되는 확률 분포를 나타내는 제2 분류 예측값을 획득할 수 있다. 이에 대한 상세한 설명은 도 4의 설명 부분에서 후술하기로 한다. In one embodiment, the processor 210 inputs the first feature vector into a fully connected layer and converts it into a one-dimensional vector value, and inputs the one-dimensional vector value to a softmax classifier. A first classification prediction value representing a probability distribution classified into one class may be obtained. Similarly, the processor 210 learns and extracts a second feature vector, inputs the second feature vector into a fully connected layer, converts it into a one-dimensional vector value, and inputs the one-dimensional vector value to a soft max classifier to give the second class. A second classification prediction value representing a probability distribution classified as may be obtained. Detailed description thereof will be described later in the description of FIG. 4.
프로세서(210)는 제1 분류 예측값과 제1 클래스 간의 차이값인 제1 분류 손실값을 획득하고, 제2 분류 예측값과 제2 클래스 간의 차이값인 제2 분류 손실값을 획득할 수 있다. 프로세서(210)는 제1 분류 손실값, 제2 분류 손실값 및 대조 손실값을 합산하여 최종 손실값을 계산하고, 계산된 최종 손실값을 기초로 제1 뉴럴 네트워크 및 제2 뉴럴 네트워크에 적용되는 가중치(weight)를 조절하는 학습을 반복할 수 있다. The processor 210 may obtain a first classification loss value that is a difference between the first classification prediction value and the first class, and obtain a second classification loss value that is a difference between the second classification prediction value and the second class. The processor 210 calculates a final loss value by summing the first classification loss value, the second classification loss value, and the control loss value, and is applied to the first neural network and the second neural network based on the calculated final loss value. Learning to adjust the weight can be repeated.
메모리(220)는 컴퓨터에서 판독 가능한 기록 매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 또한, 메모리(220)에는 운영 체제(Operating System, OS)나 적어도 하나의 컴퓨터 프로그램 코드(예를 들어, 프로세서(210)에서 수행되는 뉴럴 네트워크를 통한 학습 프로그램 등을 위한 코드)가 저장될 수 있다. 이와 같은 컴퓨터 프로그램 코드는 메모리(220)에 저장될 수도 있지만, 별도의 컴퓨터에서 판독 가능한 기록 매체 또는 컴퓨터 프로그램 제품(Computer Program Product)로부터 로딩될 수 있다. 컴퓨터에서 판독 가능한 기록 매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리카드 등의 컴퓨터에서 판독 가능한 기록 매체를 포함할 수 있다. 일 실시예에서, 컴퓨터 프로그램 코드는 서버에서부터 네트워크를 통해 제공하는 파일들에 의해 전자 장치(200)에 설치되고, 메모리(220)에서 로딩될 수 있다. The memory 220 may be a computer-readable recording medium, and may include a permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. The memory 220 may store an operating system (OS) or at least one computer program code (for example, a code for a learning program through a neural network performed by the processor 210). . Such computer program code may be stored in the memory 220 but may be loaded from a separate computer readable recording medium or a computer program product. The computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD / CD-ROM drive, a memory card, and the like. In one embodiment, the computer program code may be installed in the electronic device 200 and loaded from the memory 220 by files provided from a server through a network.
도면에는 도시되지 않았지만, 전자 장치(200)는 데이터베이스를 포함할 수 있다. 데이터베이스는 복수의 문장과 복수의 문장 각각이 속하는 복수의 클래스를 저장할 수 있다. 일 실시예에서, 데이터베이스는 전자 장치(200) 내에 구성 요소로 포함될 수 있지만, 이에 한정되는 것은 아니다. 데이터베이스는 전자 장치(200)의 외부에 배치되는 서버 형태로 구성될 수도 있다. 일 실시예에서, 프로세서(210)는 데이터베이스로부터 제2 문장 및 제2 클래스를 무작위(random)로 추출하여 제2 뉴럴 네트워크에 입력 데이터로 입력시켜 학습할 수 있다. Although not illustrated, the electronic device 200 may include a database. The database may store a plurality of sentences and a plurality of classes to which each of the plurality of sentences belongs. In an embodiment, the database may be included as a component in the electronic device 200, but is not limited thereto. The database may be configured in the form of a server disposed outside the electronic device 200. In an embodiment, the processor 210 may randomly extract the second sentence and the second class from the database and input the input data into the second neural network as input data to learn.
도 3은 본 개시의 일 실시예에 따른 전자 장치가 문장이 속하는 클래스를 분류하는 방법을 도시한 흐름도이다.3 is a flowchart illustrating a method of classifying a class to which a sentence belongs, according to an embodiment of the present disclosure.
단계 S310에서, 전자 장치는 제1 문장과 제1 문장이 속하는 제1 클래스를 입력 데이터로 하여 제1 뉴럴 네트워크(neural network)를 통해 제1 특징 벡터(feature vector)를 학습(training)한다. 일 실시예에서, 전자 장치는 사용자로부터 발화 또는 질문을 입력받고, 입력 받은 발화 또는 질문을 문장으로 인식할 수 있다. 전자 장치는 자연어 처리(NLP) 기술을 사용하여 인식된 문장 내에 포함된 적어도 하나의 단어를 파싱(parsing)하고, 적어도 하나의 단어를 적어도 하나의 단어 벡터로 각각 변환할 수 있다. 일 실시예에서, 전자 장치는 word2vec, GloVe, onehot 등과 같은 기계 학습 모델을 사용하여 적어도 하나의 단어를 적어도 하나의 단어 벡터로 임베딩(embedding)할 수 있으나, 이에 한정되는 것은 아니다. 전자 장치는 상기 기계 학습 모델을 이용하여 단어 표현을 벡터 공간 상에 나타낼 수 있는 벡터 값으로 변환할 수 있다.In operation S310, the electronic device trains a first feature vector through a first neural network using the first sentence and the first class to which the first sentence belongs as input data. In an embodiment, the electronic device may receive a utterance or question from a user and recognize the received utterance or question as a sentence. The electronic device may parse at least one word included in a recognized sentence by using natural language processing (NLP) technology, and may convert at least one word into at least one word vector. In an embodiment, the electronic device may embed at least one word into at least one word vector using a machine learning model such as word2vec, GloVe, onehot, and the like, but is not limited thereto. The electronic device may convert the word representation into a vector value that can be represented in a vector space using the machine learning model.
일 실시예에서, 전자 장치는 임베딩된 적어도 하나의 단어 벡터를 매트릭스(matrix) 형태로 배열하여 문장 벡터를 생성하고, 생성된 문장 벡터를 제1 뉴럴 네트워크에 입력 데이터로 입력시켜 제1 클래스로 분류되는 확률 분포를 학습할 수 있다. In an embodiment, the electronic device generates a sentence vector by arranging the embedded at least one word vector in a matrix form, and inputs the generated sentence vector as input data into the first neural network to classify it as a first class. Learn probability distributions.
단계 S320에서, 전자 장치는 제2 문장과 제2 문장이 속하는 제2 클래스를 입력 데이터로 하여 제2 뉴럴 네트워크를 통해 제2 특징 벡터를 학습한다. 일 실시예에서, 제2 문장과 제2 문장이 속하는 제2 클래스는 데이터베이스 형태로 저장되어 있을 수 있다. 전자 장치는 데이터베이스로부터 제2 문장 및 제2 클래스를 무작위(random)로 추출하여 제2 뉴럴 네트워크에 입력 데이터로 입력시켜 학습할 수 있다. 도 3에는 단계 S320가 단계 S310 이후에 수행되는 것으로 도시되었지만, 이에 한정되는 것은 아니다. 전자 장치는 제1 특징 벡터를 학습하는 단계(S310)과 제2 특징 벡터를 학습하는 단계(S320)를 동시에 수행할 수도 있다. In operation S320, the electronic device learns the second feature vector through the second neural network using the second sentence and the second class to which the second sentence belongs as input data. In an embodiment, the second sentence and the second class to which the second sentence belongs may be stored in a database form. The electronic device may randomly extract the second sentence and the second class from the database and input the second sentence and the second class as input data into the second neural network to learn. 3 illustrates that step S320 is performed after step S310, but is not limited thereto. The electronic device may simultaneously perform the step of learning the first feature vector (S310) and the step of learning the second feature vector (S320).
단계 S330에서, 전자 장치는 제1 특징 벡터, 제2 특징 벡터, 및 제1 클래스와 제2 클래스의 동일성 여부에 기초하여 대조 손실값(contrastive loss)을 획득한다. 일 실시예에서, 대조 손실값은 제1 특징 벡터와 제2 특징 벡터의 내적(dot product) 및 제1 클래스와 제2 클래스의 동일성 여부를 숫자로 나타내는 수식을 통해 계산될 수 있다. 상기 수식은, 제1 클래스와 제2 클래스가 동일한 경우 1을 출력하고, 제1 클래스와 제2 클래스가 동일하지 않는 경우에는 0을 출력할 수 있다. 대조 손실값을 계산하는 구체적인 실시예에 대해서는 도 4의 설명 부분에서 상세하게 설명하기로 한다. In operation S330, the electronic device obtains a contrast loss based on the first feature vector, the second feature vector, and whether the first class and the second class are identical. In one embodiment, the contrast loss value may be calculated through a numerical expression representing the dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal. The expression may output 1 when the first class and the second class are the same, and output 0 when the first class and the second class are not the same. A detailed example of calculating the contrast loss value will be described in detail with reference to FIG. 4.
단계 S340에서, 전자 장치는 대조 손실값이 최대가 되도록 제1 뉴럴 네트워크 및 제2 뉴럴 네트워크를 이용한 학습을 반복한다. 일 실시예에서, 대조 손실값은 -1 이상 1 이하의 범위의 값을 가질 수 있다. 전자 장치는 대조 손실값이 최대가 되는 방향으로 제1 뉴럴 네트워크 모델 및 제2 뉴럴 네트워크 모델을 이용하는 학습을 반복할 수 있다. 여기서, 학습의 반복은 제1 뉴럴 네트워크 모델 및 제2 뉴럴 네트워크 모델에 적용되는 가중치(weight)를 조절하는 것을 의미할 수 있다. In operation S340, the electronic device repeats the learning using the first neural network and the second neural network so that the contrast loss value is maximum. In one embodiment, the control loss value may have a value ranging from -1 to 1, inclusive. The electronic device may repeat the learning using the first neural network model and the second neural network model in a direction in which the contrast loss value is maximized. Here, repetition of learning may mean adjusting a weight applied to the first neural network model and the second neural network model.
도 4는 본 개시의 일 실시예에 따른 전자 장치가 컨볼루션 뉴럴 네트워크를 이용하여 문장이 속하는 클래스를 분류하는 방법을 설명하기 위한 도면이다.4 is a diagram for describing a method of classifying a class to which a sentence belongs, using a convolutional neural network, according to an embodiment of the present disclosure.
도 4를 참조하면, 전자 장치는 제1 뉴럴 네트워크(401)에 제1 문장(Si)과 제1 클래스(y1)를 입력 데이터로 입력하여 제1 문장(Si)이 제1 클래스(y1)로 분류되는 확률 분포를 학습할 수 있다. 또한, 전자 장치는 제2 뉴럴 네트워크(402)에 제2 문장(Sj)과 제2 클래스(y2)를 입력 데이터로 입력하여 제2 문장(Sj)이 제2 클래스(y2)로 분류되는 확률 분포를 학습할 수 있다. 여기서, 제1 클래스(y1) 및 제2 클래스(y2)은 각각 제1 문장 벡터(Si) 및 제2 문장 벡터(Sj)가 속하는 클래스를 정의하는 벡터값일 수 있다. Referring to Figure 4, an electronic apparatus includes a first neural network (401), a first sentence (S i) of the first class (y 1), a first sentence (S i) to the input to the input data is the first class in the ( We can learn the probability distribution classified as y 1 ). Further, the electronic device is a second neural network (402), a second sentence (S j) and a second class (y 2) of this second class second sentence (S j) by entering the input data (y 2) in the Learn the probability distributions that are classified. The first class (y 1) and a second class (y 2) is a vector may be a value that defines the class to which it belongs each of the first sentence vector (S i) and the second sentence vectors (S j).
도 4에서 제1 뉴럴 네트워크(401) 및 제2 뉴럴 네트워크(402)는 컨볼루션 뉴럴 네트워크 모델(Convolution Neural Network, CNN)으로 구성될 수 있지만, 이에 한정되는 것은 아니다. 일 실시예에서, 제1 뉴럴 네트워크(401) 및 제2 뉴럴 네트워크(402)는 회귀 뉴럴 네트워크(Recurrent Neural Network, RNN), 딥 빌리프 네트워크(Deep Belief Network, DBN), 제한된 볼츠만 기계(Restricted Boltzman Machine, RBM) 방식 등의 인공 뉴럴 네트워크 모델로 구현되거나, 또는 서포트 벡터 머신(Support Vector Machine, SVM)과 같은 머신 러닝 모델로도 구현될 수 있다.In FIG. 4, the first neural network 401 and the second neural network 402 may be configured as a convolutional neural network model (CNN), but are not limited thereto. In one embodiment, the first neural network 401 and the second neural network 402 are a recurrent neural network (RNN), a deep belief network (DBN), a restricted Boltzman machine (Restricted Boltzman). It may be implemented as an artificial neural network model such as a machine (RBM) scheme or a machine learning model such as a support vector machine (SVM).
전자 장치는 자연어 처리 기술을 통해 제1 문장(Si) 내의 복수의 단어(단어 1-1 내지 단어 1-6)를 추출하고, 제1 문장(Si)을 복수의 단어(단어 1-1 내지 단어 1-6)로 파싱(parsing)하고, 복수의 단어(단어 1-1 내지 단어 1-6)를 추출할 수 있다. 도 4에서 단어는 1-1 내지 1-6으로 총 6개로 도시되었지만, 이는 예시적인 것이고, 제1 문장(Si)에 속하는 단어가 6개로 한정되는 것은 아니다.Electronic device extracts a plurality of words (words 1-1 to 1-6 words) in the first sentence (S i) with a natural language processing technology, and the first sentence (S i) a plurality of words (words 1-1 To words 1-6), and a plurality of words (words 1-1 to 1-6) can be extracted. Figure 4 but in a total of six words shown in 1-1 to 1-6, which is exemplary and is not a word belonging to the first sentence (S i) is not limited to six.
전자 장치는 복수의 단어(단어 1-1 내지 단어 1-6)를 각각 복수의 단어 벡터(wv1-1 내지 wv1 - 6)로 변환할 수 있다. 일 실시예에서, 전자 장치는 제1 문장 벡터(Si) 및 제2 문장 벡터(Sj)는 word2vec, GloVe, onehot과 같은 기계 학습 모델을 이용하여 복수의 단어(단어 1-1 내지 단어 1-6)를 복수의 단어 벡터(wv1 -1 내지 wv1 -6)로 임베딩(embedding)할 수 있다. The electronic device is a plurality of words (words 1-1 to 1-6 words), a plurality of word vectors, respectively can be converted to (wv wv 1-1 to 1 6). In one embodiment, the electronic device is the first sentence vector (S i) and the second sentence vectors (S j) is word2vec, GloVe, a plurality of words using a machine learning model, such as onehot (1-1 words to the words 1 -6 may be embedded into a plurality of word vectors wv 1 -1 to wv 1 -6 .
전자 장치는 변환된 복수의 단어 벡터(wv1 -1 내지 wv1 - 6)를 매트릭스(matrix) 형태로 배열하여 문장 벡터(411)를 생성하고, 문장 벡터(411)를 제1 뉴럴 네트워크(401)에 입력시킬 수 있다. 문장 벡터(411)는 n개의 단어와 k의 차원(dimension)을 갖는 n×k 매트릭스일 수 있다. Electronic device of the converted multi-word vector (wv -1 to 1 wv 1 - 6) for the arranged in a matrix (matrix) form of generating a sentence vector 411, and the sentence vector 411, a first neural network (401 ) Can be entered. The sentence vector 411 may be an n × k matrix having n words and a dimension of k.
전자 장치는 문장 벡터(411)에 서로 다른 폭(width)을 갖는 복수의 필터(421)를 적용하여 컨볼루션 연산을 수행하고, 이를 통해 특징 맵(feature map)(431)을 생성할 수 있다. 복수의 필터(421)는 서로 다른 가중치(weight)를 갖는 벡터로서, 학습이 진행됨에 따라 가중치의 값은 변경될 수 있다. 전자 장치는 문장 벡터(411)의 벡터값과 복수의 필터(421)의 가중치 값을 서로 곱하고 더하는 연산을 통해 특징 맵(431)을 생성할 수 있다. 도 4에서 복수의 필터(421)는 2, 3, 및 4의 폭을 갖는 것으로 도시되었지만 이에 한정되는 것은 아니다. 복수의 필터(421)에서 차원(k)은 문장 벡터의 차원(k)과 동일할 수 있다. The electronic device may apply a plurality of filters 421 having different widths to the sentence vector 411 to perform a convolution operation, thereby generating a feature map 431. The plurality of filters 421 are vectors having different weights, and the weight value may change as learning progresses. The electronic device may generate the feature map 431 by multiplying and adding vector values of the sentence vector 411 and weight values of the plurality of filters 421. In FIG. 4, the plurality of filters 421 are illustrated as having a width of 2, 3, and 4, but are not limited thereto. The dimension k in the plurality of filters 421 may be the same as the dimension k of the sentence vector.
전자 장치는 특징 맵(431)을 맥스 풀링 레이어(max pooling layer)에 통과시켜 특징 맵(431)을 서브 샘플링(subsampling)하고, 이를 통해 제1 특징 벡터(441)를 생성할 수 있다. 제1 특징 벡터(441)는 맥스 풀링 레이어를 통해 특징 맵(431)에서 최대값을 갖는 벡터 값만을 추출하여 생성된 싱글 특징 벡터(single feature vector)로서, 제1 문장(Si)의 표현(representation)을 나타내는 표현 벡터로 정의될 수 있다. 도 4에는 맥스 풀링 레이어를 통해 제1 특징 벡터(441)를 생성하는 것으로 도시되었지만, 다른 서브 샘플링 레이어를 대체될 수도 있다. 예를 들어, 전자 장치는 평균값 풀링(average pooling) 또는 L2-norm pooling 등을 통해 제1 특징 벡터(441)를 생성할 수도 있다. The electronic device may subsample the feature map 431 by passing the feature map 431 through a max pooling layer, and generate a first feature vector 441. The first feature vector 441 is a single feature vector generated by extracting only the vector value having the maximum value from the feature map 431 through the max pooling layer, and is a representation of the first sentence Si. ) Can be defined as an expression vector. Although shown in FIG. 4 as generating the first feature vector 441 through the max pooling layer, other sub-sampling layers may be substituted. For example, the electronic device may generate the first feature vector 441 through average pooling or L 2 -norm pooling.
전자 장치는 제1 특징 벡터(441)를 완전 연결 레이어(fully connected layer)에 입력시켜 연결하고(concatenation), 이를 통해 1차원 벡터(451)를 생성할 수 있다. 전자 장치는 1차원 벡터(451)를 소프트맥스 분류기(softmax classifier)에 입력시켜 제1 분류 예측값 벡터(461)를 생성할 수 있다. 제1 분류 예측값 벡터(461)는 제1 문장(Si)이 제1 클래스(y1)로 분류될 수 있는 확률 분포를 나타낼 수 있다. 여기서, 전자 장치는 가중치를 조절하는 과정에서 발생되는 오버피팅(overfitting)의 발생을 방지하기 위하여 드롭 아웃(dropout) 연산을 수행할 수도 있다. The electronic device may input and concatenate the first feature vector 441 to a fully connected layer, thereby generating the one-dimensional vector 451. The electronic device may generate the first classification predicted value vector 461 by inputting the one-dimensional vector 451 to the softmax classifier. A first classification vector prediction value 461 may represent a probability distribution, which can be classified as a first sentence (S i) a first class (y 1). Here, the electronic device may perform a dropout operation in order to prevent the occurrence of overfitting generated in the process of adjusting the weight.
전자 장치는 제2 뉴럴 네트워크(402)에 제2 문장(Sj) 및 제2 클래스(y2)를 입력 데이터로 입력시켜 제2 특징 벡터(442)를 생성하고, 이를 통해 제2 분류 예측값 벡터(462)를 생성할 수 있다, 제2 뉴럴 네트워크(402)를 통한 학습 방법은 입력 데이터와 학습 결과를 제외하면 제1 뉴럴 네트워크(401)를 통한 학습 방법과 동일하므로 중복되는 설명은 생략한다.The electronic device generates a second feature vector 442 by inputting the second sentence S j and the second class y 2 as input data to the second neural network 402, thereby generating a second classification predicted vector. In operation 462, the learning method through the second neural network 402 is the same as the learning method through the first neural network 401 except for input data and learning results, and thus redundant description thereof will be omitted.
전자 장치는 제1 특징 벡터(441), 제2 특징 벡터(442), 및 제1 클래스(y1)와 제2 클래스(y2)의 동일성 여부에 기초하여 제1 문장(Si)과 제2 문장(Sj)의 표현 상의 유사 정도를 수치화한 대조 손실값(contrastive loss)(L1)을 획득할 수 있다. 제1 특징 벡터(441)을 F(Si)로 정의하고, 제2 특징 벡터(442)를 F(Sj)로 정의하면, 대조 손실값(L1)은 하기의 수학식에 기초하여 계산될 수 있다.The electronic device includes a first feature vector 441, a second feature vector 442, and the first class (y 1) and the second class (y 2), a first sentence (S i) based on the identity if the and the Contrastive loss L 1 obtained by quantifying the degree of similarity in expression of two sentences S j may be obtained. If the first feature vector 441 is defined as F (S i ) and the second feature vector 442 is defined as F (S j ), the contrast loss value L 1 is calculated based on the following equation: Can be.
Figure PCTKR2018005598-appb-M000001
Figure PCTKR2018005598-appb-M000001
수학식 1을 참조하면, 대조 손실값(L1)은 제1 특징 벡터 F(Si)와 제2 특징 벡터 F(Sj)의 내적(dot product)의 절대값 및 Y를 통해 계산될 수 있다. 수학식 1에서 Y는 제1 클래스(y1)와 제2 클래스(y2)의 동일성 여부를 숫자로 변환하는 표기(notation)로서, 제1 클래스(y1)와 제2 클래스(y2)가 동일한 경우 1을 출력하고, 제1 클래스(y1)와 제2 클래스(y2)가 동일하지 않는 경우 0을 출력할 수 있다. Referring to Equation 1, the contrast loss value L 1 may be calculated through the absolute value of the dot product and Y of the first feature vector F (S i ) and the second feature vector F (S j ). have. In Equation 1, Y is a notation for converting the identity of the first class (y 1 ) and the second class (y 2 ) to a number, and the first class (y 1 ) and the second class (y 2 ) 1 may be output when 0 is the same, and 0 may be output when first class y 1 and second class y 2 are not identical.
제1 문장(Si)과 제2 문장(Sj)이 서로 다른 클래스에 속하고, 문장 표현이 유사한 경우, 대조 손실값(L1)은 -1일 수 있다. 즉, 수학식 1에 대입하면, 제1 클래스(y1)와 제2 클래스(y2)가 서로 다른 클래스이기 때문에 Y=0이고, 제1 특징 벡터 F(Si)와 제2 특징 벡터 F(Sj)가 유사하여 F(Si)와 F(Sj)의 내적 절대값이 1에 가까운 값이 되므로, 대조 손실값(L1)은 0 × |~1| - (1-0) × |~1|= -1 로 계산될 수 있다. When the first sentence Si and the second sentence S j belong to different classes and the sentence expressions are similar, the contrast loss value L 1 may be −1. That is, when the equation 1 is substituted, Y = 0 because the first class y 1 and the second class y 2 are different classes, and the first feature vector F (S i ) and the second feature vector F are different. Since (S j ) is similar so that the internal absolute value of F (S i ) and F (S j ) is close to 1, the contrast loss value (L 1 ) is 0 × | ~ 1 |-(1-0) X | -1 | = -1 can be calculated.
제1 문장(Si)과 제2 문장(Sj)이 서로 다른 클래스에 속하고, 문장 표현도 서로 달라 구별되는 경우, 대조 손실값(L1)은 0일 수 있다. 즉, 제1 클래스(y1)와 제2 클래스(y2)가 서로 다른 클래스이기 때문에 Y=0이고, 제1 특징 벡터 F(Si)와 제2 특징 벡터 F(Sj)도 서로 구별되는바 cosine 값이 0에 가까워서 내적 절대값이 0에 근사한 값이 될 수 있다. 따라서, 수학식 1에 대입하면, 대조 손실값(L1)은 0 × |~0| - (1-0) × |~0|= 0 으로 계산될 수 있다.When the first sentence S i and the second sentence S j belong to different classes and the sentence expressions are also different from each other, the contrast loss value L 1 may be zero. That is, Y = 0 because the first class y 1 and the second class y 2 are different classes, and the first feature vector F (S i ) and the second feature vector F (S j ) are also distinguished from each other. Since the cosine value is close to zero, the inner product absolute value can be close to zero. Thus, by substituting Equation 1, the contrast loss value L 1 can be calculated as 0 × | ~ 0 |-(1-0) × | ~ 0 | = 0.
제1 문장(Si)과 제2 문장(Sj)이 동일한 클래스에 속하고, 문장 표현이 서로 달라 구별되는 경우, 대조 손실값(L1)은 0일 수 있다. 즉, 제1 클래스(y1)와 제2 클래스(y2)가 동일한 클래스이기 때문에 Y=1이고, 제1 특징 벡터 F(Si)와 제2 특징 벡터 F(Sj)가 서로 구별되므로 cosine 값이 0에 가까워서 내적 절대값이 0에 근사한 값이 될 수 있다. 따라서, 수학식 1에 대입하면, 대조 손실값(L1)은 1 × |~0| - (1-1) × |~0|= 0 으로 계산될 수 있다.When the first sentence Si and the second sentence S j belong to the same class and the sentence expressions are different from each other, the contrast loss value L 1 may be zero. That is, since the first class y 1 and the second class y 2 are the same class, Y = 1, and the first feature vector F (S i ) and the second feature vector F (S j ) are distinguished from each other. Since the cosine value is close to zero, the internal absolute value can be close to zero. Therefore, by substituting Equation 1, the contrast loss value L 1 can be calculated as 1 × | ~ 0 |-(1-1) × | ~ 0 | = 0.
제1 문장(Si)과 제2 문장(Sj)이 동일한 클래스에 속하고, 문장 표현이 유사한 경우, 대조 손실값(L1)은 1일 수 있다. 즉, 제1 클래스(y1)와 제2 클래스(y2)가 동일한 클래스이기 때문에 Y=1이고, 제1 특징 벡터 F(Si)와 제2 특징 벡터 F(Sj)가 유사한 클래스이므로, cosine 값이 1에 가까워서 내적 절대값이 1에 근사한 값이 될 수 있다. 따라서, 수학식 1에 대입하면, 대조 손실값(L1)은 1 × |~1| - (1-1) × |~1|= 1 로 계산될 수 있다.When the first sentence Si and the second sentence S j belong to the same class and the sentence expressions are similar, the contrast loss value L 1 may be one. That is, since the first class y 1 and the second class y 2 are the same class, Y = 1, and the first feature vector F (S i ) and the second feature vector F (S j ) are similar classes. In other words, the cosine value is close to 1, so the internal absolute value can be close to 1. Therefore, by substituting Equation 1, the control loss value L 1 can be calculated as 1 × | ~ 1 |-(1-1) × | ~ 1 | = 1.
상기 수학식 1을 참조하면, 대조 손실값(L1)은 제1 특징 벡터 및 제2 특징 벡터가 분류되는 클래스(y1, y2)의 동일 여부 뿐만 아니라, 제1 특징 벡터 F(Si)와 제2 특징 벡터 F(Sj)의 유사 정도도 판단할 수 있다. Referring to Equation 1, the contrast loss value L 1 is not only identical to the classes y 1 and y 2 in which the first and second feature vectors are classified, but also the first feature vector F (S i. ) And the similarity degree of the second feature vector F (S j ) may be determined.
전자 장치는 대조 손실값(L1)이 최대가 되는 방향으로 학습할 수 있다. 상기 수학식 1을 참조하면, 대조 손실값(L1)은 -1 이상 1 이하의 값을 갖는바, -1인 경우 제1 뉴럴 네트워크(401) 및 제2 뉴럴 네트워크(402)의 가중치(weight)를 변경하여 학습의 횟수를 증가시킬 수 있다. 이와 반대로 전자 장치는, 대조 손실값(L1)이 1인 경우, 제1 뉴럴 네트워크(401) 및 제2 뉴럴 네트워크(402)를 통한 학습의 횟수를 줄일 수 있다. 즉, 전자 장치는 제1 문장(Si)과 제2 문장(Sj)이 서로 다른 클래스에 속함에도 불구하고 서로 유사한 표현을 갖는 경우, 서로 구별되도록 학습 횟수를 증가시킬 수 있다. 또한, 전자 장치는 제1 문장(Si)과 제2 문장(Sj)이 동일한 클래스에 속하면서 유사한 표현을 갖는 경우에는 상대적으로 학습의 횟수를 증가시키지 않을 수 있다. The electronic device may learn in a direction in which the contrast loss value L 1 is maximized. Referring to Equation 1, the contrast loss value L 1 has a value of -1 or more and 1 or less. When -1, the weight loss of the first neural network 401 and the second neural network 402 is determined. ) To increase the number of lessons. In contrast, when the contrast loss value L 1 is 1, the electronic device may reduce the number of times of learning through the first neural network 401 and the second neural network 402. That is, when the first sentence Si and the second sentence S j have similar expressions even though they belong to different classes, the electronic device may increase the number of learning to distinguish each other. In addition, the electronic device may not have increased the number of times of learning with relative case with similar expression belonging to the same class the first sentence (S i) and the second sentence (S j).
일 실시예에서, 전자 장치는 제1 뉴럴 네트워크(401)를 통한 학습의 출력 데이터인 제1 분류 예측값 벡터(461)와 제1 클래스(y1)의 벡터의 차이값인 제1 분류 손실값(classification loss)(L2)을 획득할 수 있다. 마찬가지로, 전자 장치는 제2 뉴럴 네트워크(402)를 통한 학습의 출력 데이터인 제2 분류 예측값 벡터(462)와 제2 클래스(y2)의 벡터의 차이값인 제2 분류 손실값(L3)을 획득할 수 있다. 제1 분류 손실값(L2)과 제2 분류 손실값(L3)은 각각 제1 문장(Si)이 제1 클래스(y1)로 분류되는 정도, 제2 문장(Sj)이 제2 클래스(y2)로 분류되는 정도를 나타내는 값으로서, 값이 작을수록 분류의 정확도가 높을 수 있다. According to an embodiment, the electronic device may determine a first classification loss value, which is a difference value between a first classification prediction value vector 461 that is output data of learning through the first neural network 401 and a vector of the first class y 1 . classification loss (L 2 ) can be obtained. Similarly, the electronic device determines a second classification loss value L 3 , which is a difference between a second classification prediction value vector 462, which is output data of learning through the second neural network 402, and a vector of the second class y 2 . Can be obtained. The first classification loss value L 2 and the second classification loss value L 3 are each classified into the first class S i as the first class y 1 , and the second sentence S j is the first classification loss value L 3 . A value representing the degree of classification into two classes (y 2 ). The smaller the value, the higher the accuracy of classification.
일 실시예에서, 전자 장치는 하기 수학식 2와 같이 대조 손실값(L1), 제1 분류 손실값(L2), 및 제2 분류 손실값(L3)을 합산하여 최종 손실값(total loss)(L)을 계산할 수 있다.In an embodiment, the electronic device may add a final loss value (total) by adding a control loss value L 1 , a first classification loss value L 2 , and a second classification loss value L 3 as shown in Equation 2 below. loss) (L) can be calculated.
Figure PCTKR2018005598-appb-M000002
Figure PCTKR2018005598-appb-M000002
전자 장치는 계산된 최종 손실값(L)을 기초로 제1 뉴럴 네트워크(401) 및 제2 뉴럴 네트워크(402)에 적용되는 가중치를 조절하는 학습을 수행할 수 있다. The electronic device may learn to adjust weights applied to the first neural network 401 and the second neural network 402 based on the calculated final loss value L. FIG.
도 4에 도시된 본 개시의 실시예에 따르면, 제1 문장(Si) 및 제2 문장(Sj)이 각각 제1 클래스(y1) 및 제2 클래스(y2)로 분류되는 분류 손실값(L-2, L3) 뿐만 아니라, 문장 표현(representation)을 고려하는 대조 손실값(L1)을 동시에 고려함으로써, 서로 다른 클래스에 속하지만, 단어 표현이 유사한 경우 그 표현이 서로 달라지도록 하여 오분류를 막을 수 있다. 즉, 본 개시의 실시예에 따른 전자 장치는 다른 클래스에 속하는 유사한 발화 또는 문장에 대해서 효과적으로 분리할 수 있는 표현을 학습할 수 있고, 따라서 발화 또는 문장 분류의 정확도를 높일 수 있다. According to an embodiment of the present disclosure shown in Figure 4, a first sentence (S i) and the second sentence (S j) are each of the first class (y 1) and a second class (y 2) classification classified as loss By simultaneously considering not only the values (L- 2 , L 3 ) but also the contrast loss value (L 1 ) that takes into account sentence representations, so that the representations are different if they belong to different classes but have similar word representations. This can prevent misclassification. That is, the electronic device according to an embodiment of the present disclosure can learn expressions that can be effectively separated for similar speech or sentences belonging to different classes, and thus can increase the accuracy of speech or sentence classification.
특히, 전자 장치가 Bixby 등과 같은 대화형 로봇(Chatbot) 프로그램을 실행하는 경우, 사용자가 입력한 발화인 제1 문장(Si)이 제1 클래스(y1)에 속하더라도 문장 표현이 달라서 다른 클래스로 분류될 수 있다. 이런 경우 사용자는 원치 않는 제1 문장(Si)에 따른 원하는 답변이 아닌 잘못 분류된 답변을 받을 수 있다. 이와 같은 경우 전자 장치는 대조 손실값(L1)을 고려하여 학습함으로써, 사용자의 질문이 속하는 클래스로 분류되는 정확도를 높일 수 있다. 또한, 제1 문장(Si)이 전자 장치에 기 저장된 클래스 중 어떠한 클래스에도 속하지 않는 경우가 있을 수 있는데, 이 경우 전자 장치는 제1 문장(Si)을 어느 클래스로도 분류하지 않을 수 있어(reject), 사용자가 원치 않는 답변을 받을 가능성을 줄일 수 있다. In particular, when the electronic device executes an interactive robot program such as Bixby or the like, even if the first sentence S i , which is the speech input by the user, belongs to the first class y 1 , the sentence expression is different and thus is different. Can be classified as In this case the user is not the answer to your question in accordance with the first sentence of unwanted (S i) may be the answer incorrectly classified. In this case, the electronic device may increase the accuracy of classifying the class to which the user's question belongs by learning by considering the contrast loss value L 1 . In addition, the may be a case where one sentence (S i) does not belong to any class of previously stored class to the electronic device, in which case the electronic device can not be classified as a either a class of the first sentence (S i) (reject), which can reduce the likelihood of users receiving unwanted answers.
도 5는 본 개시의 일 실시예에 따른 전자 장치가 제1 문장이 속하는 클래스로 분류되는 확률값인 분류 예측값을 획득하는 방법을 도시한 흐름도이다.5 is a flowchart illustrating a method of obtaining, by an electronic device, a classification prediction value that is a probability value classified into a class to which a first sentence belongs.
단계 S510에서, 전자 장치는 제1 문장을 적어도 하나의 단어 벡터를 포함하는 매트릭스 형태로 변환한다. 일 실시예에서, 제1 문장은 사용자가 입력한 발화 또는 문장일 수 있다. 전자 장치는 제1 문장에 포함된 적어도 하나의 단어를 추출하고, 적어도 하나의 단어를 적어도 하나의 단어 벡터로 각각 변환할 수 있다. 일 실시예에서, 전자 장치는 word2vec, GloVe, onehot 과 같은 기계 학습 모델을 이용하여 적어도 하나의 단어를 적어도 하나의 단어 벡터로 임베딩(embedding)할 수 있다. 전자 장치는 적어도 하나의 단어 벡터를 매트릭스 형태로 배열하여 제1 문장 벡터를 생성할 수 있다.In operation S510, the electronic device converts the first sentence into a matrix form including at least one word vector. In one embodiment, the first sentence may be a speech or sentence input by a user. The electronic device may extract at least one word included in the first sentence and convert the at least one word into at least one word vector. In one embodiment, the electronic device may embed at least one word into at least one word vector using a machine learning model such as word2vec, GloVe, onehot. The electronic device may generate the first sentence vector by arranging at least one word vector in a matrix form.
단계 S520에서, 전자 장치는 변환된 매트릭스를 컨볼루션 뉴럴 네트워크(convolution neural network)에 입력 데이터로 입력하고, 복수의 필터(filter)를 적용하여 특징 맵(feature map)을 생성한다. 일 실시예에서, 전자 장치는 서로 다른 폭(width)을 갖는 다중 필터를 적용하여 컨볼루션 연산을 수행할 수 있다. 다중 필터는 서로 다른 가중치(weight)를 갖는 벡터로서, 학습이 진행됨에 따라 가중치의 값은 변경될 수 있다. 다중 필터는 단계 S510에서 생성된 문장 벡터의 차원(dimension)과 동일한 차원을 가질 수 있다. In operation S520, the electronic device inputs the converted matrix as input data to a convolutional neural network and generates a feature map by applying a plurality of filters. In an embodiment, the electronic device may apply a convolution operation by applying multiple filters having different widths. The multiple filter is a vector having different weights, and the weight value may change as learning progresses. The multiple filter may have the same dimension as the dimension of the sentence vector generated in step S510.
단계 S530에서, 전자 장치는 특징 맵을 맥스 풀링 레이어(max pooling layer)에 통과시켜 제1 특징 벡터를 추출한다. 전자 장치는 맥스 풀링 레이어를 통해 특징 맵에서 최대값을 갖는 벡터 값만을 추출하여 생성된 싱글 특징 벡터(single feature vector)인 제1 특징 벡터를 추출할 수 있다. 다만, 서브 샘플링에 사용되는 레이어가 맥스 풀링 레이어로 한정되는 것은 아니다. 일 실시예에서, 전자 장치는 평균값 풀링(average pooling) 또는 L2-norm pooling 등을 통해 제1 특징 벡터를 추출할 수도 있다.In operation S530, the electronic device extracts the first feature vector by passing the feature map through a max pooling layer. The electronic device may extract a first feature vector that is a single feature vector generated by extracting only a vector value having a maximum value from the feature map through the max pooling layer. However, the layer used for subsampling is not limited to the max pooling layer. In an embodiment, the electronic device may extract the first feature vector through average pooling or L 2 -norm pooling.
단계 S540에서, 전자 장치는 제1 특징 벡터를 완전 연결 레이어(fully connected layer)에 입력시켜 1차원 벡터값으로 변환한다. 일 실시예에서, 전자 장치는 서로 다른 폭을 갖는 필터를 이용하여 생성된 복수의 특징 맵들로 구성된 제1 특징 벡터를 하나로 연결(concatenate)하여 1차원 벡터값으로 변환할 수 있다. 단계 S540에서, 제1 특징 벡터를 학습하는 동안 발생하는 오버 피팅(overfitting)을 해결하고, 학습 데이터의 정확도를 높이기 위하여 드롭 아웃(dropout) 연산이 이용될 수도 있다.In operation S540, the electronic device inputs the first feature vector into a fully connected layer and converts the first feature vector into a one-dimensional vector value. According to an embodiment of the present disclosure, the electronic device may concatenate a first feature vector composed of a plurality of feature maps generated by using filters having different widths into one to convert a one-dimensional vector value. In operation S540, a dropout operation may be used to solve overfitting occurring while learning the first feature vector and to increase the accuracy of the training data.
단계 S550에서, 전자 장치는 1차원 벡터값을 소프트 맥스 분류기(softmax classifier)에 입력시켜 제1 분류 예측값을 획득한다. 일 실시예에서, 제1 분류 예측값은 제1 문장이 제1 클래스로 분류될 수 있는 확률값을 의미하는 것으로서, 소프트 맥스 분류기를 통과함으로써 생성될 수 있다. 1차원 벡터에 포함되는 벡터 값은 소프트 맥스 분류기를 통과하여 벡터 값의 총 합이 1이 되는 확률값으로 변환될 수 있다. In operation S550, the electronic device obtains a first classification prediction value by inputting a one-dimensional vector value to a softmax classifier. In one embodiment, the first classification prediction value refers to a probability value in which the first sentence may be classified into the first class, and may be generated by passing through a soft max classifier. The vector value included in the one-dimensional vector may be converted into a probability value through which the total sum of the vector values passes through the soft max classifier.
도 5에는 제1 문장을 컨볼루션 뉴럴 네트워크에 입력시켜 제1 분류 예측값을 획득하는 과정을 도시하였지만, 도시된 단계들은 제2 문장에도 동일하게 적용될 수 있다. 일 실시예에서, 전자 장치는 단계 S510 내지 S550에 따라 제2 문장을 컨볼루션 뉴럴 네트워크에 입력시켜 제2 분류 예측값을 획득할 수 있다. 일 실시예에서, 전자 장치는 제1 분류 예측값을 획득하는 제1 학습 과정과 제2 분류 예측값을 획득하는 제2 학습 과정을 동시에 수행할 수 있다. Although FIG. 5 illustrates a process of obtaining a first classification prediction value by inputting a first sentence to a convolutional neural network, the illustrated steps may be equally applied to the second sentence. According to an embodiment, the electronic device may obtain a second classification prediction value by inputting the second sentence to the convolutional neural network according to steps S510 to S550. In an embodiment, the electronic device may simultaneously perform a first learning process of obtaining a first classification prediction value and a second learning process of obtaining a second classification prediction value.
도 6은 본 개시의 일 실시예에 따른 전자 장치가 뉴럴 네트워크 모델을 통해 획득된 손실값을 기초로 하여 뉴럴 네트워크 모델에 적용된 가중치(weight)를 조절하는 학습 방법을 도시한 흐름도이다.6 is a flowchart illustrating a learning method of adjusting, by an electronic device, a weight applied to a neural network model based on a loss value obtained through the neural network model, according to an embodiment of the present disclosure.
단계 S610에서, 전자 장치는 제1 분류 예측값과 제1 클래스 간의 차이값인 제1 분류 손실값을 획득한다. 제1 분류 손실값은 제1 문장이 제1 클래스로 분류될 수 있는 확률값인 제1 분류 예측값과 제1 클래스 벡터의 차이값을 의미할 수 있다. In operation S610, the electronic device obtains a first classification loss value that is a difference between the first classification prediction value and the first class. The first classification loss value may mean a difference value between the first classification prediction value and the first class vector, which is a probability value in which the first sentence may be classified into the first class.
단계 S620에서, 전자 장치는 제2 분류 예측값과 제2 클래스 간의 차이값인 제2 분류 손실값을 획득한다. 제2 분류 손실값은 제2 문장이 제2 클래스로 분류될 수 있는 확률값인 제2 분류 예측값과 제2 클래스 벡터의 차이값을 의미할 수 있다. 일 실시예에서, 단계 S610과 단계 S620은 동시에 수행될 수 있다. In operation S620, the electronic device obtains a second classification loss value that is a difference between the second classification prediction value and the second class. The second classification loss value may mean a difference value between the second classification prediction value and the second class vector, which is a probability value in which the second sentence may be classified into the second class. In one embodiment, step S610 and step S620 may be performed simultaneously.
단계 S630에서, 전자 장치는 제1 분류 손실값, 제2 분류 손실값, 및 대조 손실값을 합산하여 최종 손실값을 획득한다. 대조 손실값을 계산하는 구체적인 방법은 도 4의 설명 부분에서 설명하였는바, 중복되는 설명은 생략한다.In operation S630, the electronic device obtains a final loss value by summing the first classification loss value, the second classification loss value, and the control loss value. A detailed method of calculating the control loss value has been described in the description of FIG. 4, and thus redundant description will be omitted.
단계 S640에서, 전자 장치는 최종 손실값을 기초로 하여 제1 뉴럴 네트워크 및 제2 뉴럴 네트워크에 적용되는 가중치(weight)를 조절한다. 일 실시예에서, 제1 뉴럴 네트워크 및 제2 뉴럴 네트워크는 복수의 필터를 적용하여 특징 맵을 생성하는 컨볼루션 뉴럴 네트워크로 구성되고, 전자 장치는 최종 손실값의 크기에 따라 컨볼루션 뉴럴 네트워크에 적용되는 복수의 필터의 가중치 값을 조절할 수 있다. In operation S640, the electronic device adjusts a weight applied to the first neural network and the second neural network based on the final loss value. In one embodiment, the first neural network and the second neural network are composed of a convolutional neural network that generates a feature map by applying a plurality of filters, and the electronic device is applied to the convolutional neural network according to the magnitude of the final loss value. The weight values of the plurality of filters may be adjusted.
본 명세서에서 설명된 전자 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 개시된 실시예들에서 설명된 전자 장치는 프로세서, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. The electronic device described herein may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the electronic device described in the disclosed embodiments may include a processor, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), It may be implemented using one or more general purpose or special purpose computers, such as a microprocessor or any other device capable of executing and responding to instructions.
소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. The software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device.
소프트웨어는, 컴퓨터로 읽을 수 있는 저장 매체(computer-readable storage media)에 저장된 명령어를 포함하는 컴퓨터 프로그램으로 구현될 수 있다. 컴퓨터가 읽을 수 있는 기록 매체로는, 예를 들어 마그네틱 저장 매체(예컨대, ROM(read-only memory), RAM(random-access memory), 플로피 디스크, 하드 디스크 등) 및 광학적 판독 매체(예컨대, 시디롬(CD-ROM), 디브이디(DVD: Digital Versatile Disc)) 등이 있다. 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템들에 분산되어, 분산 방식으로 컴퓨터가 판독 가능한 코드가 저장되고 실행될 수 있다. 매체는 컴퓨터에 의해 판독 가능하며, 메모리에 저장되고, 프로세서에서 실행될 수 있다. The software may be implemented as a computer program including instructions stored in a computer-readable storage media. Computer-readable recording media include, for example, magnetic storage media (eg, read-only memory (ROM), random-access memory (RAM), floppy disks, hard disks, etc.) and optical read media (eg, CD-ROMs). (CD-ROM) and DVD (Digital Versatile Disc). The computer readable recording medium can be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. The medium may be read by a computer, stored in a memory, and executed by a processor.
컴퓨터는, 저장 매체로부터 저장된 명령어를 호출하고, 호출된 명령어에 따라 개시된 실시예에 따른 동작이 가능한 장치로서, 개시된 실시예들에 따른 전자 장치를 포함할 수 있다.The computer is a device capable of calling stored instructions from a storage medium and operating according to the disclosed embodiments according to the called instructions, and may include an electronic device according to the disclosed embodiments.
컴퓨터로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, '비일시적'은 저장매체가 신호(signal)를 포함하지 않으며 실재(tangible)한다는 것을 의미할 뿐 데이터가 저장매체에 반영구적 또는 임시적으로 저장됨을 구분하지 않는다. The computer readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-temporary' means that the storage medium does not include a signal and is tangible, but does not distinguish that the data is stored semi-permanently or temporarily on the storage medium.
또한, 개시된 실시예들에 따른 전자 장치 또는 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다.In addition, an electronic device or method according to the disclosed embodiments may be provided included in a computer program product. The computer program product may be traded between the seller and the buyer as a product.
컴퓨터 프로그램 제품은 소프트웨어 프로그램, 소프트웨어 프로그램이 저장된 컴퓨터로 읽을 수 있는 저장 매체를 포함할 수 있다. 예를 들어, 컴퓨터 프로그램 제품은 전자 장치의 제조사 또는 전자 마켓(예를 들어, 구글 플레이 스토어, 앱 스토어)을 통해 전자적으로 배포되는 소프트웨어 프로그램 형태의 상품(예를 들어, 다운로드 가능한 애플리케이션(downloadable application))을 포함할 수 있다. 전자적 배포를 위하여, 소프트웨어 프로그램의 적어도 일부는 저장 매체에 저장되거나, 임시적으로 생성될 수 있다. 이 경우, 저장 매체는 제조사의 서버, 전자 마켓의 서버, 또는 소프트웨어 프로그램을 임시적으로 저장하는 중계 서버의 저장매체가 될 수 있다.The computer program product may include a software program, a computer-readable storage medium on which the software program is stored. For example, a computer program product may be a product (eg, a downloadable application) in the form of a software program distributed electronically through a manufacturer of an electronic device or an electronic market (eg, Google Play Store, App Store). ) May be included. For electronic distribution, at least a portion of the software program may be stored on a storage medium or temporarily created. In this case, the storage medium may be a server of a manufacturer, a server of an electronic market, or a storage medium of a relay server that temporarily stores a software program.
컴퓨터 프로그램 제품은, 서버 및 단말(예로, 초음파 진단 장치)로 구성되는 시스템에서, 서버의 저장매체 또는 단말의 저장매체를 포함할 수 있다. 또는, 서버 또는 단말과 통신 연결되는 제3 장치(예, 스마트폰)가 존재하는 경우, 컴퓨터 프로그램 제품은 제3 장치의 저장매체를 포함할 수 있다. 또는, 컴퓨터 프로그램 제품은 서버로부터 단말 또는 제3 장치로 전송되거나, 제3 장치로부터 단말로 전송되는 소프트웨어 프로그램 자체를 포함할 수 있다.The computer program product may include a storage medium of a server or a storage medium of a terminal in a system consisting of a server and a terminal (for example, an ultrasound diagnostic apparatus). Alternatively, when there is a third device (eg, a smartphone) that is in communication with the server or the terminal, the computer program product may include a storage medium of the third device. Alternatively, the computer program product may include a software program itself transmitted from the server to the terminal or the third device, or transmitted from the third device to the terminal.
이 경우, 서버, 단말 및 제3 장치 중 하나가 컴퓨터 프로그램 제품을 실행하여 개시된 실시예들에 따른 방법을 수행할 수 있다. 또는, 서버, 단말 및 제3 장치 중 둘 이상이 컴퓨터 프로그램 제품을 실행하여 개시된 실시예들에 따른 방법을 분산하여 실시할 수 있다.In this case, one of the server, the terminal and the third device may execute the computer program product to perform the method according to the disclosed embodiments. Alternatively, two or more of the server, the terminal and the third device may execute a computer program product to distribute and implement the method according to the disclosed embodiments.
예를 들면, 서버(예로, 클라우드 서버 또는 인공 지능 서버 등)가 서버에 저장된 컴퓨터 프로그램 제품을 실행하여, 서버와 통신 연결된 단말이 개시된 실시예들에 따른 방법을 수행하도록 제어할 수 있다. For example, a server (eg, a cloud server or an artificial intelligence server, etc.) may execute a computer program product stored in the server to control a terminal connected to the server to perform the method according to the disclosed embodiments.
또 다른 예로, 제3 장치가 컴퓨터 프로그램 제품을 실행하여, 제3 장치와 통신 연결된 단말이 개시된 실시예에 따른 방법을 수행하도록 제어할 수 있다. As another example, a third device may execute a computer program product to control a terminal in communication with the third device to perform the method according to the disclosed embodiment.
제3 장치가 컴퓨터 프로그램 제품을 실행하는 경우, 제3 장치는 서버로부터 컴퓨터 프로그램 제품을 다운로드하고, 다운로드된 컴퓨터 프로그램 제품을 실행할 수 있다. 또는, 제3 장치는 프리로드된 상태로 제공된 컴퓨터 프로그램 제품을 실행하여 개시된 실시예들에 따른 방법을 수행할 수도 있다.When the third device executes the computer program product, the third device may download the computer program product from the server and execute the downloaded computer program product. Alternatively, the third apparatus may execute the provided computer program product in a preloaded state to perform the method according to the disclosed embodiments.
또한, 이상에서는 본 개시의 실시예에 대하여 도시하고 설명하였지만, 본 개시는 전술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형 실시들은 본 개시의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안될 것이다.In addition, although the embodiments of the present disclosure have been illustrated and described above, the present disclosure is not limited to the above-described specific embodiments, and the present disclosure belongs to the present invention without departing from the gist of the present invention as claimed in the claims. Various modifications may be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present disclosure.
이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 전자 장치, 구조, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described electronic devices, structures, circuits, etc. may be combined or combined in a different form than the described method, or other components or Appropriate results can be achieved even if replaced or replaced by equivalents.

Claims (15)

  1. 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 문장이 속하는 클래스(class)를 분류(classification)하는 방법에 있어서,In a method of classifying a class to which a sentence belongs by using a deep neural network,
    적어도 하나의 단어를 포함하는 제1 문장과 상기 제1 문장이 속하는 제1 클래스를 입력 데이터로 하여 제1 뉴럴 네트워크(neural network)를 통해 제1 특징 벡터(feature vector)를 학습(training)하는 단계;Training a first feature vector through a first neural network using a first sentence including at least one word and a first class to which the first sentence belongs as input data; ;
    제2 문장과 상기 제2 문장이 속하는 제2 클래스를 입력 데이터로 하여 제2 뉴럴 네트워크를 통해 제2 특징 벡터를 학습하는 단계;Learning a second feature vector through a second neural network using as input data a second sentence and a second class to which the second sentence belongs;
    상기 제1 특징 벡터, 상기 제2 특징 벡터, 및 상기 제1 클래스와 제2 클래스의 동일성 여부에 기초하여 상기 제1 문장 및 상기 제2 문장의 표현 상의 유사 정도를 수치화한 대조 손실값(contrastive loss)을 획득하는 단계; 및Contrastive loss that quantifies the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. Obtaining; And
    상기 대조 손실값이 최대가 되도록 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크를 이용한 학습을 반복하는 단계; Repeating learning with the first neural network and the second neural network such that the control loss value is maximum;
    를 포함하는, 방법. Including, the method.
  2. 제1 항에 있어서,According to claim 1,
    사용자로부터 발화를 입력 받는 단계;Receiving a speech from a user;
    입력 받은 상기 발화를 문장으로 인식하는 단계; 및 Recognizing the received speech as a sentence; And
    인식된 상기 문장 내에 포함된 적어도 하나의 단어를 추출하고, 상기 적어도 하나의 단어를 적어도 하나의 단어 벡터로 각각 변환하는 단계; 를 더 포함하고, Extracting at least one word included in the recognized sentence and converting the at least one word into at least one word vector, respectively; More,
    상기 제1 특징 벡터를 학습하는 단계는, Learning the first feature vector may include:
    상기 적어도 하나의 단어 벡터를 매트릭스(matrix) 형태로 배열하여 문장 벡터를 생성하는 단계; 및 Generating a sentence vector by arranging the at least one word vector in a matrix form; And
    상기 문장 벡터를 상기 제1 뉴럴 네트워크에 입력 데이터로 입력시켜 상기 제1 특징 벡터를 학습하는, 방법.Learning the first feature vector by inputting the sentence vector to the first neural network as input data.
  3. 제1 항에 있어서,According to claim 1,
    복수의 문장과 상기 복수의 문장 각각이 속하는 복수의 클래스가 데이터베이스에 저장되어 있고,A plurality of sentences and a plurality of classes to which each of the plurality of sentences belong are stored in a database.
    상기 제2 문장 및 상기 제2 클래스는 상기 데이터베이스 상에서 무작위(random)로 추출되는 것인, 방법, Wherein the second sentence and the second class are extracted randomly on the database,
  4. 제1 항에 있어서,According to claim 1,
    상기 대조 손실값을 획득하는 단계는, 상기 제1 특징 벡터와 상기 제2 특징 벡터의 내적(dot product) 및 상기 제1 클래스와 제2 클래스의 동일성 여부를 숫자로 나타내는 수식을 통해 상기 대조 손실값을 계산하는, 방법. The obtaining of the control loss value may include obtaining the control loss value through a formula representing a dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal to each other. To calculate, how.
  5. 제4 항에 있어서,The method of claim 4, wherein
    상기 수식은, 상기 제1 클래스와 상기 제2 클래스가 동일한 경우 1을 출력하고, 상기 제1 클래스와 상기 제2 클래스가 동일하지 않은 경우 0을 출력하는, 방법. And the formula outputs 1 when the first class and the second class are the same and outputs 0 when the first class and the second class are not the same.
  6. 제1 항에 있어서,According to claim 1,
    상기 제1 뉴럴 네트워크를 통한 학습 단계 및 상기 제2 뉴럴 네트워크를 통한 학습 단계는 동시에 수행되는, 방법. Learning through the first neural network and learning through the second neural network are performed simultaneously.
  7. 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 문장이 속하는 클래스(class)를 분류하는 전자 장치(electronic device)에 있어서, In an electronic device that classifies a class to which a sentence belongs using a deep neural network,
    뉴럴 네트워크(neural network)를 이용하여 학습(training)을 수행하는 프로세서를 포함하고, It includes a processor that performs the training (training) using a neural network,
    상기 프로세서는, 적어도 하나의 단어를 포함하는 제1 문장과 상기 제1 문장이 속하는 제1 클래스를 입력 데이터로 하여 제1 뉴럴 네트워크를 통해 제1 특징 벡터(feature vector)를 학습하고, 제2 문장과 상기 제2 문장이 속하는 제2 클래스를 입력 데이터로 하여 제2 뉴럴 네트워크를 통해 제2 특징 벡터를 학습하고, 상기 제1 특징 벡터, 상기 제2 특징 벡터, 및 상기 제1 클래스와 제2 클래스의 동일성 여부에 기초하여 상기 제1 문장 및 상기 제2 문장의 표현 상의 유사 정도를 수치화한 대조 손실값(contrastive loss)을 획득하고, 상기 대조 손실값이 최대가 되도록 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크를 이용한 학습을 반복하여 수행하는, 전자 장치. The processor learns a first feature vector through a first neural network using a first sentence including at least one word and a first class to which the first sentence belongs as input data, and a second sentence. And a second feature vector is learned through a second neural network using the second class to which the second sentence belongs, and the first feature vector, the second feature vector, and the first and second classes. A contrast loss is obtained by quantifying the degree of similarity in the expressions of the first sentence and the second sentence based on the identity of the first sentence, and the first neural network and the first order such that the contrast loss is maximized. 2 An electronic device repeatedly performing learning using a neural network.
  8. 제7 항에 있어서,The method of claim 7, wherein
    사용자로부터 발화를 입력 받는 발화 입력부; 를 더 포함하고, A speech input unit configured to receive a speech from a user; More,
    상기 프로세서는 입력받은 상기 발화를 문장으로 인식하고, 인식된 상기 문장 내에 포함된 적어도 하나의 단어를 추출하고, 상기 적어도 하나의 단어를 적어도 하나의 단어 벡터로 각각 변환하는, 전자 장치. The processor recognizes the received speech as a sentence, extracts at least one word included in the recognized sentence, and converts the at least one word into at least one word vector, respectively.
  9. 제8 항에 있어서,The method of claim 8,
    상기 프로세서는 상기 적어도 하나의 단어 벡터를 매트릭스(matrix) 형태로 배열하여 문장 벡터를 생성하고, 상기 문장 벡터를 상기 제1 뉴럴 네트워크에 입력 데이터로 입력시켜 상기 제1 특징 벡터를 학습하는, 전자 장치. The processor generates the sentence vector by arranging the at least one word vector in a matrix form, and inputs the sentence vector as input data to the first neural network to learn the first feature vector. .
  10. 제7 항에 있어서,The method of claim 7, wherein
    복수의 문장과 상기 복수의 문장 각각이 속하는 복수의 클래스를 저장하는 데이터베이스; A database for storing a plurality of sentences and a plurality of classes to which each of the plurality of sentences belongs;
    를 더 포함하고, More,
    상기 프로세서는 상기 데이터베이스로부터 상기 제2 문장 및 상기 제2 클래스는 상기 데이터베이스 상에서 무작위(random)로 추출하여 상기 제2 뉴럴 네트워크에 입력 데이터로 입력하는, 전자 장치. And the processor extracts the second sentence and the second class from the database randomly on the database and inputs the input data to the second neural network as input data.
  11. 제7 항에 있어서,The method of claim 7, wherein
    상기 프로세서는, 상기 제1 특징 벡터와 상기 제2 특징 벡터의 내적(dot product) 및 상기 제1 클래스와 제2 클래스의 동일성 여부를 숫자로 나타내는 수식을 통해 상기 대조 손실값을 계산하는, 전자 장치. The processor is configured to calculate the contrast loss value through a numerical expression representing a dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal to each other. .
  12. 제11 항에 있어서,The method of claim 11, wherein
    상기 수식은, 상기 제1 클래스와 상기 제2 클래스가 동일한 경우 1을 출력하고, 상기 제1 클래스와 상기 제2 클래스가 동일하지 않은 경우 0을 출력하는, 전자 장치. The formula outputs 1 when the first class and the second class are the same, and outputs 0 when the first class and the second class are not the same.
  13. 제7 항에 있어서,The method of claim 7, wherein
    상기 프로세서는, 상기 제1 문장을 적어도 하나의 단어 벡터를 포함하는 매트릭스 형태로 변환하고, 변환된 매트릭스를 컨볼루션 뉴럴 네트워크에 입력 데이터로 입력하고, 복수의 필터(filter)를 적용하여 특징 맵(feature map)을 생성하고, 상기 특징 맵을 맥스 풀링 레이어(max pooling layer)에 통과시켜 상기 제1 특징 벡터를 추출하는, 전자 장치. The processor converts the first sentence into a matrix including at least one word vector, inputs the converted matrix as input data to a convolutional neural network, and applies a plurality of filters to apply a feature map ( and generate a feature map and extract the first feature vector by passing the feature map through a max pooling layer.
  14. 제7 항에 있어서,The method of claim 7, wherein
    상기 프로세서는, 상기 제1 뉴럴 네트워크를 통한 학습 및 상기 제2 뉴럴 네트워크를 통한 학습을 동시에 수행하는, 전자 장치. The processor is configured to simultaneously perform learning through the first neural network and learning through the second neural network.
  15. 컴퓨터로 읽을 수 있는 저장 매체를 포함하는 컴퓨터 프로그램 제품에 있어서, A computer program product comprising a computer readable storage medium,
    상기 저장 매체는,The storage medium,
    적어도 하나의 단어를 포함하는 제1 문장과 상기 제1 문장이 속하는 제1 클래스를 입력 데이터로 하여 제1 뉴럴 네트워크(neural network)를 통해 제1 특징 벡터(feature vector)를 학습(training)하는 단계;Training a first feature vector through a first neural network using a first sentence including at least one word and a first class to which the first sentence belongs as input data; ;
    제2 문장과 상기 제2 문장이 속하는 제2 클래스를 입력 데이터로 하여 제2 뉴럴 네트워크를 통해 제2 특징 벡터를 학습하는 단계;Learning a second feature vector through a second neural network using as input data a second sentence and a second class to which the second sentence belongs;
    상기 제1 특징 벡터, 상기 제2 특징 벡터, 및 상기 제1 클래스와 제2 클래스의 동일성 여부에 기초하여 상기 제1 문장 및 상기 제2 문장의 표현 상의 유사 정도를 수치화한 대조 손실값(contrastive loss)을 획득하는 단계; 및Contrastive loss that quantifies the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. Obtaining; And
    상기 대조 손실값이 최대가 되도록 상기 제1 뉴럴 네트워크 및 상기 제2 뉴럴 네트워크를 이용한 학습을 반복하는 단계; Repeating learning with the first neural network and the second neural network such that the control loss value is maximum;
    를 수행하는 명령어들을 포함하는, 컴퓨터 프로그램 제품. A computer program product comprising instructions for performing a.
PCT/KR2018/005598 2017-05-16 2018-05-16 Method and apparatus for classifying class, to which sentence belongs, using deep neural network WO2018212584A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/613,317 US11568240B2 (en) 2017-05-16 2018-05-16 Method and apparatus for classifying class, to which sentence belongs, using deep neural network

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762506724P 2017-05-16 2017-05-16
US62/506,724 2017-05-16
KR10-2018-0055651 2018-05-15
KR1020180055651A KR102071582B1 (en) 2017-05-16 2018-05-15 Method and apparatus for classifying a class to which a sentence belongs by using deep neural network

Publications (2)

Publication Number Publication Date
WO2018212584A2 true WO2018212584A2 (en) 2018-11-22
WO2018212584A3 WO2018212584A3 (en) 2019-01-10

Family

ID=64274189

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2018/005598 WO2018212584A2 (en) 2017-05-16 2018-05-16 Method and apparatus for classifying class, to which sentence belongs, using deep neural network

Country Status (1)

Country Link
WO (1) WO2018212584A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210027A (en) * 2019-05-30 2019-09-06 杭州远传新业科技有限公司 Fine granularity sentiment analysis method, apparatus, equipment and medium based on integrated study
CN111310823A (en) * 2020-02-12 2020-06-19 北京迈格威科技有限公司 Object classification method, device and electronic system
WO2021143018A1 (en) * 2020-01-16 2021-07-22 平安科技(深圳)有限公司 Intention recognition method, apparatus, and device, and computer readable storage medium
EP4014232A4 (en) * 2020-01-23 2022-10-19 Samsung Electronics Co., Ltd. Electronic device and control method thereof

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2505400B (en) * 2012-07-18 2015-01-07 Toshiba Res Europ Ltd A speech processing system
WO2016039751A1 (en) * 2014-09-11 2016-03-17 Nuance Communications, Inc. Method for scoring in an automatic speech recognition system
US9646634B2 (en) * 2014-09-30 2017-05-09 Google Inc. Low-rank hidden input layer for speech recognition neural network
KR102167719B1 (en) * 2014-12-08 2020-10-19 삼성전자주식회사 Method and apparatus for training language model, method and apparatus for recognizing speech
KR102413693B1 (en) * 2015-07-23 2022-06-27 삼성전자주식회사 Speech recognition apparatus and method, Model generation apparatus and method for Speech recognition apparatus

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210027A (en) * 2019-05-30 2019-09-06 杭州远传新业科技有限公司 Fine granularity sentiment analysis method, apparatus, equipment and medium based on integrated study
WO2021143018A1 (en) * 2020-01-16 2021-07-22 平安科技(深圳)有限公司 Intention recognition method, apparatus, and device, and computer readable storage medium
EP4014232A4 (en) * 2020-01-23 2022-10-19 Samsung Electronics Co., Ltd. Electronic device and control method thereof
CN111310823A (en) * 2020-02-12 2020-06-19 北京迈格威科技有限公司 Object classification method, device and electronic system
CN111310823B (en) * 2020-02-12 2024-03-29 北京迈格威科技有限公司 Target classification method, device and electronic system

Also Published As

Publication number Publication date
WO2018212584A3 (en) 2019-01-10

Similar Documents

Publication Publication Date Title
KR102071582B1 (en) Method and apparatus for classifying a class to which a sentence belongs by using deep neural network
WO2018212494A1 (en) Method and device for identifying object
WO2018212584A2 (en) Method and apparatus for classifying class, to which sentence belongs, using deep neural network
WO2021132797A1 (en) Method for classifying emotions of speech in conversation by using semi-supervised learning-based word-by-word emotion embedding and long short-term memory model
CN112861945B (en) Multi-mode fusion lie detection method
CN112233698A (en) Character emotion recognition method and device, terminal device and storage medium
WO2019098418A1 (en) Neural network training method and device
CN109559576B (en) Child accompanying learning robot and early education system self-learning method thereof
Jha et al. A novel approach on visual question answering by parameter prediction using faster region based convolutional neural network
Goswami et al. CNN model for american sign language recognition
WO2020231005A1 (en) Image processing device and operation method thereof
CN110867225A (en) Character-level clinical concept extraction named entity recognition method and system
CN113849653A (en) Text classification method and device
Herasymova et al. Development of Intelligent Information Technology of Computer Processing of Pedagogical Tests Open Tasks Based on Machine Learning Approach.
Dabwan et al. Arabic Sign Language Recognition Using EfficientnetB1 and Transfer Learning Technique
Goyal Indian sign language recognition using mediapipe holistic
Atif et al. Emojis pictogram classification for semantic recognition of emotional context
Montefalcon et al. Filipino sign language recognition using long short-term memory and residual network architecture
Rungta et al. A deep learning based approach to measure confidence for virtual interviews
Chu et al. Sign Language Action Recognition System Based on Deep Learning
Zahid et al. A Computer Vision-Based System for Recognition and Classification of Urdu Sign Language Dataset for Differently Abled People Using Artificial Intelligence
Gadge et al. Recognition of Indian Sign Language Characters Using Convolutional Neural Network
Katti et al. Character and Word Level Gesture Recognition of Indian Sign Language
Chen et al. Static correlative filter based convolutional neural network for visual question answering
Al-Obaidi et al. Interpreting arabic sign alphabet by using the deep learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18802102

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18802102

Country of ref document: EP

Kind code of ref document: A2