WO2018212584A2

WO2018212584A2 - Method and apparatus for classifying class, to which sentence belongs, using deep neural network

Info

Publication number: WO2018212584A2
Application number: PCT/KR2018/005598
Authority: WO
Inventors: 송희준; 쿨카르니니레시
Original assignee: 삼성전자 주식회사
Priority date: 2017-05-16
Filing date: 2018-05-16
Publication date: 2018-11-22
Also published as: WO2018212584A3

Abstract

Provided are a method and an apparatus for classifying a class, to which a sentence belongs, using a deep neural network. One embodiment of the present disclosure provides a method and an apparatus for: learning a first sentence and a second sentence through a first neural network and a second neural network, respectively; acquiring a contrastive loss value on the basis of whether a first feature vector and a second feature vector, which are generated as output data of learning, are identical to classes to which the first and second sentences belong; and repeating learning so as to maximize the contrastive loss value.

Description

Method and apparatus for classifying a class to which a sentence belongs by using a deep neural network

The present disclosure relates to a method and apparatus for classifying a class to which a sentence belongs, by structurally analyzing a question sentence using a deep neural network.

Artificial Intelligence (AI) system is a computer system that implements human-level intelligence, and unlike conventional rule-based smart systems, machines learn and judge themselves and become smart. As the AI system is used, the recognition rate is improved and the user's taste can be understood more accurately, and the existing Rule-based smart system is gradually replaced by the AI system based on the deep neural network.

AI technology is composed of machine learning and elementary technologies that utilize machine learning.

Machine learning is an algorithm technology that classifies / learns characteristics of input data by itself. Element technology is a technology that simulates the functions of human brain cognition and judgment by using machine learning algorithms such as neural networks. It consists of technical areas such as understanding, reasoning / prediction, knowledge representation, and motion control.

AI technology can recognize, apply and process human language / characters, and is also used for natural language processing, machine translation, dialogue system, question and answer, speech recognition / synthesis, and so on. In the question-and-answer system using artificial intelligence technology, the structure of the user's question sentence is analyzed, and the answer type, intent, subject / verb analysis is performed, and the related answer is found in the database. In the case of a question-and-answer system that executes a user's command, the user's input speech is classified, the intent is analyzed, and an independent entity is found to process the command.

Recently, customer care chatbots are being utilized that use artificial intelligence to analyze user problems and provide appropriate answers. In customer support chatbots, it is important to analyze the user's speech and analyze the category in which the user wants to receive an answer. If the amount of questions already stored is not large, the user's speech may be misclassified into a different category, unlike the user's intention. In this case, the user may not receive the desired answer.

The present disclosure provides a method and apparatus for increasing the classification accuracy of a first sentence by additionally using a separate second neural network model in classifying a class to which the first sentence belongs using the first neural network model. .

The present disclosure not only learns using a first neural network model in classifying a first class to which a first sentence belongs, but also uses a first neural network model to further learn a first feature. Vector, a second feature vector learned through a second neural network model, and a contrast loss based on the identity of the first class and the second class, thereby introducing a representational image between the first and second sentences. It provides a method and apparatus that can distinguish the degree of similarity. Accordingly, the method and apparatus according to the embodiment of the present disclosure may improve the accuracy of sentence classification by using not only the label of a sentence or speech but also semantic similarity.

The present disclosure can be easily understood by the following detailed description and the accompanying drawings in which reference numerals refer to structural elements.

1 is a conceptual diagram illustrating an embodiment of obtaining a classification prediction value of a class to which a sentence belongs by training by inputting a sentence vector and a class into a neural network model according to an embodiment of the present disclosure.

2 is a block diagram illustrating components of an electronic device according to an embodiment of the present disclosure.

3 is a flowchart illustrating a method of classifying a class to which a sentence belongs, according to an embodiment of the present disclosure.

4 is a diagram for describing a method of classifying a class to which a sentence belongs, using a convolutional neural network, according to an embodiment of the present disclosure.

5 is a flowchart illustrating a method of obtaining, by an electronic device, a classification prediction value that is a probability value classified into a class to which a first sentence belongs.

6 is a flowchart illustrating a learning method of adjusting, by an electronic device, a weight applied to a neural network model based on a loss value obtained through the neural network model, according to an embodiment of the present disclosure.

In order to solve the above technical problem, an embodiment of the present disclosure provides a first neural network using a first sentence including at least one word and a first class to which the first sentence belongs, as input data. Training a first feature vector through; Learning a second feature vector through a second neural network using as input data a second sentence and a second class to which the second sentence belongs; Contrastive loss that quantifies the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. Obtaining; And repeating the learning using the first neural network and the second neural network so that the contrast loss value is maximum, using a deep neural network. Provides a method of classification.

For example, the method may include receiving a speech from a user; Recognizing the received speech as a sentence; And extracting at least one word included in the recognized sentence, and converting the at least one word into at least one word vector, wherein learning the first feature vector comprises: Generating a sentence vector by arranging the word vectors in a matrix form; And learning the first feature vector by inputting the sentence vector as input data to the first neural network.

For example, a plurality of sentences and a plurality of classes to which each of the plurality of sentences belong are stored in a database, and the second sentence and the second class may be extracted randomly on the database.

For example, the obtaining of the contrast loss value may be performed using a formula representing a dot product of a first feature vector and a second feature vector and whether the first class and the second class are equal to each other by a numerical formula. Can be calculated.

For example, the formula may output 1 when the first class and the second class are the same, and output 0 when the first class and the second class are not the same.

For example, learning the first feature vector may include converting the first sentence into a matrix form including at least one word vector; Inputting the transformed matrix into the convolutional neural network as input data and generating a feature map by applying a plurality of filters; And extracting the first feature vector by passing the feature map through a max pooling layer.

For example, the method includes inputting a first feature vector into a fully connected layer and converting it into a one-dimensional vector value; And inputting a one-dimensional vector value to a softmax classifier to obtain a first classification prediction value representing a probability distribution classified into a first class.

For example, the method includes obtaining a first classification loss value that is a difference between the first classification prediction value and the first class; Acquire a second classification prediction value representing a probability distribution in which a second sentence is classified into a second class through the second neural network, and obtain a second classification loss value, which is a difference between the second classification prediction value and the second class. Doing; And calculating the final loss value by summing the first classification loss value, the second classification loss value, and the control loss value, and calculating the final loss value to the first neural network and the second neural network based on the calculated final loss value. The method may further include adjusting a weight applied.

For example, the learning through the first neural network and the learning through the second neural network may be performed at the same time.

In order to solve the above technical problem, an embodiment of the present disclosure may provide an electronic device that classifies a class to which a sentence belongs, using a deep neural network. The electronic device includes a processor that performs training by using a neural network, and the processor inputs a first sentence including at least one word and a first class to which the first sentence belongs. A first feature vector is learned through a first neural network as data, and a second feature vector is obtained through a second neural network using as input data a second sentence and a second class to which the second sentence belongs. Contrast loss value obtained by learning and quantifying the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. (contrastive loss) can be obtained, and the learning using the first neural network and the second neural network can be repeated to maximize the contrast loss value.

For example, the electronic device may further include a speech input unit configured to receive a speech from a user, and the processor may recognize the received speech as a sentence, extract at least one word included in the recognized sentence, and At least one word may be converted into at least one word vector, respectively.

For example, the processor generates the sentence vector by arranging the at least one word vector in a matrix form, and inputs the sentence vector as input data to the first neural network to learn the first feature vector. can do.

For example, the electronic device further includes a database storing a plurality of sentences and a plurality of classes to which each of the plurality of sentences belongs, and wherein the processor is configured to store the second sentence and the second class on the database. The data may be extracted randomly and input to the second neural network as input data.

For example, the processor may calculate the contrast loss value through a numerical expression representing a dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal. have.

For example, the processor converts the first sentence into a matrix including at least one word vector, inputs the converted matrix into input data into a convolutional neural network, and applies a plurality of filters. A feature map may be generated, and the first feature vector may be extracted by passing the feature map through a max pooling layer.

For example, the processor may input the first feature vector into a fully connected layer and convert the first feature vector into a one-dimensional vector value, and input the one-dimensional vector value into a softmax classifier to generate the first feature vector. A first classification prediction value representing a probability distribution classified into a class may be obtained.

For example, the processor obtains a first classification loss value, which is a difference value between the first classification prediction value and the first class, and calculates a probability distribution in which a second sentence is classified into a second class through the second neural network. Obtain a second classification prediction value indicating a second classification loss value that is a difference between the second classification prediction value and the second class, and obtain the first classification loss value, the second classification loss value, and the control loss value. The final loss value may be calculated by summing, and the weight applied to the first neural network and the second neural network may be adjusted based on the calculated final loss value.

For example, the processor may simultaneously perform learning through the first neural network and learning through the second neural network.

In order to solve the above technical problem, an embodiment of the present disclosure provides a computer program product including a computer-readable storage medium, wherein the storage medium includes a first sentence comprising at least one word and the first sentence. Training a first feature vector through a first neural network using the first class to which one sentence belongs as input data; Learning a second feature vector through a second neural network using as input data a second sentence and a second class to which the second sentence belongs; Contrastive loss that quantifies the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. Obtaining; And repeating the learning using the first neural network and the second neural network such that the contrast loss value is maximized.

This application is a priority claim application based on US Provisional Application USPA 62 / 506,724, filed May 16, 2017 and Application No. 10-2018-0055651, filed May 15, 2018, with the Korean Patent Office.

The terms used in the embodiments of the present disclosure have been selected as widely used general terms as possible in consideration of the functions of the present disclosure, but may vary according to the intention or precedent of the person skilled in the art, the emergence of new technologies, etc. . In addition, in certain cases, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding embodiment. Therefore, the terms used in the present disclosure should be defined based on the meanings of the terms and the contents throughout the present disclosure, rather than simply the names of the terms.

When any part of the specification is to "include" any component, this means that it may further include other components, except to exclude other components unless otherwise stated. In addition, the terms "... unit", "... module" described in the specification means a unit for processing at least one function or operation, which is implemented in hardware or software or a combination of hardware and software. Can be.

DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present disclosure. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

1 illustrates training by inputting a sentence vector (S _i , S _j ) and a class (y ₁ , y ₂ ) into a neural network model (100, 110) according to an embodiment of the present disclosure. It is a conceptual diagram for explaining an embodiment of obtaining classification prediction values y ₁ ′ and y ₂ ′ of a class.

Artificial intelligence (AI) algorithms, including deep neural networks, include input data into an artificial neural network (ANN), and learn output data through operations such as convolution. do. Artificial neural networks can refer to a computer scientific architecture that models the biological brain. In an artificial neural network, nodes corresponding to neurons in the brain are connected to each other and operate collectively to process input data. In a feed-forward neural network, neurons in the neural network have links with other neurons. Such connections may extend in one direction, for example in a forward direction, via a neural network.

1, the first, the first sentence vector (S _i) and a first class (y ₁₎ to the neural network model 100 is input as the input data, the first learning by the neural network model 100 ( The first classification predicted value y ₁ ′ may be output through training. In addition, a second sentence vector S _j and a second class y ₂ are input to the second neural network model 110 as input data, and the second classification is performed through learning through the second neural network model 110. The predicted value y ₂ ′ may be output.

The first neural network model 100 and the second neural network model 110 shown in FIG. 1 may be implemented as a convolutional neural network (CNN), but are not limited thereto. In one embodiment, the first neural network model 100 and the second neural network model 110 comprise a recurrent neural network (RNN), a deep belief network (DBN), a limited Boltzmann machine ( It can be implemented as an artificial neural network model such as a Restricted Boltzman Machine (RBM) method, or a machine learning model such as a support vector machine (SVM).

The first sentence vector S _i and the second sentence vector S _j are extracted by parsing at least one word included in a sentence or utterance input by a user through a natural language processing technique, and extracts the extracted word. Can be generated by converting to a vector. In one embodiment, the first sentence vector (S _i) and the second sentence vectors (S _j), but may be generated through a machine learning model for embedding (embedding) the word, such as word2vec, GloVe, onehot as vectors, whereby It is not limited. The first sentence vector _Si and the second sentence vector S _j may be generated by arranging at least one word vector in a matrix form.

The first class (y ₁₎ and a second class (y ₂₎ is a vector may be a value that defines the class to which it belongs each of the first sentence vector (S _i) and the second sentence vectors (S _j). Here, a class does not mean a hierarchy, but may mean a category classification to which a sentence belongs, for example, politics, society, economy, culture, entertainment, IT, and the like.

First classification predicted value (y ₁ ') are learned through the first neural network model 100, the first sentence vector (S _i) of the first neural network model 100 as a result of the data output due to learning by It may mean a probability value that may be classified as the first class y ₁ . For example, if the corresponding first class (y ₁ ) value of the 'politics' category is (1, 0, 0), and the first classification prediction value (y ₁ ') is (0.9, 0.05, 0.05), the first first sentence corresponding to the sentence vector (S _i) can be related to the global category "politics." The second classification predicted value y ₂ ′ is a result value output through the second neural network model 110, and the second sentence vector S _j is trained through the second neural network model 110 to generate a second class. It may mean a probability value that may be classified as (y ₂ ). A first classification loss value may be obtained by calculating a difference value between the first classification prediction value y ₁ ′ and the first class y ₁ . Similarly, a second classification loss value may be obtained by calculating a difference value between the second classification prediction value y ₂ ′ and the second class y ₂ .

In one embodiment, the first neural network model 100 and the second neural network model 110 may be configured as a convolutional neural network (CNN). The first sentence vector _Si and the second sentence vector S _j are each configured to filter a plurality of filters having different widths through the first neural network model 100 and the second neural network model 110. Through convolution operation, the first and second feature vectors may be learned, respectively. A first feature vector learned through the first neural network model 100, a second feature vector trained through the second neural network model 110, and a first class y ₁ and a second class y ₂ . Contrastive loss L ₁ may be obtained based on the identity of. In one embodiment, the contrast loss value L ₁ may be calculated through a numerical expression representing the dot product of the first feature vector and the second feature vector and whether the first class and the second class are the same. . The control loss value L ₁ will be described in detail in the description of FIG. 4.

In one embodiment, the control loss value L ₁ may have a value in the range of −1 or more and 1 or less. In one embodiment of the present disclosure, learning with the first neural network model 100 and the second neural network model 110 may be repeated in a direction in which the control loss value L ₁ is maximized. Here, the repetition of learning may mean adjusting a weight applied to the first neural network model 100 and the second neural network model 110.

In general deep neural network-based text classification, a classification is made by creating a loss function by learning through a neural network model based on the label of the sentence to be learned. Perform. Thus, many misclassifications can occur if the utterance to be classified does not fall into any class of the classification model. In addition, since classes are classified based on labels, many expressions may be misclassified even if the expressions are similar. For example, when the user input speech is "Send 'KakaoTalk' to 'XXX'", a case may be classified as "Send 'Text' to 'XXX'."

An embodiment of the present disclosure not only learns using the first neural network model 100 in classifying a first class to which a first sentence belongs, but also a second neural network model to learn a second sentence belonging to a second class. A first feature vector learned through 110 and learned from the first neural network model 100, a second feature vector learned from the second neural network model 110, and a first class y ₁ and By calculating the contrast loss value (L ₁ ) based on the identity of the two classes (y ₂ ), there is provided a method and apparatus that can distinguish the degree of representational similarity between the first sentence and the second sentence. Accordingly, the method and apparatus according to an embodiment of the present disclosure may use not only the label of a sentence or speech but also semantic similarity together to improve classification accuracy for sentences that are similar but belong to different classes.

2 is a block diagram illustrating components of the electronic device 200 according to an embodiment of the present disclosure. The electronic device 200 may be a device that performs training for classifying a class to which a sentence belongs by using a neural network model. The electronic device 200 may be a fixed terminal implemented as a computer device or a mobile terminal. The electronic device 200 may be, for example, at least one of a smart phone, a mobile phone, a navigation device, a computer, a notebook computer, a digital broadcasting terminal, a personal digital assistant (PDA), a portable multimedia player (PMP), and a tablet PC. But it is not limited thereto. The electronic device 200 may communicate with other electronic devices and / or servers through a network by using a wireless or wired communication scheme.

Referring to FIG. 2, the electronic device 200 may include a processor 210, a memory 220, and a speech input unit 230.

The processor 210 may be configured to process instructions of a computer program by performing arithmetic, logic, and input / output operations, such as convolution operations. Instructions may be provided to the processor 210 by the memory 220. In one embodiment, processor 210 may be configured to execute a command received according to a program code stored in a recording device, such as memory 220. The processor 210 may be configured, for example, with at least one of a central processing unit, a microprocessor, and a graphic processing unit, but is not limited thereto. In an embodiment, when the electronic device 200 is a mobile device such as a smart phone, a tablet PC, or the like, the processor 210 may be an application processor (AP) for executing an application.

The processor 210 may perform training through a general artificial intelligence algorithm based on a deep neural network such as a neural network model.

The processor 210 may perform a natural language processing (NLP) such as extracting a word from a user's speech, a question sentence, and the like, converting the extracted word into a word vector to generate a sentence vector. The processor 210 parses a word object through objectization of a sentence, processes a still word (filters for articles, etc.) and generates a token (tense, plural unification, etc.), and then extracts highly related keywords based on the frequency of occurrence. You can manage this as an independent entity.

In an embodiment, the processor 210 learns a first feature vector through a first neural network using as input data a first sentence including at least one word and a first class to which the first sentence belongs. The second feature vector may be learned through the second neural network using the second class and the second class to which the second sentence belongs. Here, the first sentence may be a sentence or speech input by a user, and the second sentence may be a sentence extracted randomly among a plurality of sentences stored in a server or a database.

The processor 210 may quantify the degree of similarity in the expression of the first sentence and the second sentence based on the first feature vector, the second feature vector, and whether the first class and the second class are identical. ) Can be obtained. In one embodiment, the processor 210 may calculate the contrast loss value through a numerical expression representing the dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal. . The method of calculating the contrast loss value will be described in detail in the description with reference to FIG. 4.

In one embodiment, the contrast loss value has a value in the range of -1 or more and 1 or less, and the processor 210 repeats the learning using the first neural network and the second neural network so that the obtained contrast loss value is maximum. Can be done.

In one embodiment, the processor 210 may simultaneously perform learning through the first neural network and learning through the second neural network.

The electronic device 200 may further include a speech input unit 230 that receives a speech or sentence from a user. The speech input unit 230 may include a voice recognition module for recognizing a user's voice, but is not limited thereto. The speech input unit 230 may include, for example, a hardware module capable of receiving a user's sentence such as a keypad, a mouse, a touch pad, a touch screen, a jog switch, and the like. The processor 210 recognizes a utterance input through the utterance input unit 230 as a sentence, parses and extracts at least one word included in the recognized sentence, and extracts at least one extracted word. Each can be converted to a word vector. In one embodiment, the processor 210 may embed a word into a vector using a machine learning model such as word2vec, GloVe, onehot, etc., but is not limited thereto. The processor 210 may convert the word representation into a vector value that can be represented in a vector space using the machine learning model.

The processor 210 may generate a sentence vector by arranging at least one word vector in a matrix form, and input the sentence vector as input data to the first neural network to learn the first feature vector. In one embodiment, the processor 210 converts the first sentence into a matrix form comprising at least one word vector, inputs the transformed matrix into the convolutional neural network as input data, and inputs a plurality of filters. A feature map may be generated and the first feature vector may be extracted by passing the feature map through a max pooling layer.

In one embodiment, the processor 210 inputs the first feature vector into a fully connected layer and converts it into a one-dimensional vector value, and inputs the one-dimensional vector value to a softmax classifier. A first classification prediction value representing a probability distribution classified into one class may be obtained. Similarly, the processor 210 learns and extracts a second feature vector, inputs the second feature vector into a fully connected layer, converts it into a one-dimensional vector value, and inputs the one-dimensional vector value to a soft max classifier to give the second class. A second classification prediction value representing a probability distribution classified as may be obtained. Detailed description thereof will be described later in the description of FIG. 4.

The processor 210 may obtain a first classification loss value that is a difference between the first classification prediction value and the first class, and obtain a second classification loss value that is a difference between the second classification prediction value and the second class. The processor 210 calculates a final loss value by summing the first classification loss value, the second classification loss value, and the control loss value, and is applied to the first neural network and the second neural network based on the calculated final loss value. Learning to adjust the weight can be repeated.

The memory 220 may be a computer-readable recording medium, and may include a permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. The memory 220 may store an operating system (OS) or at least one computer program code (for example, a code for a learning program through a neural network performed by the processor 210). . Such computer program code may be stored in the memory 220 but may be loaded from a separate computer readable recording medium or a computer program product. The computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD / CD-ROM drive, a memory card, and the like. In one embodiment, the computer program code may be installed in the electronic device 200 and loaded from the memory 220 by files provided from a server through a network.

Although not illustrated, the electronic device 200 may include a database. The database may store a plurality of sentences and a plurality of classes to which each of the plurality of sentences belongs. In an embodiment, the database may be included as a component in the electronic device 200, but is not limited thereto. The database may be configured in the form of a server disposed outside the electronic device 200. In an embodiment, the processor 210 may randomly extract the second sentence and the second class from the database and input the input data into the second neural network as input data to learn.

In operation S310, the electronic device trains a first feature vector through a first neural network using the first sentence and the first class to which the first sentence belongs as input data. In an embodiment, the electronic device may receive a utterance or question from a user and recognize the received utterance or question as a sentence. The electronic device may parse at least one word included in a recognized sentence by using natural language processing (NLP) technology, and may convert at least one word into at least one word vector. In an embodiment, the electronic device may embed at least one word into at least one word vector using a machine learning model such as word2vec, GloVe, onehot, and the like, but is not limited thereto. The electronic device may convert the word representation into a vector value that can be represented in a vector space using the machine learning model.

In an embodiment, the electronic device generates a sentence vector by arranging the embedded at least one word vector in a matrix form, and inputs the generated sentence vector as input data into the first neural network to classify it as a first class. Learn probability distributions.

In operation S320, the electronic device learns the second feature vector through the second neural network using the second sentence and the second class to which the second sentence belongs as input data. In an embodiment, the second sentence and the second class to which the second sentence belongs may be stored in a database form. The electronic device may randomly extract the second sentence and the second class from the database and input the second sentence and the second class as input data into the second neural network to learn. 3 illustrates that step S320 is performed after step S310, but is not limited thereto. The electronic device may simultaneously perform the step of learning the first feature vector (S310) and the step of learning the second feature vector (S320).

In operation S330, the electronic device obtains a contrast loss based on the first feature vector, the second feature vector, and whether the first class and the second class are identical. In one embodiment, the contrast loss value may be calculated through a numerical expression representing the dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal. The expression may output 1 when the first class and the second class are the same, and output 0 when the first class and the second class are not the same. A detailed example of calculating the contrast loss value will be described in detail with reference to FIG. 4.

In operation S340, the electronic device repeats the learning using the first neural network and the second neural network so that the contrast loss value is maximum. In one embodiment, the control loss value may have a value ranging from -1 to 1, inclusive. The electronic device may repeat the learning using the first neural network model and the second neural network model in a direction in which the contrast loss value is maximized. Here, repetition of learning may mean adjusting a weight applied to the first neural network model and the second neural network model.

Referring to Figure 4, an electronic apparatus includes a first neural network (401), a first sentence (S _i) of the first class (y _1), a first sentence (S _i) to the input to the input data is the first class in the ( We can learn the probability distribution classified as y ₁ ). Further, the electronic device is a second neural network (402), a second sentence (S _j) and a second class (y ₂₎ of this second class second sentence (S _j) by entering the input data (y ₂₎ in the Learn the probability distributions that are classified. The first class (y ₁₎ and a second class (y ₂₎ is a vector may be a value that defines the class to which it belongs each of the first sentence vector (S _i) and the second sentence vectors (S _j).

In FIG. 4, the first neural network 401 and the second neural network 402 may be configured as a convolutional neural network model (CNN), but are not limited thereto. In one embodiment, the first neural network 401 and the second neural network 402 are a recurrent neural network (RNN), a deep belief network (DBN), a restricted Boltzman machine (Restricted Boltzman). It may be implemented as an artificial neural network model such as a machine (RBM) scheme or a machine learning model such as a support vector machine (SVM).

Electronic device extracts a plurality of words (words 1-1 to 1-6 words) in the first sentence (S _i) with a natural language processing technology, and the first sentence (S _i) a plurality of words (words 1-1 To words 1-6), and a plurality of words (words 1-1 to 1-6) can be extracted. Figure 4 but in a total of six words shown in 1-1 to 1-6, which is exemplary and is not a word belonging to the first sentence (S _i) is not limited to six.

The electronic device is a plurality of words (words 1-1 to 1-6 words), a plurality of word vectors, _respectively can be converted to (wv wv _1-1 to ₁ _6). In one embodiment, the electronic device is the first sentence vector (S _i) and the second sentence vectors (S _j) is word2vec, GloVe, a plurality of words using a machine learning model, such as onehot (1-1 words to the words 1 -6 may be embedded into a plurality of word vectors wv ₁ _-1 to wv ₁ _-6 .

Electronic device of the converted multi-word vector (wv _-1 to ₁ wv ₁ _- ₆₎ for the arranged in a matrix (matrix) form of generating a sentence vector 411, and the sentence vector 411, a first neural network (401 ) Can be entered. The sentence vector 411 may be an n × k matrix having n words and a dimension of k.

The electronic device may apply a plurality of filters 421 having different widths to the sentence vector 411 to perform a convolution operation, thereby generating a feature map 431. The plurality of filters 421 are vectors having different weights, and the weight value may change as learning progresses. The electronic device may generate the feature map 431 by multiplying and adding vector values of the sentence vector 411 and weight values of the plurality of filters 421. In FIG. 4, the plurality of filters 421 are illustrated as having a width of 2, 3, and 4, but are not limited thereto. The dimension k in the plurality of filters 421 may be the same as the dimension k of the sentence vector.

The electronic device may subsample the feature map 431 by passing the feature map 431 through a max pooling layer, and generate a first feature vector 441. The first feature vector 441 is a single feature vector generated by extracting only the vector value having the maximum value from the feature map 431 through the max pooling layer, and is a representation of the first sentence Si. ) Can be defined as an expression vector. Although shown in FIG. 4 as generating the first feature vector 441 through the max pooling layer, other sub-sampling layers may be substituted. For example, the electronic device may generate the first feature vector 441 through average pooling or L ₂ -norm pooling.

The electronic device may input and concatenate the first feature vector 441 to a fully connected layer, thereby generating the one-dimensional vector 451. The electronic device may generate the first classification predicted value vector 461 by inputting the one-dimensional vector 451 to the softmax classifier. A first classification vector prediction value 461 may represent a probability distribution, which can be classified as a first sentence (S _i) a first class (y _1). Here, the electronic device may perform a dropout operation in order to prevent the occurrence of overfitting generated in the process of adjusting the weight.

The electronic device generates a second feature vector 442 by inputting the second sentence S _j and the second class y ₂ as input data to the second neural network 402, thereby generating a second classification predicted vector. In operation 462, the learning method through the second neural network 402 is the same as the learning method through the first neural network 401 except for input data and learning results, and thus redundant description thereof will be omitted.

The electronic device includes a first feature vector 441, a second feature vector 442, and the first class (y ₁₎ and the second class (y _2), a first sentence (S _i) based on the identity if the and the Contrastive loss L ₁ obtained by quantifying the degree of similarity in expression of two sentences S _j may be obtained. If the first feature vector 441 is defined as F (S _i ) and the second feature vector 442 is defined as F (S _j ), the contrast loss value L ₁ is calculated based on the following equation: Can be.

Referring to Equation 1, the contrast loss value L ₁ may be calculated through the absolute value of the dot product and Y of the first feature vector F (S _i ) and the second feature vector F (S _j ). have. In Equation 1, Y is a notation for converting the identity of the first class (y ₁ ) and the second class (y ₂ ) to a number, and the first class (y ₁ ) and the second class (y ₂ ) 1 may be output when 0 is the same, and 0 may be output when first class y ₁ and second class y ₂ are not identical.

When the first sentence _Si and the second sentence S _j belong to different classes and the sentence expressions are similar, the contrast loss value L ₁ may be −1. That is, when the equation 1 is substituted, Y = 0 because the first class y ₁ and the second class y ₂ are different classes, and the first feature vector F (S _i ) and the second feature vector F are different. Since (S _j ) is similar so that the internal absolute value of F (S _i ) and F (S _j ) is close to 1, the contrast loss value (L ₁ ) is 0 × | ~ 1 |-(1-0) X | -1 | = -1 can be calculated.

When the first sentence S _i and the second sentence S _j belong to different classes and the sentence expressions are also different from each other, the contrast loss value L ₁ may be zero. That is, Y = 0 because the first class y ₁ and the second class y ₂ are different classes, and the first feature vector F (S _i ) and the second feature vector F (S _j ) are also distinguished from each other. Since the cosine value is close to zero, the inner product absolute value can be close to zero. Thus, by substituting Equation 1, the contrast loss value L ₁ can be calculated as 0 × | ~ 0 |-(1-0) × | ~ 0 | = 0.

When the first sentence _Si and the second sentence S _j belong to the same class and the sentence expressions are different from each other, the contrast loss value L ₁ may be zero. That is, since the first class y ₁ and the second class y ₂ are the same class, Y = 1, and the first feature vector F (S _i ) and the second feature vector F (S _j ) are distinguished from each other. Since the cosine value is close to zero, the internal absolute value can be close to zero. Therefore, by substituting Equation 1, the contrast loss value L ₁ can be calculated as 1 × | ~ 0 |-(1-1) × | ~ 0 | = 0.

When the first sentence _Si and the second sentence S _j belong to the same class and the sentence expressions are similar, the contrast loss value L ₁ may be one. That is, since the first class y ₁ and the second class y ₂ are the same class, Y = 1, and the first feature vector F (S _i ) and the second feature vector F (S _j ) are similar classes. In other words, the cosine value is close to 1, so the internal absolute value can be close to 1. Therefore, by substituting Equation 1, the control loss value L ₁ can be calculated as 1 × | ~ 1 |-(1-1) × | ~ 1 | = 1.

Referring to Equation 1, the contrast loss value L ₁ is not only identical to the classes y ₁ and y _{2 in} which the _first and second feature vectors are classified, but also the first feature vector F (S _i. ) And the similarity degree of the second feature vector F (S _j ) may be determined.

The electronic device may learn in a direction in which the contrast loss value L ₁ is maximized. Referring to Equation 1, the contrast loss value L ₁ has a value of -1 or more and 1 or less. When -1, the weight loss of the first neural network 401 and the second neural network 402 is determined. ) To increase the number of lessons. In contrast, when the contrast loss value L ₁ is 1, the electronic device may reduce the number of times of learning through the first neural network 401 and the second neural network 402. That is, when the first sentence _Si and the second sentence S _j have similar expressions even though they belong to different classes, the electronic device may increase the number of learning to distinguish each other. In addition, the electronic device may not have increased the number of times of learning with relative case with similar expression belonging to the same class the first sentence (S _i) and the second sentence (S _j).

According to an embodiment, the electronic device may determine a first classification loss value, which is a difference value between a first classification prediction value vector 461 that is output data of learning through the first neural network 401 and a vector of the first class y ₁ . classification loss (L ₂ ) can be obtained. Similarly, the electronic device determines a second classification loss value L ₃ , which is a difference between a second classification prediction value vector 462, which is output data of learning through the second neural network 402, and a vector of the second class y ₂ . Can be obtained. The first classification loss value L ₂ and the second classification loss value L ₃ are each classified into the first class S _i as the first class y ₁ , and the second sentence S _j is the first classification loss value L ₃ . A value representing the degree of classification into two classes (y ₂ ). The smaller the value, the higher the accuracy of classification.

In an embodiment, the electronic device may add a final loss value (total) by adding a control loss value L ₁ , a first classification loss value L ₂ , and a second classification loss value L ₃ as shown in Equation 2 below. loss) (L) can be calculated.

The electronic device may learn to adjust weights applied to the first neural network 401 and the second neural network 402 based on the calculated final loss value L. FIG.

According to an embodiment of the present disclosure shown in Figure 4, a first sentence (S _i) and the second sentence (S _j) are each of the first class (y ₁₎ and a second class (y ₂₎ classification classified as loss By simultaneously considering not only the values (L- ₂ , L ₃ ) but also the contrast loss value (L ₁ ) that takes into account sentence representations, so that the representations are different if they belong to different classes but have similar word representations. This can prevent misclassification. That is, the electronic device according to an embodiment of the present disclosure can learn expressions that can be effectively separated for similar speech or sentences belonging to different classes, and thus can increase the accuracy of speech or sentence classification.

In particular, when the electronic device executes an interactive robot program such as Bixby or the like, even if the first sentence S _i , which is the speech input by the user, belongs to the first class y ₁ , the sentence expression is different and thus is different. Can be classified as In this case the user is not the answer to your question in accordance with the first sentence of unwanted (S _i) may be the answer incorrectly classified. In this case, the electronic device may increase the accuracy of classifying the class to which the user's question belongs by learning by considering the contrast loss value L ₁ . In addition, the may be a case where one sentence (S _i) does not belong to any class of previously stored class to the electronic device, in which case the electronic device can not be classified as a either a class of the first sentence (S _i) (reject), which can reduce the likelihood of users receiving unwanted answers.

In operation S510, the electronic device converts the first sentence into a matrix form including at least one word vector. In one embodiment, the first sentence may be a speech or sentence input by a user. The electronic device may extract at least one word included in the first sentence and convert the at least one word into at least one word vector. In one embodiment, the electronic device may embed at least one word into at least one word vector using a machine learning model such as word2vec, GloVe, onehot. The electronic device may generate the first sentence vector by arranging at least one word vector in a matrix form.

In operation S520, the electronic device inputs the converted matrix as input data to a convolutional neural network and generates a feature map by applying a plurality of filters. In an embodiment, the electronic device may apply a convolution operation by applying multiple filters having different widths. The multiple filter is a vector having different weights, and the weight value may change as learning progresses. The multiple filter may have the same dimension as the dimension of the sentence vector generated in step S510.

In operation S530, the electronic device extracts the first feature vector by passing the feature map through a max pooling layer. The electronic device may extract a first feature vector that is a single feature vector generated by extracting only a vector value having a maximum value from the feature map through the max pooling layer. However, the layer used for subsampling is not limited to the max pooling layer. In an embodiment, the electronic device may extract the first feature vector through average pooling or L ₂ -norm pooling.

In operation S540, the electronic device inputs the first feature vector into a fully connected layer and converts the first feature vector into a one-dimensional vector value. According to an embodiment of the present disclosure, the electronic device may concatenate a first feature vector composed of a plurality of feature maps generated by using filters having different widths into one to convert a one-dimensional vector value. In operation S540, a dropout operation may be used to solve overfitting occurring while learning the first feature vector and to increase the accuracy of the training data.

In operation S550, the electronic device obtains a first classification prediction value by inputting a one-dimensional vector value to a softmax classifier. In one embodiment, the first classification prediction value refers to a probability value in which the first sentence may be classified into the first class, and may be generated by passing through a soft max classifier. The vector value included in the one-dimensional vector may be converted into a probability value through which the total sum of the vector values passes through the soft max classifier.

Although FIG. 5 illustrates a process of obtaining a first classification prediction value by inputting a first sentence to a convolutional neural network, the illustrated steps may be equally applied to the second sentence. According to an embodiment, the electronic device may obtain a second classification prediction value by inputting the second sentence to the convolutional neural network according to steps S510 to S550. In an embodiment, the electronic device may simultaneously perform a first learning process of obtaining a first classification prediction value and a second learning process of obtaining a second classification prediction value.

In operation S610, the electronic device obtains a first classification loss value that is a difference between the first classification prediction value and the first class. The first classification loss value may mean a difference value between the first classification prediction value and the first class vector, which is a probability value in which the first sentence may be classified into the first class.

In operation S620, the electronic device obtains a second classification loss value that is a difference between the second classification prediction value and the second class. The second classification loss value may mean a difference value between the second classification prediction value and the second class vector, which is a probability value in which the second sentence may be classified into the second class. In one embodiment, step S610 and step S620 may be performed simultaneously.

In operation S630, the electronic device obtains a final loss value by summing the first classification loss value, the second classification loss value, and the control loss value. A detailed method of calculating the control loss value has been described in the description of FIG. 4, and thus redundant description will be omitted.

In operation S640, the electronic device adjusts a weight applied to the first neural network and the second neural network based on the final loss value. In one embodiment, the first neural network and the second neural network are composed of a convolutional neural network that generates a feature map by applying a plurality of filters, and the electronic device is applied to the convolutional neural network according to the magnitude of the final loss value. The weight values of the plurality of filters may be adjusted.

The electronic device described herein may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the electronic device described in the disclosed embodiments may include a processor, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), It may be implemented using one or more general purpose or special purpose computers, such as a microprocessor or any other device capable of executing and responding to instructions.

The software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device.

The software may be implemented as a computer program including instructions stored in a computer-readable storage media. Computer-readable recording media include, for example, magnetic storage media (eg, read-only memory (ROM), random-access memory (RAM), floppy disks, hard disks, etc.) and optical read media (eg, CD-ROMs). (CD-ROM) and DVD (Digital Versatile Disc). The computer readable recording medium can be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. The medium may be read by a computer, stored in a memory, and executed by a processor.

The computer is a device capable of calling stored instructions from a storage medium and operating according to the disclosed embodiments according to the called instructions, and may include an electronic device according to the disclosed embodiments.

The computer readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-temporary' means that the storage medium does not include a signal and is tangible, but does not distinguish that the data is stored semi-permanently or temporarily on the storage medium.

In addition, an electronic device or method according to the disclosed embodiments may be provided included in a computer program product. The computer program product may be traded between the seller and the buyer as a product.

The computer program product may include a software program, a computer-readable storage medium on which the software program is stored. For example, a computer program product may be a product (eg, a downloadable application) in the form of a software program distributed electronically through a manufacturer of an electronic device or an electronic market (eg, Google Play Store, App Store). ) May be included. For electronic distribution, at least a portion of the software program may be stored on a storage medium or temporarily created. In this case, the storage medium may be a server of a manufacturer, a server of an electronic market, or a storage medium of a relay server that temporarily stores a software program.

The computer program product may include a storage medium of a server or a storage medium of a terminal in a system consisting of a server and a terminal (for example, an ultrasound diagnostic apparatus). Alternatively, when there is a third device (eg, a smartphone) that is in communication with the server or the terminal, the computer program product may include a storage medium of the third device. Alternatively, the computer program product may include a software program itself transmitted from the server to the terminal or the third device, or transmitted from the third device to the terminal.

In this case, one of the server, the terminal and the third device may execute the computer program product to perform the method according to the disclosed embodiments. Alternatively, two or more of the server, the terminal and the third device may execute a computer program product to distribute and implement the method according to the disclosed embodiments.

For example, a server (eg, a cloud server or an artificial intelligence server, etc.) may execute a computer program product stored in the server to control a terminal connected to the server to perform the method according to the disclosed embodiments.

As another example, a third device may execute a computer program product to control a terminal in communication with the third device to perform the method according to the disclosed embodiment.

When the third device executes the computer program product, the third device may download the computer program product from the server and execute the downloaded computer program product. Alternatively, the third apparatus may execute the provided computer program product in a preloaded state to perform the method according to the disclosed embodiments.

In addition, although the embodiments of the present disclosure have been illustrated and described above, the present disclosure is not limited to the above-described specific embodiments, and the present disclosure belongs to the present invention without departing from the gist of the present invention as claimed in the claims. Various modifications may be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present disclosure.

Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described electronic devices, structures, circuits, etc. may be combined or combined in a different form than the described method, or other components or Appropriate results can be achieved even if replaced or replaced by equivalents.

Claims

In a method of classifying a class to which a sentence belongs by using a deep neural network,

Training a first feature vector through a first neural network using a first sentence including at least one word and a first class to which the first sentence belongs as input data; ;

Learning a second feature vector through a second neural network using as input data a second sentence and a second class to which the second sentence belongs;

Contrastive loss that quantifies the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. Obtaining; And

Repeating learning with the first neural network and the second neural network such that the control loss value is maximum;

Including, the method.
According to claim 1,

Receiving a speech from a user;

Recognizing the received speech as a sentence; And

Extracting at least one word included in the recognized sentence and converting the at least one word into at least one word vector, respectively; More,

Learning the first feature vector may include:

Generating a sentence vector by arranging the at least one word vector in a matrix form; And

Learning the first feature vector by inputting the sentence vector to the first neural network as input data.
According to claim 1,

A plurality of sentences and a plurality of classes to which each of the plurality of sentences belong are stored in a database.

Wherein the second sentence and the second class are extracted randomly on the database,
According to claim 1,

The obtaining of the control loss value may include obtaining the control loss value through a formula representing a dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal to each other. To calculate, how.
The method of claim 4, wherein

And the formula outputs 1 when the first class and the second class are the same and outputs 0 when the first class and the second class are not the same.
According to claim 1,

Learning through the first neural network and learning through the second neural network are performed simultaneously.
In an electronic device that classifies a class to which a sentence belongs using a deep neural network,

It includes a processor that performs the training (training) using a neural network,

The processor learns a first feature vector through a first neural network using a first sentence including at least one word and a first class to which the first sentence belongs as input data, and a second sentence. And a second feature vector is learned through a second neural network using the second class to which the second sentence belongs, and the first feature vector, the second feature vector, and the first and second classes. A contrast loss is obtained by quantifying the degree of similarity in the expressions of the first sentence and the second sentence based on the identity of the first sentence, and the first neural network and the first order such that the contrast loss is maximized. 2 An electronic device repeatedly performing learning using a neural network.
The method of claim 7, wherein

A speech input unit configured to receive a speech from a user; More,

The processor recognizes the received speech as a sentence, extracts at least one word included in the recognized sentence, and converts the at least one word into at least one word vector, respectively.
The method of claim 8,

The processor generates the sentence vector by arranging the at least one word vector in a matrix form, and inputs the sentence vector as input data to the first neural network to learn the first feature vector. .
The method of claim 7, wherein

A database for storing a plurality of sentences and a plurality of classes to which each of the plurality of sentences belongs;

More,

And the processor extracts the second sentence and the second class from the database randomly on the database and inputs the input data to the second neural network as input data.
The method of claim 7, wherein

The processor is configured to calculate the contrast loss value through a numerical expression representing a dot product of the first feature vector and the second feature vector and whether the first class and the second class are equal to each other. .
The method of claim 11, wherein

The formula outputs 1 when the first class and the second class are the same, and outputs 0 when the first class and the second class are not the same.
The method of claim 7, wherein

The processor converts the first sentence into a matrix including at least one word vector, inputs the converted matrix as input data to a convolutional neural network, and applies a plurality of filters to apply a feature map ( and generate a feature map and extract the first feature vector by passing the feature map through a max pooling layer.
The method of claim 7, wherein

The processor is configured to simultaneously perform learning through the first neural network and learning through the second neural network.
A computer program product comprising a computer readable storage medium,

The storage medium,

Training a first feature vector through a first neural network using a first sentence including at least one word and a first class to which the first sentence belongs as input data; ;

Learning a second feature vector through a second neural network using as input data a second sentence and a second class to which the second sentence belongs;

Contrastive loss that quantifies the degree of similarity in expression of the first sentence and the second sentence based on whether the first feature vector, the second feature vector, and the first class and the second class are identical. Obtaining; And

Repeating learning with the first neural network and the second neural network such that the control loss value is maximum;

A computer program product comprising instructions for performing a.