WO2020005049A1 - Procédé d'apprentissage pour réseau neuronal artificiel - Google Patents

Procédé d'apprentissage pour réseau neuronal artificiel Download PDF

Info

Publication number
WO2020005049A1
WO2020005049A1 PCT/KR2019/095019 KR2019095019W WO2020005049A1 WO 2020005049 A1 WO2020005049 A1 WO 2020005049A1 KR 2019095019 W KR2019095019 W KR 2019095019W WO 2020005049 A1 WO2020005049 A1 WO 2020005049A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
processor
sub
information processing
processing method
Prior art date
Application number
PCT/KR2019/095019
Other languages
English (en)
Korean (ko)
Inventor
송기영
강형신
Original Assignee
주식회사 수아랩
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 수아랩 filed Critical 주식회사 수아랩
Publication of WO2020005049A1 publication Critical patent/WO2020005049A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • TECHNICAL FIELD This disclosure relates to artificial neural networks, and more particularly, to a method of training artificial neural networks and data classification through trained neural networks.
  • Pattern recognition is a field of machine learning, and refers to the science of recognizing the regularity of patterns and data.
  • Pattern recognition techniques include supervised learning and unsupervised learning methods.
  • Supervised learning means a method in which an algorithm learns pattern recognition using data for which the result of pattern recognition has already been determined (called "training" data).
  • the present disclosure is devised in response to the above-described background, and is to provide a method for training an artificial neural network and data classification through the trained neural network.
  • an information processing method performed in one or more processors comprises: receiving, by a processor, first data; Generating, by a processor, second data based on the first data using a first computational model comprising a plurality of submodels connected in series, wherein the submodel includes at least one layer for changing dimensions box-; Generating, by a processor, a cost function based on at least two of the first data, the second data, and a plurality of sub-data generated by the plurality of sub-models; It may include.
  • generating the second data comprises: reducing, by a processor, the dimension of the first data using a first reduction layer included in a first submodel, Generating first sub data by recovering the reduced dimension of the first data using an included first recovery layer; And reduce, by a processor, the dimension of the first sub data using a second reduction layer included in a second sub-model, and use the first sub data using a second recovery layer included in the second sub-model. Generating the second data by recovering a reduced dimension of the; It may include.
  • the information processing method further comprises: training, by a processor, at least one of the plurality of submodels based on the cost function; It may include.
  • the cost function is:
  • the cost function may include: a term based on the relationship between the first data and the second data, and a term based on the relationship between the first data and the first sub data.
  • the first data may comprise: training data consisting of normal data.
  • the first data may comprise: image data.
  • At least one of the plurality of submodels may comprise neural networks.
  • the neural network may comprise an auto-encoder or GAN (Generative Adversarial Networks).
  • the information processing method comprises: generating, by a processor, a second computational model that shares at least a portion of the first computational model; It may further include.
  • the second computational model may: share at least a portion of the first reduction layer and the second recovery layer.
  • an information processing method performed in one or more processors comprises: receiving, by a processor, first data; Generating, by a processor, second data based on the first data using a first computational model comprising a plurality of submodels connected in series, wherein the submodel includes at least one layer that changes dimensions -; Generating, by a processor, test result information based on the first data and the second data; It may include.
  • an information processing method performed in one or more processors comprises: receiving, by a processor, first data; Generating, by a processor, third data based on the first data using a second computational model that shares at least a portion of the first computational model; Generating, by a processor, test result information based on the first data and the third data; It may include.
  • a computing device comprises: one or more processors; And a memory storing instructions executable in the one or more processors; Wherein the one or more processors comprise: receiving first data; Generating second data based on the first data using a first computational model comprising a plurality of sub-models connected in series, the sub-model comprising at least one layer that changes dimensions; And generating a cost function based at least in part on the first data, the second data, and the plurality of sub-data generated by the plurality of sub-models; Can be performed.
  • a computing device comprises: one or more processors; And a memory storing instructions executable in the one or more processors; Wherein the one or more processors comprise: receiving first data; Generating second data based on the first data using a first computational model comprising a plurality of sub-models connected in series, the sub-model comprising at least one layer that changes dimensions; Generating test result information based on the first data and the second data; Can be performed.
  • a computing device includes: one or more processors; And a memory storing instructions executable in the one or more processors; Wherein the one or more processors comprise: receiving first data; Generating third data based on the first data using a second computational model that shares at least a portion of a first computational model; Generating test result information based on the first data and the third data; Can be performed.
  • the present disclosure devised in response to the above-described background art can provide a training method for artificial neural networks and data classification through trained neural networks.
  • FIG. 1 illustrates a block diagram of a computing device for performing an information processing method according to an embodiment of the present disclosure.
  • FIG. 2 is a schematic diagram illustrating a portion of an artificial neural network according to one embodiment of the present disclosure.
  • FIG. 3 is a schematic diagram of a computational model for implementing the information processing method of the present invention in accordance with one embodiment of the present disclosure.
  • FIG. 4 is a flowchart for describing steps of an information processing method for training a first connection model according to an embodiment of the present disclosure.
  • FIG. 5 is a flowchart for describing steps of an information processing method for performing a test using a first connection model according to an embodiment of the present disclosure.
  • FIG. 6 is a flowchart for explaining steps of an information processing method for performing a test using a second connection model according to an embodiment of the present disclosure.
  • FIG. 7 shows a brief, general schematic diagram of an example computing environment in which embodiments of the present disclosure may be implemented.
  • a component may be, but is not limited to being, a process running on a processor, a processor, an object, a thread of execution, a program, and / or a computer.
  • an application running on a computing device and the computing device can be a component.
  • One or more components can reside within a processor and / or thread of execution, and a component can be localized within one computer or distributed between two or more computers.
  • these components can execute from various computer readable media having various data structures stored thereon.
  • the components may, for example, via a network such as the Internet and other systems via signals with one or more data packets (e.g., data and / or signals from one component interacting with other components in a local system, distributed system). Data transmitted) may be communicated via local and / or remote processes.
  • a network such as the Internet and other systems via signals with one or more data packets (e.g., data and / or signals from one component interacting with other components in a local system, distributed system). Data transmitted) may be communicated via local and / or remote processes.
  • FIG. 1 illustrates a block diagram of a computing device in accordance with one embodiment of the present disclosure.
  • Computing device 100 may include a processor 110 and a memory 120.
  • the block diagram shown in FIG. 1 is a simplified configuration of a computing device, and the present disclosure is not limited thereto and may include additional components required for driving.
  • the processor 110 may include one or more cores, and may include a central processing unit (CPU), a general purpose graphics processing unit (GPGPU), and a tensor processing unit (TPU) of a computing device. and a processor for data analysis, machine learning, and the like, and the above-described types of processors are merely examples and the present disclosure is not limited thereto.
  • the processor 110 may read a computer program stored in the memory 120 to perform an information processing method according to an embodiment of the present disclosure. According to an embodiment of the present disclosure, the processor 110 may perform calculation for learning a neural network. In this specification, learning or training may be used in the same sense.
  • the processor 110 performs neural network learning such as processing input data for learning in machine learning, feature extraction from input data, error calculation, and weight update of neural networks using backpropagation. To perform the calculation. At least one of the CPU, the GPGPU, and the TPU of the processor 110 may process the learning of the computational model. For example, the CPU and GPGPU can work together to learn network functions and classify data using computational models. In addition, in an embodiment of the present disclosure, a processor of a plurality of computing devices may be used together to process learning of a calculation model and data classification using the calculation model.
  • the computer program executed in the computing device 100 may be a program executable by a CPU, a GPGPU, or a TPU.
  • the computing device 100 may distribute and process a computational model using at least one of a CPU, a GPGPU, and a TPU. In addition, in an embodiment of the present disclosure, the computing device 100 may distribute and process a computational model along with other computing devices.
  • the memory 120 may store a computer program for performing an information processing method according to an embodiment of the present disclosure, and the stored computer program may be read and driven by the processor 110.
  • FIG. 2 is a schematic diagram illustrating a portion of an artificial neural network according to an embodiment of the present disclosure.
  • a neural network may consist of a set of interconnected computing units, which may generally be referred to as "nodes.” Such “nodes” may be referred to as “neurons”.
  • the neural network comprises at least one node. Nodes (or neurons) that make up neural networks may be interconnected by one or more “links”.
  • one or more nodes connected via a link may form a relationship of input node and output node relatively.
  • the concept of an input node and an output node is relative; any node in an output node relationship to one node may be in an input node relationship in relation to another node, and vice versa.
  • the input node to output node relationship can be created around the link.
  • One or more output nodes can be connected to a single input node via a link, and vice versa.
  • the output node may be determined based on data input to the input node.
  • the node interconnecting the input node and the output node may have a weight.
  • the weight may be variable, and may be varied by the user or algorithm in order for the neural network to perform the desired function. For example, if one or more input nodes are interconnected by each link to one output node, the output node is set to the values input to the input nodes associated with the output node and to the links corresponding to the respective input nodes.
  • the output node value may be determined based on the weight.
  • a neural network is formed by interconnecting one or more nodes through one or more links to form an input node and an output node relationship within the neural network.
  • the characteristics of the neural network may be determined according to the number of nodes and links in the neural network, the relationship between the nodes and the links, and the value of the weight assigned to each of the links. For example, if there are the same number of nodes and links, and there are two neural networks with different weight values between the links, the two neural networks may be recognized as different from each other.
  • the neural network may comprise one or more nodes. Some of the nodes that make up the neural network may construct one layer based on distances from the original input node, for example, a set of nodes with a distance n from the original input node, You can configure n layers.
  • the distance from the original input node may be defined by the minimum number of links that must pass to reach the node from the original input node.
  • the definition of such a layer is arbitrary for explanation, and the order of the layer in the neural network may be defined in a manner different from that described above. For example, a layer of nodes may be defined by distance from the final output node.
  • the initial input node may refer to one or more nodes into which data is directly input without going through a link in relation to other nodes of the neural network.
  • a neural network network in a relationship between nodes based on a link, it may mean nodes having no other input nodes connected by a link.
  • the final output node may refer to one or more nodes that do not have an output node in relation to other nodes of the nodes in the neural network.
  • the hidden node may refer to nodes constituting a neural network other than the first input node and the last output node.
  • the neural network according to an embodiment of the present disclosure may have the same number of nodes in the input layer as the number of nodes in the output layer, and the number of nodes decreases and then increases again as the node progresses from the input layer to the hidden layer. Can be.
  • the neural network according to another embodiment of the present disclosure may be a neural network in which the number of nodes in the input layer is smaller than the number of nodes in the output layer and the number of nodes decreases as the input layer progresses from the input layer to the hidden layer. have.
  • the neural network according to another embodiment of the present disclosure may have a larger number of nodes in the input layer than the number of nodes in the output layer, and the number of nodes increases as the number of nodes increases from the input layer to the hidden layer. Can be.
  • the neural network according to another embodiment of the present disclosure may be a neural network in a combined form of the neural networks described above.
  • a deep neural network may refer to a neural network including a plurality of hidden layers in addition to an input layer and an output layer.
  • Deep neural networks can be used to identify potential structures in your data. In other words, you can identify the potential structures of photos, texts, videos, voices, and music (e.g., what objects are in the picture, what are the content and feelings of the text, what are the contents and feelings of the voice, etc.) .
  • Deep neural networks include convolutional neural networks (CNNs), recurrentneural networks (RNNs), auto encoders, Generic Adversarial Networks (GAN), restricted boltzmann (RBM) machine), deep belief network (DBN), Q network, U network, Siamese network and the like.
  • the neural network may be trained in at least one of supervised learning, unsupervised learning, and semi supervised learning. Training of neural networks is intended to minimize errors in the output.
  • the neural network error is inputted from the output layer of the neural network in order to repeatedly input the training data into the neural network, calculate the error of the neural network output and target for the training data, and reduce the error.
  • the learning data using the correct answer is labeled in each learning data (that is, the labeled learning data), and in the case of the comparative learning, the correct answer may not be labeled in each learning data.
  • the learning data in the case of teacher learning regarding data classification may be data in which a category is labeled in each of the learning data.
  • the labeled training data is input to the neural network, and an error can be calculated by comparing the label of the training data with the output (category) of the neural network.
  • an error may be calculated by comparing learning data as an input with a neural network output. The calculated error is propagated backward in the neural network (ie, the direction from the output layer to the input layer), and the connection weight of each node of each layer of the neural network may be updated according to the reverse propagation.
  • the connection weight of each node to be updated may be determined according to a learning rate.
  • the computation of the neural network for the input data and the backpropagation of the errors can constitute a learning cycle.
  • the learning rate may be applied differently according to the number of repetitions of the learning cycle of the neural network. For example, in the early stages of neural network learning, high learning rates can be used to allow the neural network to quickly achieve a certain level of performance, increasing efficiency, and lower learning rates for later learning.
  • training data can generally be a subset of the actual data (ie, the data to be processed using the trained neural network), thus reducing the error for the training data but not for the actual data.
  • Overfitting is a phenomenon in which errors on the actual data are increased by excessively learning the training data. For example, a neural network that learns a cat by showing a yellow cat may not recognize that the cat is a cat other than a yellow cat. Overfitting can be a cause of increased errors in machine learning algorithms.
  • Various optimization methods can be used to prevent such overfitting. In order to prevent overfitting, a method of increasing learning data, regulating, or dropping out of a node of a network in the course of learning may be applied.
  • FIG. 3 is a schematic diagram of a computational model for implementing the information processing method of the present invention in accordance with one embodiment of the present disclosure.
  • the information processing method of the present disclosure may be performed by one or more processors 110 of the computing device 100.
  • One or more processors 110 of computing device 100 of the present disclosure may perform a computational process of the computational model of the present disclosure.
  • All computational processes ie, learning of computational models, feature extraction, feature comparison, etc.
  • the expression processing data in the computational model may mean a process in which the processor 110 of the computing device 100 processes the data by executing the computational model.
  • the computational models 200, 300 of the present disclosure can be used for classification of data.
  • the computational models 200 and 300 of the present disclosure can be used for anomaly detection.
  • the anomaly data may refer to abnormal data that deviates from a normal pattern of data.
  • the data may have an atypical pattern and the anomaly data may refer to data that deviates from this atypical pattern.
  • data about an image of a product may have an atypical pattern called a normal product
  • anomaly data may be data that deviates from the atypical pattern of a normal product (ie, an image of a bad product, for example).
  • the description of the normal data, the atypical pattern, and the anomaly data of the present disclosure is merely an example and the present disclosure is not limited thereto.
  • the processor 110 may receive the first data.
  • the processor 110 may receive first data stored in the memory 120 or another storage medium.
  • the first computational model 200 may use the first data as an input.
  • the processor 110 may process the first data using the first calculation model.
  • the processor 110 may generate, as an output, second data based on the first data through the first calculation model.
  • the first data may be training data of the computational models 200 and 300.
  • the first data may be data to be classified through the calculation models 200 and 300.
  • the first data may be data that is subject to anomaly determination.
  • the first data of the present disclosure may include training data consisting of normal data.
  • the processor 110 of the present disclosure may train a computational model using training data consisting of normal data.
  • the submodel included in the first computational model of the present disclosure may include at least one layer (eg, a reduction layer and a recovery layer) for changing the dimension.
  • an error may occur while the input data is changed in dimension and recovered again.
  • the processor 110 may train the first operation model using errors generated by performing the above-described process a plurality of times with one input data.
  • the processor 110 may use the first sub data extracted from the first data, which is the input data, and the recovered data, and the second data extracted from the first sub data, and the second data recovered.
  • the first computational model can be trained.
  • the information processing method of the present disclosure can generate a plurality of output data using one input data.
  • the generated plurality of output data may be used for training as training data, and the information processing method of the present disclosure enables efficient training of a computational model even when the training data is small.
  • the first computational model 200 of the present disclosure may include a plurality of submodels connected in series.
  • the submodel included in the first operation may include at least one layer for changing the dimension. Accordingly, the submodel of the present disclosure may generate an error in the process of changing and reconstructing the dimension of the input data, which may be used for training and testing.
  • the submodel included in the first computational model 200 may include various neural networks.
  • the submodels include convolutional neural networks (CNNs), recurrent neural networks (RNNs), auto encoders, autonomous encoders (GANs), restrictive Boltzmann machines (RBMs). This may include restricted boltzmann machines, deep belief networks (DBNs), Q networks, U networks, and Siamese networks.
  • CNNs convolutional neural networks
  • RNNs recurrent neural networks
  • GANs autonomous encoders
  • RBMs restrictive Boltzmann machines
  • This may include restricted boltzmann machines, deep belief networks (DBNs), Q networks, U networks, and Siamese networks.
  • the submodel may be a model for extracting a feature from an input by the processor 110 and then again generating an output identical to the input from the feature.
  • the submodel may be an auto encoder including an encoder for reducing the dimension of the input data and a decoder for restoring the dimension of the data again.
  • the submodel may be GAN (Generative Adversarial Networks) including a generator and a separator.
  • the generator may play the role of the encoder of the auto encoder and the delimiter may play the role of the decoder of the auto encoder.
  • the present invention is not limited thereto, and the first operation model 200 may include various neural networks.
  • the plurality of submodels may be connected in series with each other.
  • serial may mean that at least a part of the output of the submodel may be an input of another submodel, or a part of the submodel may overlap with another submodel.
  • the processor 110 may generate the first sub data 600 by processing the first data 100 as an input.
  • the second sub-models 230 and 240 may output the second sub-data by processing the first sub-data 600 that is the output of the first sub-models 210 and 220 as an input.
  • the second sub-data since the second sub-model is the last stage in the first computational model serial structure, the second sub-data may be the second data that is the output of the first computational model 200.
  • the present disclosure is not limited thereto, and the first operation model 200 may include various numbers of submodels and may be connected in various ways.
  • the submodel of the present disclosure may include at least one layer for changing the dimension.
  • the submodel may include at least one of a reduction layer and a recovery layer.
  • the processor 110 of the present disclosure may reduce the dimension of the input data by using reduction layers.
  • the processor 110 may reduce the dimension of the first data by using the first reduction layer included in the first computational model 200.
  • the processor 110 may reduce the dimension of the first sub data by using the second reduction layer included in the second computational model 300.
  • the reduction layer may consist of a single layer or may consist of a plurality of layers.
  • the processor 110 of the present disclosure may recover reduced dimensions of the input data using reduction layers. For example, referring to FIG. 2, the processor 110 may recover the reduced dimension of the first data using the first recovery layer included in the first computational model 200. In addition, the processor 110 may reduce the dimension of the first sub data by using the second recovery layer included in the second operation model 300.
  • the recovery layer may consist of a single layer or may consist of a plurality of layers.
  • the processor 110 of the present disclosure may generate a cost function based at least in part on the first data, the second data, and the plurality of sub data generated by the plurality of sub models.
  • the processor 110 may generate a cost function to train the first computational model 200 in a direction in which the value of the cost function decreases.
  • a cost function may be used in the same sense as a loss function.
  • the processor 110 may use various kinds of cost functions.
  • the processor 110 may use at least one of mean squared error (MSE), cross entropy error (CEE), and intersection over union (IOU) as a cost function.
  • MSE mean squared error
  • CEE cross entropy error
  • IOU intersection over union
  • the present invention is not limited thereto, and the processor 110 may use various types of cost functions.
  • the processor 110 may use the average value of the errors as a cost function, or use the sum of the errors as the cost function.
  • the processor 110 may generate a cost function using a relationship between various data.
  • the relationship between the data may mean an error (or similarity) between the data.
  • the processor 110 may compare the features of the plurality of data (eg, images) to generate an error value (or similar value).
  • the processor 110 may calculate a function for comparing the mathematical distance between two features, or may be configured as a neural network to use a configuration for determining similarity between the two features.
  • the processor 110 may generate an error value by comparing the features extracted from the data with any data comparison algorithm, and the present disclosure is not limited to the comparison algorithm.
  • the processor 110 may use a deep neural network structure, and among them, may use a deconvolutional neural network structure.
  • the present invention is not limited thereto, and the relationship between data may include various meanings such as a difference, an average, and a sum between data.
  • the processor 110 may generate a cost function using a relationship between the first data 100 that is input data and the second data 400 that is output data of the first operation model 200.
  • the processor 110 may generate a cost function using the first data and the first sub data which are output data of the first sub models 210 and 220.
  • the present invention is not limited thereto.
  • the processor 110 may use the sum of the cost functions as the cost function.
  • the processor 110 may include a term that is a cost function based on the relationship between the first data and the second data, and a term that is a cost function based on the first data and the first sub data. ) Can be used as a cost function.
  • the types of terms that may be included in the cost function may increase.
  • the term based on at least two relationships among the first data, the second data, and the first to fourth subdata is a cost function. Can be included.
  • the present invention is not limited thereto.
  • the processor 110 of the present disclosure may set different weights to terms included in the cost function. For example, the processor 110 may multiply a term based on the relationship between the first data and the second data by 2 and a cost function of multiplying the term based on the first data and the first sub data by 1 by the weight. Can be generated. In this case, the effect of the term based on the relationship between the first data and the second data on the value of the cost function can be large.
  • the present invention is not limited thereto, and the weight may be variously set.
  • the processor 110 may variably set a weight according to a training time point. For example, the processor 110 sets the weights of all terms equally to train the first computational model, and then uses the cost function of multiplying the term based on the relationship between the first data and the second data by the weight to 2 1 You can train a computational model.
  • the present invention is not limited thereto, and the processor 110 may variously set weights at various time points.
  • the processor 110 of the present disclosure may train at least one of the plurality of submodels based on the cost function. For example, the processor 110 may train the first submodel and the second submodel based on the cost function. In addition, the processor 110 may train only the first sub-model or the second sub-model based on the cost function.
  • the processor 110 may variably set a submodel to be trained according to a training time point. For example, the processor 110 may train the first submodel and the second submodel, and then additionally train only the second submodel except for the first submodel. In addition, the processor 110 may train only a portion of the submodel. For example, the processor 110 may train only the first reduced layer of the first submodel and the second recovery layer of the second submodel.
  • the present invention is not limited thereto, and the processor 110 may train the plurality of submodels in various ways based on the cost function.
  • the processor 110 of the present disclosure may generate a second computational model 300 that shares at least a portion of the first computational model 200.
  • the sharing may mean using the same (or similar) structure of the neural network (a non-limiting example, the number of layers, the number of nodes, etc.), weights, and the like.
  • the processor 110 shares a structure and weights of the first reduction layer 210 and the second recovery layer 240 of the first calculation model 200 with the second calculation model 300. ) Can be created.
  • the second computational model 300 is characterized in that the output of the third reduction layer 310, which shares the first reduction layer 210, shares the second recovery layer 240. It may have a structure applied as an input of the third recovery layer 320.
  • the present disclosure is not limited thereto, and the processor 110 may generate a second computational model that shares various parts of the first computational model 200.
  • the processor 110 of the present disclosure may generate the third data 500 based on the first data using the second operation model.
  • the processor 110 may generate test result information based on the first data and the third data using any algorithm.
  • the processor 110 may perform a test by comparing and scoring first data that is input data of the second operation model and third data 500 that are output data.
  • the test result information may be information for data classification, and specifically, the test result information may be information for determining whether the first data is anomaly.
  • the test result information may be information for evaluating the compression efficiency of the input data.
  • the information processing method of the present disclosure may improve test execution speed by using a second computational model that shares at least a portion of the trained first computational model. That is, the information processing method of the present disclosure uses a second computational model including fewer submodels (or layers) than the first computational model to reduce computation time, thereby maintaining test accuracy while maintaining test accuracy. Can improve.
  • the processor 110 of the present disclosure may generate test result information based on the first data and the second data of the first calculation model.
  • the processor 110 may perform a test using first data, which is input data of the first operation model, and second data, which is output data.
  • the test result information may be information for data classification, and specifically, the test result information may be information for determining whether the first data is anomaly.
  • the test result information may be information for evaluating the compression efficiency of the input data.
  • the present invention is not limited thereto, and the test result information may include various information.
  • the processor 110 of the present disclosure may generate test result information based on at least two of the first data, the second data, the third data, and the plurality of sub data.
  • the computational model and submodel of the present disclosure may each be trained to produce output data identical to the input data.
  • the output data of the computational model of the present disclosure and the output data of the submodel can be used to perform the test.
  • the processor 110 may generate a test result based on the first data and output data (eg, the first sub data) of the sub model.
  • the processor 110 may generate a test result based on output data of at least two submodels.
  • the processor 110 may generate test result information based on the first sub-data of the first sub-model and the second sub-data of the second sub-model. Can be generated.
  • the processor 110 of the present disclosure may perform a test by generating a plurality of test result information based on at least two of the first data, the second data, the third data, and the plurality of sub data. .
  • the sub-model of the present disclosure can operate as an independent prediction model, the sub-data generated by the sub-model can be utilized for testing with the same qualification as the data generated by the computational model.
  • the processor 110 compares the first data with the first sub data generated by the first sub-model and compares the first data with the second data generated by the first computational model. The anomaly of the first data may be determined using the test result information.
  • Anomaly of the first data may be determined when the test decision information that determines the anomaly is equal to or greater than a predetermined ratio (for example, when 75%, that is, three of the four test result information is result information indicating anomaly, etc.). .
  • a predetermined ratio for example, when 75%, that is, three of the four test result information is result information indicating anomaly, etc.
  • any one of the plurality of test result information may be weighted. For example, a test result generated by comparing the first data and the second data may be weighted twice as much as a test result generated by comparing the first data and the sub data.
  • the computational model of the present disclosure may have an effect of constructing an ensemble model by combining sub-models included in the computational model into a prediction model.
  • the present invention is not limited thereto, and the processor 110 of the present disclosure may perform a test in various ways.
  • the processor 110 may generate test result information by using a relationship between various data.
  • the relationship between the data may mean an error (or similarity) between the data.
  • the processor 110 may generate an error value (or similarity value) by comparing features of a plurality of data (eg, images).
  • the processor 110 of the present disclosure may generate test result information indicating that the first data is anomalous when an error value (or similarity value) does not satisfy a predetermined criterion (eg, 90%). Can be.
  • the processor 110 may use a function for comparing the mathematical distance between two features, and may be configured as a neural network to determine the similarity of the two features.
  • the processor 110 may generate an error value (or similarity value) by comparing the features extracted from the data with each other using an arbitrary data comparison algorithm, and the present disclosure is not limited to the comparison algorithm.
  • the processor 110 may use a neural network structure, of which the deconvolutional neural network structure may be used. However, the present invention is not limited thereto.
  • FIG. 4 is a flowchart for describing steps of an information processing method for training a first connection model according to an embodiment of the present disclosure.
  • an information processing method for training a first connection model may be implemented by the following steps.
  • An information processing method performed in one or more processors comprising: receiving, by a processor, first data (s410); Generating, by a processor, second data based on the first data using a first computational model comprising a plurality of submodels connected in series (s420), wherein the submodel is at least one for changing dimensions Including a layer; Generating (s430), by a processor, a cost function based on at least two of the first data, the second data, and the plurality of sub-data generated by the plurality of sub-models; Training, by a processor, at least one of the plurality of submodels based on the cost function (s440); It may include.
  • the step of generating the second data comprises: reducing, by a processor, the dimension of the first data using a first reduction layer included in a first submodel, Generating first sub data by recovering the reduced dimension of the first data using the first recovery layer included in the sub model; And reduce, by a processor, the dimension of the first sub data using a second reduction layer included in a second sub-model, and use the first sub data using a second recovery layer included in the second sub-model. Generating the second data by recovering a reduced dimension of the; It may include.
  • the cost function is:
  • the cost function may include: a term based on the relationship between the first data and the second data, and a term based on the relationship between the first data and the first sub data.
  • the first data may comprise: training data consisting of normal data.
  • the first data may comprise: image data.
  • At least one of the plurality of submodels may comprise neural networks.
  • the neural network may comprise an auto-encoder or GAN (Generative Adversarial Networks).
  • the information processing method comprises: generating, by a processor, a second computational model that shares at least a portion of the first computational model; It may further include.
  • the second computational model may: share at least a portion of the first reduction layer and the second recovery layer.
  • FIG. 5 is a flowchart for describing steps of an information processing method for performing a test using a first connection model according to an embodiment of the present disclosure.
  • an information processing method for performing a test using a first connection model may be implemented by the following steps.
  • an information processing method performed in one or more processors, the information processing method comprising: receiving, by a processor, first data (s510); Generating, by the processor, second data based on the first data using a first operation model including a plurality of sub-models connected in series (s520), wherein the sub-model has at least one layer for changing dimensions Comprising; Generating, by a processor, test result information based on the first data and the second data (s530); It may include.
  • FIG. 6 is a flowchart for explaining steps of an information processing method for performing a test using a second connection model according to an embodiment of the present disclosure.
  • an information processing method for performing a test using a second connection model may be implemented by the following steps.
  • An information processing method performed in one or more processors comprising: receiving, by a processor, first data (s610); Generating, by a processor, third data based on the first data using a second computational model that shares at least a portion of the first computational model (s620); Generating, by a processor, test result information based on the first data and the third data (s630); It may include.
  • FIG. 7 shows a brief, general schematic diagram of an example computing environment in which embodiments of the present disclosure may be implemented.
  • program modules include routines, programs, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • methods of the present disclosure may include uniprocessor or multiprocessor computer systems, minicomputers, mainframe computers as well as personal computers, handheld computing devices, microprocessor-based or programmable consumer electronics, And other computer system configurations, including one or more associated devices, which may operate in conjunction with one or more associated devices.
  • Computers typically include a variety of computer readable media. Any medium that can be accessed by a computer can be a computer readable medium, which can be volatile and nonvolatile media, transitory and non-transitory media, removable and non-transitory media. Removable media.
  • computer readable media may comprise computer readable storage media and computer readable transmission media.
  • Computer-readable storage media are volatile and nonvolatile media, temporary and non-transitory media, removable and non-removable implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data. Media.
  • Computer storage media may include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROMs, digital video disks or other optical disk storage devices, magnetic cassettes, magnetic tapes, magnetic disk storage devices or other magnetic storage devices, Or any other medium that can be accessed by a computer and used to store desired information.
  • Computer-readable transmission media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and the like. Includes all information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed to encode information in the signal.
  • computer readable transmission media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, or other wireless media. Combinations of any of the above should also be included within the scope of computer readable transmission media.
  • System bus 1008 couples system components, including but not limited to system memory 1006, to processing device 1004.
  • Processing apparatus 1004 may be any of a variety of commercial processors. Dual processor and other multiprocessor architectures may also be used as the processing unit 1004.
  • System bus 1008 may be any of several types of bus structures that may be further interconnected to a memory bus, a peripheral bus, and a local bus using any of a variety of commercial bus architectures.
  • System memory 1006 includes read only memory (ROM) 1010 and random access memory (RAM) 1012.
  • BIOS Basic Input / Output System
  • RAM 1012 may also include high speed RAM, such as static RAM for caching data.
  • Computer 1002 also includes an internal hard disk drive (HDD) 1014 (e.g., EIDE, SATA) and the internal hard disk drive 1014 may also be configured for external use within a suitable chassis (not shown). ), A magnetic floppy disk drive (FDD) 1016 (eg, for reading from or writing to a removable diskette 1018), and an optical disk drive 1020 (eg, a CD-ROM disk 1022 for reading from or reading from or writing to other high capacity optical media such as DVD).
  • the hard disk drive 1014, the magnetic disk drive 1016, and the optical disk drive 1020 are connected to the system bus 1008 by the hard disk drive interface 1024, the magnetic disk drive interface 1026, and the optical drive interface 1028, respectively.
  • the interface 1024 for external drive implementation includes at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.
  • drives and their associated computer readable media provide nonvolatile storage of data, data structures, computer executable instructions, and the like.
  • drives and media correspond to storing any data in a suitable digital format.
  • computer readable media refers to HDDs, removable magnetic disks, and removable optical media such as CDs or DVDs, those skilled in the art will appreciate zip drives, magnetic cassettes, flash memory cards, cartridges, and the like.
  • Other types of media readable by the computer, etc. may also be used in the exemplary operating environment and it will be appreciated that any such media may include computer executable instructions for performing the methods of the present disclosure.
  • Program modules may be stored in the drive and RAM 1012, including operating system 1030, one or more application programs 1032, other program modules 1034, and program data 1036. All or a portion of the operating system, applications, modules and / or data may also be cached in RAM 1012. It will be appreciated that the present disclosure may be implemented in various commercially available operating systems or combinations of operating systems.
  • a user may enter commands and information into the computer 1002 via one or more wired / wireless input devices, such as a keyboard 1038 and a mouse 1040.
  • Other input devices may include a microphone, IR remote control, joystick, game pad, stylus pen, touch screen, and the like.
  • These and other input devices are often connected to the processing unit 1004 through an input device interface 1042 that is connected to the system bus 1008, but the parallel port, IEEE 1394 serial port, game port, USB port, IR interface, Etc. can be connected by other interfaces.
  • a monitor 1044 or other type of display device is also connected to the system bus 1008 via an interface such as a video adapter 1046.
  • the computer generally includes other peripheral output devices (not shown) such as speakers, printers, and the like.
  • Computer 1002 may operate in a networked environment using logical connections to one or more remote computers, such as remote computer (s) 1048, via wired and / or wireless communications.
  • Remote computer (s) 1048 may be a workstation, computing device computer, router, personal computer, portable computer, microprocessor-based entertainment device, peer device, or other conventional network node, and typically Although many or all of the components described above are included, for simplicity, only memory storage 1050 is shown.
  • the logical connections shown include wired / wireless connections to a local area network (LAN) 1052 and / or a larger network, such as a wide area network (WAN) 1054.
  • LAN and WAN networking environments are commonplace in offices and businesses, facilitating enterprise-wide computer networks such as intranets, all of which may be connected to worldwide computer networks, such as the Internet.
  • the computer 1002 When used in a LAN networking environment, the computer 1002 is connected to the local network 1052 via a wired and / or wireless communication network interface or adapter 1056.
  • the adapter 1056 may facilitate wired or wireless communication to the LAN 1052, which also includes a wireless access point installed therein for communicating with the wireless adapter 1056.
  • the computer 1002 may include a modem 1058, connect to a communications computing device on the WAN 1054, or establish communications over the WAN 1054, such as via the Internet. Other means.
  • the modem 1058 which may be an internal or external and wired or wireless device, is connected to the system bus 1008 through the serial port interface 1042.
  • program modules or portions thereof described with respect to computer 1002 may be stored in remote memory / storage device 1050. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used.
  • Computer 1002 is associated with any wireless device or entity disposed and operating in wireless communication, such as a printer, scanner, desktop and / or portable computer, portable data assistant, communications satellite, wireless detectable tag. Communicate with any equipment or location and telephone. This includes at least Wi-Fi and Bluetooth wireless technology. Thus, the communication can be a predefined structure as in a conventional network or simply an ad hoc communication between at least two devices.
  • Wi-Fi Wireless Fidelity
  • Wi-Fi is a wireless technology such as a cell phone that allows such a device, for example, a computer, to transmit and receive data indoors and outdoors, i.
  • Wi-Fi networks use a wireless technology called IEEE 802.11 (a, b, g, etc.) to provide secure, reliable, high-speed wireless connections.
  • Wi-Fi may be used to connect computers to each other, to the Internet, and to a wired network (using IEEE 802.3 or Ethernet).
  • Wi-Fi networks can operate in unlicensed 2.4 and 5 GHz wireless bands, for example, at 11 Mbps (802.11a) or 54 Mbps (802.11b) data rates, or in products that include both bands (dual band). have.
  • data, instructions, instructions, information, signals, bits, symbols, and chips may include voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields. Or particles, or any combination thereof.
  • the various embodiments presented herein may be embodied in a method, apparatus, or article of manufacture using standard programming and / or engineering techniques.
  • article of manufacture includes a computer program, carrier, or media accessible from any computer-readable device.
  • computer-readable media may include magnetic storage devices (eg, hard disks, floppy disks, magnetic strips, etc.), optical discs (eg, CDs, DVDs, etc.), smart cards, and flash memory. Devices, such as, but not limited to, EEPROM, cards, sticks, key drives, and the like.
  • various storage media presented herein include one or more devices and / or other machine-readable media for storing information.
  • TECHNICAL FIELD This disclosure relates to artificial neural networks, and more particularly, to a method of training artificial neural networks and data classification through trained neural networks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Un mode de réalisation de la présente invention destiné à résoudre le problème technique concerne un procédé de traitement d'informations mis en œuvre par au moins un processeur. Le procédé de traitement d'informations peut comprendre les étapes consistant : à recevoir des premières données par un processeur ; à générer, par le processeur, des secondes données sur la base des premières données au moyen d'un premier modèle de calcul comprenant de multiples sous-modèles reliés en série, les sous-modèles comprenant au moins une couche de changement de dimension ; et à générer, par le processeur, une fonction de coût sur la base d'au moins deux éléments de données parmi les premières données, les secondes données et de multiples éléments de sous-données générés par les multiples sous-modèles.
PCT/KR2019/095019 2018-06-25 2019-05-23 Procédé d'apprentissage pour réseau neuronal artificiel WO2020005049A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020180072497A KR20200000541A (ko) 2018-06-25 2018-06-25 인공 신경망의 학습 방법
KR10-2018-0072497 2018-06-25

Publications (1)

Publication Number Publication Date
WO2020005049A1 true WO2020005049A1 (fr) 2020-01-02

Family

ID=68987272

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/095019 WO2020005049A1 (fr) 2018-06-25 2019-05-23 Procédé d'apprentissage pour réseau neuronal artificiel

Country Status (2)

Country Link
KR (1) KR20200000541A (fr)
WO (1) WO2020005049A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949824A (zh) * 2021-02-07 2021-06-11 北京淇瑀信息科技有限公司 一种基于神经网络的多输出多任务特征评估方法、装置和电子设备

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102207489B1 (ko) * 2020-01-14 2021-01-26 엘지전자 주식회사 데이터 생성방법
KR102349810B1 (ko) * 2020-08-07 2022-01-10 금오공과대학교 산학협력단 오토 디코더를 활용한 고속의 rm 부호 디코딩 방법
KR102653642B1 (ko) * 2020-10-27 2024-04-01 주식회사 이너아워 상품 추천을 위한 방법, 컴퓨터 프로그램 및 서버
WO2022164133A1 (fr) * 2021-01-26 2022-08-04 주식회사 뷰노 Méthode d'évaluation de lésion dans une image médicale
KR102476300B1 (ko) 2021-08-11 2022-12-13 기초과학연구원 기계 학습을 이용한 이상 탐지를 위한 장치 및 방법
KR102476499B1 (ko) * 2021-11-18 2022-12-12 주식회사 티맥스에이아이 유사 컨텐츠 제공 기법
KR102406287B1 (ko) * 2021-12-31 2022-06-08 주식회사 에스아이에이 협력 학습을 이용한 초해상도 이미징 방법

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180004046A (ko) * 2017-12-21 2018-01-10 동우 화인켐 주식회사 검사 장치 및 방법
KR101819857B1 (ko) * 2017-04-27 2018-01-17 건국대학교 산학협력단 인공 신경망의 학습 성능을 향상시키기 위한 구형화 패널티 방법 및 장치
KR20180054709A (ko) * 2015-09-17 2018-05-24 퀄컴 인코포레이티드 무선 네트워크에서의 크라우드 소싱된 사진의 관리
KR20180068292A (ko) * 2016-12-13 2018-06-21 엑시스 에이비 뉴럴 네트워크 트레이닝을 위한 방법, 컴퓨터 제품 및 디바이스

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8595204B2 (en) 2007-03-05 2013-11-26 Microsoft Corporation Spam score propagation for web spam detection
US9141916B1 (en) 2012-06-29 2015-09-22 Google Inc. Using embedding functions with a deep network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180054709A (ko) * 2015-09-17 2018-05-24 퀄컴 인코포레이티드 무선 네트워크에서의 크라우드 소싱된 사진의 관리
KR20180068292A (ko) * 2016-12-13 2018-06-21 엑시스 에이비 뉴럴 네트워크 트레이닝을 위한 방법, 컴퓨터 제품 및 디바이스
KR101819857B1 (ko) * 2017-04-27 2018-01-17 건국대학교 산학협력단 인공 신경망의 학습 성능을 향상시키기 위한 구형화 패널티 방법 및 장치
KR20180004046A (ko) * 2017-12-21 2018-01-10 동우 화인켐 주식회사 검사 장치 및 방법

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Autoencoder", 30 July 2016 (2016-07-30), Retrieved from the Internet <URL:https://untitledtblog.tistory.com/92> *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949824A (zh) * 2021-02-07 2021-06-11 北京淇瑀信息科技有限公司 一种基于神经网络的多输出多任务特征评估方法、装置和电子设备

Also Published As

Publication number Publication date
KR20200000541A (ko) 2020-01-03

Similar Documents

Publication Publication Date Title
WO2020005049A1 (fr) Procédé d&#39;apprentissage pour réseau neuronal artificiel
KR101940029B1 (ko) 어노말리 디텍션
WO2019074195A1 (fr) Dispositif et procédé de comparaison d&#39;images basée sur un apprentissage profond, et programme d&#39;ordinateur stocké sur un support d&#39;enregistrement lisible par ordinateur
WO2022055100A1 (fr) Procédé de détection d&#39;anomalies et dispositif associé
WO2019027208A1 (fr) Procédé d&#39;apprentissage pour un réseau neuronal artificiel
WO2019039757A1 (fr) Dispositif et procédé de génération de données d&#39;apprentissage et programme informatique stocké dans un support d&#39;enregistrement lisible par ordinateur
CN111431819B (zh) 一种基于序列化的协议流特征的网络流量分类方法和装置
WO2020004815A1 (fr) Procédé de détection d&#39;une anomalie dans des données
WO2022005091A1 (fr) Procédé et appareil de lecture de l&#39;âge d&#39;un os
KR20210068707A (ko) 신경망을 학습시키는 방법
WO2022265292A1 (fr) Procédé et dispositif de détection de données anormales
WO2022119162A1 (fr) Méthode de prédiction de maladie basée sur une image médicale
WO2022075678A2 (fr) Appareil et procédé de détection de symptômes anormaux d&#39;un véhicule basés sur un apprentissage auto-supervisé en utilisant des données pseudo-normales
WO2024080791A1 (fr) Procédé de génération d&#39;ensemble de données
WO2023101417A1 (fr) Procédé permettant de prédire une précipitation sur la base d&#39;un apprentissage profond
WO2019198900A1 (fr) Appareil électronique et procédé de commande associé
KR20200019919A (ko) 데이터의 어노말리 감지 방법
WO2023027277A1 (fr) Procédé d&#39;entraînement pour diversité de modèle de réseau neuronal
WO2022092670A1 (fr) Procédé d&#39;analyse de l&#39;épaisseur d&#39;une région corticale cérébrale
WO2023075351A1 (fr) Procédé d&#39;apprentissage d&#39;intelligence artificielle pour robot industriel
WO2020013494A1 (fr) Détection d&#39;anomalie
KR102070730B1 (ko) 이미지 세그먼테이션 방법
WO2023027278A1 (fr) Procédé d&#39;apprentissage actif fondé sur un programme d&#39;apprentissage
WO2021066266A1 (fr) Procédé de production de son en utilisant l&#39;intelligence artificielle
WO2023008811A2 (fr) Procédé de reconstruction d&#39;image de visage masqué à l&#39;aide d&#39;un modèle de réseau neuronal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19827478

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19827478

Country of ref document: EP

Kind code of ref document: A1