US20100169256A1 - Separate Learning System and Method Using Two-Layered Neural Network Having Target Values for Hidden Nodes - Google Patents

Separate Learning System and Method Using Two-Layered Neural Network Having Target Values for Hidden Nodes Download PDF

Info

Publication number
US20100169256A1
US20100169256A1 US12/722,861 US72286110A US2010169256A1 US 20100169256 A1 US20100169256 A1 US 20100169256A1 US 72286110 A US72286110 A US 72286110A US 2010169256 A1 US2010169256 A1 US 2010169256A1
Authority
US
United States
Prior art keywords
learning
hidden
node
connection weight
hidden node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/722,861
Inventor
Ju Hong Lee
Bum Ghi Choi
Tae Su Park
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inha Industry Partnership Institute
Original Assignee
Inha Industry Partnership Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inha Industry Partnership Institute filed Critical Inha Industry Partnership Institute
Priority to US12/722,861 priority Critical patent/US20100169256A1/en
Assigned to INHA-INDUSTRY PARTNERSHIP INSTITUTE reassignment INHA-INDUSTRY PARTNERSHIP INSTITUTE ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHOI, BUM GHI, LEE, JU HONG, PARK, TAE SU
Publication of US20100169256A1 publication Critical patent/US20100169256A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education

Definitions

  • the present invention relates, in general, to a separate learning system and method using a two-layered neural network having target values for hidden nodes and, more particularly, to a separate learning system and method using a two-layered neural network having target values for hidden nodes, which set the target values for hidden nodes during separate learning, so that a computational process is separated into an upper connection and a lower connection without changing a network structure and a weight updating rule, thus reducing computational work.
  • a neural network system has various uses and application fields.
  • a neural network system can be applied and utilized in various fields such as customer management and electronic commerce in data mining, network management, speech recognition, and financial services.
  • Amazon.com and NCOF use a neural network system to manage of customers who purchase books, and to support searches for products on electronic commerce sites.
  • a neural network system is used to analyze the shape of charts, and to predict tendencies of the price index of stocks.
  • Visa international and Mellon bank in the United States use a neural network system in a general system for detecting the risk of transactions and in a method of picking out persons who are a high credit risk.
  • a neural network system is used to determine conditions such as optimal temperature, pressure, or chemical materials, in a process of manufacturing fluorescent lamps, and is also utilized to detect inverse functions occurring during a manufacturing process in MIT and a simulation process in productivity laboratories.
  • Learning in a neural network is a process of setting weights to obtain a desired value at an output node that outputs results corresponding to some input.
  • a representative learning method used in a neural network is a backpropagation learning method.
  • a backpropagation learning method which is a learning method used in multi-layer and feedforward neural networks, denotes a supervised learning technique.
  • input data and desired output data are required.
  • a backpropagation algorithm has convergence problems, such as local minima or plateaus.
  • the plateaus result in the problem of very slow convergence, and the local minima result in a problem in which gradients in all directions equal zero, thus causing the learning process unexpectedly to stop.
  • QP Quick-propagation
  • RPROP resilient propagation
  • a backpropagation learning method is problematic in that, since it concentrates only on solving the imbalance between convergence speed and convergence stability due to its function, which is to solve the problem in which convergence speed is low and a learning process stalls at a local minimum, thus convergence fails, the backpropagation learning method is not flexible for arbitrary initial weights, cannot guarantee convergence in a wide range of parameters, and cannot solve the problem of local minima and plateaus.
  • an object of the present invention is to provide a separate learning system and method, which set the target values for hidden nodes during separate learning, without a network structure and a weight updating rule not changed.
  • Another object of the present invention is to provide a separate learning system and method, which separate a calculation process into an upper connection and a lower connection, thus reducing computational work.
  • a further object of the present invention is to provide a separate learning system and method, which require only a small storage space, realize high convergence speed, and guarantee convergence stability somewhat, thus solving a convergence problem.
  • Yet another object of the present invention is to provide a separate learning system and method, which can more rapidly and stably escape from local minima and plateaus.
  • the present invention provides a separate learning system using a two-layered neural network having target values for hidden nodes, comprising an input layer for receiving training data from a user, and including at least one input node; a hidden layer including at least one hidden node; a first connection weight unit for connecting the input layer to the hidden layer, and changing a weight between the input node and the hidden node, thus performing learning; an output layer for outputting training data; a second connection weight unit for connecting the hidden layer to the output layer, changing a weight between the output and the hidden node, and calculating a target value for the hidden node, based on a current error for the output node, thus performing learning; and a control unit for stopping learning, fixing the second connection weight unit, turning a learning direction to the first connection weight unit, and causing learning to be repeatedly performed between the input node and the hidden node if a learning speed decreases or a cost function increases due to local minima or plateaus when the first connection weight unit is fixed and learning is performed using only the
  • the first connection weight unit may comprise a reception module for receiving the target value for the hidden node and an error value for the hidden node from the second connection weight unit; a weight change module for changing the weight between the input node and the hidden node; and a first comparison determination module for comparing the target value with the current value for the hidden node, received through the reception module, thus determining whether learning has reached the target value for the hidden node.
  • the weight change module may adjust the weight using a gradient descent method.
  • the determination module may select a single hidden node when learning is performed.
  • control unit may turn the learning direction of the first connection weight unit, maintain the learning direction until learning has reached the target value for the hidden node, and thereafter return a learning direction to the second connection weight unit, thus repeatedly performing learning until learning reaches a global minimum.
  • the present invention provides a separate learning method using a two-layered neural network having target values for hidden nodes, comprising the steps of (a) performing learning in a second connection weight unit using training data; (b) determining whether learning has converged when a learning speed decreases due to local minima and plateaus, and stopping the learning if it is determined that learning has converged, otherwise turn a learning direction to a first connection weight unit and allowing learning to be performed between all of the input node at least one hidden node; (c) determining whether learning has reached a target value for the hidden node set by the first connection weight unit; (d) turning a learning direction to the second connection weight unit and performing learning between the hidden node and at least one output node if it is determined that learning has not reached the target value for the hidden node as a result of the determination; and (e) causing learning, performed in the second connection weight unit, to reach a global minimum.
  • the separate learning method may further comprise the step of (a- 1 ) receiving training data through the input layer to train a neural network before step (a).
  • step (b) may further comprise the steps of (b- 1 ) selecting an output node having a largest error value with respect to the hidden node if it is determined that learning has not converged; (b- 2 ) calculating the target value for the hidden node so that learning can reach a global minimum; and (b- 3 ) transmitting the error value for the hidden node and the target value for the hidden node to the first connection weight unit.
  • FIG. 1A is a conceptual view of a two-layered neural network according to an embodiment of the present invention.
  • FIG. 1B is a diagram showing the construction of a separate learning system using a two-layered neural network having target values for hidden nodes according to an embodiment of the present invention
  • FIG. 2 is a diagram showing a method of predicting a gradient relative to a target value for a hidden node according to an embodiment of the present invention
  • FIG. 3 is a diagram showing a method of detouring around obstacles, such as local minima and plateaus, according to an embodiment of the present invention
  • FIG. 4A is a flowchart of a separate learning method using a two-layered neural network having target values for hidden nodes according to an embodiment of the present invention
  • FIG. 4B is a detailed flowchart showing the step of generating a target value for a hidden node according to an embodiment of the present invention
  • FIGS. 5A to 5C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in the number of hidden nodes according to a first experimental example of the present invention
  • FIGS. 6A to 6C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in learning rate according to the first experimental example of the present invention
  • FIGS. 7A to 7C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in momentum according to the first experimental example of the present invention
  • FIGS. 8A to 8C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in learning rate according to a second experimental example of the present invention.
  • FIGS. 9A to 9C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in momentum according to the second experimental example of the present invention.
  • FIGS. 11A to 11C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in momentum according to the third experimental example of the present invention.
  • FIG. 1A is a conceptual view of a two-layered neural network according to an embodiment of the present invention
  • FIG. 1B is a diagram showing the construction of a separate learning system using a two-layered neural network having target values for hidden nodes according to an embodiment of the present invention
  • FIG. 2 is a diagram showing a method of predicting a gradient relative to a target value for a hidden node according to an embodiment of the present invention
  • FIG. 3 is a diagram showing a method of detouring around obstacles, such as local minima and plateaus, according to an embodiment of the present invention.
  • a learning system 100 performs a learning function by learning weights through training data and making generalizations about the characteristics of training data, as shown in FIG. 1A , and includes an input layer 110 , a first connection weight unit 120 , a hidden layer 130 , a second connection weight 140 , an output layer 150 , and a control unit 160 .
  • the input layer 110 functions to receive a plurality of pieces of training data from a user, and includes input nodes X n (x 1 , x 2 , . . . , x n ).
  • the first connection weight unit 120 functions to connect the input layer 110 to the hidden layer 130 through input-to-hidden connections, and to change weights between the input nodes and hidden nodes, included in the hidden layer 130 , thus performing learning.
  • the first connection weight unit 120 includes a reception module 121 , a weight change module 122 , and a first “comparison-determination” module 123 .
  • the reception module 121 functions to receive a target value and an error value for a corresponding hidden node from the second connection weight unit 140 .
  • the weight change module 122 functions to change the weights between the input nodes and the hidden nodes.
  • the weight change module 122 can perform a learning by adjusting the weights using a gradient descent method.
  • the weights of the first connection weight unit 120 are adjusted so as to minimize the sum of squares of errors between actual output values, obtained from all input nodes for a network in which input/output functions are constructed using linear units, and target output values.
  • a cost function thereof is expressed by the following Equation [1],
  • d j is a target value for a j-th output node
  • S is an activation function
  • x i is an i-th input
  • w ih* is a weight directed from an i-th input node to an h-th hidden node
  • z h is the output value of the h-th hidden node
  • w *hj is a weight directed from the h-th hidden node to the j-th output node
  • y j is the output value of the j-th output node.
  • the cost function has different function values because of the values for hidden nodes.
  • the cost function increases, learning between the hidden nodes and the output nodes is stopped, and learning between the input nodes and the hidden nodes is performed.
  • Equation [2] For reference, a gradient descent rule for the connection of the hidden layer to the output layer is expressed by the following Equation [2].
  • the first connection weight of the first connection weight unit 120 corresponding to the connection from the input layer to the hidden layer is partially differentiated by w ih* using Equation [2], which is expressed by the following Equation [3]
  • the first “comparison-determination” module 123 functions to compare the actual output value of the hidden node with the target value and error value for the hidden node, received through the reception module 121 , thus determining whether learning reaches the target value for the hidden node.
  • the first connection weight in this embodiment is indicated by w ih ., and denotes the connection from the hidden layer to the input layer.
  • the second connection weight unit 140 functions to connect the hidden layer 130 to the output layer 150 through hidden-to-output connections, process outputs on the output nodes through respective hidden nodes, and calculate the target value for the hidden node, based on the current error of the output nodes, thus allowing learning to be performed.
  • the second connection weight unit 140 includes a second comparison determination module 141 , an error calculation module 142 , a hidden node target value calculation module 143 , a transmission module 144 , a selection module 145 , and a determination module 146 .
  • the second “comparison-determination” module 141 determines whether traffic congestion, such as a delay in learning time or convergence failure, has occurred in a learning process, and turns the learning direction to the first connection weight unit 120 , thus performing learning between the input nodes and the hidden nodes until learning reaches the set a target value for the hidden node.
  • Equation [4] an expected error associated with the error of z i for an output node y i is expressed by the following Equation [4].
  • the expected error ⁇ h ⁇ z h is obtained by multiplying the function
  • the hidden node target value calculation module 143 functions to calculate the target value for the hidden node so that learning can reach a global minimum.
  • the hidden node target value calculation module 143 functions to calculate the target value for the hidden node ⁇ h , based on the current error value for the output node. That is, the error for the hidden node is calculated using a gradient corresponding to the direction of the hidden node and a selected output error, so that the target value for the hidden node is calculated.
  • a target value for a corresponding hidden node denotes the value of a hidden node which causes a selected output to approximate its ideal value as closely as possible.
  • a suitable approximate value corresponding to the target value for the hidden node is set.
  • the cost function of the hidden node can be given by the following Equation [6] using the target value for the hidden node ⁇ h calculated in Equation [4].
  • the selection module 145 functions to select the output node having the largest error with respect to a hidden node.
  • the determination module 146 functions to determine which hidden node is to be selected so as to perform learning in the first connection weight unit 120 .
  • the output layer 150 functions to output training data that has been completely learned, and includes output nodes.
  • the control unit 160 compulsorily stops learning, fixes the second connection weight unit 140 , and turns the learning direction to the first connection weight unit 120 , thus repeatedly performing learning between the input nodes and the hidden nodes. Accordingly, learning is repeatedly performed until the learning process converges to the set target value for the hidden node.
  • the learning method After the learning method turns the learning direction of the first connection weight unit 120 , and maintains the path until learning reaches the set target value for the hidden node, the learning method returns to the second connection weight unit 140 , thus repeatedly performing learning until learning reaches a global minimum.
  • the separate learning method travels a longer distance than does a backpropagation learning method, but can also travel at higher speed, and furthermore, convergence speed is also high.
  • control unit 160 receives training data through the input layer 110 to train the neural network at step S 2 .
  • control unit initializes the input layer, the hidden layer and the output layer, thus improving convergence speed using the target value for the hidden node.
  • control unit 160 performs learning in the second connection weight unit 140 using the received training data at step S 4 .
  • control unit determines whether learning has converged when learning speed decreases due to local minima and plateaus at step S 6 .
  • control unit 160 turns the learning direction to the first connection weight unit 120 at step S 8 , thus allowing learning to be performed between the input nodes and the hidden nodes.
  • the second connection weight unit is fixed and the learning direction turns to the first connection weight unit, so that learning is repeatedly performed.
  • the control unit 160 determines whether learning has reached the set target value for the hidden node at step S 10 . If it is determined that the learning has reached the target value for the hidden node, the control unit turns the learning direction to the second connection weight unit 140 , and then continuously performs learning between the hidden nodes and the output nodes at step S 12 .
  • control unit 160 determines whether learning performed in the second connection weight unit 140 has reached a global minimum at step S 14 . If it is determined that learning has reached a global minimum, learning stops.
  • control unit 160 returns to step S 4 .
  • control unit 160 returns to step S 4 .
  • control unit 160 generates a target value for the hidden node, thus causing learning to converge at step S 16 .
  • step S 16 is described.
  • the control unit 160 selects the output node having the largest error value.
  • control unit 160 calculates the target value for the hidden node using Equation [5] so that learning can reach a global minimum at step S 16 b.
  • control unit 160 transmits the generated error value for the hidden node and the generated target value for the hidden node to the first connection weight unit 120 at step S 16 c.
  • control unit causes learning to reach the global minimum using the error value and the target value for the hidden node, received from the second connection weight unit.
  • an input vector X, the number of input nodes n, and a probability variable a were input, each input pattern value was set to an arbitrary value between ⁇ 1 and 1, the number of input patterns was set to 10 to 20, and the number of classes is set to 3 to 10.
  • the probability variable a was assigned a value equal to or greater than 3.0 depending on the number of input nodes, so that data was generated to cause a region of overlapping classes to be relatively large.
  • the measure of evaluating performance used the following equations,
  • C i is the closest class
  • ⁇ k i is the k-th dimensional value of the center ⁇ i of C i
  • C j is the next closest class
  • the experimental examples compare and evaluate the convergence rates, learning rates, learning times and mean square errors according to an increase in the number of hidden nodes, an increase in learning rate, and an increase in momentum, with respect to a separate learning method and a backpropagation learning method.
  • a limit time of about 50 seconds and a convergence error limit of 0.01 were set according to an experiment, so that only the cases where an error less than the limit is obtained within the limit time were included in the case of successful convergence rate.
  • the mean square error was set to indicate the mean value of minimum errors.
  • a first experimental example was conducted to compare the performance of backpropagation learning and separate learning with each other when the learning rate was fixed at 0.3, and the number of hidden nodes was increased from 3 to 20.
  • a first experimental example was conducted in such a way that numbers of all iterations for 30 data samples, arbitrarily selected for both separate learning and backpropagation learning, are summed, and the total number of iterations is divided by the total learning time, in order to determine the learning time per iteration (epoch).
  • the total iteration number was 58641 and the total learning time was 1476 seconds
  • the total iteration number was 18205, and the total learning time was 1510 seconds.
  • the learning time per iteration for each learning method the learning time per iteration for separate learning was 0.025 seconds, and the learning time per iteration for backpropagation learning was 0.083 seconds. Accordingly, it could be seen that the learning time for separate learning was three times as short as that for backpropagation learning.
  • a second experimental example is an experiment for determining whether the a breast tumor is a benign tumor or a malignant tumor using Wisconsin breast cancer data and 9 variables.
  • the number of data samples was 457, and tumors were classified into two classes of benignancy and malignancy. Accordingly, an increase in the number of hidden nodes may decrease overall performance.
  • iris data is composed of four variables, that is, sepal length, sepal width, petal length, and petal width.
  • the total number of data samples was 150, and 50 data samples were provided for each class, the classes being set as setosa, versilcolor and vignica, which are three types of iris.
  • backpropagation learning exhibited a smaller error than did separate learning. Further, the performance of backpropagation learning and separate learning, obtained when the learning rate was fixed at 0.1 and momentum was increased from 0.1 to 0.9, are described with reference to FIGS. 11A to 11C . In the case of convergence rate, separate learning exhibited better performance than did backpropagation learning regardless of an increase in momentum.
  • the proposed separate learning exhibited better performance than did backpropagation learning with respect to convergence rate and learning time, regardless of an increase in the number of hidden nodes, an increase in learning rate, and an increase in momentum.
  • the present invention provides a separate learning system and method, which set target values for hidden nodes in separate learning, so that a network structure and a weight updating rule are not changed.
  • the present invention is advantageous in that it divides a calculation process into upper and lower layers to perform learning, thus reducing computational work and consequently improving reliability.
  • the present invention is advantageous in that it requires storage space having only a small capacity, realizes fast convergence, and guarantees stability somewhat, thus increasing the probability of convergence.
  • the present invention is advantageous in that it sets target values for hidden nodes, thus realizing faster and more stable escape from local minima and plateaus.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • Educational Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Strategic Management (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed herein is a separate learning system and method using a two-layered neural network having target values for hidden nodes. The separate learning system of the present invention includes an input layer for receiving training data from a user, and including at least one input node. A hidden layer includes at least one hidden node. A first connection weight unit connects the input layer to the hidden layer, and changes a weight between the input node and the hidden node. An output layer outputs training data that has been completely learned. The second connection weight unit connects the hidden layer to the output layer, changing a weight between the output and the hidden node, and calculates a target value for the hidden node, based on a current error for the output node. A control unit stops learning, fixes the second connection weight unit, turns a learning direction to the first connection weight unit, and causes learning to be repeatedly performed between the input node and the hidden node if a learning speed decreases or a cost function increases due to local minima or plateaus when the first connection weight unit is fixed and learning is performed using only the second connection weight unit, thus allowing learning to be repeatedly performed until learning converges to the target value for the hidden node.

Description

    REFERENCE TO RELATED APPLICATIONS
  • This is a divisional patent application of copending application Ser. No. 11/457,601, filed Jul. 14, 2000, entitled “SEPARATE LEARNING SYSTEM AND METHOD USING TWO-LAYERED NEURAL NETWORK HAVING TARGET VALUES FOR HIDDEN NODES” which claims an invention which was disclosed in Korean (Republic of) application number 10-2006-0045193, filed May 19, 2006, entitled “Separately Trained System and Method Using Two-Layered Neural Network with Target Values of Hidden Nodes”. The aforementioned applications are hereby incorporated herein by reference.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates, in general, to a separate learning system and method using a two-layered neural network having target values for hidden nodes and, more particularly, to a separate learning system and method using a two-layered neural network having target values for hidden nodes, which set the target values for hidden nodes during separate learning, so that a computational process is separated into an upper connection and a lower connection without changing a network structure and a weight updating rule, thus reducing computational work.
  • 2. Description of the Related Art
  • Generally, a neural network system has various uses and application fields. For example, a neural network system can be applied and utilized in various fields such as customer management and electronic commerce in data mining, network management, speech recognition, and financial services.
  • In detail, in data mining fields, Amazon.com and NCOF use a neural network system to manage of customers who purchase books, and to support searches for products on electronic commerce sites. In financial service fields, a neural network system is used to analyze the shape of charts, and to predict tendencies of the price index of stocks. Visa international and Mellon bank in the United States use a neural network system in a general system for detecting the risk of transactions and in a method of picking out persons who are a high credit risk. Further, in the modeling and scientific theory development fields, a neural network system is used to determine conditions such as optimal temperature, pressure, or chemical materials, in a process of manufacturing fluorescent lamps, and is also utilized to detect inverse functions occurring during a manufacturing process in MIT and a simulation process in productivity laboratories.
  • Learning in a neural network is a process of setting weights to obtain a desired value at an output node that outputs results corresponding to some input. A representative learning method used in a neural network is a backpropagation learning method.
  • That is, a backpropagation learning method, which is a learning method used in multi-layer and feedforward neural networks, denotes a supervised learning technique. In order to perform learning, input data and desired output data are required.
  • However, a backpropagation algorithm has convergence problems, such as local minima or plateaus. The plateaus result in the problem of very slow convergence, and the local minima result in a problem in which gradients in all directions equal zero, thus causing the learning process unexpectedly to stop.
  • Therefore, an arbitrary set of initial weights is problematic in that it cannot guarantee the convergence of network training. In order to solve the above problems, there are methods such as 1) dynamic change of learning rate and momentum, and 2) the selection of a better function for activation or error evaluation based on a new weight updating rule.
  • Meanwhile, Quick-propagation (QP) and resilient propagation (RPROP) can provide a fast convergence rate, but cannot guarantee convergence to a global minimum.
  • Further, a genetic algorithm, conjugate gradient and second-order methods, such as Newton's method, require a greater storage space than backpropagation (BP). Therefore, there is a problem in that imbalance exists between convergence stability, required to avoid learning traps in a wide range of parameters, and a convergence speed, or between overall performance and the requirement of a storage space.
  • In other words, a backpropagation learning method is problematic in that, since it concentrates only on solving the imbalance between convergence speed and convergence stability due to its function, which is to solve the problem in which convergence speed is low and a learning process stalls at a local minimum, thus convergence fails, the backpropagation learning method is not flexible for arbitrary initial weights, cannot guarantee convergence in a wide range of parameters, and cannot solve the problem of local minima and plateaus.
  • SUMMARY OF THE INVENTION
  • Accordingly, the present invention has been made keeping in mind the above problems occurring in the prior art, and an object of the present invention is to provide a separate learning system and method, which set the target values for hidden nodes during separate learning, without a network structure and a weight updating rule not changed.
  • Another object of the present invention is to provide a separate learning system and method, which separate a calculation process into an upper connection and a lower connection, thus reducing computational work.
  • A further object of the present invention is to provide a separate learning system and method, which require only a small storage space, realize high convergence speed, and guarantee convergence stability somewhat, thus solving a convergence problem.
  • Yet another object of the present invention is to provide a separate learning system and method, which can more rapidly and stably escape from local minima and plateaus.
  • In order to accomplish the above objects, the present invention provides a separate learning system using a two-layered neural network having target values for hidden nodes, comprising an input layer for receiving training data from a user, and including at least one input node; a hidden layer including at least one hidden node; a first connection weight unit for connecting the input layer to the hidden layer, and changing a weight between the input node and the hidden node, thus performing learning; an output layer for outputting training data; a second connection weight unit for connecting the hidden layer to the output layer, changing a weight between the output and the hidden node, and calculating a target value for the hidden node, based on a current error for the output node, thus performing learning; and a control unit for stopping learning, fixing the second connection weight unit, turning a learning direction to the first connection weight unit, and causing learning to be repeatedly performed between the input node and the hidden node if a learning speed decreases or a cost function increases due to local minima or plateaus when the first connection weight unit is fixed and learning is performed using only the second connection weight unit, thus allowing learning to be repeatedly performed until learning converges to the target value for the hidden node.
  • Preferably, the first connection weight unit may comprise a reception module for receiving the target value for the hidden node and an error value for the hidden node from the second connection weight unit; a weight change module for changing the weight between the input node and the hidden node; and a first comparison determination module for comparing the target value with the current value for the hidden node, received through the reception module, thus determining whether learning has reached the target value for the hidden node.
  • Preferably, the weight change module may adjust the weight using a gradient descent method.
  • Preferably, the second connection weight unit may comprise a second “comparison-determination” module for determining whether traffic congestion, such as a delay in learning time or a convergence failure, have occurred, and turning a learning direction to the first connection weight unit, thus allowing learning to be performed between the input node and the hidden node until learning has reached the target value for the hidden node; an error generation module for generating an error value for the hidden node according to the output node; a hidden node target value calculation module for calculating the target value for the hidden node; a transmission module for transmitting the error value for the hidden node and the target value for the hidden node to the first connection weight unit; a selection module for selecting an output node having a largest error value with respect to the hidden node; and a determination module for determining a number of hidden nodes to allow learning to be performed in the first connection weight unit.
  • Preferably, the determination module may select a single hidden node when learning is performed.
  • Preferably, the control unit may turn the learning direction of the first connection weight unit, maintain the learning direction until learning has reached the target value for the hidden node, and thereafter return a learning direction to the second connection weight unit, thus repeatedly performing learning until learning reaches a global minimum.
  • Further, the present invention provides a separate learning method using a two-layered neural network having target values for hidden nodes, comprising the steps of (a) performing learning in a second connection weight unit using training data; (b) determining whether learning has converged when a learning speed decreases due to local minima and plateaus, and stopping the learning if it is determined that learning has converged, otherwise turn a learning direction to a first connection weight unit and allowing learning to be performed between all of the input node at least one hidden node; (c) determining whether learning has reached a target value for the hidden node set by the first connection weight unit; (d) turning a learning direction to the second connection weight unit and performing learning between the hidden node and at least one output node if it is determined that learning has not reached the target value for the hidden node as a result of the determination; and (e) causing learning, performed in the second connection weight unit, to reach a global minimum.
  • Preferably, the separate learning method may further comprise the step of (a-1) receiving training data through the input layer to train a neural network before step (a).
  • Preferably, step (b) may further comprise the steps of (b-1) selecting an output node having a largest error value with respect to the hidden node if it is determined that learning has not converged; (b-2) calculating the target value for the hidden node so that learning can reach a global minimum; and (b-3) transmitting the error value for the hidden node and the target value for the hidden node to the first connection weight unit.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The above and other objects, features and other advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
  • FIG. 1A is a conceptual view of a two-layered neural network according to an embodiment of the present invention;
  • FIG. 1B is a diagram showing the construction of a separate learning system using a two-layered neural network having target values for hidden nodes according to an embodiment of the present invention;
  • FIG. 2 is a diagram showing a method of predicting a gradient relative to a target value for a hidden node according to an embodiment of the present invention;
  • FIG. 3 is a diagram showing a method of detouring around obstacles, such as local minima and plateaus, according to an embodiment of the present invention;
  • FIG. 4A is a flowchart of a separate learning method using a two-layered neural network having target values for hidden nodes according to an embodiment of the present invention;
  • FIG. 4B is a detailed flowchart showing the step of generating a target value for a hidden node according to an embodiment of the present invention;
  • FIGS. 5A to 5C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in the number of hidden nodes according to a first experimental example of the present invention;
  • FIGS. 6A to 6C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in learning rate according to the first experimental example of the present invention;
  • FIGS. 7A to 7C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in momentum according to the first experimental example of the present invention;
  • FIGS. 8A to 8C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in learning rate according to a second experimental example of the present invention;
  • FIGS. 9A to 9C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in momentum according to the second experimental example of the present invention;
  • FIGS. 10A to 10C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in learning rate according to a third experimental example of the present invention; and
  • FIGS. 11A to 11C are graphs showing the comparison of the performance of separate learning and backpropagation learning with respect to an increase in momentum according to the third experimental example of the present invention.
  • DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • Before description is made, it is to be noted that the terms or words used in the present specification and claims should be interpreted to have meaning and concepts suitable for the technical spirit of the present invention, based on the principle that an inventor can suitably define terms to optimally describe his or her invention. In the following description of the present invention, detailed descriptions may be omitted if it is determined that the detailed descriptions of related well-known functions and construction may make the gist of the present invention unclear.
  • Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings.
  • With reference to FIGS. 1A and 1B to FIG. 3, a separate learning system using a two-layered neural network having target values for hidden nodes according to an embodiment of the present invention is described below.
  • FIG. 1A is a conceptual view of a two-layered neural network according to an embodiment of the present invention, FIG. 1B is a diagram showing the construction of a separate learning system using a two-layered neural network having target values for hidden nodes according to an embodiment of the present invention, FIG. 2 is a diagram showing a method of predicting a gradient relative to a target value for a hidden node according to an embodiment of the present invention, and FIG. 3 is a diagram showing a method of detouring around obstacles, such as local minima and plateaus, according to an embodiment of the present invention.
  • In a separate learning system using a two-layered neural network having target values for hidden nodes, a learning system 100 performs a learning function by learning weights through training data and making generalizations about the characteristics of training data, as shown in FIG. 1A, and includes an input layer 110, a first connection weight unit 120, a hidden layer 130, a second connection weight 140, an output layer 150, and a control unit 160.
  • First, the input layer 110 functions to receive a plurality of pieces of training data from a user, and includes input nodes Xn(x1, x2, . . . , xn).
  • Further, as shown in FIG. 1B, the first connection weight unit 120 functions to connect the input layer 110 to the hidden layer 130 through input-to-hidden connections, and to change weights between the input nodes and hidden nodes, included in the hidden layer 130, thus performing learning. The first connection weight unit 120 includes a reception module 121, a weight change module 122, and a first “comparison-determination” module 123.
  • First, the reception module 121 functions to receive a target value and an error value for a corresponding hidden node from the second connection weight unit 140.
  • Further, the weight change module 122 functions to change the weights between the input nodes and the hidden nodes.
  • In detail, the weight change module 122 can perform a learning by adjusting the weights using a gradient descent method. In other words, the weights of the first connection weight unit 120 are adjusted so as to minimize the sum of squares of errors between actual output values, obtained from all input nodes for a network in which input/output functions are constructed using linear units, and target output values. A cost function thereof is expressed by the following Equation [1],
  • E [ w ] = 1 2 j [ d j - y j ] 2 y j = S ( u i ) u i = h w * hj z h z h = S ( v h ) v h = h w ih * x i [ 1 ]
  • where dj is a target value for a j-th output node, S is an activation function, xi is an i-th input, wih* is a weight directed from an i-th input node to an h-th hidden node, zh is the output value of the h-th hidden node, w*hj is a weight directed from the h-th hidden node to the j-th output node, and yjis the output value of the j-th output node.
  • In this case, the cost function has different function values because of the values for hidden nodes. When the cost function increases, learning between the hidden nodes and the output nodes is stopped, and learning between the input nodes and the hidden nodes is performed.
  • For reference, a gradient descent rule for the connection of the hidden layer to the output layer is expressed by the following Equation [2].
  • Δ w * hj = - η E w * hj = η ( d j - y j ) S ( u j ) z h [ 2 ]
  • The first connection weight of the first connection weight unit 120 corresponding to the connection from the input layer to the hidden layer is partially differentiated by wih* using Equation [2], which is expressed by the following Equation [3]
  • Δ w ih * = - η E w ih * = η j { ( d j - y j ) S ( u j ) w * hj } S ( v h ) x i [ 3 ]
  • Further, the first “comparison-determination” module 123 functions to compare the actual output value of the hidden node with the target value and error value for the hidden node, received through the reception module 121, thus determining whether learning reaches the target value for the hidden node.
  • In this case, if learning converges to the target value for the hidden node, learning stops, otherwise the learning direction turns to the second connection weight unit 140, thus enabling learning to be performed between the hidden nodes and the output nodes.
  • For reference, the first connection weight in this embodiment is indicated by wih., and denotes the connection from the hidden layer to the input layer.
  • Further, the second connection weight unit 140 functions to connect the hidden layer 130 to the output layer 150 through hidden-to-output connections, process outputs on the output nodes through respective hidden nodes, and calculate the target value for the hidden node, based on the current error of the output nodes, thus allowing learning to be performed. The second connection weight unit 140 includes a second comparison determination module 141, an error calculation module 142, a hidden node target value calculation module 143, a transmission module 144, a selection module 145, and a determination module 146.
  • First, the second “comparison-determination” module 141 determines whether traffic congestion, such as a delay in learning time or convergence failure, has occurred in a learning process, and turns the learning direction to the first connection weight unit 120, thus performing learning between the input nodes and the hidden nodes until learning reaches the set a target value for the hidden node.
  • Further, the error generation module 142 functions to generate an error value for the hidden node to a corresponding output node.
  • In this case, an expected error associated with the error of zi for an output node yi is expressed by the following Equation [4].
  • γ h - z h = z ~ - z · cos θ = ( d j - y j ) w * hj s ( u j ) y i ( z ) 2 = ( d j - y j ) w * hj s ( u j ) i ( y i z i ) = ( d j - y j ) w * hj s ( u j ) i ( w * ij s ( u j ) ) 2 = ( d j - y j ) w * hj s ( u j ) i ( w * ij ) 2 [ 4 ]
  • If the absolute value of the weight vector from the hidden node to the output node is relatively large, there is a great influence on an error for the hidden node compared to other cases. Therefore, if the absolute value of the weight vector is multiplied by the expected error for the hidden node, and Equation [4] is expressed again, the following Equation [5] is obtained.
  • γ h - z h = ( d j - y j ) w * hj s ( u j ) w * j [ 5 ]
  • In this embodiment, the expected error γh−zh is obtained by multiplying the function
  • z ~ - z = ( d j - y j ) y i ( z ) y i ( z ) y i ( z ) ,
  • associated with the error for the hidden node zh to the output node yi, by
  • cos θ = sign ( d j - y j ) y i z h y i ( z ) = sign ( d j - y j ) w * hj s ( u j ) y i ( z )
  • when the angle between {tilde over (z)}−z and zh is assumed to be θ.
  • In this case, when di−yi≧0, sign(di−yi)=1 is obtained, w*j=(w*1j, w*2j, . . . , w*nj) is obtained, and n is the number of hidden nodes.
  • The above embodiment is described to estimate the target value for the hidden node zh in consideration of all hidden nodes to the output nodes at the time of generation of the error for the hidden node zh, but the present invention is not limited to the above embodiment.
  • Further, the hidden node target value calculation module 143 functions to calculate the target value for the hidden node so that learning can reach a global minimum.
  • In detail, referring to FIG. 2, the hidden node target value calculation module 143 functions to calculate the target value for the hidden node γh, based on the current error value for the output node. That is, the error for the hidden node is calculated using a gradient corresponding to the direction of the hidden node and a selected output error, so that the target value for the hidden node is calculated.
  • In this case, a target value for a corresponding hidden node denotes the value of a hidden node which causes a selected output to approximate its ideal value as closely as possible. A suitable approximate value corresponding to the target value for the hidden node is set.
  • That is, the cost function of the hidden node can be given by the following Equation [6] using the target value for the hidden node γh calculated in Equation [4].
  • E ( W ) = 1 2 ( γ h - z h ) 2 [ 6 ]
  • Further, the transmission module 144 functions to transmit the error value for the hidden node and the target value for the hidden node to the first connection weight unit 120.
  • Further, the selection module 145 functions to select the output node having the largest error with respect to a hidden node.
  • Further, the determination module 146 functions to determine which hidden node is to be selected so as to perform learning in the first connection weight unit 120.
  • This embodiment is set to select only a single hidden node at each time that learning is performed in the first connection weight unit 120.
  • That is, only one is selected from among a plurality of hidden nodes to perform learning, thus improving convergence speed.
  • Further, the output layer 150 functions to output training data that has been completely learned, and includes output nodes.
  • Further, referring to FIG. 3, if a learning speed decreases or a cost function increases due to local minima or plateaus when the first connection weight unit 120 is fixed and learning is performed by the second connection weight unit 140, the control unit 160 compulsorily stops learning, fixes the second connection weight unit 140, and turns the learning direction to the first connection weight unit 120, thus repeatedly performing learning between the input nodes and the hidden nodes. Accordingly, learning is repeatedly performed until the learning process converges to the set target value for the hidden node.
  • That is, after the learning method turns the learning direction of the first connection weight unit 120, and maintains the path until learning reaches the set target value for the hidden node, the learning method returns to the second connection weight unit 140, thus repeatedly performing learning until learning reaches a global minimum.
  • Therefore, the separate learning method travels a longer distance than does a backpropagation learning method, but can also travel at higher speed, and furthermore, convergence speed is also high.
  • Hereinafter, a separate learning method using a two-layered neural network having target values for hidden nodes is described using application software having the above configuration, with reference to FIGS. 4A and 4B.
  • FIG. 4A is a flowchart of a separate learning method using a two-layered neural network having target values for hidden nodes according to an embodiment of the present invention, and FIG. 4B is a detailed flowchart showing the step of generating a target value for a hidden node according to an embodiment of the present invention.
  • As shown in FIG. 4A, the control unit 160 receives training data through the input layer 110 to train the neural network at step S2.
  • In this case, the control unit initializes the input layer, the hidden layer and the output layer, thus improving convergence speed using the target value for the hidden node.
  • Next, the control unit 160 performs learning in the second connection weight unit 140 using the received training data at step S4.
  • In this case, learning is performed using only the second connection weight unit after the first connection weight unit is fixed.
  • Next, the control unit determines whether learning has converged when learning speed decreases due to local minima and plateaus at step S6.
  • As a result of the determination at step S6, if learning is determined to have converged, the control unit 160 turns the learning direction to the first connection weight unit 120 at step S8, thus allowing learning to be performed between the input nodes and the hidden nodes.
  • In this case, the second connection weight unit is fixed and the learning direction turns to the first connection weight unit, so that learning is repeatedly performed.
  • When learning is performed in the first connection weight unit 120, the control unit 160 determines whether learning has reached the set target value for the hidden node at step S10. If it is determined that the learning has reached the target value for the hidden node, the control unit turns the learning direction to the second connection weight unit 140, and then continuously performs learning between the hidden nodes and the output nodes at step S12.
  • Next, the control unit 160 determines whether learning performed in the second connection weight unit 140 has reached a global minimum at step S14. If it is determined that learning has reached a global minimum, learning stops.
  • Meanwhile, if it is determined that learning has not reach a global minimum as a result of the determination at step S14, the control unit 160 returns to step S4.
  • Further, as a result of the determination at step S10, if it is determined that learning has reached the target value for the hidden node, the control unit 160 returns to step S4.
  • Meanwhile, if it is determined that the learning of the second connection weight unit 140 has not converged at step S6, the control unit 160 generates a target value for the hidden node, thus causing learning to converge at step S16.
  • In detail, with reference to FIG. 4B, step S16 is described. First, the control unit 160 selects the output node having the largest error value.
  • Next, the control unit 160 calculates the target value for the hidden node using Equation [5] so that learning can reach a global minimum at step S16 b.
  • Finally, the control unit 160 transmits the generated error value for the hidden node and the generated target value for the hidden node to the first connection weight unit 120 at step S16 c.
  • In this case, the control unit causes learning to reach the global minimum using the error value and the target value for the hidden node, received from the second connection weight unit.
  • Experimental Examples
  • In these experimental examples, in order to verify the performance of the separate learning method proposed in the present invention, experiments were conducted using a terminal having an AMD XP 2600+2.0 GB CPU and 512 MB Random Access Memory (RAM), using three types of data including 1) synthetic data, 2) Wisconsin breast Data, and 3) iris Data.
  • First, after distances d between respective input vectors and center vectors in all classes had been calculated to find the closest class and the next closest class, a desired class was determined using a given probability value. Then, experiments were conducted 270 times for each of the case where the number of hidden nodes increased, and the cases where learning rate and momentum increased from 0.1 to 0.9.
  • In this case, an input vector X, the number of input nodes n, and a probability variable a were input, each input pattern value was set to an arbitrary value between −1 and 1, the number of input patterns was set to 10 to 20, and the number of classes is set to 3 to 10. The probability variable a was assigned a value equal to or greater than 3.0 depending on the number of input nodes, so that data was generated to cause a region of overlapping classes to be relatively large. The measure of evaluating performance used the following equations,
  • d i = X - C i = ( k = 1 n ( x k - μ k i ) 2 ) 1 2 t = d j - d i d j + d i , d j > d i , 0 t 1 P α ( C i X ) = 1 1 + - α t , P α C j X = 1 - P α C i X
  • where Ci is the closest class, μk i is the k-th dimensional value of the center μi of Ci, and Cj is the next closest class.
  • Therefore, the experimental examples compare and evaluate the convergence rates, learning rates, learning times and mean square errors according to an increase in the number of hidden nodes, an increase in learning rate, and an increase in momentum, with respect to a separate learning method and a backpropagation learning method.
  • In the experimental examples, a limit time of about 50 seconds and a convergence error limit of 0.01 were set according to an experiment, so that only the cases where an error less than the limit is obtained within the limit time were included in the case of successful convergence rate. The mean square error was set to indicate the mean value of minimum errors.
  • First Experimental Example Synthetic Data
  • A first experimental example was conducted to compare the performance of backpropagation learning and separate learning with each other when the learning rate was fixed at 0.3, and the number of hidden nodes was increased from 3 to 20.
  • First, the experimental results of backpropagation learning and separate learning according to an increase in the number of hidden nodes are described. As shown in FIGS. 5A to 5C, when the number of hidden nodes was increased to 10 or above, backpropagation learning did not converge, but the mean square error did not decrease below 0.5. The reason for this is that an increase in the number of hidden nodes increases the complexity of a network, thus generating a large number of local minima.
  • Meanwhile, separate learning using synthetic data exhibited a high convergence rate regardless of an increase in the number of hidden nodes, so that separate learning was relatively free from the problem of local minima. In the case of learning time, it could be seen that backpropagation learning remained at the convergence limit time because it did not converge, whereas separate learning exhibited uniform and short learning time regardless of the number of hidden nodes.
  • Further, the experimental results of backpropagation learning and separate learning, obtained when hidden nodes were arbitrarily selected and the learning rate was increased from 0.1 to 0.9, are described. As shown in FIGS. 6A to 6C, it could be seen that, in the case of convergence rate and learning time, separate learning was superior to backpropagation learning. In detail, backpropagation learning failed in convergence for all learning rates except for a convergence rate of 0.1, and the mean square error thereof did not decrease below 10. Further, for separate learning, as a learning rate increased, the number of convergences decreased.
  • Further, experimental results, obtained through the comparison of the performance of backpropagation learning and separate learning when the learning rate was fixed at 0.3, and the value of momentum was increased from 0.1 to 0.9 while hidden nodes were arbitrarily selected, are described with reference to FIGS. 7A to 7C. In the case of the number of convergences, separate learning was generally superior to backpropagation learning. In the case of learning time, separate learning was performed at a speed about twice as fast as that of backpropagation learning.
  • That is, an increase in momentum is observed not to be a great help to separate learning or to backpropagation learning, so it is determined that momentum does not especially help eliminate obstacles such as local minima or plateaus.
  • In other words, a first experimental example was conducted in such a way that numbers of all iterations for 30 data samples, arbitrarily selected for both separate learning and backpropagation learning, are summed, and the total number of iterations is divided by the total learning time, in order to determine the learning time per iteration (epoch). In the case of separate learning, the total iteration number was 58641 and the total learning time was 1476 seconds, whereas, in the case of backpropagation learning, the total iteration number was 18205, and the total learning time was 1510 seconds.
  • Therefore, with respect to the learning time per iteration for each learning method, the learning time per iteration for separate learning was 0.025 seconds, and the learning time per iteration for backpropagation learning was 0.083 seconds. Accordingly, it could be seen that the learning time for separate learning was three times as short as that for backpropagation learning.
  • Second Experimental Example Wisconsin Breast Data
  • A second experimental example is an experiment for determining whether the a breast tumor is a benign tumor or a malignant tumor using Wisconsin breast cancer data and 9 variables. The number of data samples was 457, and tumors were classified into two classes of benignancy and malignancy. Accordingly, an increase in the number of hidden nodes may decrease overall performance.
  • That is, as the results of experiments, conducted while changing the number of hidden nodes to two and three, better performance was obtained when the number of hidden nodes was fixed at two. Accordingly, the experiment was conducted after the number of hidden nodes was fixed at two.
  • The experimental results are described below. As shown in FIGS. 8A to 8C, when momentum was fixed at 0.1 and a learning rate was increased from 0.1 to 0.9, separate learning was superior in both convergence rate and learning time to backpropagation learning at a low learning rate. That is, as the learning rate increased, the convergence rate decreased. In the case of a mean square error, separate learning and backpropagation learning exhibited almost the same results.
  • Further, the performances of backpropagation learning and separate learning, obtained when the learning rate was fixed at 0.1 and momentum was increased from 0.1 to 0.9, are described with reference to FIGS. 9A to 9C. In the case of convergence rate, separate learning and backpropagation learning exhibited almost the same convergence rate, but separate learning exhibited better performance. In the case of learning time, as momentum had a smaller value, backpropagation learning was performed fast, but, as momentum increased, separate learning was performed much faster.
  • Third Experimental Example Iris Data
  • In a third experimental example, iris data is composed of four variables, that is, sepal length, sepal width, petal length, and petal width.
  • In this case, the total number of data samples was 150, and 50 data samples were provided for each class, the classes being set as setosa, versilcolor and vignica, which are three types of iris.
  • As a result of experiments, the performances of backpropagation learning and separate learning, obtained when momentum was fixed at 0.1 and the learning rate was increased from 0.1 to 0.9, are described with reference to FIGS. 10A to 10C. In the case of convergence rate, backpropagation learning did not converge, whereas separate learning exhibited high convergence rate. In the case of learning time, separate learning got better results than did backpropagation learning at a low learning rate. Further, it could be seen that, as the learning rates of separate learning and backpropagation learning increased, the number of convergences decreased.
  • Therefore, in the case of mean square error, backpropagation learning exhibited a smaller error than did separate learning. Further, the performance of backpropagation learning and separate learning, obtained when the learning rate was fixed at 0.1 and momentum was increased from 0.1 to 0.9, are described with reference to FIGS. 11A to 11C. In the case of convergence rate, separate learning exhibited better performance than did backpropagation learning regardless of an increase in momentum.
  • In the case of learning time, separate learning exhibited better performance than did backpropagation learning. That is, it could be seen that backpropagation learning did not converge within a limited learning time with respect to overall learning, regardless of an increase in momentum.
  • As shown in the experimental results, the proposed separate learning exhibited better performance than did backpropagation learning with respect to convergence rate and learning time, regardless of an increase in the number of hidden nodes, an increase in learning rate, and an increase in momentum.
  • These results are obtained because the proposed method can solve the problem of convergence by providing different states to a weight updating rule, an unchanged network structure, target values and error values for hidden nodes, and a learning process. That is, computational advantages could be obtained through the fact that computational time per iteration of separate learning was less than that of backpropagation learning, and improved performance could be obtained through the application of various weight updating rules.
  • As described above, the present invention provides a separate learning system and method, which set target values for hidden nodes in separate learning, so that a network structure and a weight updating rule are not changed.
  • Further, the present invention is advantageous in that it divides a calculation process into upper and lower layers to perform learning, thus reducing computational work and consequently improving reliability.
  • Further, the present invention is advantageous in that it requires storage space having only a small capacity, realizes fast convergence, and guarantees stability somewhat, thus increasing the probability of convergence.
  • Further, the present invention is advantageous in that it sets target values for hidden nodes, thus realizing faster and more stable escape from local minima and plateaus.
  • Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. Therefore, it should be understood that the modifications, addition and substitutions, and equivalences thereto, belong to the scope of the present invention.

Claims (9)

1. (canceled)
2. (canceled)
3. (canceled)
4. (canceled)
5. (canceled)
6. (canceled)
7. A separate learning method using a two-layered neural network having target values for hidden nodes, comprising the steps of:
(a) performing learning in a second connection weight unit using training data;
(b) determining whether learning has converged when a learning speed decreases due to local minima and plateaus, and stopping the learning if it is determined that learning has converged, otherwise turn a learning direction to a first connection weight unit and allowing learning to be performed between all of the input node at least one hidden node;
(c) determining whether learning has reached a target value for the hidden node set by the first connection weight unit;
(d) turning a learning direction to the second connection weight unit and performing learning between the hidden node and at least one output node if it is determined that learning has not reached the target value for the hidden node as a result of the determination; and
(e) causing learning, performed in the second connection weight unit, to reach a global minimum.
8. The separate learning method according to claim 7, further comprising the step of (a-1) receiving training data through the input layer to train a neural network before step (a).
9. The separate learning method according to claim 7, wherein step (b) comprises the steps of:
(b-1) selecting an output node having a largest error value with respect to the hidden node if it is determined that learning has not converged;
(b-2) calculating the target value for the hidden node so that learning can reach a global minimum; and
(b-3) transmitting the error value for the hidden node and the target value for the hidden node to the first connection weight unit.
US12/722,861 2006-05-19 2010-03-12 Separate Learning System and Method Using Two-Layered Neural Network Having Target Values for Hidden Nodes Abandoned US20100169256A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/722,861 US20100169256A1 (en) 2006-05-19 2010-03-12 Separate Learning System and Method Using Two-Layered Neural Network Having Target Values for Hidden Nodes

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR1020060045193A KR100820723B1 (en) 2006-05-19 2006-05-19 Separately trained system and method using two-layered neural network with target values of hidden nodes
KR10-2006-0045193 2006-05-19
US11/457,601 US7734555B2 (en) 2006-05-19 2006-07-14 Separate learning system and method using two-layered neural network having target values for hidden nodes
US12/722,861 US20100169256A1 (en) 2006-05-19 2010-03-12 Separate Learning System and Method Using Two-Layered Neural Network Having Target Values for Hidden Nodes

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/457,601 Division US7734555B2 (en) 2006-05-19 2006-07-14 Separate learning system and method using two-layered neural network having target values for hidden nodes

Publications (1)

Publication Number Publication Date
US20100169256A1 true US20100169256A1 (en) 2010-07-01

Family

ID=38791536

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/457,601 Expired - Fee Related US7734555B2 (en) 2006-05-19 2006-07-14 Separate learning system and method using two-layered neural network having target values for hidden nodes
US12/722,861 Abandoned US20100169256A1 (en) 2006-05-19 2010-03-12 Separate Learning System and Method Using Two-Layered Neural Network Having Target Values for Hidden Nodes

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/457,601 Expired - Fee Related US7734555B2 (en) 2006-05-19 2006-07-14 Separate learning system and method using two-layered neural network having target values for hidden nodes

Country Status (2)

Country Link
US (2) US7734555B2 (en)
KR (1) KR100820723B1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646188A (en) * 2013-12-27 2014-03-19 长春工业大学 Non-invasive diagnostic method of coronary heart disease based on hybrid intelligent algorithm
CN105487009A (en) * 2015-11-19 2016-04-13 上海电机学院 Motor fault diagnosis method based on k-means RBF neural network algorithm
CN106203622A (en) * 2016-07-14 2016-12-07 杭州华为数字技术有限公司 Neural network computing device
CN106326677A (en) * 2016-09-12 2017-01-11 北京化工大学 Soft measurement method of acetic acid consumption in PTA device
US9875737B2 (en) 2016-03-18 2018-01-23 Electronics And Telecommunications Research Institute Pre-training apparatus and method for speech recognition
CN108364068A (en) * 2018-01-05 2018-08-03 华南师范大学 Deep learning neural network construction method based on digraph and robot system
WO2019039758A1 (en) * 2017-08-25 2019-02-28 주식회사 수아랩 Method for generating and learning improved neural network
CN109840492A (en) * 2019-01-25 2019-06-04 厦门商集网络科技有限责任公司 Document recognition methods and terminal based on deep learning network
CN111210003A (en) * 2019-12-30 2020-05-29 深圳前海微众银行股份有限公司 Longitudinal federated learning system optimization method, device, equipment and readable storage medium
US10963791B2 (en) 2015-03-27 2021-03-30 Equifax Inc. Optimizing neural networks for risk assessment
US10997511B2 (en) 2016-11-07 2021-05-04 Equifax Inc. Optimizing automated modeling algorithms for risk assessment and generation of explanatory data
US11010669B2 (en) 2018-10-24 2021-05-18 Equifax Inc. Machine-learning techniques for monotonic neural networks

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100249551A1 (en) * 2009-03-31 2010-09-30 Nelicor Puritan Bennett LLC System And Method For Generating Corrective Actions Correlated To Medical Sensor Errors
US8918352B2 (en) * 2011-05-23 2014-12-23 Microsoft Corporation Learning processes for single hidden layer neural networks with linear output units
US8639680B1 (en) * 2012-05-07 2014-01-28 Google Inc. Hidden text detection for search result scoring
US10552734B2 (en) * 2014-02-21 2020-02-04 Qualcomm Incorporated Dynamic spatial target selection
US20150242742A1 (en) * 2014-02-21 2015-08-27 Qualcomm Incorporated Imbalanced cross-inhibitory mechanism for spatial target selection
KR102239714B1 (en) 2014-07-24 2021-04-13 삼성전자주식회사 Neural network training method and apparatus, data processing apparatus
KR101821494B1 (en) * 2016-08-10 2018-01-24 중앙대학교 산학협력단 Adaptive traffic signal control method and apparatus
CN107563588A (en) * 2017-07-25 2018-01-09 北京拓明科技有限公司 A kind of acquisition methods of personal credit and acquisition system
KR20190051391A (en) 2017-11-06 2019-05-15 울산대학교 산학협력단 Underwater transient signal classification apparatus and method
CN108520155B (en) * 2018-04-11 2020-04-28 大连理工大学 Vehicle behavior simulation method based on neural network
CN109521270A (en) * 2018-10-11 2019-03-26 湖南工业大学 Harmonic detecting method based on modified wavelet neural network
KR102533235B1 (en) 2018-11-01 2023-05-17 서강대학교산학협력단 Convolution neural network-based input classification apparatus and method
CN109934375B (en) * 2018-11-27 2020-05-01 电子科技大学中山学院 Power load prediction method
KR102592585B1 (en) * 2019-02-01 2023-10-23 한국전자통신연구원 Method and apparatus for building a translation model
CN114579730A (en) * 2020-11-30 2022-06-03 伊姆西Ip控股有限责任公司 Information processing method, electronic device, and computer program product
KR102613367B1 (en) 2020-12-29 2023-12-13 국민대학교산학협력단 Method and apparatus for automatically reducing model weight for deep learning model serving optimization, and a method for providing cloud inference services usin the same

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR970008532B1 (en) * 1993-08-10 1997-05-24 재단법인 한국전자통신연구소 Neural metwork
KR0119902B1 (en) * 1994-04-13 1997-10-29 양승택 On line learning algorithm of an estimation network
KR20040028408A (en) * 2002-09-30 2004-04-03 주식회사 케이티 Method for measuring the importance of a variable using neural network

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646188A (en) * 2013-12-27 2014-03-19 长春工业大学 Non-invasive diagnostic method of coronary heart disease based on hybrid intelligent algorithm
US10963791B2 (en) 2015-03-27 2021-03-30 Equifax Inc. Optimizing neural networks for risk assessment
US11049019B2 (en) * 2015-03-27 2021-06-29 Equifax Inc. Optimizing neural networks for generating analytical or predictive outputs
US10977556B2 (en) 2015-03-27 2021-04-13 Equifax Inc. Optimizing neural networks for risk assessment
CN105487009A (en) * 2015-11-19 2016-04-13 上海电机学院 Motor fault diagnosis method based on k-means RBF neural network algorithm
US9875737B2 (en) 2016-03-18 2018-01-23 Electronics And Telecommunications Research Institute Pre-training apparatus and method for speech recognition
CN106203622A (en) * 2016-07-14 2016-12-07 杭州华为数字技术有限公司 Neural network computing device
CN106326677A (en) * 2016-09-12 2017-01-11 北京化工大学 Soft measurement method of acetic acid consumption in PTA device
US11238355B2 (en) 2016-11-07 2022-02-01 Equifax Inc. Optimizing automated modeling algorithms for risk assessment and generation of explanatory data
US11734591B2 (en) 2016-11-07 2023-08-22 Equifax Inc. Optimizing automated modeling algorithms for risk assessment and generation of explanatory data
US10997511B2 (en) 2016-11-07 2021-05-04 Equifax Inc. Optimizing automated modeling algorithms for risk assessment and generation of explanatory data
WO2019039758A1 (en) * 2017-08-25 2019-02-28 주식회사 수아랩 Method for generating and learning improved neural network
CN108364068A (en) * 2018-01-05 2018-08-03 华南师范大学 Deep learning neural network construction method based on digraph and robot system
US11010669B2 (en) 2018-10-24 2021-05-18 Equifax Inc. Machine-learning techniques for monotonic neural networks
US11468315B2 (en) 2018-10-24 2022-10-11 Equifax Inc. Machine-learning techniques for monotonic neural networks
US11868891B2 (en) 2018-10-24 2024-01-09 Equifax Inc. Machine-learning techniques for monotonic neural networks
CN109840492A (en) * 2019-01-25 2019-06-04 厦门商集网络科技有限责任公司 Document recognition methods and terminal based on deep learning network
CN111210003A (en) * 2019-12-30 2020-05-29 深圳前海微众银行股份有限公司 Longitudinal federated learning system optimization method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
US20070282772A1 (en) 2007-12-06
US7734555B2 (en) 2010-06-08
KR100820723B1 (en) 2008-04-10
KR20070111853A (en) 2007-11-22

Similar Documents

Publication Publication Date Title
US7734555B2 (en) Separate learning system and method using two-layered neural network having target values for hidden nodes
Ramchoun et al. Multilayer perceptron: Architecture optimization and training
Neruda et al. Learning methods for radial basis function networks
Vlahogianni et al. Spatio‐temporal short‐term urban traffic volume forecasting using genetically optimized modular networks
Kamruzzaman et al. ANN-based forecasting of foreign currency exchange rates
Mom et al. Application of artificial neural network for path loss prediction in urban macrocellular environment
Liu et al. A cooperative ensemble learning system
Iglesias et al. Element failure detection in linear antenna arrays using case-based reasoning
US5390284A (en) Learning method and apparatus for neural networks and simulator with neural network
Mirmirani et al. Gold price, neural networks and genetic algorithm
Parasuraman et al. Cluster-based hydrologic prediction using genetic algorithm-trained neural networks
CN108805287A (en) The Gyro Random error compensating method and system of genetic wavelet neural network
Lagaros et al. Constraint handling techniques for metaheuristics: a state-of-the-art review and new variants
Specht GRNN with double clustering
Rahmat et al. Big data forecasting using evolving multi-layer perceptron
Roy et al. Metamodeling for multimodal selection functions in evolutionary multi-objective optimization
Modhej et al. Integrating inverse data envelopment analysis and neural network to preserve relative efficiency values
CN115146455B (en) Complex supply chain multi-objective decision method supported by calculation experiment
Pal et al. Fuzzy versions of Kohonen's net and MLP-based classification: performance evaluation for certain nonconvex decision regions
Huang et al. Using artificial intelligence to retrieve the optimal parameters and structures of adaptive network-based fuzzy inference system for typhoon precipitation forecast modeling
Huang et al. Genetic algorithms enhanced Kohonen's neural networks
Mazzoleni et al. A comparison of manifold regularization approaches for kernel-based system identification
Choong et al. Entropy maximization networks: An application to breast cancer prognosis
Mashor Performance comparison between back propagation, RPE and MRPE algorithms for training MLP networks
Pearson et al. Modified Kohonen’s learning laws for RBF network

Legal Events

Date Code Title Description
AS Assignment

Owner name: INHA-INDUSTRY PARTNERSHIP INSTITUTE,KOREA, DEMOCRA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEE, JU HONG;CHOI, BUM GHI;PARK, TAE SU;REEL/FRAME:024095/0135

Effective date: 20060630

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION