BACKGROUND OF THE INVENTION
This application relates generally to communications networks, and more particularly, to testing communication lines.
Recently, there has been an increased demand for plain old telephone systems (POTS's) to carry high-speed digital signals. The demand has been stimulated by home access to both the Internet and distant office computers. Both types of access typically employ a POTS line as part of the path for carrying digital signals.
POTS's lines were built to carry voice signals at audible frequencies and can also carry digital signals as tones in the near audible frequency range. Modern digital services such as ISDN and ADSL transmit data at frequencies well above the audible range. At these higher frequencies, POTS's lines that transmit voice signals well may transmit digital signals poorly. Nevertheless, many telephone operating companies (TELCO's) would like to offer ISDN and/or ADSL data services to their subscribers.
Telephone lines between a TELCO switch and a subscriber's premises are frequent sources of poor performance at the high frequencies characteristic of ISDN and ADSL transmissions. Nevertheless, high cost has made widespread replacement of these subscriber lines an undesirable solution for providing subscribers with lines capable of supporting ISDN and ADSL. A less expensive alternative would be to remove only those subscriber lines that are inadequate for transmitting high-speed digital data.
To enable limited replacement of inadequate lines, TELCO's have placed some emphasis on developing methods for predicting which subscriber lines will support data services, such as ISDN and ADSL. Some emphasis has been also placed on predicting frequency ranges at which such data services will be supported. Some methods have also been developed for finding faults in subscriber lines already supporting data services so that such faults can be repaired.
Current methods for predicting the ability of subscriber lines to support high-speed digital transmissions are typically not automated and labor intensive. Often, these methods entail using skilled interpretations of high frequency measurements of line parameters to determine data transmission abilities. At a network scale, such tests are very expensive to implement.
The present invention is directed to overcoming or, at least, reducing the affects of one or more of the problems set forth above.
SUMMARY OF THE INVENTION
In a first aspect, the invention provides a method of testing a subscriber line. The method includes determining values of electrical line features from electrical measurements on the subscriber line and processing a portion of the values of the electrical features with a neural network. The neural network predicts whether the line qualifies to support one or more preselected data services from the portion of the values.
In a second aspect, the invention provides a method of constructing a test for qualifying subscriber lines for data transmissions. The method includes obtaining electrical feature and qualification data for sample lines of a training set and determining parameters defining a neural network from the data of the training set. The network is configured to use values of electrical properties of a subscriber line to predict whether the subscriber line qualifies for one or more preselected data services.
In a third aspect, the invention provides a method of testing a subscriber line. The method includes determining values of electrical features of the line from electrical measurements on the subscriber line and forming a vector having the values as components. The method also includes determining whether the vector is a member of a cluster of feature vectors and predicting whether the line qualifies for a data service based in part on the determination of cluster membership. Feature vectors of the cluster are associated with sample lines of a training set.
In a fourth aspect, the invention provides a data storage medium storing a computer executable program of instructions for performing one or more of the above-described methods.
BRIEF DESCRIPTION OF THE DRAWINGS
Other features and advantages of the invention will be apparent from the following description taken together with the drawings in which:
FIG. 1 shows a portion of a POTS network having a system for testing subscriber telephone lines;
FIG. 2 illustrates a device for performing one-ended electrical measurements on a subscriber line;
FIG. 3A is a flow chart showing a method of qualifying subscriber lines that uses a feed-forward neural network;
FIG. 3B is a flow chart showing a method of smoothing electrical measurements prior to processing by the neural network of FIG. 3A;
FIG. 4A is a flow chart showing a method of processing feature vectors through the neural network of FIG. 3A;
FIG. 4B is a block diagram showing the flow of data through the layers of the neural network of FIG. 4A;
FIG. 5 is a flow chart showing a method of constructing the neural network of FIGS. 4A-4B;
FIG. 6 is a flow chart showing a method of determining the compression layer of the neural network of FIGS. 4A-4B;
FIG. 7 is a flow chart showing a method of determining the basis layer of the neural network of FIGS. 4A-4B; and
FIG. 8 is a flow chart showing a method of determining the weights the output layer of the neural network of FIGS. 4A-4B.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Measurement and Test Apparatus
FIG. 1 shows a portion of a POTS network 10 that includes a system 11 for testing subscriber lines 12-14. The subscriber lines 12-14 connect subscriber units 16-18, i.e., modems and/or telephones, to a telephony switch 15. The switch 15 connects the subscriber lines 12-14 to the remainder of the telephone network 10. The switch 15 may be a POTS switch or another device, e.g., a digital subscriber loop access multiplexer (DSLAM). The switch 15 and testing system 11 may be located in one or more switching stations of a TELCO.
Each subscriber line 12-14 consists of a standard twisted two-wire telephone line adapted to voice transmissions. The two wires are generally referred to as the ring R and tip T wires.
A large portion of each subscriber line 12-14 is housed in one or more standard telephone cables 22. The cable 22 carries many subscriber lines 12-14, e.g., more than a dozen, in a closely packed configuration. The close packing creates an electrical environment that changes transmission properties of the individual subscriber lines 12-14.
A measurement unit 40 performs electrical measurements, used in tests of the lines 12-14. The measurement unit 40 includes a device 43 that performs one-ended electrical admittance measurements on tip and ring wires of the lines 12-14. The measurement unit 40 may also house other devices (not shown) for performing other types of one-ended electrical measurements, e.g., to test the line for selected line faults. The measurement unit 40 couples to the switch 15 via a test bus 42.
The device 43 connects to the switch 15 through the test bus 42 and a standard voice test access 44. The voice test access 44 electrically connects the device 43 to the subscriber lines 12-14 selected for testing. The voice test access 44 can transmit electrical signals having frequencies between about 100 Hertz (Hz) and 20 kilo Hz (KHz), i.e., low compared to frequencies used by ISDN and ADSL data services.
The measurement unit 40 is controlled by computer 46, which selects the types of measurements to perform and the subscriber lines 12-14 to test. The computer 46 sends control signals to the measurement unit 40 and receives measurement results from the measurement unit 40 via the same line 48.
The computer 46 executes a software program to control line testing by the measurement unit 40. The program also processes and interprets the results from the measurement unit 40 to determine whether to qualify or disqualify the lines 12-14 for preselected high-speed data services. The software program is stored, in executable form, in a data storage device 49, e.g., a hard drive or random access memory (RAM). The program may also be encoded on a readable storage medium 50, such as an optical or magnetic disk, from which the program can be executed.
To perform a test, the measurement unit 40 signals the voice test access 44 to disconnect a selected line 12-14 from the network 10 and to reconnect the line 12-14 to wires of the bus 42 connecting to the internal device 43. Then, the internal device 43 performs one-ended electrical measurements on the selected line 12-14. After the measurements are completed, the measurement unit 40 signals the switch 15 to reconnect the line 12-14 to the remainder of the POTS network 10.
The computer 46 can qualify or disqualify selected subscriber lines 12-14 for data services prior to fully connecting the lines 12-14 to the subscriber units 16-18. Qualification is based on determining, with high certainty, that a selected line 12-14 will support a specified data service. Disqualification is based on determining, with high certainty, that the selected line 12-14 will not support the specified data service.
FIG. 2 illustrates the device 43 that performs one-ended electrical measurements on the subscriber lines 12-14 of FIG. 1. The measurements may be used to speed qualify each line 12-14 at high frequencies as is described below. The measurements can also be used to detect line faults, such as bridged taps, gauge changes, split pairs, resistive imbalances, load coils, metallic faults such as shorts and open lines, and capacitive imbalances. Methods for detecting such faults with the device 43 have been described in U.S. application Ser. No. 09/294,563 ('563), filed Apr. 20, 1999 and U.S. application Ser. No. 09/285,954 ('954), filed Apr. 2, 1999. These applications are incorporated by reference, in their entirety, in the present application.
The device 43 is adapted to measure admittances between the tip wire T, ring wire R, and ground G for the subscriber line 12-14 being tested. The tip and ring wires T, R of the line 12-14 being tested couple to driving voltage sources V1 and V2 through known conductances Gt and Gr. The tip and ring wires T, R also connect to voltmeters Vt and Vr. The Vt and Vr voltmeters read the voltage between the tip wire T and ground G and between the ring wire R and ground G, respectively. The readings from the voltmeters Vt and Vr enable the computer 46 to determine three admittances Ytg, Ytr, and Yrg between the pairs tip-to-ground, tip-to-ring, and ring-to-ground, respectively. The device 43 can measure the admittances Ytg, Ytr, and Yrg at preselected frequencies of the range supported by the voice test access 44. The '563 application has described the steps for performing such measurements in detail.
The capacity of a subscriber line to support high-speed data transmissions is defined by values of the signal attenuation at high frequencies. For ISDN and ADSL services, the qualification classes for data transmission are defined as follows:
ISDN qualified if attenuation is above −47 dB at 100 KHZ;
ADSL qualified if attenuation is above −40 dB at 300 KHZ;
Disqualified if attenuation is in another range.
Other data services may entail different constraints on the signal attenuation at high frequencies. Nevertheless, qualification for any high-speed data service entails constraints at higher frequencies than the frequencies measured at which the device 43 performs electrical measurements.
Measuring line properties, e.g., the attenuation, directly at the frequencies that define the qualification classes is inconvenient, because the voice test access 44 only supports low frequency measurements. Thus, methods have been developed for extrapolating low frequency measurements to predict the high frequency attenuation. Those extrapolations are prone to inaccuracies and have resulted in mispredictions of line speed qualification status.
Qualification with Neural Networks
As used herein, a neural network is a process that generates, from an input feature vector, a set of confidence values that indicate probabilities of class memberships. The input feature vector has properties of a subscriber line as components, e.g., one-ended electrical measurements and electrical properties derived therefrom. The various embodiments use neural networks in which the classes are qualification classes for high-speed data services or classes of lines with or without selected types of conditions or faults.
FIG. 3A illustrates a method 100 that uses low frequency measurements and a feed-forward neural network to speed qualify or disqualify the subscriber lines 12-14 for high-speed data services, such as ISDN and ADSL. The neural network can increase the accuracy of qualification tests based on low frequency measurements. For example, one embodiment that uses a first-order neural network has correctly predicted the capacity of unknown subscriber lines to support ADSL and ISDN with an accuracy of more than 90% in test situations.
To test a selected subscriber line 12-14, the system 11 evaluates a set of preselected electrical properties for the selected line 12-14 (step 101). The preselected properties include one-ended electrical measurements made with the measurement unit 40 and other properties directly derived from such measurements. The measured properties include the one-ended admittances Ytg, Ytr, and Yrg, and the derived properties include ratios of admittances and capacitances at preselected frequencies. From the selected electrical properties, the computer 46 creates an electrical feature vector whose ordered components are the values of the preselected electrical properties (step 102). After creating the feature vector, the computer 46 processes the vector with a feed-forward neural network, which has a multi-layered structure (step 103). From an input feature vector, the neural network predicts whether the associated line 12-14 qualifies or disqualifies for a preselected set of high-speed data services, e.g., ISDN and/or ADSL. The neural network is encoded in a portion of the software program, which is stored in the data storage device 49 and/or the readable medium 50 in a computer-executable form.
Some embodiments combine the method 100, based on a neural network, with methods for detecting faults to predict whether a subscriber line qualifies for high-speed data services. In such embodiments, the prediction of whether a line is qualified is based both on the predictions of the neural network and on whether the line has a line fault. For example, the existence of a fault may be a quality factor that can modify the confidence levels for class membership, which are predicted by the neural network.
FIG. 3B is a flow chart showing a method 104 of obtaining smoothed electrical properties from one-ended measurements on a subscriber line 12-14. The measurement unit 40 measures admittances Ytg, Ytr, and/or Yrg of the line 12-14 at a preselected set of frequencies “f” (step 105). One embodiment uses a set of 45 frequencies starting at 150 Hz and having an increment of 450 Hz between adjacent frequencies.
The measurements of the admittances Ytg, Ytr, and/or Yrg are susceptible to noise. The noise can corrupt the measurements enough to interfere with results obtained from the neural network. To reduce noise interference, the computer 46 smoothes the measured admittances (step 106).
To implement smoothing, the computer 46 expands each measured admittance “YS” with a polynomial BS(f) of the form:
B S(f)=ΣD j=0δS j P j(f).
In the expansion, the Pj(f)'s form an orthogonal polynomial basis, e.g., Legendre polynomials. For measurements at the above-described forty-five frequencies, an expansion with the eight lowest Pj(f)'s can produce adequate smoothing, i.e., D=8. The computer 46 determines the coefficients, δj, from the measured Ys(f)'s using the formula:
δS j =<Y s(f), P j(f)>.
Here, <,> is the inner product for which the Pj(f)'s are othonormal. The values of the polynomials Bs(f) at a preselected set of frequencies “f” define the measured impedances of the line 12-14, which are inputted into the neural network. In the polynomial BS(f), noise effects are reduced by averaging.
The computer 46 calculates derived electrical properties from the polynomial Bs(f) instead of directly from measurements (step 107). The derived properties may include capacitances, ratios of admittances and derived properties at fixed frequencies, ratios of admittances to frequency, and peaks and valleys of the frequency-dependent admittances. Again, the use of the smoothing polynomial, BS(f), to calculate these properties reduces noise effects. In step 102 of FIG. 3A, the feature vector for the subscriber line 12-14 is a concatenation of measured and derived electrical properties that have been determined from BS(f).
FIG. 4A is a flow chart 110 illustrating step 103 of FIG. 3A in which the computer 46 processes the feature vector to determine the qualification status of the associated line 12-14. The processing occurs sequentially in three layers of the neural network 115 shown in FIG. 4B.
In the first or compression layer 116 of the neural network 115, the computer 46 compresses a N-dimensional feature vector Za by performing a projection into a preselected M-dimensional subspace of the entire N-dimensional feature space (step 112). To perform the projection, the computer 46 evaluates scalar products between the feature vector Za and a special set of orthonormal basis vectors spanning the subspace. Since the subspace has a lower dimension (M<N), the projection produces a compressed feature vector Xa with less components than the input feature vector Za thereby making subsequent processing more rapid. The projection removes components of the feature vectors that do not vary substantially over sample lines of a training set. Such components are less indicative of the qualification status of the associated lines, because they have similar values for both qualified and disqualified lines.
In the second or basis layer 117 of the neural network 115, the computer 46 determines whether the compressed vector Xa is a member of one or more clusters of feature vectors located in the projected subspace (step 113). To determine cluster memberships, the computer 46 evaluates fuzzy variables whose values represent probabilities of belonging to the clusters associated with the variables.
The clusters are localized groups of compressed feature vectors associated with the sample lines of the training set. Each cluster is located in the projected subspace of the entire feature space and can be modeled as a hyper-ellipsoidal region having a fuzzy boundary. Fuzzy functions on the clusters form the basis layer 117 of the neural network 115.
For a cluster K, the associated fuzzy variable yK is a fuzzy Horn clause. The value yK of the fuzzy Horn clause for a projected feature vector X is given by:
y K(X)=exp[−(X−V K)t D K 2(X−V K)]
VK is the location of the center of the cluster K in the projected feature vector space. DK 2 is a diagonal matrix whose entries, σKi −2, are the squared widths of the hyper-ellipsoidal region along directions “i”. The σKi's express the fuzziness of the cluster K. The value yK(X) is indicative of the probability that the line 12-14 with projected feature vector X is a member of the cluster K.
In the last or output layer 118 of the neural network 115, the computer 46 predicts the selected line's membership to each qualification class (step 114). The class membership predictions produce confidence values CQ for membership to each qualification class, Q, recognized by the neural network. The confidence value CQ is defined by a class membership function whose form depends on an rxC matrix of weight parameters, WQ. C is the number of clusters in the class Q, and r is a fixed non-negative integer that defines how results are biased, e.g., r=0 or 1. For class Q, the confidence value CQ is given by:
C Q(X)={1+exp[−(X r, 1)t W Q(Y(X))]}−1.
Here, Y(X) is the vector [y1(X), . . . , yc(X), 1]t of the fuzzy variables representing the probabilities that the projected feature vector X is a member of the C clusters of the class Q. Xr is a dimensionally reduced version of the projected feature vector X in which only the “r” most significant terms are retained.
The confidence values CQ's express a relationship between cluster membership and class membership. The relationship quantifies the fact that projected feature vectors of a single cluster have similar qualification class memberships.
For each feature vector, the neural network 115 generates a set of confidence values that indicate the probability of membership to each qualification class at the output 119. The computer 46 reports that the associated subscriber line 12-14 belongs to the qualification class for which the confidence value is highest.
The form of the neural network 115 is fixed by the parameters defining each of the three layers 116-118. The parameters define the feature space projection of the compression layer 116, the cluster geometry of the basis layer 117, and the weights of the output layer 118, which relate memberships to clusters and qualification classes.
FIG. 5 illustrates a method 120 for constructing the feed-forward neural network 115 used in FIGS. 3A, 4A, and 4B. To construct the neural network 115, data for a training set of sample lines is acquired through direct measurements and/or calculations (step 122). The data includes both an electrical feature vector and a qualification classification for each sample line. The properties of the sample lines belonging to the training set may correspond to the expected line properties for the unknown subscriber lines to be tested with the neural network 115. When such a correspondence exists, the neural network is generally expected to predict line qualification status with a higher accuracy. From the training set, the computer 46 determines the parameters that define the neural network 115 by a learning process (step 124).
Localizing both capacities to create and to use the neural network 115 on the same computer 46 leads to some advantages. Then, the system 11 is more adaptable to changes to the general population of subscriber lines 12-14. In response to changes, the computer 46 re-adapts the neural network 115 by recalculating the network parameters with a training set whose properties are more aligned with the changed population of subscriber lines 12-14.
FIG. 6 is a flow chart illustrating a method 125 of determining the parameters that define the compression layer 116 of the neural network 115. The projection, which produces the compression, employs a Karhunen-Loeve (KL) transformation. The KL transformation provides an expansion of feature vectors with a special orthonormal basis of N vectors, φi. In this basis, an arbitrary feature vector, X, is written as:
X=Σ N i=1 k iφi +<X>.
Here, <> denotes an average over the sample lines in the training set, i.e., <X> is the average of the feature vector.
Each basis vector, φi, is an eigenvector of a covariance matrix R, which is defined as an average over the feature vectors of the training set. The matrix R is:
R=<(X−<X>)(X−<X>)t>.
The basis vectors φi satisfy the equations:
Rφ i=λiφi
in which the λi are the eigenvalues.
To determine the parameters of the compression layer 116, the computer 46 evaluates the covariance matrix R on the feature vectors of the training set (step 126). Then, the computer 46 determines the eigenvalues and eigenvectors of the covariance matrix R (step 127).
In terms of the expansion over the special basis {φi}, the compression layer 116 compresses a feature vector X by simply deleting components of vector X to produce a truncated vector, XP. The truncated vector XP is given by:
X P==ΣM i=1 k iφi.
The truncation deletes the terms of the expansion of X over φi's corresponding to small eigenvalues λi of the covariance matrix R. The retained terms correspond to φi's whose eigenvalues λi are greater than a preselected threshold, e.g., max(λi)/250 (step 128). The coefficients (k1, k2, . . . , kM) for the retained terms are given by:
(k 1 , k 2 , . . . , k M)=(φ1, φ2, . . . , φM)t ·X.
By only retaining M eigenvectors associated with the M largest eigenvalues, the dimension of the feature vectors is reduced by compression from N to M. The eigenvectors φi, for i=1, . . . , M, define the compression layer 116.
The truncation of the feature vector X generates an error vector “e”, which is given by:
e=Σ N i=M+1 k iφi.
The error vector “e”, corresponds to a sum of squares error, ε, which has the form:
ε=ΣN i=M+1 k i k i=ΣN i=M+1φi t(X−<X>)(X−<X>)tφi
Thus, the average of the error ε over the training set is:
<ε>=ΣN i=M+1φi tRφi=ΣN i=M+1λi.
Since the λi appearing in <ε> are the smallest eigenvalues of R, the feature space projection of step 112 is optimal. The feature space projection reduces the size of feature vectors and generates minimal “average” errors. For example, one embodiment that truncates eigenvectors having eigenvalues smaller than max(λi)/250 removes about two thirds of the components of feature vectors.
FIG. 7 is a flow chart illustrating a method 130 of determining the parameters that define the basis layer 117 of clusters in projected feature space. The method uses fuzzy C-means clustering to recursively determine the basis layer from the data in the training set. At each recursion, both the center location Va of each cluster “a” and the membership values μma are updated by minimizing a fuzzy cost function, E, over the data in the training set. Minimizing E reduces inter-cluster overlaps by grouping together projected feature vectors based on both geometry and a relation. The relation assigns each feature vector of the training set to the clusters of a single qualification class, i.e., to the class to which the associated sample line belongs. Thus, each feature vector of the training set is used to construct the clusters of one qualification class.
For each class, the fuzzy cost function E provides a fuzzy measure of clustering. Each fuzzy cost function E is defined by:
E=Σ C a=1Σm(μma)fuzz |X m −V a|2.
where the sum runs over the “C” clusters of the associated qualification class. For each projected feature vector Xm, the fuzzy variables μma are indicative of membership to cluster “a”. When summed over the C clusters of the class to which the associated sample line belongs, the μma add up to one, i.e., ΣC a=1μma=1.
To find the parameters defining the functions of the basis layer 117, the computer 46 searches for clusters in the distribution of projected feature vectors for the training set. Before starting a search, the computer 46 receives a value for the number C of clusters in a qualification class, e.g., a guessed value of C provided by an operator (step 132). After receiving the number of clusters, the computer 46 guesses the center location Va for each cluster “a” of the class (step 134). From the guessed center locations, the computer 46 calculates values for cluster membership variables μma associated with each feature vector Xm of a sample line in the same class (step 136). The cluster membership variables are found from the center locations, by solving the equations:
(μma)−1=ΣC a′=1 [{|X m −V a |/|X m −V a′|}]2/(fuzz−1).
In which “fuzz” is a preselected fuzziness parameter. Fuzz is greater than 1, e.g., fuzz may be between 1.6 and 1.7.
Using the cluster membership variables μma's, the computer 46 updates the center locations, i.e., the Va's, of the clusters (step 138). The updated center locations are determined from the equation:
V a=[Σm(μma)fuzz X m]/[Σm′(μm′a)fuzz]
The sums are over projected feature vectors Xm and Xm′ of the training set. Next, the computer 46 determines whether distances between the present and last values of the center locations of the clusters are small enough, i.e., less than a preselected threshold value (step 140). If any differences exceed the threshold value, the computer 46 executes loop 142 to start another cycle of updating the geometric variables Va and μma, which define the clusters.
If all the distances between present are below the threshold value, the computer 46 calculates the widths of each cluster (step 142). Since each cluster “a” is a hyper-ellipsoidal region, the width σab of cluster “a” along direction “b” is related to the range of projected feature vectors belonging to the cluster “a” along the direction “b”.
One definition of the width σab is given by:
σab=Σm(μma m |X m, b −V a, b|2]½
Here, the component of a vector Z along direction “b” is indicated by Z,b.
FIG. 8 is a flow chart illustrating an iterative method 150 of determining the weights WQ in the class membership functions CQ(X). The method is based on a Gaussian noise model for the learning process in which the N-the recursive estimate for the confidence value Z1 is given by:
Z 1(N)=f 1(N)+δ1.
The model's noise, δ1 is Gaussian and has zero mean, i.e., <δ1>=0, and a variance <δ1δ1>=Ω1 2.
The weight parameters WQ are determined by using an extended Kalman estimator. The extended Kalman estimator approximates the nonlinear function f1(N) of the weights WQ by linearizing the dependence on WQ to obtain the form:
Z 1(N)=Σa H 1, a(N)W a(N)+δ1 with H 1, a(N)=∂f1(N)/∂W a(N).
Henceforth, the index “a” includes all matrix-indices of the weight matrix Wa(N). Wa(N) is the Nth recursive estimate to the optimal weight matrix Wa(∞)
To determine the weight matrix, the computer 46 employs a recursive algorithm. The computer 46 initializes the weights and the covariance, P(N), for the error, Er(N), in the weights, i.e., Er(N)=Wa(N)−Wa(∞), over the training set (step 152). For the Nth estimate, the error in the weights is: Er(N)=Wa(N)−Wa(∞). The initial values or the 0th estimate can be chosen to be: P(0)=103I and Wa(0)=0 where I is the unit matrix on the space of weight matrices Wa.
The computer 46 calculates new or Nth estimates of Ka(N), P1(N), Wa(N), f1(N), and H1,a(N) using the last or (N−1)th estimates to these quantities (step 154). At the Nth iteration, the calculation entails solving the following equations sequentially:
K a(N)=Σb P 1, ab(N−1)H 1,b(N−1)t/(Ω1 2 +H 1(N−1)P 1(N−1)H 1(N−1)t),
P 1(N)=[I−Σ a K a(N)H(N)1,a ]P 1(N−1),
W a(N)=Wa(N−1)+K a(N)[M 1(X)−f 1(N−1)],
f 1(N)=f 1(W a(N)),
and
H 1, a(N)=∂f1(N)/∂W a(N).
The update is performed for each vector X in the training set. M1(X) is the actual class membership value of X in class “1”, i.e., 1 or 0. After calculating the Nth approximation for each feature vector, the computer calculates the squared error in the weight vectors averaged over the training set, i.e., <tr[P1(N)]> (step 156). The computer 46 determines whether the error, i.e., <tr[P1(N)]>, is less than a preselected value (step 158). If the error is below the preselected value, the computer 46 records the value of the weight matrix as the form to be used in the neural network 115 (step 160). Otherwise, the computer 46 performs a loop 162 to calculate the (N+1)th estimated, i.e., Ka(N+1), P1(N+1), Wa(N+1), f1(N+1), and H1, a(N+1).
Other embodiments are within the scope of the following claims.