EP2174267A2

EP2174267A2 - Method and apparatus for automatic pattern recognition

Info

Publication number: EP2174267A2
Application number: EP08801094A
Authority: EP
Inventors: Clemens GÜHMANN; Steffen KÜHN
Original assignee: Technische Universitaet Berlin
Current assignee: Technische Universitaet Berlin
Priority date: 2007-07-31
Filing date: 2008-07-31
Publication date: 2010-04-14
Also published as: DE102007036277A1; US20100217572A1; WO2009015655A2; WO2009015655A3

Abstract

The invention relates to a method for automatic pattern recognition in a sequence of electronic data by means of electronic data processing in a data processing system, in which in an analysis the sequence of electronic data is compared with parametrized model data which represent at least one pattern sequence, and in which the at least one pattern sequence is recognized if it is ascertained during the analysis that model data, which the parametrized model data comprise and which are associated with the at least one pattern sequence, occur with a similarity measure which exceeds a similarity measure threshold, wherein during the formation of the parametrized model data, training data are processed by means of a dynamic time warping method to form a set of feature vectors of the same length and with the same information content as the training data from which the parametrized model data are derived. Furthermore, the invention relates to an apparatus for automatic pattern recognition in a sequence of electronic data by means of electronic data processing with a data processing system.

Description

Method and device for automatic pattern recognition

The invention relates to a method and a device for automatic pattern recognition in a sequence of electronic data by means of electronic data processing in a data processing system.

Background of the invention

In general, the goal of such pattern recognition is to find out the occurrence of sequences or sequences of properties in sequentially formed electronic data. In a variety of practical applications, the patterns to be found are not exactly definable because they may vary in shape and extent. An example is the problem of machine-language recognition, since essential standard methods of the prior art have been developed in the context of this task. Another application is finding fault patterns in machine signals. For example, the detection of knocking burns in gasoline engines by means of structure-borne noise signals, in which a very similar problem is involved (Lachmann et al .: Detecting knocking burns from disturbed knock sensor signals by means of signal separation, Sensory in the motor vehicle, Expert Verlag, 114 -123). However, the developed methods are also required, for example, for searching motor vehicle CAN bus data for error patterns (Isernhagen et al .: Intelligent Signal Processing in An Automated Measurement Data Analysis System.) In Proceedings of the 2007 IEEE Symposium on Computational Intelligence in Image and Signal Processing (CHSP 2007), pp. 83-87, 2007) or when comparing actual and setpoint curves during specification checking (Rebeschieß et al .: Automated closed-loop software test of embedded motor control functions, 11. Software & Systems Quality Conferences 2006, 7th ICS Test, 2006).

In the field of speech recognition, the so-called hidden Markov models (HMM) have become established as the solution to the problem of sequence classification and represent the state of the art (Gernot: pattern recognition with Markov models, Teubner, 2003). The basic idea here is to describe a sequence or sequence as the result of a chain of probability density distributions. Furthermore, the transition from a distribution to subsequent distributions is modeled statistically. HMMs are therefore also referred to as two-stage stochastic processes in the context of pattern recognition. They are quite powerful, but also have disadvantages.

The classification and recognition of sequences or sequences seem to be fundamentally different from conventional pattern recognition tasks in which feature vectors of fixed dimension are analyzed. Such methods and devices for pattern recognition are known, for example, from the documents DE 694 25 166 T2, DE 697 04 201 T2 and DE 10 2006 045 218 A1, and furthermore comprehensively from the specialist literature (cf., for example, Duda et al .: Pattern Classification _^ John Wiley & Sons, 2000). They all have in common that they rely on the estimation of a probability distribution per class, or at least on the estimation of class boundaries. HMMs are significantly different, due to the diversity of the data structure being analyzed. HMMs analyze sequences, ie sequences of features, numerical values, symbols or vectors. A problem here is that the pattern sequences or sequences tend to vary in length, with two differently sized pattern sequences or sequences belonging to the same class. Sequences are thus not vectors, that is, there is no feature space and no probability distribution can be determined. The use of feature-vector-based classifiers is thus prevented.

The solution approach of HMMs is that an observed sequence O = {x _/ , ..., X _n ) - which is called in the technical term of the HMMs observation sequence - the realization of a sequence of random variables Su S ₂ , ■■ -, S _m represents. This implies an additional hidden step, because a deterministic assignment of a concrete observation x _t with te [l, nj to a random variable S _τ with τ e [l, m] is not possible. For this reason, it is described by a stochastic process that models the transition from one state variable to another through transition probabilities. The special form of the data is taken into account. However, some disadvantages also follow from this architecture, since the two-stage significantly increases the complexity compared to feature-vector-based classifiers. The model parameters therefore have to be optimized numerically, which on the one hand does not always have to lead to good parameter values and is also expensive.

Another limitation of HMMs is that they are parametric models, that is, they provide a limiting framework that is not always must fit. Parametric models are therefore often affected simultaneously by under- and over-adaptation to the example data. As an example, it should be noted that HMMs basically require the Markov property to be satisfied. Another example is the assumption of temporal invariance within a state. As a rule, both assumptions are never fully fulfilled, which results in a fundamental structural underfitting.

A pattern recognition method, which deals with the recognition of feature sequences - concretely with the recognition of speech - is described in DE 697 11 392 T2. Another field of application of pattern recognition of feature sequences or sequences relates to knock detection in connection with motors. This will be discussed in more detail below.

Tapping burns are unwanted deviations from normal combustion. Normal burns are triggered by the spark plug spark and are associated with a moderate pressure increase in the cylinder. Knocking burns, however, generate high pressure peaks and can thus lead to damage to the engine. They often occur when the ignition occurs too early. A later ignition can remedy, but leads to a reduction in engine performance, and thus to an increase in fuel consumption. It therefore makes sense to choose the ignition timing so that just no knocking occurs. Since the Klopfheigung an engine depends on external influences, a knock-dependent control of the ignition timing is required. A reliable detection of knocking burns is indispensable.

In principle, a knocking combustion can be determined on the basis of the pressure curve in the interior of the cylinder. However, sensors for detecting this measurement are expensive and wear out quickly, so that other measures must be used for series operation. Structure-borne noise sensors attached to the engine block are inexpensive and provide indirect information about the combustion taking place inside the engine. In particular, knocking burns can be detected via sound peaks. The advantages of the use of structure-borne noise instead of the pressure are paid for with a more complicated and error-prone evaluation, because other effects can be noticeable in structure-borne noise. For the detection of knocking burns on the basis of structure-borne noise signals, digital filters for recognizing knocking-typical frequencies (see DE 101 38 110 A1) or simple feature-vector-based classifiers (see DE 103 52 860 A1) are known on the basis of specific characteristic values or features or the like can be obtained (see EP 1 309 841 B1 or EP 1 184 651 A2). Such methods are error-prone in principle, since in the feature formation usually much relevant information is lost, especially temporal dependencies. In document DE 103 00 204 A1, this disadvantage is alleviated by the formation of time windows. The resulting structure can be interpreted as a simple state machine.

Other methods attempt to generate a virtual pressure signal using the structure-borne sound signal. In the document DE 197 41 884 C2, for example, a neural network is used for this purpose. However, neural networks are difficult to use and do not always lead to reproducible results since many parameters (network structure, transfer functions) must be specified a priori. The weights of the network must be optimized numerically numerically, often only suboptima are found.

HMMs are an alternative approach. In this case, the temporal and the spectral variability of the signals in the form of a stochastic automaton will be described on the basis of a given example or training data set. For this purpose, the actual structure-borne sound signals are converted into time sequences of spectral vectors using STFT {"Short Time Fourier Transforms." The temporal pattern of the spectral vectors-the feature sequences-can be modeled by an HMM.

Despite the general suitability of HMMs, these can only be used to a limited extent for knock detection, since HMMs can only model relatively short sequences, preferably short, non-stochastic sequences, due to the averaging properties of the states. Furthermore, they have similar disadvantages as neural networks.

Summary of the invention

The object of the invention is a method and a device for automatic pattern recognition in a sequence of electronic data by means of electronic data processing in specify a data processing system with which a reliable detection of patterns in the sequence of electronic data can be executed in a simplified manner.

According to the invention the object is achieved by a method for automatic pattern recognition according to independent claim 1 and an apparatus for automatic pattern recognition according to independent claim 5.

The invention encompasses the idea of a method for automatic pattern recognition in a sequence of electronic data by means of electronic data processing in a data processing system, in which automatic electronic pattern recognition in a sequence of electronic data by means of electronic data processing in a data processing system, in an analysis of the sequence of electronic data is compared with parameterized model data representing at least one pattern sequence and in which the at least one pattern sequence is detected when it is determined in the analysis that model data comprised by the parameterized model data associated with the at least one pattern sequence has a similarity threshold Similarity measure occur, wherein in forming the parameterized model data training data by means of a dynamic-time warping method to a set of feature vectors of equal length and with a the same information content as the training data are processed, from which the parameterized model data are derived.

According to another aspect of the invention, there is provided an apparatus for automatic pattern recognition in a sequence of electronic data by electronic data processing, comprising a data processing system comprising: pattern recognition means configured to, in an analysis, sequence the electronic data with parameterized model data comparing at least one pattern sequence and recognizing the at least one pattern sequence if it is determined in the analysis that model data included in the parameterized model data associated with the at least one pattern sequence has a similarity measure exceeding a similarity threshold, and

Model data generating means configured to generate the parameterized model data using the training data and thereby the training data by means of a dynamic-time warping method to a set of feature vectors of equal length and to process with the same information content as the training data from which the parameterized model data are derived, and

Providing means configured to provide electronically evaluable recognition information about recognizing the at least one pattern sequence for output.

Using the conversion of training or sample data using a dynamic time-warping method (Myers et al: A Comparative Study of Several Dynamic Time-Warping Algorithms for Connected Word Recognition, The Bell System Technical Journal, 60 (7 ): 1389-1409, September 1981) to the set of feature vectors of equal length and with the same information content as the training data, it is achieved that a pattern-by-component comparison is possible in pattern recognition. Sequences or sequences that vary in length do not allow this. From the training data, feature vectors of a fixed dimension and the training or example data of the same information content are generated. The conversion into feature vectors of equal information content means that reconstruction of the training data from the set of feature vectors is possible without additional information. In particular, temporal distortion information that is inherent in the training data is retained. There then exists a set of feature vectors, which can then be evaluated by means of any classical feature-vector-based classifiers. The problem of pattern recognition is attributed to a classification task known as such. No two-stage stochastic processes are needed, as is the case with the HMMs.

A preferred embodiment of the invention provides that the parameterized model data are derived from the set of feature vectors by parameterizing a feature vector-based classifier.

In an expedient embodiment of the invention it can be provided that a Bayes classifier with kernel window density estimation is used as the feature vector-based classifier. An expedient development of the invention provides that the similarity measure for a subsequence of electronic data examined at the time j of the analysis is determined from the sequence of electronic data as follows:

L (J, j): = jnaxJzCz - 1,; - a) + log (p _tl (α))} + c \ og (p _el _(Xj))

where X _{j are} the elements of the sequence of electronic data, p _t , (•) and p _e , (•) the z-th elements of a total of N elements of the parametric model data and c and a _m are constants to be chosen empirically. The sought similarity measure at time y is L (NJ).

The method may be used in conjunction with various automatic pattern recognition technologies including, but not limited to, machine signal analysis such as engine knock analysis, ECG signal analysis, speech recognition, gene sequence analysis, image analysis, and thermal image data evaluation. For example, for the quality control of machine-forged components belong. In each case, the data to be analyzed and the example and training data in electronic form and corresponding measurement or analysis variables are available.

Description of preferred embodiments of the invention

In the following the invention will be explained in more detail by means of embodiments with reference to figures of a drawing. Hereby show:

Fig. 1 is a schematic representation of a structure of a knock control for an engine, Fig. 2 shows an example of the data to be processed in the knock control and

Fig. 3 is a schematic representation which describes the relationship between measured structure-borne sound signals and sequentially arranged electronic data.

The pattern recognition method comprises three sub-aspects that can be considered separately, namely (i) a data set transformation, (ii) a parameter determination of a model, and (iii) the application of the parameterized model for recognizing sequences or sequences in sequentially arranged electronic data in turn can represent a wide variety of information content. In a first step, a transformation of an example or training data set into feature vectors takes place, which makes hidden random variables accessible and direct comparability possible. It is assumed that there are three training or example sequences for the parameter determination:

5 ₁ = {a, a, b, b, b, d, d, d, e, f, g}

5 ₂ = {a, a, a, b, b, c, c, d, d, e, e, f, f, f, g, g} (1) S ₃ = {a, b, b, b , c, d, d, e, f, f, g, g}.

To keep the explanation simple, symbol sequences were used. Instead of symbols, real numbers or vectors can also be used. For these, only a comparison criterion is needed: For real numbers, for example, the absolute value of the difference and for vectors a distance measure, such as the Euclidean distance. For symbols, the comparison criterion is degenerate in that the distance is zero if two symbols are equal, otherwise the distance is one.

In the various applications, the example or training data set each represents electronically evaluable information about one or more patterns of measurable size to be later recognized.

It can be seen that the three sequences (1) contain nonlinear distortions. These can be compensated. An equalization yields:

S ₁ = {a, a, *, b, b, b, *, *, d, d, d, e, *, £, *, * _r g, *}

5 ₂ = {a, a, a, b, b, *, c, c, d, d, *, e, e, f, f, f, g, g} (2)

5 ₃ = {a, *, *, b, b, b _r c _r *, d, d _r *, e _r *, £, £, *, g, g).

In order for the sequences to become equal, stars have been inserted which indicate a required repetition of the predecessor symbol. For sequences of real numbers or vectors, complete equality can not be achieved by equalization. Here, however, an equalization can always be found which minimizes the distance between the sequences. One method that does this is the Dynamic-Time-Warping method. The required strains per example sequence can be described using binary vectors

S ₁ = {1,1,0,1,1,1,0,0,1,1,1,1,0,1,0,0,1,0} δ ₂ = {1,1,1, 1,1,0,1,1,1,1,0,1,1,1,1,1,1,1} (3) δ ₃ = {1,0,0,1,1,1,1 , 0,1,1,0,1,0,1,1,0,1,1}

which always contain a one if there was an icon in the original sequence at this point. Otherwise, the entry is zero. The equalized sequences (2) and the distortion vectors (3) become

m _'± = {a, a _r *, b _f b, b _r *, _r * d, d, d e _r _/ f _r * _f * _r * g, *, 1.1, 0.1, 1 , 1, 0,0, 1,1, 1,1, 0,1, 0,0, 1,0} m ' ₂ = {a, a, a, b _r b, *, c, c, d, d, *, e _r e, £, £, f, g, g, 1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1, 1,1} m ' ₃ = {a, *, *, b _r b _r b, c, *, d, d, *, e, *, £, £, *, gr, g, 1, 0, 0 , 1,1,1,1,1,0,1,1,10,1,1}

combined. The star symbols can be replaced without loss of information by the predecessor symbols, as always a back transformation would be possible by the attached binary vectors and there are the feature vectors

Ul ₁ == {a _r a _r a, b _r b _r b, b _r b _r d, d _r d _r e, e _r £ _r _r _r _{r r} , 1,1,1,11 , 1, 0.0, 1.1, 1.1, 0.1, 0.0, 1.0) iπ ₂ == {{aa aa ,, ,, ,, aa bb, _r ,, bb bb, , cc ,, cc ,, dd ,, dd ,, dd ,, ee ,, ee ,, f ,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,, 11,, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11}} (4) m ₃ = {a, a, a, b, b, b, c _r c, d _r d, d, e, e, f, f, f ^^, 1,0,0,1,1,1,1,0 , 1,1,0,1,0,1,1,0,1,1}.

It is noted that the front halves of the vectors are nearly equal. However, this effect only occurs with symbol sequences. For sequences of real numbers or vectors, the entries would only be similar. The decisive advantage of this data set transformation is that the distortions inherent in the training data become explicit and that feature vectors have arisen. Incidentally, however, the distortion information is the same in the original training data and the generated feature vectors. As a result of this adaptation, a component-wise comparison is now possible. Sequences that vary in length do not allow this.

In the following sub-aspect, the parameter determination of the model takes place. Using the example or training data set (4), a probability density p (m) can be estimated. This describes the structure and randomness of the data both in time and in amplitude. For the modeling of the probability density, a kernel approach, for example a Parzen approach, can be used (Parzen: On estimation of a probability density and mode, Annais of Mathematical Statistics, VoI 33: 1065-1076, 1962):

p (m) _* - 1 Σ "φ (iΑ- m _k , s) with (5)

Here n is the number of feature vectors, d the dimension of the feature vectors, s = (si, ..., S _r J ^{τ is} a smoothing parameter to be estimated and m ^ = (mu, ..., w ^ ^{r is} the k th feature vector The only open parameter s can be determined with the aid of a fixed point ration so that the predictive capability of the density estimate ^ (m) becomes the maximum (Duin: On the choice of the smoothing parameters for parse estimators of probability density functions, IEEE Transactions on Computers, Vol. C-25, No. 11: 1175-1179, 1976).

To reduce the amount of data, such Gaussian functions 0 (m - m ,, s) and ^ (m - m ^ s) are combined with i ≠ j into a single Gaussian function α '^ (m - m', sj) whose similarity is big enough. The new parameters occur as a result of the forming process , s' and m'on. The resulting model of the distribution is after the

Summary

^ (m) ^{^"-Σ} α t ^ ^(m - ^m t ^'s t)' ⁽⁶⁾

where q can be much smaller than n. The formulas for the parameters a [, s [and mj are

The expression (m, -mj ² is to be understood component by component, ie each component of the Vector m, - m, is individually squared. Before the summary s, = s and a, = 1 for all / = 1, ..., n. As a criterion for the similarity of two Gaussian functions ^ (m - m ^ s,) and φ (m - m _j , S _j ) is suitable

The model p (m) of the probability distribution after compression consists of a sum of q Gaussian distributions ^ (m - m ^., S ^.) Weighted by the factors a _k 'with Jc = 1, ..., q. The vector dimension d can then be reduced in the same way.

Each of the resulting q Gaussian functions φ {va. - va _k ',%' _k ) is a specialist for a subset of the data and consists of a product of scalar Gaussian functions. The scalar Gaussian functions thereby model either a local probability density in time or in amplitude, depending on the component of the feature vector m, which consists of a sequence S and a binary distortion vector δ. Each of the q Gaussian function

u φ (m-m ' _k , s _k ') = Y [-r exp K -O ² (9)

24

can undo the feature vector coding as

φ (mm _k ', s' _k) = YIp _e, (x) .p _υ (δ) (10)

/ = 1

be interpreted. The proportions of s _k 'and m ^, which originate from the distortion vectors δ, determine the parameters for the transition densities p _n (δ) and the components which derive directly from the sequences 5 the parameters for the emission densities p _{e ι} (x) , The

Emission densities and transition densities are merely the factors of the product (9) in recoded form. The pararnetrization phase is over. The following part describes how the model can be applied efficiently. The sub-aspect concerning the application of the actual pattern recognition model follows.

During the application phase, a sequence S is examined as to whether patterns similar to the sequences of the example data set occur anywhere. In this case, the transformation that was carried out during the parameterization phase must implicitly also take place for the observed sequence S. The method given by the following formula (11) is capable of doing so efficiently.

In principle, the method works like a digital filter, i. For each element of the sequence S to be examined, a measure is output which gives information about the current similarity. If this similarity measure exceeds a given threshold, then a suitable appearing reaction can take place. The evaluation of the sequence S is also possible synchronously to a measurement, since only the current measured value is needed.

Internally, the filter works as follows: For each of the q models (see formula (6)), a matrix L is created and initialized with -∞. It is calculated per time step j for all i = 1, ..., N using the formula

L (i, j) - l, j - ά) + log (p _tJ (a))} + clog (p _ej (xy) (11)

updated. The probability distributions p _{x ι} (-) and />,, (•) result from the relation (10). The parameter a _m is at least as large to choose, so that applies to all p _n (a _m ) «0. The parameter c serves the weighting and must be determined empirically. In the simplest case c = 1 can be chosen. The value L (N, j) is the sought similarity measure at time j, which indicates how closely the currently observed sequence resembles one of the sequences from the parameterization phase. Overall, there are q of these values. The largest of these is relevant and is compared to the detection threshold to signal a detection event when it is exceeded. An implementation of L (i, j) in the form of a ring buffer is possible. The method described above describes in a general way the proposed process of pattern recognition, as it can be used in various application cases. In the following, application examples for the use of the pattern recognition method will now be described in more detail.

example 1

One application of the pattern recognition method is the knock detection in motors, which will be discussed in more detail below. Fig. 1 shows a schematic representation of a structure of a knock control for a motor.

It is assumed that with the help of a suitable sensor, a structure-borne sound signal is continuously recorded and digitized by means of an analog-to-digital conversion with a sufficiently high sampling rate. The time signal thus becomes a sequence of scalars. In the next step, this sequence is converted by means of an STFT into a sequence of spectral vectors (spectrogram: amplitude spectrum or power density spectrum), which describe the expression of certain frequency components over time. The spectral vectors can then be logarithmized and converted into cepstral vectors by means of a discrete cosine transformation. However, this step is not mandatory. The vector sequences will be referred to hereinafter as feature vector sequences to abstract from the specific type of preprocessing that is completed. The actual recognition takes place exclusively on the basis of these feature vector sequences as generally explained above.

Before the knock detection can be used, a parameterization must take place. For this purpose, example or training data must be recorded with the help of an engine test stand. Here, the type of engine to be controlled is placed at different speeds and for each cylinder in the knocking and non-knocking area. In addition to the structure-borne sound signals with appropriate sensors, the cylinder internal pressure is measured. These data are required in order to be able to clearly judge whether a concretely measured structure-borne noise signal corresponds to a knocking or a non-knocking combustion (see Fig. 2). Subsequently, the recorded structure-borne noise data are prepared by cutting out all areas in which there is an overpressure in the simultaneously measured pressure signal. In addition, the knocking strength of each structure-borne sound fragment is determined on the basis of the pressure signal and connected to it (labeled). The pressure signals are bandpass filtered and rectified. The remaining maximum amplitude represents a measure of the current strength of knocking. After this step, a data set of structure-borne sound fragments is available, with which the knock detection can be parameterized. The pressure signals are then no longer needed.

Two models are parameterized for knock detection. The first model is for the detection of knocking burns, the second for the detection of non-knocking burns. In this way the task can be reduced to a simple classification problem. The starting point for the parameterization are the structure-borne sound fragments cut out of the continuous structure-borne noise signal and labeled with the knocking strength.

The model for non-knocking burns is parameterized only with those structure-borne sound fragments whose knock strength lies below a previously defined threshold S ₁ . Accordingly, the model for the knocking burns is parameterized with the help of clearly knocking structure-borne sound fragments. The knock intensity must exceed a threshold ε ₂ . Both thresholds S ₁ and s ₂ may be the same. However, it makes sense to choose S ₂ slightly larger than S _x . Apart from the database used, both models are otherwise completely identical. Likewise, the parameterization phase is not different from each other, so it is sufficient to describe them using a single model.

As a rule, it is more favorable for the pattern recognition not to analyze the structure-borne sound signals directly, but rather feature vector sequences derived therefrom, ie sequences of feature vectors. In this specific example, it is useful, as already described, to subdivide structure-borne sound signals first into short overlapping equal-length time windows and to calculate the amplitudes or power density spectra from each of them. Each of these spectra can be considered as a feature vector of fixed dimension. A structure-borne sound fragment thus becomes a feature vector sequence (see Fig. 3). Since the structure-borne sound fragments differ in their length, the feature vector sequences generated by the preprocessing differ in their length. A direct comparison is not possible. Similarly, dealing with the classification problem with classical feature vector-based pattern recognition methods is impossible because they require that a self-contained feature space exists and thus allow implicit estimation of the probability distribution of the example data set.

According to the method described above, feature vectors are then formed, which are then used to parameterize the model, as explained above. Subsequently, the model for pattern recognition can be used in the manner explained above. Since two models were generated during the parameterization phase, namely once for knocking and once for non-knocking burns, two of these values exist. Depending on which of these values is greater, either a knocking or a non-knocking combustion is present. If both values are low, there is either no combustion at the moment or the sensor is damaged. The engine control unit thus has the opportunity to detect a failure of the knock detection, which is important in order to avoid damage to the engine.

The method described allows a continuous search for knocking burns. By this is meant that the method, like a digital filter, can provide a criterion for the instantaneous knock magnitude at each sampling instant. Moreover, no a priori specifications are required and the determination of the parameters is largely constructive, i. without numerical optimization.

As discussed above in connection with knock detection, other problems associated with pattern recognition can also be attributed to a sequence detection problem, which will be discussed in more detail below.

Example 2

Some of the applications are based on time signals. In these applications, it is relatively obvious at which point the method of sequencing is usefully employed. can be set. For example, in the signal analysis of ECG signals (ECG electrocardiogram) directly the time signal can be used. It is then a use of the above-described method for automatic pattern recognition in a signal analysis of ECG signals. In this way, sequences in the ECG signals can be determined, which may indicate arrhythmias.

Example 3

The use of automatic pattern recognition in connection with speech recognition is also based on time signals. When recognizing speech, however, it makes sense to perform a preprocessing of the time signals, which in this specific case are audio signals. For this purpose, the sound signals are converted into sequences of spectral vectors equivalent to the procedure described above in the knock detection. The advantage of this transformation is that the phases that are irrelevant for physical reasons can be easily removed from the signals. Therefore, Fig. 3 also applies to the case of machine voice recognition.

The simplest use of machine speech recognition is to recognize individual predefined command words. This requires at least a microphone and a microprocessor, which additionally has to be able to read in the analog audio signals digitally. In order to use the method described above for command word recognition, it is first necessary to record an example data record with this measuring device. For each command word at least a few examples must be recorded. These are then prepared and labeled, i. It is machine-readable which command word it is at each concrete example.

For each command word a model is created. For this purpose, the corresponding examples are preprocessed and converted into spectral vector sequences. These are the actual sequences from which characteristic vectors of the same length are then generated in the manner already described (formulas (1) to (4)). With the aid of the described parameterization (formulas (5) to (10)) the models are subsequently generated. The relationship (11) then allows the use of the generated models to analyze a continuous audio signal. If the similarity measure for each model constantly calculated to a certain If the time exceeds the predefined threshold, it can be assumed that the continuously examined audio signal currently contained an utterance which was similar to the command words used in the parameterization of the corresponding model. A message of the associated label appears to the user of the system as recognition of his spoken utterance and can be used to trigger certain useful actions.

Example 4

In a virus scanner, the patterns to be searched consist of certain significant code fragments, ie sequences or sequences of bytes describing the behavior of the code. For viruses to be less easy to find, variations are often added to certain parts of the code that, while not modifying the actual behavior, result in a changed sequence of bytes. For example, NOP machine instructions (No Operation) can be inserted anywhere on the code. Other code sequences, which ultimately cause nothing, can be inserted.

The procedure for locating malicious program code using the method described above is to describe the byte sequences of different modified versions by a common model and to search for the occurrence of the virus with this. For this, the byte sequences of the formulas (1) to (4) are correspondingly transformed into feature vectors of fixed length. Subsequently, the parameterization of the model takes place. It is then a use of the above-described method for automatic pattern recognition in virus scanning.

Example 5

A very similar problem area is the search for genes or similar genes in DNA sequences. This searches for amino acid sequences instead of byte sequences. It is then a use of the above-described method for automatic pattern recognition (gene sequences) in gene sequence analysis, wherein the sequence of electronic data represents a gene sequence. Example 6

Not so obvious is the application in the image analysis, since there are two-dimensional data structures. Some of these problems can be attributed to a sequence analysis problem. For example, a handwritten text can be interpreted as a sequence or sequence of XY coordinates. However, due to variable writing speeds, these sequences can not be directly compared. However, the invention provides a direct way of processing such data. For example, the task could be to check the signature or signature of a person, e.g. to authenticate a laptop. The necessary hardware, a touchpad and a computer for the evaluation are already included in the devices.

Each sequence begins when a touch is registered on the touchpad and ends when it has not been touched for a while. In order for the position at which the signature or signature is written to have no effect, the first coordinate of the sequence may be subtracted from all remaining coordinates of the sequence. This will ensure that each coordinate sequence starts at the origin (0,0).

In order to be able to recognize the signature or signature of a person, some examples are needed, from which according to formulas (1) to (4) the feature vectors of fixed length are generated. Based on this, the model is then parameterized (formulas (5) to (10)). After the model has been fully parameterized, it can be used to compare all received coordinate sequences either constantly or only on demand with the stored model. For this purpose, the formula (11) can be used. Example 7

In machine signal analysis, time signals are often used which can be interpreted directly as sequences, namely current or voltage characteristics. Other sensor data in which interference by transfer functions takes place can be examined in the form of spectrograms (see knock detection above). As a rule, many applications exist in mechanical and plant engineering, where the described sequence recognition can be sensibly used. However, it is typical that these are almost always detail problems, for example part of a controller, part of process monitoring or the like. It is then a use of the above-described method for automatic pattern recognition in the control or process monitoring of a machine or plant, wherein the sequence of electronic data represents data acquired for the control or the process monitoring, whereby previously associated sample or training data is acquired were.

Example 8

Another application of the pattern recognition method is the evaluation of thermal image data for quality control of machined forged components. Forged components occasionally show cracks. Visually, the cracks are usually not easy to recognize. However, the respective cooling behavior deviates from areas with cracks and areas without cracks. In order to detect such deviations, images of the forged components are recorded by means of a thermal imaging camera for a short time. The cooling of a component corresponds to a change in a gray value image G (x, y, t) formed by the thermal imaging camera over a time t. Since the position of the component with respect to the thermal imaging camera does not change during the recording, the image coordinates x and y (pixels) are assigned to a respective area of the component surface. The temporal behavior of the gray value can be approximately described here by means of a decaying exponential function:

G (x, y, t) «G (x, y, 0) ^■ exp (- l (x, y) ^■ t) The parameter I (x, y) can preferably be estimated by means of linear regression. Further parameters describing the cooling are possible. For each image coordinate x and y, a parameter vector V (x, y) is thus formed which, for example, in the simplest case is only one-dimensional: V (x, y) = 1 (x, y).

Since each image coordinate x and y is assigned exactly one parameter vector V (x, y), the result of this preprocessing can be represented as a gray value image (one-dimensional parameter vector) or as a false color image (multi-dimensional parameter vector). A different cooling behavior in such secondary images V (x, y) is visually immediately recognizable as an unusual discoloration. For a machine evaluation, however, it is disturbing that the position and orientation of the components in the secondary image varies from case to case. This variation has procedural reasons and manifests itself essentially as a horizontal shift or distortion. A simple comparison with a reference image is therefore not possible

On the other hand, it is possible to assign each column Sp (x) = (V (x, l), V (x, 2), V (x, 3), ...) of the secondary image V (x, y) as a vector interpret. The sequence of columns S (x) from left to right then forms a sequence of vectors S = Sp (I), Sp (2), Sp (3), ... and thus a sequence. The task of finding the position of the component and the comparison with a reference is thus reduced to a sequence detection problem which can be solved with the pattern recognition method according to the invention. The reference image (reference) is formed for example by means of the method according to the invention from several example sequences of defect-free components

Overall, a method for automatic pattern recognition is described above, which can be used in a variety of applications by analyzing corresponding electronic data, which comprise an information associated with the respective application, in the manner explained above. The starting point of the method here is first the generation of a set of feature vectors of equal length or dimension from training or example data by means of a dynamic-time warping method. In this way, feature vectors are generated, which can then be examined in principle with the aid of any classifiers for pattern recognition. For example, a neural network (eg a multilayer perceptron) could also be used (bis- hop: Neural networks for Pattern Recognition, Clarendon Press, Oxford, 1995). Numerous other classifiers such as support vector machines, polynomial classifiers or decision tree methods are also possible (Niemann: Klassifikation von Muster, 1995). However, all classifiers must solve the problem of efficiently performing the necessary equalization of the observed sequences during the application phase. In its basic form, none of the above methods is capable of doing so.

Regardless of the subsequent execution of the classification method, the generation of the set of feature vectors constitutes an independent aspect of the invention, which develops its advantages independently of the subsequent choice of the classifier and thus in connection with various classifiers.

The described method for automatic pattern recognition can advantageously be used in particular in connection with the following applications: machine speech recognition, handwriting recognition, gene sequence analysis, search for malicious program code (virus scanner), medical technology applications such as cardiac pacemakers or electrocardiograms and mechanical diagnostic applications such as knock detection.

The features of the invention disclosed in the above description, the claims and the drawing may be of significance both individually and in any combination for the realization of the invention in its various embodiments.

Claims

claims

A method for automatic pattern recognition in a sequence of electronic data by means of electronic data processing in a data processing system, in which the sequence of electronic data is compared in an analysis with parametric model data representing at least one pattern sequence, and wherein the at least one pattern sequence is detected, if determining, in the analysis, that model data comprised of the parameterized model data associated with the at least one pattern sequence occurs with a similarity measure exceeding a similarity threshold, wherein in forming the parameterized model data, training data is converted to a set of by a dynamic-time warping method Feature vectors of the same length and with the same information content as the training data are processed, from which the parameterized model data are derived.

2. Method according to claim 1, characterized in that the parameterized model data are derived from the set of feature vectors by parameterizing a feature-vector-based classifi- cator.

3. Method according to claim 2, characterized in that a classifier based on classifiers is a Bayes classifier with Parzen window density estimation.

4. Method according to at least one of the preceding claims, characterized in that the similarity measure L (NJ) for a subsequence of electronic data examined at the time j of the analysis is determined from the sequence of electronic data as follows:

^{L (i} > ■ / ^{): =} _β Jp ^ J ¹ ^ ^' - 1, 7 ^~ α ⁾ + ^lo g ⁽ A, ₍ ⁽⁰ O ⁾ ) + ^{c 1} O ^{g (} X, O, ⁾⁾ where X _{j are} the elements of the sequence of electronic data, p _t , (•) and p _e , (•) the / th elements of a total of N elements of the parameterized model data and c and a _m are constants to be chosen empirically.

Apparatus for automatic pattern recognition in a sequence of electronic data by means of electronic data processing, comprising a data processing system having the following features:

Pattern recognition means configured to compare, in an analysis, the sequence of electronic data with parameterized model data representing at least one pattern sequence and to recognize the at least one pattern sequence if the analysis determines that model data comprised by the parameterized model data assigned to the at least one pattern sequence having a similarity measure exceeding a similarity threshold, and - model data generating means configured to generate the parameterized model data using the training data, and thereby the training data to a set by a dynamic-time warping method feature vectors of equal length and informational content as the training data from which the parameterized model data is derived, and providing means configured to provide electronically evaluable recognition to provide gsinformation about recognizing the at least one pattern sequence for an output.