CN102004549A

CN102004549A - Automatic lip language identification system suitable for Chinese language

Info

Publication number: CN102004549A
Application number: CN 201010558253
Authority: CN
Inventors: 吕坤; 贾云得; 张欣
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2010-11-22
Filing date: 2010-11-22
Publication date: 2011-04-06
Anticipated expiration: 2030-11-22
Also published as: CN102004549B

Abstract

The invention relates to an automatic lip language identification system suitable for Chinese language, comprising a wear-type camera, a man-machine interaction module, a lip contour positioning module, a geometric vector acquisition module, a motion vector acquisition module, a characteristic matrix building module, a transformation matrix T acquisition module, a conversion characteristic matrix acquisition module, a memory A, a memory B and a canonical correlation discriminatory analysis module. The wear-type camera is used for recording Chinese character sound image sequences, transmitting the Chinese character sound image sequences to the lip contour positioning module through the man-machine interaction module, and detecting and tracking lip contours by utilizing a convolution virtual electrostatic field Snake module; the geometric vector acquisition module and the motion vector acquisition module respectively extract geometric and motion characteristics from the lip contours and join up the geometric and motion characteristics as an input characteristic matrix of the canonical correlation discriminatory analysis module; and the canonical correlation discriminatory analysis module calculates the similarity among the characteristic matrixes and acquires identification results after processing. Compared with the traditional lip language identification systems, the system has higher identification accuracy.

Description

Automatic lip language recognition system suitable for Chinese

Technical Field

The invention relates to an automatic lip language recognition system, in particular to an automatic lip language recognition system suitable for Chinese, and belongs to the technical field of automatic lip language recognition.

Background

Lip language Recognition or lip reading is an attractive field in Human-Computer Interaction (HCI) and plays an important role in Automatic Speech Recognition (ASR) systems. Human language perception is a very natural multimodal process. People with hearing impairment can take full advantage of lip language cues, and even normal people can take advantage of visual information to enhance language understanding, particularly in noisy environments. The utilization of the information of the visual channel can effectively improve the performance and robustness of the modern automatic language identification system.

The lip recognition task generally comprises three main steps: firstly, detecting face and lip regions in a pronunciation image sequence; extracting features suitable for classification from the lip region; and thirdly, lip language recognition is carried out by using the lip region characteristics.

For the first step, the existing method mainly uses an image processing algorithm to position the face and lip region, and the method is easily affected by illumination, angle, rotation, shielding and the like, and generates certain errors.

The lip language features mentioned in the second step are classified into three categories in the existing literature: (1) a texture-based feature of a lower layer; (2) high-level contour-based features; (3) a combination of the two. Of these features, lip geometry (e.g., height, width, angle of the lip) and lip movement characteristics among the contour-based features are considered to be the most useful visual information. A great deal of recent work on lip contour segmentation has used deformable templates (deformable models), and one effective method is to use Snake models and modified Snake models, such as Gradient Vector Flow (GVF) Snake models, Virtual Electrostatic Field (VEF) Snake models, convolution Virtual electrostatic Field (convolutive VEF) Snake models. In contrast, the convolution virtual electrostatic field Snake model can more quickly and accurately locate the lip contour by using a Virtual Electrostatic Field (VEF) as an external force (external force) and a convolution mechanism (convolution).

In the third step of lip language recognition using lip region features, a widely used classification method is a Hidden Markov Model (HMM). Hidden markov models are useful in language recognition because they naturally model the temporal characteristics of a language. But considering the essential nature of the language, the assumption of segmented static and dependency of hidden markov models (the piece-wise stability and independence assertions) is two limitations of the model.

An important prior art used in the present invention is: and (3) a lip tracking algorithm based on a convolution virtual static electric field Snake model.

A detailed design of a lip tracking algorithm based on a convolution virtual electrostatic field Snake model is disclosed in the document "lip tracking algorithm based on a convolution virtual electrostatic field Snake model" (the sixth conference on harmony human-computer environment joint academic conference, 2010) by lukun et al.

Another important prior art used in the present invention is: a typical Correlation Analysis of Canonical Correlation (DCC) method.

Kim et al, in the document "analytical Learning And registration of Image Set Classes Using Canonical Correlations" (IEEE Transactions On Pattern Analysis And Machine Analysis, Vol.29, No.6 (2007)). According to the method, the similarity (represented by typical correlation coefficients) of homogeneous data sets (within-class sets) is maximized by introducing a transformation matrix T, and the similarity of heterogeneous data sets (within-class sets) is minimized, so that a better recognition effect is achieved.

In recent years, the canonical correlation discriminant analysis method has been successfully applied to the fields of image set matching, human face or object recognition and the like, so that the canonical correlation discriminant analysis method is a simple and effective method in theory for solving the problem of lip language recognition. However, no relevant documents and practical applications using typical correlation discriminant analysis methods for automatic lip recognition have been found so far.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides an automatic lip language recognition system suitable for Chinese.

The purpose of the invention is realized by the following technical scheme.

An automatic lip language recognition system for Chinese, comprising: the device comprises a head-mounted camera, a man-machine interaction module, a lip contour positioning module, a geometric vector acquisition module, a motion vector acquisition module, a feature matrix construction module, a transformation matrix T acquisition module, a conversion feature matrix acquisition module, a memory A, a memory B and a typical correlation discriminant analysis module.

The connection relationship is as follows: the output end of the head-mounted camera is connected with the input end of the man-machine interaction module; the output end of the human-computer interaction module is connected with the input end of the lip contour positioning module; the output end of the lip contour positioning module is connected with the input end of the geometric vector acquisition module; the output end of the geometric vector acquisition module is connected with the input ends of the motion vector acquisition module and the feature matrix construction module; the output end of the motion vector acquisition module is connected with the input end of the characteristic matrix construction module; the output end of the feature matrix construction module is connected with the input ends of the transformation matrix T acquisition module and the transformation feature matrix acquisition module; the transformation matrix T acquisition module is connected with the memory A; the conversion characteristic matrix acquisition module is connected with the memory A and the memory B; the memory A and the memory B are also connected with the input end of the typical correlation discriminant analysis module; the output end of the typical correlation discriminant analysis module is connected with the input end of the human-computer interaction module.

The main functions of each module and equipment are as follows:

the main functions of the head-mounted camera are: acquiring a Chinese character pronunciation image sequence sent by a testee.

The main functions of the man-machine interaction module are as follows: providing a closed contour curve for a testee to adjust the position of the head-mounted camera, so that the lip region of the testee acquired by the head-mounted camera is contained in the closed contour curve. Acquiring a Chinese character pronunciation image sequence shot by the head-mounted camera; thirdly, outputting the result of the typical relevant discriminant analysis module.

The main functions of the lip profile positioning module are: lip contour curves are obtained by sequentially positioning the lip contour on each frame of image in a Chinese character pronunciation image sequence by using a lip tracking algorithm proposed in a document 'lip tracking algorithm based on a convolution virtual electrostatic field Snake model' by Lukun et al, and the lip contour curves are output to a geometric vector acquisition module.

The main functions of the geometric vector acquisition module are: lip geometric characteristic vectors are obtained from lip contour curves of each frame of image in the Chinese character pronunciation image sequence output by the lip contour positioning module; and in order to compensate lip difference and image scaling difference between different testees, lip geometric feature vectors are subjected to normalization operation to obtain normalized lip geometric feature vectors, and the normalized lip geometric feature vectors are output to a motion vector acquisition module and a feature matrix construction module.

The main functions of the motion vector acquisition module are: and constructing lip motion characteristic vectors of each frame of image on the basis of the lip geometric characteristic vectors subjected to normalization operation, and then outputting the lip motion characteristic vectors to a characteristic matrix construction module.

The main functions of the feature matrix construction module are: and constructing a characteristic matrix of the Chinese character pronunciation image sequence, and then outputting the characteristic matrix of the Chinese character pronunciation image sequence to a transformation matrix T acquisition module and a conversion characteristic matrix acquisition module.

The main functions of the transformation matrix T acquisition module are: a feature matrix of a Chinese character pronunciation Image sequence of training data is processed by a typical correlation discriminant Analysis method provided by T. -K.Kim et al in the document "characterization Learning And Recognition of Image Set Classes Using Canonica Correlations" (IEEE Transactions On Pattern Analysis And Machine understanding, Vo1.29, No.6(2007)), so as to obtain a transformation matrix T, And the transformation matrix T is stored in a memory A.

The main functions of the conversion characteristic matrix acquisition module are as follows: and converting the feature matrix of the Chinese character pronunciation image sequence of the training data by using the transformation matrix T in sequence to obtain a conversion feature matrix, and storing the conversion feature matrix of the Chinese character pronunciation image sequence of the training data in the memory A.

A memory A: and storing the transformation matrix T and a conversion characteristic matrix of the Chinese character pronunciation image sequence of the training data.

A memory B: and storing the conversion characteristic matrix of the Chinese character pronunciation image sequence of the test data.

A typical correlation discriminant analysis module: and acquiring the typical correlation coefficient sum of the conversion feature matrix of the current test data and the conversion feature matrix of each training data in the memory A from the memory B, further processing the typical correlation coefficient sums to obtain the identification result of the current test data, and outputting the identification result to the human-computer interaction module.

The working process of the automatic lip language recognition system comprises a system training process and a system testing process:

the working flow of the system training process is as follows:

step 1.1: selecting m Chinese characters as training data, wherein m is more than or equal to 5 and m is a positive integer;

step 1.2: the human-computer interaction module displays a closed contour curve.

Step 1.3: the tested person fixes the head-wearing camera on the head; the position of the head-mounted camera is adjusted by the tested person, so that the tested person can directly shoot the lower half part of the tested face, and the shot image is sent to the human-computer interaction module for display; the subject again adjusts the position of the head-mounted camera so that the lip region of the subject is contained in the closed contour curve described in step 1.2.

Step 1.4: the testee pronounces the m Chinese characters in the step 1.1 at the speed of 1 Chinese character per second, and simultaneously the shooting speed of the head-wearing camera is n frames per second, n is more than or equal to 25 and n is a positive integer; therefore, the video stream of each Chinese character pronunciation consists of n frames of image sequences; the n frame image sequence of a Chinese character is called a Chinese character pronunciation image sequence; and the head-mounted camera sends the shot Chinese character pronunciation image sequence to the man-machine interaction module.

Step 1.5: and the human-computer interaction module sends the closed contour curve in the step 1.2 and the Chinese character pronunciation image sequence shot by the head-mounted camera in the step 1.4 to the lip contour positioning module.

Step 1.6: the lip contour positioning module uses a lip tracking algorithm proposed by Lukun et al in the literature, "lip tracking algorithm based on convolution virtual electrostatic field Snake model" to sequentially position the lip contour on each frame image in the Chinese character pronunciation image sequence, so as to obtain a lip contour curve, and outputs the lip contour curve to the geometric vector acquisition module. When the lip outline of the first image in each Chinese character pronunciation image sequence is positioned, the initial curve of the convolution virtual electrostatic field Snake model adopts a closed outline curve provided by a man-machine interaction module; when the lip contours of other images in the Chinese character pronunciation image sequence are positioned, the initial curve of the convolution virtual electrostatic field Snake model adopts the lip positioning result curve of the previous image of the images.

Step 1.7: the geometric vector acquisition module sequentially gets the data fromObtaining lip geometric characteristic vector from lip contour curve of each frame image in Chinese character pronunciation image sequence, and using g_iI represents the sequence number of each frame image in a Chinese character pronunciation image sequence, i is more than or equal to 1 and less than or equal to n, and i is a positive integer; and in order to compensate lip shape difference and image scaling difference among different testees, lip geometric characteristic vector g is subjected to_iCarrying out normalization operation to obtain lip geometric feature vector after normalization operation, and using g_i' represents; and then outputting the lip geometric feature vector after the normalization operation to a motion vector acquisition module and a feature matrix construction module. The specific operation steps for obtaining the lip geometric feature vector after the normalization operation are as follows:

step 1.7.1: and calculating an extreme value of the lip contour curve in the horizontal direction to obtain point coordinates of the left and right mouth angles.

Step 1.7.2: connecting the left and right nozzle corner points by a straight line, taking the midpoint of the left and right nozzle corner points as the center of a circle, and taking the center of the circle as a point O, and rotating the straight line clockwise for 5 times, wherein the rotation is 30 degrees each time; two line segments of which the straight line intersects the lip-shaped curve are obtained every time the lip-shaped curve rotates once, and 12 line segments are obtained in total, and L is respectively used in the clockwise sequence from the left mouth corner₁～L₁₂The length of the 12 line segments is expressed, and the length L of the 12 line segments is called₁～L₁₂Is a radiation vector; when a straight line between the two points at the left and right mouth corners is rotated by 90 degrees, an upper intersection point and a lower intersection point intersecting the lip-shaped curve become a point a and a point B, respectively.

Step 1.7.3: selecting one point from two points of the left and right mouth corners, namely the point Q, and respectively connecting the point Q with a point A and a point B by straight lines; angle AQO is theta₁Indicating that angle BQO is theta₂Is represented by L₁～L₁₂To obtain theta₁And theta₂To thereby obtain theta₁And theta₂Cosine value of (d);

step 1.7.4: l is₁～L₁₂And theta₁And theta₂The cosine value of the image forms a lip geometric feature vector in a frame of image; due to L₁And L₇Is half the length of the line connecting the left and right mouth corners, so that their values are equal, thus removing L from the geometric feature vector of the lips₇I.e. the geometric feature vector g of the lips in a frame of image_i＝[L₁，…，L₆，L₈，…L₁₂，cosθ₁，cosθ₂]^t；

Step 1.7.5: to compensate lip shape difference and image scaling difference between different testees, lip geometric feature vector g is subjected to_iCarrying out normalization operation to obtain lip geometric feature vector after normalization operation, and using g_i' represents; g_i' is a 13-dimensional transverse vector, g_i′＝[L₁′，…，L₆′，L₈′，…L₁₂′，cosθ₁，cosθ₂](ii) a Wherein,

j＝1，2，…6，8，…，12，

is the distance between the left and right corners of the mouth in the first frame of image of a sequence of Chinese character pronunciation images.

Step 1.8: the motion vector acquisition module constructs lip motion characteristic vectors (p is used) of each frame of image on the basis of the lip geometric characteristic vectors subjected to normalization operation_iRepresents) p)_iIs a 13-dimensional transverse vector, p_i＝(g_i′-g_i-1')/Δ t, wherein g₀′＝g₁', Δ t is the time interval of two consecutive frames; then the lip movement characteristic vector p_iOutputting the data to a feature matrix construction module;

step 1.9: feature matrix construction module constructs feature matrix (using Z) of Chinese character pronunciation image sequence of training data_fWherein f represents the sequence number of the Chinese character pronunciation image sequence of the training data, f is more than or equal to 1 and less than or equal to m, and f is a positive integer), and then a feature matrix Z of the Chinese character pronunciation image sequence of the training data is formed_fAre respectively output toThe device comprises a transformation matrix T acquisition module and a conversion characteristic matrix acquisition module. The specific operation steps for constructing the feature matrix of the Chinese character pronunciation image sequence are as follows:

step 1.9.1: the following operations are sequentially carried out on each frame image in the Chinese character pronunciation image sequence: connecting the lip geometric feature vector with the lip motion feature vector to form a joint feature vector (using v)_iRepresents) v_iIs a 26-dimensional column vector and,

step 1.9.2: the feature matrix of the Chinese character pronunciation image sequence is composed of the joint feature vector v of each frame image in the Chinese character pronunciation image sequence_iCombined so that the feature matrix Z of the image sequence of the pronunciation of Chinese characters of the training data_f＝{v₁，v₂，...，v_n}∈R^26×n。

Step 1.10: transformation matrix T obtains the characteristic matrix Z of the Chinese character pronunciation image sequence of the module to m training data_fThe transformation matrix T belongs to R by adopting a typical correlation discriminant analysis method provided by T.K.Kim et al to process to obtain a transformation matrix T belonging to R^26×rR < 26, and R is a positive integer, R represents a real number, and the transformation matrix T is stored to the memory a.

Step 1.11: the conversion characteristic matrix acquisition module reads the transformation matrix T from the memory A and uses the transformation matrix T to sequentially compare the characteristic matrix Z of the Chinese character pronunciation image sequence of the training data_fConverting to obtain a conversion characteristic matrix Z_f′＝T^TZ_fAnd training the conversion characteristic matrix Z of the Chinese character pronunciation image sequence of the data_f' store to memory a.

Through the operation of the steps, the training of the automatic lip language recognition system can be completed.

The working flow of the system testing process is as follows:

step 2.1: m ' Chinese characters are selected from m training data as test data, m ' is less than or equal to m, and m ' is a positive integer.

Step 2.2: the human-computer interaction module displays a closed contour curve.

Step 2.3: the tested person fixes the head-wearing camera on the head; the position of the head-mounted camera is adjusted by the tested person, so that the tested person can directly shoot the lower half part of the tested face, and the shot image is sent to the human-computer interaction module for display; the subject again adjusts the position of the head-mounted camera so that the lip region of the subject is contained in the closed contour curve described in step 2.2.

Step 2.4: the testee pronounces the m' Chinese characters in the step 2.1 at the speed of 1 Chinese character per second, and simultaneously the shooting speed of the head-wearing camera is n frames per second; therefore, the video stream of each Chinese character pronunciation consists of n frames of image sequences; the n frame image sequence of a Chinese character is called a Chinese character pronunciation image sequence; and the head-mounted camera sends the shot Chinese character pronunciation image sequence to the man-machine interaction module.

Step 2.5: and the human-computer interaction module sends the closed contour curve in the step 2.2 and the Chinese character pronunciation image sequence in the step 2.4 to the lip contour positioning module.

Step 2.6: the same as step 1.6 in the system training process.

Step 2.7: the same as step 1.7 in the system training process.

Step 2.8: the same as step 1.8 in the system training process.

Step 2.9: feature matrix construction module constructs feature matrix (using Z) of Chinese character pronunciation image sequence of test data_eRepresenting, wherein e represents the sequence number of the Chinese character pronunciation image sequence of the test data, e is more than or equal to 1 and less than or equal to m' and e is a positive integer), and then testing the characteristic matrix Z of the Chinese character pronunciation image sequence of the test data_eAnd outputting the data to a conversion feature matrix acquisition module. Specific method for constructing feature matrix of Chinese character pronunciation image sequenceThe operation steps are as follows:

step 2.9.1: the following operations are sequentially carried out on each frame image in the Chinese character pronunciation image sequence: connecting the lip geometric characteristic vector with the lip movement characteristic vector to form a combined characteristic vector v_i，v_iIs a 26-dimensional column vector and,

step 2.9.2: the feature matrix of the Chinese character pronunciation image sequence is composed of the joint feature vector v of each frame image in the Chinese character pronunciation image sequence_iCombined so that the feature matrix Z of the phonetic image sequence of Chinese characters of the test data_e＝{v₁，v₂，...，v_n}∈R^26×n。

Step 2.10: the conversion characteristic matrix acquisition module reads the transformation matrix T from the memory A and uses the transformation matrix T to test the characteristic matrix Z of the Chinese character pronunciation image sequence of the data_eConverting to obtain a conversion characteristic matrix Z_e′＝T^TZ_eAnd converting the character feature matrix Z of the Chinese character pronunciation image sequence of the test data_e' store to memory B.

Step 2.11: the typical correlation discriminant analysis module reads a conversion feature matrix Z of all training data from a memory A_f' reading the conversion characteristic matrix Z of the Chinese character pronunciation image sequence of the current test data from the memory B_eKim et al then calculates the transformation feature matrix Z of the test data Using a typical correlation discriminant Analysis method Set forth in the document "characterization Learning And correlation of Image Set Classes Using Canonica Correlations" (IEEE Transactions On Pattern Analysis And Machine Analysis, Vol.29, No.6(2007))_e' conversion feature matrix Z with each training data_fThe sum of typical correlation coefficients of'; because repeated Chinese characters may exist in the training data, the sum of typical correlation coefficients corresponding to the same Chinese character is 1 or more than 1, so thatAnd further calculating the average value of the typical correlation coefficient sum corresponding to each Chinese character in the training data, taking out the maximum value from the average values, and outputting the Chinese character corresponding to the maximum value in the training data to the man-machine interaction module.

Step 2.12: the man-machine interaction module displays the Chinese characters transmitted by the typical relevant discriminant analysis module.

Through the steps, the automatic identification of the test data can be completed.

Advantageous effects

Compared with the traditional Chinese automatic lip language recognition system, the invention has the following advantages:

firstly, the invention uses the head-wearing camera to directly obtain the lip image sequence, the position of the head-wearing camera is adjusted by a man-machine interaction mode when the experiment is started, the relative position of the camera and the face is fixed in the experiment process, the tested Chinese character can be pronounced naturally, and the head posture and position are not required to be kept deliberately. Compared with the prior method, the method can accurately acquire the lip image sequence, greatly reduce the early calculation amount, reduce the constraint on the tested object and enable the experimental process to be more natural.

Secondly, the invention uses the convolution virtual electrostatic field Snake model to position the lip contour, which is faster and more accurate.

The lip language features extracted by the invention are combined with lip geometric features and lip motion features, so that the analysis is more accurate.

The invention successfully applies the typical correlation discriminant analysis method to the field of lip language automatic recognition for the first time, and overcomes the limitation of the hidden Markov model in language recognition.

Drawings

Fig. 1 is a schematic structural diagram of an automatic lip language recognition system for chinese according to an embodiment of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

An automatic lip language recognition system suitable for Chinese, the system structure of which is shown in fig. 1, includes: the device comprises a head-mounted camera, a man-machine interaction module, a lip contour positioning module, a geometric vector acquisition module, a motion vector acquisition module, a feature matrix construction module, a transformation matrix T acquisition module, a conversion feature matrix acquisition module, a memory A, a memory B and a typical correlation discriminant analysis module.

The main functions of each module and equipment are as follows:

The main functions of the lip profile positioning module are: lip contour curves are obtained by sequentially positioning the lip contour on each frame of image in a Chinese character pronunciation image sequence by using a lip tracking algorithm proposed by Lukun et al in the document 'lip tracking algorithm based on convolution virtual electrostatic field Snake model', and the lip contour curves are output to a geometric vector acquisition module.

The main functions of the transformation matrix T acquisition module are: a feature matrix of a Chinese character pronunciation Image sequence of training data is processed by a typical correlation discriminant Analysis method provided by T. -K.Kim et al in the document "characterization Learning And Recognition of Image Set Classes Using Canonical Correlations" (IEEE Transactions On Pattern Analysis And Machine Analysis, Vol.29, No.6(2007)), so as to obtain a transformation matrix T, And the transformation matrix T is stored in a memory A.

The system is used for carrying out experiments, 10 testees (4 men and 6 women) are selected in the experiments, then each person pronounces 10 Chinese characters of 'zero, one, two, three, four, five, I, love, Beijing and Beijing' for 20 times, and each Chinese character obtains 200 Chinese character pronunciation image sequences; then, for each Chinese character, randomly selecting 80% (160) from 200 Chinese character pronunciation image sequences corresponding to each Chinese character as training data, and using the rest 20% (40) Chinese character pronunciation image sequences as test data; thus there were 1600 training data and 400 test data.

The steps for obtaining 2000 Chinese character pronunciation image sequences are as follows:

step 1: the human-computer interaction module displays a closed contour curve.

Step 2: the head-wearing camera is fixed on the head by 10 tested persons in sequence; the position of the head-mounted camera is adjusted by the tested person, so that the tested person can directly shoot the lower half part of the tested face, and the shot image is sent to the human-computer interaction module for display; the subject again adjusts the position of the head-mounted camera so that the lip region of the subject is contained in the closed contour curve described in step 1.

And step 3: a testee pronounces 10 Chinese characters of 'zero, one, two, three, four, five, I, love, Beijing and Beijing' at the speed of 1 Chinese character per second, wherein each Chinese character pronounces 20 times, and the shooting speed of the head-wearing camera is 30 frames per second, so that the video stream of each Chinese character pronunciation is composed of 30 frame image sequences; a30-frame image sequence of a Chinese character is referred to as a Chinese character pronunciation image sequence.

Through the operation of the steps, 2000 Chinese character pronunciation image sequences of 10 Chinese characters can be obtained.

Then, the experimenter uses 1600 randomly selected Chinese character pronunciation image sequences as training data to train the system, and the process is as follows:

step 1: and sending the closed contour curve appearing in the man-machine interaction module and 1600 Chinese character pronunciation image sequences to the lip contour positioning module.

Step 2: the lip contour positioning module uses a lip tracking algorithm proposed by Lukun et al in the literature 'lip tracking algorithm based on convolution virtual electrostatic field Snake model' to sequentially position the lip contour on each frame of image in the Chinese character pronunciation image sequence to obtain a lip contour curve, and outputs the lip contour curve to the geometric vector acquisition module. When the lip outline of the first image in each Chinese character pronunciation image sequence is positioned, the initial curve of the convolution virtual electrostatic field Snake model adopts a closed outline curve provided by a man-machine interaction module; when the lip contours of other images in the Chinese character pronunciation image sequence are positioned, the initial curve of the convolution virtual electrostatic field Snake model adopts the lip positioning result curve of the previous image of the images.

And step 3: geometric vector acquisition module sequentially pronounces image sequences from Chinese charactersObtaining lip geometric characteristic vector g from lip contour curve of each frame of image₁～g₃₀Is represented by the following formula (I); and in order to compensate lip shape difference and image scaling difference among different testees, lip geometric characteristic vector g is subjected to₁～g₃₀Carrying out normalization operation to obtain lip geometric feature vector g after normalization operation₁′～g₃₀'; and then outputting the lip geometric feature vector after the normalization operation to a motion vector acquisition module and a feature matrix construction module. The specific operation steps for obtaining the lip geometric feature vector after the normalization operation are as follows:

step 3.1: and calculating an extreme value of the lip contour curve in the horizontal direction to obtain point coordinates of the left and right mouth angles.

Step 3.2: connecting the left and right nozzle corner points by a straight line, taking the midpoint of the left and right nozzle corner points as the center of a circle, and taking the center of the circle as a point O, and rotating the straight line clockwise for 5 times, wherein the rotation is 30 degrees each time; two line segments of which the straight line intersects the lip-shaped curve are obtained every time the lip-shaped curve rotates once, and 12 line segments are obtained in total, and L is respectively used in the clockwise sequence from the left mouth corner₁～L₁₂The length of the 12 line segments is expressed, and the length L of the 12 line segments is called₁～L₁₂Is a radiation vector; when a straight line between the two points at the left and right mouth corners is rotated by 90 degrees, an upper intersection point and a lower intersection point intersecting the lip-shaped curve become a point a and a point B, respectively.

Step 3.3: the left mouth angle is called as a point Q, and the point Q is respectively connected with a point A and a point B by straight lines; angle AQO is theta₁Indicating that angle BQO is theta₂Is represented by L₁～L₁₂To obtain theta₁And theta₂To thereby obtain theta₁And theta₂Cosine value of (d);

step 3.4: l is₁～L₁₂And theta₁And theta₂The cosine value of the image forms a lip geometric feature vector in a frame of image; due to L₁And L₇Is half the length of the line connecting the left and right mouth corners, so that they have equal values, and are therefore used in the geometric feature vector of the lipsFalling L₇I.e. the geometric feature vector g of the lips in a frame of image_i＝[L₁，…，L₆，L₈，…L₁₂，cosθ₁，cosθ₂]^t，i＝1，2，…，30；

Step 3.5: to compensate lip shape difference and image scaling difference between different testees, lip geometric feature vector g is subjected to_iCarrying out normalization operation to obtain lip geometric feature vector g after normalization operation_i′；g_i' is a 13-dimensional transverse vector, g_i′＝[L₁′，…，L₆′，L₈′，…L₁₂′，cosθ₁，cosθ₂](ii) a Wherein,

j＝1，2，…6，8，…，12，

And 4, step 4: the motion vector acquisition module constructs lip motion characteristic vectors p of each frame of image on the basis of the lip geometric characteristic vectors subjected to normalization operation_i，p_iIs a 13-dimensional transverse vector, p_i＝(g_i′-g_i-1')/Δ t, wherein g₀′＝g₁', Δ t is the time interval of two consecutive frames; then the lip movement characteristic vector p_iOutputting the data to a feature matrix construction module;

and 5: feature matrix construction module constructs feature matrix Z of Chinese character pronunciation image sequence of training data_fF is 1, 2, …, 1600, and then training the feature matrix Z of the Chinese character pronunciation image sequence of the data_fAnd respectively outputting the data to a transformation matrix T acquisition module and a conversion characteristic matrix acquisition module. The specific operation steps for constructing the feature matrix of the Chinese character pronunciation image sequence are as follows:

step 5.1: in turn, theThe following operations are carried out on each frame image in the Chinese character pronunciation image sequence: connecting the lip geometric characteristic vector with the lip movement characteristic vector to form a combined characteristic vector v_i，v_iIs a 26-dimensional column vector and,

step 5.2: the feature matrix of the Chinese character pronunciation image sequence is composed of the joint feature vector v of each frame image in the Chinese character pronunciation image sequence_iCombined so that the feature matrix Z of the image sequence of the pronunciation of Chinese characters of the training data_f＝{v₁，v₂，...，v_n}∈R^26×30。

Step 1.6: transformation matrix T obtains the characteristic matrix Z of the Chinese character pronunciation image sequence of the module to 1600 training data_fAnd processing by adopting a typical correlation discriminant analysis method proposed by T.

Step 1.7: the conversion characteristic matrix acquisition module reads the transformation matrix T from the memory A and uses the transformation matrix T to sequentially compare the characteristic matrix Z of the Chinese character pronunciation image sequence of the training data_fConverting to obtain a conversion characteristic matrix Z_f′＝T^TZ_fAnd training the conversion characteristic matrix Z of the Chinese character pronunciation image sequence of the data_f' store to memory a.

After the automatic lip language recognition system is trained, an experimenter uses 400 pieces of test data to test the system, and the process is as follows:

step 1: and sending the closed contour curve appearing in the human-computer interaction module and the 400 Chinese character pronunciation image sequences to the lip contour positioning module.

And step 3: the geometric vector acquisition module sequentially acquires lip geometric characteristic vectors g from lip contour curves of each frame of image in Chinese character pronunciation image sequence₁～g₃₀Is represented by the following formula (I); and in order to compensate lip shape difference and image scaling difference among different testees, lip geometric characteristic vector g is subjected to₁～g₃₀Carrying out normalization operation to obtain lip geometric feature vector g after normalization operation₁′～g₃₀'; and then outputting the lip geometric feature vector after the normalization operation to a motion vector acquisition module and a feature matrix construction module. The specific operation steps for obtaining the lip geometric feature vector after the normalization operation are as follows:

Step 3.2: connecting the left and right nozzle corner points by a straight line, taking the midpoint of the left and right nozzle corner points as the center of a circle, and taking the center of the circle as a point O, and rotating the straight line clockwise for 5 times, wherein the rotation is 30 degrees each time; two line segments of which the straight line intersects the lip-shaped curve are obtained every time the lip-shaped curve rotates once, and 12 line segments are obtained in total, and L is respectively used in the clockwise sequence from the left mouth corner₁～L₁₂The length of the 12 line segments is expressed, and the length L of the 12 line segments is called₁～L₁₂Is a radiation vector; when the straight line between the two points of the left and right mouth angles is rotated by 90 degrees, the lip is contacted withThe upper and lower intersections at which the curved lines intersect become points a and B, respectively.

Step 3.3: the left mouth angle is called as a point Q, and the point Q is respectively connected with a point A and a point B by straight lines; theta for angle AQ0₁Indicating that angle BQO is theta₂Is represented by L₁～L₁₂To obtain theta₁And theta₂To thereby obtain theta₁And theta₂Cosine value of (d);

step 3.4: l is₁～L₁₂And theta₁And theta₂The cosine value of the image forms a lip geometric feature vector in a frame of image; due to L₁And L₇Is half the length of the line connecting the left and right mouth corners, so that their values are equal, thus removing L from the geometric feature vector of the lips₇I.e. the geometric feature vector g of the lips in a frame of image_i＝[L₁，…，L₆，L₈，…L₁₂，cosθ₁，cosθ₂]^t，i＝1，2，…，30；

j＝1，2，…6，8，…，12，

and 4, step 4: feature matrix construction module constructs feature matrix Z of Chinese character pronunciation image sequence of test data_eE 1, 2, …, 400, and then testing the feature matrix Z of the Chinese character pronunciation image sequence of the data_eAnd outputting the data to a conversion feature matrix acquisition module. The specific operation steps for constructing the feature matrix of the Chinese character pronunciation image sequence are as follows:

step 4.1: the following operations are sequentially carried out on each frame image in the Chinese character pronunciation image sequence: connecting the lip geometric characteristic vector with the lip movement characteristic vector to form a combined characteristic vector v_i，v_iIs a 26-dimensional column vector and,

step 4.2: the feature matrix of the Chinese character pronunciation image sequence is composed of the joint feature vector v of each frame image in the Chinese character pronunciation image sequence_iCombined so that the feature matrix Z of the phonetic image sequence of Chinese characters of the test data_e＝{v₁，v₂，...，v_n}∈R^26×30。

And 5: the conversion characteristic matrix acquisition module reads the transformation matrix T from the memory A and uses the transformation matrix T to test the characteristic matrix Z of the Chinese character pronunciation image sequence of the data_eConverting to obtain a conversion characteristic matrix Z_e′＝T^TZ_eAnd converting the character feature matrix Z of the Chinese character pronunciation image sequence of the test data_e' store to memory B.

Step 6: the typical correlation discriminant analysis module reads a conversion feature matrix Z of all training data from a memory A_f', read from memory BConversion characteristic matrix Z of Chinese character pronunciation image sequence of current test data_e' then calculating a conversion feature matrix Z of the test data by adopting a typical correlation discriminant analysis method proposed by T_e' conversion feature matrix Z with each training data_fThe sum of typical correlation coefficients of'; as repeated Chinese characters may exist in the training data, the typical correlation coefficient sum corresponding to the same Chinese character is 1 or more than 1, so that the average value of the typical correlation coefficient sum corresponding to each Chinese character in the training data is further calculated, the maximum value is taken out from the average values, and the Chinese character corresponding to the maximum value in the training data is output to the man-machine interaction module.

And 7: the man-machine interaction module displays the Chinese characters transmitted by the typical relevant discriminant analysis module.

Through the steps, the automatic identification of the test data can be completed, and the identification accuracy of the system is shown in the 2 nd column in the table 1; meanwhile, in order to illustrate the effect of the invention, 2 experiments were also performed:

1. under the condition of the same experimental environment, training data and test data, the convolution virtual electrostatic field Snake model used in the invention is changed into the traditional Snake model, other functions are unchanged, and the obtained identification accuracy is shown in the 3 rd column in the table 1;

2. under the same experimental environment, training data and test data, the typical correlation analysis method used in the present invention is changed into a Continuous Hidden Markov Model (CHMM) with unchanged other functions, and the obtained recognition accuracy is shown in column 4 in table 1.

TABLE 1 comparison of recognition accuracy (%) -for the different methods

	(1)	(2)	(3)
				'zero'	90.0	73.5	88.5
'one'	92.0	75.0	90.5
				'two'	86.5	76.0	83.0
'three'	93.0	81.5	92.5
				'four'	95.0	83.0	95.5
'Wu'	89.5	73.0	91.0
				'I'	96.0	82.0	95.0
Love "	97.0	82.5	95.5
				'Bei'	93.5	81.5	94.0
"Jing"	90.0	75.5	88.0

Experiments show that the system provided by the invention has higher identification accuracy.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications may be made or equivalents may be substituted for some of the features thereof without departing from the scope of the present invention, and such modifications and substitutions should also be considered as the protection scope of the present invention.

Claims

1. An automatic lip language recognition system for Chinese, comprising: the system comprises a head-mounted camera, a man-machine interaction module, a lip contour positioning module, a geometric vector acquisition module, a motion vector acquisition module, a feature matrix construction module, a transformation matrix T acquisition module, a conversion feature matrix acquisition module, a memory A, a memory B and a typical correlation discriminant analysis module;

the connection relationship is as follows: the output end of the head-mounted camera is connected with the input end of the man-machine interaction module; the output end of the human-computer interaction module is connected with the input end of the lip contour positioning module; the output end of the lip contour positioning module is connected with the input end of the geometric vector acquisition module; the output end of the geometric vector acquisition module is connected with the input ends of the motion vector acquisition module and the feature matrix construction module; the output end of the motion vector acquisition module is connected with the input end of the characteristic matrix construction module; the output end of the feature matrix construction module is connected with the input ends of the transformation matrix T acquisition module and the transformation feature matrix acquisition module; the transformation matrix T acquisition module is connected with the memory A; the conversion characteristic matrix acquisition module is connected with the memory A and the memory B; the memory A and the memory B are also connected with the input end of the typical correlation discriminant analysis module; the output end of the typical correlation discriminant analysis module is connected with the input end of the human-computer interaction module;

the main functions of each module and equipment are as follows:

the main functions of the head-mounted camera are: acquiring a Chinese character pronunciation image sequence sent by a testee;

the main functions of the man-machine interaction module are as follows: providing a closed contour curve for a testee to adjust the position of a head-mounted camera, so that the lip region of the testee acquired by the head-mounted camera is contained in the closed contour curve; acquiring a Chinese character pronunciation image sequence shot by the head-mounted camera; thirdly, outputting the result of the typical correlation discriminant analysis module;

the main functions of the lip profile positioning module are: using a lip tracking algorithm proposed in a document 'lip tracking algorithm based on a convolution virtual electrostatic field Snake model' by Lukun et al to sequentially position the lip contour on each frame of image in a Chinese character pronunciation image sequence to obtain a lip contour curve, and outputting the lip contour curve to a geometric vector acquisition module;

the main functions of the geometric vector acquisition module are: lip geometric characteristic vectors are obtained from lip contour curves of each frame of image in the Chinese character pronunciation image sequence output by the lip contour positioning module; in order to compensate lip difference and image scaling difference between different testees, lip geometric feature vectors are subjected to normalization operation to obtain normalized lip geometric feature vectors, and the normalized lip geometric feature vectors are output to a motion vector acquisition module and a feature matrix construction module;

the main functions of the motion vector acquisition module are: constructing lip motion characteristic vectors of each frame of image on the basis of the lip geometric characteristic vectors subjected to normalization operation, and then outputting the lip motion characteristic vectors to a characteristic matrix construction module;

the main functions of the feature matrix construction module are: constructing a feature matrix of the Chinese character pronunciation image sequence, and then outputting the feature matrix of the Chinese character pronunciation image sequence to a transformation matrix T acquisition module and a transformation feature matrix acquisition module;

the main functions of the transformation matrix T acquisition module are: aiming at a feature matrix of a Chinese character pronunciation Image sequence of training data, a typical correlation discriminant analysis method provided by T < -K.Kim et al in the literature, "discriminating Learning and Recognition of Image Set Classes Using Canonical relations" is adopted for processing to obtain a transformation matrix T, and the transformation matrix T is stored in a memory A;

the main functions of the conversion characteristic matrix acquisition module are as follows: converting the feature matrix of the Chinese character pronunciation image sequence of the training data in sequence by using the transformation matrix T to obtain a conversion feature matrix, and storing the conversion feature matrix of the Chinese character pronunciation image sequence of the training data in a memory A;

a memory A: storing the transformation matrix T and a conversion characteristic matrix of the Chinese character pronunciation image sequence of the training data;

a memory B: storing a conversion characteristic matrix of the Chinese character pronunciation image sequence of the test data;

a typical correlation discriminant analysis module: obtaining the typical correlation coefficient sum of the conversion characteristic matrix of the current test data and the conversion characteristic matrix of each training data in the memory A from the memory B, further processing the typical correlation coefficient sums to obtain the identification result of the current test data, and outputting the identification result to the man-machine interaction module;

the working flow of the system training process is as follows:

step 1.2: the man-machine interaction module displays a closed contour curve;

step 1.3: the tested person fixes the head-wearing camera on the head; the position of the head-mounted camera is adjusted by the tested person, so that the tested person can directly shoot the lower half part of the tested face, and the shot image is sent to the human-computer interaction module for display; the subject again adjusts the position of the head-mounted camera so that the lip region of the subject is contained in the closed contour curve described in step 1.2;

step 1.4: the testee pronounces the m Chinese characters in the step 1.1 at the speed of 1 Chinese character per second, and simultaneously the shooting speed of the head-wearing camera is n frames per second, n is more than or equal to 25 and n is a positive integer; therefore, the video stream of each Chinese character pronunciation consists of n frames of image sequences; the n frame image sequence of a Chinese character is called a Chinese character pronunciation image sequence; the head-mounted camera sends the shot Chinese character pronunciation image sequence to the man-machine interaction module;

step 1.5: the human-computer interaction module sends the closed contour curve in the step 1.2 and the Chinese character pronunciation image sequence shot by the head-mounted camera in the step 1.4 to the lip contour positioning module;

step 1.6: the lip contour positioning module uses a lip tracking algorithm proposed by Lukun et al in the literature 'lip tracking algorithm based on convolution virtual electrostatic field Snake model' to sequentially position the lip contour on each frame of image in the Chinese character pronunciation image sequence to obtain a lip contour curve, and outputs the lip contour curve to the geometric vector acquisition module; when the lip outline of the first image in each Chinese character pronunciation image sequence is positioned, the initial curve of the convolution virtual electrostatic field Snake model adopts a closed outline curve provided by a man-machine interaction module; when the lip outlines of other images in the Chinese character pronunciation image sequence are positioned, the initial curve of the convolution virtual electrostatic field Snake model adopts the lip positioning result curve of the previous image of the image;

step 1.7: the geometric vector acquisition module sequentially acquires lip geometric characteristics from a lip contour curve of each frame of image in a Chinese character pronunciation image sequenceSign vector, in g_iI represents the sequence number of each frame image in a Chinese character pronunciation image sequence, i is more than or equal to 1 and less than or equal to n, and i is a positive integer; and in order to compensate lip shape difference and image scaling difference among different testees, lip geometric characteristic vector g is subjected to_iCarrying out normalization operation to obtain lip geometric feature vector after normalization operation, and using g_i' represents; then outputting the lip geometric feature vector after the normalization operation to a motion vector acquisition module and a feature matrix construction module; the specific operation steps for obtaining the lip geometric feature vector after the normalization operation are as follows:

step 1.7.1: calculating an extreme value of the lip contour curve in the horizontal direction to obtain point coordinates of a left nozzle angle and a right nozzle angle;

step 1.7.2: connecting the left and right nozzle corner points by a straight line, taking the midpoint of the left and right nozzle corner points as the center of a circle, and taking the center of the circle as a point O, and rotating the straight line clockwise for 5 times, wherein the rotation is 30 degrees each time; two line segments of which the straight line intersects the lip-shaped curve are obtained every time the lip-shaped curve rotates once, and 12 line segments are obtained in total, and L is respectively used in the clockwise sequence from the left mouth corner₁～L₁₂The length of the 12 line segments is expressed, and the length L of the 12 line segments is called₁～L₁₂Is a radiation vector; when the straight line between the two points of the left and right mouth corners is rotated by 90 degrees, the upper intersection point and the lower intersection point which are intersected with the lip-shaped curve become a point A and a point B respectively;

step 1.7.4: l is₁～L₁₂And theta₁And theta₂The cosine value of the image forms a lip geometric feature vector in a frame of image; due to L₁And L₇Is half the length of the line connecting the left and right mouth corners, so that their values are equal, thus removing L from the geometric feature vector of the lips₇I.e. geometric feature vector of lips in a frame of imageg_i＝[L₁，…，L₆，L₈，…L₁₂，cosθ₁，cosθ₂]^t；

j＝1，2，…6，8，…，12，is the distance between the left and right mouth corners in the first frame image of a Chinese character pronunciation image sequence;

step 1.8: the motion vector acquisition module constructs lip motion characteristic vectors of each frame of image on the basis of the lip geometric characteristic vectors subjected to normalization operation, and uses p_iIs represented by the formula p_iIs a 13-dimensional transverse vector, p_i＝(g_i′-g_i-1')/Δ t, wherein g₀′＝g₁', Δ t is the time interval of two consecutive frames; then the lip movement characteristic vector p_iOutputting the data to a feature matrix construction module;

step 1.9: the feature matrix construction module constructs the feature matrix of the Chinese character pronunciation image sequence of the training data by Z_fRepresenting, wherein f represents the sequence number of the Chinese character pronunciation image sequence of the training data, f is more than or equal to 1 and less than or equal to m, and f is a positive integer; then, a feature matrix Z of the Chinese character pronunciation image sequence of the training data is used_fRespectively outputting the data to a transformation matrix T acquisition module and a transformation characteristic matrix acquisition module; the specific operation steps for constructing the feature matrix of the Chinese character pronunciation image sequence are as follows:

step 1.9.1: sequentially doing for each frame image in the Chinese character pronunciation image sequenceThe following operations are carried out: connecting the lip geometric feature vector with the lip motion feature vector to form a joint feature vector, using v_iIs represented by v_iIs a 26-dimensional column vector and,

step 1.9.2: the feature matrix of the Chinese character pronunciation image sequence is composed of the joint feature vector v of each frame image in the Chinese character pronunciation image sequence_iCombined so that the feature matrix Z of the image sequence of the pronunciation of Chinese characters of the training data_f＝{v₁，v₂，...，v_n}∈R^26×n；

Step 1.10: transformation matrix T obtains the characteristic matrix Z of the Chinese character pronunciation image sequence of the module to m training data_fThe transformation matrix T belongs to R by adopting a typical correlation discriminant analysis method provided by T.K.Kim et al to process to obtain a transformation matrix T belonging to R^26×rR < 26, wherein R is a positive integer and represents a real number, and storing the transformation matrix T into a memory A;

step 1.11: the conversion characteristic matrix acquisition module reads the transformation matrix T from the memory A and uses the transformation matrix T to sequentially compare the characteristic matrix Z of the Chinese character pronunciation image sequence of the training data_fConverting to obtain a conversion characteristic matrix Z_f′＝T^TZ_fAnd training the conversion characteristic matrix Z of the Chinese character pronunciation image sequence of the data_f' store to memory a;

through the operation of the steps, the training of the automatic lip language recognition system can be completed;

the working flow of the system testing process is as follows:

step 2.1: selecting m ' Chinese characters from m training data as test data, wherein m ' is less than or equal to m and m ' is a positive integer;

step 2.2: the man-machine interaction module displays a closed contour curve;

step 2.3: the tested person fixes the head-wearing camera on the head; the position of the head-mounted camera is adjusted by the tested person, so that the tested person can directly shoot the lower half part of the tested face, and the shot image is sent to the human-computer interaction module for display; the subject again adjusts the position of the head-mounted camera so that the lip region of the subject is contained in the closed contour curve described in step 2.2;

step 2.4: the testee pronounces the m' Chinese characters in the step 2.1 at the speed of 1 Chinese character per second, and simultaneously the shooting speed of the head-wearing camera is n frames per second; therefore, the video stream of each Chinese character pronunciation consists of n frames of image sequences; the n frame image sequence of a Chinese character is called a Chinese character pronunciation image sequence; the head-mounted camera sends the shot Chinese character pronunciation image sequence to the man-machine interaction module;

step 2.5: the human-computer interaction module sends the closed contour curve in the step 2.2 and the Chinese character pronunciation image sequence in the step 2.4 to the lip contour positioning module;

step 2.6: the same as the operation of step 1.6 in the system training process;

step 2.7: the same as the operation of step 1.7 in the system training process;

step 2.8: the same as the operation of step 1.8 in the system training process;

step 2.9: the feature matrix construction module constructs the feature matrix of the Chinese character pronunciation image sequence of the test data by Z_eRepresenting, wherein e represents the sequence number of the Chinese character pronunciation image sequence of the test data, e is more than or equal to 1 and less than or equal to m', and e is a positive integer; then testing the character matrix Z of the Chinese character pronunciation image sequence of the data_eOutputting the data to a conversion characteristic matrix obtaining module; the specific operation steps for constructing the feature matrix of the Chinese character pronunciation image sequence are as follows:

step 2.9.2: the feature matrix of the Chinese character pronunciation image sequence is composed of the joint feature vector v of each frame image in the Chinese character pronunciation image sequence_iCombination ofAnd, thus, the feature matrix Z of the Chinese character pronunciation image sequence of the test data_e＝{v₁，v₂，...，v_n}∈R^26×n；

Step 2.10: the conversion characteristic matrix acquisition module reads the transformation matrix T from the memory A and uses the transformation matrix T to test the characteristic matrix Z of the Chinese character pronunciation image sequence of the data_eConverting to obtain a conversion characteristic matrix Z_e′＝T^TZ_eAnd converting the character feature matrix Z of the Chinese character pronunciation image sequence of the test data_e' store to memory B;

step 2.11: the typical correlation discriminant analysis module reads a conversion feature matrix Z of all training data from a memory A_f' reading the conversion characteristic matrix Z of the Chinese character pronunciation image sequence of the current test data from the memory B_eKim et al, then calculates the transformed feature matrix Z of the test data Using the typical correlation discriminant analysis method Set forth in the literature, "discrete Learning and correlation of Image Set Classes Using cancer Correlations"_e' conversion feature matrix Z with each training data_fThe sum of typical correlation coefficients of'; because repeated Chinese characters may exist in the training data, the typical correlation coefficient sum corresponding to the same Chinese character is 1 or more than 1, the average value of the typical correlation coefficient sum corresponding to each Chinese character in the training data is further calculated, the maximum value is taken out from the average values, and the Chinese character corresponding to the maximum value in the training data is output to the human-computer interaction module;

step 2.12: the man-machine interaction module displays the Chinese characters transmitted by the typical relevant discriminant analysis module;

through the steps, automatic classification and identification of the test data can be completed.