CN115565540B

CN115565540B - Invasive brain-computer interface Chinese pronunciation decoding method

Info

Publication number: CN115565540B
Application number: CN202211545924.0A
Authority: CN
Inventors: 祁玉; 谭显瀚; 王跃明; 张建民; 朱君明
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2023-04-07
Anticipated expiration: 2042-12-05
Also published as: CN115565540A

Abstract

The invention discloses an invasive brain-computer interface Chinese pronunciation decoding method, which comprises the following steps: screening effective neurons from the electroencephalogram data, removing the neurons with high similarity, and labeling the electroencephalogram data by using synchronous audio data after standardization; according to the characteristics of the Chinese pronunciation electroencephalogram data, the electroencephalogram data are projected into a hyperbolic space; constructing an effective hyperbolic neural network and a hyperbolic multiple logistic regression classifier to classify Chinese phonemes in the electroencephalogram data; in the training process, a certain number of triples are extracted from training data, the losses of hierarchical clustering are calculated for the triples based on the output characteristics of a network, and the triples are added into a total loss function to be optimized according to a certain weight; decoding by using the trained hyperbolic neural network and the hyperbolic multiple logistic regression classifier. The method better utilizes the structural characteristics of the Chinese pronunciation electroencephalogram data by introducing the hyperbolic space and the hyperbolic decoding method, and effectively improves the classification decoding performance of the Chinese pronunciation electroencephalogram data.

Description

Invasive brain-computer interface Chinese pronunciation decoding method

Technical Field

The invention relates to the field of electroencephalogram data decoding, in particular to an invasive brain-computer interface Chinese pronunciation decoding method.

Background

The invasive brain-computer interface utilizes the high resolution intra-cortical brain electrical signals recorded by the invasive electrodes to identify the state and intent of the brain, thereby assisting the clinical patient in performing a variety of different tasks. In recent years, the application and research of an invasive brain-computer interface on voice are rapidly developed. Advanced speech brain-computer interfaces have enabled direct speech synthesis or decoding of speech phonemes, words and sentences from brain electrical signals, which means that invasive speech brain-computer interfaces have great potential for restoring the communication ability of aphasia patients.

In general, the speech brain-computer interface regards pronunciation as a motion process, and decodes neural signals into speech by decoding oral pronunciation kinematics as an intermediate link. One way is to convert the electroencephalogram signals recorded from the motor cortex into oral vocal actions during speaking, and then convert the corresponding oral vocal actions into voice. With the help of machine learning methods such as deep networks, some speech brain-computer interfaces tend to learn decoders in an end-to-end manner, generating speech waveforms directly from brain electrical signals.

For example, chinese patent publication No. CN111681636A discloses a method for generating a technical term sound based on a brain-computer interface, which includes collecting electroencephalogram signals reflecting brain activity information, external audio signals and video image signals, extracting features, performing nonlinear computation and learning through a plurality of neural networks, adding external context information and feedback input, directly decoding intentions and language contents expressed by the brain from the brain signals, and finally completing speech generation through a neural network, thereby realizing speech generation of the brain-computer interface technology.

However, decoding speech directly from neural signals faces the problem of vocabulary limitation. This is time consuming because the test needs to repeatedly speak words in the vocabulary for decoder training before the speech brain-computer interface is constructed. On the other hand, phonemes are basic sound units in pronunciation. Typically, the number of phonemes is much smaller than the number of words. Through accurate recognition of phonemes, free decoding of words after combination is expected. But it is difficult to accurately decode the speech phonemes from the neural signal. From a kinematic process perspective, the kinematics associated with speech is a combination of orofacial movements, including lips, tongue, chin, and other joints. Therefore, the phonemes with similar kinematics are often mixed up and difficult to distinguish, and the overall classification performance of the phonemes is reduced. How to accurately decode speech phonemes from neural signals remains a challenging problem.

More importantly, the application and research of the brain-computer interface aiming at Chinese pronunciation do not exist, and how to design an algorithm aiming at the Chinese pronunciation characteristics is realized, so that the good classification and decoding performance is realized, and then the high-efficiency voice brain-computer interface is constructed and is still in a blank state at present.

Disclosure of Invention

The invention provides an invasive brain-computer interface Chinese pronunciation decoding method, which can effectively improve the classification decoding performance of Chinese pronunciation electroencephalogram data.

An invasive brain-computer interface Chinese pronunciation decoding method comprises the following steps:

(1) Collecting electroencephalogram data of Chinese pronunciation and synchronous audio data, screening effective neurons from the electroencephalogram data, removing the neurons with high similarity, and standardizing the electroencephalogram data; marking the time node of the sound production on the electroencephalogram data by using the synchronous audio data, and intercepting data segments with fixed window length, wherein each data segment corresponds to a Chinese phoneme;

(2) Projecting the electroencephalogram data processed in the step (1) into a hyperbolic space, and forming training data by the electroencephalogram data and corresponding Chinese phonemes in the hyperbolic space;

(3) Constructing a hyperbolic neural network and a hyperbolic multiple logistic regression classifier; the hyperbolic neural network is used for extracting the features of the electroencephalogram data in the hyperbolic space, and the hyperbolic multiple logistic regression classifier is used for classifying Chinese phonemes for the features of the electroencephalogram data;

(4) Training a hyperbolic neural network and a hyperbolic multiple logistic regression classifier;

in the training process, a certain number of triples are extracted from training data, the loss of hierarchical clustering is calculated for the triples on the basis of the output characteristics of the hyperbolic neural network, and the triples are added into a total loss function to be optimized according to a certain weight;

(5) And projecting the electroencephalogram data to be decoded to a hyperbolic space, and then sequentially inputting the data to the trained hyperbolic neural network and hyperbolic multiple logistic regression classifier to obtain the decoded Chinese phoneme classification.

According to the hierarchical classification structure of the phonemes in the Chinese pronunciation and the hierarchy of the pronunciation position and the pronunciation mode in the Chinese pronunciation electroencephalogram, the hyperbolic neural network is constructed to better learn the characteristics of the Chinese pronunciation electroencephalogram, and the logit vector is obtained through the hyperbolic multiple logistic regression classifier. Meanwhile, hierarchical clustering constraint is performed on the logit vectors, and the model is encouraged to better mine the hierarchical structure of the data, so that better representation is learned, and the classification decoding performance of the Chinese pronunciation electroencephalogram data is effectively improved.

Preferably, in step (1), the neurons are screened by off-line screening. The method for screening effective neurons from electroencephalogram data and removing the neurons with high similarity specifically comprises the following steps:

carrying out spike potential classification, extracting the issuance of all neurons in an electroencephalogram signal, and drawing a waveform; visually inspecting the firing waveform of each neuron, and reserving neurons with obvious waveforms and a total firing frequency of more than 100; cosine similarity is calculated for the issuance of different neurons, and when the similarity degree of a plurality of neurons is greater than 0.7, only one neuron is reserved so as to reduce the influence of crosstalk on data quality.

When the data are normalized, the original value is subtracted by the mean value and then divided by the standard deviation, so that the obtained data meet the normal distribution that the mean value is 0 and the standard deviation is 1.

Preferably, the sound-emitting time node is marked by synchronous audio data, and the data segment with the window of [ -500ms, +1500ms ] is intercepted and used for subsequent training and verification by taking the sound-emitting time node as the center.

In the step (2), a Poincare disc model is adopted

To project the brain electrical data into the hyperbolic space:

wherein,

representing a hyperbolic space with curvature c and dimension d; />

Represents a data point, <' > is>

Represents a Euclidean real number space of dimension d, <' > H>

Represents->

Is greater than or equal to>

And &>

Represents a Euclidean metric and a hyperbolic metric, respectively>

Representing the conformities of both metrics.

In the step (3), the hyperbolic neural network is expressed as:

/>

wherein,

and &>

Respectively representing a hyperbolic neural network function and an Euclidean neural network function, and>

and

respectively representing an exponential transformation and a logarithmic transformation at the origin,crepresents the curvature of a hyperbolic space, and>

represents a data point, <' > is>

Represents->

Absolute value of (a).

In step (3), when the hyperbolic multiple logistic regression classifier classifies Chinese phonemes, z classes are given, and probability calculation formulas of different classes are as follows:

wherein,

and &>

Is a parameter of hyperbolic multiple logistic regression>

A conformal factor representing the classification boundary of the class z,

represents an inverse hyperbolic sine function, -is>

Denotes an exponential function based on a natural constant e>

Represents->

The absolute value of (a);

representing a Mobius addition operation; c represents the curvature of a hyperbolic space; />

Representing an inner product operation.

In the step (4), the formula of the total loss function is as follows:

wherein,

represents a classification loss, is>

Representing hierarchical clustering loss; />

And &>

Are coefficients that balance the two parts of the loss function.

The classification loss is calculated as follows:

wherein,

is/>

Is selected based on the category label, <' > is selected>

Is after softmax->

Is greater than the log probability of->

The data amount of mini-batch is shown.

The hierarchical clustering loss is calculated as follows:

wherein,

represents a normalized softmax function; />

Representing triples extracted from training data; />

Indicates pickin a triple>

Is connected to the minimum common ancestor node, </or>

Indicates pickin a triple>

The smallest common ancestor node of (a),

indicates pickin a triple>

The smallest common ancestor node of (c); />

Representing a hyperbolic distance to a center of a hyperbolic space; />

Representing £ in a triplet>

Hyperbolic similarity between->

Indicates pickin a triple>

Hyperbolic similarity between->

Representing £ in a triplet>

Hyperbolic similarity between them; />

Representing a matrix transposition.

When hyperbolic similarity calculation is carried out, a certain number of triples are sampled by using a random sampling method

Calculating a hyperbolic distance->

Divided by the sum of three>

Is normalized to obtain->

The degree of similarity is expressed as->

。

And when the hierarchical clustering loss is calculated, selecting a logic layer of the hyperbolic multiple logistic regression classifier to perform triple sampling and hierarchical clustering.

Compared with the prior art, the invention has the following beneficial effects:

the hyperbolic neural network is applied to the classification and decoding of the Chinese pronunciation electroencephalogram signals, the neural representation of the Chinese pronunciation is classified in a hyperbolic space, the hierarchical characteristics of the Chinese pronunciation and the signal representation are considered, and the hierarchical structure of the neural representation of the phoneme is restrained by the hierarchical clustering loss. The result proves that the model learns the interpretable hierarchical phoneme embedding from the electroencephalogram signal, and the phoneme decoding performance is obviously improved.

Drawings

FIG. 1 is a timing diagram illustrating an experimental paradigm of a data set in accordance with an embodiment of the present invention.

FIG. 2 is a spike issuance visualization diagram after different Chinese initials of the data set are grouped according to the utterance position in the embodiment of the present invention.

FIG. 3 is a graph comparing the classification accuracy of treatments with and without the inventive method.

FIG. 4 is a comparison graph of the distribution obtained after two-dimensional multiple logistic regression classification boundary visualization learned using the method of the present invention and not using the method of the present invention.

Detailed Description

The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.

In the data collection phase, this example was used to collect neural signals from the left major motor cortex of a paralyzed patient by implanting two 96-channel Utah intracortical microelectrode arrays (Blackrock Microsystems, salt Lake City, UT, USA) to record the neural signals. Neural signals were sampled at 30kHz using the Neuroport system (NSP, blackrock Microsystems) with two 96-channel Utah intracortical microelectrode arrays. During the experiment, the audio signal was recorded simultaneously with a microphone placed in front of the patient. The audio signal is digitized by the NeuroPort system through an analog input port at 30 khz. The embodiment designs three tasks aiming at Chinese pronunciation: 21 different initial pronunciation tasks, 24 different final pronunciation tasks and 20 different Chinese character pronunciation tasks. An experimental paradigm for data acquisition is shown in fig. 1. Specifically, in each trial, the subject was asked to view a red phoneme prompt on a computer screen one meter in front of him and hear an audible prompt for that phoneme. After one second the phoneme on the screen turns green indicating the start of the "start" phase and is tried to be spoken subsequently with the prompt phoneme. To ensure that the test has had sufficient reaction time to complete the test, the "start" phase lasts 3 seconds. After the "start" phase is ended, the recording of one of the dials is complete, and the recording of the next dial then begins.

The invention provides an invasive brain-computer interface Chinese pronunciation decoding method, which specifically comprises the following steps:

step 1, preprocessing electroencephalogram data.

Designing an experimental paradigm of Chinese pronunciation, and acquiring electroencephalogram data and synchronous audio data of the Chinese pronunciation; screening effective neurons from the electroencephalogram data, removing the neurons with high similarity, standardizing the data, marking the electroencephalogram data by using synchronous audio data, and then intercepting a data segment with a proper window length to obtain the preprocessed electroencephalogram data.

Specifically, spike classification (spike classification) is performed first, all neurons in the electroencephalogram signal are extracted, and a waveform is drawn. The firing waveform for each neuron was visually examined, and neurons with significant waveforms present with a total firing count greater than 100 were retained. And (4) issuing different neurons to calculate cosine similarity, and only keeping one neuron when the similarity of a plurality of neurons is greater than 0.7.

And then marking the sounding time node by using the synchronous audio data, and intercepting a data segment taking the [ -500ms, +1500ms ] as a window for subsequent training and verification by taking the sounding time node as a center.

And 2, projecting the electroencephalogram data into a hyperbolic space.

Hyperbolic space is a non-european space of everywhere negative curvature. In a hyperbolic space, the farther from the center of the space, the greater the curvature and the greater the degree of spatial curvature. This means that hyperbolic spaces are well suited for modeling data having a tree structure or hierarchy: the number of nodes of the tree grows exponentially with the depth of the tree. After the Chinese pronunciation electroencephalogram signals are subjected to visual analysis, as shown in fig. 2, the Chinese pronunciation electroencephalogram signals can be found to have a certain hierarchical structure, and the hierarchical structure is related to a sound production mode and a sound production position. This means that hyperbolic space can be used to model chinese phonation brain electrical signals.

The example uses the presently most common and best-performing hyperbolic space model: poincare disc model

To project the brain electrical data into the hyperbolic space:

wherein c represents the curvature of a hyperbolic space,dwhat is represented is the dimension of a hyperbolic space,

expressed are Euclidean measures and hyperbolic measures, respectively>

Conformal factors for both metrics are shown.

And 3, constructing a hyperbolic neural network to extract features, and classifying the Chinese pronunciation by using a hyperbolic multiple logistic regression classifier.

Constructing a reasonable network structure according to the characteristics of small data volume and large dimensionality of Chinese pronunciation electroencephalogram; the hyperbolic neural network is a version of an Euclidean space neural network vector and matrix computing operation executed in a hyperbolic space. Since the vector and matrix calculation operations are too complex to be performed in the non-european space, the tangent space of the hyperbolic space is required to be used for the approximation operation. The tangent space of the hyperbolic space has the property of an Euclidean space, so that data only needs to be projected onto the tangent space, and after the calculation operation of vectors and matrixes is executed in the tangent space, the data is projected back to the Euclidean space. Here, the conversion between the tangent space and the original space needs to be completed by using exponential transformation and logarithmic transformation in the gyro vector space. In this way, a representation of the hyperbolic neural network can be obtained:

wherein,

and &>

and

respectively, an exponential transformation and a logarithmic transformation at the origin, and c represents the curvature of a hyperbolic space.

Considering that the data volume of the Chinese pronunciation electroencephalogram signal is less, in the practical construction, a hyperbolic neural network structure with 2 layers is selected, and the number of neurons is respectively as follows: 256, 128.

Similar to the hyperbolic neural network, the hyperbolic multiple logistic regression is also a version of the european multiple logistic regression that performs operations in the hyperbolic space.

In particular, given

The method comprises the following steps that (1) the logit probabilities of different types of samples are obtained by a hyperbolic multiple logistic regression method, and the specific calculation mode is as follows:

wherein,

and &>

Is a parameter of hyperbolic multiple logistic regression>

A conformal factor representing the classification boundary of class z,

represents an inverse hyperbolic sine function, -is>

Denotes an exponential function based on a natural constant e>

Represents->

Absolute value of (d);

Representing an inner product operation.

The Mobius addition operation is an operation method of a gyro vector space, and is obtained by derivation through exponential transformation and logarithmic transformation, and the specific calculation method is as follows:

and 4, training the hyperbolic neural network and the hyperbolic multiple logistic regression classifier.

And in the optimization process, a hyperbolic RSGD method is used for parameter optimization and updating. And considering the small data volume of the electroencephalogram signals, training and testing the model by using a leave-one-out method. Only one data is used as a test set at a time, and the rest is all used as a training set. In the training process, a hierarchical clustering constraint characteristic representation learning method based on triple similarity is added into the hyperbolic neural network.

(4-1) selecting a proper similarity calculation method: considering that the hyperbolic model is used to extract features, the hyperbolic distance can be directly adopted to calculate the similarity.

Given two points on a poincare disc

The hyperbolic distance between two points is calculated as follows:

wherein,

representing the euclidean norm of the vector.

Sampling triples with the number of 20-50 by using a random sampling method

Calculating hyperbolic distances between each other

Divided by the sum of the three>

Is normalized to obtain>

The similarity can be expressed as

。

(4-2) selecting a proper clustering position: the hierarchical clustering loss is directly calculated for the logit vector, the hierarchical clustering loss is added into the overall loss needing to be optimized according to a certain weight, and meanwhile, the classification and clustering targets are optimized, so that the following overall loss function can be obtained

Wherein,

represents a classification loss, is>

Representing hierarchical clustering penalties. />

And &>

Are coefficients that balance the two parts of the loss function.

For multi-class classification tasks, given

Number of samples->

Belongs to>

A category, and a corresponding label

Wherein->

. Classification loss pick-up or-and-place>

Can be represented by the following formula

Wherein,

is/>

And->

Is after softmax->

The log probability of (c).

For hierarchical clustering loss, specifically, a certain number of triples are randomly sampled from data, and the hierarchical clustering loss is calculated based on the triples, and the goal of the loss is to enable nodes with higher similarity in a hierarchical clustering tree to be merged earlier, and the specific calculation is as follows:

wherein

Represents a normalized softmax function, <' > based on the value of the sum>

Represents triples extracted from data, representing data in a particular grouping field>

Indicates pickin a triple>

In the cluster, is based on the smallest common ancestor node in the cluster>

Represents a hyperbolic distance to the center of the hyperbolic space, is>

Representing £ in a triplet>

Hyperbolic similarity between them.

And selecting to perform triple sampling and hierarchical clustering at a locality layer for simultaneously optimizing clustering and classification.

And 5, testing and applying the hyperbolic neural network and the hyperbolic multiple logistic regression classifier.

After the training is finished, whether the classification result of the data is correct is tested. And after all tests are finished, dividing the total test accuracy number by the total data volume to obtain a classification accuracy rate.

For comparison, the feature learning framework provided by the invention has the best effect in the hyperbolic space, the same network structure is used for carrying out experiments by using three different spatial metrics on the same data set, and the obtained comparison result is shown in fig. 3, wherein three sub-graphs respectively represent classification results of 21 Chinese initial pronunciations, 24 Chinese final pronunciations and 20 Chinese character pronunciations, and the performance of the framework in the hyperbolic space is obviously superior to that in the Euclidean space and the spherical space.

In order to illustrate that the learning framework can mine the potential hierarchy of data and learn characteristics with more voice characteristics, visual analysis is carried out on the multivariate logistic regression classification boundaries learned by the network, for example, as shown in fig. 4, the left sub-graph is the classification boundary added with hierarchical clustering optimization, the right sub-graph represents the classification boundary without hierarchical clustering optimization, different colors represent Chinese initials of different classes, and it can be seen that the learned classification boundaries are more dispersed after the hierarchical clustering optimization is added, and the classification boundaries of the initials at the same sounding position show aggregation.

The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. An invasive brain-computer interface Chinese pronunciation decoding method is characterized by comprising the following steps:

2. The invasive brain-computer interface chinese pronunciation decoding method of claim 1, wherein in step (1), the screening of the effective neurons from the electroencephalogram data and the removal of the highly similar neurons specifically are:

spike potential classification is carried out firstly, issuing of all neurons in electroencephalogram signals is extracted, and waveforms are drawn; checking the firing waveform of each neuron, and keeping the neurons with obvious waveforms and the total firing frequency more than 100;

and (3) cosine similarity is calculated for the issuance of different neurons, and when the similarity of a plurality of neurons is greater than 0.7, only one neuron is reserved so as to reduce the influence of crosstalk on data quality.

3. The invasive brain-computer interface chinese pronunciation decoding method of claim 1, wherein in step (2), a poincare disk model is used

To project the brain electrical data into a hyperbolic space:

；

；

；

wherein,

a hyperbolic space with curvature c and dimension d is represented; />

Represents a data point, <' > based on>

The dimension of expression isdIn the Euclidean real number space, is greater than or equal to>

RepresentxIs greater than or equal to>

And &>

Represents a Euclidean metric and a hyperbolic metric, respectively>

Representing the conformities of both metrics.

4. The invasive brain-computer interface chinese pronunciation decoding method according to claim 1, wherein in step (3), the hyperbolic neural network is represented as:

；

；

；/>

wherein,

and &>

and

represents a data point, <' > based on>

Represents->

Absolute value of (a).

5. The invasive brain-computer interface chinese pronunciation decoding method of claim 1, wherein in step (3), when the hyperbolic multiple logistic regression classifier performs chinese phoneme classification, z classes are given, and the probability calculation formulas of the different classes are:

；

wherein,

and &>

Parameters which are hyperbolic multivariate logistic regression>

A conformal factor representing the classification boundary of class z,

represents an inverse hyperbolic sine function, -is>

Represents an exponential function based on a natural constant e>

Represents->

Absolute value of (d);

Representing an inner product operation.

6. The invasive brain-computer interface chinese pronunciation decoding method of claim 1, wherein in step (4), the formula of the total loss function is:

；

wherein,

represents a classification loss, is>

Representing hierarchical clustering loss; />

And &>

Are coefficients that balance the two parts of the loss function.

7. The invasive brain-computer interface chinese pronunciation decoding method of claim 6, wherein said classification penalty is calculated as follows:

；

wherein,

is/>

Is selected based on the category label, <' > is selected>

Is after softmax->

Log probability of (d), based on the number of pairs of the preceding block in the test block>

The data amount of mini-batch is shown.

8. The invasive brain-computer interface chinese pronunciation decoding method of claim 6, wherein said hierarchical clustering penalty is calculated as follows:

；

；

wherein,

represents a normalized softmax function; />

Representing triples extracted from training data; />

Indicates pickin a triple>

Is connected to the minimum common ancestor node, </or>

Representing £ in a triplet>

Is connected to the minimum common ancestor node, </or>

Indicates pickin a triple>

The smallest common ancestor node of (c); />

Representing a hyperbolic distance to a center of a hyperbolic space; />

Indicates pickin a triple>

Hyperbolic similarity between->

Indicates pickin a triple>

Hyperbolic similarity between->

Indicates pickin a triple>

Hyperbolic similarity between them; />

Representing a matrix transposition.

9. The invasive brain-computer interface chinese pronunciation decoding method of claim 8, wherein a random sampling method is used to sample a certain number of triplets in performing the hyperbolic similarity calculation

Calculating a hyperbolic distance->

Divided by the sum of the three>

Is normalized to obtain>

The degree of similarity is expressed as->

。

10. The invasive brain-computer interface chinese pronunciation decoding method of claim 8, wherein during hierarchical clustering loss calculation, triple sampling and hierarchical clustering are performed on a logit layer of a hyperbolic multiple logistic regression classifier.