CN106502394B

CN106502394B - Term vector calculation method and device based on EEG signals

Info

Publication number: CN106502394B
Application number: CN201610907518.2A
Authority: CN
Inventors: 徐睿峰; 杜嘉晨; 桂林; 黄锦辉
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2016-10-18
Filing date: 2016-10-18
Publication date: 2019-06-25
Anticipated expiration: 2036-10-18
Also published as: CN106502394A

Abstract

The present invention provides a kind of term vector calculation method and device based on EEG signals, the term vector calculation method based on EEG signals includes: step S1, text corpus is collected, the corpus in text corpus is handled, the corpus of the continuous phrase format as unit of phrase is obtained；The corpus of continuous phrase format is presented to labeler by step S2, is read for labeler, and acquisition labeler reads EEG signals when each phrase；Step S3, using the corresponding EEG signals of collected phrase as prediction target, training term vector predicts the EEG signals of its context characterized by current phrase, and constructing the term vector based on EEG signals indicates model.The present invention improves the accuracy of term vector calculating through the above scheme.

Description

Term vector calculation method and device based on EEG signals

Technical field

The invention belongs to natural language processing technique field more particularly to a kind of term vector calculating sides based on EEG signals Method and device.

Background technique

In natural language processing task, usually using expression of the term vector as the word in original text, to count The machine learning algorithm of value can apply to text data.The basic thought of term vector model is: by a large amount of trainings, Each word in certain language is mapped to the vector of a regular length, it is however generally that this length is much smaller than the language word The size of allusion quotation usually arrives several hundred dimensions tens.All these vectors constitute term vector space, and each vector can be to be somebody's turn to do A point in space.The measurement of " distance " is introduced on this space, so that it may judge equivalent according to the distance of term vector Similitude between language in syntax, semantically.Traditional term vector calculation method is try to use up by current text vector It may predict the vector of its context accurately to optimize its expression.

In traditional term vector calculating process, predict that context is trained primary goal by current text.It is this The major defect of method has following three points:

1, the other attribute of syntactic level for having only taken into account word does not account for the attribute of phrase semantic rank, therefore usually The obtained term vector of training can only express between word the more relationship of shallow-layer；

2, lack the modeling to human language cognitive process, it is special to have ignored important Cognitive Neuroscience and psychology Sign；

3, due to the complexity of human language cognitive mechanism, the term vector obtained by simple forecast context can not embody The characteristic of different natural language processing tasks, universality are poor.

Summary of the invention

The purpose of the present invention is to provide a kind of term vector calculation method and device based on EEG signals, it is intended to improve word The accuracy that vector calculates.

The invention is realized in this way a kind of term vector calculation method based on EEG signals, the method includes following Step:

Step S1 collects text corpus, handles the corpus in text corpus, obtain as unit of phrase The corpus of continuous phrase format；

The corpus of the continuous phrase format is presented to labeler by step S2, is read for labeler, and acquisition labeler is read Read EEG signals when each phrase；

Step S3, using the corresponding EEG signals of collected phrase as prediction target, training term vector, with current phrase Term vector indicate to be characterized the EEG signals for predicting its context, constructing term vector based on EEG signals indicates model.

Further technical solution of the invention is that the step S1 includes following sub-step:

Step S11 collects text corpus, and the corpus in the text corpus is sentence or chapter rank；

Step S12, removing length in the text corpus is more than the first preset value or length less than the second preset value Corpus obtains pretreatment corpus；

Pretreatment corpus progress word segmentation processing is obtained word by step S13；

Step S14 converts phrase for institute's predicate, obtains with the corpus of continuous phrase format using chunk parsing technology.

Further technical solution of the invention is that the step S3 includes following sub-step:

Step S31 carries out noise reduction process to collected EEG signals, the EEG signals after obtaining noise reduction；

Step S32 carries out space projection and dimension-reduction treatment to the EEG signals after the noise reduction；

Step S33, all phrases in the pretreatment corpus, which are initialized as term vector, to be indicated；

Step S34 traverses all phrases in the pretreatment corpus, characterized by the term vector of current phrase indicates, Using the EEG signals of its context of neural net regression model prediction, by the EEG signals of the context of prediction and real EEG electricity Signal compares, and obtains prediction error, is indicated according to the term vector of the prediction current phrase of error transfer factor, wherein the reality EEG signals are EEG signals when labeler reads the context；This step is repeated, until prediction error is less than default threshold Value.

Further technical solution of the invention is that the step S31 includes:

The collected EEG signals are handled, the EEG signals that signal-to-noise ratio is higher than third preset value are obtained；

The step S32 includes:

Using cospace pattern algorithm to the signal-to-noise ratio be higher than third preset value EEG signals carry out space projection and Dimensionality reduction obtains the EEG signals that dimension is lower than the 4th preset value.

Further technical solution of the invention is to carry out noise reduction process use to the collected EEG signals FASTICA algorithm.

The present invention also provides a kind of term vector computing device based on EEG signals, described device include:

Collection module is handled the corpus in text corpus for collecting text corpus, obtain be with phrase The corpus of the continuous phrase format of unit；

Acquisition module is read for the corpus of the continuous phrase format to be presented to labeler for labeler, acquisition mark Note person reads EEG signals when each phrase；

Module is constructed, for using the corresponding EEG signals of collected phrase as prediction target, training term vector, to work as The term vector of preceding phrase indicates to be characterized the EEG signals for predicting its context, term vector expression mould of the building based on EEG signals Type.

Further technical solution of the invention is that the collection module includes:

Collector unit, for collecting text corpus, the corpus in the text corpus is sentence or chapter rank；

Pretreatment unit is more than that the first preset value or length are pre- less than second for removing length in the text corpus If the corpus of value, pretreatment corpus is obtained；

Participle unit, for pretreatment corpus progress word segmentation processing to be obtained word；

Conversion unit converts phrase for institute's predicate, obtains with continuous phrase format for utilizing chunk parsing technology Corpus.

Further technical solution of the invention is that the building module includes:

Noise reduction unit, for carrying out noise reduction process to collected EEG signals, the EEG signals after obtaining noise reduction；

Dimensionality reduction unit, for carrying out space projection and dimension-reduction treatment to the EEG signals after the noise reduction；

Initialization unit is indicated for all phrases in the pretreatment corpus to be initialized as term vector；

Construction unit is expressed as traversing all phrases in the pretreatment corpus with the term vector of current phrase Feature, using the EEG signals of its context of neural net regression model prediction, by the EEG signals and reality of the context of prediction Border EEG signals compare, and obtain prediction error, are indicated according to the term vector of the prediction current phrase of error transfer factor, wherein institute Stating practical EEG signals is EEG signals when labeler reads the context；This step is repeated, until prediction error is less than Preset threshold.

Further technical solution of the invention is that the noise reduction unit is also used to, to the collected EEG signals It is handled, obtains the EEG signals that signal-to-noise ratio is higher than third preset value；

The dimensionality reduction unit is also used to, and is higher than the brain electricity of third preset value to the signal-to-noise ratio using cospace pattern algorithm Signal carries out space projection and dimensionality reduction, obtains the EEG signals that dimension is lower than the 4th preset value.

Further technical solution of the invention is that the noise reduction module is also used to adopt using FASTICA algorithm to described The EEG signals collected carry out noise reduction process.

The beneficial effects of the present invention are: the term vector calculation method and device provided by the invention based on EEG signals, leads to It crosses above scheme: collecting text corpus, the corpus in text corpus is handled, is obtained continuous as unit of phrase The corpus of phrase format；The corpus of continuous phrase format is presented to labeler, is read for labeler, acquisition labeler is read every EEG signals when one phrase；Using the corresponding EEG signals of collected phrase as prediction target, training term vector, with current Phrase is characterized the EEG signals for predicting its context, and constructing term vector based on EEG signals indicates model, improve word to Measure the accuracy calculated.

Detailed description of the invention

Fig. 1 is the flow diagram of the term vector calculation method first embodiment the present invention is based on EEG signals；

Fig. 2 is the refinement process signal of the term vector calculation method second embodiment step S1 the present invention is based on EEG signals Figure；

Fig. 3 is the refinement process signal of the term vector calculation method 3rd embodiment step S3 the present invention is based on EEG signals Figure；

Fig. 4 is the functional block diagram of the term vector computing device first embodiment the present invention is based on EEG signals；

Fig. 5 is the refinement function mould of the term vector computing device second embodiment acquisition module the present invention is based on EEG signals Block schematic diagram；

Fig. 6 is that the present invention is based on the refinement function moulds of the term vector computing device 3rd embodiment of EEG signals building module Block schematic diagram.

Appended drawing reference:

Collection module -10: collector unit -101；Pretreatment unit -102；Participle unit -103；Conversion unit -104；

Acquisition module -20；

Construct module -30: noise reduction unit -301；Dimensionality reduction unit -302；Initialization unit -303；Construction unit -304.

Specific embodiment

The solution of the embodiment of the present invention is mainly: collecting text corpus, carries out to the corpus in text corpus Processing, obtains the corpus of the continuous phrase format as unit of phrase；The corpus of the continuous phrase format is presented to mark Person reads for labeler, and acquisition labeler reads EEG signals when each phrase；By the corresponding brain telecommunications of collected phrase Number as prediction target, training term vector, characterized by current phrase predict its context EEG signals, building based on brain electricity The term vector of signal indicates model.

Fig. 1 is please referred to, Fig. 1 is the process signal of the term vector calculation method first embodiment the present invention is based on EEG signals Figure, as shown in Figure 1, the present invention is based on the term vector calculation method first embodiment of EEG signals the following steps are included:

Specifically, corpus refers to that the linguistic data really occurred in actual use in language, corpus usually store up There are in corpus, corpus is the database that corpus is carried using electronic computer as carrier, real corpus generally require by Analysis and processing can become useful resource.

Currently, Chinese corpus is mainly the general corpus of Modern Chinese, the Peoples Daily tagged corpus, for language The Modern Chinese corpus of teaching and research, Modern Chinese corpus towards speech signal analysis etc., people are needing corpus When, corpus can be directly acquired from the corpus that these are built up.Certainly, realization of the invention can also be from other corpus Middle acquisition corpus, for example obtain the corpus in internet web page.

Since the training of term vector is using phrase as training data, and the corpus in corpus is usually sentence or text Chapter obtains the corpus of the continuous phrase format as unit of phrase therefore, it is necessary to handle corpus.For example, corpus is sentence Sub " I likes that Beijing, Beijing are the capitals in China ", be processed into the continuous phrase as unit of phrase be " I/love/Beijing/ Beijing/be/China// capital ".

The corpus of continuous phrase format is presented to labeler by step S2, is read for labeler, and acquisition labeler is read every EEG signals when one phrase；Wherein, labeler is to read the user of the corpus presented with continuous phrase format.

Specifically, the present invention is to indicate term vector by EEG signals, is read in labeler with the presentation of continuous phrase format Corpus when, eeg signal acquisition device need to be worn, to obtain EEG signals when labeler reads each phrase.It is marked After EEG signals when person reads each phrase, collected EEG signals and corresponding words group are stored in pairs.

Specifically, all phrases in the pretreatment corpus can be initialized as term vector indicates；Then, institute is traversed All phrases in pretreatment corpus are stated, it is pre- using neural net regression model characterized by the term vector of current phrase indicates The EEG signals for surveying its context compare the EEG signals of the context of prediction and practical EEG signals, obtain prediction Error indicates, wherein the practical EEG signals read institute for labeler according to the term vector of the prediction current phrase of error transfer factor State EEG signals when context；This step is repeated, until prediction error is less than preset threshold.

Contextual window can be three in the present embodiment, characterized by the term vector of current phrase indicates, use nerve Its three phrase above of net regression model prediction and the hereafter EEG signals of three phrases, by the brain telecommunications of the context of prediction It number is compared with practical EEG signals, obtains prediction error and current phrase is adjusted to the error back propagation generated every time Vector indicates.

The present embodiment is through the above scheme: collecting text corpus, handles the corpus in text corpus, obtain The corpus of continuous phrase format as unit of phrase；The corpus of continuous phrase format is presented to labeler, is read for labeler It reads, acquisition labeler reads EEG signals when each phrase；Using the corresponding EEG signals of collected phrase as prediction mesh Mark, training term vector, predicts the EEG signals of its context characterized by current phrase, constructs the term vector based on EEG signals It indicates model, improves the accuracy of term vector calculating.

As the second embodiment of the present invention, referring to figure 2., Fig. 2 be based on Fig. 1 description based on the word of EEG signals to Measure the refinement flow diagram of the step S1 in calculation method.The step S1 collects text corpus, in text corpus Corpus the step of being handled, obtaining the corpus of the continuous phrase format as unit of phrase may include:

Step S11 collects text corpus, and the corpus in text corpus is sentence or chapter rank；

Step S12, removing length in text corpus is more than the corpus of the first preset value or length less than the second preset value, Obtain pretreatment corpus；

Pretreatment corpus is carried out word segmentation processing and obtains word by step S13；

Step S14 converts phrase for word using chunk parsing technology, obtains with the corpus of continuous phrase format.

Specifically, the corpus in text corpus being collected into is usually sentence or article, since the length of sentence can Can be too long or too short, therefore, a sentence length value range can be rule of thumb preset, removing length in corpus is more than first Preset value or length obtain pretreatment corpus, wherein the first preset value and the second preset value less than the corpus of the second preset value It can be set by experience.

In the present embodiment, pretreatment corpus first can be subjected to word segmentation processing, obtains word, then uses chunk parsing skill Word is converted phrase by art, obtains with the corpus of continuous phrase format.

Word segmentation processing depends primarily on what participle dictionary was realized, and the quality for segmenting dictionary directly determines word segmentation processing Quality, the participle dictionary generallyd use at present are by establishing based on " xinhua dictionary " or other similar published book Dictionary can also rely on other participle dictionaries in the present embodiment to carry out word segmentation processing.

Language chunk parsing technology is common technology in shallow grammar analysis, and language chunking technology can be according to scheduled mould Sentences decomposition is component by type, these components are mainly phrase and longer phrase, so that computer is for sentence Understanding can rise to the bigger phrase of information content, phrase from the level of single word, word, be more nearly natural language.

As the third embodiment of the present invention, referring to figure 3., Fig. 3 be based on Fig. 1 description based on the word of EEG signals to Measure the refinement flow diagram of the step S3 in calculation method.The step S3, by the corresponding EEG signals of collected phrase As prediction target, training term vector predicts the EEG signals of its context characterized by current phrase, and building is based on brain telecommunications Number term vector indicate model step may include:

During EEG signals when the corpus that acquisition labeler reading is presented with continuous phrase format, it is easy to be set The influence of standby noise signal and the factors such as electromyography signal and electro-ocular signal, it is therefore desirable to which labeler is read with continuous phrase format The EEG signals when corpus of presentation carry out denoising, the EEG signals of the high s/n ratio after obtaining noise reduction.

Signal-to-noise ratio, English name are called SNR or S/N (SIGNAL-NOISE RATIO), also known as signal to noise ratio.Refer to one The ratio of signal and noise in electronic equipment or electronic system.Here signal refers to passing through from device external needs The electronic signal that this equipment is handled, noise refer to by not existing random in the original signal generated after the equipment Extra (or be information), and this signal does not change with the variation of original signal.The measurement unit of signal-to-noise ratio It is dB, calculation method is 10lg (PS/PN), and wherein PS and PN respectively represents the effective power of signal and noise, and signal-to-noise ratio is got over Height illustrates that noise is smaller.

In the present embodiment, FASTICA algorithm is used to be presented the reading of collected labeler with continuous phrase format EEG signals when corpus are projected as multiple isolated components, then differentiate noise using spectrum signature or high order cross feature etc., Noise component is removed in EEG signals when then reading the corpus presented with continuous phrase format from collected labeler, is obtained The EEG signals of high s/n ratio after to noise reduction, the EEG signals of the high s/n ratio in the present embodiment after noise reduction are preferably noise Than the EEG signals for being higher than 15db.

Independent component analysis (abbreviation ICA) is very effective data analysis tool, it is mainly used to from blended data Extract original independent signal.It as Signal separator a kind of effective ways and widely paid close attention to.It is calculated in many ICA In method, fixed point algorithm (abbreviation FASTICA) is widely used in signal processing with its fast convergence rate, good separating effect and leads Domain.The algorithm can estimate the original signal that mutual statistical is independent, is mixed by X factor from observation signal well.

Step S32 carries out space projection and dimension-reduction treatment to the EEG signals after noise reduction；

Specifically, in the present embodiment, using cospace pattern algorithm (CSP) by the high s/n ratio after the noise reduction of different channels EEG signals projection and dimensionality reduction are carried out according to its spatial position, EEG signals after obtaining dimensionality reduction, in the present embodiment after dimensionality reduction EEG signals be preferably dimension be lower than 300 dimensions EEG signals.

Step S33, all phrases pre-processed in corpus, which are initialized as term vector, to be indicated；

Step S34, traversal pre-processes all phrases in corpus, characterized by the term vector of current phrase indicates, uses The EEG signals of its context of neural net regression model prediction, by the EEG signals of the context of prediction and practical EEG signals It compares, obtains prediction error, indicated according to the term vector of the prediction current phrase of error transfer factor, wherein the real EEG electricity Signal is EEG signals when labeler reads the context；This step is repeated, until macro-forecast error is less than default threshold Value.

Contextual window can be three in the present embodiment, characterized by the term vector of current phrase indicates, use nerve Its three phrase above of net regression model prediction and the hereafter EEG signals of three phrases, by the brain telecommunications of the context of prediction It number is compared with practical EEG signals, obtains prediction error and current phrase is adjusted to the error back propagation generated every time Vector indicates, until default error threshold can rule of thumb be set as 10^-5。

In conclusion the present invention is through the above scheme, text corpus is collected, at the corpus in text corpus Reason, obtains the corpus of the continuous phrase format as unit of phrase；The corpus of continuous phrase format is presented to labeler, for mark Note person reads, and acquisition labeler reads EEG signals when each phrase；Using the corresponding EEG signals of collected phrase as Predict target, training term vector is predicted the EEG signals of its context characterized by current phrase, constructed based on EEG signals Term vector indicates model, improves the accuracy of term vector calculating.

Corresponding with the above-mentioned term vector calculation method based on EEG signals, the present invention also provides be based on EEG signals Term vector computing device.

Referring to figure 4., Fig. 4 is the functional module of the term vector computing device first embodiment the present invention is based on EEG signals Schematic diagram, as shown in figure 4, the present invention is based on the term vector computing device first embodiment of EEG signals include: collection module 10, Acquisition module 20 and building module 30.

Wherein, collection module 10 is handled the corpus in text corpus for collecting text corpus, obtain with Phrase is the corpus of the continuous phrase format of unit；

Acquisition module 20 is used to the corpus of continuous phrase format being presented to labeler, reads for labeler, acquisition mark Person reads EEG signals when each phrase.

Module 30 is constructed to be used to train term vector using the corresponding EEG signals of collected phrase as prediction target, with Current phrase is characterized the EEG signals for predicting its context, and constructing the term vector based on EEG signals indicates model.

Contextual window can be three in the present embodiment, characterized by the term vector of current phrase indicates, use nerve Its three phrase above of net regression model prediction and the hereafter EEG signals of three phrases, by the brain telecommunications of the context of prediction It number is compared with practical EEG signals, obtains prediction error and current phrase is adjusted to the error back propagation generated every time Vector indicates, until overall default error threshold can rule of thumb be set as 10^-5。

The present embodiment is through the above scheme: collection module 10 collects text corpus, to the corpus in text corpus into Row processing, obtains the corpus of the continuous phrase format as unit of phrase；The corpus of continuous phrase format is in by acquisition module 20 Labeler is now given, is read for labeler, acquisition labeler reads EEG signals when each phrase；Collected phrase is corresponding EEG signals as prediction target, training term vector, characterized by current phrase predict its context EEG signals, building Term vector based on EEG signals indicates model, improves the accuracy of term vector calculating.

As the second embodiment of the present invention, referring to figure 5., Fig. 5 be based on Fig. 4 description based on the word of EEG signals to The refinement the functional block diagram of collection module 10 in device for calculating.In the present embodiment, collection module 10 may include: Collector unit 101, pretreatment unit 102, participle unit 103 and conversion unit 104.

Wherein, for collector unit 101 for collecting text corpus, the corpus in the text corpus is sentence or a piece Chapter rank；

Pretreatment unit 102 is more than that the first preset value or length are default less than second for removing length in text corpus The corpus of value obtains pretreatment corpus, wherein the first preset value and the second preset value can be set by experience.

Participle unit 103 obtains word for that will pre-process corpus progress word segmentation processing；

Conversion unit 104 is used to utilize chunk parsing technology, converts phrase for word, obtains with the language of continuous phrase format Material.

Specifically, the corpus in text corpus that collection module 10 is collected into is usually sentence or article, due to sentence The length of son may be too long or too short, therefore, can rule of thumb preset a sentence length value range, removes long in corpus It spends more than the first preset value or length less than the corpus of the second preset value, obtains pretreatment corpus.

In the present embodiment, can first pass through pretreatment unit 102 will pre-process corpus progress word segmentation processing, obtain word, Then chunk parsing technology is used by conversion unit 104, converts phrase for word, obtains with the corpus of continuous phrase format.

As the third embodiment of the present invention, please refer to Fig. 6, Fig. 6 be based on Fig. 4 description based on the word of EEG signals to The refinement the functional block diagram of building module 30 in device for calculating.In the present embodiment, building module 30 may include: Noise reduction unit 301, dimensionality reduction unit 302, initialization unit 303 and construction unit 304.

Wherein, noise reduction unit 301 is used to carry out noise reduction process to collected EEG signals, the brain telecommunications after obtaining noise reduction Number；

In the present embodiment, noise reduction unit 301 uses FASTICA algorithm to read collected labeler with continuous phrase The EEG signals when corpus that format is presented are projected as multiple isolated components, then use spectrum signature or high order cross feature etc. Noise is differentiated, except denoising in EEG signals when then reading the corpus presented with continuous phrase format from collected labeler The EEG signals of cent amount, the EEG signals of the high s/n ratio after obtaining noise reduction, the high s/n ratio in the present embodiment after noise reduction are excellent It is selected as the EEG signals that signal-to-noise ratio is higher than 15db.

Dimensionality reduction unit 302 is used to carry out space projection and dimension-reduction treatment to the EEG signals after noise reduction；

Specifically, in the present embodiment, dimensionality reduction unit 302 uses cospace pattern algorithm (CSP) by the noise reduction of different channels The EEG signals of high s/n ratio afterwards carry out projection and dimensionality reduction according to its spatial position, the EEG signals after obtaining dimensionality reduction, this reality Applying the EEG signals in example after dimensionality reduction is preferably the EEG signals that dimension is lower than 300 dimensions.

Initialization unit 303 is used to all phrases pre-processed in corpus being initialized as term vector to indicate；

Construction unit 304 is used to traverse all phrases in pretreatment corpus, is expressed as spy with the term vector of current phrase Sign, using the EEG signals of its context of neural net regression model prediction, by the EEG signals and reality of the context of prediction EEG signals compare, and obtain prediction error, are indicated according to the term vector of the prediction current phrase of error transfer factor, wherein practical EEG signals are EEG signals when labeler reads context；This step is repeated, until prediction error is less than preset threshold.

In conclusion the present invention is through the above scheme, collection module 10 collects text corpus, in text corpus Corpus is handled, and the corpus of the continuous phrase format as unit of phrase is obtained；Acquisition module 20 is by continuous phrase format Corpus is presented to labeler, reads for labeler, and acquisition labeler reads EEG signals when each phrase；Constructing module 30 will The corresponding EEG signals of collected phrase are predicted above and below it characterized by current phrase as prediction target, training term vector The EEG signals of text, constructing the term vector based on EEG signals indicates model, improves the accuracy of term vector calculating.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. a kind of term vector calculation method based on EEG signals, which is characterized in that the described method comprises the following steps:

Step S1 collects text corpus, handles the corpus in text corpus, obtain continuous as unit of phrase The corpus of phrase format；

The corpus of the continuous phrase format is presented to labeler by step S2, is read for labeler, and acquisition labeler is read every EEG signals when one phrase；

Step S3, using the corresponding EEG signals of collected phrase as prediction target, training term vector, with the word of current phrase Vector is expressed as the EEG signals that feature predicts its context, and constructing the term vector based on EEG signals indicates model.

2. the term vector calculation method according to claim 1 based on EEG signals, which is characterized in that the step S1 packet Include following sub-step:

Step S12, removing length in the text corpus is more than the first preset value or length less than the corpus of two preset values, is obtained To pretreatment corpus；

3. the term vector calculation method according to claim 2 based on EEG signals, which is characterized in that the step S3 packet Include following sub-step:

Step S34, all phrases traversed in the pretreatment corpus use characterized by the term vector of current phrase indicates The EEG signals of its context of neural net regression model prediction, by the EEG signals of the context of prediction and practical EEG signals It compares, obtains prediction error, indicated according to the term vector of the prediction current phrase of error transfer factor, wherein the real EEG electricity Signal is EEG signals when labeler reads the context；This step is repeated, until prediction error is less than preset threshold.

4. the term vector calculation method according to claim 3 based on EEG signals, which is characterized in that the step S31 Include:

The step S32 includes:

Space projection and dimensionality reduction are carried out using the EEG signals that cospace pattern algorithm is higher than third preset value to the signal-to-noise ratio, Obtain the EEG signals that dimension is lower than the 4th preset value.

5. the term vector calculation method according to claim 3 based on EEG signals, which is characterized in that collected to described EEG signals carry out noise reduction process use FASTICA algorithm.

6. a kind of term vector computing device based on EEG signals, which is characterized in that described device includes:

Collection module handles the corpus in text corpus, obtains as unit of phrase for collecting text corpus Continuous phrase format corpus；

Acquisition module is read for the corpus of the continuous phrase format to be presented to labeler for labeler, and labeler is acquired Read EEG signals when each phrase；

Module is constructed, for using the corresponding EEG signals of collected phrase as prediction target, training term vector, with current word The term vector of group indicates to be characterized the EEG signals for predicting its context, term vector expression model of the building based on EEG signals.

7. the term vector computing device according to claim 6 based on EEG signals, which is characterized in that the collection module Include:

Pretreatment unit is more than the first preset value or length less than the second preset value for removing length in the text corpus Corpus, obtain pretreatment corpus；

Conversion unit converts phrase for institute's predicate, obtains with the language of continuous phrase format for utilizing chunk parsing technology Material.

8. the term vector computing device according to claim 7 based on EEG signals, which is characterized in that the building module Include:

Construction unit, for traversing all phrases in the pretreatment corpus, characterized by the term vector of current phrase indicates, Using the EEG signals of its context of neural net regression model prediction, by the EEG signals of the context of prediction and real EEG electricity Signal compares, and obtains prediction error, is indicated according to the term vector of the prediction current phrase of error transfer factor, wherein the reality EEG signals are EEG signals when labeler reads the context；This step is repeated, until prediction error is less than default threshold Value.

9. the term vector computing device according to claim 8 based on EEG signals, which is characterized in that

The noise reduction unit is also used to, and is handled the collected EEG signals, and it is default higher than third to obtain signal-to-noise ratio The EEG signals of value；

The dimensionality reduction unit is also used to, and is higher than the EEG signals of third preset value to the signal-to-noise ratio using cospace pattern algorithm Space projection and dimensionality reduction are carried out, the EEG signals that dimension is lower than the 4th preset value are obtained.

10. the term vector computing device according to claim 8 based on EEG signals, which is characterized in that the noise reduction list Member is also used to carry out noise reduction process to the collected EEG signals using FASTICA algorithm.