CN106502394A - Term vector computational methods and device based on EEG signals - Google Patents

Term vector computational methods and device based on EEG signals Download PDF

Info

Publication number
CN106502394A
CN106502394A CN201610907518.2A CN201610907518A CN106502394A CN 106502394 A CN106502394 A CN 106502394A CN 201610907518 A CN201610907518 A CN 201610907518A CN 106502394 A CN106502394 A CN 106502394A
Authority
CN
China
Prior art keywords
eeg signals
phrase
term vector
language material
collecting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610907518.2A
Other languages
Chinese (zh)
Other versions
CN106502394B (en
Inventor
徐睿峰
杜嘉晨
桂林
黄锦辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN201610907518.2A priority Critical patent/CN106502394B/en
Publication of CN106502394A publication Critical patent/CN106502394A/en
Application granted granted Critical
Publication of CN106502394B publication Critical patent/CN106502394B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/015Input arrangements based on nervous system activity detection, e.g. brain waves [EEG] detection, electromyograms [EMG] detection, electrodermal response detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/011Emotion or mood input determined on the basis of sensed human body parameters such as pulse, heart rate or beat, temperature of skin, facial expressions, iris, voice pitch, brain activity patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing

Abstract

The invention provides a kind of term vector computational methods and device based on EEG signals, the term vector computational methods based on EEG signals include:Step S1, collect text corpus, to text corpus in language material process, obtain the language material of the continuous phrase form in units of phrase;Step S2, the language material of the continuous phrase form person that presents to mark is read for mark person, gathers EEG signals when mark person reads each phrase;Step S3, using corresponding for the phrase for collecting EEG signals as prediction target, is trained term vector, is characterized the EEG signals for predicting its context with current phrase, built and represent model based on the term vector of EEG signals.The present invention improves the accuracy of term vector calculating by such scheme.

Description

Term vector computational methods and device based on EEG signals
Technical field
A kind of the invention belongs to natural language processing technique field, more particularly to term vector calculating side based on EEG signals Method and device.
Background technology
In natural language processing task, term vector is usually used as the expression of the word in original text, to count The machine learning algorithm of value can apply to text data.The basic thought of term vector model is:By a large amount of trainings, The vector that each word in certain language is mapped to a regular length, it is however generally that this length is much smaller than the language word The size of allusion quotation, generally arrives hundreds of dimension tens.All these vectors constitute term vector space, and each vector can be to be somebody's turn to do A point in space.The tolerance of " distance " is spatially introduced at this, it is possible to which the distance according to term vector is judging equivalent Similitude between language in syntax, semantically.Traditional term vector computational methods are try to use up by current text vector The vector of its context may be predicted exactly to optimize what which represented.
In traditional term vector calculating process, predict that context is the primary goal of training by current text.This The major defect of method has following three points:
1st, the other attribute of syntactic level of word has been only taken into account, the attribute of phrase semantic rank has not been accounted for, therefore generally The term vector that training is obtained can only express the relation of more shallow-layer between word;
2nd, lack the modeling to human language cognitive process, have ignored important Cognitive Neuroscience and psychology is special Levy;
3rd, due to the complexity of human language cognitive mechanism, the term vector obtained by simple forecast context cannot embody The characteristic of different natural language processing tasks, universality are poor.
Content of the invention
It is an object of the invention to provide a kind of term vector computational methods and device based on EEG signals, it is intended to improve word The accuracy that vector is calculated.
The present invention is achieved in that a kind of term vector computational methods based on EEG signals, and methods described includes following Step:
Step S1, collect text corpus, to text corpus in language material process, obtain in units of phrase The language material of continuous phrase form;
Step S2, the language material of the continuous phrase form person that presents to mark is read for mark person, and collection mark person read Read EEG signals during each phrase;
Step S3, using corresponding for the phrase for collecting EEG signals as prediction target, trains term vector, with current phrase Term vector represent be characterized predict its context EEG signals, build model is represented based on the term vector of EEG signals.
The further technical scheme of the present invention is that step S1 includes following sub-step:
Step S11, collects text corpus, and the language material in the text corpus is sentence or chapter rank;
Step S12, removes length in the text corpus and is less than the second preset value more than the first preset value or length Language material, obtains pre-processing language material;
The pretreatment language material is carried out word segmentation processing and obtains word by step S13;
Institute's predicate using chunk parsing technology, is converted into phrase, is obtained with the language material of continuous phrase form by step S14.
The further technical scheme of the present invention is that step S3 includes following sub-step:
The EEG signals for collecting are carried out noise reduction process, obtain the EEG signals after noise reduction by step S31;
EEG signals after the noise reduction are carried out space projection and dimension-reduction treatment by step S32;
All phrases in the pretreatment language material are initialized as term vector and are represented by step S33;
Step S34, is traveled through all phrases in the pretreatment language material, is represented with the term vector of current phrase and be characterized, Using the EEG signals of its context of neural net regression model prediction, by the EEG signals of the context of prediction and real EEG electricity Signal is contrasted, and obtains predicated error, is represented according to the term vector that predicated error adjusts current phrase, wherein, the reality EEG signals are mark person reads the EEG signals during context;Repeat this step, until predicated error is less than default threshold Value.
The further technical scheme of the present invention is that step S31 includes:
The EEG signals for collecting are processed, EEG signals of the signal to noise ratio higher than the 3rd preset value are obtained;
Step S32 includes:
Using cospace pattern algorithm to the signal to noise ratio higher than the 3rd preset value EEG signals carry out space projection and Dimensionality reduction, obtains EEG signals of the dimension less than the 4th preset value.
The further technical scheme of the present invention is to carry out noise reduction process employing to the EEG signals for collecting FASTICA algorithms.
Present invention also offers a kind of term vector computing device based on EEG signals, described device includes:
Collection module, for collecting text corpus, to text corpus in language material process, obtain with phrase be The language material of the continuous phrase form of unit;
Acquisition module, for by the language material of the continuous phrase form person that presents to mark, reading for mark person, collection mark Note person reads EEG signals during each phrase;
Module is built, and term vector is trained for using corresponding for the phrase for collecting EEG signals as prediction target, to work as The term vector of front phrase is represented and is characterized the EEG signals for predicting its context, builds and represent mould based on the term vector of EEG signals Type.
The further technical scheme of the present invention is that the collection module includes:
Collector unit, for collecting text corpus, the language material in the text corpus is sentence or chapter rank;
Pretreatment unit, pre- less than second more than the first preset value or length for removing length in the text corpus If the language material of value, obtain pre-processing language material;
Participle unit, obtains word for the pretreatment language material is carried out word segmentation processing;
Conversion unit, for using chunk parsing technology, being converted into phrase by institute's predicate, obtaining with continuous phrase form Language material.
The further technical scheme of the present invention is that the structure module includes:
Noise reduction unit, carries out noise reduction process for the EEG signals to collecting, and obtains the EEG signals after noise reduction;
Dimensionality reduction unit, for carrying out space projection and dimension-reduction treatment to the EEG signals after the noise reduction;
Initialization unit, represents for all phrases in the pretreatment language material are initialized as term vector;
Construction unit, for traveling through all phrases in the pretreatment language material, is expressed as with the term vector of current phrase Feature, using the EEG signals of its context of neural net regression model prediction, by the EEG signals of the context of prediction and reality Border EEG signals are contrasted, and obtain predicated error, are represented according to the term vector that predicated error adjusts current phrase, wherein, institute State EEG signals when the actual EEG signals person that is mark reads the context;Repeat this step, until predicated error is less than Predetermined threshold value.
The further technical scheme of the present invention is that the noise reduction unit is additionally operable to, to the EEG signals for collecting Processed, obtained EEG signals of the signal to noise ratio higher than the 3rd preset value;
The dimensionality reduction unit is additionally operable to, electric higher than the brain of the 3rd preset value to the signal to noise ratio using cospace pattern algorithm Signal carries out space projection and dimensionality reduction, obtains EEG signals of the dimension less than the 4th preset value.
The further technical scheme of the present invention is that the noise reduction module is additionally operable to adopt to described using FASTICA algorithms The EEG signals for collecting carry out noise reduction process.
The invention has the beneficial effects as follows:The term vector computational methods based on EEG signals and device that the present invention is provided, lead to Cross such scheme:Collect text corpus, to text corpus in language material process, obtain continuous in units of phrase The language material of phrase form;By the language material of the continuous phrase form person that presents to mark, read for mark person, collection mark person read per EEG signals during one phrase;Using corresponding for the phrase for collecting EEG signals as prediction target, term vector is trained, with current Phrase be characterized predict its context EEG signals, build model is represented based on the term vector of EEG signals, improve word to The accuracy that amount is calculated.
Description of the drawings
Fig. 1 is schematic flow sheet of the present invention based on the term vector computational methods first embodiment of EEG signals;
Fig. 2 is that refinement flow process of the present invention based on term vector computational methods second embodiment step S1 of EEG signals is illustrated Figure;
Fig. 3 is that refinement flow process of the present invention based on term vector computational methods 3rd embodiment step S3 of EEG signals is illustrated Figure;
Fig. 4 is high-level schematic functional block diagram of the present invention based on the term vector computing device first embodiment of EEG signals;
Fig. 5 is refinement function mould of the present invention based on the term vector computing device second embodiment acquisition module of EEG signals Block schematic diagram;
Fig. 6 is the refinement function mould that term vector computing device 3rd embodiment of the present invention based on EEG signals builds module Block schematic diagram.
Reference:
Collection module -10:Collector unit -101;Pretreatment unit -102;Participle unit -103;Conversion unit -104;
Acquisition module -20;
Build module -30:Noise reduction unit -301;Dimensionality reduction unit -302;Initialization unit -303;Construction unit -304.
Specific embodiment
The solution of the embodiment of the present invention is mainly:Collect text corpus, to text corpus in language material carry out Process, obtain the language material of the continuous phrase form in units of phrase;The language material of the continuous phrase form is presented to mark Person, reads for mark person, gathers EEG signals during mark person each phrase of reading;By corresponding for the phrase for collecting brain telecommunications Number as prediction target, train term vector, with current phrase be characterized predict its context EEG signals, build based on brain electricity The term vector of signal represents model.
Fig. 1 is refer to, Fig. 1 is that flow process of the present invention based on the term vector computational methods first embodiment of EEG signals is illustrated Figure, as shown in figure 1, the present invention is comprised the following steps based on the term vector computational methods first embodiment of EEG signals:
Step S1, collect text corpus, to text corpus in language material process, obtain in units of phrase The language material of continuous phrase form;
Specifically, language material refer to language actually used during the linguistic data that truly occurred, language material generally stores up Exist in corpus, corpus be with electronic computer as carrier carry language material database, real corpus generally require through Analysis and process can become useful resource.
At present, Chinese corpus be mainly the general corpus of Modern Chinese,《People's Daily》Tagged corpus, be used for language The Modern Chinese corpus of teaching and research, the Modern Chinese corpus etc. towards speech signal analysis, people are needing language material When, can be from direct access language material in the corpus that these are built up.Certainly, realization of the invention can be with from other corpus Middle acquisition language material, such as obtains the language material in internet web page.
Due to the training of term vector be with phrase as training data, and the language material in corpus be usually sentence or text Chapter, accordingly, it would be desirable to process to language material, obtains the language material of the continuous phrase form in units of phrase.For example, language material is sentence Sub " I loves Beijing, and Beijing is the capital of China ", be processed into continuous phrase in units of phrase for " I/love/Beijing/ Beijing/be/China// capital ".
Step S2, the language material of the continuous phrase form person that presents to mark is read for mark person, collection mark person read per EEG signals during one phrase;Wherein, the user of the language material that mark person is presented with continuous phrase form for reading.
Specifically, the present invention is to represent term vector by EEG signals, reads in mark person and is presented with continuous phrase form Language material when, eeg signal acquisition device need to be worn, EEG signals during each phrase are read with the person that obtains mark.Marked After EEG signals during person's each phrase of reading, the EEG signals for collecting are stored in pairs with corresponding words group.
Step S3, using corresponding for the phrase for collecting EEG signals as prediction target, trains term vector, with current phrase Term vector represent be characterized predict its context EEG signals, build model is represented based on the term vector of EEG signals.
Specifically, all phrases in the pretreatment language material can be initialized as term vector to represent;Then, institute is traveled through The all phrases in pretreatment language material are stated, is represented with the term vector of current phrase and is characterized, pre- using neural net regression model The EEG signals of its context are surveyed, the EEG signals of the context of prediction are contrasted with actual EEG signals, obtain prediction Error, represents according to the term vector that predicated error adjusts current phrase, and wherein, the actual EEG signals read institute for mark person State EEG signals during context;Repeat this step, until predicated error is less than predetermined threshold value.
In the present embodiment, contextual window can be three, represented with the term vector of current phrase and be characterized, using nerve Net regression model prediction its three phrases and the hereafter EEG signals of three phrases above, by the brain telecommunications of the context of prediction Number contrasted with actual EEG signals, obtained predicated error, to the error back propagation for producing every time, adjusted current phrase Vector representation.
The present embodiment passes through such scheme:Collect text corpus, to text corpus in language material process, obtain The language material of the continuous phrase form in units of phrase;By the language material of the continuous phrase form person that presents to mark, for mark, person reads Read, gather EEG signals during mark person each phrase of reading;Using corresponding for the phrase for collecting EEG signals as prediction mesh Mark, trains term vector, is characterized the EEG signals for predicting its context with current phrase, builds the term vector based on EEG signals Model is represented, the accuracy of term vector calculating is improve.
As the second embodiment of the present invention, refer to Fig. 2, Fig. 2 be the word based on EEG signals that described based on Fig. 1 to The refinement schematic flow sheet of step S1 in amount computational methods.Step S1, collect text corpus, to text corpus in Language material processed, can include the step of the language material for obtaining the continuous phrase form in units of phrase:
Step S11, collects text corpus, and the language material in text corpus is sentence or chapter rank;
Step S12, removes the language material that length in text corpus is less than the second preset value more than the first preset value or length, Obtain pre-processing language material;
Pretreatment language material is carried out word segmentation processing and obtains word by step S13;
Word, using chunk parsing technology, is converted into phrase by step S14, is obtained with the language material of continuous phrase form.
Specifically, the language material in the text corpus that collects is typically sentence or article, as the length of sentence can Can be long or too short, therefore, it can rule of thumb preset a sentence length value range, in removal corpus, length is more than first Preset value or length obtain pre-processing language material, wherein, the first preset value and the second preset value less than the language material of the second preset value Can be set by experience.
Due to the training of term vector be with phrase as training data, and the language material in corpus be usually sentence or text Chapter, accordingly, it would be desirable to process to language material, obtains the language material of the continuous phrase form in units of phrase.For example, language material is sentence Sub " I loves Beijing, and Beijing is the capital of China ", be processed into continuous phrase in units of phrase for " I/love/Beijing/ Beijing/be/China// capital ".
In the present embodiment, pretreatment language material can be carried out word segmentation processing first, obtains word, then adopt chunk parsing skill Word is converted into phrase by art, is obtained with the language material of continuous phrase form.
Word segmentation processing depends primarily on the realization of participle dictionary, and the quality of participle dictionary directly determines word segmentation processing Quality, the participle dictionary for generally adopting at present is to pass through《Xinhua dictionary》Or set up based on other similar published books Dictionary, in the present embodiment, it is also possible to rely on other participle dictionaries to carry out word segmentation processing.
Language chunk parsing technology is the technology that commonly uses in shallow grammar analysis, and language chunk technology can be according to predetermined mould Sentences decomposition is component by type, and these components are mainly phrase and longer phrase, so that computer is for sentence Understand and the bigger phrase of information content, phrase can be risen to from the aspect of single word, word, be more nearly natural language.
As the third embodiment of the present invention, refer to Fig. 3, Fig. 3 be the word based on EEG signals that described based on Fig. 1 to The refinement schematic flow sheet of step S3 in amount computational methods.Step S3, by corresponding for the phrase for collecting EEG signals As prediction target, term vector is trained, the EEG signals for predicting its context are characterized with current phrase, build and be based on brain telecommunications Number term vector represent that model step can include:
The EEG signals for collecting are carried out noise reduction process, obtain the EEG signals after noise reduction by step S31;
During EEG signals during the language material that the person's reading of collection mark is presented with continuous phrase form, easily set The impact of the standby factor such as noise signal and electromyographic signal and electro-ocular signal, it is therefore desirable to which mark person is read with continuous phrase form The EEG signals during language material of presentation carry out denoising, obtain the EEG signals of the high s/n ratio after noise reduction.
Signal to noise ratio, English name are called SNR or S/N (SIGNAL-NOISE RATIO), are also called signal to noise ratio.Refer to one The ratio of signal and noise in electronic equipment or electronic system.Here signal refers to need to pass through from device external The electronic signal processed by this equipment, noise refer in the original signal produced after the equipment and non-existent random Extra (or referred to as information), and this signal do not change with the change of original signal.The measurement unit of signal to noise ratio It is dB, its computational methods is the effective power of 10lg (PS/PN), wherein PS and PN difference representation signal and noise, and signal to noise ratio gets over Height, illustrates that noise is less.
In the present embodiment, FASTICA algorithms are adopted to read the mark person for collecting with the presentation of continuous phrase form EEG signals during language material are projected as multiple isolated components, then differentiate noise using spectrum signature or high order cross feature etc., Remove noise component in EEG signals when then the language material presented with continuous phrase form is read from the mark person for collecting, obtain The EEG signals of the high s/n ratio to after noise reduction, the EEG signals of the high s/n ratio in the present embodiment after noise reduction are preferably noise Than the EEG signals higher than 15db.
Independent component analysis (abbreviation ICA) are very effective data analysis tools, and it is mainly used to from blended data Extract original independent signal.It is widely paid close attention to as a kind of effective ways of Signal separator.Calculate in many ICA In method, fixing point algorithm (abbreviation FASTICA) is widely used in signal transacting neck with its fast convergence rate, good separating effect Domain.Primary signal that the algorithm can estimate mutual statistical independence well from observation signal, being mixed by X factor.
Step S32, to noise reduction after EEG signals carry out space projection and dimension-reduction treatment;
Specifically, in the present embodiment, using the high s/n ratio after noise reduction of the cospace pattern algorithm (CSP) by different channels EEG signals projected according to its locus and dimensionality reduction, obtain the EEG signals after dimensionality reduction, in the present embodiment after dimensionality reduction EEG signals be preferably dimension less than 300 dimensions EEG signals.
All phrases in pretreatment language material are initialized as term vector and are represented by step S33;
Step S34, all phrases in traversal pretreatment language material, is represented with the term vector of current phrase and is characterized, used The EEG signals of neural net regression model prediction its context, by the EEG signals of the context of prediction and actual EEG signals Contrasted, obtained predicated error, represented according to the term vector that predicated error adjusts current phrase, wherein, the real EEG electricity Signal is mark person reads the EEG signals during context;Repeat this step, until macro-forecast error is less than default threshold Value.
In the present embodiment, contextual window can be three, represented with the term vector of current phrase and be characterized, using nerve Net regression model prediction its three phrases and the hereafter EEG signals of three phrases above, by the brain telecommunications of the context of prediction Number contrasted with actual EEG signals, obtained predicated error, to the error back propagation for producing every time, adjusted current phrase Vector representation, until default error threshold can rule of thumb be set as 10-5.
In sum, the present invention collects text corpus by such scheme, to text corpus in language material at Reason, obtains the language material of the continuous phrase form in units of phrase;By the language material of the continuous phrase form person that presents to mark, for mark Note person reads, and gathers EEG signals during mark person each phrase of reading;Using corresponding for the phrase for collecting EEG signals as Prediction target, trains term vector, is characterized the EEG signals for predicting its context with current phrase, builds based on EEG signals Term vector represents model, improves the accuracy of term vector calculating.
Corresponding with the above-mentioned term vector computational methods based on EEG signals, present invention also offers being based on EEG signals Term vector computing device.
Fig. 4 is refer to, Fig. 4 is functional module of the present invention based on the term vector computing device first embodiment of EEG signals Schematic diagram, as shown in figure 4, the present invention is included based on the term vector computing device first embodiment of EEG signals:Collection module 10, Acquisition module 20 and structure module 30.
Wherein, collection module 10 be used for collect text corpus, to text corpus in language material process, obtain with Language material of the phrase for the continuous phrase form of unit;
Specifically, language material refer to language actually used during the linguistic data that truly occurred, language material generally stores up Exist in corpus, corpus be with electronic computer as carrier carry language material database, real corpus generally require through Analysis and process can become useful resource.
At present, Chinese corpus be mainly the general corpus of Modern Chinese,《People's Daily》Tagged corpus, be used for language The Modern Chinese corpus of teaching and research, the Modern Chinese corpus etc. towards speech signal analysis, people are needing language material When, can be from direct access language material in the corpus that these are built up.Certainly, realization of the invention can be with from other corpus Middle acquisition language material, such as obtains the language material in internet web page.
Due to the training of term vector be with phrase as training data, and the language material in corpus be usually sentence or text Chapter, accordingly, it would be desirable to process to language material, obtains the language material of the continuous phrase form in units of phrase.For example, language material is sentence Sub " I loves Beijing, and Beijing is the capital of China ", be processed into continuous phrase in units of phrase for " I/love/Beijing/ Beijing/be/China// capital ".
Acquisition module 20 is used for, by the language material of the continuous phrase form person that presents to mark, reading for mark person, collection mark Person reads EEG signals during each phrase.
Specifically, the present invention is to represent term vector by EEG signals, reads in mark person and is presented with continuous phrase form Language material when, eeg signal acquisition device need to be worn, EEG signals during each phrase are read with the person that obtains mark.Marked After EEG signals during person's each phrase of reading, the EEG signals for collecting are stored in pairs with corresponding words group.
Building module 30 is used for corresponding for the phrase for collecting EEG signals as prediction target, trains term vector, with Current phrase is characterized the EEG signals for predicting its context, builds and represents model based on the term vector of EEG signals.
In the present embodiment, contextual window can be three, represented with the term vector of current phrase and be characterized, using nerve Net regression model prediction its three phrases and the hereafter EEG signals of three phrases above, by the brain telecommunications of the context of prediction Number contrasted with actual EEG signals, obtained predicated error, to the error back propagation for producing every time, adjusted current phrase Vector representation, until overall default error threshold can rule of thumb be set as 10-5.
The present embodiment passes through such scheme:Collection module 10 collect text corpus, to text corpus in language material enter Row is processed, and obtains the language material of the continuous phrase form in units of phrase;The language material of continuous phrase form is in by acquisition module 20 Mark person is now given, is read for mark person, gather EEG signals during mark person each phrase of reading;The phrase for collecting is corresponding EEG signals as prediction target, train term vector, with current phrase be characterized predict its context EEG signals, build Model is represented based on the term vector of EEG signals, the accuracy of term vector calculating is improve.
As the second embodiment of the present invention, refer to Fig. 5, Fig. 5 be the word based on EEG signals that described based on Fig. 4 to The refinement high-level schematic functional block diagram of the collection module 10 in device for calculating.In the present embodiment, collection module 10 can include: Collector unit 101, pretreatment unit 102, participle unit 103 and conversion unit 104.
Wherein, collector unit 101 is used for collecting text corpus, and the language material in the text corpus is sentence or a piece Chapter rank;
Pretreatment unit 102 is preset less than second more than the first preset value or length for removing length in text corpus The language material of value, obtains pre-processing language material, and wherein, the first preset value and the second preset value can be set by experience.
Participle unit 103 is used for carrying out word segmentation processing and obtaining word pretreatment language material;
Conversion unit 104 is used for, using chunk parsing technology, word being converted into phrase, being obtained with the language of continuous phrase form Material.
Specifically, the language material in the text corpus that collection module 10 is collected is typically sentence or article, due to sentence The length of son may be long or too short, therefore, it can rule of thumb preset a sentence length value range, removes long in corpus Spend more than the first preset value or length less than the language material of the second preset value, obtain pre-processing language material.
Due to the training of term vector be with phrase as training data, and the language material in corpus be usually sentence or text Chapter, accordingly, it would be desirable to process to language material, obtains the language material of the continuous phrase form in units of phrase.For example, language material is sentence Sub " I loves Beijing, and Beijing is the capital of China ", be processed into continuous phrase in units of phrase for " I/love/Beijing/ Beijing/be/China// capital ".
In the present embodiment, can first pass through pretreatment unit 102 carries out word segmentation processing by language material is pre-processed, and obtains word, Then word is converted into phrase, is obtained with the language material of continuous phrase form using chunk parsing technology by conversion unit 104.
Word segmentation processing depends primarily on the realization of participle dictionary, and the quality of participle dictionary directly determines word segmentation processing Quality, the participle dictionary for generally adopting at present is to pass through《Xinhua dictionary》Or set up based on other similar published books Dictionary, in the present embodiment, it is also possible to rely on other participle dictionaries to carry out word segmentation processing.
Language chunk parsing technology is the technology that commonly uses in shallow grammar analysis, and language chunk technology can be according to predetermined mould Sentences decomposition is component by type, and these components are mainly phrase and longer phrase, so that computer is for sentence Understand and the bigger phrase of information content, phrase can be risen to from the aspect of single word, word, be more nearly natural language.
As the third embodiment of the present invention, refer to Fig. 6, Fig. 6 be the word based on EEG signals that described based on Fig. 4 to The refinement high-level schematic functional block diagram of the structure module 30 in device for calculating.In the present embodiment, building module 30 can include: Noise reduction unit 301, dimensionality reduction unit 302, initialization unit 303 and construction unit 304.
Wherein, noise reduction unit 301 is used for carrying out noise reduction process to the EEG signals for collecting, and obtains the brain telecommunications after noise reduction Number;
During EEG signals during the language material that the person's reading of collection mark is presented with continuous phrase form, easily set The impact of the standby factor such as noise signal and electromyographic signal and electro-ocular signal, it is therefore desirable to which mark person is read with continuous phrase form The EEG signals during language material of presentation carry out denoising, obtain the EEG signals of the high s/n ratio after noise reduction.
Signal to noise ratio, English name are called SNR or S/N (SIGNAL-NOISE RATIO), are also called signal to noise ratio.Refer to one The ratio of signal and noise in electronic equipment or electronic system.Here signal refers to need to pass through from device external The electronic signal processed by this equipment, noise refer in the original signal produced after the equipment and non-existent random Extra (or referred to as information), and this signal do not change with the change of original signal.The measurement unit of signal to noise ratio It is dB, its computational methods is the effective power of 10lg (PS/PN), wherein PS and PN difference representation signal and noise, and signal to noise ratio gets over Height, illustrates that noise is less.
In the present embodiment, noise reduction unit 301 adopts FASTICA algorithms to read the mark person for collecting with continuous phrase The EEG signals during language material that form is presented are projected as multiple isolated components, then adopt spectrum signature or high order cross feature etc. Differentiate noise, except denoising in EEG signals when then the language material presented with continuous phrase form is read from the mark person for collecting Cent amount, obtains the EEG signals of the high s/n ratio after noise reduction, and the EEG signals of the high s/n ratio in the present embodiment after noise reduction are excellent Elect EEG signals of the signal to noise ratio higher than 15db as.
Independent component analysis (abbreviation ICA) are very effective data analysis tools, and it is mainly used to from blended data Extract original independent signal.It is widely paid close attention to as a kind of effective ways of Signal separator.Calculate in many ICA In method, fixing point algorithm (abbreviation FASTICA) is widely used in signal transacting neck with its fast convergence rate, good separating effect Domain.Primary signal that the algorithm can estimate mutual statistical independence well from observation signal, being mixed by X factor.
EEG signals after dimensionality reduction unit 302 is used for noise reduction carry out space projection and dimension-reduction treatment;
Specifically, in the present embodiment, dimensionality reduction unit 302 uses cospace pattern algorithm (CSP) by the noise reduction of different channels The EEG signals of high s/n ratio afterwards are projected according to its locus and dimensionality reduction, obtain the EEG signals after dimensionality reduction, this reality Apply the EEG signals in example after dimensionality reduction and be preferably EEG signals of the dimension less than 300 dimensions.
Initialization unit 303 is represented for all phrases in pretreatment language material are initialized as term vector;
Construction unit 304 is used for traveling through all phrases in pretreatment language material, is expressed as spy with the term vector of current phrase Levy, using the EEG signals of its context of neural net regression model prediction, by the EEG signals of the context of prediction and reality EEG signals are contrasted, and obtain predicated error, are represented according to the term vector that predicated error adjusts current phrase, wherein, actual EEG signals are mark person reads EEG signals during context;Repeat this step, until predicated error is less than predetermined threshold value.
In the present embodiment, contextual window can be three, represented with the term vector of current phrase and be characterized, using nerve Net regression model prediction its three phrases and the hereafter EEG signals of three phrases above, by the brain telecommunications of the context of prediction Number contrasted with actual EEG signals, obtained predicated error, to the error back propagation for producing every time, adjusted current phrase Vector representation, until overall default error threshold can rule of thumb be set as 10-5.
In sum, by such scheme, collection module 10 collects text corpus to the present invention, to text corpus in Language material is processed, and obtains the language material of the continuous phrase form in units of phrase;Acquisition module 20 is by continuous phrase form Language material presents to mark person, reads for mark person, gathers EEG signals when mark person reads each phrase;Building module 30 will The corresponding EEG signals of the phrase that collects train term vector as prediction target, and which is upper and lower to be characterized prediction with current phrase The EEG signals of text, build and represent model based on the term vector of EEG signals, improve the accuracy of term vector calculating.
Presently preferred embodiments of the present invention is the foregoing is only, not in order to limit the present invention, all in essence of the invention Any modification, equivalent and improvement that is made within god and principle etc., should be included within the scope of the present invention.

Claims (10)

1. a kind of term vector computational methods based on EEG signals, it is characterised in that the method comprising the steps of:
Step S1, collect text corpus, to text corpus in language material process, obtain continuous in units of phrase The language material of phrase form;
Step S2, the language material of the continuous phrase form person that presents to mark is read for mark person, collection mark person read per EEG signals during one phrase;
Step S3, using corresponding for the phrase for collecting EEG signals as prediction target, trains term vector, with the word of current phrase Vector representation is characterized the EEG signals for predicting its context, builds and represents model based on the term vector of EEG signals.
2. term vector computational methods based on EEG signals according to claim 1, it is characterised in that the step S1 bag Include following sub-step:
Step S11, collects text corpus, and the language material in the text corpus is sentence or chapter rank;
Step S12, removes the language material that length in the text corpus is less than two preset values more than the first preset value or length, obtains To pretreatment language material;
The pretreatment language material is carried out word segmentation processing and obtains word by step S13;
Institute's predicate using chunk parsing technology, is converted into phrase, is obtained with the language material of continuous phrase form by step S14.
3. term vector computational methods based on EEG signals according to claim 2, it is characterised in that the step S3 bag Include following sub-step:
The EEG signals for collecting are carried out noise reduction process, obtain the EEG signals after noise reduction by step S31;
EEG signals after the noise reduction are carried out space projection and dimension-reduction treatment by step S32;
All phrases in the pretreatment language material are initialized as term vector and are represented by step S33;
Step S34, is traveled through all phrases in the pretreatment language material, is represented with the term vector of current phrase and be characterized, used The EEG signals of neural net regression model prediction its context, by the EEG signals of the context of prediction and actual EEG signals Contrasted, obtained predicated error, represented according to the term vector that predicated error adjusts current phrase, wherein, the real EEG electricity Signal is mark person reads the EEG signals during context;Repeat this step, until predicated error is less than predetermined threshold value.
4. term vector computational methods based on EEG signals according to claim 3, it is characterised in that step S31 Including:
The EEG signals for collecting are processed, EEG signals of the signal to noise ratio higher than the 3rd preset value are obtained;
Step S32 includes:
EEG signals using cospace pattern algorithm to the signal to noise ratio higher than the 3rd preset value carry out space projection and dimensionality reduction, Obtain EEG signals of the dimension less than the 4th preset value.
5. term vector computational methods based on EEG signals according to claim 3, it is characterised in that collect to described EEG signals carry out noise reduction process using FASTICA algorithms.
6. a kind of term vector computing device based on EEG signals, it is characterised in that described device includes:
Collection module, for collecting text corpus, to text corpus in language material process, obtain in units of phrase Continuous phrase form language material;
Acquisition module, for by the language material of the continuous phrase form person that presents to mark, reading for mark person, gathers mark person Read EEG signals during each phrase;
Module is built, term vector is trained, with current word for using corresponding for the phrase for collecting EEG signals as prediction target The term vector of group is represented and is characterized the EEG signals for predicting its context, builds and represent model based on the term vector of EEG signals.
7. the term vector computing device based on EEG signals according to claim 6, it is characterised in that the collection module Including:
Collector unit, for collecting text corpus, the language material in the text corpus is sentence or chapter rank;
Pretreatment unit, is less than the second preset value for removing length in the text corpus more than the first preset value or length Language material, obtain pre-process language material;
Participle unit, obtains word for the pretreatment language material is carried out word segmentation processing;
Conversion unit, for using chunk parsing technology, being converted into phrase by institute's predicate, obtaining with the language of continuous phrase form Material.
8. the term vector computing device based on EEG signals according to claim 7, it is characterised in that the structure module Including:
Noise reduction unit, carries out noise reduction process for the EEG signals to collecting, and obtains the EEG signals after noise reduction;
Dimensionality reduction unit, for carrying out space projection and dimension-reduction treatment to the EEG signals after the noise reduction;
Initialization unit, represents for all phrases in the pretreatment language material are initialized as term vector;
Construction unit, for traveling through all phrases in the pretreatment language material, is represented with the term vector of current phrase and is characterized, Using the EEG signals of its context of neural net regression model prediction, by the EEG signals of the context of prediction and real EEG electricity Signal is contrasted, and obtains predicated error, is represented according to the term vector that predicated error adjusts current phrase, wherein, the reality EEG signals are mark person reads the EEG signals during context;Repeat this step, until predicated error is less than default threshold Value.
9. the term vector computing device based on EEG signals according to claim 8, it is characterised in that
The noise reduction unit is additionally operable to, and the EEG signals for collecting are processed, and obtains signal to noise ratio and presets higher than the 3rd The EEG signals of value;
The dimensionality reduction unit is additionally operable to, and is higher than the EEG signals of the 3rd preset value to the signal to noise ratio using cospace pattern algorithm Space projection and dimensionality reduction is carried out, EEG signals of the dimension less than the 4th preset value are obtained.
10. the term vector computing device based on EEG signals according to claim 8, it is characterised in that the noise reduction mould Block is additionally operable to carry out noise reduction process using FASTICA algorithms to the EEG signals for collecting.
CN201610907518.2A 2016-10-18 2016-10-18 Term vector calculation method and device based on EEG signals Active CN106502394B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610907518.2A CN106502394B (en) 2016-10-18 2016-10-18 Term vector calculation method and device based on EEG signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610907518.2A CN106502394B (en) 2016-10-18 2016-10-18 Term vector calculation method and device based on EEG signals

Publications (2)

Publication Number Publication Date
CN106502394A true CN106502394A (en) 2017-03-15
CN106502394B CN106502394B (en) 2019-06-25

Family

ID=58295164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610907518.2A Active CN106502394B (en) 2016-10-18 2016-10-18 Term vector calculation method and device based on EEG signals

Country Status (1)

Country Link
CN (1) CN106502394B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881665A (en) * 2020-09-27 2020-11-03 华南师范大学 Word embedding representation method, device and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881401A (en) * 2015-05-27 2015-09-02 大连理工大学 Patent literature clustering method
CN104965822A (en) * 2015-07-29 2015-10-07 中南大学 Emotion analysis method for Chinese texts based on computer information processing technology
CN105701084A (en) * 2015-12-28 2016-06-22 广东顺德中山大学卡内基梅隆大学国际联合研究院 Characteristic extraction method of text classification on the basis of mutual information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881401A (en) * 2015-05-27 2015-09-02 大连理工大学 Patent literature clustering method
CN104965822A (en) * 2015-07-29 2015-10-07 中南大学 Emotion analysis method for Chinese texts based on computer information processing technology
CN105701084A (en) * 2015-12-28 2016-06-22 广东顺德中山大学卡内基梅隆大学国际联合研究院 Characteristic extraction method of text classification on the basis of mutual information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881665A (en) * 2020-09-27 2020-11-03 华南师范大学 Word embedding representation method, device and equipment
CN111881665B (en) * 2020-09-27 2021-01-05 华南师范大学 Word embedding representation method, device and equipment

Also Published As

Publication number Publication date
CN106502394B (en) 2019-06-25

Similar Documents

Publication Publication Date Title
WO2019080863A1 (en) Text sentiment classification method, storage medium and computer
CN108733653A (en) A kind of sentiment analysis method of the Skip-gram models based on fusion part of speech and semantic information
Deng et al. Speech-based diagnosis of autism spectrum condition by generative adversarial network representations
CN108388554B (en) Text emotion recognition system based on collaborative filtering attention mechanism
CN107480136B (en) Method applied to emotional curve analysis in movie script
CN109448851A (en) A kind of cognition appraisal procedure and device
CN102122297A (en) Semantic-based Chinese network text emotion extracting method
CN106502979A (en) A kind of data processing method of natural language information and device
CN108090099B (en) Text processing method and device
CN107943786A (en) A kind of Chinese name entity recognition method and system
CN106446147A (en) Emotion analysis method based on structuring features
CN111191463A (en) Emotion analysis method and device, electronic equipment and storage medium
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
Sonawane et al. Speech to Indian sign language (ISL) translation system
Calderone et al. A computational approach to morphonotactics: Evidence from German
Treviso et al. Evaluating word embeddings for sentence boundary detection in speech transcripts
CN110096696A (en) A kind of Chinese long text sentiment analysis method
CN111832302A (en) Named entity identification method and device
CN106502394A (en) Term vector computational methods and device based on EEG signals
CN112287667A (en) Text generation method and equipment
CN103019924B (en) The intelligent evaluating system of input method and method
CN111639189A (en) Text graph construction method based on text content features
CN105678325A (en) Textual emotion marking method, device and system
Osathanunkul et al. Semantic similarity measures for the development of thai dialog system
Kane et al. Towards establishing a mute communication: An Indian sign language perspective

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant