CN109684637A - A kind of integrated use method of text feature - Google Patents
A kind of integrated use method of text feature Download PDFInfo
- Publication number
- CN109684637A CN109684637A CN201811571221.9A CN201811571221A CN109684637A CN 109684637 A CN109684637 A CN 109684637A CN 201811571221 A CN201811571221 A CN 201811571221A CN 109684637 A CN109684637 A CN 109684637A
- Authority
- CN
- China
- Prior art keywords
- word
- vector
- model
- text
- word2vec
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 239000013598 vector Substances 0.000 claims abstract description 90
- 238000012549 training Methods 0.000 claims abstract description 22
- 239000011159 matrix material Substances 0.000 claims abstract description 20
- 238000002203 pretreatment Methods 0.000 claims abstract description 6
- 230000015572 biosynthetic process Effects 0.000 claims description 6
- 238000003786 synthesis reaction Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000007477 logistic regression Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000013145 classification model Methods 0.000 abstract description 2
- 230000000295 complement effect Effects 0.000 abstract 1
- 230000000694 effects Effects 0.000 description 6
- 238000010801 machine learning Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000012407 engineering method Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000005194 fractionation Methods 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of integrated use methods of text feature, belong to field of artificial intelligence, this method handles corpus with completely consistent Text Pretreatment method, then TFIDF Feature Engineering model and Word2vec Feature Engineering model is respectively trained, obtains the same corpus that two different vector matrixs indicate;Then obtain two vector matrixs are simply spliced into the higher vector matrix of dimension, use vector matrix training classification task model.Complementary relevance between one word of description that can be more comprehensive and accurate conspicuousness in a document and context is carried out present invention incorporates the respective advantage of TFIDF and word2vec and characteristic, promotes the accuracy of subsequent train classification models.
Description
Technical field
The present invention relates to field of artificial intelligence, the integrated use method of specifically a kind of text feature.
Background technique
In the practice of intelligent medical treatment, all generating a large amount of data all the time, as the health status of patient, prescription,
Medical doctor's advice, progress note, consultation note etc..Make it what current intelligent medical treatment flourished, by collecting, storing these numbers
Have profound significance according to classification later, is carried out to it, not only help preferably management data, in case later analysis and calling,
The distribution and inherent rule of classification discovery data can more be passed through.In face of the data of Rapid Accumulation, although manual sort can guarantee
Higher accuracy rate, but compared to the method with machine learning, the efficiency of manual sort is then more low, therefore, in big data
It is imperative that sorted generalization is carried out by data of the machine learning to medical industry increasingly developed today.
Summary of the invention
Technical assignment of the invention be against the above deficiency place, a kind of integrated use method of text feature, energy are provided
Relevance between enough more comprehensive and accurate one words of description conspicuousness in a document and context, promotes subsequent training
The accuracy of disaggregated model.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of integrated use method of text feature is handled with completely consistent Text Pretreatment method in the method
Then TFIDF Feature Engineering model and Word2vec Feature Engineering model is respectively trained in corpus, obtain two different vectors
The same corpus that matrix indicates, but the two different vector matrixs are each has different focus by oneself, such as the conspicuousness of vocabulary
Or the correlation of context;
Then obtain two vector matrixs are simply spliced into the higher vector matrix of dimension, word is increased with this
The comprehensive and accuracy of vector description is able to ascend subsequent supervised training using vector matrix training classification task model
Learning effect.
Wherein, TFIDF is for calculating word frequency, including original word frequency algorithm and inverse document frequency value;
Word2vec is used on the basis of TFIDF, solves the relevance of word within a context.
TFIDF had both considered the frequency of word in a document, it is also considered that frequency of the word in entire corpus, therefore most of
Task shows preferably again and stablizes.TF-IDF algorithm is simple and quick, is as a result more conform with actual conditions, but is come with " word frequency " merely
The importance for measuring a word is obviously not comprehensive enough, and the number that sometimes important word occurs is simultaneously few, and this algorithm can not
Embody the location information of word.And by combining Word2Vec algorithm that can effectively solve the relevance of word within a context.
In the text that medical field generates, not only there is significant indicative medical vocabulary, there are also stronger causes and effects to close
System, i.e. context, which exist, to be closely connected.Therefore, the feature that integrated use TFIDF and word2vec is generated synthesizes new feature,
This feature considers that the particularity of single word and context are contacted with it simultaneously.Feature after merging can be used for optimizing subsequent
Model training.
Preferably, the Text Pretreatment method includes segmenting and removing stop words.
Preferably, TFIDF and Word2vec is respectively trained, obtains the same corpus that two different vector matrixs indicate
Library, then obtain two vector matrixs are simply spliced into the higher vector matrix of dimension, it operates as follows:
Document vector dimension: N*K is obtained using TFIDF;Document vector dimension: N*L is obtained using Word2vec;Two to
Moment matrix is spliced into higher vector matrix of dimension: N* (K+L).
Specifically, the specific implementation steps are as follows for this method:
1), Text Pretreatment, including segment and remove stop words, there are many tool that can be used, such as jieba,
Thulac etc., the deactivated dictionary that deactivated dictionary oneself can be constructed or be increased income using each mechanism, the two all can be according to actual needs
Manually expanded in conjunction with the characteristics of medical text;
2) two Feature Engineering models, are trained using TFIDF and Word2Vec respectively, save as vector matrix mode;
3) two feature vectors, are spliced into a new feature by the direction (i.e. extension columns) for degree of being extended to
Vector, the vector include description of the TFIDF and Word2Vec to the same corpus;
4), the vector of synthesis is used for the training of the disaggregated model of text.
Specifically, the disaggregated model include Linear SVM Classifier, logistic regression orBayes Classifier etc..
Specifically, the IFIDF model training,
The TF value for first calculating word, uses formulaWherein, niTime occurred in a certain document for a certain word
Number, the N sum of word contained by document thus;
The IDF value for calculating word again, uses formulaThat is article total number is divided by including the word
Article number after take logarithm;
The product for calculating TF and IDF, obtains the TFIDF weight of required word.
TFIDF (term frequency-inverse document frequency) algorithm is a kind of common weighting technique for information retrieval and data mining,
It plays a significant role assessing the significance level of a word in a document.Instinctively, the high frequency words in article can represent this
The feature of a document, that is, original word frequency (TF) algorithm.But certain words are such as " we ", the words such as " everybody " are in most of texts
The frequency occurred in chapter is very high, can not represent the feature of a certain particular document, so we are also contemplated that the inverse document frequency of vocabulary
Rate value (IDF).The two is combined as TF-IDF algorithm.
In this, we first calculate the TF value an of word, can use formulaIt obtains, wherein niExist for a certain word
The number occurred in a certain document, the N sum of word contained by document thus.The IDF value of word can use formulaIt obtains, i.e. article total number after the article number comprising the word divided by taking logarithm to obtain.It calculates
The product tf of TF and IDFi*idfi, the TF-IDF weight of a certain word needed for us can be obtained.The high word of a certain specific file
The low document-frequency of speech frequency rate and the word in entire file set, can produce out the TF-IDF value of high weight.
Therefore, TF-IDF algorithm can filter out the everyday words frequently occurred, retain important representational word.
Word2vec algorithm is the algorithm that word is converted to vector form.Traditional term vector method is with one long
Degree is N, component only one 1, other indicate a word, also referred to as sparse vector for 0 vectors.Sparse vector is a wide range of
Dimension disaster can be generated in text application and can not embody the association between word.And word2vec algorithm can pass through training
Word is mapped to the vector of a regular length, all these vectors can be made up of the cumulative method averaged document to
Amount.Each vector can be considered a point spatially, and vector length can be unrelated with article scale with unrestricted choice at this time, and
The association between word can be embodied.
Specifically, the Word2vec model training includes CBOW model and Skip-gram model, CBOW model is logical
N-1 word is crossed around input to predict the word itself;Skip-gram model predicts context according to word itself.
Word2vec is based on skip-gram or CBOW, and single word and set of context can join together to consider.
Further, the CBOW model and Skip-gram model are three-layer neural network, i.e. input layer, middle layer
And output layer,
The input layer of CBOW model is the term vector of current term upper and lower level window, and middle layer adds up the vector of upper and lower word
(or averaging) obtains intermediate vector, and output layer is a Huffman tree, and leaf node represents word all in corpus, for
Each leaf node can have a global coding, such as " 01001 ", each node of following middle layer can be with Hough
Graceful tree nonleaf node is related, each nonleaf node is the Softmax of one two classification in fact, by each of intermediate vector
Node is assigned in the subtree of Huffman tree, and the parameter of nonleaf node is updated by continuous iteration, the vector of each word can be obtained
It indicates;
Skip-gram mode input layer is current term, and hidden layer is due to being no longer multiple vectors, then tired without doing vector
Add, output layer is equally a Huffman tree, based on context the coding of the coding of word and current word, according to gradient descent algorithm
The parameter of nonleaf node and current term vector in more new route are until convergence, the vector that each word can be obtained indicate.
Association between term vector can embody the association between word.
Further, the middle layer is hidden layer.
A kind of integrated use method of text feature of the invention compared with prior art, has the advantages that
The fractionation of one section of word is independent word and is shown with by vector table and helps machine and better understands and learning text
Feature, the effect of document classification can also increase accordingly.TFIDF it both considered the frequency of word in a document, it is also considered that word is entire
Frequency in corpus, its performance preferably and is stablized in most of tasks;And word2vec is based on skip-gram or CBOW,
Single word and set of context can be joined together to consider.The feature synthesis that integrated use TFIDF and word2vec are generated
For new feature, this feature simultaneously considers that the particularity of single word and context are contacted with it.Feature after merging is available
In the subsequent model training of optimization.This method can be applied to medical field, and big data is applied to the classification of medical industry data
And analysis, improve the classification effectiveness of machine learning.
This method combines the advantage and characteristic progress complementation of TFIDF and word2vec respectively, more complete to establish with this
Relevance between the conspicuousness in a document and context of face and accurate term vector to describe a word, and then after being promoted
The accuracy of continuous train classification models.The process for carrying out vectorization to corpus using single features engineering method at present is improved,
By the synthesis of two feature vectors, more comprehensive description is provided to corpus, is helped to improve subsequent to medical text point
The effect of class model training, optimizes hospital work process, reduces the workload that doctor arrange to data classification.
Intelligent medical treatment is one of the field of current artificial intelligence application, and carrying out classification to medical text is in intelligent medical treatment
Important link, this method are facilitated excellent using the programming language of general at present open source natural language processing kit and mainstream
Change hospital work process, reduces the workload that doctor arrange to data classification, improve classification effectiveness etc..
Detailed description of the invention
Fig. 1 is the flow chart of the integrated use method of text feature of the invention;
Fig. 2 is the building flow chart of comprehensive characteristics engineering model of the invention.
Specific embodiment
The present invention is further explained in the light of specific embodiments.
A kind of integrated use method of text feature is handled with completely consistent Text Pretreatment method in the method
Then TFIDF Feature Engineering model and Word2vec Feature Engineering model is respectively trained in corpus, obtain two different vectors
The same corpus that matrix indicates, but the two different vector matrixs are each has different focus by oneself, such as the conspicuousness of vocabulary
Or the correlation of context;
Wherein, TFIDF is for calculating word frequency, including original word frequency algorithm and inverse document frequency value;Word2vec is used for
On the basis of TFIDF, the relevance of word within a context is solved.
In one embodiment of the invention, the Text Pretreatment method includes segmenting and removing stop words.
Then obtain two vector matrixs are simply spliced into the higher vector matrix of dimension, word is increased with this
The comprehensive and accuracy of vector description is able to ascend subsequent supervised training using vector matrix training classification task model
Learning effect.
TFIDF and Word2vec is respectively trained, obtains the same corpus that two different vector matrixs indicate, then incite somebody to action
To two vector matrixs be simply spliced into the higher vector matrix of dimension, operate as follows:
Document vector dimension: N*K is obtained using TFIDF;Document vector dimension: N*L is obtained using Word2vec;Two to
Moment matrix is spliced into higher vector matrix of dimension: N* (K+L).
TFIDF had both considered the frequency of word in a document, it is also considered that frequency of the word in entire corpus, therefore most of
Task shows preferably again and stablizes.TF-IDF algorithm is simple and quick, is as a result more conform with actual conditions, but is come with " word frequency " merely
The importance for measuring a word is obviously not comprehensive enough, and the number that sometimes important word occurs is simultaneously few, and this algorithm can not
Embody the location information of word.And by combining Word2Vec algorithm that can effectively solve the relevance of word within a context.
In the text that medical field generates, not only there is significant indicative medical vocabulary, there are also stronger causes and effects to close
System, i.e. context, which exist, to be closely connected.Therefore, the feature that integrated use TFIDF and word2vec is generated synthesizes new feature,
This feature considers that the particularity of single word and context are contacted with it simultaneously.Feature after merging can be used for optimizing subsequent
Model training.
In one embodiment of the invention, the specific implementation steps are as follows for the integrated use method of this article eigen:
1), Text Pretreatment, including segment and remove stop words, there are many tool that can be used, such as jieba,
Thulac etc., the deactivated dictionary that deactivated dictionary oneself can be constructed or be increased income using each mechanism, the two all can be according to actual needs
Manually expanded in conjunction with the characteristics of medical text;
2) two Feature Engineering models, are trained using TFIDF and Word2Vec respectively, save as vector matrix mode;
3) two feature vectors, are spliced into a new feature by the direction (i.e. extension columns) for degree of being extended to
Vector, the vector include description of the TFIDF and Word2Vec to the same corpus;
4), the vector of synthesis is used for the training of the disaggregated model of text, including Linear SVM Classifier,
Logistic regression orBayes Classifier etc..
The IFIDF model training,
The TF value for first calculating word, uses formulaWherein, niTime occurred in a certain document for a certain word
Number, the N sum of word contained by document thus;
The IDF value for calculating word again, uses formulaThat is article total number is divided by including the word
Article number after take logarithm;
Calculate the product tf of TF and IDFi*idfi, obtain the TFIDF weight of required word.
The Word2vec model training includes CBOW model and Skip-gram model, and CBOW model passes through input
Surrounding n-1 word predicts the word itself;Skip-gram model predicts context according to this itself.Word2vec is based on skip-
Single word and set of context can be joined together to consider by gram or CBOW.
The CBOW model and Skip-gram model are three-layer neural network, i.e. input layer, middle layer (hidden layer) and defeated
Layer out,
The input layer of CBOW model is the term vector of current term upper and lower level window, and middle layer adds up the vector of upper and lower word
(or averaging) obtains intermediate vector, and output layer is a Huffman tree, and leaf node represents word all in corpus, for
Each leaf node can have a global coding, such as " 01001 ", each node of following middle layer can be with Hough
Graceful tree nonleaf node is related, each nonleaf node is the Softmax of one two classification in fact, by each of intermediate vector
Node is assigned in the subtree of Huffman tree, and the parameter of nonleaf node is updated by continuous iteration, the vector of each word can be obtained
It indicates;
Skip-gram mode input layer is current term, and hidden layer is due to being no longer multiple vectors, then tired without doing vector
Add, output layer is equally a Huffman tree, based on context the coding of the coding of word and current word, according to gradient descent algorithm
The parameter of nonleaf node and current term vector in more new route are until convergence, the vector that each word can be obtained indicate.
Association between term vector can embody the association between word.
Finally, the vector of synthesis to be used for the training of the disaggregated model of text, optimize subsequent model training.
The technical personnel in the technical field can readily realize the present invention with the above specific embodiments,.But it answers
Work as understanding, the present invention is not limited to above-mentioned specific embodiments.On the basis of the disclosed embodiments, the technical field
Technical staff can arbitrarily combine different technical features, to realize different technical solutions.
Except for the technical features described in the specification, it all is technically known to those skilled in the art.
Claims (9)
1. a kind of integrated use method of text feature, it is characterised in that handle corpus with completely consistent Text Pretreatment method
Then TFIDF Feature Engineering model and Word2vec Feature Engineering model is respectively trained in library, obtain two different vector matrixs
The same corpus indicated;
Then obtain two vector matrixs are simply spliced into the higher vector matrix of dimension, use the vector matrix
Training classification task model.
2. a kind of integrated use method of text feature according to claim 1, it is characterised in that the text is located in advance
Reason method includes segmenting and removing stop words.
3. a kind of integrated use method of text feature according to claim 1 or 2, it is characterised in that obtained using TFIDF
Obtain document vector dimension: N*K;Document vector dimension: N*L is obtained using Word2vec;Two vector matrixs are spliced into a dimension
Spend higher vector matrix: N* (K+L).
4. a kind of integrated use method of text feature according to claim 3, it is characterised in that the specific reality of this method
It is existing that steps are as follows:
1), Text Pretreatment, including segment and remove stop words;
2) two Feature Engineering models, are trained using TFIDF and Word2Vec respectively, save as vector matrix mode;
3) two feature vectors, are spliced into a new feature vector by the direction for degree of being extended to, which includes
Description of the TFIDF and Word2Vec to the same corpus;
4), the vector of synthesis is used for the training of the disaggregated model of text.
5. a kind of integrated use method of text feature according to claim 4, it is characterised in that the disaggregated model packet
Include Linear SVM Classifier, logistic regression orBayes Classifier。
6. a kind of integrated use method of text feature according to claim 4, it is characterised in that the IFIDF model instruction
Practice,
The TF value for first calculating word, uses formulaWherein, niFor the number that a certain word occurs in a certain document, N is
The sum of word contained by this document;
The IDF value for calculating word again, uses formulaI.e. article total number is divided by the text comprising the word
Logarithm is taken after chapter number;
The product for calculating TF and IDF, obtains the TFIDF weight of required word.
7. a kind of integrated use method of text feature according to claim 4, it is characterised in that the Word2vec mould
Type training includes CBOW model and Skip-gram model, and CBOW model predicts the word sheet by input n-1 word of surrounding
Body;Skip-gram model predicts context according to word itself.
8. a kind of integrated use method of text feature according to claim 7, it is characterised in that the CBOW model and
Skip-gram model is three-layer neural network, i.e. input layer, middle layer and output layer,
The input layer of CBOW model is the term vector of current term upper and lower level window, and middle layer adds up the vector of upper and lower word
To intermediate vector, output layer is a Huffman tree, and leaf node represents word all in corpus, each leaf node has one
A global coding, each node of middle layer are related with Huffman tree nonleaf node, each by intermediate vector saves
Point is assigned in the subtree of Huffman tree, and the parameter of nonleaf node is updated by continuous iteration, the vector table of each word can be obtained
Show;
Skip-gram mode input layer is current term, and output layer is a Huffman tree, based on context the coding of word and is worked as
The coding of preceding word, according to the parameter of nonleaf node in gradient descent algorithm more new route and current term vector until convergence
The vector for obtaining each word indicates.
9. a kind of integrated use method of text feature according to claim 8, it is characterised in that the middle layer is hidden
Layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811571221.9A CN109684637A (en) | 2018-12-21 | 2018-12-21 | A kind of integrated use method of text feature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811571221.9A CN109684637A (en) | 2018-12-21 | 2018-12-21 | A kind of integrated use method of text feature |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109684637A true CN109684637A (en) | 2019-04-26 |
Family
ID=66188180
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811571221.9A Pending CN109684637A (en) | 2018-12-21 | 2018-12-21 | A kind of integrated use method of text feature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109684637A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105930318A (en) * | 2016-04-11 | 2016-09-07 | 深圳大学 | Word vector training method and system |
US9984682B1 (en) * | 2016-03-30 | 2018-05-29 | Educational Testing Service | Computer-implemented systems and methods for automatically generating an assessment of oral recitations of assessment items |
CN108763477A (en) * | 2018-05-29 | 2018-11-06 | 厦门快商通信息技术有限公司 | A kind of short text classification method and system |
-
2018
- 2018-12-21 CN CN201811571221.9A patent/CN109684637A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9984682B1 (en) * | 2016-03-30 | 2018-05-29 | Educational Testing Service | Computer-implemented systems and methods for automatically generating an assessment of oral recitations of assessment items |
CN105930318A (en) * | 2016-04-11 | 2016-09-07 | 深圳大学 | Word vector training method and system |
CN108763477A (en) * | 2018-05-29 | 2018-11-06 | 厦门快商通信息技术有限公司 | A kind of short text classification method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Singh et al. | Vectorization of text documents for identifying unifiable news articles | |
CN109933789B (en) | Neural network-based judicial domain relation extraction method and system | |
US20210034813A1 (en) | Neural network model with evidence extraction | |
CN112115700B (en) | Aspect-level emotion analysis method based on dependency syntax tree and deep learning | |
CN110019770A (en) | The method and apparatus of train classification models | |
CN105930368B (en) | A kind of sensibility classification method and system | |
CN107967255A (en) | A kind of method and system for judging text similarity | |
Cui et al. | Sliding selector network with dynamic memory for extractive summarization of long documents | |
CN113239186A (en) | Graph convolution network relation extraction method based on multi-dependency relation representation mechanism | |
JP2022088319A (en) | Analysis of natural language text in document | |
Ettaouil et al. | Architecture optimization model for the multilayer perceptron and clustering. | |
CN109271516A (en) | Entity type classification method and system in a kind of knowledge mapping | |
Elayidom et al. | A generalized data mining framework for placement chance prediction problems | |
CN112463989A (en) | Knowledge graph-based information acquisition method and system | |
CN110705279A (en) | Vocabulary selection method and device and computer readable storage medium | |
CN110888944B (en) | Attention convolutional neural network entity relation extraction method based on multi-convolutional window size | |
Wu et al. | TW-TGNN: Two windows graph-based model for text classification | |
CN108122613A (en) | Health forecast method and apparatus based on health forecast model | |
CN110020015A (en) | A kind of conversational system answers generation method and system | |
CN109684637A (en) | A kind of integrated use method of text feature | |
CN112686306B (en) | ICD operation classification automatic matching method and system based on graph neural network | |
US20170011309A1 (en) | System and method for layered, vector cluster pattern with trim | |
Gao et al. | Compressing lstm networks by matrix product operators | |
Keegan | Using first-order stochastic based optimizers in solving regression models | |
Mandayam et al. | Intelligent conversational model for mental health wellness |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190426 |