CN106649662A - Construction method of domain dictionary - Google Patents
Construction method of domain dictionary Download PDFInfo
- Publication number
- CN106649662A CN106649662A CN201611149314.3A CN201611149314A CN106649662A CN 106649662 A CN106649662 A CN 106649662A CN 201611149314 A CN201611149314 A CN 201611149314A CN 106649662 A CN106649662 A CN 106649662A
- Authority
- CN
- China
- Prior art keywords
- text
- word
- domain
- dictionary
- distance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the field of natural language processing, and in particular relates to a construction method of a domain dictionary. The method comprises the following steps: on the basis of automatic acquisition of a text keyword, clustering to-be-processed texts to form different topic text sets; selecting a part of seed words in a domain text set of a to-be-constructed dictionary through manual examination; on the basis, analyzing the distance of the clustered topic text sets and the selected domain seed words in relationship, and only retaining the top text sets which are relatively close in relationship for expanding the domain dictionary; and in a related domain, performing automatic expansion of the domain dictionary by combining an algorithm in the related domain to obtain a corresponding dictionary. According to the method provided by the invention, the to-be-constructed domain dictionary can be automatically expanded through a few of part of seed words on the basis of automatic differentiation of the domains of the text topics; the construction efficiency of the dictionary is relatively high, the accuracy is high, and the pertinence of the domain is strong; the method has wide application prospect in text analysis and natural language processing field.
Description
Technical field
Natural language processing field of the present invention, more particularly to a kind of domain lexicon construction method.
Background technology
With the fast development of internet, substantial amounts of, disclosed web data is generated, also therefore facilitated various being based on
The medical treatment of the new industry of big data technology, such as internet, Internet education, enterprise or personal reference etc..These internets
The rise of industry be unable to do without substantial amounts of data message analysis with flourishing.Natural language processing in big data analysis occupies important ground
Position, the network text resource in the face of magnanimity with natural language processing analysis method by automatically, intelligently judging text
Or certain Sentiment orientation that text publisher is contained, either suffer from the analysis of public opinion or business survey heavy to closing
The practical application meaning wanted.Using these analysis results, correct anticipation is carried out to the development evolvement of thing, and then taken in advance
Corresponding measure is realizing bigger positive effect.
And sentiment analysis mainly have two big class methods, a class is that, based on the method for machine learning, another kind of is based on dictionary
Method.Method based on machine learning is to build grader first, and be analysed to text input is carried out in grader
Analysis.The limitation of this method is to build grader, needs large-scale corpus to be trained grader, and is classified
The selection of feature also has challenge very much, and the quality of feature selecting will directly affect the performance of grader.Based on the method for dictionary,
Using the word in dictionary as feature, corresponding feature vocabulary is extracted by dictionary matching, on the basis of feature vocabulary is extracted
With reference to setting model either algorithm judging the corresponding tendency of the text or property, the reliability of analysis is greatly increased.
It is targetedly to analyze and excavate, the word that different fields is taken based on the sentiment analysis method of sentiment dictionary
Allusion quotation is also very different, and at present existing domain lexicon, but lacks the applicability to particular problem, and specific aim is not strong.Dividing
When analysis specific field or concrete topic, using existing big and wide in range domain lexicon, preferably analysis can not be reached
Effect, targetedly domain lexicon is very necessary for structure, but manual construction dictionary very takes time and effort;Magnanimity can not be met
The demand of text analyzing.
The content of the invention
It is an object of the invention to overcome the above-mentioned deficiency in the presence of prior art, there is provided a kind of domain lexicon structure side
Method, on the basis of text key word is obtained automatically, clusters to pending text, and the different field of formation or theme are literary
This collection;According to analysis needs, a small amount of corresponding field seed words are chosen, the field after cluster or master are analyzed on this basis
Topic text set is far and near with the relation of selected field seed words, only retains the closer field of relation or subject text collection as neck
The source of domain lexicon extension.Carry out the automatic extension of domain lexicon with reference to word correlation analysis algorithm on this basis, and then
Obtain corresponding domain lexicon.
In order to realize foregoing invention purpose, the invention provides technical scheme below:A kind of domain lexicon construction method, bag
Step containing implemented below:
(1) keyword of each text in pending text set is extracted;
(2) pending text is clustered, generates N number of subject text collection, wherein N is integer and N >=2;
(3) seed words in field are chosen;
(4) seed words are counted and the frequency for occurring is concentrated in each subject text;The subject text collection that frequency exceedes threshold value is protected
Stay, as the source text collection of domain lexicon extension;
(5) degree of association of seed words and each candidate word in the text of source text collection is calculated, the degree of association is reached threshold value is set
Candidate word be stored in dictionary to be expanded as domain term.
Specifically, the inventive method includes participle, the pre-treatment step gone high frequency words, remove stop words.
Further, keyword in text is extracted using following algorithmic formula in the step (1).The calculating of the algorithm
Formula is:
TR(vi) it is word v in textiImportance, d is damped coefficient, be traditionally arranged to be 0.85, N be in non-directed graph own
The number of word, relat { viBe and word viThere are the set of words of cooccurrence relation, vjIt is relat { viIn any one word, TR (vj)
It is vjImportance, N (pj) be and vjThere is the number of the word of cooccurrence relation.
Further, procedure below is included to pending text cluster in the step (2):
(2-1) when initial, each pending text is respectively a class;
Between class distance is defined as in two classes the maximum of distance between text pair two-by-two, and the computing formula of distance is such as between text
Under:
Wherein C (t1, t2) represents the distance between text 1 and text 2, and t1 ∩ t2 are represented and included between text 1 and text 2
The number of same keyword, mid (t1, t2) represents the mean number comprising keyword in text 1 and text 2;Between class distance meter
Calculate formula as follows:
Dist(ca, cb)=max { C (ta, tb), ta∈ca, tb∈cb}
Wherein, Dist (ca, cb) represent the distance between any two class cluster, caAnd cbTwo classes, C (t are represented respectivelya,
tb) represent the distance between two texts, taAnd tbTwo texts are represented respectively, and require ta∈ca、tb∈cb(2-2) calculate
All classes distance between any two, the minimum class of distance is merged, and is named as cnew;
(2-3) merged initial classes cluster is deleted in pending text set, and new class cluster cnew is added to poly-
In class result;
(2-4) repeat step (2-1) to (2-3), when only including N number of class cluster in pending text set, stops cluster.
What is now included in pending text set is the N number of theme formed after cluster, and the concrete number of wherein N is answered according to actual
The sets itself with institute.
As a kind of preferred:Candidate word is with the calculation of relationship degree formula of seed words in the step (5):
The probability that wherein p (word1, word2) occurs jointly for word word1 and word word2, p (word1) and p (word2)
Represent the probability that word word1 and word word2 occurs respectively.
As one kind preferably, in the step (2), N=3.
As one kind preferably, in the step (3), the number of selected seed words is 50-200.
Further, the step (3) sequentially can move on to the step (1) and, or before step (2).
As a kind of preferred, in the step (4), only retain seed words frequency of occurrences highest subject text collection as word
The source text collection that allusion quotation expands.
As it is a kind of preferably, in the step (5) candidate word and the threshold value of seed words be set to MI (word1, word2)=
0.2, when the degree of association >=0.2 of vocabulary in text set and seed words, just the word is added to be built as extension vocabulary
Dictionary in.
Compared with prior art, beneficial effects of the present invention:The present invention provides a kind of domain lexicon construction method, automatic
On the basis of obtaining text key word, pending text is clustered, form different subject text collection;And choose a fixed number
The field seed words of amount, find that the pending text set after clustering is remote with the relation for treating extension field by seed words automatically
Closely, automatically identify cluster after text domain type on the basis of, only retain subject text collection in close relations to be led
Domain lexicon extension.The accuracy of dictionary creation is higher, builds in hgher efficiency.
The inventive method, chooses a part of seed words, depending on the selection of seed words can be according to the concrete direction of analysis, therefore
More there is specific aim, choose with the basis of field automatically discovery in seed words, calculate the text of seed words and source text collection
The correlation degree of middle word is far and near, retains word in close relations as the expansion word of the domain lexicon;Compared to common domain term
Allusion quotation, the domain lexicon constructed by the inventive method has higher flexible.The practicality of dictionary is higher, is adaptive to particular problem
Or the text analyzing of theme.
Description of the drawings:
Fig. 1 realizes block diagram for the construction method of this area dictionary.
Fig. 2 is the realization procedure chart of this area word construction method step (5).
Specific embodiment
With reference to test example and specific embodiment, the present invention is described in further detail.But this should not be understood
Scope for above-mentioned theme of the invention is only limitted to below example, and all technologies realized based on present invention belong to this
The scope of invention.
A kind of domain lexicon construction method is provided, on the basis of text key word is obtained automatically, pending text is entered
Row cluster, forms different subject text collection;It is concentrated through manually checking in the field text of dictionary to be built, chooses a part
Seed words.The subject text collection after cluster is analyzed on this basis far and near with the relation of selected field seed words, only retain and close
The closer subject text collection of system is carrying out domain lexicon extension.Carry out the automatic of domain lexicon in conjunction with algorithm on this basis
Extension, obtains corresponding domain lexicon.The inventive method leads to too small amount of portion on the basis of automatic distinguishing text subject field
Seed words are divided to expand the domain lexicon for wanting to build automatically;The structure efficiency of dictionary is higher, and accuracy is high, the pin in field
It is very strong to property;Have wide practical use in text analyzing and natural language processing field.
In order to realize foregoing invention purpose, the invention provides technical scheme below:A kind of domain lexicon construction method, bag
Containing implemented below step as shown in Figure 1:
(1) keyword of each text in pending text set is extracted;
(2) pending text is clustered, forms N number of subject text collection, wherein N is integer and N >=2;
(3) a small amount of field seed words are chosen;Choose the vocabulary with obvious domain features, the side of artificial selected seed word
Formula, it is higher for the specific aim of specific field or problem, constructed dictionary it is applicable more flexible.
(4) seed words are counted and the frequency for occurring is concentrated in each subject text;The seed words frequency of occurrences is exceeded into the master of threshold value
Topic text set retains, used as the source text collection of domain lexicon extension.Pending text set is classified by cluster, is defined
The text collection of different themes, the correlation degree between text in same subject is higher, is that follow-up lexicon extension is carried out
The preparation and screening of language material.
Formed after different themes text set by cluster, through calculating appearance frequency of the seed words in subject text keyword
Rate, and then the distance of the relation between different themes and constructed dictionary field is analyzed, relation text set farther out is given up, this
Sample is only carried out when lexicon extension is carried out in the nearer theme in field, substantially increases the quality of lexicon extension source language material,
The accuracy of lexicon extension is obviously improved, simultaneously because being only that in the nearest text set in extended field dictionary expansion is carried out
Exhibition, reduces the scope calculated during lexicon extension, reduces the amount of calculation of lexicon extension, improves the efficiency of lexicon extension.
(5) degree of association of seed words and each word of source text collection is calculated, the degree of association is reached the word of given threshold as neck
Domain word is stored in dictionary to be expanded.
Specifically, the inventive method includes participle, the pre-treatment step gone high frequency words, remove stop words.
Further, keyword in text is extracted using following algorithmic formula in the step (1).The calculating of the algorithm
Formula is:
TR(vi) it is word v in textiImportance.D is damped coefficient, is traditionally arranged to be 0.85.N is (by text in non-directed graph
After this participle, a non-directed graph is abstracted into, each word in its Chinese version is a node in figure) number of all words.
relat{viBe and word viThere is the set of words of cooccurrence relation.vjIt is relat { viIn any one word, TR (vj) it is vjWeight
The property wanted, N (pj) be and vjThere is the number of the word of cooccurrence relation.
Calculating is iterated by this computing formula, TR (v are extractedi) it is more than the key of the equivalent as the text of threshold value
Word;It is that text cluster is prepared by the automatic extraction of keyword.
Further, procedure below is included to pending text cluster in the step (2):
(2-1) when initial, each pending text is respectively a class;
Between class distance is defined as in two classes the maximum of distance between text pair two-by-two, and the computing formula of distance is such as between text
Under:
Wherein C (t1, t2) represents the distance between text 1 and text 2, and t1 ∩ t2 are represented and included between text 1 and text 2
The number of same keyword, mid (t1, t2) represents the mean number comprising keyword in text 1 and text 2;Between class distance meter
Calculate formula as follows:
Dist(ca, cb)=max { C (ta, tb), ta∈ca, tb∈cb}
Wherein, Dist (ca, cb) represent the distance between any two class cluster, caAnd cbTwo classes, C (t are represented respectivelya,
tb) represent the distance between two texts, taAnd tbTwo texts are represented respectively, and require ta∈ca、tb∈cb(2-2) calculate
All classes distance between any two, the minimum class of distance is merged, and is named as cnew;
(2-3) merged initial classes cluster is deleted in pending text set, and new class cluster cnew is added to poly-
In class result;
(2-4) repeat step (2-1) to (2-3), when only including N number of class cluster in pending text set, stops cluster.
What is now included in pending text set is the N number of theme formed after cluster, and the concrete number of wherein N is answered according to actual
With and sets itself.
As a kind of preferred, step (2-4) N=3, pending text set is only divided into three themes, it is convenient follow-up
Calculate.
As one kind preferably, in the step (3), the quantity of the field seed words for being extracted is 50-200.Choose
Seed words are very few, will affect the accuracy of domain lexicon extension, cross and at most will increase the manpower and time cost chosen.
As a kind of preferred;In the step (4), only retain seed words frequency of occurrences highest subject text collection as word
The source text collection that allusion quotation expands;This step concentrates the most close text set of selection and seed words relation from individual subject text so that word
The characteristics of corpus of allusion quotation extension more conform to field, the extension quality of dictionary is higher, and specific aim is higher.
As a kind of preferred:Vocabulary is thought with the calculation of relationship degree of seed words using the calculating of mutual information in the step (5)
Think, the computing formula for being adopted for:
The probability that wherein p (word1, word2) occurs jointly for word word1 and word word2, p (word1) and p (word2)
Represent the probability that word word1 and word word2 occurs respectively.Mutual information algorithm is for the degree of association between analysis vocabulary, algorithm letter
Clean easy realization, computational efficiency is higher;Mutual information is the analysis method of computational linguistics model, and it is measured between two objects
Reciprocity.It is used for measures characteristic in filtration problem for the discrimination of theme.When domain lexicon structure is carried out, plant choosing
On the basis of sub- word, the correlation of vocabulary to be expanded and existing seed words is calculated using the method for mutual information, the degree of correlation is got over
Height represents that the word is higher with the relevance of seed words.
As one kind preferably, the threshold value of the step (5) is set to MI (word1, word2)=0.2, when time in text set
When selecting the degree of association >=0.2 of word and seed words, just it is added to the word as extension vocabulary in the dictionary to be built, the step
Suddenly the calculating process of (5) is as shown in Figure 2.
Claims (9)
1. a kind of domain lexicon construction method, it is characterised in that comprising implemented below step:
(1) keyword of each text in pending text set is extracted;
(2) pending text is clustered, generates N number of subject text collection, wherein N is integer and N >=2;
(3) seed words in field are chosen;
(4) seed words are counted and the frequency for occurring is concentrated in each subject text;The subject text collection that frequency exceedes threshold value is retained, is made
For the source text collection of domain lexicon extension;
(5) degree of association of seed words and each candidate word in the text of source text collection is calculated, the degree of association is reached into the candidate word of threshold value
It is stored in dictionary to be expanded as domain term.
2. the method for claim 1, it is characterised in that include before the step (1):Participle, go high frequency words, go to stop
The pre-treatment step of word.
3. the method for claim 1, in the step (1) keyword, the public affairs are extracted using following computing formula
Formula is:
TR(vi) it is word v in textiImportance, d is damped coefficient, and it is all words in non-directed graph to be traditionally arranged to be 0.85, N
Number, relat { viBe and word viThere are the set of words of cooccurrence relation, vjIt is relat { viIn any one word, TR (vj) it is vj
Importance, N (pj) be and vjThere is the number of the word of cooccurrence relation.
4. method as claimed in claim 3, it is characterised in that:To pending text cluster comprising following in the step (2)
Process:
(2-1) when initial, each pending text is respectively a class;
Between class distance is defined as in two classes the maximum of distance between text pair two-by-two, and the computing formula of distance is as follows between text:
Wherein C (t1, t2) represents the distance between text 1 and text 2, and t1 ∩ t2 are represented between text 1 and text 2 comprising identical
The number of keyword, mid (t1, t2) represents the mean number comprising keyword in text 1 and text 2;
Between class distance computing formula is as follows:
Dist(ca, cb)=max { C (ta, tb), ta∈ca, tb∈cb}
Wherein, Dist (ca, cb) represent the distance between any two class cluster, caAnd cbTwo classes, C (t are represented respectivelya, tb) table
Show the distance between two texts, taAnd tbTwo texts are represented respectively, and require ta∈ca、tb∈cb(2-2) calculate all
Class distance between any two, the minimum class of distance is merged, and is named as cnew;
(2-3) merged class cluster is deleted in pending text set, and new class cluster cnew is added in cluster result;
(2-4) repeat step (2-1) to (2-3), when only including N number of class cluster in pending text set, stops cluster.
5. method as claimed in claim 4, it is characterised in that:The degree of association meter of candidate word and seed words in the step (5)
Calculating formula is:
Wherein p (word1, word2) is the probability that word word1 and word word2 occurs jointly, and p (word1) and p (word2) is represented
The probability that word word1 and word word2 occur respectively.
6. method as claimed in claim 5, it is characterised in that:In the step (2), N=3.
7. method as claimed in claim 6, it is characterised in that:In the step (3), the number of selected seed words is
50-200.
8. method as claimed in claim 7, it is characterised in that:In the step (4), only retain seed words frequency of occurrences highest
The source text collection that expands as dictionary of subject text collection.
9. method as claimed in claim 8, it is characterised in that:In the step (5), the degree of association of expansion word and seed words is treated
Threshold value is set to:0.2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611149314.3A CN106649662A (en) | 2016-12-13 | 2016-12-13 | Construction method of domain dictionary |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611149314.3A CN106649662A (en) | 2016-12-13 | 2016-12-13 | Construction method of domain dictionary |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106649662A true CN106649662A (en) | 2017-05-10 |
Family
ID=58825933
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611149314.3A Pending CN106649662A (en) | 2016-12-13 | 2016-12-13 | Construction method of domain dictionary |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649662A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107402909A (en) * | 2017-06-16 | 2017-11-28 | 合肥龙图腾信息技术有限公司 | A kind of encyclopaedia content input method and system |
CN107992509A (en) * | 2017-10-12 | 2018-05-04 | 如是科技(大连)有限公司 | method and device for generating job dictionary information |
CN108038101A (en) * | 2017-12-07 | 2018-05-15 | 杭州迪普科技股份有限公司 | A kind of recognition methods for distorting text and device |
CN110704638A (en) * | 2019-09-30 | 2020-01-17 | 南京邮电大学 | Clustering algorithm-based electric power text dictionary construction method |
CN115080752A (en) * | 2022-08-18 | 2022-09-20 | 湖南大学 | Numerical value feature discovery method and system based on automatic acquisition of feature field knowledge |
CN115270774A (en) * | 2022-09-27 | 2022-11-01 | 吉奥时空信息技术股份有限公司 | Big data keyword dictionary construction method for semi-supervised learning |
CN115859948A (en) * | 2022-06-14 | 2023-03-28 | 北京中关村科金技术有限公司 | Method, device and storage medium for mining domain vocabulary based on correlation analysis algorithm |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559174A (en) * | 2013-09-30 | 2014-02-05 | 东软集团股份有限公司 | Semantic emotion classification characteristic value extraction method and system |
CN105893444A (en) * | 2015-12-15 | 2016-08-24 | 乐视网信息技术(北京)股份有限公司 | Sentiment classification method and apparatus |
-
2016
- 2016-12-13 CN CN201611149314.3A patent/CN106649662A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559174A (en) * | 2013-09-30 | 2014-02-05 | 东软集团股份有限公司 | Semantic emotion classification characteristic value extraction method and system |
CN105893444A (en) * | 2015-12-15 | 2016-08-24 | 乐视网信息技术(北京)股份有限公司 | Sentiment classification method and apparatus |
Non-Patent Citations (6)
Title |
---|
LIZHOU ZHENG: "Multi-dimensional Sentiment Analysis for Large-Scale E-commerce Reviews", 《INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS》 * |
唐浩浩等: "基于词亲和度的微博词语语义倾向识别算法", 《数据采集与处理》 * |
董丽丽等: "基于领域本体、情感词典的商品评论倾向性分析", 《计算机应用与软件》 * |
蒋盛益等: "面向微博的社会情绪词典构建及情绪分析方法研究", 《中文信息学报》 * |
赵军等: "一种改进的融合关联词典的微博倾向性分析方法", 《数据采集与处理》 * |
顾益军: "融合LDA与TextRank的关键词抽取研究", 《现代图书情报技术》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107402909A (en) * | 2017-06-16 | 2017-11-28 | 合肥龙图腾信息技术有限公司 | A kind of encyclopaedia content input method and system |
CN107992509A (en) * | 2017-10-12 | 2018-05-04 | 如是科技(大连)有限公司 | method and device for generating job dictionary information |
CN107992509B (en) * | 2017-10-12 | 2022-05-13 | 如是人力科技集团股份有限公司 | Method and device for generating job dictionary information |
CN108038101A (en) * | 2017-12-07 | 2018-05-15 | 杭州迪普科技股份有限公司 | A kind of recognition methods for distorting text and device |
CN108038101B (en) * | 2017-12-07 | 2021-04-27 | 杭州迪普科技股份有限公司 | Method and device for identifying tampered text |
CN110704638A (en) * | 2019-09-30 | 2020-01-17 | 南京邮电大学 | Clustering algorithm-based electric power text dictionary construction method |
CN115859948A (en) * | 2022-06-14 | 2023-03-28 | 北京中关村科金技术有限公司 | Method, device and storage medium for mining domain vocabulary based on correlation analysis algorithm |
CN115080752A (en) * | 2022-08-18 | 2022-09-20 | 湖南大学 | Numerical value feature discovery method and system based on automatic acquisition of feature field knowledge |
CN115080752B (en) * | 2022-08-18 | 2022-12-02 | 湖南大学 | Numerical value feature discovery method and system based on automatic acquisition of feature field knowledge |
CN115270774A (en) * | 2022-09-27 | 2022-11-01 | 吉奥时空信息技术股份有限公司 | Big data keyword dictionary construction method for semi-supervised learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649662A (en) | Construction method of domain dictionary | |
CN106610955A (en) | Dictionary-based multi-dimensional emotion analysis method | |
Rathi et al. | Sentiment analysis of tweets using machine learning approach | |
CN107766324B (en) | Text consistency analysis method based on deep neural network | |
CN104102626B (en) | A kind of method for short text Semantic Similarity Measurement | |
CN108052593A (en) | A kind of subject key words extracting method based on descriptor vector sum network structure | |
CN102411563B (en) | Method, device and system for identifying target words | |
CN106681985A (en) | Establishment system of multi-field dictionaries based on theme automatic matching | |
CN107301171A (en) | A kind of text emotion analysis method and system learnt based on sentiment dictionary | |
CN111160037A (en) | Fine-grained emotion analysis method supporting cross-language migration | |
CN105868184A (en) | Chinese name recognition method based on recurrent neural network | |
CN106156372B (en) | A kind of classification method and device of internet site | |
CN108021555A (en) | A kind of Question sentence parsing measure based on depth convolutional neural networks | |
CN106682128A (en) | Method for automatic establishment of multi-field dictionaries | |
CN105975478A (en) | Word vector analysis-based online article belonging event detection method and device | |
CN109885675B (en) | Text subtopic discovery method based on improved LDA | |
CN108038106B (en) | Fine-grained domain term self-learning method based on context semantics | |
CN110825850B (en) | Natural language theme classification method and device | |
CN106681986A (en) | Multi-dimensional sentiment analysis system | |
CN106682089A (en) | RNNs-based method for automatic safety checking of short message | |
CN105740382A (en) | Aspect classification method for short comment texts | |
US8762300B2 (en) | Method and system for document classification | |
CN111626050A (en) | Microblog emotion analysis method based on expression dictionary and emotion common sense | |
CN107292348A (en) | A kind of Bagging_BSJ short text classification methods | |
CN107463715A (en) | English social media account number classification method based on information gain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170510 |
|
WD01 | Invention patent application deemed withdrawn after publication |