CN105335351B - A kind of synonym automatic mining method based on patent search daily record user behavior - Google Patents
A kind of synonym automatic mining method based on patent search daily record user behavior Download PDFInfo
- Publication number
- CN105335351B CN105335351B CN201510701365.1A CN201510701365A CN105335351B CN 105335351 B CN105335351 B CN 105335351B CN 201510701365 A CN201510701365 A CN 201510701365A CN 105335351 B CN105335351 B CN 105335351B
- Authority
- CN
- China
- Prior art keywords
- daily record
- synonym
- patent search
- word
- search daily
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of synonym automatic mining methods based on patent search daily record user behavior, include the following steps:Step 1) pre-processes patent search daily record, and candidate synonym collection is obtained using the stay in place form of patent search daily record synset;Step 2) extracts literal feature, pronunciation feature and the query characteristics of the candidate synonym of candidate synonym concentration.Synonym automatic mining method provided by the invention based on patent search daily record user behavior, the accuracy that can effectively improve the synonym identification in patent search daily record field by choosing literal feature, pronunciation feature and query characteristics, can meet the needs of practical application well.
Description
Technical field
The invention belongs to Chinese information retrieval technical fields, and in particular to a kind of based on patent search daily record user behavior
Synonym automatic mining method.
Background technology
With the fast development of science and technology, various emerging high-tech products more and more flood the market, patent letter
It ceases and has been paid much attention to by people as a kind of residence law, technology, the economic specific information resource in one.Patent search engine
As a basic means of patent information inquiry, it is used widely.Whether user can retrieve satisfied information and search
Indexing the thesaurus held up has very close relationship, and synonym is the part for forming thesaurus, in order to make user inquire more
Comprehensive detailed patent information, synonym Research on Mining are particularly important.
There are a large amount of wrong words, some wrong words to be widely used by people in patent search daily record, this kind of word with and it
Corresponding correct word is also considered as synonym, such as carbon nanotube and carbon nanotube, Yoga and yoga.In addition to this, patent is searched
There are many unregistered words in Suo Zhi, thus it is existing《Hownet》With《Chinese thesaurus》This kind of synonym resource cannot be used for
The synonym of patent search daily record excavates.Traditional synonym define refer to a things different expression-forms, pass through analysis
In patent search daily record the characteristics of vocabulary, the synonym of patent field can substantially be divided into following eight major class:1) Chinese-English,
This kind of synonym is mainly two kinds of different expression-forms for describing identical concept, such as:Zinc-Zn, Email-email;2) it learns
Name-popular name refers to the written word and works and expressions for everyday use of same thing, such as:Ethyl alcohol-alcohol;3) full name-abbreviation refers to the original of same thing
Title and simplified title, such as:Peking University-Beijing University, short message-short message, time stab-time stamp;4) unisonance synonym, this kind of word
Mainly caused by the wrong word that high frequency uses, such as:Yoga-yoga, gamma-gamma, Bezafibrate-bezafibrate, automobile-
Gas vehicle;5) newly claim-be once called as, refer to two kinds of address modes of different times identical concept, such as bicycle-bicycle;6) tradition is synonymous
Word, it is identical and be not belonging to the word of the above classification to refer to concept, such as chitin-chitin, threshold value-thresholding;7) antonym refers to concept and cuts
So opposite word, such as go out-enter, increase-reduction, left-hand rotation-right-hand rotation;8) synonym caused by translating, this kind of word is turned over to English
It translates, pronunciation is roughly the same, such as:Epcos AG-Epcos AG, Rosemount Inc-Rosemount are public
Department.
Currently, synonym resource has been widely used in various fields, as information retrieval, semantic disambiguation, query expansion,
Keyword extraction, machine translation etc..With the promotion of application, the method for automatic mining synonym emerges one after another, at this stage mainly
There are following two methods:Synonym based on corpus and based on dictionary excavates.But both methods exists centainly
Defect:Method based on corpus easy tos produce matrix Sparse Problems;Synonym method for digging based on dictionary is easy to be led
The limitation in domain can not play a role well.
Invention content
For the above-mentioned prior art the problem of, the purpose of the present invention is to provide one kind can avoid above-mentioned skill occur
The synonym automatic mining method based on patent search daily record user behavior of art defect.
In order to achieve the above-mentioned object of the invention, the technical solution adopted by the present invention is as follows:
A kind of synonym automatic mining method based on patent search daily record user behavior, includes the following steps:
Step 1) pre-processes patent search daily record, is obtained using the stay in place form of patent search daily record synset
Candidate synonym collection;
Step 2) extracts literal feature, pronunciation feature and the query characteristics of the candidate synonym of candidate synonym concentration.
Further, the step 1) is specially:
Step A:The query string of filtering useless, using regular expression remove patent search daily record in application number, openly
Number, the patent information inquired of classification number;
Step B:To patent search daily record progress, full-shape is converted to half-angle, traditional font is converted to simplified processing;
Step C:The synonymous word structure in patent search daily record is extracted according to the stay in place form of candidate synonym collection;
Step D:According to name identifier rule-based filtering name information, candidate synonym collection is obtained.
Further, whether the literal feature includes moving similarity after maximum similarity, minimum similarity degree, center of gravity, having
There is same prefix and whether there are five features of identical suffix, wherein:The calculation formula of the maximum similarity is as follows:
The calculation formula of the minimum similarity degree is as follows:
The calculation formula that similarity is moved after the center of gravity is as follows:
Wherein, Sim_zimianmax(w1, w2) word is represented to (w1, w2) maximum similarity;Sim_zimianmin(w1, w2)
Word is represented to (w1, w2) minimum similarity degree;Sim_zimianzhongxin(w1, w2) word is represented to (w1, w2) center of gravity after phase shift seemingly
Degree;same(w1, w2) word is represented to (w1, w2) in same word number;min(|w1|, | w2|) word is represented to (w1, w2) in it is minimum
Word it is long;max(|w1|, | w2|) word is represented to (w1, w2) in maximum word it is long;|w1| represent w1Word it is long;|w2| represent w2Word
It is longRefer to weights sum of the identical word in word different location;K is represented in word
Word number, same (w1, m) and represent the position of identical word;Wherein, α=0.6, β=0.4, γ=1.
Further, the pronunciation calculating formula of similarity of the pronunciation feature is as follows:
Wherein,Represent w1Pronunciation,Word is represented to (w1, w2) pronunciation smallest edit distance,Word is represented to (w1, w2) in maximum pronunciation length;Word is represented to (w1, w2) reading
Sound similarity.
Further, patent search daily record is will appear in the vocabulary in a line as a query characteristics, and utilization is following
Formula calculates query characteristics value:
(w1, w2) ∈ row represent word to (w1, w2) same a line in patent search daily record occurs,Generation
Table word is to (w1, w2) do not occur in same a line of patent search daily record.
Synonym automatic mining method provided by the invention based on patent search daily record user behavior, it is literal by choosing
Feature, pronunciation feature and query characteristics can effectively improve the accuracy of the synonym identification in patent search daily record field, can
To meet the needs of practical application well.
Description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2 is the specific steps flow chart of step 1);
Fig. 3 is that the data of a linearly inseparable convert the linear separability sample obtained later by gaussian kernel function,
In, the point being circled is supporting vector.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with the accompanying drawings and specific implementation
The present invention will be further described for example.It should be appreciated that described herein, specific examples are only used to explain the present invention, and does not have to
It is of the invention in limiting.
As shown in Figure 1, a kind of synonym automatic mining method based on patent search daily record user behavior, including following step
Suddenly:
Step 1) pre-processes patent search daily record, is obtained using the stay in place form of patent search daily record synset
Candidate synonym collection;
Step 2) extracts literal feature, pronunciation feature and the query characteristics of the candidate synonym of candidate synonym concentration.
Most of query string in patent search daily record contains a variety of describing modes of a things, these describing modes
Between be attached by the logical operators such as " or ", " and ", " not ", the part vocabulary of these logical operators connection exists
Coordination.It is as shown in table 1 by analyzing the characteristics of synonym is distributed in patent search daily record.
Table 1:Processed patent search daily record language material
The synset stay in place form of structure mainly has following five kinds:
1. template 1
“'words1'OR'words2'OR'words3'", wherein
Words1, words2 and words3 are candidate synonym collection;The template is connected with " OR " or " or ", is simplest synset
Stay in place form, as shown in 18 rows in Fig. 1;
2. template 2
“('words1'pre/2'words2')OR'words3'", " &
apos;words1'pre/2'words2'" indicate that the phrase that words1 and words2 is constituted connects with " OR "
The words3 connect is that candidate synonym, i.e. words1+words2 and words3 are candidate synonym, such as 19,24,26 row in Fig. 1
It is shown;
3. template 3
“'words1'OR('words2'AND'words3'AND&
apos;words4') ", wherein query statement " 'words2'AND'words3'AND&
apos;words4'" indicate that words2, words3 and words4 constitute the words1 that a phrase is connect with " OR " and constitute
Candidate synonym, i.e. word1 and words2+words3+words4 are candidate synonym, as shown in 27 rows in Fig. 1;
4. template 4
“'words1'OR('words2'and/sen'words3') ",
Wherein word1 and words2+word3 is candidate synonym, as shown in 29 rows in Fig. 1;
5. template 5
" indications='words1'OR indications=s 'words2'OR indications=&
apos;words3'", indications are usually " DESC1 ", " KWRF ", " TICN ", " ABS ", " TI, KW+ " etc. are referred to not
Same query word characteristic.Wherein words1, words2 and words3 refer to the candidate synonym collection for having same nature, such as table 1
In shown in 17,20,22,23,28 rows.
Candidate synonym collection is obtained using the stay in place form of patent search daily record synset.First, regular expressions are utilized
The patent information inquired with application number, publication number, classification number in formula removal patent search daily record.Due to inquiring the defeated of record
Enter method and font disunity, full-shape is converted to half-angle, traditional font is converted to simplified processing to daily record progress.According to synset
Patent search daily record is carried out division processing by stay in place form 1, template 2, template 3, template 4, template 5.Step 1) is excavated candidate same
The flow chart of adopted word set is as shown in Fig. 2, step 1) is specially:
Step A:The query string of filtering useless, using regular expression remove patent search daily record in application number, openly
Number, the patent information inquired of classification number;
Step B:To patent search daily record progress, full-shape is converted to half-angle, traditional font is converted to simplified processing;
Step C:The synonymous word structure in patent search daily record is extracted according to the stay in place form of candidate synonym collection;
Step D:According to name identifier rule-based filtering name information, candidate synonym collection is obtained.
Wherein:In the query string of filtering useless, retain with title, address, applicant, inventor, patent agency
Deng the information for carrying out patent consulting.By analyzing patent search daily record find that merely with synonym template name letter can not be filtered
Breath extracts the indications rule comprising name, such as to improve the accuracy and recall rate of synonym table:" INV ",
" ATCN ", " GK_IN " etc. filter the interference of name information, in row 17 and row 23 in table 1 according to the rule of name indications
Content.Include nearly 30,000 query word in patent search daily record after processed, including Chinese, English and Japanese vocabulary.
In table 1 shown in 18 rows, candidate synonym collection is:Chitin chitin chitosan, then candidate synonym is to just having 3
It is right, i.e.,:Chitin chitin;Chitin chitosan;Chitin chitosan.Synonym in patent search daily record is made full use of to be distributed
The characteristics of, the accuracy rate of the candidate synonym collection of acquisition is also relatively high.
The basic thought of supporting vector machine model:An optimal hyperplane is defined, and optimal hyperlane will be found and calculated
Method is attributed to the problem of solution convex programming.Then according to the expansion theorem of Mercer cores, pass through a Nonlinear Mapping
Sample space is mapped to compared with (spaces Hilbert) in higher-dimension or infinite dimensional feature space, in this way in feature space model
Point of the recurrence in sample space model, estimation of density function and high dimensional nonlinear can be solved using linear learning method
Class problem.Especially prominent in solving the problems, such as text classification, the recall rate and accuracy obtained using this method is superior to other
Method.
It is pattern recognition problem that classification problem, which is also known as, is exactly to find point inherent in data according to existing observation data
Then class relationship is treated prediction data using obtained disaggregated model y=M (x) and is tested.Synonymous word identification problem is exactly one
A two classifying and dividing is exactly to find a suitable function y=f (x), by f (xiThe x of) >=0iIt is classified as positive class, by f (xi) < 0
XiIt is classified as negative class.
The kernel function of supporting vector machine model, common is mainly the following:
1. polynomial kernel K (x, xi)=(axy+c)d (1)
It is the linear separability sample that the data of a linearly inseparable obtain later by gaussian kernel function transformation as shown in Figure 3
This, wherein the point being circled is supporting vector.
In machine learning method, the selection of feature is very important classification.The candidate that the present invention chooses is synonymous
Word is all word pair similar in the meaning of a word, therefore the division of classification is only difficult to realize by simple feature.The present invention not only considers word
Region feature also takes into account the similar feature of pronunciation and user query behavior feature.
Synonym is exactly usually morpheme having the same there are an apparent feature, such as:Peking University and Beijing University,
Running shoes and running shoe, timestamp value and time stamp etc..Therefore the feature of literal similarity is considered when using support vector machines.Word
Whether region feature includes mainly maximum similarity, and minimum similarity degree moves similarity after center of gravity, if having same prefix and have
The calculating formula of similarity difference of five features of identical suffix, wherein first three feature is as follows:
The calculation formula of the maximum similarity is
The calculation formula of the minimum similarity degree is
The calculation formula of shifting similarity is after the center of gravity
Wherein, Sim_zimianmax(w1, w2) word is represented to (w1, w2) maximum similarity;Sim_zimianmin(w1, w2)
Word is represented to (w1, w2) minimum similarity degree;Sim_zimianzhongxin(w1, w2) word is represented to (w1, w2) center of gravity after phase shift seemingly
Degree;same(w1, w2) word is represented to (w1, w2) in same word number;min(|w1|, | w2|) word is represented to (w1, w2) in it is minimum
Word it is long;max(|w1|, | w2|) word is represented to (w1, w2) in maximum word it is long;|w1| represent w1Word it is long;|w2| represent w2Word
It is longRefer to weights sum of the identical word in word different location;K is represented in word
Word number, same (w1, m) and represent the position of identical word;Wherein, α=0.6, β=0.4, γ=1.Following table 2 is listed
The example of one literal feature:
Table 2:Literal feature
There are many wrong words, some wrong words largely to be used by people in daily record, therefore, by this part word to as same
Adopted word.This kind of word is to there are one common ground, i.e. pronunciation is similar, such as:Fourier and Fourier, saltcake and mirabilite, Yoga and yoga
Deng.The pronunciation of word is obtained by parsing search dog cell dictionary, the pronunciation similarity calculation of the pronunciation feature in step 2) is public
Formula is as follows:
Wherein,Represent w1Pronunciation,Word is represented to (w1, w2) pronunciation smallest edit distance,Word is represented to (w1, w2) in maximum pronunciation length;Word is represented to (w1, w2) reading
Sound similarity.Following table 3 lists the example of a pronunciation feature:
Table 3:Pronunciation feature
The query word that same a line is appeared in patent search daily record is similar word or related term, because these vocabulary are all pair
The different describing modes of the same patent.
Patent search daily record be will appear in the vocabulary in a line as a query characteristics, treated, and partial monopoly is searched
Rope log query information is as shown in table 4.
Table 4:Part treated patent search log query string
Query word in the same row is that the possibility of synonym is bigger as can be seen from Table 4, described in step 2)
The calculation formula of query characteristics is as follows:
(w1, w2) ∈ row represent word to (w1, w2) same a line in patent search daily record occurs,Generation
Table word is to (w1, w2) do not occur in same a line of patent search daily record.The query characteristics of 4 middle part participle pair of table are shown in table 5
Value:
Table 5:Query characteristics
The patent search daily record that following embodiment is provided using certain patent search system, total size 10G.It is right first
Patent search daily record is pre-processed, and candidate synonym collection is extracted according to the characteristics of synonym occurs in patent search daily record,
Then literal feature, pronunciation feature and the query characteristics of word pair in daily record after handling are extracted respectively, and using artificial mark
4741 words are to for training corpus, wherein synonym word pair 2108, non-synonymous word word pair 2633, and use " 1 " and " -1 "
Label synonym pair and non-synonymous word pair respectively.
It sequentially adds literal, pronunciation and query characteristics is tested, the variation table of the feature weight factor in each feature combination
As shown in table 6:
Table 6:Feature weight factor variations table
Wherein, feature combination 1 refers to literal feature;Feature combination 2 refers to literal feature+pronunciation feature;Feature combines 3
Refer to literal feature+pronunciation feature+query characteristics.The results are shown in Table 7 for each feature combination:
Table 7:SVM model experiment results
The accuracy of feature combination 3 as can be seen from Table 7, recall rate and F values all increase, therefore context of methods is adopted
It is combined, is compared using the method and method commonly used in the prior art of the present invention, experimental result such as table 8, table with No. 3 features
Shown in 9.
Table 8:Experiment statistics result
Table 9:Contrast and experiment
Wherein, identify that word logarithm refers to the word logarithm in the synonym table excavated.
From table 8 and table 9 as can be seen that with each feature addition, method using the present invention, synonym identification
Accuracy, recall rate and F values are higher than the prior art.It can be seen that the present invention is by choosing literal feature, pronunciation feature
The accuracy of the synonym identification in patent search daily record field can be effectively improved with query characteristics.
Embodiments of the present invention above described embodiment only expresses, the description thereof is more specific and detailed, but can not
Therefore it is interpreted as the limitation to the scope of the claims of the present invention.It should be pointed out that for those of ordinary skill in the art,
Without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection model of the present invention
It encloses.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.
Claims (3)
1. a kind of synonym automatic mining method based on patent search daily record user behavior, which is characterized in that including following step
Suddenly:
Step 1) pre-processes patent search daily record, is obtained using the stay in place form of patent search daily record synset candidate
Synset;
Step 2) extracts literal feature, pronunciation feature and the query characteristics of the candidate synonym of candidate synonym concentration;
Step 3), which uses, manually marks several words to for training corpus, and marked respectively using " 1 " and " -1 " synonym pair with
Non-synonymous word pair;
Literal feature, pronunciation feature and query characteristics are added SVM models and carry out word to identification by step 4);
Wherein, patent search daily record is will appear in the vocabulary in a line as a query characteristics, is calculated using following formula
Query characteristics value:
Wherein,
(w1, w2) ∈ row represent word to (w1, w2) same a line in patent search daily record occurs,Represent word
To (w1, w2) do not occur in same a line of patent search daily record.
2. the synonym automatic mining method according to claim 1 based on patent search daily record user behavior, feature
It is, the step 1) is specially:
Step A:The query string of filtering useless, using regular expression remove patent search daily record in application number, publication number, point
The patent information that class-mark is inquired;
Step B:To patent search daily record progress, full-shape is converted to half-angle, traditional font is converted to simplified processing;
Step C:The synonymous word structure in patent search daily record is extracted according to the stay in place form of candidate synonym collection;
Step D:According to name identifier rule-based filtering name information, candidate synonym collection is obtained.
3. the synonym automatic mining method according to claim 1 based on patent search daily record user behavior, feature
It is, the pronunciation calculating formula of similarity of the pronunciation feature is as follows:
Wherein,Represent w1Pronunciation,Word is represented to (w1, w2) pronunciation smallest edit distance,
Word is represented to (w1, w2) in maximum pronunciation length;Word is represented to (w1, w2)
Pronunciation similarity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510701365.1A CN105335351B (en) | 2015-10-27 | 2015-10-27 | A kind of synonym automatic mining method based on patent search daily record user behavior |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510701365.1A CN105335351B (en) | 2015-10-27 | 2015-10-27 | A kind of synonym automatic mining method based on patent search daily record user behavior |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105335351A CN105335351A (en) | 2016-02-17 |
CN105335351B true CN105335351B (en) | 2018-08-28 |
Family
ID=55285896
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510701365.1A Active CN105335351B (en) | 2015-10-27 | 2015-10-27 | A kind of synonym automatic mining method based on patent search daily record user behavior |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105335351B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110111778B (en) * | 2019-04-30 | 2021-11-12 | 北京大米科技有限公司 | Voice processing method and device, storage medium and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622338A (en) * | 2012-02-24 | 2012-08-01 | 北京工业大学 | Computer-assisted computing method of semantic distance between short texts |
CN102760134A (en) * | 2011-04-28 | 2012-10-31 | 北京百度网讯科技有限公司 | Method and device for mining synonyms |
CN102955774A (en) * | 2012-05-30 | 2013-03-06 | 华东师范大学 | Control method and device for calculating Chinese word semantic similarity |
CN103870489A (en) * | 2012-12-13 | 2014-06-18 | 北京信息科技大学 | Chinese name self-extension recognition method based on search logs |
CN103942339A (en) * | 2014-05-08 | 2014-07-23 | 深圳市宜搜科技发展有限公司 | Synonym mining method and device |
-
2015
- 2015-10-27 CN CN201510701365.1A patent/CN105335351B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102760134A (en) * | 2011-04-28 | 2012-10-31 | 北京百度网讯科技有限公司 | Method and device for mining synonyms |
CN102622338A (en) * | 2012-02-24 | 2012-08-01 | 北京工业大学 | Computer-assisted computing method of semantic distance between short texts |
CN102955774A (en) * | 2012-05-30 | 2013-03-06 | 华东师范大学 | Control method and device for calculating Chinese word semantic similarity |
CN103870489A (en) * | 2012-12-13 | 2014-06-18 | 北京信息科技大学 | Chinese name self-extension recognition method based on search logs |
CN103942339A (en) * | 2014-05-08 | 2014-07-23 | 深圳市宜搜科技发展有限公司 | Synonym mining method and device |
Non-Patent Citations (4)
Title |
---|
专利领域同义词识别;李军锋 等;《小型微型计算机系统》;20150430;第36卷(第4期);论文第2-3节 * |
利用字面相似度识别汉语同义词的实验;侯汉清 等;《第15届全国计算机信息管理学术研讨会论文集》;20011231;全文 * |
基于专利搜索日志的同义词挖掘;王颖 等;《计算机工程与设计》;20130331;第34卷(第3期);论文第2-3节 * |
基于大规模语料库的汉语词义相似度计算方法;石静 等;《中文信息学报》;20130131;第27卷(第1期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN105335351A (en) | 2016-02-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107133213B (en) | Method and system for automatically extracting text abstract based on algorithm | |
CN104199972B (en) | A kind of name entity relation extraction and construction method based on deep learning | |
CN107193801B (en) | Short text feature optimization and emotion analysis method based on deep belief network | |
CN107463607B (en) | Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning | |
CN109635297B (en) | Entity disambiguation method and device, computer device and computer storage medium | |
CN108536677A (en) | A kind of patent text similarity calculating method | |
CN108920482B (en) | Microblog short text classification method based on lexical chain feature extension and LDA (latent Dirichlet Allocation) model | |
CN107992542A (en) | A kind of similar article based on topic model recommends method | |
CN105701084A (en) | Characteristic extraction method of text classification on the basis of mutual information | |
CN106569993A (en) | Method and device for mining hypernym-hyponym relation between domain-specific terms | |
CN103309852A (en) | Method for discovering compound words in specific field based on statistics and rules | |
CN104281653A (en) | Viewpoint mining method for ten million microblog texts | |
CN103678412A (en) | Document retrieval method and device | |
CN113962293B (en) | LightGBM classification and representation learning-based name disambiguation method and system | |
CN102043808A (en) | Method and equipment for extracting bilingual terms using webpage structure | |
CN101702167A (en) | Method for extracting attribution and comment word with template based on internet | |
CN108763348A (en) | A kind of classification improved method of extension short text word feature vector | |
CN101477518A (en) | Tour field named entity recognition method based on condition random field | |
CN106126502A (en) | A kind of emotional semantic classification system and method based on support vector machine | |
CN113033183B (en) | Network new word discovery method and system based on statistics and similarity | |
CN108108482B (en) | Method for realizing scene reality enhancement in scene conversion | |
CN110188359B (en) | Text entity extraction method | |
CN109062904A (en) | Logical predicate extracting method and device | |
CN104657375A (en) | Image-text theme description method, device and system | |
CN110705292A (en) | Entity name extraction method based on knowledge base and deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |