CN105335351B - A kind of synonym automatic mining method based on patent search daily record user behavior - Google Patents

A kind of synonym automatic mining method based on patent search daily record user behavior Download PDF

Info

Publication number
CN105335351B
CN105335351B CN201510701365.1A CN201510701365A CN105335351B CN 105335351 B CN105335351 B CN 105335351B CN 201510701365 A CN201510701365 A CN 201510701365A CN 105335351 B CN105335351 B CN 105335351B
Authority
CN
China
Prior art keywords
daily record
synonym
patent search
word
search daily
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510701365.1A
Other languages
Chinese (zh)
Other versions
CN105335351A (en
Inventor
吕学强
周建设
董志安
李雪伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital Normal University
Beijing Information Science and Technology University
Original Assignee
Capital Normal University
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital Normal University, Beijing Information Science and Technology University filed Critical Capital Normal University
Priority to CN201510701365.1A priority Critical patent/CN105335351B/en
Publication of CN105335351A publication Critical patent/CN105335351A/en
Application granted granted Critical
Publication of CN105335351B publication Critical patent/CN105335351B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of synonym automatic mining methods based on patent search daily record user behavior, include the following steps:Step 1) pre-processes patent search daily record, and candidate synonym collection is obtained using the stay in place form of patent search daily record synset;Step 2) extracts literal feature, pronunciation feature and the query characteristics of the candidate synonym of candidate synonym concentration.Synonym automatic mining method provided by the invention based on patent search daily record user behavior, the accuracy that can effectively improve the synonym identification in patent search daily record field by choosing literal feature, pronunciation feature and query characteristics, can meet the needs of practical application well.

Description

A kind of synonym automatic mining method based on patent search daily record user behavior
Technical field
The invention belongs to Chinese information retrieval technical fields, and in particular to a kind of based on patent search daily record user behavior Synonym automatic mining method.
Background technology
With the fast development of science and technology, various emerging high-tech products more and more flood the market, patent letter It ceases and has been paid much attention to by people as a kind of residence law, technology, the economic specific information resource in one.Patent search engine As a basic means of patent information inquiry, it is used widely.Whether user can retrieve satisfied information and search Indexing the thesaurus held up has very close relationship, and synonym is the part for forming thesaurus, in order to make user inquire more Comprehensive detailed patent information, synonym Research on Mining are particularly important.
There are a large amount of wrong words, some wrong words to be widely used by people in patent search daily record, this kind of word with and it Corresponding correct word is also considered as synonym, such as carbon nanotube and carbon nanotube, Yoga and yoga.In addition to this, patent is searched There are many unregistered words in Suo Zhi, thus it is existing《Hownet》With《Chinese thesaurus》This kind of synonym resource cannot be used for The synonym of patent search daily record excavates.Traditional synonym define refer to a things different expression-forms, pass through analysis In patent search daily record the characteristics of vocabulary, the synonym of patent field can substantially be divided into following eight major class:1) Chinese-English, This kind of synonym is mainly two kinds of different expression-forms for describing identical concept, such as:Zinc-Zn, Email-email;2) it learns Name-popular name refers to the written word and works and expressions for everyday use of same thing, such as:Ethyl alcohol-alcohol;3) full name-abbreviation refers to the original of same thing Title and simplified title, such as:Peking University-Beijing University, short message-short message, time stab-time stamp;4) unisonance synonym, this kind of word Mainly caused by the wrong word that high frequency uses, such as:Yoga-yoga, gamma-gamma, Bezafibrate-bezafibrate, automobile- Gas vehicle;5) newly claim-be once called as, refer to two kinds of address modes of different times identical concept, such as bicycle-bicycle;6) tradition is synonymous Word, it is identical and be not belonging to the word of the above classification to refer to concept, such as chitin-chitin, threshold value-thresholding;7) antonym refers to concept and cuts So opposite word, such as go out-enter, increase-reduction, left-hand rotation-right-hand rotation;8) synonym caused by translating, this kind of word is turned over to English It translates, pronunciation is roughly the same, such as:Epcos AG-Epcos AG, Rosemount Inc-Rosemount are public Department.
Currently, synonym resource has been widely used in various fields, as information retrieval, semantic disambiguation, query expansion, Keyword extraction, machine translation etc..With the promotion of application, the method for automatic mining synonym emerges one after another, at this stage mainly There are following two methods:Synonym based on corpus and based on dictionary excavates.But both methods exists centainly Defect:Method based on corpus easy tos produce matrix Sparse Problems;Synonym method for digging based on dictionary is easy to be led The limitation in domain can not play a role well.
Invention content
For the above-mentioned prior art the problem of, the purpose of the present invention is to provide one kind can avoid above-mentioned skill occur The synonym automatic mining method based on patent search daily record user behavior of art defect.
In order to achieve the above-mentioned object of the invention, the technical solution adopted by the present invention is as follows:
A kind of synonym automatic mining method based on patent search daily record user behavior, includes the following steps:
Step 1) pre-processes patent search daily record, is obtained using the stay in place form of patent search daily record synset Candidate synonym collection;
Step 2) extracts literal feature, pronunciation feature and the query characteristics of the candidate synonym of candidate synonym concentration.
Further, the step 1) is specially:
Step A:The query string of filtering useless, using regular expression remove patent search daily record in application number, openly Number, the patent information inquired of classification number;
Step B:To patent search daily record progress, full-shape is converted to half-angle, traditional font is converted to simplified processing;
Step C:The synonymous word structure in patent search daily record is extracted according to the stay in place form of candidate synonym collection;
Step D:According to name identifier rule-based filtering name information, candidate synonym collection is obtained.
Further, whether the literal feature includes moving similarity after maximum similarity, minimum similarity degree, center of gravity, having There is same prefix and whether there are five features of identical suffix, wherein:The calculation formula of the maximum similarity is as follows:
The calculation formula of the minimum similarity degree is as follows:
The calculation formula that similarity is moved after the center of gravity is as follows:
Wherein, Sim_zimianmax(w1, w2) word is represented to (w1, w2) maximum similarity;Sim_zimianmin(w1, w2) Word is represented to (w1, w2) minimum similarity degree;Sim_zimianzhongxin(w1, w2) word is represented to (w1, w2) center of gravity after phase shift seemingly Degree;same(w1, w2) word is represented to (w1, w2) in same word number;min(|w1|, | w2|) word is represented to (w1, w2) in it is minimum Word it is long;max(|w1|, | w2|) word is represented to (w1, w2) in maximum word it is long;|w1| represent w1Word it is long;|w2| represent w2Word It is longRefer to weights sum of the identical word in word different location;K is represented in word Word number, same (w1, m) and represent the position of identical word;Wherein, α=0.6, β=0.4, γ=1.
Further, the pronunciation calculating formula of similarity of the pronunciation feature is as follows:
Wherein,Represent w1Pronunciation,Word is represented to (w1, w2) pronunciation smallest edit distance,Word is represented to (w1, w2) in maximum pronunciation length;Word is represented to (w1, w2) reading Sound similarity.
Further, patent search daily record is will appear in the vocabulary in a line as a query characteristics, and utilization is following Formula calculates query characteristics value:
(w1, w2) ∈ row represent word to (w1, w2) same a line in patent search daily record occurs,Generation Table word is to (w1, w2) do not occur in same a line of patent search daily record.
Synonym automatic mining method provided by the invention based on patent search daily record user behavior, it is literal by choosing Feature, pronunciation feature and query characteristics can effectively improve the accuracy of the synonym identification in patent search daily record field, can To meet the needs of practical application well.
Description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2 is the specific steps flow chart of step 1);
Fig. 3 is that the data of a linearly inseparable convert the linear separability sample obtained later by gaussian kernel function, In, the point being circled is supporting vector.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with the accompanying drawings and specific implementation The present invention will be further described for example.It should be appreciated that described herein, specific examples are only used to explain the present invention, and does not have to It is of the invention in limiting.
As shown in Figure 1, a kind of synonym automatic mining method based on patent search daily record user behavior, including following step Suddenly:
Step 1) pre-processes patent search daily record, is obtained using the stay in place form of patent search daily record synset Candidate synonym collection;
Step 2) extracts literal feature, pronunciation feature and the query characteristics of the candidate synonym of candidate synonym concentration.
Most of query string in patent search daily record contains a variety of describing modes of a things, these describing modes Between be attached by the logical operators such as " or ", " and ", " not ", the part vocabulary of these logical operators connection exists Coordination.It is as shown in table 1 by analyzing the characteristics of synonym is distributed in patent search daily record.
Table 1:Processed patent search daily record language material
The synset stay in place form of structure mainly has following five kinds:
1. template 1
“'words1'OR'words2'OR'words3'", wherein Words1, words2 and words3 are candidate synonym collection;The template is connected with " OR " or " or ", is simplest synset Stay in place form, as shown in 18 rows in Fig. 1;
2. template 2
“('words1'pre/2'words2')OR'words3'", " & apos;words1'pre/2'words2'" indicate that the phrase that words1 and words2 is constituted connects with " OR " The words3 connect is that candidate synonym, i.e. words1+words2 and words3 are candidate synonym, such as 19,24,26 row in Fig. 1 It is shown;
3. template 3
“'words1'OR('words2'AND'words3'AND& apos;words4') ", wherein query statement " 'words2'AND'words3'AND& apos;words4'" indicate that words2, words3 and words4 constitute the words1 that a phrase is connect with " OR " and constitute Candidate synonym, i.e. word1 and words2+words3+words4 are candidate synonym, as shown in 27 rows in Fig. 1;
4. template 4
“'words1'OR('words2'and/sen'words3') ", Wherein word1 and words2+word3 is candidate synonym, as shown in 29 rows in Fig. 1;
5. template 5
" indications='words1'OR indications=s 'words2'OR indications=& apos;words3'", indications are usually " DESC1 ", " KWRF ", " TICN ", " ABS ", " TI, KW+ " etc. are referred to not Same query word characteristic.Wherein words1, words2 and words3 refer to the candidate synonym collection for having same nature, such as table 1 In shown in 17,20,22,23,28 rows.
Candidate synonym collection is obtained using the stay in place form of patent search daily record synset.First, regular expressions are utilized The patent information inquired with application number, publication number, classification number in formula removal patent search daily record.Due to inquiring the defeated of record Enter method and font disunity, full-shape is converted to half-angle, traditional font is converted to simplified processing to daily record progress.According to synset Patent search daily record is carried out division processing by stay in place form 1, template 2, template 3, template 4, template 5.Step 1) is excavated candidate same The flow chart of adopted word set is as shown in Fig. 2, step 1) is specially:
Step A:The query string of filtering useless, using regular expression remove patent search daily record in application number, openly Number, the patent information inquired of classification number;
Step B:To patent search daily record progress, full-shape is converted to half-angle, traditional font is converted to simplified processing;
Step C:The synonymous word structure in patent search daily record is extracted according to the stay in place form of candidate synonym collection;
Step D:According to name identifier rule-based filtering name information, candidate synonym collection is obtained.
Wherein:In the query string of filtering useless, retain with title, address, applicant, inventor, patent agency Deng the information for carrying out patent consulting.By analyzing patent search daily record find that merely with synonym template name letter can not be filtered Breath extracts the indications rule comprising name, such as to improve the accuracy and recall rate of synonym table:" INV ", " ATCN ", " GK_IN " etc. filter the interference of name information, in row 17 and row 23 in table 1 according to the rule of name indications Content.Include nearly 30,000 query word in patent search daily record after processed, including Chinese, English and Japanese vocabulary.
In table 1 shown in 18 rows, candidate synonym collection is:Chitin chitin chitosan, then candidate synonym is to just having 3 It is right, i.e.,:Chitin chitin;Chitin chitosan;Chitin chitosan.Synonym in patent search daily record is made full use of to be distributed The characteristics of, the accuracy rate of the candidate synonym collection of acquisition is also relatively high.
The basic thought of supporting vector machine model:An optimal hyperplane is defined, and optimal hyperlane will be found and calculated Method is attributed to the problem of solution convex programming.Then according to the expansion theorem of Mercer cores, pass through a Nonlinear Mapping Sample space is mapped to compared with (spaces Hilbert) in higher-dimension or infinite dimensional feature space, in this way in feature space model Point of the recurrence in sample space model, estimation of density function and high dimensional nonlinear can be solved using linear learning method Class problem.Especially prominent in solving the problems, such as text classification, the recall rate and accuracy obtained using this method is superior to other Method.
It is pattern recognition problem that classification problem, which is also known as, is exactly to find point inherent in data according to existing observation data Then class relationship is treated prediction data using obtained disaggregated model y=M (x) and is tested.Synonymous word identification problem is exactly one A two classifying and dividing is exactly to find a suitable function y=f (x), by f (xiThe x of) >=0iIt is classified as positive class, by f (xi) < 0 XiIt is classified as negative class.
The kernel function of supporting vector machine model, common is mainly the following:
1. polynomial kernel K (x, xi)=(axy+c)d (1)
It is the linear separability sample that the data of a linearly inseparable obtain later by gaussian kernel function transformation as shown in Figure 3 This, wherein the point being circled is supporting vector.
In machine learning method, the selection of feature is very important classification.The candidate that the present invention chooses is synonymous Word is all word pair similar in the meaning of a word, therefore the division of classification is only difficult to realize by simple feature.The present invention not only considers word Region feature also takes into account the similar feature of pronunciation and user query behavior feature.
Synonym is exactly usually morpheme having the same there are an apparent feature, such as:Peking University and Beijing University, Running shoes and running shoe, timestamp value and time stamp etc..Therefore the feature of literal similarity is considered when using support vector machines.Word Whether region feature includes mainly maximum similarity, and minimum similarity degree moves similarity after center of gravity, if having same prefix and have The calculating formula of similarity difference of five features of identical suffix, wherein first three feature is as follows:
The calculation formula of the maximum similarity is
The calculation formula of the minimum similarity degree is
The calculation formula of shifting similarity is after the center of gravity
Wherein, Sim_zimianmax(w1, w2) word is represented to (w1, w2) maximum similarity;Sim_zimianmin(w1, w2) Word is represented to (w1, w2) minimum similarity degree;Sim_zimianzhongxin(w1, w2) word is represented to (w1, w2) center of gravity after phase shift seemingly Degree;same(w1, w2) word is represented to (w1, w2) in same word number;min(|w1|, | w2|) word is represented to (w1, w2) in it is minimum Word it is long;max(|w1|, | w2|) word is represented to (w1, w2) in maximum word it is long;|w1| represent w1Word it is long;|w2| represent w2Word It is longRefer to weights sum of the identical word in word different location;K is represented in word Word number, same (w1, m) and represent the position of identical word;Wherein, α=0.6, β=0.4, γ=1.Following table 2 is listed The example of one literal feature:
Table 2:Literal feature
There are many wrong words, some wrong words largely to be used by people in daily record, therefore, by this part word to as same Adopted word.This kind of word is to there are one common ground, i.e. pronunciation is similar, such as:Fourier and Fourier, saltcake and mirabilite, Yoga and yoga Deng.The pronunciation of word is obtained by parsing search dog cell dictionary, the pronunciation similarity calculation of the pronunciation feature in step 2) is public Formula is as follows:
Wherein,Represent w1Pronunciation,Word is represented to (w1, w2) pronunciation smallest edit distance,Word is represented to (w1, w2) in maximum pronunciation length;Word is represented to (w1, w2) reading Sound similarity.Following table 3 lists the example of a pronunciation feature:
Table 3:Pronunciation feature
The query word that same a line is appeared in patent search daily record is similar word or related term, because these vocabulary are all pair The different describing modes of the same patent.
Patent search daily record be will appear in the vocabulary in a line as a query characteristics, treated, and partial monopoly is searched Rope log query information is as shown in table 4.
Table 4:Part treated patent search log query string
Query word in the same row is that the possibility of synonym is bigger as can be seen from Table 4, described in step 2) The calculation formula of query characteristics is as follows:
(w1, w2) ∈ row represent word to (w1, w2) same a line in patent search daily record occurs,Generation Table word is to (w1, w2) do not occur in same a line of patent search daily record.The query characteristics of 4 middle part participle pair of table are shown in table 5 Value:
Table 5:Query characteristics
The patent search daily record that following embodiment is provided using certain patent search system, total size 10G.It is right first Patent search daily record is pre-processed, and candidate synonym collection is extracted according to the characteristics of synonym occurs in patent search daily record, Then literal feature, pronunciation feature and the query characteristics of word pair in daily record after handling are extracted respectively, and using artificial mark 4741 words are to for training corpus, wherein synonym word pair 2108, non-synonymous word word pair 2633, and use " 1 " and " -1 " Label synonym pair and non-synonymous word pair respectively.
It sequentially adds literal, pronunciation and query characteristics is tested, the variation table of the feature weight factor in each feature combination As shown in table 6:
Table 6:Feature weight factor variations table
Wherein, feature combination 1 refers to literal feature;Feature combination 2 refers to literal feature+pronunciation feature;Feature combines 3 Refer to literal feature+pronunciation feature+query characteristics.The results are shown in Table 7 for each feature combination:
Table 7:SVM model experiment results
The accuracy of feature combination 3 as can be seen from Table 7, recall rate and F values all increase, therefore context of methods is adopted It is combined, is compared using the method and method commonly used in the prior art of the present invention, experimental result such as table 8, table with No. 3 features Shown in 9.
Table 8:Experiment statistics result
Table 9:Contrast and experiment
Wherein, identify that word logarithm refers to the word logarithm in the synonym table excavated.
From table 8 and table 9 as can be seen that with each feature addition, method using the present invention, synonym identification Accuracy, recall rate and F values are higher than the prior art.It can be seen that the present invention is by choosing literal feature, pronunciation feature The accuracy of the synonym identification in patent search daily record field can be effectively improved with query characteristics.
Embodiments of the present invention above described embodiment only expresses, the description thereof is more specific and detailed, but can not Therefore it is interpreted as the limitation to the scope of the claims of the present invention.It should be pointed out that for those of ordinary skill in the art, Without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection model of the present invention It encloses.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (3)

1. a kind of synonym automatic mining method based on patent search daily record user behavior, which is characterized in that including following step Suddenly:
Step 1) pre-processes patent search daily record, is obtained using the stay in place form of patent search daily record synset candidate Synset;
Step 2) extracts literal feature, pronunciation feature and the query characteristics of the candidate synonym of candidate synonym concentration;
Step 3), which uses, manually marks several words to for training corpus, and marked respectively using " 1 " and " -1 " synonym pair with Non-synonymous word pair;
Literal feature, pronunciation feature and query characteristics are added SVM models and carry out word to identification by step 4);
Wherein, patent search daily record is will appear in the vocabulary in a line as a query characteristics, is calculated using following formula Query characteristics value:
Wherein,
(w1, w2) ∈ row represent word to (w1, w2) same a line in patent search daily record occurs,Represent word To (w1, w2) do not occur in same a line of patent search daily record.
2. the synonym automatic mining method according to claim 1 based on patent search daily record user behavior, feature It is, the step 1) is specially:
Step A:The query string of filtering useless, using regular expression remove patent search daily record in application number, publication number, point The patent information that class-mark is inquired;
Step B:To patent search daily record progress, full-shape is converted to half-angle, traditional font is converted to simplified processing;
Step C:The synonymous word structure in patent search daily record is extracted according to the stay in place form of candidate synonym collection;
Step D:According to name identifier rule-based filtering name information, candidate synonym collection is obtained.
3. the synonym automatic mining method according to claim 1 based on patent search daily record user behavior, feature It is, the pronunciation calculating formula of similarity of the pronunciation feature is as follows:
Wherein,Represent w1Pronunciation,Word is represented to (w1, w2) pronunciation smallest edit distance,
Word is represented to (w1, w2) in maximum pronunciation length;Word is represented to (w1, w2) Pronunciation similarity.
CN201510701365.1A 2015-10-27 2015-10-27 A kind of synonym automatic mining method based on patent search daily record user behavior Active CN105335351B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510701365.1A CN105335351B (en) 2015-10-27 2015-10-27 A kind of synonym automatic mining method based on patent search daily record user behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510701365.1A CN105335351B (en) 2015-10-27 2015-10-27 A kind of synonym automatic mining method based on patent search daily record user behavior

Publications (2)

Publication Number Publication Date
CN105335351A CN105335351A (en) 2016-02-17
CN105335351B true CN105335351B (en) 2018-08-28

Family

ID=55285896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510701365.1A Active CN105335351B (en) 2015-10-27 2015-10-27 A kind of synonym automatic mining method based on patent search daily record user behavior

Country Status (1)

Country Link
CN (1) CN105335351B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111778B (en) * 2019-04-30 2021-11-12 北京大米科技有限公司 Voice processing method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622338A (en) * 2012-02-24 2012-08-01 北京工业大学 Computer-assisted computing method of semantic distance between short texts
CN102760134A (en) * 2011-04-28 2012-10-31 北京百度网讯科技有限公司 Method and device for mining synonyms
CN102955774A (en) * 2012-05-30 2013-03-06 华东师范大学 Control method and device for calculating Chinese word semantic similarity
CN103870489A (en) * 2012-12-13 2014-06-18 北京信息科技大学 Chinese name self-extension recognition method based on search logs
CN103942339A (en) * 2014-05-08 2014-07-23 深圳市宜搜科技发展有限公司 Synonym mining method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102760134A (en) * 2011-04-28 2012-10-31 北京百度网讯科技有限公司 Method and device for mining synonyms
CN102622338A (en) * 2012-02-24 2012-08-01 北京工业大学 Computer-assisted computing method of semantic distance between short texts
CN102955774A (en) * 2012-05-30 2013-03-06 华东师范大学 Control method and device for calculating Chinese word semantic similarity
CN103870489A (en) * 2012-12-13 2014-06-18 北京信息科技大学 Chinese name self-extension recognition method based on search logs
CN103942339A (en) * 2014-05-08 2014-07-23 深圳市宜搜科技发展有限公司 Synonym mining method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
专利领域同义词识别;李军锋 等;《小型微型计算机系统》;20150430;第36卷(第4期);论文第2-3节 *
利用字面相似度识别汉语同义词的实验;侯汉清 等;《第15届全国计算机信息管理学术研讨会论文集》;20011231;全文 *
基于专利搜索日志的同义词挖掘;王颖 等;《计算机工程与设计》;20130331;第34卷(第3期);论文第2-3节 *
基于大规模语料库的汉语词义相似度计算方法;石静 等;《中文信息学报》;20130131;第27卷(第1期);全文 *

Also Published As

Publication number Publication date
CN105335351A (en) 2016-02-17

Similar Documents

Publication Publication Date Title
CN107133213B (en) Method and system for automatically extracting text abstract based on algorithm
CN104199972B (en) A kind of name entity relation extraction and construction method based on deep learning
CN107193801B (en) Short text feature optimization and emotion analysis method based on deep belief network
CN107463607B (en) Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning
CN109635297B (en) Entity disambiguation method and device, computer device and computer storage medium
CN108536677A (en) A kind of patent text similarity calculating method
CN108920482B (en) Microblog short text classification method based on lexical chain feature extension and LDA (latent Dirichlet Allocation) model
CN107992542A (en) A kind of similar article based on topic model recommends method
CN105701084A (en) Characteristic extraction method of text classification on the basis of mutual information
CN106569993A (en) Method and device for mining hypernym-hyponym relation between domain-specific terms
CN103309852A (en) Method for discovering compound words in specific field based on statistics and rules
CN104281653A (en) Viewpoint mining method for ten million microblog texts
CN103678412A (en) Document retrieval method and device
CN113962293B (en) LightGBM classification and representation learning-based name disambiguation method and system
CN102043808A (en) Method and equipment for extracting bilingual terms using webpage structure
CN101702167A (en) Method for extracting attribution and comment word with template based on internet
CN108763348A (en) A kind of classification improved method of extension short text word feature vector
CN101477518A (en) Tour field named entity recognition method based on condition random field
CN106126502A (en) A kind of emotional semantic classification system and method based on support vector machine
CN113033183B (en) Network new word discovery method and system based on statistics and similarity
CN108108482B (en) Method for realizing scene reality enhancement in scene conversion
CN110188359B (en) Text entity extraction method
CN109062904A (en) Logical predicate extracting method and device
CN104657375A (en) Image-text theme description method, device and system
CN110705292A (en) Entity name extraction method based on knowledge base and deep learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant