CN111552806B - Method for unsupervised construction of entity set in building field - Google Patents
Method for unsupervised construction of entity set in building field Download PDFInfo
- Publication number
- CN111552806B CN111552806B CN202010302187.6A CN202010302187A CN111552806B CN 111552806 B CN111552806 B CN 111552806B CN 202010302187 A CN202010302187 A CN 202010302187A CN 111552806 B CN111552806 B CN 111552806B
- Authority
- CN
- China
- Prior art keywords
- word
- score
- words
- probability
- calculated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a method for unsupervised construction of an entity set in the building field, which comprises the following steps: s1, acquiring texts to be screened, and dividing each sentence in the acquired texts into M characters or/and words, wherein M is a positive integer greater than or equal to 1; s2, selecting the first K words with the highest overall score as a candidate word set D, wherein K is a positive integer greater than or equal to 1; and S3, screening out the vocabulary with similar semantic characteristics with the building field vocabulary in the candidate vocabulary set D as the field words. The invention can classify the words in the acquired text to be screened, and screen out and filter out words which do not belong to the building field.
Description
Technical Field
The invention relates to the technical field of knowledge maps, in particular to a method for unsupervised construction of an entity set in the building field.
Background
As a semantic network, knowledge-graph can solve many practical problems under the energization of big data. It is envisaged that there is still more knowledge that does not break the size bottleneck. Other types of knowledge representations will also be able to solve more practical problems under the enablement of large data. Knowledge required by more and more fields of application breaks through the scope of knowledge graph and demands other knowledge (such as production rules, Bayesian networks, decision trees and the like). Natural language is exceptionally complex: natural language has ambiguity, diversity, and semantic understanding has ambiguity and depends on context. The root cause of the difficulty in understanding natural language by machine is that human language understanding is based on the cognitive ability of human beings, and the background knowledge formed by the cognitive experience of human beings is the fundamental strut supporting human language understanding.
Disclosure of Invention
The invention aims to at least solve the technical problems in the prior art, and particularly creatively provides a method for unsupervised construction of an entity set in the building field.
In order to achieve the above object, the present invention provides a method for unsupervised construction of a building domain entity set, comprising the following steps:
s1, acquiring texts to be screened, and dividing each sentence in the acquired texts into M characters or/and words, wherein M is a positive integer greater than or equal to 1;
s2, selecting the first K words with the highest overall score as a candidate word set D, wherein K is a positive integer greater than or equal to 1;
and S3, screening out words with similar semantic features with the building field word set in the candidate word set D as field words.
In a preferred embodiment of the present invention, step S1 further includes: and mapping all the words or/and words obtained by segmentation to a word vector space.
In a preferred embodiment of the present invention, step S2 includes the following steps:
and S21, judging whether the calculated coagulation degree score is greater than or equal to a preset coagulation degree score:
if the calculated curdling degree score is greater than or equal to the preset curdling degree score, executing step S22;
if the calculated coagulation degree score is smaller than the preset coagulation degree score, discarding;
and S22, judging whether the calculated left adjacency score is greater than or equal to a preset left adjacency score:
if the calculated left adjacency score is greater than or equal to the preset left adjacency score, executing step S23;
if the calculated left adjacency score is smaller than the preset left adjacency score, the left adjacency score is discarded;
and S23, judging whether the calculated right adjacency score is larger than or equal to a preset right adjacency score:
if the calculated right adjacency score is larger than or equal to the preset right adjacency score, calculating the overall score condition of the right adjacency score;
and if the calculated right adjacency score is smaller than the preset right adjacency score, the right adjacency score is discarded.
In a preferred embodiment of the present invention, the condensation degree score is calculated in step S21 by:
p (X, Y) represents the joint probability of the appearance of the character X and the character Y in the text to be screened;
p (Y) represents the probability of the appearance of the character Y in the text to be screened;
MI (X, Y) represents a word X to word Y cohesion score;
the left-neighborhood score is calculated in step S22 by:
a represents a set formed by all characters appearing on the left side of a character W in a text to be screened;
a represents a certain word in the vocabulary set A;
aW represents the word with word a to the left of word W;
p (aW | W) represents a conditional probability that the word aW occurs under the condition that the word W occurs;
EL(W) represents a left adjacency score;
the method of calculating the right adjacency score in step S23 is:
b represents a set formed by all characters appearing on the right side of the character W in the text to be screened;
b represents a certain word in the vocabulary set B;
wb represents the word with word b to the right of word W;
p (Wb | W) represents a conditional probability of the word Wb occurring under the condition of the word W;
ER(W) represents a right adjacency score.
The overall score is calculated in step S23 by:
Score=λ1·MI(X,Y)+λ2·EL(W)+λ3·ER(W),
wherein λ is1Adjusting the coefficient for the condensation score;
MI (X, Y) represents a word X to word Y cohesion score;
λ2adjusting the coefficients for the left adjacency score;
EL(W) represents a left adjacency score;
λ3adjusting the coefficient for the right adjacency score;
ER(W) represents a right adjacency score.
In a preferred embodiment of the present invention, in step S3, the method for screening out the words in the candidate word set D having semantic features similar to those in the building field word set includes:
P(Z)=∫P(Z|θ)P(θ)dθ,
wherein Z represents a word in the candidate word set D;
θ represents a parameter;
p (Z | θ) represents the conditional probability of the occurrence of the word Z under the condition of the occurrence of the parameter θ;
p (θ) represents a prior probability density function;
p (Z) represents the probability of the occurrence of the word Z under the parameter θ;
wherein D isCRepresenting a set formed by words belonging to the building field word set in the candidate word set D;
Zirepresenting a set of words DCThe word in (1);
P(Zi| θ) indicates that the word Z occurs under the condition that the parameter θ occursiA conditional probability of occurrence;
p (θ) represents a prior probability density function;
P(DC) Representing a set of words DCThe probability of occurrence;
P(Z|DC)=∫P(Z|θ)P(θ|DC)dθ,
wherein P (Z | θ) represents the conditional probability of the occurrence of the word Z under the condition of the occurrence of the parameter θ;
P(θ|DC) Representing a posterior probability density function;
P(Z|DC) Indicating that the word Z belongs to the set of words DCThe probability of (d);
wherein, P (D)C| θ) represents a likelihood function;
p (θ) represents a prior probability density function;
P(DC) Representing a set of words DCThe probability of occurrence;
P(θ|DC) Representing a posterior probability density function;
wherein, P (Z | D)C) Indicating that the word Z belongs to the set of words DCThe probability of (d);
p (Z) represents the probability of the occurrence of the word Z under the parameter θ; (ii) a
score (Z) the expression Z belongs to the set of words D in the building fieldC(ii) a final score of;
if the word Z belongs to the word set D in the building fieldCIf the final score is greater than or equal to the preset score value, the vocabulary Z in the candidate word set belongs to the building field word set DC。
In a preferred embodiment of the present invention, step S1 includes the following steps:
S={s1,s2,s3,…,sns represents the text to be screened, SiRepresenting an ith sentence in the text to be screened, wherein i is a positive integer less than or equal to n;
si=V1V2V3…VM,Vjrepresenting the jth character in the ith sentence in the text to be screened, wherein j is a positive integer less than or equal to M;
si′={V1,V2,V3,…,VM},si' means that the ith sentence is divided into M words.
In a preferred embodiment of the present invention, the expression method of the word vector space is:
ei=Wwrdvi,
wherein e isiA low-dimensional vectorized representation representing a word;
Wwrdthe parameter matrix of the word is obtained through training;
virepresenting a high-dimensional vector input into the computer.
In summary, by adopting the technical scheme, the invention can classify the words in the acquired text to be screened, and screen out and filter out words which do not belong to the building field.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic block diagram of the process of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
The invention provides a method for unsupervised construction of an entity set in the building field, which comprises the following steps:
s1, acquiring texts to be screened, and dividing each sentence in the acquired texts into M characters or/and words, wherein M is a positive integer greater than or equal to 1;
s2, selecting the first K words with the highest overall score as a candidate word set D, wherein K is a positive integer greater than or equal to 1;
s3, extracting terms of the building field normative documents, expressing semantic features of building field vocabularies by word vectors, taking the vocabulary feature mean values with the same attributes as a building field vocabulary set, screening out vocabularies which have similar semantic features with the building field vocabulary set from the candidate vocabulary set D as field vocabularies, and filtering out non-field vocabularies irrelevant to the building field.
In a preferred embodiment of the present invention, step S1 further includes: and mapping all the words or/and words obtained by segmentation to a word vector space.
In a preferred embodiment of the present invention, step S2 includes the following steps:
and S21, judging whether the calculated coagulation degree score is greater than or equal to a preset coagulation degree score:
if the calculated curdling degree score is greater than or equal to the preset curdling degree score, executing step S22;
if the calculated coagulation degree score is smaller than the preset coagulation degree score, discarding;
and S22, judging whether the calculated left adjacency score is greater than or equal to a preset left adjacency score:
if the calculated left adjacency score is greater than or equal to the preset left adjacency score, executing step S23;
if the calculated left adjacency score is smaller than the preset left adjacency score, the left adjacency score is discarded;
and S23, judging whether the calculated right adjacency score is larger than or equal to a preset right adjacency score:
if the calculated right adjacency score is larger than or equal to the preset right adjacency score, calculating the overall score condition of the right adjacency score;
and if the calculated right adjacency score is smaller than the preset right adjacency score, the right adjacency score is discarded.
In a preferred embodiment of the present invention, the condensation degree score is calculated in step S21 by:
p (X, Y) represents the joint probability of the appearance of the character X and the character Y in the text to be screened;
p (X) represents the probability of the appearance of the character X in the text to be screened;
p (Y) represents the probability of the appearance of the character Y in the text to be screened;
MI (X, Y) represents a word X to word Y cohesion score;
the left-neighborhood score is calculated in step S22 by:
a represents a set formed by all characters appearing on the left side of a character W in a text to be screened;
a represents a certain word in the vocabulary set A;
aW represents the word with word a to the left of word W;
p (aW | W) represents a conditional probability that the word aW occurs under the condition that the word W occurs;
EL(W) represents a left adjacency score;
the method of calculating the right adjacency score in step S23 is:
b represents a set formed by all characters appearing on the right side of the character W in the text to be screened;
b represents a certain word in the vocabulary set B;
wb represents the word with word b to the right of word W;
p (Wb | W) represents a conditional probability of the word Wb occurring under the condition of the word W;
ER(W) represents a right adjacency score.
The overall score is calculated in step S23 by:
Score=λ1·MI(X,Y)+λ2·EL(W)+λ3·ER(W),
wherein λ is1Adjusting the coefficient for the condensation score;
MI (X, Y) represents a word X to word Y cohesion score;
λ2adjusting the coefficients for the left adjacency score;
EL(W) represents a left adjacency score;
λ3adjusting the coefficient for the right adjacency score;
ER(W) represents a right adjacency score.
In a preferred embodiment of the present invention, in step S3, the method for screening out the words in the candidate word set D having semantic features similar to those in the building field word set includes:
P(Z)=∫P(Z|θ)P(θ)dθ,
wherein Z represents a word in the candidate word set D, and is also (D-D)C) The word in (1); dCRepresenting a set formed by words belonging to the building field word set in the candidate word set D;
θ represents a parameter;
p (Z | θ) represents the conditional probability of the occurrence of the word Z under the condition of the occurrence of the parameter θ;
p (θ) represents a prior probability density function;
p (Z) represents the probability of the occurrence of the word Z under the parameter θ;
wherein D isCRepresenting a set formed by words belonging to the building field word set in the candidate word set D;
Zirepresenting a set of words DCThe word in (1);
P(Zi| θ) indicates that the word Z occurs under the condition that the parameter θ occursiA conditional probability of occurrence;
p (θ) represents a prior probability density function;
P(DC) Representing a set of words DCThe probability of occurrence;
P(Z|DC)=∫P(Z|θ)P(θ|DC)dθ,
wherein P (Z | θ) represents the conditional probability of the occurrence of the word Z under the condition of the occurrence of the parameter θ;
P(θ|DC) Representing a posterior probability density function;
P(Z|DC) Indicating that the word Z belongs to the set of words DCThe probability of (d);
wherein, P (D)C| θ) represents a likelihood function;
p (θ) represents a prior probability density function;
P(DC) Representing a set of words DCThe probability of occurrence;
P(θ|DC) Representing a posterior probability density function;
wherein, P (Z | D)C) Indicating that the word Z belongs to the set of words DCThe probability of (d);
p (Z) represents the probability of the occurrence of the word Z under the parameter θ; (ii) a
score (Z) the expression Z belongs to the set of words D in the building fieldC(ii) a final score of;
if the word Z belongs to the word set D in the building fieldCIf the final score is greater than or equal to the preset score value, the vocabulary Z in the candidate word set belongs to the building field word set DC。
In a preferred embodiment of the present invention, step S1 includes the following steps:
S={s1,s2,s3,…,sns represents the text to be screened, SiRepresenting an ith sentence in the text to be screened, wherein i is a positive integer less than or equal to n;
si=V1V2V3…VM,Vjrepresenting the jth character in the ith sentence in the text to be screened, wherein j is a positive integer less than or equal to M;
si′={V1,V2,V3,…,VM},si' means that the ith sentence is divided into M words.
In a preferred embodiment of the present invention, the expression method of the word vector space is:
ei=Wwrdvi,
wherein e isiA low-dimensional vectorized representation representing a word;
Wwrdthe parameter matrix of the word is obtained through training;
virepresenting a high-dimensional vector input into the computer.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Claims (6)
1. A method for unsupervised construction of a building field entity set is characterized by comprising the following steps:
s1, acquiring texts to be screened, and dividing each sentence in the acquired texts into M characters or/and words, wherein M is a positive integer greater than or equal to 1;
s2, selecting the first K words with the highest overall score as a candidate word set D, wherein K is a positive integer greater than or equal to 1;
s3, screening out words with similar semantic features to the building field word set in the candidate word set D as field words; the method for screening out the words with similar semantic features with the building field word set in the candidate word set D comprises the following steps:
P(Z)=∫P(Z|θ)P(θ)dθ,
wherein Z represents a word in the candidate word set D;
θ represents a parameter;
p (Z | θ) represents the conditional probability of the occurrence of the word Z under the condition of the occurrence of the parameter θ;
p (θ) represents a prior probability density function;
p (Z) represents the probability of the occurrence of the word Z under the parameter θ;
wherein D isCRepresenting a set formed by words belonging to the building field word set in the candidate word set D;
Zirepresenting a set of words DCThe word in (1);
P(Zi| θ) indicates that the word Z occurs under the condition that the parameter θ occursiA conditional probability of occurrence;
p (θ) represents a prior probability density function;
P(DC) Representing a set of words DCThe probability of occurrence;
P(Z|DC)=∫P(Z|θ)P(θ|DC)dθ,
wherein P (Z | θ) represents the conditional probability of the occurrence of the word Z under the condition of the occurrence of the parameter θ;
P(θ|DC) Representing a posterior probability density function;
P(Z|DC) Indicating that the word Z belongs to the set of words DCThe probability of (d);
wherein, P (D)C| θ) represents a likelihood function;
p (θ) represents a prior probability density function;
P(DC) Representing a set of words DCThe probability of occurrence;
P(θ|DC) Representing a posterior probability density function;
wherein, P (Z | D)C) Indicating that the word Z belongs to the set of words DCThe probability of (d);
p (Z) represents the probability of the occurrence of the word Z under the parameter θ;
score (Z) the expression Z belongs to the set of words D in the building fieldC(ii) a final score of;
if the word Z belongs to the word set D in the building fieldCIf the final score is greater than or equal to the preset score value, the vocabulary Z in the candidate word set belongs to the building field word set DC。
2. The unsupervised construction method of building domain entity set of claim 1, further comprising, in step S1: and mapping all the words or/and words obtained by segmentation to a word vector space.
3. The unsupervised construction method of a set of building domain entities according to claim 1, characterized in that in step S2, it comprises the following steps:
and S21, judging whether the calculated coagulation degree score is greater than or equal to a preset coagulation degree score:
if the calculated curdling degree score is greater than or equal to the preset curdling degree score, executing step S22;
if the calculated coagulation degree score is smaller than the preset coagulation degree score, discarding;
and S22, judging whether the calculated left adjacency score is greater than or equal to a preset left adjacency score:
if the calculated left adjacency score is greater than or equal to the preset left adjacency score, executing step S23;
if the calculated left adjacency score is smaller than the preset left adjacency score, the left adjacency score is discarded;
and S23, judging whether the calculated right adjacency score is larger than or equal to a preset right adjacency score:
if the calculated right adjacency score is larger than or equal to the preset right adjacency score, calculating the overall score condition of the right adjacency score;
and if the calculated right adjacency score is smaller than the preset right adjacency score, the right adjacency score is discarded.
4. The method for unsupervised construction of a set of building field entities according to claim 3, wherein the condensation score is calculated in step S21 by:
p (X, Y) represents the joint probability of the appearance of the character X and the character Y in the text to be screened;
p (Y) represents the probability of the appearance of the character Y in the text to be screened;
MI (X, Y) represents a word X to word Y cohesion score;
the left-neighborhood score is calculated in step S22 by:
a represents a set formed by all characters appearing on the left side of a character W in a text to be screened;
a represents a certain word in the vocabulary set A;
aW represents the word with word a to the left of word W;
p (aW | W) represents a conditional probability that the word aW occurs under the condition that the word W occurs;
EL(W) represents a left adjacency score;
the method of calculating the right adjacency score in step S23 is:
b represents a set formed by all characters appearing on the right side of the character W in the text to be screened;
b represents a certain word in the vocabulary set B;
wb represents the word with word b to the right of word W;
p (Wb | W) represents a conditional probability of the word Wb occurring under the condition of the word W;
ER(W) represents a right adjacency score;
the overall score is calculated in step S23 by:
Score=λ1·MI(X,Y)+λ2·EL(W)+λ3·ER(W),
wherein λ is1Adjusting the coefficient for the condensation score;
MI (X, Y) represents a word X to word Y cohesion score;
λ2adjusting the coefficients for the left adjacency score;
EL(W) represents a left adjacency score;
λ3adjusting the coefficient for the right adjacency score;
ER(W) represents a right adjacency score.
5. The unsupervised construction method of a set of building domain entities according to claim 1, characterized in that in step S1, it comprises the following steps:
S={s1,s2,s3,…,sns represents the text to be screened, SiRepresenting an ith sentence in the text to be screened, wherein i is a positive integer less than or equal to n;
si=V1V2V3…VM,Vjrepresenting the jth character in the ith sentence in the text to be screened, wherein j is a positive integer less than or equal to M;
si′={V1,V2,V3,…,VM},si' means that the ith sentence is divided into M words.
6. The method for unsupervised construction of a set of building realm entities according to claim 2, wherein the representation of the word vector space is:
ei=Wwrdvi,
wherein e isiA low-dimensional vectorized representation representing a word;
Wwrdthe parameter matrix of the word is obtained through training;
virepresenting a high-dimensional vector input into the computer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010302187.6A CN111552806B (en) | 2020-04-16 | 2020-04-16 | Method for unsupervised construction of entity set in building field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010302187.6A CN111552806B (en) | 2020-04-16 | 2020-04-16 | Method for unsupervised construction of entity set in building field |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111552806A CN111552806A (en) | 2020-08-18 |
CN111552806B true CN111552806B (en) | 2021-11-02 |
Family
ID=72007475
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010302187.6A Active CN111552806B (en) | 2020-04-16 | 2020-04-16 | Method for unsupervised construction of entity set in building field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111552806B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649849A (en) * | 2016-12-30 | 2017-05-10 | 上海智臻智能网络科技股份有限公司 | Text information base building method and device and searching method, device and system |
CN108846033A (en) * | 2018-05-28 | 2018-11-20 | 北京邮电大学 | The discovery and classifier training method and apparatus of specific area vocabulary |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102646100B (en) * | 2011-02-21 | 2016-02-24 | 腾讯科技(深圳)有限公司 | Domain term acquisition methods and system |
CN103092966A (en) * | 2013-01-23 | 2013-05-08 | 盘古文化传播有限公司 | Vocabulary mining method and device |
CN107577739B (en) * | 2017-08-28 | 2020-04-10 | 广东惠禾科技发展有限公司 | Semi-supervised domain word mining and classifying method and equipment |
CN110162681B (en) * | 2018-10-08 | 2023-04-18 | 腾讯科技(深圳)有限公司 | Text recognition method, text processing method, text recognition device, text processing device, computer equipment and storage medium |
-
2020
- 2020-04-16 CN CN202010302187.6A patent/CN111552806B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649849A (en) * | 2016-12-30 | 2017-05-10 | 上海智臻智能网络科技股份有限公司 | Text information base building method and device and searching method, device and system |
CN108846033A (en) * | 2018-05-28 | 2018-11-20 | 北京邮电大学 | The discovery and classifier training method and apparatus of specific area vocabulary |
Also Published As
Publication number | Publication date |
---|---|
CN111552806A (en) | 2020-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977223B (en) | Method for classifying papers by using capsule mechanism-fused graph convolution network | |
CN111104513B (en) | Short text classification method for question and answer service of game platform user | |
CN107392919B (en) | Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method | |
CN111125434B (en) | Relation extraction method and system based on ensemble learning | |
CN109299668B (en) | Hyperspectral image classification method based on active learning and cluster analysis | |
CN111967258B (en) | Method for constructing coreference resolution model, coreference resolution method and medium | |
CN111177386B (en) | Proposal classification method and system | |
CN110493612B (en) | Barrage information processing method, server and computer readable storage medium | |
Chen et al. | An improved SOM algorithm and its application to color feature extraction | |
CN114998602A (en) | Domain adaptive learning method and system based on low confidence sample contrast loss | |
Zhuang et al. | A handwritten Chinese character recognition based on convolutional neural network and median filtering | |
CN111552806B (en) | Method for unsupervised construction of entity set in building field | |
Duh et al. | Beyond log-linear models: Boosted minimum error rate training for n-best re-ranking | |
CN109192197A (en) | Big data speech recognition system Internet-based | |
CN116304063B (en) | Simple emotion knowledge enhancement prompt tuning aspect-level emotion classification method | |
CN110032642B (en) | Modeling method of manifold topic model based on word embedding | |
CN117033961A (en) | Multi-mode image-text classification method for context awareness | |
CN109871448B (en) | Short text classification method and system | |
CN111639189A (en) | Text graph construction method based on text content features | |
CN115345158A (en) | New word discovery method, device, equipment and storage medium based on unsupervised learning | |
JPH11143875A (en) | Device and method for automatic word classification | |
CN114091469B (en) | Network public opinion analysis method based on sample expansion | |
CN116051924A (en) | Divide-and-conquer defense method for image countermeasure sample | |
CN113705647B (en) | Dual semantic feature extraction method based on dynamic interval | |
CN112529637B (en) | Service demand dynamic prediction method and system based on context awareness |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |