CN111552806B - Method for unsupervised construction of entity set in building field - Google Patents

Method for unsupervised construction of entity set in building field Download PDF

Info

Publication number
CN111552806B
CN111552806B CN202010302187.6A CN202010302187A CN111552806B CN 111552806 B CN111552806 B CN 111552806B CN 202010302187 A CN202010302187 A CN 202010302187A CN 111552806 B CN111552806 B CN 111552806B
Authority
CN
China
Prior art keywords
word
score
words
probability
calculated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010302187.6A
Other languages
Chinese (zh)
Other versions
CN111552806A (en
Inventor
万里
秦梦瑶
丁玉杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202010302187.6A priority Critical patent/CN111552806B/en
Publication of CN111552806A publication Critical patent/CN111552806A/en
Application granted granted Critical
Publication of CN111552806B publication Critical patent/CN111552806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method for unsupervised construction of an entity set in the building field, which comprises the following steps: s1, acquiring texts to be screened, and dividing each sentence in the acquired texts into M characters or/and words, wherein M is a positive integer greater than or equal to 1; s2, selecting the first K words with the highest overall score as a candidate word set D, wherein K is a positive integer greater than or equal to 1; and S3, screening out the vocabulary with similar semantic characteristics with the building field vocabulary in the candidate vocabulary set D as the field words. The invention can classify the words in the acquired text to be screened, and screen out and filter out words which do not belong to the building field.

Description

Method for unsupervised construction of entity set in building field
Technical Field
The invention relates to the technical field of knowledge maps, in particular to a method for unsupervised construction of an entity set in the building field.
Background
As a semantic network, knowledge-graph can solve many practical problems under the energization of big data. It is envisaged that there is still more knowledge that does not break the size bottleneck. Other types of knowledge representations will also be able to solve more practical problems under the enablement of large data. Knowledge required by more and more fields of application breaks through the scope of knowledge graph and demands other knowledge (such as production rules, Bayesian networks, decision trees and the like). Natural language is exceptionally complex: natural language has ambiguity, diversity, and semantic understanding has ambiguity and depends on context. The root cause of the difficulty in understanding natural language by machine is that human language understanding is based on the cognitive ability of human beings, and the background knowledge formed by the cognitive experience of human beings is the fundamental strut supporting human language understanding.
Disclosure of Invention
The invention aims to at least solve the technical problems in the prior art, and particularly creatively provides a method for unsupervised construction of an entity set in the building field.
In order to achieve the above object, the present invention provides a method for unsupervised construction of a building domain entity set, comprising the following steps:
s1, acquiring texts to be screened, and dividing each sentence in the acquired texts into M characters or/and words, wherein M is a positive integer greater than or equal to 1;
s2, selecting the first K words with the highest overall score as a candidate word set D, wherein K is a positive integer greater than or equal to 1;
and S3, screening out words with similar semantic features with the building field word set in the candidate word set D as field words.
In a preferred embodiment of the present invention, step S1 further includes: and mapping all the words or/and words obtained by segmentation to a word vector space.
In a preferred embodiment of the present invention, step S2 includes the following steps:
and S21, judging whether the calculated coagulation degree score is greater than or equal to a preset coagulation degree score:
if the calculated curdling degree score is greater than or equal to the preset curdling degree score, executing step S22;
if the calculated coagulation degree score is smaller than the preset coagulation degree score, discarding;
and S22, judging whether the calculated left adjacency score is greater than or equal to a preset left adjacency score:
if the calculated left adjacency score is greater than or equal to the preset left adjacency score, executing step S23;
if the calculated left adjacency score is smaller than the preset left adjacency score, the left adjacency score is discarded;
and S23, judging whether the calculated right adjacency score is larger than or equal to a preset right adjacency score:
if the calculated right adjacency score is larger than or equal to the preset right adjacency score, calculating the overall score condition of the right adjacency score;
and if the calculated right adjacency score is smaller than the preset right adjacency score, the right adjacency score is discarded.
In a preferred embodiment of the present invention, the condensation degree score is calculated in step S21 by:
Figure BDA0002454412600000021
p (X, Y) represents the joint probability of the appearance of the character X and the character Y in the text to be screened;
p (Y) represents the probability of the appearance of the character Y in the text to be screened;
MI (X, Y) represents a word X to word Y cohesion score;
the left-neighborhood score is calculated in step S22 by:
Figure BDA0002454412600000022
a represents a set formed by all characters appearing on the left side of a character W in a text to be screened;
a represents a certain word in the vocabulary set A;
aW represents the word with word a to the left of word W;
p (aW | W) represents a conditional probability that the word aW occurs under the condition that the word W occurs;
EL(W) represents a left adjacency score;
the method of calculating the right adjacency score in step S23 is:
Figure BDA0002454412600000031
b represents a set formed by all characters appearing on the right side of the character W in the text to be screened;
b represents a certain word in the vocabulary set B;
wb represents the word with word b to the right of word W;
p (Wb | W) represents a conditional probability of the word Wb occurring under the condition of the word W;
ER(W) represents a right adjacency score.
The overall score is calculated in step S23 by:
Score=λ1·MI(X,Y)+λ2·EL(W)+λ3·ER(W),
wherein λ is1Adjusting the coefficient for the condensation score;
MI (X, Y) represents a word X to word Y cohesion score;
λ2adjusting the coefficients for the left adjacency score;
EL(W) represents a left adjacency score;
λ3adjusting the coefficient for the right adjacency score;
ER(W) represents a right adjacency score.
In a preferred embodiment of the present invention, in step S3, the method for screening out the words in the candidate word set D having semantic features similar to those in the building field word set includes:
P(Z)=∫P(Z|θ)P(θ)dθ,
wherein Z represents a word in the candidate word set D;
θ represents a parameter;
p (Z | θ) represents the conditional probability of the occurrence of the word Z under the condition of the occurrence of the parameter θ;
p (θ) represents a prior probability density function;
p (Z) represents the probability of the occurrence of the word Z under the parameter θ;
Figure BDA0002454412600000041
wherein D isCRepresenting a set formed by words belonging to the building field word set in the candidate word set D;
Zirepresenting a set of words DCThe word in (1);
P(Zi| θ) indicates that the word Z occurs under the condition that the parameter θ occursiA conditional probability of occurrence;
p (θ) represents a prior probability density function;
P(DC) Representing a set of words DCThe probability of occurrence;
P(Z|DC)=∫P(Z|θ)P(θ|DC)dθ,
wherein P (Z | θ) represents the conditional probability of the occurrence of the word Z under the condition of the occurrence of the parameter θ;
P(θ|DC) Representing a posterior probability density function;
P(Z|DC) Indicating that the word Z belongs to the set of words DCThe probability of (d);
Figure BDA0002454412600000042
wherein, P (D)C| θ) represents a likelihood function;
p (θ) represents a prior probability density function;
P(DC) Representing a set of words DCThe probability of occurrence;
P(θ|DC) Representing a posterior probability density function;
Figure BDA0002454412600000043
wherein, P (Z | D)C) Indicating that the word Z belongs to the set of words DCThe probability of (d);
p (Z) represents the probability of the occurrence of the word Z under the parameter θ; (ii) a
score (Z) the expression Z belongs to the set of words D in the building fieldC(ii) a final score of;
if the word Z belongs to the word set D in the building fieldCIf the final score is greater than or equal to the preset score value, the vocabulary Z in the candidate word set belongs to the building field word set DC
In a preferred embodiment of the present invention, step S1 includes the following steps:
S={s1,s2,s3,…,sns represents the text to be screened, SiRepresenting an ith sentence in the text to be screened, wherein i is a positive integer less than or equal to n;
si=V1V2V3…VM,Vjrepresenting the jth character in the ith sentence in the text to be screened, wherein j is a positive integer less than or equal to M;
si′={V1,V2,V3,…,VM},si' means that the ith sentence is divided into M words.
In a preferred embodiment of the present invention, the expression method of the word vector space is:
ei=Wwrdvi
wherein e isiA low-dimensional vectorized representation representing a word;
Wwrdthe parameter matrix of the word is obtained through training;
virepresenting a high-dimensional vector input into the computer.
In summary, by adopting the technical scheme, the invention can classify the words in the acquired text to be screened, and screen out and filter out words which do not belong to the building field.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic block diagram of the process of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
The invention provides a method for unsupervised construction of an entity set in the building field, which comprises the following steps:
s1, acquiring texts to be screened, and dividing each sentence in the acquired texts into M characters or/and words, wherein M is a positive integer greater than or equal to 1;
s2, selecting the first K words with the highest overall score as a candidate word set D, wherein K is a positive integer greater than or equal to 1;
s3, extracting terms of the building field normative documents, expressing semantic features of building field vocabularies by word vectors, taking the vocabulary feature mean values with the same attributes as a building field vocabulary set, screening out vocabularies which have similar semantic features with the building field vocabulary set from the candidate vocabulary set D as field vocabularies, and filtering out non-field vocabularies irrelevant to the building field.
In a preferred embodiment of the present invention, step S1 further includes: and mapping all the words or/and words obtained by segmentation to a word vector space.
In a preferred embodiment of the present invention, step S2 includes the following steps:
and S21, judging whether the calculated coagulation degree score is greater than or equal to a preset coagulation degree score:
if the calculated curdling degree score is greater than or equal to the preset curdling degree score, executing step S22;
if the calculated coagulation degree score is smaller than the preset coagulation degree score, discarding;
and S22, judging whether the calculated left adjacency score is greater than or equal to a preset left adjacency score:
if the calculated left adjacency score is greater than or equal to the preset left adjacency score, executing step S23;
if the calculated left adjacency score is smaller than the preset left adjacency score, the left adjacency score is discarded;
and S23, judging whether the calculated right adjacency score is larger than or equal to a preset right adjacency score:
if the calculated right adjacency score is larger than or equal to the preset right adjacency score, calculating the overall score condition of the right adjacency score;
and if the calculated right adjacency score is smaller than the preset right adjacency score, the right adjacency score is discarded.
In a preferred embodiment of the present invention, the condensation degree score is calculated in step S21 by:
Figure BDA0002454412600000071
p (X, Y) represents the joint probability of the appearance of the character X and the character Y in the text to be screened;
p (X) represents the probability of the appearance of the character X in the text to be screened;
p (Y) represents the probability of the appearance of the character Y in the text to be screened;
MI (X, Y) represents a word X to word Y cohesion score;
the left-neighborhood score is calculated in step S22 by:
Figure BDA0002454412600000072
a represents a set formed by all characters appearing on the left side of a character W in a text to be screened;
a represents a certain word in the vocabulary set A;
aW represents the word with word a to the left of word W;
p (aW | W) represents a conditional probability that the word aW occurs under the condition that the word W occurs;
EL(W) represents a left adjacency score;
the method of calculating the right adjacency score in step S23 is:
Figure BDA0002454412600000073
b represents a set formed by all characters appearing on the right side of the character W in the text to be screened;
b represents a certain word in the vocabulary set B;
wb represents the word with word b to the right of word W;
p (Wb | W) represents a conditional probability of the word Wb occurring under the condition of the word W;
ER(W) represents a right adjacency score.
The overall score is calculated in step S23 by:
Score=λ1·MI(X,Y)+λ2·EL(W)+λ3·ER(W),
wherein λ is1Adjusting the coefficient for the condensation score;
MI (X, Y) represents a word X to word Y cohesion score;
λ2adjusting the coefficients for the left adjacency score;
EL(W) represents a left adjacency score;
λ3adjusting the coefficient for the right adjacency score;
ER(W) represents a right adjacency score.
In a preferred embodiment of the present invention, in step S3, the method for screening out the words in the candidate word set D having semantic features similar to those in the building field word set includes:
P(Z)=∫P(Z|θ)P(θ)dθ,
wherein Z represents a word in the candidate word set D, and is also (D-D)C) The word in (1); dCRepresenting a set formed by words belonging to the building field word set in the candidate word set D;
θ represents a parameter;
p (Z | θ) represents the conditional probability of the occurrence of the word Z under the condition of the occurrence of the parameter θ;
p (θ) represents a prior probability density function;
p (Z) represents the probability of the occurrence of the word Z under the parameter θ;
Figure BDA0002454412600000081
wherein D isCRepresenting a set formed by words belonging to the building field word set in the candidate word set D;
Zirepresenting a set of words DCThe word in (1);
P(Zi| θ) indicates that the word Z occurs under the condition that the parameter θ occursiA conditional probability of occurrence;
p (θ) represents a prior probability density function;
P(DC) Representing a set of words DCThe probability of occurrence;
P(Z|DC)=∫P(Z|θ)P(θ|DC)dθ,
wherein P (Z | θ) represents the conditional probability of the occurrence of the word Z under the condition of the occurrence of the parameter θ;
P(θ|DC) Representing a posterior probability density function;
P(Z|DC) Indicating that the word Z belongs to the set of words DCThe probability of (d);
Figure BDA0002454412600000091
wherein, P (D)C| θ) represents a likelihood function;
p (θ) represents a prior probability density function;
P(DC) Representing a set of words DCThe probability of occurrence;
P(θ|DC) Representing a posterior probability density function;
Figure BDA0002454412600000092
wherein, P (Z | D)C) Indicating that the word Z belongs to the set of words DCThe probability of (d);
p (Z) represents the probability of the occurrence of the word Z under the parameter θ; (ii) a
score (Z) the expression Z belongs to the set of words D in the building fieldC(ii) a final score of;
if the word Z belongs to the word set D in the building fieldCIf the final score is greater than or equal to the preset score value, the vocabulary Z in the candidate word set belongs to the building field word set DC
In a preferred embodiment of the present invention, step S1 includes the following steps:
S={s1,s2,s3,…,sns represents the text to be screened, SiRepresenting an ith sentence in the text to be screened, wherein i is a positive integer less than or equal to n;
si=V1V2V3…VM,Vjrepresenting the jth character in the ith sentence in the text to be screened, wherein j is a positive integer less than or equal to M;
si′={V1,V2,V3,…,VM},si' means that the ith sentence is divided into M words.
In a preferred embodiment of the present invention, the expression method of the word vector space is:
ei=Wwrdvi
wherein e isiA low-dimensional vectorized representation representing a word;
Wwrdthe parameter matrix of the word is obtained through training;
virepresenting a high-dimensional vector input into the computer.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (6)

1. A method for unsupervised construction of a building field entity set is characterized by comprising the following steps:
s1, acquiring texts to be screened, and dividing each sentence in the acquired texts into M characters or/and words, wherein M is a positive integer greater than or equal to 1;
s2, selecting the first K words with the highest overall score as a candidate word set D, wherein K is a positive integer greater than or equal to 1;
s3, screening out words with similar semantic features to the building field word set in the candidate word set D as field words; the method for screening out the words with similar semantic features with the building field word set in the candidate word set D comprises the following steps:
P(Z)=∫P(Z|θ)P(θ)dθ,
wherein Z represents a word in the candidate word set D;
θ represents a parameter;
p (Z | θ) represents the conditional probability of the occurrence of the word Z under the condition of the occurrence of the parameter θ;
p (θ) represents a prior probability density function;
p (Z) represents the probability of the occurrence of the word Z under the parameter θ;
Figure FDA0003148174530000011
wherein D isCRepresenting a set formed by words belonging to the building field word set in the candidate word set D;
Zirepresenting a set of words DCThe word in (1);
P(Zi| θ) indicates that the word Z occurs under the condition that the parameter θ occursiA conditional probability of occurrence;
p (θ) represents a prior probability density function;
P(DC) Representing a set of words DCThe probability of occurrence;
P(Z|DC)=∫P(Z|θ)P(θ|DC)dθ,
wherein P (Z | θ) represents the conditional probability of the occurrence of the word Z under the condition of the occurrence of the parameter θ;
P(θ|DC) Representing a posterior probability density function;
P(Z|DC) Indicating that the word Z belongs to the set of words DCThe probability of (d);
Figure FDA0003148174530000021
wherein, P (D)C| θ) represents a likelihood function;
p (θ) represents a prior probability density function;
P(DC) Representing a set of words DCThe probability of occurrence;
P(θ|DC) Representing a posterior probability density function;
Figure FDA0003148174530000022
wherein, P (Z | D)C) Indicating that the word Z belongs to the set of words DCThe probability of (d);
p (Z) represents the probability of the occurrence of the word Z under the parameter θ;
score (Z) the expression Z belongs to the set of words D in the building fieldC(ii) a final score of;
if the word Z belongs to the word set D in the building fieldCIf the final score is greater than or equal to the preset score value, the vocabulary Z in the candidate word set belongs to the building field word set DC
2. The unsupervised construction method of building domain entity set of claim 1, further comprising, in step S1: and mapping all the words or/and words obtained by segmentation to a word vector space.
3. The unsupervised construction method of a set of building domain entities according to claim 1, characterized in that in step S2, it comprises the following steps:
and S21, judging whether the calculated coagulation degree score is greater than or equal to a preset coagulation degree score:
if the calculated curdling degree score is greater than or equal to the preset curdling degree score, executing step S22;
if the calculated coagulation degree score is smaller than the preset coagulation degree score, discarding;
and S22, judging whether the calculated left adjacency score is greater than or equal to a preset left adjacency score:
if the calculated left adjacency score is greater than or equal to the preset left adjacency score, executing step S23;
if the calculated left adjacency score is smaller than the preset left adjacency score, the left adjacency score is discarded;
and S23, judging whether the calculated right adjacency score is larger than or equal to a preset right adjacency score:
if the calculated right adjacency score is larger than or equal to the preset right adjacency score, calculating the overall score condition of the right adjacency score;
and if the calculated right adjacency score is smaller than the preset right adjacency score, the right adjacency score is discarded.
4. The method for unsupervised construction of a set of building field entities according to claim 3, wherein the condensation score is calculated in step S21 by:
Figure FDA0003148174530000031
p (X, Y) represents the joint probability of the appearance of the character X and the character Y in the text to be screened;
p (Y) represents the probability of the appearance of the character Y in the text to be screened;
MI (X, Y) represents a word X to word Y cohesion score;
the left-neighborhood score is calculated in step S22 by:
Figure FDA0003148174530000032
a represents a set formed by all characters appearing on the left side of a character W in a text to be screened;
a represents a certain word in the vocabulary set A;
aW represents the word with word a to the left of word W;
p (aW | W) represents a conditional probability that the word aW occurs under the condition that the word W occurs;
EL(W) represents a left adjacency score;
the method of calculating the right adjacency score in step S23 is:
Figure FDA0003148174530000033
b represents a set formed by all characters appearing on the right side of the character W in the text to be screened;
b represents a certain word in the vocabulary set B;
wb represents the word with word b to the right of word W;
p (Wb | W) represents a conditional probability of the word Wb occurring under the condition of the word W;
ER(W) represents a right adjacency score;
the overall score is calculated in step S23 by:
Score=λ1·MI(X,Y)+λ2·EL(W)+λ3·ER(W),
wherein λ is1Adjusting the coefficient for the condensation score;
MI (X, Y) represents a word X to word Y cohesion score;
λ2adjusting the coefficients for the left adjacency score;
EL(W) represents a left adjacency score;
λ3adjusting the coefficient for the right adjacency score;
ER(W) represents a right adjacency score.
5. The unsupervised construction method of a set of building domain entities according to claim 1, characterized in that in step S1, it comprises the following steps:
S={s1,s2,s3,…,sns represents the text to be screened, SiRepresenting an ith sentence in the text to be screened, wherein i is a positive integer less than or equal to n;
si=V1V2V3…VM,Vjrepresenting the jth character in the ith sentence in the text to be screened, wherein j is a positive integer less than or equal to M;
si′={V1,V2,V3,…,VM},si' means that the ith sentence is divided into M words.
6. The method for unsupervised construction of a set of building realm entities according to claim 2, wherein the representation of the word vector space is:
ei=Wwrdvi
wherein e isiA low-dimensional vectorized representation representing a word;
Wwrdthe parameter matrix of the word is obtained through training;
virepresenting a high-dimensional vector input into the computer.
CN202010302187.6A 2020-04-16 2020-04-16 Method for unsupervised construction of entity set in building field Active CN111552806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010302187.6A CN111552806B (en) 2020-04-16 2020-04-16 Method for unsupervised construction of entity set in building field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010302187.6A CN111552806B (en) 2020-04-16 2020-04-16 Method for unsupervised construction of entity set in building field

Publications (2)

Publication Number Publication Date
CN111552806A CN111552806A (en) 2020-08-18
CN111552806B true CN111552806B (en) 2021-11-02

Family

ID=72007475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010302187.6A Active CN111552806B (en) 2020-04-16 2020-04-16 Method for unsupervised construction of entity set in building field

Country Status (1)

Country Link
CN (1) CN111552806B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649849A (en) * 2016-12-30 2017-05-10 上海智臻智能网络科技股份有限公司 Text information base building method and device and searching method, device and system
CN108846033A (en) * 2018-05-28 2018-11-20 北京邮电大学 The discovery and classifier training method and apparatus of specific area vocabulary

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102646100B (en) * 2011-02-21 2016-02-24 腾讯科技(深圳)有限公司 Domain term acquisition methods and system
CN103092966A (en) * 2013-01-23 2013-05-08 盘古文化传播有限公司 Vocabulary mining method and device
CN107577739B (en) * 2017-08-28 2020-04-10 广东惠禾科技发展有限公司 Semi-supervised domain word mining and classifying method and equipment
CN110162681B (en) * 2018-10-08 2023-04-18 腾讯科技(深圳)有限公司 Text recognition method, text processing method, text recognition device, text processing device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649849A (en) * 2016-12-30 2017-05-10 上海智臻智能网络科技股份有限公司 Text information base building method and device and searching method, device and system
CN108846033A (en) * 2018-05-28 2018-11-20 北京邮电大学 The discovery and classifier training method and apparatus of specific area vocabulary

Also Published As

Publication number Publication date
CN111552806A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN109977223B (en) Method for classifying papers by using capsule mechanism-fused graph convolution network
CN111104513B (en) Short text classification method for question and answer service of game platform user
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN111125434B (en) Relation extraction method and system based on ensemble learning
CN109299668B (en) Hyperspectral image classification method based on active learning and cluster analysis
CN111967258B (en) Method for constructing coreference resolution model, coreference resolution method and medium
CN111177386B (en) Proposal classification method and system
CN110493612B (en) Barrage information processing method, server and computer readable storage medium
Chen et al. An improved SOM algorithm and its application to color feature extraction
CN114998602A (en) Domain adaptive learning method and system based on low confidence sample contrast loss
Zhuang et al. A handwritten Chinese character recognition based on convolutional neural network and median filtering
CN111552806B (en) Method for unsupervised construction of entity set in building field
Duh et al. Beyond log-linear models: Boosted minimum error rate training for n-best re-ranking
CN109192197A (en) Big data speech recognition system Internet-based
CN116304063B (en) Simple emotion knowledge enhancement prompt tuning aspect-level emotion classification method
CN110032642B (en) Modeling method of manifold topic model based on word embedding
CN117033961A (en) Multi-mode image-text classification method for context awareness
CN109871448B (en) Short text classification method and system
CN111639189A (en) Text graph construction method based on text content features
CN115345158A (en) New word discovery method, device, equipment and storage medium based on unsupervised learning
JPH11143875A (en) Device and method for automatic word classification
CN114091469B (en) Network public opinion analysis method based on sample expansion
CN116051924A (en) Divide-and-conquer defense method for image countermeasure sample
CN113705647B (en) Dual semantic feature extraction method based on dynamic interval
CN112529637B (en) Service demand dynamic prediction method and system based on context awareness

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant