CN109657242B - Automatic eliminating system for Chinese redundancy meaning items - Google Patents

Automatic eliminating system for Chinese redundancy meaning items Download PDF

Info

Publication number
CN109657242B
CN109657242B CN201811542048.XA CN201811542048A CN109657242B CN 109657242 B CN109657242 B CN 109657242B CN 201811542048 A CN201811542048 A CN 201811542048A CN 109657242 B CN109657242 B CN 109657242B
Authority
CN
China
Prior art keywords
syn
fat
items
term
sense
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811542048.XA
Other languages
Chinese (zh)
Other versions
CN109657242A (en
Inventor
符建辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Guoli Zhenjiang Intelligent Technology Co ltd
Original Assignee
Zhongke Guoli Zhenjiang Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Guoli Zhenjiang Intelligent Technology Co ltd filed Critical Zhongke Guoli Zhenjiang Intelligent Technology Co ltd
Priority to CN201811542048.XA priority Critical patent/CN109657242B/en
Publication of CN109657242A publication Critical patent/CN109657242A/en
Application granted granted Critical
Publication of CN109657242B publication Critical patent/CN109657242B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a Chinese redundancy meaning item automatic elimination system, which comprises a module A: labeling the meaning items of the segmented training corpus T gamma and analyzing the correlation of the meaning items; module B: eliminating redundant sense items by automatically detecting service independent sense items; module C: eliminating redundant sense items by comparing and analyzing a plurality of term proximity; module D: redundant sense items are eliminated by comparing the term near class with the term parent class. The invention provides a system and a method for automatically eliminating Chinese redundancy meaning items with high efficiency by means of artificial intelligence association analysis, statistical analysis and other technologies, thereby improving the accuracy and the efficiency of Chinese sentence analysis.

Description

Automatic eliminating system for Chinese redundancy meaning items
Technical Field
The invention relates to the fields of Chinese understanding, text automatic analysis, chinese machine learning and the like, in particular to a system for automatically eliminating Chinese redundancy meaning items.
Background
With the rapid development of artificial intelligence technology, the industry application demands with natural language as a core are becoming stronger. In the process of analyzing natural language sentences, there are two basic and important tasks: the words in the natural language sentence after word segmentation are labeled one by one. The former task is simply divided into words, and the latter task is simply referred to as meaning item annotation.
In labeling the meaning items of a natural language sentence (simply referred to as sentence) S, a difficulty that is generally encountered is how to accurately label the meaning items of the words in the sentence S. This problem becomes more serious for a particular industry application because in a particular industry, most words carry a number of possible meanings, which are compiled by different business personnel. Because of the lack of unified criteria, the phenomenon of one word in a sentence being labeled with multiple terms is quite common, and some of the terms are irrelevant, thus becoming redundant terms.
For example, for the sentence s= "how mobile phone card is" two sets of results can be obtained after word segmentation: ts1= "how mobile phone card is handled", and ts2= "how mobile phone card is handled". Labeling the results for their sense items may be: TS1 = "Mobile phone card { card proximity } { element parent }/how { proximity } { query parent }/office { handling proximity, office proximity }/", TS2 = "Mobile phone { } { product parent, device parent, movie parent }/card { card proximity, cartoon proximity } { element parent }/how { proximity } { query parent }/office { handling proximity, office proximity } { }/" where "handling { proximity, office proximity } { }" means that there are two items to handle a word, i.e., handling proximity, office proximity. However, it is easy to determine that the sense of the office in TS1 and TS2 does not include the sense of the office's close class, and that the correct sense should have only the office's close class. The redundant meaning term not only can reduce the analysis precision of the Chinese sentence, but also can reduce the processing speed of the Chinese sentence.
Although the problems of Chinese meaning labeling and redundant meaning elimination have been studied for many years, in the existing methods, there are still two closely related problems:
(1) The precision of the meaning item annotation is not high: the correct understanding of Chinese sentences depends on the meaning item labels of the sentences, and if the meaning item labels of the sentences are wrong, the understanding of the sentences leads to misunderstanding.
(2) Redundancy meaning item labeling: in the meaning item labeling of a sentence, irrelevant meaning items are generally labeled on the words, so that some words in the sentence are formed with redundant meaning items; the reason for this problem is that these words appear in different semantic types.
Disclosure of Invention
The invention aims to solve the technical problems that: aiming at the problems of low meaning item marking precision, redundant meaning item marking and the like of Chinese sentences, the invention provides a high-efficiency system for automatically eliminating the Chinese redundant meaning items by means of artificial intelligent association analysis, statistical analysis and other technologies, thereby improving the precision and efficiency of Chinese sentence analysis.
In order to solve the problems, the invention adopts the following technical scheme that the automatic Chinese redundancy meaning item eliminating system comprises the following modules:
module a: labeling the meaning items of the segmented training corpus T gamma and analyzing the correlation of the meaning items;
module B: eliminating redundant sense items by automatically detecting service independent sense items;
module C: eliminating redundant sense items by comparing and analyzing a plurality of term proximity;
module D: redundant sense items are eliminated by comparing the term near class with the term parent class.
Implementation of the module AThe method comprises the following steps: training corpus T gamma= { TS after word segmentation 1 ,TS 2 ,...,TS n }, wherein TS i (1. Ltoreq.i.ltoreq.n) in the form TS i =t i1 {}{}/t i2 {}{}/...t ij {}{}/...t ik { } { }/; introducing a sense set, which is a set, initially empty; for each TS in T gamma i For TS i Each t of (2) ij { } { }, performing the following steps:
step A-1: finding t in a near class dictionary ij The belonging term classes are stored in the set t ij In _syn, t is ij Increase_syn to t ij In the first of { } { }, t is formed ij {t ij _syn}{};
Step A-2: sense_set=sense_set set t ij _syn;
Step A-3: finding t in the parent dictionary ij The term parent class to which they belong is stored in set t ij In_fat, t is taken as ij Fat is increased to t ij {t ij In the second one of _syn } { }, t is formed ij {t ij _syn}{t ij _fat};
Step A-4: sense_set=sense_set set t ij _fat。
The implementation steps of the module B are as follows:
step B-1: for any term of sense_set, the support set of sf in Γ is calculated and denoted as supp_set (Γ, sf), i.e. supp_set (Γ, sf) = { s|s e Γ, and S contains at least one term of sf };
step B-2: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any two elements t in_syn ij _syn 1 And t ij _syn 2 : if it is
Figure BDA0001908375900000021
And (2) and
Figure BDA0001908375900000022
then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_syn ij _syn 2 I.e. t ij _syn 2 At t ij Redundant sense items of (a);
step B-3: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij _fat
Any two elements t of (a) ij _fat 1 And t ij _fat 2 : if it is
Figure BDA0001908375900000031
And is also provided with
Figure BDA0001908375900000032
Then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_fat ij _fat 2 I.e. t ij _fat 2 At t ij Is a redundant term of meaning.
The implementation steps of the module C are as follows:
step C-1: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any two elements t in_syn ij _syn 1 And t ij _syn 2 : if it is
Figure BDA0001908375900000033
Then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_syn ij _syn 1 I.e. t ij _syn 1 At t ij Redundant sense items of (a);
step C-2: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any two elements t in_syn ij _syn 1 And t ij _syn 2 : if it is
Figure BDA0001908375900000034
Then the following is performed: if it is
Figure BDA0001908375900000035
Then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_syn ij _syn 2 (i.e. t ij _syn 2 At t ij Redundancy of sense item).
The implementation method of the module D is as follows:
for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any element t in_syn ij _syn 1 And t ij Any element t in_fat ij _fat 1 If (3)
Figure BDA0001908375900000036
And is also provided with
Figure BDA0001908375900000037
Then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_fat ij _fat 1
The beneficial effects are that: the invention provides a system and a method for automatically eliminating Chinese redundancy meaning items. The method selects 5 Chinese application scenes such as package consultation, activity consultation, fault consultation, weather consultation, flight consultation and the like, and collects 10000 Chinese sentences to carry out a test of automatically eliminating Chinese redundancy meaning items. Then, 1000 test results are checked and the effect of the invention on automatic elimination of redundant sense term is examined. The result shows that up to 88.1% of redundant sense items are accurately eliminated, namely the automatic elimination precision of the redundant sense items reaches 88.1%.
Drawings
FIG. 1 is a workflow diagram of a system and method for automatic elimination of Chinese redundancy meaning items.
Detailed Description
In order to be able to more clearly state the invention, several important terms are introduced and explained below:
(1) The term proximity, the term parent, the term sense: generally, any term is synonymous or proximal. For example, for the term office, the near term is office, etc. In the present invention, a collection is used to store synonymous or near-sense terms of the terms. For example, for an office, an office proximity = { office,..} is used to represent a term synonymous with or close to the office, and an office proximity = { office, office.} is used to represent a term synonymous with or close to the office. As a concept, most terms have some lower terms. For example, a product is a term which also means a concept, and its lower terms are a mobile phone, a refrigerator, a washing machine, etc. For this reason, in the present invention, one set is used to store the lower terms of terms. For example, for a product, the product parent = { cell phone, refrigerator, washing machine,.} is used to represent the lower terminology of the product, the local bank parent = { beijing bank, beijing city, nanjing bank, nanjing, ningbo bank, ningbo, } is used to represent the lower term of the local bank. The term meaning is a generic term for the term neighborhood and the term parent, for example, the office neighborhood is a term meaning and the product parent is a term meaning.
(2) A near class dictionary, a parent class dictionary: a neighborhood dictionary is a collection of tuples (terms, term classes). For example, a near class dictionary= { (office, { office, office }), { office, application,.}) }. Similarly, a parent dictionary is a collection of tuples (terms, term parents). For example, a parent dictionary = { (product, { mobile phone, refrigerator, washing machine,.}), (local bank parent, { beijing bank, beijing city, nanjing bank, nanjing, ningbo bank, ningbo, }) }.
(3) Word segmentation dictionary, word segmentation: a word segmentation dictionary is a collection of terms made up of words that appear in a near class dictionary and a parent class dictionary. For example, regarding the near dictionary and the parent dictionary in the above (2), the word segmentation dictionary formed by them= { office, sponsor, product, mobile phone, refrigerator, washing machine, beijing bank, beijing city, nanjing bank, nanjing, ningbo bank, ningbo, }. The word segmentation is a process of segmenting a sentence S into words by using words of a word segmentation dictionary. For example, s= "how a cell phone card is handled", and the word is divided into ts= "cell phone card { } { }/how { } { }/handling { }/", where "{ }" indicates that the meaning Xiang Shang of the term is uncertain and therefore an empty set; "/" indicates a term after segmentation and a spacer between terms.
(4) Training corpus set, training corpus set after word segmentation: training corpus Γ= { S 1 ,S 2 ,...,S n Is a set of Chinese sentences, where S i (1 is more than or equal to i is more than or equal to n) is a Chinese sentence. Training corpus T gamma= { TS after word segmentation 1 ,TS 2 ,...,TS n The term "is a collection obtained by word segmentation, in which TS i (1.ltoreq.i.ltoreq.n) is S i Word-segmented strings.
(5) Intersection operation, union operation, difference operation and base number of the set: given two sets S 1 And S is 2 。S 1 And S is equal to 2 Is denoted as S 1 ∩S 2 Is a combination of simultaneous occurrence of S 1 And S is 2 A set of elements in the set. S is S 1 And S is equal to 2 Is marked as S 1 ∪S 2 Is a result of occurrence of S 1 Or occur at S 2 A set of elements in the set. S is S 1 And S is equal to 2 Is marked as S 1 \S 2 Is a result of occurrence of S 1 But does not appear in S 2 A set of elements in the set. For the set S, |S| is the radix function of S, and the function value of the radix function is the number of elements in S.
The invention is described in further detail below with reference to fig. 1 and the detailed description. The system for automatically eliminating Chinese redundancy meaning items is divided into five large modules, and each large module is realized through a plurality of specific method steps. The functions of the respective modules, core methods are explained in detail below.
Module a: meaning item labeling and meaning item correlation analysis for segmented training corpus T gamma
Without loss of generality, assume word segmentationPost training corpus T Γ= { TS 1 ,TS 2 ,...,TS n }, wherein TS i (1. Ltoreq.i.ltoreq.n) in the form TS i =t i1 {}{}/t i2 {}{}/...t ik {}{}/。
A sense set sense_set is introduced, which is a set, initially empty, for storing the term near class, term parent class to which T f relates.
For each TS in T gamma i For TS i Each t of (2) ij { } { }, performing the following steps:
step A-1: finding t in a near class dictionary ij The belonging term classes are stored in the set t ij In _syn, t is ij Increase_syn to t ij In the first of { } { }, t is formed ij {t ij _syn}{}。
Step A-2: sense_set=sense_set set t ij _syn。
Step A-3: finding t in the parent dictionary ij The term parent class to which they belong is stored in set t ij In_fat, t is taken as ij Fat is increased to t ij {t ij In the second one of _syn } { }, t is formed ij {t ij _syn}{t ij _fat}。
Step A-4: sense_set=sense_set set t ij _fat。
For example, for ts= "cell phone card { }/how { { }/", after step a-1, ts= "cell phone card { card proximity } { }/how { how proximity, how { proximity } { }/how { proximity, office proximity } { }/", after step a-2, ts= "cell phone card { card proximity } { element parent }/how { proximity, how { query parent }/how { proximity, office proximity } { }/". Note that { } means that there is no corresponding term proximity or term parent, e.g., "do { transact proximity, office proximity } { }" means "do" belongs to two term proximity of transact proximity, office proximity, but not any term parent.
Module B: eliminating redundant sense items by automatically detecting business independent sense items
Through the dieThe training corpus after the processing of the block A is marked as T gamma= { TS 1 ,TS 2 ,...,TS n }, wherein TS i =t i1 {t i1 _syn}{t i1 _fat}/t i2 {t i2 _syn}{t i2 _fat}/...t ij {t ij _syn}{t ij _fat}/...t ik {t ik _syn}{t ik _fat}/。
Since the corpus Γ is derived from one or more related specific services, the specific service to which Γ belongs, such as a mobile customer service, an aviation customer service, a financial service, etc., can be determined from Γ. However, after the module a generates T, some service-independent meaning items enter T. For example, in the ts= "mobile phone card { card proximity } { element parent }/how { proximity } { how proximity } { query word parent }/office { proximity } { }/" given by the module a, element parent and office proximity are business independent meaning items, and need to be deleted from the TS.
In the sub-module A-1, the sense_set contains the term close class and the term parent class to which T Γ relates, note that each term close class or term parent class is a term set.
The specific implementation method of the module B is as follows:
step B-1: for any one term of sense_set, the term of parent sf, the supp_set (Γ, sf) = { s|s e Γ, and S contains at least one term of sf }.
Step B-2: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any two elements t in_syn ij _syn 1 And t ij _syn 2 : if it is
Figure BDA0001908375900000061
(β is a parameter, and it is found through experiments that β=0.001 achieves the best effect, so that β=0.001 is adopted in the present invention
And (2) and
Figure BDA0001908375900000062
(α is a parameter, and it is found through experiments that α=0.3 achieves the best effect, so α=0.3 is adopted in the invention), then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_syn ij _syn 2 (i.e. t ij _syn 2 At t ij Redundancy of sense item).
Step B-3: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any two elements t in_fat ij _fat 1 And t ij _fat 2 : if it is
Figure BDA0001908375900000063
And is also provided with
Figure BDA0001908375900000064
Then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_fat ij _fat 2 (i.e. t ij _fat 2 At t ij Redundancy of sense item).
Module C: eliminating redundant sense items by comparative analysis of multiple term proximity
Although module B may quickly delete some redundant sense items, some are implicit in both term classes. For this purpose, multiple term classes need to be analyzed by comparison to eliminate redundant sense items.
The specific implementation method of the module C is as follows:
step C-1: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any two elements t in_syn ij _syn 1 And t ij _syn 2 : if it is
Figure BDA0001908375900000071
(δ is a parameter, and it is found through experiments that the invention achieves the best effect when δ=0.3, so that δ=0.3 is adopted in the invention), then from t ij {t ij _syn}{t ij _fatT in } ij Deletion of t in_syn ij _syn 1 (i.e. t ij _syn 1 At t ij Redundancy of sense item).
Step C-2: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any two elements t in_syn ij _syn 1 And t ij _syn 2 : if it is
Figure BDA0001908375900000072
(γ is a parameter, and it is found through experiments that the present invention achieves the best effect when γ=0.7, so that γ=0.7 is adopted in the present invention), then the following is performed: if->
Figure BDA0001908375900000073
(α is a parameter, and it is found through experiments that α=0.3 achieves the best effect, so that α=0.3 is adopted in the invention) (, then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_syn ij _syn 2 (i.e. t ij _syn 2 At t ij Redundancy of sense item).
Module D: eliminating redundant sense items by comparing term proximity to term parent
In practice, it may be encountered that a term neighborhood belongs to a term parent, or that most members (e.g., 90%) of a term neighborhood belong to a term parent. For example, beijing proximity = { beijing, beijing city }, local bank parent = { beijing bank, beijing city, nanjing bank, nanjing, ningbo bank, ningbo, &.}, where the city names of beijing, nanjing, ningbo, etc., are short for the corresponding beijing bank, nanjing bank, ningbo bank. At this time, beijing like
Figure BDA0001908375900000076
Local bank parent class.
For the above reasons, ts= "beijing phone card" is processed by the word segmentation and the module a to be changed into ts= "beijing { beijing near class } { local bank parent }/phone card { card near class } { element parent }/", but the local bank parent in "beijing { beijing near class } { local bank parent }" is not a term of meaning of beijing.
The specific implementation method of the module D is as follows: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any element t in_syn ij _syn 1 And t ij Any element t in_fat ij _fat 1 If (3)
Figure BDA0001908375900000074
(γ is a parameter, and experiments show that the invention achieves the best effect when γ=0.7, so that γ=0.7 is adopted in the invention), and +.>
Figure BDA0001908375900000075
(α is a parameter, and it is found through experiments that α=0.3 achieves the best effect, so α=0.3 is adopted in the invention), then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_fat ij _fat 1
Experimental effect
The invention provides a system and a method for automatically eliminating Chinese redundancy meaning items. The method selects 5 Chinese application scenes such as package consultation, activity consultation, fault consultation, weather consultation, flight consultation and the like, collects 100000 Chinese sentences to carry out an automatic test for eliminating Chinese redundancy meaning items, carries out a grouping test of 0.1 to 1 for parameters alpha, gamma and delta in modules B, C and D, and carries out a grouping test of 0.1 to 0.01 for parameter beta, and 0.00005 for step length. Then, 5000 test results are inspected and analyzed manually. The results show that when alpha=0.3, beta=0.001, gamma=0.7 and delta=0.3, the invention obtains 88.1% of accurate elimination of the redundant sense item, namely the automatic elimination precision of the redundant sense item reaches 88.1%. Therefore, the invention not only has important theoretical value, but also plays an important role in the actual Chinese sentence processing application.

Claims (1)

1. An automatic elimination system for Chinese redundancy meaning items is characterized by comprising the following modules:
module a: labeling the meaning items of the segmented training corpus T gamma and analyzing the correlation of the meaning items;
module B: eliminating redundant sense items by automatically detecting service independent sense items;
module C: eliminating redundant sense items by comparing and analyzing a plurality of term proximity;
module D: eliminating redundant sense items by comparing the term near class with the term parent class;
the implementation steps of the module A are as follows: training corpus T gamma= { TS after word segmentation 1 ,TS 2 ,...,TS n }, wherein TS i (1. Ltoreq.i.ltoreq.n) in the form TS i =t i1 {}{}/t i2 {}{}/...t ij {}{}/...t ik { } { }/(1.ltoreq.j.ltoreq.n); introducing a sense set, which is a set, initially empty; for each TS in T gamma i For TS i Each t of (2) ij { } { }, performing the following steps:
step A-1: finding t in a near class dictionary ij The belonging term classes are stored in the set t ij In _syn, t is ij Increase_syn to t ij In the first of { } { }, t is formed ij {t ij _syn}{};
Step A-2: sense_set=sense_set set t ij _syn;
Step A-3: finding t in the parent dictionary ij The term parent class to which they belong is stored in set t ij In_fat, t is taken as ij Fat is increased to t ij {t ij In the second one of _syn } { }, t is formed ij {t ij _syn}{t ij _fat};
Step A-4: sense_set=sense_set set t ij _fat;
The implementation steps of the module B are as follows:
step B-1: for any term of sense_set, the support set of sf in Γ is calculated and denoted as supp_set (Γ, sf), i.e. supp_set (Γ, sf) = { s|s e Γ, and S contains at least one term of sf };
step B-2: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any two elements t in_syn ij _syn 1 And t ij _syn 2 : if it is
Figure QLYQS_1
,
And is also provided with
Figure QLYQS_2
Then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_syn ij _syn 2 I.e. t ij _syn 2 At t ij Redundant sense items of (a);
step B-3: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any two elements t in_fat ij _fat 1 And t ij _fat 2 : if it is
Figure QLYQS_3
And is also provided with
Figure QLYQS_4
Then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_fat ij _fat 2 I.e. t ij _fat 2 At t ij Redundant sense items of (a);
the implementation steps of the module C are as follows:
step C-1: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any two elements t in_syn ij _syn 1 And t ij _syn 2 : if it is
Figure QLYQS_5
Then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_syn ij _syn 1 I.e. t ij _syn 1 At t ij Redundant sense items of (a);
step C-2: for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any two elements t in_syn ij _syn 1 And t ij _syn 2 : if it is
Figure QLYQS_6
Then the following is performed: if it is
Figure QLYQS_7
Then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_syn ij _syn 2
The implementation method of the module D is as follows:
for each TS in T gamma i For TS i Any one of t ij {t ij _syn}{t ij Fat, for t ij Any element t in_syn ij _syn 1 And t ij Any element t in_fat ij _fat 1 If (3)
Figure QLYQS_8
And is also provided with
Figure QLYQS_9
Then from t ij {t ij _syn}{t ij T in_fat } ij Deletion of t in_fat ij _fat 1 。/>
CN201811542048.XA 2018-12-17 2018-12-17 Automatic eliminating system for Chinese redundancy meaning items Active CN109657242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811542048.XA CN109657242B (en) 2018-12-17 2018-12-17 Automatic eliminating system for Chinese redundancy meaning items

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811542048.XA CN109657242B (en) 2018-12-17 2018-12-17 Automatic eliminating system for Chinese redundancy meaning items

Publications (2)

Publication Number Publication Date
CN109657242A CN109657242A (en) 2019-04-19
CN109657242B true CN109657242B (en) 2023-05-05

Family

ID=66113768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811542048.XA Active CN109657242B (en) 2018-12-17 2018-12-17 Automatic eliminating system for Chinese redundancy meaning items

Country Status (1)

Country Link
CN (1) CN109657242B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295294A (en) * 2008-06-12 2008-10-29 昆明理工大学 Improved Bayes acceptation disambiguation method based on information gain
CN108073570A (en) * 2018-01-04 2018-05-25 焦点科技股份有限公司 A kind of Word sense disambiguation method based on hidden Markov model
CN108256030A (en) * 2017-12-29 2018-07-06 北京理工大学 A kind of degree adaptive Concept Semantic Similarity computational methods based on ontology
CN108446269A (en) * 2018-03-05 2018-08-24 昆明理工大学 A kind of Word sense disambiguation method and device based on term vector
CN108874772A (en) * 2018-05-25 2018-11-23 太原理工大学 A kind of polysemant term vector disambiguation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070106657A1 (en) * 2005-11-10 2007-05-10 Brzeski Vadim V Word sense disambiguation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295294A (en) * 2008-06-12 2008-10-29 昆明理工大学 Improved Bayes acceptation disambiguation method based on information gain
CN108256030A (en) * 2017-12-29 2018-07-06 北京理工大学 A kind of degree adaptive Concept Semantic Similarity computational methods based on ontology
CN108073570A (en) * 2018-01-04 2018-05-25 焦点科技股份有限公司 A kind of Word sense disambiguation method based on hidden Markov model
CN108446269A (en) * 2018-03-05 2018-08-24 昆明理工大学 A kind of Word sense disambiguation method and device based on term vector
CN108874772A (en) * 2018-05-25 2018-11-23 太原理工大学 A kind of polysemant term vector disambiguation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Word vs. Class-Based Word Sense Disambiguation";Ruben Izquierdo 等;《Journal of Artificial Intelligence Research》;20150915;83-122 *
"自然语言处理中的语义消歧研究";贾媛媛;《淮南师范学院学报》;20130915;108-110 *

Also Published As

Publication number Publication date
CN109657242A (en) 2019-04-19

Similar Documents

Publication Publication Date Title
US10657325B2 (en) Method for parsing query based on artificial intelligence and computer device
CN107766371B (en) Text information classification method and device
CN108228825B (en) A kind of station address data cleaning method based on participle
EP3051432A1 (en) Semantic information acquisition method, keyword expansion method thereof, and search method and system
CN110427487B (en) Data labeling method and device and storage medium
CN107194617B (en) App software engineer soft skill classification system and method
CN101446942A (en) Semantic character labeling method of natural language sentence
CN109471793A (en) A kind of webpage automatic test defect positioning method based on deep learning
CN109740159B (en) Processing method and device for named entity recognition
CN110717040A (en) Dictionary expansion method and device, electronic equipment and storage medium
CN105608113B (en) Judge the method and device of POI data in text
CN111274814A (en) Novel semi-supervised text entity information extraction method
CN114297987B (en) Document information extraction method and system based on text classification and reading understanding
CN110175585A (en) It is a kind of letter answer correct system and method automatically
CN109145071B (en) Automatic construction method and system for geophysical field knowledge graph
CN110909123A (en) Data extraction method and device, terminal equipment and storage medium
CN111159356A (en) Knowledge graph construction method based on teaching content
CN110263331A (en) A kind of English-Chinese semanteme of word similarity automatic testing method of Knowledge driving
CN113407644A (en) Enterprise industry secondary industry multi-label classifier based on deep learning algorithm
CN110377695A (en) A kind of public sentiment subject data clustering method, device and storage medium
CN111079384B (en) Identification method and system for forbidden language of intelligent quality inspection service
CN111177401A (en) Power grid free text knowledge extraction method
CN113010593B (en) Event extraction method, system and device for unstructured text
CN109657242B (en) Automatic eliminating system for Chinese redundancy meaning items
CN101071421A (en) Chinese word cutting method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant