CN107357918B - Text representation method based on graph - Google Patents

Text representation method based on graph Download PDF

Info

Publication number
CN107357918B
CN107357918B CN201710599697.2A CN201710599697A CN107357918B CN 107357918 B CN107357918 B CN 107357918B CN 201710599697 A CN201710599697 A CN 201710599697A CN 107357918 B CN107357918 B CN 107357918B
Authority
CN
China
Prior art keywords
document
graph
entries
characteristic
documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710599697.2A
Other languages
Chinese (zh)
Other versions
CN107357918A (en
Inventor
周法国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology Beijing CUMTB
Original Assignee
China University of Mining and Technology Beijing CUMTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology Beijing CUMTB filed Critical China University of Mining and Technology Beijing CUMTB
Priority to CN201710599697.2A priority Critical patent/CN107357918B/en
Publication of CN107357918A publication Critical patent/CN107357918A/en
Application granted granted Critical
Publication of CN107357918B publication Critical patent/CN107357918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/358Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention relates to the technical field of text representation, in particular to a text representation method based on a graph, which comprises the following steps: determining the maximum vertex number n of the graph model corresponding to each document, performing word segmentation, part of speech tagging and preprocessing on the documents, and performing word frequency statistics on the documents; selecting the feature entries which can represent the document most, wherein the number of the feature entries is not more than n, and recording the sequence of all feature words in the document; and regarding the document D, all the characteristic entries of the document D are used as vertexes of the graph model, and the appearance frequencies of the corresponding characteristic entries form the weight of the vertexes. The invention has the beneficial effects that: the word meaning space is a network diagram formed by constraint relations between words, and between words, the semantic distance is expressed by the strength of the constraint relation between words, the similarity of the diagram is measured by the basic elements of the diagram, a good clustering effect is obtained, and if the semantic information, the frequency of characteristic entries and the position relation of the characteristic entries are reflected by the external characteristics of the text, the characteristic entries, the frequency of the characteristic entries and the position relation of the characteristic entries are obtained.

Description

Text representation method based on graph
Technical Field
The invention relates to the technical field of text representation, in particular to a text representation method based on a graph.
Background
In natural language processing and related fields, the classical text representation model basically rarely considers the effect of the order relation of terms in text on semantic expression, and assumes that terms are independent of each other. In fact, the order relationship between terms affects the semantics of the text, and changes in the chinese word order often affect the relationship between terms and cause semantic changes. A simple example is "a likes B" and "B likes a", the terms used in the sentence are the same, and the difference in word order results in a difference in semantics. The currently most popular text representation model VSM model ignores the order relationships in its model assumptions.
The most common text representation method is the vector space model, which is a bag-of-words (bag-of-words) based method, but it cannot be changed, and this representation method loses much information in the original text, such as: the order of words in the text, the boundaries of sentences and paragraphs in the text, and the like.
Aiming at the defects of the vector space representation model, many scholars at home and abroad propose a document representation method based on a graph model. Document conceptual diagram representation models based on auxiliary dictionaries VerbNet and WordNet as proposed by sveltana in its paper; bhopesh and Pushpak propose in their papers that feature vectors representing documents are constructed according to UNL graphs, and SOM technology is adopted to cluster texts; also Inderjeet and Eric have proposed in their paper a document graph model representation method for multiple document summarization. Although the graph models well embody semantic information of the documents, the graph models are too complex to provide similarity measurement standards, and some graph models need additional auxiliary information. Recently, Adam Schenker et al proposed a simpler document representation method based on a graph model in their papers, but their models are mainly established on the basis of the position boolean association of the feature terms of the text, and do not consider the influence of the occurrence frequency of the feature terms on the main content of the text.
Therefore, it is necessary to propose a graph-based text representation method for the above-described problem.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, an object of the present invention is to provide a text representation method based on a graph, which can better represent a text and improve the effects of information retrieval and text classification applications.
The text representation method based on the graph comprises the following steps: the method comprises the following steps: inputting a text document D; step two: outputting a text class diagram G (V, E, W)1,W2) (ii) a Step three: determining the maximum vertex number n of the graph model corresponding to each document; step four: performing word segmentation, part-of-speech tagging and preprocessing on the document, and performing word frequency statistics on the document; step five: selecting the feature entries which can represent the document most, wherein the number of the feature entries is not more than n, and recording the sequence of all feature words in the document; step six: for the document D, all the characteristic entries of the document D are used as vertexes of the graph model, and the appearance frequency of the corresponding characteristic entries forms the weight of the vertexes; step seven: if two characteristic words appear in a certain paragraph of the document successively, a directed edge is arranged between the two characteristic words, the direction of the edge is pointed by the characteristic word appearing first and then the characteristic word appears later, and the number of times of the two characteristic terms co-occurring in the document is counted; step eight: determining an incidence matrix M and U of the characteristic entries according to a formula (1); step nine: according to the formula
Figure BDA0001356829290000021
And carrying out normalization processing on the matrix U, and determining the normalized incidence matrix W.
Preferably, the formula (1) is defined by defining 1 a weight of an edge between two feature entries, i.e. a semantic measure, the semantic measure is defined as: w is aAB=1/(num(B)-num(A))
(1)
Where num (B) represents the sequence number of the feature entry B in the document, and num (a) represents the sequence number of the feature entry a in the document.
Preferably, where definition 1 is a document D, it corresponds to a class diagram G in the semantic space of words, G being a quadruple G (V, E, W)1,W2) And the class weighted directed graph is composed of a weighted vertex set V (G) and a weighted edge set E (G). The vertex set V (G) is composed of all characteristic entries appearing in the document D; set of weights W for the vertices1Is composed of the word frequency of the vertex in the vertex set V (G); if the feature entries corresponding to the two vertexes appear in sequence, a directed edge exists between the two vertexes, the direction of the directed edge is pointed to the vertex appearing in sequence by the vertex appearing in sequence, the weight W of the edge represents the degree of constraint between the two feature entries related to the edge, a set formed by all the edges is called an edge set E (G), a set formed by the weight W of the edge is called a weight set W of the edge2
Preferably, the document expression form of definition 1 is:
T=[t1,t2,…,tn] (2)
Figure BDA0001356829290000031
wherein, T: a feature entry set; t is tiIs a characteristic entry, i ═ 1,2, …, n; m: an incidence matrix of the feature entries; a isij: characteristic entry tiAnd tjThe correlation strength (i is more than or equal to 1 and less than or equal to j and less than or equal to n),
if a word a constrains another word B in the same paragraph for many times, only the nearest constraint relationship between them is counted, and it can be known from definition 1 that the maximum constraint value is 1, and a matrix U is obtained:
Figure BDA0001356829290000041
in general, the matrix U needs to be normalized.
Order to
Figure BDA0001356829290000042
Where i, j, k, l is 1,2, …, n yields the normalized matrix W:
Figure BDA0001356829290000043
preferably, two documents D1And D2The closer the semantics are, the more similar their corresponding document maps are, and conversely, the more similar the two document maps are, the closer they are semantically, the two documents D1And D2The closer the semantics are, the more identical vertices and edges there are on the graph features, and the closer the weights on the edges are.
Preferably, assume two documents D1And D2The corresponding weighted directed graphs are respectively G1And G2,G1And G2C, then document D1And D2The similarity of (a) is defined as follows:
Figure BDA0001356829290000044
wherein, | V (C) | represents the weighted directed graph G1And G2N is Max { | V (G)1)|,V(G2) And the constant factor beta takes a decimal between 0 and 1.
And the document similarity reflects the similarity degree between the two documents. Generally, the value is 0-1, 0 represents dissimilar, 1 represents completely similar, and the larger the value is, the more similar the two documents are.
The closer the semantics of the two documents are, as reflected in the characteristics of the graph, the more the two graphs have the same vertices and edges, and the closer the weights on the edges are, in equation (7),
Figure BDA0001356829290000051
is a measure of the composition of the vertices of two graphs, the closer the semantics of the two documents are, the more similar the corresponding graphs are,
Figure BDA0001356829290000052
the larger the value is, the closer to 1 is; while
Figure BDA0001356829290000053
Is a measure of the composition of the edges of two graphs, the closer the semantics of the two documents are, the more similar the corresponding graphs are,
Figure BDA0001356829290000054
the larger the value, the closer to 1, the linear combination
Figure BDA0001356829290000055
Represents a measure of similarity of the corresponding graphs for two documents, and S (D)1,D2) The value is between 0 and 1.
Accordingly, two documents D1And D2Distance Dis (D)1,D2)=1-S(D1,D2)。
Due to the adoption of the technical scheme, the invention has the beneficial effects that: the word meaning space is a network diagram formed by constraint relations among words, the semantic distance is expressed by the strength of the constraint relations among the words, the similarity of the diagram is measured by basic elements (vertexes and edges) of the diagram, a good clustering effect is obtained, and if the semantic information, the frequency of characteristic entries and the position relation of the characteristic entries are reflected by the external characteristics of a text, a new document expression model based on the word meaning space is established. It can successfully capture the following information: (1) part of speech, (2) word order, (3) word frequency, (4) co-occurrence of words, and (5) context information of words in text.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The embodiments of the invention will be described in detail below with reference to the drawings, but the invention can be implemented in many different ways as defined and covered by the claims.
As shown in fig. 1, the text representation method of the figure comprises the following steps: the method comprises the following steps: inputting a text document D; step two: outputting a text class diagram G (V, E, W)1,W2) (ii) a Step three: determining the maximum vertex number n of the graph model corresponding to each document; step four: performing word segmentation, part-of-speech tagging and preprocessing on the document, and performing word frequency statistics on the document; step five: selecting the feature entries which can represent the document most, wherein the number of the feature entries is not more than n, and recording the sequence of all feature words in the document; step six: for the document D, all the characteristic entries of the document D are used as vertexes of the graph model, and the appearance frequency of the corresponding characteristic entries forms the weight of the vertexes; step seven: if two characteristic words appear in a certain paragraph of the document successively, a directed edge is arranged between the two characteristic words, the direction of the edge is pointed by the characteristic word appearing first and then the characteristic word appears later, and the number of times of the two characteristic terms co-occurring in the document is counted; step eight: determining an incidence matrix M and U of the characteristic entries according to a formula (1); step nine: according to the formula
Figure BDA0001356829290000061
And carrying out normalization processing on the matrix U, and determining the normalized incidence matrix W.
Further, the formula (1) is defined by defining 1 a weight of an edge between two feature entries, i.e. semantic measure, which is defined by: w is aAB=1/(num(B)-num(A))(1)
Where num (B) represents the sequence number of the feature entry B in the document, and num (a) represents the sequence number of the feature entry a in the document.
Wherein definition 1 is a document D corresponding to a class diagram G in the semantic space of words, G is aQuadruplet G (V, E, W)1,W2) And the class weighted directed graph is composed of a weighted vertex set V (G) and a weighted edge set E (G). The vertex set V (G) is composed of all characteristic entries appearing in the document D; set of weights W for the vertices1Is composed of the word frequency of the vertex in the vertex set V (G); if the feature entries corresponding to the two vertexes appear in sequence, a directed edge exists between the two vertexes, the direction of the directed edge is pointed to the vertex appearing in sequence by the vertex appearing in sequence, the weight W of the edge represents the degree of constraint between the two feature entries related to the edge, a set formed by all the edges is called an edge set E (G), a set formed by the weight W of the edge is called a weight set W of the edge2
Preferably, the document expression form of definition 1 is:
T=[t1,t2,…,tn] (2)
Figure BDA0001356829290000071
wherein, T: a feature entry set; t is tiIs a characteristic entry, i ═ 1,2, …, n; m: an incidence matrix of the feature entries; a isij: characteristic entry tiAnd tjThe correlation strength (i is more than or equal to 1 and less than or equal to j and less than or equal to n),
if a word a constrains another word B in the same paragraph for many times, only the nearest constraint relationship between them is counted, and it can be known from definition 1 that the maximum constraint value is 1, and a matrix U is obtained:
Figure BDA0001356829290000072
in general, the matrix U needs to be normalized.
Order to
Figure BDA0001356829290000073
Where i, j, k, l is 1,2, …, n yields the normalized matrix W:
Figure BDA0001356829290000081
further, two documents D1And D2The closer the semantics are, the more similar their corresponding document maps are, and conversely, the more similar the two document maps are, the closer they are semantically, the two documents D1And D2The closer the semantics are, the more identical vertices and edges there are on the graph features, and the closer the weights on the edges are.
Suppose two documents D1And D2The corresponding weighted directed graphs are respectively G1And G2,G1And G2C, then document D1And D2The similarity of (a) is defined as follows:
Figure BDA0001356829290000082
wherein, | V (C) | represents the weighted directed graph G1And G2N is Max { | V (G)1)|,V(G2) And the constant factor beta takes a decimal between 0 and 1.
And the document similarity reflects the similarity degree between the two documents. Generally, the value is 0-1, 0 represents dissimilar, 1 represents completely similar, and the larger the value is, the more similar the two documents are.
The closer the semantics of the two documents are, the more identical vertices and edges are present on the graph's features, and the closer the weights on the edges are. In the formula (7), the reaction mixture is,
Figure BDA0001356829290000083
is a measure of the composition of the vertices of two graphs, the closer the semantics of the two documents are, the more similar the corresponding graphs are,
Figure BDA0001356829290000084
the larger the value is, the closer to 1 is; while
Figure BDA0001356829290000085
Is a measure of the composition of the edges of two graphs, the closer the semantics of the two documents are, the more similar the corresponding graphs are,
Figure BDA0001356829290000086
the larger the value, the closer to 1, and thus, the linear combination
Figure BDA0001356829290000087
Represents a measure of similarity of the corresponding graphs for two documents, and S (D)1,D2) The value is between 0 and 1, and correspondingly, two documents D1And D2Distance Dis (D)1,D2)=1-S(D1,D2)。
In addition, weak equivalence relation: let R be a binary relation on set A, if R satisfies the condition:
self-reflexibility: for any element x in the set A, x belongs to R
Symmetry: for any 2 elements x and y of the set A, < x, y > ∈ R, then < y, x > ∈ R
Weak transmissibility: for any 3 elements x, y, and z of set A, < x, z > ∈ LR if < x, y > ∈ R, and < y, z > ∈ R. Where LR represents a weak binary relationship for R.
Then R is a weak equivalence relation defined on the set A, and the similarity relation S of the documents is the document set DsetThe above binary relation, the similarity relation S of the documents is a weak equivalence relation.
Wherein, the term meaning space is term space + semantic space, and the formalization of term meaning space is described as follows:
S=<T,R,W1,W2> - { T ═ T1,t2,…,ti,…,tn1,2, …, n, T is a characteristic entry set, TiIs a characteristic entry, i ═ 1,2, …, n; r is a semantic constraint relation on a set T, an element T in the set TiAnd tjSatisfies the relationship R, if and only if tiConstraint tjIs denoted by tiRtjOr < ti,tj>∈R,i,j=1,2,…,n;W1Is a set of weights for the feature entries, here tiI ═ 1,2, …, n; w2Is a set of the element constraint strengths in the set T.
Obviously, the semantically constrained relationship on the set T is a binary relationship on the set T. As known from the knowledge of set theory, the binary relation can be represented by graph G, wherein the vertex of graph G is composed of all elements in T, if < Ti,tjIs > ∈ R, then from vertex tiTo the vertex tjThere is a directed edge, i, j ═ 1,2, …, n, because the relationship is a set of ordered pairs, the order of the elements in the ordered pairs cannot be reversed, so in the graph representation of the relationship, directed edges are used.
The word meaning space of the invention is a network diagram formed by constraint relations between words, word and word, the semantic distance is expressed by the strength of the constraint relation between words, the similarity of the diagram is measured by basic elements (vertexes and edges) of the diagram, a good clustering effect is obtained, if the semantic information, the frequency of characteristic entries and the position relation of the characteristic entries of the text are reflected by the external characteristics of the text, a new document expression model based on the word meaning space is established, and the following information can be successfully captured: (1) part of speech, (2) word order, (3) word frequency, (4) co-occurrence of words, and (5) context information of words in text.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (4)

1. The text representation method based on the graph is characterized in that: the method comprises the following steps:
the method comprises the following steps: inputting a text document D;
step two: output text graph G (V, E, W)1,W2);
Step three: determining the maximum vertex number n of the graph model corresponding to each document;
step four: performing word segmentation, part-of-speech tagging and preprocessing on the document, and performing word frequency statistics on the document;
step five: selecting the feature entries which can represent the document most, wherein the number of the feature entries is not more than n, and recording the sequence of all feature words in the document;
step six: for the document D, all the characteristic entries of the document D are used as vertexes of the graph model, the appearance frequency of the corresponding characteristic entries forms the weight of the vertexes, and therefore the weight set W of the vertexes is formed1
Step seven: if two characteristic words appear in a certain paragraph of the document successively, a directed edge is arranged between the two characteristic words, and the direction of the edge is directed by the characteristic word appearing first to the characteristic word appearing later;
step eight: determining an incidence matrix M and U of the characteristic entries according to a formula (1);
step nine: according to the formula
Figure FDA0003418197460000011
Carrying out normalization processing on the matrix U, and determining a normalized incidence matrix W;
eighthly, the formula (1) is defined by semantic measure of an edge between two feature entries of the definition 1, and the semantic measure is defined as follows: w is aAB=1/(num(B)-num(A)) (1)
Wherein num (B) represents the sequence number of the feature entry B in the document, and num (a) represents the sequence number of the feature entry a in the document;
where definition 1 is a document D corresponding to a graph G in the semantic space of words, G is a quadruple G (V, E, W)1,W2) A weighted directed graph consisting of a weighted set of vertices V (G) and a weighted set of edges E (G), the set of vertices V (G) consisting of all the feature entries appearing in the document D; weight on edge W2Representing the degree of constraint between two associated feature entries, the set of all edges is called edge set E (G), the weight W on the edge2The set of constructs is called the weight set W of the edge2
2. The graph-based text representation method of claim 1, wherein: the document expression form of definition 1 is:
T=[t1,t2,…,tn] (2)
Figure FDA0003418197460000021
wherein, T: a feature entry set; t is tiIs a characteristic entry, i ═ 1,2, …, n; m: an incidence matrix of the feature entries; a isij: characteristic entry tiAnd tjThe correlation strength (i is more than or equal to 1 and less than or equal to j and less than or equal to n),
if a word a constrains another word B in the same paragraph for many times, only the nearest constraint relationship between them is counted, and it can be known from definition 1 that the maximum constraint value is 1, and a matrix U is obtained:
Figure FDA0003418197460000022
in general, the matrix U needs to be normalized,
order to
Figure FDA0003418197460000023
Where i, j, k, l is 1,2, …, n yields the normalized matrix W:
Figure FDA0003418197460000024
3. the graph-based text representation method of claim 1, wherein: two documents D1And D2The closer the semantics are, the more similar their corresponding document maps are, and conversely, the more similar the two document maps areThe closer they are semantically, two documents D1And D2The closer the semantics are, the more identical vertices and edges there are on the graph features, and the closer the weights on the edges are.
4. The graph-based text representation method of claim 1, wherein: suppose two documents D1And D2The corresponding weighted directed graphs are respectively G1And G2,G1And G2C, then document D1And D2The similarity of (a) is defined as follows:
Figure FDA0003418197460000031
wherein, | V (C) | represents the weighted directed graph G1And G2N is Max { | V (G)1)|,V(G2) The constant factor beta is a decimal between 0 and 1,
the document similarity, which reflects the similarity between two documents, is usually a numerical value between 0 and 1, where 0 indicates dissimilarity, 1 indicates complete similarity, and a larger numerical value indicates more similarity between two documents,
the closer the semantics of the two documents are, as reflected in the characteristics of the graph, the more the two graphs have the same vertices and edges, and the closer the weights on the edges are, in equation (7),
Figure FDA0003418197460000032
is a measure of the composition of the vertices of two graphs, the closer the semantics of the two documents are, the more similar the corresponding graphs are,
Figure FDA0003418197460000033
the larger the value is, the closer to 1 is; while
Figure FDA0003418197460000034
Is a measure of the composition of the edges of the two graphsThe closer the semantics of the two documents are, the more similar the corresponding graphs are,
Figure FDA0003418197460000036
the larger the value, the closer to 1, the linear combination
Figure FDA0003418197460000035
Represents a measure of similarity of the corresponding graphs for two documents, and S (D)1,D2) The value is between 0 and 1, and correspondingly, two documents D1And D2Distance Dis (D)1,D2)=1-S(D1,D2)。
CN201710599697.2A 2017-07-21 2017-07-21 Text representation method based on graph Active CN107357918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710599697.2A CN107357918B (en) 2017-07-21 2017-07-21 Text representation method based on graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710599697.2A CN107357918B (en) 2017-07-21 2017-07-21 Text representation method based on graph

Publications (2)

Publication Number Publication Date
CN107357918A CN107357918A (en) 2017-11-17
CN107357918B true CN107357918B (en) 2022-01-25

Family

ID=60284884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710599697.2A Active CN107357918B (en) 2017-07-21 2017-07-21 Text representation method based on graph

Country Status (1)

Country Link
CN (1) CN107357918B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992480B (en) * 2017-12-25 2021-09-14 东软集团股份有限公司 Method, device, storage medium and program product for realizing entity disambiguation
CN109326327B (en) * 2018-08-28 2021-11-12 福建师范大学 Biological sequence clustering method based on SeqRank graph algorithm
CN110188349A (en) * 2019-05-21 2019-08-30 清华大学深圳研究生院 A kind of automation writing method based on extraction-type multiple file summarization method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024385A1 (en) * 2007-07-16 2009-01-22 Semgine, Gmbh Semantic parser

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
加权最大频繁子图挖掘算法的研究;王映龙等;《计算机工程与应用》;20091130;第45卷(第20期);31-34,38 *
基于词项共现关系图模型的中文观点句识别研究;王明文等;《中文信息学报》;20151130;第29卷(第6期);第187页,191页第1栏 *

Also Published As

Publication number Publication date
CN107357918A (en) 2017-11-17

Similar Documents

Publication Publication Date Title
US20240202446A1 (en) Method for training keyword extraction model, keyword extraction method, and computer device
CN106446148B (en) A kind of text duplicate checking method based on cluster
CN107180045B (en) Method for extracting geographic entity relation contained in internet text
RU2628436C1 (en) Classification of texts on natural language based on semantic signs
RU2628431C1 (en) Selection of text classifier parameter based on semantic characteristics
US9183274B1 (en) System, methods, and data structure for representing object and properties associations
Bansal et al. Hybrid attribute based sentiment classification of online reviews for consumer intelligence
WO2022126810A1 (en) Text clustering method
Phan et al. Aspect-level sentiment analysis: A survey of graph convolutional network methods
CN107357918B (en) Text representation method based on graph
KR101717230B1 (en) Document summarization method using recursive autoencoder based sentence vector modeling and document summarization system
CN111813955B (en) Service clustering method based on knowledge graph representation learning
CN112686025B (en) Chinese choice question interference item generation method based on free text
WO2021113104A1 (en) Methods and systems for predicting a price of any subtractively manufactured part utilizing artificial intelligence at a computing device
CN114997288A (en) Design resource association method
CN114265936A (en) Method for realizing text mining of science and technology project
Cheng et al. Domain-specific ontology mapping by corpus-based semantic similarity
CN111767724A (en) Text similarity calculation method and system
CN111581984A (en) Statement representation method based on task contribution degree
CN108427769B (en) Character interest tag extraction method based on social network
Xie et al. Construction of unsupervised sentiment classifier on idioms resources
Wang Research on the art value and application of art creation based on the emotion analysis of art
CN109117436A (en) Synonym automatic discovering method and its system based on topic model
Zhou et al. Satirical news detection with semantic feature extraction and game-theoretic rough sets
CN113869038A (en) Attention point similarity analysis method for Baidu stick bar based on feature word analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant