CN113536761A - Method for calculating sentence similarity based on frame importance - Google Patents
Method for calculating sentence similarity based on frame importance Download PDFInfo
- Publication number
- CN113536761A CN113536761A CN202110776700.XA CN202110776700A CN113536761A CN 113536761 A CN113536761 A CN 113536761A CN 202110776700 A CN202110776700 A CN 202110776700A CN 113536761 A CN113536761 A CN 113536761A
- Authority
- CN
- China
- Prior art keywords
- frame
- importance
- frames
- sentence
- information set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000012800 visualization Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 2
- 238000003058 natural language processing Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for calculating sentence similarity based on frame importance, which comprises the following steps: step 1: all frames in the English sentence S form a frame semantic information set E; step 2: extracting core frame elements of each frame in the set E; and step 3: calculating the importance of each frame according to the number of core frame elements in each frame in the set E; and 4, step 4: all frames in the English sentence S ' form a frame semantic information set E ', and the importance of each frame in the set E ' is calculated; and 5: taking the same frame in the set E and the set E' as a group of frames; selecting the minimum frame importance in each frame group as the importance of the frame group; and accumulating and calculating the frame importance of all the frame groups, and calculating the similarity of the English sentences S and S' based on the accumulated and calculated values. The method provided by the invention can be applied to natural language processing tasks such as text inclusion recognition, text summarization and the like.
Description
Technical Field
The invention belongs to the technical field of natural language processing.
Background
The Frame semantic library FrameNet is a semantic knowledge base based on Frame Semantics (Frame Semantics) and is used for the research of languages such as linguistics, computational linguistics, natural language processing and the like. Concept structures and semantic scenes hidden behind words can be mined through the frame semantics.
A frame (frame) in FrameNet refers to a semantic structural form of a sentence expressing a specific scene, which is composed of lemmas (lexical units, LUs) and Frame Elements (FEs) to which it is associated. The various participants, external conditions, etc. involved in the framework are referred to as framework elements. The frame elements are divided into core frame elements (CoreFEs) and non-frame elements (Peripheral, Extra-composite) according to the importance degree, the core frame elements are necessary components of a frame in conceptual understanding, the core frame elements are different in number and type in different frames, and the personality of the frames is displayed; the non-core frame elements express general semantic components such as time, place and the like.
When a sentence includes multiple frames, the importance of the different frames is not necessarily the same, and to accurately measure the similarity between sentences, the importance of the frames must be considered while considering the frames themselves, however, it is not easy to measure the importance of the frames in the sentence, because the measurement result is not constant according to different importance measurement standards. Therefore, the frame importance metric selection is the key to the frame importance metric. The similarity calculation method based on the word level features does not consider the structural information of sentences at present; the similarity calculation method based on sentence structure characteristics fails to fully consider sentence semantics. The conventional sentence similarity calculation method mainly aims at the problems of sentence keywords and sentence structures, and the similarity calculation result is not accurate enough due to the fact that the semantics of the sentences are not comprehensive and the interpretability is poor.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a method for calculating sentence similarity based on frame importance, which aims to solve the problems in the prior art.
The technical scheme is as follows: the invention provides a method for calculating sentence similarity based on frame importance, which comprises the following steps:
step 1: extracting all frames in the English sentence S, and forming a frame semantic information set E by all the frames;
step 2: constructing a frame semantic library FrameNet visualization tool GIFN, and extracting core frame elements of each frame in a frame semantic information set E through the GIFN;
and step 3: calculating a frame influence factor of each frame based on the number of core frame elements in each frame; establishing a frame importance function according to the frame influence factors to obtain the importance w (f) of the ith frame in the frame semantic information set EE,i),fE,iRepresenting the ith frame in the frame semantic information set E, wherein i is 1, 2., frame _ S, and frame _ S is the total number of frames in the frame semantic information set E;
and 4, step 4: forming all frames in the English sentence S ' into a frame semantic information set E ' according to the steps 1-3, and calculating the importance of each frame in the frame semantic information set E ';
and 5: taking the same frame in E and E' as a group of frame groups to obtain frame _ same frame groups; comparing the importance of two frames in the jth frame group, and selecting the minimum frame importance as the frame importance min of the jth frame groupjJ ═ 1,2,. said, frame _ same; and accumulating the frame importance of the frame _ same frame groups, and calculating the similarity of the English sentences S and S' based on the accumulated values.
Further, in the step 1, the english sentence S is input into an open source semantic frame extraction tool SEMAFOR, and the SEMAFOR analyzes the input english sentence S according to the structure of the frame semantic library FrameNet, so as to extract the frame in the english sentence S.
Further, the specific method for constructing the framework semantic library FrameNet visualization tool GIFN in the step 2 comprises the following steps: all frames in the FrameNet are taken as nodes, semantic relations among the frames and semantic relations among the lemmas and the frames are taken as edges, and the nodes and the edges are stored in a graph database Neo4 j.
Further, the similarity calculation formula corresponding to the english sentence S and the sentence S' is as follows:
wherein, Similarity _ score is the Similarity between English sentence S and sentence S'; frame _ S 'is the total number of frames in the frame semantic information set E', Maximum (.) is the Maximum value; wherein the expression of Path _ score is as follows:
wherein frame _ rel is the number of shortest path frame pairs, and the method for specifically obtaining the shortest path frame pairs is as follows: removing the frames which are the same as the frames in the frame semantic information set E' from the frame semantic information set E to obtain a set E1; removing the frames which are the same as the frames in the frame semantic information set E from the frame semantic information set E 'to obtain a set E' 1; obtaining the number of edges required by each frame in the set E1 to reach any frame in the set E' 1 through a visualization tool GIFN; using two frames with the minimum number of required edges as a shortest path frame pair; path _ valuei,The expression of (a) is as follows:
wherein CountPath is the number of edges required by one frame in the ith' shortest path frame pair to reach the other frame; weighttIs the weight of the t-th edge.
Further, the framework influence factor in step 3 is:
wherein, ciIs fE,iTotal number of center core frame elements; n isiIs fE,iTotal number of middle frame elements, betaiIs fE,iThe framework influencing factor of (1).
Further, the frame importance function in step 3 is:
Has the advantages that: the invention considers the importance of the frame while considering the frame, and can more accurately measure the similarity between sentences. The method can be applied to natural language processing tasks such as text inclusion recognition and text summarization.
Drawings
FIG. 1 is a schematic flow chart of a method for calculating the importance of a frame according to the present invention;
FIG. 2 is a flow chart of extracting core frame elements according to a frame semantic library FrameNet;
FIG. 3 is a flow chart of a compute frame importance function;
FIG. 4 is a diagram of the semantic relationships between partial frames in the GIFN.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention.
According to the method for calculating the sentence similarity based on the frame importance, the core frame elements of the frame are extracted according to the frame semantic library FrameNet, the frame importance is distinguished through the number of the core frame elements contained in the frame, and the method is conveniently applied to natural language processing tasks such as text inclusion recognition and text summarization.
The following detailed description of the embodiments of the present invention will be provided with reference to the drawings and examples, so as to fully understand how to implement the technical solution of the present invention and achieve the technical effects. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.
The FrameNet described in this embodiment refers to a semantic knowledge base based on Frame Semantics (Frame Semantics) constructed by berkeley division, university of california, usa, and is used for linguistic studies, such as linguistics, computational linguistics, and natural language processing. Concept structures and semantic scenes hidden behind words can be mined through the frame semantics. In FrameNet, a frame refers to the semantic structural form of a sentence expressing a particular scene, made up of a token and its associated frame elements. The various participants, external conditions, etc. involved in the frame are referred to as frame elements, which in the real corpus correspond to the vocabulary describing the event or event modality in the context. The frame elements are divided into core frame elements and non-frame elements according to the importance degree, the core frame elements are necessary components of a frame in concept understanding, the core frame elements are different in number and type in different frames, and the personality of the frames is displayed; the non-core frame elements express general semantic components such as time, place and the like.
SEMAFOR is an open-source framework semantic parser. The method can automatically analyze English sentences according to the FrameNet structure, and obtain frames, frame elements, specific contents indicated by the frame elements and the like aroused by the sentence contents. In the key steps of the implementation design, frame semantic information is acquired by a SEMAFOR open source tool according to a semantic knowledge base FrameNet.
Neo4j is a high-performance NOSQL graph database that stores structured data on a network rather than in tables. It is an embedded, disk-based Java persistence engine with full transactional features. Neo4j provides large-scale scalability, allowing billions of nodes/relationships/attributes to be processed on one machine, extending to multiple machines running in parallel. Graph databases are good at handling large amounts of complex, interconnected, low-structured data that changes rapidly and requires frequent queries, as opposed to relational databases, where such queries result in large numbers of table connections and, therefore, create performance problems. Neo4j focuses on solving the performance degradation problem that occurs when a traditional RDBMS with a large number of connections queries. By modeling the data around the graph, Neo4j will traverse nodes and edges at the same speed, which does not have any relationship to the amount of data that makes up the graph.
As shown in fig. 1, the present embodiment provides a method for calculating sentence similarity based on frame importance, which includes:
step one, identifying all frame semantic information from the English sentence S. And analyzing the sentence S by using an open source frame semantic analysis tool SEMAFOR according to a FrameNet structure to obtain a frame excited by the content of the sentence S, wherein the frame comprises the lemma and frame elements connected with the lemma, and all the frames form a frame semantic information set E. The input SEMAFOR content is English sentence S, and the output is the result analyzed by the SEMAFOR tool.
By means of the development tool Eclipse, frarnet was mapped into Neo4j, resulting in the constructed frarnet visualization tool GIFN: all frames, frame elements and lemmas in the frame semantic library FrameNet are taken as nodes (because a frame is a sentence semantic structure form which is formed by the lemmas and the frame elements connected with the lemmas and expresses a specific scene, the frame is taken as a node), the relationship between the frames and the relationship between the lemmas and the frames are taken as edges and stored in a graphic database Neo4j, and the constructed FrameNet visualization tool 'graphic Interpretation for FrameNet: GIFN' is obtained.
Step two, extracting the frame of each line from the result analyzed by the frame semantic analysis tool SEMAFOR: defining a FrameExtraction class to extract a plurality of frames of each line in a result analyzed by a frame semantic analysis tool SEMAFOR, and defining a searchFrame method in the FrameExtraction class to extract a plurality of frames of each line; the searchFrame method is called and the result output is all the frames contained in each row.
Determining a FrameFEExtraction class to extract frame elements contained in each frame in a result analyzed by a frame semantic analysis tool SEMAFOR, and defining a searchFE method in the FrameFEExtraction class to extract the frame elements contained in each frame; the searchFE method is invoked and the result output is all the frame elements contained under each frame.
Part of key codes of the second step are as follows:
the results obtained in step two are partially collated as shown in Table 1:
TABLE 1
Frame (Frame) | FEs (frame element) |
Statement | {Message,Speaker} |
Sign_agreement | {Agreement,Signatory} |
Ordinal_numbers | {Type} |
Possession | {Possession} |
Compliance | {Protagonist} |
Step three, displaying a framework semantic library FrameNet in a graphical topographic form by utilizing a FrameNet visualization tool GIFN, wherein frames (framework) and LUs (lemmas) in the GIFN comprise annoset, FEs (framework elements) comprise FEcoreSet (core framework element set), and the FEcoreSet represents core framework elements of the framework in the set E: the specific flow is shown in FIG. 2; defining a FEcoreExtraction class, extracting core frame elements contained in each frame in a result analyzed by a frame semantic analysis tool SEMAFOR, finding out the core frame element result of the frame in the FEcoreExtraction class through FEcoreSet, and outputting the core frame element result as the core frame elements contained in each frame.
Part of the results obtained in step three are summarized in table 2:
TABLE 2
Frame (Frame) | CoreFEs (core frame element) |
Statement | {Message,Speaker} |
Sign_agreement | {Agreement,Signatory} |
Ordinal_numbers | {Type} |
Possession | {Possession} |
Compliance | {Protagonist} |
And step four, calculating the frame influence factor of each frame based on the number of the core frame elements in each frame. Calculating the probability of the number of the core frame elements covered by each frame in the number of the frame elements covered by the frames in the whole sentence S, and defining the probability as a frame influence factor in a frame semantic information set E, wherein the calculation formula is as follows:
wherein: c. CiIs fE,iTotal number of center core frame elements; n isiIs fE,iTotal number of middle frame elements, fE,iDenotes the ith frame in the set E, i 1, 2., frame _ S, which is the total number of frames covered in the sentence S. The greater the number of core frame elements covered by a frame, the higher the importance, and the greater its impact factor value.
When the number of core frame elements covered by the two frames is the same, the semantic importance of the two frames is considered to be the same, and the influence factor values are the same.
In this embodiment, fes classification is defined to calculate the total number of frame elements contained in the english sentence S, FrameNum is defined to calculate the number of frames covered by each sentence, fes num is defined to calculate the number of frame elements covered by each frame, fes null is defined to accumulate the number of frame elements, and the output result is the total number of frame elements contained in the whole sentence. Defining a CoreFEsCalculation class to calculate the probability of the number of core frame elements covered by each frame in the number of frame elements covered by the frames in the whole sentence S, defining a CoreFEsNum method to calculate the number of frame elements covered by each frame, and defining a CoreFEsPeer method to calculate the probability by using a formula (1).
The partial calculation results obtained in step four are shown in table 3:
TABLE 3
Step five, constructing a frame influence factor matrix, wherein the frame influence factor matrix is as follows:
M=(βi)frame_S×1
and step six, measuring the importance of the frames according to the influence factors of the frames in the set E, defining an importance function of each frame in the sentence S, and calculating the importance of each frame in the frame information set E. Giving corresponding weight according to the number of the core frame elements covered in the frame, and calculating the importance of each frame of the sentence S to the sentence, specifically: (ii) a
The importance of each frame in the frame information set E is initialized. The initialized formula for the importance of each frame in the frame information set E is:
and normalizing the importance of the frame in the English sentence S. The calculation formula of the importance of each frame in the normalized English sentence S to the sentence is as follows:
whereinFor each element in the framework impact factor matrix, also βiAn exponential score of; w (f) is more than 0E,i)≤1,
According to one embodiment of the invention, the FrameWeight type calculation frame importance is defined, a CoreFEsPerall method is defined to accumulate frame influence factors, the FrameWeight method is defined to calculate the frame importance by using a formula (2), and the output result is the importance of each frame of a sentence corresponding to the sentence. A flow chart for defining the frame importance function is shown in fig. 3.
Step seven, forming a frame semantic information set E ' by all frames in the English sentence S ' according to the steps one to six, and calculating the importance of each frame in the frame semantic information set E ';
step eight: taking the same frame in the frame semantic information set E and the frame semantic information set E' as a group of frame groups; obtaining frame _ same frame groups; comparing the importance of two frames in the jth frame group, and selecting the minimum frame importance as the importance min of the frame in the jth frame groupjJ 1, 2., frame _ same; and accumulating the importance of the frames of the frame _ same frame group, and calculating the similarity of the English sentences S and S' based on the following formula:
wherein, Similarity _ score is the Similarity between English sentence S and sentence S'; frame _ S 'is the total number of frames in the frame semantic information set E', Maximum (.) is the Maximum value; wherein the calculation formula of Path _ score is as follows:
frame _ rel is the number of shortest path frame pairs, and the specific method for obtaining the number of shortest path frame pairs is as follows: removing the frames which are the same as the frames in the frame semantic information set E' from the frame semantic information set E to obtain a set E1; removing the same frame in the frame semantic information set E ' from the frame semantic information set E ' to obtain a set E ' 1; obtaining the number of edges required by each frame in the set E1 to reach any frame in the set E' 1 through the visualization tool GIFN, wherein the semantic relation among partial frames is shown in FIG. 4; using the two frames with the minimum number of required edges as the shortest circuitA radial frame pair; path _ valuei,The expression of (a) is as follows:
wherein CountPath is the number of edges required by one frame in the ith' shortest path frame pair to reach the other frame; weighttIs the weight of the t-th edge. The weight of each path in fig. 4 is shown in table 4:
TABLE 4
Inter-frame semantic relationships (semantic relationships represented by paths in GIFN are also edges) | Path weight |
Inherits from | 0.55 |
Is Inherited by | 0.55 |
Perspective on | 0.45 |
Is Perspective in | 0.45 |
Users | 0.3 |
Is Used by | 0.3 |
Subframe of | 0.35 |
Has Subframe(s) | 0.35 |
Precedes | 0.2 |
Is Preceded by | 0.2 |
Is Inchoative of | 0.3 |
Is Causative of | 0.3 |
See also | 0.4 |
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. The invention is not described in detail in order to avoid unnecessary repetition.
Claims (6)
1. A method for calculating sentence similarity based on frame importance is characterized by comprising the following steps:
step 1: extracting all frames in the English sentence S, and forming a frame semantic information set E by all the frames;
step 2: constructing a frame semantic library FrameNet visualization tool GIFN, and extracting core frame elements of each frame in a frame semantic information set E through the GIFN;
and step 3: calculating a frame influence factor of each frame based on the number of core frame elements in each frame; establishing a frame importance function according to the frame influence factors to obtain the importance w (f) of the ith frame in the frame semantic information set EE,i),fE,iRepresenting the ith frame in the frame semantic information set E, wherein i is 1, 2., frame _ S, and frame _ S is the total number of frames in the frame semantic information set E;
and 4, step 4: forming all frames in the English sentence S ' into a frame semantic information set E ' according to the steps 1-3, and calculating the importance of each frame in the frame semantic information set E ';
and 5: taking the same frame in E and E' as a group of frame groups to obtain frame _ same frame groups; comparing the importance of two frames in the jth frame group, and selecting the minimum frame importance as the frame importance min of the jth frame groupjJ 1, 2., frame _ same; the frame importance of the frame _ same frame groups is accumulated,and calculating the similarity of the english sentences S and S' based on the cumulatively calculated values.
2. The method for calculating sentence similarity based on frame importance according to claim 1, wherein the english sentence S in step 1 is input into an open source semantic frame extraction tool semfor, and the semfor analyzes the input english sentence S according to the structure of a frame semantic library FrameNet, thereby extracting the frame in the english sentence S.
3. The method for calculating sentence similarity based on frame importance according to claim 1, wherein the specific method for constructing the frame semantic library FrameNet visualization tool GIFN in the step 2 is as follows: all frames in the FrameNet are taken as nodes, semantic relations among the frames and semantic relations among the lemmas and the frames are taken as edges, and the nodes and the edges are stored in a graph database Neo4 j.
4. The method for calculating sentence similarity based on frame importance of claim 3, wherein the similarity calculation formula for correspondence between English sentence S and sentence S' is as follows:
wherein, Similarity _ score is the Similarity between English sentence S and sentence S'; frame _ S 'is the total number of frames in the frame semantic information set E', Maximum (.) is the Maximum value; wherein the expression of Path _ score is as follows:
wherein frame _ rel is the number of shortest path frame pairs, and the method for specifically obtaining the shortest path frame pairs is as follows: the same frame as in the frame semantic information set E' is removed from the frame semantic information set E,obtaining a set E1; removing the frames which are the same as the frames in the frame semantic information set E from the frame semantic information set E 'to obtain a set E' 1; obtaining the number of edges required by each frame in the set E1 to reach any frame in the set E' 1 through a visualization tool GIFN; using two frames with the minimum number of required edges as a shortest path frame pair; path _ valuei’The expression of (a) is as follows:
wherein CountPath is the number of edges required by one frame in the ith' shortest path frame pair to reach the other frame; weighttIs the weight of the t-th edge.
5. The method for calculating sentence similarity based on frame importance according to claim 1, wherein the frame influence factors in the step 3 are:
wherein, ciIs fE,iTotal number of center core frame elements; n isiIs fE,iTotal number of middle frame elements, betaiIs fE,iThe framework influencing factor of (1).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110776700.XA CN113536761B (en) | 2021-07-09 | 2021-07-09 | Method for calculating sentence similarity based on frame importance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110776700.XA CN113536761B (en) | 2021-07-09 | 2021-07-09 | Method for calculating sentence similarity based on frame importance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113536761A true CN113536761A (en) | 2021-10-22 |
CN113536761B CN113536761B (en) | 2024-01-30 |
Family
ID=78127260
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110776700.XA Active CN113536761B (en) | 2021-07-09 | 2021-07-09 | Method for calculating sentence similarity based on frame importance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113536761B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012113422A (en) * | 2010-11-22 | 2012-06-14 | Nippon Telegr & Teleph Corp <Ntt> | Document processing apparatus, method and program |
US20130054612A1 (en) * | 2006-10-10 | 2013-02-28 | Abbyy Software Ltd. | Universal Document Similarity |
CN110889292A (en) * | 2019-11-29 | 2020-03-17 | 福州大学 | Text data viewpoint abstract generating method and system based on sentence meaning structure model |
CN111324690A (en) * | 2020-03-04 | 2020-06-23 | 南京航空航天大学 | FrameNet-based graphical semantic database processing method |
-
2021
- 2021-07-09 CN CN202110776700.XA patent/CN113536761B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130054612A1 (en) * | 2006-10-10 | 2013-02-28 | Abbyy Software Ltd. | Universal Document Similarity |
JP2012113422A (en) * | 2010-11-22 | 2012-06-14 | Nippon Telegr & Teleph Corp <Ntt> | Document processing apparatus, method and program |
CN110889292A (en) * | 2019-11-29 | 2020-03-17 | 福州大学 | Text data viewpoint abstract generating method and system based on sentence meaning structure model |
CN111324690A (en) * | 2020-03-04 | 2020-06-23 | 南京航空航天大学 | FrameNet-based graphical semantic database processing method |
Non-Patent Citations (2)
Title |
---|
TIEXIN WANG等: "A joint FrameNet and element focusing Sentence-BERT method of sentence similarity computation", EXPERT SYSTEMS WITH APPLICATIONS, vol. 200, no. 117084, pages 1 - 11 * |
王铁鑫等: "面向英文句子的框架语义扩展及相似度计算", 小型微型计算机系统, pages 1 - 8 * |
Also Published As
Publication number | Publication date |
---|---|
CN113536761B (en) | 2024-01-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11520812B2 (en) | Method, apparatus, device and medium for determining text relevance | |
JP7223785B2 (en) | TIME-SERIES KNOWLEDGE GRAPH GENERATION METHOD, APPARATUS, DEVICE AND MEDIUM | |
CN111104794B (en) | Text similarity matching method based on subject term | |
TWI662425B (en) | A method of automatically generating semantic similar sentence samples | |
CN108197117B (en) | Chinese text keyword extraction method based on document theme structure and semantics | |
CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
US20190005049A1 (en) | Corpus search systems and methods | |
CN110502642B (en) | Entity relation extraction method based on dependency syntactic analysis and rules | |
CN111190900B (en) | JSON data visualization optimization method in cloud computing mode | |
US20110302168A1 (en) | Graphical models for representing text documents for computer analysis | |
US20220277005A1 (en) | Semantic parsing of natural language query | |
US11514034B2 (en) | Conversion of natural language query | |
CN103646112A (en) | Dependency parsing field self-adaption method based on web search | |
CN109522396B (en) | Knowledge processing method and system for national defense science and technology field | |
CN110909126A (en) | Information query method and device | |
CN109840255A (en) | Reply document creation method, device, equipment and storage medium | |
CN112633000A (en) | Method and device for associating entities in text, electronic equipment and storage medium | |
JP2018005690A (en) | Information processing apparatus and program | |
JPH0816620A (en) | Data sorting device/method, data sorting tree generation device/method, derivative extraction device/method, thesaurus construction device/method, and data processing system | |
JP7197542B2 (en) | Method, Apparatus, Device and Medium for Text Word Segmentation | |
CN111444713A (en) | Method and device for extracting entity relationship in news event | |
Korobkin et al. | Patent data analysis system for information extraction tasks | |
CN110929509B (en) | Domain event trigger word clustering method based on louvain community discovery algorithm | |
CN106294689B (en) | A kind of method and apparatus for selecting to carry out dimensionality reduction based on text category feature | |
CN112632272A (en) | Microblog emotion classification method and system based on syntactic analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |