CN113536761B - Method for calculating sentence similarity based on frame importance - Google Patents

Method for calculating sentence similarity based on frame importance Download PDF

Info

Publication number
CN113536761B
CN113536761B CN202110776700.XA CN202110776700A CN113536761B CN 113536761 B CN113536761 B CN 113536761B CN 202110776700 A CN202110776700 A CN 202110776700A CN 113536761 B CN113536761 B CN 113536761B
Authority
CN
China
Prior art keywords
frame
frames
importance
information set
semantic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110776700.XA
Other languages
Chinese (zh)
Other versions
CN113536761A (en
Inventor
王铁鑫
史荟
刘文静
严欣华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202110776700.XA priority Critical patent/CN113536761B/en
Publication of CN113536761A publication Critical patent/CN113536761A/en
Application granted granted Critical
Publication of CN113536761B publication Critical patent/CN113536761B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for calculating sentence similarity based on frame importance, which specifically comprises the following steps: step 1: all frames in the English sentence S form a frame semantic information set E; step 2: extracting core frame elements of each frame in the set E; step 3: calculating importance of each frame according to the number of core frame elements in each frame in the set E; step 4: all frames in the English sentence S ' are formed into a frame semantic information set E ', and the importance of each frame in the set E ' is calculated; step 5: taking the same frames in the set E and the set E' as a group of frame groups; selecting a minimum frame importance in each frame group as the importance of the frames of the frame group; and carrying out accumulation calculation on the frame importance of all the frame groups, and calculating the similarity of English sentences S and S' based on the value of the accumulation calculation. The method provided by the invention can be applied to natural language processing tasks such as text inclusion recognition, text abstracts and the like.

Description

Method for calculating sentence similarity based on frame importance
Technical Field
The invention belongs to the technical field of natural language processing.
Background
The Frame semantic library FrameNet is a semantic knowledge base based on Frame Semantics (Frame Semantics) and is used for researching linguistic aspects such as linguistic aspects, computational linguistic aspects, natural language processing and the like. Concept structures and semantic scenes hidden behind words can be mined through frame semantics.
The frame (frame) in frame net refers to a sentence semantic structure form that expresses a specific scene, consisting of a word element (LUs) and its associated Frame Elements (FEs). The various participants, external conditions, etc. involved in the framework are referred to as framework elements. The frame elements are divided into core frame elements (CoreFEs) and non-frame elements (Peripheral, extra-thenmatic) according to the importance degree, wherein the core frame elements are essential components of a frame in concept understanding, and the number and the type of the core frame elements are different in different frames, so that the individuality of the frame is displayed; the non-core framework elements express general semantic components such as time, place and the like.
When a sentence contains multiple frames, the importance of the frames is not necessarily the same, and to accurately measure the similarity between sentences, the importance of the frames must be considered while the frames themselves are considered, however, measuring the importance of the frames in the sentence is not easy, because the measurement result is not constant according to different importance measurement standards. Frame importance metric selection is therefore the key to the frame importance metric. The existing similarity calculation method based on word level features does not consider the structural information of sentences; the similarity calculation method based on the sentence structure features fails to comprehensively consider sentence semantics. The traditional sentence similarity calculation method mainly aims at the problems of sentence keywords and structures, and the similarity calculation result is inaccurate due to incomplete semantic consideration and lack of interpretation.
Disclosure of Invention
The invention aims to: the invention provides a method for calculating sentence similarity based on frame importance in order to solve the problems existing in the prior art.
The technical scheme is as follows: the invention provides a method for calculating sentence similarity based on frame importance, which specifically comprises the following steps:
step 1: extracting all frames in the English sentence S, and forming a frame semantic information set E by all frames;
step 2: constructing a frame semantic library FrameNet visualization tool GIFN, and extracting core frame elements of each frame in the frame semantic information set E through the GIFN;
step 3: calculating a frame influence factor of each frame based on the number of core frame elements in each frame; establishing a frame importance function according to the frame influence factors to obtain the importance w (f) of the ith frame in the frame semantic information set E E,i ),f E,i Representing an ith frame in the frame semantic information set E, i=1, 2,..;
Step 4: all frames in the English sentence S ' are formed into a frame semantic information set E ' according to the steps 1-3, and the importance of each frame in the frame semantic information set E ' is calculated;
step 5: taking the same frames in E and E' as a group of frames to obtain frame_same frame groups; comparing the importance of two frames in the jth frame group, and selecting the minimum frame importance as the frame importance min of the jth frame group j J=1, 2,..frame_same; and carrying out accumulation calculation on the frame importance of the frame_same frame groups, and calculating the similarity of English sentences S and S' based on the accumulation calculated values.
Further, in the step 1, the english sentence S is input to an open source semantic frame extraction tool SEMAFOR, and the SEMAFOR parses the input english sentence S according to the structure of the frame semantic library FrameNet, thereby extracting the frame in the english sentence S.
Further, the specific method for constructing the frame semantic library FrameNet visualization tool GIFN in the step 2 is as follows: all frames in the FrameNet are used as nodes, semantic relations among the frames and semantic relations among the word elements and the frames are used as edges, and the nodes and the edges are stored in a graph database Neo4 j.
Further, the similarity calculation formula corresponding to the english sentence S and the sentence S' is as follows:
wherein similarity_score is the Similarity between english sentence S and sentence S'; frame_s 'is the total number of frames in the frame semantic information set E', maximum (); wherein Path_score is expressed as follows:
where frame_rel is the number of shortest path frame pairs, specifically the shortest pathThe method of the frame pair is as follows: removing the same frames as those in the frame semantic information set E' from the frame semantic information set E to obtain a set E1; removing the same frames as those in the frame semantic information set E from the frame semantic information set E 'to obtain a set E'1; obtaining the number of edges required by each frame in the set E1 to reach any frame in the set E'1 through the visualization tool GIFN; taking two frames with the minimum number of required edges as a shortest path frame pair; path_value i, The expression of (2) is as follows:
wherein CountPath is the number of edges required for one frame to reach the other frame in the ith' shortest path frame pair; weight (weight) t The weight of the t-th edge.
Further, in the step 3, the frame influencing factor is:
wherein c i Is f E,i The total number of middle core frame elements; n is n i Is f E,i Total number of middle frame elements, beta i Is f E,i Is a framework influencing factor of (a).
Further, in the step 3, the frame importance function is:
wherein the method comprises the steps ofBeta is i Is a combination of the above.
The beneficial effects are that: the invention considers the importance of the frame and simultaneously considers the importance of the frame, and can measure the similarity between sentences more accurately. The method and the device can be applied to natural language processing tasks such as text implication recognition, text abstract and the like.
Drawings
FIG. 1 is a schematic flow chart of a frame importance calculating method according to the present invention;
FIG. 2 is a flow diagram of extracting core frame elements from a frame semantic library FrameNet;
FIG. 3 is a flow chart for calculating a frame importance function;
fig. 4 is a semantic relationship diagram between partial frameworks in the GIFN.
Detailed Description
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
According to the method for calculating sentence similarity based on the frame importance, the core frame elements of the frame are extracted according to the frame semantic library FrameNet, the frame importance is distinguished through the number of the core frame elements contained in the frame, and the method is convenient to apply to natural language processing tasks such as text implication recognition and text abstract.
The following describes embodiments of the present invention in detail with reference to the drawings and examples, thereby fully understanding and implementing the implementation process of how the technical means are applied to solve the technical problems and achieve the technical effects. It should be noted that, as long as no conflict is formed, each embodiment of the present invention and each feature of each embodiment may be combined with each other, and the formed technical solutions are all within the protection scope of the present invention.
The FrameNet described in this embodiment refers to a semantic knowledge base based on Frame Semantics (Frame semantecs) constructed by the university of california, berkeley division, and is used for research in linguistics, computational linguistics, natural language processing, and the like. Concept structures and semantic scenes hidden behind words can be mined through frame semantics. In FrameNet, a frame refers to a semantic structural form of a sentence that expresses a particular scene, consisting of a word element and the frame elements it contacts. The various participants, external conditions, etc. involved in the framework are referred to as framework elements, which correspond in the real corpus to the vocabulary in the context that describes the event or the morphology of the event. The frame elements are divided into core frame elements and non-frame elements according to importance, wherein the core frame elements are essential components of a frame in concept understanding, and the number and the types of the frame elements are different in different frames, so that individuality of the frame is displayed; the non-core framework elements express general semantic components such as time, place and the like.
SEMAFOR is an open-source framework semantic parser. The method can automatically analyze English sentences according to the FrameNet structure to obtain frames, frame elements, specific contents pointed by the frame elements and the like stimulated by sentence contents. In the key steps of the implementation design, the frame semantic information is obtained through a SEMAFOR open source tool according to a semantic knowledge base FrameNet.
Neo4j is a high-performance NOSQL graph database that stores structured data on the network rather than in tables. It is an embedded, disk-based Java persistence engine with full transactional properties. Neo4j provides large scale scalability, can handle billions of nodes/relationships/attributes on one machine, and can be extended to multiple machines running in parallel. The graph database is good at handling large volumes of complex, interconnected, low structured data that change rapidly, requiring frequent queries-in the relational database, these queries result in a large number of table connections, thus creating performance problems. Neo4j focuses on solving the performance degradation problem that occurs when a conventional RDBMS with a large number of connections is queried. By modeling the data around the graph, neo4j traverses nodes and edges at the same speed, which does not have any relation to the amount of data that make up the graph.
As shown in fig. 1, the present embodiment provides a method for calculating sentence similarity based on frame importance, which includes:
step one, all frame semantic information is identified from English sentences S. And analyzing the sentence S according to a frame Net structure by using an open source frame semantic analysis tool SEMAFOR to obtain a frame stimulated by the content of the sentence S, wherein the frame comprises a word element and frame elements connected with the word element, and all the frames form a frame semantic information set E. The content of the input SEMAFOR is English sentence S, and the output is the result after the analysis of the SEMAFOR by the framework semantic analysis tool.
By means of the development tool Eclipse, the FrameNet is mapped into Neo4j, resulting in the constructed FrameNet visualization tool GIFN: all frames, frame elements and tokens in the frame semantic library frame net are taken as nodes (the frames are taken as nodes because the frames refer to sentence semantic structure forms which are formed by the tokens and the frame elements which are connected with the frames and express specific scenes), the relations among the frames and the relations among the tokens and the frames are taken as edges and stored in a graph database Neo4j, and the constructed frame net visualization tool 'Graphical Interpretation for FrameNet:GIFN' is obtained.
Step two, extracting the frames of each row from the results analyzed by the frame semantic analysis tool SEMAFOR: defining a frame extraction class to extract a plurality of frames of each row in a result after the frame semantic analysis tool SEMAFOR analysis, wherein a definition search frame method in the frame extraction class is used for extracting the plurality of frames of each row; the searchFrame method is invoked and the result is output as the entire frame contained in each row.
The frame extraction method is used for extracting frame elements contained in each frame in the result of frame semantic analysis by a frame semantic analysis tool SEMAFOR; and calling a searchFE method, and outputting the result as all frame elements contained under each frame.
Part of key codes of the second step are as follows:
the partial arrangement of the results obtained in the second step is shown in Table 1:
TABLE 1
Frame (Frame) FEs (frame element)
Statement {Message,Speaker}
Sign_agreement {Agreement,Signatory}
Ordinal_numbers {Type}
Possession {Possession}
Compliance {Protagonist}
Step three, the frame net visualization tool GIFN is utilized to display the frame net of the frame semantic library in a graphical form, frames (frames) and LUs (lements) in the GIFN contain an anonset, FEs (frame elements) contain a FEcoreSet (core frame element set), wherein the FEcoreSet represents the core frame elements of the frames in the set E: the specific flow is shown in figure 2; and defining a FEcoreextraction class to extract core frame elements contained in each frame in the result after the frame semantic analysis tool SEMAFOR analysis, and outputting the result of finding out the core frame elements of the frames through FEcoreset in the FEcoreextraction class as the core frame elements contained under each frame.
The partial result arrangement obtained in the third step is shown in table 2:
TABLE 2
Frame (Frame) CoreFEs (core frame element)
Statement {Message,Speaker}
Sign_agreement {Agreement,Signatory}
Ordinal_numbers {Type}
Possession {Possession}
Compliance {Protagonist}
And step four, calculating the frame influence factor of each frame based on the number of core frame elements in each frame. The probability that the number of core frame elements covered by each frame occupies the number of frame elements covered by the frame in the whole sentence S is calculated, the probability is defined as a frame influence factor in the frame semantic information set E, and a calculation formula is as follows:
wherein: c i Is f E,i The total number of middle core frame elements; n is n i Is f E,i Total number of middle frame elements, f E,i The i-th frame in set E, i=1, 2,..frame_s, frame_s is the total number of frames covered in sentence S. The greater the number of core framework elements, the greater the importance, the greater the impact factor value.
When the number of core frame elements covered by the two frames is the same, the semantic importance of the two frames is considered to be the same, and the influence factor values are the same.
In this embodiment, the fes calculation class is defined to calculate the total number of frame elements included in the english sentence S, the frame num method is defined to calculate the number of frames included in each sentence, the FEsNum method is defined to calculate the number of frame elements included in each frame, the FEsNum method is defined to accumulate the number of frame elements, and the output result is the total number of frame elements included in the whole sentence. The CoreFEs calculation class is defined to calculate the probability that the number of core frame elements covered by each frame occupies the number of frame elements covered by the frames in the whole sentence S, the CoreFEs num method is defined to calculate the number of frame elements covered by each frame, and the CoreFEs Per method is defined to calculate the probability by using the formula (1).
The partial calculation results obtained in the fourth step are shown in table 3:
TABLE 3 Table 3
Step five, constructing a framework influence factor matrix, wherein the framework influence factor matrix is as follows:
M=(β i ) frame_S×1
and step six, measuring the importance of the frames according to the frame influence factors in the set E, defining an importance function of each frame in the sentence S, and calculating the importance of each frame in the frame information set E. Giving corresponding weight according to the number of the core frame elements covered in the frames, and calculating the importance of each frame of the sentence S for the sentence, wherein the importance is specifically as follows: the method comprises the steps of carrying out a first treatment on the surface of the
The importance of each frame in the set of frame information E is initialized. The initialization formula for the importance of each frame in the frame information set E is:
the importance of the frames in the english sentence S is normalized. The importance calculation formula of each frame in the normalized English sentence S for the sentence is as follows:
wherein the method comprises the steps ofFor each element in the frame influencing factor matrix, not only beta i Is a combination of the exponential score of (a); 0 < w (f) E,i )≤1,
In one embodiment of the invention, the frame weight class is defined to calculate the importance of the frames, the CoreFEs Perall method is defined to accumulate the frame influence factors, the frame weight method is defined to calculate the importance of the frames by using the formula (2), and the output result is the importance of each frame corresponding to the sentences. A flow chart defining the framework importance function is shown in fig. 3.
Step seven, all frames in the English sentence S ' are formed into a frame semantic information set E ' according to the steps one to six, and the importance of each frame in the frame semantic information set E ' is calculated;
step eight: taking the same frames in the frame semantic information set E and the frame semantic information set E' as a group of frame groups; obtaining frame_same frame groups; comparing importance of two frames in the j-th frame group, and selecting the minimum frameImportance as importance min of frame of jth frame group j J=1, 2,., frame_same; the importance of frames of the frame_same frame groups is calculated in an accumulated mode, and the similarity of English sentences S and S' is calculated based on the following formula:
wherein similarity_score is the Similarity between english sentence S and sentence S'; frame_s 'is the total number of frames in the frame semantic information set E', maximum (); wherein the calculation formula of Path_score is as follows:
wherein frame_rel is the number of shortest path frame pairs, and the method for specifically obtaining the number of shortest path frame pairs is as follows: removing the same frames as those in the frame semantic information set E' from the frame semantic information set E to obtain a set E1; removing the same frames in the frame semantic information set E from the frame semantic information set E 'to obtain a set E'1; obtaining the number of edges required by each frame in the set E1 to reach any frame in the set E'1 through the visualization tool GIFN, wherein the semantic relationship among part of frames is shown in FIG. 4; taking two frames with the minimum number of required edges as a shortest path frame pair; path_value i, The expression of (2) is as follows:
wherein CountPath is the number of edges required for one frame to reach the other frame in the ith' shortest path frame pair; weight (weight) t The weight of the t-th edge. The weights for each path in fig. 4 are shown in table 4:
TABLE 4 Table 4
Inter-frame semantic relationships (semantic relationships represented by paths in GIFN are also bordered) Path weight
Inherits from 0.55
Is Inherited by 0.55
Perspective on 0.45
Is Perspective in 0.45
Users 0.3
Is Used by 0.3
Subframe of 0.35
Has Subframe(s) 0.35
Precedes 0.2
Is Preceded by 0.2
Is Inchoative of 0.3
Is Causative of 0.3
See also 0.4
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
In addition, the specific features described in the above embodiments may be combined in any suitable manner without contradiction. The various possible combinations of the invention are not described in detail in order to avoid unnecessary repetition.

Claims (3)

1. The method for calculating the sentence similarity based on the frame importance is characterized by comprising the following steps:
step 1: extracting all frames in the English sentence S, and forming a frame semantic information set E by all frames;
step 2: constructing a frame semantic library FrameNet visualization tool GIFN, and extracting core frame elements of each frame in the frame semantic information set E through the GIFN;
step 3: calculating a frame influence factor of each frame based on the number of core frame elements in each frame; establishing a frame importance function according to the frame influence factors to obtain the importance w (f) of the ith frame in the frame semantic information set E E,i ),f E,i The method comprises the steps of representing an ith frame in a frame semantic information set E, wherein i=1, 2, frame_S, and frame_S are the total number of frames in the frame semantic information set E;
step 4: all frames in the English sentence S ' are formed into a frame semantic information set E ' according to the steps 1-3, and the importance of each frame in the frame semantic information set E ' is calculated;
step 5: taking the same frames in E and E' as a group of frames to obtain frame_same frame groups; comparing the importance of two frames in the jth frame group, and selecting the minimum frame importance as the frame importance min of the jth frame group j J=1, 2, …, frame_same; accumulating the frame importance of the frame_same frame groups, and calculating the similarity of English sentences S and S' based on the accumulated value;
the corresponding similarity calculation formula between the english sentence S and the sentence S' is as follows:
wherein similarity_score is the Similarity between english sentence S and sentence S'; frame_s 'is the total number of frames in the frame semantic information set E', maximum (); wherein Path_score is expressed as follows:
wherein frame_rel is the number of shortest path frame pairs, and the method for specifically obtaining the shortest path frame pairs comprises the following steps: removing the same frames as those in the frame semantic information set E' from the frame semantic information set E to obtain a set E1; removing the same frames as those in the frame semantic information set E from the frame semantic information set E 'to obtain a set E'1; obtaining the number of edges required by each frame in the set E1 to reach any frame in the set E'1 through the visualization tool GIFN; taking two frames with the minimum number of required edges as a shortest path frame pair; path_value i, The expression of (2) is as follows:
wherein CountPath is the number of edges required for one frame to reach the other frame in the ith' shortest path frame pair; weight (weight) t The weight of the t-th edge;
the frame influence factors in the step 3 are as follows:
wherein c i Is f E,i The total number of middle core frame elements; n is n i Is f E,i Total number of middle frame elements, beta i Is f E,i Framework influencing factors of (2);
the frame importance function in the step 3 is as follows:
wherein the method comprises the steps ofBeta is i Is a combination of the above.
2. The method according to claim 1, wherein in the step 1, the english sentence S is input into an open source semantic frame extraction tool SEMAFOR, and the SEMAFOR parses the input english sentence S according to the structure of a frame semantic library FrameNet, thereby extracting the frame in the english sentence S.
3. The method for calculating sentence similarity based on frame importance according to claim 1, wherein the specific method for constructing the frame semantic library FrameNet visualization tool GIFN in step 2 is as follows: all frames in the FrameNet are used as nodes, semantic relations among the frames and semantic relations among the word elements and the frames are used as edges, and the nodes and the edges are stored in a graph database Neo4 j.
CN202110776700.XA 2021-07-09 2021-07-09 Method for calculating sentence similarity based on frame importance Active CN113536761B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110776700.XA CN113536761B (en) 2021-07-09 2021-07-09 Method for calculating sentence similarity based on frame importance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110776700.XA CN113536761B (en) 2021-07-09 2021-07-09 Method for calculating sentence similarity based on frame importance

Publications (2)

Publication Number Publication Date
CN113536761A CN113536761A (en) 2021-10-22
CN113536761B true CN113536761B (en) 2024-01-30

Family

ID=78127260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110776700.XA Active CN113536761B (en) 2021-07-09 2021-07-09 Method for calculating sentence similarity based on frame importance

Country Status (1)

Country Link
CN (1) CN113536761B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012113422A (en) * 2010-11-22 2012-06-14 Nippon Telegr & Teleph Corp <Ntt> Document processing apparatus, method and program
CN110889292A (en) * 2019-11-29 2020-03-17 福州大学 Text data viewpoint abstract generating method and system based on sentence meaning structure model
CN111324690A (en) * 2020-03-04 2020-06-23 南京航空航天大学 FrameNet-based graphical semantic database processing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9892111B2 (en) * 2006-10-10 2018-02-13 Abbyy Production Llc Method and device to estimate similarity between documents having multiple segments

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012113422A (en) * 2010-11-22 2012-06-14 Nippon Telegr & Teleph Corp <Ntt> Document processing apparatus, method and program
CN110889292A (en) * 2019-11-29 2020-03-17 福州大学 Text data viewpoint abstract generating method and system based on sentence meaning structure model
CN111324690A (en) * 2020-03-04 2020-06-23 南京航空航天大学 FrameNet-based graphical semantic database processing method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A joint FrameNet and element focusing Sentence-BERT method of sentence similarity computation;Tiexin Wang等;Expert Systems With Applications;第200卷(第117084期);1-11 *
面向英文句子的框架语义扩展及相似度计算;王铁鑫等;小型微型计算机系统;1-8 *

Also Published As

Publication number Publication date
CN113536761A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
US11520812B2 (en) Method, apparatus, device and medium for determining text relevance
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
TWI662425B (en) A method of automatically generating semantic similar sentence samples
CN108108426B (en) Understanding method and device for natural language question and electronic equipment
CN108549634A (en) A kind of Chinese patent text similarity calculating method
CN105843897A (en) Vertical domain-oriented intelligent question and answer system
CN107506389B (en) Method and device for extracting job skill requirements
CN104978314B (en) Media content recommendations method and device
CN110427478B (en) Knowledge graph-based question and answer searching method and system
CN104933027A (en) Open Chinese entity relation extraction method using dependency analysis
CN104391942A (en) Short text characteristic expanding method based on semantic atlas
CN110705612A (en) Sentence similarity calculation method, storage medium and system with mixed multi-features
WO2017198031A1 (en) Semantic parsing method and apparatus
CN103646112A (en) Dependency parsing field self-adaption method based on web search
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN105528437A (en) Question-answering system construction method based on structured text knowledge extraction
JP6729095B2 (en) Information processing device and program
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN109783806A (en) A kind of text matching technique using semantic analytic structure
CN114547298A (en) Biomedical relation extraction method, device and medium based on combination of multi-head attention and graph convolution network and R-Drop mechanism
CN110909126A (en) Information query method and device
CN109840255A (en) Reply document creation method, device, equipment and storage medium
CN109508460A (en) Unsupervised composition based on Subject Clustering is digressed from the subject detection method and system
CN109284389A (en) A kind of information processing method of text data, device
CN116362243A (en) Text key phrase extraction method, storage medium and device integrating incidence relation among sentences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant