CN114417811A - Similarity calculation method and device based on semantics and storage medium - Google Patents

Similarity calculation method and device based on semantics and storage medium Download PDF

Info

Publication number
CN114417811A
CN114417811A CN202111660511.2A CN202111660511A CN114417811A CN 114417811 A CN114417811 A CN 114417811A CN 202111660511 A CN202111660511 A CN 202111660511A CN 114417811 A CN114417811 A CN 114417811A
Authority
CN
China
Prior art keywords
document
matching
calculation
matched
matching degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111660511.2A
Other languages
Chinese (zh)
Inventor
胡成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiesi Security Technology Co ltd
Original Assignee
Beijing Jiesi Security Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiesi Security Technology Co ltd filed Critical Beijing Jiesi Security Technology Co ltd
Priority to CN202111660511.2A priority Critical patent/CN114417811A/en
Publication of CN114417811A publication Critical patent/CN114417811A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a similarity calculation method, a similarity calculation device and a storage medium based on semantics, wherein the method comprises the following steps: processing the provided service document to generate a template; the processing comprises word segmentation processing of the business document and construction of a space vector for the word segmentation; setting keywords and key sentences which are associated with document semantics for the generated template; processing the document to be matched in the same mode as the template is generated, and then performing matching calculation on the document to be matched and the template to obtain matching similarity; the matching calculation comprises word frequency similarity, weighted keyword matching degree and weighted keyword sentence matching degree calculation; if the matching similarity reaches a set threshold, the document to be matched is a document needing specific protection; the beneficial effects are as follows: in addition to conventional word frequency similarity calculation, the whole scheme also adds weighting processing of keywords and key sentences of associated semantics, so that a matching result is more accurate, and the occurrence of corresponding misjudgment situations is reduced.

Description

Similarity calculation method and device based on semantics and storage medium
Technical Field
The invention relates to the technical field of text similarity, in particular to a similarity calculation method and device based on semantics and a storage medium.
Background
In the endpoint security industry, whether a user specific service document is referred by other texts needs to be detected, a common matching mode is to define a sensitive word in advance, search is performed in a document by adopting a character string comparison mode, and the matched specific sensitive word is considered to belong to a sensitive document and needs to be protected.
Although the scheme of judging the similarity by means of word segmentation vectors appears in the prior art, the judgment is not carried out based on document content and semantics, so that misjudgment is easily generated on a matching result.
Disclosure of Invention
Aiming at the technical defects in the prior art, the embodiments of the present invention provide a semantic-based similarity calculation method, apparatus and storage medium, which can make the matching result more accurate and thereby reduce the misjudgment.
In order to achieve the above object, in a first aspect, an embodiment of the present invention provides a similarity calculation method based on semantics, where the method includes:
processing the provided service document to generate a template; the processing comprises word segmentation processing of the business document and construction of a space vector for the word segmentation;
setting keywords and key sentences associated with document semantics for the generated template;
processing the document to be matched in the same mode as the template is generated, and then performing matching calculation on the document to be matched and the template to obtain matching similarity; the matching calculation comprises word frequency similarity calculation, weighted keyword matching calculation and weighted keyword sentence matching calculation;
and if the matching similarity reaches a set threshold, the document to be matched is a document needing specific protection.
Preferably, during the matching calculation, it is first determined whether the document to be matched is a subset of the service document, and if so, it is directly determined that the document to be matched is a document that needs to be specifically protected without calculation.
Preferably, the weighted keyword matching degree is obtained by the following steps:
firstly, respectively acquiring a word segmentation list of the service document and a word segmentation list of a document to be matched;
then, taking a list with many word segmentations as a denominator, and taking the number of the longest same word segmentation part in the business document and the document to be matched as a numerator to obtain the matching degree of the keywords;
and finally, combining the keyword matching degree with a preset keyword weighted value to obtain the weighted keyword matching degree.
Preferably, the weighted key sentence matching degree is obtained by the following steps:
extracting key sentences from the service document and the document to be matched respectively to form respective key sentence lists;
taking a list with many key sentences as a calculation denominator, and taking the number of sentences with similar key sentences in the two lists as numerators to obtain the matching degree of the key sentences;
and finally, combining the matching degree of the key sentences with a preset weight value of the key sentences to obtain the matching degree of the weighted key sentences.
In a second aspect, an embodiment of the present invention further provides a similarity calculation apparatus based on semantics, including:
the template generating module is used for processing the provided service document to generate a template; the processing comprises word segmentation processing of the business document and construction of a space vector for the word segmentation;
the setting module is used for setting keywords and key sentences which are associated with document semantics for the generated template;
the document to be matched generating module is used for processing the document to be matched according to the same mode of generating the template;
a similarity calculation module to:
after the document to be matched is processed, matching calculation is carried out on the document to be matched and the template to obtain matching similarity; the matching calculation comprises word frequency similarity calculation, weighted keyword matching calculation and weighted keyword sentence matching calculation;
if the matching similarity reaches a set threshold, the document to be matched is a document needing specific protection;
and the returning module is used for displaying the matching calculation result obtained by the similarity calculation module.
Preferably, during the matching calculation, it is first determined whether the document to be matched is a subset of the service document, and if so, it is directly determined that the document to be matched is a document that needs to be specifically protected without calculation.
Preferably, the weighted keyword matching degree is obtained by the following steps:
firstly, respectively acquiring a word segmentation list of the service document and a word segmentation list of a document to be matched;
then, taking a list with many word segmentations as a denominator, and taking the number of the longest same word segmentation part in the business document and the document to be matched as a numerator to obtain the matching degree of the keywords;
and finally, combining the keyword matching degree with a preset keyword weighted value to obtain the weighted keyword matching degree.
Preferably, the weighted key sentence matching degree is obtained by the following steps:
extracting key sentences from the service document and the document to be matched respectively to form respective key sentence lists;
taking a list with many key sentences as a calculation denominator, and taking the number of sentences with similar key sentences in the two lists as numerators to obtain the matching degree of the key sentences;
and finally, combining the matching degree of the key sentences with a preset weight value of the key sentences to obtain the matching degree of the weighted key sentences.
In a third aspect, the present invention also provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the method as provided in the first aspect.
The embodiment of the invention is implemented by processing the business documents providing the materials to generate a template, setting the associated keywords and key sentences, then processing the documents to be matched according to the same mode as the template, and then calculating the word frequency similarity, the weighted keyword matching degree and the weighted key sentence matching degree with the template; in addition to conventional word frequency similarity calculation, the whole scheme also adds weighting processing of keywords and key sentences of associated semantics, so that a matching result is more accurate, and the occurrence of corresponding misjudgment situations is reduced.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below.
Fig. 1 is a flowchart of a semantic-based similarity calculation method according to an embodiment of the present invention;
fig. 2 is a block diagram of a semantic-based similarity calculation apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a method for calculating similarity based on semantics according to an embodiment of the present invention includes:
s101, processing the provided service document to generate a template; the processing comprises word segmentation processing of the business documents and construction of space vectors of the word segmentation.
Specifically, the business document is a material document provided for a business scenario of a user actual application process; for example, marketing strategies, planning reports, etc. in an enterprise that involve confidential content;
the method comprises the steps of learning a material document, performing word segmentation processing, removing stop words, calculating word frequency, constructing word frequency vectors and the like by extracting document contents, and generating the template.
And S102, setting keywords and key sentences which are associated with document semantics for the generated template.
Specifically, the setting comprises two setting modes; one of the words is obtained according to the word frequency quantity, and the other words is obtained by pre-marking according to the type of the service document; the key sentence is composed of a plurality of keywords.
S103, processing the document to be matched in the same mode as the template is generated, and then performing matching calculation on the document to be matched and the template to obtain matching similarity; and the matching calculation comprises word frequency similarity calculation, weighted keyword matching calculation and weighted keyword sentence matching calculation.
Specifically, the matching similarity is word frequency space vector similarity, word frequency weight + content keyword matching degree, keyword sentence matching degree; the setting of the weight values may be performed in parallel in the setting process.
The calculation of the word frequency similarity is a mature prior art and is not described herein in detail;
the matching degree of the weighted keywords is obtained through the following steps:
firstly, respectively acquiring a word segmentation list of the service document and a word segmentation list of a document to be matched;
then, taking a list with many word segmentations as a denominator, and taking the number of the longest same word segmentation part in the business document and the document to be matched as a numerator to obtain the matching degree of the keywords; namely, during matching, the segmentation possibly occurs at any position of the document, so that the front and back sequence of the segmentation is limited and reflected, and the situation that the content of the original business document is greatly different due to a conventional similarity judgment mode is reduced;
finally, combining the keyword matching degree with a preset keyword weight value to obtain the weighted keyword matching degree;
similarly, the matching degree of the weighted key sentence is obtained by the following steps:
extracting key sentences from the service document and the document to be matched respectively to form respective key sentence lists;
taking a list with many key sentences as a calculation denominator, and taking the number of sentences with similar key sentences in the two lists as numerators to obtain the matching degree of the key sentences;
finally, combining the matching degree of the key sentences with a preset weight value of the key sentences to obtain the matching degree of the weighted key sentences;
it should be noted that, in this embodiment, the keyword matching degree and the content keyword matching degree have the same meaning; the matching degree of the key sentences has the same meaning with the matching degree of the content key sentences.
And S104, if the matching similarity reaches a set threshold, the document to be matched is a document needing specific protection.
Specifically, the setting of the threshold may be adjusted according to the service type, and is not limited herein.
To facilitate a better understanding of the invention, a specific example is described.
Setting the word frequency weight value to be 0.5, the keyword weight value to be 0.3 and the keyword sentence weight value to be 0.2;
1, content keyword matching degree calculation method
The business document A and the comparison article B are respectively subjected to word segmentation, and the word segmentation result is as follows:
word segmentation list of business document a: a1, A2 … … A15
Comparing the word segmentation list of article B: b1, B2 … … B21
Finding out the business document A and the longest identical word segmentation part segment in the comparison article B by taking the list with many words as a calculation denominator (here, the list of B) (assuming that A6, A7, A8, A9 and A10 are identical to B5, B6, B7, B8 and B9), and then the similarity calculation formula of position matching is 5/21;
2, calculating matching degree of key sentences in content
The business document A and the comparison article B respectively extract key sentences, and the results are as follows:
list of key sentences of the business document a: w1, W2
List of key sentences of article B: c1, C2.
Taking a list with many key sentences as a calculation denominator, adding 1 to the numerators with similar key sentences, and assuming that 5 sentences are similar, the similarity calculation formula matched with the key sentences is 5/9;
the matching similarity is 0.5+ (5/21) × 0.3+ (5/9) × 0.2.
In another embodiment, to further improve the processing efficiency, the method further comprises:
when the documents to be matched are matched and calculated, whether the documents to be matched are the subset of the business documents is judged, if yes, calculation is not needed, and the documents to be matched are directly judged to be the documents needing specific protection.
Therefore, the situation that the difference between the business documents of the selected materials and the article to be compared is large can be conveniently handled, the corresponding matching calculation process can be reduced through the judgment of the subsets, and the efficiency is further improved.
For example: and if the A is completely contained in the B or the B is completely contained in the A, the matching degree of the content keywords and the matching degree of the content keywords are not calculated.
According to the technical scheme, a business document providing materials is processed to generate a template, related keywords and key sentences are set, then the document to be matched is processed in the same mode as the template, and then word frequency similarity, weighted keyword matching degree and weighted key sentence matching degree are calculated with the template; in addition to conventional word frequency similarity calculation, the whole scheme also adds weighting processing of keywords and key sentences of associated semantics, so that a matching result is more accurate, and the occurrence of corresponding misjudgment situations is reduced.
Based on the same inventive concept, the embodiment of the present invention provides a similarity calculation apparatus based on semantics, as shown in fig. 2, including a template generation module 1, a setting module 2, a to-be-matched document generation module 3, a similarity calculation module 4, and a return module 5.
The template generating module 1 is used for processing the provided service document to generate a template; the processing comprises word segmentation processing of the business document and construction of a space vector for the word segmentation;
the setting module 2 is used for setting keywords and key sentences which are associated with document semantics for the generated template;
the document to be matched generating module 3 is used for processing the document to be matched according to the same mode of generating the template;
a similarity calculation module 4, configured to:
after the document to be matched is processed, matching calculation is carried out on the document to be matched and the template to obtain matching similarity; the matching calculation comprises word frequency similarity calculation, weighted keyword matching calculation and weighted keyword sentence matching calculation;
if the matching similarity reaches a set threshold, the document to be matched is a document needing specific protection;
and the returning module 5 is used for displaying the matching calculation result obtained by the similarity calculation module.
When the method is applied, the matching degree of the weighted keywords is obtained through the following steps:
firstly, respectively acquiring a word segmentation list of the service document and a word segmentation list of a document to be matched;
then, taking a list with many word segmentations as a denominator, and taking the number of the longest same word segmentation part in the business document and the document to be matched as a numerator to obtain the matching degree of the keywords;
finally, combining the keyword matching degree with a preset keyword weight value to obtain the weighted keyword matching degree;
similarly, the matching degree of the weighted key sentence is obtained by the following steps:
extracting key sentences from the service document and the document to be matched respectively to form respective key sentence lists;
taking a list with many key sentences as a calculation denominator, and taking the number of sentences with similar key sentences in the two lists as numerators to obtain the matching degree of the key sentences;
and finally, combining the matching degree of the key sentences with a preset weight value of the key sentences to obtain the matching degree of the weighted key sentences.
Further, in order to improve processing efficiency, during the matching calculation, it is first determined whether the document to be matched is a subset of the service document, and if so, it is directly determined that the document to be matched is a document that needs specific protection without calculation.
It should be noted that, for a more specific workflow of the similarity calculation apparatus, please refer to the foregoing method embodiment, which is not described herein again.
The implementation of the scheme overcomes the defect that the existing similarity matching algorithm mainly judges the similarity in a word segmentation vector mode and does not judge based on document content and semantics; in the scheme, the conventional word frequency space vector is used for adding the similarity, and the weighting processing of the content key words and the content key sentences is also added, so that the matching result is more accurate, and the occurrence of misjudgment is reduced.
In this embodiment, a computer-readable storage medium is further provided, where a computer program is stored, and when executed by a processor, the computer program causes the processor to execute the steps of the embodiment of the semantic-based similarity calculation method.
In particular, the computer-readable storage medium may include Cache (Cache), high-speed Random Access Memory (RAM), such as common double data rate synchronous dynamic random access memory (DDR SDRAM), and may also include non-volatile memory (NVRAM), such as one or more read-only memories (ROM), disk storage devices, Flash memory (Flash) memory devices, or other non-volatile solid-state memory devices, such as compact disk (CD-ROM, DVD-ROM), floppy disks or data tapes, and so forth.
Those of ordinary skill in the art will appreciate that the various illustrative modules and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention.

Claims (9)

1. A semantic-based similarity calculation method, comprising:
processing the provided service document to generate a template; the processing comprises word segmentation processing of the business document and construction of a space vector for the word segmentation;
setting keywords and key sentences associated with document semantics for the generated template;
processing the document to be matched in the same mode as the template is generated, and then performing matching calculation on the document to be matched and the template to obtain matching similarity; the matching calculation comprises word frequency similarity calculation, weighted keyword matching calculation and weighted keyword sentence matching calculation;
and if the matching similarity reaches a set threshold, the document to be matched is a document needing specific protection.
2. The semantic-based similarity calculation method according to claim 1, wherein during the matching calculation, it is first determined whether the document to be matched is a subset of the business documents, and if so, it is directly determined that the document to be matched is a document that needs specific protection without calculation.
3. A semantic-based similarity calculation method according to claim 1 or 2, wherein the weighted keyword matching degree is obtained by:
firstly, respectively acquiring a word segmentation list of the service document and a word segmentation list of a document to be matched;
then, taking a list with many word segmentations as a denominator, and taking the number of the longest same word segmentation part in the business document and the document to be matched as a numerator to obtain the matching degree of the keywords;
and finally, combining the keyword matching degree with a preset keyword weighted value to obtain the weighted keyword matching degree.
4. The semantic-based similarity calculation method according to claim 3, wherein the weighted key sentence matching degree is obtained by the following steps:
extracting key sentences from the service document and the document to be matched respectively to form respective key sentence lists;
taking a list with many key sentences as a calculation denominator, and taking the number of sentences with similar key sentences in the two lists as numerators to obtain the matching degree of the key sentences;
and finally, combining the matching degree of the key sentences with a preset weight value of the key sentences to obtain the matching degree of the weighted key sentences.
5. A semantic-based similarity calculation apparatus, comprising:
the template generating module is used for processing the provided service document to generate a template; the processing comprises word segmentation processing of the business document and construction of a space vector for the word segmentation;
the setting module is used for setting keywords and key sentences which are associated with document semantics for the generated template;
the document to be matched generating module is used for processing the document to be matched according to the same mode of generating the template;
a similarity calculation module to:
after the document to be matched is processed, matching calculation is carried out on the document to be matched and the template to obtain matching similarity; the matching calculation comprises word frequency similarity calculation, weighted keyword matching calculation and weighted keyword sentence matching calculation;
if the matching similarity reaches a set threshold, the document to be matched is a document needing specific protection;
and the returning module is used for displaying the matching calculation result obtained by the similarity calculation module.
6. The semantic-based similarity calculation device according to claim 5, wherein during the matching calculation, it is first determined whether the document to be matched is a subset of the business documents, and if so, it is directly determined that the document to be matched is a document that needs specific protection without calculation.
7. A semantic-based similarity calculation apparatus according to claim 5 or 6 wherein the weighted keyword matching score is derived by:
firstly, respectively acquiring a word segmentation list of the service document and a word segmentation list of a document to be matched;
then, taking a list with many word segmentations as a denominator, and taking the number of the longest same word segmentation part in the business document and the document to be matched as a numerator to obtain the matching degree of the keywords;
and finally, combining the keyword matching degree with a preset keyword weighted value to obtain the weighted keyword matching degree.
8. The semantic-based similarity computation apparatus according to claim 7, wherein the weighted key sentence matching degree is obtained by:
extracting key sentences from the service document and the document to be matched respectively to form respective key sentence lists;
taking a list with many key sentences as a calculation denominator, and taking the number of sentences with similar key sentences in the two lists as numerators to obtain the matching degree of the key sentences;
and finally, combining the matching degree of the key sentences with a preset weight value of the key sentences to obtain the matching degree of the weighted key sentences.
9. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of a semantic-based similarity calculation method according to any one of claims 1 to 4.
CN202111660511.2A 2021-12-30 2021-12-30 Similarity calculation method and device based on semantics and storage medium Pending CN114417811A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111660511.2A CN114417811A (en) 2021-12-30 2021-12-30 Similarity calculation method and device based on semantics and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111660511.2A CN114417811A (en) 2021-12-30 2021-12-30 Similarity calculation method and device based on semantics and storage medium

Publications (1)

Publication Number Publication Date
CN114417811A true CN114417811A (en) 2022-04-29

Family

ID=81271546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111660511.2A Pending CN114417811A (en) 2021-12-30 2021-12-30 Similarity calculation method and device based on semantics and storage medium

Country Status (1)

Country Link
CN (1) CN114417811A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117829140A (en) * 2024-03-04 2024-04-05 证通股份有限公司 Automatic comparison method and system for regulations and regulations
CN118468321A (en) * 2024-07-11 2024-08-09 山东圣剑医学研究有限公司 Basic research data encryption storage method based on block chain technology

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170090678A (en) * 2016-01-29 2017-08-08 (주) 다이퀘스트 Apparatus for extracting scene keywords from video contents and keyword weighting factor calculation apparatus
CN108090077A (en) * 2016-11-23 2018-05-29 中国科学院沈阳计算技术研究所有限公司 A kind of comprehensive similarity computational methods based on natural language searching
CN110929022A (en) * 2018-09-18 2020-03-27 阿基米德(上海)传媒有限公司 Text abstract generation method and system
CN113377927A (en) * 2021-06-28 2021-09-10 成都卫士通信息产业股份有限公司 Similar document detection method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170090678A (en) * 2016-01-29 2017-08-08 (주) 다이퀘스트 Apparatus for extracting scene keywords from video contents and keyword weighting factor calculation apparatus
CN108090077A (en) * 2016-11-23 2018-05-29 中国科学院沈阳计算技术研究所有限公司 A kind of comprehensive similarity computational methods based on natural language searching
CN110929022A (en) * 2018-09-18 2020-03-27 阿基米德(上海)传媒有限公司 Text abstract generation method and system
CN113377927A (en) * 2021-06-28 2021-09-10 成都卫士通信息产业股份有限公司 Similar document detection method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐驰;陈丽容;: "基于TextRank和GloVe的自动文本摘要算法", 中国新通信, no. 09, 5 May 2019 (2019-05-05) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117829140A (en) * 2024-03-04 2024-04-05 证通股份有限公司 Automatic comparison method and system for regulations and regulations
CN117829140B (en) * 2024-03-04 2024-05-31 证通股份有限公司 Automatic comparison method and system for regulations and regulations
CN118468321A (en) * 2024-07-11 2024-08-09 山东圣剑医学研究有限公司 Basic research data encryption storage method based on block chain technology

Similar Documents

Publication Publication Date Title
US9460117B2 (en) Image searching
US8000504B2 (en) Multimodal classification of adult content
CN109635082B (en) Policy influence analysis method, device, computer equipment and storage medium
CN114417811A (en) Similarity calculation method and device based on semantics and storage medium
Laber et al. Shallow decision trees for explainable k-means clustering
CN102799647A (en) Method and device for webpage reduplication deletion
WO2011035210A2 (en) Method and system for scoring texts
US20130339369A1 (en) Search Method and Apparatus
CN110298024B (en) Method and device for detecting confidential documents and storage medium
WO2022116419A1 (en) Automatic determination method and apparatus for domain name infringement, electronic device, and storage medium
CN113807073B (en) Text content anomaly detection method, device and storage medium
CN110909540A (en) Method and device for identifying new words of short message spam and electronic equipment
CN114547257B (en) Class matching method and device, computer equipment and storage medium
CN112434158A (en) Enterprise label acquisition method and device, storage medium and computer equipment
CN111177372A (en) Scientific and technological achievement classification method, device, equipment and medium
Oliveira et al. A concept-based integer linear programming approach for single-document summarization
CN109918661B (en) Synonym acquisition method and device
CN111061924A (en) Phrase extraction method, device, equipment and storage medium
CN113515627B (en) Document detection method, device, equipment and storage medium
CN116166814A (en) Event detection method, device, equipment and storage medium
CN113704398B (en) Keyword extraction method and equipment
CN110795537B (en) Method, device, equipment and medium for determining improvement strategy of target commodity
CN107066623A (en) A kind of article merging method and device
CN113191777A (en) Risk identification method and device
McKelvey et al. Aligning entity names with online aliases on twitter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination