CN114398534A - Event cluster text retrieval system - Google Patents

Event cluster text retrieval system Download PDF

Info

Publication number
CN114398534A
CN114398534A CN202210001964.2A CN202210001964A CN114398534A CN 114398534 A CN114398534 A CN 114398534A CN 202210001964 A CN202210001964 A CN 202210001964A CN 114398534 A CN114398534 A CN 114398534A
Authority
CN
China
Prior art keywords
event
text
calculation method
text vector
similarity calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210001964.2A
Other languages
Chinese (zh)
Other versions
CN114398534B (en
Inventor
刘彬
王玉娟
王锐
王震
倪晔玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Posts & Telecommunications Designing Consulting Institute Co ltd
Original Assignee
Shanghai Posts & Telecommunications Designing Consulting Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Posts & Telecommunications Designing Consulting Institute Co ltd filed Critical Shanghai Posts & Telecommunications Designing Consulting Institute Co ltd
Publication of CN114398534A publication Critical patent/CN114398534A/en
Application granted granted Critical
Publication of CN114398534B publication Critical patent/CN114398534B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an event clustering text retrieval system, which comprises: the system comprises a processor, a memory for storing computer programs, a crawling database and a display interface. The crawling database stores an event text vector and a corresponding event word segmentation weight obtained by performing word segmentation on an event text, and an associated text vector and a corresponding associated word segmentation weight obtained by performing word segmentation on an associated text related to the event text. The processor is used for calculating the similarity between any event text vector and the corresponding associated text vector based on different similarity calculation formulas, and presenting the corresponding associated text on the display interface in a similarity descending manner. The invention can improve the acquisition efficiency and the text pertinence.

Description

Event cluster text retrieval system
Technical Field
The invention relates to the field of physics, in particular to an information processing technology, and specifically relates to an event clustering text retrieval system.
Background
On the internet, an event is presented in the form of text, or a text. At this point the user wishes to be able to read all the text relating to the event. At present, when a user wants to know a certain event, keywords are input on the internet for retrieval, and the internet presents relevant texts of the event to the user according to a time sequence. However, the presentation in time sequence is not uniform in subject and not strong in pertinence, and is not beneficial to the user. Therefore, there is a need to cluster such texts to provide texts with topic unification and strong pertinence.
Disclosure of Invention
Aiming at the technical problem, the invention provides an event clustering text retrieval system which can provide relevant texts with uniform special subjects and strong pertinence aiming at a certain event.
The technical scheme adopted by the invention is as follows:
the invention provides an event clustering text retrieval system, which is arranged at a cloud end and is used for concurrently executing the processing of a plurality of event texts, and the system comprises: the system comprises a processor, a memory for storing computer programs, a crawling database and a display interface;
the crawling database stores event text vectors obtained by performing word segmentation on event texts and corresponding event word segmentation weights, and associated text vectors obtained by performing word segmentation on M associated texts related to the event texts and corresponding associated word segmentation weights; wherein, any event text vector E ═ (E)1,e2,......,em) And corresponding event participlesWeight WE ═ WE1,we2,......,wem),eiFor the ith participle, we, in event text vector EiFor word segmentation eiM is the number of participles in the event text vector E, and the value of i is 1 to m; any one of the associated text vectors Pj=(pj1,pj2,......,pjn) And corresponding associated participle weights WPj=(wpj1,wpj2,......,wpjn),pjtFor associated text vectors PjT-th participle of (1), wpjtFor word segmentation pjtJn is the associated text vector PjThe number of the medium participles, j is 1 to M, and t is 1 to n;
for any event text vector E corresponding associated text vector PjThe processor is used for executing the computer program to realize the following steps:
obtaining E # Pj=(b1,b2,......,bV) And V is E ^ N and PjThe number of medium participles;
comparing the number m of the participles in the event text vector E with a preset first threshold value D1;
selecting a preset similarity calculation method to calculate the event text vector E and the associated text vector P based on the comparison resultjSimilarity of (2)j(ii) a The preset similarity calculation method includes a first similarity calculation method
Figure BDA0003455000700000021
And second similarity calculation method
Figure BDA0003455000700000022
W1kFor word segmentation bkParticiple weight for corresponding participle in event text vector E, W2kFor word segmentation bkIn associating text vector PjThe word weight of the corresponding word;
and traversing the M associated text vectors, and presenting the associated texts on the display interface in a similarity descending manner.
According to the event clustering text retrieval system provided by the embodiment of the invention, the corresponding similarity calculation method is selected to calculate the similarity between the event text and the associated text based on the word segmentation quantity of the event text vector, so that the acquisition efficiency is improved and the calculation resources of the server are saved on the premise of ensuring the accuracy of acquiring the similarity. In addition, the associated texts are presented in a similarity descending manner, so that the presented associated texts have pertinence, and the user experience can be improved.
Detailed Description
In order to make the technical problems, technical solutions and advantages to be solved by the present invention clearer, the following detailed description is given with reference to specific embodiments.
In some flows described in the specification and claims of this invention, a number of operations are included in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they appear herein, with the order of the operations being indicated by the numbers 101, 102, etc. merely to distinguish between the various operations, which by themselves do not represent any order of execution. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.
The technical solutions in the embodiments of the present invention are clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides an event clustering text retrieval system which is arranged at a cloud end, is particularly deployed on a cloud end server and is used for concurrently executing processing of a plurality of event texts. The system comprises: the system comprises a processor, a memory for storing computer programs, a crawling database and a display interface.
In the embodiment of the invention, the event text vector obtained by performing word segmentation processing on the event text and the corresponding event text vector are stored in the crawling databaseAnd performing word segmentation processing on the M associated texts related to the event text to obtain associated text vectors and corresponding associated word segmentation weights. Wherein, any event text vector E ═ (E)1,e2,......,em) And corresponding event participle weight WE ═ WE1,we2,......,wem),eiFor the ith participle, we, in event text vector EiFor word segmentation eiM is the number of participles in the event text vector E, and the value of i is 1 to m. The invention can use the existing word segmentation technology to perform word segmentation processing on the event text. The weight of each participle may be determined based on prior art techniques. Preferably weiFor word segmentation eiNumber of occurrences in the event text.
In addition, any of the associated text vectors Pj=(pj1,pj2,......,pjn) And corresponding associated participle weights WPj=(wpj1,wpj2,......,wpjn),pjtFor associated text vectors PjT-th participle of (1), wpjtFor word segmentation pjtJn is the associated text vector PjThe number of the middle participles, j is 1 to M, and t is 1 to n. In an embodiment of the present invention, the associated text is a text obtained according to the event text, and the obtaining manner may be a manner in the prior art, for example, a specific method for reporting content of related listed companies in the royal flush software, or any other clustering manner in the prior art. The invention can use the existing word segmentation technology to perform word segmentation processing on the associated text. The weight of each participle may be determined based on prior art techniques. Preferably, wpjtFor word segmentation pjtNumber of occurrences in the associated text.
In the embodiment of the invention, the processor is used for calculating the similarity between any event text vector and the corresponding associated text vector, and presenting the corresponding associated text on the display interface in a way of descending the similarity. The specific execution function of the processor of the present invention is described below by way of embodiments 1 to 5.
(example 1)
In this embodiment, for any event text vector E, the corresponding associated text vector PjThe processor is used for executing the computer program to realize the following steps:
s101, obtaining an event text vector E and an associated text vector PjThe intersection of (E &) Pj=(b1,b2,......,bV) (ii) a V is E ^ N and PjThe number of medium participles.
S102, obtaining
Figure BDA0003455000700000041
W1kFor word segmentation bkParticiple weight for corresponding participle in event text vector E, W2kFor word segmentation bkIn associating text vector PjThe word weight of the corresponding word;
s103, traversing the M associated text vectors, and presenting the associated texts on the display interface in a similarity descending manner.
In the embodiment, the associated texts can be presented in a similarity descending manner, and compared with the prior art, the presented associated texts can have pertinence.
(example 2)
In this embodiment, for any event text vector E, the corresponding associated text vector PjThe processor is used for executing the computer program to realize the following steps:
s201, obtaining E # Pj=(b1,b2,......,bV) And V is E ^ N and PjThe number of medium participles.
S202, comparing the number m of the participles in the event text vector E with a preset first threshold value D1.
S203, selecting a preset similarity calculation method to calculate the event text vector E and the associated text vector P based on the comparison resultjSimilarity of (2)j(ii) a The preset similarity calculation method includes a first similarity calculation method
Figure BDA0003455000700000042
And second similarity calculation method
Figure BDA0003455000700000043
W1kFor word segmentation bkParticiple weight for corresponding participle in event text vector E, W2kFor word segmentation bkIn associating text vector PjThe word weight of the corresponding word; the method specifically comprises the following steps:
s2031, if m>D1, selecting the first similarity calculation method to calculate
Figure BDA0003455000700000044
S2032, if m is less than or equal to D1, selecting the second similarity calculation method to calculate
Figure BDA0003455000700000045
S204, traversing the M associated text vectors, and presenting the associated texts on the display interface in a similarity descending manner.
Compared with the embodiment 1, the embodiment 2 can determine different similarity calculation modes according to the number of the event text participles, when the number of the event text participles is greater than a preset first threshold, the similarity is calculated by adopting the weight of the intersected participles of the event text vector and the associated text vector, and when the number of the event text participles does not exceed the preset first threshold, the similarity is calculated by directly adopting the number of the intersected participles of the event text vector and the associated text vector, the number of the participles of the event text vector and the number of the participles of the associated text vector, so that the acquisition efficiency can be improved and the calculation resources of the server can be saved on the premise of ensuring the accuracy of acquiring the similarity.
(example 3)
In this embodiment, for any event text vector E, the corresponding associated text vector PjThe processor is used for executing the computer program to realize the following steps:
s301, obtaining E # Pj=(b1,b2,......,bV) And V is E ^ N and PjThe number of medium participles.
S302, obtaining an event text vector E and an associated text vector PjIs the union of E ^ Pj=(b1,b2,......,bU) U is E U-PjThe number of medium participles.
S303, comparing the number m of the participles in the event text vector E with a preset first threshold value D1;
s304, selecting a preset similarity calculation method to calculate the event text vector E and the associated text vector P based on the comparison resultjSimilarity of (2)j(ii) a The preset similarity calculation method includes a first similarity calculation method
Figure BDA0003455000700000051
Second similarity calculation method
Figure BDA0003455000700000052
And third similarity calculation method
Figure BDA0003455000700000053
W1kFor word segmentation bkParticiple weight for corresponding participle in event text vector E, W2kFor word segmentation bkIn associating text vector PjThe word weight of the corresponding word; the method specifically comprises the following steps:
s3041, if m>D1, selecting the third similarity calculation method to calculate
Figure BDA0003455000700000054
S3042, if m is less than or equal to D1, selecting a second similarity calculation method to calculate
Figure BDA0003455000700000055
S305, traversing the M associated text vectors, and presenting the associated texts on the display interface in a similarity descending manner.
Similar to embodiment 2, compared to embodiment 1, embodiment 3 can determine different similarity calculation manners according to the number of event text participles, when the number of event text participles is greater than a preset first threshold, calculate the similarity by using the weight of the participles of the union of the event text vector and the associated text vector, and when the number of event text participles does not exceed the preset first threshold, directly calculate the similarity by using the number of the participles of the intersection of the event text vector and the associated text vector, the number of the participles of the event text vector and the number of the participles of the associated text vector, thereby improving the acquisition efficiency and saving the calculation resources of the server on the premise of ensuring the accuracy of acquiring the similarity.
(example 4)
In this embodiment, for any event text vector E, the corresponding associated text vector PjThe processor is used for executing the computer program to realize the following steps:
s401, obtaining E # Pj=(b1,b2,......,bV) And V is E ^ N and PjThe number of medium participles.
S402, comparing the number m of the participles in the event text vector E with a preset first threshold value D1.
S403, associating the text vector PjThe number jn of the medium participles is compared with a preset second threshold value D2.
S404, selecting a preset similarity calculation method to calculate the event text vector E and the associated text vector P based on the comparison resultjSimilarity of (2)j(ii) a The preset similarity calculation method includes a first similarity calculation method
Figure BDA0003455000700000061
And second similarity calculation method
Figure BDA0003455000700000062
W1kFor word segmentation bkParticiple weight for corresponding participle in event text vector E, W2kFor word segmentation bkIn associating text vector PjThe word weight of the corresponding word; the method specifically comprises the following steps:
s4041, if m>D1, selecting the first similarity calculation method to calculate
Figure BDA0003455000700000063
S4042, if m is less than or equal to D1 and jn is less than or equal to D2, selecting a second similarity calculation method to calculate
Figure BDA0003455000700000064
S405, traversing the M associated text vectors, and presenting the associated texts on the display interface in a similarity descending manner.
Further, in the embodiment of the present invention, the preset similarity calculation method further includes a fourth similarity calculation method
Figure BDA0003455000700000071
Step S404 further includes:
s4043, if m is less than or equal to D1 and jn>D2, selecting the fourth similarity calculation method to calculate
Figure BDA0003455000700000072
With respect to embodiment 1, embodiment 4 can determine different similarity calculation manners according to the number of event text and associated text participles, calculate similarity using weights of the intersected participles of an event text vector and an associated text vector when the number of event text participles is greater than a preset first threshold, calculate similarity using the number of the intersected participles of the event text vector and the associated text vector and the number of the participles of the event text vector and the associated text vector directly when the number of event text participles does not exceed the preset first threshold and the number of the associated text participles does not exceed a preset second threshold, and calculate similarity using weights of the intersected participles of the event text vector and the associated text vector and the number of the intersected participles of the associated text vector directly when the number of event text participles does not exceed the preset first threshold and the number of the associated text participles is greater than the preset second threshold, therefore, on the premise of ensuring the similarity acquisition accuracy, the acquisition efficiency can be improved, and the computing resources of the server are saved.
In the above embodiment of the present invention, the preset first threshold may have a value range of, for example, 20 to 100, and preferably, D1 is 50. Preferably, the preset second threshold may be equal to the preset first threshold, i.e., D2 — D1.
In the embodiment of the present invention, the memory and the processor can be general-purpose memory and processor, which are not specifically limited herein, and when the processor runs the computer program stored in the memory, the problems of low efficiency of associated text retrieval, and insufficient uniformity and pertinence of the presented text topic in the related art can be solved.
To sum up, in the event cluster text retrieval system provided by the embodiment of the present invention, when calculating the similarity between the event text and the associated text, a corresponding similarity calculation method is selected to calculate the similarity between the event text and the associated text based on the number of the participles of the event text vector or based on both the number of the participles of the event text vector and the number of the participles of the associated text vector, when the number of the participles of the event text vector is greater than a preset first threshold, the similarity is calculated by using the weights of the participles of the intersection or the union of the event text vector and the associated text vector, and when the number of the participles of the event text vector does not exceed the preset first threshold, the similarity is calculated by using the number of the participles of the intersection or the union of the event text vector and the number of the participles of the event text vector and the participles of the associated text vector, on the premise of ensuring the similarity acquisition accuracy, the acquisition efficiency is improved and the server computing resources are saved. In addition, the associated texts are presented in a similarity descending manner, so that the presented associated texts have pertinence, and the user experience can be improved.
The above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An event clustering text retrieval system is characterized by comprising: the system comprises a processor, a memory for storing computer programs, a crawling database and a display interface; is arranged at the cloud end and is used for concurrently executing the processing of a plurality of event texts,
the crawling database stores event text vectors obtained by performing word segmentation on event texts and corresponding event word segmentation weights, and associated text vectors obtained by performing word segmentation on M associated texts related to the event texts and corresponding associated word segmentation weights; wherein, any event text vector E ═ (E)1,e2,......,em) And corresponding event participle weight WE ═ WE1,we2,......,wem),eiFor the ith participle, we, in event text vector EiFor word segmentation eiM is the number of participles in the event text vector E, and the value of i is 1 to m; any one of the associated text vectors Pj=(pj1,pj2,......,pjn) And corresponding associated participle weights WPj=(wpj1,wpj2,......,wpjn),pjtFor associated text vectors PjT-th participle of (1), wpjtFor word segmentation pjtJn is the associated text vector PjThe number of the medium participles, j is 1 to M, and t is 1 to n;
corresponding association of text vector E for any eventText vector PjThe processor implements the following steps by executing the computer program:
obtaining E # Pj=(b1,b2,......,bV) And V is E ^ N and PjThe number of medium participles;
comparing the number m of the participles in the event text vector E with a preset first threshold value D1;
selecting a preset similarity calculation method to calculate the event text vector E and the associated text vector P based on the comparison resultjSimilarity of (2)j(ii) a The preset similarity calculation method includes a first similarity calculation method
Figure FDA0003455000690000011
And second similarity calculation method
Figure FDA0003455000690000012
W1kFor word segmentation bkParticiple weight for corresponding participle in event text vector E, W2kFor word segmentation bkIn associating text vector PjThe word weight of the corresponding word;
and traversing the M associated text vectors, and presenting the associated texts on the display interface in a similarity descending manner.
2. The event clustering text retrieval system of claim 1, wherein the event text vector E and the associated text vector P are calculated by selecting a preset similarity calculation method based on the comparison resultjSimilarity of (2)jThe method comprises the following steps:
if m is>D1, selecting the first similarity calculation method to calculate
Figure FDA0003455000690000021
Otherwise, selecting a second similarity calculation method to calculate
Figure FDA0003455000690000022
3. The event clustering text retrieval system of claim 1, further comprising, before the comparing the number m of participles in the event text vector E with a preset first threshold D1:
obtaining E U PjE∪Pj=(b1,b2,......,bU) U is E U-PjE∪PjThe number of medium participles;
the preset similarity calculation method further comprises a third similarity calculation method
Figure FDA0003455000690000023
4. The event clustering text retrieval system of claim 3, wherein the event text vector E and the associated text vector P are calculated by selecting a preset similarity calculation method based on the comparison resultjSimilarity of (2)jThe method comprises the following steps:
if m is>D1, selecting the third similarity calculation method to calculate
Figure FDA0003455000690000024
Otherwise, selecting a second similarity calculation method to calculate
Figure FDA0003455000690000025
5. The event clustering text retrieval system of claim 3, further comprising, after comparing the number m of participles in the event text vector E with a preset first threshold D1:
will associate a text vector PjThe number jn of the medium participles is compared with a preset second threshold value D2.
6. The event clustering text retrieval system of claim 5, whereinThen, the event text vector E and the associated text vector P are calculated by selecting a predetermined similarity calculation method based on the comparison resultjSimilarity of (2)jThe method comprises the following steps:
if m is>D1, selecting the first similarity calculation method to calculate
Figure FDA0003455000690000031
If m ≦ D1 and jn ≦ D2, then the second similarity calculation method is selected to calculate
Figure FDA0003455000690000032
7. The event cluster text retrieval system of claim 6, wherein the predetermined similarity calculation method further comprises a fourth similarity calculation method
Figure FDA0003455000690000033
And further comprising:
if m.ltoreq.D 1 and jn>D2, selecting the fourth similarity calculation method to calculate
Figure FDA0003455000690000034
8. The event clustering text retrieval system of claim 5, wherein the event text vector E and the associated text vector P are calculated by selecting a preset similarity calculation method based on the comparison resultjSimilarity of (2)jThe method comprises the following steps:
if m is>D1, selecting the third similarity calculation method to calculate
Figure FDA0003455000690000035
If m ≦ D1 and jn ≦ D2, then the second similarity calculation method is selected to calculate
Figure FDA0003455000690000036
9. The event cluster text retrieval system according to claim 1, wherein the preset first threshold value ranges from 20 to 100.
10. The event cluster text retrieval system of claim 5, wherein the second predetermined threshold is equal to the first predetermined threshold.
CN202210001964.2A 2021-01-05 2022-01-04 Event clustering text retrieval system Active CN114398534B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2021100052580 2021-01-05
CN202110005258 2021-01-05

Publications (2)

Publication Number Publication Date
CN114398534A true CN114398534A (en) 2022-04-26
CN114398534B CN114398534B (en) 2023-09-12

Family

ID=81228430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210001964.2A Active CN114398534B (en) 2021-01-05 2022-01-04 Event clustering text retrieval system

Country Status (1)

Country Link
CN (1) CN114398534B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6137911A (en) * 1997-06-16 2000-10-24 The Dialog Corporation Plc Test classification system and method
CN107239574A (en) * 2017-06-29 2017-10-10 北京神州泰岳软件股份有限公司 A kind of method and device of intelligent Answer System knowledge problem matching
CN109635077A (en) * 2018-12-18 2019-04-16 武汉斗鱼网络科技有限公司 Calculation method, device, electronic equipment and the storage medium of text similarity
CN110347795A (en) * 2019-07-05 2019-10-18 腾讯科技(深圳)有限公司 Search for relatedness computation method, apparatus, equipment and the medium of text and library file
CN111046271A (en) * 2018-10-15 2020-04-21 阿里巴巴集团控股有限公司 Mining method and device for search, storage medium and electronic equipment
CN111274808A (en) * 2020-02-11 2020-06-12 支付宝(杭州)信息技术有限公司 Text retrieval method, model training method, text retrieval device, and storage medium
CN111708879A (en) * 2020-05-11 2020-09-25 北京明略软件系统有限公司 Text aggregation method and device for event and computer-readable storage medium
US20200349183A1 (en) * 2019-05-03 2020-11-05 Servicenow, Inc. Clustering and dynamic re-clustering of similar textual documents

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6137911A (en) * 1997-06-16 2000-10-24 The Dialog Corporation Plc Test classification system and method
CN107239574A (en) * 2017-06-29 2017-10-10 北京神州泰岳软件股份有限公司 A kind of method and device of intelligent Answer System knowledge problem matching
CN111046271A (en) * 2018-10-15 2020-04-21 阿里巴巴集团控股有限公司 Mining method and device for search, storage medium and electronic equipment
CN109635077A (en) * 2018-12-18 2019-04-16 武汉斗鱼网络科技有限公司 Calculation method, device, electronic equipment and the storage medium of text similarity
US20200349183A1 (en) * 2019-05-03 2020-11-05 Servicenow, Inc. Clustering and dynamic re-clustering of similar textual documents
CN110347795A (en) * 2019-07-05 2019-10-18 腾讯科技(深圳)有限公司 Search for relatedness computation method, apparatus, equipment and the medium of text and library file
CN111274808A (en) * 2020-02-11 2020-06-12 支付宝(杭州)信息技术有限公司 Text retrieval method, model training method, text retrieval device, and storage medium
CN111708879A (en) * 2020-05-11 2020-09-25 北京明略软件系统有限公司 Text aggregation method and device for event and computer-readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
G. SURYA NARAYANA等: "Clustering for high dimensional categorical data based on text similarity", 《ICCIP \'16: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION PROCESSING》 *
金春霞等: "动态向量的中文短文本聚类", 《计算机工程与应用》 *

Also Published As

Publication number Publication date
CN114398534B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
US20220188708A1 (en) Systems and methods for predictive coding
JP4011906B2 (en) Profile information search method, program, recording medium, and apparatus
CN109325108B (en) Query processing method, device, server and storage medium
US10565253B2 (en) Model generation method, word weighting method, device, apparatus, and computer storage medium
TWI643076B (en) Financial analysis system and method for unstructured text data
JP7430820B2 (en) Sorting model training method and device, electronic equipment, computer readable storage medium, computer program
CN109885180B (en) Error correction method and apparatus, computer readable medium
CN111008272A (en) Knowledge graph-based question and answer method and device, computer equipment and storage medium
CN110032650B (en) Training sample data generation method and device and electronic equipment
CN113177154A (en) Search term recommendation method and device, electronic equipment and storage medium
KR20190128246A (en) Searching methods and apparatus and non-transitory computer-readable storage media
WO2020052547A1 (en) Method and apparatus for identifying new words in spam message, and electronic device
CN114116997A (en) Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium
CN110008396B (en) Object information pushing method, device, equipment and computer readable storage medium
CN113326420A (en) Question retrieval method, device, electronic equipment and medium
CN115239214B (en) Enterprise evaluation processing method and device and electronic equipment
CN113988157A (en) Semantic retrieval network training method and device, electronic equipment and storage medium
US20220107949A1 (en) Method of optimizing search system
JP7172187B2 (en) INFORMATION DISPLAY METHOD, INFORMATION DISPLAY PROGRAM AND INFORMATION DISPLAY DEVICE
CN111597336A (en) Processing method and device of training text, electronic equipment and readable storage medium
CN111737461A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN104572820B (en) The generation method and device of model, importance acquisition methods and device
CN114398534A (en) Event cluster text retrieval system
CN111708862B (en) Text matching method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant