CN114398534A - Event cluster text retrieval system - Google Patents
Event cluster text retrieval system Download PDFInfo
- Publication number
- CN114398534A CN114398534A CN202210001964.2A CN202210001964A CN114398534A CN 114398534 A CN114398534 A CN 114398534A CN 202210001964 A CN202210001964 A CN 202210001964A CN 114398534 A CN114398534 A CN 114398534A
- Authority
- CN
- China
- Prior art keywords
- event
- text
- calculation method
- text vector
- similarity calculation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an event clustering text retrieval system, which comprises: the system comprises a processor, a memory for storing computer programs, a crawling database and a display interface. The crawling database stores an event text vector and a corresponding event word segmentation weight obtained by performing word segmentation on an event text, and an associated text vector and a corresponding associated word segmentation weight obtained by performing word segmentation on an associated text related to the event text. The processor is used for calculating the similarity between any event text vector and the corresponding associated text vector based on different similarity calculation formulas, and presenting the corresponding associated text on the display interface in a similarity descending manner. The invention can improve the acquisition efficiency and the text pertinence.
Description
Technical Field
The invention relates to the field of physics, in particular to an information processing technology, and specifically relates to an event clustering text retrieval system.
Background
On the internet, an event is presented in the form of text, or a text. At this point the user wishes to be able to read all the text relating to the event. At present, when a user wants to know a certain event, keywords are input on the internet for retrieval, and the internet presents relevant texts of the event to the user according to a time sequence. However, the presentation in time sequence is not uniform in subject and not strong in pertinence, and is not beneficial to the user. Therefore, there is a need to cluster such texts to provide texts with topic unification and strong pertinence.
Disclosure of Invention
Aiming at the technical problem, the invention provides an event clustering text retrieval system which can provide relevant texts with uniform special subjects and strong pertinence aiming at a certain event.
The technical scheme adopted by the invention is as follows:
the invention provides an event clustering text retrieval system, which is arranged at a cloud end and is used for concurrently executing the processing of a plurality of event texts, and the system comprises: the system comprises a processor, a memory for storing computer programs, a crawling database and a display interface;
the crawling database stores event text vectors obtained by performing word segmentation on event texts and corresponding event word segmentation weights, and associated text vectors obtained by performing word segmentation on M associated texts related to the event texts and corresponding associated word segmentation weights; wherein, any event text vector E ═ (E)1,e2,......,em) And corresponding event participlesWeight WE ═ WE1,we2,......,wem),eiFor the ith participle, we, in event text vector EiFor word segmentation eiM is the number of participles in the event text vector E, and the value of i is 1 to m; any one of the associated text vectors Pj=(pj1,pj2,......,pjn) And corresponding associated participle weights WPj=(wpj1,wpj2,......,wpjn),pjtFor associated text vectors PjT-th participle of (1), wpjtFor word segmentation pjtJn is the associated text vector PjThe number of the medium participles, j is 1 to M, and t is 1 to n;
for any event text vector E corresponding associated text vector PjThe processor is used for executing the computer program to realize the following steps:
obtaining E # Pj=(b1,b2,......,bV) And V is E ^ N and PjThe number of medium participles;
comparing the number m of the participles in the event text vector E with a preset first threshold value D1;
selecting a preset similarity calculation method to calculate the event text vector E and the associated text vector P based on the comparison resultjSimilarity of (2)j(ii) a The preset similarity calculation method includes a first similarity calculation methodAnd second similarity calculation methodW1kFor word segmentation bkParticiple weight for corresponding participle in event text vector E, W2kFor word segmentation bkIn associating text vector PjThe word weight of the corresponding word;
and traversing the M associated text vectors, and presenting the associated texts on the display interface in a similarity descending manner.
According to the event clustering text retrieval system provided by the embodiment of the invention, the corresponding similarity calculation method is selected to calculate the similarity between the event text and the associated text based on the word segmentation quantity of the event text vector, so that the acquisition efficiency is improved and the calculation resources of the server are saved on the premise of ensuring the accuracy of acquiring the similarity. In addition, the associated texts are presented in a similarity descending manner, so that the presented associated texts have pertinence, and the user experience can be improved.
Detailed Description
In order to make the technical problems, technical solutions and advantages to be solved by the present invention clearer, the following detailed description is given with reference to specific embodiments.
In some flows described in the specification and claims of this invention, a number of operations are included in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they appear herein, with the order of the operations being indicated by the numbers 101, 102, etc. merely to distinguish between the various operations, which by themselves do not represent any order of execution. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.
The technical solutions in the embodiments of the present invention are clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides an event clustering text retrieval system which is arranged at a cloud end, is particularly deployed on a cloud end server and is used for concurrently executing processing of a plurality of event texts. The system comprises: the system comprises a processor, a memory for storing computer programs, a crawling database and a display interface.
In the embodiment of the invention, the event text vector obtained by performing word segmentation processing on the event text and the corresponding event text vector are stored in the crawling databaseAnd performing word segmentation processing on the M associated texts related to the event text to obtain associated text vectors and corresponding associated word segmentation weights. Wherein, any event text vector E ═ (E)1,e2,......,em) And corresponding event participle weight WE ═ WE1,we2,......,wem),eiFor the ith participle, we, in event text vector EiFor word segmentation eiM is the number of participles in the event text vector E, and the value of i is 1 to m. The invention can use the existing word segmentation technology to perform word segmentation processing on the event text. The weight of each participle may be determined based on prior art techniques. Preferably weiFor word segmentation eiNumber of occurrences in the event text.
In addition, any of the associated text vectors Pj=(pj1,pj2,......,pjn) And corresponding associated participle weights WPj=(wpj1,wpj2,......,wpjn),pjtFor associated text vectors PjT-th participle of (1), wpjtFor word segmentation pjtJn is the associated text vector PjThe number of the middle participles, j is 1 to M, and t is 1 to n. In an embodiment of the present invention, the associated text is a text obtained according to the event text, and the obtaining manner may be a manner in the prior art, for example, a specific method for reporting content of related listed companies in the royal flush software, or any other clustering manner in the prior art. The invention can use the existing word segmentation technology to perform word segmentation processing on the associated text. The weight of each participle may be determined based on prior art techniques. Preferably, wpjtFor word segmentation pjtNumber of occurrences in the associated text.
In the embodiment of the invention, the processor is used for calculating the similarity between any event text vector and the corresponding associated text vector, and presenting the corresponding associated text on the display interface in a way of descending the similarity. The specific execution function of the processor of the present invention is described below by way of embodiments 1 to 5.
(example 1)
In this embodiment, for any event text vector E, the corresponding associated text vector PjThe processor is used for executing the computer program to realize the following steps:
s101, obtaining an event text vector E and an associated text vector PjThe intersection of (E &) Pj=(b1,b2,......,bV) (ii) a V is E ^ N and PjThe number of medium participles.
S102, obtainingW1kFor word segmentation bkParticiple weight for corresponding participle in event text vector E, W2kFor word segmentation bkIn associating text vector PjThe word weight of the corresponding word;
s103, traversing the M associated text vectors, and presenting the associated texts on the display interface in a similarity descending manner.
In the embodiment, the associated texts can be presented in a similarity descending manner, and compared with the prior art, the presented associated texts can have pertinence.
(example 2)
In this embodiment, for any event text vector E, the corresponding associated text vector PjThe processor is used for executing the computer program to realize the following steps:
s201, obtaining E # Pj=(b1,b2,......,bV) And V is E ^ N and PjThe number of medium participles.
S202, comparing the number m of the participles in the event text vector E with a preset first threshold value D1.
S203, selecting a preset similarity calculation method to calculate the event text vector E and the associated text vector P based on the comparison resultjSimilarity of (2)j(ii) a The preset similarity calculation method includes a first similarity calculation methodAnd second similarity calculation methodW1kFor word segmentation bkParticiple weight for corresponding participle in event text vector E, W2kFor word segmentation bkIn associating text vector PjThe word weight of the corresponding word; the method specifically comprises the following steps:
S2032, if m is less than or equal to D1, selecting the second similarity calculation method to calculate
S204, traversing the M associated text vectors, and presenting the associated texts on the display interface in a similarity descending manner.
Compared with the embodiment 1, the embodiment 2 can determine different similarity calculation modes according to the number of the event text participles, when the number of the event text participles is greater than a preset first threshold, the similarity is calculated by adopting the weight of the intersected participles of the event text vector and the associated text vector, and when the number of the event text participles does not exceed the preset first threshold, the similarity is calculated by directly adopting the number of the intersected participles of the event text vector and the associated text vector, the number of the participles of the event text vector and the number of the participles of the associated text vector, so that the acquisition efficiency can be improved and the calculation resources of the server can be saved on the premise of ensuring the accuracy of acquiring the similarity.
(example 3)
In this embodiment, for any event text vector E, the corresponding associated text vector PjThe processor is used for executing the computer program to realize the following steps:
s301, obtaining E # Pj=(b1,b2,......,bV) And V is E ^ N and PjThe number of medium participles.
S302, obtaining an event text vector E and an associated text vector PjIs the union of E ^ Pj=(b1,b2,......,bU) U is E U-PjThe number of medium participles.
S303, comparing the number m of the participles in the event text vector E with a preset first threshold value D1;
s304, selecting a preset similarity calculation method to calculate the event text vector E and the associated text vector P based on the comparison resultjSimilarity of (2)j(ii) a The preset similarity calculation method includes a first similarity calculation methodSecond similarity calculation methodAnd third similarity calculation methodW1kFor word segmentation bkParticiple weight for corresponding participle in event text vector E, W2kFor word segmentation bkIn associating text vector PjThe word weight of the corresponding word; the method specifically comprises the following steps:
S3042, if m is less than or equal to D1, selecting a second similarity calculation method to calculate
S305, traversing the M associated text vectors, and presenting the associated texts on the display interface in a similarity descending manner.
Similar to embodiment 2, compared to embodiment 1, embodiment 3 can determine different similarity calculation manners according to the number of event text participles, when the number of event text participles is greater than a preset first threshold, calculate the similarity by using the weight of the participles of the union of the event text vector and the associated text vector, and when the number of event text participles does not exceed the preset first threshold, directly calculate the similarity by using the number of the participles of the intersection of the event text vector and the associated text vector, the number of the participles of the event text vector and the number of the participles of the associated text vector, thereby improving the acquisition efficiency and saving the calculation resources of the server on the premise of ensuring the accuracy of acquiring the similarity.
(example 4)
In this embodiment, for any event text vector E, the corresponding associated text vector PjThe processor is used for executing the computer program to realize the following steps:
s401, obtaining E # Pj=(b1,b2,......,bV) And V is E ^ N and PjThe number of medium participles.
S402, comparing the number m of the participles in the event text vector E with a preset first threshold value D1.
S403, associating the text vector PjThe number jn of the medium participles is compared with a preset second threshold value D2.
S404, selecting a preset similarity calculation method to calculate the event text vector E and the associated text vector P based on the comparison resultjSimilarity of (2)j(ii) a The preset similarity calculation method includes a first similarity calculation methodAnd second similarity calculation methodW1kFor word segmentation bkParticiple weight for corresponding participle in event text vector E, W2kFor word segmentation bkIn associating text vector PjThe word weight of the corresponding word; the method specifically comprises the following steps:
S4042, if m is less than or equal to D1 and jn is less than or equal to D2, selecting a second similarity calculation method to calculate
S405, traversing the M associated text vectors, and presenting the associated texts on the display interface in a similarity descending manner.
Further, in the embodiment of the present invention, the preset similarity calculation method further includes a fourth similarity calculation methodStep S404 further includes:
s4043, if m is less than or equal to D1 and jn>D2, selecting the fourth similarity calculation method to calculate
With respect to embodiment 1, embodiment 4 can determine different similarity calculation manners according to the number of event text and associated text participles, calculate similarity using weights of the intersected participles of an event text vector and an associated text vector when the number of event text participles is greater than a preset first threshold, calculate similarity using the number of the intersected participles of the event text vector and the associated text vector and the number of the participles of the event text vector and the associated text vector directly when the number of event text participles does not exceed the preset first threshold and the number of the associated text participles does not exceed a preset second threshold, and calculate similarity using weights of the intersected participles of the event text vector and the associated text vector and the number of the intersected participles of the associated text vector directly when the number of event text participles does not exceed the preset first threshold and the number of the associated text participles is greater than the preset second threshold, therefore, on the premise of ensuring the similarity acquisition accuracy, the acquisition efficiency can be improved, and the computing resources of the server are saved.
In the above embodiment of the present invention, the preset first threshold may have a value range of, for example, 20 to 100, and preferably, D1 is 50. Preferably, the preset second threshold may be equal to the preset first threshold, i.e., D2 — D1.
In the embodiment of the present invention, the memory and the processor can be general-purpose memory and processor, which are not specifically limited herein, and when the processor runs the computer program stored in the memory, the problems of low efficiency of associated text retrieval, and insufficient uniformity and pertinence of the presented text topic in the related art can be solved.
To sum up, in the event cluster text retrieval system provided by the embodiment of the present invention, when calculating the similarity between the event text and the associated text, a corresponding similarity calculation method is selected to calculate the similarity between the event text and the associated text based on the number of the participles of the event text vector or based on both the number of the participles of the event text vector and the number of the participles of the associated text vector, when the number of the participles of the event text vector is greater than a preset first threshold, the similarity is calculated by using the weights of the participles of the intersection or the union of the event text vector and the associated text vector, and when the number of the participles of the event text vector does not exceed the preset first threshold, the similarity is calculated by using the number of the participles of the intersection or the union of the event text vector and the number of the participles of the event text vector and the participles of the associated text vector, on the premise of ensuring the similarity acquisition accuracy, the acquisition efficiency is improved and the server computing resources are saved. In addition, the associated texts are presented in a similarity descending manner, so that the presented associated texts have pertinence, and the user experience can be improved.
The above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. An event clustering text retrieval system is characterized by comprising: the system comprises a processor, a memory for storing computer programs, a crawling database and a display interface; is arranged at the cloud end and is used for concurrently executing the processing of a plurality of event texts,
the crawling database stores event text vectors obtained by performing word segmentation on event texts and corresponding event word segmentation weights, and associated text vectors obtained by performing word segmentation on M associated texts related to the event texts and corresponding associated word segmentation weights; wherein, any event text vector E ═ (E)1,e2,......,em) And corresponding event participle weight WE ═ WE1,we2,......,wem),eiFor the ith participle, we, in event text vector EiFor word segmentation eiM is the number of participles in the event text vector E, and the value of i is 1 to m; any one of the associated text vectors Pj=(pj1,pj2,......,pjn) And corresponding associated participle weights WPj=(wpj1,wpj2,......,wpjn),pjtFor associated text vectors PjT-th participle of (1), wpjtFor word segmentation pjtJn is the associated text vector PjThe number of the medium participles, j is 1 to M, and t is 1 to n;
corresponding association of text vector E for any eventText vector PjThe processor implements the following steps by executing the computer program:
obtaining E # Pj=(b1,b2,......,bV) And V is E ^ N and PjThe number of medium participles;
comparing the number m of the participles in the event text vector E with a preset first threshold value D1;
selecting a preset similarity calculation method to calculate the event text vector E and the associated text vector P based on the comparison resultjSimilarity of (2)j(ii) a The preset similarity calculation method includes a first similarity calculation methodAnd second similarity calculation methodW1kFor word segmentation bkParticiple weight for corresponding participle in event text vector E, W2kFor word segmentation bkIn associating text vector PjThe word weight of the corresponding word;
and traversing the M associated text vectors, and presenting the associated texts on the display interface in a similarity descending manner.
2. The event clustering text retrieval system of claim 1, wherein the event text vector E and the associated text vector P are calculated by selecting a preset similarity calculation method based on the comparison resultjSimilarity of (2)jThe method comprises the following steps:
3. The event clustering text retrieval system of claim 1, further comprising, before the comparing the number m of participles in the event text vector E with a preset first threshold D1:
obtaining E U PjE∪Pj=(b1,b2,......,bU) U is E U-PjE∪PjThe number of medium participles;
4. The event clustering text retrieval system of claim 3, wherein the event text vector E and the associated text vector P are calculated by selecting a preset similarity calculation method based on the comparison resultjSimilarity of (2)jThe method comprises the following steps:
5. The event clustering text retrieval system of claim 3, further comprising, after comparing the number m of participles in the event text vector E with a preset first threshold D1:
will associate a text vector PjThe number jn of the medium participles is compared with a preset second threshold value D2.
6. The event clustering text retrieval system of claim 5, whereinThen, the event text vector E and the associated text vector P are calculated by selecting a predetermined similarity calculation method based on the comparison resultjSimilarity of (2)jThe method comprises the following steps:
8. The event clustering text retrieval system of claim 5, wherein the event text vector E and the associated text vector P are calculated by selecting a preset similarity calculation method based on the comparison resultjSimilarity of (2)jThe method comprises the following steps:
9. The event cluster text retrieval system according to claim 1, wherein the preset first threshold value ranges from 20 to 100.
10. The event cluster text retrieval system of claim 5, wherein the second predetermined threshold is equal to the first predetermined threshold.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2021100052580 | 2021-01-05 | ||
CN202110005258 | 2021-01-05 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114398534A true CN114398534A (en) | 2022-04-26 |
CN114398534B CN114398534B (en) | 2023-09-12 |
Family
ID=81228430
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210001964.2A Active CN114398534B (en) | 2021-01-05 | 2022-01-04 | Event clustering text retrieval system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114398534B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6137911A (en) * | 1997-06-16 | 2000-10-24 | The Dialog Corporation Plc | Test classification system and method |
CN107239574A (en) * | 2017-06-29 | 2017-10-10 | 北京神州泰岳软件股份有限公司 | A kind of method and device of intelligent Answer System knowledge problem matching |
CN109635077A (en) * | 2018-12-18 | 2019-04-16 | 武汉斗鱼网络科技有限公司 | Calculation method, device, electronic equipment and the storage medium of text similarity |
CN110347795A (en) * | 2019-07-05 | 2019-10-18 | 腾讯科技(深圳)有限公司 | Search for relatedness computation method, apparatus, equipment and the medium of text and library file |
CN111046271A (en) * | 2018-10-15 | 2020-04-21 | 阿里巴巴集团控股有限公司 | Mining method and device for search, storage medium and electronic equipment |
CN111274808A (en) * | 2020-02-11 | 2020-06-12 | 支付宝(杭州)信息技术有限公司 | Text retrieval method, model training method, text retrieval device, and storage medium |
CN111708879A (en) * | 2020-05-11 | 2020-09-25 | 北京明略软件系统有限公司 | Text aggregation method and device for event and computer-readable storage medium |
US20200349183A1 (en) * | 2019-05-03 | 2020-11-05 | Servicenow, Inc. | Clustering and dynamic re-clustering of similar textual documents |
-
2022
- 2022-01-04 CN CN202210001964.2A patent/CN114398534B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6137911A (en) * | 1997-06-16 | 2000-10-24 | The Dialog Corporation Plc | Test classification system and method |
CN107239574A (en) * | 2017-06-29 | 2017-10-10 | 北京神州泰岳软件股份有限公司 | A kind of method and device of intelligent Answer System knowledge problem matching |
CN111046271A (en) * | 2018-10-15 | 2020-04-21 | 阿里巴巴集团控股有限公司 | Mining method and device for search, storage medium and electronic equipment |
CN109635077A (en) * | 2018-12-18 | 2019-04-16 | 武汉斗鱼网络科技有限公司 | Calculation method, device, electronic equipment and the storage medium of text similarity |
US20200349183A1 (en) * | 2019-05-03 | 2020-11-05 | Servicenow, Inc. | Clustering and dynamic re-clustering of similar textual documents |
CN110347795A (en) * | 2019-07-05 | 2019-10-18 | 腾讯科技(深圳)有限公司 | Search for relatedness computation method, apparatus, equipment and the medium of text and library file |
CN111274808A (en) * | 2020-02-11 | 2020-06-12 | 支付宝(杭州)信息技术有限公司 | Text retrieval method, model training method, text retrieval device, and storage medium |
CN111708879A (en) * | 2020-05-11 | 2020-09-25 | 北京明略软件系统有限公司 | Text aggregation method and device for event and computer-readable storage medium |
Non-Patent Citations (2)
Title |
---|
G. SURYA NARAYANA等: "Clustering for high dimensional categorical data based on text similarity", 《ICCIP \'16: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION PROCESSING》 * |
金春霞等: "动态向量的中文短文本聚类", 《计算机工程与应用》 * |
Also Published As
Publication number | Publication date |
---|---|
CN114398534B (en) | 2023-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804641B (en) | Text similarity calculation method, device, equipment and storage medium | |
US20220188708A1 (en) | Systems and methods for predictive coding | |
JP4011906B2 (en) | Profile information search method, program, recording medium, and apparatus | |
CN109325108B (en) | Query processing method, device, server and storage medium | |
US10565253B2 (en) | Model generation method, word weighting method, device, apparatus, and computer storage medium | |
TWI643076B (en) | Financial analysis system and method for unstructured text data | |
JP7430820B2 (en) | Sorting model training method and device, electronic equipment, computer readable storage medium, computer program | |
CN109885180B (en) | Error correction method and apparatus, computer readable medium | |
CN111008272A (en) | Knowledge graph-based question and answer method and device, computer equipment and storage medium | |
CN110032650B (en) | Training sample data generation method and device and electronic equipment | |
CN113177154A (en) | Search term recommendation method and device, electronic equipment and storage medium | |
KR20190128246A (en) | Searching methods and apparatus and non-transitory computer-readable storage media | |
WO2020052547A1 (en) | Method and apparatus for identifying new words in spam message, and electronic device | |
CN114116997A (en) | Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium | |
CN110008396B (en) | Object information pushing method, device, equipment and computer readable storage medium | |
CN113326420A (en) | Question retrieval method, device, electronic equipment and medium | |
CN115239214B (en) | Enterprise evaluation processing method and device and electronic equipment | |
CN113988157A (en) | Semantic retrieval network training method and device, electronic equipment and storage medium | |
US20220107949A1 (en) | Method of optimizing search system | |
JP7172187B2 (en) | INFORMATION DISPLAY METHOD, INFORMATION DISPLAY PROGRAM AND INFORMATION DISPLAY DEVICE | |
CN111597336A (en) | Processing method and device of training text, electronic equipment and readable storage medium | |
CN111737461A (en) | Text processing method and device, electronic equipment and computer readable storage medium | |
CN104572820B (en) | The generation method and device of model, importance acquisition methods and device | |
CN114398534A (en) | Event cluster text retrieval system | |
CN111708862B (en) | Text matching method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |