CN115730589A - News propagation path generation method based on word vector and related device - Google Patents

News propagation path generation method based on word vector and related device Download PDF

Info

Publication number
CN115730589A
CN115730589A CN202211377457.5A CN202211377457A CN115730589A CN 115730589 A CN115730589 A CN 115730589A CN 202211377457 A CN202211377457 A CN 202211377457A CN 115730589 A CN115730589 A CN 115730589A
Authority
CN
China
Prior art keywords
news
vectors
vector
headline
title
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211377457.5A
Other languages
Chinese (zh)
Inventor
丁洪鑫
胥月
粟郡
曹扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC Big Data Research Institute Co Ltd
Original Assignee
CETC Big Data Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC Big Data Research Institute Co Ltd filed Critical CETC Big Data Research Institute Co Ltd
Priority to CN202211377457.5A priority Critical patent/CN115730589A/en
Publication of CN115730589A publication Critical patent/CN115730589A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a news propagation path generation method based on word vectors and a related device, which can improve the efficiency of searching similar news titles, thereby improving the generation efficiency of news propagation paths. The method comprises the following steps: acquiring a plurality of news headlines; vectorizing and mapping any news headline into a news headline vector through a Bert model; clustering the news title vectors by adopting a K-means clustering algorithm to obtain a preset number of K classification labels; carrying out similarity calculation on every two news title vectors in any classification label by adopting a vector similarity function, and determining similar news title vectors in any classification label, wherein the similar news title vectors are news title vectors which are larger than or equal to a preset similarity threshold value in the news title vectors; the propagation paths are generated according to the similar news headline vectors in any of the category labels and the ascending order of the generation times.

Description

News propagation path generation method based on word vector and related device
Technical Field
The present application relates to the field of news propagation path technology, and in particular, to a method and a related apparatus for generating a news propagation path based on word vectors.
Background
The network news is spread by means of an internet platform, and is an extension of the traditional news service. The network news can be repeatedly quoted while being spread, and related background introduction can be matched at the end, so that the depth and the breadth of the news are increased. The news propagation path mentioned here means that from the time when an original news is released, a reprinting path of the news in other province, city and county level news promotion media is tracked in chronological order, namely the news propagation path. The difficulty of constructing a news propagation path is that a huge amount of news media exist every day, and the data processing amount is extremely large; secondly, content adjustment can be carried out on the title or the text during the reprinting of the news media, and a technical threshold is provided for judging whether the news is the same source; and thirdly, the front-end application needs quasi-real-time feedback and has high performance requirement.
In the prior art, news propagation paths are generally constructed by directly performing news title matching or word vector matching, searching news which is completely consistent with current news titles but has later release time, and sequencing according to time sequence to obtain the news propagation paths.
However, when performing word vector matching on a title or content, the efficiency of searching for similar news titles may be low due to a large number of news titles or contents, thereby affecting the generation efficiency of news propagation paths.
Disclosure of Invention
In order to solve the above technical problem, the present application provides a news propagation path generation method and a related apparatus based on word vectors, which can improve the efficiency of searching for similar news headlines, thereby improving the generation efficiency of news propagation paths, and specifically refer to the following examples.
The application provides a news propagation path generation method based on word vectors in a first aspect, which includes:
acquiring a plurality of news headlines;
vectorizing and mapping any news title into a news title vector through a Bert model;
clustering the news headline vectors by adopting a K-means clustering algorithm to obtain a preset number K of classification labels, wherein any one news headline vector corresponds to one classification label, and any one classification label comprises a plurality of news headline vectors;
performing similarity calculation on every two of the news title vectors in any one of the classification labels by adopting a vector similarity function, and determining similar news title vectors in any one of the classification labels, wherein the similar news title vectors are news title vectors which are larger than or equal to a preset similarity threshold value in the news title vectors;
and generating a propagation path according to the ascending sequence of the similar news headline vectors and the generation time in any classification label, wherein the news headlines corresponding to any similar news headline vector are stored with different generation times in advance.
Optionally, after the obtaining the plurality of news headlines, the method for generating a news propagation path based on a word vector further includes:
acquiring a newly added news title;
the vectorizing mapping of any of the news headlines into a news headline vector through the Bert model includes:
vectorizing and mapping any one of the newly-added news headlines into a news headline vector through a Bert model.
Optionally, the news headline vector is a fixed-length 768-dimensional vector.
Optionally, after the obtaining the plurality of news headlines, the method for generating a news propagation path based on a word vector further includes:
and filtering the news headlines.
Optionally, the preset similarity threshold is 0.96.
Optionally, the preset number K of the category labels is 32.
A second aspect of the present application provides a news propagation path generation apparatus based on word vectors, including:
a first acquisition unit configured to acquire a plurality of news headlines;
the vectorization unit is used for vectorizing and mapping any news title into a news title vector through a Bert model;
the clustering unit is used for clustering the news headline vectors by adopting a K-means clustering algorithm to obtain a preset number of K classification labels, wherein any one news headline vector corresponds to one classification label, and any one classification label comprises a plurality of news headline vectors;
the determining unit is used for performing similarity calculation on every two of the news title vectors in any one of the classification labels by adopting a vector similarity function so as to determine similar news title vectors in any one of the classification labels, wherein the similar news title vectors are news title vectors which are greater than or equal to a preset similarity threshold value in the news title vectors;
and the generating unit is used for generating a propagation path according to the ascending sequence of the similar news headline vectors and the generating time in any classification label, and the news headline corresponding to any similar news headline vector stores different generating time in advance.
Optionally, the apparatus for generating a news propagation path based on a word vector further includes:
the second acquisition unit is used for acquiring a newly added news title;
the vectorization unit is specifically configured to vectorize and map any one of the newly added news headlines into a news headline vector through a Bert model.
Optionally, the device for generating a news propagation path based on a word vector further includes:
and the filtering unit is used for filtering the news headlines.
A third aspect of the present application provides a news propagation path generation apparatus based on word vectors, including:
the system comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;
the memory is a transient storage memory or a persistent storage memory;
the central processor is configured to communicate with the memory and execute the instructions in the memory to perform the method of the first aspect and any of its alternatives.
A fourth aspect of the present application provides a computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to carry out the method of the first aspect and any one of the alternatives of the first aspect.
According to the technical scheme, the method has the following advantages:
the method comprises the steps of obtaining a plurality of news titles, vectorizing and mapping any news title into a news title vector through a Bert model, clustering the news title vector by adopting a K-means clustering algorithm, carrying out similarity calculation on the news title vectors in any classification label pairwise by adopting a vector similarity function, determining the similar news title vector in any classification label, generating a propagation path according to the similar news title vector in any classification label and ascending arrangement of generation time, and searching the efficiency of the similar news titles through the method, so that the generation efficiency of the news propagation path is improved.
Drawings
In order to more clearly illustrate the technical solutions in the present application, the drawings required for the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings may be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flowchart of an embodiment of a news propagation path generation method based on word vectors in the present application;
fig. 2 is a schematic flowchart of another embodiment of a news propagation path generation method based on word vectors according to the present application;
fig. 3 is a schematic structural diagram of a news propagation path generation apparatus based on word vectors in the present application;
fig. 4 is another schematic structural diagram of a news propagation path generation apparatus based on word vectors in the present application;
fig. 5 is a schematic structural diagram of a news propagation path generation apparatus based on word vectors in the present application.
Detailed Description
The application provides a news propagation path generation method based on word vectors and a related device, which can improve the efficiency of searching similar news titles and further improve the generation efficiency of news propagation paths.
The news propagation path generation method based on the word vector is suitable for a server or a system. The method is specifically described below as applied to a server.
Referring to fig. 1, fig. 1 is a schematic flowchart of an embodiment of a news propagation path generation method based on word vectors according to the present application, where the news propagation path generation method based on word vectors includes:
101. a plurality of news headlines are obtained.
The server downloads a plurality of news headlines or contents from each third-party network element through a crawler program and stores the downloaded news headlines or contents in a MySQL database, and certainly, the downloaded news headlines or contents can also be stored in a MariaDB database or other relational database, and the specific type of the database for storing the news headlines or the contents is not limited.
102. Vectorizing and mapping any news headline into a news headline vector through a Bert model.
Vectorization mapping of any news headline into a news headline vector through the Bert model specifically means that a segment of characters are mapped into a one-dimensional vector of a fixed length through the berts service in the pre-training language model of google.
The Bert model can perform semantic-level vectorization conversion on news headlines and main content, so that matching accuracy is enhanced.
The fixed length of the mapped news headline vector is a non-fixed value, the fixed length parameter is debugged based on the actual news sample situation and the precision of the application requirement, specifically, the vectorized and mapped news headline vector can be a vector with a fixed length of 768 dimensions, and the following is an example script program code for vectorizing and mapping the news headline into a news headline vector with a fixed length of 768 dimensions:
Figure BDA0003927307220000041
Figure BDA0003927307220000051
Figure BDA0003927307220000061
103. and clustering the news headline vectors by adopting a K-means clustering algorithm to obtain a preset number of K classification labels, wherein any news headline vector corresponds to one classification label, and any classification label comprises a plurality of news headline vectors.
The K-means clustering algorithm has the specific function of dividing news title vectors into a plurality of classes in a classification mode so as to reduce the Knn complexity.
The value of the preset number K is a non-fixed value, and the value of the preset number K is obtained by debugging based on actual news sample conditions and calculation timeliness requirements, specifically, the preset number K may be 32, and certainly, the preset number K may also be a value larger than 2, such as 33 or 34, which is not limited herein specifically, in this embodiment, a skearn algorithm package of Python is used, and when the preset number K is 32, a specific example script program code is as follows:
Figure BDA0003927307220000062
Figure BDA0003927307220000071
104. and performing similarity calculation on every two of the plurality of news title vectors in any classification label by adopting a vector similarity function so as to determine similar news title vectors in any classification label, wherein the similar news title vectors are news title vectors which are larger than or equal to a preset similarity threshold value in the news title vectors.
It can be understood that the vector similarity function may be a cosine similarity function, or may also be a vector similarity function such as other sine similarity functions, and is not specifically limited herein, the cosine similarity function is also called a cosine similarity function, and the similarity between two vectors is evaluated by calculating a cosine value of an included angle between the two vectors, the preset similarity threshold is a non-fixed value, specifically, the preset similarity threshold is obtained by debugging the precision based on the actual news sample condition and the application requirement, and the value range of the preset similarity threshold is: (0, 1], wherein the preset similarity threshold may be set to 0.96, and when the cosine value of the included angle between two news headline vectors is greater than the preset similarity threshold 0.96, the two news headline vectors have a certain degree of similarity, and thus can be classified into the same class to a certain extent.
Through the aforementioned K-means clustering algorithm, the amount of calculation can be reduced when performing the vector similarity function, for example: originally 10000 news are matched pairwise, the matching needs to be carried out for 1 hundred million times, if K-means clustering divides the news into 100 big classes, wherein each class is assumed to be 100 news, the complexity is changed into 100 × 100=1 million times, and the matching times are reduced by 100 times.
Optionally, after the first news headline vector in the classification label is scanned and matched with other news headline vectors to determine a similar news headline vector, the first news headline vector is eliminated in the process of the similarity calculation, so that the first news headline vector does not participate in subsequent traversal query and scanning matching, and the program execution efficiency can be improved.
Example script program code is specifically as follows:
Figure BDA0003927307220000081
Figure BDA0003927307220000091
Figure BDA0003927307220000101
105. and generating a propagation path according to the ascending sequence of the similar news headline vectors and the generation time in any classification label, wherein the news headlines corresponding to any similar news headline vector are stored with different generation times in advance.
It can be understood that the news headlines obtained by the server are all preset with the generation time create _ time, similar news headline vectors in any classification label represent news with the same topic under the same propagation path, and the propagation path can be generated according to the ascending sequence of the create _ time.
In this embodiment, a server obtains a plurality of news titles, vectorizes and maps any news title into a news title vector through a Bert model, clusters the news title vector by using a K-means clustering algorithm, calculates the similarity of each two of the news title vectors in any classification label by using a vector similarity function, determines the similar news title vector in any classification label, and generates a propagation path according to the ascending order arrangement of the similar news title vector and the generation time in any classification label.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating another embodiment of a method for generating a news propagation path based on word vectors, where the method for generating a news propagation path based on word vectors includes:
201. a plurality of news headlines are obtained.
Step 201 in this embodiment is similar to step 101 in the embodiment shown in fig. 1, and is not described here again.
202. The news headlines are filtered.
A rule-based filter may be added after obtaining the plurality of news headlines to cull a large amount of invalid news from the plurality of news headlines, such as: the news is invalid, and the news has the problems of overlong title, no news subject, messy title code, outdated news and the like, so that the matching efficiency is improved.
203. And acquiring a new news title.
The multiple news headlines obtained in the third-party network element for the second time may have a repeated part with the multiple news headlines obtained in the first time, wherein the news headlines of the repeated part, such as the news headline a, have already been subjected to vectorization, clustering and similar matching processing and are distributed to a specific news propagation path, and when the news headlines obtained next time still have the news headlines a, the vectorization, clustering and similar matching processing do not need to be repeated on the news headlines a, the news headlines obtained next time need to be compared, so that the newly added news headlines are extracted.
204. And vectorizing and mapping any newly-added news headline into a news headline vector through a Bert model.
Step 204 in this embodiment is similar to step 102 in the embodiment shown in fig. 1, and is not repeated herein.
205. And clustering the news headline vectors by adopting a K-means clustering algorithm to obtain a preset number of K classification labels, wherein any news headline vector corresponds to one classification label, and any classification label comprises a plurality of news headline vectors.
Step 205 in this embodiment is similar to step 103 in the embodiment shown in fig. 1, and is not described here again.
206. And performing similarity calculation on every two of the plurality of news title vectors in any classification label by adopting a vector similarity function so as to determine similar news title vectors in any classification label, wherein the similar news title vectors are news title vectors which are greater than or equal to a preset similarity threshold value in the news title vectors.
Step 206 in this embodiment is similar to step 104 in the embodiment shown in fig. 1, and is not repeated herein.
207. And generating a propagation path according to the ascending sequence of the similar news headline vectors and the generation time in any classification label, wherein the news headlines corresponding to any similar news headline vector store different generation times in advance.
Step 207 in this embodiment is similar to step 105 in the embodiment shown in fig. 1, and is not described again here.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a news propagation path generating device based on word vectors, where the news propagation path generating device based on word vectors includes:
a first acquisition unit 301 configured to acquire a plurality of news headlines;
a vectorization unit 302, configured to vectorize and map any news title into a news title vector through a Bert model;
the clustering unit 303 is configured to perform clustering processing on the news headline vectors by using a K-means clustering algorithm to obtain a preset number K of classification tags, where any news headline vector corresponds to one classification tag, and any classification tag includes a plurality of news headline vectors;
a determining unit 304, configured to perform similarity calculation on every two of the multiple news headline vectors in any classification tag by using a vector similarity function, and determine similar news headline vectors in any classification tag, where the similar news headline vectors are news headline vectors that are greater than or equal to a preset similarity threshold value among the news headline vectors;
the generating unit 305 is configured to generate a propagation path according to the ascending order of the similar news headline vectors and the generation time in any classification label, where the news headline corresponding to any similar news headline vector stores different generation times in advance.
In the device of this embodiment, the functions of each unit correspond to the steps in the method embodiment shown in fig. 1, and are not described herein again.
In this embodiment, the first obtaining unit 301 obtains a plurality of news headlines, the vectorization unit 302 vectorizes and maps any news headline into a news headline vector through a Bert model, the clustering unit 303 clusters the news headline vector by using a K-means clustering algorithm, the determination unit 304 calculates the similarity of each two of the news headline vectors in any classification label by using a vector similarity function, and determines the similar news headline vector in any classification label, the generation unit 305 generates a propagation path according to the ascending order arrangement of the similar news headline vector and the generation time in any classification label, and the efficiency of searching for the similar news headlines can be achieved through the above units, so that the generation efficiency of the news propagation path is improved.
Referring to fig. 4, fig. 4 is another schematic structural diagram of a news propagation path generating device based on word vectors, the news propagation path generating device based on word vectors includes:
a first acquisition unit 401 configured to acquire a plurality of news titles;
a vectorization unit 402, configured to vectorize and map any news title into a news title vector through a Bert model;
a clustering unit 403, configured to perform clustering processing on the news headline vectors by using a K-means clustering algorithm to obtain a preset number K of classification tags, where any news headline vector corresponds to one classification tag, and any classification tag includes multiple news headline vectors;
a determining unit 404, configured to perform similarity calculation on every two of the multiple news headline vectors in any classification tag by using a vector similarity function, and determine a similar news headline vector in any classification tag, where the similar news headline vector is a news headline vector that is greater than or equal to a preset similarity threshold in the news headline vectors;
the generating unit 405 is configured to generate a propagation path according to the ascending order of the similar news headline vectors and the generation time in any classification tag, where the news headlines corresponding to any similar news headline vector store different generation times in advance.
Optionally, the apparatus for generating a news propagation path based on word vectors further includes:
a second obtaining unit 406, configured to obtain a new news title;
the vectorization unit 402 is specifically configured to vectorize and map any newly added news headline into a news headline vector through the Bert model.
Optionally, the news propagation path generating apparatus based on word vectors further includes:
and a filtering unit 407, configured to perform filtering processing on the news headlines.
In the device of this embodiment, the functions of each unit correspond to the steps in the method embodiment shown in fig. 2, and are not described herein again.
Referring to fig. 5, the news propagation path generating apparatus based on word vectors provided in the present application includes: a central processing unit 502, a memory 501, an input/output interface 503, a wired or wireless network interface 504 and a power supply 505;
the memory 501 is a transient storage memory or a persistent storage memory;
the central processor 502 is configured to communicate with the memory 501 and execute the instruction operations in the memory 501 to perform the steps in the embodiments shown in fig. 1-2.
The present application also provides a computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the steps of the aforementioned embodiments of fig. 1-2.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.

Claims (9)

1. A news propagation path generation method based on word vectors is characterized by comprising the following steps:
acquiring a plurality of news headlines;
vectorizing and mapping any news title into a news title vector through a Bert model;
clustering the news headline vectors by adopting a K-means clustering algorithm to obtain a preset number K of classification labels, wherein any one news headline vector corresponds to one classification label, and any one classification label comprises a plurality of news headline vectors;
performing similarity calculation on every two of the plurality of news title vectors in any classification label by adopting a vector similarity function, and determining similar news title vectors in any classification label, wherein the similar news title vectors are news title vectors which are larger than or equal to a preset similarity threshold value in the news title vectors;
and generating a propagation path according to the ascending sequence of the similar news title vectors and the generation time in any classification label, wherein the news titles corresponding to any similar news title vector store different generation times in advance.
2. The method of claim 1, wherein after the obtaining of the plurality of news headlines, the method further comprises:
acquiring a newly added news title;
the vectorizing mapping of any of the news headlines into a news headline vector through the Bert model includes:
vectorizing and mapping any one of the newly-added news headlines into a news headline vector through a Bert model.
3. The method of generating news propagation paths based on word vectors as claimed in claim 1, wherein the news headline vectors are fixed length 768-dimensional vectors.
4. The method of claim 1, wherein after the obtaining of the plurality of news headlines, the method further comprises:
and filtering the news headlines.
5. The method for generating news propagation path based on word vector as claimed in any one of claims 1 to 5, wherein the preset similarity threshold is 0.96.
6. The method for generating news propagation path based on word vector according to any one of claims 1 to 5, characterized in that the preset number K of the category labels is 32.
7. A news propagation path generation apparatus based on a word vector, comprising:
a first acquisition unit configured to acquire a plurality of news titles;
the vectorization unit is used for vectorizing and mapping any news title into a news title vector through a Bert model;
the clustering unit is used for clustering the news headline vectors by adopting a K-means clustering algorithm to obtain a preset number of K classification labels, wherein any news headline vector corresponds to one classification label, and any classification label comprises a plurality of news headline vectors;
the determining unit is used for performing similarity calculation on every two news title vectors in any one of the classification labels by adopting a vector similarity function so as to determine similar news title vectors in any one of the classification labels, wherein the similar news title vectors are news title vectors which are larger than or equal to a preset similarity threshold value in the news title vectors;
and the generating unit is used for generating a propagation path according to the ascending sequence of the similar news headline vectors and the generating time in any classification label, and the news headline corresponding to any similar news headline vector stores different generating time in advance.
8. A news propagation path generation apparatus based on a word vector, comprising:
the system comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;
the memory is a transient memory or a persistent memory;
the central processor is configured to communicate with the memory and execute the operations of the instructions in the memory to perform the method of any of claims 1 to 6.
9. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 6.
CN202211377457.5A 2022-11-04 2022-11-04 News propagation path generation method based on word vector and related device Pending CN115730589A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211377457.5A CN115730589A (en) 2022-11-04 2022-11-04 News propagation path generation method based on word vector and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211377457.5A CN115730589A (en) 2022-11-04 2022-11-04 News propagation path generation method based on word vector and related device

Publications (1)

Publication Number Publication Date
CN115730589A true CN115730589A (en) 2023-03-03

Family

ID=85294580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211377457.5A Pending CN115730589A (en) 2022-11-04 2022-11-04 News propagation path generation method based on word vector and related device

Country Status (1)

Country Link
CN (1) CN115730589A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117391071A (en) * 2023-12-04 2024-01-12 中电科大数据研究院有限公司 News topic data mining method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117391071A (en) * 2023-12-04 2024-01-12 中电科大数据研究院有限公司 News topic data mining method, device and storage medium
CN117391071B (en) * 2023-12-04 2024-02-27 中电科大数据研究院有限公司 News topic data mining method, device and storage medium

Similar Documents

Publication Publication Date Title
CN111444320A (en) Text retrieval method and device, computer equipment and storage medium
US11609748B2 (en) Semantic code search based on augmented programming language corpus
CN112115232A (en) Data error correction method and device and server
CN112883165B (en) Intelligent full-text retrieval method and system based on semantic understanding
CN111506621A (en) Data statistical method and device
CN111274822A (en) Semantic matching method, device, equipment and storage medium
CN115730589A (en) News propagation path generation method based on word vector and related device
CN115905630A (en) Graph database query method, device, equipment and storage medium
CN115858773A (en) Keyword mining method, device and medium suitable for long document
CN116610304B (en) Page code generation method, device, equipment and storage medium
CN113254649A (en) Sensitive content recognition model training method, text recognition method and related device
CN110209895B (en) Vector retrieval method, device and equipment
CN112765976A (en) Text similarity calculation method, device and equipment and storage medium
CN111831624A (en) Data table creating method and device, computer equipment and storage medium
CN111597336A (en) Processing method and device of training text, electronic equipment and readable storage medium
CN112287005B (en) Data processing method, device, server and medium
CN116822491A (en) Log analysis method and device, equipment and storage medium
CN113869408A (en) Classification method and computer equipment
CN113886520A (en) Code retrieval method and system based on graph neural network and computer readable storage medium
CN112732743A (en) Data analysis method and device based on Chinese natural language
CN112988778A (en) Method and device for processing database query script
CN111460088A (en) Similar text retrieval method, device and system
CN112148751A (en) Method and device for querying data
CN116383655B (en) Sample generation method, model training method, text processing method and device
US20230394021A1 (en) Computing similarity of tree data structures using metric functions defined on sets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination