CN112580355B - News information topic detection and real-time aggregation method - Google Patents
News information topic detection and real-time aggregation method Download PDFInfo
- Publication number
- CN112580355B CN112580355B CN202011613849.8A CN202011613849A CN112580355B CN 112580355 B CN112580355 B CN 112580355B CN 202011613849 A CN202011613849 A CN 202011613849A CN 112580355 B CN112580355 B CN 112580355B
- Authority
- CN
- China
- Prior art keywords
- real
- task
- text
- time
- news information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the technical field of natural language processing, and particularly relates to a news information topic detection and real-time aggregation method. The method can be used for finishing the real-time pushing of news information leisurely through data acquisition, data processing, text fusion model construction and real-time aggregation. On the basis of constructing a text feature model by using a multi-feature fusion method, a distributed real-time streaming data calculation method is adopted to distribute topic clustering tasks to different calculation nodes, so that the accuracy and the real-time performance of news information real-time aggregation are improved, the performance problem under a single node is solved, and finally, a news information aggregation result can be pushed to an end user through a terminal device, and the method is convenient and practical.
Description
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a multi-data news information topic detection and real-time aggregation method.
Background
The continuous innovation and rapid development of information technology bring a profound influence to news spreading, media platforms based on the internet are continuously increased, the spreading speed and the number of news are increased day by day, network news information becomes messy, different media platforms forward and copy the same news information, and the homogenization problem is serious. Therefore, how to automatically mine and analyze the hot topics worth attention at present in an immense information sea by utilizing computer technology and comprehensively display the aggregated hot news to users in real time is a research hotspot and focus of current network news. In addition, as the data scale of the network news is rapidly increased, the original serialized topic discovery and tracking method cannot be effectively executed due to the limitation of conditions such as memory capacity when processing a mass news data set, and the requirement on timeliness and the like is difficult to meet.
Disclosure of Invention
Aiming at the defects and problems that the conventional news data are increased sharply, and the conventional serialized topic finding and tracking method is often unable to be effectively executed due to the limitation of conditions such as memory capacity and the like when processing a mass news data set, and is difficult to meet the requirements in aspects such as timeliness and the like, the invention provides a topic detection and aggregation method for multi-data news information on the basis of constructing a text feature model by utilizing a multi-feature fusion method, distributes topic clustering tasks to different computing nodes, improves the accuracy and real-time performance of news information real-time aggregation, and solves the performance problem under a single node.
The technical scheme adopted by the invention for solving the technical problems is as follows: a news information topic detection and real-time aggregation method comprises the following steps:
step one, distributed data acquisition: collecting news information from an internet news media website in real time through a distributed collection program to serve as original data;
step two, data preprocessing: carrying out text denoising, Chinese word segmentation, word filtering stop words, part of speech tagging, keyword extraction and named entity identification on original data to obtain a data document set D to be processed;
step three, constructing a text feature model: the text feature model is constructed by utilizing a multi-feature fusion method, and the model construction method comprises the following steps:
(1) obtaining subject characteristics of a text by utilizing a named entity recognition technology and an LDA model which are integrated, receiving a document set D as input, and calculating the similarity sim (p, q) of the texts p and qlda,
In the formula: p and q are probability vectors of the quantitative texts, and DKL is a vector distance calculated by adopting relative entropy;
(2) obtaining semantic features of the text by using a Word2Vect model, and calculating the semantic similarity sim (p, q) of the text p and the text q by using cosine similarityv2q,
(3) A text fusion model is obtained by fusing the theme characteristics and the semantic characteristics by adopting the weighting factors,
sim(p,q)=α*sim(p,q)lda+β*sim(p,q)v2q
in the formula: α, β are weighting factors, α + β ═ 1;
(4) adding time attenuation factors to the text fusion model to update the model, calculating the similarity of the updated text,
sim(p,q)=e-k*(t2-t1)*α*sim(p,q)lda+e-k*(t2-t1)*β*sim(p,q)v2q
in the formula: k is an attenuation factor, t2And t1Is the update time of both articles;
step four, distributed real-time clustering: the method for clustering news information in real time by adopting a distributed real-time clustering algorithm comprises the following steps:
(1) vectorizing the collected and preprocessed text, transferring the vector data to task scheduling nodes of a distributed real-time aggregation algorithm according to an input sequence, uniformly numbering the tasks by the task scheduling nodes, and then issuing the tasks to task execution nodes;
(2) traversing the feature vectors of the text by the task execution nodes, and calculating the similarity of each vector and other vectors of the calculation nodes according to the updated text fusion model to obtain a candidate similarity set;
(3) selecting the maximum similarity from the similarity candidate set and recording the feature vector corresponding to the maximum similarity to form a feature vector similarity set;
(4) filtering combinations with the similarity smaller than a specified threshold value from the feature vector similarity set to obtain a filtering set, and outputting the result to the message middleware;
(5) and taking out the filtering set from the message middleware, merging and outputting the sets with the same text until all clusters are not updated any more, and obtaining the real-time clustered news information.
Step five, pushing in real time: and pushing the news information clustered in real time to the user in real time through a visualization tool.
In the above news information topic detection and real-time aggregation method, the internet news media data is various news information from various media platforms.
According to the news information topic detection and real-time aggregation method, in the first step, data acquisition adopts a distributed architecture design, a task generation module executes a generated acquisition task, and a task execution module executes the acquisition task.
According to the news information topic detection and real-time aggregation method, message middleware can be arranged between the task generation module and the task execution module, and the two modules are respectively in communication connection with the message middleware to finish data transmission.
According to the news information topic detection and real-time aggregation method, the distributed acquisition program comprises the task scheduling center and the task acquisition nodes, wherein the task scheduling center acquires tasks from the task list and issues the acquired tasks to the specific task acquisition nodes through the message middleware to generate the acquisition tasks to be executed of the form and shadow; the task acquisition node is used for executing an acquisition task and downloading and acquiring page news data.
The invention has the beneficial effects that:
the text similarity calculation method utilizes a multi-feature fusion method to construct a text feature model, utilizes a named entity recognition technology and an LDA model to obtain the subject feature of the text, and fully considers the named entity factors and the time factors to construct a frame of text similarity calculation.
The invention adopts a distributed real-time clustering algorithm to distribute topic clustering tasks to different computing nodes, improves the accuracy and real-time performance of news information real-time aggregation, and solves the performance problem under a single node.
According to the topic detection and real-time aggregation method for multi-data news information, real-time pushing of the news information is finished through data acquisition, data processing, text fusion model construction and real-time aggregation, during data acquisition, a task generation module executes and generates an acquisition task, a task execution module executes the acquisition task, and the two modules can dynamically expand or reduce resources according to the size of a task amount scheduling program without influencing normal operation of a system, so that acquisition efficiency is guaranteed.
Drawings
FIG. 1 is a schematic view of the overall process of the present invention.
Fig. 2 is a schematic diagram of data acquisition processing according to the present invention.
Detailed Description
The invention provides a topic detection and aggregation method for multi-data news information on the basis of constructing a text feature model by using a multi-feature fusion method, and topic clustering tasks are distributed to different computing nodes, so that the accuracy and the real-time performance of news information real-time aggregation are improved, and the performance problem under a single node is solved. The invention is further illustrated with reference to the following figures and examples.
As shown in fig. 1, the news information topic detection and real-time aggregation method of the present invention includes the following steps.
Step one, distributed data acquisition: the news information from the internet news media website is collected in real time through a distributed collection program to serve as original data.
The method comprises the following steps:
(1) generating an acquisition task, generating a corresponding acquisition task according to the data volume of the data source, and transmitting the acquired task to the message middleware;
(2) and receiving the acquisition task, executing the acquisition task, and acquiring data according to the acquisition task in the middle of the acquisition task and the received message to obtain first data.
The mutual news media data is various news information from various media platforms (including but not limited to websites of traditional news media in various cities and internet news sites). In the specific implementation process, script can be used as a framework of an acquisition program, a task acquisition module extracts a task according to an initialized data source and a task extraction rule, and writes the analyzed task into a kafka acquisition task; and the acquisition module reads the tasks from the kafka and performs data acquisition and completes preprocessing and warehousing work. During implementation, partial task acquisition or execution nodes can be dynamically started and suspended by a scheduling program according to the task quantity condition in kafka.
Or a distributed architecture design is adopted, the task generating module executes and generates the acquisition task, and the task executing module executes the acquisition task; meanwhile, a message middleware can be arranged between the task generating module and the task executing module, and the two modules are respectively in communication connection with the message middleware to finish data transmission; the task generation module and the task execution module can dynamically expand or reduce resources according to the size of the task amount and the scheduling program without influencing the normal operation of the system, and the acquisition efficiency is ensured.
Step two, data preprocessing: carrying out text denoising, Chinese word segmentation, word filtering stop words, part of speech tagging, keyword extraction and named entity identification on original data to obtain a data document set D to be processed;
for the Chinese word segmentation, because the current Chinese word segmentation technology is relatively mature and the word segmentation effect of the mainstream word segmentation tool is relatively close, the open-source Chinese word segmentation tool, such as the jieba word segmentation, can be directly used in the implementation.
Aiming at the filtering of stop words, new stop words can be further added into the stop word library in a further perfection manner by combining the characteristics of news aggregation on the basis of considering the common natural language processing stop word set.
Step three, constructing a text feature model: constructing a text feature model by using a multi-feature fusion method, wherein the model construction method comprises the following steps:
(1) obtaining subject characteristics of texts by utilizing named entity recognition technology and LDA model, receiving a document set D as input, and calculating text similarity sim (p, q) of texts p and qlda,
In the formula: p and q are probability vectors of texts, and DKL is a vector distance calculated by using relative entropy.
(2) Obtaining semantic features of the text by using a Word2Vect model, and calculating the semantic similarity sim (p, q) of the text p and the text q by using cosine similarityv2q,
In the formula: p is a radical ofiAnd q isiRespectively, representing different text.
The important meaning of the Word2Vect model (Word vector) is that natural language is converted into a vector that a computer can understand the computation. Word2Vec is a Word vector computation model proposed by Google. The Word2 vent tool mainly comprises two models: continuous bag of words model (CBOW, continuous bag of words) and skip-word model (skip-gram). CBOW is a word vector obtained by training according to the context to predict a target word; and the Skip-gram is trained according to the target word to predict the surrounding words to obtain a word vector. In the specific implementation process, because the Skip-gram has a good effect on large-scale corpora, the Skip-gram is adopted to construct word vectors, the news2016zh corpora is used as a training corpus to construct word vectors, and a trained model is used to represent texts.
(3) Fusing text features and semantic features by using weighting factors to obtain a text fusion model,
sim(p,q)=α*sim(p,q)lda+β*sim(p,q)v2q
in the formula: α and β are weighting factors, and α + β is 1.
(4) Adding a time attenuation factor to the text fusion model to update the model, wherein the similarity of the updated text is calculated as follows:
sim(p,q)=e-k*(t2-t1)*α*sim(p,q)lda+e-k*(t2-t1)*β*sim(p,q)v2q
in the formula: k is an attenuation factor, t2And t1Is the update time of both articles.
Step four, distributed real-time clustering: the method for clustering news information in real time by adopting a distributed real-time clustering algorithm comprises the following steps:
(1) and vectorizing the collected and preprocessed text, transferring the vector data to task scheduling nodes of a distributed real-time aggregation algorithm according to an input sequence, uniformly numbering the tasks by the task scheduling nodes, and then issuing the tasks to task execution nodes.
(2) Traversing the feature vectors of the text by the task execution nodes, and calculating the similarity of each vector and other vectors of the calculation nodes according to the updated text fusion model to obtain a candidate similarity set;
(3) selecting the maximum similarity from the similarity candidate set and recording the feature vector corresponding to the maximum similarity to form a feature vector similarity set;
(4) filtering combinations with the similarity smaller than a specified threshold value from the feature vector similarity set to obtain a filtering set, and outputting the result to the message middleware;
(5) and taking out the filtering set from the message middleware, merging and outputting the sets with the same text until all clusters are not updated any more, and obtaining the real-time clustered news information.
Step five, pushing in real time: and pushing the news information clustered in real time to the user in real time through a visualization tool.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and scope of the present invention are intended to be covered thereby.
Claims (5)
1. A news information topic detection and real-time aggregation method is characterized in that: the method comprises the following steps:
step one, distributed data acquisition: collecting news information from an internet news media website in real time through a distributed collection program to serve as original data;
step two, data preprocessing: carrying out text denoising, Chinese word segmentation, word filtering stop words, part of speech tagging, keyword extraction and named entity identification on original data to obtain a data document set D to be processed;
step three, constructing a text feature model: the text feature model is constructed by utilizing a multi-feature fusion method, and the model construction method comprises the following steps:
(1) obtaining subject characteristics of texts by utilizing named entity recognition technology and LDA model, receiving a document set D as input, and calculating text similarity sim (p, q) of texts p and qlda,
In the formula: p and q are probability vectors of texts, and DKL is a vector distance calculated by adopting relative entropy;
(2) obtaining semantic features of the text by using a Word2Vect model, and calculating the semantic similarity sim (p, q) of the text p and the text q by using cosine similarityv2q,
(3) A text fusion model is obtained by fusing the theme characteristics and the semantic characteristics by adopting the weighting factors,
sim(p,q)=α*sim(p,q)lda+β*sim(p,q)v2q
in the formula: α, β are weighting factors, α + β ═ 1;
(4) adding time attenuation factors to the text fusion model to update the model, calculating the similarity of the updated text,
sim(p,q)=e-k*(t2-t1)*α*sim(p,q)lda+e-k*(t2-t1)*β*sim(p,q)v2q
in the formula: k is an attenuation factor, t2And t1Is the update time of both articles;
step four, distributed real-time clustering: the method for clustering news information in real time by adopting a distributed real-time clustering algorithm comprises the following steps:
(1) vectorizing the collected and preprocessed text, transferring the vector data to task scheduling nodes of a distributed real-time aggregation algorithm according to an input sequence, uniformly numbering the tasks by the task scheduling nodes, and then issuing the tasks to task execution nodes;
(2) traversing the feature vectors of the text by the task execution nodes, and calculating the similarity of each vector and other vectors of the calculation nodes according to the updated text fusion model to obtain a candidate similarity set;
(3) selecting the maximum similarity from the similarity candidate set and recording the feature vector corresponding to the maximum similarity to form a feature vector similarity set;
(4) filtering combinations with the similarity smaller than a specified threshold value from the feature vector similarity set to obtain a filtering set, and outputting the result to the message middleware;
(5) taking out a filtering set from the message middleware, merging and outputting the sets with the same text until all clusters are not updated any more, and obtaining real-time clustered news information;
step five, pushing in real time: and pushing the news information clustered in real time to the user in real time through a visualization tool.
2. The news information topic detection and real-time aggregation method as claimed in claim 1, wherein: the internet news media website data is various news information from various media platforms.
3. The news information topic detection and real-time aggregation method as claimed in claim 1, wherein: in the first step, data acquisition adopts a distributed architecture design, a task generation module executes a generated acquisition task, and a task execution module executes the acquisition task.
4. The news information topic detection and real-time aggregation method as claimed in claim 3, wherein: and a message middleware can be arranged between the task generating module and the task executing module, and the two modules are respectively in communication connection with the message middleware to finish data transmission.
5. The news information topic detection and real-time aggregation method as claimed in claim 1, wherein: the distributed acquisition program comprises a task scheduling center and task acquisition nodes, wherein the task scheduling center acquires tasks from a task list and issues the acquisition tasks to the specific task acquisition nodes to generate corresponding acquisition tasks to be executed; the task acquisition node is used for executing an acquisition task and downloading and acquiring page news data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011613849.8A CN112580355B (en) | 2020-12-30 | 2020-12-30 | News information topic detection and real-time aggregation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011613849.8A CN112580355B (en) | 2020-12-30 | 2020-12-30 | News information topic detection and real-time aggregation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112580355A CN112580355A (en) | 2021-03-30 |
CN112580355B true CN112580355B (en) | 2021-08-31 |
Family
ID=75145101
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011613849.8A Active CN112580355B (en) | 2020-12-30 | 2020-12-30 | News information topic detection and real-time aggregation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112580355B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117077632B (en) * | 2023-10-18 | 2024-01-09 | 北京国科众安科技有限公司 | Automatic generation method for information theme |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107526819A (en) * | 2017-08-29 | 2017-12-29 | 江苏飞搏软件股份有限公司 | A kind of big data the analysis of public opinion method towards short text topic model |
CN109492157A (en) * | 2018-10-24 | 2019-03-19 | 华侨大学 | Based on RNN, the news recommended method of attention mechanism and theme characterizing method |
CN109885675A (en) * | 2019-02-25 | 2019-06-14 | 合肥工业大学 | Method is found based on the text sub-topic for improving LDA |
US10460035B1 (en) * | 2016-12-26 | 2019-10-29 | Cerner Innovation, Inc. | Determining adequacy of documentation using perplexity and probabilistic coherence |
CN110795533A (en) * | 2019-10-22 | 2020-02-14 | 王帅 | Long text-oriented theme detection method |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8781989B2 (en) * | 2008-01-14 | 2014-07-15 | Aptima, Inc. | Method and system to predict a data value |
CN105677769B (en) * | 2015-12-29 | 2018-01-05 | 广州神马移动信息科技有限公司 | One kind is based on latent Dirichletal location(LDA)The keyword recommendation method and system of model |
CN106202065B (en) * | 2016-06-30 | 2018-12-21 | 中央民族大学 | Across the language topic detecting method of one kind and system |
CN106951463A (en) * | 2017-02-27 | 2017-07-14 | 宇龙计算机通信科技(深圳)有限公司 | News push method and system |
US10671936B2 (en) * | 2017-04-06 | 2020-06-02 | Universite Paris Descartes | Method for clustering nodes of a textual network taking into account textual content, computer-readable storage device and system implementing said method |
CN107423337A (en) * | 2017-04-27 | 2017-12-01 | 天津大学 | News topic detection method based on LDA Fusion Models and multi-level clustering |
CN107463605B (en) * | 2017-06-21 | 2021-06-11 | 北京百度网讯科技有限公司 | Method and device for identifying low-quality news resource, computer equipment and readable medium |
CN107861939B (en) * | 2017-09-30 | 2021-05-14 | 昆明理工大学 | Domain entity disambiguation method fusing word vector and topic model |
CN108509517B (en) * | 2018-03-09 | 2021-05-11 | 东南大学 | Streaming topic evolution tracking method for real-time news content |
CN108519971B (en) * | 2018-03-23 | 2022-02-11 | 中国传媒大学 | Cross-language news topic similarity comparison method based on parallel corpus |
CN108920482B (en) * | 2018-04-27 | 2020-08-21 | 浙江工业大学 | Microblog short text classification method based on lexical chain feature extension and LDA (latent Dirichlet Allocation) model |
CN108920508A (en) * | 2018-05-29 | 2018-11-30 | 福建新大陆软件工程有限公司 | Textual classification model training method and system based on LDA algorithm |
CN108829799A (en) * | 2018-06-05 | 2018-11-16 | 中国人民公安大学 | Based on the Text similarity computing method and system for improving LDA topic model |
CN109033320B (en) * | 2018-07-18 | 2021-02-12 | 无码科技(杭州)有限公司 | Bilingual news aggregation method and system |
CN109710728B (en) * | 2018-11-26 | 2022-05-17 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Automatic news topic discovery method |
CN111858918A (en) * | 2019-04-30 | 2020-10-30 | 中移(苏州)软件技术有限公司 | News classification method and device, network element and storage medium |
CN110297988B (en) * | 2019-07-06 | 2020-05-01 | 四川大学 | Hot topic detection method based on weighted LDA and improved Single-Pass clustering algorithm |
CN110738053A (en) * | 2019-10-14 | 2020-01-31 | 广东南方新媒体科技有限公司 | News theme recommendation algorithm based on semantic analysis and supervised learning model |
CN111144453A (en) * | 2019-12-11 | 2020-05-12 | 中科院计算技术研究所大数据研究院 | Method and equipment for constructing multi-model fusion calculation model and method and equipment for identifying website data |
CN111460289B (en) * | 2020-03-27 | 2024-03-29 | 北京百度网讯科技有限公司 | News information pushing method and device |
-
2020
- 2020-12-30 CN CN202011613849.8A patent/CN112580355B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10460035B1 (en) * | 2016-12-26 | 2019-10-29 | Cerner Innovation, Inc. | Determining adequacy of documentation using perplexity and probabilistic coherence |
CN107526819A (en) * | 2017-08-29 | 2017-12-29 | 江苏飞搏软件股份有限公司 | A kind of big data the analysis of public opinion method towards short text topic model |
CN109492157A (en) * | 2018-10-24 | 2019-03-19 | 华侨大学 | Based on RNN, the news recommended method of attention mechanism and theme characterizing method |
CN109885675A (en) * | 2019-02-25 | 2019-06-14 | 合肥工业大学 | Method is found based on the text sub-topic for improving LDA |
CN110795533A (en) * | 2019-10-22 | 2020-02-14 | 王帅 | Long text-oriented theme detection method |
Also Published As
Publication number | Publication date |
---|---|
CN112580355A (en) | 2021-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Filtering out the noise in short text topic modeling | |
US20210224568A1 (en) | Method and apparatus for recognizing text | |
CN109783651B (en) | Method and device for extracting entity related information, electronic equipment and storage medium | |
US11899681B2 (en) | Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium | |
WO2017202125A1 (en) | Text classification method and apparatus | |
CN111143576A (en) | Event-oriented dynamic knowledge graph construction method and device | |
CN106446148A (en) | Cluster-based text duplicate checking method | |
US20220058222A1 (en) | Method and apparatus of processing information, method and apparatus of recommending information, electronic device, and storage medium | |
US10482146B2 (en) | Systems and methods for automatic customization of content filtering | |
CN112036906B (en) | Data processing method, device and equipment | |
JP2022191412A (en) | Method for training multi-target image-text matching model and image-text retrieval method and apparatus | |
CN110851644A (en) | Image retrieval method and device, computer-readable storage medium and electronic device | |
WO2023045187A1 (en) | Semantic search method and apparatus based on neural network, device, and storage medium | |
CN113051914A (en) | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait | |
CN104391969B (en) | Determine the method and device of user's query statement syntactic structure | |
WO2022116324A1 (en) | Search model training method, apparatus, terminal device, and storage medium | |
JP2018509664A (en) | Model generation method, word weighting method, apparatus, device, and computer storage medium | |
CN111061837A (en) | Topic identification method, device, equipment and medium | |
US20220383036A1 (en) | Clustering data using neural networks based on normalized cuts | |
CN111738341B (en) | Distributed large-scale face clustering method and device | |
Wei | Study on the application of cloud computing and speech recognition technology in English teaching | |
CN112580355B (en) | News information topic detection and real-time aggregation method | |
CN111538859B (en) | Method and device for dynamically updating video tag and electronic equipment | |
CN111191242A (en) | Vulnerability information determination method and device, computer readable storage medium and equipment | |
CN109918661A (en) | Synonym acquisition methods and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |