CN111695020A - Hadoop platform-based information recommendation method and system - Google Patents

Hadoop platform-based information recommendation method and system Download PDF

Info

Publication number
CN111695020A
CN111695020A CN202010542277.2A CN202010542277A CN111695020A CN 111695020 A CN111695020 A CN 111695020A CN 202010542277 A CN202010542277 A CN 202010542277A CN 111695020 A CN111695020 A CN 111695020A
Authority
CN
China
Prior art keywords
information
text information
text
publisher
key value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010542277.2A
Other languages
Chinese (zh)
Inventor
张梓光
肖明
张小芳
许宋硕
周敏
鲁虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202010542277.2A priority Critical patent/CN111695020A/en
Publication of CN111695020A publication Critical patent/CN111695020A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The application relates to an information recommendation method and system based on a Hadoop platform, which mainly comprises the following steps: (1) acquiring text information and publisher information, performing denoising processing on the text information and storing the text information in an HDFS system; (2) generating a key-value pair list for the text information and the publisher information stored in the HDFS system by using MapReduce; (3) performing theme modeling on the list according to the key value by the LDA theme model; (4) clustering the text information, and finishing information recommendation according to a clustering result; preliminarily filtering the text information to be recommended by utilizing the characteristics of Hadoop distributed storage information, and establishing a mapping relation between an information publisher and the text information; the text information is secondarily filtered by combining the Hadoop platform and the LDA topic model, so that the topic of the text information can be finely extracted, the access rate of a recommendation system to the text information and the accuracy of text information query before recommendation are improved, and the effectiveness and the accuracy of information recommendation are further ensured.

Description

Hadoop platform-based information recommendation method and system
Technical Field
The invention belongs to the field of data mining, and particularly relates to an information recommendation method and system based on a Hadoop platform.
Background
As internet technology has developed, more and more users browse news online or using mobile devices, and news applications have become one of the hottest internet applications, just a little lower than internet music. However, the huge amount of network news causes the problem of information overload, so that it is an important research topic to help users to filter or recommend useful news information. The mass users relate to tens of millions of attention relations and the amount of published articles, the interaction behavior and reading behavior among the users can reach the billion level, and the following defects occur in the conventional recommendation model and processing method along with the sharp increase of the data such as the number of users, the amount of published articles and the like: the accuracy of processing the text data is reduced; the performance of topic mining and information recommendation is insufficient; the problem of sparse user data is not well solved, and the defects enable the existing recommendation model and processing method to not meet the recommendation requirements of users, so that the popularization of a news application platform is hindered, and the satisfaction degree of the users is further influenced.
Disclosure of Invention
Based on the information recommendation method and system based on the Hadoop platform, the information is preliminarily filtered before classified recommendation by utilizing the characteristic of distributed data processing of the Hadoop platform, so that the recommendation accuracy is improved, and the defects of the prior art are overcome.
The invention relates to an information recommendation method based on a Hadoop platform, which comprises the following steps:
acquiring text information and corresponding publisher information, performing denoising processing on the text information, and storing the denoised text information and the publisher information in an HDFS (Hadoop distributed file system) of a Hadoop platform;
dividing and sequencing text information and publisher information stored in the HDFS system by using a MapReduce computing frame to generate a plurality of text information and key value pairs corresponding to the publisher information, and combining the key value pairs of the same publisher to generate a plurality of key value pair lists;
performing theme modeling on the key value pair list by using an LDA theme model to obtain the theme characteristics of each piece of text information, and clustering the text information according to the modeling result of the LDA theme model;
and recommending information to the user according to the clustering result of the text information.
Preferably, the denoising processing of the text information includes:
and converting the text information into a uniform language.
Preferably, the denoising processing of the text information further includes:
and converting special symbols carried in the text information into characters so as to reserve the emotional characteristics of the text information.
Preferably, the denoising processing of the text information further includes:
and performing word segmentation on the text information by using an ICTCCLAS word segmentation system.
Preferably, the denoising processing of the text information further includes:
stop words in the text information are removed to reduce the storage space of the text information in the HDFS system.
Preferably, clustering the text information comprises:
and calculating the similarity of the text information by utilizing the cosine similarity, and clustering the text information according to the calculation result of the similarity.
Preferably, calculating the text information similarity includes:
the text information is simplified into a space vector by using a vector space model VSM, and the cosine similarity of the text information is calculated as the following formula
Figure BDA0002539393760000021
AiAnd BiRespectively indicate participation similarDegree-calculated vector-space-model-VSM-based spatial vectors of two text messages.
Preferably, the recommending information to the user according to the clustering result of the text information includes:
and calculating the similarity of the candidate text information and the reading history of the user and/or the score of the text information according to the clustering result of the text information, generating a list to be recommended, and indexing the list to be recommended to complete information recommendation of the user.
Preferably, the acquiring the text information and the corresponding publisher information comprises:
and simulating user login, downloading any page URL, performing page analysis to obtain publisher information, and obtaining the published text information according to the publisher information.
In another aspect, the present invention provides an information recommendation system based on a Hadoop platform, including:
the information acquisition module is used for acquiring the text information and the corresponding publisher information;
the information storage module runs an HDFS (Hadoop distributed file system) system with a Hadoop computing framework to store the text information and the publisher information which are subjected to denoising processing;
the key value pair generating module runs a MapReduce computing framework to divide and sequence the text information and the publisher information stored in the HDFS system, generates a plurality of text information and key value pairs corresponding to the publisher information, and combines the key value pairs of the same publisher to generate a plurality of key value pair lists;
the text information topic modeling module is used for carrying out topic modeling on the key value pair list obtained in the key value pair generating module by utilizing an LDA topic model to obtain the topic characteristics of each piece of text information;
the text information clustering module is used for clustering the text information according to the modeling result of the LDA topic model and recommending information to the user;
and the recommending module is used for recommending information to the user according to the clustering result of the text information.
According to the technical scheme, the invention has the following beneficial effects:
according to the information recommendation method and system based on the Hadoop platform, the text information to be recommended is preliminarily filtered by utilizing the characteristics of Hadoop distributed storage information, and the mapping relation between an information publisher and the text information is established, so that compared with the prior art that text mining is directly carried out on the text information by utilizing an LDA topic model, the method and system have higher accuracy; the secondary filtering of the text information by combining the Hadoop platform and the LDA topic model can realize refined extraction of the topic of the text information, improve the access rate of the recommendation system to the text information and the accuracy of text information query before recommendation, and further guarantee the effectiveness and accuracy of information recommendation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a block diagram of an information recommendation system based on a Hadoop platform according to an embodiment of the present invention
FIG. 2 is a flow chart of an embodiment of the invention based on a Hadoop platform information recommendation method
FIG. 3 is a flowchart illustrating an implementation of a microblog news recommending method based on a Hadoop platform according to another embodiment of the invention
FIG. 4 is a schematic diagram of a MapReduce engine according to another embodiment of the present invention
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1 and fig. 2, the present embodiment provides an information recommendation system based on a Hadoop platform, including:
the information acquisition module is used for acquiring the text information and the corresponding publisher information;
the information storage module runs an HDFS (Hadoop distributed file system) system with a Hadoop computing framework to store the text information and the publisher information which are subjected to denoising processing;
the key value pair generating module runs a MapReduce computing framework to divide and sequence the text information and the publisher information stored in the HDFS system, generates a plurality of text information and key value pairs corresponding to the publisher information, and combines the key value pairs of the same publisher to generate a plurality of key value pair lists;
the text information topic modeling module is used for carrying out topic modeling on the key value pair list obtained in the key value pair generating module by utilizing an LDA topic model to obtain the topic characteristics of each piece of text information;
the text information clustering module is used for clustering the text information according to the modeling result of the LDA topic model and recommending information to the user;
and the recommending module is used for recommending information to the user according to the clustering result of the text information.
When the recommendation system of the embodiment recommends information to a user, text information and corresponding publisher information are obtained, denoising processing is performed on the text information, and the text information and the publisher information subjected to denoising processing are stored in an HDFS system of a Hadoop platform; dividing and sequencing text information and publisher information stored in the HDFS system by using a MapReduce computing frame to generate a plurality of text information and key value pairs corresponding to the publisher information, and combining the key value pairs of the same publisher to generate a plurality of key value pair lists; performing theme modeling on the key value pair list by using an LDA theme model to obtain the theme characteristics of each piece of text information, and clustering the text information according to the modeling result of the LDA theme model; and recommending information to the user according to the clustering result of the text information.
In a further embodiment, the recommendation system may further include a text information denoising module (not shown in the figure) for performing denoising processing on the text information; the denoising processing module can also be integrated in the information acquisition module, and the information acquisition module can process and store the text information in the information storage module of the HDFS system operating with the Hadoop calculation framework.
In a further embodiment, the information storage module and the key-value pair generation module may be integrated in the same processing unit, and a complete Hadoop platform has been constructed in the processing unit, which includes an HDFS system and a MapReduce engine, and is configured to perform distributed storage and mapping relationship establishment on text information, and finally obtain a multiple key-value pair list merged according to publisher information.
The modules may be implemented by software codes, and in this case, the modules may be stored in a memory provided at a control end such as a control computer. The above modules may also be implemented by hardware, such as an integrated circuit chip.
As shown in fig. 3, another embodiment of the present invention is introduced below, and this embodiment is a personalized microblog news recommendation method based on a Hadoop platform, and more personalized and precise user news recommendation is realized by performing more precise topic mining on huge amounts of microblog text data by using the Hadoop platform.
The acquisition of the microblog text information is different from that of a common webpage, and because a crawler scheme based on Python is adopted in the embodiment of the microblog anti-crawler mechanism, the authority of acquiring information is obtained by simulating the login of a user to obtain the authorization of a microblog platform, which comprises the following steps:
the user name is encrypted through base64 to carry out pre-login to obtain parameters of server time, nonce, pubkey and rsakv, wherein the server time is server time, the nonce is a server random character string, the pubkey is a public key encrypted by RSA used by a customer service end, and the rsakv is a value in headers used for login;
carrying out RSA encryption on the password to construct a form data imitation login request;
and acquiring a login jump link and acquiring cookie information.
After logging in successfully, a URL needs to be selected from page URLs which are not acquired, the webpage is downloaded after entering the page, the acquired URL list library is updated after downloading is completed, and page resolution is performed on the downloaded webpage after storage is completed, wherein the page resolution is to extract microblog information in the downloaded webpage. Firstly, a microblog page is used as a seed URL, a comment relation and news comment content (including a user ID) of a news microblog can be obtained from the seed URL, and then personal information, microblog text information and the like of a user personal homepage are crawled according to the user ID.
The method needs to perform data cleansing on the text information of the microblog to retain effective text content, and comprises the following steps:
different from common text information, microblog text information is provided with a plurality of special text elements, in order to improve the access rate of data and the accuracy of text mining, the special text elements which are irrelevant to the meaning of the text information need to be removed, such as '@' characters, topic labels '###', URL links displayed in the text information and the like, and the workload of subsequent computer operation is reduced;
in order to facilitate text mining, the text information language is uniformly converted into simplified Chinese in the embodiment;
in addition, the microblog text information also carries more emoticons, and the emoticons reflect the emotional characteristics of the text information to a certain extent, so that the emoticons are converted into characters and reserved as a part of the text information to restore the text information content more accurately;
the embodiment also carries out word segmentation and word stop removal on the text information, and the word segmentation system adopts ICTCCLAS to realize Chinese word segmentation, part of speech tagging and unknown word identification on the text information; the stop words refer to words which have high occurrence frequency but do not have actual meanings, such as language words, in the text, and the removal of the stop words can effectively release the storage space, improve the mining capability and the clustering efficiency of a subsequent LDA topic model, and further improve the recommendation accuracy.
Storing the text information cleaned by the series of data and the publisher information thereof in an HDFS system, wherein the HDFS system is provided with a NameNode and a plurality of DataNodes, the NameNode is responsible for positioning the storage position of the text information, naming the entered text information, distributing the entered text information for each DataNode node, and finally storing the text information in the DataNodes; the main responsibility of the DataNode is to respond to the data access command of the NameNode in real time, store or extract the text information in real time, and the NameNode and the DataNode keep the real-time information interaction through a heartbeat mechanism. The HDFS also performs multi-path backup on the information while inputting the text information, stores microblog information in blocks, and defaults the size of each block to be 64M, so that the safety, accuracy and access efficiency of data are improved, and the subsequent MapReduce data processing is easy.
As shown in fig. 4, MapReduce mainly accomplishes the following work:
receiving a text information processing request, sending a processing instruction to a node JobClient, packaging application configuration parameters into jar files by the processing instruction, storing the jar files into an HDFS (Hadoop distributed File System), and submitting a text information storage path to a JobTracker node; creating each Task, namely MapTask and ReduceTask by a JobTracker node, distributing the Taskask and ReduceTask to each TaskTracker service for execution, monitoring each Task by the JobTracker, and re-running if a failed Task is found; the TaskTracker subdivides the text information preprocessing task, and invokes a plurality of Map tasks, at the moment, the disordered text information of the HDFS system is divided and sequenced, and a plurality of key value pairs of < user u, information v > are generated, wherein the key value pairs represent the mapping relation between the microblog users and the text information issued by the microblog users one by one; when the Map component finishes data segmentation and serialization, merging segmented key-value pairs < user u, information v > through the Shuffle component, wherein the merging basis is the user name of a microblog publisher, and merging the key-value pairs of microblog information of the same user into a large key-value pair list. At this time, the output result of the Map process is the input of Reduce, Reduce further performs aggregation optimization processing on the microblog information key value pair lists, and finally outputs the text information key value pairs processed by the system.
The processing process of the microblog text information by the MapReduce in the whole stage is realized on the basis of dynamic real-time interaction of the NameNode and the DataNode in the HDFS system.
Extracting key-value pair lists using LDA topic modelCalculating the topic distribution of the text information contained in the text information to obtain the topic characteristics of the text, wherein the posterior distribution of the topic distribution and the word distribution in the LDA topic model is estimated by utilizing a Gibbs sampling algorithm so as to estimate the topic distribution theta and the word distribution
Figure BDA0002539393760000072
Two parameters.
After traversing all text information by the LDA topic model, calculating the similarity of the text by using cosine similarity to cluster the text information, simplifying the semantic similarity of the text information into space vector operation by using a vector space model VSM, comparing each keyword in the text information with a bag of words and giving a positive real value on the basis of the bag of words to enable each text information to form a multidimensional space vector, and calculating the cosine similarity of the text information as follows
Figure BDA0002539393760000071
AiAnd BiVector space model VSM-based spatial vectors representing two text messages participating in a similarity calculation, respectively.
And calculating the final similarity between the candidate blog articles and the user preference and/or the scores of the text information according to the text clustering result, generating a Top-K blog article recommendation list according to the similarity in a descending order, and finally recommending microblog information to the microblog user in a personalized manner to realize accurate microblog news recommendation.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An information recommendation method based on a Hadoop platform is characterized by comprising the following steps:
acquiring text information and corresponding publisher information, performing denoising processing on the text information, and storing the denoised text information and the publisher information in an HDFS (Hadoop distributed file system) of a Hadoop platform;
dividing and sequencing text information and publisher information stored in the HDFS system by using a MapReduce computing frame to generate a plurality of text information and key value pairs corresponding to the publisher information, and combining the key value pairs of the same publisher to generate a plurality of key value pair lists;
performing theme modeling on the key value pair list by using an LDA theme model to obtain the theme characteristics of each piece of text information, and clustering the text information according to the modeling result of the LDA theme model;
and recommending information to the user according to the clustering result of the text information.
2. The Hadoop platform-based information recommendation method according to claim 1, wherein the denoising processing of the text information comprises:
and converting the text information into a uniform language.
3. The Hadoop platform based information recommendation method as claimed in claim 2, wherein the denoising processing of the text information further comprises:
and converting special symbols carried in the text information into characters so as to reserve the emotional characteristics of the text information.
4. The Hadoop platform based information recommendation method as claimed in claim 3, wherein the denoising processing of the text information further comprises:
and performing word segmentation on the text information by using an ICTCCLAS word segmentation system.
5. The Hadoop platform based information recommendation method according to claim 4, wherein the denoising processing of the text information further comprises:
stop words in the text information are removed to reduce the storage space of the text information in the HDFS system.
6. The Hadoop platform based information recommendation method according to claim 1, wherein the clustering the text information comprises:
and calculating the similarity of the text information by utilizing the cosine similarity, and clustering the text information according to the calculation result of the similarity.
7. The Hadoop platform-based information recommendation method according to claim 7, wherein the calculating the similarity of the text information by using the cosine similarity comprises:
text information is simplified into space vectors by using a vector space model VSM, and the cosine similarity of the text information is calculated as the following formula
Figure FDA0002539393750000021
AiAnd BiVector space model VSM-based spatial vectors representing two text messages participating in a similarity calculation, respectively.
8. The Hadoop platform-based information recommendation method according to claim 1, wherein the recommending information to the user according to the clustering result of the text information comprises:
and calculating the similarity of the candidate text information and the reading history of the user and/or the score of the text information according to the clustering result of the text information, generating a list to be recommended, and indexing the list to be recommended to complete information recommendation of the user.
9. The Hadoop platform-based information recommendation method according to claim 1, wherein the acquiring text information and corresponding publisher information comprises:
and simulating user login, downloading any page URL, performing page analysis to obtain publisher information, and obtaining the published text information according to the publisher information.
10. An information recommendation system based on a Hadoop platform is characterized by comprising:
the information acquisition module is used for acquiring the text information and the corresponding publisher information;
the information storage module is used for operating an HDFS (Hadoop distributed file system) system with a Hadoop computing framework so as to store the text information and the publisher information which are subjected to denoising processing;
the key value pair generating module runs a MapReduce computing framework to divide and sequence the text information and the publisher information stored in the HDFS system, generates a plurality of text information and key value pairs corresponding to the publisher information, and combines the key value pairs of the same publisher to generate a plurality of key value pair lists;
the text information topic modeling module is used for carrying out topic modeling on the key value pair list obtained in the key value pair generating module by utilizing an LDA topic model to obtain the topic characteristics of each piece of text information;
the text information clustering module is used for clustering the text information according to the modeling result of the LDA topic model and recommending information to the user;
and the recommending module is used for recommending information to the user according to the clustering result of the text information.
CN202010542277.2A 2020-06-15 2020-06-15 Hadoop platform-based information recommendation method and system Pending CN111695020A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010542277.2A CN111695020A (en) 2020-06-15 2020-06-15 Hadoop platform-based information recommendation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010542277.2A CN111695020A (en) 2020-06-15 2020-06-15 Hadoop platform-based information recommendation method and system

Publications (1)

Publication Number Publication Date
CN111695020A true CN111695020A (en) 2020-09-22

Family

ID=72481004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010542277.2A Pending CN111695020A (en) 2020-06-15 2020-06-15 Hadoop platform-based information recommendation method and system

Country Status (1)

Country Link
CN (1) CN111695020A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562947A (en) * 2017-09-26 2018-01-09 宿州学院 A kind of Mobile Space-time perceives the lower dynamic method for establishing model of recommendation service immediately
CN108733824A (en) * 2018-05-22 2018-11-02 合肥工业大学 Consider the interactive theme modeling method and device of expertise
CN108920508A (en) * 2018-05-29 2018-11-30 福建新大陆软件工程有限公司 Textual classification model training method and system based on LDA algorithm
CN110555106A (en) * 2018-03-28 2019-12-10 蓝盾信息安全技术有限公司 Semi-supervised LDA model based on seed words

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562947A (en) * 2017-09-26 2018-01-09 宿州学院 A kind of Mobile Space-time perceives the lower dynamic method for establishing model of recommendation service immediately
CN110555106A (en) * 2018-03-28 2019-12-10 蓝盾信息安全技术有限公司 Semi-supervised LDA model based on seed words
CN108733824A (en) * 2018-05-22 2018-11-02 合肥工业大学 Consider the interactive theme modeling method and device of expertise
CN108920508A (en) * 2018-05-29 2018-11-30 福建新大陆软件工程有限公司 Textual classification model training method and system based on LDA algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李钊;李晓;王春梅;李诚;杨春;: "一种基于MapReduce的文本聚类方法研究", no. 01 *
段庆伟;铁木巴干;: "基于Hadoop云计算平台的新浪微博数据聚类分析算法研究", no. 04 *

Similar Documents

Publication Publication Date Title
CN103514183B (en) Information search method and system based on interactive document clustering
Nabli et al. Efficient cloud service discovery approach based on LDA topic modeling
CN110968684A (en) Information processing method, device, equipment and storage medium
CN104536830A (en) KNN text classification method based on MapReduce
Krishnaraj et al. Conceptual semantic model for web document clustering using term frequency
Liang et al. Co-clustering WSDL documents to bootstrap service discovery
Zhao et al. Text sentiment analysis algorithm optimization and platform development in social network
Zhu et al. Real-time personalized twitter search based on semantic expansion and quality model
CN107066585B (en) A kind of probability topic calculates and matched public sentiment monitoring method and system
CN107368489A (en) A kind of information data processing method and device
Nodarakis et al. Using hadoop for large scale analysis on twitter: A technical report
CN110069686A (en) User behavior analysis method, apparatus, computer installation and storage medium
CN110020214B (en) Knowledge-fused social network streaming event detection system
CN116823410A (en) Data processing method, object processing method, recommending method and computing device
Sivaramakrishnan et al. Validating effective resume based on employer’s interest with recommendation system
CA3046474A1 (en) Portfolio-based text analytics tool
CN111695020A (en) Hadoop platform-based information recommendation method and system
CN115640439A (en) Method, system and storage medium for network public opinion monitoring
Xu et al. Research on topic discovery technology for Web news
Nodarakis et al. MR-SAT: a MapReduce algorithm for big data sentiment analysis on Twitter
Delianidi et al. A graph-based method for session-based recommendations
CN108345605B (en) Text search method and device
Yu et al. Friend recommendation mechanism for social media based on content matching
Pushplata et al. An analytical assessment on document clustering
Yang et al. A Method for Massive Scientific Literature Clustering Based on Hadoop

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination