CN112861016A - Data high-concurrency processing method of Feed stream system of healthy social platform - Google Patents

Data high-concurrency processing method of Feed stream system of healthy social platform Download PDF

Info

Publication number
CN112861016A
CN112861016A CN202011439825.5A CN202011439825A CN112861016A CN 112861016 A CN112861016 A CN 112861016A CN 202011439825 A CN202011439825 A CN 202011439825A CN 112861016 A CN112861016 A CN 112861016A
Authority
CN
China
Prior art keywords
data
feed
cache
feed stream
redis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011439825.5A
Other languages
Chinese (zh)
Inventor
吕小健
况红波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Pantaoshu Technology Co ltd
Original Assignee
Shenzhen Pantaoshu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Pantaoshu Technology Co ltd filed Critical Shenzhen Pantaoshu Technology Co ltd
Priority to CN202011439825.5A priority Critical patent/CN112861016A/en
Publication of CN112861016A publication Critical patent/CN112861016A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention belongs to the technical field of high-concurrency data processing methods, and particularly relates to a method for storing a tweet into a database, storing a tweetmeta into a timeline, and S1, wherein when a user publishes the tweet, the tweetmeta is written into the timeline list through fanout according to a social graph; only the metadata is stored, and massive metadata pushing can be well supported by means of Redis; s2, when the user views own timeline, directly taking tweetmeta from the own timeline, and then obtaining corresponding tweet data from DB; by upgrading the SNS social system, the design and development of a million-level data Feed stream service system can be realized, and better system performance is provided; the processing method optimizes and enhances the social recommendation algorithm, optimizes the accurate matching of data and solves the problem of data repetition in social relations; the method effectively promotes the user to increase the alternating current frequency and the activity, and improves the user experience.

Description

Data high-concurrency processing method of Feed stream system of healthy social platform
Technical Field
The invention belongs to the technical field of data high-concurrency processing methods, and particularly relates to a data high-concurrency processing method of a Feed stream system of a healthy social platform.
Background
The data acquisition of big data is based on determining user target, aiming at the acquisition of all structured, semi-structured and unstructured data in the range, processing the data after acquisition, and analyzing and mining valuable information from the data. In the process of collecting big data, the big data collection method has two main challenges, one is that the communication protocol and the data protocol of heterogeneous internet of things equipment are diversified; feed stream systems, the two most critical cores, one is storage and one is push. The content required to be stored in the Feed stream system is divided into two parts, one is a social relationship (such as a friend, a community member and an attention list) and the other is Feed message content. Two functions are required by the Feed pushing system, one is to release feeds and the other is to read Feed streams; with the access of mass equipment, the high concurrency of data acquisition can generate performance bottleneck, which causes the problems of data backlog, connection overtime and the like; the system cannot dynamically and real-timely transmit the data to subscribers, and the content push of the social network based on the SNS is not easy to realize; aiming at the problems exposed in the use process of the current data high concurrent processing method, structural improvement and optimization of the data high concurrent processing method are needed.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a data high concurrency processing method for a Feed stream system of a healthy social platform, which has the characteristics of facilitating dynamic real-time propagation to subscribers through Feed streams and realizing the content push of a social network based on SNS.
In order to achieve the purpose, the invention provides the following technical scheme: the method comprises the following steps that (1) tween is stored in a database, and tween is stored in timeline:
s1, when the user issues the tweet, writing the tweetmeta into the timeline list through fanout according to the social graph; only the metadata is stored, and massive metadata pushing can be well supported by means of Redis;
s2, when the user views the own timeline, the user directly takes the tweetmeta from the own timeline, and then obtains the corresponding tweet data from the DB.
The preferable technical scheme of the data high concurrency processing method of the Feed stream system of the healthy social platform is that when a Feed message is issued;
1) feed messages first enter a queue service. Extracting metadata (tweetMeta) such as a publisher, a publishing domain, publishing time, a content type and a content id from the Feed message;
2) storing the Feed message into a Mysql database, and asynchronously calling metadata publishing service to publish after successful storage;
3) the metadata publishing service extracts the ID of the publisher, the publishing domain and the Feed message from the metadata, and calls the social relationship service to determine a queue list needing to push the Feed;
4) write rows of data into multiple Feed streams at once using a bulk write interface of a metadata publishing service.
The optimal technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform is that when the Feed stream is read;
1) reading Feed IDs of the latest N Feed messages from the Feed stream;
2) after the Feed ID list is obtained, asynchronously calling a Feed content storage interface (with a cache function), and directly reading corresponding Feed content through the Id;
3) combining the results in the step 2) and returning the combined results to the user. The process of reading the Feed stream ends. When the user acquires more data, the process of 1) may be repeated with the Score of the last Feed at the present as the maximum range.
As an optimal technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform, a ternary closure theory and a common friend and time sequence recommendation algorithm are adopted; on the basis of common friends, time dimension is added; based on the assumption that the user is more interested in the newly added buddy.
The optimal technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform adopts an empirical formula
Figure DEST_PATH_IMAGE001
The larger the time difference is, the smaller the weight is; δ u, fi is the time when u establishes a friend relationship with fi, δ fi, fof is the time when fi establishes a friend relationship with fof, and-0.3 is a penalty factor.
As an optimal technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform, the overload of the social data cache is solved by adopting a secondary cache consisting of a local cache Ehcache and a centralized cache redis.
a) When the memory cache is used, once the application is restarted, the cache data is lost, the cache avalanche causes huge pressure to the database, and the application is blocked;
b) when the memory cache is used, a plurality of application nodes cannot share cache data;
c) with centralized caching, the data throughput of the caching service is too high and the bandwidth runs out due to the large amount of data being obtained by the cache. The phenomenon is that the Redis service load is not high, but the data reading is very slow due to the fact that the bandwidth of a machine network card runs full;
when problems a, b are encountered, Redis is used to cache data, thus leading to the occurrence of problem c, which is difficult to avoid.
When the problem c occurs, the cluster of Redis is adopted, and the pressure of the cache service is reduced through the cluster.
As an optimal technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform, the existing memory cache frame is used as a first-level cache, and Redis is used as a second-level cache.
As the preferable technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform, all data are read from the first-level cache first and then read from the second-level cache when the data do not exist, and the access times of the Redis to the second-level cache are reduced.
As the preferable technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform, the first-level cache frame can control the data amount stored in the memory through configuration, and the memory overflow is avoided.
Compared with the prior art, the invention has the beneficial effects that:
(1) by upgrading the SNS social system, the design and development of a million-level data Feed stream service system can be realized, and better system performance is provided; the processing method optimizes and enhances the social recommendation algorithm, optimizes the accurate matching of data and solves the problem of data repetition in social relations; the method effectively promotes the user to increase the alternating current frequency and the activity, and improves the user experience;
(2) the method meets the service requirement and design target of the prior Feed stream system, and effectively solves the content transmission problem of dynamically distributing the content to subscribers in real time under the social relationship; by separating Feed content and metadata, the memory requirement of a Feed stream system is effectively reduced; the method adopts a mode of combining organic combination of an application layer with a common open-source stable storage scheme, reduces the difficulty of implementation and greatly improves the expansibility.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a Feed stream distribution of the present invention;
FIG. 2 is a general block diagram of a Feed stream system map structure in the present invention;
FIG. 3 is a block diagram of the logic structure of a Feed stream system according to the present invention;
FIG. 4 is a flow chart of Feed stream reading in the present invention;
FIG. 5 is a flowchart of a time series recommendation algorithm;
FIG. 6 is a diagram of empirical equations.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
Referring to fig. 1-6, the present invention provides the following technical solutions: the method comprises the following steps that (1) tween is stored in a database, and tween is stored in timeline:
s1, when the user issues the tweet, writing the tweetmeta into the timeline list through fanout according to the social graph; only the metadata is stored, and massive metadata pushing can be well supported by means of Redis;
s2, when the user views own timeline, the user directly obtains the tweetmeta from the own timeline, and then obtains corresponding tweet data from the DB, wherein the test environment in the embodiment is a single 4-core 8G gigabit network card Centos7.5 server, and a redis4.2, mysql 5.7 and feed stream system are deployed in the docker. The pressure measurement client uses apache meter5.3, 100 threads to group pressure measurement for 10 min.
Specifically, when a Feed message is issued;
1) feed messages first enter a queue service. Extracting metadata (tweetMeta) such as a publisher, a publishing domain, publishing time, a content type and a content id from the Feed message;
2) storing the Feed message into a Mysql database, and asynchronously calling metadata publishing service to publish after successful storage;
3) the metadata publishing service extracts the ID of the publisher, the publishing domain and the Feed message from the metadata, and calls the social relationship service to determine a queue list needing to push the Feed;
if the publishing domain is a friend, acquiring a friend list of the publisher through the social relationship service; if the release domain is a community, acquiring a member list of the community through a social relationship service; acquiring a corresponding pushed Feed queue according to the social relation under other conditions;
4) writing multiple rows of data into multiple Feed streams at once using a batch write interface of a metadata publishing service, ending the process of publishing feeds.
Specifically, when the Feed stream is read;
1) the Feed ID of the latest N Feed messages is read from the Feed stream (the read may be performed in a range interval, where the start position of the range is the ID of the latest Feed read last time, and the end position may be the current time or MAX). (ii) a
2) After the Feed ID list is obtained, asynchronously calling a Feed content storage interface (with a cache function), and directly reading corresponding Feed content through the Id;
3) combining the results in the step 2) and returning the combined results to the user. The process of reading the Feed stream ends. When the user acquires more data, the process of 1) may be repeated with the Score of the last Feed at the present as the maximum range.
The processing method adopts a Redis ordered set (zset) as a bottom layer to construct a feed push function.
1) Redis ordered collections are also collections of string type elements, as are collections, and do not allow for repeated members.
2) Except that each element is associated with a score of double type; redis is just sorting the members of a collection from small to large by score.
3) Members of the ordered set are unique, but scores (score) can be repeated.
4) The aggregation is realized by a hash table, so the complexity of addition, deletion and search is O (1). The maximum number of members in a set is 2^32-1 (4294967295, each set can store 40 more hundred million members).
Specifically, a ternary closure theory and a common friend and time sequence recommendation algorithm are adopted; on the basis of common friends, time dimension is added; this embodiment is based on the assumption that the user is more interested in the newly added buddy.
Specifically, empirical formulas are used
Figure 178077DEST_PATH_IMAGE001
The larger the time difference is, the smaller the weight is; δ u, fi is the time for u to establish a friend relationship with fi, δ fi, fof is the time for fi to establish a friend relationship with fof, and-0.3 is a penalty factor; the penalty factor is an empirical parameter and needs to be adjusted according to specific conditions; the score recommended by the friend can be directly calculated according to the empirical formula, and the score recommended by the friend can also be used as a one-dimensional characteristic to perform regression together with other characteristics.
Specifically, a second-level cache consisting of a local cache Ehcache and a centralized cache redis adopted, so that the overload of the social data cache is solved.
a) When the memory cache is used, once the application is restarted, the cache data is lost, the cache avalanche causes huge pressure to the database, and the application is blocked;
b) when the memory cache is used, a plurality of application nodes cannot share cache data;
c) with centralized caching, the data throughput of the caching service is too high and the bandwidth runs out due to the large amount of data being obtained by the cache. The phenomenon is that the Redis service load is not high, but the data reading is very slow due to the fact that the bandwidth of a machine network card runs full;
when problems a, b are encountered, Redis is used to cache data, thus leading to the occurrence of problem c, which is difficult to avoid.
When the problem c occurs, the cluster of Redis is adopted, and the pressure of the cache service is reduced through the cluster.
Specifically, the existing memory cache frame is used as a primary cache, and Redis is used as a secondary cache.
Specifically, all data are read from the first-level cache first, and are read from the second-level cache when the data do not exist, so that the access times to the second-level cache Redis are reduced.
Specifically, the first-level cache frame may be configured to control the amount of data stored in the memory, so as to avoid memory overflow.
Because of the timeliness of social Feed, all Feed lists do not need to be stored in the Feed stream, and only the latest Feed needs to be stored;
by periodically scanning the SIZE of the number of feeds within the Feed stream, when MAX _ SIZE is exceeded, an operation of clearing a portion of the feeds is performed. The whole storage data volume in the Redis is kept in a relatively stable state, so that the pressure of a server is reduced, and the stability of a system is improved;
when the required reading time range exceeds the data in the Feed stream, the Feed ID in the corresponding time period is directly screened out through the Feed memory storage data, and then the corresponding Feed content can be returned through the step 3 in the Feed reading.
The average response time of the feed flow system in the pressure measurement process is 92ms, and tps is 945; the average value of the response time is 100ms, and the partial response time is increased to 200 ms; the tps mean value is 945, and the fractional time is reduced to 600. Monitoring the data by looking at feed stream JVM reveals that frequent write requests cause short GC to the JVM, resulting in a decrease in tps and an increase in response time. The number of GC can be reduced by increasing jvm the memory and adjusting the GC parameter.
Feed, each state or message in a Feed stream is Feed, for example, one state in a friend circle is Feed, and one microblog in microblogs is Feed.
A Feed stream is a stream of information that is continuously updated and presented to the user's content. A friend circle of each person, a microblog interest page and the like are Feed streams.
Timeline is a type of Feed stream, and microblogs and circle of friends are all Timeline type Feed streams, but since Timeline type occurs earliest, it is most widely used and well known, and sometimes Timeline is used to represent Feed streams.
The personal page Timeline is used for displaying pages of Feed messages sent by the user, such as albums in WeChat, personal pages of microblog and the like.
The MetaData is a data format that records meta information (such as publisher, domain of publication, time of publication, type of content, id of content, etc.) of each piece of status or message in the Feed stream.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A data high-concurrency processing method of a Feed stream system of a healthy social platform is characterized in that a tweet is stored in a database, and a tweetmeta is stored in a timeline, and the method comprises the following steps:
s1, when the user issues the tweet, writing the tweetmeta into the timeline list through fanout according to the social graph; only the metadata is stored, and massive metadata pushing can be well supported by means of Redis;
s2, when the user views the own timeline, the user directly takes the tweetmeta from the own timeline, and then obtains the corresponding tweet data from the DB.
2. The method for processing data of a healthy social platform Feed stream system according to claim 1, wherein the method comprises the following steps: when a Feed message is issued;
1) the Feed message first enters a queue service;
extracting metadata (tweetMeta) such as a publisher, a publishing domain, publishing time, a content type and a content id from the Feed message;
2) storing the Feed message into a Mysql database, and asynchronously calling metadata publishing service to publish after successful storage;
3) the metadata publishing service extracts the ID of the publisher, the publishing domain and the Feed message from the metadata, and calls the social relationship service to determine a queue list needing to push the Feed;
4) write rows of data into multiple Feed streams at once using a bulk write interface of a metadata publishing service.
3. The method for processing data of a healthy social platform Feed stream system according to claim 2, wherein the method comprises the following steps: reading the Feed stream;
1) reading Feed IDs of the latest N Feed messages from the Feed stream;
2) after the Feed ID list is obtained, asynchronously calling a Feed content storage interface (with a cache function), and directly reading corresponding Feed content through the Id;
3) combining the results in the step 2) and returning the results to the user;
the flow of reading the Feed stream is finished;
when the user acquires more data, the process of 1) may be repeated with the Score of the last Feed at the present as the maximum range.
4. The method for processing data of a healthy social platform Feed stream system according to claim 1, wherein the method comprises the following steps: adopting a ternary closure theory and a common friend and time sequence recommendation algorithm; on the basis of common friends, time dimension is added; based on the assumption that the user is more interested in the newly added buddy.
5. The method for processing data of a healthy social platform Feed stream system according to claim 4, wherein the method comprises the following steps: using empirical formulas
Figure DEST_PATH_IMAGE002
The larger the time difference is, the smaller the weight is; δ u, fi is the time when u establishes a friend relationship with fi, δ fi, fof is the time when fi establishes a friend relationship with fof, and-0.3 is a penalty factor.
6. The method for processing data of a healthy social platform Feed stream system according to claim 5, wherein the method comprises the following steps: a second-level cache consisting of a local cache Ehcache and a centralized cache redis is adopted to solve the problem of overload of social data cache;
a) when the memory cache is used, once the application is restarted, the cache data is lost, the cache avalanche causes huge pressure to the database, and the application is blocked;
b) when the memory cache is used, a plurality of application nodes cannot share cache data;
c) by using a centralized cache, the data throughput of the cache service is too large and the bandwidth runs full due to the fact that a large amount of data is obtained through the cache;
the phenomenon is that the Redis service load is not high, but the data reading is very slow due to the fact that the bandwidth of a machine network card runs full;
when problems a and b are encountered, Redis is used for caching data, so that the problem c is caused inevitably;
when the problem c occurs, the cluster of Redis is adopted, and the pressure of the cache service is reduced through the cluster.
7. The method for processing data of a healthy social platform Feed stream system according to claim 6, wherein the method comprises the following steps: the existing memory cache frame is used as a first-level cache, and Redis is used as a second-level cache.
8. The method for processing data of a healthy social platform Feed stream system according to claim 7, wherein the method comprises the following steps: all data are read from the first-level cache first, and are read from the second-level cache when the data do not exist, so that the access times to the second-level cache Redis are reduced.
9. The method for processing data of a healthy social platform Feed stream system according to claim 8, wherein the method comprises the following steps: the first-level cache frame can control the data amount stored in the memory through configuration, and memory overflow is avoided.
CN202011439825.5A 2020-12-11 2020-12-11 Data high-concurrency processing method of Feed stream system of healthy social platform Pending CN112861016A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011439825.5A CN112861016A (en) 2020-12-11 2020-12-11 Data high-concurrency processing method of Feed stream system of healthy social platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011439825.5A CN112861016A (en) 2020-12-11 2020-12-11 Data high-concurrency processing method of Feed stream system of healthy social platform

Publications (1)

Publication Number Publication Date
CN112861016A true CN112861016A (en) 2021-05-28

Family

ID=75997104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011439825.5A Pending CN112861016A (en) 2020-12-11 2020-12-11 Data high-concurrency processing method of Feed stream system of healthy social platform

Country Status (1)

Country Link
CN (1) CN112861016A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113961829A (en) * 2021-10-15 2022-01-21 上海一谈网络科技有限公司 Improved friend list generation method, device and equipment based on push-pull model
CN115052040A (en) * 2022-04-26 2022-09-13 浪潮通信技术有限公司 Feed stream implementation method, system, electronic device and storage medium
CN116132393A (en) * 2023-02-02 2023-05-16 网易(杭州)网络有限公司 Method, device, electronic equipment and computer medium for publishing and querying message

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177027A (en) * 2011-12-23 2013-06-26 北京新媒传信科技有限公司 Method and system for obtaining dynamic feed index
CN103516765A (en) * 2012-06-30 2014-01-15 北京新媒传信科技有限公司 Storage method and system of social networking service background data
US20140173451A1 (en) * 2012-12-14 2014-06-19 Microsoft Corporation Creating tasks based on newsfeed user entries
CN104516915A (en) * 2013-09-30 2015-04-15 腾讯科技(北京)有限公司 Media data publishing method and device based on tweet timeline
WO2020028308A1 (en) * 2018-07-31 2020-02-06 Facebook, Inc. Dynamic location monitoring for targeted updates

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177027A (en) * 2011-12-23 2013-06-26 北京新媒传信科技有限公司 Method and system for obtaining dynamic feed index
CN103516765A (en) * 2012-06-30 2014-01-15 北京新媒传信科技有限公司 Storage method and system of social networking service background data
US20140173451A1 (en) * 2012-12-14 2014-06-19 Microsoft Corporation Creating tasks based on newsfeed user entries
CN104516915A (en) * 2013-09-30 2015-04-15 腾讯科技(北京)有限公司 Media data publishing method and device based on tweet timeline
WO2020028308A1 (en) * 2018-07-31 2020-02-06 Facebook, Inc. Dynamic location monitoring for targeted updates

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113961829A (en) * 2021-10-15 2022-01-21 上海一谈网络科技有限公司 Improved friend list generation method, device and equipment based on push-pull model
CN115052040A (en) * 2022-04-26 2022-09-13 浪潮通信技术有限公司 Feed stream implementation method, system, electronic device and storage medium
CN115052040B (en) * 2022-04-26 2024-04-19 浪潮通信技术有限公司 Feed stream implementation method, system, electronic device and storage medium
CN116132393A (en) * 2023-02-02 2023-05-16 网易(杭州)网络有限公司 Method, device, electronic equipment and computer medium for publishing and querying message

Similar Documents

Publication Publication Date Title
US11580176B2 (en) Search infrastructure
CN112861016A (en) Data high-concurrency processing method of Feed stream system of healthy social platform
US10997145B2 (en) Hierarchical diff files
CN104050258B (en) Group recommendation method based on interest groups
US8775556B1 (en) Automated segmentation and processing of web site traffic data over a rolling window of time
US9292607B2 (en) Using social-network data for identification and ranking of URLs
CN109471847B (en) I/O congestion control method and control system
JP2019204473A (en) Method for writing plurality of small files of 2 mb or smaller to hdfs having data merge module and hbase cash module on the basis of hadoop
CN102662986A (en) System and method for microblog message retrieval
US8782162B1 (en) System for merging and comparing real-time analytics data with conventional analytics data
CN109165207B (en) Drinking water mass data storage management method and system based on Hadoop
Dam et al. Efficient top-k recently-frequent term querying over spatio-temporal textual streams
CN107679097B (en) Distributed data processing method, system and storage medium
US10318594B2 (en) System and method for enabling related searches for live events in data streams
CN107295059B (en) Statistical system and method for business pushing quantity
CN103825922B (en) A kind of data-updating method and web server
US20160217192A1 (en) Search system and search method
Jain et al. Scalable social analytics for live viral event prediction
US10511556B2 (en) Bursty detection for message streams
US9405846B2 (en) Publish-subscribe based methods and apparatuses for associating data files
CN113010373B (en) Data monitoring method and device, electronic equipment and storage medium
Portilla et al. A Study of YouTube recommendation graph based on measurements and stochastic tools
CN111858733A (en) Government affair information comparison method and system based on internet multi-source heterogeneous data
Ríssola et al. Inverted index entry invalidation strategy for real time search
Anta et al. Distributed slicing in dynamic systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210528