CN112861016A

CN112861016A - Data high-concurrency processing method of Feed stream system of healthy social platform

Info

Publication number: CN112861016A
Application number: CN202011439825.5A
Authority: CN
Inventors: 吕小健; 况红波
Original assignee: Shenzhen Pantaoshu Technology Co ltd
Current assignee: Shenzhen Pantaoshu Technology Co ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-05-28

Abstract

The invention belongs to the technical field of high-concurrency data processing methods, and particularly relates to a method for storing a tweet into a database, storing a tweetmeta into a timeline, and S1, wherein when a user publishes the tweet, the tweetmeta is written into the timeline list through fanout according to a social graph; only the metadata is stored, and massive metadata pushing can be well supported by means of Redis; s2, when the user views own timeline, directly taking tweetmeta from the own timeline, and then obtaining corresponding tweet data from DB; by upgrading the SNS social system, the design and development of a million-level data Feed stream service system can be realized, and better system performance is provided; the processing method optimizes and enhances the social recommendation algorithm, optimizes the accurate matching of data and solves the problem of data repetition in social relations; the method effectively promotes the user to increase the alternating current frequency and the activity, and improves the user experience.

Description

Data high-concurrency processing method of Feed stream system of healthy social platform

Technical Field

The invention belongs to the technical field of data high-concurrency processing methods, and particularly relates to a data high-concurrency processing method of a Feed stream system of a healthy social platform.

Background

The data acquisition of big data is based on determining user target, aiming at the acquisition of all structured, semi-structured and unstructured data in the range, processing the data after acquisition, and analyzing and mining valuable information from the data. In the process of collecting big data, the big data collection method has two main challenges, one is that the communication protocol and the data protocol of heterogeneous internet of things equipment are diversified; feed stream systems, the two most critical cores, one is storage and one is push. The content required to be stored in the Feed stream system is divided into two parts, one is a social relationship (such as a friend, a community member and an attention list) and the other is Feed message content. Two functions are required by the Feed pushing system, one is to release feeds and the other is to read Feed streams; with the access of mass equipment, the high concurrency of data acquisition can generate performance bottleneck, which causes the problems of data backlog, connection overtime and the like; the system cannot dynamically and real-timely transmit the data to subscribers, and the content push of the social network based on the SNS is not easy to realize; aiming at the problems exposed in the use process of the current data high concurrent processing method, structural improvement and optimization of the data high concurrent processing method are needed.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a data high concurrency processing method for a Feed stream system of a healthy social platform, which has the characteristics of facilitating dynamic real-time propagation to subscribers through Feed streams and realizing the content push of a social network based on SNS.

In order to achieve the purpose, the invention provides the following technical scheme: the method comprises the following steps that (1) tween is stored in a database, and tween is stored in timeline:

s1, when the user issues the tweet, writing the tweetmeta into the timeline list through fanout according to the social graph; only the metadata is stored, and massive metadata pushing can be well supported by means of Redis;

s2, when the user views the own timeline, the user directly takes the tweetmeta from the own timeline, and then obtains the corresponding tweet data from the DB.

The preferable technical scheme of the data high concurrency processing method of the Feed stream system of the healthy social platform is that when a Feed message is issued;

1) feed messages first enter a queue service. Extracting metadata (tweetMeta) such as a publisher, a publishing domain, publishing time, a content type and a content id from the Feed message;

2) storing the Feed message into a Mysql database, and asynchronously calling metadata publishing service to publish after successful storage;

3) the metadata publishing service extracts the ID of the publisher, the publishing domain and the Feed message from the metadata, and calls the social relationship service to determine a queue list needing to push the Feed;

4) write rows of data into multiple Feed streams at once using a bulk write interface of a metadata publishing service.

The optimal technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform is that when the Feed stream is read;

1) reading Feed IDs of the latest N Feed messages from the Feed stream;

2) after the Feed ID list is obtained, asynchronously calling a Feed content storage interface (with a cache function), and directly reading corresponding Feed content through the Id;

3) combining the results in the step 2) and returning the combined results to the user. The process of reading the Feed stream ends. When the user acquires more data, the process of 1) may be repeated with the Score of the last Feed at the present as the maximum range.

As an optimal technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform, a ternary closure theory and a common friend and time sequence recommendation algorithm are adopted; on the basis of common friends, time dimension is added; based on the assumption that the user is more interested in the newly added buddy.

The optimal technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform adopts an empirical formula

The larger the time difference is, the smaller the weight is; δ u, fi is the time when u establishes a friend relationship with fi, δ fi, fof is the time when fi establishes a friend relationship with fof, and-0.3 is a penalty factor.

As an optimal technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform, the overload of the social data cache is solved by adopting a secondary cache consisting of a local cache Ehcache and a centralized cache redis.

a) When the memory cache is used, once the application is restarted, the cache data is lost, the cache avalanche causes huge pressure to the database, and the application is blocked;

b) when the memory cache is used, a plurality of application nodes cannot share cache data;

c) with centralized caching, the data throughput of the caching service is too high and the bandwidth runs out due to the large amount of data being obtained by the cache. The phenomenon is that the Redis service load is not high, but the data reading is very slow due to the fact that the bandwidth of a machine network card runs full;

when problems a, b are encountered, Redis is used to cache data, thus leading to the occurrence of problem c, which is difficult to avoid.

When the problem c occurs, the cluster of Redis is adopted, and the pressure of the cache service is reduced through the cluster.

As an optimal technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform, the existing memory cache frame is used as a first-level cache, and Redis is used as a second-level cache.

As the preferable technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform, all data are read from the first-level cache first and then read from the second-level cache when the data do not exist, and the access times of the Redis to the second-level cache are reduced.

As the preferable technical scheme of the data high-concurrency processing method of the Feed stream system of the healthy social platform, the first-level cache frame can control the data amount stored in the memory through configuration, and the memory overflow is avoided.

Compared with the prior art, the invention has the beneficial effects that:

(1) by upgrading the SNS social system, the design and development of a million-level data Feed stream service system can be realized, and better system performance is provided; the processing method optimizes and enhances the social recommendation algorithm, optimizes the accurate matching of data and solves the problem of data repetition in social relations; the method effectively promotes the user to increase the alternating current frequency and the activity, and improves the user experience;

(2) the method meets the service requirement and design target of the prior Feed stream system, and effectively solves the content transmission problem of dynamically distributing the content to subscribers in real time under the social relationship; by separating Feed content and metadata, the memory requirement of a Feed stream system is effectively reduced; the method adopts a mode of combining organic combination of an application layer with a common open-source stable storage scheme, reduces the difficulty of implementation and greatly improves the expansibility.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a Feed stream distribution of the present invention;

FIG. 2 is a general block diagram of a Feed stream system map structure in the present invention;

FIG. 3 is a block diagram of the logic structure of a Feed stream system according to the present invention;

FIG. 4 is a flow chart of Feed stream reading in the present invention;

FIG. 5 is a flowchart of a time series recommendation algorithm;

FIG. 6 is a diagram of empirical equations.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

Referring to fig. 1-6, the present invention provides the following technical solutions: the method comprises the following steps that (1) tween is stored in a database, and tween is stored in timeline:

s2, when the user views own timeline, the user directly obtains the tweetmeta from the own timeline, and then obtains corresponding tweet data from the DB, wherein the test environment in the embodiment is a single 4-core 8G gigabit network card Centos7.5 server, and a redis4.2, mysql 5.7 and feed stream system are deployed in the docker. The pressure measurement client uses apache meter5.3, 100 threads to group pressure measurement for 10 min.

Specifically, when a Feed message is issued;

if the publishing domain is a friend, acquiring a friend list of the publisher through the social relationship service; if the release domain is a community, acquiring a member list of the community through a social relationship service; acquiring a corresponding pushed Feed queue according to the social relation under other conditions;

4) writing multiple rows of data into multiple Feed streams at once using a batch write interface of a metadata publishing service, ending the process of publishing feeds.

Specifically, when the Feed stream is read;

1) the Feed ID of the latest N Feed messages is read from the Feed stream (the read may be performed in a range interval, where the start position of the range is the ID of the latest Feed read last time, and the end position may be the current time or MAX). (ii) a

The processing method adopts a Redis ordered set (zset) as a bottom layer to construct a feed push function.

1) Redis ordered collections are also collections of string type elements, as are collections, and do not allow for repeated members.

2) Except that each element is associated with a score of double type; redis is just sorting the members of a collection from small to large by score.

3) Members of the ordered set are unique, but scores (score) can be repeated.

4) The aggregation is realized by a hash table, so the complexity of addition, deletion and search is O (1). The maximum number of members in a set is 2^32-1 (4294967295, each set can store 40 more hundred million members).

Specifically, a ternary closure theory and a common friend and time sequence recommendation algorithm are adopted; on the basis of common friends, time dimension is added; this embodiment is based on the assumption that the user is more interested in the newly added buddy.

Specifically, empirical formulas are used

The larger the time difference is, the smaller the weight is; δ u, fi is the time for u to establish a friend relationship with fi, δ fi, fof is the time for fi to establish a friend relationship with fof, and-0.3 is a penalty factor; the penalty factor is an empirical parameter and needs to be adjusted according to specific conditions; the score recommended by the friend can be directly calculated according to the empirical formula, and the score recommended by the friend can also be used as a one-dimensional characteristic to perform regression together with other characteristics.

Specifically, a second-level cache consisting of a local cache Ehcache and a centralized cache redis adopted, so that the overload of the social data cache is solved.

Specifically, the existing memory cache frame is used as a primary cache, and Redis is used as a secondary cache.

Specifically, all data are read from the first-level cache first, and are read from the second-level cache when the data do not exist, so that the access times to the second-level cache Redis are reduced.

Specifically, the first-level cache frame may be configured to control the amount of data stored in the memory, so as to avoid memory overflow.

Because of the timeliness of social Feed, all Feed lists do not need to be stored in the Feed stream, and only the latest Feed needs to be stored;

by periodically scanning the SIZE of the number of feeds within the Feed stream, when MAX _ SIZE is exceeded, an operation of clearing a portion of the feeds is performed. The whole storage data volume in the Redis is kept in a relatively stable state, so that the pressure of a server is reduced, and the stability of a system is improved;

when the required reading time range exceeds the data in the Feed stream, the Feed ID in the corresponding time period is directly screened out through the Feed memory storage data, and then the corresponding Feed content can be returned through the step 3 in the Feed reading.

The average response time of the feed flow system in the pressure measurement process is 92ms, and tps is 945; the average value of the response time is 100ms, and the partial response time is increased to 200 ms; the tps mean value is 945, and the fractional time is reduced to 600. Monitoring the data by looking at feed stream JVM reveals that frequent write requests cause short GC to the JVM, resulting in a decrease in tps and an increase in response time. The number of GC can be reduced by increasing jvm the memory and adjusting the GC parameter.

Feed, each state or message in a Feed stream is Feed, for example, one state in a friend circle is Feed, and one microblog in microblogs is Feed.

A Feed stream is a stream of information that is continuously updated and presented to the user's content. A friend circle of each person, a microblog interest page and the like are Feed streams.

Timeline is a type of Feed stream, and microblogs and circle of friends are all Timeline type Feed streams, but since Timeline type occurs earliest, it is most widely used and well known, and sometimes Timeline is used to represent Feed streams.

The personal page Timeline is used for displaying pages of Feed messages sent by the user, such as albums in WeChat, personal pages of microblog and the like.

The MetaData is a data format that records meta information (such as publisher, domain of publication, time of publication, type of content, id of content, etc.) of each piece of status or message in the Feed stream.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A data high-concurrency processing method of a Feed stream system of a healthy social platform is characterized in that a tweet is stored in a database, and a tweetmeta is stored in a timeline, and the method comprises the following steps:

2. The method for processing data of a healthy social platform Feed stream system according to claim 1, wherein the method comprises the following steps: when a Feed message is issued;

1) the Feed message first enters a queue service;

extracting metadata (tweetMeta) such as a publisher, a publishing domain, publishing time, a content type and a content id from the Feed message;

3. The method for processing data of a healthy social platform Feed stream system according to claim 2, wherein the method comprises the following steps: reading the Feed stream;

1) reading Feed IDs of the latest N Feed messages from the Feed stream;

3) combining the results in the step 2) and returning the results to the user;

the flow of reading the Feed stream is finished;

when the user acquires more data, the process of 1) may be repeated with the Score of the last Feed at the present as the maximum range.

4. The method for processing data of a healthy social platform Feed stream system according to claim 1, wherein the method comprises the following steps: adopting a ternary closure theory and a common friend and time sequence recommendation algorithm; on the basis of common friends, time dimension is added; based on the assumption that the user is more interested in the newly added buddy.

5. The method for processing data of a healthy social platform Feed stream system according to claim 4, wherein the method comprises the following steps: using empirical formulas

6. The method for processing data of a healthy social platform Feed stream system according to claim 5, wherein the method comprises the following steps: a second-level cache consisting of a local cache Ehcache and a centralized cache redis is adopted to solve the problem of overload of social data cache;

c) by using a centralized cache, the data throughput of the cache service is too large and the bandwidth runs full due to the fact that a large amount of data is obtained through the cache;

the phenomenon is that the Redis service load is not high, but the data reading is very slow due to the fact that the bandwidth of a machine network card runs full;

when problems a and b are encountered, Redis is used for caching data, so that the problem c is caused inevitably;

7. The method for processing data of a healthy social platform Feed stream system according to claim 6, wherein the method comprises the following steps: the existing memory cache frame is used as a first-level cache, and Redis is used as a second-level cache.

8. The method for processing data of a healthy social platform Feed stream system according to claim 7, wherein the method comprises the following steps: all data are read from the first-level cache first, and are read from the second-level cache when the data do not exist, so that the access times to the second-level cache Redis are reduced.

9. The method for processing data of a healthy social platform Feed stream system according to claim 8, wherein the method comprises the following steps: the first-level cache frame can control the data amount stored in the memory through configuration, and memory overflow is avoided.