CN112418941A

CN112418941A - Resource popularity calculation method, system and storage medium based on real-time flow

Info

Publication number: CN112418941A
Application number: CN202011349407.7A
Authority: CN
Inventors: 万仕龙; 顾永兴; 仲跻炜; 朱彭生; 冯若寅; 梁东梅
Original assignee: Ouye Yunshang Co ltd
Current assignee: Ouye Yunshang Co ltd
Priority date: 2020-11-26
Filing date: 2020-11-26
Publication date: 2021-02-26

Abstract

The invention relates to a resource popularity calculation method, a resource popularity calculation system and a storage medium based on real-time streaming, wherein the method comprises the following steps of behavior data acquisition: collecting user behavior data with interaction events before each client and each server to a message middleware in a server embedded point mode; a transaction data acquisition step: synchronizing transaction information to a message middleware in real time in a log synchronization mode; data layering processing: and constructing a source data layer, a public data layer and an application data layer in the message middleware by adopting a data layering method, and performing aggregation calculation through a real-time calculation engine to obtain resource popularity index values. Compared with the prior art, the method has the advantages of avoiding the information loss condition, ensuring the real-time performance of data, along with high reliability and the like.

Description

Resource popularity calculation method, system and storage medium based on real-time flow

Technical Field

The invention relates to the technical field of big data processing, in particular to a resource popularity calculation method and system based on real-time flow and a storage medium.

Background

Currently, with the progress of network communication technology and the increase of broadband network, network retail platforms are increasingly developed and applied. For the network retail platform seller, it is very important to know the accessed condition of the shop resource in time. However, with the increasing number of users of network buyers, the original method for analyzing the popularity of store resources by obtaining information such as search and transaction off line cannot meet the requirement of timely adjustment of store resources by sellers; meanwhile, the mode of embedding the front-end codes is adopted for collecting the user behavior data, the mode can only obtain the user behavior of the PC end, channels such as complex mobile end APP and small programs cannot be covered, and information loss exists.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a resource popularity calculation method, a resource popularity calculation system and a storage medium which are suitable for a network retail platform, can avoid information loss, ensure data real-time performance and improve data processing reliability and are based on real-time flow.

The purpose of the invention can be realized by the following technical scheme:

a resource popularity calculation method based on real-time flow calculates user transaction data and user behavior data of a network retail platform according to the display dimension of a front end, and specifically comprises the following steps:

behavior data acquisition:

and acquiring user behavior data with interaction events between each client and the server of the network retail platform to the message middleware in a server-side embedded point mode.

A transaction data acquisition step:

and synchronizing the transaction information to the message middleware in real time in a log synchronization mode. The transaction information includes order data and resource number information.

Data layering processing:

and constructing a source data layer, a public data layer and an application data layer in the message middleware by adopting a data layering method, and performing aggregation calculation through a real-time calculation engine to obtain resource popularity index values. The source data layer comprises a clicking action unit, a browsing detail action unit, a quality guarantee book viewing action unit, a bidding action unit and a vehicle adding action unit.

In the data layering step, user behavior data acquired by pushing of a server-side buried point are respectively stored in a clicking behavior unit, a detail browsing behavior unit, a quality and guarantee book viewing behavior unit, a bid behavior unit and a car adding behavior unit of a source data layer of the message middleware.

Further, the message middleware adopts a distributed publish-subscribe message system kafka.

In the data layering step, stream processing is adopted to analyze log information of a source data layer, data with empty resource numbers in user behavior data are filtered, and a transaction wide table and a user behavior wide table which are adaptive to the display dimension of the front end are generated and serve as public data layer data. Specifically, the real-time calculation engine analyzes the user behavior data and the order data, judges whether a resource number in the user behavior data is empty, if the resource number is empty, does not count the statistical calculation, and otherwise, establishes a transaction width table and a user behavior width table of the message middleware in a transaction domain and a behavior domain respectively according to the transaction data field and the behavior data field.

The specific contents for constructing the transaction broad table are as follows:

the method comprises the steps of firstly correlating records of an order main table and an order sub-table according to order IDs through a real-time calculation engine, carrying out data statistics according to the display dimension of a front end, marking sources, marking scores of corresponding resource popularity, and generating a public data layer data table.

The specific content for constructing the user behavior broad table is as follows:

and analyzing each behavior detail data of clicking behavior, browsing detail behavior, checking warranty book behavior, bidding behavior and vehicle-adding behavior by adopting a real-time computing engine, grouping according to the display dimension of the front end, and aggregating and computing resource popularity. The specific contents of the calculation resource human qi are aggregated as follows:

and setting a scoring rule according to the depth of the behaviors, scoring resource popularity of different behaviors of the user until the score of the car adding behavior of the user is the highest, combining resource popularity scores of all the resource popularity scores of the listed user behaviors according to the same latitude, and calculating the resource popularity to generate a behavior wide table.

Further, the real-time computation engine employs an open source stream processing framework Flink.

And the step of data layering processing also comprises a step of computing result output, wherein the step of real-time aggregation computing is carried out according to different dimensions, the result is output to the distributed storage system HBase, and the query HBase provides data service to the front end. And the intermediate result of the aggregation calculation is stored in a distributed storage system HBase or stored by adopting a message middleware, and the final calculation result is provided to the front end for display by generating a data service API.

Further, the client includes but is not limited to a PC end, a mobile end and an applet.

In another aspect, the present invention provides a resource popularity calculation system based on real-time streaming, including:

the data acquisition module is used for acquiring user behavior data with an interaction event between the client and the server by a server-side point burying method;

the data layering construction module is used for classifying and layering the collected data of different attribute sources by adopting a data layering method, and the layering processing comprises a source data layer, a public data layer and an application data layer;

the data processing module analyzes the source layer logs through stream processing, filters data with empty resource numbers, and generates a transaction width table and a user behavior width table with corresponding dimensions;

the resource popularity calculation module analyzes each behavior detail data of the source data layer through the real-time calculation engine and calculates a resource popularity value for the analyzed data;

the distributed storage system HBase stores the data processing result and provides data query service;

and the front end sets the resource popularity display dimension and displays the data service provided by the HBase of the distributed storage system.

Further, the distributed storage system HBase provides a data service API exposed by a front end.

Another aspect of the present invention provides a computer-readable storage medium having stored therein a computer program executable by at least one processor to implement the steps of the real-time streaming based resource popularity calculation method as described above.

Compared with the prior art, the resource popularity calculation method, the resource popularity calculation system and the resource popularity calculation storage medium based on the real-time stream at least have the following beneficial effects:

1) according to the invention, the collection of the user behavior data of the network retail platform is changed from the original front-end embedded point to the server-end embedded point, so that the data accuracy is improved, and meanwhile, the behavior data of a complex mobile terminal can be collected, thereby avoiding the situation of information loss.

2) The real-time data storage is constructed by adopting the message middleware to store data, and the message middleware adopts the stable message queue kafka, so that the method has the characteristics of high throughput, low delay, high concurrency, high fault tolerance and high expandability.

3) The real-time computing engine is used for processing, the real-time computing engine is a Flink, the Flink is a stream batch unified engine which has high throughput, low delay, high flexible stream windows and a lightweight fault-tolerant mechanism, and the real-time performance of data can be guaranteed.

4) The data layering construction method is adopted to classify and layer data from different attribute sources, the data reuse degree is improved, good expansibility is achieved, the data of the original public data layer can be reused if new behaviors and transaction types are analyzed, the data calculation task of the public data layer only needs to be reconstructed if new behaviors and dimensions exist, the new task is not needed, and the calculation resource overhead is reduced.

5) The result data is stored by adopting the column-oriented storage distributed storage system HBase, and the query service is provided, so that the scenes of massive data and high concurrency can be supported, the delay of the data service is lower, and the reliability is higher.

Drawings

Fig. 1 is a schematic flow chart of a resource popularity calculation method based on real-time streaming in an embodiment.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.

Examples

The invention relates to a resource popularity calculation method based on real-time flow, which utilizes a plurality of user behaviors and transaction data collected from a network retail platform in real time, calculates a resource popularity value by carrying out real-time statistical processing on a message middleware, and finally provides the resource popularity value to the platform in a mode of providing real-time data service. As shown in fig. 1, the method specifically includes the following steps:

step one, data acquisition:

behavior data acquisition:

and acquiring user behavior data including interaction events of a PC (personal computer) end, a mobile end, a client of an applet and a server associated with the network retail platform to a message middleware in a point burying mode of the server. The user behavior data includes click behavior data, browse details behavior data, view warranty behavior data, bid behavior data, and car-added behavior (shopping-cart-added behavior) data,

collecting transaction data:

and synchronizing the transaction information to the message middleware in real time in a log synchronization mode. The transaction information includes order data, resource number and the like.

The real-time data storage is constructed by adopting the message middleware to store data, and the message middleware adopts the stable message queue kafka, so that the method has the characteristics of high throughput, low delay, high concurrency, high fault tolerance and high expandability.

Step two, data layering processing:

and after data acquisition, a data layering method is adopted, a source data layer, a public data layer and an application data layer are constructed in the message middleware, and then aggregation calculation is carried out through a real-time calculation engine to obtain resource popularity index values. Namely, the real-time data warehouse is divided into three layers as a whole, namely a source data layer, a public layer (DWD and DWS layer) and an application layer.

The computing engine adopts Flink which is taken as a main factor of the streaming computing engine and comprises the following steps: high throughput, low latency, high performance; highly flexible streaming windows; exact-once semantics of state computation; a lightweight fault tolerance mechanism; EventTime and out-of-order events are supported; and (4) a stream batch unification engine. The online traffic data goes directly to Kafka or other message storage system and is calculated using Flink real-time consumption data.

Specifically, the method comprises the following steps:

2.1) source data layer:

and respectively storing the buried point push user behavior data into TPXHB01 (click action), TPXHB02 (browse detail action), TPXHB03 (view warranty action), TPXHB04 (BID action) and TPXHB05 (vehicle adding action) of a source data layer of the message middleware, and simultaneously synchronizing the ORDER main table data TPXH _ ORDER _ M, the ORDER sub table TPXH _ ORDER _ D and the bidding process table XH TPBID _ RECORD in the transaction to the source data layer of the message middleware in a log synchronization mode.

2.2) dividing the data domain by the common data layer:

and analyzing the source layer logs through stream processing, filtering dirty data with empty resource numbers, and generating a transaction wide table and a user behavior wide table of corresponding dimensions as public data layer data. Specifically, the real-time calculation engine analyzes the behavior data and the order data, and the transaction information and the behavior data comprise the resource number, but the resource number of the behavior data may be empty, so that whether the resource number in the behavior data is empty is judged firstly, if the resource number is empty, the statistical calculation is not included, otherwise, a message middleware public layer transaction width table and a user behavior width table are respectively constructed in a transaction domain and a behavior domain according to the transaction domain and the behavior domain. Specifically, the method comprises the following steps:

2.3) constructing a transaction wide table:

the method comprises the steps of firstly, associating records of an ORDER main table TPXH _ ORDER _ M and an ORDER sub table TPXH _ ORDER _ D according to ORDER IDs through a real-time computing engine, in the embodiment, associating by adopting a Flink stream data join technology in the prior art, directly obtaining details after stream data join, carrying out data statistics according to a certain dimension, marking sources, and giving corresponding scores to marks. For example, in the obtained details, the volume of the deal, the amount of the deal and the popularity of the resource are counted by the rule of 'deal time + seller + buyer + resource number + bundle number + variety + brand + specification + quality grade + region + source' (remark: source is ORDER deal), (remark: score of popularity value of each deal resource for 6 points), and a public level deal data table DW _ JY _ ORDER is generated.

2.4) constructing a behavior broad table:

analyzing the behavior detail data of the TPXHB01, the TPXHB02, the TPXHB03, the TPXHB04, and the TPXHB05 by using a real-time computing engine, in this embodiment, analyzing is performed by a method of directly analyzing json-format files by using a Flink engine in the prior art to obtain key information required in json, and grouping is performed according to a certain dimension of an actual transaction rule, for example, grouping is performed by using a dimension of "behavior time + seller + buyer + resource number + package number + variety + brand + specification + quality level + region + behavior name", and resource popularity is aggregated.

As each behavior represents the depth that the user wants to know the product, the invention defines the scoring rule according to the depth of the behavior and scores different behaviors until the scoring of the behavior of adding the car by the user is the highest. For example, according to the depth of the BEHAVIOR, the resource popularity of TPXHB01 (click BEHAVIOR) is recorded for 1 point, the resource popularity of TPXHB02 (browse detail BEHAVIOR) is recorded for 2 points, the resource popularity of TPXHB03 (view warranty BEHAVIOR) is recorded for 3 points, the resource popularity of TPXHB04 (bid BEHAVIOR) is recorded for 4 points, the resource popularity of TPXHB05 (car-added BEHAVIOR) is recorded for 5 points, the resources of the listed user BEHAVIORs are combined and calculated according to the rule according to the same latitude, and the BEHAVIOR width table DW _ XW _ BEHAVIOR is generated statistically.

2.5) merging and generating an application data layer resource human atmosphere table:

by using stream processing, merging the dimensionalities of resource popularity detail, namely time, seller, buyer, resource number, bundle number, variety, brand number, specification, quality grade, region and source (BEHAVIOR name) into a transaction width table DW _ JY _ ORDER and a BEHAVIOR width table DW _ XW _ BEHAVIOR to generate an application layer resource popularity table: DM _ ZY _ POP.

Step three, outputting a calculation result:

the method calculates according to the dimension displayed by the front end, for example, if the front end is the dimension of 'variety + brand', the front end aggregates according to the dimension of the variety brand, and the embodiment aggregates and counts the resource popularity according to the detailed dimension, namely the dimension of 'date + seller + buyer + resource number + package number + variety + brand + specification + quality grade + region'; and after real-time aggregation calculation is carried out according to different dimensions, the results of the application layer resource human gas table are output to a column storage-oriented telescopic distributed storage system HBase, and finally, data services are provided to the front end by a query HBase for display.

In this embodiment, as a preferred scheme, the result of the calculation is stored in the HBase, and finally, a data service API is generated and provided to the front-end presentation. Namely, a unified interface service layer (such as OneService) is used for providing Dubbo interface acquisition index data for a service user, and the previous section is displayed.

In addition, the invention provides a resource popularity calculation system based on real-time flow, which comprises a data acquisition module, a data layering construction module and a data processing module, wherein:

and the data acquisition module is used for acquiring user behavior data of interaction events between the client and the server by a server point burying method, and synchronizing transaction information to the message middleware in real time by a log synchronization mode.

And the data layering construction module is used for classifying and layering the acquired data of different attribute sources by adopting a data layering method, and specifically comprises a source data layer, a public data layer and an application data layer.

And the data processing module is used for analyzing the source number layer logs through stream processing, filtering dirty data with empty resource numbers, and generating a transaction wide table and a user behavior wide table with corresponding dimensions. The trading width table firstly correlates records of an order main table and an order sub table according to order IDs through a real-time calculation engine, performs data statistics according to the display dimension of a front end, marks sources, and marks corresponding resource popularity scores; the user behavior broad table is used for setting a scoring rule according to the depth of the behavior, scoring resource popularity of different behaviors of the user, and combining resource popularity scores of all listed user behaviors according to the same latitude to calculate the resource popularity.

And the resource popularity calculation module is used for analyzing each behavior detail data of the source data layer through the real-time calculation engine and calculating a resource popularity value for the analyzed data.

And the distributed storage system HBase is used for storing the data processing result and providing data query service.

And the front end is used for setting the resource popularity display dimension and displaying the data service provided by the distributed storage system HBase.

The present invention further provides a computer-readable storage medium, which is a non-volatile readable storage medium and stores a computer program, where the computer program is executable by at least one processor to implement the operation of the resource popularity calculation method or system based on real-time streaming.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and those skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A resource popularity calculation method based on real-time flow is characterized in that the method calculates user transaction data and user behavior data of a network retail platform according to the display dimension of a front end, and comprises the following steps:

behavior data acquisition: collecting user behavior data with interaction events between each client and a server of the network retail platform to a message middleware in a server embedded point mode;

a transaction data acquisition step: synchronizing transaction information to a message middleware in real time in a log synchronization mode;

data layering processing: and constructing a source data layer, a public data layer and an application data layer in the message middleware by adopting a data layering method, and performing aggregation calculation through a real-time calculation engine to obtain resource popularity index values.

2. The real-time streaming based resource popularity computation method of claim 1, wherein the transaction information includes order data and resource number information.

3. The real-time stream-based resource popularity computation method according to claim 1, wherein the source data layer includes a click behavior unit, a browse details behavior unit, a view warranty behavior unit, a bid behavior unit, and a car-filling behavior unit.

4. The resource popularity calculation method based on the real-time stream according to claim 3, wherein in the data layering step, user behavior data collected by pushing of a server-side buried point are respectively stored in a click behavior unit, a browse details behavior unit, a quality insurance book viewing behavior unit, a bid behavior unit and a car adding behavior unit of a source data layer of a message middleware.

5. The resource popularity computation method based on the real-time streaming according to claim 4, wherein the message middleware adopts a distributed publish-subscribe message system kafka.

6. The resource popularity calculation method based on the real-time stream according to claim 1, wherein in the data layering step, stream processing is adopted to analyze log information of a source data layer, data with empty resource numbers in user behavior data are filtered, and a transaction width table and a user behavior width table which are adaptive to a presentation dimension of a front end are generated to serve as public data layer data.

7. The resource popularity calculation method based on the real-time flow according to claim 6, wherein the real-time calculation engine analyzes the user behavior data and the order data, judges whether the resource number in the user behavior data is empty, if the resource number is empty, the statistical calculation is not included, otherwise, according to the two data fields of transaction and behavior, a transaction width table and a user behavior width table of the message middleware are respectively constructed in the transaction field and the behavior field.

8. The resource popularity calculation method based on real-time streaming according to claim 7, wherein the specific contents for constructing the transaction wide table are as follows:

9. The resource popularity calculation method based on real-time streams as claimed in claim 6, wherein the specific contents for constructing the user behavior broad table are as follows:

and analyzing each behavior detail data of clicking behavior, browsing detail behavior, checking warranty book behavior, bidding behavior and vehicle-adding behavior by adopting a real-time computing engine, grouping according to the display dimension of the front end, and aggregating and computing resource popularity.

10. The resource popularity computation method based on real-time streaming according to claim 9, wherein the specific contents of the aggregated computation resource popularity are as follows:

11. The real-time flow-based resource popularity computation method according to claim 8 or 10, wherein the real-time computation engine employs an open source flow processing framework Flink.

12. The resource popularity calculation method based on the real-time stream according to claim 1, wherein the data layering processing step further includes a calculation result output step, the step performs real-time aggregation calculation according to different dimensions, the result is output to a distributed storage system HBase, and a query HBase provides data service to a front end.

13. The resource popularity computation method based on real-time streams according to claim 12, wherein intermediate results of the aggregation computation are stored in a distributed storage system HBase or stored by using a message middleware, and final computation results are provided to a front end for presentation by generating a data service API.

14. The real-time streaming based resource popularity calculation method according to claim 1, wherein the client comprises a PC end, a mobile end and an applet.

15. A real-time streaming based resource popularity computation system, the system comprising:

16. The real-time streaming based resource popularity computation system of claim 15, wherein the distributed storage system HBase provides a data services API exposed with a front-end.

17. A computer-readable storage medium, having stored thereon a computer program executable by at least one processor to perform the steps of the real-time streaming based resource popularity calculation method according to any one of claims 1-14.