CN112286895A

CN112286895A - Log real-time attribution processing method, device and platform

Info

Publication number: CN112286895A
Application number: CN202011194112.7A
Authority: CN
Inventors: 赵宏伟
Original assignee: Beijing Shenyan Intelligent Technology Co ltd
Current assignee: Beijing Shenyan Intelligent Technology Co ltd
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2021-01-29
Anticipated expiration: 2040-10-30
Also published as: CN112286895B

Abstract

The invention discloses a log real-time attribution processing method, device and platform. Wherein, the method includes: when there is a unique key of the data, updating the data through upsert; associating the upstream and downstream logs according to the upstream and downstream log relationships specified in the dependency attribution path; performing complementary updates to the upstream and downstream log fields, The association result is recorded in the first log and the second log, wherein the first log is used to indicate whether the upstream log is successfully associated; the second log is used to indicate whether the downstream log is associated. The invention solves the technical problem of low efficiency of data processing service existing in the prior art in the maintenance process of customer data.

Description

Log real-time attribution processing method, device and platform

Technical Field

The invention relates to the technical field of computers, in particular to a log real-time attribution processing method, device and platform.

Background

In the existing advertisement delivery system, delivery data is generally managed according to historical experience by operation and maintenance personnel, but the advertisement delivery system used in the prior art has single function and cannot meet the management requirement of increasingly complex business on a large amount of data

In view of the above-mentioned problem of low efficiency of data processing service in the maintenance process of client data in the prior art, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a log real-time attribution processing method, a log real-time attribution processing device and a log real-time attribution processing platform, which are used for at least solving the technical problem of low efficiency of data processing service in the maintenance process of client data in the prior art.

According to an aspect of an embodiment of the present invention, there is provided a log real-time attribution processing method, including: when the unique key of the data exists, updating the data through updating insertion; associating the upstream log and the downstream log according to the upstream log relation specified in the dependency attribution path; complementarily updating the fields of the upstream log and the downstream log, and recording the association result in a first log and a second log, wherein the first log is used for expressing whether the upstream log is successfully associated or not; the second log is used for expressing whether the downstream log is related or not.

Optionally, associating the upstream log and the downstream log according to the upstream log relationship specified in the dependency attribution path includes: judging whether a top node or an attribution path is empty or not; determining whether to execute the execution logic of the child node to find the parent node according to the judgment result; and submitting the execution result to a log set.

Further, optionally, the method further includes: the relationships of parent node finding child node, child node finding deletion, renaming and extraction in parent node include: when the relations of deletion, renaming and extraction do not exist, the data complementing operation is not executed; when only the deleted relation exists, removing the specified fields, and taking the rest fields downwards; when only the renaming relationship exists, the designated field is renamed and then taken down, and the rest fields are discarded; when only the extracted relation exists, extracting the specified field downward band, and discarding the rest fields; when a relation between renaming and deleting exists or a relation between renaming and extracting exists, extracting the field needing renaming to rename; deleting or extracting the remaining fields except for renaming; and combining the calculation results; the relation between deletion and extraction cannot exist at the same time, and when the number of the fields needing to be reserved is larger than a first preset value, deletion operation is executed; when the field needing to be discarded is larger than a second preset value, executing extraction operation; if all fields need to be reserved, the deletion is set to be a null array, and one field is not removed; the first parameter of the renamed internal key value pair represents the renamed name, and the second parameter represents the original name.

Optionally, the method further includes: representing the first parameter of the key value pair in the calls as the field name of a parent node, and representing the second parameter as the field name of a child node; when the mntprjId and the userId participate in query, the mntprjId and the userId are displayed in the calls for designation; when the traversed parent nodes are all attributed, the first log flag of the current node is true; when the first log mark is false, the exchange field of the parent node and the child node is not influenced.

Optionally, the method further includes: the node to be submitted cases include: the current node has no father node, wherein the current node is a top node; all ancestor nodes are attributed successfully, wherein the first log mark of the ancestor node is true, the child node contains the mpId of the ancestor node, and a state of whether the child node is submitted to be true or not when the attribution is successful is set; when attribution of the father node fails and attribution failure is set, whether to submit the state of the child node is set; when the attribution of the father node fails and the father node finds the child node, whether the self-node state is submitted or not is set; and if no fields needing to be exchanged exist, setting the value of the fields as a third preset value.

Optionally, the method further includes: attribution attributes of attribute definitions of the data processing engine are generated from the path of the specified item and the attribution attributes of the specified item.

According to another aspect of the embodiments of the present invention, there is also provided a monitoring platform, which is applied to the method described above, and includes: the system comprises a management webpage end, a management data processing engine and a data processing engine, wherein the management webpage end is used for providing a user use interface and authority verification; the management data processing engine is used for starting and stopping the appointed workflow and finishing the pre-release and release work of the project; and the data processing engine is used for operating the specified workflow and finishing the processing work of the data.

Optionally, the monitoring platform further includes: the database is used for searching or creating users for logs of different sources according to related user fields in the logs; and generating an association path at a session or user level for the event with the association relationship.

According to an aspect of an embodiment of the present invention, there is provided a log real-time attribution processing apparatus, including: the updating module is used for updating the data through updating insertion when the unique key of the data exists; the correlation module is used for correlating the upstream log and the downstream log according to the upstream log relation specified in the dependency attribution path; the log generation module is used for complementarily updating the fields of the upstream log and the downstream log and recording the association result in a first log and a second log, wherein the first log is used for expressing whether the upstream log is successfully associated or not; the second log is used for expressing whether the downstream log is related or not.

Optionally, the association module includes: the judging unit is used for judging whether the top node or the attribution path is empty or not; the execution unit is used for determining whether to execute the execution logic of the child node for finding the parent node according to the judgment result; and the submitting unit is used for submitting the execution result to the log set.

In the embodiment of the invention, when the unique key of the data exists, the data is updated through updating insertion; associating the upstream log and the downstream log according to the upstream log relation specified in the dependency attribution path; complementarily updating the fields of the upstream log and the downstream log, and recording the association result in a first log and a second log, wherein the first log is used for expressing whether the upstream log is successfully associated or not; the second log is used for expressing whether the downstream log is associated or not, and the purpose of effectively managing data is achieved, so that the technical effect of improving the efficiency of data processing service is achieved, and the technical problem that in the prior art, the efficiency of data processing service is low in the maintenance process of client data is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow diagram of a log real-time attribution processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an attribution flow in a log real-time attribution processing method according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an item data structure in a log real-time attribution processing method according to an embodiment of the present invention

FIG. 4 is a schematic diagram of an associated flow in a log real-time attribution processing method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a monitoring platform according to an embodiment of the invention;

FIG. 6 is a schematic diagram of monitoring the status of items as published in a platform, according to an embodiment of the invention;

FIG. 7 is a schematic diagram of a monitor platform being started in a different handler instance process in accordance with an embodiment of the present invention;

FIG. 8 is a schematic diagram of a timing diagram for publication in a monitoring platform according to an embodiment of the invention;

FIG. 9 is a schematic diagram of a split stream processing workflow in a monitoring platform according to an embodiment of the invention;

FIG. 10 is a schematic diagram of a data processing flow in a monitoring platform according to an embodiment of the invention;

FIG. 11 is a schematic diagram of a log real-time attribution processing device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided a method embodiment of a log real-time attribution processing method, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a schematic flow chart of a log real-time attribution processing method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step S102, when the unique key of the data exists, updating the data through updating insertion;

specifically, whether the key exists is determined according to unique key (the unique key in the embodiment of the present application), and update is performed on the existing data by using update insertion.

Step S104, associating the upstream log and the downstream log according to the upstream log relation specified in the dependency attribution path;

optionally, in step S104, associating the upstream and downstream logs according to the upstream and downstream log relationship specified in the dependency attribution path includes: judging whether a top node or an attribution path is empty or not; determining whether to execute the execution logic of the child node to find the parent node according to the judgment result; and submitting the execution result to a log set.

Specifically, as shown in fig. 2, fig. 2 is a schematic diagram of an attribution process in a log real-time attribution processing method according to an embodiment of the present invention, where the attribution process in the embodiment of the present invention includes:

the first step is as follows: judging whether a top node or an attribution path is empty or not;

the second step is that: if so, the child node does not need to be moved to find the logic of the parent node; if not, the logic of finding the father node by the child node is moved;

the third step: walking a father node to find child node logic;

and returning: satisfying the condition, the log collection needs to be submitted to the producer.

Step S106, complementary updating is carried out on the fields of the upstream log and the downstream log, and the association result is recorded in a first log and a second log, wherein the first log is used for expressing whether the upstream log is successfully associated or not; the second log is used for expressing whether the downstream log is related or not.

Specifically, according to the upstream and downstream log relationship specified in the dependency attribution path, the upstream and downstream logs are associated, complementary update of the upstream and downstream log fields is realized, and the association result is recorded in isAttr (whether the upstream log is successfully associated) (the first log in the embodiment of the present application) and hasChild (whether the downstream log is successfully associated) (the second log in the embodiment of the present application).

Further, optionally, the log real-time attribution processing method provided in the embodiment of the present application further includes: the relationships of parent node finding child node, child node finding deletion, renaming and extraction in parent node include: when the relations of deletion, renaming and extraction do not exist, the data complementing operation is not executed; when only the deleted relation exists, removing the specified fields, and taking the rest fields downwards; when only the renaming relationship exists, the designated field is renamed and then taken down, and the rest fields are discarded; when only the extracted relation exists, extracting the specified field downward band, and discarding the rest fields; when a relation between renaming and deleting exists or a relation between renaming and extracting exists, extracting the field needing renaming to rename; deleting or extracting the remaining fields except for renaming; and combining the calculation results; the relation between deletion and extraction cannot exist at the same time, and when the number of the fields needing to be reserved is larger than a first preset value, deletion operation is executed; when the field needing to be discarded is larger than a second preset value, executing extraction operation; if all fields need to be reserved, the deletion is set to be a null array, and one field is not removed; the first parameter of the renamed internal key value pair represents the renamed name, and the second parameter represents the original name.

Specifically, for relations of remove (deletion), rename (renaming), and extra (extraction) in parentToChild (parent node finding child node provided in the embodiment of the present application) and child toparent (child node finding parent node provided in the embodiment of the present application), the relations include:

(1) when no remove, rename and extra exists, the complement data operation is not executed;

(2) when only remove exists, removing the specified fields, and taking the rest fields downwards;

(3) when only rename exists, the designated field is renamed and then is taken down, and the rest fields are discarded;

(4) when only the extra exists, extracting the specified field and taking down the field, and discarding the rest fields;

(5) when rename + remove or rename + extractor exists, fields needing rename are extracted out for renaming; removing the residual fields except the rename, and then executing remove or extra operation; finally, combining the calculation results of the two parts;

(6) remove and extractor can not exist at the same time, and the two naturally and mutually exclusive; when the number of the fields needing to be reserved is more (the number of the fields needing to be reserved is greater than a first preset value in the embodiment of the application), the remove operation is simpler to execute; when the fields needing to be discarded are more (the number of the fields needing to be discarded is greater than a second preset value in the embodiment of the application), the execution of the extrator operation is simpler;

(7) the method is characterized in that: if all fields are required to be reserved, the remove is set as a null array, and one field is not removed; for example: the rename + empty remove array can realize the operation of renaming a part of fields and reserving the rest fields;

(8) the first argument of the key-value pair inside rename represents the renamed name and the second represents the original name.

Optionally, the log real-time attribution processing method provided in the embodiment of the present application further includes: representing the first parameter of the key value pair in the calls as the field name of a parent node, and representing the second parameter as the field name of a child node; when the mntprjId and the userId participate in query, the mntprjId and the userId are displayed in the calls for designation; when the traversed parent nodes are all attributed, the first log flag of the current node is true; when the first log mark is false, the exchange field of the parent node and the child node is not influenced.

Specifically, a first parameter of a key-value pair in the pair represents a field name of a parent node, and a second parameter represents a field name of a child node;

when the mntprjId and the userId need to participate in query, the mntprjId and the userId need to be displayed in calls for designation;

the isAttr (i.e., the first log in the embodiment of the present application) of the current node will be updated to true only if the previous parent nodes are all attributed;

the switching field of the parent node and the child node is not influenced when isAttr is false;

the parent finds that the child can only attribute isAttr as a child node of false;

fields with default values such as parentPost (parent node), child node, self post1 (self node 1), self post2 (self node 2), parenttatttime. timeformat, etc. may not be configured if consistent with the default logic.

Optionally, the log real-time attribution processing method provided in the embodiment of the present application further includes: the node to be submitted cases include: the current node has no father node, wherein the current node is a top node; all ancestor nodes are attributed successfully, wherein the first log mark of the ancestor node is true, the child node contains the mpId of the ancestor node, and a state of whether the child node is submitted to be true or not when the attribution is successful is set; when attribution of the father node fails and attribution failure is set, whether to submit the state of the child node is set; when the attribution of the father node fails and the father node finds the child node, whether the self-node state is submitted or not is set; and if no fields needing to be exchanged exist, setting the value of the fields as a third preset value.

Specifically, the node may be submitted to 4 cases:

(1) there is no parent node, i.e., the node is the top level node;

(2) all ancestor nodes are attributed successfully (isAttr of ancestor is true), child node contains its mpId inside, and whether child node is committed (child post true) when attribution is successful is set;

(3) the parent node is attributed with failure, and whether the child node is submitted when the attribution failure is set (self post1 is true, and false is defaulted when the field is not filled);

(4) the parent node is failed to be attributed, but the node finds its own child node (i.e. hasChild true), and whether to submit itself is set (self post2 true);

parentToChild (parent node finding child node in the embodiment of the present application) and child ToParent (child node finding parent node provided in the embodiment of the present application) are mandatory fields, and if there is no field to be exchanged, the value is set to { } (third preset value in the embodiment of the present application).

In summary, see Table 1, the attribution fields and descriptions are as in Table 1

TABLE 1

Examples are as follows:

test data

When the configuration is read using the database, the information of the database is { "name": mongo _ attribute "," dependencyPath ": the information of dependencyPath in the above case is copied here ] }

Top level node (remember to change ts date to last 2 days): { "request _ id": parentA A123 "," mpId ": nodeA", "nameA": zhangsan "," agenA ": 18", "ho bbyA": "(run", "football" ], "weight A": 120 "," ts ": 2019-12-1212: 08: 09" }

And a second layer node: { "request _ id": childB123 "," mpId ": nodeB 1", "nameB": zhangsan "," ageB ": 18", "ph oneB": 15536888673 "," height B ": 180", "musicB": WelComme to joining "," address B ": joining" }

Third-tier node 1: { "request _ id": childC123 "," mpId ": nodeC 1", "nameC": zhangsan "," agenC ": 18", "te acherC": lilei "," joba ": doctor", "school": Harvard University "}

Third-tier node 2: { "request _ id": childC456 "," mpId ": nodeC 2", "nameC": zhangsan "," ageC ": 18", "fa other": Jack "," sun ": her" }

Optionally, the log real-time attribution processing method provided in the embodiment of the present application further includes: attribution attributes of attribute definitions of the data processing engine are generated from the path of the specified item and the attribution attributes of the specified item.

Specifically, a handleattenbute-defined attribution attribute is generated from a Path (Path: a series of directed line segments composed of mpId pairs) of a specified item (mntprjId) and its attribution attribute (MonitorPointAttr).

The embodiment of the present application describes a process of generating dependency info (attribute attributed to processing engine) based on a project path (as a part of processing when a project is saved), following a project path generation design.

The summary of the preposed flow is as follows:

fig. 3 is a schematic diagram of an item data structure in a log real-time attribution processing method according to an embodiment of the present invention, as shown in fig. 3, attribution-dependent main table structure:

project table (Project), monitor point (MonitorPoint), Path (Path);

at present, monitoring points and paths of a global project are not stored in a database (but are written to a workflow) so as to generate attribution path attributes, and a management platform end needs to be enhanced as follows:

1. the complete path is generated when the project is saved. (or assisted by manual maintenance);

2. maintaining global project monitoring points (currently only custom monitoring points) (interface or manual);

3. maintain attribute attributes (newly added table MonitorPointAttr) (interface or manual maintenance);

4. all items need to have explicit operations that trigger attribution creation (created when the item is saved).

In addition, the association process is shown in fig. 4, and fig. 4 is a schematic diagram of the association process in the log real-time attribution processing method according to the embodiment of the invention.

As shown in fig. 4, fig. 4 respectively shows three flows, wherein the first flow is a flow for saving a project, which includes the project, nodes and attributions among the nodes, and obtains attribution attributes by generating a saving path; the second process is a compiling process; the third flow is a global workflow writing flow.

Wherein, 1. attribution attribute is generated in table 2, and table 2 is an attribution information table:

TABLE 2

Field(s)	Type (B)	Description of the invention
			_id
mntprjId	string	Item id (pk)
			name	string	Project names (optional, only easy to read)
dependencyPath	[]	The element is a dependency node defined by an attribute actor

2. Attribute attribution of monitoring points: if the monitor Point is not defined by the attribution attribute, the default value is used, see Table 3

TABLE 3

The eventKey has at most 5 fields (at most 4 fields are used at present, and one field is reserved).

3. Generating a DependencyNode (MonitorPointAttr is mapped to the DependencyNode), see table 4;

TABLE 4

4. The field transparent transmission (downlink/uplink) data structure in the field transparent transmission is as follows:

{

stopPropagation, coolean,// default false; if true, the upstream field is not transparently passed [ string ],// field is removed (valid only for this level, not transparently passed)

rename map string,// key source field name, value target field name extractor string [ ],// rename abbreviation

}

The calculation method comprises the following steps:

remove as received; (use is not recommended)

Extract is converted into rename semantics (result is rename only, no extract);

rename is passed cumulatively along the path, replacing the next level source name with the target name at each pass.

4. When stopprogress is encountered, rename (rename of the start of the line segment) is reset.

5. Until the end of the full path.

Similar downFields/upFields algorithms (opposite starting points)

The set path is as follows: a → B → C → D, calculated as Table 5:

TABLE 5

Example 2

According to another aspect of the embodiments of the present invention, there is further provided a monitoring platform, which is applied to the log real-time attribution processing method in embodiment 1, and fig. 5 is a schematic diagram of the monitoring platform according to the embodiments of the present invention, as shown in fig. 5, including: the system comprises a management webpage end, a management data processing engine and a data processing engine, wherein the management webpage end is used for providing a user use interface and authority verification; the management data processing engine is used for starting and stopping the appointed workflow and finishing the pre-release and release work of the project; and the data processing engine is used for operating the specified workflow and finishing the processing work of the data.

Optionally, the monitoring platform provided in the embodiment of the present application further includes: the database is used for searching or creating users for logs of different sources according to related user fields in the logs; and generating an association path at a session or user level for the event with the association relationship.

In summary, the data processing architecture of the monitoring platform is as shown in fig. 5, the management Web end provides a user interface and authority verification, and the functions are mainly divided into IT management, service management, report form and monitoring function. And compiling the monitoring points, managing the versions, and providing a publishing, pre-publishing and monitoring interface.

And the Monitor is used for managing a data processing engine, appointing the starting and stopping of the workflow and finishing the pre-release and release work of the project. And issuing, pre-issuing, restarting the cluster, the instance and the project instructions are provided through an HTTP interface. And acquiring a current instance list through service discovery, and issuing the current instance list to a handler start instruction (containing a template) and a stop instruction through Kafka.

And the Handler and the data processing engine run the specified workflow to complete the data processing work. And reading an instruction in Kafka, realizing the loading start and stop of the workflow template, and regularly storing the running state into MongoDB.

The Monitor and the Handler operate as a background stateless process in the Docker environment, and can be intensively deployed in Shenzhen through parameter setting when the Docker is started. Dependent resources Kafka, MongoDB, Redis, etc. can be accessed directly locally.

Based on the above, the load balancing in the production environment in the embodiment of the present application is as follows:

the production environment uses a distributed architecture to manage and run the data processing engines, and in order to ensure that the efficiency of the data processing service is maximized, load balancing on the computing resources needs to be considered.

The load value for each Instance is planned using the following formula:

the workflow instance load is the cumulative sum of each log processing time;

container instance load is the cumulative sum of each workflow instance load;

the load needs to be adjusted in the following scenarios:

scene 1: when the project is released and pre-released;

scene 2: when the following scenes occur, a restart platform instruction is issued through the interface;

scene 3: when the occupancy rate of the computing resources exceeds a threshold value;

scene 4: when computing resources are newly added;

scene 5: when the partition of topic is adjusted;

load balancing refers to the load of each container instance, and the load of each instance can approach the average load through the load history sequencing combination of the last 1 day during each adjustment.

In the embodiment of the application, the project is divided into IT in the state on the platform, and the IT comprises a development environment (for configuration), a verification environment (for pre-run viewing), and a production environment (for running viewing management). The development environment can edit and construct projects. The state of the item when released is shown in fig. 6, and fig. 6 is a schematic diagram of the state of the item when released in the monitoring platform according to the embodiment of the present invention.

Wherein, the items after the pre-release test is passed can be released; the influence of the monitor points used in pre-release and release on the processing setting of the split Topic is the same; the shunt processing is not influenced by the suspension; when stopping, it is necessary to check whether there is any other item consumption in the shunting processing of the associated Topic, and if not, the shunting processing of the Topic is stopped.

In the embodiment of the application, the pre-publishing and the publishing adopt the same flow, but the log consumption and the data generation are not influenced mutually.

During publishing and pre-publishing, the Monitor dynamically sets relevant environment variables in the template, including:

(1) consuming the groupId of Topic

(2) Attributed Event and User tables

(3) Name of production of Topic

(4) When the source Topic is the split Topic, the consumed Topic is set as the project target Topic generated by the split.

Stopping pre-publishing workflows under the same project during pre-publishing, and regularly cleaning resources;

and stopping pre-publishing workflows under the same project during publishing, regularly cleaning resources and stopping published workflows.

And when the template is released, creating the same quantity of workflow quantities consuming the partitions according to the partition number of topic used by the template, wherein each workflow instance needs to be started in a different handler instance process. As shown in fig. 7, fig. 7 is a schematic diagram of the monitoring platform being started in different handler instance processes according to an embodiment of the present invention.

Timing diagram for publication fig. 8 is a schematic diagram of a timing diagram for publication in a monitoring platform according to an embodiment of the present invention.

The shunting processing when released in the embodiment of the present application: when the project monitoring point contains the split Topic, monitoring point information needs to be written into a configuration table, and the workflow instance for splitting by the Topic is restarted.

And dynamically generating a workflow template when the workflow instance of the shunting processing is started.

The configuration table main fields are shown in table 6:

TABLE 6

Based on the above, the flow splitting processing workflow in the embodiment of the present application is as shown in fig. 9, and fig. 9 is a schematic diagram of the flow splitting processing workflow in the monitoring platform according to the embodiment of the present invention.

Distribute Actor

1. Synchronizing configuration table information according to configuration timing

2. Filtering each log, and adding topic content to the Producer array in the log meeting the condition

MultiProducer Actor

And sending the current log to each target topic according to the specified producer array content.

The effect of MongoDB in the examples of the present application is as follows:

during data processing, the following tasks need to be completed:

(1) unifying user identifications:

and for logs of different sources, searching or creating users according to related user fields in the logs. The internal and external logs need to be associated with each other by Session ID or the like. After associating users, user attributes in different logs may be collected and saved to related logs.

The following requirements are made for the type selection of the database:

when real-time processing is carried out, high-frequency reading and writing are needed, and creation and query are realized;

the query condition also supports the retrieval of a plurality of user identification fields;

user attributes are required to be stored during creation;

reading user attributes during searching;

since temporary marks such as Session ID are used when the user is not logged in, the user number level is very large.

(2) Event path attribution:

for the events with the association relationship, such as showing, clicking and payment, the association path is required to be realized according to the session or user level, so that the truth and convergence of the conversion are ensured. The following requirements are made for the type selection of the database:

high-frequency reading and writing, storing the full text of the associated event, and supporting various retrieval modes to ensure that the association of different source data can be realized, such as IP + UA, device ID, session ID, utmid, various user IDs and the like.

The retention time depends on the attributed time window, which is commonly 15 days for the advertising industry.

In the event correlation process, the attributes of the events which occur first can be brought into the events which follow, such as channels, sources and the like.

And an event unique mark is designated according to the log content, so that the number complementing is not repeated.

For the above reasons, the requirements of the database can be summarized:

1. high-frequency read-write updating is carried out, and the processing magnitude of the log can reach the level of billions of days;

2. multi-identifier retrieval, which is to perform retrieval and association according to different identifiers of users and sessions in logs;

3. the log format is Json without fixed schema, and the field content to be used by the subsequent event is ensured to exist when the log is stored;

4. the dynamic index creation can be supported for the specified fields;

5. on the order of billions of data, databases support distributed extensions, providing sufficient storage space and processing capability.

The KV type database such as Redis and the like has insufficient support for the 2, 3 and 5

3,4 support of big table type database such as Hbase

ES support deficiency for 1

TABLE 7

	MongoDB	Redis	Big table	Sql on Hadoop	ES
						High frequency read-write update	V	V	V	X	X
Multi-identity retrieval	V	X	V	V	V
						The log format is Json without fixed schema	V	X	X	X	V
Dynamically creating an index	V	X	X	V	V
						Distributed type	V	V	V	V	V

The interactive design based on the Monitor-Handler comprises the following steps:

monitor, namely a management module for operating the workflow engine.

A Handler: refers to a module running a workflow engine.

Instance this section refers specifically to the Handler example.

And (3) managing a Web end: the display layer of the monitoring platform is provided with a front end and a back end of the display layer and independent storage, and the display layer interacts with the Monitor to realize the start and stop of workflow.

Workflow: and performing service logic processing on the specific data stream and outputting the specific data stream. workflow is distributed by monitor to workflow engine (Hanlder) for execution.

Discovery Server, service discovery server. When the Handler is started, the Handler is registered to the DiscoveryServer, and the Monitor configures the DiscoveryServer address to obtain a Handler list.

Configuring the DiscoveryServer address, registering to the DiscoveryServer (the Handler is the Discoveryclient in the monitoring platform).

deployType refers to the manner of publication (pre-publish | publication), (distinguished from the dev/prd context, naming deployType not the context). Pre-release (commissioning): it is proved that pre-release is not a separate environment independent of dev/prd, which means that the workflow itself is pre-released (i.e. the processing logic of the workflow is the same as that of the formal release, only the data source of the input and output is different) to observe whether the output is in expectation or not. In short, it is a somewhat different workflow to run in both environments (typically prd) than it is to run formally. (dev/prd is the environment in which the monitoring platform operates, and pre-release/publish is the mode of operation of workflow within dev/prd.)

The monitoring platform is composed of modules such as a management platform (Web end), a compiler, a Monitor and a Handler.

And (3) managing a Web end: defining monitoring items and issuing the monitoring items.

A compiler: and compiling the well-defined monitoring items and the configuration information into workflow and packaging.

Monitor: coordinating handles runs the workflows of published monitoring items, runs global workflows and handles shunts correctly.

Handler-running data workflow (monitoring item workflow, global workflow).

In the embodiment of the present application, as shown in fig. 10, fig. 10 is a schematic diagram of a data processing flow in a monitoring platform according to an embodiment of the present invention.

Compiler compilation in the examples of this application:

different DeliveryTypes (pre-publish/publish) replace placeholders in workflows with different parameters. The compiler and the processing engine jointly agree on placeholder syntax in the workflow.

When pre-releasing/releasing, the parameters are obtained from the build output.

While it is understood from the concept of compile- > runtime that runtime parameters can be given at runtime (i.e., publish/pre-publish), it is required that the content and behavior of the project build be determined at build time from the definition of monitoring the build version of the project.

Thus, the parameters belong to the contents of the monitoring project workspace, snapped into the build at build time.

The Build construct (db, collection) should comprise:

the deploy _ params table structure in the examples of the present application is shown in table 8:

TABLE 8

field	Type (B)	Description of the invention
			deployType	sgtring	Issue type (rc \| ga)
name	string	Parameter name
			value	string	Parameter value

The list of the names of the parameters in the deployment _ params (both rc and ga must be configured) is shown in Table 9:

TABLE 9

In the embodiment of the present application, a mapping table of the offload in the offload mapping (read and write by Monitor) (db (dmmp), collection (workflow _ map) is shown in table 10:

watch 10

In the embodiment of the present application, a configuration splitting process (build or adding a mapping item when releasing) is as follows:

1.srcTopicKey＝Build.Artifact.workflowsi.topic.topicKey；

filters＝Build.Artifact.workflowsi.topic.filters；

mntprjId＝Build.mntprjId；

2. mapper (or new) with src TopicKey, mntprjId

Mapper.srcTopicKey＝topicKey；

Mapper.mntprjId＝mntprjId；

Filters + filters// if the deletion of a monitoring point is considered, traverse all the build of mntprjId, re-compute the filters set in its entirety.

Mapper.targetCluster＝srcTopicKey.cluster；

Mapper.targetTopic＝srcTopicKey.topic+mntprjId；

Insert or update

3. Restarting global shunt workflow (or dynamic loading)

The global shunt workflow processing process in the embodiment of the application is as follows:

look up mappers table for each src;

1.Log＝Consumer Mapper.srcTopic；

2.FOR each mappers(of this srcTopic)；

IF map. filters [ i ]. accept (log) THEN write to map. targettopickey// monitor points filters (split workflow should understand this definition).

The monitoring project shunting mapping process in the embodiment of the application:

1.srcTopicKey＝Build.Artifact.workflowsi.consumerTopic.topicKey；

mntprjId＝Build.mntprjId；

2. and obtaining a target topicKey to replace the corresponding workflow consumer topic.

The issuing process in the embodiment of the application is as follows:

split mapping process (if there is no compilation notification) — > configure the split process;

2. replacing the corresponding parameter placeholder in workflows with the corresponding delivery parameter (in the build construct);

3. shunting mapping processing- > monitoring item shunting mapping;

4. detecting a consumer topic part;

store processed workflows;

6. workflow is started.

In Workflow load balancing in the embodiment of the present application:

minimum workflow number:

and (4) the resource consumption of each workflow is regarded as the same, and the operation request of the workflow is always distributed to the instance with the minimum number of the workflow currently operated.

Minimum average time:

the average speed per data processed on the example in the past period of time, Ti ═ time/records (time is the sum of the actual processing time);

Ta＝avg(Ti)；

workflow is assigned to the node with the smallest Ta. (activation of multiple workflows in a short period of time may present an unreasonable allocation)

Minimum processing time:

treat each container instance processing capacity as undifferentiated;

net processing time per data record tn ═ (record out-pipeline time-record in-pipeline time);

workflow instance processing time twi ═ sum (tn);

all workflow average processing times (per example) twa ═ avg (twi);

the container processing time tc ═ sum (twi);

containers with minimum workflow to tc are dispensed. And instances tc + twa.

In summary, in the embodiment of the present application, items (db, collection, delivery) are published as shown in table 11:

TABLE 11

name	comment
			_id
mntprjId	Monitoring item id
			scheduledOn	(Pre) publication time
status	Status of state	scheduled\|stop
			deliveryType	Pre-post \| post	rc\|ga
deliveryParams	Publishing parameters
			workflows	Overridden workflows

In the examples of the present application, workflow (db ═ dmmp, and collection ═ workflow) is shown in table 12:

TABLE 12

name	comment
		_id
mngprjId	Monitoring item id
		status	Status of state	waitToRun
name
		tasks		Embedded or sub-tables
calls
		topic	Consumer topic
partition	Number of detected consumer part

In the present embodiment, workflow runs a count (last 1 minute) (db ═ dmmp, collection ═ workflow _ ins _ state);

the state count of a workflow instance on a docker container, the remaining dimensions need to be based on this detailed statistics.

Example 3

According to an aspect of the embodiment of the present invention, there is provided a log real-time attribution processing apparatus, and fig. 11 is a schematic diagram of the log real-time attribution processing apparatus according to the embodiment of the present invention, as shown in fig. 11, including: an update module 112, configured to update the data through update insertion when there is a unique key of the data; an association module 114, configured to associate the upstream log and the downstream log according to the upstream log relationship specified in the dependency attribution path; a log generating module 116, configured to perform complementary updating on the fields of the upstream log and the downstream log, and record the association result in a first log and a second log, where the first log is used to express whether the upstream log is successfully associated; the second log is used for expressing whether the downstream log is related or not.

Optionally, the association module 114 includes: the judging unit is used for judging whether the top node or the attribution path is empty or not; the execution unit is used for determining whether to execute the execution logic of the child node for finding the parent node according to the judgment result; and the submitting unit is used for submitting the execution result to the log set.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A log real-time attribution processing method is characterized by comprising the following steps:

when the unique key of the data exists, updating the data through updating insertion;

associating the upstream log and the downstream log according to the upstream log relation specified in the dependency attribution path;

performing complementary updating on the fields of the upstream log and the downstream log, and recording an association result in a first log and a second log, wherein the first log is used for expressing whether the upstream log is successfully associated or not; the second log is used for expressing whether the downstream log is related or not.

2. The method of claim 1, wherein associating upstream and downstream logs in accordance with upstream and downstream log relationships specified in dependency attribution paths comprises:

judging whether a top node or an attribution path is empty or not;

determining whether to execute the execution logic of the child node to find the parent node according to the judgment result;

and submitting the execution result to a log set.

3. The method of claim 2, further comprising:

the relationships of parent node finding child node, child node finding deletion, renaming and extraction in parent node include:

when the relations of deletion, renaming and extraction do not exist, the data complementing operation is not executed;

when only the deleted relation exists, removing the specified fields, and taking the rest fields downwards;

when only the renaming relationship exists, the designated field is renamed and then taken down, and the rest fields are discarded;

when only the extracted relation exists, extracting the specified field downward band, and discarding the rest fields;

when a relation between renaming and deleting exists or a relation between renaming and extracting exists, extracting the field needing renaming to rename; deleting or extracting the remaining fields except for renaming; and combining the calculation results;

the relation between deletion and extraction cannot exist at the same time, and when the number of the fields needing to be reserved is larger than a first preset value, deletion operation is executed; when the field needing to be discarded is larger than a second preset value, executing extraction operation;

if all fields need to be reserved, the deletion is set to be a null array, and one field is not removed;

the first parameter of the renamed internal key value pair represents the renamed name, and the second parameter represents the original name.

4. The method of claim 2, further comprising:

representing the first parameter of the key value pair in the calls as the field name of a parent node, and representing the second parameter as the field name of a child node;

when the mntprjId and the userId participate in query, the mntprjId and the userId are displayed in the calls for designation;

when the traversed parent nodes are all attributed, the first log flag of the current node is true; when the first log mark is false, the exchange field of the parent node and the child node is not influenced.

5. The method of claim 2, further comprising:

the node to be submitted cases include:

a current node has no parent node, wherein the current node is a top-level node;

all ancestor nodes are attributed successfully, wherein the first log mark of the ancestor node is true, the child node contains the mpId of the ancestor node, and a state of whether the child node is submitted to be true or not when the attribution is successful is set;

when attribution of the father node fails and attribution failure is set, whether to submit the state of the child node is set;

when the father node fails to be attributed and finds a child node, setting whether to submit the self-node state or not;

and if no fields needing to be exchanged exist, setting the values of the fields as a third preset value.

6. The method of claim 2, further comprising:

attribution attributes of attribute definitions of a data processing engine are generated from a path of a specified item and attribution attributes of the specified item.

7. A monitoring platform, which is applied to the method of any one of claims 1 to 6, comprising:

a management web page end, a management data processing engine and a data processing engine, wherein,

the management webpage end is used for providing a user use interface and authority verification;

the management data processing engine is used for starting and stopping the appointed workflow and finishing the pre-release and release work of the project;

and the data processing engine is used for operating the specified workflow and finishing the data processing work.

8. The monitoring platform of claim 7, further comprising: the database is used for searching or creating users for logs of different sources according to related user fields in the logs; and generating an association path at a session or user level for the event with the association relationship.

9. A log real-time attribution processing apparatus, comprising:

the updating module is used for updating the data through updating insertion when the unique key of the data exists;

the correlation module is used for correlating the upstream log and the downstream log according to the upstream log relation specified in the dependency attribution path;

the log generation module is used for performing complementary updating on the fields of the upstream log and the downstream log and recording the association result in a first log and a second log, wherein the first log is used for expressing whether the upstream log is successfully associated or not; the second log is used for expressing whether the downstream log is related or not.

10. The apparatus of claim 9, wherein the associating module comprises:

the judging unit is used for judging whether the top node or the attribution path is empty or not;

the execution unit is used for determining whether to execute the execution logic of the child node for finding the parent node according to the judgment result;

and the submitting unit is used for submitting the execution result to the log set.