CN113282393A

CN113282393A - Method for solving task scheduling facing to multiple portrait label jobs

Info

Publication number: CN113282393A
Application number: CN202110624290.7A
Authority: CN
Inventors: 刘跃红; 余丽玲
Original assignee: Yinsheng Payment Service Co Ltd
Current assignee: Yinsheng Payment Service Co Ltd
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2021-08-20

Abstract

The invention discloses a method for solving a task of job scheduling for a plurality of portrait labels, which belongs to the technical field of system development and comprises the following steps: acquiring data required by user images and using the data as a source end of offline data scheduling; synchronizing the fields into a hive data warehouse; checking the number of the source end data and the number of the data synchronized to the portrait intermediate table, if the source end data and the data are consistent, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, and if the source end data and the data are inconsistent, deleting the data of the synchronized day and rescheduling the data; and step four, checking whether the calculated offline tag data is normal, if so, sending a mail and informing related personnel of enterprise micro-communication, and if so, starting a scheduling task for aggregating the user tag data. The invention can automatically generate the required label, reschedule and send the mail notification when the task has a problem, and ensure the reliability and stability of the data.

Description

Method for solving task scheduling facing to multiple portrait label jobs

Technical Field

The invention relates to the technical field of system development, in particular to a method for solving a task of job scheduling for a plurality of portrait labels.

Background

Due to the demands of the user representation system, hundreds of tag scripts need to be developed, each tag needs to submit a spark task separately in order to reduce the coupling between each tag data, and a timing job is needed every day to refresh the new tag generated the previous day. But a number of problems arise:

1. the generation of the labels requires special personnel to write a calculation script, and with the increase of the labels, the scale of scheduling tasks is increased, and the artificial cost is higher and higher;

2. the dependency relationship between tasks is disordered;

3. it is inconvenient to see to which task is currently executed;

4. the problem occurs and the quick positioning cannot be realized;

5. it is not convenient to record the execution of the historically scheduled tasks.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a method for solving a task of job scheduling for a plurality of portrait labels, so as to solve the technical problems.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method for solving task scheduling facing to a plurality of portrait label jobs comprises the following steps:

the method comprises the following steps: collecting data required by a user image, including service data, log data, buried point data and third-party data, storing the data into a hive data warehouse, and using the data as a source end of offline data scheduling;

step two: synchronizing fields into the hive data warehouse;

step three: checking the number of the source end data and the number of the data synchronized to the portrait intermediate table, if the number of the source end data is consistent with the number of the data synchronized to the portrait intermediate table, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, and if the number of the source end data is inconsistent with the number of the data synchronized to the portrait intermediate table, deleting the data synchronized on the day and rescheduling the data synchronized to the portrait intermediate table;

step four: and checking whether the calculated offline tag data are normal or not, if so, sending a mail and an enterprise WeChat to inform related personnel, if so, starting a scheduling task of aggregating the user tag data, and after the aggregation is finished, synchronizing the tag data and the aggregated tag data and storing the aggregated tag data into a plurality of databases of different types.

As an improvement of the technical scheme, the method further comprises the steps that Kafka carries out real-time label data processing, user portrait label data needing real-time processing are calculated, and the user portrait label data are stored in a portrait database.

As an improvement of the technical scheme, in the first step, the business data, the log data, the buried point data and the third-party data are stored into a DW library, an ODS library and a DM library corresponding to the hive data warehouse in a program, script and ogg mode.

As an improvement of the technical scheme, after business data, log data, buried point data and third-party data are stored in a DW library, an ODS library and a DM library, a hive data warehouse initiates table field requirements.

As an improvement of the technical scheme, the required table fields in the hive data warehouse are selected according to requirements and synchronized into the representation intermediate table of the hive data warehouse.

As an improvement of the above technical solution, in step three, the offline data is scheduled through the dolphin scheduler platform according to the dependency relationship between the tags.

As an improvement of the above technical solution, in step four, the synchronized data includes data of a tag state, tag data generated synchronously, and data after synchronous aggregation;

recording the data of the tag state into a mysql database, and if the data of the day is abnormal in synchronization, displaying the normal data of the previous day;

synchronously generating label data to be recorded in a clickhouse database;

and synchronously recording the aggregated data to an elastic search database and a hbase database.

As an improvement of the technical scheme, the synchronous label data is scheduled through a dolphin scheduler platform.

As an improvement of the technical scheme, the method also comprises the steps of directly matching the existing label template according to the content of the requirement to generate a scheduling task and directly calculating the label of the user portrait if the new label requirement exists.

The invention has the beneficial effects that:

(1) the universal type label can be automatically generated according to the requirements of business personnel;

(2) the relation among all label tasks is clear and easy to check, and the tasks are conveniently and orderly executed;

(3) the task currently executed can be clearly seen in a visual mode;

(4) the problems occurring are conveniently and quickly positioned according to the log, so that the problems can be timely processed;

(5) the time of execution of each task is convenient to be checked, so that the working time is optimized later;

(6) and when the task has a problem, rescheduling and sending a mail notification to ensure the reliability and stability of the data.

Drawings

The invention is further illustrated with reference to the following figures and examples.

FIG. 1 is a schematic structural diagram of the present invention.

Detailed Description

The conception, the specific structure, and the technical effects produced by the present invention will be clearly and completely described below in conjunction with the embodiments and the accompanying drawings to fully understand the objects, the features, and the effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention. In addition, all the connection/connection relations referred to in the patent do not mean that the components are directly connected, but mean that a better connection structure can be formed by adding or reducing connection auxiliary components according to specific implementation conditions. All technical characteristics in the invention can be interactively combined on the premise of not conflicting with each other.

Referring to FIG. 1, the present invention discloses a method for solving job scheduling task facing multiple portrait tags, comprising the following steps:

step two: synchronizing fields into the hive data warehouse;

In the above embodiment, after the business data, the log data, the buried point data, and the third-party data are stored in the DW library, the ODS library, and the DM library in the hive data warehouse, the hive data warehouse initiates a table field requirement. And selecting required table fields in the hive data warehouse according to the requirement to be synchronized into the representation intermediate table of the hive data warehouse. Then, initiating verification, if the number of the source end data is consistent with the number of the data synchronized to the portrait intermediate table, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, if the source end data is inconsistent with the number of the data synchronized to the portrait intermediate table, deleting the data of the synchronized day and performing rescheduling, namely, according to different label types, initiating verification again, generating data of the portrait label of the user after starting a scheduling task of the offline label data, verifying whether the calculated offline label data is normal, if the calculated offline label data is abnormal, sending mails and related personnel for enterprise WeChat communication, and then initiating scheduling again, namely, initiating verification calculation again; if the label data are normal, a scheduling task of aggregating the user label data is started, the aggregated synchronous label data and the aggregated label data are stored in a plurality of databases of different types, in the process, a universal type label can be automatically generated according to the requirements of business personnel, the relation among all label tasks is clear and easy to check, the task can be conveniently and orderly executed, the label data are rescheduled when the task has problems, and a mail notification is sent to ensure the reliability and the stability of the data.

The method also comprises Kafka processing the real-time label data, calculating the user portrait label data needing real-time processing, and storing the user portrait label data in a portrait database.

Furthermore, the business data, the log data, the buried point data and the third-party data are stored into a DW library, an ODS library and a DM library corresponding to the hive data warehouse in a program, script and ogg mode. After the business data, the log data, the buried point data and the third-party data are stored in the DW library, the ODS library and the DM library, the hive data warehouse initiates table field requirements. And selecting required table fields in the hive data warehouse according to the requirement to be synchronized into the representation intermediate table of the hive data warehouse.

In the above embodiment, the real-time data and the offline data are the same and include service data, log data, buried point data, third-party data, and external data. The buried point is a term in the field of data acquisition, especially in the field of user behavior data acquisition, and refers to a related technology and implementation process thereof for capturing, processing and transmitting specific user behaviors or events. The off-line data is processed through a program, a script and an ogg channel, and the real-time data is processed through a Kafka channel.

And further, if a new label is required, directly matching the existing label template according to the required content to generate a scheduling task, and directly calculating the label data of the user portrait.

In the third step of the scheme, the offline data is scheduled through the dolphin scheduler platform according to the dependency relationship among the labels. The dolphin scheduler platform is a scheduling system, can visually and clearly see the currently executed tasks, is convenient to quickly locate the problems according to the logs, is convenient to timely process the problems, and is convenient to check the execution time of each task, so that the subsequent operation time is optimized.

In the fourth step of the scheme, the synchronous data comprises data in a tag state, tag data generated synchronously and data after synchronous aggregation, wherein the synchronous tag data is scheduled through a dolphin scheduler platform, the data in the tag state is recorded in a mysql database, if the data in the day of synchronization is abnormal, the normal data in the previous day is taken for display, the tag data generated synchronously is recorded in a clickhouse database, and the data after synchronous aggregation is recorded in an elastic search database and a hbase database. The mysql database can display tag metadata, the clickhouse database has very fast aggregation performance on large-batch data, tag aggregation data can be taken from the clickhouse database, the elasticsearch database is responsible for crowd calculation and analysis, and the hbase database can display data collected by the user portrait system in a personalized and real-time mode.

The invention has the beneficial effects that:

(3) the task currently executed can be clearly seen in a visual mode;

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for solving task scheduling facing to a plurality of portrait label jobs is characterized by comprising the following steps:

step two: synchronizing fields into the hive data warehouse;

2. The method of claim 1, further comprising Kafka performing real-time tag data processing, calculating user portrait tag data to be processed in real-time, and storing the user portrait tag data in a portrait database.

3. The method as claimed in claim 1, wherein in step one, the service data, the log data, the buried data and the third party data are stored in the DW library, the ODS library and the DM library corresponding to the hive data warehouse by means of program, script and ogg.

4. The method for solving task scheduling for multiple portrait label jobs as recited in claim 3, wherein the hive data warehouse initiates table field requirement after the business data, log data, buried data and third party data are stored in DW library, ODS library and DM library.

5. The method for solving the task of scheduling jobs for multiple portrait tags according to claim 4, wherein required table fields in the hive data warehouse are selected according to requirements and synchronized into the portrait intermediate table of the hive data warehouse.

6. The method as claimed in claim 1, wherein in step three, the offline data is scheduled by dolphin scheduler platform according to the dependency relationship between tags.

7. The method for solving task scheduling oriented to multiple portrait label jobs according to claim 1, wherein in step four, the synchronized data includes data of label state, synchronously generated label data and synchronously aggregated data;

synchronously generating label data to be recorded in a clickhouse database;

8. The method of claim 7, wherein the synchronized tag data is scheduled by a dolphin scheduler platform.

9. The method of claim 1, further comprising matching existing label templates directly according to the content of the request to generate a scheduling task and calculating the label of the user portrait directly if there is a new label request.