CN113282393A - Method for solving task scheduling facing to multiple portrait label jobs - Google Patents

Method for solving task scheduling facing to multiple portrait label jobs Download PDF

Info

Publication number
CN113282393A
CN113282393A CN202110624290.7A CN202110624290A CN113282393A CN 113282393 A CN113282393 A CN 113282393A CN 202110624290 A CN202110624290 A CN 202110624290A CN 113282393 A CN113282393 A CN 113282393A
Authority
CN
China
Prior art keywords
data
portrait
label
scheduling
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110624290.7A
Other languages
Chinese (zh)
Inventor
刘跃红
余丽玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yinsheng Payment Service Co Ltd
Original Assignee
Yinsheng Payment Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yinsheng Payment Service Co Ltd filed Critical Yinsheng Payment Service Co Ltd
Priority to CN202110624290.7A priority Critical patent/CN113282393A/en
Publication of CN113282393A publication Critical patent/CN113282393A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The invention discloses a method for solving a task of job scheduling for a plurality of portrait labels, which belongs to the technical field of system development and comprises the following steps: acquiring data required by user images and using the data as a source end of offline data scheduling; synchronizing the fields into a hive data warehouse; checking the number of the source end data and the number of the data synchronized to the portrait intermediate table, if the source end data and the data are consistent, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, and if the source end data and the data are inconsistent, deleting the data of the synchronized day and rescheduling the data; and step four, checking whether the calculated offline tag data is normal, if so, sending a mail and informing related personnel of enterprise micro-communication, and if so, starting a scheduling task for aggregating the user tag data. The invention can automatically generate the required label, reschedule and send the mail notification when the task has a problem, and ensure the reliability and stability of the data.

Description

Method for solving task scheduling facing to multiple portrait label jobs
Technical Field
The invention relates to the technical field of system development, in particular to a method for solving a task of job scheduling for a plurality of portrait labels.
Background
Due to the demands of the user representation system, hundreds of tag scripts need to be developed, each tag needs to submit a spark task separately in order to reduce the coupling between each tag data, and a timing job is needed every day to refresh the new tag generated the previous day. But a number of problems arise:
1. the generation of the labels requires special personnel to write a calculation script, and with the increase of the labels, the scale of scheduling tasks is increased, and the artificial cost is higher and higher;
2. the dependency relationship between tasks is disordered;
3. it is inconvenient to see to which task is currently executed;
4. the problem occurs and the quick positioning cannot be realized;
5. it is not convenient to record the execution of the historically scheduled tasks.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method for solving a task of job scheduling for a plurality of portrait labels, so as to solve the technical problems.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for solving task scheduling facing to a plurality of portrait label jobs comprises the following steps:
the method comprises the following steps: collecting data required by a user image, including service data, log data, buried point data and third-party data, storing the data into a hive data warehouse, and using the data as a source end of offline data scheduling;
step two: synchronizing fields into the hive data warehouse;
step three: checking the number of the source end data and the number of the data synchronized to the portrait intermediate table, if the number of the source end data is consistent with the number of the data synchronized to the portrait intermediate table, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, and if the number of the source end data is inconsistent with the number of the data synchronized to the portrait intermediate table, deleting the data synchronized on the day and rescheduling the data synchronized to the portrait intermediate table;
step four: and checking whether the calculated offline tag data are normal or not, if so, sending a mail and an enterprise WeChat to inform related personnel, if so, starting a scheduling task of aggregating the user tag data, and after the aggregation is finished, synchronizing the tag data and the aggregated tag data and storing the aggregated tag data into a plurality of databases of different types.
As an improvement of the technical scheme, the method further comprises the steps that Kafka carries out real-time label data processing, user portrait label data needing real-time processing are calculated, and the user portrait label data are stored in a portrait database.
As an improvement of the technical scheme, in the first step, the business data, the log data, the buried point data and the third-party data are stored into a DW library, an ODS library and a DM library corresponding to the hive data warehouse in a program, script and ogg mode.
As an improvement of the technical scheme, after business data, log data, buried point data and third-party data are stored in a DW library, an ODS library and a DM library, a hive data warehouse initiates table field requirements.
As an improvement of the technical scheme, the required table fields in the hive data warehouse are selected according to requirements and synchronized into the representation intermediate table of the hive data warehouse.
As an improvement of the above technical solution, in step three, the offline data is scheduled through the dolphin scheduler platform according to the dependency relationship between the tags.
As an improvement of the above technical solution, in step four, the synchronized data includes data of a tag state, tag data generated synchronously, and data after synchronous aggregation;
recording the data of the tag state into a mysql database, and if the data of the day is abnormal in synchronization, displaying the normal data of the previous day;
synchronously generating label data to be recorded in a clickhouse database;
and synchronously recording the aggregated data to an elastic search database and a hbase database.
As an improvement of the technical scheme, the synchronous label data is scheduled through a dolphin scheduler platform.
As an improvement of the technical scheme, the method also comprises the steps of directly matching the existing label template according to the content of the requirement to generate a scheduling task and directly calculating the label of the user portrait if the new label requirement exists.
The invention has the beneficial effects that:
(1) the universal type label can be automatically generated according to the requirements of business personnel;
(2) the relation among all label tasks is clear and easy to check, and the tasks are conveniently and orderly executed;
(3) the task currently executed can be clearly seen in a visual mode;
(4) the problems occurring are conveniently and quickly positioned according to the log, so that the problems can be timely processed;
(5) the time of execution of each task is convenient to be checked, so that the working time is optimized later;
(6) and when the task has a problem, rescheduling and sending a mail notification to ensure the reliability and stability of the data.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a schematic structural diagram of the present invention.
Detailed Description
The conception, the specific structure, and the technical effects produced by the present invention will be clearly and completely described below in conjunction with the embodiments and the accompanying drawings to fully understand the objects, the features, and the effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention. In addition, all the connection/connection relations referred to in the patent do not mean that the components are directly connected, but mean that a better connection structure can be formed by adding or reducing connection auxiliary components according to specific implementation conditions. All technical characteristics in the invention can be interactively combined on the premise of not conflicting with each other.
Referring to FIG. 1, the present invention discloses a method for solving job scheduling task facing multiple portrait tags, comprising the following steps:
the method comprises the following steps: collecting data required by a user image, including service data, log data, buried point data and third-party data, storing the data into a hive data warehouse, and using the data as a source end of offline data scheduling;
step two: synchronizing fields into the hive data warehouse;
step three: checking the number of the source end data and the number of the data synchronized to the portrait intermediate table, if the number of the source end data is consistent with the number of the data synchronized to the portrait intermediate table, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, and if the number of the source end data is inconsistent with the number of the data synchronized to the portrait intermediate table, deleting the data synchronized on the day and rescheduling the data synchronized to the portrait intermediate table;
step four: and checking whether the calculated offline tag data are normal or not, if so, sending a mail and an enterprise WeChat to inform related personnel, if so, starting a scheduling task of aggregating the user tag data, and after the aggregation is finished, synchronizing the tag data and the aggregated tag data and storing the aggregated tag data into a plurality of databases of different types.
In the above embodiment, after the business data, the log data, the buried point data, and the third-party data are stored in the DW library, the ODS library, and the DM library in the hive data warehouse, the hive data warehouse initiates a table field requirement. And selecting required table fields in the hive data warehouse according to the requirement to be synchronized into the representation intermediate table of the hive data warehouse. Then, initiating verification, if the number of the source end data is consistent with the number of the data synchronized to the portrait intermediate table, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, if the source end data is inconsistent with the number of the data synchronized to the portrait intermediate table, deleting the data of the synchronized day and performing rescheduling, namely, according to different label types, initiating verification again, generating data of the portrait label of the user after starting a scheduling task of the offline label data, verifying whether the calculated offline label data is normal, if the calculated offline label data is abnormal, sending mails and related personnel for enterprise WeChat communication, and then initiating scheduling again, namely, initiating verification calculation again; if the label data are normal, a scheduling task of aggregating the user label data is started, the aggregated synchronous label data and the aggregated label data are stored in a plurality of databases of different types, in the process, a universal type label can be automatically generated according to the requirements of business personnel, the relation among all label tasks is clear and easy to check, the task can be conveniently and orderly executed, the label data are rescheduled when the task has problems, and a mail notification is sent to ensure the reliability and the stability of the data.
The method also comprises Kafka processing the real-time label data, calculating the user portrait label data needing real-time processing, and storing the user portrait label data in a portrait database.
Furthermore, the business data, the log data, the buried point data and the third-party data are stored into a DW library, an ODS library and a DM library corresponding to the hive data warehouse in a program, script and ogg mode. After the business data, the log data, the buried point data and the third-party data are stored in the DW library, the ODS library and the DM library, the hive data warehouse initiates table field requirements. And selecting required table fields in the hive data warehouse according to the requirement to be synchronized into the representation intermediate table of the hive data warehouse.
In the above embodiment, the real-time data and the offline data are the same and include service data, log data, buried point data, third-party data, and external data. The buried point is a term in the field of data acquisition, especially in the field of user behavior data acquisition, and refers to a related technology and implementation process thereof for capturing, processing and transmitting specific user behaviors or events. The off-line data is processed through a program, a script and an ogg channel, and the real-time data is processed through a Kafka channel.
And further, if a new label is required, directly matching the existing label template according to the required content to generate a scheduling task, and directly calculating the label data of the user portrait.
In the third step of the scheme, the offline data is scheduled through the dolphin scheduler platform according to the dependency relationship among the labels. The dolphin scheduler platform is a scheduling system, can visually and clearly see the currently executed tasks, is convenient to quickly locate the problems according to the logs, is convenient to timely process the problems, and is convenient to check the execution time of each task, so that the subsequent operation time is optimized.
In the fourth step of the scheme, the synchronous data comprises data in a tag state, tag data generated synchronously and data after synchronous aggregation, wherein the synchronous tag data is scheduled through a dolphin scheduler platform, the data in the tag state is recorded in a mysql database, if the data in the day of synchronization is abnormal, the normal data in the previous day is taken for display, the tag data generated synchronously is recorded in a clickhouse database, and the data after synchronous aggregation is recorded in an elastic search database and a hbase database. The mysql database can display tag metadata, the clickhouse database has very fast aggregation performance on large-batch data, tag aggregation data can be taken from the clickhouse database, the elasticsearch database is responsible for crowd calculation and analysis, and the hbase database can display data collected by the user portrait system in a personalized and real-time mode.
The invention has the beneficial effects that:
(1) the universal type label can be automatically generated according to the requirements of business personnel;
(2) the relation among all label tasks is clear and easy to check, and the tasks are conveniently and orderly executed;
(3) the task currently executed can be clearly seen in a visual mode;
(4) the problems occurring are conveniently and quickly positioned according to the log, so that the problems can be timely processed;
(5) the time of execution of each task is convenient to be checked, so that the working time is optimized later;
(6) and when the task has a problem, rescheduling and sending a mail notification to ensure the reliability and stability of the data.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A method for solving task scheduling facing to a plurality of portrait label jobs is characterized by comprising the following steps:
the method comprises the following steps: collecting data required by a user image, including service data, log data, buried point data and third-party data, storing the data into a hive data warehouse, and using the data as a source end of offline data scheduling;
step two: synchronizing fields into the hive data warehouse;
step three: checking the number of the source end data and the number of the data synchronized to the portrait intermediate table, if the number of the source end data is consistent with the number of the data synchronized to the portrait intermediate table, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, and if the number of the source end data is inconsistent with the number of the data synchronized to the portrait intermediate table, deleting the data synchronized on the day and rescheduling the data synchronized to the portrait intermediate table;
step four: and checking whether the calculated offline tag data are normal or not, if so, sending a mail and an enterprise WeChat to inform related personnel, if so, starting a scheduling task of aggregating the user tag data, and after the aggregation is finished, synchronizing the tag data and the aggregated tag data and storing the aggregated tag data into a plurality of databases of different types.
2. The method of claim 1, further comprising Kafka performing real-time tag data processing, calculating user portrait tag data to be processed in real-time, and storing the user portrait tag data in a portrait database.
3. The method as claimed in claim 1, wherein in step one, the service data, the log data, the buried data and the third party data are stored in the DW library, the ODS library and the DM library corresponding to the hive data warehouse by means of program, script and ogg.
4. The method for solving task scheduling for multiple portrait label jobs as recited in claim 3, wherein the hive data warehouse initiates table field requirement after the business data, log data, buried data and third party data are stored in DW library, ODS library and DM library.
5. The method for solving the task of scheduling jobs for multiple portrait tags according to claim 4, wherein required table fields in the hive data warehouse are selected according to requirements and synchronized into the portrait intermediate table of the hive data warehouse.
6. The method as claimed in claim 1, wherein in step three, the offline data is scheduled by dolphin scheduler platform according to the dependency relationship between tags.
7. The method for solving task scheduling oriented to multiple portrait label jobs according to claim 1, wherein in step four, the synchronized data includes data of label state, synchronously generated label data and synchronously aggregated data;
recording the data of the tag state into a mysql database, and if the data of the day is abnormal in synchronization, displaying the normal data of the previous day;
synchronously generating label data to be recorded in a clickhouse database;
and synchronously recording the aggregated data to an elastic search database and a hbase database.
8. The method of claim 7, wherein the synchronized tag data is scheduled by a dolphin scheduler platform.
9. The method of claim 1, further comprising matching existing label templates directly according to the content of the request to generate a scheduling task and calculating the label of the user portrait directly if there is a new label request.
CN202110624290.7A 2021-06-04 2021-06-04 Method for solving task scheduling facing to multiple portrait label jobs Pending CN113282393A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110624290.7A CN113282393A (en) 2021-06-04 2021-06-04 Method for solving task scheduling facing to multiple portrait label jobs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110624290.7A CN113282393A (en) 2021-06-04 2021-06-04 Method for solving task scheduling facing to multiple portrait label jobs

Publications (1)

Publication Number Publication Date
CN113282393A true CN113282393A (en) 2021-08-20

Family

ID=77283343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110624290.7A Pending CN113282393A (en) 2021-06-04 2021-06-04 Method for solving task scheduling facing to multiple portrait label jobs

Country Status (1)

Country Link
CN (1) CN113282393A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064660A (en) * 2021-11-29 2022-02-18 重庆允成互联网科技有限公司 Data structured analysis method based on ElasticSearch

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391752A (en) * 2017-08-16 2017-11-24 四川长虹电器股份有限公司 A kind of method based on hadoop platform construction user tag information
CN110427422A (en) * 2019-05-23 2019-11-08 武汉达梦数据库有限公司 Data consistency verification method, equipment and storage medium when data synchronous abnormality
CN111475509A (en) * 2020-04-03 2020-07-31 李俊宏 Big data-based user portrait and multidimensional analysis system
CN111737230A (en) * 2020-06-23 2020-10-02 北京奇艺世纪科技有限公司 Data verification method and device, electronic equipment and readable storage medium
CN111881221A (en) * 2020-07-07 2020-11-03 上海中通吉网络技术有限公司 Method, device and equipment for customer portrait in logistics service

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391752A (en) * 2017-08-16 2017-11-24 四川长虹电器股份有限公司 A kind of method based on hadoop platform construction user tag information
CN110427422A (en) * 2019-05-23 2019-11-08 武汉达梦数据库有限公司 Data consistency verification method, equipment and storage medium when data synchronous abnormality
CN111475509A (en) * 2020-04-03 2020-07-31 李俊宏 Big data-based user portrait and multidimensional analysis system
CN111737230A (en) * 2020-06-23 2020-10-02 北京奇艺世纪科技有限公司 Data verification method and device, electronic equipment and readable storage medium
CN111881221A (en) * 2020-07-07 2020-11-03 上海中通吉网络技术有限公司 Method, device and equipment for customer portrait in logistics service

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈捷,: "基于大数据技术的用户画像系统设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑(月刊)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064660A (en) * 2021-11-29 2022-02-18 重庆允成互联网科技有限公司 Data structured analysis method based on ElasticSearch
CN114064660B (en) * 2021-11-29 2022-06-21 重庆允成互联网科技有限公司 Data structured analysis method based on ElasticSearch

Similar Documents

Publication Publication Date Title
CN109408337B (en) Interface operation and maintenance method and device
CN101719149A (en) Data synchronization method and device
CN111125444A (en) Big data task scheduling management method, device, equipment and storage medium
CN111400011B (en) Real-time task scheduling method, system, equipment and readable storage medium
CN108197155A (en) Information data synchronous method, device and computer readable storage medium
US20090083221A1 (en) System and Method for Estimating and Storing Skills for Reuse
CN111061696B (en) Method and device for analyzing transaction message log
CN112486701A (en) Message asynchronous processing method and equipment thereof
CN114741375A (en) Rapid and automatic data migration system and method for multi-source heterogeneous database
CN114356692A (en) Visual processing method and device for application monitoring link and storage medium
CN113282393A (en) Method for solving task scheduling facing to multiple portrait label jobs
CN105867895A (en) Method for realizing code management and need management information synchronization and device thereof
CN114398359A (en) Order data automatic reconciliation method, device and storage medium
CN111753015A (en) Data query method and device of payment clearing system
CN116627609A (en) Hive batch processing-based scheduling method and device
CN107451056B (en) Method and device for monitoring interface test result
CN110322313A (en) The method transferred items based on SAP system batch creation sales order and delivery order
CN115481116A (en) Data quality inspection method and device
CN114492861A (en) Test data acquisition and analysis method
CN114020819A (en) Multi-system parameter synchronization method and device
US8631391B2 (en) Method and a system for process discovery
CN112527497A (en) Serialized multithreading data processing system
CN112925697B (en) Method, device, equipment and medium for monitoring job difference
CN109710688A (en) A kind of real-time Inspection method of data and message-oriented middleware
CN112965793B (en) Identification analysis data-oriented data warehouse task scheduling method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210820