CN113282393A - Method for solving task scheduling facing to multiple portrait label jobs - Google Patents
Method for solving task scheduling facing to multiple portrait label jobs Download PDFInfo
- Publication number
- CN113282393A CN113282393A CN202110624290.7A CN202110624290A CN113282393A CN 113282393 A CN113282393 A CN 113282393A CN 202110624290 A CN202110624290 A CN 202110624290A CN 113282393 A CN113282393 A CN 113282393A
- Authority
- CN
- China
- Prior art keywords
- data
- portrait
- label
- scheduling
- tag
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000001360 synchronised effect Effects 0.000 claims abstract description 34
- 230000004931 aggregating effect Effects 0.000 claims abstract description 5
- 230000002776 aggregation Effects 0.000 claims description 8
- 238000004220 aggregation Methods 0.000 claims description 8
- 241001481833 Coryphaena hippurus Species 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 6
- 238000013515 script Methods 0.000 claims description 6
- 230000002159 abnormal effect Effects 0.000 claims description 4
- 238000004891 communication Methods 0.000 abstract description 2
- 230000033772 system development Effects 0.000 abstract description 2
- 230000000977 initiatory effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24554—Unary operations; Data partitioning operations
- G06F16/24556—Aggregation; Duplicate elimination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Abstract
The invention discloses a method for solving a task of job scheduling for a plurality of portrait labels, which belongs to the technical field of system development and comprises the following steps: acquiring data required by user images and using the data as a source end of offline data scheduling; synchronizing the fields into a hive data warehouse; checking the number of the source end data and the number of the data synchronized to the portrait intermediate table, if the source end data and the data are consistent, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, and if the source end data and the data are inconsistent, deleting the data of the synchronized day and rescheduling the data; and step four, checking whether the calculated offline tag data is normal, if so, sending a mail and informing related personnel of enterprise micro-communication, and if so, starting a scheduling task for aggregating the user tag data. The invention can automatically generate the required label, reschedule and send the mail notification when the task has a problem, and ensure the reliability and stability of the data.
Description
Technical Field
The invention relates to the technical field of system development, in particular to a method for solving a task of job scheduling for a plurality of portrait labels.
Background
Due to the demands of the user representation system, hundreds of tag scripts need to be developed, each tag needs to submit a spark task separately in order to reduce the coupling between each tag data, and a timing job is needed every day to refresh the new tag generated the previous day. But a number of problems arise:
1. the generation of the labels requires special personnel to write a calculation script, and with the increase of the labels, the scale of scheduling tasks is increased, and the artificial cost is higher and higher;
2. the dependency relationship between tasks is disordered;
3. it is inconvenient to see to which task is currently executed;
4. the problem occurs and the quick positioning cannot be realized;
5. it is not convenient to record the execution of the historically scheduled tasks.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method for solving a task of job scheduling for a plurality of portrait labels, so as to solve the technical problems.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for solving task scheduling facing to a plurality of portrait label jobs comprises the following steps:
the method comprises the following steps: collecting data required by a user image, including service data, log data, buried point data and third-party data, storing the data into a hive data warehouse, and using the data as a source end of offline data scheduling;
step two: synchronizing fields into the hive data warehouse;
step three: checking the number of the source end data and the number of the data synchronized to the portrait intermediate table, if the number of the source end data is consistent with the number of the data synchronized to the portrait intermediate table, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, and if the number of the source end data is inconsistent with the number of the data synchronized to the portrait intermediate table, deleting the data synchronized on the day and rescheduling the data synchronized to the portrait intermediate table;
step four: and checking whether the calculated offline tag data are normal or not, if so, sending a mail and an enterprise WeChat to inform related personnel, if so, starting a scheduling task of aggregating the user tag data, and after the aggregation is finished, synchronizing the tag data and the aggregated tag data and storing the aggregated tag data into a plurality of databases of different types.
As an improvement of the technical scheme, the method further comprises the steps that Kafka carries out real-time label data processing, user portrait label data needing real-time processing are calculated, and the user portrait label data are stored in a portrait database.
As an improvement of the technical scheme, in the first step, the business data, the log data, the buried point data and the third-party data are stored into a DW library, an ODS library and a DM library corresponding to the hive data warehouse in a program, script and ogg mode.
As an improvement of the technical scheme, after business data, log data, buried point data and third-party data are stored in a DW library, an ODS library and a DM library, a hive data warehouse initiates table field requirements.
As an improvement of the technical scheme, the required table fields in the hive data warehouse are selected according to requirements and synchronized into the representation intermediate table of the hive data warehouse.
As an improvement of the above technical solution, in step three, the offline data is scheduled through the dolphin scheduler platform according to the dependency relationship between the tags.
As an improvement of the above technical solution, in step four, the synchronized data includes data of a tag state, tag data generated synchronously, and data after synchronous aggregation;
recording the data of the tag state into a mysql database, and if the data of the day is abnormal in synchronization, displaying the normal data of the previous day;
synchronously generating label data to be recorded in a clickhouse database;
and synchronously recording the aggregated data to an elastic search database and a hbase database.
As an improvement of the technical scheme, the synchronous label data is scheduled through a dolphin scheduler platform.
As an improvement of the technical scheme, the method also comprises the steps of directly matching the existing label template according to the content of the requirement to generate a scheduling task and directly calculating the label of the user portrait if the new label requirement exists.
The invention has the beneficial effects that:
(1) the universal type label can be automatically generated according to the requirements of business personnel;
(2) the relation among all label tasks is clear and easy to check, and the tasks are conveniently and orderly executed;
(3) the task currently executed can be clearly seen in a visual mode;
(4) the problems occurring are conveniently and quickly positioned according to the log, so that the problems can be timely processed;
(5) the time of execution of each task is convenient to be checked, so that the working time is optimized later;
(6) and when the task has a problem, rescheduling and sending a mail notification to ensure the reliability and stability of the data.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a schematic structural diagram of the present invention.
Detailed Description
The conception, the specific structure, and the technical effects produced by the present invention will be clearly and completely described below in conjunction with the embodiments and the accompanying drawings to fully understand the objects, the features, and the effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention. In addition, all the connection/connection relations referred to in the patent do not mean that the components are directly connected, but mean that a better connection structure can be formed by adding or reducing connection auxiliary components according to specific implementation conditions. All technical characteristics in the invention can be interactively combined on the premise of not conflicting with each other.
Referring to FIG. 1, the present invention discloses a method for solving job scheduling task facing multiple portrait tags, comprising the following steps:
the method comprises the following steps: collecting data required by a user image, including service data, log data, buried point data and third-party data, storing the data into a hive data warehouse, and using the data as a source end of offline data scheduling;
step two: synchronizing fields into the hive data warehouse;
step three: checking the number of the source end data and the number of the data synchronized to the portrait intermediate table, if the number of the source end data is consistent with the number of the data synchronized to the portrait intermediate table, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, and if the number of the source end data is inconsistent with the number of the data synchronized to the portrait intermediate table, deleting the data synchronized on the day and rescheduling the data synchronized to the portrait intermediate table;
step four: and checking whether the calculated offline tag data are normal or not, if so, sending a mail and an enterprise WeChat to inform related personnel, if so, starting a scheduling task of aggregating the user tag data, and after the aggregation is finished, synchronizing the tag data and the aggregated tag data and storing the aggregated tag data into a plurality of databases of different types.
In the above embodiment, after the business data, the log data, the buried point data, and the third-party data are stored in the DW library, the ODS library, and the DM library in the hive data warehouse, the hive data warehouse initiates a table field requirement. And selecting required table fields in the hive data warehouse according to the requirement to be synchronized into the representation intermediate table of the hive data warehouse. Then, initiating verification, if the number of the source end data is consistent with the number of the data synchronized to the portrait intermediate table, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, if the source end data is inconsistent with the number of the data synchronized to the portrait intermediate table, deleting the data of the synchronized day and performing rescheduling, namely, according to different label types, initiating verification again, generating data of the portrait label of the user after starting a scheduling task of the offline label data, verifying whether the calculated offline label data is normal, if the calculated offline label data is abnormal, sending mails and related personnel for enterprise WeChat communication, and then initiating scheduling again, namely, initiating verification calculation again; if the label data are normal, a scheduling task of aggregating the user label data is started, the aggregated synchronous label data and the aggregated label data are stored in a plurality of databases of different types, in the process, a universal type label can be automatically generated according to the requirements of business personnel, the relation among all label tasks is clear and easy to check, the task can be conveniently and orderly executed, the label data are rescheduled when the task has problems, and a mail notification is sent to ensure the reliability and the stability of the data.
The method also comprises Kafka processing the real-time label data, calculating the user portrait label data needing real-time processing, and storing the user portrait label data in a portrait database.
Furthermore, the business data, the log data, the buried point data and the third-party data are stored into a DW library, an ODS library and a DM library corresponding to the hive data warehouse in a program, script and ogg mode. After the business data, the log data, the buried point data and the third-party data are stored in the DW library, the ODS library and the DM library, the hive data warehouse initiates table field requirements. And selecting required table fields in the hive data warehouse according to the requirement to be synchronized into the representation intermediate table of the hive data warehouse.
In the above embodiment, the real-time data and the offline data are the same and include service data, log data, buried point data, third-party data, and external data. The buried point is a term in the field of data acquisition, especially in the field of user behavior data acquisition, and refers to a related technology and implementation process thereof for capturing, processing and transmitting specific user behaviors or events. The off-line data is processed through a program, a script and an ogg channel, and the real-time data is processed through a Kafka channel.
And further, if a new label is required, directly matching the existing label template according to the required content to generate a scheduling task, and directly calculating the label data of the user portrait.
In the third step of the scheme, the offline data is scheduled through the dolphin scheduler platform according to the dependency relationship among the labels. The dolphin scheduler platform is a scheduling system, can visually and clearly see the currently executed tasks, is convenient to quickly locate the problems according to the logs, is convenient to timely process the problems, and is convenient to check the execution time of each task, so that the subsequent operation time is optimized.
In the fourth step of the scheme, the synchronous data comprises data in a tag state, tag data generated synchronously and data after synchronous aggregation, wherein the synchronous tag data is scheduled through a dolphin scheduler platform, the data in the tag state is recorded in a mysql database, if the data in the day of synchronization is abnormal, the normal data in the previous day is taken for display, the tag data generated synchronously is recorded in a clickhouse database, and the data after synchronous aggregation is recorded in an elastic search database and a hbase database. The mysql database can display tag metadata, the clickhouse database has very fast aggregation performance on large-batch data, tag aggregation data can be taken from the clickhouse database, the elasticsearch database is responsible for crowd calculation and analysis, and the hbase database can display data collected by the user portrait system in a personalized and real-time mode.
The invention has the beneficial effects that:
(1) the universal type label can be automatically generated according to the requirements of business personnel;
(2) the relation among all label tasks is clear and easy to check, and the tasks are conveniently and orderly executed;
(3) the task currently executed can be clearly seen in a visual mode;
(4) the problems occurring are conveniently and quickly positioned according to the log, so that the problems can be timely processed;
(5) the time of execution of each task is convenient to be checked, so that the working time is optimized later;
(6) and when the task has a problem, rescheduling and sending a mail notification to ensure the reliability and stability of the data.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (9)
1. A method for solving task scheduling facing to a plurality of portrait label jobs is characterized by comprising the following steps:
the method comprises the following steps: collecting data required by a user image, including service data, log data, buried point data and third-party data, storing the data into a hive data warehouse, and using the data as a source end of offline data scheduling;
step two: synchronizing fields into the hive data warehouse;
step three: checking the number of the source end data and the number of the data synchronized to the portrait intermediate table, if the number of the source end data is consistent with the number of the data synchronized to the portrait intermediate table, starting scheduling operation of the offline label data, calculating the offline label data of the portrait of the user, and if the number of the source end data is inconsistent with the number of the data synchronized to the portrait intermediate table, deleting the data synchronized on the day and rescheduling the data synchronized to the portrait intermediate table;
step four: and checking whether the calculated offline tag data are normal or not, if so, sending a mail and an enterprise WeChat to inform related personnel, if so, starting a scheduling task of aggregating the user tag data, and after the aggregation is finished, synchronizing the tag data and the aggregated tag data and storing the aggregated tag data into a plurality of databases of different types.
2. The method of claim 1, further comprising Kafka performing real-time tag data processing, calculating user portrait tag data to be processed in real-time, and storing the user portrait tag data in a portrait database.
3. The method as claimed in claim 1, wherein in step one, the service data, the log data, the buried data and the third party data are stored in the DW library, the ODS library and the DM library corresponding to the hive data warehouse by means of program, script and ogg.
4. The method for solving task scheduling for multiple portrait label jobs as recited in claim 3, wherein the hive data warehouse initiates table field requirement after the business data, log data, buried data and third party data are stored in DW library, ODS library and DM library.
5. The method for solving the task of scheduling jobs for multiple portrait tags according to claim 4, wherein required table fields in the hive data warehouse are selected according to requirements and synchronized into the portrait intermediate table of the hive data warehouse.
6. The method as claimed in claim 1, wherein in step three, the offline data is scheduled by dolphin scheduler platform according to the dependency relationship between tags.
7. The method for solving task scheduling oriented to multiple portrait label jobs according to claim 1, wherein in step four, the synchronized data includes data of label state, synchronously generated label data and synchronously aggregated data;
recording the data of the tag state into a mysql database, and if the data of the day is abnormal in synchronization, displaying the normal data of the previous day;
synchronously generating label data to be recorded in a clickhouse database;
and synchronously recording the aggregated data to an elastic search database and a hbase database.
8. The method of claim 7, wherein the synchronized tag data is scheduled by a dolphin scheduler platform.
9. The method of claim 1, further comprising matching existing label templates directly according to the content of the request to generate a scheduling task and calculating the label of the user portrait directly if there is a new label request.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110624290.7A CN113282393A (en) | 2021-06-04 | 2021-06-04 | Method for solving task scheduling facing to multiple portrait label jobs |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110624290.7A CN113282393A (en) | 2021-06-04 | 2021-06-04 | Method for solving task scheduling facing to multiple portrait label jobs |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113282393A true CN113282393A (en) | 2021-08-20 |
Family
ID=77283343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110624290.7A Pending CN113282393A (en) | 2021-06-04 | 2021-06-04 | Method for solving task scheduling facing to multiple portrait label jobs |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113282393A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114064660A (en) * | 2021-11-29 | 2022-02-18 | 重庆允成互联网科技有限公司 | Data structured analysis method based on ElasticSearch |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107391752A (en) * | 2017-08-16 | 2017-11-24 | 四川长虹电器股份有限公司 | A kind of method based on hadoop platform construction user tag information |
CN110427422A (en) * | 2019-05-23 | 2019-11-08 | 武汉达梦数据库有限公司 | Data consistency verification method, equipment and storage medium when data synchronous abnormality |
CN111475509A (en) * | 2020-04-03 | 2020-07-31 | 李俊宏 | Big data-based user portrait and multidimensional analysis system |
CN111737230A (en) * | 2020-06-23 | 2020-10-02 | 北京奇艺世纪科技有限公司 | Data verification method and device, electronic equipment and readable storage medium |
CN111881221A (en) * | 2020-07-07 | 2020-11-03 | 上海中通吉网络技术有限公司 | Method, device and equipment for customer portrait in logistics service |
-
2021
- 2021-06-04 CN CN202110624290.7A patent/CN113282393A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107391752A (en) * | 2017-08-16 | 2017-11-24 | 四川长虹电器股份有限公司 | A kind of method based on hadoop platform construction user tag information |
CN110427422A (en) * | 2019-05-23 | 2019-11-08 | 武汉达梦数据库有限公司 | Data consistency verification method, equipment and storage medium when data synchronous abnormality |
CN111475509A (en) * | 2020-04-03 | 2020-07-31 | 李俊宏 | Big data-based user portrait and multidimensional analysis system |
CN111737230A (en) * | 2020-06-23 | 2020-10-02 | 北京奇艺世纪科技有限公司 | Data verification method and device, electronic equipment and readable storage medium |
CN111881221A (en) * | 2020-07-07 | 2020-11-03 | 上海中通吉网络技术有限公司 | Method, device and equipment for customer portrait in logistics service |
Non-Patent Citations (1)
Title |
---|
陈捷,: "基于大数据技术的用户画像系统设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑(月刊)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114064660A (en) * | 2021-11-29 | 2022-02-18 | 重庆允成互联网科技有限公司 | Data structured analysis method based on ElasticSearch |
CN114064660B (en) * | 2021-11-29 | 2022-06-21 | 重庆允成互联网科技有限公司 | Data structured analysis method based on ElasticSearch |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109408337B (en) | Interface operation and maintenance method and device | |
CN101719149A (en) | Data synchronization method and device | |
CN111125444A (en) | Big data task scheduling management method, device, equipment and storage medium | |
CN111400011B (en) | Real-time task scheduling method, system, equipment and readable storage medium | |
CN108197155A (en) | Information data synchronous method, device and computer readable storage medium | |
US20090083221A1 (en) | System and Method for Estimating and Storing Skills for Reuse | |
CN111061696B (en) | Method and device for analyzing transaction message log | |
CN112486701A (en) | Message asynchronous processing method and equipment thereof | |
CN114741375A (en) | Rapid and automatic data migration system and method for multi-source heterogeneous database | |
CN114356692A (en) | Visual processing method and device for application monitoring link and storage medium | |
CN113282393A (en) | Method for solving task scheduling facing to multiple portrait label jobs | |
CN105867895A (en) | Method for realizing code management and need management information synchronization and device thereof | |
CN114398359A (en) | Order data automatic reconciliation method, device and storage medium | |
CN111753015A (en) | Data query method and device of payment clearing system | |
CN116627609A (en) | Hive batch processing-based scheduling method and device | |
CN107451056B (en) | Method and device for monitoring interface test result | |
CN110322313A (en) | The method transferred items based on SAP system batch creation sales order and delivery order | |
CN115481116A (en) | Data quality inspection method and device | |
CN114492861A (en) | Test data acquisition and analysis method | |
CN114020819A (en) | Multi-system parameter synchronization method and device | |
US8631391B2 (en) | Method and a system for process discovery | |
CN112527497A (en) | Serialized multithreading data processing system | |
CN112925697B (en) | Method, device, equipment and medium for monitoring job difference | |
CN109710688A (en) | A kind of real-time Inspection method of data and message-oriented middleware | |
CN112965793B (en) | Identification analysis data-oriented data warehouse task scheduling method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210820 |