CN112148810B - User portrait analysis system supporting custom labels - Google Patents
User portrait analysis system supporting custom labels Download PDFInfo
- Publication number
- CN112148810B CN112148810B CN202011243959.XA CN202011243959A CN112148810B CN 112148810 B CN112148810 B CN 112148810B CN 202011243959 A CN202011243959 A CN 202011243959A CN 112148810 B CN112148810 B CN 112148810B
- Authority
- CN
- China
- Prior art keywords
- data
- user
- analysis
- task
- execution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 25
- 238000007405 data analysis Methods 0.000 claims abstract description 32
- 238000012545 processing Methods 0.000 claims abstract description 30
- 238000013515 script Methods 0.000 claims description 37
- 238000007726 management method Methods 0.000 claims description 20
- 238000012544 monitoring process Methods 0.000 claims description 13
- 230000006399 behavior Effects 0.000 claims description 10
- 238000005192 partition Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 238000012986 modification Methods 0.000 claims description 4
- 230000004048 modification Effects 0.000 claims description 4
- 238000007792 addition Methods 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 235000006694 eating habits Nutrition 0.000 claims description 3
- 238000002955 isolation Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 3
- 238000005065 mining Methods 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 claims description 2
- 238000000034 method Methods 0.000 abstract description 9
- 230000008569 process Effects 0.000 abstract description 6
- 230000010354 integration Effects 0.000 abstract description 5
- 238000012827 research and development Methods 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 241000282813 Aepyceros melampus Species 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/252—Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a user portrait analysis system supporting custom labels, which comprises a source data layer and a storage layer, wherein the storage layer is used for storing original business data; the data analysis layer is used for analyzing and processing the original business data according to the labels/indexes configured by the user in a self-defining way, and loading analysis results into the data warehouse and the data application module for data display and application; and the data product end is based on a data analysis layer and is developed by a user to realize data statistics and display analysis. The invention integrates the existing big data technical frames deeply, provides standard integrated interface API externally, reduces the selection and integration work of the technical frames in the system research and development process, and is easy to integrate into the existing system; meanwhile, various data processing components can be provided, a flexibly-configurable tagged data analysis scheme can be quickly applied to related big data analysis scenes, and the data analysis indexes which are continuously changed can be quickly responded.
Description
Technical Field
The invention relates to a user portrait analysis system supporting custom labels, and belongs to the technical field of data analysis.
Background
After the internet walks into the big data age, the user behavior brings a series of changes and remodelling to the products and services of the enterprise, wherein the biggest change is that all the behaviors of the user are traceable and analyzable in front of the enterprise, a large amount of original data and various business data are stored in the enterprise, which is a real record of the business operation of the enterprise, how to more effectively use the data for analysis and evaluation is a problem of the enterprise based on a larger data amount background. With the deep research and application of big data technology, the focus of enterprises is increasingly focused on how to use big data for fine operation and accurate marketing service, and the fine operation is to be performed, and firstly, user portraits of the enterprises are to be established.
User portraits, namely user informatization tags, are characterized by the data of various dimensions such as social attributes, consumption habits, preference characteristics and the like of mobile phone users, and are analyzed and counted, potential value information is mined, and the information overall view of the users is abstracted from the characteristics. The user image can be regarded as the root of the fine operation of enterprises, is a precondition for targeted advertisement delivery and personalized recommendation, and lays a foundation for data-driven operation. Compared with the traditional enterprise report, the user portrait provides more flexible user behavior analysis and more accurate personalized service, and is an important direction of big data floor application.
At present, the technology of big data storage, processing, analysis and the like is layered endlessly, and frames in the same field are in diversified development trends, and have advantages and characteristics. When an enterprise needs to develop related services such as user portrait analysis, system developers often face the following problems:
1. many similar frameworks or technologies exist for completing the same function, and research and development personnel need to spend time for research, comparison and trial-and-error;
2. how to organically integrate a plurality of frames and form a set of high-efficiency and accurate system-level integral solution;
3. when other business departments in the enterprise need to build similar platforms, how to use the existing platforms for quick multiplexing and integration.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a user portrait analysis system supporting a custom tag, which combines the service requirements of user portrait type, deeply integrates the prior big data technical frames, externally provides a standard integrated interface API, reduces the selection and integration work of the technical frames in the system research and development process, and is easy to integrate into the prior system; meanwhile, various data processing components can be provided, a flexibly-configurable tagged data analysis scheme can be quickly applied to related big data analysis scenes, and the data analysis indexes which are continuously changed can be quickly responded.
In order to achieve the above purpose, the present invention adopts the following technical scheme: a user portrait analysis system supporting custom labels, the whole system is divided into three layers, including:
the source data layer is used for storing a storage layer of original business data;
the data analysis layer is used for analyzing and processing the original business data according to the labels/indexes configured by the user in a self-defining way, and loading analysis results into the data warehouse and the data application module for data display and application;
the data product end is based on a data analysis layer, and is automatically developed by a user to realize data statistics and display analysis;
the data analysis layer comprises an ODS storage layer, a data processing module, a data warehouse and a data application module, wherein the ODS storage layer is an isolation layer formed between a service system and the data warehouse and is used for accessing and storing the original data of a plurality of service systems and providing a data foundation and support for a data analysis engine of an upper layer;
the data processing module comprises a tag metadata management module, a task scheduling engine module and an operation state monitoring module, wherein the tag metadata management module is used for managing definition data describing each tag in a user portrait; the task scheduling engine module coordinates and schedules each task according to a time plan, executes a deployed data analysis program and provides a Web monitoring and management page;
the data warehouse is realized based on Hive, is used for storing processed data and classifies the processed data according to topics;
the data application module is used for providing data application support for the data product end;
the data of the data warehouse are periodically synchronized to the data application module.
The ODS storage layer is realized by a Hive external partition table, and partition design is carried out according to different dimensions by combining specific service requirements.
The label metadata is a set of data index system established according to actual service demands and comprises a statistics index, a rule index, an algorithm index and a machine learning mining index, and the label metadata is stored by using Mysql and provides basic inquiry, new addition, modification and deletion interfaces for the outside.
The task scheduling engine module comprises script management, workflow, a scheduler, a script plug-in, a UI and an API, and the specific execution flow comprises the following steps:
step one, uploading program packages/files to a server designated directory: the user uploads the packed Jar package, the packed sh file and the packed SQL file by himself;
step two, inputting an execution script: the user inputs script information through a UI interface, wherein the script information comprises names, types, versions, program package paths, resource settings, execution environment parameters and dynamic parameters;
step three, creating a workflow: a user draws a DAG workflow through a UI interface, and configures basic information and execution sequence of each node;
step four, creating a scheduling task: the system automatically generates scheduling task information according to the workflow, wherein the scheduling task information comprises Job, trigger and Scheduler;
step five, the system performs task scheduling and execution: the system executes task scheduling according to the generated scheduling information;
step six, storing the result: and saving a task scheduling record and a task execution condition.
The data warehouse comprises a user attribute subject database, a user behavior subject database, a user consumption subject database, a user preference subject database and a user value subject database, wherein the user attribute subject database comprises user gender, age, academic, income level, marital status and family member status; the user behavior theme library comprises recent travel frequency and recent shopping frequency; the user consumption theme library comprises the recent consumption times, the recent consumption amount and the consumption capacity; the user preference theme library comprises a category of commonly purchased goods and eating habits; the user value subject library includes user value information calculated according to an RFM model.
The data application layer utilizes ElasticSearch, redis, hbase and relational databases as data stores.
Compared with the prior art, the invention adopts a highly packaged integrated data analysis layer, can realize necessary functions such as data acquisition, metadata definition, data processing, data application and the like, and supports expansion. The data processing layer adopts an independently developed execution engine, supports a main stream data processing frame and scripts, such as shell, spark, MR, hive, SQL, java, python scripts and the like, and can realize quick deployment and online application through online editing and configuration. Finally, by deeply integrating the existing mainstream big data processing technology and providing a standard integrated interface API to the outside, the selection and integration work of a technical framework in the system research and development process is greatly reduced, and the system is easy to integrate into the existing system. The whole system has powerful functions, provides various data processing components, is flexible and configurable with a labeled data analysis scheme, can be rapidly applied to related big data analysis scenes, and can rapidly respond to continuously-changed data analysis indexes.
Drawings
FIG. 1 is a general architecture diagram of the present invention;
FIG. 2 is a workflow of the task scheduling engine module of the present invention;
FIG. 3 is a schematic representation of the workflow of the present invention;
fig. 4 is a schematic diagram of a scheduler of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made more fully hereinafter with reference to the accompanying drawings, in which embodiments of the invention are shown, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1 to fig. 4, the method for performing network public opinion analysis based on DFA algorithm provided by the present invention includes three layers of systems, including:
the source data layer is used for storing a storage layer of original business data; managed and maintained by a business system, such as member information, order information, commodity information, member access logs, business operation logs, buried point tracking information, and the like;
the data analysis layer, namely the general data analysis component related to the patent, is used for analyzing and processing the original business data according to the labels/indexes configured by user definition, and loading analysis results into a data warehouse and a data application module for data display and application;
the data product end is based on a data analysis layer, and is automatically developed by a user to realize data statistics and display analysis; such as crowd analysis, user portraits, marketing delivery, statistics, early warning and alarming, etc.;
the data analysis layer comprises an ODS storage layer, a data processing module, a data warehouse and a data application module, wherein the ODS storage layer is an isolation layer formed between a service system and the data warehouse and is used for accessing and storing the original data of a plurality of service systems, such as the original data of user information, order details, commodity information and the like; and provides data base and support for the data analysis engine of the upper layer;
the ODS storage layer can be realized by adopting a Hive external partition table, and is designed in a partition mode according to different dimensions in combination with specific service requirements, such as date, data source, application and the like, and system research personnel need to establish a Hive data table corresponding to source data according to the table structure and the data characteristics of original data;
the data processing module is a core module of the data analysis layer and comprises a tag metadata management module, a task scheduling engine module and an operation state monitoring module, wherein the tag metadata management module is used for managing definition data describing each tag in a user portrait; the task scheduling engine module coordinates and schedules each task according to a time plan, executes a deployed data analysis program and provides a Web monitoring and management page;
the data warehouse is realized based on Hive, is used for storing processed data and classifies the processed data according to topics;
the data application module is used for providing data application support for the data product end;
the data of the data warehouse are periodically synchronized to the data application module.
Among them, the use of Hive as an ODS storage layer has the following advantages:
1) The metadata management is unified, and the metadata can be directly transmitted to frames such as Spark/Impala and the like for processing;
2) The SQL-like query language HQL is provided, so that the method is easy to use and use;
3) Based on Hadoop HDFS storage, the method has strong expansion and calculation capability;
the user can load the original data into the Hive data table by the existing mature means, such as ETL, sqoop, hive load command, etc.
The tag metadata is a set of data index system established according to actual service requirements, and comprises a statistics index, a rule index, an algorithm index and a machine learning mining index, and the set of data index system is commonly called as tag metadata. From the composition of the index and the application scenario, the following dimensions can be divided: user attribute dimensions, user behavior dimensions, user consumption dimensions, user social dimensions, and the like. The label metadata is stored by using Mysql, and basic inquiry, new addition, modification and deletion interfaces are externally provided, so that development of corresponding management interfaces by developers is facilitated.
The tag metadata definition table structure is designed as follows:
as shown in fig. 2, the task scheduling engine module includes script management, workflow, scheduler, script plug-in, UI and API, and the specific execution flow includes the following steps:
step one, uploading program packages/files to a server designated directory: the user uploads the packed Jar package, the packed sh file and the packed SQL file by himself;
step two, inputting an execution script: the user inputs script information through a UI interface, wherein the script information comprises names, types, versions, program package paths, resource settings, execution environment parameters and dynamic parameters;
step three, creating a workflow: a user draws a DAG workflow through a UI interface, and configures basic information and execution sequence of each node;
step four, creating a scheduling task: the system automatically generates scheduling task information according to the workflow, wherein the scheduling task information comprises Job, trigger and Scheduler;
step five, the system performs task scheduling and execution: the system executes task scheduling according to the generated scheduling information;
step six, storing the result: and saving a task scheduling record and a task execution condition.
The script management comprises script input management, a parser and an executor, and the script attribute comprises: name, type, version, package path, resource settings, execution environment parameters, dynamic parameters, etc. The execution parameters of different types of scripts are the same, and after the user selects the type, the page automatically loads the corresponding execution parameter setting window. And the script parser analyzes, merges, replaces dynamic parameters and the like of the input script according to different types, and generates executable data analysis codes. The script executor is called by the workflow and the scheduler, and calls the bottom shell of the operating system to execute according to the executable codes provided by the parser.
The workflow is used for organizing and controlling the execution sequence of each task, and a retry mechanism is provided for failed tasks. The workflow is assembled in a DAG streaming mode (DAG: full scale Directed Acyclic Graph), task tasks in the workflow are assembled in a directed acyclic graph mode, topology traversal is carried out from nodes with zero degree until no subsequent nodes exist, the running state of the tasks can be monitored in real time, and operations such as retry supporting, failure recovery from a designated node, pause, kill Task and the like are supported.
An execution node is the basic component of a workflow, providing an abstract definition of a series of executable programs/scripts as a carrier of task execution. Each execution node corresponds to an execution script, and is distinguished according to the program type, and mainly comprises: linux shell, spark, hive, SQL, java, python, etc.
The node controller provides the sequence and priority of executing the nodes and provides the operations of executing, suspending, stopping, retrying and the like of the nodes.
The monitor provides basic monitoring data such as execution time, end time, time consumption, execution state, error log and the like of each execution node.
A complete workflow layout diagram is shown in fig. 3.
After the workflow is arranged, the system automatically generates corresponding task examples, and the task examples correspond to a set of scheduling configuration information. The scheduler mainly comprises the following physical components: scheduler, job and Trigger, the relationship between the components is shown in fig. 4. The Scheduler is a task scheduling controller and is used for receiving and storing Job and Trigger information and is responsible for triggering and executing the Trigger. Schedulers comprise two important components: threadPool and JobStore. ThreadPool provides multithreading to execute Job programs, jobStore, for storing Job and Trigger information. Job is a scheduled task, which is an abstract definition of a task, and is the execution logic of the task. One Job may be triggered by multiple Trigger. Trigger triggers Trigger corresponding task program based on time rule, which specifies Trigger time and period based on Cron expression. For example: cronscheduled ("0 0/3 9-15.
The script plug-in comprises various scripts such as shell, spark, MR, hive, SQL, java, python, the various scripts are integrated into the engine in a plug-in mode, the engine adapts to each supported script, and other new script types are extended in a later support mode.
The UI provides the visual management function related to the data processing engine for the terminal operation user, and mainly comprises the following steps: tag metadata management, script management, workflow configuration, task scheduling management, running state monitoring, data processing result display and the like.
And the API interface layer is used for uniformly providing RESTful API to provide request services to the outside. Interfaces include creation, definition, querying, modification, publishing, offline, manual start, stop, pause, resume, execute from the node, and so forth of the workflow.
The operation state monitoring is used for monitoring the execution state of each component and the result and efficiency of data processing in the process of data processing, and mainly comprises the following steps: task scheduling monitoring, data processing process monitoring, data processing engine execution state monitoring and the like.
The data warehouse comprises a user attribute subject database, a user behavior subject database, a user consumption subject database, a user preference subject database and a user value subject database, wherein the user attribute subject database comprises user gender, age, academic, income level, marital status and family member status; the user behavior theme library comprises recent travel frequency and recent shopping frequency; the user consumption theme library comprises the recent consumption times, the recent consumption amount and the consumption capacity; the user preference theme library comprises a category of commonly purchased goods and eating habits; the user value subject library includes user value information calculated according to an RFM model.
The data warehouse already stores the topic library data in a classified manner, but cannot provide convenient and efficient query and analysis capabilities, so that a data application layer needs to be provided above the data warehouse for providing data application support for the product end. The data application layer uses ElasticSearch, redis, hbase, a relational database and the like as data storage, and provides very convenient functions of inquiring, analyzing, displaying and the like for a product end.
In summary, the invention adopts a highly encapsulated integrated data analysis layer, which can realize necessary functions such as data acquisition, metadata definition, data processing, data application and the like, and support expansion. The data processing layer adopts an independently developed execution engine, supports a main stream data processing frame and scripts, such as shell, spark, MR, hive, SQL, java, python scripts and the like, and can realize quick deployment and online application through online editing and configuration. Finally, by deeply integrating the existing mainstream big data processing technology and providing a standard integrated interface API to the outside, the selection and integration work of a technical framework in the system research and development process is greatly reduced, and the system is easy to integrate into the existing system. The whole system has powerful functions, provides various data processing components, is flexible and configurable with a labeled data analysis scheme, can be rapidly applied to related big data analysis scenes, and can rapidly respond to continuously-changed data analysis indexes.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.
Claims (4)
1. A user portrait analysis system supporting custom labels is characterized in that the whole system is divided into three layers, and comprises:
the source data layer is used for storing a storage layer of original business data;
the data analysis layer is used for analyzing and processing the original business data according to the labels/indexes configured by the user in a self-defining way, and loading analysis results into the data warehouse and the data application module for data display and application;
the data product end is based on a data analysis layer, and is automatically developed by a user to realize data statistics and display analysis;
the data analysis layer comprises an ODS storage layer, a data processing module, a data warehouse and a data application module, wherein the ODS storage layer is an isolation layer formed between a service system and the data warehouse and is used for accessing and storing the original data of a plurality of service systems and providing a data foundation and support for a data analysis engine of an upper layer; the data application layer uses ElasticSearch, redis, hbase and a relational database as data storage;
the data processing module comprises a tag metadata management module, a task scheduling engine module and an operation state monitoring module, wherein the tag metadata management module is used for managing definition data describing each tag in a user portrait; the task scheduling engine module coordinates and schedules each task according to a time plan, executes a deployed data analysis program and provides a Web monitoring and management page;
the task scheduling engine module comprises script management, workflow, a scheduler, a script plug-in, a UI and an API, and the specific execution flow comprises the following steps:
step one, uploading program packages/files to a server designated directory: the user uploads the packed Jar package, the packed sh file and the packed SQL file by himself;
step two, inputting an execution script: the user inputs script information through a UI interface, wherein the script information comprises names, types, versions, program package paths, resource settings, execution environment parameters and dynamic parameters;
step three, creating a workflow: a user draws a DAG workflow through a UI interface, and configures basic information and execution sequence of each node;
the workflow is used for organizing and controlling the execution sequence of each Task, a retry mechanism is provided for failed tasks, the workflow is in a DAG flow type, task tasks in the workflow are assembled in a directed acyclic graph form, topology traversal is carried out from nodes with zero degree of entry, and the Task is assembled in a mode until no successor node exists;
the execution nodes are basic components of the workflow, serve as bearing facilities for task execution, provide abstract definitions of a series of executable programs/scripts, correspond to one execution script, and distinguish the execution scripts according to program types;
the node controller provides the sequence and the priority of executing the nodes and provides the operations of executing, suspending, stopping and retrying the nodes;
step four, creating a scheduling task: the system automatically generates scheduling task information according to the workflow, wherein the scheduling task information comprises Job, trigger and Scheduler;
step five, the system performs task scheduling and execution: the system executes task scheduling according to the generated scheduling information;
step six, storing the result: storing task scheduling records and task execution conditions;
the data warehouse is realized based on Hive, is used for storing processed data and classifies the processed data according to topics;
the data application module is used for providing data application support for the data product end;
the data of the data warehouse are periodically synchronized to the data application module.
2. The custom tag-enabled user portrayal analysis system of claim 1, wherein the ODS storage layer is implemented using Hive external partition tables, and the partition designs are performed in different dimensions in conjunction with specific business requirements.
3. The user portrait analysis system supporting custom labels according to claim 1, wherein the label metadata is a set of data index system established according to actual service requirements, including statistics class indexes, rule class indexes, algorithm class indexes and machine learning mining class indexes, and the label metadata is stored by Mysql and provides basic query, addition, modification and deletion interfaces to the outside.
4. The user portrayal analysis system supporting custom tags according to claim 1, wherein the data warehouse comprises a user attribute topic library, a user behavior topic library, a user consumption topic library, a user preference topic library and a user value topic library, the user attribute topic library comprising user gender, age, academy, income level, marital status and family member status; the user behavior theme library comprises recent travel frequency and recent shopping frequency; the user consumption theme library comprises the recent consumption times, the recent consumption amount and the consumption capacity; the user preference theme library comprises a category of commonly purchased goods and eating habits; the user value subject library includes user value information calculated according to an RFM model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011243959.XA CN112148810B (en) | 2020-11-10 | 2020-11-10 | User portrait analysis system supporting custom labels |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011243959.XA CN112148810B (en) | 2020-11-10 | 2020-11-10 | User portrait analysis system supporting custom labels |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112148810A CN112148810A (en) | 2020-12-29 |
CN112148810B true CN112148810B (en) | 2023-11-28 |
Family
ID=73887287
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011243959.XA Active CN112148810B (en) | 2020-11-10 | 2020-11-10 | User portrait analysis system supporting custom labels |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112148810B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906907B (en) * | 2021-03-24 | 2024-02-23 | 成都工业学院 | Method and system for layering management and distribution of machine learning pipeline model |
CN113313344B (en) * | 2021-04-13 | 2023-03-31 | 武汉烽火众智数字技术有限责任公司 | Label system construction method and system fusing multiple modes |
CN113239270A (en) * | 2021-05-11 | 2021-08-10 | 浪潮软件股份有限公司 | Method for flexibly configuring and generating PC (personal computer) client label |
CN113139750B (en) * | 2021-05-14 | 2024-04-09 | 中国平安人寿保险股份有限公司 | Course recommendation method and device, server and storage medium |
CN113449017A (en) * | 2021-07-15 | 2021-09-28 | 中数智科技(东莞)有限公司 | Historical behavior data processing method and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016054908A1 (en) * | 2014-10-10 | 2016-04-14 | 中兴通讯股份有限公司 | Internet of things big data platform-based intelligent user profiling method and apparatus |
CN111159276A (en) * | 2018-11-08 | 2020-05-15 | 北京航天长峰科技工业集团有限公司 | Holographic image system construction method based on hybrid storage mode |
-
2020
- 2020-11-10 CN CN202011243959.XA patent/CN112148810B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016054908A1 (en) * | 2014-10-10 | 2016-04-14 | 中兴通讯股份有限公司 | Internet of things big data platform-based intelligent user profiling method and apparatus |
CN111159276A (en) * | 2018-11-08 | 2020-05-15 | 北京航天长峰科技工业集团有限公司 | Holographic image system construction method based on hybrid storage mode |
Also Published As
Publication number | Publication date |
---|---|
CN112148810A (en) | 2020-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112148810B (en) | User portrait analysis system supporting custom labels | |
US11768875B2 (en) | Monitoring system control interface for asset tree determination | |
US9800675B2 (en) | Methods for dynamically generating an application interface for a modeled entity and devices thereof | |
Rudolf et al. | The graph story of the SAP HANA database | |
JP4676199B2 (en) | Systems and methods for integrating, managing, and coordinating customer activities | |
US20090006148A1 (en) | Apparatus and method for materializing related business intelligence data entities | |
US11461350B1 (en) | Control interface for dynamic elements of asset monitoring and reporting system | |
US9311617B2 (en) | Processing event instance data in a client-server architecture | |
US11132373B1 (en) | Decoupled update cycle and disparate search frequency dispatch for dynamic elements of an asset monitoring and reporting system | |
Zang et al. | Architecture, implementation and application of complex event processing in enterprise information systems based on RFID | |
CN107103064B (en) | Data statistical method and device | |
US11023468B2 (en) | First/last aggregation operator on multiple keyfigures with a single table scan | |
US8645431B2 (en) | Multi-level supply chain management system and methods | |
Akpınar et al. | Thingstore: A platform for internet-of-things application development and deployment | |
Salem et al. | Active XML-based Web data integration | |
US8825596B2 (en) | Systems and methods for robust data source access | |
Andrade et al. | Intelligent event broker: a complex event processing system in big data contexts | |
Mos et al. | Multi-level monitoring and analysis of web-scale service based applications | |
Chereja et al. | Operationalizing analytics with NewSQL | |
CN109086296A (en) | A kind of e-commerce system based on browser and server structure | |
US20130218893A1 (en) | Executing in-database data mining processes | |
Abi Assaf et al. | A continuous query language for stream-based artifacts | |
US20210342783A1 (en) | System and method of automated extraction and visualization of knowledge about enterprise technology, personnel and business functions | |
May et al. | Managed query processing within the SAP HANA database platform | |
Jing et al. | An intelligent self-adaption complex event processing framework with dynamic context detection and automatic event pattern modification abilities |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |