CN115544144A - Method and device for processing label data - Google Patents

Method and device for processing label data Download PDF

Info

Publication number
CN115544144A
CN115544144A CN202211259401.XA CN202211259401A CN115544144A CN 115544144 A CN115544144 A CN 115544144A CN 202211259401 A CN202211259401 A CN 202211259401A CN 115544144 A CN115544144 A CN 115544144A
Authority
CN
China
Prior art keywords
real
label
data
time data
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211259401.XA
Other languages
Chinese (zh)
Other versions
CN115544144B (en
Inventor
黄景华
叶田田
王文鉴
宋依兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongdian Jinxin Software Co Ltd
Original Assignee
Zhongdian Jinxin Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongdian Jinxin Software Co Ltd filed Critical Zhongdian Jinxin Software Co Ltd
Priority to CN202211259401.XA priority Critical patent/CN115544144B/en
Priority claimed from CN202211259401.XA external-priority patent/CN115544144B/en
Publication of CN115544144A publication Critical patent/CN115544144A/en
Application granted granted Critical
Publication of CN115544144B publication Critical patent/CN115544144B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a tag data processing method and device, which comprise the following steps: receiving real-time data sent by a third-party system through a preset interface; determining a tag cube associated in real time; judging whether the real-time data conforms to data information indicated by a tag cube associated with the real-time data; if the real-time data accords with the data information indicated by the label cube associated with the real-time data, pushing the real-time data to the streaming component; and when the real-time data in the streaming assembly meets a preset triggering condition, reading the real-time data from the streaming assembly by using an automatic construction engine, constructing the real-time data into a tag cube associated with the real-time data, and generating corresponding real-time tag data. According to the method and the device, the real-time data are received through the preset interface and are associated to the preset label cube, and the timeliness of label generation is improved.

Description

Method and device for processing label data
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for processing tag data.
Background
In the prior art, a banking system performs settlement and integration on business data generated in the daytime, that is, runs to reduce the workload of workers, and when running to be aggregated, a label is often required to be constructed on aggregated data, for example, "total transaction amount in a certain day" and the like.
The traditional label portrait platform needs to complete complex processes of metadata definition, logic model design, model physics and chemistry, label data loading, aggregation and the like for label construction, the label construction often needs T +1 batch running, and the timeliness cannot be guaranteed while the label construction efficiency is low.
Disclosure of Invention
In view of this, an object of the present application is to provide at least a method and an apparatus for processing tag data, which receive real-time data through a preset interface and associate the real-time data with a preset tag cube, so as to improve timeliness of tag generation.
The application mainly comprises the following aspects:
in a first aspect, an embodiment of the present application provides a method for processing tag data, where the method includes: receiving real-time data sent by a third-party system through a preset interface, wherein the real-time data comprises a plurality of label parameters; judging whether the label parameters comprise a label name, a label report period, a label threshold value and a customer number; if the label parameters comprise a label name, a label report period, a label threshold value and a customer number, determining a label cube associated in real time; judging whether the real-time data conforms to data information indicated by a tag cube associated with the real-time data; if the real-time data accords with the data information indicated by the label cube associated with the real-time data, pushing the real-time data to the streaming component; and when the real-time data in the streaming assembly meets the preset triggering condition, reading the real-time data from the streaming assembly by using an automatic construction engine, constructing the real-time data into a tag cube associated with the real-time data, and generating corresponding real-time tag data.
In one possible embodiment, the tag cube with which the real-time data is associated is determined by: judging whether a target original label corresponding to the label name indicated by the real-time data exists in the original label data set or not; if the original tag data set has the target original tag, directly determining a preset tag cube associated with the target original tag as a tag cube associated with the real-time data; and if the target original label does not exist in the original label data set, determining a preset new label cube as a label cube associated with the real-time data, wherein the preset new label cube is used for constructing the real-time label data which does not exist in the original label data set.
In one possible implementation, the data information indicated by the tag cube includes a target data format and a target data type, and the step of determining whether the real-time data conforms to the data information indicated by the tag cube associated with the real-time data includes: sequentially judging whether the parameter values corresponding to each label parameter all accord with a target data format associated with the real-time data; if the parameter value corresponding to each label parameter accords with the target data format, judging whether the parameter value corresponding to the label threshold value in the real-time data accords with the target data type indicated by the threshold value column in the label cube associated with the real-time data; if the parameter value corresponding to the label threshold value in the real-time data accords with the target data type indicated by the threshold value column in the label cube associated with the real-time library, the real-time data accords with the data information indicated by the label cube associated with the real-time data; and if the parameter value corresponding to the tag threshold value in the real-time data does not accord with the target data type, determining that the real-time data does not accord with the data information indicated by the tag cube associated with the real-time data.
In one possible implementation, real-time data is pushed to the streaming component by: pushing the real-time data to a redis message queue; through a preset timing scheduler aiming at a redis message queue, traversing the redis message queue by timing trigger to acquire real-time data of a first data set; and pushing the acquired real-time data to the streaming component through the redis message queue.
In one possible embodiment, the preset trigger condition includes: the redis message queue does not receive the real-time data with the same label name in a preset time range, or the quantity of the real-time data with the same label name pushed to the streaming component meets a preset threshold value.
In a second aspect, the present application further provides a tag data processing method, where the method includes: receiving a batch data table sent by a user; analyzing the batch data table to obtain a plurality of groups of non-real-time data, wherein each group of non-real-time data comprises a plurality of non-real-time label parameters; judging whether a plurality of non-real-time label parameters comprise label names, label report periods, label threshold values and customer numbers or not for each group of non-real-time data, if the plurality of non-real-time label parameters comprise the label names, the label report periods, the label threshold values and the customer numbers, determining label cubes associated with the non-real-time data, judging whether the non-real-time label data accord with data information indicated by the label cubes associated with the non-real-time data or not, and if the non-real-time label data accord with the data information indicated by the label cubes associated with the non-real-time data, pushing a plurality of groups of non-real-time data to a streaming component; and for each group of non-real-time data, when the group of non-real-time data in the streaming assembly meets a preset trigger condition, reading the group of non-real-time data from the streaming assembly by using an automatic construction engine, constructing the non-real-time data into a tag cube associated with the non-real-time data, and generating corresponding non-real-time tag data.
In one possible embodiment, the bulk data table is created by: a bulk data table is created by: acquiring a batch data table template in a preset mode, wherein the batch data table template comprises a plurality of lines to be edited, and each line to be edited comprises a plurality of label parameters to be edited; for each row to be edited, responding to the input operation sequentially executed by a user for each label parameter to be edited in the row to be edited, generating a parameter value corresponding to each label parameter to be edited, and forming non-real-time data corresponding to the row to be edited by the parameter values corresponding to the plurality of label parameters to be edited; a bulk data table is formed from sets of non-real time data.
In a third aspect, an embodiment of the present application further provides a device for processing tag data, where the device includes:
the receiving module is used for receiving real-time data sent by a third-party system through a preset interface, and the real-time data comprises a plurality of label parameters; the first judging module is used for judging whether the label parameters comprise a label name, a label report period, a label threshold value and a customer number; the determining module is used for determining a real-time associated label cube if the label parameters comprise a label name, a label report period, a label threshold value and a customer number; the second judgment module is used for judging whether the real-time data accords with the data information indicated by the label cube associated with the real-time data; the pushing module is used for pushing the real-time data to the streaming component if the real-time data conforms to the data information indicated by the tag cube associated with the real-time data; and the construction module is used for reading the real-time data from the streaming assembly by using the automatic construction engine when the real-time data in the streaming assembly meets the preset triggering condition, constructing the real-time data into a tag cube associated with the real-time data, and generating the corresponding real-time tag data.
In a fourth aspect, the present application further provides an electronic device, including: the tag data processing device comprises a processor, a memory and a bus, wherein the memory stores machine readable instructions executable by the processor, the processor and the memory are communicated through the bus when the electronic device runs, and the machine readable instructions are executed by the processor to execute the steps of the tag data processing method according to any one of the above embodiments.
In a fifth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to execute the steps of the tag data processing method according to any one of the above embodiments.
The application provides a tag data processing method and device, which comprise the following steps: receiving real-time data sent by a third-party system through a preset interface; determining a tag cube associated in real time; judging whether the real-time data conforms to data information indicated by a tag cube associated with the real-time data; if the real-time data accords with the data information indicated by the label cube associated with the real-time data, pushing the real-time data to the streaming component; and when the real-time data in the streaming assembly meets a preset triggering condition, reading the real-time data from the streaming assembly by using an automatic construction engine, constructing the real-time data into a tag cube associated with the real-time data, and generating corresponding real-time tag data. According to the method and the device, the real-time data are received through the preset interface and are associated to the preset label cube, and the timeliness of label generation is improved.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a flowchart illustrating steps of a method for processing tag data according to an embodiment of the present application;
FIG. 2 is a flow chart illustrating steps of a method for determining a tag cube with which real-time data is associated according to an embodiment of the present application;
fig. 3 is a flowchart illustrating steps of another tag data processing method provided in an embodiment of the present application;
fig. 4 is a schematic structural diagram illustrating a tag data processing apparatus according to an embodiment of the present application;
fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
To make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Further, it should be understood that the schematic drawings are not drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and that steps without logical context may be performed in reverse order or concurrently. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.
In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be obtained by a person skilled in the art without making any inventive step based on the embodiments of the present application, fall within the scope of protection of the present application.
The customer label portrait is to label customer information, that is, by mining and analyzing relevant data of a customer, the customer behavior and preference are known, and customer characteristics are drawn in a multi-dimensional manner, so as to help a bank to accurately locate and market the customer.
In the prior art, a large number of pre-created original tags are in a pre-created tag system and can be directly called by an external system, but in actual operation, some real-time/temporary tags which do not exist in the tag system are often generated, and for such tags, complex processes such as metadata definition, logic model design, model physics and chemistry, tag data loading, aggregation and the like need to be completed if the tags are created in the system again, so that the processing efficiency of tag data is greatly reduced, and secondary calling of subsequent calculation is influenced.
Based on this, the embodiment of the application provides a method and a device for processing tag data, which receive real-time data through a preset interface and associate the real-time data with a preset tag cube, so as to improve the timeliness of tag generation, and specifically includes the following steps:
referring to fig. 1, fig. 1 is a flowchart illustrating steps of a method for processing tag data according to an embodiment of the present application. As shown in fig. 1, a method provided in an embodiment of the present application includes the following steps:
and S100, receiving real-time data sent by a third-party system through a preset interface.
Specifically, the method of the application can be applied to an intelligent portrait platform, the third-party system is a system for generating real-time data, and the preset interface can be a preset API interface or a kafka Client API interface provided by kafka and allowing a user to operate.
In a specific implementation, each real-time service record is generated, the real-time service record is stored in a corresponding data table, the data table includes a plurality of fields, each real-time service record includes a plurality of field values corresponding to the fields, real-time data can be determined from the real-time service record, the real-time data includes a plurality of tag parameters, for a third-party system, a plurality of real-time service records can be processed secondarily according to preset rules to form real-time data corresponding to the real-time service records, for example, a certain data table records transaction conditions of a user, including 5 fields of { transaction type, transaction amount, transaction client number, transaction channel, and transaction date }, the plurality of real-time service records therein can be processed according to preset rules to form a required tag, for example, a tag of "transaction amount of a certain channel in the last 1 day" is created, and the real-time data generated after processing can be a { tag name: "transaction amount of the last 1 day of a certain channel", transaction amount: 100 ten thousand, transaction date: "10/7/2022", transaction client number: 2015EX7}, that is, a plurality of tag parameters included in the finally generated real-time data are all in the form of key-value pairs, for example, tag name: "transaction amount of the last 1 day of a certain channel", wherein the tag name is a key, and the corresponding value is: the transaction amount of the last 1 day of a certain channel.
The real-time data actually refers to data depending on real-time behaviors of users, the system-dependent capacity is high, and the real-time data is common in real-time scene marketing, for example, if a certain tag is a user who does not log in for 90 days, for the tag, the real-time business record conforming to the tag of the user who does not log in for 90 days is screened from a corresponding data table according to the real-time logging behaviors of the user, and the real-time business record and the tag name form corresponding real-time data.
The received real-time data is converted into a common data format of tag cube data, such as json, specified by the intelligent representation platform, and the tag data is transmitted to the intelligent representation platform through a predetermined interface.
In a preferred embodiment, a preset API interface may be defined according to the Open API specification, where the preset API interface indicates parameters such as a URL address and an interface name of the smart imaging platform, so that a third-party system may transmit real-time data to the smart imaging platform by calling the preset API interface.
In another possible implementation, the intelligent representation platform may further obtain real-time data through a kafka Client API interface provided by kafka and operable by a user, where kafka is a high-throughput distributed publish-subscribe messaging system, a specific third-party system may send data to kafka messaging middleware through kafka Client API, and the intelligent representation platform reads the real-time data from the kafka messaging middleware.
S110, judging whether the label parameters comprise a label name, a label report period, a label threshold value and a customer number.
After receiving the real-time data, analyzing the real-time data, splitting the real-time data to obtain a plurality of label parameters, and further judging whether the plurality of label parameters comprise a label name, a label report period, a label threshold value and a customer number, wherein the label report period is a generation date of a label, and the customer number is used for indicating the identity of a customer.
And S120, if the label parameters comprise a label name, a label report period, a label threshold value and a customer number, determining a label cube associated with the real-time data.
In a preferred embodiment, referring to fig. 2, fig. 2 is a flowchart illustrating steps of a method for determining a tag cube associated with real-time data according to an embodiment of the present application. As shown in fig. 2, the tag cube with which the real-time data is associated is determined by:
s1201, judging whether a target original label corresponding to the label name indicated by the real-time data exists in the original label data set.
Specifically, the intelligent portrait platform creates a plurality of original tags in advance, and the original tags are placed in the original tag dataset, where after receiving the real-time data, the intelligent portrait platform determines whether the real-time data is created, specifically, the intelligent portrait platform searches through the original tag dataset to find whether there is a target original tag corresponding to a tag name in the real-time data.
And S1202, if the target original tag exists in the original tag data set, directly determining a preset tag cube associated with the target original tag as a tag cube associated with the real-time data.
In a specific embodiment, the intelligent portrait platform creates a plurality of preset tag cubes in advance according to different tag categories through a Kylin system, each preset cube is bound to a different tag category in advance, and each original tag is also bound or associated to a corresponding preset tag cube according to the tag category to which the original tag belongs.
The data cube is built according to the requirement of a user for carrying out data query and analysis from multiple angles and multiple levels, and is based on a database model with facts and different dimensions, the basic application of the data cube is to realize OLAP (on-line analytical processing), OLAP is a technology commonly used for data analysis and indexing, and the OLAP can build a multi-dimensional index for data, so that the data can be analyzed through the data cube, and the query efficiency of the data can be greatly accelerated.
In the intelligent label portrait platform, a label cube is established based on label data, metadata corresponding to the label cube is generated and comprises a label cube name, a dimension column, a measurement column, a threshold column, label categories and label information related to the label cube, and therefore a mapping relation between an SQL statement and label cube metadata and a data dictionary is formed by using the label cube and is stored in a relational database to achieve logical mapping from an application layer to a database layer and serve as a business rule analysis template.
And S1203, if the target original label does not exist in the original label data set, determining a preset new label cube as a label cube associated with the real-time data.
In a specific embodiment, the preset new tag cube is used for constructing real-time tag data which does not exist in the original tag data set, and only tags which exist in the intelligent tag portrait platform can be subsequently further expanded and called, so that if the original tag data set does not include target original tags corresponding to tag names, it is indicated that the intelligent tag portrait platform does not create tags corresponding to the tag names, the tags need to be created on the intelligent tag portrait platform, and in the application, the real-time tag data corresponding to the real-time data can be automatically triggered and constructed based on the preset new tag cube.
Returning to fig. 1, in S121, if any one of the tag name, the tag report period, the tag threshold value, and the client number is not included in the plurality of tag parameters, the subsequent steps are not performed on the plurality of tag parameters.
S130, judging whether the real-time data accords with the data information indicated by the label cube associated with the real-time data.
The data information indicated by the tag cube includes a target data format of tag cube data specified by the intelligent representation platform and a target data type corresponding to the threshold column.
The step S130 includes:
sequentially judging whether the parameter value corresponding to each label parameter accords with a target data format indicated by a label cube associated with the real-time data;
if the parameter value corresponding to each label parameter accords with the target data format, judging whether the parameter value corresponding to the label threshold in the real-time data accords with the target data type indicated by the threshold column in the label cube associated with the real-time data;
if the parameter value corresponding to the tag threshold value in the real-time data conforms to the target data type indicated by the threshold value column in the tag cube associated with the real-time data, determining that the real-time data conforms to the data information indicated by the tag cube associated with the real-time data;
and if the parameter value corresponding to the tag threshold value in the real-time data does not accord with the target data type, determining that the real-time data does not accord with the data information indicated by the tag cube associated with the real-time data.
Specifically, each preset tag cube and each preset newly-generated tag cube define the data type of the corresponding threshold column, so that the data type of the tag threshold in the tag data associated with the preset tag cube is necessarily consistent with the data type defined by the threshold column, and if the data type of the tag threshold in the tag data associated with the preset tag cube is not consistent with the data type defined by the threshold column, association cannot be formed, so that for inconsistent real-time data, the corresponding tag threshold needs to be converted into a target data type before further subsequent processing can be performed.
In a preferred embodiment, the target data type is a generic data format of tag cube data specified by the intelligent representation platform, the generic data format specifying a tag name, a tag report period, a tag threshold, and a specific data format for a client number, e.g., json specifies that the attribute name of an object must be a double quotation mark, and the attribute value must be a double quotation mark if a string is used, etc. in the generic data format Json or XML Json.
Target data types include, but are not limited to, reshape, numeric, string.
And S140, if the real-time data accords with the data information indicated by the tag cube associated with the real-time data, pushing the real-time data to the streaming component.
The method further includes, S141, if the real-time data does not conform to the data information indicated by the tag cube associated with the real-time data, not performing subsequent processing on the real-time data.
In the present application, the tag cube is created and managed by the kylin system, so that real-time data needs to be pushed to the kylin system for further processing.
In another preferred embodiment, the real-time data is pushed to the streaming component by:
the method comprises the steps of pushing real-time data to a redis message queue, regularly triggering and traversing the redis message queue to obtain the real-time data through a preset timing scheduler aiming at the redis message queue, and pushing the obtained real-time data to a streaming component through the redis message queue.
In a specific embodiment, the redis message queue may perform deduplication processing on real-time data in a timing trigger traversal manner, and due to the possible problems such as network errors, the same real-time data may be repeatedly sent to the redis message queue, and the redis message queue may perform deduplication by combining the timing trigger traversal with a preset trigger condition, so that the problem of repeated sending of data may be avoided by setting the preset trigger condition.
S150, when the real-time data in the streaming assembly meets the preset triggering condition, reading the real-time data from the streaming assembly by using the automatic construction engine, constructing the real-time data into a tag cube associated with the real-time data, and generating corresponding real-time tag data.
Wherein the attrition component may be provided by kafka.
In a preferred embodiment, the preset trigger condition includes:
the redis message queue does not receive the real-time data with the same label name in a preset time range, or the quantity of the real-time data with the same label name pushed to the streaming component meets a preset threshold value.
In another preferred embodiment, a plurality of tag cubes are created in advance by means of a kylin system, each tag cube corresponds to one HIVE, real-time data is encapsulated into JSON in the form of key values and is transmitted into a streaming component, each real-time data is composed of a plurality of tag parameters, namely each tag parameter is in the form of a key value pair, in this way, after the real-time data is read from the streaming component, the key value pairs in the real-time data are identified through automatic analysis, a corresponding virtual table is automatically generated, the virtual table is stored into the corresponding HIVE, namely, the real-time data read from the streaming component is written into the HDFS.
And performing join operation on the virtual table and the corresponding dimension table in the HIVE by using the HiveQL, so that the currently received real-time data can be constructed into a label cube associated with the real-time data to form the real-time label data under the corresponding label cube.
Referring to fig. 3, fig. 3 is a flowchart illustrating steps of another tag data processing method according to an embodiment of the present application. As shown in fig. 3, the method includes:
and S210, receiving the batch data table sent by the user.
In a preferred embodiment, the bulk data table is created by:
the method comprises the steps of obtaining a batch data table template in a preset mode, wherein the batch data table template comprises a plurality of rows to be edited, each row to be edited comprises a plurality of label parameters to be edited, responding to input operation which is sequentially executed by a user aiming at each label parameter to be edited in the row to be edited aiming at each row to be edited, generating a parameter value corresponding to each label parameter to be edited, forming non-real-time data corresponding to the row to be edited by the parameter values corresponding to the plurality of label parameters to be edited, and forming a batch data table by a plurality of groups of non-real-time data.
The non-real-time data refers to data of which the formed label is a non-real-time label, for example, the non-real-time label can be age, gender, historically bought products, an account opening channel and the like, and the labels belong to basic attribute information of the user and do not depend on real-time behaviors of the user.
Specifically, an input rule corresponding to each label parameter in the batch data table template can be formulated according to a uniform universal format of the label cube, so that a user can execute input operation according to the corresponding rule, the batch data table is more suitable for a batch transmission process of non-real-time data to meet the requirements of the user, and the flexibility is high.
And S220, analyzing the batch data table to obtain multiple groups of non-real-time data.
Wherein each group of non-real-time data comprises a plurality of non-real-time label parameters.
And S230, judging whether the plurality of non-real-time label parameters comprise label names, label report periods, label threshold values and customer numbers or not according to each group of non-real-time data.
And S240, for each group of non-real-time data, if the plurality of non-real-time label parameters comprise a label name, a label report period, a label threshold value and a customer number, determining a label cube associated with the non-real-time data.
And S241, aiming at each group of non-real-time data, if any item of the label name, the label report period, the label threshold value and the customer number is not contained in the plurality of non-real-time label parameters, the plurality of non-real-time label parameters are not subjected to subsequent step processing.
And S250, judging whether the non-real-time tag data accord with the data information indicated by the tag cube associated with the non-real-time data or not for each group of non-real-time data.
And S260, aiming at each group of non-real-time data, if the non-real-time tag data accords with the data information indicated by the tag cube associated with the non-real-time tag data, pushing a plurality of groups of non-real-time data to the streaming assembly.
And S261, aiming at each group of non-real-time data, if the non-real-time tag data do not accord with the data information indicated by the tag cube associated with the non-real-time tag data, the non-real-time tag data are not subjected to subsequent step processing.
And S270, aiming at each group of non-real-time data, when the group of non-real-time data in the streaming assembly meets a preset trigger condition, reading the group of non-real-time data from the streaming assembly by using an automatic construction engine, constructing the non-real-time data into a tag cube associated with the non-real-time data, and generating corresponding non-real-time tag data.
The process of steps S230 to S280 is similar to the process of processing real-time data, and is not described herein again.
Based on the same application concept, a real-time tag processing apparatus corresponding to the real-time tag processing method provided in the foregoing embodiment is also provided in this embodiment of the present application, and since the principle of the apparatus in this embodiment of the present application to solve the problem is similar to the real-time tag processing method in the foregoing embodiment of the present application, the implementation of the apparatus may refer to the implementation of the method, and repeated details are omitted.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a tag data processing apparatus according to an embodiment of the present application, and as shown in fig. 4, the apparatus includes:
the receiving module 300 is configured to receive real-time data sent by a third-party system through a preset interface, where the real-time data includes a plurality of tag parameters;
a first determining module 310, configured to determine whether the multiple tag parameters include a tag name, a tag report period, a tag threshold, and a customer number;
a determining module 320, configured to determine a real-time associated tag cube if the plurality of tag parameters include a tag name, a tag report period, a tag threshold, and a customer number;
the second judging module 330 is configured to judge whether the real-time data conforms to data information indicated by a tag cube associated with the real-time data;
the pushing module 340, if the real-time data conforms to the data information indicated by the tag cube associated with the real-time data, pushes the real-time data to the streaming component;
the constructing module 350 is configured to, when the real-time data in the streaming component meets a preset trigger condition, read the real-time data from the streaming component by using the automatic construction engine, construct the real-time data into a tag cube associated with the real-time data, and generate corresponding real-time tag data.
Based on the same application concept, please refer to fig. 5, fig. 5 illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device 400 includes: a processor 410, a memory 420 and a bus 430, wherein the memory 420 stores machine-readable instructions executable by the processor 410, the processor 410 and the memory 420 communicate via the bus 430 when the electronic device 400 is running, and the machine-readable instructions are executed by the processor 410 to perform the steps of the method for processing tag data according to any of the above embodiments.
Based on the same application concept, embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the method for processing tag data provided in the foregoing embodiments are performed.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process of the system and the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for processing tag data, the method comprising:
receiving real-time data sent by a third-party system through a preset interface, wherein the real-time data comprises a plurality of label parameters;
judging whether the label parameters comprise a label name, a label report period, a label threshold value and a customer number;
if the label parameters comprise a label name, a label report period, a label threshold value and a customer number, determining a label cube associated in real time;
judging whether the real-time data conforms to data information indicated by a tag cube associated with the real-time data;
if the real-time data accords with the data information indicated by the label cube associated with the real-time data, pushing the real-time data to the streaming component;
and when the real-time data in the streaming assembly meets a preset triggering condition, reading the real-time data from the streaming assembly by using an automatic construction engine, constructing the real-time data into a tag cube associated with the real-time data, and generating corresponding real-time tag data.
2. The method of claim 1, wherein the tag cube with which the real-time data is associated is determined by:
judging whether a target original label corresponding to the label name indicated by the real-time data exists in the original label data set;
if the target original label exists in the original label data set, directly determining a preset label cube associated with the target original label as a label cube associated with real-time data;
and if the target original label does not exist in the original label data set, determining a preset new label cube as a label cube associated with the real-time data, wherein the preset new label cube is used for constructing the real-time label data which does not exist in the original label data set.
3. The method of claim 1, wherein the data information indicated by the tag cube includes a target data format and a target data type,
the step of judging whether the real-time data conforms to the data information indicated by the tag cube associated with the real-time data comprises the following steps:
sequentially judging whether the parameter values corresponding to each label parameter all accord with the target data format associated with the real-time data;
if the parameter value corresponding to each label parameter accords with the target data format, judging whether the parameter value corresponding to the label threshold value in the real-time data accords with the target data type indicated by the threshold value column in the label cube associated with the real-time data;
if the parameter value corresponding to the label threshold value in the real-time data conforms to the target data type, the real-time data conforms to the data information indicated by the label cube associated with the real-time data;
and if the parameter value corresponding to the tag threshold value in the real-time data does not accord with the target data type, determining that the real-time data does not accord with the data information indicated by the tag cube associated with the real-time data.
4. The method of claim 1, wherein the real-time data is pushed to the streaming component by:
pushing the real-time data to a redis message queue;
through a preset timing scheduler aiming at a redis message queue, traversing the redis message queue by timing trigger to acquire real-time data of a first data set;
and pushing the acquired real-time data to the streaming component through the redis message queue.
5. The method of claim 4, wherein the preset trigger condition comprises:
the redis message queue does not receive real-time data having the same tag name within a preset time range,
or the quantity of the real-time data with the same label name pushed to the streaming component meets a preset threshold value.
6. A method for processing tag data, the method comprising:
receiving a batch data table sent by a user;
analyzing the batch data table to obtain a plurality of groups of non-real-time data, wherein each group of non-real-time data comprises a plurality of non-real-time label parameters;
judging whether the plurality of non-real-time label parameters contain label names, label report periods, label threshold values and customer numbers or not for each group of non-real-time data, if the plurality of non-real-time label parameters contain the label names, the label report periods, the label threshold values and the customer numbers, determining label cubes associated with the non-real-time data, judging whether the non-real-time label data accord with data information indicated by the label cubes associated with the non-real-time data or not, and if the non-real-time label data accord with the data information indicated by the label cubes associated with the non-real-time data, pushing the plurality of groups of non-real-time data to a streaming component;
and for each group of non-real-time data, when the group of non-real-time data in the streaming assembly meets a preset trigger condition, reading the group of non-real-time data from the streaming assembly by using an automatic construction engine, constructing the non-real-time data into a tag cube associated with the non-real-time data, and generating corresponding non-real-time tag data.
7. The method of claim 6, wherein the bulk data table is created by:
acquiring a batch data table template in a preset mode, wherein the batch data table template comprises a plurality of lines to be edited, and each line to be edited comprises a plurality of label parameters to be edited;
for each row to be edited, responding to the input operation sequentially executed by a user for each label parameter to be edited in the row to be edited, generating a parameter value corresponding to each label parameter to be edited, and forming non-real-time data corresponding to the row to be edited by the parameter values corresponding to the plurality of label parameters to be edited;
forming the bulk data table from the plurality of sets of non-real time data.
8. An apparatus for processing tag data, the apparatus comprising:
the receiving module is used for receiving real-time data sent by a third-party system through a preset interface, and the real-time data comprises a plurality of label parameters;
the first judging module is used for judging whether the label parameters comprise a label name, a label report period, a label threshold value and a customer number;
the determining module is used for determining a real-time associated label cube if the label parameters comprise a label name, a label report period, a label threshold value and a customer number;
the second judgment module is used for judging whether the real-time data accords with the data information indicated by the label cube associated with the real-time data;
the pushing module is used for pushing the real-time data to the streaming component if the real-time data conforms to the data information indicated by the label cube associated with the real-time data;
and the construction module is used for reading the real-time data from the streaming assembly by using the automatic construction engine when the real-time data in the streaming assembly meets the preset triggering condition, constructing the real-time data into a tag cube associated with the real-time data, and generating the corresponding real-time tag data.
9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operated, the machine-readable instructions being executable by the processor to perform the steps of the method for processing tag data according to any one of claims 1 to 5 or the method for processing tag data according to any one of claims 6 to 7.
10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, performs the steps of the method for processing tag data according to any one of claims 1 to 5 or the method for processing tag data according to any one of claims 6 to 7.
CN202211259401.XA 2022-10-14 Label data processing method and device Active CN115544144B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211259401.XA CN115544144B (en) 2022-10-14 Label data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211259401.XA CN115544144B (en) 2022-10-14 Label data processing method and device

Publications (2)

Publication Number Publication Date
CN115544144A true CN115544144A (en) 2022-12-30
CN115544144B CN115544144B (en) 2024-05-31

Family

ID=

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019237541A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Method and apparatus for determining contact label, and terminal device and medium
CN111126880A (en) * 2020-01-02 2020-05-08 浙江吉利新能源商用车集团有限公司 User portrait generation method, device and equipment
CN112269805A (en) * 2020-11-18 2021-01-26 杭州米雅信息科技有限公司 Data processing method, device, equipment and medium
CN112685448A (en) * 2020-12-25 2021-04-20 中国平安人寿保险股份有限公司 Real-time label automatic generation method and device and storage medium
CN114595943A (en) * 2022-02-14 2022-06-07 烟台杰瑞石油服务集团股份有限公司 Mechanical equipment portrait generation method
CA3148075A1 (en) * 2021-02-08 2022-08-08 10353744 Canada Ltd. Real-time stream data processing method, device, computer apparatus, and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019237541A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Method and apparatus for determining contact label, and terminal device and medium
CN111126880A (en) * 2020-01-02 2020-05-08 浙江吉利新能源商用车集团有限公司 User portrait generation method, device and equipment
CN112269805A (en) * 2020-11-18 2021-01-26 杭州米雅信息科技有限公司 Data processing method, device, equipment and medium
CN112685448A (en) * 2020-12-25 2021-04-20 中国平安人寿保险股份有限公司 Real-time label automatic generation method and device and storage medium
CA3148075A1 (en) * 2021-02-08 2022-08-08 10353744 Canada Ltd. Real-time stream data processing method, device, computer apparatus, and storage medium
CN114595943A (en) * 2022-02-14 2022-06-07 烟台杰瑞石油服务集团股份有限公司 Mechanical equipment portrait generation method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
姜红玉 等: ""基于流式计算的实时用户画像系统研究"", 《计算机技术与发展》, vol. 30, no. 7, 10 July 2020 (2020-07-10), pages 193 - 200 *
朱振华;于晓昀;李超;: "基于公安大数据的人员背景标签应用分析与研究", 电脑知识与技术, no. 21, 25 July 2018 (2018-07-25), pages 34 - 36 *
韦智勇;: "面向推荐系统的用户行为记录数据实时预处理研究与实现", 企业科技与发展, no. 08, 10 August 2018 (2018-08-10), pages 94 - 97 *

Similar Documents

Publication Publication Date Title
US10191968B2 (en) Automated data analysis
US11989176B2 (en) Data query method and apparatus, device, and computer-readable storage medium
CN107704539B (en) Method and device for large-scale text information batch structuring
JP2017224184A (en) Machine learning device
CN107301214B (en) Data migration method and device in HIVE and terminal equipment
WO2017096892A1 (en) Index construction method, search method, and corresponding device, apparatus, and computer storage medium
KR101679050B1 (en) Personalized log analysis system using rule based log data grouping and method thereof
CN111061758B (en) Data storage method, device and storage medium
CN112464034A (en) User data extraction method and device, electronic equipment and computer readable medium
CN110046155B (en) Method, device and equipment for updating feature database and determining data features
WO2021012861A1 (en) Method and apparatus for evaluating data query time consumption, and computer device and storage medium
CN108140022B (en) Data query method and database system
US20210019804A1 (en) Systems and methods for generating synthetic data
CN113763502A (en) Chart generation method, device, equipment and storage medium
CN110737432A (en) script aided design method and device based on root list
CN112069269B (en) Big data and multidimensional feature-based data tracing method and big data cloud server
CN111427544B (en) Software requirement document generation method and device, storage medium and electronic equipment
CN113761185A (en) Main key extraction method, equipment and storage medium
CN112100177A (en) Data storage method and device, computer equipment and storage medium
WO2016119508A1 (en) Method for recognizing large-scale objects based on spark system
CN115544144B (en) Label data processing method and device
CN115544144A (en) Method and device for processing label data
CN115470279A (en) Data source conversion method, device, equipment and medium based on enterprise data
CN115658680A (en) Data storage method, data query method and related device
CN114860819A (en) Method, device, equipment and storage medium for constructing business intelligent system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant