CN110968596A

CN110968596A - Data processing method based on label system

Info

Publication number: CN110968596A
Application number: CN201911211040.XA
Authority: CN
Inventors: 王勇庆
Original assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Current assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2020-04-07

Abstract

The invention particularly relates to a data processing method based on a label system. The data processing method based on the label system comprises the steps of firstly establishing a corresponding relation between a label system and a processing rule, and carrying out structured analysis on data by using the identification data characteristics of the label system so as to standardize and generalize the processing rule; and then extracting data according to the label items to realize multi-level analysis and multiplexing of the data. The data processing method based on the label system not only enables the processing rule to be standardized and generalized by establishing the corresponding relation between the label system and the processing rule, but also extracts data according to the label item, realizes multi-level analysis and multiplexing of the data, and greatly improves the value of the data.

Description

Data processing method based on label system

Technical Field

The invention relates to the technical field of big data, in particular to a data processing method based on a label system.

Background

Data is a form of expression for facts, concepts, or instructions that may be processed by human or automated means. After the data is interpreted and given a certain meaning, it becomes information. Data processing (data processing) is the collection, storage, retrieval, processing, transformation, and transmission of data.

The basic purpose of data processing is to extract and derive valuable, meaningful data for certain people from large, cluttered, unintelligible amounts of data.

Data processing is the basic link of system engineering and automatic control. Data processing is throughout various fields of social production and social life. The development of data processing technology and the breadth and depth of its application have greatly influenced the progress of human society development.

However, in the big data era, with the rapid rise and popularization of internet technology, the data volume collected by people in different fields is large to an unprecedented extent.

The industry groups the characteristics of big data into 4 "V" -Volume, Variety, Value, Velocity.

Firstly, the data size is huge, and the data size is increased from a TB level to a PB level;

secondly, the data types are various and comprise a plurality of types such as weblogs, videos, pictures, geographical location information and the like;

thirdly, the value density is low, taking video as an example, in the continuous monitoring process, the data which is possibly useful is only one or two seconds;

fourth, the processing speed is fast, which is a substantial difference from the conventional data mining technology.

In summary, the coming of big data era has revolutionized the way of data generation, storage and processing, and people's work and life can be basically represented digitally, so it is more and more important to adopt a way of effectively processing data.

In the traditional data processing mode, a data processing engineer carries out a series of processes according to business requirements, the processes are closely related to business, and the standardization of processing rules and the unification of data results cannot be achieved, namely the reusability of data cannot be achieved. The complex and repeated work causes the inefficiency of data processing, and the data cannot be shared and used due to the unnormalization of the data, so that the value of the data is reduced.

With the advent of the big data era, data is quantized, diversified and valued, and data processing is becoming more important and complex. How to find an effective data processing mode makes the value of the data more prominent, and the data use more convenient becomes a difficult problem to be solved urgently.

In view of the above situation, the present invention provides a data processing method based on a tag system.

Disclosure of Invention

In order to make up for the defects of the prior art, the invention provides a simple and efficient data processing method based on a label system.

The invention is realized by the following technical scheme:

a data processing method based on a label system is characterized in that: firstly, establishing a corresponding relation between a tag system and a processing rule, and performing structured analysis on data by using the tag system identification data characteristics to standardize and generalize the processing rule; and then extracting data according to the label items to realize multi-level analysis and multiplexing of the data.

According to the data processing method based on the label system, firstly, original data are analyzed according to business requirements, and a label system composed of a label tree structure is created.

In order to avoid that the specific business group personnel are too closely related to the specific business in the analysis process and cannot achieve the effect of general use in other business departments after data processing, the personnel of the basic data processing group and the personnel of the specific business group participate together to create a label system.

In the label system, a corresponding relation between a data processing rule and the label system is established, the data processing rule is more detailed, each label has a corresponding data processing rule, and the processed data is endowed with a corresponding label value; the data is further processed, the tag value of the previous step can participate in the calculation of the next step by depending on the analysis of the tag value of the previous step, the data processing of each step is independent, and the processed data can be used for further analysis of different rules, so that the multiplexing of processing rules and the multiplexing of data are realized; and the data processing rules are as simple as possible, and the processing of each rule is in accordance with the meaning expressed by the corresponding label.

According to the data processing method based on the label system, after the label system is established, an operation storage scheme is constructed, data processing rules are calculated, processing results are stored, and the processing results are displayed.

In order to conveniently and uniformly monitor and process all executed rules and make the flow clearer, uniform interface integration is carried out on all processing modes, the processing modes are identified in the rule making process, and the uniform interface is used for operation execution.

And when the data is regular and the data volume is small, storing the processing result by adopting a traditional relational database.

When the data volume is large and the query speed requirement on the processing result is high, a non-relational database such as an elastic search is adopted, so that the storage and quick query functions of the large data volume can be met, the self-contained data processing function is realized, and the further exploration of data by an operator is facilitated.

The invention has the beneficial effects that: the data processing method based on the label system not only enables the processing rule to be standardized and generalized by establishing the corresponding relation between the label system and the processing rule, but also extracts data according to the label item, realizes multi-level analysis and multiplexing of the data, and greatly improves the value of the data.

Drawings

FIG. 1 is a schematic diagram of a data processing method based on a label system according to the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the embodiment of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The data processing method based on the label system comprises the steps of firstly establishing a corresponding relation between a label system and a processing rule, and carrying out structured analysis on data by using the identification data characteristics of the label system so as to standardize and generalize the processing rule; and then extracting data according to the label items to realize multi-level analysis and multiplexing of the data.

A label system which is well built can enable people of each service group to clearly find the position of the data requirement of the people, so that the business process of the people can be quickly built from the label system. In order to avoid that the specific business group personnel are too closely related to the specific business in the analysis process and cannot achieve the effect of general use in other business departments after data processing, the personnel of the basic data processing group and the personnel of the specific business group participate together to create a label system.

The label system with good construction enables the data processing to become structured and streamlined, the data processing to become clear and reliable, the workload of the data processing can be reduced, and the value of the data can be obtained to the maximum extent.

In the label system, a corresponding relation between a data processing rule and the label system is established, the data processing rule is more detailed, each label has a corresponding data processing rule, each step of processing achieves an effect, and the processed data is endowed with a corresponding label value; the data is further processed, the tag value of the previous step can participate in the next calculation, the data processing of each step is independent, the processed data can be used for further analysis of different rules, and therefore multiplexing of processing rules and multiplexing of data are achieved, and the multiplexing of the processing rules and the data can greatly improve the efficiency of data processing.

In addition, the data processing rules are as simple as possible, and the processing of each rule is matched with the meaning expressed by the corresponding label. Because the creation of data processing rules is closely related to the label hierarchy, rule creators should not include multiple steps in a rule for convenience, which can cause confusion in the overall label hierarchy.

Due to different data sources, the adopted processing modes are various. For example, the traditional data processing mode, Excel processing, relational database processing, and currently popular big data processing architecture spark, etc. In order to conveniently and uniformly monitor and process all executed rules and make the flow clearer, uniform interface integration is carried out on all processing modes, the processing modes are identified in the rule making process, and the uniform interface is used for operation execution.

The scheme of processing result storage can be based on specific scenes, and when the data is regular and the data volume is small, the traditional relational database is adopted to store the processing result.

Compared with the prior art, the data processing method based on the label system has the following characteristics:

firstly, by establishing a label system, a data processor can quickly and clearly obtain a data position required by the data processor, and quickly construct a service of the data processor; and simultaneously, the situation that the analysis and calculation are repeated from a large amount of data every time the same batch of data is used is avoided.

Secondly, the data processing rules support diversity, and no matter the rules for performing mathematical operation, enumeration, regular expression or text analysis on the data can be applied, the flexibility of the rules enables the application range of the label to be wider

Thirdly, the uniform rule calculation interface can help technicians to monitor and manage the whole processing flow more effectively.

A data processing method based on a tag system in the embodiment of the present invention is described in detail above. While the present invention has been described with reference to specific examples, which are provided to assist in understanding the core concepts of the present invention, it is intended that all other embodiments that can be obtained by those skilled in the art without departing from the spirit of the present invention shall fall within the scope of the present invention.

Claims

1. A data processing method based on a label system is characterized in that: firstly, establishing a corresponding relation between a tag system and a processing rule, and performing structured analysis on data by using the tag system identification data characteristics to standardize and generalize the processing rule; and then extracting data according to the label items to realize multi-level analysis and multiplexing of the data.

2. The tag system-based data processing method of claim 1, wherein: firstly, analyzing original data according to business requirements, and creating a label system composed of a label tree structure.

3. The data processing method based on the label system as claimed in claim 2, characterized in that: in order to avoid that the specific business group personnel are too closely related to the specific business in the analysis process and cannot achieve the effect of general use in other business departments after data processing, the personnel of the basic data processing group and the personnel of the specific business group participate together to create a label system.

4. A data processing method based on a label system according to claim 2 or 3, characterized in that: in the label system, a corresponding relation between a data processing rule and the label system is established, the data processing rule is more detailed, each label has a corresponding data processing rule, and the processed data is endowed with a corresponding label value; the data is further processed, the tag value of the previous step can participate in the calculation of the next step by depending on the analysis of the tag value of the previous step, the data processing of each step is independent, and the processed data can be used for further analysis of different rules, so that the multiplexing of processing rules and the multiplexing of data are realized; and the data processing rules are as simple as possible, and the processing of each rule is in accordance with the meaning expressed by the corresponding label.

5. The tag system-based data processing method of claim 4, wherein: and after the label system is established, constructing an operation storage scheme, calculating a data processing rule, storing a processing result and displaying the processing result.

6. The tag system-based data processing method of claim 5, wherein: in order to conveniently and uniformly monitor and process all executed rules and make the flow clearer, uniform interface integration is carried out on all processing modes, the processing modes are identified in the rule making process, and the uniform interface is used for operation execution.

7. The tag system-based data processing method of claim 5, wherein: and when the data is regular and the data volume is small, storing the processing result by adopting a traditional relational database.

8. The tag system-based data processing method of claim 5, wherein: when the data volume is large and the query speed requirement on the processing result is high, the non-relational database is adopted, so that the storage and quick query functions of the large data volume can be met, the data processing function of the non-relational database is realized, and the further exploration of data by an operator is facilitated.