CN116414827A

CN116414827A - Terminal data acquisition, analysis and management method based on intelligent library

Info

Publication number: CN116414827A
Application number: CN202111673102.6A
Authority: CN
Inventors: 杨海花
Original assignee: Youjia Nanjing Software Technology Co ltd
Current assignee: Youjia Nanjing Software Technology Co ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2023-07-11

Abstract

The invention provides a terminal data acquisition analysis management method based on an intelligent library, which utilizes a physical terminal and a virtual terminal to acquire data, adopts the two terminals to acquire the data, can better acquire various data, adopts the data of multiple devices in the same field to carry out repeated data approximate data combination, carries out removal processing on obvious error data, marks the obvious error data through keywords to form a unit management database, and then carries out analysis management according to later requirements.

Description

Terminal data acquisition, analysis and management method based on intelligent library

Technical Field

The invention relates to the technical field of computer science, in particular to a terminal data acquisition, analysis and management method based on an intelligent library.

Background

At present, computer science and technology rapidly develops, a database is the core of the computer science and technology, information reflected by data is the key of development and progress of human beings, different data are processed and analyzed sequentially through a computer system, the needed results are output, and the data results are applied to life production, so that the development of the science and technology of people is continuously promoted. The method has the advantages that different data information is known by a computer fully, different data information is collected, different data information is analyzed and compared, the useful data information needed by people is extracted from massive data information through continuous understanding, comparison, analysis and learning, the data information is applied to the living production of people, the scientific technology level of people is continuously improved, in the process of understanding, collecting and comparing and analyzing massive data information, the data collection is the most important part in the whole process, more data information can be comprehensively known by people only through the full and comprehensive collection of the data, a plurality of data collection technologies exist in the market at present, but the data collection mode is simpler, only the data is collected by the prior art, no corresponding processing is performed on the data, the collected data is original, whether the collected data is associated with each other or not is not reflected when the collected data is collected by using the technology, the problems of repeated, old, the redundant collection and the like exist, a great deal of time and data can be required to be removed when people analyze the data in the process of comparing and analyzing the massive data, the efficiency is low, the data collection efficiency is designed, the data is improved by the prior art, the data is comprehensively analyzed and the data is not analyzed by the prior art, and the data collection method is based on the prior art, and the data collection method is fully is improved, and the data collection management is difficult and is comprehensively analyzed.

Disclosure of Invention

In order to solve the technical problems, the invention provides a terminal data acquisition, analysis and management method based on an intelligent library, which utilizes a physical terminal and a virtual terminal to acquire data, adopts data to multiple devices in the same domain, performs repeated data approximate data combination, and performs analysis and management according to later requirements, thereby greatly improving analysis and management efficiency.

A terminal data acquisition, analysis and management method based on an intelligent library is characterized in that: the method comprises the following specific steps:

1) The data acquisition is carried out by using a physical data acquisition terminal and a virtual data acquisition terminal, and the acquired data is concentrated to an intelligent library;

the virtual data acquisition comprises stored data and network real-time data;

the physical data acquisition terminal comprises a communication access terminal, a remote acquisition device and a handheld data acquisition device;

2) Preprocessing the obtained data;

classifying the data of the same or similar domain into a group according to the application scene requirement;

3) Removing redundancy from the packet data;

let n signal samples in a group of data be observation objects, each object has m physical data, and a sample sequence d= [ D1, D2, …, dm ] T can be obtained, where di is all sample points of the ith physical data of the samples, i=1, 2, …, m, and starting point zero-imaging processing is performed on the samples of the m physical data:

wherein the method comprises the steps of

Is the initial point zero image of the ith point to form an initialized sample matrix

And for all i.ltoreq.j, i, j=1, 2, …, m, respectively solving the association coefficients |si| and |sj| of the ith physical data and the jth physical data:

and the gray absolute correlation between the physical data is obtained by the following formula:

finding out a value with gray absolute association degree larger than 0.8 between physical data, regarding the corresponding physical data as strongly-relevant physical data, and randomly removing one physical data in the two physical data with large absolute association degree;

4) Marking the element management database through keywords according to the classification mode;

5) And analyzing the corresponding unit management database according to actual needs to obtain an analysis result.

As a further improvement of the invention, the classification means of the application scenario include classification from field type, classification from data structure, classification from the perspective of describing things, classification from the perspective of data processing, classification from data granularity and classification from update means.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a terminal data acquisition analysis management method based on an intelligent library, which utilizes a physical terminal and a virtual terminal to acquire data, adopts the two terminals to acquire the data, can better acquire various data, adopts the data of multiple devices in the same domain to carry out repeated data approximate data combination, carries out removal processing on obvious error data, marks the obvious error data through keywords to form a unit management database, and then carries out analysis management according to later requirements.

Drawings

Fig. 1 is a schematic diagram of a principle and structure of a terminal data acquisition, analysis and management method of the present invention.

Detailed Description

The invention is described in further detail below with reference to the attached drawings and detailed description:

as a specific embodiment of the invention, the invention provides a terminal data acquisition, analysis and management method based on an intelligent library, which comprises the following specific steps:

the virtual data acquisition comprises stored data and network real-time data;

2) Preprocessing the obtained data;

the classification modes of the application scene comprise a field type classification, a data structure classification, a thing description angle classification, a data processing angle classification, a data granularity classification and an updating mode classification;

classifying from the field types including a text class, a numeric class, and a temporal class;

text-like data is often used for descriptive fields such as name, address, transaction summary, etc. Such data is not quantized and cannot be used directly for four-law operations. When in use, the field can be standardized (such as address standardization) and then character matching can be carried out, and direct fuzzy matching can also be carried out.

The numeric class data is used to describe quantization attributes, or is used for encoding. For example, transaction amount, commodity number, product number, customer score and the like all belong to quantization attributes, and can be directly used for four operations, which are core fields of daily calculation indexes. The postal code, the ID card number, the card number and the like belong to codes, a plurality of enumeration values are regularly coded, four arithmetic operations can be carried out, but no substantial business meaning exists, and a plurality of codes exist as dimensions.

Time class data is used only to describe the time at which an event occurs, time is a very important dimension, and is very important in business statistics or analysis;

classifying from the data structure includes structured data, semi-structured data, and unstructured data;

structured data generally refers to data recorded in a relational database mode, wherein the data is stored according to tables and fields, and the fields are mutually independent.

The semi-structured data is data recorded in a self-description text mode, and the self-description data is very convenient in the use process because the self-description data does not need to meet the very strict structure and relationship on the relational database. Many web sites and application access logs employ this format, as does the web pages themselves.

Unstructured data generally refers to data in the format of voice, pictures, video, etc. Such data is typically encoded in a specific application format, is very large in data volume, and cannot be simply converted into structured data.

Classifying from the perspective of describing things comprises state class data, event class data and mixed class data;

the description of the objective world with data can generally be viewed from two aspects. The first aspect is an entity describing the objective world, i.e. individual objects such as people, tables, accounts, etc. For these objects, each having a characteristic, different kinds of objects have different characteristics, such as the characteristics of a person including name, gender and age, and the characteristics of a table including color and texture; different characteristic values differ for different individuals of the same subject, such as Zhang Sannan years old, four-year-old, four-female 24 years old. Some features are stable and unchanged, others are constantly changing, such as gender is generally unchanged, but account amounts and positions of people are possibly changed at any time. Thus, each object can be described using a set of characteristic data that can change over time (the change in data depends on the change in the object on the one hand and on the time difference the change reflects on the data on the other hand), and the data at each point in time reflects the state that the object at that point in time is in and is therefore referred to as state class data.

The second aspect describes the relationships between objects in the objective world, how they interact, how they react. This interaction or response is recorded and this type of data is referred to as event type data. For example, a customer purchases an item of clothing from a store, where three objects, customer, store, clothing, respectively, have a transaction relationship between them.

The mixed class data also belongs to the category of event class data theoretically, and the difference between the two is that the event occurrence process described by the mixed class data lasts longer, the event is not finished when the data is recorded, and the event is changed. For example, the whole process from order generation to case settlement needs to last for a period of time, and the first record of order data is that the order state and the order amount can be changed for a plurality of times later when the order is produced.

Classifying the original data and the derivative data from the data processing perspective;

raw data refers to data from an upstream system that has not been processed. While a large amount of derived data is generated from the original data, a piece of original data remains without any modification, and once the derived data has a problem, the derived data can be recalculated from the original data at any time.

Derived data refers to data generated by processing the raw data. Derived data includes various data marts, summary layers, broad tables, data analysis and mining results, and the like. From the derivative purpose, the method can be simply divided into two cases, wherein one is to improve the data delivery efficiency, and the data marts, the summarization layers and the broad tables belong to the case. Another is that data analysis and mining results are of this kind in order to solve business problems.

Classifying from data granularity including detail data and summary data;

raw data, typically obtained from a business system, is relatively small in granularity, including a large amount of business details. For example, the customer table contains data such as sex, age, name, etc. of each customer, and the transaction table contains data such as time, place, amount, etc. of each transaction. This data is referred to as detail data. Although the detail data contains the most abundant business details, a great deal of calculation is often needed in analysis and mining, and the efficiency is low.

In order to improve the data analysis efficiency, the data needs to be preprocessed, and the data is generally summarized according to common dimensions such as time dimension, regional dimension, product dimension and the like. When analyzing the data, the summarized data is preferentially used, and if the summarized data cannot meet the requirement, the detail data is used, so that the data use efficiency is improved.

Classifying batch data and real-time data from an updating mode;

when the source system provides data, different source systems have different providing modes, and the two modes can be mainly divided. One is a batch mode, which is provided at intervals, with all changes in the period being provided. The batch mode has lower timeliness, most of the traditional systems adopt a T+1 mode, and service users can only analyze data of the previous day and see the report of the previous day at the highest speed.

Another way is in real time, i.e. whenever data changes or new data is generated, it is provided immediately. The method is quick in timeliness, and can effectively meet the service with high timeliness requirements, such as scene marketing. However, the method has higher technical requirements, the system must be ensured to be stable enough, and once data errors occur, serious business influence is easily caused.

3) Removing redundancy from the packet data;

wherein the method comprises the steps of

and finding out a value with gray absolute association degree larger than 0.8 between the physical data, regarding the corresponding physical data as strongly-correlated physical data, and randomly removing one physical data in the two physical data with large absolute association degree.

5) And analyzing the corresponding unit management database according to actual needs.

The above description is only of the preferred embodiment of the present invention, and is not intended to limit the present invention in any other way, but is intended to cover any modifications or equivalent variations according to the technical spirit of the present invention, which fall within the scope of the present invention as defined by the appended claims.

Claims

1. A terminal data acquisition, analysis and management method based on an intelligent library is characterized in that: the method comprises the following specific steps:

the virtual data acquisition comprises stored data and network real-time data;

2) Preprocessing the obtained data;

3) Removing redundancy from the packet data;

setting n signal samples in data of one group as observation objects, wherein each object has m physical data, and a sample sequence D= [ D ] can be obtained ₁ ,d ₂ ,…,d _m ] ^T Wherein d is _i All sample points of the ith physical data of the sample, i=1, 2, …, m, and performing starting point zero-imaging processing on samples of m physical data:

wherein the method comprises the steps of

Is the starting point zero image of the ith point, forms the initialized sample matrix +.>

And for all i.ltoreq.j, i, j=1, 2, …, m, respectively obtaining the association coefficient |s of the ith physical data and the jth physical data _i |and |s _j |：

2. The terminal data acquisition, analysis and management method based on the intelligent library as claimed in claim 1, wherein the method comprises the following steps: the classification of the application scene includes classification from field type, classification from data structure, classification from the perspective of describing things, classification from the perspective of data processing, classification from data granularity, and classification from update mode.