CN116414827A - Terminal data acquisition, analysis and management method based on intelligent library - Google Patents

Terminal data acquisition, analysis and management method based on intelligent library Download PDF

Info

Publication number
CN116414827A
CN116414827A CN202111673102.6A CN202111673102A CN116414827A CN 116414827 A CN116414827 A CN 116414827A CN 202111673102 A CN202111673102 A CN 202111673102A CN 116414827 A CN116414827 A CN 116414827A
Authority
CN
China
Prior art keywords
data
classification
physical data
terminal
physical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111673102.6A
Other languages
Chinese (zh)
Inventor
杨海花
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Youjia Nanjing Software Technology Co ltd
Original Assignee
Youjia Nanjing Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Youjia Nanjing Software Technology Co ltd filed Critical Youjia Nanjing Software Technology Co ltd
Priority to CN202111673102.6A priority Critical patent/CN116414827A/en
Publication of CN116414827A publication Critical patent/CN116414827A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a terminal data acquisition analysis management method based on an intelligent library, which utilizes a physical terminal and a virtual terminal to acquire data, adopts the two terminals to acquire the data, can better acquire various data, adopts the data of multiple devices in the same field to carry out repeated data approximate data combination, carries out removal processing on obvious error data, marks the obvious error data through keywords to form a unit management database, and then carries out analysis management according to later requirements.

Description

Terminal data acquisition, analysis and management method based on intelligent library
Technical Field
The invention relates to the technical field of computer science, in particular to a terminal data acquisition, analysis and management method based on an intelligent library.
Background
At present, computer science and technology rapidly develops, a database is the core of the computer science and technology, information reflected by data is the key of development and progress of human beings, different data are processed and analyzed sequentially through a computer system, the needed results are output, and the data results are applied to life production, so that the development of the science and technology of people is continuously promoted. The method has the advantages that different data information is known by a computer fully, different data information is collected, different data information is analyzed and compared, the useful data information needed by people is extracted from massive data information through continuous understanding, comparison, analysis and learning, the data information is applied to the living production of people, the scientific technology level of people is continuously improved, in the process of understanding, collecting and comparing and analyzing massive data information, the data collection is the most important part in the whole process, more data information can be comprehensively known by people only through the full and comprehensive collection of the data, a plurality of data collection technologies exist in the market at present, but the data collection mode is simpler, only the data is collected by the prior art, no corresponding processing is performed on the data, the collected data is original, whether the collected data is associated with each other or not is not reflected when the collected data is collected by using the technology, the problems of repeated, old, the redundant collection and the like exist, a great deal of time and data can be required to be removed when people analyze the data in the process of comparing and analyzing the massive data, the efficiency is low, the data collection efficiency is designed, the data is improved by the prior art, the data is comprehensively analyzed and the data is not analyzed by the prior art, and the data collection method is based on the prior art, and the data collection method is fully is improved, and the data collection management is difficult and is comprehensively analyzed.
Disclosure of Invention
In order to solve the technical problems, the invention provides a terminal data acquisition, analysis and management method based on an intelligent library, which utilizes a physical terminal and a virtual terminal to acquire data, adopts data to multiple devices in the same domain, performs repeated data approximate data combination, and performs analysis and management according to later requirements, thereby greatly improving analysis and management efficiency.
A terminal data acquisition, analysis and management method based on an intelligent library is characterized in that: the method comprises the following specific steps:
1) The data acquisition is carried out by using a physical data acquisition terminal and a virtual data acquisition terminal, and the acquired data is concentrated to an intelligent library;
the virtual data acquisition comprises stored data and network real-time data;
the physical data acquisition terminal comprises a communication access terminal, a remote acquisition device and a handheld data acquisition device;
2) Preprocessing the obtained data;
classifying the data of the same or similar domain into a group according to the application scene requirement;
3) Removing redundancy from the packet data;
let n signal samples in a group of data be observation objects, each object has m physical data, and a sample sequence d= [ D1, D2, …, dm ] T can be obtained, where di is all sample points of the ith physical data of the samples, i=1, 2, …, m, and starting point zero-imaging processing is performed on the samples of the m physical data:
Figure BDA0003450483190000021
wherein the method comprises the steps of
Figure BDA0003450483190000022
Is the initial point zero image of the ith point to form an initialized sample matrix
Figure BDA0003450483190000023
And for all i.ltoreq.j, i, j=1, 2, …, m, respectively solving the association coefficients |si| and |sj| of the ith physical data and the jth physical data:
Figure BDA0003450483190000024
Figure BDA0003450483190000025
and the gray absolute correlation between the physical data is obtained by the following formula:
Figure BDA0003450483190000026
finding out a value with gray absolute association degree larger than 0.8 between physical data, regarding the corresponding physical data as strongly-relevant physical data, and randomly removing one physical data in the two physical data with large absolute association degree;
4) Marking the element management database through keywords according to the classification mode;
5) And analyzing the corresponding unit management database according to actual needs to obtain an analysis result.
As a further improvement of the invention, the classification means of the application scenario include classification from field type, classification from data structure, classification from the perspective of describing things, classification from the perspective of data processing, classification from data granularity and classification from update means.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a terminal data acquisition analysis management method based on an intelligent library, which utilizes a physical terminal and a virtual terminal to acquire data, adopts the two terminals to acquire the data, can better acquire various data, adopts the data of multiple devices in the same domain to carry out repeated data approximate data combination, carries out removal processing on obvious error data, marks the obvious error data through keywords to form a unit management database, and then carries out analysis management according to later requirements.
Drawings
Fig. 1 is a schematic diagram of a principle and structure of a terminal data acquisition, analysis and management method of the present invention.
Detailed Description
The invention is described in further detail below with reference to the attached drawings and detailed description:
as a specific embodiment of the invention, the invention provides a terminal data acquisition, analysis and management method based on an intelligent library, which comprises the following specific steps:
1) The data acquisition is carried out by using a physical data acquisition terminal and a virtual data acquisition terminal, and the acquired data is concentrated to an intelligent library;
the virtual data acquisition comprises stored data and network real-time data;
the physical data acquisition terminal comprises a communication access terminal, a remote acquisition device and a handheld data acquisition device;
2) Preprocessing the obtained data;
classifying the data of the same or similar domain into a group according to the application scene requirement;
the classification modes of the application scene comprise a field type classification, a data structure classification, a thing description angle classification, a data processing angle classification, a data granularity classification and an updating mode classification;
classifying from the field types including a text class, a numeric class, and a temporal class;
text-like data is often used for descriptive fields such as name, address, transaction summary, etc. Such data is not quantized and cannot be used directly for four-law operations. When in use, the field can be standardized (such as address standardization) and then character matching can be carried out, and direct fuzzy matching can also be carried out.
The numeric class data is used to describe quantization attributes, or is used for encoding. For example, transaction amount, commodity number, product number, customer score and the like all belong to quantization attributes, and can be directly used for four operations, which are core fields of daily calculation indexes. The postal code, the ID card number, the card number and the like belong to codes, a plurality of enumeration values are regularly coded, four arithmetic operations can be carried out, but no substantial business meaning exists, and a plurality of codes exist as dimensions.
Time class data is used only to describe the time at which an event occurs, time is a very important dimension, and is very important in business statistics or analysis;
classifying from the data structure includes structured data, semi-structured data, and unstructured data;
structured data generally refers to data recorded in a relational database mode, wherein the data is stored according to tables and fields, and the fields are mutually independent.
The semi-structured data is data recorded in a self-description text mode, and the self-description data is very convenient in the use process because the self-description data does not need to meet the very strict structure and relationship on the relational database. Many web sites and application access logs employ this format, as does the web pages themselves.
Unstructured data generally refers to data in the format of voice, pictures, video, etc. Such data is typically encoded in a specific application format, is very large in data volume, and cannot be simply converted into structured data.
Classifying from the perspective of describing things comprises state class data, event class data and mixed class data;
the description of the objective world with data can generally be viewed from two aspects. The first aspect is an entity describing the objective world, i.e. individual objects such as people, tables, accounts, etc. For these objects, each having a characteristic, different kinds of objects have different characteristics, such as the characteristics of a person including name, gender and age, and the characteristics of a table including color and texture; different characteristic values differ for different individuals of the same subject, such as Zhang Sannan years old, four-year-old, four-female 24 years old. Some features are stable and unchanged, others are constantly changing, such as gender is generally unchanged, but account amounts and positions of people are possibly changed at any time. Thus, each object can be described using a set of characteristic data that can change over time (the change in data depends on the change in the object on the one hand and on the time difference the change reflects on the data on the other hand), and the data at each point in time reflects the state that the object at that point in time is in and is therefore referred to as state class data.
The second aspect describes the relationships between objects in the objective world, how they interact, how they react. This interaction or response is recorded and this type of data is referred to as event type data. For example, a customer purchases an item of clothing from a store, where three objects, customer, store, clothing, respectively, have a transaction relationship between them.
The mixed class data also belongs to the category of event class data theoretically, and the difference between the two is that the event occurrence process described by the mixed class data lasts longer, the event is not finished when the data is recorded, and the event is changed. For example, the whole process from order generation to case settlement needs to last for a period of time, and the first record of order data is that the order state and the order amount can be changed for a plurality of times later when the order is produced.
Classifying the original data and the derivative data from the data processing perspective;
raw data refers to data from an upstream system that has not been processed. While a large amount of derived data is generated from the original data, a piece of original data remains without any modification, and once the derived data has a problem, the derived data can be recalculated from the original data at any time.
Derived data refers to data generated by processing the raw data. Derived data includes various data marts, summary layers, broad tables, data analysis and mining results, and the like. From the derivative purpose, the method can be simply divided into two cases, wherein one is to improve the data delivery efficiency, and the data marts, the summarization layers and the broad tables belong to the case. Another is that data analysis and mining results are of this kind in order to solve business problems.
Classifying from data granularity including detail data and summary data;
raw data, typically obtained from a business system, is relatively small in granularity, including a large amount of business details. For example, the customer table contains data such as sex, age, name, etc. of each customer, and the transaction table contains data such as time, place, amount, etc. of each transaction. This data is referred to as detail data. Although the detail data contains the most abundant business details, a great deal of calculation is often needed in analysis and mining, and the efficiency is low.
In order to improve the data analysis efficiency, the data needs to be preprocessed, and the data is generally summarized according to common dimensions such as time dimension, regional dimension, product dimension and the like. When analyzing the data, the summarized data is preferentially used, and if the summarized data cannot meet the requirement, the detail data is used, so that the data use efficiency is improved.
Classifying batch data and real-time data from an updating mode;
when the source system provides data, different source systems have different providing modes, and the two modes can be mainly divided. One is a batch mode, which is provided at intervals, with all changes in the period being provided. The batch mode has lower timeliness, most of the traditional systems adopt a T+1 mode, and service users can only analyze data of the previous day and see the report of the previous day at the highest speed.
Another way is in real time, i.e. whenever data changes or new data is generated, it is provided immediately. The method is quick in timeliness, and can effectively meet the service with high timeliness requirements, such as scene marketing. However, the method has higher technical requirements, the system must be ensured to be stable enough, and once data errors occur, serious business influence is easily caused.
3) Removing redundancy from the packet data;
let n signal samples in a group of data be observation objects, each object has m physical data, and a sample sequence d= [ D1, D2, …, dm ] T can be obtained, where di is all sample points of the ith physical data of the samples, i=1, 2, …, m, and starting point zero-imaging processing is performed on the samples of the m physical data:
Figure BDA0003450483190000051
wherein the method comprises the steps of
Figure BDA0003450483190000052
Is the initial point zero image of the ith point to form an initialized sample matrix
Figure BDA0003450483190000053
And for all i.ltoreq.j, i, j=1, 2, …, m, respectively solving the association coefficients |si| and |sj| of the ith physical data and the jth physical data:
Figure BDA0003450483190000054
Figure BDA0003450483190000056
and the gray absolute correlation between the physical data is obtained by the following formula:
Figure BDA0003450483190000055
and finding out a value with gray absolute association degree larger than 0.8 between the physical data, regarding the corresponding physical data as strongly-correlated physical data, and randomly removing one physical data in the two physical data with large absolute association degree.
4) Marking the element management database through keywords according to the classification mode;
5) And analyzing the corresponding unit management database according to actual needs.
The above description is only of the preferred embodiment of the present invention, and is not intended to limit the present invention in any other way, but is intended to cover any modifications or equivalent variations according to the technical spirit of the present invention, which fall within the scope of the present invention as defined by the appended claims.

Claims (2)

1. A terminal data acquisition, analysis and management method based on an intelligent library is characterized in that: the method comprises the following specific steps:
1) The data acquisition is carried out by using a physical data acquisition terminal and a virtual data acquisition terminal, and the acquired data is concentrated to an intelligent library;
the virtual data acquisition comprises stored data and network real-time data;
the physical data acquisition terminal comprises a communication access terminal, a remote acquisition device and a handheld data acquisition device;
2) Preprocessing the obtained data;
classifying the data of the same or similar domain into a group according to the application scene requirement;
3) Removing redundancy from the packet data;
setting n signal samples in data of one group as observation objects, wherein each object has m physical data, and a sample sequence D= [ D ] can be obtained 1 ,d 2 ,…,d m ] T Wherein d is i All sample points of the ith physical data of the sample, i=1, 2, …, m, and performing starting point zero-imaging processing on samples of m physical data:
Figure FDA0003450483180000011
wherein the method comprises the steps of
Figure FDA0003450483180000016
Is the starting point zero image of the ith point, forms the initialized sample matrix +.>
Figure FDA0003450483180000012
And for all i.ltoreq.j, i, j=1, 2, …, m, respectively obtaining the association coefficient |s of the ith physical data and the jth physical data i |and |s j |:
Figure FDA0003450483180000013
Figure FDA0003450483180000014
And the gray absolute correlation between the physical data is obtained by the following formula:
Figure FDA0003450483180000015
finding out a value with gray absolute association degree larger than 0.8 between physical data, regarding the corresponding physical data as strongly-relevant physical data, and randomly removing one physical data in the two physical data with large absolute association degree;
4) Marking the element management database through keywords according to the classification mode;
5) And analyzing the corresponding unit management database according to actual needs to obtain an analysis result.
2. The terminal data acquisition, analysis and management method based on the intelligent library as claimed in claim 1, wherein the method comprises the following steps: the classification of the application scene includes classification from field type, classification from data structure, classification from the perspective of describing things, classification from the perspective of data processing, classification from data granularity, and classification from update mode.
CN202111673102.6A 2021-12-31 2021-12-31 Terminal data acquisition, analysis and management method based on intelligent library Withdrawn CN116414827A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111673102.6A CN116414827A (en) 2021-12-31 2021-12-31 Terminal data acquisition, analysis and management method based on intelligent library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111673102.6A CN116414827A (en) 2021-12-31 2021-12-31 Terminal data acquisition, analysis and management method based on intelligent library

Publications (1)

Publication Number Publication Date
CN116414827A true CN116414827A (en) 2023-07-11

Family

ID=87055142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111673102.6A Withdrawn CN116414827A (en) 2021-12-31 2021-12-31 Terminal data acquisition, analysis and management method based on intelligent library

Country Status (1)

Country Link
CN (1) CN116414827A (en)

Similar Documents

Publication Publication Date Title
Guevara et al. diverse: an R Package to Analyze Diversity in Complex Systems.
CN111831636B (en) Data processing method, device, computer system and readable storage medium
CN108363821A (en) A kind of information-pushing method, device, terminal device and storage medium
CN103605651A (en) Data processing showing method based on on-line analytical processing (OLAP) multi-dimensional analysis
CN113836131B (en) Big data cleaning method and device, computer equipment and storage medium
CN102693299A (en) System and method for parallel video copy detection
CN112307232A (en) Intelligent classification storage processing method for big data content
Caruso et al. Deprivation and the dimensionality of welfare: a variable‐selection cluster‐analysis approach
CN116644184B (en) Human resource information management system based on data clustering
CN111859070A (en) Mass internet news cleaning system
CN114119057A (en) User portrait model construction system
CN116842142B (en) Intelligent retrieval system for medical instrument
CN115204436A (en) Method, device, equipment and medium for detecting abnormal reasons of business indexes
JP3185167B2 (en) Data processing system
CN116862434A (en) Material data management system and method based on big data
CN116414827A (en) Terminal data acquisition, analysis and management method based on intelligent library
CN116340387A (en) Statistical analysis method and system for personal information disclosure condition of data table
CN115034762A (en) Post recommendation method and device, storage medium, electronic equipment and product
CN113779110A (en) Family relation network extraction method and device, computer equipment and storage medium
CN113408207A (en) Data mining method based on social network analysis technology
CN113538011A (en) Method for associating non-registered contact information with registered user in power system
CN111666378A (en) Chinese yearbook title classification method based on word vectors
Kettenring et al. Cluster analysis applied to the validation of course objectives
CN112215627B (en) Customer information data processing system
CN113392203B (en) Intelligent question-answering method, intelligent question-answering device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20230711