CN113127449A - Method for constructing aluminum/copper plate strip production full-flow data warehouse - Google Patents

Method for constructing aluminum/copper plate strip production full-flow data warehouse Download PDF

Info

Publication number
CN113127449A
CN113127449A CN202110450036.XA CN202110450036A CN113127449A CN 113127449 A CN113127449 A CN 113127449A CN 202110450036 A CN202110450036 A CN 202110450036A CN 113127449 A CN113127449 A CN 113127449A
Authority
CN
China
Prior art keywords
data
aluminum
copper plate
plate strip
cleaning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110450036.XA
Other languages
Chinese (zh)
Inventor
刘士新
姚明昊
陈大力
温睿
赵梓焱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN202110450036.XA priority Critical patent/CN113127449A/en
Publication of CN113127449A publication Critical patent/CN113127449A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for constructing a data warehouse of the whole process of aluminum/copper plate strip production, which relates to the technical field of data warehouses and comprises the following steps: acquiring an existing aluminum/copper plate strip data file, identifying the data type of the aluminum/copper plate strip data file, analyzing the data file type, setting an importing mode, and importing data according to the importing mode to obtain migrated data; the migrated data is stored in a partition mode, the non-standard data in the migrated data is inquired, a cleaning scheme and a cleaning rule are determined according to the type of the non-standard data, and the migrated data is cleaned according to the cleaning rule to obtain cleaned data; and constructing a data characteristic table, placing the cleaned data into the data characteristic table, and mapping the data in the data characteristic table to a data warehouse. The method is suitable for finishing the data integration method facing the whole process of aluminum/copper plate strip production, and forms an industrial big data integration processing standardization scheme.

Description

Method for constructing aluminum/copper plate strip production full-flow data warehouse
Technical Field
The invention relates to the technical field of data warehouses, in particular to a method for constructing a full-flow data warehouse for aluminum/copper plate strip production.
Background
The Data warehouse still has the main function of analyzing and organizing a large amount of Data accumulated by an organization through the online transaction processing (OLTP) of the information system over the years through a Data storage structure specific to the theory of the Data warehouse, so as to facilitate the proceeding of various analysis methods such as online analysis processing (OLAP) and Data Mining (Data Mining), further support the creation of a Decision Support System (DSS) and an Executive Information System (EIS), help a decision maker to quickly and effectively analyze valuable information from a large amount of Data, facilitate decision making and quickly respond to external environment changes, and help to construct Business Intelligence (BI). With the fusion and breakthrough of new generation big data information technologies such as distributed storage, distributed computation, data warehouse, data mining analysis and the like, the data warehouse plays an important role in the whole big data ecology.
The data warehouse is comprehensively utilized to carry out data integration modeling in the whole aluminum/copper plate strip production process, so that valuable historical information can be efficiently stored, factors influencing indexes such as product quality, production cost, production yield and the like can be mined to form chart structure data, the product quality and the equipment running state are subjected to statistical analysis, the product quality and the equipment degradation trend are evaluated, the equipment running condition is determined, corresponding countermeasures are taken, and the equipment investment efficiency and the production efficiency are improved. Therefore, the modeling of the data warehouse adopted on the aluminum/copper plate strip production full-flow industrial equipment has great and profound significance.
The existing aluminum/copper plate strip production data processing method has the following problems: 1. large amounts of production data are accumulated in aluminum/copper plate strip mills, but the correlation between data of different processes and different devices cannot be effectively utilized; 2. the production data is stored in a database such as SQL Server and the like, but the data stored in an aluminum/copper plate strip factory is only for daily business management and cannot be extracted and utilized; 3. the historical data contains a large number of missing values, abnormal values, repeated values and the like, and the data information value is not high; 4. with the increase of data volume, a simple relational database cannot meet the requirements in production and cannot be effectively integrated and stored. In summary, there is a need for an aluminum/copper plate strip production data processing method that can effectively utilize aluminum/copper plate strip production data, extract accumulated data, and increase the value of data information.
Disclosure of Invention
The invention provides a method for constructing a full-flow data warehouse for aluminum/copper plate strip production, which solves the problem that the existing method for processing the production data of the aluminum/copper plate strip cannot effectively utilize the production data of the aluminum/copper plate strip.
In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:
a method for constructing a data warehouse of the whole process of aluminum/copper plate strip production comprises the following steps:
acquiring an existing aluminum/copper plate strip data file, identifying the data type of the aluminum/copper plate strip data file, analyzing the data file type, setting an importing mode, and importing data according to the importing mode to obtain migrated data;
the migrated data is stored in a partition mode, the non-standard data in the migrated data is inquired, a cleaning scheme and a cleaning rule are determined according to the type of the non-standard data, and the migrated data is cleaned according to the cleaning rule to obtain cleaned data;
and constructing a data characteristic table, placing the cleaned data into the data characteristic table, and mapping the data in the data characteristic table to a data warehouse.
Preferably, the setting and importing manner of the analysis data file type includes the following steps:
judging the type of the pre-imported file, and importing the pre-imported file by adopting a corresponding strategy according to different file types;
judging the attribute of the data in the pre-imported file, and importing the data by adopting corresponding strategies according to different data attributes;
and setting a timed intelligent import script to import the file into the distributed file system for partition storage according to the periodicity of the data production time.
Preferably, the specific step of cleaning the migrated data according to the cleaning rule is as follows:
extracting source data needing cleaning from a specified partition of the distributed file system;
performing data preprocessing on source data before data cleaning, and removing useless data content to obtain preprocessed data;
cleaning the migrated data according to a cleaning rule;
checking whether the data meet the requirements of the cleaning rules, and if so, storing the cleaned data into an appointed partition of the distributed file system; otherwise, returning to the previous step to continue cleaning until the rule is met.
Preferably, the constructing the data feature table comprises the following steps:
setting a pre-stored table structure in an open source data warehouse tool, respectively setting a table for the processes of the aluminum/copper plate strip production flow, partitioning each process according to equipment, and partitioning each equipment according to time periods;
and mapping the data which is extracted, cleaned and stored on the distributed file system into a corresponding partition table in the open-source data warehouse tool according to the process-equipment-time to form an external table structure.
Preferably, the non-specification data includes data including missing values, repeated values, and abnormal values.
The invention has the beneficial effects that:
the method adopts a data migration method, is well suitable for the correlated utilization of multi-source heterogeneous data of the aluminum/copper plate strip, uniformly collects data among different devices, can also introduce historical data into a distributed file system to realize the space-time uniformity of aluminum/copper plate strip factories, and solves the problems that the correlation between different processes and data among different devices cannot be effectively utilized and data accumulated in the past year cannot be extracted for utilization in the traditional aluminum/copper plate strip data storage mode;
the invention adopts a clear data means, and solves the problem of irregular data of the traditional aluminum/copper plate strip data storage mode.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a flow chart of the present invention for building a data warehouse.
FIG. 3 is a flow chart of data compression according to the present invention.
FIG. 4 is a flow chart of intelligent data migration in accordance with the present invention.
FIG. 5 is a diagram of a multi-source multi-modal data migration scheme in accordance with the present invention.
FIG. 6 is a block diagram of a distributed data cleansing system according to the present invention.
FIG. 7 is a flow chart of data cleansing according to the present invention.
FIG. 8 is a data cleansing protocol according to the present invention.
FIG. 9 is a flow chart of data integration according to the present invention.
FIG. 10 is an example of a table structure of a data warehouse of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description. Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
In the description of the present invention, it is to be understood that the orientation or positional relationship indicated by the directional terms such as "front, rear, upper, lower, left, right", "lateral, vertical, horizontal" and "top, bottom", etc., are generally based on the orientation or positional relationship shown in the drawings, and are used for convenience of description and simplicity of description only, and in the absence of any contrary indication, these directional terms are not intended to indicate and imply that the device or element so referred to must have a particular orientation or be constructed and operated in a particular orientation, and therefore should not be considered as limiting the scope of the present invention: the terms "inner and outer" refer to the inner and outer relative to the profile of the respective component itself.
Spatially relative terms, such as "above … …," "above … …," "above … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial relationship to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if a device in the figures is turned over, devices described as "above" or "on" other devices or configurations would then be oriented "below" or "under" the other devices or configurations. Thus, the exemplary term "above … …" can include both an orientation of "above … …" and "below … …". The device may be otherwise variously oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.
It should be noted that the terms "first", "second", and the like are used to define the components, and are only used for convenience of distinguishing the corresponding components, and the terms have no special meanings unless otherwise stated, and therefore, the scope of the present invention should not be construed as being limited.
The invention provides a technical scheme that: a method for constructing a data warehouse in a full process of aluminum/copper plate strip production is disclosed, a flow chart of the method is shown in figure 1, and a complete process for constructing the data warehouse shown in figure 2 comprises the following steps:
an intelligent extraction mode migration method for multi-source heterogeneous data. The aluminum/copper plate strip production full-flow data sources are different, the data sources with various data sources and formats are extracted to be stored in a unified mode, different data migration schemes are set according to the data attributes of the data sources, and the data sources are stored in special partitions of a Hadoop file system, so that convenience is brought to later operation. Compared with various performances of the existing data migration tool, as shown in table 1, the Sqoop (open source data transmission tool) can best meet the characteristics of huge data volume, high migration speed, simple operation and the like of the aluminum/copper plate strip production data.
TABLE 1 comparison of data migration tools
Figure BDA0003038373720000061
And constructing a distributed data cleaning system suitable for the aluminum/copper plate strip. The data of the aluminum/copper plate strip factory inevitably has the characteristics of a large amount of loss, abnormity, repeated data and the like, low data value density and the like, and the utilization efficiency of the data can be improved by cleaning the migrated original data. After comparing the performances of the existing data cleaning tools as shown in table 2, the constructed MapReduce data cleaning system can meet the characteristics that the aluminum/copper plate strip production environment occupies less resources, can be loaded by user settings, and can meet complex cleaning.
TABLE 2 comparison of data cleaning tools
Figure BDA0003038373720000062
And finishing the data integration method facing the whole aluminum/copper plate strip production process. The data warehouse is subject-oriented, integrated, stable and time-varying. Compared with the traditional relational database data storage mode, the method is more suitable for utilizing a large amount of historical data to make decisions and manage. A table structure is designed according to the data characteristics and the production process characteristics of an aluminum/copper plate and strip factory, and a UDF (user defined function) with a specific function is compiled according to requirements, so that the statistical analysis requirements in production are met. After comparing the performances of various data storage tools as shown in table 3, the use of Hive to construct a data warehouse can satisfy the characteristics that the production data volume of aluminum/copper plate strips is huge, a data format is required for calculation and statistics, and SQL statements can be supported for analysis.
TABLE 3 comparison of data storage tools
Figure BDA0003038373720000071
And different compression modes are adopted for compression when data is imported, so that the operation efficiency of the ETL process is improved, and a large amount of disk space is saved. The compression schedule for the aluminum/copper slab strip data warehouse was determined as shown in table 4 comparing the various compression schedules:
TABLE 4 compression mode comparison
Figure BDA0003038373720000072
Further, the intelligent data migration flow shown in fig. 4 further includes the following steps and functions for such features:
the file type is identified and imported as shown in fig. 5. In the production of the aluminum/copper plate strip, due to the complex process and numerous flows, various data files of different types can be generated, and aiming at the characteristic, the automatic identification function is designed and different treatments are carried out according to the types of the data files. The program automatically identifies the type of the data file to be imported, and the step is divided into the following steps:
if the pre-imported data is relational database data, such as data in an SQL Server database. The data volume of the format is large, the relevance among the data is strong, and a driver in Sqoop is used for large-scale data import.
If the pre-imported data is structurally stored data, for example, Excel files such as quality inspection information. The data is generally in a K-V format, has better normativity and is directly imported by using a Java program.
If the pre-imported data is unstructured data, such as text, pictures or videos. And carrying out split storage import on the data according to actual production requirements.
And analyzing the file attribute and setting an import mode. According to the production characteristics of the aluminum/copper plate strip, a plurality of data are objective and do not change in a time period basically; more production real-time data are continuously updated; and performing data compression processing at the time of importing.
If all the data imported during each import is out of specification and has extremely low efficiency, a large amount of import time and database storage space can be saved if the data is processed during the import. Aiming at the characteristic, the data is orderly and structurally imported in the distributed file system division area on the Hadoop in advance.
If the data volume needing to be imported is small and basically cannot be changed, then carrying out full-scale import, for example, the equipment information in the SQL Server database has basically no change in the content of the equipment, and the equipment information is not imported after being imported into the database completely, and if the data volume needs to be updated and changed, the changed data is imported in full; if the amount of data to be imported is large and a large amount of data is generated in each time slot, the data in the new time slot is separately and incrementally imported into the file area allocated in advance. The normalization of the data and the efficiency of storage are guaranteed.
And setting running time intelligent import. Data analysis on the aluminum/copper plate strip shows that the production is basically operated within 24 hours, and a large amount of human resources are wasted if a specially-assigned person is arranged for importing the aluminum/copper plate strip. The intelligent import function is designed according to the characteristics, the program automatically runs import data within a fixed time of one day, the operation information is sent to related personnel after the completion of the import data, warning information is sent if the import of the intentional factors fails, and the related program is operated again or manually imported.
Further, the construction of the distributed data cleaning system suitable for the aluminum/copper plate strip specifically comprises the following steps:
the data of the whole aluminum/copper plate strip production process has the characteristics of large data volume, low data value density and the like, and the data contains a large number of missing values, abnormal values and repeated values. According to the method, a distributed data cleaning scheme for the aluminum/copper plate strip is constructed by analyzing the characteristics of production data, and the steps of the cleaning method for the whole production flow data of the aluminum/copper plate strip are designed as follows:
a distributed data cleaning system suitable for aluminum/copper sheet strip was constructed as in fig. 6. It is composed of the following parts:
the extracted source data: the data obtained through the intelligent extraction process is placed in a designated file partition and contains a large amount of non-standard data, such as missing values, repeated values, abnormal values, and the like.
Distributed file system: and providing partitions for the intelligently extracted source data, and extracting relevant part of the source data according to the request of a user.
The cleaning scheme comprises the following steps: the user can set the data cleaning rule by himself, and the cleaning threshold value is set according to the requirement of the user, and the processing method is as shown in fig. 8. For example, when the missing value is processed, if the missing value of the column is greater than 90%, the column is directly deleted; if the deletion is below 30%, filling in the context; others use average population. If the user has no special requirements, the system provides a set of default cleaning schemes according to the characteristics of the cleaning method and the aluminum/copper plate strip production data.
A data washing engine: and a core part in the data cleaning system loads the rules in the cleaning scheme by using a MapReduce framework in Hadoop and cleans the source data.
After the distributed data cleansing system is constructed, the data cleansing operation is performed by the steps shown in fig. 7:
loading data: source data that needs to be flushed is extracted from a designated partition of the distributed file system.
Data preprocessing: data is preprocessed before data cleaning, useless data content is removed, and data with good structure is provided for data cleaning. For example, the fusion cast number is lost when the data of the fusion cast process is cleaned, the coil number is lost when the data of the cold rolling process is cleaned, and the like.
Data cleaning: starting a data cleaning engine to load user-set or default data cleaning scheme processing source data
Checking data: and checking whether the data meet the requirements of the cleaning rule. If the data are in accordance with the preset partition, storing the cleaned data into the appointed partition of the distributed file system; otherwise, returning to the previous step to continue cleaning until the rule is met.
Further, the data integration facing the whole aluminum/copper plate strip production process comprises the following steps:
the production data of the aluminum/copper plate strip has the characteristic of strong correlation, the generation and application of the data of the aluminum/copper plate strip production process surround the whole life cycle of the plate strip, the correlation among the data is strong, and the big data is used for giving a decision and giving a decision basis. An analysis-oriented data warehouse is constructed according to the characteristic, and the data stored on the HDFS (distributed file system) is extracted and cleaned through the steps and stored, and the specific steps are as follows as shown in FIG. 9:
a table structure is designed. Setting a pre-stored table structure in a Hive (open source data warehouse tool), respectively setting a table for the processes of the aluminum/copper plate strip production flow, partitioning each process according to equipment, and partitioning each equipment according to time periods, for example, from 00: 00: 00 to 23: 59: the data in the 59 time period is considered as a partition. The user can find data in a product ID life cycle in a microscopic manner in a process flow, and also can perform statistics on data of three dimensions of time, equipment and process through partitioning, so as to provide a basis for decision making, as shown in fig. 10, for example, a casting flow is taken as an example, data is partitioned according to equipment, and each equipment is partitioned according to time, for example, partitioned data generated by a casting machine in the casting flow from 1/2021 to 1/4/2021.
And loading the data. And mapping the data which is extracted, cleaned and stored on the HDFS to a corresponding partition table in the Hive according to process-equipment-time to form an external table structure, wherein the data on the HDFS is not influenced by deleting the table in the Hive.
Custom functions are added. And customizing the corresponding UDF according to the particularity of the production data of the aluminum/copper plate strip, and processing more complicated affairs. For example, when energy consumed every day is counted, a user-defined function is set, and when data that the starting time and the ending time are not on the same day is found, the consumption of the user in a time period required by the user is intercepted.
And (5) carrying out statistical analysis. Users manipulate data in the data warehouse for retrieval and statistics, and provide basis for analysis and decision making.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (5)

1. The method for constructing the data warehouse of the whole process of aluminum/copper plate strip production is characterized by comprising the following steps of:
acquiring an existing aluminum/copper plate strip data file, identifying the data type of the aluminum/copper plate strip data file, analyzing the data file type, setting an importing mode, and importing data according to the importing mode to obtain migrated data;
the migrated data is stored in a partition mode, the non-standard data in the migrated data is inquired, a cleaning scheme and a cleaning rule are determined according to the type of the non-standard data, and the migrated data is cleaned according to the cleaning rule to obtain cleaned data;
and constructing a data characteristic table, placing the cleaned data into the data characteristic table, and mapping the data in the data characteristic table to a data warehouse.
2. The method for constructing the aluminum/copper plate strip production full-flow data warehouse as claimed in claim 1, wherein the step of importing the analysis data file type setting comprises the following steps:
judging the type of the pre-imported file, and importing the pre-imported file by adopting a corresponding strategy according to different file types;
judging the attribute of the data in the pre-imported file, and importing the data by adopting corresponding strategies according to different data attributes;
and setting a timed intelligent import script to import the file into the distributed file system for partition storage according to the periodicity of the data production time.
3. The aluminum/copper plate strip production full-flow data warehouse construction method as claimed in claim 2, wherein the concrete steps of cleaning the migrated data according to the cleaning rules are as follows:
extracting source data needing cleaning from a specified partition of the distributed file system;
performing data preprocessing on source data before data cleaning, and removing useless data content to obtain preprocessed data;
cleaning the migrated data according to a cleaning rule;
checking whether the data meet the requirements of the cleaning rules, and if so, storing the cleaned data into an appointed partition of the distributed file system; otherwise, returning to the previous step to continue cleaning until the rule is met.
4. The aluminum/copper strip production full flow data warehouse construction method according to claim 1, wherein the construction of the data feature table comprises the following steps:
setting a pre-stored table structure in an open source data warehouse tool, respectively setting a table for the processes of the aluminum/copper plate strip production flow, partitioning each process according to equipment, and partitioning each equipment according to time periods;
and mapping the data which is extracted, cleaned and stored on the distributed file system into a corresponding partition table in the open-source data warehouse tool according to the process-equipment-time to form an external table structure.
5. The aluminum/copper plate strip production full-flow data warehouse construction method according to claim 1, characterized in that: the non-specification data includes data including missing values, repeated values, and abnormal values.
CN202110450036.XA 2021-04-25 2021-04-25 Method for constructing aluminum/copper plate strip production full-flow data warehouse Pending CN113127449A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110450036.XA CN113127449A (en) 2021-04-25 2021-04-25 Method for constructing aluminum/copper plate strip production full-flow data warehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110450036.XA CN113127449A (en) 2021-04-25 2021-04-25 Method for constructing aluminum/copper plate strip production full-flow data warehouse

Publications (1)

Publication Number Publication Date
CN113127449A true CN113127449A (en) 2021-07-16

Family

ID=76779868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110450036.XA Pending CN113127449A (en) 2021-04-25 2021-04-25 Method for constructing aluminum/copper plate strip production full-flow data warehouse

Country Status (1)

Country Link
CN (1) CN113127449A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6178418B1 (en) * 1998-07-28 2001-01-23 Noetix Corporation Distributed data warehouse query and resource management system
US20100174720A1 (en) * 2006-04-26 2010-07-08 Robert Mack Coherent data identification method and apparatus for database table development
US20110110515A1 (en) * 2009-11-11 2011-05-12 Justin Tidwell Methods and apparatus for audience data collection and analysis in a content delivery network
CN102880123A (en) * 2012-08-28 2013-01-16 浙江大学 System and method for controlling production process of petrochemical enterprise on basis of manufacturing execution system (MES) workflow
CN107766541A (en) * 2017-10-30 2018-03-06 北京国电通网络技术有限公司 With electricity consumption overall situation full dose data transfer and storage method, device, electronic equipment
CN110209650A (en) * 2019-05-05 2019-09-06 苏宁易购集团股份有限公司 The regular moving method of data, device, computer equipment and storage medium
CN111708773A (en) * 2020-08-13 2020-09-25 江苏宝和数据股份有限公司 Multi-source scientific and creative resource data fusion method
CN111767267A (en) * 2020-06-18 2020-10-13 杭州数梦工场科技有限公司 Metadata processing method and device and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6178418B1 (en) * 1998-07-28 2001-01-23 Noetix Corporation Distributed data warehouse query and resource management system
US20100174720A1 (en) * 2006-04-26 2010-07-08 Robert Mack Coherent data identification method and apparatus for database table development
US20110110515A1 (en) * 2009-11-11 2011-05-12 Justin Tidwell Methods and apparatus for audience data collection and analysis in a content delivery network
CN102880123A (en) * 2012-08-28 2013-01-16 浙江大学 System and method for controlling production process of petrochemical enterprise on basis of manufacturing execution system (MES) workflow
CN107766541A (en) * 2017-10-30 2018-03-06 北京国电通网络技术有限公司 With electricity consumption overall situation full dose data transfer and storage method, device, electronic equipment
CN110209650A (en) * 2019-05-05 2019-09-06 苏宁易购集团股份有限公司 The regular moving method of data, device, computer equipment and storage medium
CN111767267A (en) * 2020-06-18 2020-10-13 杭州数梦工场科技有限公司 Metadata processing method and device and electronic equipment
CN111708773A (en) * 2020-08-13 2020-09-25 江苏宝和数据股份有限公司 Multi-source scientific and creative resource data fusion method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ASHISH THUSOO等: "Hive- a petabyte scale data warehouse using hadoop", 《2010 IEEE 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING》, pages 1 - 9 *
温国强: "数据仓库和数据挖掘在合金生产控制中的应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》, pages 138 - 865 *

Similar Documents

Publication Publication Date Title
CN104820670B (en) A kind of acquisition of power information big data and storage method
CN107193266A (en) A kind of platform monitoring system of big data
CN109597850A (en) Tobacco integrated information data mart modeling stores platform and data processing method
CN109543067A (en) Enterprise's production status based on artificial intelligence monitors analysis system in real time
CN108280084A (en) A kind of construction method of data warehouse, system and server
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
CN106066895A (en) A kind of intelligent inquiry system
CN106383916A (en) Data processing method based on predictive maintenance of industrial equipment
CN109299199A (en) Precursor chemicals dimensional analytic system and implementation method based on data warehouse
CN109359126B (en) Method and system for constructing intelligent learning query model based on business user habits
CN111708895B (en) Knowledge graph system construction method and device
CN109669975A (en) A kind of industry big data processing system and method
CN106780157B (en) Ceph-based power grid multi-temporal model storage and management system and method
CN110007905A (en) A kind of generation method and system of the software development scheme based on big data
CN110134698A (en) Data managing method and Related product
CN113127449A (en) Method for constructing aluminum/copper plate strip production full-flow data warehouse
Pedrozo et al. A tool for automatic index selection in database management systems
CN108932258A (en) Data directory processing method and processing device
CN111414355A (en) Offshore wind farm data monitoring and storing system, method and device
WO2016206395A1 (en) Weekly report information processing method and device
CN116257594A (en) Data reconstruction method and system
CN114047729B (en) Natural plant processing control method, system, computer device and storage medium
Ptiček et al. MapReduce research on warehousing of big data
CN114676208A (en) Data warehouse
CN114358812A (en) Multi-dimensional power marketing analysis method and system based on operation and maintenance big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20240322