CN116864020B

CN116864020B - Data management system applied to EGDA generation process

Info

Publication number: CN116864020B
Application number: CN202311132545.3A
Authority: CN
Inventors: 王刚; 李星
Original assignee: Shandong Luyang New Materials Technology Co ltd
Current assignee: Shandong Luyang New Materials Technology Co ltd
Priority date: 2023-09-05
Filing date: 2023-09-05
Publication date: 2023-11-03
Anticipated expiration: 2043-09-05
Also published as: CN116864020A

Abstract

The invention relates to the field of data processing, in particular to a data management system applied to an EGDA generation process, which comprises the following components: the data acquisition and preprocessing module acquires temperature time sequence data and residual time sequence data; the data analysis module obtains the aggregation degree according to the difference between the coordinates of each residual value; acquiring a target point and acquiring each extreme point according to the target point; then obtaining each small sequence segment and corresponding trend characteristic; obtaining characteristic trend distribution degree according to trend characteristics of each small sequence segment; obtaining suspected esterification abnormality degree according to the aggregation degree and the characteristic trend distribution degree; obtaining a marked data point according to the suspected esterification abnormality degree; the data management module acquires the abnormal points, eliminates residual time sequence data according to the abnormal points and the marked data points, and stores and manages the residual time sequence data after elimination. The invention uses the data processing method to make the abnormality detection result more accurate in the EGDA production scene.

Description

Data management system applied to EGDA generation process

Technical Field

The invention relates to the technical field of data processing, in particular to a data management system applied to an EGDA generation process.

Background

Ethylene Glycol Diacetate (EGDA) is an organic compound, also known as ethylene glycol diethyl ester. The method is an ester compound prepared by the reaction of ethylene glycol and acetic acid. Ethylene glycol diacetate is commonly used as an organic solvent and has wide application in the fields of paint, ink, adhesive, cleaning agent and the like. It has good solubility and volatility and can dissolve many organic and inorganic substances. In addition, ethylene glycol diacetate is considered a more environmentally friendly alternative in many industrial applications due to its lower toxicity and volatility.

During the production of EGDA, a lot of data need to be collected, such as reaction time, reaction temperature, reactant concentration, catalyst concentration, etc., are generated. Especially, in the preparation process, the control of the reaction temperature is very critical, and the unsuitable reaction temperature not only can slow down the reaction speed, but also can lead to incomplete reaction, reduced product quality, and even has the safety risk caused by temperature runaway. It is therefore important to ensure that the reaction temperature reaches the proper range during the preparation process. In the traditional temperature anomaly detection of EGDA production, STL time sequence decomposition is generally adopted, 3 sigma segmentation is carried out on residual errors obtained by the decomposition, and anomaly data with larger outlier degree is obtained, so that the anomaly detection of the data is realized.

In the process of industrially generating EGDA, the ambient temperature is controlled to be 110-150 ℃, so that the temperature of the EGDA process floats within a certain range, the traditional mode of detecting temperature abnormality in the production process generally adopts STL time sequence decomposition to obtain a residual error term, and the point with overlarge floating amplitude is regarded as an abnormal point, such as sudden drop and sudden rise of the temperature. The direct esterification of ethylene glycol and acetic acid is the most common synthetic method for EGDA and this reaction is an esterification reaction. The esterification reaction is an exothermic reaction, and a large amount of heat is generated by breaking bonds, absorbing heat at the beginning, and then combining into an ester. Therefore, the EGDA generates little heat absorption and a large amount of heat release, and the temperature of the EGDA can be increased or decreased. The conventional STL algorithm captures the heat change generated by the esterification reaction at the moment of EGDA generation as an anomaly, so that the conventional method has limitation on the detection of the scene temperature anomaly. According to the method, the characteristic model of the temperature change of the EGDA is built by combining the temperature change in the current scene, so that a relatively accurate abnormal point detection result is obtained, and accurate monitoring management of temperature data is realized.

Disclosure of Invention

The invention provides a data management system applied to an EGDA generation process, which aims to solve the existing problems.

The data management system applied to the EGDA generation process adopts the following technical scheme:

one embodiment of the present invention provides a data management system applied to an EGDA generation process, the system including the following modules:

the data acquisition and preprocessing module acquires temperature time sequence data in the EGDA generation process, and decomposes and acquires residual time sequence data, wherein the residual time sequence data comprises a plurality of data points, and each data point represents a residual value at each moment;

the data analysis module acquires a time sequence change curve of the residual according to the residual time sequence data, acquires coordinates of each residual value in the curve, and acquires the aggregation degree of each data point in the residual time sequence data according to the difference between the coordinates of each residual value;

acquiring a data point closest to the maximum value according to the time sequence change curve of the residual error, and marking the data point as a target point;

acquiring a plurality of extreme points nearest to a target point, acquiring a plurality of small sequence fragments according to the target point and each extreme point, acquiring trend characteristics of each small sequence fragment, and acquiring characteristic trend distribution degrees of data points in residual time sequence data according to the trend characteristics of all the small sequence fragments;

obtaining suspected esterification abnormality degree of each data point in the residual sequence data according to aggregation degree of each data point in the residual sequence data and characteristic trend distribution degree of the data points in the residual sequence data;

obtaining a marked data point according to the suspected esterification abnormality degree of each data point in the residual sequence data;

and the data management module is used for obtaining abnormal points according to the residual time sequence data, alarming when the data points are both abnormal points and marked data points, and realizing the safety management of the EGDA production process.

Further, the specific acquisition steps of the residual time sequence data are as follows:

and (5) using STL time sequence decomposition to the temperature time sequence data to obtain residual time sequence data.

Further, the specific step of obtaining the aggregation degree of each data point in the residual time sequence data is as follows:

the formula for the aggregation level of each data point in the residual time series data is as follows:

in the method, in the process of the invention,the abscissa and ordinate representing the ith data point in the residual time series data, +.>The abscissa and ordinate representing the (i+1) th data point in the residual time series data, +.>Representing the number of elements in the residual time series data,representing hyperbolic tangent function, ">Indicating the degree of aggregation of the ith data point in the residual sequence data,the representation is centered on the ith data point in the residual sequenceIs the number of data points in the neighborhood of the radius.

Further, the acquiring the plurality of extreme points closest to the target point includes the following specific steps:

the extreme points comprise two minimum value points and two maximum value points;

acquiring a minimum point which is left of the target point and closest to the target point, and marking the minimum point as a first minimum point; acquiring a maximum point which is left of the target point and closest to the target point, and marking the maximum point as a first maximum point; acquiring a minimum point which is right of the target point and closest to the target point, and marking the minimum point as a second minimum point; and acquiring a maximum point which is right to the target point and closest to the target point, and recording the maximum point as a second maximum point.

Further, the obtaining a plurality of small sequence segments according to the target point and each extreme point comprises the following specific steps:

acquiring residual data points from a target point to a first minimum point, and recording the residual data points as small sequence fragmentsThe method comprises the steps of carrying out a first treatment on the surface of the Acquiring residual data points from a target point to a second minimum point, and marking the residual data points as a small sequence segment +.>The method comprises the steps of carrying out a first treatment on the surface of the Acquiring residual data points from the first minimum value point to the first maximum value point, and marking the residual data points as small sequence fragments +.>The method comprises the steps of carrying out a first treatment on the surface of the Acquiring residual data points from the second minimum value point to the second maximum value point, and marking the residual data points as small sequence fragments +.>。

Further, the specific acquisition steps of the trend characteristic of each small sequence segment are as follows:

for any small sequence segment, firstly acquiring slopes between adjacent data points in the small sequence segment, forming a slope sequence by slopes between all adjacent data points, acquiring differences between adjacent slopes in the slope sequence, marking the differences as first differences, and adding all the first differences to obtain trend characteristics of each small sequence segment; wherein the difference represents the absolute value of the difference.

Further, the specific step of obtaining the characteristic trend distribution degree of the data points in the residual time sequence data is as follows:

the formula of the characteristic trend distribution degree of the data points in the residual time sequence data is as follows:

in the method, in the process of the invention,representing a small sequence fragment->The number of data in>Representing a small sequence fragment->The number of data in the data set,representing a small sequence fragment->The number of data in>Representing a small sequence fragment->The number of data in>Representing a small sequence fragment->Trend characteristics of->Representing a small sequence fragment->Trend characteristics of->Representing a small sequence fragment->Trend characteristics of->Representation of smallSequence fragment->Trend characteristics of->And the characteristic trend distribution degree of the data points in the residual time sequence data is represented.

Further, the specific acquisition steps of the suspected esterification abnormality degree of each data point in the residual sequence data are as follows:

the formula of the suspected esterification abnormality degree of each data point in the residual sequence data is as follows:

in the method, in the process of the invention,indicating the degree of aggregation of the ith data point in the residual sequence data,/->Characteristic trend distribution degree of data points in residual time series data, < >>And->For a preset threshold value, ++>Represents an exponential function based on natural constants, < ->And (5) representing suspected esterification abnormality degree of the ith data point in the residual sequence data.

Further, the specific acquisition steps of the marker data points are as follows:

and marking residual data points with suspected esterification abnormality degrees higher than the classification threshold Y as marker data points.

Further, the specific acquisition steps of the abnormal points are as follows:

obtaining the average value of the residual time sequence data according to all the data in the residual time sequence data, and marking the average value asObtaining standard deviation of residual time sequence data according to all data in the residual time sequence data, and marking the standard deviation as +.>Acquiring the data value in the residual time sequence dataAnd->Data within the range and is noted as outliers.

The technical scheme of the invention has the beneficial effects that: the characteristic of the esterification reaction temperature change is utilized to obtain change data points caused by the esterification reaction temperature change, the change data points are reflected as deviation points in residual errors, and the temperature outlier points caused by real anomaly reasons are obtained by eliminating the data points.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block flow diagram of a data management system of the present invention as applied to an EGDA generation process.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purposes, the following detailed description refers to the specific implementation, structure, characteristics and effects of the data management system for the EGDA generating process according to the present invention with reference to the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the data management system applied to the EGDA generating process provided by the present invention with reference to the accompanying drawings.

Referring to fig. 1, a block flow diagram of a data management system applied to an EGDA generation process according to one embodiment of the present invention is shown, where the system includes the following blocks:

module 101: and the data acquisition and preprocessing module.

In addition, since the temperature is controlled within a certain range during the production of EGDA, the control of the temperature is very critical, and thus the present embodiment analyzes the change of the temperature during the production of EGDA.

Specifically, temperature time series data in one day are collected by a temperature monitor of the industrial EGDA, time in seconds is taken as a time horizontal axis, and a coordinate system is constructed by taking the temperature vertical axis starting from 100 in the temperature. And obtaining a time sequence change curve of the temperature data in a coordinate system through the temperature time sequence data. STL time sequence decomposition is carried out on a time sequence change curve of temperature data, and the time sequence change curve can be decomposed into three items, wherein the three items comprise: trend item time sequence data, season item time sequence data and residual time sequence data, wherein the residual time sequence data comprises a plurality of data points, and each data point represents a residual value at each moment.

So far, residual time sequence data are obtained.

Module 102: and a data analysis module.

It should be noted that, the environmental temperature data in the EGDA generating process is time sequence data fluctuating up and down, and the residual error obtained through STL time sequence decomposition is represented on a coordinate axis, that is, most of the residual error is aggregated near a central axis, and the degree of abnormality of the data is usually smaller when the data is closer to the central axis. But this simply separates the outliers by distance from the outlier level and does not take into account the outlier of the data points due to the esterification reaction in the current scenario. Therefore, the data points of abnormal points and suspected esterification reaction are classified from characteristics, a suspected esterification abnormality degree evaluation model is constructed by combining the aggregation space characteristics formed by the density between points of temperature data residual errors of the esterification reaction and the trend characteristics of point arrangement, residual data points are distinguished and classified, abnormal residual error points caused by the esterification reaction are obtained, and the abnormal residual error points are removed from the final analysis processing of residual error constant points.

(1) And acquiring the aggregation degree of each data point in the residual time sequence data.

It should be further noted that, the residual time sequence change curve may be obtained from the residual time sequence data. The aggregation degree of the residual data points is the aggregation characteristic of the data in the curve image plane space, so that the density aggregation condition of the data is analyzed. Partial ideas of DB-SCAN density clustering are adopted to obtain data pointsSearch in the circular area for radius, +.>The data separation distance, i.e., the average density, is typically averaged. The variation of the abnormal data points in the same unit time is mostly shown as sudden variation and large outlier degree, so that only one point or even no point exists in the radius range around the abnormal point; the esterification reaction is a chemical reaction process, and the change form of the data points in the same time unit is stable and uniform, so that two to three points are required to exist in the radius range around the point. The probability of more adjacent points in the radius of the circular area, which is an inflection point, in the time sequence change curve is higher, the aggregation degree is higher, and the data points which are misjudged to be abnormal points are usually allSince the inflection point is located at the highest point or the lowest point, the likelihood of the suspected esterification abnormality corresponding to the inflection point is increased.

Specifically, a time sequence change curve of the residual is obtained by fitting the residual time sequence data, coordinates of each residual value in the curve are obtained, and the aggregation degree of data points in the residual time sequence is obtained according to the difference between the coordinates of each residual value. The formulation is as follows:

Wherein, the liquid crystal display device comprises a liquid crystal display device,is the Euclidean distance between the (i+1) th data point and the (i) th data point, and represents two coordinate pointsDirect distance between (I)>Is the sum of the distances between n data points, +.>Is the average of the distances between n adjacent data points, i.e. the radius of the circle field +.>。/>Representing the data point xSearching is performed in a circular neighborhood of radius, and the result is the number of adjacent data points present in the neighborhood. />The first 0.5 is for controlling the value +.>Within the optimum range of>The value obtained +.>1, tan h final normalized result value +.>0.5, let the final +.>The normalization effect is better. The degree of aggregation of each data point in the residual is thus obtained.

Note that the time series change curve in this embodiment is obtained by fitting according to a 5 th degree polynomial using a least square method.

Thus, the aggregation degree of each data point in the residual time sequence data is obtained.

(2) And obtaining the characteristic trend distribution degree of the data points in the residual time sequence data.

The trend distribution characteristics are analyzed by continuously adopting the time sequence variation curve of the residual error. Each cycle of the seasonal term separated from the original temperature data sequence is represented in a form of short frequency and small amplitude in the graph, while the exothermic process of the esterification reaction generated by the mass production is smooth and uniform, and is represented in a form of long frequency and large amplitude in the temperature data sequence, so that the residual form obtained by subtracting the seasonal term from the temperature data of the part of the esterification reaction of the original sequence can be deduced as follows: along with the time sequence, the device gradually moves away from the central axis and returns to the central axis, and gradually returns to the central axis according to the rule when the device reaches the furthest point.

Further, since the points of the esterification reaction detected as abnormal points are usually located at the top of the temperature change curve, the trend characteristic is detected from the top. Firstly, finding all vertexes in a residual error item, wherein the vertexes are the maximum value on a time sequence change curve of the residual error; the difference of the slope changes of the data on the left and the right sides is not large from the top point, the slope changes between the following data points are still not large after the data reaches the valley point after the first large slope changes. The total regular trend number is uncertain, and only two curves around the peak point are taken to indicate that the peak point is the peak point of the temperature change curve of the esterification reaction. The integral trend change degree of the curve is reflected by the accumulated sum and the average value of slope difference values between every two points of the curve, and if the obtained average value is smaller, the change degree of the curve is small, and the curve is a smooth trend distribution curve. Thereby obtaining the trend distribution degree capable of reflecting the trend characteristic of the esterification reaction temperature change data.

Specifically, acquiring one data point closest to the maximum value according to a time sequence change curve of the residual error, and marking the data point as a target point; when the target point is at the leftmost side and the rightmost side, in order to meet the requirement that two extreme points exist at two sides in the follow-up, one data point closest to the secondary maximum value can be selected as the target point, if two extreme points exist at two sides, one data point closest to the tertiary maximum value is continuously selected as the target point, and the process is repeated until two extreme points exist at two sides of the target point, and then the target point selection is completed.

And obtaining the characteristic trend distribution degree of the data points in the residual time sequence according to the difference value of the slopes of the data points in the residual time sequence.

Firstly, all maximum points and minimum points are obtained according to a time sequence change curve of a residual, and because a target point is one maximum value in the time sequence change curve of the residual, one minimum point which is left of the target point and is closest to the target point is obtained and is recorded as a first minimum point; acquiring a maximum point which is left of the target point and closest to the target point, and marking the maximum point as a first maximum point; acquiring a minimum point which is right of the target point and closest to the target point, and marking the minimum point as a second minimum point; and acquiring a maximum point which is right to the target point and closest to the target point, and recording the maximum point as a second maximum point.

Acquiring residual data points (including the target point and the first minimum point) between the target point and the first minimum point, and recording the residual data points as small sequence fragmentsSmall sequence fragment->The number of data in (a) is recorded as +.>The method comprises the steps of carrying out a first treatment on the surface of the Acquiring residual data points (comprising the target point and the second minimum point) from the target point to the second minimum point, and recording the residual data points as a small sequence segment +.>Small sequence fragment->The number of data in (a) is recorded as +.>The method comprises the steps of carrying out a first treatment on the surface of the Acquiring a first minimum value point to a first maximum valueResidual data points between points (including the first maximum point but not including the first minimum point) noted as small sequence fragment +.>Small sequence fragment->The number of data in (a) is recorded as +.>The method comprises the steps of carrying out a first treatment on the surface of the Acquiring residual data points (including the second maximum point but not including the second minimum point) between the second minimum point and the second maximum point, and recording the residual data points as small sequence fragments +.>Small sequence fragment->The number of data in (a) is recorded as +.>。

Then calculate the small sequence fragment againThe slope between adjacent data points of the series is obtained by obtaining the small series segment +.>Is specifically expressed as:

similarly, small sequence fragments are obtainedIs specifically expressed as:

in the method, in the process of the invention,represents the i-th slope between data points, +.>Represents the i+1th slope between data points, +.>Representing a small sequence fragment->The number of data in>Representing a small sequence fragment->The number of data in>Representing a small sequence fragment->The number of data in>Representing a small sequence fragment->The number of data in>Representing a small sequence fragment->Is characterized by the trend of (a),representing a small sequence fragment->Trend characteristics of->Representing a small sequence fragment->Trend characteristics of->Representing a small sequence fragment->Trend characteristics of (2).

Based on small sequence fragmentsTrend characteristic of (2) small sequence fragment->Trend characteristic of (2) small sequence fragment->Trend characteristics and small sequence fragment->The trend feature of the data points in the residual time sequence data is obtained, and the feature trend distribution degree of the data points in the residual time sequence data is specifically expressed as follows by a formula:

in the method, in the process of the invention,representing a small sequence fragment->The number of data in>Representing a small sequence fragment->The number of data in the data set,representing a small sequence fragment->The number of data in>Representing a small sequence fragment->The number of data in>Representing a small sequence fragment->Trend characteristics of->Representing a small sequence fragment->Trend characteristics of->Representing a small sequence fragment->Trend characteristics of->Representing a small sequence fragment->Trend characteristics of->And the characteristic trend distribution degree of the data points in the residual time sequence data is represented. And carrying out accumulation and averaging on the slope differences obtained in each section, wherein the averaging can reduce the result along with the increase of the same trend data points, thereby expanding the characteristics of the esterification reaction temperature change curve.

And thus, obtaining the characteristic trend distribution degree of the data points in the residual time sequence data.

(3) And constructing a suspected esterification abnormality degree evaluation model of the data according to the aggregation degree of the data points in the residual time sequence and the characteristic trend distribution degree.

Specifically, a threshold value is presetAnd->Wherein the present embodiment is +.>And->To describe the example, the present embodiment is not particularly limited, wherein +.>And->Depending on the particular implementation. And weighting the aggregation degree and the characteristic trend distribution degree, and constructing a suspected esterification abnormality degree evaluation model of the data.

Thus, the suspected esterification abnormality degree of each data point in the residual sequence is obtained.

(4) And classifying the residual time sequence data by a suspected esterification abnormality degree evaluation model.

After normalization treatment, when the aggregation degree D is more than or equal to 0.5, the aggregation density around the data point is larger, and the suspected esterification abnormality degree of the representative data is higher. Degree of characteristic trend distribution T1, a curve with a change in the temperature of the esterification reaction can be basically reflected, corresponding +.>Thus, the final classification threshold Y is 0.44 by weight matching. The residual data points with suspected esterification abnormality degree higher than 0.44 are marked as mark data points; residual data points with suspected esterification anomaly level less than or equal to 0.44 are noted as non-marker data points.

Module 103: and a data management module.

Obtaining the average value of the residual time sequence data according to all the data in the residual time sequence data, and marking the average value asObtaining standard deviation of residual time sequence data according to all data in the residual time sequence data, and marking the standard deviation as +.>Acquiring the data value in the residual time sequence dataAnd->And (3) marking the data in the range as abnormal points, eliminating residual data points with suspected esterification abnormality degree higher than 0.44 from all abnormal points, and when the data points are both marker data points and abnormal points, indicating that the temperature corresponding to the data points is abnormal, indicating that the abnormal conditions exist in the process of generating the EGDA, and warning is needed at the moment, so that the safety management of the production process of the EGDA is realized.

This embodiment is completed.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. A data management system for use in an EGDA generation process, the system comprising:

2. The data management system applied to the EGDA generating process according to claim 1, wherein the specific acquisition steps of the residual time series data are as follows:

3. The data management system applied to the EGDA generating process according to claim 1, wherein the specific obtaining step of the aggregation degree of each data point in the residual time series data is as follows:

in the method, in the process of the invention,the abscissa and ordinate representing the ith data point in the residual time series data, +.>The abscissa and ordinate representing the (i+1) th data point in the residual time series data, +.>Representing the number of elements in the residual time series data, +.>Representing hyperbolic tangent function, ">Indicating the degree of aggregation of the ith data point in the residual sequence data,the representation is centered on the ith data point in the residual sequenceIs the number of data points in the neighborhood of the radius.

4. The data management system applied to the EGDA generating process according to claim 1, wherein the acquiring the plurality of extreme points closest to the target point comprises the following specific steps:

5. The data management system applied to the EGDA generating process according to claim 4, wherein the obtaining a plurality of small sequence segments according to the target point and each extreme point comprises the following specific steps:

6. The data management system for use in an EGDA generation process according to claim 1, wherein the specific acquisition steps of the trend feature of each small sequence segment are as follows:

7. The data management system applied to the EGDA generating process according to claim 5, wherein the specific obtaining step of the characteristic trend distribution degree of the data points in the residual time series data is as follows:

in the method, in the process of the invention,representing a small sequence fragment->The number of data in>Representing a small sequence fragment->The number of data in>Representing a small sequence fragment->The number of data in>Representing a small sequence fragment->The number of data in>Representing a small sequence fragment->Trend characteristics of->Representing a small sequence fragment->Trend characteristics of->Representing a small sequence fragment->Is characterized by the trend of (a),representing a small sequence fragment->Trend characteristics of->And the characteristic trend distribution degree of the data points in the residual time sequence data is represented.

8. The data management system applied to the EGDA generating process according to claim 1, wherein the specific obtaining step of the suspected esterification abnormality degree of each data point in the residual sequence data is as follows:

9. The data management system for use in an EGDA generation process according to claim 1, wherein the specific acquisition steps of the marker data points are as follows:

10. The data management system applied to the EGDA generation process according to claim 1, wherein the specific acquisition steps of the outlier are as follows: