CN111935226A

CN111935226A - Method and system for realizing streaming computing by supporting industrial data

Info

Publication number: CN111935226A
Application number: CN202010651990.0A
Authority: CN
Inventors: 高明明; 高响
Original assignee: Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Current assignee: Shanghai Weiyi Intelligent Manufacturing Technology Co ltd
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2020-11-13
Anticipated expiration: 2040-07-08
Also published as: CN111935226B

Abstract

The invention provides a method and a system for realizing streaming processing calculation of industrial data, which are used for acquiring data of equipment, acquiring a bit array generated by the equipment, identifying that a field of the equipment is converted into a JSON character string and sending the JSON character string to a message middleware in real time; analyzing data information of the message middleware by using a Flink calculation engine, and asynchronously matching corresponding dimension information; and performing windowing calculation on the data information according to different dimensions, writing a first calculation result into an analysis layer of the message middleware, performing index calculation, and packaging and storing a second calculation result into a column type storage database. The data calculation is realized by adopting a big data stream calculation engine Flink, the data storage is completed by a big data distributed file storage system Hadoop, the data access safety uses a kerberos network authorization protocol, and the pain points of data timeliness calculation, automatic analysis report and data storage safety of the current industrial enterprise are solved.

Description

Method and system for realizing streaming computing by supporting industrial data

Technical Field

The invention relates to the technical field of data analysis, in particular to a method and a system for realizing streaming computing by supporting industrial data.

Background

With the vigorous development of new capital construction in China, the trend of providing intelligent production for traditional industrial enterprises by taking big data as a technical central point is new. Traditional enterprises such as industrial enterprises have data which is not locally available or is locally available, but the data is dispersed in different enterprise devices or systems. Industrial enterprises have great demands on production quality and cost, and the production quality can directly influence the production cost of the industrial enterprises.

Industrial enterprises are now still staying in the "pristine" state, relying on a lot of manpower. In the past, most of the management and control of enterprises on the quality of industrial products are preset or changed by the experience of teachers, so that a large amount of manpower and time are wasted, and the actual operators are required to have strong experience. Moreover, the experience is often not reproducible, and the person can only have a certain experience through long-time and large-scale accumulation.

Patent document CN110879820A provides an industrial data processing method and device, which collects industrial data generated in at least a part of production processes; associating each industrial data according to the process time of each acquired industrial data; the method comprises the steps of carrying out abnormity detection on each industrial data after time correlation, splicing discrete production flows through an effective data processing technology, and achieving quick positioning through analysis of a production full link.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method and a system for realizing streaming computing by supporting industrial data.

The invention provides a streaming processing calculation method for realizing industrial data, which comprises the following steps:

a data acquisition step: acquiring data of equipment through a data gateway, acquiring a bit array generated by the equipment, identifying an equipment field by setting an OPC (OLE for process control) protocol, converting the equipment field into a JSON (Java Server object notation) character string, and sending the JSON character string to a message middleware in real time;

a data conversion step: analyzing data information of the message middleware by using a Flink calculation engine, and asynchronously matching corresponding dimension information;

index calculation: windowing the data information according to different dimensions to obtain a first calculation result, writing the first calculation result into an analysis layer of the message middleware, performing index calculation to obtain a second calculation result, packaging the second calculation result, and storing the second calculation result into a column type storage database.

Preferably, the data acquisition step comprises:

a data acquisition step: acquiring a bit array generated by equipment in the operation process, wherein the bit array comprises equipment state information and service data information;

array conversion: any one or more device fields of a device ID, a device state, a device running time length, a feed opening, a defect ID and a production product ID are identified through an OPC protocol and converted into a word array, device information is analyzed through the word array, the device information comprises any one or more of the device state and a monitoring state, then the word array is converted into character strings, the character strings are spliced into JSON character strings, and the JSON character strings are sent to a message middleware.

Preferably, the data conversion step includes:

a message consumption step: the user-defined connector is connected with the message middleware, analyzes data in the data stream and carries out different logic processing according to different events;

asynchronous matching: distinguishing a data structure type according to the events, analyzing an enterprise ID and an equipment ID, carrying out conditional query on whether corresponding dimension information exists in a cache or not by using the enterprise ID and the equipment ID, if the query is empty, carrying out asynchronous query on the dimension information in a dimension database by using a vertx frame, caching a query result, and sending the query result to different subjects in a message middleware according to different events.

Preferably, the index calculating step includes:

windowing calculation: windowing calculation is carried out on different equipment state data and production data according to different dimensions to obtain a first calculation result, wherein the first calculation result comprises any one or more of yield, defective rate and defect rate;

and (5) result packaging: and accessing the first calculation result to Apache flight for index calculation, filtering invalid data, segmenting according to event types, performing windowing calculation, converting streaming data into Table streaming data, performing packet sequencing on the data according to multiple dimensions by using the flight Sql, respectively generating different industrial indexes, forming a second calculation result, packaging and storing the second calculation result in a column type storage database.

According to the invention, the streaming processing computing system for realizing the industrial data comprises:

a data acquisition module: acquiring data of equipment through a data gateway, acquiring a bit array generated by the equipment, identifying an equipment field by setting an OPC (OLE for process control) protocol, converting the equipment field into a JSON (Java Server object notation) character string, and sending the JSON character string to a message middleware in real time;

the data conversion module: analyzing data information of the message middleware by using a Flink calculation engine, and asynchronously matching corresponding dimension information;

an index calculation module: windowing the data information according to different dimensions to obtain a first calculation result, writing the first calculation result into an analysis layer of the message middleware, performing index calculation to obtain a second calculation result, packaging the second calculation result, and storing the second calculation result into a column type storage database.

Preferably, the data acquisition module comprises:

a data acquisition module: acquiring a bit array generated by equipment in the operation process, wherein the bit array comprises equipment state information and service data information;

an array conversion module: any one or more device fields of a device ID, a device state, a device running time length, a feed opening, a defect ID and a production product ID are identified through an OPC protocol and converted into a word array, device information is analyzed through the word array, the device information comprises any one or more of the device state and a monitoring state, then the word array is converted into character strings, the character strings are spliced into JSON character strings, and the JSON character strings are sent to a message middleware.

Preferably, the data conversion module includes:

a message consumption module: the user-defined connector is connected with the message middleware, analyzes data in the data stream and carries out different logic processing according to different events;

an asynchronous matching module: distinguishing a data structure type according to the events, analyzing an enterprise ID and an equipment ID, carrying out conditional query on whether corresponding dimension information exists in a cache or not by using the enterprise ID and the equipment ID, if the query is empty, carrying out asynchronous query on the dimension information in a dimension database by using a vertx frame, caching a query result, and sending the query result to different subjects in a message middleware according to different events.

Preferably, the index calculation module includes:

a windowing calculation module: windowing calculation is carried out on different equipment state data and production data according to different dimensions to obtain a first calculation result, wherein the first calculation result comprises any one or more of yield, defective rate and defect rate;

and a result packaging module: and accessing the first calculation result to Apache flight for index calculation, filtering invalid data, segmenting according to event types, performing windowing calculation, converting streaming data into Table streaming data, performing packet sequencing on the data according to multiple dimensions by using the flight Sql, respectively generating different industrial indexes, forming a second calculation result, packaging and storing the second calculation result in a column type storage database.

Compared with the prior art, the invention has the following beneficial effects:

1. data acquisition is carried out on the equipment through the data gateway, different data formats are uploaded by any equipment machine type through adding configuration files, and the problem of data acquisition adaptation of different equipment machines is solved.

2. The data are processed and windowed by the Flink real-time computing engine to compute the industrial index, so that industrial enterprise users can obtain the feedback of product quality and product production conditions in real time, the real-time performance is higher than that of the conventional offline batch processing data, the data can be provided for a BI analysis platform rapidly, and the problem of analysis effectiveness is solved.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a schematic structural framework of the present invention;

FIG. 2 is a flow chart of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

Example 1

The invention realizes the stream processing calculation of the industrial data by a big data ecological technology stack, completes one-stop industrial data application service and provides a reliable analysis report. In data calculation, a big data stream calculation engine Flink is adopted to realize data storage, a big data distributed file storage system Hadoop is used for data access security, a kerberos network authorization protocol is used for realizing the robustness, usability and security of the system, and the pain point of data timeliness calculation, automatic analysis report and data storage security of the current industrial enterprise is solved. The industrial enterprise data unified management is realized, an industrial data island is broken, and an industrial data closed loop is realized.

As shown in FIG. 1, the method comprises a collecting step, a data conversion step, an industrial index calculation step and an industrial production quality analysis step.

The collecting step comprises the following steps:

step A1: acquiring a bit array generated by industrial equipment, specifically:

the bit array is generated in the operation of the industrial equipment, and the array comprises equipment system state, equipment system monitoring, equipment feeding system, equipment discharging system, equipment transferring system, equipment taking system, equipment performance index and service data

Step B1: the data gateway receives a bit array generated by the equipment, identifies the relevant fields of the equipment through an OPC protocol, converts the relevant fields into JSON character strings and sends the JSON character strings to the message middleware in real time;

the bit array is converted into a character string, and the method comprises the following steps:

step B1S 1: receiving bit array, converting into word array according to protocol

Step B1S 2: and analyzing specified equipment information through the word array, namely:

system state (words 0)

Wherein the content of the first and second substances,

bit 0 represents an unknown error condition, typically a network failure or power down;

bit 1 represents autorun;

the 2 nd bit represents standby;

bit 3 represents a fault;

position 4 represents maintenance;

the 5 th position represents maintenance;

system monitoring (words 1)

Wherein the content of the first and second substances,

position 0 represents near qi instability;

bit 1 represents the stopping of the algorithm machine;

the 2 nd bit represents the exceeding of the detection result;

the 3 rd bit represents that the emergency stop is pressed;

position 4 represents the emergency gate is open;

and sequentially circulating each element of the word array according to the equipment feeding system, the equipment discharging system, the equipment transferring system, the equipment taking system and the equipment performance index, converting the elements into character strings according to protocol convention, splicing the character strings into JSON (Java Server object notation) and sending the JSON to the specified message middleware.

The data conversion step comprises the following steps:

step A2: consuming the middleware data information by using the Flink, specifically:

the user-defined connector is connected with the message middleware, analyzes data in the data stream, and performs different logic processing according to different events, and the method comprises the following steps:

step A2S1, converting the Byte array into a character string according to UTF-8;

A2S2, obtaining different key-value key value pairs according to the value of the event field in the character string and converting the key-value key value pairs into JSON character strings;

step B2: matching dimension information by using a Flink asynchronous IO technology, wherein the asynchronous IO is matched with the dimension information, and the method comprises the following steps:

and step B2S1, distinguishing the data structure type according to the event, and analyzing the enterprise ID and the equipment ID.

And step B2S2, by utilizing the combination of the vertx frame and the cache, firstly inquiring whether the corresponding dimensionality exists in the cache according to the condition, and if the result is empty, asynchronously inquiring the dimensionality information in the dimensionality database by using the vertx and caching the dimensionality information.

Namely:

the equipment system state data is equipment ID + batch number + product ID + product name + state identification + attribute value + timestamp;

the equipment system monitors equipment ID + batch number + product ID + product name + monitoring index + attribute value + timestamp;

the equipment feeding system, the equipment discharging system, the equipment transferring system, the equipment photographing system and the equipment performance index are also the corresponding dimension information matched as described above.

Step B2S3, sending the events to different topics topic of the message middleware according to different event events.

The industrial index calculating step comprises the following steps:

step A3: the method comprises the following steps of industrial index calculation:

windowing calculation is carried out on different equipment system data and production data according to different dimensions, the yield, the defective product rate, the defect rate and the like are calculated, and then the calculation result is written into the message middleware analysis layer topic, so that a data source is provided for industrial production quality analysis.

Step B3: index calculation, specifically:

step B3S 1: and (4) calculating indexes such as real-time consumption number, designated duration average consumption rate and the like by data access Flink, filtering invalid data, and dividing the data according to the event type.

Step B3S 2: designating the Flink Job as event-driven according to the event _ time, performing assignment TimestampAndWatermarks processing on DataStream, marking the qualified _ date _ time as an event, namely, designating the event _ time field in the data by using the assignment TimestampAndWatermarks, setting the data waiting time, marking the delayed data, performing windowing calculation, and converting the streaming data into the Table streaming data. Carrying out groupBy on data according to dimensions of clients, equipment, products, batches and the like by using the Flink Sql to respectively generate different industrial indexes;

namely:

FirstValidFilterFunction filters invalid event data

SeconddValidFunction filters invalid classfied _ date _ time

AssignTimestampAndWatermarks converts classfied _ date _ time to Timestamp and specifies

classfied _ date _ time is event _ time

Conversion of Flink StreamExecutionEnvironment to StreamTableEnvironment Sql calculation for dataStream.

The sql refers to four types, namely screening product data with data types of good products, and then calculating the good product rate in a grouping mode according to different products, production lines, customer IDs and working procedures; screening product data with the data type of non-defective products, and then calculating the non-defective product rate in groups according to different products, production lines, customer IDs and procedures; screening product data with the data type as a defect, and then grouping and calculating the defect rate of the product according to different products, production lines, customer IDs and procedures; and screening product data with the data type of the image, and then grouping and calculating the image detection accuracy according to different products, production lines, customer IDs and procedures.

Step B3S 3: and packaging the calculation result, persisting the calculation result into a column type storage database for analysis, and adopting a BI self-help analysis engine for analysis. The columnar stores the database for use in application-side real-time queries. And calculating a result in real time through the Flink, classifying the data according to dates and different enterprise client IDs, respectively storing different enterprise data into different data directories or data tables, and physically isolating the client data to improve the safety. Storing the data into an HDFS (Hadoop distributed file system) or a ClickHouse (Business Process architecture) by adopting a queue column type storage data format, adding a data source (which is an address of a data storage medium) by a BI (Business Process analysis) self-service analysis tool, then creating a data set and a dashboard, and finally displaying a data analysis result.

Example 2

Embodiment 2 can be regarded as a preferable example of embodiment 1. The streaming processing calculation system for realizing industrial data described in embodiment 2 uses the steps of the streaming processing calculation method for realizing industrial data described in embodiment 1.

The invention provides a stream processing computing system for realizing industrial data, which is used for acquiring data of equipment, acquiring a bit array generated by the equipment, identifying that a field of the equipment is converted into a JSON character string and sending the JSON character string to a message middleware in real time; analyzing data information of the message middleware by using a Flink calculation engine, and asynchronously matching corresponding dimension information; and performing windowing calculation on the data information according to different dimensions, writing a first calculation result into an analysis layer of the message middleware, performing index calculation, and packaging and storing a second calculation result into a column type storage database. The data calculation is realized by adopting a big data stream calculation engine Flink, the data storage is completed by a big data distributed file storage system Hadoop, the data access safety uses a kerberos network authorization protocol, and the pain points of data timeliness calculation, automatic analysis report and data storage safety of the current industrial enterprise are solved.

The method specifically comprises the following modules:

The data acquisition module comprises:

The data conversion module includes:

The index calculation module includes:

Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A streaming processing calculation method for realizing industrial data is characterized by comprising the following steps:

2. The method of claim 1, wherein the data collection step comprises:

3. The method of claim 1, wherein the data conversion step comprises:

4. The streaming processing calculation method for industrial data according to claim 1, wherein the index calculation step comprises:

5. A streaming computing system that implements industrial data, comprising:

6. The streaming computing system of industrial data as recited in claim 5, wherein the data collection module comprises:

7. The streaming computing system of industrial data as recited in claim 5, wherein the data conversion module comprises:

8. The streaming processing computing system of industrial data as recited in claim 5, wherein the metric calculation module comprises: