CN111970374A

CN111970374A - Data node grouping method, system and medium based on machine learning

Info

Publication number: CN111970374A
Application number: CN202010878186.6A
Authority: CN
Inventors: 古欣; 邵慧; 房玉飞; 刁志峰; 黄大伟; 迟昊
Original assignee: Shandong Youren Information Technology Co ltd
Current assignee: Shandong Youren Information Technology Co ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2020-11-20
Anticipated expiration: 2040-08-27
Also published as: CN111970374B

Abstract

The invention discloses a data node grouping method, a system and a medium based on machine learning, comprising the following steps: sorting the data node set according to the size of the initial address; dividing a data node set into a plurality of subsets according to a set rule according to an address difference value between adjacent data nodes; screening effective groups for each subset, and determining all possible group acquisition strategies based on the screened effective groups; determining the acquisition time of each group acquisition strategy based on a machine learning method, and determining an optimal group acquisition strategy; and combining the optimal grouping acquisition strategies of all the subsets to obtain the optimal grouping acquisition strategy of the whole data node set. The method has the advantages of shorter time for edge acquisition by the grouping optimization strategy, higher efficiency, greatly reduced time delay of edge acquisition and improved real-time property of edge acquisition.

Description

Data node grouping method, system and medium based on machine learning

Technical Field

The invention relates to the technical field of edge acquisition, in particular to a data node grouping method, a data node grouping system and a data node grouping medium based on machine learning.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

In an application scenario of edge acquisition and edge calculation, a conventional edge acquisition method generally acquires information of all data nodes needing edge acquisition, and then sequentially generates an acquisition instruction of each data node according to the node information; and performing edge acquisition on each data node by using the acquisition instruction of the data node, and acquiring the node state and the data of the acquired data according to the rule of the protocol. The method collects data in sequence and in series among nodes, and the edge collection efficiency is very slow due to the fact that the ratio of effective load data to protocol data is too small and the time consumption of execution of a plurality of instructions is low.

The prior art optimizes the conventional edge acquisition method and logically divides the address space into a plurality of groups according to a fixed address length. And distributing the nodes to be acquired into corresponding groups according to the addresses of the nodes. The data to be acquired is acquired in groups, so that the serial acquisition among the nodes is optimized to be the serial acquisition among the groups, and the acquisition efficiency can be improved by several times to tens of times due to the logic structure of the parallel acquisition of the data nodes in the groups.

However, in practical applications, the inventor finds that the grouping method is a fixed address range grouping, defines logical addresses, and groups all nodes in the range. The method often cannot find a relatively reasonable grouping method, and the effect of acquisition optimization is reduced on the contrary due to an unreasonable or non-optimal grouping method in part of cases; such as:

the method comprises six Modbus-RTU protocols, wherein the register type is a data node (node or data point for short) for holding a register, and the addresses are 0, 31, 32, 63, 64 and 65; the grouping method using 32 addresses for one logical grouping results in: [0, 31], [32, 63], [64, 65 ]; the packet [0, 31] collects the data of all the data nodes with the addresses of 0 to 31; at this time, the data payload has 32 data points, the payload has 2 data points, and the dummy payload has 30 data points. For this grouping, the percentage of valid data points was 6.25%. In addition, protocol leading data and protocol trailing data are also arranged in the communication protocol, and the total effective data percentage is 5.7% which is too low. The replied data contains a large amount of redundant invalid load data, and when a serial port mode is used and the baud rate is low or other modes are adopted, the data transmission time of the large amount of redundant data is very time-consuming, and the acquisition efficiency is reduced.

In addition, the prior art discloses that the acquisition time of each packet is determined by a direct detection method, which requires that the actual detection data acquisition command is sent to the overall time spent for receiving and analyzing the data returned by the lower device, and therefore the efficiency is low;

in some embodiments, it is further disclosed to determine the acquisition time for each packet by predictive analysis, which predicts the data transfer time based on the data length, without actual detection process, and can improve the calculation efficiency of the acquisition time; however, since the time is estimated, the accuracy is not high compared to the actual detection method.

Disclosure of Invention

In view of this, the invention provides a data node grouping method, system and medium based on machine learning, which adopt an optimized grouping acquisition strategy to improve the efficiency of edge acquisition; and meanwhile, the acquisition time of each group acquisition strategy is determined by adopting a machine learning method so as to quickly determine the optimal group acquisition strategy.

In order to achieve the above purpose, in some embodiments, the following technical solutions are adopted:

a data node grouping method based on machine learning comprises the following steps:

sorting the data node set according to the size of the initial address;

dividing a data node set into a plurality of subsets according to a set rule according to an address difference value between adjacent data nodes;

screening effective groups for each subset, and determining all possible group acquisition strategies based on the screened effective groups; determining the acquisition time of each group acquisition strategy based on a machine learning method, and determining an optimal group acquisition strategy;

and combining the optimal grouping acquisition strategies of all the subsets to obtain the optimal grouping acquisition strategy of the whole data node set.

In other embodiments, the following technical solutions are adopted:

an optimized grouping system for improving edge acquisition efficiency, comprising:

means for sorting the set of data nodes according to starting address size;

the device is used for splitting the data node set into a plurality of subsets according to a set rule according to the address difference value between adjacent data nodes;

the system is used for screening effective groups for each subset, and determining all possible group acquisition strategies based on the screened effective groups; a device for determining the collection time of each group collection strategy and determining the optimal group collection strategy based on a machine learning method;

and the device is used for combining the optimal grouping acquisition strategies of all the subsets to obtain the optimal grouping acquisition strategy of the whole data node set.

In other embodiments, the following technical solutions are adopted:

a terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned machine learning-based data node grouping method.

In other embodiments, the following technical solutions are adopted:

a computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to execute the above-mentioned machine learning-based data node grouping method.

Compared with the prior art, the invention has the beneficial effects that:

compared with a grouping strategy with fixed address length, the grouping optimization strategy of the method has shorter time for edge acquisition and higher efficiency, can greatly reduce the time delay of the edge acquisition and improve the real-time property of the edge acquisition.

The invention divides the whole data node set into a plurality of subsets to be processed respectively, thereby greatly reducing the data volume needing to be processed at a time and reducing the requirement on the performance of the data processing equipment.

According to the invention, through setting multiple effective grouping screening strategies, the grouping which does not meet the requirements can be directly filtered, and the grouping acquisition strategy is determined based on the screened effective grouping, so that the complexity of data processing is reduced, and the data processing efficiency is improved.

The method is based on a machine learning method to determine the acquisition time of each group, and then the acquisition time required by each group acquisition strategy is obtained; the method can improve the calculation efficiency of the acquisition time and can ensure the accuracy of the calculation result.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

Fig. 1 is a flowchart of a data node grouping method based on machine learning according to an embodiment of the present invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.

Example one

In one or more embodiments, a machine learning-based data node grouping method is disclosed, referring to fig. 1, comprising the steps of:

step S101: sorting the data node set according to the size of the initial address;

step S102: dividing a data node set into a plurality of subsets according to a set rule according to an address difference value between adjacent data nodes;

specifically, firstly, determining an address difference value between every two adjacent data nodes, and sequencing the data nodes from large to small; then the resolution is carried out according to the following process:

(1) splitting two non-split adjacent data nodes with the maximum address difference value;

(2) judging whether a subset meets a set condition after the data nodes are split;

(3) if not, entering the step (4); if yes, intercepting the subset, and entering the step (5);

(4) returning to the step (1) to continue splitting;

(5) judging whether the data node set is completely split or not, and if so, finishing; otherwise, returning to the step (1) to continue splitting the residual data nodes.

Wherein the subset satisfying the set condition includes:

condition 1: the number of nodes in the subset does not exceed the limited value X (X subset number limit parameter).

Condition 2: the address range of the subset does not exceed the limited value Y (Y subset address range limitation parameter).

When the two conditions are both satisfied, determining that the subset satisfies the condition; the values of X and Y can be flexibly configured according to requirements; the smaller the values of X and Y are, the more the number of the subsets is, the lower the optimization degree is, and the lower the time complexity is; the larger the values of X and Y, the larger the number of subsets, the higher the optimization degree and the higher the time complexity.

In this embodiment, the maximum number of the subsets is set as needed, and if the number of the split subsets reaches the maximum number, it is determined that the data node set is split completely.

It should be noted that when the address difference between consecutive data nodes, that is, adjacent data nodes, is 1, splitting is not required.

For example, the data node set obtained by sorting the starting address size is as follows:

x represents that node data exists under the current address, O represents that the node data does not exist under the current address, and 1-10 are addresses of data nodes.

When splitting is performed, the address difference between two data nodes with addresses 3 and 6 is the largest, so the data set is split into:

and

two subsets;

and judging whether the two subsets meet the condition or not, and then splitting the subsets which do not meet the condition.

Step S103: screening effective groups for each subset, and determining all possible group acquisition strategies based on the screened effective groups; calculating the acquisition time of each group acquisition strategy, and determining the optimal group acquisition strategy;

specifically, the specific process of screening valid packets includes:

(1) firstly, traversing all possible grouping modes of data nodes in a subset; if there are n data nodes, the number of all packets is:

such as: the subset has 5 points [0, 31, 32, 63, 64 ], denoted [ A, B, C, D, E ] for convenience of description. All possible grouping modes include the following 15 types:

(2) in addition to the grouping of the individual data nodes (such as A, B, C, D, E described above), the remaining grouping modes are screened, and the grouping screening method includes:

grouping and screening the adhesivity:

when the address difference value of any two adjacent data nodes is smaller than a set value Z, rejecting all groups only containing any one of the two data nodes; when the difference between two data nodes is smaller than Z, the two data nodes are called a bonded node pair. The smaller the value of Z, the greater the number of packets filtered and the lower the degree of optimization.

Exclusion group screening:

when the address difference value of any two adjacent data nodes is larger than a set value L, rejecting all groups containing the two nodes at the same time; these two nodes are referred to as an exclusive node pair. The larger the value of L, the greater the number of packets filtered and the lower the degree of optimization.

Thirdly, grouping and screening address ranges:

if the address difference value between the first data node and the last data node in the packet is larger than a set address range M, rejecting the packet; the smaller the value of M, the greater the number of packets filtered and the lower the degree of optimization.

Grouping efficiency screening:

generating an acquisition instruction of a corresponding group according to the rule of the protocol, determining the acquisition time of each group, and rejecting the groups with low acquisition efficiency based on the acquisition time; for example: the acquisition time for group a was 5 seconds, for group B was 3 seconds, and for group AB was 10 seconds. And (4) considering that the grouping efficiency of the grouping AB is low, and rejecting.

And the remaining groups after layer-by-layer screening are effective groups.

In this embodiment, the machine learning-based method obtains the acquisition time of each group, and the specific process includes:

in the data acquisition execution process, under the condition of not changing the environmental parameters, only the initial address is changed and the length of the query register unit is not changed, so that the flows of multiple edge acquisition are completely consistent, and the time is completely the same.

In this embodiment, the environmental parameters include: protocol type (for example: PLC protocol such as modbus-RTU, modbus-TCP, modbus-ASCII, PPI, etc.), protocol communication medium, register type (for example, register type such as Siemens PPI protocol having I type, Q type, M type, D type, etc.), data type and communication parameters, wherein the communication parameters further comprise: serial port, baud rate, data bit, stop bit, check bit and start bit. The environmental parameters are combined together to form a scene; any change in the environmental parameters corresponds to a new scene, one sub-classifier for each scene.

Of course, this does not constitute a limitation to the technical solution of the present invention, and those skilled in the art may determine other combinations of environmental parameters as scenes according to actual needs.

For example:

collecting and holding register with initial address of 0X 000X 64 and register unit length of 0X 000X 01

Collecting commands: 010300640001C 5D 5

Replying data: 0103020000B 844 reply data length 2 bytes

② a collection and hold register with the start address of 0X 000 XC8 and the length of the register unit of 0X 000X 01

Collecting commands: 010300C 8000105F 4

Replying data: 0103020000B 844 reply data length 2 bytes

Similarly, if only the register unit length is changed, the flow of the acquisition command and the reply data and the format of the protocol data are controllable.

For example:

collecting and maintaining register with initial address of 0X 000X 64 and register unit length of 0X 000X 02

Collecting commands: 01030064000285D 4

Replying data: 01030400000000 FA 33 reply data length of 4 bytes

② a collection holding register with the starting address of 0X 000 XC8 and the length of the register unit of 0X 000X 02

Collecting commands: 010300C 8000245F 5

Replying data: 01030400000000 FA 33 reply data length of 4 bytes

Therefore, the register unit length and the amount of the interactive data satisfy a linear relationship, that is, the register unit length and the overall execution time T satisfy a linear regression relationship: t ═ Lx + b; wherein, T is execution time, L is data register unit length, and b is fixed time.

If the linear regression equation of the scene is determined, the execution time T can be directly calculated from the linear regression equation if only the register unit length L is adjusted under the condition that the scene is not changed.

And each scene is trained and learned by using supervised learning according to the characteristics of the scene, so that a model of each scene is obtained.

The specific process of machine learning is as follows:

data acquisition- - > data preprocessing- - > model training- - > model verification- - > confirmation model

Acquisition of data

And (3) building a model scene, and creating process data of a crawler script crawling and PLC communication, wherein the process data comprises query time and the length of a queried register unit.

The process is as follows:

and creating the environment of the scene, for example, using a modbus-RTU protocol for communication, and building an edge acquisition environment by using parameters such as a 485 bus, a 9600 baud rate, 8-bit data bits, 1-bit stop bits and no check bits.

Creating a crawler script, realizing the generation of a query instruction, sending the query instruction to the PLC, receiving and analyzing reply data of the PLC, and storing the crawled process data: register unit length, time to issue and receive an instruction.

② data preprocessing

There may be anomalous or invalid data or out of range data due to the large amount of data crawled by the crawler. These data are preprocessed: and eliminating data failed in acquisition and data overtime, and replying abnormal data by the PLC, wherein the length of the register unit exceeds the range.

Model training

The preprocessed data has a large number of repeated data with the length of the register unit. All data is sorted according to register unit length.

For the data classified according to the register unit length, the arithmetic mean of the edge capture time of the register unit length class data is calculated as the mean time of the edge capture of the register unit length.

The specific process of fitting a linear regression equation for the register cell length is described below by way of an example.

Numbering:	register cell length	Edge acquisition mean time	Equation of
				1	2	16.9	16.9＝2x+b
2	3	22.1	22.1＝3x+b
				3	4	27.2	27.2＝4x+b
4	5	31.9	31.9＝5x+b

1) Solving for x

Using the equation numbered n minus the equation numbered n-1

Numbering	Equation of	Solving for x
			2-1	22.1-16.9＝(4x+b)-(3x+b)	5.2＝x
3-2	27.2-22.1＝(4x+b)-(3x+b)	5.1＝x
			4-3	31.9-27.2＝(5x+b)-(4x+b)	4.7＝x

The arithmetic mean (5.2+5.1+ 4.7)/3-15/3-5 was calculated for all X, yielding X-5.

2) Solving for b

Substituting the obtained x-5 into each equation, and solving b of each equation

Numbering	Register cell length	Edge acquisition mean time	Equation of	b
					1	2	16.9	16.9＝2*5+b	b＝6.9
2	3	22.1	22.1＝3*5+b	b＝7.1
					3	4	27.2	27.2＝4*5+b	b＝7.2
4	5	31.9	31.9＝5*5+b	b＝6.9

The arithmetic mean (6.9+7.1+7.2+ 6.9)/4-28.1/4-7.025 was calculated for all b yielding b-7.025.

3) Substituting x and b yields a linear regression equation as:

Y＝5x+7.025

model verification

Numbering	Register cell length	Edge acquisition mean time	Equation of
				1	2	16.9	2*5+7.025＝17.025
2	3	22.1	3*5+7.025＝22.025
				3	4	27.2	4*5+7.025＝27.025
4	5	31.9	5*5+7.025＝32.025

And substituting the length of the register unit into the edge acquisition time obtained by the obtained linear regression equation to compare and verify the edge acquisition average time. Within reasonable error, to evaluate the performance of the model.

Model use

And performing acquisition time prediction on new data by using the trained model.

And finishing the training of the model.

Respectively training the sub-classifiers corresponding to each register type under each communication protocol by adopting the method to obtain a linear regression model of each sub-classifier, and further obtaining an integral classifier model;

acquiring the environmental parameter of each group in the group acquisition strategy, inputting the environmental parameter information into an integral classifier model, finding a corresponding relation equation T (f) (L) for each group according to the environmental parameter of the current group, and acquiring the acquisition time T of each group according to the length L of the register unit of the group so as to acquire the acquisition time of the group acquisition strategy.

In this embodiment, a combination of a plurality of valid packets, which include all nodes and each node includes only once, is determined as one packet acquisition policy.

Determining all possible grouping collection strategies based on the screened effective groups, wherein the specific process comprises the following steps:

sorting according to the address of the initial data node according to the principle that the initial nodes of the effective groups in each row are the same, and respectively placing the effective groups into different rows; such as:

the fifth element: e

Fourth row: d DE

Third row: c CD CDE

A second row: b BC BCD BCDE

First row: a AB ABC ABCD ABCDE

Firstly, selecting a first group of a group acquisition strategy, sequentially polling a first row, and selecting one group as the first group; and if all the groups in the first row are completely traversed, the collection strategy is completely traversed.

And determining the tail node of the last collection strategy group, and selecting the next node as the starting node of the next group.

Finding out the row where the starting node is located, and then sequentially selecting each group; until the last node E is found.

For example, ABC of the first row is selected as an acquisition strategy head group, and the group is added into an acquisition strategy set; then taking the node D as the head node of the next group, and selecting the group from the fourth row, wherein D or DE can be selected; if node D is selected, continue to select group E from the fifth row, resulting in a group acquisition policy: ABC/D/E; if the node DE is selected, a packet acquisition policy is obtained: ABC/DE.

And calculating the sum of the acquisition time of all the groups in each group acquisition strategy, and selecting the group acquisition strategy with the shortest acquisition time as the optimal group acquisition strategy of the subset.

Step S104: and combining the optimal grouping acquisition strategies of all the subsets to obtain the optimal grouping acquisition strategy of the whole data node set.

Compared with the traditional edge acquisition method or the fixed grouping edge acquisition method, the method can find the optimized grouping acquisition strategy by using the shortest time, has shorter time for edge acquisition and higher efficiency, can greatly reduce the time delay of edge acquisition and improve the real-time property of edge acquisition.

Example two

In one or more embodiments, a machine learning based data node grouping system is disclosed, comprising:

means for sorting the set of data nodes according to starting address size;

It should be noted that the specific working manner of the apparatus is implemented by using the method disclosed in step S101 to step S104 in the first embodiment, and details are not described again.

EXAMPLE III

In one or more implementations, a terminal device is disclosed that includes a server including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the machine learning-based data node grouping method of the first embodiment when executing the program. For brevity, no further description is provided herein.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.

The data node grouping method based on machine learning in the first embodiment can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.

Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. A data node grouping method based on machine learning is characterized by comprising the following steps:

sorting the data node set according to the size of the initial address;

2. The machine learning-based data node grouping method according to claim 1, wherein a data node set is split into a plurality of subsets according to a set rule according to an address difference between adjacent data nodes, and the specific process includes:

(4) returning to the step (1) to continue splitting;

3. The machine learning-based data node grouping method according to claim 2, wherein the setting of the condition in step (2) specifically includes:

the number of nodes in the subset does not exceed a set value X;

the address range of the subset does not exceed the set value Y.

4. The machine learning-based data node grouping method according to claim 1, wherein for each subset, the specific process of screening valid packets includes:

acquiring all possible grouping modes of the data nodes in the subset;

and screening the rest grouping modes except the grouping of the single data node, wherein the grouping screening method at least adopts one mode of the following modes:

when the address difference value of any two adjacent data nodes is smaller than a set value Z, rejecting all groups only containing any one of the two data nodes;

when the address difference value of any two adjacent data nodes is larger than a set value L, rejecting all groups containing the two nodes simultaneously;

if the difference value of the address of the first data node and the address of the last data node in the group is larger than the set address range M, rejecting the group;

and fourthly, determining the acquisition time of each group, and rejecting the groups with low acquisition efficiency based on the acquisition time.

5. The machine learning-based data node grouping method according to claim 1, wherein a combination of a plurality of valid groups, which contain all data nodes and each data node contains only once, is determined as a group collection policy.

6. The machine learning-based data node grouping method of claim 1, wherein the collection time of each grouping collection strategy is calculated, and the one with the shortest collection time is selected as the optimal grouping collection strategy.

7. The machine learning-based data node grouping method according to claim 1, wherein the machine learning-based method determines the collection time of each group collection strategy, and the specific process includes:

determining environmental parameters for combining to form scenes, wherein the change of each environmental parameter corresponds to a new scene, and each scene corresponds to a sub-classifier;

in a certain scene, crawling grouping acquisition time data corresponding to different register unit lengths; training the sub-classifiers in the scene based on the crawled data to obtain linear regression models of the sub-classifiers;

respectively training the sub-classifiers corresponding to each scene by adopting the method to obtain a linear regression model of each sub-classifier, and further obtaining an integral classifier model;

acquiring the environmental parameter information of each group in the group acquisition strategy, inputting the information into the integral classifier model, and outputting the acquisition time of each group so as to obtain the acquisition time of the group acquisition strategy.

8. An optimized grouping system for improving edge acquisition efficiency, comprising:

means for sorting the set of data nodes according to starting address size;

9. A terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform the machine learning based data node grouping method of any of claims 1-7.

10. A computer-readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to perform the machine learning based data node grouping method of any one of claims 1-7.