CN113093682A

CN113093682A - Non-centralized recursive dynamic load balancing calculation framework

Info

Publication number: CN113093682A
Application number: CN202110382691.6A
Authority: CN
Inventors: 谷晓英; 赵春海; 韩建枫; 马云鹏; 吴玉霄
Original assignee: Tianjin University of Commerce
Current assignee: Tianjin University of Commerce
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2021-07-09
Anticipated expiration: 2041-04-09
Also published as: CN113093682B

Abstract

The invention discloses a non-centralized recursive dynamic load balancing calculation framework, which comprises a KVM switcher, a switch and a plurality of industrial personal computers; the KVM switcher, the switchboard and the plurality of industrial personal computers are arranged in the hollow cluster cabinet; a plurality of industrial cameras are arranged outside the cluster cabinet; and the KVM switcher is respectively connected with the plurality of industrial personal computers and used for inputting the control instruction of the user and forwarding the control instruction to the industrial personal computers for execution. The switch is respectively connected with the industrial personal computers and the industrial cameras to realize data communication; each industrial camera is used for shooting and collecting product images and sending the product images to an industrial personal computer through a switch for image analysis and processing; and each industrial personal computer is respectively used as an industrial personal computer node on the local area network. The invention can realize the sequential overflow of the calculation tasks among a plurality of industrial personal computers, flexibly distribute the calculation tasks and form a clear resource requirement boundary on the basis of ensuring the high-efficiency concurrent completion of the calculation tasks.

Description

Non-centralized recursive dynamic load balancing calculation framework

Technical Field

The invention relates to the technical fields of cloud computing, computer software, cluster computing, image processing and the like, in particular to a non-centralized recursive dynamic load balancing computing architecture.

Background

The computer image processing algorithm, data analysis, feature extraction and classification have important significance in the field of machine vision. With the rapid progress of industrial production technology, large-scale flow line production is very common.

At present, in an edging process on a plane glass production line, the product is easily abraded and damaged due to the hard and brittle texture of glass and the abrasion condition of an abrasive wheel installed on an edging machine. Because the glass products on the glass production line are various in types, the damaged types are also various. Therefore, machine vision detection equipment is developed and installed on a plane glass production line, and a technology for quality supervision and detection of the glass processing production line is developed by utilizing an image processing algorithm, so that the quality control level of a product can be greatly improved.

Referring to fig. 1, for a product on a flat glass production line, a specific product detection process is as follows:

firstly, identifying whether a product on a plane glass production line is transmitted to a detection position or not through photoelectric identification;

then, image capture stage. The main control computer informs the camera to take images, and continuously captures images into the computer at high speed to form an image analysis task pool.

Then, a task allocation phase. And distributing the tasks in the task pool to the computing nodes for computing (namely image analysis processing).

And then, collecting the execution results of a plurality of groups of calculation tasks by the multiple calculation nodes to obtain a final analysis result, outputting the final analysis result to blanking equipment, and blanking the product which does not meet the quality requirement by the blanking equipment.

Based on the product detection process, in the product detection process, the calculation task is mainly an image analysis processing task. However, because the production speed of the plane glass production line frequently changes, the computing architecture of the parallel computing cluster adopted in the prior art often causes the phenomenon of insufficient or wasted computing resources, namely the problem of unbalanced dynamic load.

In addition, in the existing parallel cluster architecture constructed based on the overall synchronous parallel computing model (BSP), all computing nodes participating in computing participate in work, and the minimum architecture amount necessary for the current computing resources cannot be obviously displayed.

In addition, for the existing computing task processing, because a shared storage structure constructed by a random access parallel computing model (PRAM) model is adopted, the efficiency of the computing model is directly determined by the shared storage space and the read-write design thereof. It is not appropriate for the computing architecture with large computing data volume and requiring frequent IO task pool.

Disclosure of Invention

The invention aims to provide a non-centralized recursive dynamic load balancing computing framework aiming at the technical defects in the prior art.

Therefore, the invention provides a non-centralized recursive dynamic load balancing calculation framework which is characterized by comprising a KVM switcher, a switch and a plurality of industrial personal computers;

the KVM switcher, the switchboard and the plurality of industrial personal computers are arranged in the hollow cluster cabinet;

a plurality of industrial cameras are arranged outside the cluster cabinet;

the KVM switcher is respectively connected with the plurality of industrial personal computers, and is used for inputting control instructions of users and forwarding the control instructions to the industrial personal computers for execution;

the switch is respectively connected with a plurality of industrial personal computers and a plurality of industrial cameras to realize data communication;

each industrial camera is used for shooting and collecting product images and sending the product images to an industrial personal computer through a switch for image analysis and processing;

each industrial personal computer is respectively used as an industrial personal computer node on the local area network;

the local area network comprises a switch and a plurality of industrial personal computers connected with the switch.

Preferably, a power strip is further installed below the switch;

the power strip is respectively connected with power line plugs of the KVM switcher, the switchboard and the industrial personal computer and used for providing working power for the equipment by connecting an external alternating current power supply.

Preferably, the plurality of IPCs are located at a position between the KVM switch and the switch.

Preferably, for each industrial personal computer node, counting the number of residual calculation tasks in the industrial personal computer node in real time and the number of calculation results collected after the calculation tasks are completed;

and when the number of the residual calculation tasks in the nodes of the industrial personal computer is reset and the number of the calculation results collected after the calculation tasks are finished is equal to the total number of the calculation processing tasks, synthesizing the total number of the calculation tasks of the nodes of the industrial personal computer.

Preferably, when the non-centralized recursive dynamic load balancing computing architecture is used for product detection, the method specifically includes the following detection steps:

the method comprises the steps that firstly, an industrial camera in an image capturing mechanism is used for continuously capturing images of products on a production line, then images are transmitted to a head industrial personal computer node, and an image analysis initial task pool is constructed by a head management node;

the head industrial personal computer node is an industrial personal computer used for connecting all industrial cameras in the image capturing mechanism;

secondly, the head industrial personal computer node is used as a management node, a calculation task is progressively distributed to the next industrial personal computer node, and the head industrial personal computer node is used as a calculation node, and image analysis processing is carried out on the image in the image analysis initial task pool, namely the calculation task of the local computer is executed;

and thirdly, for any industrial computer node in the computing framework, when the task pool of the industrial computer node is not empty, judging whether the computing task load of the industrial computer node is too high in real time, if so, performing progressive distribution of computing tasks to the next industrial computer node, executing residual computing tasks which are not subjected to progressive distribution, and sending a computing result of the industrial computer node after the computing task is executed to the previous industrial computer node or preset equipment on an external production line.

Preferably, in the third step, it is determined that the load of the computing task of the node of the industrial personal computer is too high, and a specific determination algorithm is as follows:

(RCost >75) & (number n of remaining computing tasks: preset computing period T > real-time requirement T);

wherein & & is a logical AND;

RCost is the calculation task load of the nodes of the industrial personal computer;

wherein the real-time requirement time t is less than or equal to Tmax;

the Tmax is obtained in the following mode: timing is started when a product to be detected just enters a detection station, and timing is ended when the next product to be detected enters the position of the detection station, and the time Tmax required by the detection of the product is measured.

Preferably, the real-time requirement time t is Tmax/1.2.

Preferably, the industrial camera is an area-array camera with horizontal and vertical reference resolutions of 1920 pixels and 1200 pixels respectively;

for the area array camera with the horizontal and vertical reference resolutions of 1920 pixels and 1200 pixels respectively, the preset calculation period T_InitialSetting the time to 100 ms;

firstly, when the target width or the target height of the area-array camera is increased by X times, X is the increased multiplying power, X>0, and the detection accuracy is kept unchanged, the image capturing resolution of the industrial camera is adjusted with the same magnification according to the improved magnification of the image capturing width or height, and at the moment, if X is more than or equal to 1, the preset calculation period corresponding to each industrial control machine node is Tx-T_InitialX; if 0<X<1, taking Tx as 100 ms;

correspondingly, the Tx is used as a judgment algorithm for the overhigh load of the calculation task of the industrial control machine node in the third step, and the value of the preset calculation period T is adopted;

secondly, when the defect radius of the area-array camera is reduced by Y times, namely the detection precision is improved by Y times, Y>0 and Y is the multiplying power, and at the moment, if Y is more than or equal to 1, the preset calculation period corresponding to each industrial control machine node is Ty-T_Initial*Y²(ii) a If 0<Y<1, taking Ty as 100 ms;

correspondingly, the Ty at this time is used as a determination algorithm for determining that the load of the calculation task of the tool control node is too high in the third step, and a value of a preset calculation period T is finally adopted.

Preferably, in the third step, when the load of the calculation task is too high, one-half of the calculation tasks of the nodes of the industrial personal computer are distributed to the next industrial personal computer node in a progressive manner.

Preferably, in the third step, for each industrial computer node except for the tail industrial computer node and the head industrial computer node, the calculation result after the calculation task of the industrial computer node and the received calculation result of the next industrial computer node are sent to the previous industrial computer node together;

in the third step, the tail industrial personal computer node is only used for receiving and executing the calculation task which is distributed by the previous industrial personal computer in a progressive manner, and regressing the calculation result after the calculation task is completed to the previous industrial personal computer node;

the tail industrial personal computer node is an industrial personal computer node with an excessively high calculation task load;

in the third step, the head industrial personal computer node is used for receiving the calculation result of the next industrial personal computer node, synthesizing the calculation result with the calculation task of the local computer to form a final detection result, and then sending the final detection result to the preset equipment on an external production line.

Compared with the prior art, the technical scheme provided by the invention has the advantages that the non-centralized recursive dynamic load balancing calculation framework is scientific in design, calculation tasks can be sequentially overflowed among a plurality of industrial personal computers, the calculation tasks are flexibly distributed, a clear resource requirement boundary is formed on the basis of guaranteeing the high-efficiency completion of the calculation tasks, and the method has great practical significance.

Drawings

FIG. 1 is a schematic diagram of a conventional product inspection process, in which image capture, task assignment, image analysis, and analysis result collection are part of the process that can be performed using the computing architecture of the present invention;

FIG. 2 is a schematic diagram illustrating a distribution of a non-centralized recursive dynamic load balancing computing architecture in a cluster enclosure according to the present invention;

fig. 3 is a network connection diagram of a product inspection apparatus provided in the present invention, wherein the product inspection apparatus adopts the cluster cabinet structure shown in fig. 2; the product detection equipment comprises m (m is any natural number greater than 1) industrial cameras and n (n is any natural number greater than 1) industrial control computer nodes; the number of the industrial cameras is determined by the structure of a product production line, and the number of the nodes of the industrial personal computer is dynamically determined by operation.

Fig. 4 is a flowchart of the work of performing the quality detection of the product based on the non-centralized recursive dynamic load balancing calculation framework provided by the present invention.

Detailed Description

In order to make the technical means for realizing the invention easier to understand, the following detailed description of the present application is made in conjunction with the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Referring to fig. 1 to 4, the present invention provides a non-centralized recursive dynamic load balancing computing architecture, which includes a KVM switch 2, a switch 3 and a plurality of industrial computers 4;

the KVM switcher 2, the switchboard 3 and the plurality of industrial personal computers 4 are arranged in the hollow cluster cabinet 1;

a plurality of industrial cameras are arranged outside the cluster cabinet 1;

the KVM switcher 2 is connected to the plurality of IPCs 4, and is used for inputting a control command of a user and forwarding the control command to the IPCs 4 for execution.

The switch 3 is respectively connected with a plurality of industrial personal computers 4 and a plurality of industrial cameras to realize data communication;

each industrial camera is used for sending the product images shot and collected by the industrial camera to an industrial personal computer 4 through a switch 3 for image analysis processing (specifically, a head industrial personal computer node is connected with all industrial cameras, and the product images collected by all industrial phases are received by the head industrial personal computer node and then are sequentially transmitted downwards to the following industrial personal computer nodes for processing);

the local area network comprises a switch 3 and a plurality of industrial personal computers 4 connected with the switch 3.

It should be noted that the image analysis processing technology is an existing conventional product appearance image detection and processing technology, which is not an improvement point of the present invention, and is not described herein again.

In the present invention, in a specific implementation, a power strip 5 is further installed below the switch 3, and the power strip 5 is respectively connected to power plugs of the KVM switch 2, the switch 3, and the industrial personal computer 4, and is used for providing power for these devices by connecting to an external ac power supply.

In the present invention, a plurality of industrial personal computers 4 are located between the KVM switch 2 and the switch 3.

It should be noted that KVM is an abbreviation of Keyboard (Keyboard), display (Video), and Mouse (Mouse). A KVM switch allows a system administrator to control multiple servers or computer peripherals of a computer host through a set of keyboard, mouse, and display.

In the present invention, as shown in fig. 2, the cluster enclosure adopts the non-centralized recursive dynamic load balancing calculation architecture of the present invention. If the distance between the cluster cabinets is close, a machine room can be built, the image processing industrial personal computers are deployed in a centralized mode, and the external industrial cameras transmit data signals to the industrial personal computers in the machine room through network devices such as the switch and the concentrator.

In order to make the product detection device convenient for the maintenance personnel without computer professional knowledge to adjust and use, each product detection device adopting the non-centralized recursive dynamic load balancing computing architecture of the invention is designed, and at least 1 KVM switcher and 1 switcher are installed. Each set of devices uses the switch in the cluster cabinet as the core node of data switch, and uses KVM as the main UI (deployment interface).

In the present invention, as shown in fig. 3, for a cluster cabinet as a product detection device, 4 industrial computers are deployed at the time of initial installation of the device, and these industrial computers access to a KVM switch of the local computer and an exchanger in the cabinet by using a network cable; according to the product demand of producing the line, a plurality of industry cameras of externally mounted insert the switch in this rack.

It should be noted that, a customer who has a tracing requirement for product quality defects may deploy a disk array in a cluster rack to store a defect picture in an image processing result, and perform persistent storage management of defect data by using a Raid0 hard disk.

In a specific implementation of the present invention, each of the industrial personal computers is respectively used as an industrial personal computer node (e.g., a computing node or a management node), and an image processing parallel computing architecture composed of an existing OpenMP multithread program framework and an OpenCV open source function library is specifically adopted for processing an image. The image processing process in each single industrial personal computer comprises the processes of conventional image preprocessing, image foreground and background cutting, mode recognition (or called target recognition) and the like.

It should be noted that the OpenMP multithreading framework and the OpenCV open source function library are not improvements of the present invention, and are conventional processing technologies and will not be described herein again,

for each industrial personal computer, in the image preprocessing process, a large number of data traversal operations of two-dimensional images are used, and the processes generally process points and adjacent points in a limited range around the points one by one. The points are independent of each other, the method is suitable for an OpenMP computing architecture, and the multi-core industrial personal computer is used, so that the computing speed can be obviously increased.

In the pattern recognition process of each industrial personal computer, no matter the template library is used for template matching or the feature descriptors are used for texture pattern recognition of the region points, a large amount of calculation and comparison work is needed. These works also apply to the OpenMP computing architecture.

In conclusion, each industrial personal computer is respectively used as an industrial personal computer node, and an image processing parallel computing architecture is formed by using OpenMP and OpenCV in the industrial personal computer nodes, so that image processing work responsible by the local computer can be quickly completed, and processing results in the local computer are synthesized to generate a local task pool processing result mRetvalue.

It should be noted that when a plurality of computing objects call the OpenMP computing architecture concurrently, RCost (computing resource overhead) of the computing load of the industrial personal computer increases rapidly. When the computing tasks are started concurrently, the problem of short-time overload in the node task processing process caused by concurrency of a large number of computing tasks can be solved by reasonably controlling the task execution frequency.

In the invention, in order to reasonably control the task execution frequency and the establishment of the task frequency, the real-time analysis and the automatic adjustment are carried out according to the occupation condition of system resources. The task may be started with an initial period T of 100ms, which is set to an empirical value, determined according to a plurality of factors such as the resolution of the image to be processed, the accuracy of defect recognition, and the like. In general, the more complex the image processing task, the longer the initial period.

In the present invention, in a specific implementation, as shown in table 1 below, table 1 is a task execution statistical example table in operation of a computing architecture in which four industrial personal computers are installed. In the table, only three industrial computer nodes 0-2 distribute calculation tasks, the industrial computer 3 is in an idle state, and maintenance personnel can consider to switch the network cable to other cluster cabinets to provide services.

As shown in table 1 below, for each industrial computer node, the overhead conditions of the computing resources of the industrial computer node (i.e., of all the multiple computing tasks) are continuously collected according to a preset computing period T in the industrial computer node, so that the overhead of the computing resources of the industrial computer node is stabilized to be 40-75% as much as possible, and stable service is provided on the premise of ensuring high utilization of the computing resources.

It should be noted that, with respect to the present invention, the preset calculation period T refers to a stable execution period of the calculation task, and is a calculation time length required for performing calculation processing on one calculation task (for example, performing image detection processing on one image). The determination of the execution frequency (i.e. the determination of the preset calculation period T) is mainly determined by the requirements of the industrial camera on the image capturing resolution, the detection accuracy, the target width and the target height.

1. The image capture resolution is determined by the horizontal/vertical resolution m (px) n (px) of the camera parameters, such as 1920px 1200px area-array camera, which refers to an area-array camera with horizontal and vertical reference resolutions of 1920 pixels and 1200 pixels, respectively (i.e. an area-array camera with 1920 pixels × 1200 pixels).

2. The defect radius Er (in mm) is used to indicate the detection accuracy. Er is determined by the requirements of a user side detection department, such as bubbles with the diameter less than 0.5mm, bubbles with the diameter more than 1mm and the like.

3. The target width, set as W, refers to the target width of the product to be covered by image capture each time, and the unit is mm. According to the image capture resolution, N/W pixel points are obtained every mm, and each pixel point represents an image of W/N (mm).

4. The target height is set as H, and refers to the target height for image capture coverage of the product every time, and the unit is mm. According to the image capture resolution, M/H pixel points are obtained every mm, and each pixel point represents an image of H/M (mm).

5. According to the nyquist sampling theorem, the minimum identification point occupies at least 2 pixel points (rows or columns), and the horizontal/vertical imaging precision of the conventional area-array camera is the same, so that the unit of the minimum identification point radius is mm (2H/M) and mm (2W/N). The minimum recognized point may represent a normal point and may also represent a defective point.

6. The number of pixels corresponding to the defect radius Er (unit is mm) is Er N/W ═ Er M/H.

7. And (4) detecting step length l ═ Er × N/2W ═ Er × M/2H in the image traversing process, wherein the unit is the number of pixel points.

For example, under theoretical conditions, an area-array camera with a resolution of 1920px 1200px is used to image a product surface within a range of 160mm x 100cm, which corresponds to an image of approximately 10 × 10 pixels per square millimeter of the product surface; minimum Er 2 pixels (row or column) occupying 0.2mm of the horizontal or vertical surface of the product.

Through factory environment installation and field test, in an industrial environment, equipment is inevitably influenced by various factors such as environmental vibration, light source intensity change, product vibration and the like, and an area array camera with resolution of more than 4 times (including 4 times) is required to be used for image capture for obtaining the same Er value.

In order to determine the value of the preset calculation period T, in the calculation framework provided by the present invention, the adopted industrial camera is a standard product, specifically, a 1920 px-1200 px area-array camera is adopted, the detection precision is 1mm, the field test is performed in the factory environment, and the time overhead of taking out the calculation task from the task pool and completing the single-point processing each time is calculated to be 80-100ms, and the calculation framework provided by the present invention is convenient for calculation, actual time overhead and implementation and popularization, wherein the adopted preset calculation period T is set to be 100 ms.

That is, the industrial camera used in the present invention is preferably a 1920px by 1200px area-array camera (i.e., an area-array camera with horizontal and vertical reference resolutions of 1920 pixels and 1200 pixels, respectively);

for 1920px 1200px area-array camera, the invention has a predetermined calculation period T_InitialPreferably, it is set to 100 ms.

In the concrete implementation, when the subsequent product is implemented, the target width or the target height of the area-array camera is increased by X times, wherein X is the multiplying power, and X is the increasing power>0, and the detection accuracy is kept unchanged, the image capturing resolution of the industrial camera is adjusted with the same magnification according to the improved magnification of the image capturing width or height, and at the moment, if X is more than or equal to 1, the preset calculation period corresponding to each industrial control machine node is Tx-T_InitialX; if 0<X<1, taking Tx as 100 ms;

correspondingly, the Tx is used as a judgment algorithm for the overhigh load of the calculation task of the industrial control machine node in the third step, and the value of the preset calculation period T is adopted; for example, in the above example, the image is taken over a range of 160mm × 200cm, and a 1920px × 2400px area-array camera is used, which is set to 200 ms.

When the defect radius of the area-array camera is reduced by Y times, the detection precision is improved by Y times, Y>0 and Y is the multiplying power, and at the moment, if Y is more than or equal to 1, the preset calculation period corresponding to each industrial control machine node is Ty-T_Initial*Y²(ii) a If 0<Y<1, taking Ty as 100 ms;

correspondingly, the Ty at this time is used as a determination algorithm for determining that the load of the calculation task of the tool control node is too high in the third step, and a value of a preset calculation period T is finally adopted. For example, if the defect radius of the industrial camera in the above example (i.e., the area-array camera of 1920px — 1200 px) is 0.5mm, the preset calculation period corresponding to each node of the industrial control machine is adjusted to Ty of 400 ms.

Table 1: and task execution statistical tables in the running of computing architectures of the four industrial personal computers.

In the specific implementation, for each industrial personal computer node, the number of the remaining calculation tasks in the industrial personal computer node and the number of calculation results collected after the calculation tasks are completed are counted in real time. And when the number of the residual calculation tasks in the nodes of the industrial personal computer is cleared and the number of the calculation results collected after the calculation tasks are completed is equal to the total number of the calculation processing tasks, synthesizing a task pool processing result mRetValue (namely the total number of the calculation tasks) of the nodes of the industrial personal computer.

In table 1, the total number of calculation tasks is equal to the sum of the number of remaining calculation tasks and the number of calculation results collected after the completion of the calculation tasks.

Fig. 4 is a flowchart of the work flow of performing the quality inspection of the product (image inspection of the product appearance defects) based on the non-centralized recursive dynamic load balancing calculation framework provided by the present invention.

In fig. 4, each set of devices contains multiple industrial cameras, which use a network access switch. Each industrial personal computer node (as a computing node or a management node) is accessed to the switch. In fig. 4, PIC0, PIC1 … …, and the like are image data that are continuously transmitted from the switch to the head industrial personal computer node.

Referring to fig. 4, for the non-centralized recursive dynamic load balancing computation architecture provided by the present invention, when used for product detection, the computation architecture specifically includes the following detection steps:

the method comprises the steps that firstly, an industrial camera in an image capturing mechanism is used for continuously capturing images (specifically, high-speed continuous image capturing) of products on a production line, then images are transmitted to a head industrial personal computer node, and an image analysis initial task pool is constructed by a head management node;

each industrial personal computer is respectively used as an industrial personal computer node;

the head industrial personal computer node is an industrial personal computer used for connecting all industrial cameras in the image capturing mechanism, and is shown in fig. 4;

secondly, a head industrial personal computer node distributes calculation tasks (namely, the calculation tasks are executed as management nodes and a transmission process of the calculation tasks is initiated) to next industrial personal computer nodes in a progressive mode, and images in the image analysis initial task pool are subjected to image analysis processing (namely, the calculation tasks are executed as calculation nodes);

it should be noted that the logical identity of the head industrial personal computer node is mainly based on the management node, and the image acquisition mechanism acquires images to form an initial task pool and performs the work of driving, identifying, positioning and detecting the product to the equipment. When the nodes distribute tasks, except the tail (namely the tail end) of the industrial personal computer node (which is just used as a computing node), other nodes are used as management nodes and are also used as computing nodes.

And after the initial task pool is constructed, the head industrial personal computer node initiates a distribution process of transmitting a calculation task.

It should be further noted that the head industrial personal computer node is configured to execute a management task and a computation task (i.e., simultaneously serve as a management node and a computation node), where the management task executed by the head industrial personal computer node includes: distributing calculation tasks to the next industrial computer node in a progressive mode, installing an industrial camera driving program, initializing a task pool and a node set, driving a camera to take a picture, and obtaining an image from the camera; the calculation tasks executed by the head industrial personal computer node comprise: and carrying out image analysis processing on the image acquired by the industrial camera. The head industrial personal computer node is mainly used as a management node and also used as a computing node.

And the tail industrial computer node is only used for receiving and executing the calculation tasks (such as forwarded calculation tasks, for example, image analysis processing tasks) which are distributed by the previous industrial computer in a progressive manner, and regressing the calculation results after the calculation tasks are completed to the previous industrial computer node. The node does not distribute calculation tasks to next industrial computer nodes (because the next industrial computer node is not available), and the node only serves as a calculation node.

The rest industrial personal computer nodes except the head industrial personal computer node and the tail industrial personal computer node are used for executing management tasks and calculation tasks (namely, the rest industrial personal computer nodes are used as the management nodes and the calculation nodes at the same time), wherein the management tasks executed by the rest industrial personal computer nodes comprise: distributing calculation tasks to the next industrial computer node in a progressive mode; the computing tasks executed by other industrial personal computer nodes comprise: and carrying out image analysis processing on the image acquired by the industrial camera. That is, the rest of the nodes of the industrial personal computer are relatively equivalent to computing nodes, and are relatively equivalent to management nodes.

It should be noted that, for the present invention, the head industrial computer node refers to an industrial computer for connecting image capturing devices (for example, including all the multiple industrial cameras) in the computing architecture of the present invention, and plays a leading role in the computing architecture, and is an essential component in the architecture. When the structure is built, identity definition is configured through configuration files, connected camera parameters are configured, and PLC connection parameters are configured to determine, and the structure is a necessary component in the structure.

For the invention, the tail industrial personal computer node refers to a node which triggers a recursion ending condition when more than 1 industrial personal computer in a computing architecture of the invention forms a cluster, and the node does not issue computing tasks outwards any more. The judgment conditions are as follows: the method can stably not trigger the judgment criterion that the load of the calculation task is too high, namely, the tail industrial computer node is the industrial computer node with the load of the calculation task being not too high, the industrial computer node is used as the tail end node of the plurality of industrial computer nodes, and the calculation task is not issued outwards any more, and the method is shown in fig. 4. For the criterion of calculating the overload of the task, refer to the content of the third step below.

For the present invention, referring to fig. 4, the rest of the industrial personal computer nodes except the head industrial personal computer node and the tail industrial personal computer node refer to a plurality of industrial personal computer nodes which are generated by the non-centralized recursive dynamic load balancing and dynamically linked into the computing architecture. The nodes are sequentially discovered from the beginning of the nodes by taking the head industrial personal computer node as a starting point according to the progressive relation, whether the node conditions are continuously discovered in a progressive way is determined by the judgment criterion that the load of the computing task is too high, and the nodes are discovered until the tail industrial personal computer node is discovered. The order relationship (namely the front-back order) among other industrial personal computer nodes (comprising a plurality of industrial personal computer nodes) can be sent to the head industrial personal computer node through the network for persistent storage, and the position relationship between the previous node and the next node is indicated in each node in the form of an IP chain table (namely an IP address chain table).

And thirdly, for any industrial computer node in the computing architecture, when a task pool of the industrial computer node is not empty, judging whether the computing task load of the industrial computer node is too high in real time (namely, the computing resource cost is too high, and more computing tasks remain), if so, performing progressive distribution of the computing tasks to the next industrial computer node, executing the remaining computing tasks which are not subjected to progressive distribution, and sending a computing result of the industrial computer node after the computing task is executed to the previous industrial computer node or preset equipment (such as operation control equipment on an external production line).

In the third step, the calculation task load of the nodes of the industrial personal computer is judged to be too high, and a specific judgment algorithm is as follows:

(RCost >75) & (number n of remaining computation tasks: preset computation period T > real-time required time T);

wherein & & is a logical AND;

the RCost is a calculation task load of the nodes of the industrial personal computer (i.e., calculation resource overhead of an image analysis processing task), and may be obtained through an Application Programming Interface (API) of an operating system installed on the industrial personal computer.

The real-time requirement time t is calculated according to the actual condition of a production line. The method specifically comprises the following steps: measuring a time length Tmax required by the computing framework of the invention to finish detection (such as image detection) on a product, wherein the time length Tmax is an upper limit value Tmax of a time length (namely, a time length of computing processing) required by the computing framework of the invention to finish detection (such as image detection) on a product, and is measured in milliseconds, namely, an actual detection time length is less than or equal to the upper limit value Tmax;

the real-time property requires time t and is required to be less than or equal to Tmax.

In the concrete implementation, when the calculation time of the industrial personal computer node for completing detection (such as image detection) on a product exceeds t, the calculation task backlog is generated, and when the calculation time is less than the time t, the idle of calculation resources is generated.

In particular, in order to ensure reliable operation of the computing architecture of the present invention, it is preferable that: and taking the real-time requirement time t as Tmax/1.2.

In the third step, when the load of the calculation task is too high (namely, the judgment result is true), one half of the calculation tasks of the nodes of the industrial personal computer are distributed to the next industrial personal computer node in a progressive manner.

It should be noted that, when the determination result is true, the industrial personal computer node is simultaneously used as the identity of the management node, the network is used in the switch to find the computing node according to the addressing Map (for example, the IP address linked list), and the computing task is distributed 1/2 to the next industrial personal computer node (as the computing node) for computing. And starting a concurrent task processing process while receiving the calculation task by the next industrial personal computer node (serving as a calculation node), judging the calculation task load condition of the current industrial personal computer node, and if the calculation task load is too high, continuing to serve as the identity of a management node, and performing progressive distribution of the calculation task to the next industrial personal computer node (for example, continuing to distribute one half of the calculation task of the current node).

According to the process, whether each industrial personal computer node serving as a computing node has the identity of a management node or not is determined by the computing task load in each industrial personal computer node, and the internal number of the target industrial personal computer node which is progressively distributed and the number of distributed tasks are synchronously recorded in each distribution of the task pool.

In the invention, in the third step, for the tail industrial personal computer node (namely the tail end node), the task allocation logic cannot be triggered, and the tail industrial personal computer node only exists as a computing node. This is the end of the progressive distribution of computing tasks.

In the third step, for the tail industrial computer node (namely, the tail end node), after the processing of the calculation task on the tail industrial computer node is completed, the result (namely, the calculation result) after the completion of the calculation task is fed back to the upper industrial computer node according to the task source. And for the previous industrial computer node, the same management node and the same computing node identity are provided, the computing task of the previous industrial computer node is processed to synthesize the processing result mRetValue of the previous industrial computer node task pool and combine with the computing result cache number SubRetValue received from the child node (namely the next industrial computer node), so as to obtain the final result cache number RetValue of the local computer (namely the previous industrial computer node), and the result is continuously sent.

In the third step, in the invention, for each industrial computer node except the tail industrial computer node (namely the tail end node) and the head industrial computer node, the calculation task of the local computer is required to be completed, the result after the calculation task is completed (namely the calculation result) and the received calculation result of the sub-calculation node (namely the next industrial computer node) are combined together to obtain the final result cache number RetValue, and then the regression process of the calculation result is carried out. That is, for each industrial computer node except for the tail industrial computer node (i.e., the end node) and the head industrial computer node, the calculation result after the calculation task of the industrial computer node and the received calculation result of the next industrial computer node are sent to the previous industrial computer node together.

In the third step, for the head industrial personal computer node (i.e. the head end node), the calculation result cache number subRetValue (i.e. the calculation result of the next industrial personal computer node) of the child node is received and then synthesized with the calculation task of the local computer to form a final detection result, and then a signal is sent to a production line for product classification, so that the detection period of a single product is finished.

In the invention, on the basis of the computing architecture, the head industrial personal computer nodes (namely head end nodes) are also used for monitoring the computing task distribution condition of each industrial personal computer node in the local area network, and when the computing task load of all the extension machines (namely the industrial personal computer nodes) in the local area network is judged to be overhigh, maintenance personnel are informed of the number of the industrial personal computers needing to be increased; when the computing load of part of the extension sets is low, maintenance personnel is prompted to adjust the equipment to provide services for other clusters.

In the specific implementation, in the above flow for constructing the non-centralized recursive dynamic load balancing calculation architecture, the time delay is increased in each step, and the time delay is longer as the number of nodes is larger. And a solution is provided, and after the production line is stably produced, the logical relation among the nodes is stored, and the task allocation progress and the result collection regression path are kept unchanged. And in the aspect of task network distribution, the task network topology mode of the first-stage management node distribution is optimized, and the number of network transmission paths is reduced. And (5) the feedback result data packet is extremely small, and the final detection result is obtained by summarizing and summarizing according to the original logic.

In the concrete implementation, when the production line changes or the computing architecture is adjusted, the maintenance personnel inform the equipment to perform non-centralized recursive dynamic load balancing adjustment, dynamically construct the computing architecture and complete the adaptation work. Therefore, on the basis of ensuring a non-centralized recursive dynamic load balancing calculation architecture, the time overhead of multi-level network transmission is reduced.

In the concrete implementation, when the specification of a product produced on a production line is larger or the requirement on calculation precision is high, the head management node constructs a plurality of recursive dynamic load balancing task pools, and constructs a non-centralized recursive dynamic load balancing calculation framework controlled by a multi-cycle pipeline in a parallel mode of the pipeline.

In the present invention, it should be noted that the present invention uses a dynamic load balancing method to schedule tasks in the task pool. Generally, dynamic load balancing has better performance than static load balancing. The method for realizing dynamic load balancing comprises the following steps: centralized dynamic load balancing and decentralized dynamic load balancing.

In the centralized dynamic load balancing, a specific main process generally manages tasks, a subprocess is idle or applies for a computing task to the main process when the computing task is completed, and the main process distributes the computing task to the subprocess from an incomplete task queue.

Rather than centralized dynamic load balancing, it is meant that each thread can distribute tasks. A thread can receive tasks and can also distribute tasks. The implementation method comprises two steps: firstly, dividing a task pool into a plurality of sub-task pools by using a main process, and distributing the task pools to each sub-process for calculation; and secondly, a process group completes task allocation of the task pool, and the tasks in the task pool are directly allocated to the processes for calculation.

The non-centralized recursive dynamic load balancing computing architecture provided by the invention can provide a computing server architecture which can be directly adjusted by a user and is elastically expandable for machines (such as visual detection equipment) needing to adopt a cluster structure. The flexible architecture not only supports the expansion of new computing nodes when computing resources are insufficient, but also supports the reduction of the number of computing nodes when the computing resources are idle.

It should be noted that the computing architecture of the present invention is mainly used for capturing images to an analysis result collection stage.

It should be noted that, the present invention is based on solving the product quality detection requirement on the plane glass production line, and the computing architecture of the present invention is suitable for various application scenarios where the fragmentation computing task is generated in real time and the computing result is generated by real-time computing. In the application of the present invention, the calculation task is mainly an image analysis processing task, but the application of the architecture is not limited to the image analysis processing task. The invention relates to a high-performance data processing computing framework, which is an elastic parallel computing cluster framework.

Compared with the prior art, the non-centralized recursive dynamic load balancing computing framework provided by the invention has the following beneficial effects:

1. for the method, a non-centralized dynamic load balancing computing architecture is constructed in a mode of recursive task progression and result specification.

2. The establishing process of the computing architecture in operation is divided into two stages: in the construction period and the stable optimization period, the system automatically divides the calculation tasks layer by layer and collects the results inversely, the network path is longer, and the transmission path of data in the network is optimized after stabilization, so that a stable and high-speed dynamic load balancing calculation framework is formed.

3. After the recursive dynamic load balancing computing architecture provided by the invention is packaged, a pipeline parallel method is used for further parallelization promotion, and the concurrency of the architecture on a fine and complex processing process is effectively optimized.

4. The dynamic load balancing computing architecture is constructed in a recursive manner, so that sequential overflow of computing tasks among a plurality of industrial personal computers in the system is realized, and the computing tasks are flexibly distributed. And forming a clear resource demand boundary on the basis of ensuring that the calculation task can be efficiently completed. After the system runs stably, maintenance personnel can be clearly informed of the number of the industrial personal computers required for calculation, and high-quality adaptation of calculation resources and calculation tasks is achieved.

5. For the invention, when the computing resources are redundant, maintenance personnel can put the redundant industrial personal computer in storage or record, and when part of equipment is under computing, the redundant industrial personal computer is directly accessed into the system for use, thereby realizing the flexible management of the computing resources among a plurality of sets of equipment.

6. The invention uses the non-centralized recursive dynamic load balancing calculation framework on the basis of forming a calculation cluster by using OpenMP and nodes in the calculation node (industrial personal computer), thereby greatly improving the tolerance of the adjustment of the production line.

In summary, compared with the prior art, the non-centralized recursive dynamic load balancing computing architecture provided by the invention has a scientific design, can realize sequential overflow of computing tasks among multiple industrial personal computers, flexibly distributes the computing tasks, forms a clear resource requirement boundary on the basis of ensuring efficient and concurrent completion of the computing tasks, and has great practical significance.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A non-centralized recursive dynamic load balancing computing architecture is characterized by comprising a KVM switcher (2), a switch (3) and a plurality of industrial computers (4);

the KVM switcher (2), the switchboard (3) and the plurality of industrial personal computers (4) are arranged in the hollow cluster cabinet (1);

a plurality of industrial cameras are arranged outside the cluster cabinet (1);

the KVM switcher (2) is respectively connected with the plurality of industrial personal computers (4), and is used for inputting control instructions of users and forwarding the control instructions to the industrial personal computers (4) for execution;

the switch (3) is respectively connected with a plurality of industrial personal computers (4) and a plurality of industrial cameras to realize data communication;

each industrial camera is used for shooting and collecting product images and sending the product images to an industrial personal computer (4) through a switch (3) for image analysis and processing;

the local area network comprises a switch (3) and a plurality of industrial personal computers (4) connected with the switch (3).

2. The non-centralized recursive dynamic load balancing computing architecture according to claim 1, wherein a power strip (5) is further installed below the switch (3);

the power strip (5) is respectively connected with power line plugs of the KVM switcher (2), the switcher (3) and the industrial personal computer (4) and used for providing working power for the devices by connecting an external alternating current power supply.

3. The non-centralized recursive dynamic load balancing computing architecture according to claim 1, characterized by a plurality of industrial computers (4) located at positions between the KVM switch (2) and the switch (3).

4. The non-centralized recursive dynamic load balancing computation architecture according to claim 1, wherein for each industrial computer node, the number of remaining computation tasks in the industrial computer node and the number of computation results collected after the computation tasks are completed are counted in real time;

5. The non-centralized recursive dynamic load balancing computing architecture according to any one of claims 1 to 4, when used for product detection, comprising in particular the following detection steps:

6. The non-centralized recursive dynamic load balancing computation architecture according to claim 5, wherein in the third step, it is determined that the computation task load of the node of the industrial personal computer is too high, and a specific determination algorithm is as follows:

wherein & & is a logical AND;

wherein the real-time requirement time t is less than or equal to Tmax;

7. The non-centralized recursive dynamic load balancing computation architecture according to claim 6, wherein the real-time requirement time t is Tmax/1.2.

8. The architecture of claim 6, wherein industrial cameras, in particular area-array cameras with horizontal and vertical reference resolutions of 1920 pixels and 1200 pixels, respectively;

firstly, when the target width or the target height of the area-array camera is increased by X times, X is the increased multiplying power, X>0, and the detection accuracy is kept unchanged, the image capturing resolution of the industrial camera is adjusted with the same magnification according to the improved magnification of the image capturing width or height, and at the moment, if X is more than or equal to 1, each industrial controlThe preset calculation period corresponding to the machine node is Tx-T_InitialX; if 0<X<1, taking Tx as 100 ms;

9. The non-centralized recursive dynamic load balancing computation architecture according to claim 5, wherein in the third step, when the computation task load is too high, one-half of the computation tasks of the current IPC node are progressively distributed to the next IPC node.

10. The non-centralized recursive dynamic load balancing computation architecture according to claim 5, wherein in the third step, for each of the IPC nodes except the tail IPC node and the head IPC node, the computation result after the computation task of the current IPC node and the received computation result of the next IPC node are sent to the previous IPC node together;