CN109245979B

CN109245979B - CANopen master-slave station reliability control method and overall management device thereof

Info

Publication number: CN109245979B
Application number: CN201811303119.0A
Authority: CN
Inventors: 文长明; 文可; 卢昌虎
Original assignee: Hefei Baode Industrial Automation Co ltd
Current assignee: Hefei Baode Industrial Automation Co ltd
Priority date: 2018-11-02
Filing date: 2018-11-02
Publication date: 2021-07-27
Anticipated expiration: 2038-11-02
Also published as: CN109245979A

Abstract

The invention discloses a reliability control method of a CANopen master-slave station and a general planning manager thereof, which are applied to a plurality of devices on a CAN bus, wherein one device is a master station, and the other devices are slave stations; and when the master station starts the slave station, the master station configures the heartbeat or the life cycle of the slave station. The reliability control method comprises the following steps: enabling the slave station to send a heartbeat packet to the master station, or enabling the master station to send a life monitoring message to the slave station; arranging all slave stations according to the node sequence and forming a starting chain; removing the slave station which is successfully started or the slave station which does not need to be started again from the starting chain, and moving the slave station which fails to be started and needs to be started again to the tail of the starting chain; and when the master station does not receive the heartbeat packet of the slave station after a preset time, or the master station sends a life cycle monitoring message for multiple times and does not receive the reply of the slave station, carrying out corresponding error processing according to the type of the slave station.

Description

CANopen master-slave station reliability control method and overall management device thereof

Technical Field

The invention relates to a reliability control method and a manager thereof in the technical field of communication, in particular to a CANopen master-slave station reliability control method and a general planning manager thereof.

Background

CAN is an abbreviation of Controller Area Network, and is a serial communication protocol that is ISO international standardized. CAN belongs to the field bus category and is a serial communication network that effectively supports distributed control or real-time control. Meanwhile, the high performance and reliability of CAN have been widely recognized and widely used in industrial automation, ships, medical equipment, industrial equipment, and the like.

A network based on a CAN bus is provided with a plurality of devices, unexpected faults CAN occur in the devices in the operation process, and when some parts in a distributed system cannot work normally, the normal operation of the whole system CAN be damaged. When a problem occurs in a device, a reliability control mechanism is needed to implement the problem in order to detect which device fails and to lock the device that has failed.

Disclosure of Invention

Aiming at the problem of failure of a single module in operation possibly caused in a distributed system with multiple devices combined, the invention provides a reliability control method of a CANopen master-slave station and a planning manager thereof, which realize the failure detection of the multiple devices, carry out selective recovery processing according to the types of the multiple devices and the system pre-configuration, and carry out the processing including stopping the whole network, or resetting all the devices, or resetting a single device, and carrying out restarting and configuration processes after resetting.

The invention is realized by adopting the following technical scheme: a CANopen master-slave station reliability control method is applied to a plurality of devices on a CAN bus, wherein one device is a master station, and the other devices are slave stations; when the master station starts the slave station, the master station configures the heartbeat or the life cycle of the slave station;

the reliability control method comprises the following steps:

enabling the slave station to send a heartbeat packet to the master station, or enabling the master station to send a life monitoring message to the slave station;

arranging all slave stations according to the node sequence and forming a starting chain;

removing the slave station which is successfully started or the slave station which does not need to be started again from the starting chain, and moving the slave station which fails to be started and needs to be started again to the tail of the starting chain;

and when the master station does not receive the heartbeat packet of the slave station after a preset time, or the master station sends a life cycle monitoring message for multiple times and does not receive the reply of the slave station, carrying out corresponding configuration error processing according to the type of the slave station.

As a further improvement of the above solution, the master station comprises a heartbeat consumer module, and each slave station comprises a heartbeat producer module;

one heartbeat consumer module comprises an HBC table with a plurality of table entries, and each heartbeat producer module corresponds to one table entry of the communication object table;

one table entry of the HBC table corresponds to one table entry of a communication object table, and each table entry of the HBC table corresponds to heartbeat information of a slave station;

each heartbeat producer module judges whether the time interval between the previous sending time and the current time is greater than a preset heartbeat time, and sends a heartbeat packet to the master station when the time interval is greater than the preset heartbeat time;

and polling all the table entries of the HBC table by the heartbeat consumer module, judging whether the last heartbeat receiving time and the current time exceed a preset time two, and if so, performing error processing.

As a further improvement of the above scheme, the master station includes an NMTM module, the NMTM module includes an NMTM module table, the NMTM module table sets a plurality of NMTM entries, and each NMTM entry corresponds to one slave station;

each NMTM table entry corresponds to two COB table entries of a communication object table, and the two COB table entries of the communication object table correspond to table entries of a communication object table of a slave station.

Furthermore, the slave station comprises an NMTS module, and the NMTS module corresponds to an item of a communication object table;

the NMTM module carries out polling and sends a life monitoring message to the slave station according to the life monitoring time of the NMTM table item; the NMTS module receives call-back and replies a message to the master station; in the polling of the NMTM module, the NMTM module performs the error processing when waiting for the slave station to reply for more than a preset time.

As a further improvement of the above scheme, the method for arranging all the slave stations in the node order and forming the start chain includes:

all the slave stations are started in sequence, and the slave stations started subsequently check whether the previous slave station is started successfully or not and move forwards when the slave station with successful starting exists.

As a further improvement of the above scheme, the method for moving the slave station which fails to start and needs to start again to the tail of the starting chain comprises the following steps:

and adding a copy of the slave station which fails to start and needs to start again at the tail of the chain, taking the original copy as a node to be removed, and removing the original copy at the end of polling of the start chain.

As a further improvement of the above, the reliability control method further includes:

starting the slave station for a plurality of times;

and after the secondary starting fails, resetting the secondary station again, and removing the secondary station failed in starting at least twice.

As a further improvement of the above scheme, the error handling method includes:

judging whether the node is positioned in the control list or not;

when the nodes are positioned in the control list, judging whether to allow all the nodes to stop, if so, stopping all the nodes and finishing error processing, otherwise, judging whether to allow all the nodes to reset;

when the node is not located in the control list, ending error processing;

resetting all nodes when all nodes are allowed to be reset; when all the nodes are not allowed to be reset, resetting a single node, starting equipment of the single node, and ending error processing.

As a further improvement of the above scheme, when the slave station sends a heartbeat packet to the master station, the slave station performs a corresponding error process every time it loses a reply of the heartbeat packet;

when the master station sends a life monitoring message to the slave station, the slave station performs corresponding error processing every time the slave station loses a reply of the life monitoring message; and after the primary error processing is carried out, the slave station abandons the checking of the life monitoring message.

The invention also provides a orchestration manager for managing a plurality of modules provided in a plurality of devices on the CAN bus; when the overall management manager manages a plurality of modules, the CANopen master-slave station reliability control method is realized.

The CANopen master-slave station reliability control method and the overall management device thereof CAN solve the problem of dead connection between the master station and the slave station, improve the reliability of a distributed system, realize the starting of a plurality of devices on a CAN bus and improve the flexibility of connection between the devices.

Drawings

Fig. 1 is a flowchart of start commands, events, protocols and states of a plurality of devices in embodiment 1 of the present invention;

fig. 2 is a flowchart of the process of starting up the master and slave stations in fig. 1;

fig. 3 is a diagram of a master station and a plurality of slave stations under the heartbeat mechanism in embodiment 1 of the present invention;

fig. 4 is a corresponding diagram of a master station and a plurality of slave stations under a life cycle mechanism in embodiment 1 of the present invention;

FIG. 5 is an organizational chart of an NMTM table in example 1 of the present invention;

FIG. 6 is a block diagram of a orchestration manager according to embodiment 3 of the present invention;

FIG. 7 is a flowchart of a first part of the process when the orchestration manager starts up the master according to embodiment 3 of the present invention;

FIG. 8 is a flowchart of a second part of the process when the orchestration manager starts up the master station according to embodiment 3 of the present invention;

FIG. 9 is a module polling diagram of the orchestration manager of FIG. 6;

FIG. 10 is a functional block diagram of a first portion of a state machine of the orchestration manager of FIG. 6;

FIG. 11 is a functional block diagram of a second portion of the state machine of the orchestration manager of FIG. 6;

fig. 12 is a flowchart of the orchestration manager of fig. 6 initiating processing of a starting NMT slave;

fig. 13 is a flowchart of a first part of a method for starting a CANopen slave station according to embodiment 3 of the present invention;

fig. 14 is a flowchart of a second part of a method for starting a CANopen slave station according to embodiment 3 of the present invention;

fig. 15 is a flowchart of a third part of a method for starting a CANopen slave station according to embodiment 3 of the present invention;

FIG. 16 is a flow chart of the orchestration manager checking configuration of FIG. 6;

FIG. 17 is a flowchart of the orchestration manager checking NMT status in FIG. 6;

FIG. 18 is a flowchart of the orchestration manager of FIG. 6 initiating error control;

FIG. 19 is a flowchart of the orchestration manager of FIG. 6 causing a slave to start a first part of a state machine;

FIG. 20 is a flowchart of the orchestration manager of FIG. 6 causing a slave to start a second part of the state machine;

FIG. 21 is a flowchart of a third portion of the orchestration manager of FIG. 6 causing a slave to start a state machine;

FIG. 22 is a flowchart of a fourth portion of the orchestration manager of FIG. 6 causing a slave to start a state machine;

fig. 23 is a schematic block diagram of an AutoHb module configuration state machine according to embodiment 3 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

Referring to fig. 1 to 5, the present embodiment provides a method for controlling reliability of a CANopen master and slave station, which is applied to a plurality of devices on a CAN bus, wherein one device is a master station, and the other devices are slave stations. When the master station starts the slave station, the master station configures the heartbeat or the life cycle of the slave station.

The master station comprises a heartbeat consumer module and each slave station comprises a heartbeat producer module. A heartbeat consumer module comprises an HBC table with a plurality of table entries, and each heartbeat producer module corresponds to a table entry of a communication object table. One table entry of the HBC table corresponds to one table entry of the communication object table, and each table entry of the HBC table corresponds to heartbeat information of one slave station.

The master station comprises an NMTM module, the NMTM module comprises an NMTM module table, each NMTM module table is provided with a plurality of NMTM table entries, and each NMTM table entry corresponds to one slave station; each NMTM table entry corresponds to two COB table entries of a communication object table, and the two COB table entries of the communication object table correspond to table entries of a communication object table of a slave station. The slave station comprises an NMTS module, and the NMTS module corresponds to a table entry of the communication object table.

The reliability control method comprises the following steps:

(1) enabling the slave station to send a heartbeat packet to the master station, or enabling the master station to send a life monitoring message to the slave station; specifically, each heartbeat producer module judges whether the time interval between the previous sending time and the current time is greater than a preset heartbeat time, and sends a heartbeat packet to the master station when the time interval is greater than the preset heartbeat time.

(2) Arranging all slave stations according to the node sequence and forming a starting chain; the arrangement method comprises the following steps: all the slave stations are started in sequence, and the slave stations started subsequently check whether the previous slave station is started successfully or not and move forwards when the slave station with successful starting exists.

(3) And removing the slave station which is successfully started and the slave station which does not need to be started again from the starting chain, and moving the slave station which fails to be started and needs to be started again to the tail of the starting chain. Here, a copy of the slave station which fails to start and needs to start again is added at the end of the chain, and the original copy is taken as the node to be removed, and is removed when the polling of the start chain is finished.

(4) Starting the slave station for multiple times; and after the secondary starting fails, resetting the secondary station again, and removing the secondary station which fails in starting at least twice.

(5) And when the master station does not receive the heartbeat packet of the slave station after a preset time, or the master station sends the life cycle monitoring message for multiple times and does not receive the reply of the slave station, carrying out corresponding configurative error processing according to the type of the slave station.

And polling all the table entries of the HBC table by the heartbeat consumer module, judging whether the last heartbeat receiving time and the current time exceed a preset time two, and if so, performing error processing. And the NMTM module performs polling and sends a life monitoring message to the slave station according to the life monitoring time of the NMTM table entry. And the NMTS module receives the call-back and replies a message to the master station. In the polling of the NMTM module, the NMTM module performs error processing when waiting for the slave station to reply for more than a preset time three.

In this embodiment, the method for error handling includes:

(a) judging whether the node is positioned in the control list or not;

(b) when the nodes are positioned in the control list, judging whether to allow all the nodes to stop, if so, stopping all the nodes and finishing error processing, otherwise, judging whether to allow all the nodes to reset;

(c) when the node is not located in the control list, ending the error processing;

(d) resetting all nodes when all nodes are allowed to be reset; when all the nodes are not allowed to be reset, resetting a single node, starting equipment of the single node, and ending error processing.

When the slave station sends a heartbeat packet to the master station and does not receive the heartbeat packet reply of the slave station, carrying out corresponding error processing; when the master station sends the life monitoring message to the slave station and does not receive the life monitoring message reply of the slave station, corresponding error processing is carried out; and after the error processing is carried out for one time, giving up the check on the life monitoring message of the slave station.

The CANopen master-slave station reliability control method and the overall management device thereof CAN solve the problem of dead connection between the master station and the slave station, improve the reliability of a distributed system, realize starting of a plurality of devices on a CAN bus, and improve the flexibility of connection between the devices.

Example 2

The present embodiment provides a orchestration manager for managing a plurality of modules provided in a plurality of devices on a CAN bus. When the overall management manager manages a plurality of modules, the steps of the method for controlling the reliability of the CANopen master-slave station in embodiment 1 are implemented.

Example 3

Referring to fig. 6, the present embodiment provides a orchestration manager for managing a plurality of modules disposed in a plurality of devices on a CAN bus.

In order to realize modularization and configurability of the device, modularization processing is needed to be carried out on different functions, and the modularization processing comprises single file and configurability of a single module. In the following description, the orchestration manager is denoted by CopMgr, and orchestrates all modules.

At the very bottom is the communication object table (COB), the upper level structure that is primarily related to the underlying CAN driver.

On top of the COB are the modules, and the PDO is a data synchronization module, which is responsible for data synchronization.

The SDOS and the SDOC are respectively a service communication object server and a client, are mainly used for transmitting SDO messages to remotely read and write an object dictionary, are used for a data object processing module for data transmission by the PDO, are used for a service data object module for remote parameter setting, and can transmit data when the setting is needed.

NMT, NMTM, NMTs are primarily responsible for handling two parts, one of which is life cycle monitoring related and the other of which is a device event at startup.

HBP and HBC are responsible for handling heartbeat mechanism, and EMCP and EMCC are responsible for handling emergency events.

Above this, SRD, SDOM are responsible for dynamic service requests and are based on SDOS, SDOC, and AutoHb is a module that automatically configures heartbeat and lifecycle. CfgMa is a configuration manager that can configure the functionality of the slave using the device description file.

The orchestration manager manages all modules and implements startup and reliability control.

The starting is to process the starting, configuration and possible errors in the process of all the slave stations, and the reliability control is to process potential errors in the operation state after the starting. The master station starting comprises the starting of the master station end and the starting of the master station for processing all the slave stations in parallel, and the master station starting needs to wait for the successful starting of all the slave stations or does not succeed in corresponding processing.

Referring to fig. 7 and fig. 8, the present embodiment further provides a method for starting a CANopen master station, which is applied to a plurality of devices on a CAN bus, where one of the devices is the master station and the other devices are slave stations.

The starting method comprises the following steps:

step S1, after the device is powered on, determines whether the device is configured as an NMT master.

When the equipment is not configured as an NMT master station, judging whether the equipment allows the equipment to start, if so, automatically jumping to an MNT running state by the equipment, and then entering an NMT slave station mode, otherwise, directly entering the NMT slave station mode; if the device is configured as a master station, it is checked whether or not there is a function of an in-flight master station process, which is configured as a master station by a plurality of devices and then arbitration is performed, a win having a high priority is taken as a master station, and a win having a low priority is turned into a slave station, and if so, master station negotiation is performed.

When the device is configured as an NMT master station, executing step S2, and performing NMT flight master station processing on the master station; in this embodiment, the method for processing the NMT flight master station includes: if there are a plurality of devices configured as the master, arbitration is then performed, and the device with the higher priority wins and serves as the master, and the device with the lower priority becomes the slave.

And after the master station fails in the NMT flight master station processing arbitration, the mode of the NMT slave station is entered.

After the master station succeeds in the NMT flight master station processing arbitration, step S3 is executed to determine whether the master station requires LSS service.

When the master requests the LSS service, step S4 is executed to execute the LSS master process. The LSS service is a layer setting service and is used for setting the mark information of the slave station; the mark information comprises equipment type, identification, coding, version number and serial number.

After the master station does not require the LSS service or completes the LSS master station processing, step S5 is executed, whether the active bit of the slave station is set or not is judged, if yes, the slave station with the active bit not set is reset, and if not, all the slave stations are reset; all slave stations have an active bit set, a remote restart active bit is not set, if all the slave stations are not set, all the slave stations are restarted at one time, the configuration can ensure that some special slave stations are not restarted by the master station, the specificity of the slave stations is ensured, and the configuration can be specially used in certain applications, for example, some slave devices do not want to be restarted by the master station at any time, and the slave stations can perform some special tasks.

In step S6, it is determined whether all the slave stations that are forced to start have successfully started.

And when the slave station with forced start fails to start, stopping the start of the master station.

When all the slave stations which are forcibly started are successfully started, step S7 is executed to determine whether the master station allows the automatic entering into the operating state.

And when the master station does not allow the automatic entering of the running state, waiting for the application to enter the running state.

Step S8, determine whether the master station allows all nodes to enter the operating state remotely.

And when the master station is not allowed to remotely enable all the nodes to enter the running state, jumping to the normal operation to finish the starting.

When the master is allowed to remotely let all nodes enter the operation state, step S9 is executed to determine whether the optional slave station is successfully started. Here, after starting all the slave stations, the slave stations are not started successfully on behalf of all the slave stations, and some slave stations are lost or have other errors, so that the slave stations are not started successfully, and some slave stations are not started successfully and continue, but some slave stations may be started successfully because the responsible task is important, and are configured as forced-start slave stations, and if any slave stations exist, the forced-start slave stations must be started successfully, otherwise the master station enters a starting suspension state.

And when all the slave stations are started successfully, all the slave stations are enabled to enter the running state, and then the normal operation is carried out to finish the starting. And when the slave station is not started successfully, firstly enabling the successfully started part of the slave stations to enter the running state, and then jumping to normal operation to finish the starting.

In this embodiment, the determination method in step S1 specifically includes:

determining whether the device is configured as an NMT master by examining the value of bit 0 of object 0x1F80 in the object dictionary; after the device is started, whether the device is a master or a slave is determined by bit 0 of the object 0x1F80, i.e., configured, and the object 0x1F80 is similarly configured, but it may be configured with other attributes, as well as many other objects that may be configured with different attributes. An object points to a decision, both representing a configuration or parameter setting. Therefore, in steps S2-S9:

judging whether the master station allows all nodes to be started remotely by checking the numerical values of the bit 1 and the bit 3 of the object 0x1F80 in the object dictionary; judging whether the master station is allowed to automatically enter the running state or not by checking the numerical value of the bit 2 of the object 0x1F80 in the object dictionary; judging whether to carry out NMT flight master station processing or not by checking the numerical value of bit 5 of an object 0x1F80 in an object dictionary; judging whether an active bit of a slave station is set or not by checking the value of bit 4 of the object 0x1F84 in the object dictionary; whether the slave station is forcibly started is judged by checking the numerical values of the bit 0 and the bit 3 of the object 0x1F81 in the object dictionary.

Therefore, the embodiment CAN realize the starting of a plurality of devices on the CAN bus, and carry out differentiated processing according to the configured attributes of the devices, thereby realizing the flexibility of device connection and improving the adaptability of the distributed system to different applications.

Referring to FIG. 9, to implement the boot process of the Master station, in CopMgr, CopMgr defines a plurality of states and events of the device, the events are used for synchronization with other modules, and the states are used for different processing in different situations. The CopMgr is a module at the upper layer, and will utilize all other modules, and the CopMgr polls the process processing of all other modules and maintains a state machine of its own, in which:

a) referring to fig. 10, when a device is booted, there is a boot event, the CopMgr jumps to a boot state, checks whether it is configured as a master or a slave in the boot state, where it is configured as a master, then registers an SDO dynamic request, registers its SRD to the SDOM, which is also on the master, and the registration of the SDO dynamic service request goes through multiple steps.

When the registration of the dynamic service request is successful, the SRD registration success event is activated, then the state jumps to the configuration state, and the configuration event is activated, wherein a plurality of events are possible in each state, and the activation of each event respectively executes different functions.

b) In a configuration event in a configuration state, starting initial information of the slave station is set, then the sending of a reset event is activated, and then all the slave stations are remotely reset in the sending of the reset event, wherein the remote reset is realized by sending a CAN message and is realized by using COB corresponding to an NMTM module, the NMTS has corresponding COB to realize receiving and locks a corresponding callback processing receiving event, the slave station carries out reset operation in the reset event.

If the COB ID in one message transmission of the reset slave station is 0, all the slave stations respond, but whether the reset operation is executed for the receiving slave station depends on the node ID number of the data segment of the launched message, if the node ID number is 0, all the slave stations execute a response command, and the response command depends on the command type of the data segment of the message transmitted by the master station. Here, in the reset command, if the node ID (not COB ID, COB ID is in the CAN identification field in front of the data segment) of the data segment in the message received by the slave station is not 0, the slave station will determine whether the node ID is consistent with its own ID number, and if so, the slave station will respond, otherwise, the slave station will not continue to respond.

c) After all the slave stations are reset, firstly, the master station activates polling of the slave station node queues, the starting of all the slave stations is mainly realized, the starting of the slave stations is parallel starting in a broad time concept, microscopically, each slave station executes a first step state one by one, then executes a second state one by one, and so on, a recording variable records how many slave stations finish the starting process, and the recording variable is decreased to 0 after all the slave stations finish the starting process. After that, whether any slave station is configured to be of a forced start type is judged, wherein the forced start type means that the slave station must be started successfully, a slave station completes the start process and does not represent that the slave station is started successfully, and errors can occur in the start process. The slave stations which are set to be forced to start may further require that the start must be successful because of special tasks, and if the slave stations which are not very special do not affect the whole application even if the start fails, the slave stations are not set to be of the forced start type. If the forced starting slave stations are successfully started, other stations complete the starting process, and the master station enters a waiting application state or a running state directly according to the configuration, and is configured to enter the waiting application state. Whether waiting for the application or entering the suspended state, the application is notified of this by a callback. After the successful start, the application is informed of the successful start event, and the application activates the running event.

d) Referring to fig. 11, in the waiting application state, since the application activates the running event, the master station determines whether the slave station enters the running state by itself or allows the master station to let the slave station enter the running state by a remote command according to the configuration. It is configured here that the master sends an NMTM message informing the slave to enter the active state and the master itself enters the active state, thus completing the master activation and the activation of all slaves.

Referring to fig. 12, in some embodiments, the CopMgr also manages the activation of slave stations.

a) When a master station starts, all slave stations need to be started, starting a slave station refers to starting a single slave station, parallel processing of slave station starting refers to parallel starting of a plurality of slave stations, starting means multiple communication interaction with the slave stations, and starting the slave stations represents the process.

b) The initiating slave does not represent a successful initiation of the slave and if the slave does not reply, it checks whether the slave is a forced initiation and whether it times out. If it is forced and time-out, the application is notified, the start of the slave station is ended and a signal 2 is sent to the main flow which starts the process of starting the slave station, if the slave station replies it checks if the start of the slave station is successful, and if not, the application is notified and the start of the slave station is ended.

c) When starting the slave station, if the slave station does not reply, a command for remotely resetting the slave station is sent, then the process of starting the slave station is waited for 1 second, if the slave station is of a forced starting type, the slave station needs to be checked to have the reply within a limited time, if the time is out, the slave station is not reset endlessly, and then the operation is waited.

d) The start of the slave is handled as part of the master start-up process in which it is necessary to wait for all slaves to finish starting, this waiting being the signal depicted in the figure, signal 1 indicating that all slaves have undergone a start. Whether such an initiation has failed or timed out, it is an initiation completion, and signal 2 indicates a signal corresponding to the successful initiation of a slave of the forced initiation type in the slave. This signal indicates that all slave stations of the forced start type have successfully started and that the master station needs to wait for both signals when starting. In fact, these two signals are designed to perform their functions with two related variables, specifically, variable 1 represents the number of all slave stations, and variable 2 represents the number of slave stations of the forced start type. When each slave goes through a start-up procedure, the variable 1 is decremented by 1, and when each slave of the forced start type starts successfully, the variable 2 is decremented by 1. If all slave stations have undergone a start-up procedure, the variable 1 will eventually equal 0, and if all slave stations of the forced start type have successfully started, the variable 2 will eventually equal 0. The master station starts and waits according to the two variables, whether all the slave stations start or not and whether all the slave stations of the forced start type start successfully start or not are judged, and then whether the master station enters the starting suspension state or the application waiting state is determined.

Signal 1 is decremented by 1 at the end of each slave activation process and signal 2 must wait for each slave of the forced activation type to successfully activate and decrement by 1.

e) When all the slave station starting processes are finished, starting of the slave stations is finished, and the starting process of the single slave station is described in detail next.

Referring to fig. 13, 14 and 15, the single slave station activation process includes:

1) when a slave station is started, whether the slave station is contained in a network list is checked, node numbers of all slave stations managed by a master station are set in advance, the whole network is configured through the configuration of an upper computer, the node numbers of all the slave stations are stored in the network list of the master station, the list is the bit 0 position of an object 0x1F81, the object 0x1F81 is an array, each item of the array corresponds to one slave station, the relevant configuration of the slave station is stored, and if the bit 0 of the array is set, the slave station corresponding to the array item is represented in the network list.

If the node number of the slave station is in the network list, the starting is continued, otherwise, the starting of the slave station is ended, wherein the ending means entering the last step of the starting process step and checking the starting result.

2) After the slave station is reset by the master station through the remote message, the slave station performs initial self-starting, starting is divided into a plurality of steps, each step corresponds to one state, when the slave station enters the quasi-running state, the slave station sends a starting notification message to the master station, the master station judges whether the slave station is allowed to start after receiving the starting notification message, namely the bit 2 of 0x1F81, and if the slave station starts to perform communication interaction with the slave station.

The slave station is started without performing data synchronization when the slave station is not in the operating state, and the slave station must enter the operating state after being successfully started to perform data synchronization.

If the slave is not allowed to start, the process proceeds to error control, which will be described later and is temporarily retained.

3) If the slave is allowed to start, then the 0x1000 object of the slave is requested, the process of requesting the object is a dynamic SDO request process, a dynamic connection request related to SDO is made, the SRD of the master enters a waiting default connection state, the SDOM scans and allocates COB IDs to the SDOS of the slave and the SDOC of the master, because the default connection is requested, the default connection adopts a fixedly allocated COB ID instead of the dynamic allocation, when the default connection is found to be requested, the allocated COB ID is released and the SDOS of the slave is not set, because the default SDOS of the slave is fixedly set, the first SDOS table entry is set at the time of the initialization of the SDOS module of the slave, the corresponding COB table entry is fixedly set, the COB ID is related to the slave, the default connection request is a fixed SDO channel, and after the dynamic SDO connection request is successful, one SDO transfer is made to read the slave's 0x1000 object.

4) If the slave station does not reply, the start is finished, and if the slave station replies, whether the content of the reply is consistent with that stored by the master station side 0x1F84 object is checked. If the element of the 0x1F84 object array corresponding to the slave station is not 0, starting is finished and error identification is carried out if the element of the slave station is not consistent, namely, the error type is recorded in each step if an error occurs, and the final error control processing is waited. If the 0x1F84 object is 0 or consistent, then the slave's 0x1018 object is requested.

5) The 0x1018 object has four sub-indexes including four objects, which are respectively requested, and waits for whether a reply exists after the request, if the reply exists and the 0x1F85-0x1F88 of the master station end is not 0, whether the sub-indexes are consistent or not is checked, the following process is continued if the sub-indexes are consistent, and if the sub-indexes are not consistent, the starting is ended and the reason identification is recorded.

In fact, the object 0x1000 and the object 0x1018 correspond to five parameters, namely the device type, the manufacturer identifier, the product code, the version number and the serial number of the slave station, and the object 0x1000 and the object 0x1018 are remotely accessed through the SDO to check whether the five parameters of the slave station are consistent with the elements of the corresponding slave station in the master station end 0x1F84-0x1F88 object array.

6) Entry B is an optional flow and C is a normal flow, optional flow meaning that this part of the functionality can be skipped or retained, but eventually also entry C is made, as will be described further below.

7) In an optional flow, whether an active flag of a slave station is set is checked, if the active flag is set to indicate that the slave station is an active type, a heartbeat is automatically configured, and the heartbeat parameter of the slave station is acquired to set the heartbeat parameter of the slave station stored locally in the master station instead of setting the heartbeat parameter of the slave station. Namely, a heartbeat consumer is automatically configured once, the heartbeat consumer refers to a site for receiving heartbeat packets, each slave station sends own heartbeat packet to a master station, the master station is a receiver of the heartbeat packets, the receiver is a heartbeat consumer hbc (heartbeat consumer), and the slave station for generating the heartbeat packets is a heartbeat producer hbp (heartbeat producer).

8) And checking the node state, waiting for the reply of the node state, finishing starting if the node state is not received, and checking whether the node is in a running state currently or not if the node state is received. If so, error control processing is carried out, the process goes to a flow D, which is described later, and if the node is not in a running state, the node communication is reset.

The significance of this is that in a network, a plurality of devices are hooked, wherein a master station and a slave station are provided, the master station and the slave station are started up in the first starting process, then some slave stations report errors to the master station in the operation process, the master station needs to reset all the slave stations, or the master station needs to restart a protocol stack for some reasons, but some slave stations do not need to reset the slave station when other slave stations fail or the master station needs to be restarted in some cases, and perhaps the slave station needs to operate all the time with the special task, and for the slave station configured as a live type slave station, the master station does not reset the slave station again when starting.

Therefore, the active type slave station is not reset when the master station starts, but the active type slave station must enter the running state, and to enter the running state, the master station must send a command to enter the running state, and before that, the active type slave station needs to be reset, because the active type slave station must be reset by the master station for the first time and cannot be reset by the master station after entering the running state after being started, because the master station does not know whether the active type slave station is reset for the first time or has been reset last time and enters the running state just after being restarted, the master station does not reset the active type slave station when starting, but starts to check the state of the active type slave station when starting the slave station, and if the state of the active type slave station is not in the running state, it indicates that the active type slave station needs to be reset for the first time and then enters the starting flow, therefore, the node state of the active slave station needs to be checked, whether the slave station is in the running state or not is judged, if yes, the slave station directly enters the D flow to carry out error control, and if not, the active slave station is indicated to be reset for the first time, so that the active slave station is reset.

For the inactive slave, the following procedure is directly followed, since the inactive slave has been remotely reset during the start-up of the master.

9) Checking whether the slave station program version needs to be checked, checking and updating if the slave station program version needs to be checked, checking again after updating, entering the next flow after the slave station program version is consistent, and directly entering the next flow C if the slave station program version does not need to be checked.

10) The configuration is checked and if correct, the error control service is performed, which is described next.

11) The error control service is started.

12) The flow from E is not allowed to start the slave station, so it is directly ended and accompanied by a successful start flag.

And the flow from D is that the active type slave station is in the running state, so that the slave station is not allowed to reset and start again, and therefore the process is directly finished and is accompanied by an identifier L to indicate that the active slave station is already in the running state.

13) And then judging whether the equipment is required to enter the running state, once the slave station enters the running state, data synchronization is started, the data between the network nodes are synchronized, the input data related to the slave station in the master station is kept consistent, and the output data of the master station is also kept consistent with the output of the slave station.

At the end of the slave's activation is exactly what the master's activation will do.

Referring to fig. 16, the checking configuration is part of the start of the slave station, and mainly checks whether two objects of the master station are 0, if they are 0, the configuration of the slave station is updated (this is the first configuration), otherwise, requests the 0x1020 object of the slave station, the 0x1020 object of the slave station stores the last update date and time of the slave station configuration, if the update date and time of the slave station is consistent with the configuration date and time of the corresponding slave station at the master station end, then it is not needed to be configured, otherwise, the configuration is updated, and the update configuration is some parameters of the slave station configured by the master station.

Referring to fig. 17, checking that the node status is for an active type slave station, first checks to see if the consumer heartbeat time parameter is non-zero.

To better describe this flow, it is first understood that there are two ways to obtain the status of the slave, one is called a heartbeat mechanism, the implemented modules are respectively an HBC module and an HBP module, the HBC module is a receiver of the heartbeat, the HBP module is a sender of the heartbeat, the receiver is a consumer, the producer is a producer, i.e. a consumer-production model, the master is a consumer of the heartbeat, i.e. a receiver, and the slave is a producer of the heartbeat, the slave sends a CAN packet to the master at intervals, exactly the slave corresponds to the HBP module, the master corresponds to the HBC module, the COB of the HBP module corresponds to the COB of the HBC module, one HBC module receives heartbeat packets of multiple HBPs (a CAN packet, but what its COB ID determines it is), exactly the master contains an HBC module, i.e. contains an HBC table, the HBC table has a number of sub-entries. The slave station only has one COB, and the COB is coincided with the COB of the NMTS, so that the HBC module of the master station stores all receivers HBC of the slave station corresponding to the HBP module, and the slave station only has one HBP module, so that messages sent by the HBP module only have response of the HBC table, because COB IDs of the HBP module and the HBC table are related to nodes, the HBP module and the HBC table have a corresponding relation of CAN identifications, the slave station sends heartbeat packets to the HBC module according to set intervals of heartbeat time parameters, the heartbeat packets contain the current state of the slave station, the slave station is actively sent, and the master station passively receives, namely, a heartbeat mechanism. It can be seen that the heartbeat mechanism is summarized by a sentence, namely that the slave station determines how often to send a heartbeat packet to the master station according to the heartbeat parameter of the slave station, and the heartbeat packet contains the current state of the slave station, so that the state of the slave station can be known only by obtaining one heartbeat packet of the slave station by checking the state of the node.

Another mechanism for acquiring the state of the slave station is called node monitoring, which is that the master station actively sends a node monitoring request, and then the slave station passively replies a message, wherein the node reply message contains the current state of the slave station.

When the node state is checked, whether the heartbeat time of a consumer is 0 or not is checked, the heartbeat time of the consumer is the heartbeat parameter of the corresponding slave station stored by the master station, but is different from the value of the heartbeat parameter of the slave station, the former is larger than the latter, the former is used for checking whether the heartbeat packet of the slave station can be received within a specified time, and the latter is used for sending the heartbeat packet by the slave station according to the time, so the former is larger than the latter, when the heartbeat time of the consumer is 0, the consumer refers to the master station, namely, the time when the heartbeat time of the slave station stored by the master station end can receive the heartbeat of the slave station within the parameter time is 0, which indicates that the slave station does not start the heartbeat mechanism, but adopts a node monitoring mechanism, the master station requests a node monitoring request, waits for a reply, and if the slave station does not receive the reply state after overtime, the reply state of the slave station is obtained, the examination is then ended.

If the heartbeat parameter of the consumer is nonzero, the heartbeat mechanism is adopted, the heartbeat mechanism is the initiative of the master station and the slave station, so that the user only needs to wait for the heartbeat packet, the check is finished as long as the heartbeat packet is received, and on the contrary, the heartbeat time which is saved by the consumer is exceeded, namely, the time is overtime, the check is finished, and the error is recorded along with an error identifier.

Referring to fig. 18, the start error control service is part of the start of the slave station, and the heartbeat and node monitoring mechanism is used when the previous check is for the active type slave station, and actually the heartbeat or node monitoring mechanism is already started at this time for the normal type slave station, because the slave station is already configured when the previous check is configured, and the heartbeat mechanism is activated once the heartbeat parameter of the slave station is configured.

The part is mainly used for detecting whether the slave station is successfully configured, if the slave station is successfully configured, a heartbeat packet or a life cycle mechanism is activated, or the slave station can receive the heartbeat packet or send a life cycle monitoring message and can be replied and confirmed in a set parameter time interval, otherwise the slave station is not successfully started.

The implementation of the foregoing flow is based on the mechanisms of events and state machines, with states being steps of the flow and events determining a number of different case handling.

The method for realizing the starting of the slave station mainly comprises the following steps:

step 1, checking the equipment type, manufacturer identification, product code, version number and serial number;

step 2, automatically configuring heartbeat consumers (only executed in an active type);

step 3, checking the state of the active node (only the active type is executed);

step 4, starting the configuration management of the node;

step 5, automatically configuring heartbeat and life cycle (referring to a producer);

step 6, starting the life cycle of the slave station node;

step 7, stopping the life cycle of the slave station node;

step 8, starting error control;

and 9, error processing after finishing.

Referring to fig. 19, in step 1: in the initial state, the slave station is in an idle state, and after the waiting starting event is activated in the process of the master station state machine control, the slave station enters the waiting starting state. If no additional events are activated in this state, the requesting device model state is entered after waiting a period of time. When a state is initially entered, i.e. a state change occurs, a trigger event is activated to initiate an SDO request requesting the device model of the slave. And after the slave station replies, activating an SDO transmission success event in the callback of the message, and checking whether the request result is consistent with the configuration of the master station side. And if the configuration of the master station end is nonzero, jumping to an end state if the configuration is inconsistent, simultaneously recording the error reason, jumping to a request manufacturer identification state if the configuration is consistent, requesting the manufacturer identification of the slave station, and jumping to execute checking according to the previous state.

Therefore, the equipment type, the manufacturer identification, the product code, the version number and the serial number of the slave station are checked at one time and compared with the configuration stored by the master station, if the equipment type, the manufacturer identification, the product code, the version number and the serial number are consistent, the next check is continued, and if any inconsistency occurs, the next check is ended and the error reason is recorded.

Referring to fig. 20, in step 3: and after the equipment type, the manufacturer identification, the product code, the version number and the serial number of the slave station are checked, if the equipment type, the manufacturer identification, the product code, the version number and the serial number are all correct or the stored parameter of the master station is 0, which indicates that the equipment type, the manufacturer identification, the product code, the version number and the serial number do not need to be checked, the state of the automatic configuration heartbeat consumer is jumped to.

As a result of the state change, a trigger event is activated, the heartbeat consumer is automatically configured and the state is checked for the purpose of handling the active type of slave station. If the active type slave station is in the operation state, the master station is not allowed to reset the active type slave station, and the master station does not reset and start the active type slave station at the time of initialization. If the active type slave station in the running state exists, the master station configures a heartbeat consumer parameter of the master station after starting, wherein the parameter is a measure for judging whether the heartbeat message of the slave station is overtime or not by the master station. Because the master station is reset and the master station as a consumer needs to reset the parameter, after the triggering event is activated, if the slave station is of an inactive type or the slave station is of an active type but just reset, the configuration heartbeat consumer event of the AutoHb module is activated if the slave station is of the active type and is not reset, and the status of the slave station is not required to be configured and checked.

The configuration heartbeat consumer event of the AutoHb module, when activated, triggers the automatic configuration of the heartbeat consumer.

When the configuration is successful, the AutoHb module activates the SDO success event and jumps to the checking state.

In the checking state, the active type is checked and whether the slave station is just reset is judged, so that only the slave station of the active type is processed and the slave station is not reset, in the state, a heartbeat packet of the slave station is waited, if the heartbeat packet is waited, a heartbeat receiving event is activated, at the moment, the master station can obtain the state of the slave station from the heartbeat packet, if the slave station of the active type is in the running state, the slave station jumps to the error control state, otherwise, the slave station is reset and enters the waiting starting state, and a starting process is executed for the slave station again.

And if the heartbeat is not received, the master station checks whether the heartbeat consumer is set or not, if the heartbeat consumer is set and the heartbeat packet of the slave station is not received in delay, the master station jumps to an ending state, the starting process of the slave station is ended, and the error reason is recorded.

If the heartbeat consumer is not set, the master station acquires the state of the slave station through a node monitoring mechanism, the master station actively transmits a node monitoring message, acquires the state of the slave station after receiving the reply of the slave station, jumps to an error control state if the slave station is in an operation state, otherwise, resets and restarts a starting process for the slave station, and if the slave station does not reply, the starting process is ended and an error identifier is recorded.

Referring to fig. 21, in step 7: the method comprises the steps that a slave station is in an active type or an inactive type, then the state of a configuration manager is jumped to, a configuration node event under a trigger event is a configuration task initiated by an application actively and does not belong to a link in a starting process, so that the configuration node event belongs to an additional task, the configuration manager can be started by the additional task, a part of the configuration manager belongs to a CfgMa module, the module can configure an object dictionary of the slave station according to content in an equipment file, if configuration is successful, the configuration manager jumps to an end state, the configuration process is directly ended because the configuration process does not belong to a starting process, and if no configuration file exists, the configuration manager jumps to an automatic configuration heartbeat and life cycle state.

The configuration data or configuration file is an equipment description file, the equipment description file contains information of an object dictionary to be configured, such as index sub-index and data, and an object corresponding to a slave station can be configured according to the information, and the configuration process is realized by traversing the file and transmitting through SDO.

If the configuration data is not an additional task but part of the starting process, the configuration manager state is adopted under the triggering event, and the difference is that the configuration data is successfully configured, the configuration data jumps to the error control state to continue the starting process, and the configuration data also jumps to the automatic configuration heartbeat and life cycle state if the configuration data is not available.

That is, if the configuration is successful through the configuration manager, the control state is directly jumped to the error control state, and the heartbeat and life cycle related objects do not need to be reconfigured.

If the heartbeat and lifecycle related objects fail to be configured in another way, a jump is made to the auto-configure heartbeat and lifecycle state. Under the trigger event in this state, the auto-configuration event of the AutoHb module will be activated.

This event will cause the AutoHb module to automatically configure the heartbeat and lifecycle. And if the configuration is successful, the AutoHb module activates an SDO success event, if the configuration is successful, the system jumps to the ending state, otherwise, the system jumps to the error control state to continue to perform the next starting process.

Starting or stopping the lifecycle is an additional task that is associated with the AutoHb module and that is responsible for selecting whether the heartbeat mechanism or the lifecycle monitoring mechanism controls the errors of the slave station.

Referring to fig. 22, in the error control state, the main role is to check whether the slave station is in a dead connection in step 9. In order to prevent false connection of the slave station, two mechanisms are provided, one is a heartbeat mechanism, and the slave station sends heartbeat packets to the master station at intervals. There are two intervals, one is that the slave sends a heartbeat packet according to the time represented by the heartbeat parameter of the slave, which belongs to the heartbeat time parameter of the heartbeat producer. The other is a heartbeat time parameter of each slave station stored by the master station end, the parameter is a parameter for judging whether the heartbeat packet of the slave station is overtime or not by the master station, and the heartbeat parameter at the master station end belongs to a heartbeat consumer parameter. The heartbeat consumer parameter is larger than the time parameter of the heartbeat producer, and because the receiving and processing of the message need time, a certain delay exists.

In the error control state, if a heartbeat packet is received (a heartbeat receiving event is activated in a callback received by the heartbeat packet), the connection is normal, and the state is jumped to the end state. If the heartbeat packet is not received, checking heartbeat consumer parameters, if zero indicates that a heartbeat mechanism is not adopted, but a life monitoring mechanism is adopted, the master station configures NMTM table entries, NMTM polls all the table entries, sends life monitoring messages to the slave station, the slave station replies the life monitoring messages, error processing can be caused by one-time message overtime or long-term overtime (the condition is no reply condition), the error processing is different from the error control, the error processing is call-back, and the slave station is reset in the error processing and jumps to a waiting starting state.

And in the ending state, releasing the dynamic SDO connection, jumping to an idle state, processing the error identifier of the slave station in the starting process for ending the starting process in advance, resetting and jumping to a waiting starting state if the slave station does not reply, sending a message remotely to enable the slave station to enter the running state if the slave station is started correctly, and once the slave station enters the running state. PDO data synchronization is started and if there are other errors, the manager enters the suspend state.

Referring to fig. 23, an AutoHb module is described next.

The AutoHb module mainly realizes multiple SDO transmission control in a configuration process, and is mainly used for configuring service for heartbeat or a life cycle in a starting process of a slave station.

The method is characterized in that the AutoHb is in an idle state at the beginning, one event of the AutoHb is activated, one state jump is triggered, and the method is mainly divided into four events:

a) configuring heartbeat consumer events

In the starting process of the slave station, if the slave station belongs to an active type, configuration of a heartbeat consumer is executed, when the event is activated, the heartbeat consumer jumps to a heartbeat requesting state, one-time SDO transmission is triggered, the heartbeat requesting state jumps to a heartbeat waiting state, if the request is successful, if the heartbeat value of a heartbeat producer is zero, the heartbeat producer returns to an idle state, otherwise, heartbeat parameters of the heartbeat consumer are obtained, if the heartbeat consumer is not zero, the size is compared, the application is notified, and if the heartbeat consumer is zero, the heartbeat value of the heartbeat consumer corresponding to the station is set.

I.e. if necessary to configure the heartbeat consumer, it is only necessary if the heartbeat producer is not zero and the heartbeat consumer is zero.

b) Automatically configuring events

This event is activated when the heartbeat and lifecycle states are automatically configured, jumping to the request heartbeat state and performing an SDO transfer. Jumping to a waiting request heartbeat state, jumping to a request monitoring period state if the heartbeat producer parameter is zero, in the waiting request life period state after SDO transmission, ending if the slave station life period is nonzero, otherwise continuing configuration, checking the heartbeat consumer heartbeat parameter and the local life period parameter, jumping to a set heartbeat state if the heartbeat consumer parameter is nonzero or the local life period is zero and the heartbeat producer heartbeat parameter is zero, otherwise jumping to the set life period state.

Automatically configuring the heartbeat and the lifecycle allows only one of the configuration, one of which is configuring the heartbeat and the other of which is configuring the lifecycle.

There are producers and consumers for heartbeats.

The producer is the slave and the consumer is the master. The heartbeat parameter and the life cycle parameter exist at the master station end, and the heartbeat parameter and the life cycle parameter also exist at the slave station. When auto-configured, if the slave's heartbeat is non-zero, it is indicated that the slave has been configured as a heartbeat mechanism and does not need to be reconfigured.

If the slave heartbeat is zero, namely the slave heartbeat is not configured, checking whether the life cycle of the slave is configured. The auto-configuration process is also ended if the slave's lifecycle non-zero indicates that it is configured, since this indicates that the slave employs a lifecycle mechanism.

When the heartbeat and the life cycle of the slave station are simultaneously zero, namely the slave station does not select which mode to detect the dead connection, the master station is required to be configured at the moment, and the selection of which mode is determined according to the heartbeat and the life cycle parameters of the master station.

At this time, the parameters of the master station end have four situations:

heartbeat is zero and lifecycle is zero;

the heartbeat is non-zero and the life cycle is zero;

heartbeat is non-zero and lifecycle is non-zero;

the heartbeat is zero and the life cycle is non-zero.

The first three cases will choose to configure the heartbeat of the slave station, and the fourth case configures the life cycle of the slave station.

In the first case, although the heartbeat of the master station is zero (the heartbeat of the master station refers to the maximum heartbeat timeout time of the corresponding slave station stored at the master station side), it is set by a default value.

Therefore, the heartbeat is set to be in a heartbeat setting state, and the life cycle is set to be in a life cycle setting state. And setting the heartbeat state, setting the heartbeat of the slave station by using the heartbeat parameter of the corresponding slave station of the master station end, and if the heartbeat of the master station end is zero, using a default value in a mode of SDO transmission, and then jumping to a state of waiting for setting the heartbeat.

And in the waiting state, activating an SDO transmission completion event, jumping to an idle state and informing the slave station to start a state machine, and automatically configuring by activating the SDO completion event of the slave station starting state machine.

If the selection is to set the life cycle, two parameters of the life cycle and the life cycle factor of the slave station are set in the SDO transmission mode, the life cycle is the interval of the monitoring message for judging the live connection of the slave station, if no reply is received in the interval, the message is lost, and the life cycle factor means that no message is replied in the life cycle for a plurality of times, the slave station is lost. And after the setting of the life cycle and the life cycle factors is completed, the slave station is notified to start the state machine and end.

c) Initiating a lifecycle event

The event is to set two parameters of the life cycle and the life cycle factor of the slave station, and the precondition is that the slave station is not in the heartbeat mechanism, namely the heartbeat parameter of the slave station is not set, and once the life cycle of the slave station is set, the life cycle mechanism is started.

d) Off lifecycle events

The event is to apply a life cycle stopping mechanism, and the precondition is that the slave station is under the life cycle mechanism, the life cycle stopping is only carried out by clearing two parameters of the life cycle and life cycle factors of the slave station, and the slave station enters a node monitoring protocol after the clearing.

In this embodiment, the orchestration manager needs to perform reliability control. Reliability control deals primarily with the problem of lost slaves of a network communication, which a master can detect if a slave has somehow interrupted the communication. The processing mechanism is implemented by a heartbeat mechanism and a life cycle mechanism.

The reliability control comprises a starting process of the slave station, which generally refers to the starting process of the slave station, wherein one mechanism is selected from a heartbeat mechanism and a life cycle mechanism to detect communication connection, if the connection is lost, error processing is generated, and the error processing selects whether to stop all nodes or reset all nodes or just to add a starting process after resetting a single node according to different configurations.

First of all for the start-up of a device the term start-up encompasses both the start-up procedure described previously for the slave start-up state machine and the stages of internal start-up of the slave itself, which parts are mutual.

With continued reference to fig. 1, when a slave is powered on, it connects to the network and executes a local initialization command that causes a local initialization event, a reset node event, a quasi-reset communication event, a post-reset communication event, and a quasi-run state event, respectively. Each event causes execution of a module event for the respective protocol stack, involving for reliability control two modules, HBP and NMTS respectively (relative to the secondary station). In the quasi-reset communication event, the slave station uses a start protocol, in which the slave station sends a start message to the master station to indicate that the slave station starts itself, and the master station starts a start flow of the slave station after receiving the start message, which is shown in fig. 2. And the master station starts the starting processing after receiving the starting message of the slave station, and starts the node equipment if the slave station is in the network list, namely, the master station does not need to wait for directly entering the type state of the request equipment.

When the slave station enters the event after the reset communication of the local event, the slave station adopts a node monitoring protocol, and waits for the configuration of the master station under the protocol, namely the automatic configuration heartbeat and the life cycle of the master station in the process of starting the slave station, because the slave station does not select any mechanism as a reliability mechanism at the moment.

When the slave station starts successfully, the slave station selects one of the heartbeat or the life cycle, and then starts reliability control.

The reliability control includes four parts:

1. heartbeat mechanism

Under the heartbeat mechanism, a slave station actively sends a heartbeat packet to a master station, and an HBP module and an HBC module are designed. The primary station contains an HBC module and the secondary station contains an HBP module.

The HBC table comprises a plurality of table entries, each table entry corresponds to one COB, and each table entry is responsible for heartbeat information of one slave station. The HBP module has no module table, only one COB, although this COB is common to the COB of the NMTS. The polling process of the HBP module checks whether the interval between the previous sending time and the current time is larger than the set heartbeat time parameter, if so, the sending is carried out, and the polling process of the HBC module polls all items of the HBC table, checks whether the last heartbeat receiving time and the current time are overtime, and if so, enters the above error processing process.

2. Life cycle mechanism

Under the life cycle mechanism, the master station actively sends a life monitoring message to the slave station, and the slave station replies to the master station to indicate that the connection is active. This part of the functionality is contained in the NMT NMTM NMTS module.

With continued reference to fig. 3, the NMTM module includes a module table (NMTM table), and each table entry corresponds to a slave station, that is, the master station needs to store the relevant information of each slave station, which is used for performing COB communication.

With continued reference to fig. 4, each entry of the NMTM corresponds to a slave, i.e., a single entry structure in the figure, which contains information about the life cycle parameters and the slave, and a COB parameter structure for setting the COB that contains the transmitting and receiving COB for uniquely identifying the message as communicated between which NMTM entry of the master and which slave.

The slave station comprises an NMTS module without a module table but with a COB which is the same COB as the COB corresponding to the previous heartbeat HBP module, since the CAN ID is common, the time sharing depends on whether the slave station uses the heartbeat mechanism or the life cycle mechanism at the time.

And in the NMTM module polling, the life monitoring message is sent to the slave station according to the life monitoring time of the NMTM table item, the message is replied to the master station in the receiving callback of the NMTS module of the slave station, and if the slave station replies overtime, the NMTM module polling enters error processing.

2. Startup chain and error handling

The slave starting in the slave starting state machine is the starting of a single slave, and actually, the master manages the starting of a plurality of slaves together, the starting speed of each slave is not controlled, and other slaves can be successfully started and can also be failed in starting.

In order to deal with the parallel starting of a plurality of slave stations and the restarting of an error slave station, a starting chain is designed to control, the plurality of slave stations are started according to the node sequence at first, then the starting speed is not controlled, and the possibility exists that the second slave station is started successfully firstly and the last slave station is started successfully firstly.

The start chain mode is as follows:

a) all the slave stations are arranged in the node sequence, if any slave station successfully starts, the successfully started slave station is removed from the chain, and specifically, the slave station which starts later checks whether the successfully started slave station exists in the front. If the slave station is successfully started, the slave station moves forwards, and the subsequent slave stations perform the operation in sequence, namely after one or more slave stations are successfully started, the subsequent slave stations move forwards to push the successfully started slave stations away, which is equivalent to that the successfully started slave stations are removed from the chain.

b) The slave station that failed the start is removed from the chain as the slave station that succeeded in the start if it does not need to be started again.

c) If the slave station with failed starting needs to be started again, the failed slave station is moved to the tail of the chain, actually a copy of the slave station is added to the tail of the chain, the original copy is taken as a node needing to be removed, and the node is removed from the chain at the end of polling of the chain according to the similar mode as the slave station with successful or failed starting, which is equal to the mode that the failed slave station is directly moved to the tail of the chain.

Therefore, when the slave station fails in the starting process, the slave station can be reset again and started again, the starting process is to enable the slave station to be in a waiting state, the resetting is not a power-off similar restarting, the program does not operate again, but only the local event of the slave station is changed, the resetting refers to the resetting of the node or the resetting communication, and the master station is a remote resetting command through a resetting command sent by a message.

But not always reset if the slave is not always started successfully, depending on the configuration. If there has been an error, it is configured that the manager is not started after jumping out from the configuration state. That is, in the starting process of the slave station, after the first starting failure, the slave station is reset and then the second starting attempt is carried out. The manager then jumps out of the configuration state so if the second start is still unsuccessful, the slave is reset again but no attempt is made to enter a restart but the slave is removed from the start chain. I.e. the slave is removed from the startup list in case the just-mentioned startup fails without a need for a restart, and the node is moved to the end of the startup chain if a restart is needed.

With continued reference to fig. 5, when error handling occurs, which may be due to the master not receiving the slave's heartbeat packet or the master's lifecycle monitoring message in the late stage, multiple times, in which case configurable handling is performed depending on the slave type, if the slave is of the type that must be successfully started, i.e. the distributed task represented by the slave does not allow a short error in the module, the master may stop the entire network or reset all devices when such a slave error is detected, if the device is reset after an error is allowed, the slave is reset and enqueued in the error handling, added to the startup chain, since the slave station was removed from the startup chain after the startup was successful in the initial startup process, the slave station now loses connection during operation and rejoins the startup chain.

For the heartbeat mechanism, the heartbeat packet loss of a slave station only generates error processing once, and error processing for a plurality of times cannot be generated endlessly unless the heartbeat packet of the slave station is recovered again, and if the heartbeat packet is recovered and lost again, new error processing can be generated.

For the life cycle mechanism, if a slave station does not respond to the life cycle monitoring message, the slave station is lost and generates an error process, the slave station does not perform the detection of the monitoring message any more, and the life cycle of the slave station is not enabled any more, so that the life cycle monitoring message cannot be sent at intervals. If the life cycle monitoring mechanism is to be enabled again, a new configuration of the slave station is required, which requires a new slave station start-up procedure.

When an error occurs in a slave station, if the slave station is in a network node list, whether all nodes are stopped or all nodes are reset or a single node is reset according to the configuration of the master station, which mainly aims at the slave station which must be started successfully, a general slave station can be reset and restarted independently, and the restart is to add the slave station into a starting chain and set the slave station into a waiting starting state.

In addition, the two modules of EMCP and EMCC process internal errors occurring during the operation of the protocol stack, the heartbeat or the life cycle described above is a communication connection, and the internal errors may generate an emergency event, and the internal errors are mainly generated during the operations of the COB, PDO and NMT:

the internal error of the COB is a drive-related error, and is caused when the CAN controller drive is not found or the drive state is wrong. The internal error of the NMT includes errors in the event processing of all modules. The error of the PDO is an error generated when performing PDO synchronization, transmitting a PDO packet, PDO encoding, PDO decoding, and checking a mapping object.

The emergency producer generates an emergency, the emergency consumer receives the emergency and processes the emergency, and the application program is mainly notified to make the application decide what to process the emergency.

The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions and improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A CANopen master-slave station reliability control method is applied to a plurality of devices on a CAN bus, one device is a master station, the other devices are slave stations, and each slave station is provided with a node number to serve as a node; the method is characterized in that when the master station starts the slave station, the master station configures the heartbeat or the life cycle of the slave station;

the reliability control method comprises the following steps:

enabling the slave station to send a heartbeat packet to the master station, or enabling the master station to send a life cycle monitoring message to the slave station;

when the master station does not receive a heartbeat packet of a slave station after a preset time I, or the master station sends a life cycle monitoring message for multiple times and does not receive a reply of the slave station, performing corresponding error processing according to the type of the slave station;

the starting method of the master station comprises the following steps:

step S1, after the device is powered on, judging whether the device is configured as an NMT master station;

when the equipment is not configured as an NMT master station, judging whether the equipment allows the equipment to start, if so, automatically jumping to an NMT running state and entering an NMT slave station mode, otherwise, directly entering the NMT slave station mode;

when the device is configured as an NMT master station, executing step S2 to perform NMT flight master station processing on the master station;

after the master station processes the lost master station right by the NMT flight master station, entering an NMT slave station mode;

after the master station processes the right of the winning master station by the NMT flight master station, executing a step S3, and judging whether the master station requires LSS service; the LSS service is a layer setting service and is used for setting the mark information of the slave station;

when the master station requests the LSS service, step S4 is executed to execute the LSS master station process;

after the master station does not require the LSS service or completes the LSS master station processing, step S5 is executed, whether the active bit of the slave station is set or not is judged, if yes, the slave station with the active bit not set is reset, and if not, all the slave stations are reset;

step S6, judging whether all the forced start slave stations are successfully started;

stopping the starting of the master station when the forced starting slave station is not successfully started;

when all the forced-start slave stations are started successfully, executing a step S7, and judging whether the master station is configured to automatically enter a running state;

when the master station is not configured to automatically enter the running state, waiting for an application to trigger the master station to enter the running state;

step S8, judging whether the master station allows all nodes to enter the running state by remote commands;

when the master station does not allow all the nodes to enter the running state, skipping to normal operation to finish starting;

when the master station allows all the nodes to enter the running state through the remote command, executing a step S9, and judging whether the optional slave station is started successfully;

when all the slave stations are started successfully, remotely commanding all the slave stations to enter a running state, and then jumping to normal operation to finish starting;

when the slave station is not started successfully, the part of the slave station which is started successfully by the remote command enters a running state, and then the slave station jumps to normal operation to finish the starting.

2. The CANopen master-slave reliability control method of claim 1, wherein the master station comprises a heartbeat consumer module, and each slave station comprises a heartbeat producer module;

and polling all the table entries of the HBC table by each heartbeat consumer module, and if the last heartbeat receiving time and the current time exceed a preset time two, performing error processing.

3. The CANopen master-slave station reliability control method of claim 1, wherein the master station comprises an NMTM module, the NMTM module comprises an NMTM module table, the NMTM module table is provided with a plurality of NMTM entries, and each NMTM entry corresponds to one slave station;

4. The CANopen master-slave station reliability control method of claim 3, wherein the slave station comprises an NMTS module, and the NMTS module corresponds to an entry of a communication object table;

the NMTM module carries out polling and sends a life cycle monitoring message to the slave station according to the life monitoring time of the NMTM table item; the NMTS module receives call-back and replies a message to the master station; in the polling of the NMTM module, the NMTM module performs the error processing when waiting for the slave station to reply for more than a preset time.

5. A CANopen master-slave station reliability control method as claimed in claim 1,

the method for arranging all the slave stations according to the node sequence and forming the starting chain comprises the following steps:

6. The CANopen master-slave reliability control method of claim 1, wherein the method for moving the slave station which fails to start and needs to start again to the tail of the start chain comprises:

7. The CANopen master-slave station reliability control method of claim 1, wherein the reliability control method further comprises:

starting the slave station for a plurality of times;

8. The CANopen master-slave reliability control method of claim 1, wherein the error handling method comprises:

judging whether the node is positioned in the control list or not;

when the node is not located in the control list, ending error processing;

9. The CANopen master-slave station reliability control method of claim 1, wherein when the slave station sends a heartbeat packet to the master station, the slave station performs a corresponding error process every time it loses a reply of the heartbeat packet;

when the master station sends a life cycle monitoring message to the slave station, and the slave station loses a reply of the life cycle monitoring message every time, corresponding error processing is carried out; and after the primary error processing is carried out, the slave station abandons the checking of the life cycle monitoring message.

10. A orchestration manager for managing a plurality of modules provided in a plurality of devices on a CAN bus; the overall management manager is characterized by realizing the CANopen master-slave station reliability control method according to any one of claims 1-9 when managing a plurality of modules; the modules comprise a heartbeat consumer module and an NMTM module of a main station, and a heartbeat producer module and an NMTS module of a slave station.