CN116935069B

CN116935069B - Man-machine asynchronous detection method, device and medium based on improved attention mechanism

Info

Publication number: CN116935069B
Application number: CN202311187641.8A
Authority: CN
Inventors: 徐山; 程海博; 涂燕晖; 陈一昕
Original assignee: Shandong Future Network Research Institute Industrial Internet Innovation Application Base Of Zijinshan Laboratory
Current assignee: Shandong Future Network Research Institute Industrial Internet Innovation Application Base Of Zijinshan Laboratory
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2023-11-21
Anticipated expiration: 2043-09-15
Also published as: CN116935069A

Abstract

The invention provides a man-machine asynchronous detection method, device and medium based on an improved attention mechanism, and belongs to the technical field of breathing machine man-machine asynchronous detection. Data acquisition, namely slicing the acquired data, and drawing the data of the data slice into an image generation model sample; preprocessing a model sample through data set cleaning and image labeling; establishing and training a yolov5s human-computer asynchronous recognition network model of an improved attention mechanism through a training set, collecting patient breathing machine data detected in real time, inputting the patient breathing machine data into the trained yolov5s human-computer asynchronous recognition network model of the improved attention mechanism, and obtaining whether human-computer asynchronous conditions occur to a patient or not; and judging whether the labeling frame is asynchronous with the man-machine according to the confidence level, and processing. The method has the advantages of higher efficiency and accuracy in expression of man-machine dyssynchrony, stronger robustness of the extracted abstract features and better generalization capability.

Description

Man-machine asynchronous detection method, device and medium based on improved attention mechanism

Technical Field

The invention relates to a man-machine asynchronous detection method, a device and a medium based on an improved attention mechanism, and belongs to the technical field of breathing machine man-machine asynchronous detection.

Background

Currently mechanical ventilation (mechanical ventilation, MV) is one of the most effective life support means for emergency treatment of respiratory failure patients and critically ill patients. During mechanical ventilation, the ventilator is not matched with the patient's inspiratory effort supply and demand, i.e., is man-machine-asynchronous (PVA). Human-machine dyssynchrony is one of the most common mechanical ventilation problems, and detection thereof is an important topic of clinical practice. Currently, for man-machine dyssynchrony, automatic detection is finished by adopting three breathing waveform curves of pressure-time, flow rate-time and volume-time based on waveform characteristic rules, machine learning or deep learning modes.

Traditional methods based on waveform characteristic rules often have higher requirements on waveform normalization. And the feature rule mainly depends on a manually designed feature extractor, the design of the extractor has extremely high accuracy requirements on professional knowledge, and the selection of the threshold value has certain subjectivity and poor generalization capability.

The method based on machine learning requires a large number of complex parameter adjustment processes, and meanwhile, the machine learning has the defects of poor generalization capability and poor robustness. In particular, the detection effect of machine learning is not ideal for waveform signals containing a lot of high-frequency noise.

Methods based on waveform feature rules and machine learning often require a feature extraction process. There may be no extractable features for the irregular waveform. And the dependence of the model on the characteristics is strong, and the manually selected characteristics can cause poor model effect due to the selection of one surface or cause the reduction of model speed due to the redundancy of the characteristics.

At present, the man-machine asynchronous detection method based on deep learning is mainly based on the current respiration waveform, and often has the problem that the previous periodic waveform connection is ignored or an abnormal region cannot be accurately positioned. And in the actual detection, the judgment of the respiratory asynchronism type by the respiratory therapist is obtained by combining the multicycle waveforms.

Disclosure of Invention

The invention aims to provide a man-machine unsynchronized detection method, a device and a medium based on an improved attention mechanism, which are more efficient and accurate in expression of man-machine unsynchronized, stronger in robustness of extracted abstract features and better in generalization capability.

The invention aims to achieve the aim, and the aim is achieved by the following technical scheme:

a human-machine dyssynchrony detection method based on an improved attention mechanism, comprising:

step 1: and data acquisition, namely performing gas interaction with the breathing machine through simulating the lung, and acquiring the pressure, the flow speed and the capacity on the breathing machine during the interaction.

Step 2: and slicing the acquired pressure, flow rate and capacity, and drawing the data of the data slice into an image generation model sample, wherein the image generation model sample comprises a pressure-time waveform diagram, a flow rate-time waveform diagram and a capacity-time waveform diagram.

Step 3: and preprocessing the model sample through data set cleaning and image labeling.

Step 4: the method comprises the steps of constructing a yolov5s man-machine asynchronous recognition network model for improving an attention mechanism based on a yolov5s framework, comprising a main network, a head network and a prediction network, introducing a C3CBAM compound attention mechanism after each convolution layer of the main network, and adding a CA layer attention mechanism before a rapid spatial pyramid pooling module layer of the main network.

Step 5: dividing a model sample into a training set and a verification set, training a yolov5s man-machine asynchronous recognition network model with an improved attention mechanism through the training set, and calculating map@0.5 and map@0.5 of a training process by taking forward calculation and backward propagation as a training period: and 0.95, respectively weighting and calculating average weights according to the proportion of 0.1 to 0.9, wherein the weight corresponding to the maximum value of the current weighted sum is used as the optimal solution of the model, and is used as the weight of the yolov5s detection human-computer asynchronous model, so as to obtain the trained yolov5s human-computer asynchronous recognition network model with an improved attention mechanism.

Step 6: and collecting patient breathing machine data detected in real time, preprocessing the patient breathing machine data, inputting the data into a yolov5s human-machine asynchronous recognition network model of an improved attention mechanism after training is completed, and obtaining whether the patient is in a human-machine asynchronous condition.

Step 7: when the confidence coefficient is larger than 0.3, the annotation frame is judged to be asynchronous to a human machine, the annotation frame is drawn on the breathing waveform diagram by the corresponding coordinates, otherwise, synchronous breathing is judged currently, no processing is carried out, and the confidence coefficient is multiplied by the probability of existence of a target, the intersection ratio of the prediction frame and the real frame, and the probability that the prediction frame belongs to each category.

Preferably, the data acquisition period is 14.6 days.

Preferably, the data slices are in a multi-cycle time sequence, each group of slices has a length of 14 seconds, each data slice contains 700 time points, and each time point has three data, namely, data values of pressure, flow rate and capacity at the time points.

Preferably, the data set is generated in the following specific manner: the data set cleansing includes deleting duplicate pictures, invalidating triggers, and delaying triggering of human-machine unsynchronized respiratory waveform patterns.

The data labeling mode is as follows: if the label box does not cover the full-period waveform, only the portion of the unsynchronized waveform that is different from the synchronized respiratory waveform in this period is covered.

Preferably, the backbone network sequentially comprises a first enhancement convolution module, a second enhancement convolution module, a C3 module of a first fusion CBAM attention mechanism, a third enhancement convolution module, a C3 module of a second fusion CBAM attention mechanism, a fourth enhancement convolution module, a C3 module of a third fusion CBAM attention mechanism, an enhancement convolution module, a C3 module of a fourth fusion CBAM attention mechanism, a CA attention mechanism module and a rapid spatial pyramid pooling module, wherein the modules are sequentially in input and output relation.

The enhanced convolution module is composed of a Conv convolution layer, a BatchNorm regularization layer and a SiLU activation function layer in sequence.

Preferably, the convolution kernel size of the Conv convolution layer in the first enhancement convolution module is 6×6, the step size and the filling are all 2, the convolution kernel sizes of the Conv convolution layers in the second enhancement convolution module, the third enhancement convolution module and the fourth enhancement convolution module are all 3×3, and the step size is 2.

Preferably, in the header network: and after the feature map output by the rapid spatial pyramid pooling module layer of the backbone network is subjected to a fifth enhanced convolution module and upsampling, the feature map is fused with the feature map concat output by the C3 module of the backbone network which is used for third fusing the CBAM attention mechanism.

The output fusion features are sequentially input into a first C3 module, a sixth enhanced convolution module and an up-sampling module and then fused with a C3 module output feature map concat of a second fusion CBAM attention mechanism in a backbone network.

The output feature map sequentially passes through the second C3 module and the seventh enhanced convolution module, and is fused with the output feature concat of the seventh enhanced convolution module to output a fused feature map.

The feature map is then sequentially passed through a third C3 module and an eighth enhanced convolution module, and is fused with the output concat of the fifth enhanced convolution module in the head module and then is input into a fourth C3 module.

And the feature maps output by the second C3 module, the third C3 module and the fourth C3 module are input into a prediction network.

Preferably, the optimal weight judging mode is as follows: map@0.5 and map@0.5 of the calculation training process: and 0.95, and respectively weighting and calculating average weights according to the proportion of 0.1 and 0.9, wherein the weight corresponding to the maximum value of the current weighted sum is used as the optimal solution of the model.

The specified threshold is 0.3.

The invention has the advantages that: the invention mainly carries out feature extraction by driving the pixels of the respiration waveform image, and can obtain deeper feature representation, so that the expression of man-machine asynchronism is more efficient and accurate, the robustness of the extracted abstract feature is stronger, and the generalization capability is better. The human-computer asynchronous detection model based on the improved attention mechanism is suitable for the human-computer asynchronous requirement of image recognition, and the model recognition accuracy is higher. The superiority of the present invention may be manifested by testing comparisons for improved and unmodified attention mechanisms under the same data set.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.

Fig. 1 is a schematic diagram of a network structure according to the present invention.

Fig. 2 is a schematic diagram of an enhanced convolution structure.

FIG. 3 is a schematic diagram of a rapid spatial pyramid pooling module architecture.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

step 1: and data acquisition, namely performing gas interaction with the breathing machine through simulating the lung, and acquiring the pressure, the flow speed and the capacity on the breathing machine during the interaction. Specifically, the simulated lung is used for replacing the lung of a patient, and the gas interaction is carried out with the breathing machine, and the interaction time is as long as 14.6 days. In order to make the data more universal and closer to the lung condition of the actual patient during interaction, various linear and nonlinear parameters of the simulated lung represent different chest lung compliance, airway resistance, oxygenation states and the like of the actual patient, and five parameters on the breathing machine are set according to the simulated lung performance: ventilation frequency, inhalation flow rate, tidal volume, respiratory ratio, and airflow pattern are matched to the simulated lungs. During the data generation period, various types of man-machine asynchronous waveform data such as double trigger, active trigger, reverse trigger, delay trigger, invalid trigger and the like are generated.

The data slicing is different from the prior method which only adopts a single-period time sequence, and the invention adopts a multi-period time sequence. Firstly, when the man-machine asynchronous identification of a real patient is carried out, a respiratory therapist often refers to the respiratory waveform of the previous period, often the first two to three periods of the current period waveform, and has significance for judging whether the current period waveform is the man-machine asynchronous waveform or not.

In the 14.6-day data generation process, the simulated lung simulates the breathing condition of a real patient, the time of synchronous breathing of a person is three to five seconds, so that the breathing frequency of the simulated lung is set between 12 and 24 times per minute each time, and in order to ensure that each breathing waveform contains the current waveform and the first two to three periods of the current waveform, the length of each group of slices is determined to be 14 seconds. The ventilator generated data frequency was 50 frames/second, so each data slice contained 700 time points. There are three data at each time point, which are the data values of the pressure-time waveform, the flow-time waveform, and the capacity-time waveform at that time point, respectively. The 14.6 day data were co-sliced into 71094 data segments.

The data set generation is specifically as follows: the invention uses image processing mode to detect man-machine asynchronism and judge whether the respiration waveform of the patient at a certain moment is invalid trigger or delay trigger. The present invention applies the plt.figure () method in python to draw a slice of data into an image, setting all canvas to be 8inches wide and 16inches long. And drawing a blue curve according to the sequence of the pressure-time waveform, the flow rate-time waveform and the capacity-time waveform from top to bottom when the waveform is drawn. In order to ensure the normalization of the data of each time period, the abscissas of the three time sequence waveforms are gradually increased according to the time sequence, and the abscissas of the three time sequence waveforms are sequentially defined as [ -50,50], [ -80,80], [ -10,1000]. Yellow auxiliary lines with y=0 are drawn in all three timing waveforms on the same canvas. All slices are drawn into pictures according to the rules, and 71094 breathing machine oscillograms are finally generated to form a model basic data set v0.0.

Step 3: and preprocessing the model sample through data set cleaning and image labeling. In the data generation process, the time for collecting the breathing waveform on the breathing machine is 14.6 days to ensure sufficient data sample size. Although the breathing waveform is as generalized as possible by changing the parameter settings that simulate the lungs and ventilator, picture repetition is still unavoidable due to the periodicity of the breathing waveform. Excessive repeated pictures in the data set often cause the occurrence of bad phenomena such as over fitting of the model, reduced generalization capability of the model and the like in the subsequent model training process. In addition, some human-machine unsynchronized types of data are excessive, causing sample imbalance. The present invention therefore uses the image mode of the Duplicate Cleaner software to delete duplicate pictures of the dataset, leaving 4116 non-duplicate pictures. And eliminating the breathing waveform pictures which do not contain the nonsynchronous triggering of the two types of robots, namely the ineffective triggering and the delayed triggering. And 1025 pictures are remained after the data set is cleaned, so that a model data set v1.0 is formed.

The labelimg used in the invention is used for image marking. The label uses a PASCAL VOC format, and the label frame and the label type are stored as XML files. In order to enable the man-machine unsynchronized identification to be accurate to the abnormal position of the waveform, the labeling frame does not cover the full-period waveform, but only covers the part of the unsynchronized waveform in the period, which is different from the synchronized respiratory waveform.

The backbone network sequentially comprises a first enhancement convolution module, a second enhancement convolution module, a C3 module of a first fusion CBAM attention mechanism, a third enhancement convolution module, a C3 module of a second fusion CBAM attention mechanism, a fourth enhancement convolution module, a C3 module of a third fusion CBAM attention mechanism, an enhancement convolution module, a C3 module of a fourth fusion CBAM attention mechanism, a CA attention mechanism module and a rapid space pyramid pooling module, wherein the modules are sequentially in input and output relation.

The convolution kernel size of the Conv convolution layer in the first enhancement convolution module is 6 multiplied by 6, the step length and the filling are all 2, the convolution kernel sizes of the Conv convolution layer in the second enhancement convolution module, the third enhancement convolution module and the fourth enhancement convolution module are all 3 multiplied by 3, and the step length is 2.

The convolution kernel size of Conv convolution layer in the first enhancement convolution module is 6×6, step length and filling are all 2, after the sample picture is subjected to downsampling by the enhancement convolution module, the output characteristics pass through a C3 module of a first fusion CBAM attention mechanism, and the CBAM is an attention mechanism module combining spatial attention and channel attention. The characteristics are firstly passed through a channel attention module, a weighted result is obtained, then the weighted result is passed through a space attention module, the obtained characteristics are input into a next layer of enhanced convolution module, and the convolution kernels of Conv convolution layers in the second, third and fourth enhanced convolution modules are 3 multiplied by 3, and the step length is 2. After passing through the fourth C3 module fusing the CBAM attention mechanism, the C3 module inputs the C3 module into the CA attention module, and the CA module respectively performs two operations: a Coordinatate information embedding operation and a Coordinate Attention generation operation. Firstly, the characteristic diagram input in the previous layer is respectively subjected to AvgPool operation from width and height in two directions, and the characteristic diagrams of the two global receptive fields are respectively obtained. After splicing, the new feature map is obtained through dimension reduction in a convolution module with a convolution kernel of 1*1, and after normalization, the feature map is activated by using a sigmoid activation function. Again, the feature map with a channel number of still 3 was calculated using a convolution with a convolution kernel 1*1 bi-directionally in terms of height and width. After the feature map is subjected to a sigmoid activation function, the attention weight of the feature map in height and the attention weight of the feature map in width are obtained respectively. After the CA attention mechanism module, any feature map in the network is compressed and recombined to form a feature map with unchanged scale, but a large amount of parameters and calculation amount in the model are reduced. This tensor is input into a subsequent fast spatial pyramid pooling module, the structure of which is shown in fig. 2. The feature map is sequentially input into the enhanced convolution structure and the MaxPool layers of three 5*5 by the rapid spatial pyramid pooling module structure, four structures are concat and then input into the enhanced convolution respectively, and the enhanced convolution structure in the rapid spatial pyramid pooling module is identical to that in the enhanced convolution structure in FIG. 1.

In the header network: and after the feature map output by the rapid spatial pyramid pooling module layer of the backbone network is subjected to a fifth enhanced convolution module and upsampling, the feature map is fused with the feature map concat output by the C3 module of the backbone network which is used for third fusing the CBAM attention mechanism.

The superiority of the present invention may be manifested by testing the comparison of improved and non-improved attention mechanisms under the same data set, with the model of non-attention mechanisms, the model of CBAM attention mechanisms, the model of CA attention mechanisms and the model of improved attention mechanisms map@0.5 being 0.977, 0.958, 0.973 and 0.98, respectively. The model of the delay trigger type no-attention mechanism, the model of the CBAM attention mechanism, the model of the CA attention mechanism, and the map@0.5 of the model of the improved attention mechanism are 0.921, 0.703, 0.952, and 0.947, respectively.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A human-machine dyssynchrony detection method based on an improved attention mechanism, comprising:

step 1: data acquisition, namely performing gas interaction between a simulated lung and a breathing machine, and acquiring pressure, flow speed and capacity on the breathing machine during interaction;

step 2: slicing the acquired pressure, flow rate and capacity, and drawing the data of the data slice into an image generation model sample, wherein the model sample comprises a pressure-time waveform diagram, a flow rate-time waveform diagram and a capacity-time waveform diagram;

step 3: preprocessing a model sample through data set cleaning and image labeling;

step 4: the method comprises the steps of constructing a yolov5s man-machine asynchronous recognition network model for improving an attention mechanism based on a yolov5s framework, wherein the model comprises a main network, a head network and a prediction network, introducing a C3CBAM compound attention mechanism after each convolution layer of the main network, and adding a CA layer attention mechanism before a rapid spatial pyramid pooling module layer of the main network;

step 5: dividing a model sample into a training set and a verification set, training a yolov5s human-machine asynchronous identification network model of an improved attention mechanism through the training set, taking primary forward calculation and reverse propagation as a training period, and selecting the weight with optimal performance in the training process as the weight of the yolov5s detection human-machine asynchronous model to obtain the trained yolov5s human-machine asynchronous identification network model of the improved attention mechanism;

step 6: collecting real-time detected patient breathing machine data, preprocessing the data, inputting the preprocessed data into a yolov5s human-machine asynchronous recognition network model of an improved attention mechanism after training is completed, and obtaining whether human-machine asynchronous conditions occur to a patient or not;

step 7: when the confidence coefficient is larger than a specified threshold, the annotation frame is judged to be asynchronous to a human machine, the annotation frame is drawn on the breathing waveform diagram by the corresponding coordinates, otherwise, synchronous breathing is judged currently, no processing is carried out, and the confidence coefficient is multiplied by the probability that a target exists, the intersection ratio of the prediction frame and the real frame, and the probability that the prediction frame belongs to each category.

2. The improved attention mechanism based human-machine dyssynchrony detection method as in claim 1, wherein the data acquisition period is 14.6 days.

3. The improved attention mechanism based human-machine dyssynchrony detection method of claim 1, wherein the data slices are in a multi-cycle time series, each set of slices having a length of 14 seconds and each data slice having 700 time points, each time point having three data, namely data values of pressure, flow rate and capacity at the time points.

4. A method of human-machine dyssynchrony detection based on an improved attention mechanism as claimed in claim 3, wherein the dataset generation is specifically as follows: the data set cleaning comprises deleting repeated pictures, invalid triggering and delay triggering of a human-computer unsynchronized respiration waveform chart;

5. The man-machine unsynchronized detection method based on an improved attention mechanism according to any one of claims 1-4, wherein said backbone network is composed of a first enhanced convolution module, a second enhanced convolution module, a C3 module of a first fused CBAM attention mechanism, a third enhanced convolution module, a C3 module of a second fused CBAM attention mechanism, a fourth enhanced convolution module, a C3 module of a third fused CBAM attention mechanism, an enhanced convolution module, a C3 module of a fourth fused CBAM attention mechanism, a CA attention mechanism module, and a fast spatial pyramid pooling module in sequence, wherein the input and output relationships are sequentially provided between the modules;

6. The improved attention mechanism based human-machine asynchronous detection method of claim 5 wherein the Conv convolution layer in the first enhanced convolution module has a convolution kernel size of 6 x 6, step sizes and padding of 2, and the Conv convolution layer in the second enhanced convolution module, the third enhanced convolution module and the fourth enhanced convolution module has a convolution kernel size of 3 x 3, and step sizes of 2.

7. The improved attention mechanism based human-machine dyssynchrony detection method as in claim 6, wherein in the head network: after the feature map output by the rapid spatial pyramid pooling module layer of the backbone network is subjected to a fifth enhanced convolution module and upsampling, the feature map is fused with a feature map concat output by a C3 module of a third fused CBAM attention mechanism of the backbone network;

the output fusion characteristics are sequentially input into a first C3 module, a sixth enhanced convolution module and an up-sampling module and then fused with a C3 module output characteristic diagram of a second fusion CBAM attention mechanism in a backbone network;

the output feature map sequentially passes through a second C3 module and a seventh enhanced convolution module, and is fused with the output feature concat of the seventh enhanced convolution module to output a fused feature map;

the feature map is fused with the output concat of the fifth enhanced convolution module in the head module and then is input into the fourth C3 module after passing through the third C3 module and the eighth enhanced convolution module in turn;

8. The human-machine dyssynchrony detection method based on improved attention mechanism as claimed in claim 1, wherein the optimal weight judgment mode is as follows: map@0.5 and map@0.5 of the calculation training process: 0.95, and respectively weighting and calculating average weights according to the proportion of 0.1 to 0.9, wherein the weight corresponding to the maximum value of the current weighted sum is used as the optimal solution of the model;

the specified threshold is 0.3.

9. A human-machine dyssynchrony detection device for an improved attention mechanism comprising a processor and a memory storing program instructions, characterized in that the processor is configured to perform the human-machine dyssynchrony detection method for an improved attention mechanism as claimed in any one of claims 1 to 8 when running the program instructions.

10. A storage medium storing program instructions which, when executed, perform the improved attention mechanism based human-machine dyssynchrony detection method as claimed in any one of claims 1 to 8.