CN114679759B

CN114679759B - Wearable electrocardiograph monitoring network switching method based on reinforcement learning

Info

Publication number: CN114679759B
Application number: CN202210323583.6A
Authority: CN
Inventors: 张羽; 赵文娟; 杨慧; 亢羽童
Original assignee: Ningbo Research Institute of Northwestern Polytechnical University
Current assignee: Ningbo Research Institute of Northwestern Polytechnical University
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2023-06-09
Anticipated expiration: 2042-03-29
Also published as: CN114679759A

Abstract

The invention discloses a wearable electrocardiograph monitoring network switching method based on reinforcement learning, which comprises the following steps: and (3) inputting an environment state, preprocessing data, using a Qlearning decision output action, receiving an environment feedback reward, updating a Qtable parameter, and executing network switching. Various factors in the wearable electrocardiograph monitoring environment can be comprehensively considered, the accuracy of the model is improved, meanwhile, the calculated amount of the model is guaranteed to be reduced as much as possible when a correct decision is made, the task of timely switching the wearable network is efficiently completed, and the method can be used as a model foundation for intelligent switching of the wearable electrocardiograph monitoring data transmission network.

Description

Wearable electrocardiograph monitoring network switching method based on reinforcement learning

Technical Field

The invention relates to the field of communication control, in particular to a wearable electrocardiograph monitoring network switching method based on reinforcement learning.

Background

Wearable electrocardiographic monitoring can improve the efficiency of healthcare. However, the current wearable devices relying on only a single wireless transmission technology do not guarantee that the patient can connect to the remote electrocardiographic healthcare center at any time and any place. For this reason, a vertical switching scheme sensitive to the health condition of the patient is proposed for remote wearable electrocardiographic monitoring applications under heterogeneous wireless networks. Vertical handover in heterogeneous networks refers to the process of migrating from one network to another without breaking a traffic link. Therefore, in the existing communication technology, the most appropriate technology must be selected according to factors such as power level, data rate, operation range, coexistence of multiple technologies and the like required by the wearable electrocardiograph data transmission, so as to reduce network switching delay time, thereby ensuring that electrocardiograph monitoring data can be transmitted in real time.

The prior art has the method for scanning Bluetooth communication or performing network switching by a cellular network according to the blacklist application of the terminal equipment so as to avoid the consumption of large flow, thereby providing good Internet surfing experience. However, the threshold condition set by the method is too complicated, and adds a lot of unnecessary calculation amount and energy consumption required by scanning. And selecting a network which meets the current service performance requirement and has highest reliability according to the time-space characteristics of the network connection of the mobile terminal so as to ensure that the transmission delay is small and the reliability is high. However, the method needs to collect information such as the position, time period, service type, network reliability and the like of each mobile terminal, needs a third-party server, has large calculation amount, and is not suitable for a physiological data real-time transmission system. Meanwhile, the method has the advantages that the network type is determined according to the use state of the network related function in the terminal and is switched to a matched high-power-consumption or low-power-consumption network, but the method does not consider the real-time transmission condition of the monitoring data of the wearable equipment, and the network type with high power consumption and low power consumption cannot be judged easily.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a wearable electrocardiograph monitoring network switching method based on reinforcement learning.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

a wearable electrocardiograph monitoring network switching method based on reinforcement learning comprises the following steps:

s1, receiving data returned by environment network monitoring equipment and an electrocardio data sensing module;

s2, preprocessing the data returned in the step S1, and obtaining a Q table about expected actions and corresponding expected action gain values by using a Q learning algorithm;

s3, classifying the data of the Q table preprocessed in the step S2 into clusters by using a subtractive clustering method, and calculating uncertain and inaccurate data to obtain an environment state matrix;

and S4, finding out the output with the maximum profit value from the Q table, sending the corresponding optimal action as output to an external task controller for execution, returning to the step S1 for continuous execution if the task is not completed after the action is executed, and ending the work if the task is completed.

Further, the data returned by the environmental network monitoring device in S1 includes the received signal strength RSSI of the bluetooth network of the wearable electrocardiograph monitoring device at time t _t Throughput _t Data transmission rate R _t ；

The data returned by the electrocardio data sensing module comprises the frequency of electrocardio data at the moment t

/>

Further, the data preprocessing method in S2 is as follows:

s21, sampling the data returned in the step S1 at fixed intervals delta t, wherein each group of data comprises a plurality of samples;

s22, encoding the data sampled in the S21 into a data sequence to obtain a received signal strength data sequence RSSI of the current connection network in a time starting delta t interval _Δt The Throughput data sequence of the current connection network in the delta t interval is started at the moment t _Δt Data sequence R of current connection network data transmission rate in delta t interval is started at time t _Δt And remove RSSI _Δt 、Throughput _Δt and R_Δt Maximum and minimum values of (a);

s23, removing RSSI _Δt 、Throughput _Δt and R_Δt The data sequence after the maximum value and the minimum value in the sequence is averaged to obtain the corresponding sequence average value

and />

And the obtained sequence average value and the current electrocardio data frequency at the time t are added>

Merging environmental statesMatrix S _t 。

Further, the calculation method of the Q learning algorithm in S2 is as follows:

wherein ,Q(s_t ,a _t ) Q table value, s at time t _t For the environmental state matrix value at time t, a _t Action at time t, S _t Is an environmental state matrix, r _t And alpha and gamma are constant parameters for the transmission rate of the current connection network at the moment t.

Further, the specific calculation mode of the subtractive clustering method in S3 is as follows:

wherein ,χ_i 、χ _j Respectively the ith and the jth data in the Q table data after S2 pretreatment, r is the cluster radius, χ _j -χ _i For euclidean distances for the i and j data,

d is the data point with the highest density index _j Data point χ _i Density index of the location,/->

Is a two-norm square operation.

Further, the step S4 specifically includes the following steps:

s41, initializing Q (S, a), S ε S _t A is E A, A is an action set, and an environment state matrix S is initialized according to the result after subtraction clustering _t ；

S42, updating the environment state matrix S according to the network parameters of the connected network _t And updating the Q table value;

s43, selecting and outputting the maximum Q value max (Q (S) _t A) corresponding maximumOptimal action a' _t And the task is executed by the external task control controller, and if the task is not completed after the execution, the step S1 is returned.

The invention has the following beneficial effects:

the invention creatively provides a heterogeneous network switching light reinforcement learning model construction method for wearable electrocardiograph monitoring equipment, and the adaptability of the model to perform rapid switching tasks in the wearable network environment is greatly improved by comprehensively considering network environment factors and ECG signal frequency characteristics of the wearable network switching; the invention also adopts the subtractive clustering technology to classify the switching measurement into the corresponding state, improves the accuracy of the Q table, can effectively reduce the calculated amount of the model, and is suitable for providing service for the wearable network in an embedded environment.

Drawings

Fig. 1 is a schematic flow chart of a wearable electrocardiographic monitoring network switching method based on reinforcement learning.

Fig. 2 is a schematic diagram of a network handover embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.

A wearable electrocardiograph monitoring network switching method based on reinforcement learning, as shown in figure 1, comprises the following steps:

in this embodiment, the model receives as input data returned by the network environment state monitor, and environment state information such as the received signal strength RSSI of the bluetooth network of the wearable electrocardiograph monitoring device at time t _t Throughput _t Data transmission rate R _t The method comprises the steps of carrying out a first treatment on the surface of the Receiving electrocardio frequency data returned by an electrocardio data sensing module as input, such as electrocardio data frequency at time t

as shown in fig. 2, the data preprocessing module mainly performs data sampling, and uses data obtained by sampling network condition monitor data and electrocardiographic data frequency at fixed intervals as output of the data preprocessing module, specifically, the method comprises the following steps:

and />

Merging environmental state matrix S _t 。

In the embodiment, as shown in fig. 2, a model adopts a Q learning algorithm (formula 1), and performs efficient learning and timely response on a network environment, namely, Q (s, a) takes action a to obtain a benefit expectation in a state of a certain moment, the environment feeds back corresponding rewards according to the action of a reagent, a state and an action are constructed into a q_table table to store a Q value, then the action capable of obtaining the maximum benefit is selected according to the Q value, so that an optimal decision is made, the defect of long time delay caused by network switching is overcome, and the frequency characteristic of electrocardiographic data is combined, the electrocardiographic data is subjected to layered coding according to the electrocardiographic sampling frequency, and is divided into a low-frequency ECG containing important cardiac condition details and a high-frequency ECG increasing the detail quantity of electrocardiographic signals, and then switching performance is evaluated from two aspects of energy efficiency and switching delay. The model judges whether the network switching condition is met according to the locally recorded historical information about the network transmission quality, dynamically determines that the network interface of the current wearable device autonomously performs network switching without information interaction with a receiving end, effectively avoids occurrence of network interruption events, and greatly reduces the time delay of network switching. In order to improve training efficiency and obtain a smaller Q table, a subtractive clustering technology (formula 2) is introduced to classify switching metrics into corresponding state intervals according to data distribution, and input indexes are classified into clusters, so that uncertain and inaccurate data are effectively processed, and the influence of reasoning on decisions is reduced to the greatest extent.

Is a two-norm square operation.

S3, classifying the data of the Q table preprocessed in the step S2 into clusters by using a subtractive clustering method, and calculating uncertain and inaccurate data to obtain a new environment state matrix;

Specifically, the method comprises the following steps:

the step S4 specifically comprises the following steps:

s43, selecting and outputting the maximum Q value max (Q (S) _t The optimal action a 'corresponding to A)' _t And the task is executed by the external task control controller, and if the task is not completed after the execution, the step S1 is returned.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims

1. A wearable electrocardiograph monitoring network switching method based on reinforcement learning is characterized by comprising the following steps:

s1, receiving data returned by environment network monitoring equipment and an electrocardiograph data sensing module, wherein the data returned by the environment network monitoring equipment comprises

Received signal strength of Bluetooth network of wearable electrocardio monitoring equipment at moment +.>

Throughput of

Data transmission rate->

；

The data returned by the electrocardio data sensing module comprises

Time electrocardiographic data frequency->

S2, preprocessing the data returned in the step S1, wherein the specific mode is as follows:

the data preprocessing mode is as follows:

s21, interval fixed time

Sampling the data returned in the step S1, wherein each group of data comprises a plurality of samples;

s22, encoding the data sampled in the S21 into a data sequence to obtain a time beginning

The currently connected network in the interval receives the signal strength data sequence +.>

，/>

Start of moment->

Current connection network throughput data sequence within an interval

，/>

Start of moment->

Data sequence of data transmission rate of current connection network in interval +.>

And remove->

、/>

and />

Maximum and minimum values of (a);

s23, removing

、/>

and />

The data sequences after the maximum value and the minimum value in (1) are averaged to obtain the corresponding sequence average value +.>

, />

and />

And averaging the obtained sequences and +.>

The current electrocardiographic data frequency of the moment +.>

Merging environmental state matrix->

；

Obtaining a Q table about expected actions and return values corresponding to the expected actions by using a Q learning algorithm, wherein the Q learning algorithm is calculated in the following way:

wherein ,

q table value at time t +.>

For the ambient state matrix value at time t +.>

For action at time t->

Is an environmental state matrix>

For the transmission rate of the currently connected network at time t, < >>

Is a constant parameter;

s3, classifying the data of the Q table preprocessed in the step S2 into clusters by using a subtractive clustering method, and calculating uncertain and inaccurate data to obtain an environment state matrix, wherein the subtractive clustering method comprises the following specific calculation modes:

wherein ,

respectively, i and j data in the Q table data after S2 pretreatment,/>

For the radius of the cluster,

euclidean distance for the ith and jth data,>

for the data point with the highest density index, +.>

Is data point->

Density index of the location,/->

Performing square operation for two norms;

s4, finding out the output with the biggest profit value in the Q table, sending the corresponding optimal action as output to an external task controller for execution, returning to the step S1 for continuous execution if the task is not completed after the action is executed, and ending the work if the task is completed, wherein the method specifically comprises the following steps:

s41, initializing