CN114845332A - Millimeter wave communication link blocking prediction method based on visual information fusion - Google Patents

Millimeter wave communication link blocking prediction method based on visual information fusion Download PDF

Info

Publication number
CN114845332A
CN114845332A CN202210480580.3A CN202210480580A CN114845332A CN 114845332 A CN114845332 A CN 114845332A CN 202210480580 A CN202210480580 A CN 202210480580A CN 114845332 A CN114845332 A CN 114845332A
Authority
CN
China
Prior art keywords
sequence
model
module
embedding
coordinate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210480580.3A
Other languages
Chinese (zh)
Inventor
杨绿溪
张明寒
邓淼佩
周婷
李春国
黄永明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210480580.3A priority Critical patent/CN114845332A/en
Publication of CN114845332A publication Critical patent/CN114845332A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W36/00Hand-off or reselection arrangements
    • H04W36/08Reselecting an access point
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W36/00Hand-off or reselection arrangements
    • H04W36/24Reselection being triggered by specific parameters
    • H04W36/30Reselection being triggered by specific parameters by measured or perceived connection quality data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a millimeter wave communication link blocking prediction method based on visual information fusion. The invention can effectively predict the mobile blocking condition in the communication process, can ensure that a user actively switches to another line-of-sight link base station before the blocking occurs, ensures that the communication is always in a line-of-sight link state, and improves the stability of a millimeter wave communication system.

Description

Millimeter wave communication link blocking prediction method based on visual information fusion
Technical Field
The invention belongs to the field of wireless communication and deep learning, and particularly relates to a millimeter wave communication system beam blocking prediction method based on visual information fusion.
Background
Millimeter waves and large-scale MIMO are one of important key technologies of 5G mobile communication, the large bandwidth of the millimeter waves can greatly improve the channel capacity, and the high data rate requirements of various applications such as unmanned driving and virtual reality in the future can be met. By utilizing the wave beam forming technology, the base station can aim the wave beam direction of the signal at the position of the user, and the signal to noise ratio of communication is improved.
However, one of the key challenges faced by millimeter wave communication systems is the susceptibility of high frequency signals to blocking. High frequency signals are transmitted mainly by line-of-sight links due to their high free space loss and weak reflection capability. When there is an object obstacle between the user and the communication base station, the received signal-to-noise ratio will drop dramatically, which may cause a sudden interruption of communication and seriously affect the stability of communication. When there is a blockage in the communication link between the user and the base station, it is often necessary to re-establish a new line-of-sight link, which usually requires some processing time. Especially for massive MIMO systems, beam training tends to bring large time overhead. In view of the low latency requirements of future communication networks, it is desirable that the communication system not only maintain line-of-sight connectivity, but also be capable of sensing future congestion.
Some studies have demonstrated that machine learning models can utilize wireless channel data (e.g., channel or received power) to distinguish line-of-sight links from non-line-of-sight links, e.g., congestion prediction can be performed by collecting a user's beam sequence input to a gated recursive network (GRU). However, the algorithm is suitable for the case of fixed blocking and cannot predict the mobile blocking well.
The multimodal deep learning technology is designed by an algorithm so that a model can simultaneously acquire information of a plurality of modes such as characters, images and sounds, and has recently achieved excellent performance in many natural language processing tasks. In a communication system, the multi-mode technology can be utilized to combine the wireless channel data with other modal data, so that the perception capability of the algorithm to the environment is improved.
Disclosure of Invention
The invention aims to provide a visual fusion beam blocking prediction method based on a Transformer in order to cope with a complex scene of multi-direction mobile blocking in a real communication network, so as to realize the purpose of sensing the blocking condition of burst in a millimeter wave communication system in advance. The scheme can enable a user to actively switch to other line-of-sight link base stations before the blockage occurs, avoid the situation that the signal-to-noise ratio is suddenly reduced due to the blockage in the communication process, and ensure the stability of the communication process.
In order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a millimeter wave communication link blocking prediction method based on visual information fusion comprises the following steps:
step (1): the method comprises the steps of modeling a beam blocking prediction problem into a two-classification problem based on multi-mode information, wherein the model consists of a target detection module, a camera selection module, an embedding module, a Transformer module and a classification module. Initializing model parameters including neural network weights and biases of the modules;
the target detection module is responsible for positioning the coordinates of suspected obstacles in the acquired image, the embedding module is responsible for encoding the input beam sequence and the target coordinate sequence into vectors with specified dimensions, the camera selection module predicts the number of cameras where a user is located through the input beam sequence, the Transformer module is an encoder based on an attention mechanism, and the classification module finally outputs the classification result of the model two.
The millimeter wave base station is equipped with three cameras, two of which are side cameras with a 75 degree field of view and one of which is a center camera with a 110 degree field of view.
Step (2): for user u, at each slot τ, a sequence of beams { b ] of length r is constructed u [τ-r+1],...,b u [τ]And image sequence X n [τ-r+1],...,X n [τ]As a training sample sequence S u . Simultaneously constructing a link state sequence { a ] with the length r u [τ+1],...,a u [τ+r′]As training sample label q u
(2.1) defining an input sequence: the method aims to develop a deep learning model by utilizing an RGB image sequence and a beam sequence to predict the blocking condition of a communication link. For any user u in the communication environment, the image sequence and the beam sequence observed in r unit time intervals form a group of input sequences. For any time slot
Figure BDA0003627442200000021
The sequence is shown below
Figure BDA0003627442200000022
Wherein the content of the first and second substances,
Figure BDA0003627442200000023
representing the RGB image captured by the nth camera in the t-th time slot, W, H, C represents the width, height and number of color channels of the image, respectively. b u [t]Representation codebook
Figure BDA0003627442200000024
Is used to serve the index of the beamforming vector of user u in the t-th slot.
Figure BDA0003627442200000025
Representing the length of the observation interval.
(2.2) defining an output variable q u : let a u [t]E {0,1} represents the communication link status of user u at the tth time slot, where 0 represents line-of-sight communication and 1 represents non-line-of-sight communication. The link connection state q of user u in a time window of length r' in the future u Is shown below
Figure BDA0003627442200000026
Where 0 indicates that user u is maintaining line-of-sight communication throughout the time window and 1 indicates that link congestion occurs during the time window.
(2.3) defining a model function: the algorithm of the invention aims to establish a function f Θ (S) the function receives the observed image-beam sequence pairs for future link states
Figure BDA0003627442200000027
And (6) performing prediction. Where Θ represents a set of parameters of the model, learned from the tag sequence dataset. The goal of model training can be expressed as follows
Figure BDA0003627442200000031
And (3): image sequence { X n [τ-r+1],...,X n [τ]Inputting a target detection module, and outputting a coordinate sequence of a detection frame of the barrier { d } n [τ-r+1],...,d n [τ]};
The target detection module needs to have two basic capabilities of 1) rapidly and accurately detecting object coordinates and 2) effectively identifying object types. The YOLO detector can well realize detection precision, and the module adopts the latest YOLOv5 framework and is optimized. The original architecture is modified to detect objects of interest in the scene, i.e. objects that may cause a blockage to the user communication link in the communication scene, such as buses, trucks, trees, buildings, etc.
For a certain time slot τ, the following steps are performed in order:
(3.1) obtaining a sequence of RGB images { X } n [τ-r+1],...,X n [τ]},
Figure BDA0003627442200000032
(3.2) sequencing the sequence X n Inputting a YOLO detector to obtain a coordinate of a detection target boundary box;
(3.3) converting each bounding box coordinate into a 6-dimensional vector including a center coordinate [ x ] cent ,y cent ]Coordinates of upper left corner [ x ] 1 ,y 1 ]And the lower right corner coordinate [ x ] 2 ,y 2 ]. The coordinates are normalized to the interval [0,1 ]]Together they mark the exact position of an object in the scene;
(3.4) stacking the transformed coordinate vectors of an image into a high-dimensional vector
Figure BDA0003627442200000033
Wherein M represents the number of target objects detected in the image, and t is ∈ { tau-r + 1. Since the algorithm scene is a dynamic communication environment, the number of detection objects in the image at each moment is not fixed, which results in that
Figure BDA0003627442200000034
Is variable in length. Thus, padding with N-M zero vectors yields a sequence
Figure BDA0003627442200000035
(3.5) the module finally outputs the coordinate sequence of the detection frame
Figure BDA0003627442200000036
And (4): beam sequence b u [τ-r+1],...,b u [τ]Inputting the sequence into a beam embedding module to obtain a beam embedding sequence { b [ tau-r +1 ]],...,b[τ]}. And inputting the beam embedding sequence into a camera selection module, and judging the camera where the user is located at the moment. Inputting the detection coordinate sequence dn corresponding to the camera into a coordinate embedding module, and outputting a corresponding detection coordinate embedding sequence { d [ tau-r +1],...,d[τ]};
(4.1) obtaining a beam sequence { b) of a user u u [τ-r+1],...,b u [τ]Wherein the beams are codebooks
Figure BDA00036274422000000311
The index of the optimal codeword of the serving user. The definition of the optimal code word is as follows
Figure BDA0003627442200000037
Wherein
Figure BDA0003627442200000038
As a codebook
Figure BDA00036274422000000312
Code word of (1), N m Is the number of base station antennas.
Figure BDA0003627442200000039
Is a downlink channel between the base station and the user. P s Which is representative of the transmitted power,
Figure BDA00036274422000000310
representing the noise power and k the kth carrier.
(4.2) Beam sequence { b u [τ-r+1],...,b u [τ]Is input to the beam embedding module. Since the algorithm will receive data (beams and images) of two modalities, and the dimensions of the two information are different, it is necessary to convert them into vectors of the same dimension by the embedding module.
For the beam sequence, a lookup table with the size | F | is generated, and the beam codeword index b is input n [t]The embedding layer returns the embedded vector corresponding to the index
Figure BDA0003627442200000041
Wherein d is model Is a defined feature vector dimension.
(4.3) embedding the Beam into the sequence { b [ tau-r +1],...,b[τ]Is input into a camera selection model NET s Outputting the feature vector
Figure BDA0003627442200000042
The camera selection module comprises L s The model can be expressed as
Figure BDA0003627442200000043
Wherein Θ is s ={W s ,b s Denotes the weight and bias of the fully connected layer,
Figure BDA0003627442200000044
a non-linear function representing a model, which can be written as
Figure BDA0003627442200000045
Wherein
Figure BDA0003627442200000046
Indicating the Relu activation function.
And (5): fusing the target detection coordinate embedding sequence and the wave beam embedding sequence, sending the fused sequences into a transformer encoder module, coding the sequences, sending the coded sequences into a classification module for secondary classification, and predicting the link connection state of a user u in a time window with the future length of r
Figure BDA0003627442200000047
(5.1) since the Transformer relies only on the attention mechanism, there is no loop and convolution structure, in order for the model to be able to exploit the order information of the sequence, it is necessary to insert some information with absolute position before the input sequence. In the invention, Positionembedding and Modal-type embedding modes are adopted to encode the input sequence.
Wherein Positionembedding is calculated as follows
Figure BDA0003627442200000048
Figure BDA0003627442200000049
Wherein
Figure BDA00036274422000000410
Representing the position of the token in the sequence, L seq Indicating the length of the sequence. i ∈ [ 0.,. d. ], d ∈ model /2) represents the dimension of positionedbudding.
Embedding the beam into the sequence b [ tau-r +1 ] in turn],...,b[τ]D [ tau-r +1 ] and target detection coordinate embedding sequence],...,d[τ]Sending into a position coding function F PE (·)
b=b+F PE (b)
d=d+F PE (d)
The Modal-type embedding is mainly used for enabling the model to distinguish information of two modes, namely
ME b =full_like(b,0)
ME d =full_like(d,1)
b=b+ME b (b)
d=d+ME d (d)
Wherein full _ like (x, n) indicates that a vector with the same dimension as x is constructed and filled with n.
And (5.2) splicing the beam sequence b and the target detection coordinate sequence d, and inputting the spliced beam sequence b and the target detection coordinate sequence d into a transformer encoder model. The transformer encoder is formed by stacking L multi-head attention layers and feedforward neural network layers. The algorithm flow of the multi-head attention mechanism is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
where head i =Attention(QW i Q ,KW i K ,VW i V )
Figure BDA0003627442200000051
wherein, the input parameters of the MultiHead function
Q=K=V={b[τ-r+1],...,b[τ],d[τ-r+1],...,d[τ]}
(5.3) obtaining a characteristic vector after passing through a transformer encoder model
Figure BDA0003627442200000052
Inputting yt to the classification module NET o Obtain the final predicted result
Figure BDA0003627442200000053
Figure BDA0003627442200000054
Figure BDA0003627442200000055
Wherein
Figure BDA0003627442200000056
Indicating the Relu activation function.
And (6): calculating a predicted value
Figure BDA0003627442200000057
And a label q u The loss function of (2) performs inverse gradient update on the model parameter Θ;
classification model output occlusion prediction
Figure BDA0003627442200000058
Blocking label q u E {0,1 }. In step (4), model output camera selection
Figure BDA0003627442200000059
Camera tag y s ∈{(0,0,1),(0,1,0),(1,0,0)}
Defining a loss function
Figure BDA00036274422000000510
As follows
Figure BDA00036274422000000511
Figure BDA00036274422000000512
Figure BDA00036274422000000513
Wherein
Figure BDA00036274422000000514
In order to predict the loss of congestion,
Figure BDA00036274422000000515
in order to predict the loss of the camera,
Figure BDA00036274422000000516
alpha is a predicted camera loss weight coefficient for the model total loss. Updating model parameters theta by adopting a random gradient descent method
Figure BDA0003627442200000061
And (3) wherein lambda is a learning rate, and steps (2) to (6) are executed in a circulating manner until the algorithm is converged.
The invention has the beneficial effects that:
1) by utilizing a machine learning algorithm, a user can predict the impending communication link blockage so as to switch a communication network to other line-of-sight links in advance and ensure the stability of communication;
2) the model is based on beam and image bimodal information, and compared with the condition that the model only based on wireless information is limited by fixed blockage, the method is suitable for the complex scene of multidirectional mobile blockage;
3) by utilizing the Transformer model based on the attention mechanism, compared with networks such as RNN, LSTM and the like, the parallel computing capability of the model is greatly improved, the time for model training and reasoning is shortened, and the large-scale practical application is facilitated.
4) The model only depends on the beam sequence information and the image information of the user to train and reason, and is insensitive to the signal-to-noise ratio change of the communication environment.
Drawings
FIG. 1 is a flow chart of a method for beam blockage prediction;
FIG. 2 is a schematic diagram of a Transformer module;
FIG. 3 is a validation set ROC graph.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings and the detailed implementation mode.
In order to cope with the complex scene of multidirectional mobile blocking existing in a real communication network, the invention provides a visual fusion beam blocking prediction method based on a Transformer, so as to realize the early perception of the blocking condition of burst arrival in a millimeter wave communication system. The scheme can enable a user to actively switch to other line-of-sight link base stations before the blockage occurs, avoid the situation that the signal-to-noise ratio is suddenly reduced due to the blockage in the communication process, and ensure the stability of the communication process.
As shown in fig. 1-3, the beam blocking prediction problem is modeled as a multi-modal two-classification problem based on beams and images, the model consisting of an object detection module, an image selection module, an embedding module, a transformer encoder module, and a classification module. The system comprises a target detection module, an embedding module, a camera selection module, a transformer encoder module, a classification module and a model II classification module, wherein the target detection module is used for positioning coordinates of suspected obstacles from an acquired image, the embedding module is used for encoding an input beam sequence and a target coordinate sequence into vectors with specified dimensions, the camera selection module is used for predicting the number of cameras where a user is located through the input beam sequence, the transformer encoder module is an encoder based on an attention mechanism, and the classification module finally outputs a model II classification result.
1. Simulation environment construction
The simulated communication environment is built based on an open source scenario-the ViWi multi-user scenario "ASUDT 1_ 28". This is an outdoor millimeter wave communication environment built using a game engine and ray tracing software. It was developed using the ViWi data generation framework. The scene depicts a typical busy street including vehicles, pedestrians, trees, buildings, and so on. The cars moving in the scene represent the communication users. While large vehicles, such as moving buses and trucks, act as dynamic blockages during user communications. The simulation scenario includes a total of 50 cars, 8 buses, and 2 trucks, all moving at different speeds.
A millimeter wave base station with the working frequency of 28GHz is deployed beside a street, and the base station is provided with three cameras with the heights of 4.5 meters and different directions, wherein two cameras are side cameras and have 75-degree visual fields, and one camera is a central camera and has 110-degree visual fields. The base station has a uniform linear array of N antennas, using a predefined DFT codebook
Figure BDA0003627442200000071
Wherein
Figure BDA0003627442200000072
Can be expressed as
Figure BDA0003627442200000073
Figure BDA0003627442200000074
The user u communication in the simulation system adopts OFDM of K subcarriers, and the received downlink signal is
Figure BDA0003627442200000075
Wherein
Figure BDA0003627442200000076
Is the received signal of user u on carrier k,
Figure BDA0003627442200000077
is the channel of base station and user u at carrier k, n k Is subject to a Gaussian distribution
Figure BDA0003627442200000078
Random noise of (2). h is u,k The channel is represented as follows
Figure BDA0003627442200000079
Wherein alpha is l For the attenuation coefficient of the l-th path,
Figure BDA00036274422000000710
for the departure azimuth angle of the ith path,
Figure BDA00036274422000000711
elevation of departure for the ith path. Upsilon is l Is the phase of path l, τ l Is the propagation delay of path l, B is the signal bandwidth, K is the number of carriers, a (-) is the channel response vector.
For user u, the optimal codeword index between the current time and the base station is obtained as the beam, and the definition of the optimal codeword is as follows
Figure BDA00036274422000000712
Wherein
Figure BDA00036274422000000713
As a codebook
Figure BDA00036274422000000714
Code word of (1), N m Is the number of base station antennas.
Figure BDA00036274422000000715
Is a downlink channel between the base station and the user u. P s Which is representative of the transmitted power,
Figure BDA00036274422000000716
representing the noise power and k the kth carrier.
The simulation environment setting parameters are as follows:
1) the base station antennas are distributed in a uniform linear array, and the number of the antennas is 128; the user side adopts a single full-phase antenna
2) The working frequency of the base station is 28GHz
3) With DFT codebook, there are 128 code words
4) Carrier number K64
2. Communication blocking prediction method training process based on visual information fusion
Construction of { b u [τ-r+1],...,b u [τ]As the original beam input sequence, τ is the current time slot, and r is the time window length. Construction of { X n [τ-r+1],...,X n [τ]And n belongs to {1, 2, 3} as an original image input sequence, and the camera index is n. For any time slot
Figure BDA0003627442200000081
The input sequence is represented as follows
Figure BDA0003627442200000082
Let a u [t]E {0,1} represents the communication link status of user u at the tth time slot, where 0 represents line-of-sight communication and 1 represents non-line-of-sight communication. The link connection state q of user u in a time window of length r' in the future u Is shown below
Figure BDA0003627442200000083
Where 0 indicates that user u is maintaining line-of-sight communication throughout the time window and 1 indicates that link congestion occurs during the time window.
Establishing a function f Θ (S) the function receives observed image-beam sequence pairs for future link states
Figure BDA0003627442200000084
And (6) performing prediction. Where Θ represents a set of parameters of the model, learned from the tag sequence dataset.The goal of model training can be expressed as follows
Figure BDA0003627442200000085
Will sequence X n Inputting into a YOLO detector to obtain the coordinates of the bounding box of the detected object, and converting each bounding box coordinate into a 6-dimensional vector including a central coordinate [ x ] cent ,y cent ]Coordinates of upper left corner [ x ] 1 ,y 1 ]And the lower right corner coordinate [ x ] 2 ,y 2 ]. The coordinates are normalized to the interval [0,1 ]]Together they mark the exact position of an object in the scene. Stacking the transformed coordinate vectors of an image into a high-dimensional vector
Figure BDA0003627442200000086
Wherein M represents the number of target objects detected in the image, and t is ∈ { tau-r + 1. Since the algorithm scene is a dynamic communication environment, the number of detection objects in the image at each moment is not fixed, which results in that
Figure BDA0003627442200000087
Is variable in length. Thus, padding with N-M zero vectors yields a sequence
Figure BDA0003627442200000088
The beam sequence b u [τ-r+1],...,b u [τ]Inputting the sequence into a beam embedding module to obtain a beam embedding sequence { b [ tau-r +1 ]],...,b[τ]}. And inputting the beam embedding sequence into a camera selection module, and judging the camera where the user is located at the moment. Inputting the detection coordinate sequence dn corresponding to the camera into a coordinate embedding module, and outputting a corresponding detection coordinate embedding sequence { d [ tau-r +1],...,d[τ]}。
The target detection coordinate embedding sequence and the beam embedding sequence are fused and then sent to a Transformer encoder module, and because the Transformer only depends on an attention mechanism and has no cycle and convolution structures, in order to enable the model to utilize the sequence information of the sequences, some information with absolute positions needs to be inserted before the input sequences. In the invention, Positionembedding and Modal-type embedding modes are adopted to encode the input sequence.
Wherein Positionembedding is calculated as follows
Figure BDA0003627442200000091
Figure BDA0003627442200000092
Wherein
Figure BDA0003627442200000093
Representing the position of the token in the sequence, L seq Indicating the length of the sequence. i ∈ [ 0.,. d. ], d ∈ model /2) represents the dimension of positionedbudding.
Embedding the beam into the sequence b [ tau-r +1 ] in turn],...,b[τ]D [ tau-r +1 ] and target detection coordinate embedding sequence],...,d[τ]Sending into a position coding function F PE (·)
b=b+F PE (b)
d=d+F PE (d)
The Modal-type embedding is mainly used for enabling the model to distinguish information of two modes, namely
ME b =full_like(b,0)
ME d =full_like(d,1)
b=b+ME b (b)
d=d+ME d (d)
Wherein full _ like (x, n) indicates that a vector with the same dimension as x is constructed and filled with n.
And splicing the beam sequence b and the target detection coordinate sequence d, and inputting the spliced beam sequence b and the target detection coordinate sequence d into a transformer encoder model. Obtaining a characteristic vector after passing through a transformer encoder model
Figure BDA0003627442200000094
Will y t Input to a classification module NET o Obtaining the final predicted result
Figure BDA0003627442200000095
Figure BDA0003627442200000096
Figure BDA0003627442200000097
Wherein
Figure BDA0003627442200000098
Let us show the Relu activation function.
Calculating a predicted value
Figure BDA0003627442200000099
And a label q u The model parameters Θ are updated with the inverse gradient.
Classification model output occlusion prediction
Figure BDA0003627442200000101
Blocking label q u E {0,1 }. Model output camera selection
Figure BDA0003627442200000102
Camera tag y s ∈{(0,0,1),(0,1,0),(1,0,0)}
Defining a loss function
Figure BDA0003627442200000103
As follows
Figure BDA0003627442200000104
Figure BDA0003627442200000105
Figure BDA0003627442200000106
Wherein
Figure BDA0003627442200000107
In order to predict the loss of congestion,
Figure BDA0003627442200000108
in order to predict the loss of the camera,
Figure BDA0003627442200000109
alpha is a predicted camera loss weight coefficient for the model total loss. Updating model parameters theta by adopting a random gradient descent method
Figure BDA00036274422000001010
Training results are as follows: after the model training is completed, 2050 samples in total are input into the test set, wherein 1280 non-blocking samples (0) and 770 blocking samples (1).
The verification results are as follows:
real\predict 0 1
0 1187 93
1 49 721
the prediction accuracy is as follows:
Figure BDA00036274422000001011
recall is defined as the proportion of samples correctly predicted to be blocked to all samples labeled as blocked
Figure BDA00036274422000001012
Precision is defined as the ratio of correctly predicted as blocked samples to all predicted as blocked samples
Figure BDA00036274422000001013
The ROC curves for model prediction are shown in fig. 3.
It should be noted that modifications and adaptations may occur to those skilled in the art without departing from the principles of the present invention and should be considered within the scope of the present invention.

Claims (7)

1. A millimeter wave communication link blocking prediction method based on visual information fusion is characterized by comprising the following steps:
step (1): modeling a beam blocking prediction problem into a two-classification problem based on multi-mode information, wherein the model consists of a target detection module, a camera selection module, an embedding module, a transform module and a classification module; initializing model parameters including neural network weights and biases of the modules;
step (2): for user u, at each time slot tau, a beam sequence of length r is constructedColumn { b } u [τ-r+1],…,b u [τ]And image sequence X n [τ-r+1],…,X n [τ]As a training sample sequence S u (ii) a Simultaneously constructing a link state sequence { a ] with the length r u [τ+1],…,a u [τ+r′]As training sample label q u
And (3): image sequence { X n [τ-r+1],…,X n [τ]Inputting a target detection module, and outputting a coordinate sequence of a detection frame of the barrier { d } n [τ-r+1],…,d n [τ]};
And (4): beam sequence b u [τ-r+1],…,b u [τ]Inputting the sequence into a beam embedding module to obtain a beam embedding sequence { b [ tau-r +1 ]],…,b[τ]}; inputting the beam embedding sequence into a camera selection module, and judging a camera where a user is located at the moment; inputting the detection coordinate sequence dn corresponding to the camera into a coordinate embedding module, and outputting a corresponding detection coordinate embedding sequence { d [ tau-r +1],…,d[τ]};
And (5): fusing the target detection coordinate embedding sequence and the wave beam embedding sequence, sending the fused sequences into a Transformer module, coding the sequences, sending the coded sequences into a classification module for secondary classification, and predicting the link connection state of the user u in a time window with the future length r
Figure FDA0003627442190000011
And (6): calculating a predicted value
Figure FDA0003627442190000012
And a label q u The loss function of (2) performs inverse gradient update on the model parameter Θ;
and (5) circularly executing the steps (2) to (6) until the algorithm converges.
2. The millimeter wave communication link blocking prediction method based on visual information fusion of claim 1, wherein: the step (1) specifically comprises the following steps:
the target detection module is responsible for positioning the coordinates of suspected obstacles from the acquired image, the embedding module is responsible for encoding an input beam sequence and a target coordinate sequence into vectors with specified dimensions, the camera selection module predicts the number of cameras where a user is located through the input beam sequence, the Transformer module is an encoder based on an attention mechanism, and the classification module finally outputs the results of model two classification;
the millimeter wave base station is equipped with three cameras, two of which are side cameras with a 75 degree field of view and one of which is a center camera with a 110 degree field of view.
3. The millimeter wave communication link blocking prediction method based on visual information fusion of claim 1, wherein: the step (2) specifically comprises:
(2.1) defining an input sequence: the method aims to develop a deep learning model by utilizing an RGB image sequence and a beam sequence to predict the blocking condition of a communication link; for any user u in the communication environment, an image sequence and a beam sequence observed in r unit time intervals form a group of input sequences; for any time slot
Figure FDA0003627442190000021
The sequence is shown below
Figure FDA0003627442190000022
Wherein the content of the first and second substances,
Figure FDA0003627442190000023
representing the RGB image shot by the nth camera in the t-th time slot, wherein W, H, C respectively represents the width, height and color channel number of the image; b u [t]Representation codebook
Figure FDA00036274421900000213
An index of a beamforming vector used to serve user u in the t-th slot;
Figure FDA0003627442190000024
represents the length of the observation interval;
(2.2) defining an output variable q u : let a u [t]E {0,1} represents the communication link state of the user u at the t-th time slot, wherein 0 represents line-of-sight communication and 1 represents non-line-of-sight communication; the link connection state q of user u in a time window of length r' in the future u Is shown below
Figure FDA0003627442190000025
Wherein 0 indicates that user u maintains line-of-sight communication throughout the time window, and 1 indicates that link congestion occurs within the time window;
(2.3) defining a model function: the method aims to establish a function f Θ (S) the function receives observed image-beam sequence pairs for future link states
Figure FDA0003627442190000026
Carrying out prediction; wherein Θ represents a parameter set of the model, learning from the tag sequence dataset; the goal of model training can be expressed as follows
Figure FDA0003627442190000027
4. The millimeter wave communication link blocking prediction method based on visual information fusion of claim 1, wherein: the step (3) specifically comprises the following steps:
(3.1) obtaining a sequence of RGB images
Figure FDA0003627442190000028
(3.2) sequencing the sequence X n Inputting a YOLO detector to obtain a coordinate of a detection target boundary box;
(3.3) converting each bounding box coordinate into a 6-dimensional vector including a center coordinate [ x ] cent ,y cent ]Coordinate of upper left corner [ x ] 1 ,y 1 ]And the lower right corner coordinate [ x ] 2 ,y 2 ](ii) a The coordinates are normalized to the interval [0,1 ]]Together they mark the exact position of an object in the scene;
(3.4) stacking the transformed coordinate vectors of an image into a high-dimensional vector
Figure FDA0003627442190000029
Wherein M represents the number of the target objects detected in the image, and t belongs to { tau-r + 1.,. tau }; since the algorithm scene is a dynamic communication environment, the number of detection objects in the image at each moment is not fixed, which results in that
Figure FDA00036274421900000210
Is variable in length; thus, padding with N-M zero vectors yields a sequence
Figure FDA00036274421900000211
(3.5) the module finally outputs the coordinate sequence of the detection frame
Figure FDA00036274421900000212
5. The millimeter wave communication link blocking prediction method based on visual information fusion of claim 1, wherein: the step (4) specifically comprises the following steps:
(4.1) obtaining a beam sequence { b) of a user u u [τ-r+1],…,b u [τ]Wherein the beams are codebooks
Figure FDA00036274421900000314
The index of the optimal code word of the middle service user; the definition of the optimal code word is as follows
Figure FDA0003627442190000031
Wherein
Figure FDA0003627442190000032
As a codebook
Figure FDA0003627442190000033
Code word of (1), N m Is the number of base station antennas;
Figure FDA0003627442190000034
is a downlink channel between a base station and a user; p s Which is representative of the transmitted power,
Figure FDA0003627442190000035
representing the noise power, k representing the kth carrier;
(4.2) Beam sequence { b u [τ-r+1],…,b u [τ]Inputting the data to a beam embedding module; because the algorithm receives data of two modes, the two information have different dimensions, and therefore the two information need to be converted into vectors of the same dimension through an embedding module;
for a beam sequence, a memory size is generated as
Figure FDA0003627442190000036
Of the input beam codeword index b n [t]The embedding layer returns the embedded vector corresponding to the index
Figure FDA0003627442190000037
Wherein d is model Is a defined feature vector dimension;
(4.3) embedding the Beam into the sequence { b [ tau-r +1],…,b[τ]Is input into a camera selection model NET s Outputting the feature vector
Figure FDA0003627442190000038
Camera selection moduleComprises L s A fully connected network of layers, the model being represented as
Figure FDA0003627442190000039
Wherein Θ is s ={W s ,b s Denotes the weight and bias of the fully connected layer,
Figure FDA00036274421900000310
nonlinear function of the representation model, written as
Figure FDA00036274421900000311
Wherein
Figure FDA00036274421900000312
Indicating the Relu activation function.
6. The millimeter wave communication link blocking prediction method based on visual information fusion of claim 1, wherein: the step (5) specifically comprises:
(5.1) since the Transformer only depends on the attention mechanism, and has no cycle and convolution structure, in order to enable the model to utilize the sequence information of the sequence, some information with absolute position needs to be inserted in front of the input sequence, and the input sequence is encoded by adopting Positionembedding and Modal-type embedding modes;
wherein Positionembedding is calculated as follows
Figure FDA00036274421900000313
Figure FDA0003627442190000041
Wherein
Figure FDA0003627442190000042
Representing the position of the token in the sequence, L seq Indicates the length of the sequence; i ∈ [0, …, d ] model /2) represents the dimension of positionedbudding;
embedding the beam into the sequence b [ tau-r +1 ] in turn],…,b[τ]D [ tau-r +1 ] and target detection coordinate embedding sequence],…,d[τ]Sending into a position coding function F PE (·)
b=b+F PE (b)
d=d+F PE (d)
The Modal-type embedding is mainly used for enabling the model to distinguish information of two modes, namely
ME b =full_like(b,0)
ME d =full_like(d,1)
b=b+ME b (b)
d=d+ME d (d)
Wherein full _ like (x, n) represents that a vector with the same dimension as x is constructed and is filled with n;
(5.2) splicing the beam sequence b and the target detection coordinate sequence d, and inputting the spliced beam sequence b and the target detection coordinate sequence d into a Transformer model; the transformer encoder is formed by stacking L multi-head attention layers and a feedforward neural network layer; the algorithm flow of the multi-head attention mechanism is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,…,head h )W o
Figure FDA0003627442190000043
Figure FDA0003627442190000044
wherein, the input parameters of the MultiHead function
Q=K=V={b[τ-r+1],…,b[τ],d[τ-r+1],…,d[τ]}
(5.3) obtaining a characteristic vector after passing through a transformer encoder model
Figure FDA0003627442190000045
Will y t Input to a classification module NET o Obtaining the final predicted result
Figure FDA0003627442190000046
Figure FDA0003627442190000047
Figure FDA0003627442190000048
Wherein
Figure FDA0003627442190000049
Indicating the Relu activation function.
7. The millimeter wave communication link blocking prediction method based on visual information fusion of claim 1, wherein: in the step (6), the classification model outputs the blocking prediction
Figure FDA0003627442190000051
Blocking label q u E {0,1 }; in step (4), model output camera selection
Figure FDA0003627442190000052
Camera tag y s ∈{(0,0,1),(0,1,0),(1,0,0)}
Defining a loss function
Figure FDA0003627442190000053
As follows
Figure FDA0003627442190000054
Figure FDA0003627442190000055
Figure FDA0003627442190000056
Wherein
Figure FDA0003627442190000057
In order to predict the loss of congestion,
Figure FDA0003627442190000058
in order to predict the loss of the camera,
Figure FDA0003627442190000059
alpha is a weight coefficient for predicting the loss of the camera; updating model parameters theta by adopting a random gradient descent method
Figure FDA00036274421900000510
And (3) wherein lambda is the learning rate, and the steps (2) to (6) are executed in a circulating mode until the algorithm is converged.
CN202210480580.3A 2022-05-05 2022-05-05 Millimeter wave communication link blocking prediction method based on visual information fusion Pending CN114845332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210480580.3A CN114845332A (en) 2022-05-05 2022-05-05 Millimeter wave communication link blocking prediction method based on visual information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210480580.3A CN114845332A (en) 2022-05-05 2022-05-05 Millimeter wave communication link blocking prediction method based on visual information fusion

Publications (1)

Publication Number Publication Date
CN114845332A true CN114845332A (en) 2022-08-02

Family

ID=82568278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210480580.3A Pending CN114845332A (en) 2022-05-05 2022-05-05 Millimeter wave communication link blocking prediction method based on visual information fusion

Country Status (1)

Country Link
CN (1) CN114845332A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113595608A (en) * 2021-06-23 2021-11-02 清华大学 Millimeter wave/terahertz communication method, device and system based on visual perception
CN113709701A (en) * 2021-08-27 2021-11-26 西安电子科技大学 Millimeter wave vehicle networking combined beam distribution and relay selection method
WO2021255640A1 (en) * 2020-06-16 2021-12-23 King Abdullah University Of Science And Technology Deep-learning-based computer vision method and system for beam forming

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021255640A1 (en) * 2020-06-16 2021-12-23 King Abdullah University Of Science And Technology Deep-learning-based computer vision method and system for beam forming
CN113595608A (en) * 2021-06-23 2021-11-02 清华大学 Millimeter wave/terahertz communication method, device and system based on visual perception
CN113709701A (en) * 2021-08-27 2021-11-26 西安电子科技大学 Millimeter wave vehicle networking combined beam distribution and relay selection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马文焱;戚晨皓;: "基于深度学习的上行传输过程毫米波通信波束选择方法", 合肥工业大学学报(自然科学版), no. 12, 28 December 2019 (2019-12-28) *

Similar Documents

Publication Publication Date Title
Charan et al. Vision-aided 6G wireless communications: Blockage prediction and proactive handoff
US11940803B2 (en) Method, apparatus and computer storage medium for training trajectory planning model
CN109961019A (en) A kind of time-space behavior detection method
JP2020119501A (en) Learning method and learning device for improving segmentation performance to be used for detecting events including pedestrian event, automobile event, falling event, and fallen event by using edge loss, and test method and test device using the same
CN111461251A (en) Indoor positioning method of WiFi fingerprint based on random forest and self-encoder
WO2021183993A1 (en) Vision-aided wireless communication systems
WO2021255640A1 (en) Deep-learning-based computer vision method and system for beam forming
CN114266938A (en) Scene recognition method based on multi-mode information and global attention mechanism
Comiter et al. Localization convolutional neural networks using angle of arrival images
Yang et al. Environment semantics aided wireless communications: A case study of mmWave beam prediction and blockage prediction
CN114844545A (en) Communication beam selection method based on sub6GHz channel and partial millimeter wave pilot frequency
CN116580322A (en) Unmanned aerial vehicle infrared small target detection method under ground background
US20230362039A1 (en) Neural network-based channel estimation method and communication apparatus
CN114845332A (en) Millimeter wave communication link blocking prediction method based on visual information fusion
CN116503723A (en) Dense multi-scale target detection method in low-visibility environment
Wu et al. Enhanced path loss model by image-based environmental characterization
Lin et al. Multi-camera view based proactive bs selection and beam switching for v2x
CN113030853B (en) RSS and AOA combined measurement-based multi-radiation source passive positioning method
CN115426671A (en) Method, system and equipment for graph neural network training and wireless cell fault prediction
NL2026432B1 (en) Multi-source target tracking method for complex scenes
CN112765892A (en) Intelligent switching judgment method in heterogeneous Internet of vehicles
Yapar et al. The First Pathloss Radio Map Prediction Challenge
Lin et al. Multi-Camera Views Based Beam Searching and BS Selection with Reduced Training Overhead
Neema et al. User spatial localization for vision aided beam tracking based millimeter wave systems using convolutional neural networks
Xiang et al. Computer Vision Aided Beamforming Fused with Limited Feedback

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination