CN114845332A - Millimeter wave communication link blocking prediction method based on visual information fusion - Google Patents
Millimeter wave communication link blocking prediction method based on visual information fusion Download PDFInfo
- Publication number
- CN114845332A CN114845332A CN202210480580.3A CN202210480580A CN114845332A CN 114845332 A CN114845332 A CN 114845332A CN 202210480580 A CN202210480580 A CN 202210480580A CN 114845332 A CN114845332 A CN 114845332A
- Authority
- CN
- China
- Prior art keywords
- sequence
- model
- module
- embedding
- coordinate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000006854 communication Effects 0.000 title claims abstract description 64
- 238000004891 communication Methods 0.000 title claims abstract description 59
- 230000000903 blocking effect Effects 0.000 title claims abstract description 32
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000000007 visual effect Effects 0.000 title claims abstract description 17
- 230000004927 fusion Effects 0.000 title claims abstract description 15
- 238000001514 detection method Methods 0.000 claims description 38
- 239000013598 vector Substances 0.000 claims description 32
- 230000006870 function Effects 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 12
- 230000007246 mechanism Effects 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000013145 classification model Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000004888 barrier function Effects 0.000 claims description 2
- 238000013136 deep learning model Methods 0.000 claims description 2
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 239000000126 substance Substances 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 4
- 238000004088 simulation Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002902 bimodal effect Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/06—Testing, supervising or monitoring using simulated traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/147—Network analysis or design for predicting network behaviour
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W36/00—Hand-off or reselection arrangements
- H04W36/08—Reselecting an access point
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W36/00—Hand-off or reselection arrangements
- H04W36/24—Reselection being triggered by specific parameters
- H04W36/30—Reselection being triggered by specific parameters by measured or perceived connection quality data
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a millimeter wave communication link blocking prediction method based on visual information fusion. The invention can effectively predict the mobile blocking condition in the communication process, can ensure that a user actively switches to another line-of-sight link base station before the blocking occurs, ensures that the communication is always in a line-of-sight link state, and improves the stability of a millimeter wave communication system.
Description
Technical Field
The invention belongs to the field of wireless communication and deep learning, and particularly relates to a millimeter wave communication system beam blocking prediction method based on visual information fusion.
Background
Millimeter waves and large-scale MIMO are one of important key technologies of 5G mobile communication, the large bandwidth of the millimeter waves can greatly improve the channel capacity, and the high data rate requirements of various applications such as unmanned driving and virtual reality in the future can be met. By utilizing the wave beam forming technology, the base station can aim the wave beam direction of the signal at the position of the user, and the signal to noise ratio of communication is improved.
However, one of the key challenges faced by millimeter wave communication systems is the susceptibility of high frequency signals to blocking. High frequency signals are transmitted mainly by line-of-sight links due to their high free space loss and weak reflection capability. When there is an object obstacle between the user and the communication base station, the received signal-to-noise ratio will drop dramatically, which may cause a sudden interruption of communication and seriously affect the stability of communication. When there is a blockage in the communication link between the user and the base station, it is often necessary to re-establish a new line-of-sight link, which usually requires some processing time. Especially for massive MIMO systems, beam training tends to bring large time overhead. In view of the low latency requirements of future communication networks, it is desirable that the communication system not only maintain line-of-sight connectivity, but also be capable of sensing future congestion.
Some studies have demonstrated that machine learning models can utilize wireless channel data (e.g., channel or received power) to distinguish line-of-sight links from non-line-of-sight links, e.g., congestion prediction can be performed by collecting a user's beam sequence input to a gated recursive network (GRU). However, the algorithm is suitable for the case of fixed blocking and cannot predict the mobile blocking well.
The multimodal deep learning technology is designed by an algorithm so that a model can simultaneously acquire information of a plurality of modes such as characters, images and sounds, and has recently achieved excellent performance in many natural language processing tasks. In a communication system, the multi-mode technology can be utilized to combine the wireless channel data with other modal data, so that the perception capability of the algorithm to the environment is improved.
Disclosure of Invention
The invention aims to provide a visual fusion beam blocking prediction method based on a Transformer in order to cope with a complex scene of multi-direction mobile blocking in a real communication network, so as to realize the purpose of sensing the blocking condition of burst in a millimeter wave communication system in advance. The scheme can enable a user to actively switch to other line-of-sight link base stations before the blockage occurs, avoid the situation that the signal-to-noise ratio is suddenly reduced due to the blockage in the communication process, and ensure the stability of the communication process.
In order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a millimeter wave communication link blocking prediction method based on visual information fusion comprises the following steps:
step (1): the method comprises the steps of modeling a beam blocking prediction problem into a two-classification problem based on multi-mode information, wherein the model consists of a target detection module, a camera selection module, an embedding module, a Transformer module and a classification module. Initializing model parameters including neural network weights and biases of the modules;
the target detection module is responsible for positioning the coordinates of suspected obstacles in the acquired image, the embedding module is responsible for encoding the input beam sequence and the target coordinate sequence into vectors with specified dimensions, the camera selection module predicts the number of cameras where a user is located through the input beam sequence, the Transformer module is an encoder based on an attention mechanism, and the classification module finally outputs the classification result of the model two.
The millimeter wave base station is equipped with three cameras, two of which are side cameras with a 75 degree field of view and one of which is a center camera with a 110 degree field of view.
Step (2): for user u, at each slot τ, a sequence of beams { b ] of length r is constructed u [τ-r+1],...,b u [τ]And image sequence X n [τ-r+1],...,X n [τ]As a training sample sequence S u . Simultaneously constructing a link state sequence { a ] with the length r u [τ+1],...,a u [τ+r′]As training sample label q u ;
(2.1) defining an input sequence: the method aims to develop a deep learning model by utilizing an RGB image sequence and a beam sequence to predict the blocking condition of a communication link. For any user u in the communication environment, the image sequence and the beam sequence observed in r unit time intervals form a group of input sequences. For any time slotThe sequence is shown below
Wherein the content of the first and second substances,representing the RGB image captured by the nth camera in the t-th time slot, W, H, C represents the width, height and number of color channels of the image, respectively. b u [t]Representation codebookIs used to serve the index of the beamforming vector of user u in the t-th slot.Representing the length of the observation interval.
(2.2) defining an output variable q u : let a u [t]E {0,1} represents the communication link status of user u at the tth time slot, where 0 represents line-of-sight communication and 1 represents non-line-of-sight communication. The link connection state q of user u in a time window of length r' in the future u Is shown below
Where 0 indicates that user u is maintaining line-of-sight communication throughout the time window and 1 indicates that link congestion occurs during the time window.
(2.3) defining a model function: the algorithm of the invention aims to establish a function f Θ (S) the function receives the observed image-beam sequence pairs for future link statesAnd (6) performing prediction. Where Θ represents a set of parameters of the model, learned from the tag sequence dataset. The goal of model training can be expressed as follows
And (3): image sequence { X n [τ-r+1],...,X n [τ]Inputting a target detection module, and outputting a coordinate sequence of a detection frame of the barrier { d } n [τ-r+1],...,d n [τ]};
The target detection module needs to have two basic capabilities of 1) rapidly and accurately detecting object coordinates and 2) effectively identifying object types. The YOLO detector can well realize detection precision, and the module adopts the latest YOLOv5 framework and is optimized. The original architecture is modified to detect objects of interest in the scene, i.e. objects that may cause a blockage to the user communication link in the communication scene, such as buses, trucks, trees, buildings, etc.
For a certain time slot τ, the following steps are performed in order:
(3.2) sequencing the sequence X n Inputting a YOLO detector to obtain a coordinate of a detection target boundary box;
(3.3) converting each bounding box coordinate into a 6-dimensional vector including a center coordinate [ x ] cent ,y cent ]Coordinates of upper left corner [ x ] 1 ,y 1 ]And the lower right corner coordinate [ x ] 2 ,y 2 ]. The coordinates are normalized to the interval [0,1 ]]Together they mark the exact position of an object in the scene;
(3.4) stacking the transformed coordinate vectors of an image into a high-dimensional vectorWherein M represents the number of target objects detected in the image, and t is ∈ { tau-r + 1. Since the algorithm scene is a dynamic communication environment, the number of detection objects in the image at each moment is not fixed, which results in thatIs variable in length. Thus, padding with N-M zero vectors yields a sequence
And (4): beam sequence b u [τ-r+1],...,b u [τ]Inputting the sequence into a beam embedding module to obtain a beam embedding sequence { b [ tau-r +1 ]],...,b[τ]}. And inputting the beam embedding sequence into a camera selection module, and judging the camera where the user is located at the moment. Inputting the detection coordinate sequence dn corresponding to the camera into a coordinate embedding module, and outputting a corresponding detection coordinate embedding sequence { d [ tau-r +1],...,d[τ]};
(4.1) obtaining a beam sequence { b) of a user u u [τ-r+1],...,b u [τ]Wherein the beams are codebooksThe index of the optimal codeword of the serving user. The definition of the optimal code word is as follows
WhereinAs a codebookCode word of (1), N m Is the number of base station antennas.Is a downlink channel between the base station and the user. P s Which is representative of the transmitted power,representing the noise power and k the kth carrier.
(4.2) Beam sequence { b u [τ-r+1],...,b u [τ]Is input to the beam embedding module. Since the algorithm will receive data (beams and images) of two modalities, and the dimensions of the two information are different, it is necessary to convert them into vectors of the same dimension by the embedding module.
For the beam sequence, a lookup table with the size | F | is generated, and the beam codeword index b is input n [t]The embedding layer returns the embedded vector corresponding to the indexWherein d is model Is a defined feature vector dimension.
(4.3) embedding the Beam into the sequence { b [ tau-r +1],...,b[τ]Is input into a camera selection model NET s Outputting the feature vectorThe camera selection module comprises L s The model can be expressed as
Wherein Θ is s ={W s ,b s Denotes the weight and bias of the fully connected layer,a non-linear function representing a model, which can be written as
And (5): fusing the target detection coordinate embedding sequence and the wave beam embedding sequence, sending the fused sequences into a transformer encoder module, coding the sequences, sending the coded sequences into a classification module for secondary classification, and predicting the link connection state of a user u in a time window with the future length of r
(5.1) since the Transformer relies only on the attention mechanism, there is no loop and convolution structure, in order for the model to be able to exploit the order information of the sequence, it is necessary to insert some information with absolute position before the input sequence. In the invention, Positionembedding and Modal-type embedding modes are adopted to encode the input sequence.
Wherein Positionembedding is calculated as follows
WhereinRepresenting the position of the token in the sequence, L seq Indicating the length of the sequence. i ∈ [ 0.,. d. ], d ∈ model /2) represents the dimension of positionedbudding.
Embedding the beam into the sequence b [ tau-r +1 ] in turn],...,b[τ]D [ tau-r +1 ] and target detection coordinate embedding sequence],...,d[τ]Sending into a position coding function F PE (·)
b=b+F PE (b)
d=d+F PE (d)
The Modal-type embedding is mainly used for enabling the model to distinguish information of two modes, namely
ME b =full_like(b,0)
ME d =full_like(d,1)
b=b+ME b (b)
d=d+ME d (d)
Wherein full _ like (x, n) indicates that a vector with the same dimension as x is constructed and filled with n.
And (5.2) splicing the beam sequence b and the target detection coordinate sequence d, and inputting the spliced beam sequence b and the target detection coordinate sequence d into a transformer encoder model. The transformer encoder is formed by stacking L multi-head attention layers and feedforward neural network layers. The algorithm flow of the multi-head attention mechanism is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
where head i =Attention(QW i Q ,KW i K ,VW i V )
wherein, the input parameters of the MultiHead function
Q=K=V={b[τ-r+1],...,b[τ],d[τ-r+1],...,d[τ]}
(5.3) obtaining a characteristic vector after passing through a transformer encoder modelInputting yt to the classification module NET o Obtain the final predicted result
And (6): calculating a predicted valueAnd a label q u The loss function of (2) performs inverse gradient update on the model parameter Θ;
classification model output occlusion predictionBlocking label q u E {0,1 }. In step (4), model output camera selectionCamera tag y s ∈{(0,0,1),(0,1,0),(1,0,0)}
WhereinIn order to predict the loss of congestion,in order to predict the loss of the camera,alpha is a predicted camera loss weight coefficient for the model total loss. Updating model parameters theta by adopting a random gradient descent method
And (3) wherein lambda is a learning rate, and steps (2) to (6) are executed in a circulating manner until the algorithm is converged.
The invention has the beneficial effects that:
1) by utilizing a machine learning algorithm, a user can predict the impending communication link blockage so as to switch a communication network to other line-of-sight links in advance and ensure the stability of communication;
2) the model is based on beam and image bimodal information, and compared with the condition that the model only based on wireless information is limited by fixed blockage, the method is suitable for the complex scene of multidirectional mobile blockage;
3) by utilizing the Transformer model based on the attention mechanism, compared with networks such as RNN, LSTM and the like, the parallel computing capability of the model is greatly improved, the time for model training and reasoning is shortened, and the large-scale practical application is facilitated.
4) The model only depends on the beam sequence information and the image information of the user to train and reason, and is insensitive to the signal-to-noise ratio change of the communication environment.
Drawings
FIG. 1 is a flow chart of a method for beam blockage prediction;
FIG. 2 is a schematic diagram of a Transformer module;
FIG. 3 is a validation set ROC graph.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings and the detailed implementation mode.
In order to cope with the complex scene of multidirectional mobile blocking existing in a real communication network, the invention provides a visual fusion beam blocking prediction method based on a Transformer, so as to realize the early perception of the blocking condition of burst arrival in a millimeter wave communication system. The scheme can enable a user to actively switch to other line-of-sight link base stations before the blockage occurs, avoid the situation that the signal-to-noise ratio is suddenly reduced due to the blockage in the communication process, and ensure the stability of the communication process.
As shown in fig. 1-3, the beam blocking prediction problem is modeled as a multi-modal two-classification problem based on beams and images, the model consisting of an object detection module, an image selection module, an embedding module, a transformer encoder module, and a classification module. The system comprises a target detection module, an embedding module, a camera selection module, a transformer encoder module, a classification module and a model II classification module, wherein the target detection module is used for positioning coordinates of suspected obstacles from an acquired image, the embedding module is used for encoding an input beam sequence and a target coordinate sequence into vectors with specified dimensions, the camera selection module is used for predicting the number of cameras where a user is located through the input beam sequence, the transformer encoder module is an encoder based on an attention mechanism, and the classification module finally outputs a model II classification result.
1. Simulation environment construction
The simulated communication environment is built based on an open source scenario-the ViWi multi-user scenario "ASUDT 1_ 28". This is an outdoor millimeter wave communication environment built using a game engine and ray tracing software. It was developed using the ViWi data generation framework. The scene depicts a typical busy street including vehicles, pedestrians, trees, buildings, and so on. The cars moving in the scene represent the communication users. While large vehicles, such as moving buses and trucks, act as dynamic blockages during user communications. The simulation scenario includes a total of 50 cars, 8 buses, and 2 trucks, all moving at different speeds.
A millimeter wave base station with the working frequency of 28GHz is deployed beside a street, and the base station is provided with three cameras with the heights of 4.5 meters and different directions, wherein two cameras are side cameras and have 75-degree visual fields, and one camera is a central camera and has 110-degree visual fields. The base station has a uniform linear array of N antennas, using a predefined DFT codebookWhereinCan be expressed as
The user u communication in the simulation system adopts OFDM of K subcarriers, and the received downlink signal is
WhereinIs the received signal of user u on carrier k,is the channel of base station and user u at carrier k, n k Is subject to a Gaussian distributionRandom noise of (2). h is u,k The channel is represented as follows
Wherein alpha is l For the attenuation coefficient of the l-th path,for the departure azimuth angle of the ith path,elevation of departure for the ith path. Upsilon is l Is the phase of path l, τ l Is the propagation delay of path l, B is the signal bandwidth, K is the number of carriers, a (-) is the channel response vector.
For user u, the optimal codeword index between the current time and the base station is obtained as the beam, and the definition of the optimal codeword is as follows
WhereinAs a codebookCode word of (1), N m Is the number of base station antennas.Is a downlink channel between the base station and the user u. P s Which is representative of the transmitted power,representing the noise power and k the kth carrier.
The simulation environment setting parameters are as follows:
1) the base station antennas are distributed in a uniform linear array, and the number of the antennas is 128; the user side adopts a single full-phase antenna
2) The working frequency of the base station is 28GHz
3) With DFT codebook, there are 128 code words
4) Carrier number K64
2. Communication blocking prediction method training process based on visual information fusion
Construction of { b u [τ-r+1],...,b u [τ]As the original beam input sequence, τ is the current time slot, and r is the time window length. Construction of { X n [τ-r+1],...,X n [τ]And n belongs to {1, 2, 3} as an original image input sequence, and the camera index is n. For any time slotThe input sequence is represented as follows
Let a u [t]E {0,1} represents the communication link status of user u at the tth time slot, where 0 represents line-of-sight communication and 1 represents non-line-of-sight communication. The link connection state q of user u in a time window of length r' in the future u Is shown below
Where 0 indicates that user u is maintaining line-of-sight communication throughout the time window and 1 indicates that link congestion occurs during the time window.
Establishing a function f Θ (S) the function receives observed image-beam sequence pairs for future link statesAnd (6) performing prediction. Where Θ represents a set of parameters of the model, learned from the tag sequence dataset.The goal of model training can be expressed as follows
Will sequence X n Inputting into a YOLO detector to obtain the coordinates of the bounding box of the detected object, and converting each bounding box coordinate into a 6-dimensional vector including a central coordinate [ x ] cent ,y cent ]Coordinates of upper left corner [ x ] 1 ,y 1 ]And the lower right corner coordinate [ x ] 2 ,y 2 ]. The coordinates are normalized to the interval [0,1 ]]Together they mark the exact position of an object in the scene. Stacking the transformed coordinate vectors of an image into a high-dimensional vectorWherein M represents the number of target objects detected in the image, and t is ∈ { tau-r + 1. Since the algorithm scene is a dynamic communication environment, the number of detection objects in the image at each moment is not fixed, which results in thatIs variable in length. Thus, padding with N-M zero vectors yields a sequence
The beam sequence b u [τ-r+1],...,b u [τ]Inputting the sequence into a beam embedding module to obtain a beam embedding sequence { b [ tau-r +1 ]],...,b[τ]}. And inputting the beam embedding sequence into a camera selection module, and judging the camera where the user is located at the moment. Inputting the detection coordinate sequence dn corresponding to the camera into a coordinate embedding module, and outputting a corresponding detection coordinate embedding sequence { d [ tau-r +1],...,d[τ]}。
The target detection coordinate embedding sequence and the beam embedding sequence are fused and then sent to a Transformer encoder module, and because the Transformer only depends on an attention mechanism and has no cycle and convolution structures, in order to enable the model to utilize the sequence information of the sequences, some information with absolute positions needs to be inserted before the input sequences. In the invention, Positionembedding and Modal-type embedding modes are adopted to encode the input sequence.
Wherein Positionembedding is calculated as follows
WhereinRepresenting the position of the token in the sequence, L seq Indicating the length of the sequence. i ∈ [ 0.,. d. ], d ∈ model /2) represents the dimension of positionedbudding.
Embedding the beam into the sequence b [ tau-r +1 ] in turn],...,b[τ]D [ tau-r +1 ] and target detection coordinate embedding sequence],...,d[τ]Sending into a position coding function F PE (·)
b=b+F PE (b)
d=d+F PE (d)
The Modal-type embedding is mainly used for enabling the model to distinguish information of two modes, namely
ME b =full_like(b,0)
ME d =full_like(d,1)
b=b+ME b (b)
d=d+ME d (d)
Wherein full _ like (x, n) indicates that a vector with the same dimension as x is constructed and filled with n.
And splicing the beam sequence b and the target detection coordinate sequence d, and inputting the spliced beam sequence b and the target detection coordinate sequence d into a transformer encoder model. Obtaining a characteristic vector after passing through a transformer encoder modelWill y t Input to a classification module NET o Obtaining the final predicted result
Calculating a predicted valueAnd a label q u The model parameters Θ are updated with the inverse gradient.
Classification model output occlusion predictionBlocking label q u E {0,1 }. Model output camera selectionCamera tag y s ∈{(0,0,1),(0,1,0),(1,0,0)}
WhereinIn order to predict the loss of congestion,in order to predict the loss of the camera,alpha is a predicted camera loss weight coefficient for the model total loss. Updating model parameters theta by adopting a random gradient descent method
Training results are as follows: after the model training is completed, 2050 samples in total are input into the test set, wherein 1280 non-blocking samples (0) and 770 blocking samples (1).
The verification results are as follows:
real\predict | 0 | 1 |
0 | 1187 | 93 |
1 | 49 | 721 |
the prediction accuracy is as follows:
recall is defined as the proportion of samples correctly predicted to be blocked to all samples labeled as blocked
Precision is defined as the ratio of correctly predicted as blocked samples to all predicted as blocked samples
The ROC curves for model prediction are shown in fig. 3.
It should be noted that modifications and adaptations may occur to those skilled in the art without departing from the principles of the present invention and should be considered within the scope of the present invention.
Claims (7)
1. A millimeter wave communication link blocking prediction method based on visual information fusion is characterized by comprising the following steps:
step (1): modeling a beam blocking prediction problem into a two-classification problem based on multi-mode information, wherein the model consists of a target detection module, a camera selection module, an embedding module, a transform module and a classification module; initializing model parameters including neural network weights and biases of the modules;
step (2): for user u, at each time slot tau, a beam sequence of length r is constructedColumn { b } u [τ-r+1],…,b u [τ]And image sequence X n [τ-r+1],…,X n [τ]As a training sample sequence S u (ii) a Simultaneously constructing a link state sequence { a ] with the length r u [τ+1],…,a u [τ+r′]As training sample label q u ;
And (3): image sequence { X n [τ-r+1],…,X n [τ]Inputting a target detection module, and outputting a coordinate sequence of a detection frame of the barrier { d } n [τ-r+1],…,d n [τ]};
And (4): beam sequence b u [τ-r+1],…,b u [τ]Inputting the sequence into a beam embedding module to obtain a beam embedding sequence { b [ tau-r +1 ]],…,b[τ]}; inputting the beam embedding sequence into a camera selection module, and judging a camera where a user is located at the moment; inputting the detection coordinate sequence dn corresponding to the camera into a coordinate embedding module, and outputting a corresponding detection coordinate embedding sequence { d [ tau-r +1],…,d[τ]};
And (5): fusing the target detection coordinate embedding sequence and the wave beam embedding sequence, sending the fused sequences into a Transformer module, coding the sequences, sending the coded sequences into a classification module for secondary classification, and predicting the link connection state of the user u in a time window with the future length r
And (6): calculating a predicted valueAnd a label q u The loss function of (2) performs inverse gradient update on the model parameter Θ;
and (5) circularly executing the steps (2) to (6) until the algorithm converges.
2. The millimeter wave communication link blocking prediction method based on visual information fusion of claim 1, wherein: the step (1) specifically comprises the following steps:
the target detection module is responsible for positioning the coordinates of suspected obstacles from the acquired image, the embedding module is responsible for encoding an input beam sequence and a target coordinate sequence into vectors with specified dimensions, the camera selection module predicts the number of cameras where a user is located through the input beam sequence, the Transformer module is an encoder based on an attention mechanism, and the classification module finally outputs the results of model two classification;
the millimeter wave base station is equipped with three cameras, two of which are side cameras with a 75 degree field of view and one of which is a center camera with a 110 degree field of view.
3. The millimeter wave communication link blocking prediction method based on visual information fusion of claim 1, wherein: the step (2) specifically comprises:
(2.1) defining an input sequence: the method aims to develop a deep learning model by utilizing an RGB image sequence and a beam sequence to predict the blocking condition of a communication link; for any user u in the communication environment, an image sequence and a beam sequence observed in r unit time intervals form a group of input sequences; for any time slotThe sequence is shown below
Wherein the content of the first and second substances,representing the RGB image shot by the nth camera in the t-th time slot, wherein W, H, C respectively represents the width, height and color channel number of the image; b u [t]Representation codebookAn index of a beamforming vector used to serve user u in the t-th slot;represents the length of the observation interval;
(2.2) defining an output variable q u : let a u [t]E {0,1} represents the communication link state of the user u at the t-th time slot, wherein 0 represents line-of-sight communication and 1 represents non-line-of-sight communication; the link connection state q of user u in a time window of length r' in the future u Is shown below
Wherein 0 indicates that user u maintains line-of-sight communication throughout the time window, and 1 indicates that link congestion occurs within the time window;
(2.3) defining a model function: the method aims to establish a function f Θ (S) the function receives observed image-beam sequence pairs for future link statesCarrying out prediction; wherein Θ represents a parameter set of the model, learning from the tag sequence dataset; the goal of model training can be expressed as follows
4. The millimeter wave communication link blocking prediction method based on visual information fusion of claim 1, wherein: the step (3) specifically comprises the following steps:
(3.2) sequencing the sequence X n Inputting a YOLO detector to obtain a coordinate of a detection target boundary box;
(3.3) converting each bounding box coordinate into a 6-dimensional vector including a center coordinate [ x ] cent ,y cent ]Coordinate of upper left corner [ x ] 1 ,y 1 ]And the lower right corner coordinate [ x ] 2 ,y 2 ](ii) a The coordinates are normalized to the interval [0,1 ]]Together they mark the exact position of an object in the scene;
(3.4) stacking the transformed coordinate vectors of an image into a high-dimensional vectorWherein M represents the number of the target objects detected in the image, and t belongs to { tau-r + 1.,. tau }; since the algorithm scene is a dynamic communication environment, the number of detection objects in the image at each moment is not fixed, which results in thatIs variable in length; thus, padding with N-M zero vectors yields a sequence
5. The millimeter wave communication link blocking prediction method based on visual information fusion of claim 1, wherein: the step (4) specifically comprises the following steps:
(4.1) obtaining a beam sequence { b) of a user u u [τ-r+1],…,b u [τ]Wherein the beams are codebooksThe index of the optimal code word of the middle service user; the definition of the optimal code word is as follows
WhereinAs a codebookCode word of (1), N m Is the number of base station antennas;is a downlink channel between a base station and a user; p s Which is representative of the transmitted power,representing the noise power, k representing the kth carrier;
(4.2) Beam sequence { b u [τ-r+1],…,b u [τ]Inputting the data to a beam embedding module; because the algorithm receives data of two modes, the two information have different dimensions, and therefore the two information need to be converted into vectors of the same dimension through an embedding module;
for a beam sequence, a memory size is generated asOf the input beam codeword index b n [t]The embedding layer returns the embedded vector corresponding to the indexWherein d is model Is a defined feature vector dimension;
(4.3) embedding the Beam into the sequence { b [ tau-r +1],…,b[τ]Is input into a camera selection model NET s Outputting the feature vectorCamera selection moduleComprises L s A fully connected network of layers, the model being represented as
Wherein Θ is s ={W s ,b s Denotes the weight and bias of the fully connected layer,nonlinear function of the representation model, written as
6. The millimeter wave communication link blocking prediction method based on visual information fusion of claim 1, wherein: the step (5) specifically comprises:
(5.1) since the Transformer only depends on the attention mechanism, and has no cycle and convolution structure, in order to enable the model to utilize the sequence information of the sequence, some information with absolute position needs to be inserted in front of the input sequence, and the input sequence is encoded by adopting Positionembedding and Modal-type embedding modes;
wherein Positionembedding is calculated as follows
WhereinRepresenting the position of the token in the sequence, L seq Indicates the length of the sequence; i ∈ [0, …, d ] model /2) represents the dimension of positionedbudding;
embedding the beam into the sequence b [ tau-r +1 ] in turn],…,b[τ]D [ tau-r +1 ] and target detection coordinate embedding sequence],…,d[τ]Sending into a position coding function F PE (·)
b=b+F PE (b)
d=d+F PE (d)
The Modal-type embedding is mainly used for enabling the model to distinguish information of two modes, namely
ME b =full_like(b,0)
ME d =full_like(d,1)
b=b+ME b (b)
d=d+ME d (d)
Wherein full _ like (x, n) represents that a vector with the same dimension as x is constructed and is filled with n;
(5.2) splicing the beam sequence b and the target detection coordinate sequence d, and inputting the spliced beam sequence b and the target detection coordinate sequence d into a Transformer model; the transformer encoder is formed by stacking L multi-head attention layers and a feedforward neural network layer; the algorithm flow of the multi-head attention mechanism is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,…,head h )W o
wherein, the input parameters of the MultiHead function
Q=K=V={b[τ-r+1],…,b[τ],d[τ-r+1],…,d[τ]}
(5.3) obtaining a characteristic vector after passing through a transformer encoder modelWill y t Input to a classification module NET o Obtaining the final predicted result
7. The millimeter wave communication link blocking prediction method based on visual information fusion of claim 1, wherein: in the step (6), the classification model outputs the blocking predictionBlocking label q u E {0,1 }; in step (4), model output camera selectionCamera tag y s ∈{(0,0,1),(0,1,0),(1,0,0)}
WhereinIn order to predict the loss of congestion,in order to predict the loss of the camera,alpha is a weight coefficient for predicting the loss of the camera; updating model parameters theta by adopting a random gradient descent method
And (3) wherein lambda is the learning rate, and the steps (2) to (6) are executed in a circulating mode until the algorithm is converged.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210480580.3A CN114845332A (en) | 2022-05-05 | 2022-05-05 | Millimeter wave communication link blocking prediction method based on visual information fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210480580.3A CN114845332A (en) | 2022-05-05 | 2022-05-05 | Millimeter wave communication link blocking prediction method based on visual information fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114845332A true CN114845332A (en) | 2022-08-02 |
Family
ID=82568278
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210480580.3A Pending CN114845332A (en) | 2022-05-05 | 2022-05-05 | Millimeter wave communication link blocking prediction method based on visual information fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114845332A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113595608A (en) * | 2021-06-23 | 2021-11-02 | 清华大学 | Millimeter wave/terahertz communication method, device and system based on visual perception |
CN113709701A (en) * | 2021-08-27 | 2021-11-26 | 西安电子科技大学 | Millimeter wave vehicle networking combined beam distribution and relay selection method |
WO2021255640A1 (en) * | 2020-06-16 | 2021-12-23 | King Abdullah University Of Science And Technology | Deep-learning-based computer vision method and system for beam forming |
-
2022
- 2022-05-05 CN CN202210480580.3A patent/CN114845332A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021255640A1 (en) * | 2020-06-16 | 2021-12-23 | King Abdullah University Of Science And Technology | Deep-learning-based computer vision method and system for beam forming |
CN113595608A (en) * | 2021-06-23 | 2021-11-02 | 清华大学 | Millimeter wave/terahertz communication method, device and system based on visual perception |
CN113709701A (en) * | 2021-08-27 | 2021-11-26 | 西安电子科技大学 | Millimeter wave vehicle networking combined beam distribution and relay selection method |
Non-Patent Citations (1)
Title |
---|
马文焱;戚晨皓;: "基于深度学习的上行传输过程毫米波通信波束选择方法", 合肥工业大学学报(自然科学版), no. 12, 28 December 2019 (2019-12-28) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Charan et al. | Vision-aided 6G wireless communications: Blockage prediction and proactive handoff | |
US11940803B2 (en) | Method, apparatus and computer storage medium for training trajectory planning model | |
CN109961019A (en) | A kind of time-space behavior detection method | |
JP2020119501A (en) | Learning method and learning device for improving segmentation performance to be used for detecting events including pedestrian event, automobile event, falling event, and fallen event by using edge loss, and test method and test device using the same | |
CN111461251A (en) | Indoor positioning method of WiFi fingerprint based on random forest and self-encoder | |
WO2021183993A1 (en) | Vision-aided wireless communication systems | |
WO2021255640A1 (en) | Deep-learning-based computer vision method and system for beam forming | |
CN114266938A (en) | Scene recognition method based on multi-mode information and global attention mechanism | |
Comiter et al. | Localization convolutional neural networks using angle of arrival images | |
Yang et al. | Environment semantics aided wireless communications: A case study of mmWave beam prediction and blockage prediction | |
CN114844545A (en) | Communication beam selection method based on sub6GHz channel and partial millimeter wave pilot frequency | |
CN116580322A (en) | Unmanned aerial vehicle infrared small target detection method under ground background | |
US20230362039A1 (en) | Neural network-based channel estimation method and communication apparatus | |
CN114845332A (en) | Millimeter wave communication link blocking prediction method based on visual information fusion | |
CN116503723A (en) | Dense multi-scale target detection method in low-visibility environment | |
Wu et al. | Enhanced path loss model by image-based environmental characterization | |
Lin et al. | Multi-camera view based proactive bs selection and beam switching for v2x | |
CN113030853B (en) | RSS and AOA combined measurement-based multi-radiation source passive positioning method | |
CN115426671A (en) | Method, system and equipment for graph neural network training and wireless cell fault prediction | |
NL2026432B1 (en) | Multi-source target tracking method for complex scenes | |
CN112765892A (en) | Intelligent switching judgment method in heterogeneous Internet of vehicles | |
Yapar et al. | The First Pathloss Radio Map Prediction Challenge | |
Lin et al. | Multi-Camera Views Based Beam Searching and BS Selection with Reduced Training Overhead | |
Neema et al. | User spatial localization for vision aided beam tracking based millimeter wave systems using convolutional neural networks | |
Xiang et al. | Computer Vision Aided Beamforming Fused with Limited Feedback |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |