WO2020065384A1

WO2020065384A1 - Systems and methods for video-assisted network operations

Info

Publication number: WO2020065384A1
Application number: PCT/IB2018/057579
Authority: WO
Inventors: Faisal EL-SHABANI; Edward Sich
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2018-09-28
Filing date: 2018-09-28
Publication date: 2020-04-02

Abstract

Systems and methods for video-assisted network operations in a wireless communication network are disclosed. In some embodiments, a method of operation of a network node in a wireless communication network comprises receiving, from a visual system comprising one or more cameras, visual assistance information related to an object detected by the visual system. The method further comprises correlating the object detected by the visual system to a wireless device served by the wireless communication network, and performing one or more network operations associated with the wireless device based on the visual assistance information related to the object detected by the visual system that is correlated to the wireless device. Using the visual assistance information, the network node can, e.g., rapidly determine the most likely direction to initiate a beam selection process for the wireless device, thereby saving time and improving the likelihood of a high throughput connection.

Description

SYSTEMS AND METHODS FOR VIDEO-ASSISTED NETWORK OPERATIONS

Technical Field

[0001 ] The present disclosure relates to a wireless network and, more specifically, network operations performed in a wireless network such as a cellular communications network.

Background

[0002] Operators today pay a significant amount of money for spectrum. So, they are interested in the push for higher capacity (bits per second (bits/s)) and higher spectrum efficiency (bits/s per Hertz (bits/s/Hz)). A theoretical limit on capacity can be calculated based on the Shannon-Hartley theorem which shows C=B log2(1 +S/N), where B is the bandwidth of the channel, S is the received signal power over the bandwidth, and N is the noise power over the bandwidth. This formula was later extended to account for the number of layers used in transmission. Today, this is used in Multiple Input Multiple Output (MIMO) systems.

[0003] The number of MIMO layers in the channel can be increased by increasing the number of antennas at the base station and at the User

Equipment (UE). The ratio S/N can be improved by using a technique called “transmit diversity,” where for example more than one antenna at the base station will send the same signal. The signals from those antennas will go through different radio propagation paths to arrive at the UE, where the signals are then properly decoded.

[0004] Another technique for improving the ratio S/N is beamforming. For beamforming, a phase/amplitude modified copy of the same signal is sent on multiple antennas, causing the energy of the transmitted signal to be focused in one direction (where the UE would be), i.e. reducing its beam width and increasing directivity. The phase and amplitude of the transmitted signals can be modified to control the beam width and steer the beam in different directions, allowing users to be tracked and throughput to be maintained with the UE’s movement.

[0005] Today, frequency allocation at Long Term Evolution (LTE) bands is more crowded than ever. A rise in operators’ interest is seen for millimeter wave frequencies (30-300 gigahertz (GHz)), especially for Fifth Generation (5G) applications. The downside of millimeter wave signals is that path loss increases significantly. To maintain the same system throughput, power at the base station and the UE needs to be increased significantly, or we must rely on beamforming. The first option is not feasible and results in a lot of wasted power. This indicates the need for even higher directivity achieved through an increased number of antennas in an antenna array, while utilizing beamforming and steering.

[0006] The first thing a UE does once it is turned on is to perform the attach procedure, during which the UE connects to the network through the closest base station. Network planning divides geographical areas into sectors.

Knowledge of the sector to which the UE has attached give a general idea of where the UE is located. In the 5G New Radio (NR) case, initial access is performed using Synchronization Signal Blocks (SSBs), where a SSB is a group of four (4) symbols occupying twenty (20) resource blocks.

[0007] Since fast initial acquisition time is required, a SSB can be sent using wider beams with fewer options to sweep. The UE will decode the signal from these beams and return an index of the beam that gives the best ratio S/N. The list of possible beams at the base station side and at the UE side along with the respective beamforming coefficients will be predetermined and defined in a codebook, covering a certain angular space. The number of possible beams will directly depend on beam width of the antenna structure and the area that needs to be covered.

[0008] Once initial acquisition is complete, the base station can now send SSB bursts or Channel State Information-Reference Signals (CSI-RSs) to find an even narrower beam with higher S/N. For tracking UE movement, a subset of the list of beams will continuously be swept by the base station, and the UE will update the base station according to its position. Tracking is also done using SSB bursts and CSI-RS signals. This will make sure the UE can be tracked by the base station and maintain good throughput.

[0009] When using millimeter wave, e.g., in 5G NR, many more beams will be used and, therefore, many more beams will have to be swept and checked. This increases the complexity, responsiveness, as well as the overhead used by the system. To reduce the severity of this problem, the base station can decide to create a wider beam that covers multiple UEs, but this will reduce the S/N experienced by each user.

[0010] Thus, there is a need for systems and methods that reduce overhead for beam tracking, which is particularly beneficial for millimeter wave systems such as, e.g., 5G NR.

Summary

[0011 ] Systems and methods for video-assisted network operations in a wireless communication network are disclosed. In some embodiments, a method of operation of a network node in a wireless communication network comprises receiving, from a visual system comprising one or more cameras, visual assistance information related to an object detected by the visual system. The method further comprises correlating the object detected by the visual system to a wireless device served by the wireless communication network, and performing one or more network operations associated with the wireless device based on the visual assistance information related to the object detected by the visual system that is correlated to the wireless device. Using the visual assistance information, the network node can, e.g., rapidly determine the most likely direction to initiate a beam selection process for the wireless device, thereby saving time and improving the likelihood of a high throughput connection.

[0012] In some embodiments, performing the one or more network operations comprises performing beam selection for a transmit beam for a downlink transmission from a radio access node of the wireless communication network to the wireless device and/or a receive beam for reception of an uplink transmission from the wireless device at the radio access node of the wireless communication network. Further, in some embodiments, the visual assistance information related to the object detected by the visual system comprises information that indicates, for each time instant in a set of time instants, a relative position of the object within a field of view of the one or more cameras at that time instant.

Correlating the object detected by the visual system to the wireless device served by the wireless communication network comprises determining that, for each time instant in at least a subset of the set of time instants, the relative position of the object within the field of view of the one or more cameras at the time instant matches a beam index used for the wireless device at that time instant.

[0013] Further, in some embodiments, performing beam selection for the transmit beam for the downlink transmission from the radio access node of the wireless communication network to the wireless device and/or the receive beam for reception of the uplink transmission from the wireless device at the radio access node of the wireless communication network comprises predicting a position of the wireless device during a future transmission time interval in which the downlink transmission and/or uplink transmission is to occur based on the visual assistance information related to the object detected by the visual system that is correlated to the wireless device and selecting a beam index of the transmit beam and/or the receive beam based on a predefined mapping between the predicted position of the wireless device and the beam index.

[0014] In some embodiments, the information that indicates, for each time instant in the set of time instants, the relative position of the object within the field of view of the one or more cameras at that time instant comprises, for each time instant in the set of time instants, information that indicates an azimuth angle and/or an elevation angle of the object within the field of view of the one or more cameras. Further, in some embodiments, the information that indicates, for each time instant in the set of time instants, the relative position of the object within the field of view of the one or more cameras at that time instant further comprises, for each time instant in the set of time instants, information that indicates a distance of the object from a known location of a respective one of the one or more cameras.

[0015] In some embodiments, the visual assistance information related to the object detected by the visual system comprises information that indicates, for each time instant in the set of time instants, a velocity of the object.

[0016] In some embodiments, the one or more cameras consist of a camera having a field of view that matches a coverage area of an antenna system of the radio access node used to provide beamforming. Further, in some

embodiments, the wireless device is in a line-of-sight of the antenna system of the radio access node.

[0017] In some embodiments, the wireless communication network is a cellular communication network, and performing the one or more network operations comprises making a decision to perform a handover of the wireless device from one cell to another cell based on the visual assistance information related to the object detected by the visual system that is correlated to the wireless device and initiating the handover of the wireless device from the one cell to the other cell upon making the decision.

[0018] In some embodiments, the one or more network operations comprise one or more security related operations.

[0019] In some embodiments, performing the one or more network operations comprises obtaining, from the visual system, one or more images or one or more videos captured of the object and associating the one or more images or the one or more videos with the wireless device or with a particular communication or communication session of the wireless device.

[0020] In some embodiments, the visual assistance information related to the object detected by the visual system comprises, for a period of time, one or more images and/or one or more videos captured of the object during the period of time. Performing the one or more network operations comprises associating the one or more images and/or the one or more videos captured of the object during the period of time with a communication or communication session of the wireless device that occurred during the period of time. [0021 ] In some embodiments, the method further comprises receiving, from the visual system, visual assistance information related to one or more additional objects detected by the visual system and performing the one or more network operations comprises estimating a wireless communication channel for the wireless device based on the visual assistance information related to the one or more additional objects detected by the visual system.

[0022] In some embodiments, the method further comprises receiving, from the visual system, visual assistance information related to one or more additional objects detected by the visual system, and performing the one or more network operations comprises performing beam steering for the wireless device based on the visual assistance information related to the one or more additional objects detected by the visual system.

[0023] In some embodiments, the method further comprises receiving, from the visual system, visual assistance information related to one or more additional objects detected by the visual system, and performing the one or more network operations comprises identifying sources of passive intermodulation distortion for the wireless device based on the visual assistance information related to the one or more additional objects detected by the visual system.

[0024] In some embodiments, the wireless communication network is a cellular communications network.

[0025] Embodiments of a network node for a wireless communication network are also disclosed. In some embodiments, network node for a wireless communication network is adapted to receive, from a visual system comprising one or more cameras, visual assistance information related to an object detected by the visual system, correlate the object detected by the visual system to a wireless device served by the wireless communication network, and perform one or more network operations associated with the wireless device based on the visual assistance information related to the object detected by the visual system that is correlated to the wireless device.

[0026] In some embodiments, a network node for a wireless communication network comprises a communication interface and processing circuitry associated with the communication interface, wherein the processing circuitry is operable to cause the network node to receive, from a visual system comprising one or more cameras, visual assistance information related to an object detected by the visual system, correlate the object detected by the visual system to a wireless device served by the wireless communication network, and perform one or more network operations associated with the wireless device based on the visual assistance information related to the object detected by the visual system that is correlated to the wireless device.

Brief Description of the Drawings

[0027] The accompanying drawing figures incorporated in and forming a part of this specification illustrate several aspects of the disclosure, and together with the description serve to explain the principles of the disclosure.

[0028] Figure 1 illustrates one example of a system including a network node (e.g., a base station) and a visual system and provides visual assistance information to the network node that is then used by the network node to perform one or more network operations such as, e.g., beam tracking in accordance with embodiments of the present disclosure;

[0029] Figure 2 illustrates the operation of the network node and the visual system of Figure 1 in accordance with embodiments of the present disclosure;

[0030] Figure 3 illustrates one example of a cellular communications network according to some embodiments of the present disclosure;

[0031 ] Figure 4 illustrates a wireless communication system represented as a Fifth Generation (5G) network architecture composed of core Network Functions (NFs), where interaction between any two NFs is represented by a point-to-point reference point/interface;

[0032] Figure 5 illustrates a 5G network architecture using service-based interfaces between the NFs in the control plane, instead of the point-to-point reference points/interfaces used in the 5G network architecture of Figure 4;

[0033] Figure 6 is a schematic block diagram of a radio access node according to some embodiments of the present disclosure; [0034] Figure 7 is a schematic block diagram that illustrates a virtualized embodiment of the radio access node of Figure 6 according to some

embodiments of the present disclosure;

[0035] Figure 8 is a schematic block diagram of the radio access node of Figure 6 according to some other embodiments of the present disclosure;

[0036] Figure 9 is a schematic block diagram of a User Equipment device (UE) according to some embodiments of the present disclosure; and

[0037] Figure 10 is a schematic block diagram of the UE of Figure 9 according to some other embodiments of the present disclosure.

Detailed Description

[0038] The embodiments set forth below represent information to enable those skilled in the art to practice the embodiments and illustrate the best mode of practicing the embodiments. Upon reading the following description in light of the accompanying drawing figures, those skilled in the art will understand the concepts of the disclosure and will recognize applications of these concepts not particularly addressed herein. It should be understood that these concepts and applications fall within the scope of the disclosure.

[0039] Radio Node: As used herein, a“radio node” is either a radio access node or a wireless device.

[0040] Radio Access Node: As used herein, a“radio access node” or“radio network node” is any node in a radio access network of a cellular

communications network that operates to wirelessly transmit and/or receive signals. Some examples of a radio access node include, but are not limited to, a base station (e.g., a New Radio (NR) base station (gNB) in a Third Generation Partnership Project (3GPP) Fifth Generation (5G) NR network or an enhanced or evolved Node B (eNB) in a 3GPP Long Term Evolution (LTE) network), a high- power or macro base station, a low-power base station (e.g., a micro base station, a pico base station, a home eNB, or the like), and a relay node.

[0041] Core Network Node: As used herein, a“core network node” is any type of node in a core network. Some examples of a core network node include, e.g., a Mobility Management Entity (MME), a Packet Data Network Gateway (P- GW), a Service Capability Exposure Function (SCEF), or the like.

[0042] Wireless Device: As used herein, a“wireless device” is any type of device that has access to (i.e., is served by) a cellular communications network by wirelessly transmitting and/or receiving signals to a radio access node(s). Some examples of a wireless device include, but are not limited to, a User Equipment device (UE) in a 3GPP network and a Machine Type Communication (MTC) device.

[0043] Network Node: As used herein, a“network node” is any node that is either part of the radio access network or the core network of a cellular communications network/system.

[0044] Note that the description given herein focuses on a 3GPP cellular communications system and, as such, 3GPP terminology or terminology similar to 3GPP terminology is oftentimes used. Flowever, the concepts disclosed herein are not limited to a 3GPP system.

[0045] Note that, in the description herein, reference may be made to the term “cell”; however, particularly with respect to 5G NR concepts, beams may be used instead of cells and, as such, it is important to note that the concepts described herein are equally applicable to both cells and beams.

[0046] Existing beam tracking technologies depend on the base station sending reference signals on multiple possible beams, and the UE responding back to the base station with an index of the beam that gives the best S/N ratio. With the increased number of antennas needed for beamforming to maintain system throughput and coverage at millimeter wave frequencies, beam tracking and management will become a much more important and much more difficult problem. This is because the increase in the number of possible beams to sweep will result in increased system latency and overhead. Thus, there is a need for system and methods to address this problem.

[0047] Systems and methods are disclosed herein in which visual assistance information from a visual system including one or more cameras is provided to a network node and utilized by the network node to perform one or more network operations such as, e.g., beam tracking, beam steering, handover, etc. With regard to beam tracking (i.e., beam selection), the present disclosure describes systems and methods that utilize a camera(s) and image processing to get a better estimate of where UEs are in relation to the base station. Such a visual system can, e.g., help the base station rapidly determine the most likely direction to initiate its beam selection process, thereby saving time and improving the likelihood of a high throughput connection to each potential user.

[0048] In some embodiments, a UE’s location can first be estimated using existing beam tracking schemes. Visual assistance information from the visual system that indicates the position of an object is then correlated with the beam selected via the existing beam tracking scheme to thereby correlate the object detected by the visual system to the UE. Thereafter, beam selection for that UE is performed based on visual assistance information regarding the correlated object obtained from the visual system. In some embodiments, the visual system includes a camera having a field of view that is matched to a coverage area of the antenna system of the base station, and the UE’s angle referenced to the antenna system of the base station can be estimated by averaging the pixels that it occupies, then record its (x,y) coordinates compared to the origin (front facing direction, center of the image). From there, the angle can be calculated and used to select the appropriate beam for the UE. In some embodiments, stereoscopic imaging can be used to create a Three Dimensional (3D) plot and estimate the distance to the user.

[0049] In addition, in some embodiments, the visual system can help the base station rapidly estimate the UE density in the cell and/or at various locations of the cell. As another example, in scenarios in which the uplink Signal to Noise Ratio (SNR) is poor, the visual system can be used to assist the base station in locating particular UEs.

[0050] As discussed below, while some embodiments utilize the visual assistance information for beam selection, the present disclosure is not limited thereto. The visual assistance information can be used by the base station or other network nodes to perform any suitable type of network operation. [0051 ] While not being limited to or by any particular advantage, embodiments of the present disclosure provide a number of advantages. For example, embodiments of the present disclosure provide better accuracy in estimating where a UE is located, especially for a moving object, which leads to faster beam tracking, higher throughput and user experience, lower system latency, and lower overhead. The UE’s environment, direction, and movement can also be better estimated. This gives more intelligence to the base station as to what is happening around the base station. Also, there will be lower latency in

identifying which beam fits the UE best, and improved overall system capacity and responsiveness.

[0052] Before describing embodiments of the present disclosure, some information regarding cameras and object detection technology that may be utilized by the visual system described herein is beneficial.

[0053] The following definitions and background descriptions are sourced from Wikipedia. See references [1 ]-[9].

• “Stereopsis is a term that is most often used to refer to the perception of depth and 3-dimensional structure obtained on the basis of visual information deriving from two eyes.” (see reference [1 ]). “A stereo camera is a type of camera with two or more lenses with a separate image sensor or film frame for each lens. This allows the camera to simulate human binocular vision, and therefore gives it the ability to capture three-dimensional images.” (see reference [2]). “Computer stereo vision is the extraction of 3D information from digital images, such as obtained by a CCD camera. By comparing information about a scene from two vantage points, 3D information can be extracted by examination of the relative positions of objects in the two panels.” (see reference [3]).

• “Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos.” (see reference [4]). Object detection can be achieved through training a computer using machine learning techniques (see reference [5]), by feeding it data of what a certain object usually looks like. Through enough examples, the computer will be eventually able to recognize some features and determine by itself what the object is. Google owns a currently dominant object detection API. (see reference [6]). Video in the following footnote shows an example of what can be done (see reference [7]).

• “Kalman filtering is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone.” (see reference [8]). Tracking moving objects and predicting their speed and future position can be achieved through Kalman Filter (see reference

[9])·

[0054] With regard to object detection, one question is how far can the camera see for the purposes of object detection. This question is addressed in a whitepaper by CohuHD Costar entitled“How Far Can I See,” 2014

(http://www.cohuhd.com/Files/white_papers/How_Far_Can_l_See.pdf). As discussed in this white paper, the Johnson’s Criteria is a method used in security and defense to note down the number of pixels on a target needed to classify an object of interest. This classification can be divided into: detection (i.e. , an object of the size you want to detect is present), recognition (i.e., determining what class the object belongs to such as building, truck, man, etc.), and identification (i.e., description of the object to the limit of the observer’s knowledge). As set forth in the white paper, the number of pixels needed for detection is 2 vertical pixels on the target, the number of pixels needed for recognition is 8 vertical pixels on the target, and the number of pixels needed for identification is 14 vertical pixels on the target. This is based on a 50% probability of positive assessment.

[0055] This whitepaper also states that there are three main criteria that determine how far a camera can see, which are: object size, field of view of the camera, and image resolution. Angle of view is another term for field of view. Regarding field of view, the field of view determines how much of a scene that the camera is going to see. Focal length is the distance between the lens and the image sensor when the subject is in focus. The field of view can be specified in degrees in vertical and horizontal or in meters in vertical and horizontal at the target location.

[0056] Many tools exist today to translate target distance and camera properties including focal length and width of lenses, etc. into field of view in meters at the target’s location.

[0057] Regarding image resolution, or camera resolution, the higher image resolution, the greater the detail in the image.

[0058] The following is an example from the white paper that shows how the number of vertical pixels needed to identify a 6 foot tall person is determined:

• GIVEN

o Object Size : 6 ft. (V) x 2 ft. (H) [Person]

o Camera Field of View : 259 ft. (FI) x 145 ft. (V) [Measured @ 4,560 ft. from camera]

o Image Resolution : 1280 (H) x 720 (V) [720p, High Definition]

• Step 1 : Calculate Pixels per Foot in Image

o Pixels per Foot = Image Resolution / Field of View

^■ Horizontal Pixels per Foot (HPF) = [ 1280 / 259 ] = 5 Pixels per Foot

^■ Vertical Pixels per Foot (VPF) = [ 720 / 145 ] = 5 Pixels per Foot

• Step 2: Calculate Pixels on Object

o Pixels on Object (PoT) = Pixels per Foot x Object Size

o Horizontal PoT (HPoT) = 5 HPF x 2 ft. = 10 HPoT

o Vertical PoT (VPoT) - 5 VPF x 6 ft. - 30 VPoT

• Step 3: Compare to your Detection Recognition Identification (DRI) Definition

o Here, we use the Johnson Criteria

^■ Detection - 2 VPoT ^■ Recognition - 8 VPoT

^■ Identification - 14 VPoT

• Step 4: Result

o This example shows that you will have 30 VPoT on a 6 ft. man 4,560 ft. away, and will be able to identify him with better than 50% probability.

[0059] Now, turning to embodiments of the present disclosure, Figure 1 illustrates a system 100 in which a network node 102 utilizes visual assistance information from a visual system 104 to perform one or more network operations in accordance with embodiments of the present disclosure. The system 100 includes the network node 102 and the visual system 104. The network node 102 can be any network node in any type of wireless communication system, preferably one that utilizes beamforming but not necessarily limited thereto. Examples of wireless communication systems in which embodiments of the present disclosure may be used are a cellular communications system (e.g., 5G NR), WiFi, and microwave Peer to Peer (P2P) systems. In some preferred embodiments, the network node 102 is a network node in a cellular

communications network such as, e.g., a 5G NR network. In this regard, the network node 102 may be a base station (i.e., a gNB in 5G NR terminology) or a core network node (e.g., a network node implementing a 5G Core (5GC) core network function). The network node 102 includes a correlation function 106 and a task performance function 108, each of which may be implemented, e.g., in software that is executed by one or more processors (e.g., Central Processing Units (CPUs), Application Specific Integrated Circuits (ASICs), or the like) of the network node 102. The correlation function 106 correlates objects detected by the visual system 104 to wireless devices (e.g., UEs) served by the wireless communication network based on respective visual assistance information obtained from the visual system 104 for the objects and knowledge that the network node 102 has regarding the wireless devices (e.g., current and possibly recent past beam indices used for the wireless devices). Note that a“beam index” is the nomenclature used to identify the position and the power of an individual RF beam to or from the base station. The task performance function 108 performs one or more network operations, or tasks, associated with the wireless devices based on the visual assistance information obtained from the visual system 104.

[0060] The visual system 104 includes one or more cameras 1 10 and one or more processing units 1 12 implementing an object detection function 1 14 and a visual assistance information reporting function 1 16. The one or more cameras 1 10 may include a single camera or multiple cameras. For example, in some embodiments, the network node 102 is a base station, and the one or more cameras 1 10 include a single camera 1 10 that has a field of view that matches a coverage area of an antenna system of the base station. As another example, the one or more cameras 1 10 include multiple cameras, e.g., positioned at known physical locations, e.g., around an urban environment or inside a building. The one or more cameras 1 10 operate to capture visual data, which includes images and/or video, and provide the visual data and associated metadata (e.g., camera attributes such as focal length, image resolution, etc.) to the processing unit 1 12. At the processing unit 1 12, the object detection function 1 14 processes the visual data and associated metadata to perform object detection using any suitable object detection technique. Note that object detection techniques are well-known in the art. As such, the details of object detection are not described herein. The visual assistance information reporting function 1 16 processes the output of the object detection function 1 14 to generate and report visual assistance information to the network node 102. In general, the visual assistance information includes information regarding one or more objects detected by the object detection function 1 14. In one example embodiment, for each object, the visual assistance information for that object includes:

• Object Identifier (ID): A unique identifier assigned to the object, e.g., by the object detection function 1 14.

• Position Information: Information that indicates a relative position(s) of the object within a field of view of the respective camera 1 10 that captured the respective visual data in which the object was detected. For example, for each time instant in a set of time instants in which the object is detected, the visual assistance information includes information that indicates the position of the object within the field of view of the respective camera 1 10 at that time instant. The set of time instants may be a single time instant or multiple time instants. The information that indicates the position of the object within the field of view may include an azimuth (horizontal) angle and an elevation (vertical) angle relative to a reference point within the field of view (i.e., a point at the center of the field of view). If the camera 1 10 is, e.g., a stereoscopic camera, the information that indicates the position of the object within the field of view may also include a distance from the camera 1 10.

• (Optional) Velocity Information: Information that indicates a velocity of each of the detected objects, e.g., at each time instant in the set of time instants for which the visual assistance information includes position information for the detected object.

• (Optional) Object Classification: The visual assistance information may include, for each object, information that indicates the

classification of the object (e.g., a person, an automobile, a building, etc.).

The visual assistance information may be reported to the network node 102 in any desired manner. For example, the visual assistance information may be reported to the network node 102 upon request (i.e., on-demand), periodically (e.g., at a predefined or preconfigured periodicity), or in real-time.

[0061 ] Figure 2 illustrates the operation of the network node 102 and the visual system 104 in accordance with some embodiments of the present disclosure. As illustrated, the visual system 104, and in particular the camera(s)

1 10, capture visual data (e.g., images and/or video) (step 200). The visual system 104, and in particular the object detection function 1 14, performs object detection using the captured visual data to thereby detect one or more objects within the field of view of the camera(s) 1 10 (step 202). The visual assistance information reporting function 1 16 processes the output of the object detection function 1 14 to generate visual assistance information (step 204). As discussed above, the visual assistance information includes, for each detected object, an object ID of the object, position information for the object, (optional) velocity information, and (optional) classification type of the object. The visual assistance information reporting function 1 16 of the visual system 104 reports the visual assistance information to the network node 102 (step 206).

[0062] The network node 102 receives the visual assistance information from the visual system 104. Upon receiving the visual assistance information from the visual system 104, the correlation function 106 of the network node 102 correlates at least some of the visual assistance information to one or more wireless devices (e.g., UEs) served by the wireless communication network (step 208). More specifically, the correlation function 106 correlates known information related to the positions of the wireless devices served by the wireless

communication network with the position information of the detected objects included in the visual assistance information. The known information related to the positions of the wireless devices may include, e.g., beam indices selected for the wireless devices, e.g., using conventional beam selection techniques for one or more time transmission time intervals. If the beam index (or beam indices) selected for a particular wireless device at a particular transmission time interval(s) matches the position(s) of a particular detected object for a

corresponding time instant(s), then that particular wireless device is determined to match (i.e. , correlate to) that particular detected object. Here, a predefined mapping between beam indices and position information (e.g., azimuth and elevation angles) within the field of view of the camera(s) 1 10 can be used to determine whether a particular beam index selected for a wireless device “matches” the position information of a detected object. Note that while beam indices are used in the example above, other types of known information related to the positions of the wireless devices may be used. For example, Global Positioning System (GPS) position or the like may alternatively be used. [0063] The network node 102, and in particular the task performance function 108, then performs one or more network operations associated with the wireless devices based on the visual assistance information for the correlated objects detected by the visual system 104 (step 210). The network operation(s) may include beam selection or beam tracking, handover, channel estimation, intelligent beam steering (i.e., beam optimization, which may include adjustment of angle, power, and/or number of beams used to reach a particular UE), security operation(s), and/or Internet of Things (loT) operation(s). For example, for beam selection, the task performance function 108 may predict a future location of a particular wireless device based on the visual assistance information for the correlated object received in step 206 and/or based on future visual assistance information received from the visual system 104 for the same object. In other words, the position (and, e.g., velocity) of the wireless device can be tracked using the visual assistance information reported by the visual system 104 for the correlated object and used to select the appropriate beam indices for the wireless device.

[0064] As another example, the visual assistance information for an object correlated to a particular wireless device may be utilized to determine when the wireless device is about to move from one cell to another (e.g., from a cell outside of a building to a cell inside a building as the correlated object is moving into an entryway of the building) and initiate handover of the wireless device to the other cell.

[0065] As another example, the camera(s) 1 10 may include multiple cameras 1 10 distributed throughout a building (e.g., a large shopping mall), and the visual assistance information for an object moving throughout the building may be used by the network node 102 to, e.g., track the correlated wireless device within the building, initiate handovers between different cells within the building, or intelligently steer a beam toward the wireless device based on its position within the building and a known layout of the building.

[0066] Some additional examples are described below. However, it should be noted that the network operations described herein are only examples. The visual assistance information can be used for any type of network operation in which knowledge of the position of objects (e.g., objects correlated to wireless devices and/or objects that are not correlated to wireless devices but impact the operation of the wireless communication network (e.g., buildings impacting channel conditions)) is beneficial.

[0067] Some particular aspects of an example embodiment in which the visual assistance information is used by the network node 102 to perform beam selection will now be described.

[0068] In this example, the network node 102 is a base station, and the camera(s) 1 10 include a video camera that is, e.g., mounted together with an antenna system of the base station, where the field of view of the video camera covers the same field of view as the antenna system of the base station. The video camera captures real time video with a high number of frames per second. This will give the base station the ability to detect wireless devices (referred to in this example as UEs), whether the UEs are stationary or moving. This allows increased beam tracking and steering accuracy and speed, which leads to higher user throughput. Tracking via the visual assistance information from the visual system 104 is done once there is a correlation between an object detected in the video and the position of a corresponding UE (e.g., a correlation between the position information for the detected object and the beam index(ices) selected for the UE via a conventional beam selection scheme). Once there is a correlation, the frames from the video camera can be used to track and predict movement of the object and thus the movement of the UE. The direction (e.g., azimuth and elevation angle) to the UE can be estimated and converted to a beam index.

This reduces the system latency and overhead when identifying the best possible beam for the UE.

[0069] The system can be used either with a codebook-based pattern generation system that uses precoding to create the desired antenna response pattern, or with a high-resolution beamforming system, such as exists in millimeter wave phased array systems, which use phase and amplitude control of the antenna element signals to create high resolution beams, which can be aimed within the coverage area. In the following discussion, wherever the phrase“codebook” is used, the same principles can be applied to phased array beamforming systems which use a“grid of beams” or similar approach to beamforming equally well.

[0070] The number of possible beams in the codebook are based on the beam width and the intended coverage area. A beam will be created for every possible angle within the intended coverage area. This, along with the UE angle referenced to the antenna system of the base station estimated by the visual system 104, will be used as the input to the task performance function 108 for beam selection. The UE’s (i.e., the correlated object’s) angle referenced to the base station antenna system can be estimated by averaging the pixels that it occupies, then record its (x,y) coordinates compared to the origin (front facing direction, center of the image). From there, the angle can be calculated.

[0071 ] Doing this in the case of Line of Sight (LoS) gives the base station the ability to identify which beam will best match the position of the UE much faster than before. Via the visual assistance information, the base station knows where the UE is positioned (e.g., the angle at which the corresponding object is located in reference to the antenna system of the base station) and can then use the beam that best matches the position of the UE. This will speed up the system responsiveness to identifying where the UE is and tracking the UE’s movement, since it is not necessary to continuously sweep the codebook. This will also result in reduction in system overhead. Using the visual system 104, the UE’s movement can be predicted, which enables the base station to initiate action earlier.

[0072] In order to calculate the latency of the new system, one should account for three things. First, the camera’s rate (i.e., frames per second) should be considered. There has been a lot of progress in this area with some cameras today reaching 1 trillion Frames Per Second (FPS), which give a latency of 1 picosecond. Flowever, for latency targets in the sub millisecond range, such a high frame rate is not required. For reference, a pedestrian user may change beams about once per second depending on their walking speed and distance from the base station. For example, assuming a beam width of 2 degrees, at a distance of 30 meters between the user and the base station and a walking speed of 1 .4 meters per second, the user would stay in the same beam for about 1 second. Second, the image processing, including machine learning/object detection and angle calculation, should be considered. This is where the bulk of the work will be and where most of the processing time will be spent. Google’s specialized hardware Tensor Processing Units (TPUs) claim a latency of 7 milliseconds (ms). The second class of hardware that can be used is Graphics Processing Units (GPUs), which are cheaper but come at a latency in the order of x100 ms. The third class of hardware is general purpose CPUs. These are much cheaper, but latency increases even more. Third, the conversion from an angle to the UE to a set of new beamforming coefficients for the UE should be considered. This should be quite fast as, for example, a look-up table can be used for the mapping of UE angle to beamforming coefficients or beam index, and the beamforming coefficients can then be provided to Field Programmable Gate Array (FPGA) / ASIC responsible for making the changes.

[0073] In some embodiments, machine learning technology improvements can be made in hardware and software allowing the previously mentioned latency numbers to be reduced further, and for the hardware to become cheaper. This will allow for smoother integration of machine learning and beamforming technologies. Another thing to keep in mind is that the latency of the proposed system can be reduced in the case of tracking a user’s movement by using a Kalman Filter. Images from previous frames and images from current frames can be used to make a prediction where the UE will be in future frames. From that, the angle to the UE can be estimated and then translated to a beam index defined by a codebook.

[0074] For Multiple Input Multiple Output (MIMO), where the transmitter will be sending different traffic data on different antennas, the visual assistance information for objects, including buildings, UEs, etc. can be used by the base station to estimate channel conditions and calculate precoders that best fit the scenario. For example, using the stereoscopic camera, the distance from the antenna system of the base station to buildings and UEs can be used to estimate signal propagation conditions, which can then be used to calculate the best precoder coefficients and select an ideal number of MIMO layers to fit the scenario. This application is more relevant for LTE/NR frequency bands (<= 3.5 gigahertz (GHz)) in a dense urban environment. Multiple UEs moving in different directions can be also detected, Multiple User MIMO (MU-MIMO) can then be used to give them all a better experience.

[0075] There are many Application Programming Interfaces (APIs) that can be used to detect objects and identify what those objects are. One example is TensorFlow, which is owned by Google. In this case, the visual assistance information can include an indication of the classification of the detected object(s). The classifications of the detected objects will allow the base station to be more intelligent and better predict network usage. For example, in an indoor scenario where there is a concert, a video camera’s images with this API can give the ability to detect how many people are in the same location, and this information can be used to let the network know to be prepared in case of possible spike usage. Towards the end of the concert when people are leaving the building, image detection can tell the base station that everyone is leaving the building so that the base station can prepare for a handover event from an indoor cell to an outdoor cell.

[0076] Another use case is an outdoor setting where a UE is moving and ends up being behind a building where network coverage would deteriorate if there is no LoS. Using object detection from images captured from the video camera along with machine learning, the base station can know that when a UE ends up in that location, the UE’s network performance will degrade so that the base station can take one or more actions to address this issue such as, e.g., steering the beam to create a reflection that will reach the user or initiating a handover to a neighboring base station. Machine learning can also be used to teach the base station to detect from the detected objects what a“user” of a UE looks like, whether that is a human or something else. [0077] Another example is an outdoor setting in which a UE is moving away from the direction that the antenna system of the base station is facing. In this case, throughput is going down, but the base station does not know why. Once Received Signal Strength Indicator (RSSI) goes below a certain threshold, a handover event is triggered. By using stereoscopic vision, which relies on using a stereo camera, along with possible use of Kalman Filter, the UE’s movement, distance, direction, etc. can be detected, thereby giving the base station a much higher accuracy and resolution estimate of the position of the UE such that the beam can be adjusted and a handover event can be predicted and prepared for sooner. This gives better network conditions to the UE. The movement does not have to be in the direction where the antenna system of the base station is facing; it could be in any direction. By using stereo imaging, the UE’s movement velocity can be detected, which can be used by the base station to handle handover more smoothly, which avoids interruption of connection.

[0078] As another example, an orientation of the UE is detected by the visual system 104. The network node 102 can inform the UE of the UE’s orientation in order to assist the UE in beam selection on the uplink side. This can potentially save battery lifetime as the UE beam searching and selection algorithm is simplified.

[0079] Another application is related to Passive Intermodulation (PIM) detection. A PIM detection algorithm could approximate the angle and distance to PIM sources relative to the antenna system of the base station based on radio internal algorithms. With a stereoscopic camera, the accuracy of prediction of the angle and distance to objects the camera can see can be improved. The estimate from the distance to PIM feature can then be correlated with what is seen or detected by the visual system, and possible PIM sources including another antenna or rust on cables, etc. can be identified. In other words, the visual system can work together with a PIM detection function. The PIM detection function would estimate distance and/or angle to the PIM. The visual system could then, for example, highlight areas of the viewing area or

pictures/video captured by the camera to help identify PIM sources. By making the camera view of the detected PIM source available to the operator, this can help an onsite person to isolate the exact source of the PIM and potentially remove or correct it. This results in quicker and more effective site maintenance and possible removal or repair of the offending PIM source, since identifying what is the PIM source in a network setup is time consuming.

[0080] Another application is in security, where images and/or videos can be correlated with a communication or communication session (e.g., a text message sent or received, a website viewed) that was part of criminal activity. In this case, the network node 102 may request an image(s) or video(s) of a particular object of interest during a period of time during which the communication or

communication session occurred. Alternatively, images and/or videos may be included in the visual assistance information provided to the network node 102.

[0081 ] Another application is in loT, where sensor locations can be visually pinpointed. In a specialized example, in a crop field setting, a captured image which shows smoke or fire could be correlated with a high temperature sensor reading, leading to a warning system with better intelligence of what the problem is.

[0082] Figure 3 illustrates one example of a cellular communications network 300 in which the system 100 of Figure 1 may be implemented according to some embodiments of the present disclosure. In the embodiments described herein, the cellular communications network 300 is a 5G NR network. In this example, the cellular communications network 300 includes base stations 302-1 and 302- 2, which in 5G NR are referred to as gNBs, controlling corresponding macro cells 304-1 and 304-2. The base stations 302-1 and 302-2 are generally referred to herein collectively as base stations 302 and individually as base station 302. Likewise, the macro cells 304-1 and 304-2 are generally referred to herein collectively as macro cells 304 and individually as macro cell 304. The cellular communications network 300 may also include a number of low power nodes 306-1 through 306-4 controlling corresponding small cells 308-1 through 308-4. The low power nodes 306-1 through 306-4 can be small base stations (such as pico or femto base stations) or Remote Radio Fleads (RRFIs), or the like. Notably, while not illustrated, one or more of the small cells 308-1 through 308-4 may alternatively be provided by the base stations 302. The low power nodes 306-1 through 306-4 are generally referred to herein collectively as low power nodes 306 and individually as low power node 306. Likewise, the small cells 308-1 through 308-4 are generally referred to herein collectively as small cells 308 and individually as small cell 308. The base stations 302 (and optionally the low power nodes 306) are connected to a core network 310.

[0083] The base stations 302 and the low power nodes 306 provide service to wireless devices 312-1 through 312-5 in the corresponding cells 304 and 308.

The wireless devices 312-1 through 312-5 are generally referred to herein collectively as wireless devices 312 and individually as wireless device 312. The wireless devices 312 are also sometimes referred to herein as UEs.

[0084] In some embodiments, the functionality of the network node 102 described above is implemented in the base stations 302 and/or the low power nodes 306. The visual system 104 is not illustrated in Figure 3, but should be understood to be present and operating as described above to provide visual assistance information to the respective base station(s) 302 and/or the respective low power node(s) 306.

[0085] Figure 4 illustrates a wireless communication system represented as a 5G network architecture composed of core Network Functions (NFs), where interaction between any two NFs is represented by a point-to-point reference point/interface. Figure 4 can be viewed as one particular implementation of the system 300 of Figure 3.

[0086] Seen from the access side the 5G network architecture shown in Figure 4 comprises a plurality of UEs connected to either a Radio Access

Network (RAN) or an Access Network (AN) as well as an Access and Mobility Management Function (AMF). Typically, the R(AN) comprises base stations, e.g. such as eNBs or 5G base stations (gNBs) or similar. Seen from the core network side, the 5G core NFs shown in Figure 4 include a Network Slice Selection Function (NSSF), an Authentication Server Function (AUSF), a Unified Data Management (UDM), an AMF, a Session Management Function (SMF), a Policy Control Function (PCF), and an Application Function (AF).

[0087] Reference point representations of the 5G network architecture are used to develop detailed call flows in the normative standardization. The N1 reference point is defined to carry signaling between the UE and AMF. The reference points for connecting between the AN and AMF and between the AN and UPF are defined as N2 and N3, respectively. There is a reference point,

N1 1 , between the AMF and SMF, which implies that the SMF is at least partly controlled by the AMF. N4 is used by the SMF and UPF so that the UPF can be set using the control signal generated by the SMF, and the UPF can report its state to the SMF. N9 is the reference point for the connection between different UPFs, and N14 is the reference point connecting between different AMFs, respectively. N15 and N7 are defined since the PCF applies policy to the AMF and SMP, respectively. N12 is required for the AMF to perform authentication of the UE. N8 and N10 are defined because the subscription data of the UE is required for the AMF and SMF.

[0088] The 5G core network aims at separating user plane and control plane. The user plane carries user traffic while the control plane carries signaling in the network. In Figure 4, the UPF is in the user plane and all other NFs, i.e., the AMF, SMF, PCF, AF, AUSF, and UDM, are in the control plane. Separating the user and control planes guarantees each plane resource to be scaled

independently. It also allows UPFs to be deployed separately from control plane functions in a distributed fashion. In this architecture, UPFs may be deployed very close to UEs to shorten the Round Trip Time (RTT) between UEs and data network for some applications requiring low latency.

[0089] The core 5G network architecture is composed of modularized functions. For example, the AMF and SMF are independent functions in the control plane. Separated AMF and SMF allow independent evolution and scaling. Other control plane functions like the PCF and AUSF can be separated as shown in Figure 4. Modularized function design enables the 5G core network to support various services flexibly. [0090] Each NF interacts with another NF directly. It is possible to use intermediate functions to route messages from one NF to another NF. In the control plane, a set of interactions between two NFs is defined as service so that its reuse is possible. This service enables support for modularity. The user plane supports interactions such as forwarding operations between different UPFs.

[0091] Figure 5 illustrates a 5G network architecture using service-based interfaces between the NFs in the control plane, instead of the point-to-point reference points/interfaces used in the 5G network architecture of Figure 4.

However, the NFs described above with reference to Figure 4 correspond to the NFs shown in Figure 5. The service(s) etc. that a NF provides to other authorized NFs can be exposed to the authorized NFs through the service-based interface. In Figure 5 the service based interfaces are indicated by the letter“N” followed by the name of the NF, e.g. Namf for the service based interface of the AMF and Nsmf for the service based interface of the SMF etc. The Network Exposure Function (NEF) and the Network Repository Function (NRF) in Figure 5 are not shown in Figure 4 discussed above. However, it should be clarified that all NFs depicted in Figure 4 can interact with the NEF and the NRF of Figure 5 as necessary, though not explicitly indicated in Figure 4.

[0092] Some properties of the NFs shown in Figures 4 and 5 may be described in the following manner. The AMF provides UE-based authentication, authorization, mobility management, etc. A UE even using multiple access technologies is basically connected to a single AMF because the AMF is independent of the access technologies. The SMF is responsible for session management and allocates Internet Protocol (IP) addresses to UEs. It also selects and controls the UPF for data transfer. If a UE has multiple sessions, different SMFs may be allocated to each session to manage them individually and possibly provide different functionalities per session. The AF provides information on the packet flow to the PCF responsible for policy control in order to support Quality of Service (QoS). Based on the information, the PCF determines policies about mobility and session management to make the AMF and SMF operate properly. The AUSF supports authentication function for UEs or similar and thus stores data for authentication of UEs or similar while the UDM stores subscription data of the UE. The Data Network (DN), not part of the 5G core network, provides Internet access or operator services and similar.

[0093] An NF may be implemented either as a network element on a dedicated hardware, as a software instance running on a dedicated hardware, or as a virtualized function instantiated on an appropriate platform, e.g., a cloud infrastructure.

[0094] In some embodiments, the functionality of the network node 102 described above is implemented in a core network node or core network entity such as one of the existing network functions illustrated in Figure 4 or 5 or as a new network function in addition to those illustrated in Figures 4 and 5.

[0095] Figure 6 is a schematic block diagram of a network node 600 according to some embodiments of the present disclosure. The network node 600 is one example implementation of the network node 102 described above. The network node 600 may be, for example, a base station 302 or 306. As illustrated, the network node 600 includes a control system 602 that includes one or more processors 604 (e.g., CPUs, ASICs, FPGAs, and/or the like), memory 606, and a network interface 608. The one or more processors 604 are also referred to herein as processing circuitry. In addition, if the network node 600 is a base station or other radio access node, the network node 600 includes one or more radio units 610 that each includes one or more transmitters 612 and one or more receivers 614 coupled to one or more antennas 616. The radio units 610 may be referred to or be part of radio interface circuitry. In some embodiments, the radio unit(s) 610 is external to the control system 602 and connected to the control system 602 via, e.g., a wired connection (e.g., an optical cable).

Flowever, in some other embodiments, the radio unit(s) 610 and potentially the antenna(s) 616 are integrated together with the control system 602. The one or more processors 604 operate to provide one or more functions of a network node 600 as described herein. In some embodiments, the function(s) are implemented in software that is stored, e.g., in the memory 606 and executed by the one or more processors 604.

[0096] Figure 7 is a schematic block diagram that illustrates a virtualized embodiment of the network node 600 according to some embodiments of the present disclosure. As used herein, a“virtualized” radio access node is an implementation of the network node 600 in which at least a portion of the functionality of the network node 600 is implemented as a virtual component(s) (e.g., via a virtual machine(s) executing on a physical processing node(s) in a network(s)). As illustrated, in this example, the network node 600 includes one or more processing nodes 700 coupled to or included as part of a network(s)

702. Each processing node 700 includes one or more processors 704 (e.g., CPUs, ASICs, FPGAs, and/or the like), memory 706, and a network interface 708. In addition, if the network node 600 is a base station or other radio access node, the network node 600 includes the radio unit(s) 610 and, optionally, the control system 602.

[0097] In this example, functions 710 of the network node 600 (i.e., the network node 102) described herein are implemented at the one or more processing nodes 700 or distributed across the control system 602 and the one or more processing nodes 700 in any desired manner. In some particular embodiments, some or all of the functions 710 of the network node 600 described herein are implemented as virtual components executed by one or more virtual machines implemented in a virtual environment(s) hosted by the processing node(s) 700.

[0098] In some embodiments, a computer program including instructions which, when executed by at least one processor, causes the at least one processor to carry out the functionality of network node 600 (i.e., the functionality of the network node 102) or a node (e.g., a processing node 700) implementing one or more of the functions 710 of the network node 600 in a virtual

environment according to any of the embodiments described herein is provided. In some embodiments, a carrier comprising the aforementioned computer program product is provided. The carrier is one of an electronic signal, an optical signal, a radio signal, or a computer readable storage medium (e.g., a non- transitory computer readable medium such as memory).

[0099] Figure 8 is a schematic block diagram of the network node 600 according to some other embodiments of the present disclosure. The network node 600 includes one or more modules 800, each of which is implemented in software. The module(s) 800 provide the functionality of the network node 600 described herein, e.g., with respect to Figure 2. This discussion is equally applicable to the processing node 700 of Figure 7 where the modules 800 may be implemented at one of the processing nodes 700 or distributed across multiple processing nodes 700 and/or distributed across the processing node(s) 700 and the control system 602.

[0100] Figure 9 is a schematic block diagram of a UE 900 according to some embodiments of the present disclosure. As illustrated, the UE 900 includes one or more processors 902 (e.g., CPUs, ASICs, FPGAs, and/or the like), memory 904, and one or more transceivers 906 each including one or more transmitters 908 and one or more receivers 910 coupled to one or more antennas 912. The transceiver(s) 906 includes radio-front end circuitry connected to the antenna(s) 912 that is configured to condition signals communicated between the antenna(s) 912 and the processor(s) 902, as will be appreciated by on of ordinary skill in the art. The processors 902 are also referred to herein as processing circuitry. The transceivers 906 are also referred to herein as radio circuitry. In some

embodiments, the functionality of the UE 900 described above may be fully or partially implemented in software that is, e.g., stored in the memory 904 and executed by the processor(s) 902. Note that the UE 900 may include additional components not illustrated in Figure 9 such as, e.g., one or more user interface components (e.g., an input/output interface including a display, buttons, a touch screen, a microphone, a speaker(s), and/or the like and/or any other components for allowing input of information into the UE 900 and/or allowing output of information from the UE 900), a power supply (e.g., a battery and associated power circuitry), etc. [0101 ] In some embodiments, a computer program including instructions which, when executed by at least one processor, causes the at least one processor to carry out the functionality of the UE 900 according to any of the embodiments described herein is provided. In some embodiments, a carrier comprising the aforementioned computer program product is provided. The carrier is one of an electronic signal, an optical signal, a radio signal, or a computer readable storage medium (e.g., a non-transitory computer readable medium such as memory).

[0102] Figure 10 is a schematic block diagram of the UE 900 according to some other embodiments of the present disclosure. The UE 900 includes one or more modules 1000, each of which is implemented in software. The module(s) 1000 provide the functionality of the UE 900 described herein.

[0103] Any appropriate steps, methods, features, functions, or benefits disclosed herein may be performed through one or more functional units or modules of one or more virtual apparatuses. Each virtual apparatus may comprise a number of these functional units. These functional units may be implemented via processing circuitry, which may include one or more

microprocessor or microcontrollers, as well as other digital hardware, which may include Digital Signal Processors (DSPs), special-purpose digital logic, and the like. The processing circuitry may be configured to execute program code stored in memory, which may include one or several types of memory such as Read Only Memory (ROM), Random Access Memory (RAM), cache memory, flash memory devices, optical storage devices, etc. Program code stored in memory includes program instructions for executing one or more telecommunications and/or data communications protocols as well as instructions for carrying out one or more of the techniques described herein. In some implementations, the processing circuitry may be used to cause the respective functional unit to perform corresponding functions according one or more embodiments of the present disclosure.

[0104] While processes in the figures may show a particular order of operations performed by certain embodiments of the present disclosure, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.).

[0105] At least some of the following abbreviations may be used in this disclosure. If there is an inconsistency between abbreviations, preference should be given to how it is used above. If listed multiple times below, the first listing should be preferred over any subsequent listing(s).

3D Three Dimensional

3GPP Third Generation Partnership Project

5G Fifth Generation

5GC Fifth Generation Core

AF Application Function

AMF Access and Mobility Management Function

AN Access Network

API Application Programming Interface

ASIC Application Specific Integrated Circuit

AUSF Authentication Server Function

Bits/s Bits per Second

Bits/s/Hz Bits per Second per Flertz

CPU Central Processing Unit

CSI-RS Channel State Information-Reference Signal

DN Data Network

DRI Detection Recognition Identification

DSP Digital Signal Processor

eNB Enhanced or Evolved Node B

FPGA Field Programmable Gate Array

FPS Frames Per Second

GFIz Gigahertz

gNB New Radio Base Station

GPS Global Positioning System • GPU Graphics Processing Unit

• HPF Horizontal Pixels per Foot

• HPoT Horizontal Pixels on Object

• ID Identifier

• loT Internet of Things

• IP Internet Protocol

• LoS Line of Sight

• LTE Long Term Evolution

• MIMO Multiple Input Multiple Output

• MME Mobility Management Entity

• ms Millisecond

• MTC Machine Type Communication

• MU-MIMO Multiple User Multiple Input Multiple Output

• NEF Network Exposure Function

• NF Network Function

• NR New Radio

• NRF Network Repository Function

• NSSF Network Slice Selection Function

• P2P Peer-to-Peer

• PCF Policy Control Function

• PIM Passive Intermodulation

• P-GW Packet Data Network Gateway

• PoT Pixels on Object

• QoS Quality of Service

• RAM Random Access Memory

• RAN Radio Access Network

• ROM Read Only Memory

• RRH Remote Radio Head

• RSSI Received Signal Strength Indicator

• RTT Round Trip Time SCEF Service Capability Exposure Function SMF Session Management Function

SNR Signal to Noise Ratio

SSB Synchronization Signal Block

TPU Tensor Processing Unit

UDM Unified Data Management

UE User Equipment

VPF Vertical Pixels per Foot

VPoT Vertical Pixels on Object

[0106] Those skilled in the art will recognize improvements and modifications to the embodiments of the present disclosure. All such improvements and modifications are considered within the scope of the concepts disclosed herein.

References

1 . https://arxiv.Org/pdf/1804.01908.pdf

2. https://en.wikipedia.0rg/wiki/Stere0psis#Static_and_dynamic_stimuli

3. https://en.wikipedia.org/wiki/Stereo_camera

4. https://en.wikipedia.org/wiki/Computer_stereo_vision

5. https://en.wikipedia.org/wiki/Object_detection

6. https://en.wikipedia.org/wiki/Machine_learning

7. https://techcrunch.com/2017/06/16/object-detection-api/

8. https://www.youtube.com/watch?v=By5CWHpuqNg

9. https://en.wikipedia.org/wiki/Kalman_filter

Claims

Claims What is claimed is:

1 . A method of operation of a network node (102) in a wireless

communication network, comprising:

receiving (206), from a visual system (104) comprising one or more cameras (1 10), visual assistance information related to an object detected by the visual system (104);

correlating (208) the object detected by the visual system (104) to a wireless device served by the wireless communication network; and

performing (210) one or more network operations associated with the wireless device based on the visual assistance information related to the object detected by the visual system (104) that is correlated to the wireless device.

2. The method of claim 1 wherein performing (210) the one or more network operations comprises performing beam selection for a transmit beam for a downlink transmission from a radio access node of the wireless communication network to the wireless device and/or a receive beam for reception of an uplink transmission from the wireless device at the radio access node of the wireless communication network.

3. The method of claim 2 wherein:

the visual assistance information related to the object detected by the visual system (104) comprises information that indicates, for each time instant in a set of time instants, a relative position of the object within a field of view of the one or more cameras (1 10) at that time instant; and

correlating (208) the object detected by the visual system (104) to the wireless device served by the wireless communication network comprises determining that, for each time instant in at least a subset of the set of time instants, the relative position of the object within the field of view of the one or more cameras (1 10) at the time instant matches a beam index used for the wireless device at that time instant.

4. The method of claim 3 wherein performing beam selection for the transmit beam for the downlink transmission from the radio access node of the wireless communication network to the wireless device and/or the receive beam for reception of the uplink transmission from the wireless device at the radio access node of the wireless communication network comprises:

predicting a position of the wireless device during a future transmission time interval in which the downlink transmission and/or uplink transmission is to occur based on the visual assistance information related to the object detected by the visual system (104) that is correlated to the wireless device; and

selecting a beam index of the transmit beam and/or the receive beam based on a predefined mapping between the predicted position of the wireless device and the beam index.

5. The method of claim 3 or 4 wherein the information that indicates, for each time instant in the set of time instants, the relative position of the object within the field of view of the one or more cameras (1 10) at that time instant comprises: for each time instant in the set of time instants, information that indicates an azimuth angle and/or an elevation angle of the object within the field of view of the one or more cameras (1 10).

6. The method of claim 5 wherein the information that indicates, for each time instant in the set of time instants, the relative position of the object within the field of view of the one or more cameras (1 10) at that time instant further comprises:

for each time instant in the set of time instants, information that indicates a distance of the object from a known location of a respective one of the one or more cameras (1 10).

7. The method of claim 5 or 6 wherein the visual assistance information related to the object detected by the visual system (104) comprises information that indicates, for each time instant in the set of time instants, a velocity of the object.

8. The method of any one of claims 3 to 7 wherein the one or more cameras (1 10) consist of a camera (1 10) having a field of view that matches a coverage area of an antenna system of the radio access node used to provide

beamforming.

9. The method of claim 8 wherein the wireless device is in a line-of-sight of the antenna system of the radio access node.

10. The method of claim 1 wherein the wireless communication network is a cellular communication network, and performing (210) the one or more network operations comprises:

making a decision to perform a handover of the wireless device from one cell to another cell based on the visual assistance information related to the object detected by the visual system (104) that is correlated to the wireless device; and

initiating the handover of the wireless device from the one cell to the other cell upon making the decision.

1 1 . The method of claim 1 wherein the one or more network operations comprise one or more security related operations.

12. The method of claim 1 wherein performing (210) the one or more network operations comprises:

obtaining, from the visual system (104), one or more images or one or more videos captured of the object; and associating the one or more images or the one or more videos with the wireless device or with a particular communication or communication session of the wireless device.

13. The method of claim 1 wherein:

the visual assistance information related to the object detected by the visual system (104) comprises, for a period of time, one or more images and/or one or more videos captured of the object during the period of time; and

performing (210) the one or more network operations comprises associating the one or more images and/or the one or more videos captured of the object during the period of time with a communication or communication session of the wireless device that occurred during the period of time.

14. The method of any one of claims 1 to 13 further comprising:

receiving (206), from the visual system (104), visual assistance information related to one or more additional objects detected by the visual system (104); and

performing (210) the one or more network operations comprises estimating a wireless communication channel for the wireless device based on the visual assistance information related to the one or more additional objects detected by the visual system (104).

15. The method of any one of claims 1 to 13 further comprising:

performing (210) the one or more network operations comprises performing beam steering for the wireless device based on the visual assistance information related to the one or more additional objects detected by the visual system (104).

16. The method of any one of claims 1 to 13 further comprising:

performing (210) the one or more network operations comprises identifying sources of passive intermodulation distortion for the wireless device based on the visual assistance information related to the one or more additional objects detected by the visual system (104).

17. The method of any one of claims 1 to 16 wherein the wireless

communication network is a cellular communications network.

18. A network node (102) for a wireless communication network, the network node (102) adapted to:

receive, from a visual system (104) comprising one or more cameras (1 10), visual assistance information related to an object detected by the visual system (104);

correlate the object detected by the visual system (104) to a wireless device served by the wireless communication network; and

perform one or more network operations associated with the wireless device based on the visual assistance information related to the object detected by the visual system (104) that is correlated to the wireless device.

19. The network node (102) of claim 18 wherein the network node (102) is further adapted to perform the method of any one of claims 2 to 17.

20. A network node (102) for a wireless communication network, comprising: a communication interface; and

processing circuitry associated with the communication interface, wherein the processing circuitry is operable to cause the network node (102) to: receive, from a visual system (104) comprising one or more cameras (1 10), visual assistance information related to an object detected by the visual system (104);

21 . The network node (102) of claim 20 wherein the processing circuitry is further operable to cause the network node (102) to perform the method of any one of claims 2 to 17.

22. A network node (102) for a wireless communication network, comprising: a receiving module operable to receive, from a visual system (104) comprising one or more cameras (1 10), visual assistance information related to an object detected by the visual system (104);

a correlating module operable to correlate the object detected by the visual system (104) to a wireless device served by the wireless communication network; and

a performing module operable to perform one or more network operations associated with the wireless device based on the visual assistance information related to the object detected by the visual system (104) that is correlated to the wireless device.

23. A computer program comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method according to any one of claims 1 -17.

24. A carrier containing the computer program of claim 23, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, or a computer readable storage medium.