CN113179496A

CN113179496A - Video analysis framework based on MEC and indoor positioning system under framework

Info

Publication number: CN113179496A
Application number: CN202110482405.3A
Authority: CN
Inventors: 陆音; 程然然; 李清远; 朱洪波
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-07-27

Abstract

The invention discloses a video analysis architecture and an indoor positioning system under the architecture, wherein the architecture comprises an application layer, a control arrangement layer, an edge calculation layer and a video acquisition layer; the video acquisition layer is used for acquiring video data; the edge computing layer comprises independent edge video analysis units for providing video analysis, and the control arrangement layer is used for providing an edge network environment for the edge video analysis units, distributing resources for each video analysis unit in the edge computing layer and monitoring the resource condition of each video analysis unit; the application layer is used to deploy applications for video analytics or to provide an interface for application access. The invention provides a novel video analysis architecture, in the architecture, a large amount of video data does not need to be uploaded and remotely processed through a core network, part or all of calculation is carried out on edge calculation nodes, the possibility of link congestion and failure is reduced, the performance is improved, and the response time is shortened.

Description

Video analysis framework based on MEC and indoor positioning system under framework

Technical Field

The invention relates to a video analysis framework based on an MEC (motion adaptive coding) and an indoor positioning system under the framework, belonging to the technical field of video analysis.

Background

Video analysis consists of computationally intensive tasks such as image processing, computer vision and pattern recognition. Location-based applications and services require very low latency to process video, as the analysis results are typically used to interact with humans for video applications (such as virtual reality and augmented reality) and other systems (such as traffic lights). At the present stage of a multimedia internet of things system, a large number of internet of things camera nodes are in close cooperation with application servers located in a small number of distributed large-scale data centers. And transmitting the video data to a remote server or a computing center for video analysis. Large distributed data centers provide pay-on-demand services to users in a highly centralized manner. This centralized cloud computing architecture model has enjoyed tremendous success on the current internet. Considering the amount of video data generated by a camera node in the internet of things, a centralized processing model faces a huge challenge facing the application of the internet of things, namely, needs lower response time. First, applications rely primarily on datacenters owned by application service providers such as Tencent, Ali baba, Google, Amazon, and Facebook to meet computing, storage, and network resource requirements. With the dramatic growth of internet of things devices and data, the model cannot transmit data from the edge device to the remote data center. Second, edge devices are typically located far from the data center, and thus, as the number of edge devices grows exponentially, high latency will be an inevitable problem for some end-to-end communication applications.

With the development of information technology, which is an era of internet of everything in the future, the number of sensors and devices connected through internet of things has increased to 500 billion by 2020 according to the report of cisco. The connected sensor and equipment not only enable the location sensing service in the intelligent environment to be possible, but also are beneficial to building intelligent buildings, intelligent campuses, intelligent transportation and intelligent cities. For the location-based applications and services of the internet of things, Global Navigation Satellite Systems (GNSS), such as Global Positioning System (GPS) and Beidou Navigation Satellite System (BDS), have been widely used in outdoor environments and have very high Positioning accuracy. However, there are many obstacles in the indoor scene, so that GNSS signals may be attenuated quickly or even disappear completely, and the requirement of indoor positioning cannot be met. Thus, many indoor location technologies, such as Wi-Fi, visible light, Ultra Wideband (UWB), and bluetooth, have emerged. However, these methods are not general and additional devices must be configured for each scene in advance. Under the environment of the internet of things, abundant information is provided by a large amount of video data captured by widely distributed camera nodes. With the development of image processing technology and computer vision technology, the indoor positioning technology based on video analysis has a huge development prospect.

Disclosure of Invention

The invention aims to solve the problem of high delay of the current video analysis system, and provides a video analysis architecture based on MEC (Mobile Edge Computing), which comprises the following steps: the system comprises an application layer, a control arrangement layer, an edge calculation layer and a video acquisition layer; the video acquisition layer is used for acquiring video data; the edge computing layer comprises independent edge video analysis units, and the edge video analysis units are used for providing video analysis based on video data acquired by the video acquisition layer;

the control arrangement layer comprises a network controller and a container arranger, the network controller is used for providing an edge network environment for the edge video analysis units, and the container arranger is used for allocating resources for each video analysis unit in the edge calculation layer and monitoring the resource condition of each video analysis unit;

the application layer is used for deploying the application program of the video analysis or providing an interface for the access of the application program according to the video analysis result obtained by the edge video analysis unit.

In a second aspect, the present invention provides an indoor positioning system under a MEC-based video analysis architecture, where the video analysis architecture is a video analysis architecture provided in any one of the possible implementation manners of the technical solutions provided in the first aspect, the edge calculation layer includes each independent edge video analysis unit, and the edge video analysis unit is composed of a video data preprocessing unit, a camera calibration unit, a foreground extraction unit, a feature point extraction unit, a face recognition unit, and a target calibration unit;

the video data preprocessing unit is used for preprocessing the captured video data according to distributed stream processing to obtain a preprocessed video picture;

the camera calibration unit is used for segmenting an indoor plane, acquiring a projection matrix through camera calibration and acquiring a mapping relation between pixels in an image and a spatial three-dimensional object;

the foreground extraction unit is used for carrying out foreground extraction on a moving target appearing in the preprocessed video picture to obtain a target;

the characteristic point extraction unit is used for extracting characteristic points of targets in the video pictures;

the face recognition unit is used for carrying out face recognition based on the feature points to confirm identity information;

and the target positioning unit is used for determining the specific indoor position of the target person according to the identity information of the target person.

The invention has the following beneficial technical effects: the invention provides a novel video analysis architecture. In this new architecture, large amounts of video data do not need to be uploaded and processed remotely through the core network. In addition, part or all of the calculation is performed on the edge calculation node, so that the possibility of link congestion and failure is reduced, the performance is improved, and the response time is shortened; the video analysis process is modular and uses the container as its carrying entity. The same module can be shared among different modules, so that the resource utilization rate is improved, and the unified scheduling and resource allocation of the edge cluster are facilitated.

The invention designs and realizes an intelligent indoor positioning system based on the proposed video analysis architecture. The position information with centimeter-level positioning accuracy in the system can meet various position-based services in the Internet of things.

Drawings

Fig. 1 is a schematic structural diagram of a video analysis architecture according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an indoor positioning system according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of an execution of a target positioning unit in an indoor positioning system according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a simulation evaluation environment of an indoor positioning system according to an embodiment of the present invention.

Fig. 5 is a CDF of position error distances of positioning under different test cases of the indoor positioning system according to an embodiment of the present invention;

fig. 6 is a CDF of position error distances for positioning by different students for the indoor positioning system according to the embodiment of the present invention;

fig. 7 shows average position distance errors of indoor positioning systems positioned at different resolutions according to an embodiment of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific examples. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and specific operation steps are given, but the scope of the present invention is not limited to the following embodiments.

Example 1: the video analysis architecture, as shown in fig. 1, includes: the system comprises an application layer, a control arrangement layer, an edge calculation layer and a video acquisition layer; the video acquisition layer is used for acquiring video data; the edge computing layer comprises independent edge video analysis units, and the edge video analysis units are used for providing video analysis based on video data acquired by the video acquisition layer;

Fig. 1 shows the proposed architecture and the relation between different functional components, from bottom to top in turn a video acquisition layer, an edge calculation layer, a control orchestration layer and an application layer.

In the video acquisition layer, the real-time video stream acquired by the camera is an important data source of the video analysis system. Various types of camera nodes continuously collect data in an uninterrupted operation mode and stream the captured video source data to a streaming service layer. Real-Time video streams acquired by network cameras with different IP addresses are acquired through a Real-Time Streaming Protocol (RTSP), and the video streams are analyzed into corresponding video frames and provided to a system for further analysis and processing. From a global perspective, the density of camera nodes determines the geographic distribution of edge computing nodes. Because more camera nodes means more multimedia video data, the computing and storage resources of the edge cluster should meet the requirements of the video analytics application.

The Edge calculation layer is a cluster composed of a plurality of Edge Video Analysis Units (EVAUs), and each EVAU is an independent Edge calculation node capable of providing Video analysis calculation. By using container-level lightweight virtualization, a specific container instance is used as a basic operation unit, and calculation, network and storage resources of the EVAU are abstracted, so that a service program is decoupled from bottom hardware. In this way, the underlying infrastructure is masked, coordinating highly heterogeneous clusters of EVAUs to act as a homogeneous computing platform. In consideration of the function division, the service requirement and the later maintenance requirement, the whole video analysis task can be divided into different sub-modules according to the function, if the video analysis task comprises modules such as face recognition, indoor positioning and the like. The open source system kubernets, which uses Google for automatically deploying, extending and managing containerized applications, combines the containers that make up the application into a logical unit for management and service discovery. The video analysis is realized by means of a Docker editing management technology, and a Docker container can isolate corresponding functional sub-modules from each other. All the functional processes cannot influence each other, and the computing resources can be reasonably scheduled. A container is a method to package and isolate everything needed for program execution. By "isolated" herein is meant that the container can allocate separate resources from the host. Each module is constructed as a Docker image and then deployed. Because each functional module is relatively independent, transparent mirror image packaging can be carried out on each module, monitoring and management are facilitated, and the change of each functional module is not dependent on the change of the environment.

Different service requests tend to have partially identical video analysis tasks. Therefore, the whole video analysis task can be divided into a plurality of modules according to functions, and each module has a corresponding video analysis function. The processing result of the current functional module is forwarded to the next functional module for subsequent analysis processing. By performing the function division in this way, a general function module is shared among a plurality of application programs, so that the resource utilization rate can be greatly improved. To reduce the response delay of the application, different functional modules are executed at the appropriate time and at the appropriate EVAU according to the resources consumed by each module and a specific resource scheduling algorithm. In addition, different nomenclature is used to uniformly differentiate and manage all video analytics modules.

The video analysis task is typically run on a video analysis unit (EVAU) with heterogeneous functionality. In order to enable the heterogeneous EVAU to accurately execute the same video analysis module, the container is used as a bearer entity of the video analysis function module. Since the container provides virtualization at the operating system level, many problems caused by service migration between different operating systems and infrastructure, such as environment configuration and compatibility, can be avoided. The video analysis module runs in a Docker container, and meanwhile, the resource supervision and task scheduling of the module can be realized by means of a container orchestrator of the platform. The container orchestration function of the lightweight edge computing platform is implemented by kubernets (k8s), which is a platform for automated container operations open by Google, and can perform operations including deployment, scheduling, and inter-node cluster expansion. The Kubernetes management container is used in a video analysis system, so that the automatic container deployment and copying can be realized, the load balance among containers and the container elasticity can be provided, and the like.

The control arrangement layer realizes network and resource monitoring of the video analysis system by means of two functional modules of network control and container arrangement provided by the lightweight edge computing platform, thereby ensuring that the system can run smoothly on the edge platform. Wherein, the network control aims at providing an edge network environment for video analysis, and the network controller can construct a large-scale edge computing network by controlling the virtual switch. Meanwhile, the container orchestrator can acquire the current state and service deployment condition of node resources, supervise the resource conditions of each video analysis unit in the edge computing layer, including CPU (central processing unit), memory and the like, and simultaneously coordinate the smooth operation of the video analysis modules through resource scheduling.

The video analysis application layer is used for displaying and applying video analysis results and provides diversified services for users based on various computer vision algorithms. Diverse video analysis applications such as region management, people flow statistics, security monitoring alarms, etc. can be deployed in this layer. Meanwhile, through the service interaction point, the application programs can also be provided to the third party as open REST API. For example, the cloud may control the behavior of the edge nodes through Remote Procedure Calls (RPCs). The video analysis scheme provided by the embodiment is designed with two keys, and different from other existing solutions, the video analysis efficiency can be improved.

The main steps of the process for realizing video analysis by using the video analysis architecture provided by the embodiment are summarized as follows:

(1) collecting and processing user request parameters: computer vision algorithms typically include various parameters such as video resolution and video sampling rate. Therefore, for a specific application program, the corresponding service interaction point needs to determine specific parameters of the video analysis task from the request of the user.

(2) Dividing a video analysis task: video analysis tasks often involve a series of image processing operations that are run on top of a specific container instance. The application program controller obtains configuration information of a certain video analysis task through SQL query, and the configuration information comprises required video analysis function module composition and corresponding resource consumption. And according to the inquired configuration information, container instances provided in the cluster are matched, and the result is sent to a container orchestrator.

(3) And (3) resource allocation: the container orchestrator allocates resources to each container instance (function module) according to the availability (such as CPU utilization, memory space, network throughput, storage usage, etc.) of the current edge cluster, and represents the allocation result in the form of an orchestration file.

(4) Starting the container: the container is started on the appropriate EVAU in the edge cluster. The resources occupied by each container instance are specified according to the corresponding parameters in the layout file. In addition, the administrator may additionally set the number of copies per container instance in the edge cluster to cope with contingencies that may occur with the EVAU, thereby improving availability.

(5) Acquiring a video: each camera node has an independent IP address, and source video streams captured by the camera nodes are transmitted on an edge network according to a real-time streaming protocol. The streaming media server performs unified management on various video streams and performs format conversion according to requirements.

(6) And executing a video analysis task: when a particular functional module obtains video data, its respective computational task is performed. Such as target tracking, face recognition, foreground extraction, etc.

Example 2: an indoor positioning system under a video analytics architecture, the video analytics architecture employing the video analytics architecture of embodiment 1, the indoor positioning system comprising: the edge calculation layer comprises independent edge video analysis units consisting of a video data preprocessing unit, a camera calibration unit, a foreground extraction unit, a feature point extraction unit, a face recognition unit and a target calibration unit;

the video data preprocessing unit is used for processing the captured video data according to distributed stream processing;

the foreground extraction unit is used for carrying out foreground extraction on a moving target appearing in a video picture to obtain a target;

the characteristic point extraction unit is used for extracting characteristic points of targets in the video frames;

The present embodiment provides an indoor positioning system based on the video analysis architecture and the application requirements of the system in positioning aspect provided in embodiment 1, as shown in fig. 2, the indoor positioning system mainly includes three parts, namely Infrastructure-as-a-Service (IaaS), positioning-as-a-Service (PaaS), and Software Service (Software-as-a-Service (SaaS).

Among them, the infrastructure as a service (IaaS) part mainly provides necessary computing, network and storage resources for the entire system. The method comprises the steps of utilizing an edge computing layer and a control arrangement layer to achieve infrastructure services, meanwhile, carrying out module division on a series of video analysis tasks required by the whole system according to functions by means of a lightweight virtualization technology provided by a container, and processing collected video data in a parallelization mode.

The positioning service is realized by utilizing the video acquisition layer and the edge calculation layer, namely, firstly, the video stream is acquired, and then the target positioning is realized through video analysis (foreground extraction, face recognition, target positioning and the like). The positioning service (PaaS) is the core of the whole system, the indoor positioning scheme is based on the indoor positioning of computer vision, and the target in the video frame is positioned in real time by processing the image of the video stream uploaded by the camera, so that the interference of external noise can be reduced, and the positioning accuracy can be improved. The device mainly comprises the following parts: video acquisition and preprocessing, camera calibration, foreground extraction, feature point extraction, face recognition and target positioning.

And the application of the video analysis result in the application layer realizes software service. Software as a service (SaaS) is at the top of the entire system service stack, providing various Web services to the system, such as watching videos, obtaining location information, and some administrative operations, so that users can easily access directly.

A schematic flowchart of the process performed by the target location unit in the indoor positioning system is shown in fig. 3.

In this embodiment, the video acquisition layer acquires original video data provided by the camera node as input. Different scenarios have different requirements on the accuracy of the location information. For example, in a mall, the user's location information with an error distance within 5 meters is required to push advertisements, and the error distance in a hospital is reduced to 1 meter to monitor patients. Higher video resolution does increase the detailed information in the video data, which means that the positioning information will be greatly enhanced. Therefore, according to the parameters provided by the application request, the original video data is converted into the video data meeting the positioning precision requirement in a specific scene.

Noise is inevitably generated in the video data acquisition process. Therefore, before transmitting the video data to the next portion, the video data needs to be processed by filtering, histogram equalization, or the like using a video data preprocessing unit.

In consideration of the problem of resource limitation of edge devices and massive video data generated by a large number of camera nodes in the internet of things, the embodiment processes the captured video data according to distributed stream processing. The motion video data arrives not stored first but processed in memory. Because the video data is not stored, the consumption of reading and writing the magnetic disk is reduced, and the video analysis is further optimized.

The camera calibration unit is used for acquiring three-dimensional scale information from a two-dimensional image, dividing an indoor plane and acquiring a projection matrix through camera calibration to further acquire a mapping relation between pixels in the image and a space three-dimensional object, so that the purpose of calculating space coordinates by using pixel coordinates is achieved. The following mapping of pixel coordinates and spatial coordinates can be obtained by means of a pinhole imaging model:

where s is a scale factor, u and v are pixel coordinates; x and Y are spatial coordinates; a is an internal reference matrix of the camera and is an inherent parameter of the camera; r is₁And r₂The method is characterized in that the method is a rotation parameter, t is a translation parameter, the parameters are external parameters of a camera, and the relationship between a camera coordinate system and an actual coordinate system is represented. The calibration of the camera on the indoor plane can be realized by solving the equation, and on the basis of a camera imaging model, the conversion relation between pixel coordinates and space coordinates is obtained by solving the internal and external parameters of the camera by means of a single-plane checkerboard-based camera calibration method of Zhangyingyou.

After the camera is calibrated, the foreground extraction unit is required to extract the foreground of the moving object appearing in the video picture so as to realize the positioning of the target person. Common foreground extraction algorithms include a frame difference method, an optical flow method, a background model method, and the like. In an indoor positioning strategy of the system, the embodiment selects a frame difference method to perform foreground extraction on a moving object in a video frame, and is specifically implemented by Python and OpenCV.

After the foreground extraction of the moving target is finished, the feature point extraction of the target can be further finished and the pixel coordinate of the target in the image can be obtained, and meanwhile, the mapping relation between the pixel coordinate and the space coordinate can be further expressed as

In the formula, Z_cIs the actual distance between the person and the camera, d_xAnd d_yIs the length and width of the pixel point, and f is the focal length of the camera. Form an internal matrix M₁Parameter f, d of_x，d_y，v₀，u₀Related to the camera itself, are camera internal parameters, descriptionsThe optical characteristics of the camera are obtained; form a matrix M₂R and T of (a) represent a rotation matrix and a translation vector transformed from the camera coordinate system to the space coordinate system, which are external parameters of the camera; the following relationship can be obtained by substituting the coordinates of the characteristic pixel points of the object after the moving target detection into formula (3):

after the camera is calibrated, the specific values of the internal and external parameters of the camera can be uniquely determined.

With the object localization unit, a database is created in advance comprising facial images of different angles of different persons. The face images are used as a sample set, and corresponding face models are generated through training to provide support for face recognition. Each individual height data is also stored in a database along with the face image.

After extracting the feature points of the target person in the video frame and identifying the identity information by face recognition based on the extracted feature points, the system can search and obtain the height of the corresponding person from the database, and can obtain the pixel size of the corresponding person in the image by detecting the moving target, and then can obtain the actual distance between the person in the room and the camera by combining the pixel distance between the person in the image and the camera, namely the depth of field of the camera. After the actual distance is obtained, the specific position of the target person in the room can be confirmed by solving the mapping relation between the pixel coordinate and the space coordinate.

A schematic diagram of a simulation evaluation environment of an indoor positioning system according to an embodiment of the present invention is shown in fig. 4. A conference room in a laboratory is used as an evaluation environment. The conference room floor tile size is 60cm by 60 cm. In the experiment, three servers with the same configuration are deployed to serve as the role of a video analysis unit (EVAU) and one container arrangement node, and can be found and connected together to form a cluster. The CPU of each server is configured as a six-core processor with double Intel Xeon E5-2620v3, 32GB RAM, 2TB storage and Ubuntu 14.04LTS operating system. A Docker container engine is installed on each edge computing node, and modularization of video analysis can be supported through a virtualization isolation technology provided by the container. Each module corresponds to a specific Docker container instance, and the container encapsulates an environment (OpenCV, FFmpeg) on which the module depends and a corresponding code implementation. Aiming at the indoor positioning system provided by the embodiment, three video analysis modules corresponding to three specific Docker containers are provided, and the three video analysis modules comprise foreground extraction, face detection and recognition, feature point extraction and positioning. The HIKVISION-2 CD3345F-ISIP video camera is used as a camera node. Video data of the camera node is encoded by H.264, the resolution is 2560 x 1440, and the frame rate is 30 fps. The Docker container engine is a tool for creating and managing containers, and then an independent video analysis module can be realized through the virtualization isolation provided by the containers, so that the applications are not interfered with each other during operation.

The container arrangement node realizes a container arrangement function of a lightweight edge computing platform, and is specifically realized through Kubernets (k8s), which are platforms for automated container operation by Google open source, and the executable operations include deployment, scheduling and node cluster expansion. The Kubernetes management container is used in a video analysis system, so that the automatic container deployment and copying can be realized, the load balance among containers and the container elasticity can be provided, and the like.

Kubernets are deployed on all devices in the cluster. And taking the container arrangement node for storing the pixel point model and the database as a main node of Kubernetes, and uniformly scheduling and managing the calculation, network and storage resources of the whole cluster. In addition, the master node is responsible for deployment, replication, and expansion of the various containers in the edge cluster. The behavior of all containers in the edge cluster is specified by the configuration file. The number of copies per container instance was twice that of our experiment. When a Docker process starts on an EVAU, a virtual bridge named Docker0 will be automatically created with its IP address set as the default gateway for all containers started on the node.

Common industrial cameras typically have large optical distortions, meaning that straight lines in the actual scene cannot remain as straight lines in the image. The system adopts a HIKVISION DS-2CD3345F-IS network camera to collect video. In order to accurately acquire scene information, distortion correction processing needs to be performed on captured video.

The effectiveness of a position fix is often evaluated by the position fix accuracy, which is defined as the cumulative percentage of position errors within a specified distance. 37 position points are selected in a laboratory to evaluate the positioning accuracy of the system. Fig. 5 shows the results of the comparison before and after the camera node correction. It can be seen that the corrected camera nodes show better positioning accuracy and have a position error of approximately 80% of the points in the range of 20 cm.

After distortion correction of the camera nodes, three students in the laboratory were randomly selected to further evaluate the positioning system. Student A and student A are already in the database, the height of the student A is 176 cm and 165 cm, and student C is not in the database.

The video resolution was 2560 x 1440. Fig. 6 shows CDFs (relative Distribution functions, CDFs) of position error distances of three students. It can be seen that the feature point selection method has a probability of approximately 90% keeping the error within 20cm if the object to be located is present in the database. If there is no pending target in the database, the position error will increase to 30 cm. Overall, the video analysis based localization system can provide a localization accuracy of 30cm in 90% of the localization results.

In order to further explore the factors influencing the positioning accuracy of the system, the positioning accuracy under different resolutions is analyzed and compared. As shown in fig. 6, as the resolution decreases, the average error distance increases, and the positioning accuracy decreases. This is mainly because the feature information of the target is compressed, resulting in a reduction in the feature detection and recognition effect.

In addition, at low resolution, the fewer pixel points representing the face, which means the fewer feature points that can be extracted, resulting in a reduction in the face recognition effect. As can also be seen from fig. 6, when the resolution is reduced to 720 × 480, the face cannot be correctly detected and recognized, resulting in a sudden increase in the average error distance of the positioning results of the targets (student a and student b) present in the database. In contrast, for the absence of an object to be located (student c) from the database, there is no phenomenon of steep increase in the average error distance. Different feature point selection strategies can be employed to improve positioning accuracy at different resolutions.

The response time of the system mainly comprises three parts, including the transmission time and the processing time of the video data and the return time of the processing result. Since different applications have different response results. Therefore, this section mainly discusses video data transmission time and processing time.

In addition, in order to maximize the utilization of resources in the edge cluster, the resource requirements of the system are evaluated. In the experiment, each edge computing node has 24 logical CPUs, and the main node of Kubernetes is responsible for uniformly distributing resources occupied by all containers in the cluster.

Fig. 7 shows the computational resources and corresponding processing time required by the system in processing video frames of different resolutions. Higher resolution means a larger amount of data and longer processing time. More advanced computer vision algorithms and efficient resource scheduling strategies can be employed to reduce the response time of the system.

The invention firstly provides a video analysis framework for processing video data near different camera nodes by utilizing edge calculation. In addition, in order to improve the efficiency of video analysis, the video analysis process is modularized, so that typical and general functional modules can be shared among different applications. On the basis, an indoor positioning system based on video analysis is designed and realized, and the container is used as a bearing entity of the video analysis functional module. The positioning accuracy of the system and the processing time and computational resources at different resolutions are discussed. Experimental results show that the system can provide centimeter-level positioning accuracy for monitoring scenes in the Internet of things so as to meet the requirements of various location-based services. On the other hand, compared with the traditional cloud computing model, the system has the characteristic of quick response and can meet the real-time requirement of delay-sensitive application.

Claims

1. An MEC-based video analytics architecture, comprising: the system comprises an application layer, a control arrangement layer, an edge calculation layer and a video acquisition layer; the video acquisition layer is used for acquiring video data; the edge computing layer comprises independent edge video analysis units, and the edge video analysis units are used for providing video analysis based on video data acquired by the video acquisition layer; the control arrangement layer comprises a network controller and a container arranger, the network controller is used for providing an edge network environment for the edge video analysis units, and the container arranger is used for allocating resources for each video analysis unit in the edge calculation layer and monitoring the resource condition of each video analysis unit;

2. The MEC-based video analytics architecture of claim 1, wherein said edge video analytics unit has container instances as basic arithmetic units.

3. The MEC-based video analytics architecture of claim 1, wherein said edge computing layer is further configured to store network data.

4. The indoor positioning system under the video analysis architecture based on the MEC is characterized in that the video analysis architecture adopts the video analysis architecture of any one of claims 1 to 3, the edge calculation layer comprises independent edge video analysis units, and each edge video analysis unit consists of a video data preprocessing unit, a camera calibration unit, a foreground extraction unit, a feature point extraction unit, a face recognition unit and a target calibration unit;

5. The indoor positioning system under MEC-based video analytics architecture of claim 4, further comprising a people image database, said people image database being stored at said edge computing layer.

6. The indoor positioning system under MEC-based video analytics architecture of claim 5, wherein the target location unit is specifically configured to: the height of a corresponding person is searched and obtained from a person image database, the moving target is detected and detected through the foreground extraction unit to obtain the pixel size of the corresponding person in the image, and the actual distance between the person in the room and the camera, namely the depth of field of the camera, can be obtained by combining the pixel distance between the person in the image and the camera; after the actual distance is obtained, the specific indoor position of the target person can be confirmed by solving the mapping relation between the pixel coordinate and the space coordinate.

7. The indoor positioning system under MEC-based video analysis architecture of claim 4, wherein the foreground extraction unit selects frame difference method to perform foreground extraction on the moving object in the video frame.