WO2023245833A1 - 基于边缘计算的场景监控方法、装置、设备及存储介质 - Google Patents

基于边缘计算的场景监控方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023245833A1
WO2023245833A1 PCT/CN2022/111842 CN2022111842W WO2023245833A1 WO 2023245833 A1 WO2023245833 A1 WO 2023245833A1 CN 2022111842 W CN2022111842 W CN 2022111842W WO 2023245833 A1 WO2023245833 A1 WO 2023245833A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
abnormal
preset
original image
information
Prior art date
Application number
PCT/CN2022/111842
Other languages
English (en)
French (fr)
Inventor
刘璘
刘译键
曾正
Original Assignee
清华大学
中国人寿财产保险股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学, 中国人寿财产保险股份有限公司 filed Critical 清华大学
Publication of WO2023245833A1 publication Critical patent/WO2023245833A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/02Alarms for ensuring the safety of persons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • This application relates to video surveillance technology, and more specifically, to a scene surveillance method, device, equipment and storage medium based on edge computing.
  • the purpose of this application is to provide a scene monitoring method, device, equipment and storage medium based on edge computing to improve the efficiency and accuracy of scene monitoring.
  • this application discloses a scene monitoring method based on edge computing.
  • the method is applied to edge devices and includes:
  • abnormal information is determined from the original image, where the abnormal information is used to represent information that needs to be warned in the preset scene, and the abnormal information is the abnormal image area in the original image
  • the information in the abnormal image area is determined based on the confidence between the abnormal image area and the preset abnormal image
  • the image frames within the scene are obtained as the original image.
  • feature extraction is performed on the original image to obtain abnormal information in the original image.
  • the edge device directly sends abnormal information to the client, and the client issues an alarm. This avoids the process of the edge device sending the video to the cloud platform, Yunping identifying the anomaly, and then the cloud platform sending the identification results to the client. It solves the problem of delay caused by network transmission in the existing technology, reduces the pressure on the cloud platform, and improves the efficiency and accuracy of scene monitoring.
  • determine abnormal information from the original image according to the preset neural network model including:
  • the original image is input into a preset neural network model, and the original image is traversed according to the preset target frame size to obtain at least two partial images in the original image, wherein the partial image
  • the size is the default target box size
  • Abnormal information in the original image is determined based on the confidence level of the partial image.
  • each pixel in the original image can be identified based on the partial image to avoid missing information and improve the accuracy of scene monitoring.
  • the confidence level of the local image it can be determined whether the local image is similar to the abnormal image. The greater the confidence level, the higher the similarity between the local image and the abnormal image, and the greater the possibility that abnormal information exists in the local image. Automatically determine abnormal information, save manpower and time, and improve scene monitoring efficiency.
  • determining the abnormal information in the original image according to the confidence level of the partial image includes:
  • the abnormal image area is converted into text data, and the text data is determined as abnormal information of the original image.
  • the confidence level can be used to quickly determine whether a local image is abnormal and improve monitoring efficiency. Convert abnormal content into text format for users to view through the client, improving the user's scene monitoring experience.
  • the confidence between the local image of each target frame size and the abnormal image corresponding to the target frame size is determined.
  • the corresponding abnormal image is determined, which avoids the process of calculating the confidence of the local image and all abnormal images, reduces the amount of calculation, and effectively improves the efficiency of scene monitoring.
  • the preset neural network model after determining abnormal information from the original image according to the preset neural network model, it also includes:
  • the original image corresponding to the abnormal information is encoded into a video stream, and the video stream is sent to the client.
  • the client send the video stream to the client so that users can visually view the abnormal conditions of the scene.
  • the exception information and the original image to which the exception information belongs can also be sent to the client. That is, the client can receive dynamic videos and static pictures, which is convenient for users to view and improves user experience.
  • sending the video stream to the client includes:
  • the video stream is pushed to the cloud platform according to the preset real-time message transmission protocol, and the cloud platform pushes the video stream to the client according to the preset web page real-time communication protocol.
  • the edge device uses the real-time message transmission protocol to push the video stream to the cloud platform.
  • the cloud platform supports the real-time message transmission protocol video stream to be converted into a web page real-time communication protocol video stream.
  • the cloud platform uses the web page real-time communication protocol to the client. Perform video streaming push. Effectively reduce delays, achieve the effect of real-time live broadcast, and improve the efficiency of scene monitoring.
  • send the abnormal information to the client for alarm including:
  • the user can know the time and location of the abnormality, which facilitates timely processing of abnormal situations and improves the efficiency and accuracy of scene monitoring.
  • obtain the original image in the preset scene including:
  • the camera automatically obtains the video of the preset scene and obtains the original image, which facilitates the acquisition of real scenes of each preset scene, understands the situation in the scene in a timely manner, and effectively improves the efficiency and accuracy of scene monitoring.
  • this application discloses a scene monitoring device based on edge computing.
  • the device is configured on an edge device and includes:
  • Image acquisition module used to acquire original images in preset scenes
  • An information determination module configured to determine abnormal information from the original image according to a preset neural network model, wherein the abnormal information is used to represent information that needs to be warned in a preset scene, and the abnormal information is the Information in the abnormal image area in the original image, the abnormal image area is determined based on the confidence between the abnormal image area and the preset abnormal image;
  • An alarm module is used to send the abnormal information to the client for alarm.
  • this application provides an electronic device, including: a processor, and a memory communicatively connected to the processor;
  • the memory stores computer execution instructions
  • the processor executes the computer execution instructions stored in the memory to implement the edge computing-based scene monitoring method as described in the first aspect.
  • the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores computer-executable instructions. When the computer-executable instructions are executed by a processor, they are used to implement the method based on the first aspect. Scene monitoring methods for edge computing.
  • the present application provides a computer program product, including a computer program that implements the edge computing-based scene monitoring method as described in the first aspect when executed by a processor.
  • this application provides a scene monitoring method, device, equipment and storage medium based on edge computing.
  • video monitoring the preset scene image frames within the scene are obtained as original images.
  • feature extraction is performed on the original image to obtain abnormal information in the original image.
  • the edge device directly sends abnormal information to the client, and the client issues an alarm. This avoids the process of the edge device sending the video to the cloud platform, Yunping identifying the anomalies, and then the cloud platform sending the identification results to the client. It solves the problem of delay caused by network transmission in the existing technology, reduces the pressure on the cloud platform, and improves the efficiency and accuracy of scene monitoring.
  • Figure 1 is a structural block diagram of an edge computing-based scene monitoring system provided by an embodiment of the present application
  • Figure 2 is a schematic flowchart of an edge computing-based scene monitoring method provided by an embodiment of the present application
  • Figure 3 is a structural block diagram of an edge computing-based scene monitoring system provided by an embodiment of the present application.
  • Figure 4 is a schematic flowchart of an edge computing-based scene monitoring method provided by an embodiment of the present application.
  • Figure 5 is a schematic flow chart of an edge computing-based scene monitoring method provided by an embodiment of the present application.
  • Figure 6 is a structural block diagram of an edge computing-based scene monitoring device provided by an embodiment of the present application.
  • Figure 7 is a structural block diagram of an edge computing-based scene monitoring device provided by an embodiment of the present application.
  • Figure 8 is a structural block diagram of an electronic device provided by an embodiment of the present application.
  • Figure 9 is a structural block diagram of an electronic device provided by an embodiment of the present application.
  • Figure 1 is a structural block diagram of a scene monitoring system based on edge computing in this application.
  • the edge device uses the camera device to collect the scene video of the production site, and sends the collected video data to the cloud platform.
  • the cloud platform identifies the video data and determines whether there is an abnormality in the production site. That is, the response service for the abnormality is deployed in on the cloud platform. If it is determined that an exception exists, the cloud platform can send the exception information to the client. For example, it can be pushed to the enterprise platform or security production big data platform on the client, etc.
  • the client can be a mobile phone or PC (Personal Computer), etc.
  • the cloud platform needs to identify abnormal information in the video data of multiple edge devices.
  • the cloud platform is under great pressure and is prone to errors, making scene monitoring less accurate.
  • the cloud platform then transmits data to the client. Multiple network transmissions can easily cause high delays and affect the efficiency of scene monitoring.
  • This application provides a scene monitoring method, device, equipment and storage medium based on edge computing, aiming to solve the above technical problems of the existing technology.
  • FIG. 2 is a schematic flowchart of an edge computing-based scene monitoring method provided according to an embodiment of the present application.
  • the method provided in this embodiment is applied to edge devices and executed by an edge computing-based scene monitoring device. As shown in Figure 2, the method includes the following steps:
  • the edge device may be a device arranged on the side of a preset scene, there may be multiple preset scenes, and the edge device may monitor multiple preset scenes.
  • the preset scene can be an industrial production site, and edge devices can be mobile terminals and other devices.
  • An image collection device can be installed on the edge device, and the image collection device of the edge device can be arranged in a preset scene to collect videos of the preset scene in real time or at scheduled times.
  • the edge device obtains the video collected by the image collection device.
  • the video includes multiple video frames. Each video frame can be an original image, thereby obtaining the original image in the preset scene.
  • the edge device can also directly collect real-time or scheduled images of the preset scene without collecting video, and the resulting image is the original image.
  • the edge device can acquire an original image every 10 minutes through the image acquisition device; or the edge device can acquire a 3-minute video every 10 minutes through the image acquisition device, and use the video frames in the three-minute video as the original image.
  • obtaining the original image in the preset scene includes: using a camera installed on the edge device to obtain the video frame in the preset scene as the original image based on a preset data collection cycle.
  • the edge device is equipped with an image acquisition device, and the image acquisition device may be a camera.
  • the data collection period is the period during which the camera collects video of the preset scene.
  • the data collection period is 10 minutes, that is, a video of the preset scene is collected every 10 minutes. You can preset the length of each video, for example, collect 5 minutes of video each time.
  • the camera regularly collects videos in the preset scene.
  • the edge device obtains the video of the preset scene, decomposes the video, obtains the video frames in the video, and determines each video frame in the video as the original image.
  • the beneficial effect of this setting is that the camera automatically obtains the video of the preset scene and obtains the original image, which facilitates the acquisition of real scenes of each preset scene, understands the situation in the scene in a timely manner, and effectively improves the efficiency and accuracy of scene monitoring.
  • S202 Determine abnormal information from the original image according to the preset neural network model.
  • the abnormal information is used to represent information that needs to be warned in the preset scene.
  • the abnormal information is the information in the abnormal image area in the original image.
  • the abnormal image The area is determined based on the confidence between the abnormal image area and the preset abnormal image.
  • abnormal information refers to information that needs to be alerted in a preset scenario.
  • abnormal information can include information such as the unsafe status of equipment and unsafe behaviors of personnel.
  • the unsafe status of equipment can include exposed, rotated or cut energized wires.
  • Equipment and other dangerous equipment lack protection, toxic gas leaks, etc.; unsafe behaviors of personnel may include workers not wearing safety helmets or safety clothing at the construction site, incorrectly wearing seat belts when working at heights, smoking or smoking near flammable and explosive equipment, or Call and wait.
  • the neural network model it can be identified whether there is abnormal information in the original image, and if it exists, the abnormal information in the original image is determined. For example, it is possible to identify whether there is a person in the original image. If so, identify whether the person is not wearing a safety helmet. If not, determine whether there is abnormal information in the original image, and determine whether the abnormal information is that the person is not wearing a safety helmet. wear safety helmet.
  • the neural network model can include convolutional layers, pooling layers, fully connected layers, etc. Feature extraction is performed through the neural network model to obtain the image features of the original image, and abnormal information is obtained based on the image features. For example, people can be identified based on the image features to determine whether the person is wearing a safety helmet; or the clarity of the original image can be determined, and based on the clarity, it can be determined whether there is abnormal information such as smoke in the preset scene.
  • the abnormal information may be information in an abnormal image area in the original image, and the abnormal image area is an area in the original image where abnormal information exists.
  • the abnormal image area may be an area of a rectangular frame.
  • the abnormal image area can be located at any position in the original image.
  • the abnormal image area can be at the upper right or middle position of the original image.
  • the original image can be divided into multiple areas, determine whether there is abnormal information in these areas, and determine the area with abnormal information as the abnormal image area.
  • the abnormal image area may be determined based on the confidence between the abnormal image area and the preset abnormal image.
  • the preset abnormal image is a preset image with abnormal information. Calculate the difference between each area in the original image and the preset abnormal image. Based on the confidence level, it is determined whether each area in the original image is an abnormal image area. For example, an area whose confidence exceeds a preset confidence threshold is determined as an abnormal image area in which abnormal information exists.
  • the abnormal image area where abnormal information exists can be determined first through confidence, and then the abnormal image area is identified, and the abnormal information in the abnormal image area is identified as the abnormal information of the original image where the abnormal image area is located.
  • the preset neural network model may be an autoencoder.
  • the autoencoder is pre-trained, and the training samples are normal scene pictures without abnormal information. Input the training samples into the autoencoder and output the reconstructed image, which will be close to the normal scene picture. Therefore, the generalization ability of the autoencoder is explicitly suppressed to force the reconstructed image to approach the normal scene image to complete the training of the autoencoder.
  • the original image is input to the autoencoder.
  • the similarity between the original image and the reconstructed image is determined. Based on the similarity, it can be judged whether there is a abnormal. If the original image is a normal scene image, the similarity between the original image and the reconstructed image is high; if there is abnormal information in the original image, the similarity between the original image and the reconstructed image is low.
  • a similarity threshold can be preset. If the similarity is higher than the similarity threshold, it is determined that the similarity between the original image and the reconstructed image is low, and there is abnormal information in the original image.
  • a supervised neural network model can be pre-trained to obtain the anomaly information of the original image.
  • the autoencoder is an unsupervised neural network model, thus realizing the combined application of supervised and unsupervised neural network models.
  • Both unsupervised neural network models and supervised neural network models can be pre-trained.
  • deep neural network acceleration optimization technology can be used to optimize the neural network model and improve the efficiency and accuracy of the model.
  • the optimization and acceleration directions of the model can include convolution optimization, model pruning, and model quantization.
  • model quantization can be to convert the parameters in the quantizer into trainable objects, thereby constructing a mapping relationship from the high-bit representation of the model weights and activations to the low-bit representation.
  • Perform quantization function training on key layers to increase the speed of quantization-aware training, allowing the model to quickly recover accuracy after performing a small amount of completed quantization-aware training.
  • the edge device For example, if there is no abnormal information in the original image, the edge device continues to obtain a new original image in the preset scene and determines whether there is abnormal information in the new original image. If there is abnormal information in the original image, the edge device will send the abnormal information to the client, and the client will alarm and issue prompt information.
  • the prompt information can be a text message prompt or a voice prompt, etc.
  • Edge devices can also issue prompts such as broadcasts in preset scenes.
  • the edge device can be bound to one or more clients in advance, and after determining the abnormal information, send the abnormal information to the bound clients.
  • FIG. 3 is a structural block diagram of an edge computing-based scene monitoring system in an embodiment of the present application.
  • the scene monitoring system based on edge computing can include edge devices and clients without a cloud platform. The edge device communicates directly with the client, and the edge device performs image recognition and determination of abnormal information, sends the abnormal information to the client, and the client handles the alarm. It reduces the pressure on the cloud platform, reduces network transmission delays, and improves monitoring efficiency.
  • sending the abnormal information to the client for alarm includes: determining the collection time and collection location of the original image corresponding to the abnormal information, and sending the collection time, collection location and abnormal information to the client.
  • the collection time and collection location of the original image with abnormal information are determined. For example, the number of the preset scene corresponding to the original image can be determined as the collection location.
  • the edge device When the edge device obtains original images, it will store the collection time and collection location of each original image to facilitate subsequent acquisition to the client.
  • the beneficial effect of this setting is that by obtaining the collection time and collection location of the original image, the user can know the time and location of the abnormality, which facilitates timely processing of abnormal situations and improves the efficiency and accuracy of scene monitoring.
  • An embodiment of the present application provides a scene monitoring method based on edge computing.
  • image frames within the scene are obtained as original images.
  • feature extraction is performed on the original image to obtain abnormal information in the original image.
  • the edge device directly sends abnormal information to the client, and the client issues an alarm. This avoids the process of the edge device sending the video to the cloud platform, Yunping identifying the anomaly, and then the cloud platform sending the anomaly information to the client. It solves the problem of delay caused by network transmission in the existing technology, reduces the pressure on the cloud platform, and improves the efficiency and accuracy of scene monitoring.
  • Figure 4 is a schematic flowchart of a scene monitoring method based on edge computing provided by an embodiment of the present application. This embodiment is an optional embodiment based on the above embodiment.
  • determining the abnormal information from the original image according to the preset neural network model can be refined as follows: input the original image into the preset neural network model, and calculate the original image according to the preset target frame size. Traverse to obtain at least two partial images in the original image, where the size of the partial image is the preset target frame size; extract the image features of the partial image, and determine the confidence level of the partial image based on the image features of the partial image , the confidence level represents the confidence level between the local image and the preset abnormal image; according to the confidence level of the local image, the abnormal information in the original image is determined.
  • the method includes the following steps:
  • this step can be referred to the above-mentioned step S201, which will not be described again.
  • S402. Input the original image into the preset neural network model, traverse the original image according to the preset target frame size, and obtain at least two partial images in the original image, where the size of the partial image is the preset size.
  • Target box size Input the original image into the preset neural network model, traverse the original image according to the preset target frame size, and obtain at least two partial images in the original image, where the size of the partial image is the preset size. Target box size.
  • a neural network model is built in advance, the input of the neural network model is the original image, and the output is the abnormal information in the original image.
  • the original image is input into the neural network model, and the target frame size is set in the neural network model.
  • the target frame size is equal to or smaller than the size of the original image.
  • the target frame size can be 10 ⁇ 10 pixels.
  • traversing the original image is to perform frame selection on the original image based on the size of the target frame.
  • the original image is divided according to the pixel ordering of the original image, and multiple target frame-sized images are selected as partial images.
  • Each pixel in the original image can be divided into at least one partial image. For example, if the target frame size is 10 ⁇ 10, you can first use the first 10 rows and 10 columns of the original image as a partial image, and then start from the second column and then select the 10 ⁇ 10 area backwards as a partial image.
  • Another partial image that is, the area of the first 10 rows and columns 2 to 11 can be used as another partial image.
  • each pixel in the original image can be identified based on the partial image to avoid missing information and improve the accuracy of scene monitoring.
  • Target frame sizes of different sizes can be preset.
  • the target frame size can be 10 ⁇ 10 and 80 ⁇ 80, with 10 ⁇ 10 being a small target frame and 80 ⁇ 80 being a large target frame.
  • the pixels in the original image must be traversed so that each pixel in the original image can be divided into partial images of each target frame size.
  • the number of partial images with different target frame sizes can be different. It supports multi-scale anomaly recognition and has good recognition effects on both large target frame anomalies and small target frame anomalies to adapt to multi-scale features in industrial production scenarios.
  • multi-scale refers to the diversity of target frame sizes.
  • anomalies can be identified for the entire human body in industrial production scenarios; anomalies can be identified for partial images of large target frames; if helmets on human heads are identified, anomalies can be identified for partial images of small target frames.
  • the neural network model includes convolutional layers, pooling layers, fully connected layers, etc., which can extract features from images. After obtaining the partial images, the image features of each partial image are extracted according to the neural network model.
  • the abnormal image is an image containing abnormal information.
  • the abnormal image is an image of a person not wearing a safety helmet.
  • the confidence between each local image and each abnormal image is calculated, and the confidence between the local image and the abnormal image is used as the confidence level of the local image. If there are two preset abnormal images, two confidence levels can be obtained for each partial image.
  • the confidence level of the local image it can be determined whether the local image is similar to the abnormal image. The greater the confidence level, the higher the similarity between the local image and the abnormal image, and the greater the possibility that abnormal information exists in the local image.
  • the preset target frame size is at least two preset sizes; determining the confidence level of the local image according to the image characteristics of the local image includes: based on the difference between the different preset target frame sizes and the abnormal image. Correlation relationship, determine the abnormal image corresponding to each target frame size; determine the confidence between the local image of each target frame size and the abnormal image corresponding to the target frame size.
  • target frame sizes of different sizes are preset, and abnormal images of different sizes can be preset according to the preset target frame sizes.
  • Each target box size can correspond to multiple abnormal images.
  • the association between the target frame size and the abnormal image can be set in advance. For example, for a target frame size of 10 ⁇ 10, the associated abnormal image can be an image of a small area of the human head; for a target frame size of 80 ⁇ 80, The associated anomaly image may be an image of a large area of the entire human body.
  • the abnormal image corresponding to the target frame size is determined. Calculate the confidence between the local image and the corresponding abnormal images to obtain the confidence of the local image. Multiple confidence levels can be obtained for each local image to determine whether the local image is similar to multiple abnormal images.
  • the beneficial effect of this setting is that the corresponding abnormal image is determined based on the target frame size of the local image, which avoids the process of calculating the confidence of the local image and all abnormal images, reduces the amount of calculation, and effectively improves the efficiency of scene monitoring.
  • each local image After obtaining the confidence level of each local image, it is determined whether there is abnormal information in the local image based on the confidence level. If it exists, it is determined that there is abnormal information in the original image where the partial image is located. Determine the abnormal information in the partial image, that is, obtain the abnormal information in the original image.
  • the confidence level of the local image is calculated with multiple abnormal images. Therefore, the respective confidence levels of the local image can be compared. The greater the confidence level, the closer the local image is to the abnormal image corresponding to the confidence level. If the local image is close to the abnormal image, it is determined that abnormal information exists in the local image. That is, based on the comparison of confidence levels, it can be determined whether there is abnormal information in the local image.
  • the image content can be extracted as text data based on the image content in the partial image. For example, if the partial image shows a person not wearing a safety helmet, the abnormal information can be "not wearing a safety helmet.” Automatically determine abnormal information, save manpower and time, and improve scene monitoring efficiency.
  • determining the abnormal information in the original image according to the confidence level of the partial image includes: determining the partial image whose confidence level exceeds the preset confidence threshold as an abnormal image area; converting the abnormal image area into text data , and determine the text data as abnormal information of the original image.
  • the confidence threshold is set in advance, and the confidence threshold is the maximum confidence value allowed when there is no abnormal information in the local image. That is, if the confidence level of the partial image exceeds the confidence threshold, it is determined that abnormal information exists in the partial image.
  • the abnormal information of each abnormal image can be set in advance, and the preset abnormal information is in the format of text data. After determining that there is abnormal information in the local image, determine the abnormal image corresponding to the confidence level of the local image exceeding the confidence threshold, and determine the abnormal information of the abnormal image as the abnormal information of the local image, that is, as the abnormality of the original image information.
  • the beneficial effect of this setting is that it can quickly determine whether a local image is abnormal through the confidence level and improve monitoring efficiency. Convert abnormal content into text format for users to view through the client, improving the user's scene monitoring experience.
  • this step can be referred to the above-mentioned step S203, which will not be described again.
  • An embodiment of the present application provides a scene monitoring method based on edge computing.
  • image frames within the scene are obtained as original images.
  • feature extraction is performed on the original image to obtain abnormal information in the original image.
  • the edge device directly sends abnormal information to the client, and the client issues an alarm. This avoids the process of the edge device sending the video to the cloud platform, Yunping identifying the anomaly, and then the cloud platform sending the anomaly information to the client. It solves the problem of delay caused by network transmission in the existing technology, reduces the pressure on the cloud platform, and improves the efficiency and accuracy of scene monitoring.
  • Figure 5 is a schematic flowchart of a scene monitoring method based on edge computing provided by an embodiment of the present application. This embodiment is an optional embodiment based on the above embodiment.
  • the original image corresponding to the abnormal information is encoded into a video stream, and The video stream is sent to the client.
  • the method includes the following steps:
  • this step can be referred to the above-mentioned step S201, which will not be described again.
  • the abnormal information is used to represent the information that needs to be warned in the preset scene.
  • the abnormal information is the information in the abnormal image area in the original image.
  • the abnormal image The area is determined based on the confidence between the abnormal image area and the preset abnormal image.
  • this step may refer to the above-mentioned step S202, which will not be described again.
  • a hardware video stream encoder is preset in the edge device, and the hardware video stream encoder can be used to encode multiple original images into video streams.
  • GPU Graphic Processing Unit, graphics processor
  • CPU Central Processing Unit, central processing unit
  • Continuous original images with abnormal information are determined, and the continuous original images are encoded into a video stream. If there is only one original image with abnormal information, or the original images with abnormal information are discontinuous, a preset number of original images before and after the original image can be determined, and these original images can be encoded into a video stream.
  • the client send the video stream to the client so that users can visually view the abnormal conditions of the scene.
  • the exception information and the original image to which the exception information belongs can also be sent to the client. That is, the client can receive dynamic videos and static pictures, which is convenient for users to view and improves user experience.
  • sending the video stream to the client includes: pushing the video stream to the cloud platform according to the preset real-time message transmission protocol, and allowing the cloud platform to push the video stream to the client according to the preset web page real-time communication protocol. end.
  • the edge device can also communicate with the cloud platform, and the cloud platform can also communicate with the client.
  • the edge device can send the video stream to the cloud platform according to the preset transmission protocol, and the cloud platform then sends the video stream to the client.
  • the edge device pushes the video stream to the cloud platform according to RTMP (Real Time Messaging Protocol), and the cloud platform pushes the video stream to the customer according to WebRTC (Web Real-Time Communication).
  • RTMP Real Time Messaging Protocol
  • WebRTC Web Real-Time Communication
  • the beneficial effect of this setting is that by adopting real-time audio and video communication technology, the edge device uses the RTMP protocol to push video streams to the cloud platform.
  • the cloud platform supports the conversion of RTMP video streams into WebRTC video streams, and the cloud platform uses the WebRTC protocol to stream videos to the client. push. That is, the video stream is released from the edge device to the cloud platform, and the cloud platform sends the video stream to the client through the streaming media processing service, effectively reducing the delay.
  • the measured delay is 300 milliseconds, achieving the effect of real-time live broadcast and improving the efficiency of scene monitoring.
  • An embodiment of the present application provides a scene monitoring method based on edge computing.
  • image frames within the scene are obtained as original images.
  • feature extraction is performed on the original image to obtain abnormal information in the original image.
  • the edge device directly sends abnormal information to the client, and the client issues an alarm. This avoids the process of the edge device sending the video to the cloud platform, Yunping identifying the anomaly, and then the cloud platform sending the anomaly information to the client. It solves the problem of delay caused by network transmission in the existing technology, reduces the pressure on the cloud platform, and improves the efficiency and accuracy of scene monitoring.
  • FIG. 6 is a structural block diagram of an edge computing-based scene monitoring device provided by an embodiment of the present application.
  • the device is configured on an edge device. For convenience of explanation, only parts related to the embodiments of the present disclosure are shown.
  • the device includes: an image acquisition module 601 , an information determination module 602 and an alarm module 603 .
  • the image acquisition module 601 is used to acquire the original image in the preset scene
  • the information determination module 602 is used to determine abnormal information from the original image according to the preset neural network model, where the abnormal information is used to represent information that needs to be warned in the preset scene, and the abnormal information is the Information in the abnormal image area in the original image, where the abnormal image area is determined based on the confidence between the abnormal image area and the preset abnormal image;
  • the alarm module 603 is used to send the abnormal information to the client for alarm.
  • Figure 7 is a structural block diagram of an edge computing-based scene monitoring device provided by an embodiment of the present application. Based on the embodiment shown in Figure 6, as shown in Figure 7, the information determination module 602 includes a local image acquisition unit 6021, Confidence determining unit 6022 and anomaly information determining unit 6023.
  • the local image obtaining unit 6021 is used to input the original image into a preset neural network model, traverse the original image according to the preset target frame size, and obtain the original image in the original image. At least two partial images, wherein the size of the partial images is a preset target frame size;
  • Confidence determining unit 6022 configured to extract the image features of the partial image, and determine the confidence level of the partial image based on the image features of the partial image, where the confidence level represents the partial image and the preset abnormal image confidence between;
  • the abnormal information determining unit 6023 is configured to determine abnormal information in the original image according to the confidence level of the partial image.
  • the exception information determination unit 6023 is specifically used for:
  • the abnormal image area is converted into text data, and the text data is determined as abnormal information of the original image.
  • the preset target frame size is at least two preset sizes; the confidence determination unit 6022 is specifically used to:
  • the confidence between the local image of each target frame size and the abnormal image corresponding to the target frame size is determined.
  • the device also includes:
  • a video stream sending module configured to encode the original image corresponding to the abnormal information into a video stream according to a preset hardware video stream encoder after determining abnormal information from the original image according to a preset neural network model. , and sends the video stream to the client.
  • the video stream sending module is specifically used for:
  • the video stream is pushed to the cloud platform according to the preset real-time message transmission protocol, and the cloud platform pushes the video stream to the client according to the preset web page real-time communication protocol.
  • the alarm module 603 is specifically used for:
  • the image acquisition module 601 is specifically used for:
  • Figure 8 is a structural block diagram of an electronic device provided by an embodiment of the present application. As shown in Figure 8, the electronic device includes: a memory 81, a processor 82; the memory 81; and a memory used to store instructions executable by the processor 82.
  • the processor 82 is configured to execute the method provided by the above embodiment.
  • the electronic device also includes a receiver 83 and a transmitter 84.
  • the receiver 83 is used to receive instructions and data sent by other devices, and the transmitter 84 is used to send instructions and data to external devices.
  • Figure 9 is a structural block diagram of an electronic device according to an exemplary embodiment.
  • the device may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal Digital assistants, vehicles and more.
  • Device 900 may include one or more of the following components: processing component 902 , memory 904 , power component 906 , multimedia component 908 , audio component 910 , input/output (I/O) interface 912 , sensor component 914 , and communications component 916 .
  • Processing component 902 generally controls the overall operations of device 900, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 902 may include one or more processors 920 to execute instructions to complete all or part of the steps of the above method.
  • processing component 902 may include one or more modules that facilitate interaction between processing component 902 and other components.
  • processing component 902 may include a multimedia module to facilitate interaction between multimedia component 908 and processing component 902.
  • Memory 904 is configured to store various types of data to support operations at device 900 . Examples of such data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, etc.
  • Memory 904 may be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EEPROM), Programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EEPROM erasable programmable read-only memory
  • EPROM Programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory flash memory, magnetic or optical disk.
  • Power supply component 906 provides power to the various components of device 900 .
  • Power supply components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 900 .
  • Multimedia component 908 includes a screen that provides an output interface between the device 900 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action.
  • multimedia component 908 includes a front-facing camera and/or a rear-facing camera.
  • the front camera and/or the rear camera may receive external multimedia data.
  • Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.
  • Audio component 910 is configured to output and/or input audio signals.
  • audio component 910 includes a microphone (MIC) configured to receive external audio signals when device 900 is in operating modes, such as call mode, recording mode, and speech recognition mode. The received audio signals may be further stored in memory 904 or sent via communications component 916 .
  • audio component 910 also includes a speaker for outputting audio signals.
  • the I/O interface 912 provides an interface between the processing component 902 and a peripheral interface module, which may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.
  • Sensor component 914 includes one or more sensors that provide various aspects of status assessment for device 900 .
  • the sensor component 914 can detect the open/closed state of the device 900, the relative positioning of components, such as the display and keypad of the device 900, and the sensor component 914 can also detect a change in position of the device 900 or a component of the device 900. , the presence or absence of user contact with device 900 , device 900 orientation or acceleration/deceleration and temperature changes of device 900 .
  • Sensor assembly 914 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • Sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 916 is configured to facilitate wired or wireless communications between device 900 and other devices.
  • Device 900 can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof.
  • the communication component 916 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communications component 916 also includes a near field communications (NFC) module to facilitate short-range communications.
  • NFC near field communications
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • device 900 may be configured by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable Gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are implemented for executing the above method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable Gate array
  • controller microcontroller, microprocessor or other electronic components are implemented for executing the above method.
  • a non-transitory computer-readable storage medium including instructions such as a memory 904 including instructions, which are executable by the processor 920 of the device 900 to complete the above method is also provided.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
  • a non-transitory computer-readable storage medium when instructions in the storage medium are executed by a processor of a terminal device, enable the terminal device to perform the above edge computing-based scene monitoring method of the terminal device.
  • This application also discloses a computer program product, which includes a computer program that implements the method described in this embodiment when executed by a processor.
  • Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or a combination thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or a combination thereof.
  • These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor
  • the processor which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code may execute entirely on the machine, partially on the machine, as a stand-alone software package, partially on the machine and partially on a remote machine or entirely on the remote machine or electronic device.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data electronics device), or a computing system that includes middleware components (e.g., an application electronics device), or a computing system that includes a front-end component (e.g., as a data electronics device).
  • back-end components e.g., as a data electronics device
  • middleware components e.g., an application electronics device
  • a front-end component e.g., as a data electronics device.
  • a user computer having a graphical user interface or a web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • LAN local area network
  • Computer systems may include clients and electronic devices. Clients and electronic devices are generally remote from each other and typically interact through a communications network. The relationship of client and electronic device is created by computer programs running on respective computers and having a client-electronic device relationship with each other.
  • Electronic equipment can be cloud electronic equipment, also known as cloud computing electronic equipment or cloud host. It is a host product in the cloud computing service system to solve the problem of traditional physical host and VPS service ("Virtual Private Server", or "Virtual Private Server” for short) VPS”) has the disadvantages of difficult management and weak business scalability.
  • the electronic device can also be an electronic device of a distributed system, or an electronic device combined with a blockchain.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

一种基于边缘计算的场景监控方法、装置、设备及存储介质。该方法应用于边缘设备,包括:获取预设场景内的原始图像(S201);根据预设的神经网络模型,从原始图像中确定异常信息,其中,异常信息用于表示预设场景中需要被告警的信息,异常信息为原始图像中异常图像区域中的信息,异常图像区域为基于该异常图像区域与预设异常图像之间的置信度所确定的(S202);将异常信息发送至客户端进行告警(S203)。在边缘侧进行异常信息的确定,由边缘侧直接将异常信息发给客户端,减少边缘侧与云平台,以及云平台与客户端之间的网络传输带来的延时,提高告警效率。

Description

基于边缘计算的场景监控方法、装置、设备及存储介质
本申请要求于2022年6月22日提交中国专利局、申请号为2022107276426、申请名称为“基于边缘计算的场景监控方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频监控技术,更为具体地,涉及一种基于边缘计算的场景监控方法、装置、设备及存储介质。
背景技术
设备的不安全状态和人的不安全行为会造成重大的安全责任事故,将人工智能新技术引入工业安全领域,对异常场景和异常行为进行实时监测和告警,尽早发现事故隐患,保障工业安全。常见的设备不安全状态包括通电电线裸露在外、转动或切割设备等危险设备缺乏保护、有毒气体泄漏等;人的不安全行为包括工人在工地不佩戴安全帽、不穿安全服、进行高空作业不正确佩戴安全带、在易燃易爆设备附近抽烟或打电话等。
现有技术中,人工对现场进行视频监控和告警,浪费大量人力和时间,容易遗漏视频中的信息,影响场景监控的效率和精度。
发明内容
本申请的目的在于提供一种基于边缘计算的场景监控方法、装置、设备及存储介质,用以提高场景监控的效率和精度。
第一方面,本申请公开了一种基于边缘计算的场景监控方法,该方法应用于边缘设备,包括:
获取预设场景内的原始图像;
根据预设的神经网络模型,从所述原始图像中确定异常信息,其中,所述异常信息用于表示预设场景中需要被告警的信息,所述异常信息为所述原始图像中异常图像区域中的信息,所述异常图像区域为基于该异常图像区域与预设异常图像之间的置信度所确定的;
将所述异常信息发送至客户端进行告警。
基于上述技术内容,通过对预设场景进行视频监控,得到场景内的图像帧,作为原始图像。根据预设的神经网络模型,对原始图像进行特征提取,得到原始图像中的异常信息。边缘设备直接将异常信息发送给客户端,由客户端进行告警。避免了由边缘设备将视频发送给云平台,由云平进行异常识别,再由云平台将识别结果发给客户端的过程。解决了现有技术中网络传输造成延时的问题,减轻了云平台的压力,提高场景监控的效率和精度。
可选的,根据预设的神经网络模型,从所述原始图像中确定异常信息,包括:
将所述原始图像输入至预设的神经网络模型中,根据预设的目标框大小,对所述原始 图像进行遍历,得到所述原始图像中的至少两张局部图像,其中,所述局部图像的大小为预设的目标框大小;
提取所述局部图像的图像特征,并根据所述局部图像的图像特征,确定所述局部图像的置信度大小,所述置信度大小表示局部图像与预设异常图像之间的置信度;
根据所述局部图像的置信度大小,确定所述原始图像中的异常信息。
通过划分局部图像,可以基于局部图像对原始图像中的每一个像素点进行识别,避免信息遗漏,提高场景监控的精度。通过局部图像的置信度大小,可以确定局部图像是否与异常图像相似,置信度越大,局部图像与异常图像的相似度越高,局部图像中存在异常信息的可能性就越大。实现了自动确定异常信息,节约人力和时间,提高场景监控效率。
可选的,根据所述局部图像的置信度大小,确定所述原始图像中的异常信息,包括:
确定置信度大小超过预设的置信度阈值的局部图像为异常图像区域;
将所述异常图像区域转换为文本数据,并将所述文本数据确定为所述原始图像的异常信息。
通过置信度大小可以快速确定局部图像是否异常,提高监控效率。将异常的内容转换为文本格式,便于用户通过客户端查看,提升用户的场景监控体验。
可选的,预设的目标框大小为至少两种预设尺寸;根据所述局部图像的图像特征,确定所述局部图像的置信度大小,包括:
根据预先设置的不同目标框大小与异常图像之间的关联关系,确定各目标框大小对应的异常图像;
确定各目标框大小的局部图像与所述目标框大小对应的异常图像之间的置信度。
根据局部图像的目标框大小,确定对应的异常图像,避免了将局部图像与所有异常图像进行置信度计算的过程,减少计算量,有效提高场景监控的效率。
可选的,在根据预设的神经网络模型,从所述原始图像中确定异常信息之后,还包括:
根据预设的硬件视频流编码器,将所述异常信息对应的原始图像编码为视频流,并将所述视频流发送至所述客户端。
将视频流发送至客户端,供用户可以直观查看场景的异常情况。在发送视频流给客户端时,也可以发送异常信息以及异常信息所属的原始图像给客户端,即,客户端可以接收到动态视频和静态图片,便于用户查看,提升用户体验。
可选的,将所述视频流发送至所述客户端,包括:
根据预设的实时消息传输协议,将所述视频流推送至云平台,供所述云平台根据预设的网页实时通信协议,将所述视频流推送至所述客户端。
采取实时音视频通讯技术,边缘设备到云平台采用实时消息传输协议进行视频流推送,云平台支持实时消息传输协议视频流转化为网页实时通信协议视频流,云平台到客户端采用网页实时通信协议进行视频流推送。有效降低延时,达到实时直播的效果,提高场景监控的效率。
可选的,将所述异常信息发送至客户端进行告警,包括:
确定所述异常信息对应的原始图像的采集时间和采集地点,将所述采集时间、采集地点和异常信息发送至客户端。
通过获取原始图像的采集时间和采集地点,可以使用户知道出现异常的时间和地点, 便于及时对异常情况进行处理,提高场景监控的效率和精度。
可选的,获取预设场景内的原始图像,包括:
通过安装于所述边缘设备上的摄像头,基于预设的数据采集周期,获取预设场景内的视频帧,为所述原始图像。
通过摄像头自动获取预设场景的视频,得到原始图像,便于对各个预设场景进行实景的获取,及时了解场景内的情况,有效提高场景监控的效率和精度。
第二方面,本申请公开了一种基于边缘计算的场景监控装置,该装置配置于边缘设备,包括:
图像获取模块,用于获取预设场景内的原始图像;
信息确定模块,用于根据预设的神经网络模型,从所述原始图像中确定异常信息,其中,所述异常信息用于表示预设场景中需要被告警的信息,所述异常信息为所述原始图像中异常图像区域中的信息,所述异常图像区域为基于该异常图像区域与预设异常图像之间的置信度所确定的;
告警模块,用于将所述异常信息发送至客户端进行告警。
第三方面,本申请提供一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如第一方面所述的基于边缘计算的场景监控方法。
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如第一方面所述的基于边缘计算的场景监控方法。
第五方面,本申请提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如第一方面所述的基于边缘计算的场景监控方法。
结合上述技术方案,本申请提供的一种基于边缘计算的场景监控方法、装置、设备及存储介质,通过对预设场景进行视频监控,得到场景内的图像帧,作为原始图像。根据预设的神经网络模型,对原始图像进行特征提取,得到原始图像中的异常信息。边缘设备直接将异常信息发送给客户端,由客户端进行告警。避免了由边缘设备将视频发送给云平台,由云平进行异常识别,再由云平台将识别结果发给客户端的过程。解决了现有技术中网络传输造成延时的问题,减轻了云平台的压力,提高场景监控的效率和精度。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1为本申请实施例提供的一种基于边缘计算的场景监控系统的结构框图;
图2为本申请实施例提供的一种基于边缘计算的场景监控方法的流程示意图;
图3为本申请实施例提供的一种基于边缘计算的场景监控系统的结构框图;
图4为本申请实施例提供的一种基于边缘计算的场景监控方法的流程示意图;
图5为本申请实施例提供的一种基于边缘计算的场景监控方法的流程示意图;
图6为本申请实施例提供的一种基于边缘计算的场景监控装置的结构框图;
图7为本申请实施例提供的一种基于边缘计算的场景监控装置的结构框图;
图8为本申请实施例提供的一种电子设备的结构框图;
图9为本申请实施例提供的一种电子设备的结构框图。
通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要注意的是,由于篇幅所限,本申请说明书没有穷举所有可选的实施方式,本领域技术人员在阅读本申请说明书后,应该能够想到,只要技术特征不互相矛盾,那么技术特征的任意组合均可以构成可选的实施方式。
对工业安全场景中的异常情况进行识别和告警,需要数据与算法的紧密配合。在场景监控的相关技术中,系统的整体架构设计可分为边缘设备、云平台和客户端三部分,图1为本申请中的一种基于边缘计算的场景监控系统的结构框图。边缘设备使用摄像头设备对生产现场进行场景的视频采集,将采集到的视频数据发送给云平台,云平台对视频数据进行识别,确定生产现场中是否存在异常,即,对异常的响应服务部署于云平台上。若确定存在异常,则云平台可以将异常信息发送给客户端。例如,可以推送到客户端上的企业平台或安全生产大数据平台等。客户端可以是手机端或PC(Personal Computer,个人计算机)端等。
但是,边缘设备有多台,云平台需要对多台边缘设备进行视频数据中异常信息的识别,云平台的压力较大,容易出现差错,使场景监控的精度较低。且边缘设备与云平台之间进行数据传输之后,云平台再与客户端之间进行数据传输,多次网络传输容易带来较高的延时,影响场景监控的效率。
本申请提供的一种基于边缘计算的场景监控方法、装置、设备及存储介质,旨在解决现有技术的如上技术问题。
下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本申请的实施例进行描述。
图2是根据本申请实施例提供的一种基于边缘计算的场景监控方法的流程示意图,本实施例提供的方法应用于边缘设备,由一种基于边缘计算的场景监控装置执行。如图2所示,该方法包括以下步骤:
S201、获取预设场景内的原始图像。
示例性地,边缘设备可以是布置在预设场景侧的设备,可以有多个预设场景,边缘设备可以对多个预设场景进行监控。预设场景可以是工业生产的现场,边缘设备可以是移动 终端等设备,在预设场景中可以有一台或多台边缘设备。在边缘设备上可以安装有图像采集设备,可以将边缘设备的图像采集设备布置在预设场景中,来对预设场景的视频进行实时或定时采集。边缘设备获取图像采集设备采集到的视频,视频中包括多帧视频帧,每一帧视频帧可以是一张原始图像,从而得到预设场景内的原始图像。
边缘设备也可以不采集视频,直接对预设场景进行实时或定时地图像采集,所得到图像即为原始图像。例如,边缘设备可以通过图像采集设备,每10分钟获取一张原始图像;或者,边缘设备可以通过图像采集设备,每10分钟获取一段3分钟的视频,将这三分钟视频中的视频帧作为原始图像。
本实施例中,获取预设场景内的原始图像,包括:通过安装于边缘设备上的摄像头,基于预设的数据采集周期,获取预设场景内的视频帧,为原始图像。
具体的,边缘设备安装有图像采集设备,图像采集设备可以是摄像头。预先设置数据采集周期,数据采集周期是摄像头对预设场景进行视频采集的周期,例如,数据采集周期是10分钟,即,每10分钟采集一段预设场景内的视频。可以预设每段视频的长度,例如,每次采集5分钟的视频。
根据预设的数据采集周期,由摄像头定时采集预设场景内的视频。边缘设备获取到预设场景的视频,对视频进行分解,得到视频中的视频帧,将视频中的各视频帧确定为原始图像。
这样设置的有益效果在于,通过摄像头自动获取预设场景的视频,得到原始图像,便于对各个预设场景进行实景的获取,及时了解场景内的情况,有效提高场景监控的效率和精度。
S202、根据预设的神经网络模型,从原始图像中确定异常信息,其中,异常信息用于表示预设场景中需要被告警的信息,异常信息为原始图像中异常图像区域中的信息,异常图像区域为基于该异常图像区域与预设异常图像之间的置信度所确定的。
示例性地,预先构建一个神经网络模型,神经网络模型可以用于进行图像处理,来识别出原始图像中的异常信息。异常信息是指在预设场景中需要被告警的信息,例如,异常信息可以包括设备的不安全状态和人员的不安全行为等信息,设备的不安全状态可以包括通电电线裸露在外、转动或切割设备等危险设备缺乏保护、有毒气体泄漏等;人员的不安全行为可以包括工人在工地不佩戴安全帽、不穿安全服、进行高空作业不正确佩戴安全带、在易燃易爆设备附近抽烟或打电话等。
通过神经网络模型,可以识别出原始图像中是否存在异常信息,若存在,则确定出原始图像中的异常信息。例如,可以识别原始图像中是否存在人员,若存在,则识别人员头部是否未戴安全帽,若未戴安全帽,则确定原始图像中存在异常信息,并确定所存在的异常信息为人员未戴安全帽。神经网络模型中可以包括卷积层、池化层和全连接层等,通过神经网络模型进行特征提取,得到原始图像的图像特征,根据图像特征,得到异常信息。例如,根据图像特征可以进行人员识别,进而确定人员是否戴有安全帽;或者,可以确定该原始图像的清晰度,根据清晰度可以确定预设场景中是否存在烟雾等异常信息。
异常信息可以是原始图像中异常图像区域中的信息,异常图像区域是原始图像中存在异常信息的区域,例如,异常图像区域可以是一个矩形框的区域。异常图像区域可以位于原始图像中的任意位置处,例如,异常图像区域可以在原始图像的右上方或中间位置等。 可以将原始图像划分为多个区域,确定这些区域中是否存在异常信息,将存在异常信息的区域确定为异常图像区域。
异常图像区域可以是基于该异常图像区域与预设异常图像之间的置信度所确定的,预设异常图像是预先设置的存在异常信息的图像,计算原始图像中各个区域与预设异常图像之间的置信度,根据置信度大小确定原始图像中的各个区域是否为异常图像区域。例如,将置信度超过预设的置信度阈值的区域确定为存在异常信息的异常图像区域。可以先通过置信度确定存在异常信息的异常图像区域,再对异常图像区域进行识别,识别出异常图像区域中的异常信息,作为该异常图像区域所在的原始图像的异常信息。
本实施例中,预设的神经网络模型可以是自动编码器,预先训练自动编码器,训练样本为不存在异常信息的正常场景图片。将训练样本输入至自动编码器中,可以输出重构的图像,重构的图像会趋近于正常场景图片。因此,对自动编码器的泛化能力做显式抑制处理,逼迫重构后的图像趋近于正常场景图片,完成自动编码器的训练。
在自动编码器训练完成后,向自动编码器输入原始图像,通过对比原始图像和重构后的图像,确定原始图像和重构后的图像的相似度,根据相似度可以判断原始图像中是否存在异常。若原始图像是正常场景图像,则原始图像和重构后的图像相似度高;若原始图像中存在异常信息,则原始图像和重构后的图像相似度低。可以预设一个相似度阈值,若相似度高于相似度阈值,则确定原始图像和重构后的图像相似度低,该原始图像中存在异常信息。
在确定原始图像中存在异常信息后,再对原始图像中的图像内容进行特征提取,确定异常信息。例如,可以预先训练一个有监督神经网络模型来得到原始图像的异常信息。自动编码器为无监督神经网络模型,从而实现有监督和无监督的神经网络模型的结合应用。
无监督神经网络模型和有监督神经网络模型都可以预先进行训练。在训练神经网络模型时,可以利用深度神经网络加速优化技术,对神经网络模型进行优化,提高模型的工作效率和精度。例如,模型的优化加速方向可以包括卷积优化、模型剪枝和模型量化等。其中,模型量化可以是将量化器中的参数转换为可训练对象,从而构建模型权重和激活的高比特表示到低比特表示的一个映射关系。进行关键层的量化函数训练,提高量化感知训练的速度,使模型在执行少量完成的量化感知训练后快速恢复准确率。
S203、将异常信息发送至客户端进行告警。
示例性地,若原始图像中不存在异常信息,则边缘设备继续获取预设场景内的新的原始图像,确定新的原始图像中是否存在异常信息。若原始图像中存在异常信息,则边缘设备将异常信息发送给客户端,由客户端进行告警,发出提示信息。提示信息可以是短信提示或语音提示等。边缘设备也可以在预设场景中发出广播等提示。
边缘设备可以预先与一个或多个客户端进行绑定,在确定异常信息,将异常信息发送给绑定的客户端。
可以预设有多个场景,每个场景可以对应不同的客户端。在确定存在异常信息后,确定该异常信息所在的场景。确定该场景对应的客户端,将异常信息发送给该客户端。即,不同客户端具有查看不同场景的权限。可以将异常信息发送给手机中的企业群,或者发送给PC端中的企业管理平台等。可以基于Wi-Fi(无线网络通信技术)和低功耗蓝牙技术等无线传输技术进行异常信息的传输。图3为本申请实施例中的一种基于边缘计算的场景监 控系统的结构框图。图3中,基于边缘计算的场景监控系统可以包括边缘设备和客户端,没有云平台。边缘设备直接与客户端进行通信,边缘设备进行图像识别和异常信息的确定,将异常信息发送给客户端,由客户端进行告警处理。减轻了云平台的压力,并减少了网络传输的延时,提高监控效率。
本实施例中,将异常信息发送至客户端进行告警,包括:确定异常信息对应的原始图像的采集时间和采集地点,将采集时间、采集地点和异常信息发送至客户端。
具体的,在确定原始图像中存在异常信息后,确定存在异常信息的原始图像的采集时间和采集地点。例如,可以确定该原始图像所对应的预设场景的编号,作为采集地点。边缘设备在获取原始图像时,会存储每一张原始图像的采集时间和采集地点,便于后续获取给客户端。
确定异常信息所在的原始图像,获取该原始图像的采集时间和采集地点,将采集时间、采集地点和异常信息共同发送给客户端,使用户可以清楚查看预设场景的异常,并快速到达场景进行调整。
这样设置的有益效果在于,通过获取原始图像的采集时间和采集地点,可以使用户知道出现异常的时间和地点,便于及时对异常情况进行处理,提高场景监控的效率和精度。
本申请实施例提供的一种基于边缘计算的场景监控方法,通过对预设场景进行视频监控,得到场景内的图像帧,作为原始图像。根据预设的神经网络模型,对原始图像进行特征提取,得到原始图像中的异常信息。边缘设备直接将异常信息发送给客户端,由客户端进行告警。避免了由边缘设备将视频发送给云平台,由云平进行异常识别,再由云平台将异常信息发给客户端的过程。解决了现有技术中网络传输造成延时的问题,减轻了云平台的压力,提高场景监控的效率和精度。
图4为本申请实施例提供的一种基于边缘计算的场景监控方法的流程示意图,该实施例是在上述实施例基础上的可选实施例。
本实施例中,根据预设的神经网络模型,从原始图像中确定异常信息,可细化为:将原始图像输入至预设的神经网络模型中,根据预设的目标框大小,对原始图像进行遍历,得到原始图像中的至少两张局部图像,其中,局部图像的大小为预设的目标框大小;提取局部图像的图像特征,并根据局部图像的图像特征,确定局部图像的置信度大小,置信度大小表示局部图像与预设异常图像之间的置信度;根据局部图像的置信度大小,确定原始图像中的异常信息。
如图4所示,该方法包括以下步骤:
S401、获取预设场景内的原始图像。
示例性地,本步骤可以参见上述步骤S201,不再赘述。
S402、将原始图像输入至预设的神经网络模型中,根据预设的目标框大小,对原始图像进行遍历,得到原始图像中的至少两张局部图像,其中,局部图像的大小为预设的目标框大小。
示例性地,预先构建神经网络模型,神经网络模型的输入是原始图像,输出是原始图像中的异常信息。将原始图像输入至神经网络模型中,神经网络模型中设置有目标框大小,目标框大小等于或小于原始图像的大小,例如,目标框大小可以是10×10像素。
根据目标框大小,对原始图像进行遍历,就是以目标框大小在原始图像上进行框选。 按照原始图像的像素点排序,对原始图像进行划分,框选出多个目标框大小的图像,作为局部图像,原始图像中的每一个像素点能够被划分到至少一张局部图像中。例如,目标框大小是10×10,则可以先将原始图像中前10行和前10列的区域作为一个局部图像,再从第二列开始依次向后再框选10×10的区域,作为又一个局部图像,即,可以将前10行和第2至11列的区域作为又一个局部图像。也可以跳过已经被划分到局部图像中的像素点,将还未划分到局部图像中的像素点进行划分,例如,在将前10行和前10列的区域划分为一个局部图像后,将前10行和第11至20列的区域划分为又一个局部图像。一张原始图像能够划分出多张局部图像,每张局部图像的大小一致,均为目标框大小。
通过划分局部图像,可以基于局部图像对原始图像中的每一个像素点进行识别,避免信息遗漏,提高场景监控的精度。
可以预先设置不同尺寸的目标框大小,例如,目标框大小可以是10×10和80×80,10×10为小目标框,80×80为大目标框。若预设不同尺寸的目标框大小,则对于每种目标框大小,都要遍历原始图像中的像素点,使原始图像中的每一个像素点都能被划分到各个目标框大小的局部图像中。对于同一张原始图像,不同目标框大小的局部图像的数量可以不同。实现了支持多尺度的异常识别,对大目标框的异常和小目标框的异常都具备较好的识别效果,以适应工业生产场景中的多尺度特征。本实施例中,多尺度是指目标框大小的多样性,既能以大目标框从大范围的角度进行异常识别,也能以小目标框更精细地进行异常识别。例如,若工业生产场景中对整个人体进行异常识别,则可以对大目标框的局部图像进行异常识别;若对人体头部安全帽进行识别,则可以对小目标框的局部图像进行异常识别。
S403、提取局部图像的图像特征,并根据局部图像的图像特征,确定局部图像的置信度大小,置信度大小表示局部图像与预设异常图像之间的置信度。
示例性地,神经网络模型中包括卷积层、池化层和全连接层等,可以对图像进行特征提取。在得到局部图像后,根据神经网络模型,提取各个局部图像的图像特征。
预先设置一张或多张异常图像,异常图像为包含异常信息的图像,例如,异常图像为人员未戴安全帽的图像。根据局部图像的图像特征,基于预设的置信度计算公式,计算各局部图像与各异常图像之间的置信度,将局部图像与异常图像之间的置信度作为局部图像的置信度大小。若有两张预设的异常图像,则每张局部图像可以得到两个置信度大小。
通过局部图像的置信度大小,可以确定局部图像是否与异常图像相似,置信度越大,局部图像与异常图像的相似度越高,局部图像中存在异常信息的可能性就越大。
本实施例中,预设的目标框大小为至少两种预设尺寸;根据局部图像的图像特征,确定局部图像的置信度大小,包括:根据预先设置的不同目标框大小与异常图像之间的关联关系,确定各目标框大小对应的异常图像;确定各目标框大小的局部图像与目标框大小对应的异常图像之间的置信度。
具体的,预先设置不同尺寸的目标框大小,根据预设的目标框大小,可以预设不同尺寸的异常图像。每个目标框大小可以对应多张异常图像。可以预先设置目标框大小与异常图像之间的关联关系,例如,对于10×10的目标框大小,关联的异常图像可以是人体头部的小区域的图像;对于80×80的目标框大小,关联的异常图像可以是整个人体的大区域的图像。
在得到局部图像后,根据局部图像对应的目标框大小,确定该目标框大小对应的异常图像。计算局部图像与对应的各张异常图像之间的置信度,得到局部图像的置信度大小。每张局部图像可以得到多个置信度大小,从而确定局部图像与多张异常图像是否相似。
这样设置的有益效果在于,根据局部图像的目标框大小,确定对应的异常图像,避免了将局部图像与所有异常图像进行置信度计算的过程,减少计算量,有效提高场景监控的效率。
S404、根据局部图像的置信度大小,确定原始图像中的异常信息。
示例性地,在得到各局部图像的置信度大小之后,根据置信度大小,确定局部图像中是否存在异常信息。若存在,则确定该局部图像所在的原始图像中存在异常信息。确定该局部图像中的异常信息,即得到原始图像中的异常信息。
局部图像与多张异常图像进行置信度计算,因此,可以比较局部图像的各个置信度大小,置信度越大,则确定局部图像越接近该置信度所对应的异常图像。若局部图像接近异常图像,则确定局部图像中存在异常信息。即,可以根据置信度大小的比较,可以确定局部图像中是否存在异常信息。
在确定局部图像中存在异常信息后,可以根据局部图像中的图像内容,将图像内容提取为文本数据,例如,局部图像为人员未戴安全帽,则异常信息可以是“未戴安全帽”。实现了自动确定异常信息,节约人力和时间,提高场景监控效率。
本实施例中,根据局部图像的置信度大小,确定原始图像中的异常信息,包括:确定置信度大小超过预设的置信度阈值的局部图像为异常图像区域;将异常图像区域转换为文本数据,并将文本数据确定为原始图像的异常信息。
具体的,预先设置置信度阈值,置信度阈值是局部图像中不存在异常信息时允许的最大置信度值。即,若局部图像的置信度大小超过置信度阈值,则确定局部图像中存在异常信息。
将局部图像的置信度大小与预设的置信度阈值进行比较,确定局部图像的置信度大小是否超过预设的置信度阈值,若是,则确定该局部图像中存在异常信息,将该局部图像确定为异常图像区域;若否,则确定该局部图像中不存在异常信息,不需要对该局部图像进行异常信息的确定。若确定局部图像为异常图像区域,则将异常图像区域转换为文本数据,并将文本数据确定为该异常图像区域所属的原始图像的异常信息。
可以预先设置每张异常图像的异常信息,预设的异常信息为文本数据的格式。在确定局部图像中存在异常信息后,确定该局部图像的置信度大小超过置信度阈值所对应的异常图像,将该异常图像的异常信息确定为局部图像的异常信息,即,作为原始图像的异常信息。
这样设置的有益效果在于,通过置信度大小可以快速确定局部图像是否异常,提高监控效率。将异常的内容转换为文本格式,便于用户通过客户端查看,提升用户的场景监控体验。
S405、将异常信息发送至客户端进行告警。
示例性地,本步骤可以参见上述步骤S203,不再赘述。
本申请实施例提供的一种基于边缘计算的场景监控方法,通过对预设场景进行视频监控,得到场景内的图像帧,作为原始图像。根据预设的神经网络模型,对原始图像进行特 征提取,得到原始图像中的异常信息。边缘设备直接将异常信息发送给客户端,由客户端进行告警。避免了由边缘设备将视频发送给云平台,由云平进行异常识别,再由云平台将异常信息发给客户端的过程。解决了现有技术中网络传输造成延时的问题,减轻了云平台的压力,提高场景监控的效率和精度。
图5为本申请实施例提供的一种基于边缘计算的场景监控方法的流程示意图,该实施例是在上述实施例基础上的可选实施例。
本实施例中,在根据预设的神经网络模型,从原始图像中确定异常信息之后,可追加:根据预设的硬件视频流编码器,将异常信息对应的原始图像编码为视频流,并将视频流发送至客户端。
如图5所示,该方法包括以下步骤:
S501、获取预设场景内的原始图像。
示例性地,本步骤可以参见上述步骤S201,不再赘述。
S502、根据预设的神经网络模型,从原始图像中确定异常信息,其中,异常信息用于表示预设场景中需要被告警的信息,异常信息为原始图像中异常图像区域中的信息,异常图像区域为基于该异常图像区域与预设异常图像之间的置信度所确定的。
示例性地,本步骤可以参见上述步骤S202,不再赘述。
S503、根据预设的硬件视频流编码器,将异常信息对应的原始图像编码为视频流,并将视频流发送至客户端。
示例性地,在边缘设备中预先设置一个硬件视频流编码器,硬件视频流编码器可以用于将多张原始图像编码为视频流。使用硬件视频流编码器,高效利用GPU(Graphic Processing Unit,图形处理器)资源而非抢占CPU(Central Processing Unit,中央处理器)资源。即,识别存在异常信息后,存在异常信息的原始图像经过硬件视频流编码器处理,与传统的软件编码相比,能大幅降低边缘设备的CPU利用率,以达到长期稳定工作的效果。
确定存在异常信息的连续的原始图像,将连续的原始图像编码为视频流。若只有一张存在异常信息的原始图像,或存在异常信息的原始图像不连续,则可以确定位于该原始图像前后的预设数量的原始图像,将这些原始图像编码为视频流。
将视频流发送至客户端,供用户可以直观查看场景的异常情况。在发送视频流给客户端时,也可以发送异常信息以及异常信息所属的原始图像给客户端,即,客户端可以接收到动态视频和静态图片,便于用户查看,提升用户体验。
本实施例中,将视频流发送至客户端,包括:根据预设的实时消息传输协议,将视频流推送至云平台,供云平台根据预设的网页实时通信协议,将视频流推送至客户端。
具体的,边缘设备除了可以与客户端进行通信,还可以与云平台进行通信,云平台还可以与客户端进行通信。边缘设备在得到视频流后,可以根据预设的传输协议,将视频流发送给云平台,由云平台再将视频流发送给客户端。例如,边缘设备根据RTMP(Real Time Messaging Protocol,实时消息传输协议),将视频流推送至云平台,云平台再根据WebRTC(Web Real-Time Communication,网页实时通信协议),将视频流推送至客户端。
这样设置的有益效果在于,采取实时音视频通讯技术,边缘设备到云平台采用RTMP协议进行视频流推送,云平台支持RTMP视频流转化为WebRTC视频流,云平台到客户端采用WebRTC协议进行视频流推送。即,从边缘设备发布视频流到云平台,云平台通过流 媒体处理服务将视频流发送到客户端,有效降低延时,实测延时300毫秒,达到实时直播的效果,提高场景监控的效率。
本申请实施例提供的一种基于边缘计算的场景监控方法,通过对预设场景进行视频监控,得到场景内的图像帧,作为原始图像。根据预设的神经网络模型,对原始图像进行特征提取,得到原始图像中的异常信息。边缘设备直接将异常信息发送给客户端,由客户端进行告警。避免了由边缘设备将视频发送给云平台,由云平进行异常识别,再由云平台将异常信息发给客户端的过程。解决了现有技术中网络传输造成延时的问题,减轻了云平台的压力,提高场景监控的效率和精度。
图6为本申请实施例提供的一种基于边缘计算的场景监控装置的结构框图,该装置配置于边缘设备。为了便于说明,仅示出了与本公开实施例相关的部分。参照图6,该装置包括:图像获取模块601、信息确定模块602和告警模块603。
图像获取模块601,用于获取预设场景内的原始图像;
信息确定模块602,用于根据预设的神经网络模型,从所述原始图像中确定异常信息,其中,所述异常信息用于表示预设场景中需要被告警的信息,所述异常信息为所述原始图像中异常图像区域中的信息,所述异常图像区域为基于该异常图像区域与预设异常图像之间的置信度所确定的;
告警模块603,用于将所述异常信息发送至客户端进行告警。
图7为本申请实施例提供的一种基于边缘计算的场景监控装置的结构框图,在图6所示实施例的基础上,如图7所示,信息确定模块602包括局部图像获得单元6021、置信度确定单元6022和异常信息确定单元6023。
一个示例中,局部图像获得单元6021,用于将所述原始图像输入至预设的神经网络模型中,根据预设的目标框大小,对所述原始图像进行遍历,得到所述原始图像中的至少两张局部图像,其中,所述局部图像的大小为预设的目标框大小;
置信度确定单元6022,用于提取所述局部图像的图像特征,并根据所述局部图像的图像特征,确定所述局部图像的置信度大小,所述置信度大小表示局部图像与预设异常图像之间的置信度;
异常信息确定单元6023,用于根据所述局部图像的置信度大小,确定所述原始图像中的异常信息。
一个示例中,异常信息确定单元6023,具体用于:
确定置信度大小超过预设的置信度阈值的局部图像为异常图像区域;
将所述异常图像区域转换为文本数据,并将所述文本数据确定为所述原始图像的异常信息。
一个示例中,预设的目标框大小为至少两种预设尺寸;置信度确定单元6022,具体用于:
根据预先设置的不同目标框大小与异常图像之间的关联关系,确定各目标框大小对应的异常图像;
确定各目标框大小的局部图像与所述目标框大小对应的异常图像之间的置信度。
一个示例中,该装置还包括:
视频流发送模块,用于在根据预设的神经网络模型,从所述原始图像中确定异常信息 之后,根据预设的硬件视频流编码器,将所述异常信息对应的原始图像编码为视频流,并将所述视频流发送至所述客户端。
一个示例中,视频流发送模块,具体用于:
根据预设的实时消息传输协议,将所述视频流推送至云平台,供所述云平台根据预设的网页实时通信协议,将所述视频流推送至所述客户端。
一个示例中,告警模块603,具体用于:
确定所述异常信息对应的原始图像的采集时间和采集地点,将所述采集时间、采集地点和异常信息发送至客户端。
一个示例中,图像获取模块601,具体用于:
通过安装于所述边缘设备上的摄像头,基于预设的数据采集周期,获取预设场景内的视频帧,为所述原始图像。
图8为本申请实施例提供的一种电子设备的结构框图,如图8所示,电子设备包括:存储器81,处理器82;存储器81;用于存储处理器82可执行指令的存储器。
其中,处理器82被配置为执行如上述实施例提供的方法。
电子设备还包括接收器83和发送器84。接收器83用于接收其他设备发送的指令和数据,发送器84用于向外部设备发送指令和数据。
图9是根据一示例性实施例示出的一种电子设备的结构框图,该设备可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理,车辆等。
设备900可以包括以下一个或多个组件:处理组件902,存储器904,电源组件906,多媒体组件908,音频组件910,输入/输出(I/O)接口912,传感器组件914,以及通信组件916。
处理组件902通常控制设备900的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件902可以包括一个或多个处理器920来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件902可以包括一个或多个模块,便于处理组件902和其他组件之间的交互。例如,处理组件902可以包括多媒体模块,以方便多媒体组件908和处理组件902之间的交互。
存储器904被配置为存储各种类型的数据以支持在设备900的操作。这些数据的示例包括用于在设备900上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器904可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件906为设备900的各种组件提供电力。电源组件906可以包括电源管理系统,一个或多个电源,及其他与为设备900生成、管理和分配电力相关联的组件。
多媒体组件908包括在所述设备900和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动 作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件908包括一个前置摄像头和/或后置摄像头。当设备900处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件910被配置为输出和/或输入音频信号。例如,音频组件910包括一个麦克风(MIC),当设备900处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器904或经由通信组件916发送。在一些实施例中,音频组件910还包括一个扬声器,用于输出音频信号。
I/O接口912为处理组件902和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件914包括一个或多个传感器,用于为设备900提供各个方面的状态评估。例如,传感器组件914可以检测到设备900的打开/关闭状态,组件的相对定位,例如所述组件为设备900的显示器和小键盘,传感器组件914还可以检测设备900或设备900一个组件的位置改变,用户与设备900接触的存在或不存在,设备900方位或加速/减速和设备900的温度变化。传感器组件914可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件914还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件914还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件916被配置为便于设备900和其他设备之间有线或无线方式的通信。设备900可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件916经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件916还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,设备900可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器904,上述指令可由设备900的处理器920执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
一种非临时性计算机可读存储介质,当该存储介质中的指令由终端设备的处理器执行时,使得终端设备能够执行上述终端设备的基于边缘计算的场景监控方法。
本申请还公开了一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如本实施例中所述的方法。
本申请以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片 上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本申请的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或电子设备上执行。
在本申请的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据电子设备)、或者包括中间件部件的计算系统(例如,应用电子设备)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和电子设备。客户端和电子设备一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-电子设备关系的计算机程序来产生客户端和电子设备的关系。电子设备可以是云电子设备,又称为云计算电子设备或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务("Virtual Private Server",或简称"VPS")中,存在的管理难度大,业务扩展性弱的缺陷。电子设备也可以为分布式系统的电子设备,或者是结合了区块链的电子设备。应该理 解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请公开的技术方案所期望的结果,本文在此不进行限制。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求书指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求书来限制。

Claims (12)

  1. 一种基于边缘计算的场景监控方法,其特征在于,所述方法应用于边缘设备,所述方法包括:
    获取预设场景内的原始图像;
    根据预设的神经网络模型,从所述原始图像中确定异常信息,其中,所述异常信息用于表示预设场景中需要被告警的信息,所述异常信息为所述原始图像中异常图像区域中的信息,所述异常图像区域为基于该异常图像区域与预设异常图像之间的置信度所确定的;
    将所述异常信息发送至客户端进行告警。
  2. 根据权利要求1所述的方法,其特征在于,根据预设的神经网络模型,从所述原始图像中确定异常信息,包括:
    将所述原始图像输入至预设的神经网络模型中,根据预设的目标框大小,对所述原始图像进行遍历,得到所述原始图像中的至少两张局部图像,其中,所述局部图像的大小为预设的目标框大小;
    提取所述局部图像的图像特征,并根据所述局部图像的图像特征,确定所述局部图像的置信度大小,所述置信度大小表示局部图像与预设异常图像之间的置信度;
    根据所述局部图像的置信度大小,确定所述原始图像中的异常信息。
  3. 根据权利要求2所述的方法,其特征在于,根据所述局部图像的置信度大小,确定所述原始图像中的异常信息,包括:
    确定置信度大小超过预设的置信度阈值的局部图像为异常图像区域;
    将所述异常图像区域转换为文本数据,并将所述文本数据确定为所述原始图像的异常信息。
  4. 根据权利要求2所述的方法,其特征在于,预设的目标框大小为至少两种预设尺寸;根据所述局部图像的图像特征,确定所述局部图像的置信度大小,包括:
    根据预先设置的不同目标框大小与异常图像之间的关联关系,确定各目标框大小对应的异常图像;
    确定各目标框大小的局部图像与所述目标框大小对应的异常图像之间的置信度。
  5. 根据权利要求1-4中任一所述的方法,其特征在于,在根据预设的神经网络模型,从所述原始图像中确定异常信息之后,还包括:
    根据预设的硬件视频流编码器,将所述异常信息对应的原始图像编码为视频流,并将所述视频流发送至所述客户端。
  6. 根据权利要求5所述的方法,其特征在于,将所述视频流发送至所述客户端,包括:
    根据预设的实时消息传输协议,将所述视频流推送至云平台,供所述云平台根据预设的网页实时通信协议,将所述视频流推送至所述客户端。
  7. 根据权利要求1-6中任一所述的方法,其特征在于,将所述异常信息发送至客户端进行告警,包括:
    确定所述异常信息对应的原始图像的采集时间和采集地点,将所述采集时间、采集地点和异常信息发送至客户端。
  8. 根据权利要求1-6中任一所述的方法,其特征在于,获取预设场景内的原始图像,包括:
    通过安装于所述边缘设备上的摄像头,基于预设的数据采集周期,获取预设场景内的视频帧,为所述原始图像。
  9. 一种基于边缘计算的场景监控装置,其特征在于,所述装置配置于边缘设备,所述装置包括:
    图像获取模块,用于获取预设场景内的原始图像;
    信息确定模块,用于根据预设的神经网络模型,从所述原始图像中确定异常信息,其中,所述异常信息用于表示预设场景中需要被告警的信息,所述异常信息为所述原始图像中异常图像区域中的信息,所述异常图像区域为基于该异常图像区域与预设异常图像之间的置信度所确定的;
    告警模块,用于将所述异常信息发送至客户端进行告警。
  10. 一种电子设备,其特征在于,包括:处理器,以及与所述处理器通信连接的存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,以实现如权利要求1-8中任一项所述的基于边缘计算的场景监控方法。
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如权利要求1-8中任一项所述的基于边缘计算的场景监控方法。
  12. 一种计算机程序产品,其特征在于,包括计算机程序,该计算机程序被处理器执行时实现如权利要求1-8中任一项所述的基于边缘计算的场景监控方法。
PCT/CN2022/111842 2022-06-22 2022-08-11 基于边缘计算的场景监控方法、装置、设备及存储介质 WO2023245833A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210727642.6A CN115171007A (zh) 2022-06-22 2022-06-22 基于边缘计算的场景监控方法、装置、设备及存储介质
CN202210727642.6 2022-06-22

Publications (1)

Publication Number Publication Date
WO2023245833A1 true WO2023245833A1 (zh) 2023-12-28

Family

ID=83486509

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/111842 WO2023245833A1 (zh) 2022-06-22 2022-08-11 基于边缘计算的场景监控方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN115171007A (zh)
WO (1) WO2023245833A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130204A1 (en) * 2017-10-31 2019-05-02 The University Of Florida Research Foundation, Incorporated Apparatus and method for detecting scene text in an image
CN111738240A (zh) * 2020-08-20 2020-10-02 江苏神彩科技股份有限公司 区域监测方法、装置、设备及存储介质
CN112257604A (zh) * 2020-10-23 2021-01-22 北京百度网讯科技有限公司 图像检测方法、装置、电子设备和存储介质
CN112784797A (zh) * 2021-01-29 2021-05-11 北京百度网讯科技有限公司 目标图像识别方法和装置
CN112989987A (zh) * 2021-03-09 2021-06-18 北京京东乾石科技有限公司 用于识别人群行为的方法、装置、设备以及存储介质
CN113111782A (zh) * 2021-04-14 2021-07-13 中国工商银行股份有限公司 基于显著对象检测的视频监控方法及装置
WO2022046077A1 (en) * 2020-08-28 2022-03-03 Siemens Aktiengesellschaft Incremental learning for anomaly detection and localization in images

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130204A1 (en) * 2017-10-31 2019-05-02 The University Of Florida Research Foundation, Incorporated Apparatus and method for detecting scene text in an image
CN111738240A (zh) * 2020-08-20 2020-10-02 江苏神彩科技股份有限公司 区域监测方法、装置、设备及存储介质
WO2022046077A1 (en) * 2020-08-28 2022-03-03 Siemens Aktiengesellschaft Incremental learning for anomaly detection and localization in images
CN112257604A (zh) * 2020-10-23 2021-01-22 北京百度网讯科技有限公司 图像检测方法、装置、电子设备和存储介质
CN112784797A (zh) * 2021-01-29 2021-05-11 北京百度网讯科技有限公司 目标图像识别方法和装置
CN112989987A (zh) * 2021-03-09 2021-06-18 北京京东乾石科技有限公司 用于识别人群行为的方法、装置、设备以及存储介质
CN113111782A (zh) * 2021-04-14 2021-07-13 中国工商银行股份有限公司 基于显著对象检测的视频监控方法及装置

Also Published As

Publication number Publication date
CN115171007A (zh) 2022-10-11

Similar Documents

Publication Publication Date Title
KR101852284B1 (ko) 경보 방법 및 장치
US8903317B2 (en) System and method for controlling an infrared camera using a mobile phone
EP2688296B1 (en) Video monitoring system and method
WO2016041340A1 (zh) 一种提示方法及移动终端
US11264027B2 (en) Method and apparatus for determining target audio data during application waking-up
US20190051147A1 (en) Remote control method, apparatus, terminal device, and computer readable storage medium
US9131106B2 (en) Obscuring a camera lens to terminate video output
CN108307106B (zh) 一种图像处理方法、装置及移动终端
US10768682B2 (en) Detection-based wakeup of detection devices
KR20170094745A (ko) 영상 인코딩 방법 및 이를 지원하는 전자 장치
KR102154457B1 (ko) 상태 검출 방법, 장치 및 저장 매체
US20190020803A1 (en) Controlling flash behavior during capture of image data
WO2022052613A1 (zh) 摄像头的控制方法、装置、电子设备及存储介质
CN109981890B (zh) 一种提醒任务处理方法、终端及计算机可读存储介质
US11816269B1 (en) Gesture recognition for wearable multimedia device using real-time data streams
WO2023245833A1 (zh) 基于边缘计算的场景监控方法、装置、设备及存储介质
CN104023207A (zh) 单呼即通实况通信终端、方法及工具
CN103956032A (zh) Dvr安防报警方法和系统
EP3428779A1 (en) User-machine interaction method and system based on feedback signals
KR20180020374A (ko) 이벤트 검색 시스템, 장치 및 방법
CN115291792A (zh) 一种显示设备及其控制方法
CN114553725B (zh) 一种机房监控告警方法、装置、电子设备及存储介质
US20160125303A1 (en) Method and apparatus for calculating smart indicator
CN106375646B (zh) 一种信息处理方法及终端
CN114520955A (zh) 信息发送方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22947574

Country of ref document: EP

Kind code of ref document: A1