CN108921012B - Method for processing image video frame by using artificial intelligence chip - Google Patents

Method for processing image video frame by using artificial intelligence chip Download PDF

Info

Publication number
CN108921012B
CN108921012B CN201810470989.0A CN201810470989A CN108921012B CN 108921012 B CN108921012 B CN 108921012B CN 201810470989 A CN201810470989 A CN 201810470989A CN 108921012 B CN108921012 B CN 108921012B
Authority
CN
China
Prior art keywords
image
monitoring
video frame
emergency
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810470989.0A
Other languages
Chinese (zh)
Other versions
CN108921012A (en
Inventor
高钰峰
陈云霁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201810470989.0A priority Critical patent/CN108921012B/en
Publication of CN108921012A publication Critical patent/CN108921012A/en
Application granted granted Critical
Publication of CN108921012B publication Critical patent/CN108921012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

The present disclosure provides a method for processing image video frames by using an artificial intelligence chip, comprising: the image processing device acquires a monitoring image shot by a monitoring system in real time; and the image processing device receives the video frame in the monitoring image, performs artificial neural network operation on the video frame, and outputs the emergency type data corresponding to the monitoring image after the operation. The method can judge the types of the emergency events in the monitoring video in real time by a computer program in a machine learning mode, and saves a large amount of human resources.

Description

Method for processing image video frame by using artificial intelligence chip
Technical Field
The disclosure relates to the technical field of information processing, in particular to an automatic monitoring method for an emergency.
Background
In the prior art, the method for analyzing the emergency in the video mainly adopts a manual monitoring and judging mode, and obviously, the method has the main problems that: the manual monitoring and detection needs huge manpower resources, and the uninterrupted monitoring and detection judgment is difficult to implement by manual browsing; furthermore, the manual retrieval efficiency is low, the time complexity is high, the number of video resources is large, the influence of the quality of a video screen is large, and accurate judgment cannot be necessarily achieved by manual browsing; in addition, manual monitoring is limited by hardware devices, and portability of the system cannot be realized.
Disclosure of Invention
Technical problem to be solved
In view of the above, an object of the present disclosure is to provide an automatic emergency monitoring method, so as to solve at least some of the above technical problems.
(II) technical scheme
In order to achieve the above object, the present disclosure provides an automatic monitoring method for an emergency event, including:
the image processing device acquires a monitoring image shot by a monitoring system in real time;
and the image processing device receives the video frame in the monitoring image, performs artificial neural network operation on the video frame, and outputs the emergency type data corresponding to the monitoring image after the operation.
In a further scheme, before acquiring the monitoring image captured by the monitoring system in real time, the method further comprises performing adaptive training on the neural network model.
In a further aspect, the adaptive training comprises: inputting an image at least comprising an emergency video image video frame and an emergency type coding label corresponding to the image; inputting a video frame into a current neural network structure, calculating the updating gradient direction and the updating amplitude of the network parameters of the type to which the current picture belongs through a loss function, and calculating the updating gradient direction and the updating amplitude of the overall neural network parameters of the type to which the video clip belongs through a combined loss function; and updating the neural network parameters according to the updating gradient direction and the updating amplitude.
In a further aspect, the monitoring image is preprocessed by a preprocessing module before receiving a video frame in the monitoring image.
In a further aspect, the pre-processing comprises: the method comprises the steps of monitoring image data segmentation, Gaussian filtering, binarization, regularization and/or normalization.
In a further aspect, the type data of the emergency event includes n bits for indicating different types of emergency events, and n is an integer greater than 1.
In a further aspect, performing an artificial neural network operation on the video frame comprises: the storage module receives a monitoring image, wherein the monitoring image comprises a video frame; respectively transmitting the instruction, the video frame data and the weight in the storage unit into an instruction cache module and inputting the instruction, the video frame data and the weight into a neuron cache module and a weight cache module through a Direct Memory Access (DMA); the control circuit reads the instruction from the instruction cache module, decodes the instruction and transmits the decoded instruction to the operation circuit; according to the instruction, the operation circuit executes corresponding neural network operation and transmits an operation result to the output neuron cache module; and taking the result of the operation as the judgment result of the current video frame image, and accessing the corresponding judgment result storage address of the DMA by the direct memory.
In a further scheme, when the image is a plurality of images, the images sequentially execute artificial neural network operation, the judgment results of the operation form a judgment queue which is then used as the input of the operation circuit for weighted addition, and the judgment result of the emergency type of the whole monitoring video at the current moment is determined.
In a further aspect, the adaptive training process is off-line training, and the input data for the adaptive training may be derived from an external continuous-time image capture device.
In a further aspect, the operational circuitry performs respective neural network operations, including: multiplying the input neuron by the weight data through a multiplication circuit; adding the multiplications step by step through an addition tree to obtain a weighted sum, and adding or not adding an offset according to the weighted sum; and performing activation function operation by using the activation function operation circuit and taking the weighted sum with bias or without bias as input to obtain the output neuron.
(III) advantageous effects
(1) The automatic monitoring method for the emergency can judge the type of the emergency in the monitoring video in real time by a computer program in a machine learning mode, so that a large amount of human resources are saved;
(2) the automatic monitoring method for the emergency can realize the monitoring and judgment of the type of the emergency under the complex environment and the video background through machine identification, and make up for the reduction of the judgment accuracy rate caused by the quality of the monitored video image and the environmental interference suffered by manual monitoring and judgment;
(3) the method comprises the image processing capable of carrying out neural network operation, the size of a hardware result required by the whole judging and early warning system can be greatly reduced by the image processing device, a huge display system is not required, the method can be realized by a mobile phone, a tablet personal computer and even a special signal generating receiver, and the portable design of the system is easy to realize;
(4) the method disclosed by the invention can greatly promote the popularization of monitoring video emergency monitoring and provide guarantee for social security and manual monitoring.
Drawings
Fig. 1 is a block diagram of an automatic emergency monitoring system according to an embodiment of the present disclosure.
Fig. 2 is a block diagram of an image processing apparatus of the automatic monitoring system of fig. 1.
Fig. 3 is a block schematic diagram of another image processing apparatus of the automated monitoring system of fig. 1.
Fig. 4 is a flow chart of a method of processing a surveillance image according to an embodiment of the disclosure.
Fig. 5 is a flow chart of another method of processing a monitored image according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure are clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making any creative effort, shall fall within the protection scope of the disclosure.
In the present disclosure, a "video frame" refers to an exposure time point image obtained by performing short exposure shooting during video shooting, and the images are continuously played to form a video; the video frame can be a current video frame to be subjected to neural network operation, or can be a historical video frame which is subjected to neural network operation and has a corresponding real emergency type coding label. In the present disclosure, an "emergency" refers to a natural event, an accident disaster, a public event or a social event that occurs suddenly, causing or possibly causing a serious social hazard, including but not limited to a flood, a terrorist event, a social conflict, a fire or a power outage.
The existing video monitoring is manually monitored and judged, but is often influenced by images of factors such as image quality, individual factors of monitoring personnel, environment and the like, and the judgment accuracy and efficiency are low. The embodiment of the disclosure provides an automatic emergency monitoring system and an automatic emergency monitoring method, which realize monitoring and judgment of emergency types in complex environments and video backgrounds through automatic machine identification, and make up for reduction of judgment accuracy rate caused by monitoring and judging monitored video image quality and environmental interference manually.
Fig. 1 is a block diagram of an automatic emergency monitoring system according to an embodiment of the present disclosure. According to an aspect of the embodiment of the present disclosure, an automatic emergency monitoring system 100 is provided, which includes a monitoring device 110 and an image processing device 120. The monitoring device 110 is configured to capture a monitoring image of a monitoring area; the image processing device 120 is configured to receive a video frame in the monitored image, perform an artificial neural network operation on the video frame, and output emergency type data corresponding to the monitored image after the operation. The image is operated by the neural network and then the emergency type data is output, so that the automatic judgment of the emergency type can be realized. The monitoring device 110 can be any of various devices capable of recording images in the prior art, including but not limited to a video camera, a still camera or a mobile phone, and converts the images or image frames into electronic format images (the electronic format images can be preprocessed). The image processing apparatus 120 according to the embodiment of the present disclosure receives the electronic format image, and performs a neural network operation on the electronic format image through a hardware circuit to obtain data of an emergency type (for example, it is determined that the emergency type is a fire event). In the operation of the neural network, the network model adopted can be various models existing in the prior art, including but not limited to DNN (deep neural network), CNN (convolutional neural network), or RNN (recurrent neural network) (e.g., LSTM long short term memory network), and the neurons of the output layer of the neural network contain the data of the type of the emergency corresponding to the image or video frame; the neural network operation is accelerated by the hardware equipment of the embodiment of the disclosure, so that the overall operation effect can be improved, and the efficiency of judging the emergency event is improved.
Fig. 2 is a block diagram of an image processing apparatus of the automatic monitoring system of fig. 1. In some embodiments, as shown in fig. 2, the image processing apparatus 120 includes a memory module 121 and an arithmetic circuit 123; the storage module 121 is configured to store an instruction, a neural network parameter, and operation data, where the operation data includes a video frame (including a current video frame and a historical video frame) and emergency type data corresponding to the historical video frame, and the operation circuit 123 is configured to perform a corresponding neural network operation on the operation data. The storage module 121 may further store output neuron data obtained by the operation of the operation circuit. Neural network parameters herein include, but are not limited to, weights, biases, and activation functions. Preferably, the initialized weight in the parameter is the updated weight after the historical data training, the training process can be realized in an off-line mode, the artificial neural network operation can be directly carried out, and the process of training the neural network is saved.
In some embodiments, the operation circuit 123 is configured to perform a neural network operation on the operation data, including: a multiplication circuit for multiplying the input neuron by the weight data; an addition tree for adding the multiplication sums step by step through the addition tree to obtain a weighted sum, and adding or not adding an offset according to the weighted sum; and the activation function operation circuit is used for performing activation function operation by taking the weighted sum with bias or without bias as input to obtain the output neuron. Preferably, the activation function may be a sigmoid function, a tanh function, a ReLU function, or a softmax function.
In some embodiments, the image processing apparatus 120 further includes a control circuit 122, the control circuit 122 is electrically connected (directly or indirectly) with the memory module 121 and the arithmetic circuit 123, respectively, and is configured to decode the instructions in the memory module 121 into arithmetic instructions and input the arithmetic instructions to the arithmetic circuit 123, and further configured to control the data reading or arithmetic process of the memory module 121 and the arithmetic circuit 123.
In some embodiments, as shown in fig. 2, the image processing apparatus 120 may further include a Direct Memory Access DMA124(Direct Memory Access) for storing input data, neural network parameters and instructions in the storage module 121 for the control circuit 122 and the operation circuit 123 to call; and further, the operation circuit 123 is configured to write the output neuron into the storage module 121 after the output neuron is calculated by the operation circuit 123.
In some embodiments, as shown in fig. 2, the image processing apparatus 120 further includes an instruction cache module 125 for caching instructions from the direct memory access DMA124 for the control circuit 122 to call. The instruction cache module 125 may be an on-chip cache, which is integrated on the processor through a manufacturing process, and may increase a processing speed and save an overall operation time when an instruction is fetched.
In some embodiments, the image processing apparatus 120 further comprises an input neuron caching module 126, the input neuron caching module 126 being configured to cache input neurons from the direct memory access DMA124 for invocation by the arithmetic circuitry; the image processing apparatus 120 may further include a weight caching module 127, configured to cache a weight from the DMA124 for the operation circuit 123 to call; the image processing apparatus 120 may further include an output neuron buffer module 128 for storing the output neurons obtained from the operation circuit 123 for outputting to the direct memory access DMA 124. The input neuron cache module, the weight cache module and the output neuron cache module can also be on-chip caches, and are integrated on the image processing device 120 through a semiconductor process, so that the processing speed can be increased when the operation circuit 123 reads and writes, and the overall operation time is saved.
Fig. 3 is a block schematic diagram of another image processing apparatus 120 of the automated monitoring system of fig. 1. As shown in fig. 3, the image processing apparatus 120 in this embodiment may include a preprocessing module 129, which is configured to preprocess the monitoring image captured by the monitoring apparatus 110 and convert the preprocessed monitoring image into data conforming to the input format of the neural network. Preferably, the preprocessing includes slicing, gaussian filtering, binarizing, regularizing and/or normalizing the image and/or video data captured by the monitoring device to obtain data conforming to the neural network input format. The preprocessing function is to improve the accuracy of the subsequent neural network operation so as to obtain accurate people number judgment.
It should be noted that the preprocessing module 129 according to the embodiment of the disclosure may be disposed in the image processing apparatus 120, integrally formed with the image processing apparatus 120 through a semiconductor process, or may be disposed outside the image processing apparatus 120, including but not limited to being disposed in the monitoring apparatus 110.
In some embodiments, parameters (such as weight, offset, and the like) in the neural network may be adaptively trained, one or more images including video frames and corresponding labels (such as corresponding codes) in the emergency may be input to the graphics processing apparatus 120 including the neural network structure, the update gradient direction and the update amplitude of the corresponding network parameter of the current image may be calculated and determined through the loss function, and then the loss function may be adaptively reduced through continuous iteration, so that the error rate of the single video frame image and the emergency type of the entire monitoring video are continuously reduced, and finally a correct emergency type determination result may be better returned. Preferably, the adaptive training process is processed in real time.
In some embodiments, the automatic emergency monitoring system 100 may further include: and the result processing and displaying device is used for receiving the emergency type data calculated by the image processing device and converting the data into a user recognizable format, wherein the recognizable format is a picture, a table, a text, a video and/or a voice. The result processing device may convert the emergency type data (e.g., a string of codes) calculated by the image processing device 120 into a format recognizable by the user, such as performing digital-to-analog conversion, for example, converting into analog signals such as sound; for example, format conversion is performed, the format is converted into a picture format, and then the picture format is displayed to a user through a display device (such as a touch screen and a display) for the user to select; for example, into a control signal, to control the corresponding device, etc., to respond to the emergency event (e.g., to control the fire extinguishing device to perform a fire extinguishing operation on the monitored area).
According to another aspect of the embodiments of the present disclosure, there is also provided an automatic emergency monitoring system, including an image processing device, configured to receive a video frame in a monitored image, perform an artificial neural network operation on the video frame, and output emergency type data corresponding to the monitored image after the operation. The setting manner of the image processing apparatus may be the image processing apparatus 120 in the above embodiments, which is not described herein again.
In another aspect, the embodiment of the present disclosure further provides an automatic monitoring method for an emergency event. Fig. 4 is a flow chart of a method of processing a surveillance image according to an embodiment of the disclosure. Fig. 4 shows an automatic emergency monitoring method, which includes:
s401: the image processing device acquires a monitoring image shot by the monitoring device in real time;
s402: and the image processing device receives the video frame in the monitoring image, performs artificial neural network operation on the video frame, and outputs the emergency type data corresponding to the monitoring image after the operation.
In step S401, the image captured by the monitoring device is processed in a real-time manner. The mode can judge whether the emergency happens in time so as to facilitate relevant personnel to process the emergency scene.
In step S402, a section of video (including multiple images) or a single image (a video frame) is obtained, and a determination result is finally given by performing neural network operation on the multiple images in sequence and performing weighting calculation, or a determination result of the type of the emergency event is directly given by performing neural network operation on the single image.
In some embodiments, before step S401, adaptive training of the neural network model is further included. The adaptive training may comprise the steps of: inputting an image at least comprising an emergency video image video frame and an emergency type coding label corresponding to the image; inputting a video frame into a current neural network structure, calculating the updating gradient direction and the updating amplitude of the network parameters of the type to which the current picture belongs through a loss function, and calculating the updating gradient direction and the updating amplitude of the overall neural network parameters of the type to which the video clip belongs through a combined loss function; and updating the neural network parameters according to the updating gradient direction and the updating amplitude. The adaptive training process is off-line training, and input data of the adaptive training can be from an external continuous time image acquisition device.
In some embodiments, the monitoring image is pre-processed by a pre-processing module prior to receiving a video frame in the monitoring image. The pretreatment comprises the following steps: the method comprises the steps of monitoring image data segmentation, Gaussian filtering, binarization, regularization and/or normalization. The corresponding preprocessing function can be realized by setting the preprocessing module, and for the setting of the corresponding preprocessing module, the preprocessing module 129 in the automatic emergency monitoring system can be referred to, which is not described herein again.
In some embodiments, the type data of the emergency event comprises n bits for representing different types of emergency events, and n is an integer greater than 1. Of course, there is a corresponding data type for images that do not include an emergency, for example, represented by the code n' b0, but this data type needs to be distinguished from the above-described images that include an emergency.
In some embodiments, performing artificial neural network operations on the video frame comprises: the storage module receives a monitoring image, wherein the monitoring image comprises a video frame; respectively transmitting the instruction, the video frame data and the weight in the storage unit into an instruction cache module and inputting the instruction, the video frame data and the weight into a neuron cache module and a weight cache module through a Direct Memory Access (DMA); the control circuit reads the instruction from the instruction cache module, decodes the instruction and transmits the decoded instruction to the operation circuit; according to the instruction, the operation circuit executes corresponding neural network operation and transmits an operation result to the output neuron cache module; and taking the result of the operation as the judgment result of the current video frame image, and accessing the corresponding judgment result storage address of the DMA by the direct memory.
Further, when the images are a plurality of images, the images sequentially execute artificial neural network operation, the judgment results of the operation form a judgment queue which is then used as the input of the operation circuit for weighted addition, and the judgment result of the emergency type of the whole monitoring video at the current moment is determined.
In some embodiments, the operational circuitry performs respective neural network operations, including: multiplying the input neuron by the weight data through a multiplication circuit; adding the multiplications step by step through an addition tree to obtain a weighted sum, and adding or not adding an offset according to the weighted sum; and performing activation function operation by using the activation function operation circuit and taking the weighted sum with bias or without bias as input to obtain the output neuron.
In addition, similar to the method of the above embodiment, but there is a difference, the embodiment of the present disclosure further provides an automatic emergency monitoring method. Fig. 5 is a flow chart of another method of processing a monitored image according to an embodiment of the present disclosure. Fig. 5 shows an automatic emergency monitoring method, which includes:
s501: the method comprises the steps that an image processing device obtains multiple groups of historical images of emergency types to be judged, which are transmitted from the outside;
s502: the image processing device screens video frames in a plurality of groups of historical images, performs artificial neural network operation on the video frames in sequence, and outputs corresponding emergency type data with an emergency in a plurality of sections of images after the operation.
In step S501, external sets of historical images are obtained, and then images in which an emergency occurs are screened out by calculation in a later stage, and the type of the emergency is determined, and then the non-emergency (for example, traffic violation) is processed in the later stage. By automatically calculating and screening a large number of images, a large amount of manual labor can be saved.
In step S52, the images are sequentially subjected to neural network operation and then subjected to weighted calculation, so as to finally provide a determination result, which can perform comprehensive determination on a segment of image, thereby further improving the overall screening efficiency.
The specific neural network operation details, the training method and the preprocessing method can be performed by referring to the corresponding steps in the method of the above embodiment, which are not repeated herein.
The following exemplifies specific examples to specifically describe the automatic emergency monitoring method, wherein embodiment 1 corresponds to real-time processing of the image of the monitoring device, and timely calculating the emergency type corresponding to the image; embodiment 2 corresponds to searching for a segment in which an emergency occurs among a plurality of (e.g., a large number of) video segments. The following embodiments refer to the functions and connection modes of specific devices, modules, circuits and units, and refer to the contents described in the above embodiments of the automatic emergency monitoring system.
Example 1:
the embodiment provides a method capable of processing a monitoring image in real time and detecting the type of an emergency, and the method can judge whether the emergency occurs in time so as to facilitate relevant personnel to process the emergency on site.
In this embodiment 1, a storage module of an image processing apparatus monitors an interactive monitoring image of the apparatus in real time, and stores a video frame of the monitoring image in the storage module as input data, where the input data includes but is not limited to one or more groups of video frames of monitoring videos; the device is trained according to the input monitoring video frame, combined with a period of historical video frame and image video frame labels, and predicts and provides the input emergency type code. The video frame image of the input surveillance video can be the original input or the result of the original input after being preprocessed.
The image processing apparatus may perform adaptive training, such as: the device inputs a group (belonging to an emergency video) or an image containing a monitoring video frame and an emergency category label (representing in a form of coding, if not, the corresponding label coding is also carried out). The device inputs an image into a current neural network structure, calculates and judges the update gradient direction and the update amplitude of network parameters (such as weight, bias and the like) of the type to which the current image belongs through a loss function (a cost function for measuring the type judgment error of the emergency corresponding to the image), calculates the update gradient direction and the update amplitude of overall neural network parameters (such as weight, bias and the like) of the type to which the monitoring segment belongs through a combined loss function (a cost function for measuring all video frames in a short time and judging errors), further adaptively reduces the loss function through continuous iteration, so that the error rate of single video frame image and the overall monitoring video in the judgment of the type of the emergency is continuously reduced, and finally, a correct emergency type judgment result can be better returned.
In the input burst event type code, at least n bits are needed to represent, the code n' b0 represents no burst event, and other burst events are sequentially represented by n-bit binary numbers. Meanwhile, the codes are used as video frame labels for training video monitoring, and input into a network to be used as training labels of a neural network and output results of videos to be judged.
The adaptive training process is offline; the type judgment of the monitoring video to be judged is processed in real time, and the image processing device is an artificial neural network chip.
The whole working process of the device is as follows:
step 1, input data is transmitted into a storage module through a preprocessing module or directly transmitted into the storage module;
step 2, transmitting the data into an instruction cache in batches by a Direct Memory Access (DMA), inputting the data into a neuron cache and a weight cache;
step 3, the control circuit reads the instruction from the instruction cache, decodes the instruction and then transmits the decoded instruction to the operation circuit;
step 4, according to the instruction, the operation circuit executes the corresponding operation: in each layer of the neural network, the operation is mainly divided into three steps: step 4.1, multiplying the corresponding input neuron by the weight; step 4.2, performing addition tree operation, namely adding the results of the step 4.1 step by step through an addition tree to obtain a weighted sum, and adding bias or not processing the weighted sum according to needs; and 4.3, performing activation function operation on the result obtained in the step 4.2 to obtain an output neuron, and transmitting the output neuron into an output neuron cache.
And 5, repeating the steps 2 to 4 until all data are operated. And storing the result after the operation as the judgment result of the current video frame image into the corresponding judgment result storage address by the DMA.
And 6, taking the result judgment queue obtained in the step 5 as the input of the arithmetic circuit, and carrying out weighted addition to obtain the result, namely the judgment result of the emergency type of the whole monitoring video at the moment.
According to the function requirements: and if the judgment result of the video image emergency is required to be obtained, the final weighting of the neural network and the coding result of the corresponding emergency are the final judgment result of the video.
Example 2:
the embodiment provides a method for screening a plurality of historical images, judging whether an emergency occurs in the images and giving a judgment result of the type of the emergency, calculating and screening a large number of images through an automatic process, saving a large amount of manual labor, processing the monitored images in real time and detecting the type of the emergency, and judging whether the emergency occurs in time so that related personnel can process the emergency on site.
In this embodiment 2, a storage circuit of an image processing apparatus receives a plurality of video images and stores video frames of the video images in the storage circuit as input data, where the input data includes but is not limited to a group or a single video frame; the device trains according to the input video image video frame and the video image video frame label, predicts and provides the input emergency type code. The input video image and the video frame image can be the original input or the result of the original input after being preprocessed.
In some embodiments, the image processing apparatus is capable of adaptive training, such as: the device inputs a group (belonging to an emergency video) or a group of images containing video frames of the emergency video images and corresponding emergency type coding labels (if not, the corresponding codes are also available). The device inputs an input image into a current neural network structure, calculates and judges the update gradient direction and the update amplitude of network parameters (such as weight, bias and the like) of the type of a current image through a loss function (a cost function for measuring the type judgment error of an emergency corresponding to the image), calculates the update gradient direction and the update amplitude of overall neural network parameters (such as weight, bias and the like) of the type of a video segment through a combined loss function (a cost function for measuring all video frames in a short time to judge the error), adaptively reduces the loss function through continuous iteration, enables the error rate of single video frame images and the overall video in emergency type judgment to be continuously reduced, and finally can better return a correct emergency type judgment result.
In some embodiments, at least n bits are required to represent the input burst type code, and no burst occurs is represented by the code n' b0, and other bursts are sequentially represented by n-bit binary numbers. Meanwhile, the codes are used as video frame labels input into a training video screen for monitoring and input into a network as training labels of a neural network and output results of videos to be judged.
In some embodiments, the adaptive training process is offline (i.e., processed by a local computer without being connected to a cloud server via a network). Preferably, the type judgment of the monitoring video to be judged is processed in real time. Preferably, the image processing device is an artificial neural network chip.
The whole working process of the device is as follows:
step 1, input data is transmitted into a storage module through a preprocessing module or directly transmitted into the storage module;
step 2, DMA (Direct Memory Access) transmits the data into an instruction cache in batches, and inputs the data into a neuron cache and a weight cache;
step 3, the control circuit reads the instruction from the instruction cache, decodes the instruction and then transmits the decoded instruction to the operation circuit;
and 4, according to the instruction, the operation circuit executes corresponding operation: in each layer of the neural network, the operation is mainly divided into three steps: step 4.1, multiplying the corresponding input neuron by the weight; step 4.2, performing addition tree operation, namely adding the results of the step 4.1 step by step through an addition tree to obtain a weighted sum, and adding bias or not processing the weighted sum according to needs; and 4.3, performing activation function operation on the result obtained in the step 4.2 to obtain an output neuron, and transmitting the output neuron into an output neuron cache.
And 5, repeating the steps 2 to 4 to know that all data are operated. And storing the result after the operation as the judgment result of the current video frame image into the corresponding judgment result storage address by the DMA.
And 6, taking the result judgment queue obtained in the step 5 as the input of the arithmetic circuit, and carrying out weighted addition to obtain the result, namely the judgment result of the emergency type of the whole video.
According to the function requirements: and if the judgment result of the video image emergency is required to be obtained, the final weighting of the neural network and the coding result of the corresponding emergency are the final judgment result of the video.
In the embodiments provided in the present disclosure, it should be understood that the disclosed related devices and methods may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the described parts or modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of parts or modules may be combined or integrated into a system, or some features may be omitted or not executed.
In this disclosure, the term "and/or" may have been used. As used herein, the term "and/or" means one or the other or both (e.g., a and/or B means a or B or both a and B).
In the description above, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It will be apparent, however, to one skilled in the art, that one or more other embodiments may be practiced without some of these specific details. The specific embodiments described are not intended to limit the disclosure but rather to illustrate it. The scope of the present disclosure is not to be determined by the specific examples provided above but only by the claims below. In other instances, well-known circuits, structures, devices, and operations are shown in block diagram form, rather than in detail, in order not to obscure an understanding of the description. Where considered appropriate, reference numerals or terminal portions of reference numerals have been repeated among the figures to indicate corresponding or analogous elements, optionally having similar characteristics or identical features, unless otherwise specified or evident.
Various operations and methods have been described. Some methods have been described in a relatively basic manner in a flow chart form, but operations may alternatively be added to and/or removed from the methods. Additionally, while the flow diagrams illustrate a particular order of operation according to example embodiments, it is understood that this particular order is exemplary. Alternative embodiments may optionally perform these operations in a different manner, combine certain operations, interleave certain operations, etc. The components, features, and specific optional details of the devices described herein may also optionally be applied to the methods described herein, which may be performed by and/or within such devices in various embodiments.
Each functional unit/subunit/module/submodule in the present disclosure may be hardware, for example, the hardware may be a circuit, including a digital circuit, an analog circuit, and the like. Physical implementations of hardware structures include, but are not limited to, physical devices including, but not limited to, transistors, memristors, and the like. The memory module may be any suitable magnetic or magneto-optical storage medium, such as RRAM, DRAM, SRAM, EDRAM, HBM, HMC, and the like.
It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (8)

1. An automatic monitoring method for an emergency, which is characterized by comprising the following steps:
the method comprises the steps that an image processing device obtains a monitoring image shot by a monitoring system in real time, wherein the monitoring image comprises a single video frame image;
the image processing device receives a single video frame image in the monitoring image, performs artificial neural network operation on the single video frame image, and outputs emergency type data corresponding to the monitoring image after the operation;
before acquiring a monitoring image shot by a monitoring system in real time, the method further comprises the step of carrying out adaptive training on the neural network model, wherein the adaptive training comprises the following steps:
inputting an image at least comprising an emergency video image video frame and an emergency type coding label corresponding to the image;
inputting a video frame into a current neural network structure, calculating the updating gradient direction and the updating amplitude of the network parameters of the type to which a current picture belongs through a loss function, and calculating the updating gradient direction and the updating amplitude of the overall network parameters of the type to which a video clip belongs through a combined loss function;
and updating the neural network parameters according to the updating gradient direction and the updating amplitude.
2. The method of claim 1, wherein the monitoring image is pre-processed by a pre-processing module prior to receiving a video frame in the monitoring image.
3. The method of claim 2, wherein the pre-processing comprises: the method comprises the steps of monitoring image data segmentation, Gaussian filtering, binarization, regularization and/or normalization.
4. The method of claim 1, wherein the type data of the emergency event comprises n bits for representing different types of emergency events, and n is an integer greater than 1.
5. The method of claim 1, wherein performing an artificial neural network operation on the video frame comprises:
the storage module receives a monitoring image, wherein the monitoring image comprises a video frame;
respectively transmitting the instruction, the video frame data and the weight in the storage unit into an instruction cache module and inputting the instruction, the video frame data and the weight into a neuron cache module and a weight cache module through a Direct Memory Access (DMA);
the control circuit reads the instruction from the instruction cache module, decodes the instruction and transmits the decoded instruction to the operation circuit;
according to the instruction, the operation circuit executes corresponding neural network operation and transmits an operation result to the output neuron cache module;
and taking the result of the operation as the judgment result of the current video frame image, and accessing the corresponding judgment result storage address of the DMA by the direct memory.
6. The method according to claim 5, wherein when the image is a plurality of images, each image sequentially performs an artificial neural network operation, and the judgment results of the operations form a judgment queue and are used as the input of the operation circuit for weighted addition to determine the judgment result of the emergency type of the whole monitoring video at the current time.
7. The method of claim 1, wherein the adaptive training process is off-line training, and the input data for the adaptive training is derived from an external continuous-time image acquisition device.
8. The method of claim 5, wherein the operational circuitry performs respective neural network operations, comprising:
multiplying the input neuron by the weight data through a multiplication circuit;
adding the multiplications step by step through an addition tree to obtain a weighted sum, and adding or not adding an offset according to the weighted sum;
and performing activation function operation by using the activation function operation circuit and taking the weighted sum with bias or without bias as input to obtain the output neuron.
CN201810470989.0A 2018-05-16 2018-05-16 Method for processing image video frame by using artificial intelligence chip Active CN108921012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810470989.0A CN108921012B (en) 2018-05-16 2018-05-16 Method for processing image video frame by using artificial intelligence chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810470989.0A CN108921012B (en) 2018-05-16 2018-05-16 Method for processing image video frame by using artificial intelligence chip

Publications (2)

Publication Number Publication Date
CN108921012A CN108921012A (en) 2018-11-30
CN108921012B true CN108921012B (en) 2022-05-03

Family

ID=64402510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810470989.0A Active CN108921012B (en) 2018-05-16 2018-05-16 Method for processing image video frame by using artificial intelligence chip

Country Status (1)

Country Link
CN (1) CN108921012B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889497B (en) * 2018-12-29 2021-04-23 中科寒武纪科技股份有限公司 Learning task compiling method of artificial intelligence processor and related product
CN110503596A (en) * 2019-08-12 2019-11-26 北京中科寒武纪科技有限公司 Method for processing video frequency, device, electronic equipment and computer readable storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI430212B (en) * 2010-06-08 2014-03-11 Gorilla Technology Inc Abnormal behavior detection system and method using automatic classification of multiple features
CN105160313A (en) * 2014-09-15 2015-12-16 中国科学院重庆绿色智能技术研究院 Method and apparatus for crowd behavior analysis in video monitoring
CN104636751A (en) * 2014-12-11 2015-05-20 广东工业大学 Crowd abnormity detection and positioning system and method based on time recurrent neural network
CN107545303B (en) * 2016-01-20 2021-09-07 中科寒武纪科技股份有限公司 Computing device and operation method for sparse artificial neural network
CN106991477B (en) * 2016-01-20 2020-08-14 中科寒武纪科技股份有限公司 Artificial neural network compression coding device and method
CN106022311A (en) * 2016-06-13 2016-10-12 上海仪电(集团)有限公司中央研究院 City monitoring video identification-based emergency event discovery method and system
CN107688795A (en) * 2017-09-06 2018-02-13 全球能源互联网研究院有限公司 A kind of monitoring method and terminal of power system video image
CN107818337A (en) * 2017-10-09 2018-03-20 中国电子科技集团公司第二十八研究所 Accident classification method and device based on depth convolutional neural networks

Also Published As

Publication number Publication date
CN108921012A (en) 2018-11-30

Similar Documents

Publication Publication Date Title
CN110633610B (en) Student state detection method based on YOLO
CN110659391A (en) Video detection method and device
CN113408671B (en) Object identification method and device, chip and electronic equipment
CN112966589A (en) Behavior identification method in dangerous area
CN109711364A (en) A kind of facial image super-resolution reconstruction method, device and computer equipment
CN111931719B (en) High-altitude parabolic detection method and device
CN108921012B (en) Method for processing image video frame by using artificial intelligence chip
CN113239914B (en) Classroom student expression recognition and classroom state evaluation method and device
CN108566537A (en) Image processing apparatus for carrying out neural network computing to video frame
CN113114986B (en) Early warning method based on picture and sound synchronization and related equipment
CN113052147A (en) Behavior identification method and device
CN115082752A (en) Target detection model training method, device, equipment and medium based on weak supervision
CN117036843A (en) Target detection model training method, target detection method and device
CN112883231A (en) Short video popularity prediction method, system, electronic device and storage medium
CN115082813A (en) Detection method, unmanned aerial vehicle, detection system and medium
CN115099318A (en) Training method and application method of wind speed space-time prediction network and electronic equipment
CN114120208A (en) Flame detection method, device, equipment and storage medium
CN117746015A (en) Small target detection model training method, small target detection method and related equipment
CN113239883A (en) Method and device for training classification model, electronic equipment and storage medium
CN108564595A (en) Image tracking method and device, electronic equipment, storage medium, program
CN114445711B (en) Image detection method, image detection device, electronic equipment and storage medium
CN115331155A (en) Mass video monitoring point location graph state detection method and system
CN115273224A (en) High-low resolution bimodal distillation-based video human body behavior identification method
CN108647660A (en) A method of handling image using neural network chip
CN113989720A (en) Target detection method, training method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant