CN109714526B

CN109714526B - Intelligent camera and control system

Info

Publication number: CN109714526B
Application number: CN201811402218.4A
Authority: CN
Inventors: 周诗怡; 陈云霁
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2018-11-22
Filing date: 2018-11-22
Publication date: 2021-02-09
Anticipated expiration: 2038-11-22
Also published as: CN109714526A

Abstract

The utility model provides an intelligence camera and contain its control system. Wherein intelligent camera includes: the image shooting device is used for shooting or receiving images and/or videos containing the target object; and a processing device comprising: the data preprocessing module is used for selecting an image or a video frame meeting set conditions from the shot image and/or video; the target extraction module is used for detecting a target and acquiring a target object in the image or a target image of at least part of the target object; and the identification module is used for identifying the target image and identifying the identification data for distinguishing the target object. In the disclosure, the processing speed can be improved by arranging a plurality of modules which adopt the neural network for processing in the intelligent camera.

Description

Intelligent camera and control system

Technical Field

The invention relates to the technical field of information processing, in particular to an intelligent camera and a control system comprising the intelligent camera.

Background

There is the occasion of demand to intelligent camera, can have a plurality of problems that are difficult to solve. For example, in a large parking lot, a specific parking position of a vehicle can not be found frequently, and the problem of finding the parked vehicle in the parking lot can be solved by using an intelligent camera only through the help of a key and a plurality of workers for finding guidance; searching for specific personnel in a place containing a large number of people can be basically carried out only by means of visual inspection at present; among many animals and plants, the search for animals and plants of a specific species can be performed only by visual observation.

Disclosure of Invention

Technical problem to be solved

In view of the above, the present invention provides an intelligent camera and a control system including the intelligent camera.

(II) technical scheme

According to an aspect of the present disclosure, there is provided an intelligent camera, including:

the image shooting device is used for shooting or receiving images and/or videos containing the target object; and

a processing apparatus, comprising: the data preprocessing module is used for selecting an image or a video frame meeting set conditions from the shot image and/or video; the target extraction module is used for detecting a target and acquiring a target object in the image or a target image of at least part of the target object; and the identification module is used for identifying the target image and identifying the identification data for distinguishing the target object.

In a further embodiment, the processing device further comprises: the enhancement module is used for processing the image or video frame with the resolution ratio of the image or video frame meeting the set condition as the first resolution ratio to obtain the image or video frame with the second resolution ratio, wherein the second resolution ratio is higher than the first resolution ratio.

In a further embodiment, setting the conditions comprises: from the captured images and/or videos, images or video frames having a difference of a set threshold or more are extracted.

In a further embodiment, extracting video frames having a difference above a set threshold comprises: through artificial neural network, carry out the processing to the video, specifically include: taking a current video frame T, carrying out feature extraction on the video frame through convolution, and finally obtaining a score f through an output layer of a neural network_TAs a score for the video frame, representing a feature of the frame, f_TAnd f₀Comparison, f₀Initializing to 0, if the difference is larger than the set threshold, taking the frame as the data of the subsequent module, and taking f as the data of the subsequent module_TIs assigned to f₀And taking the next video frame T +1, and repeating the processing until all the video frames are finished.

In further embodiments, the target is a human, animal, plant, natural, or artificial.

In a further embodiment, the artifact is an automobile, and the at least a portion of the image of the object includes an automobile license plate.

In a further embodiment, the target extracting module, for target detection, acquiring a target image of a target object or at least a portion of the target object in the image, includes: reading one image or video frame through the artificial neural network, and acquiring a target image of the target object or at least part of the target object.

In further embodiments, acquiring a target image of the target object or at least a portion of the target object comprises: generating a candidate region by using a selective search algorithm through an artificial neural network, dividing the picture into a plurality of small regions, and merging the small regions according to the similarity through a hierarchical grouping method to obtain a target object or at least partial candidate bounding boxes of the target object; and adopting a sliding window method for the candidate boundary frame, and sliding on the boundary frame according to the proportion of the target object or at least part of the target object as the window size to obtain the target object or at least part of the target image area of the target object.

In further embodiments, the identification data comprises at least one of: patterns, Chinese characters, letters, numbers and symbols.

In a further embodiment, the identifying module identifies the target image and identifies identification data distinguishing the target object, including: through the artificial neural network, the identification data in the picture is positioned and respectively identified: extracting all candidate frames of the image, adjusting the size of each candidate frame to adapt to the input size of the artificial neural network, obtaining a characteristic diagram through the convolutional neural network, and inputting the characteristic diagram into a classification network, wherein the classification network can identify the characteristic diagram, and finally obtaining identification data information to be obtained in an original diagram.

In a further embodiment, the processing device comprises a neural network processor integrating at least one of the data pre-processing module, the target extraction module, the data processing module, and the enhancement module.

In a further embodiment, the neural network processor comprises: the storage unit is used for storing the input data, the neural network parameters and the instructions; the control unit is used for reading the special instruction from the storage unit, decoding the special instruction into an arithmetic unit instruction and inputting the arithmetic unit instruction to the arithmetic unit; and the operation unit is used for executing corresponding neural network operation on the data according to the operation unit instruction to obtain an output neuron.

In a further embodiment, in the arithmetic unit, performing the corresponding neural network operation includes: multiplying the input neuron by the weight data to obtain a multiplication result; executing addition tree operation for adding the multiplication results step by step through an addition tree to obtain a weighted sum, and adding bias or not processing the weighted sum; and executing activation function operation on the weighted sum which is biased or not processed to obtain the output neuron.

In a further embodiment, the processor further comprises: the preprocessing unit is used for preprocessing the image and/or video data shot by the camera and converting the image and/or video data into a face recognition result, and the face recognition result is data in accordance with a neural network input format; and/or direct memory access DMA, input data, neural network parameters and instructions stored in the memory unit for the control unit and the arithmetic unit to call.

In further embodiments, the processor further comprises at least one of: the instruction cache is used for accessing the DMA cache instruction from the direct memory for the calling of the control unit; the input neuron cache is used for inputting neurons from the direct memory access DMA cache for being called by the operation unit; the weight cache is used for accessing the DMA cache weight from the direct memory for the calling of the arithmetic unit; and an output neuron buffer for storing the output neurons obtained from the operation unit after the operation to output to the direct memory access DMA.

In a further embodiment, the instruction cache, the input neuron cache, the weight cache, and the output neuron cache are on-chip caches.

According to another aspect of the present disclosure, there is provided a control system including: the number of the intelligent cameras is configured to cover a set place;

and the control end receives the identification data processed by each intelligent camera and determines the position of the target object corresponding to the identification data in a set place.

In a further embodiment, the control terminal further comprises: and the display device and/or the voice output device are used for outputting the position information of the target object in the set place determined by the control end.

(III) advantageous effects

The control system with the intelligent camera is beneficial to the management of a parking lot or the recognition of vehicles which are randomly parked and randomly placed on a road;

the intelligent camera is provided with a plurality of modules which adopt a neural network for processing, so that the processing speed is improved;

the processing speed is further increased by executing the modules of the intelligent camera in parallel for processing.

Drawings

Fig. 1 is a schematic diagram of the principle of an intelligent camera according to an embodiment of the present invention.

Fig. 2 is a schematic block diagram of the processing apparatus of fig. 1.

Fig. 3 is a schematic block diagram of a neural network processor in the processing device of fig. 2.

Fig. 4 is a schematic view of an application scenario of the smart camera according to the embodiment of the present invention.

FIG. 5 is a schematic diagram of a control system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present disclosure are clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making any creative effort, shall fall within the protection scope of the disclosure.

According to the basic concept of the present disclosure, an intelligent camera is provided, which includes an image capturing device and a processing device, wherein the image capturing device is configured to capture or receive an image and/or a video frame containing a target object, and the processing device is configured to select an image or a video frame satisfying a set condition from the image and/or the video captured by the image capturing device, and then acquire the target object or at least a part of the target image of the target object in the image or the video frame; and identifying identification data distinguishing the target object. Through the intelligent camera, the target object can be distinguished by identifying the marking data of the target object through operation, so as to be further processed in the later period (for example, statistical analysis, target object searching and the like).

Fig. 1 is a schematic diagram of an intelligent camera 100 according to an embodiment of the present invention. As shown in fig. 1, the smart camera 100 includes an image capturing device 120 and a processing device 110.

The image capturing device 120 is used for capturing or receiving an image and/or video including a target object. The image capturing device 120 may be various electronic devices capable of capturing images and/or videos, which are available in the prior art, and obtain external information through an electromagnetic or optical or other signal source, and its structure may refer to various electronic devices with an image capturing function in the prior art, including but not limited to a video camera, a still camera, a mobile phone with a shooting function, and a tablet computer, but the electronic devices with an image capturing function in the prior art do not include a functional unit or module for performing a neural network operation on image data.

In the embodiment of the invention, the target object can be a human, an animal, a plant, a natural object or an artificial object, and the individual can be distinguished and can be identified by the identification data. For example, the object is a person, the identification data may be a pattern of clothing, text and/or numbers worn by the person, or the identification data may be a local or global feature of the person's body including, but not limited to, hairstyle, facial shape, or height; for example, the target object is an animal, similar to a human, which may be distinguished by clothing or by local or global characteristics of the animal; for example, the object is an artifact, further described herein by way of example as an automobile, such as by distinguishing the automobile by its license plate, or by automobile color, model, etc.

Fig. 2 is a schematic block diagram of the processing apparatus of fig. 1. In some embodiments, for the processing device 200, it may include a data pre-processing module 201, an object extraction module 202, and a recognition module 204.

The data preprocessing module 201 is configured to select an image or a video frame that satisfies a set condition from images and/or videos captured by the image capturing apparatus 120. The setting condition here may include extracting an image or video frame having a difference above a set threshold from the captured image and/or video. Further, the method can be selected as follows, and the video can be processed through an artificial neural network, which specifically comprises the following steps: taking a current video frame T, carrying out feature extraction on the video frame through convolution, and finally obtaining a score f through an output layer of a neural network_TAs a score for the video frame, representing a feature of the frame, f_TAnd f₀Comparison, f₀Initialized to 0, if the difference is larger than the set threshold, the frame is taken as a subsequent moduleData of (a) will f_TIs assigned to f₀And taking the next video frame T +1, and repeating the processing until all the video frames are finished.

In some embodiments, the artificial neural network used in the data preprocessing module 201 may be a deep neural network, and the deep neural network algorithm is divided into a training process and a using process. Historically collected images or video content are used in the training process. Alternatively, different data sets are used for different targets, for example, in a parking lot mission, videos of different parking lots marked with license plate numbers are used, and the data is used as a training set to train the deep neural network. The deep neural network may include convolutional layers, fully-connected layers, pooling layers, and batch normalization (batch norm) layers to perform coarse processing (i.e., the selection operation in the data preprocessing module 201 described above) on the video or image.

The object extraction (attach) module 202 is used for detecting an object, and acquiring an object or an object image of at least a part of the object in the image. The selection of at least a portion of the target image should be selected to form a portion of the image that distinguishes between the targets, such as an image containing a portion of the license plate of a vehicle.

In some embodiments, acquiring a target image of the target object or at least a portion of the target object may include: generating a candidate area by using a selective search algorithm, dividing the picture into a plurality of small areas, and merging the small areas according to the similarity by a hierarchical grouping method to obtain a target object or at least partial candidate bounding boxes of the target object; and adopting a sliding window method for the candidate boundary frame, and sliding on the boundary frame according to the proportion of the target object or at least part of the target object as the window size to obtain the target object or at least part of the target image area of the target object. For example, a target object is taken as an automobile, at least part of target images of the target object comprise automobile license plate illustrations, a network for target detection can be trained, a candidate area is generated by using a selective search algorithm, an image is divided into a plurality of small areas through simple area division, then the small areas are combined according to a certain similarity through a hierarchical grouping method, a candidate boundary frame of one automobile is obtained, then a sliding window method is adopted for the candidate boundary frame of each automobile, and the candidate boundary frame slides on the boundary frame according to the proportion size of the license plate as the size of a window, so that a license plate area is obtained.

In some embodiments, the artificial neural network used by the target extraction module 202 may be a deep neural network, and the deep neural network algorithm is divided into two parts, a training process and a use process. During training, historically collected images or video frames meeting set conditions are used. The deep neural network may include a convolutional layer, a fully connected layer, a pooling layer, and a batch normalization (batch norm) layer, so as to perform the process of acquiring the target object or at least part of the target object (i.e., the acquisition operation in the target extraction module 202 described above) on the image or video frame satisfying the set condition.

The recognition module 204 is configured to recognize the target image and recognize identification data for distinguishing the target object. The identification data may be various data extracted from the target image and capable of distinguishing the target object, including but not limited to patterns, chinese characters, letters, numbers and symbols. For a person, the number may be a height, a pattern, a character, a letter and/or data of a wearing article, for an artificial article, such as a car, the letter and the data on the license plate number may be combined, or of course, a part of the license plate number, such as the last three digits of the license plate number.

In some embodiments, identifying the target image and identifying the identification data distinguishing the target objects may include: through the artificial neural network, the identification data in the picture is positioned and respectively identified: extracting all candidate frames of the image, adjusting the size of each candidate frame to adapt to the input size of the artificial neural network, obtaining a characteristic diagram through the convolutional neural network, and inputting the characteristic diagram into a classification network, wherein the classification network can identify the characteristic diagram, and finally obtaining identification data information to be obtained in an original diagram. Taking an example that a target object is an automobile, wherein at least part of a target image of the target object comprises an example of a license plate of the automobile, extracting all candidate frames of the image, adjusting the size of each candidate frame to adapt to the input size of a convolutional neural network, inputting a feature map obtained by the convolutional neural network into a classification network, wherein the classification network can identify the feature map, finally obtaining information to be obtained in an original map, and obtaining the license plate number of one automobile in a parking lot task (the identification data comprises a combination of letters and numbers).

In some embodiments, the artificial neural network used by the recognition module 204 may be a deep neural network, and the deep neural network algorithm is divided into two parts, a training process and a use process. During the training process, historically collected target images are used. The deep neural network may include a convolutional layer, a fully-connected layer, a pooling layer, and a batch normalization (batch norm) layer to perform a process of recognizing a target object on a target image (i.e., the recognition operation in the recognition module 204 described above).

In some embodiments, the processing apparatus 200 may further include an enhancing module 203, where the enhancing module 203 obtains an image or a video frame at a second resolution by processing the image or the video frame, whose resolution is the first resolution, of the image or the video frame that satisfies the setting condition, where the second resolution is higher than the first resolution. If the resolution of the image obtained from the target extraction module 202 is low, the accuracy of the identification directly performed by the identification module 204 is not high, so that the image with high resolution can be obtained by processing the image with low resolution by the enhancement module 203, so that the definition is improved. The module can be composed of convolution and deconvolution, the features are extracted through convolution, a high-dimensional feature map is obtained through deconvolution, nonlinear mapping is carried out through a convolution layer with the convolution kernel size of 1 x 1, the high-dimensional feature map is mapped to another high-dimensional feature map, and finally, a convolution layer is used for reconstruction to obtain a high-resolution image.

In some embodiments, the processing device comprises a neural network processor integrating at least one of the data preprocessing module, the target extraction module 202, the data processing module, and the enhancement module. That is, four modules in the processing device can respectively use four processors, each neural network processor has the same structure as that shown in fig. 3, and the processor part can carry out training and reasoning.

As shown in fig. 3, in some embodiments, the neural network processor includes a storage unit 310, a control unit 320, and an arithmetic unit 330, wherein the storage unit 310 is used for storing input data (which may be input neurons), neural network parameters, and instructions; the control unit 320 is used for reading the special instruction from the storage unit 310, decoding the special instruction into an instruction of the arithmetic unit 330, and inputting the instruction to the arithmetic unit 330; the operation unit 330 is configured to perform corresponding neural network operation on the data according to an instruction of the operation unit 330, so as to obtain an output neuron. The storage unit 310 may further store the output neurons obtained by the operation of the operation unit 330. Neural network parameters herein include, but are not limited to, weights, biases, and activation functions. Preferably, the initialized weight in the parameter is a trained weight, and the artificial neural network operation can be directly carried out, so that the process of training the neural network is saved.

In some embodiments, performing the corresponding neural network operation in the operation unit 330 includes: multiplying the input neuron by the weight data to obtain a multiplication result; executing addition tree operation for adding the multiplication results step by step through an addition tree to obtain a weighted sum, and adding bias or not processing the weighted sum;

and executing activation function operation on the weighted sum which is biased or not processed to obtain the output neuron. Preferably, the activation function may be a sigmoid function, a tanh function, a ReLU function, or a softmax function.

In some embodiments, as shown in fig. 3, the neural network processor may further include a DMA340(Direct Memory Access) for storing input data, neural network parameters and instructions in the storage unit 310 for the control unit 320 and the operation unit 330 to call; the operation unit 330 is further configured to write the output neuron into the storage unit 310 after calculating the output neuron.

In some embodiments, as shown in FIG. 3, the neural network processor further comprises an instruction cache 350 for caching instructions from the direct memory access DMA340 for invocation by the control unit 320. The instruction cache 350 may be an on-chip cache, which is integrated on the neural network processor through a manufacturing process, and may improve a processing speed and save an overall operation time when an instruction is fetched.

In some embodiments, the neural network processor further comprises: an input neuron cache 370 for caching input neurons from the direct memory access DMA340 for invocation by the arithmetic unit 330; a weight cache 360 for caching weights from the DMA340 for invocation by the arithmetic unit 330; an output neuron cache 380 for storing output neurons obtained from the operation unit 330 for output to the DMA 340. The input neuron buffer 370, the weight buffer 360 and the output neuron buffer 380 may also be on-chip buffers, and are integrated on a neural network processor through a semiconductor process, so that the processing speed can be increased when the operation unit 330 reads and writes, and the overall operation time can be saved.

Based on the same inventive concept, the embodiment of the present disclosure further provides a control system, including: the intelligent camera and the control terminal of at least one of the above embodiments, the number of the intelligent cameras is configured to cover a set place; and the control end is used for receiving the identification data processed by each intelligent camera and determining the position of the target object corresponding to the identification data in a set place.

Referring to fig. 4 and 5, in a setting place 430 (for example, various internal or external spaces including, but not limited to, a parking lot, a square, a classroom or an office), image capture and processing are performed by the smart camera 410, for example, in the captured image and/or video frame, an image at least partially including an object 420 (for example, an automobile) is schematically shown, a plurality of automobiles may be actually provided (and the parking positions may also be irregular), and after the image is captured and identified by the smart camera, identification data (for example, a license plate number or a part of the license plate number) of the object may be determined.

Since the setting place 430 is controlled by at least one camera, the whole setting place can be photographed and analyzed under normal conditions, and which intelligent camera 410 photographs the target object 420 can be analyzed; the direction of the target object at the set location can be specified from the image captured by the smart camera (here, the reference acquisition may be performed based on the position information of the smart camera).

Referring to fig. 5, the control end 440 receives the identification data of the target determined by each intelligent camera 410, and may also perform statistics, analysis and display by combining the orientation information of the target (intelligent camera 410), for example, analyze where the set location 430 is full or has a vacancy, for example, analyze the number of targets having the same or similar identification data in the set location, or determine the orientation information of the target by searching the identification data; or determining the marking data of the target object by searching the azimuth information, etc.

In some embodiments, the control terminal 440 may include a display device and/or a voice output device for outputting the position information of the target object at the set location determined by the control terminal 440 as described above.

In the embodiments provided in the present disclosure, it should be understood that the disclosed related devices and methods may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the described parts or modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of parts or modules may be combined or integrated into a system, or some features may be omitted or not executed.

In this disclosure, the term "and/or" may have been used. As used herein, the term "and/or" means one or the other or both (e.g., a and/or B means a or B or both a and B).

In the description above, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It will be apparent, however, to one skilled in the art, that one or more other embodiments may be practiced without some of these specific details. The specific embodiments described are not intended to limit the disclosure but rather to illustrate it. The scope of the present disclosure is not to be determined by the specific examples provided above but only by the claims below. In other instances, well-known circuits, structures, devices, and operations are shown in block diagram form, rather than in detail, in order not to obscure an understanding of the description. Where considered appropriate, reference numerals or terminal portions of reference numerals have been repeated among the figures to indicate corresponding or analogous elements, optionally having similar characteristics or identical features, unless otherwise specified or evident.

Each functional unit/subunit/module/submodule in the present disclosure may be hardware, for example, the hardware may be a circuit, including a digital circuit, an analog circuit, and the like. Physical implementations of hardware structures include, but are not limited to, physical devices including, but not limited to, transistors, memristors, and the like. The computing module in the computing device may be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, ASIC, and the like. The memory unit may be any suitable magnetic or magneto-optical storage medium, such as RRAM, DRAM, SRAM, EDRAM, HBM, HMC, etc.

It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The utility model provides an intelligent camera which characterized in that includes:

the image shooting device is used for shooting or receiving images and/or videos containing the target object; and a processing device comprising:

a data preprocessing module for preprocessing the image taken and ^ erOr selecting an image or a video frame which meets a set condition from the video, wherein the set condition comprises that the image or the video frame with the difference more than a set threshold value is extracted from the shot image and/or video; the extracting of the image or video frame having the difference above the set threshold value includes: through artificial neural network, carry out the processing to the video, specifically include: taking a current video frame T, carrying out feature extraction on the video frame through convolution, and finally obtaining a score f through an output layer of a neural network_TAs a score for the video frame, representing a feature of the frame, f_TAnd f₀Comparison, f₀Initializing to 0, if the difference is larger than the set threshold, taking the frame as the data of the subsequent module, and taking f as the data of the subsequent module_TIs assigned to f₀Taking the next video frame T +1, and repeating the processing until all the video frames are finished;

the enhancement module is used for processing the image or video frame with the resolution ratio of the image or video frame meeting the set condition as the first resolution ratio to obtain the image or video frame with the second resolution ratio, wherein the second resolution ratio is higher than the first resolution ratio;

the target extraction module is used for target detection and acquiring a target object in an image or a target image of at least part of the target object, and comprises: reading one image or one video frame through an artificial neural network, and acquiring a target object or at least partial target image of the target object; the acquiring a target image of the target object or at least a part of the target object comprises: generating a candidate region by using a selective search algorithm through an artificial neural network, dividing the picture into a plurality of small regions, and merging the small regions according to the similarity through a hierarchical grouping method to obtain a target object or at least partial candidate bounding boxes of the target object; adopting a sliding window method for the candidate boundary frame, sliding on the boundary frame according to the target object or at least part of the target object as the window size, and obtaining the target object or at least part of the target image area of the target object;

and the identification module is used for identifying the target image and identifying the identification data for distinguishing the target object.

2. The intelligent camera of claim 1, wherein the target object is a human, an animal, a plant, a natural object, or an artificial object.

3. The intelligent camera of claim 2, wherein the artifact is a car and the at least a portion of the image of the target includes a license plate of the car.

4. The smart camera of claim 1, wherein the identification data comprises at least one of:

patterns, Chinese characters, letters, numbers and symbols.

5. The intelligent camera according to claim 1, wherein the recognizing module recognizes the target image and recognizes identification data for distinguishing the target object, and the recognizing module includes:

through the artificial neural network, the identification data in the picture is positioned and respectively identified: extracting all candidate frames of the image, adjusting the size of each candidate frame to adapt to the input size of the artificial neural network, obtaining a characteristic diagram through the convolutional neural network, and inputting the characteristic diagram into a classification network, wherein the classification network can identify the characteristic diagram, and finally obtaining identification data information to be obtained in an original diagram.

6. The smart camera of claim 1, wherein the processing device comprises a neural network processor that integrates at least one of the data pre-processing module, the target extraction module, the data processing module, and the enhancement module.

7. The smart camera of claim 6, wherein the neural network processor comprises:

the storage unit is used for storing input data, neural network parameters and instructions;

the control unit is used for reading the special instruction from the storage unit, decoding the special instruction into an arithmetic unit instruction and inputting the arithmetic unit instruction to the arithmetic unit;

and the operation unit is used for executing corresponding neural network operation on the data according to the operation unit instruction to obtain an output neuron.

8. The intelligent camera according to claim 7, wherein the operation unit performs the corresponding neural network operation including:

multiplying the input neuron by the weight data to obtain a multiplication result;

executing addition tree operation for adding the multiplication results step by step through an addition tree to obtain a weighted sum, and adding bias or not processing the weighted sum;

and executing activation function operation on the weighted sum which is biased or not processed to obtain the output neuron.

9. The smart camera of claim 7, wherein the processor further comprises:

the preprocessing unit is used for preprocessing the image and/or video data shot by the camera and converting the image and/or video data into a face recognition result, and the face recognition result is data in accordance with a neural network input format;

and/or direct memory access DMA, input data, neural network parameters and instructions stored in the memory unit for the control unit and the arithmetic unit to call.

10. The smart camera of claim 9, wherein the processor further comprises at least one of:

the instruction cache is used for accessing the DMA cache instruction from the direct memory for the calling of the control unit;

the input neuron cache is used for inputting neurons from the direct memory access DMA cache for being called by the operation unit;

the weight cache is used for accessing the DMA cache weight from the direct memory for the calling of the arithmetic unit; and

and the output neuron cache is used for storing the output neurons obtained from the operation unit after operation so as to output the output neurons to the direct memory access DMA.

11. The intelligent camera of claim 10, wherein the instruction cache, input neuron cache, weight cache, and output neuron cache are on-chip caches.

12. A control system, comprising:

at least one smart camera as recited in any one of claims 1-11, wherein the number of smart cameras is configured to cover a defined location;

13. The control system of claim 12, wherein the control end comprises:

and the display device and/or the voice output device are used for outputting the position information of the target object in the set place determined by the control end.