CN109714526A

CN109714526A - Intelligent video camera head and control system

Info

Publication number: CN109714526A
Application number: CN201811402218.4A
Authority: CN
Inventors: 周诗怡; 陈云霁
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2018-11-22
Filing date: 2018-11-22
Publication date: 2019-05-03
Anticipated expiration: 2038-11-22
Also published as: CN109714526B

Abstract

The disclosure provides a kind of intelligent video camera head and the control system comprising it.Wherein intelligent video camera head includes: Image intake device, for shooting or receiving image and/or video containing object；And processing unit, comprising: data preprocessing module is chosen from the image and/or video of shooting and meets the image to impose a condition or video frame；Object extraction module is used for target detection, obtains at least part of target image of the object or object in image；Identification module identifies target image, identifies the mark data for distinguishing object.In the disclosure, by the way that several modules handled using neural network are arranged in intelligent video camera head, processing speed can be improved.

Description

Intelligent video camera head and control system

Technical field

The present invention relates to this disclosure relates to technical field of information processing, and in particular to a kind of intelligent video camera head, and comprising The control system of the intelligent video camera head.

Background technique

There is the occasion of demand to intelligent video camera head, there can be several insoluble problems.For example, in large parking lot, It often can not find the specific stand of the vehicle of oneself, can only help by key and multidigit staff and find guidance, it can To handle the parking stall problem found and parked in parking lot with intelligent video camera head help；It is found in the place containing a large amount of personages specific Personnel, can only also be carried out substantially by means of estimation mode at present；In numerous animals and plants, the animals and plants of particular species are found Deng can only also be carried out by means of the mode of range estimation.

Summary of the invention

(1) technical problems to be solved

In view of this, the purpose of the present invention is to provide a kind of intelligent video camera heads, and the control comprising the intelligent video camera head System processed.

(2) technical solution

According to the one side of the disclosure, a kind of intelligent video camera head is provided, comprising:

Image intake device, for shooting or receiving image and/or video containing object；And

Processing unit, comprising: data preprocessing module is chosen from the image and/or video of shooting and meets setting condition Image or video frame；Object extraction module is used for target detection, and the object or object in acquisition image are at least Partial target image；Identification module identifies target image, identifies the mark data for distinguishing object.

In a further embodiment, processing unit further include: enhancing module, by the image for meeting setting condition Either the resolution ratio of video frame be the image of first resolution or video frame handled to obtain the image of second resolution or Video frame, the second resolution are higher than the first resolution.

In a further embodiment, setting condition includes: from the image and/or video of shooting, and extracting has The image or video frame of the above difference of given threshold.

In a further embodiment, extracting the video frame with the above difference of given threshold includes: by artificial Neural network processes video, specifically includes: taking current video frame T, video frame carries out feature extraction through convolution, most passes through afterwards The output layer of neural network obtains a scoring f_T, as the score of the video frame, represent the feature of the frame, f_TWith f₀Compare, f₀ It is initialized as 0, if difference is greater than given threshold, data of the frame as subsequent module, by f_TIt is assigned to f₀, remove a video Frame T+1, repeats the above, until completing all video frames.

In a further embodiment, object behaviour, animal, plant, natural object or object is manually manufactured.

In a further embodiment, manually manufacture object is automobile, and at least partly image of the object includes vapour Vehicle license plate.

In a further embodiment, in object extraction module, be used for target detection, obtain image in object or At least part of target image of person's object, comprising: by artificial neural network, wherein piece image or video frame are read, Obtain at least part of target image of object or object.

In a further embodiment, it includes: logical for obtaining object or at least part of target image of object Artificial neural network is crossed, candidate region is generated using selection searching algorithm, picture is divided into many zonules, passes through level point Group method merges according to similarity, obtains at least part of boundary candidate frame of object or object；To boundary candidate frame Using sliding window method, according at least part of scale of object or object as window size in boundary upper slide of frame It is dynamic, obtain at least part of object region of object or object.

In a further embodiment, mark data includes following at least one: pattern, Chinese character, letter, number and symbol Number.

In a further embodiment, in identification module, target image is identified, identifies and distinguishes object Mark data, comprising: by artificial neural network, position picture identification data, identified respectively: all of image are extracted Candidate frame is sized to each candidate frame and adapts to artificial neural network input size, obtained by convolutional neural networks Characteristic pattern, in being input to sorter network, the sorter network energy identification feature figure is final to obtain the mark to be obtained in original graph Data information.

In a further embodiment, processing unit includes neural network processor, integrates the data prediction mould Block, object extraction module, data processing module and at least one for enhancing module.

In a further embodiment, neural network processor includes: storage unit, for storing the input number According to neural network parameter and instruction；Control unit for reading special instruction from the storage unit, and is decoded into Arithmetic element instructs and is input to arithmetic element；Arithmetic element is corresponding for being executed according to arithmetic element instruction to the data Neural network computing, obtain output neuron.

In a further embodiment, in arithmetic element, executing corresponding neural network computing includes: by input nerve Member is multiplied with weight data, obtains multiplied result；Add tree operation is executed, for the multiplied result to be passed through add tree step by step It is added, obtains weighted sum, weighted sum biasing is set or is not processed；The weighted sum set or be not processed to biasing executes activation letter Number operation, obtains output neuron.

In a further embodiment, processor further include: pretreatment unit, the image for being absorbed to video camera And/or video data is pre-processed, and face recognition result is converted into, which is to meet neural network to input lattice The data of formula；And/or direct memory access DMA, the input data for being stored in storage unit, neural network parameter and refer to It enables, so that control unit and arithmetic element are called.

In a further embodiment, processor further includes following at least one: instruction buffer, is used for from described direct Memory access DMA cache instruction is called for control unit；Neuron caching is inputted, for slow from the direct memory access DMA Input neuron is deposited, is called for arithmetic element；Weight caching, for caching weight from the direct memory access DMA, for operation Cell call；And output neuron caching, the output neuron after operation is obtained for storing from the arithmetic element, with defeated Out to direct memory access DMA.

In a further embodiment, instruction buffer, input neuron caching, weight caching and output neuron caching For on piece caching.

According to another aspect of the present disclosure, a kind of control system is provided, comprising: a kind of intelligent video camera head of any of the above, institute The quantity configuration of intelligent video camera head is stated as one setting place of camera shooting covering；

Control terminal receives the mark data of each intelligent video camera head processing, determines that mark data corresponds to object and setting Determine the position in place.

In a further embodiment, control terminal further include: display device and/or instantaneous speech power, for exporting Location information of the object that control terminal determines in setting place.

(3) beneficial effect

By providing the control system containing intelligent video camera head, be conducive to the management in parking lot, or to the unrest on road The vehicle for stopping leaving about is identified；

By the way that several modules handled using neural network are arranged in intelligent video camera head, processing speed is improved；

By handling several modular concurrents execution of intelligent video camera head, the speed of processing is further speeded up.

Detailed description of the invention

Fig. 1 is the intelligent video camera head schematic illustration of the embodiment of the present invention.

Fig. 2 is the functional-block diagram of processing unit in Fig. 1.

Fig. 3 is the neural network processor functional-block diagram in the processing unit of Fig. 2.

Fig. 4 is the intelligent video camera head application scenarios schematic diagram of the embodiment of the present invention.

Fig. 5 is the control system schematic diagram of the embodiment of the present invention.

Specific embodiment

Below with reference to the attached drawing in the embodiment of the present disclosure, the technical solution in the embodiment of the present disclosure is carried out clear, complete Ground description, it is clear that described embodiment is only disclosure a part of the embodiment, instead of all the embodiments.Based on this Disclosed embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, belongs to the protection scope of the disclosure.

According to the basic conception of the disclosure, a kind of intelligent video camera head, including Image intake device and processing unit, shadow are provided As capturing apparatus is used for for shooting or receiving image and/or video frame containing object, processing unit from Image intake device It is chosen in the image and/or video of shooting and meets the image to impose a condition or video frame, then obtained in image or video frame Object or object at least part of target image；And identify the mark data for distinguishing object.Pass through the intelligence Can camera by operation, can identify that the unlabeled data of object distinguishes object, (such as unite so that the later period is further processed Meter analysis, lookup object etc.).

Fig. 1 is 100 schematic illustration of intelligent video camera head of the embodiment of the present invention.As shown in Figure 1, intelligent video camera head 100 wraps Include Image intake device 120 and processing unit 110.

Wherein, for Image intake device 120, it is used to shoot or receive image and/or video containing object.Its In, Image intake device 120 can be the existing various electronic equipments for capableing of image recording and/or video in the prior art, External information is obtained by electromagnetism or optics or other signal sources, structure can refer to the various prior arts of the prior art The various electronic equipments with image capturing function, including but not limited to video camera, camera, camera, with shooting function Mobile phone, tablet computer, but the electronic equipment with image capturing function of the prior art does not include to image data progress nerve The functional unit or module of network operations.

In embodiments of the present invention, object can be people, animal, plant, natural object or manually manufacture object, and individual Between can have differentiation, can be identified by mark data.Such as object is people, mark data can be people's wearing Clothes pattern, text and/or number, mark data are also possible to the body part or global feature of people, including but not limited to send out Type, shape of face or height；It, can be by wear or animal partly or wholly such as object is animal, with the mankind Feature differentiation comes；Such as object is artificial manufacture object, is further illustrated here by enumerating automobile, for example pass through vapour Vehicle license plate is distinguished to distinguish automobile, or by vehicle color, model etc..

Fig. 2 is the functional-block diagram of processing unit in Fig. 1.It in some embodiments, can be with for processing unit 200 Including data preprocessing module 201, object extraction module 202 and identification module 204.

Wherein, data preprocessing module 201 from image captured by Image intake device 120 and/or video for selecting It takes and meets the image to impose a condition or video frame.Here setting condition may include from the image and/or video of shooting Extract image or video frame with the above difference of given threshold.Further, it can choose as follows, Ke Yitong Artificial neural network to be crossed, video is processed, is specifically included: taking current video frame T, video frame carries out feature extraction through convolution, Most the output layer through neural network obtains a scoring f afterwards_T, as the score of the video frame, represent the feature of the frame, f_TWith f₀ Compare, f₀It is initialized as 0, if difference is greater than given threshold, data of the frame as subsequent module, by f_TIt is assigned to f₀, remove One video frame T+1, repeats the above, until completing all video frames.

In some embodiments, artificial neural network used in data preprocessing module 201 can be depth nerve net Network, deep neural network algorithm are divided into training process and use process two parts.In the training process, the figure that usage history is collected Picture or video content.Optionally, different data sets is used to different target object, for example, in the task of parking lot, with mark The video in the different parking lots of license plate number utilizes the data as training set training book deep neural network.Here depth mind It may include convolutional layer, full articulamentum, pond layer and batch normalization (batch norm) layer through network, to video or image Do bulk processing (selection operation in data preprocessing module 201 namely described above).

Wherein, Objective extraction (attach) module 202 is used for target detection, obtains object or object in image At least part of target image.Here for the selection of at least part of target image, it should be that this part chosen can Make the parts of images that differentiation is formed between object, such as the image of the part containing automotive license plate.

In some embodiments, at least part of target image for obtaining object or object may include: with choosing It selects searching algorithm and generates candidate region, picture is divided into many zonules, is merged by level group technology according to similarity, Obtain at least part of boundary candidate frame of object or object；Sliding window method is used to boundary candidate frame, according to object Perhaps at least part of scale of object is slided on bounding box as window size obtains object or object At least part of object region.Using object as automobile, at least part of target image of object includes automobile for citing License plate for carrying out the network of target detection for example, can be trained, with selecting searching algorithm to generate candidate region, first Picture is divided into many zonules by simple region division, is then closed by level group technology according to certain similarity And the boundary candidate frame of many vehicles is obtained, then, sliding window method is used to the boundary candidate frame of each vehicle, according to license plate Scale is slided on bounding box as window size, obtains license plate area.

In some embodiments, the artificial neural network that object extraction module 202 uses can be deep neural network, deep Degree neural network algorithm is divided into training process and use process two parts.In the training process, the satisfaction that usage history is collected is set The image or video frame of fixed condition.Here deep neural network may include convolutional layer, full articulamentum, pond layer and batch return One changes (batch norm) layer, obtains object or object extremely to do to the image or video frame that meet setting condition Least a portion of processing (the acquisition operation in object extraction module 202 namely described above).

Wherein, identification module 204 identifies the mark data for distinguishing object for identifying to target image.This In mark data, can be the various data that can distinguish object extracted from target image, including but not limited to scheme Case, Chinese character, letter, numbers and symbols.For people, height, the pattern of wear, text, letter and/or number can be According to for artificiality, such as automobile, the letter that can be in license plate number is combined with data, it is of course also possible to be part License plate number, such as latter three of license plate number.

In some embodiments, target image is identified, identifies that the mark data for distinguishing object may include: By artificial neural network, picture identification data is positioned, is identified respectively: all candidate frames of image is extracted, to each A candidate frame, is sized and adapts to artificial neural network input size, and the characteristic pattern obtained by convolutional neural networks is inputting Into sorter network, the sorter network energy identification feature figure is final to obtain the marking data information to be obtained in original graph.Citing Using object as automobile, at least part of target image of object includes automotive license plate for example, extracting all times of image Frame is selected, to each candidate frame, is sized the input size for adapting to convolutional neural networks, the feature that convolutional neural networks obtain Figure, is input in sorter network, the sorter network energy identification feature figure, final to obtain the information to be obtained in original graph, is stopping In the task of parking lot, that is, obtain the license plate number (combination that identification data here include letter and number) of many vehicles.

In some embodiments, the artificial neural network that identification module 204 uses can be deep neural network, depth mind It is divided into training process and use process two parts through network algorithm.In the training process, the target image that usage history is collected.This In deep neural network may include convolutional layer, full articulamentum, pond layer and batch normalization (batch norm) layer, to mesh Logo image identify the processing (the identification operation in identification module 204 namely described above) of object.

In some embodiments, processing unit 200 can also include enhancing module 203, which passes through to full The resolution ratio of the image or video frame that impose a condition enough is that the image of first resolution or video frame are handled to obtain second The image or video frame of resolution ratio, the second resolution are higher than the first resolution.Such as obtained from object extraction module 202 The resolution ratio of the image obtained is lower, and the accuracy rate directly identified by identification module 204 is not high, it is possible to pass through enhancing Module 203 handles the image of low resolution to obtain high-resolution image, so that clarity improves.This module can be by Convolution sum deconvolution composition extracts feature by convolution, the characteristic pattern of a higher-dimension is obtained by deconvolution, then through a convolution The convolutional layer that core size is 1*1 carries out Nonlinear Mapping, the characteristic pattern of higher-dimension is mapped to the characteristic pattern of another higher-dimension, most Afterwards, then with a convolutional layer it is rebuild, obtains high-resolution image.

In some embodiments, the processing unit includes neural network processor, integrate the data preprocessing module, Object extraction module 202, data processing module and at least one for enhancing module.That is four moulds in the processing unit Block can be respectively with four processors, and each neural network processor has identical structure as shown in figure 3, processor part can be with It is trained and reasoning.

As shown in figure 3, in some embodiments, neural network processor includes storage unit 310,320 and of control unit Arithmetic element 330, wherein storage unit 310 is for storing input data (can be used as input neuron), neural network parameter And instruction；Control unit 320 is decoded into arithmetic element 330 for reading special instruction from the storage unit 310 It instructs and is input to arithmetic element 330；Arithmetic element 330 is used to execute the data according to the instruction of arithmetic element 330 corresponding Neural network computing, obtain output neuron.Wherein, storage unit 310 can also be stored obtains after 330 operation of arithmetic element The output neuron obtained.Here neural network parameter includes but is not limited to weight, biasing and activation primitive.Preferably, Initialization weight in parameter is trained weight, can directly carry out artificial neural network operation, save to nerve net The process that network is trained.

In some embodiments, executed in arithmetic element 330 corresponding neural network computing include: will input neuron and Weight data is multiplied, and obtains multiplied result；Add tree operation is executed, for the multiplied result to be passed through add tree phase step by step Add, obtain weighted sum, weighted sum biasing is set or is not processed；

The weighted sum set or be not processed to biasing executes activation primitive operation, obtains output neuron.Preferably, Activation primitive can be sigmoid function, tanh function, ReLU function or softmax function.

In some embodiments, as shown in figure 3, neural network processor can also include DMA340 (Direct Memory Access, direct memory access), the input data for being stored in storage unit 310, neural network parameter and instruction, for Control unit 320 and arithmetic element 330 are called；Further it is also used to after arithmetic element 330 calculates output neuron, to The output neuron is written in storage unit 310.

In some embodiments, it as shown in figure 3, neural network processor further includes instruction buffer 350, is used for from described straight Memory access DMA340 cache instruction is connect, is called for control unit 320.The instruction buffer 350 can cache on piece, pass through Preparation process is integrated on neural network processor, processing speed can be improved when instruction is transferred, when saving integral operation Between.

In some embodiments, neural network processor further include: input neuron caching 370 is used for from described straight Memory access DMA340 caching input neuron is connect, is called for arithmetic element 330；Weight caching 360 is used for from described direct Memory access DMA340 caches weight, calls for arithmetic element 330；Output neuron caching 380 is used to store from the fortune It calculates unit 330 and obtains the output neuron after operation, with output to direct memory access DMA340.Above-mentioned input neuron caching 370, weight caching 360 and output neuron caching 380 or on piece caching, are integrated in nerve by semiconductor technology On network processing unit, processing speed can be improved when reading and writing for arithmetic element 330, saves the integral operation time.

Based on the same inventive concept, the embodiment of the present disclosure also provides a kind of control system, comprising: at least one above-mentioned implementation Intelligent video camera head and control terminal described in example, the quantity configuration of the intelligent video camera head are one setting place of camera shooting covering；Control End processed is used to receive the mark data of each intelligent video camera head processing, determines that mark data corresponds to object in setting place Position.

Referring to fig. 4 and shown in Fig. 5, in a setting place 430 (such as various internal or external spaces, including but it is unlimited In parking lot, square, classroom or office), image capturing and processing are carried out by intelligent video camera head 410, for example, being absorbed Image and/or video frame in, at least partly include the image of object 420 (such as automobile), be only schematic table in figure Show can actually there be multiple automobiles (and stand or irregular), it, can be with after shooting and identify by intelligent video camera head Determine the unlabeled data (such as license plate number or part license plate number) of object.

Due to being deployed to ensure effective monitoring and control of illegal activities by least one camera to setting place 430, it under normal circumstances, can be right Entire setting place is imaged and is analyzed, and analyzes which intelligent video camera head 410 takes in object 420；Furthermore, it is possible to logical The image for crossing intelligent video camera head intake determines that object (may be otherwise and install according to intelligent video camera head in the orientation in setting place Location information carry out referring to acquisition).

Referring to Figure 5, control terminal 440 receives the mark data of object determined by each intelligent video camera head 410, may be used also It with the azimuth information of combining target object (intelligent video camera head 410), counted, analyzed and is shown, for example analyze setting field 430 where have expired or have also had vacant position, for example analyze the object number that setting place has same or similar mark data It measures, or determines the azimuth information of object by Search Flags data；Or object is determined by searching for azimuth information Unlabeled data etc..

In some embodiments, control terminal 440 may include display device and/or instantaneous speech power, for exporting such as Shown on the object that determines of the control terminal 440 introduced setting place location information.

In embodiment provided by the disclosure, it should be noted that, disclosed relevant apparatus and method can pass through others Mode is realized.For example, the apparatus embodiments described above are merely exemplary, such as the division of the part or module, Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple portions or module can be with In conjunction with being perhaps desirably integrated into a system or some features can be ignored or does not execute.

In the disclosure, term "and/or" may be had been used.As used herein, term "and/or" means one Or other or both (for example, A and/or B mean A or B or both A and B).

In the above description, for purpose of explanation, elaborate numerous details in order to provide each reality to the disclosure Apply the comprehensive understanding of example.However, the skilled person will be apparent that, without certain in these details Implementable one or more other embodiments.Described specific embodiment be not limited to the disclosure but in order to illustrate. The scope of the present disclosure is not determined by specific example provide above, is only determined by following claim.At other In the case of, in form of a block diagram, rather than it is illustrated in detail known circuit, structure, equipment, and operation is so as not to as making to retouching The understanding stated thickens.In place of thinking to be suitable for, the ending of appended drawing reference or appended drawing reference is weighed in all attached drawings It is multiple to indicate optionally correspondence or similar element with similar characteristics or same characteristic features, unless otherwise specifying or Obviously.

Each functional unit/subelement/module/submodule can be hardware in the disclosure, for example the hardware can be electricity Road, including digital circuit, analog circuit etc..The physics realization of hardware configuration includes but is not limited to physical device, physics device Part includes but is not limited to transistor, memristor etc..Computing module in the computing device can be any appropriate hard Part processor, such as CPU, GPU, FPGA, DSP and ASIC etc..The storage unit can be any magnetic storage appropriate and be situated between Matter or magnetic-optical storage medium, such as RRAM, DRAM, SRAM, EDRAM, HBM, HMC etc..

It is apparent to those skilled in the art that for convenience and simplicity of description, only with above-mentioned each function The division progress of module can according to need and for example, in practical application by above-mentioned function distribution by different function moulds Block is completed, i.e., the internal structure of device is divided into different functional modules, to complete all or part of function described above Energy.

Particular embodiments described above has carried out further in detail the purpose of the present invention, technical scheme and beneficial effects Describe in detail bright, it should be understood that the above is only a specific embodiment of the present invention, is not intended to restrict the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done should be included in protection of the invention Within the scope of.

Claims

1. a kind of intelligent video camera head, characterized by comprising:

Image intake device, for shooting or receiving image and/or video containing object；And processing unit, comprising:

Data preprocessing module is chosen from the image and/or video of shooting and meets the image to impose a condition or video frame；

Object extraction module is used for target detection, obtains at least part of target figure of the object or object in image Picture；

Identification module identifies target image, identifies the mark data for distinguishing object.

2. intelligent video camera head according to claim 1, which is characterized in that the processing unit further include:

Enhance module, by the image or view that the resolution ratio for meeting the image or video frame that impose a condition are first resolution Frequency frame is handled to obtain the image of second resolution or video frame, and the second resolution is higher than the first resolution.

3. intelligent video camera head according to claim 1, which is characterized in that the setting condition includes:

From the image and/or video of shooting, image or video frame with the above difference of given threshold are extracted.

4. intelligent video camera head according to claim 3, which is characterized in that extract the view with the above difference of given threshold Frequency frame includes:

By artificial neural network, video is processed, is specifically included: taking current video frame T, video frame carries out feature through convolution It extracts, most the output layer through neural network obtains a scoring f afterwards_T, as the score of the video frame, represent the feature of the frame, f_T With f₀Compare, f₀It is initialized as 0, if difference is greater than given threshold, data of the frame as subsequent module, by f_TIt is assigned to f₀, A video frame T+1 is removed, is repeated the above, until completing all video frames.

5. intelligent video camera head according to claim 1, which is characterized in that the object behaviour, animal, plant, nature Object manually manufactures object.

6. intelligent video camera head according to claim 5, which is characterized in that the artificial manufacture object is automobile, the target At least partly image of object includes automotive license plate.

7. intelligent video camera head according to claim 1, which is characterized in that in the object extraction module, examined for target It surveys, obtains at least part of target image of the object or object in image, comprising:

By artificial neural network, wherein piece image or video frame are read, obtains object or object at least partly Target image.

8. intelligent video camera head according to claim 7, which is characterized in that obtain object or object at least partly Target image include:

By artificial neural network, candidate region is generated using selection searching algorithm, picture is divided into many zonules, is passed through Level group technology merges according to similarity, obtains at least part of boundary candidate frame of object or object；To candidate Bounding box uses sliding window method, according at least part of scale of object or object as window size in bounding box Upper sliding obtains at least part of object region of object or object.

9. intelligent video camera head according to claim 1, which is characterized in that the mark data includes following at least one:

Pattern, Chinese character, letter, numbers and symbols.

10. intelligent video camera head according to claim 1, which is characterized in that in the identification module, carried out to target image Identification identifies the mark data for distinguishing object, comprising:

By artificial neural network, picture identification data is positioned, is identified respectively: extracting all candidate frames of image, it is right Each candidate frame is sized and adapts to artificial neural network input size, the characteristic pattern obtained by convolutional neural networks, It is input in sorter network, the sorter network energy identification feature figure, it is final to obtain the marking data information to be obtained in original graph.

11. intelligent video camera head according to claim 2, which is characterized in that the processing unit includes Processing with Neural Network Device integrates the data preprocessing module, object extraction module, data processing module and at least one for enhancing module.

12. intelligent video camera head according to claim 11, which is characterized in that the neural network processor includes:

Storage unit, for storing the input data, neural network parameter and instruction；

Control unit for reading special instruction from the storage unit, and is decoded into arithmetic element and instructs and input To arithmetic element；

Arithmetic element obtains output mind for executing corresponding neural network computing to the data according to arithmetic element instruction Through member.

13. intelligent video camera head according to claim 12, which is characterized in that in the arithmetic element, execute corresponding mind Include: through network operations

Input neuron is multiplied with weight data, obtains multiplied result；

Execute add tree operation, for the multiplied result to be added step by step by add tree, obtain weighted sum, to weighted sum plus It biases or is not processed；

The weighted sum set or be not processed to biasing executes activation primitive operation, obtains output neuron.

14. intelligent video camera head according to claim 12, which is characterized in that the processor further include:

Pretreatment unit, image and/or video data for absorbing to video camera pre-process, and are converted into recognition of face knot Fruit, the face recognition result are to meet the data of neural network input format；

And/or direct memory access DMA, the input data for being stored in storage unit, neural network parameter and instruction, for Control unit and arithmetic element are called.

15. intelligent video camera head according to claim 12, which is characterized in that the processor further includes following at least one Kind:

Instruction buffer, for being called for control unit from the direct memory access DMA cache instruction；

Neuron caching is inputted, for caching input neuron from the direct memory access DMA, is called for arithmetic element；

Weight caching is called for caching weight from the direct memory access DMA for arithmetic element；And

Output neuron caching, obtains the output neuron after operation for storing from the arithmetic element, to export to direct Memory access DMA.

16. intelligent video camera head according to claim 15, which is characterized in that described instruction caches, input neuron caches, Weight caching and output neuron caching are that on piece caches.

17. a kind of control system characterized by comprising

Any intelligent video camera head of an at least claim 1-16, the quantity configuration of the intelligent video camera head are camera shooting covering One setting place；

Control terminal receives the mark data of each intelligent video camera head processing, determines that mark data corresponds to object in setting field Position.

18. control system according to claim 17, which is characterized in that the control terminal includes:

Display device and/or instantaneous speech power, location information of the object determining for output control terminal in setting place.