CN112784987B

CN112784987B - Target nursing method and device based on multistage neural network cascade

Info

Publication number: CN112784987B
Application number: CN202110296284.3A
Authority: CN
Inventors: 陈辉; 熊章; 张智
Original assignee: Wuhan Xingxun Intelligent Technology Co ltd
Current assignee: Wuhan Xingxun Intelligent Technology Co ltd
Priority date: 2019-01-29
Filing date: 2019-01-29
Publication date: 2024-01-23
Anticipated expiration: 2039-01-29
Also published as: CN112784987A; CN109829542A; CN109829542B

Abstract

The invention discloses a target nursing method and device based on multistage neural network cascading. The method comprises the following steps: acquiring a video stream image acquired by a camera; selecting a logic combination relation, and determining cascading relations among all depth network models in the depth network module and corresponding output actions according to the logic combination relation; loading a corresponding depth network model; calling a multi-core dynamic allocation management instruction, calculating the complexity of the loaded depth network model, and allocating corresponding memory and a preset number of core processors for each depth network model according to the complexity; inputting the acquired video stream image into a corresponding depth network model; and analyzing scene information in the video stream image according to the appointed output information obtained after the processing of each cascaded depth network model, and executing corresponding output actions. The invention has low requirement on memory space, and various deep network models are flexibly combined to form a building block type development mode, so that the development efficiency and the interestingness of users are improved.

Description

Target nursing method and device based on multistage neural network cascade

The application is a divisional application of the invention patent application with the application number 201910088001.9, which is filed on the 1 st and 29 th 2019 and is named as a multi-core processor-based multi-element depth network model reconstruction method and device.

Technical Field

The invention relates to a data processing technology, in particular to a data processing under a deep network model by adopting a multi-core processor, and particularly relates to a target nursing method and device based on multistage neural network cascading.

Background

The existing deep network model can only process tasks such as detection, classification, segmentation and the like of a single scene. The complex scene involved will be disabled.

The existing methods for solving the complex scene analysis can generally adopt two modes: one is to use a GPU with higher cost to directly output the output information of multiple depth network models in series. However, the GPU is too high in cost and is not beneficial to large-scale use; the other is to directly perform time sequence analysis on the scene, and analyze the scene through the spatial position correlation and the time correlation between each action or each target in the scene. The training data set required by the method is special and troublesome to process, and the time sequence analysis is continuously carried out on the video, so that the required memory space is large, and the algorithm complexity is high.

Disclosure of Invention

The invention solves the technical problem of providing a multi-element depth network model reconstruction method and device based on a multi-core processor, which have low memory space requirement, low algorithm complexity, flexible combination and low cost.

To achieve the above object, in one aspect, the present invention provides a target nursing method based on multistage neural network cascade, for nursing old people and/or infants, comprising:

acquiring a video stream image acquired by a camera;

selecting a logic combination relation according to a nursing object, and determining cascade relation and corresponding output action among all depth network models in a depth network module according to the logic combination relation;

loading a corresponding depth network model according to the logic combination relation;

calling a multi-core dynamic allocation management instruction, calculating the complexity of the loaded depth network model, and allocating corresponding memories and a preset number of core processors for each depth network model according to the complexity;

inputting the acquired video stream image into a corresponding depth network model;

analyzing scene information in the video stream image according to the appointed output information obtained after the processing of each cascaded depth network model, and executing corresponding output actions;

the logical combination relation is as follows: and a valid identification area specified in the acquired video stream image.

Preferably, if the care object is an infant, the depth network model comprises a child depth detection model and a network tracking identification model, and the child depth detection model and the network tracking identification model are in secondary cascade;

if the nursing object is the old man, the depth network model comprises an old man depth detection model, a network tracking identification model and a depth network falling classification model;

the old man deep network detection model, the deep network tracking identification model and the deep network falling classification model are subjected to three-level cascade; the output of the depth network detection model of the aged is used as the input of the depth network tracking identification model, and the output of the depth network tracking identification model is used as the input of the depth network fall classification model.

Preferably, the output action is a voice prompt, wherein prompt content comprises that the child falls beyond a delimited area or the old falls.

Preferably, the allocating a corresponding memory and a predetermined number of core processors for each depth network model according to the complexity includes:

obtaining a parameter file and a weight file of each depth network model;

according to the parameter file and the weight file corresponding to each depth network model, respectively calculating the space complexity and the time complexity of each depth network model;

and allocating corresponding memory and a preset number of core processors for each depth network model according to the space complexity and the time complexity of each depth network model.

Preferably, the parameter file includes at least one of: the number of convolution layers, the number of depth convolution layers, and the convolution kernel size.

Preferably, the inputting the acquired video stream image into the corresponding depth network model includes:

obtaining the number of the detection models contained in the depth network model group;

dividing the video stream image into subcode streams corresponding to the number of the models;

and scaling each sub-code stream and then respectively sending the scaled sub-code streams into each detection model.

Preferably, the detection model is a network detection model trained based on mobiletv 2-SSD.

Preferably, the allocating a corresponding memory and a predetermined number of core processors to each depth network model according to the spatial complexity and the temporal complexity of each depth network model includes:

acquiring the computing capacity of each vector computing unit of the multi-core processor;

determining the number of vector computing units required by each depth network model according to the time complexity and the space complexity of each depth network model through each computing capability, and distributing the number of core processors to each depth network model;

wherein the number of core processors is the same as the number of vector computation units.

Preferably, the effective identification area is: the area located in the middle of the image is selected and the area is 1/2 of the whole image area.

In another aspect, the present invention also provides a target scene care device based on multistage neural network cascade, the device comprising:

the video stream image input module is used for acquiring video stream images acquired by the camera;

the logic combination module is used for selecting a logic combination relation according to a nursing object, and determining cascade relation and corresponding output action among all depth network models in the depth network module according to the logic combination relation;

the loading module is used for loading a corresponding depth network model according to the logic combination relation;

the multi-core dynamic allocation management module is used for calling a multi-core dynamic allocation management instruction, calculating the complexity of the loaded depth network model, and allocating corresponding memory and a preset number of core processors for each depth network model according to the complexity;

the depth network module is used for inputting the acquired video stream images into the corresponding depth network model;

the execution module is used for analyzing scene information in the video stream image according to the appointed output information obtained after the processing of each cascaded depth network model, and executing corresponding output actions;

According to the target nursing method and device based on multistage neural network cascading, the cascade relation among the depth network models is determined by adopting the multi-core processor according to the preset logic combination relation, then corresponding memory and core processors are allocated to each depth network model according to the complexity of the depth network models, comprehensive analysis is carried out on scenes in video stream images, the requirements on memory space are low, the algorithm complexity is low, the product cost is low, various depth network models (depth network learning algorithms) can be flexibly combined, a building block type development mode is formed, and the development efficiency and the interestingness of users are improved.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of a multi-core processor-based multi-element depth network model reconstruction method of the present invention;

FIG. 2 is a video stream image classification block diagram of the multi-core processor-based multi-element depth network model reconstruction method of FIG. 1;

fig. 3 is a schematic diagram of task classification specified by a user in the multi-core processor-based multi-deep network model reconstruction method in fig. 1.

Fig. 4 is a schematic structural diagram of a multi-core dynamic allocation management module according to an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a multi-core processor-based multi-element deep network model reconstruction device according to a preferred embodiment of the present invention.

Detailed Description

The invention will be described in detail with reference to the drawings and examples. It should be noted that, if not in conflict, the embodiments of the present invention and the features of the embodiments may be combined with each other, which are all within the scope of protection of the present invention.

Example 1

Referring to fig. 1 to fig. 4, an embodiment of the present invention provides a multi-core processor-based multi-element depth network model reconstruction method, where the multi-core processor may be a multi-core neural network processing chip or other integrated chips with a plurality of core processors, and the multi-core processor includes a predetermined number of vector computing units, typically 12 or 16 vector computing units at present, or other numbers. The computing power and on-chip cache size of each vector computation unit may be set by itself. In the embodiment of the invention, a multi-core neural network processing chip is selected and connected with a CCD camera, a camera device based on visible light or infrared light (such as a home scene, a working scene, a conference scene and the like) can be obtained through the CCD camera externally connected with an infrared light supplementing lamp to shoot a preset area of the scene, so that a real-time image of the current scene is obtained, and a visible light video stream image is adopted as a test case in the embodiment. The multi-core processor-based multi-element depth network model reconstruction method mainly comprises the following steps:

s10, acquiring a video stream image acquired by a camera; here, the video stream image is a visible light video stream image, but it may be an infrared light video stream image.

S20, selecting a logic combination relation, and determining cascade relation and corresponding output action among all depth network models in the depth network module according to the logic combination relation; the logic combination relation is designed in advance, and which depth network models are determined to be cascaded according to the needs of users.

S30, loading a corresponding depth network model according to the logic combination relation;

s40, calling a multi-core dynamic allocation management instruction, calculating the complexity of the loaded depth network model, and allocating corresponding memory and a preset number of core processors for each depth network model according to the complexity; the multi-core dynamic allocation management instruction mainly comprises a memory management instruction and a multi-core allocation management instruction. The memory management instruction is mainly used for managing a plurality of memory blocks, such as a memory block 1, a memory block 2, a memory block 3 and a … memory block n; the multi-core allocation management instruction mainly manages allocation of a plurality of core processors, such as processor 1, processor 2, and processor 3 and …, and processor n, which are exemplified herein, and the number of the processors may be equal to or different from the number of the memory blocks.

S50, inputting the acquired video stream image into a corresponding depth network model;

s60, analyzing scene information in the video stream image according to the appointed output information obtained after the cascade deep network model processing, and executing corresponding output actions.

The multi-core processor-based multi-element depth network model reconstruction method provided by the embodiment of the invention analyzes the scene, has no special requirement on a training data set, is simple to process, can continuously analyze the time sequence of the acquired video stream image, has low requirement on the memory space, is simpler in algorithm, can accurately understand the semantics of the scene, and can timely execute corresponding output actions after analyzing the scene information. The invention can flexibly combine various deep learning algorithms (namely, deep learning models) to form a building block type development mode, thereby improving the development efficiency and the interestingness of users.

In a preferred embodiment, the video stream image includes a main stream video image and a plurality of sub stream video images, and the main stream video image and the plurality of sub stream video images are respectively input into respective corresponding depth network models. The resolution and frame rate of the main stream video image and the plurality of subcode stream video images are customized by the user as desired. As shown in fig. 2, the video stream image includes one main stream video image and a plurality of sub stream images, where the plurality of sub stream images are sub stream image 1, sub stream image 2, sub stream image 3 …, and sub stream image W, where W is an integer greater than 3.

In a preferred embodiment, the invoking the multi-core dynamic allocation management instruction calculates the complexity of the loaded depth network model, and allocating a corresponding memory and a predetermined number of core processors for each depth network model according to the complexity further includes:

and calling a multi-core dynamic allocation management instruction, calculating the time complexity and the space complexity of the corresponding depth network model according to the loaded parameter file of the depth network model, and determining the memory space and the number of core processors dynamically allocated to the corresponding depth network model according to the time complexity and the space complexity.

In a preferred embodiment, the spatial complexity is calculated by the formula:

the calculation formula of the time complexity is divided into:

(1) The time complexity formula of the single convolution layer is calculated as follows:

Time～O(M ² ·K ² ·C _in ·C _out )

(2) The time complexity formula for calculating the whole depth network model is as follows:

wherein M is the size of the output feature map, K is the size of the convolution kernel, C _in Is the number of input channels, C _out The number of output channels, D is the total convolution layer number of the depth network model; l is the first convolution layer of the depth network model, C _l The number of output channels of the first convolution layer of the depth network model is also the number of convolution kernels of the convolution layer; c (C) _l-1 Is the first said convolutionNumber of input channels of a layer. Wherein M and K are numbers greater than 0 and D is an integer greater than 0.

In a preferred embodiment, said determining the memory space and the number of core processors dynamically allocated to said corresponding deep network model according to said temporal complexity and said spatial complexity further comprises:

according to the N depth network models specified by the logic combination relation, calculating the spatial complexity of each depth network model as M _i (K) Allocating a corresponding memory space for each depth network model; the calculated time complexity of each depth network model is T _i (G) The method comprises the steps of carrying out a first treatment on the surface of the Setting the computing capability of each core processor in the multi-core processors as H (G), wherein G is a unit for measuring the complexity of computing time, and the total number of the core processors required by the depth network model is sum=T _i (G) H (G), wherein N is an integer greater than 0. Each deep network model represents a deep learning algorithm.

In a preferred embodiment, the logical combination relationship includes at least one of: a cascading relationship between the specified depth network models, a selected video stream image region, a specified task, and a specified depth network model. The selected video stream image area includes: a rectangular region, a circular region, or a polygonal region with a number of sides greater than 4 is selected in the video stream image.

In a preferred embodiment, the N depth network models include: a deep network detection model, a deep network classification model, a deep network semantic segmentation model, a deep network tracking recognition model and a deep network voice recognition model; the deep network detection model is mainly based on a target detection model of deep learning and detects targets trained by users in advance. The depth network classification model is a target classification model based on deep learning, extracts depth features of an image, classifies the image, and judges what scene or target the image belongs to. The deep network semantic segmentation module is a semantic segmentation model based on deep learning, and mainly segments objects with specific meanings. The depth network tracking model is a tracking model based on deep learning, and is used for extracting depth features of an image for tracking. The deep network voice recognition model is a voice recognition model based on deep learning, and semantic meaning is extracted based on deep learning recognition of user voice information.

The cascade relationship between the deep network models is as follows:

a single-layer cascade or multi-layer cascade between the depth network detection model and another depth network detection model, a single-layer cascade or multi-layer cascade between the depth network detection model and one depth network classification model, a single-layer cascade or multi-layer cascade between the depth network classification model and another depth network classification model, a single-layer cascade or multi-layer cascade between the depth network detection model and one depth tracking model, a single-layer cascade or multi-layer cascade between the depth network detection model and one depth network semantic segmentation model, a single-layer cascade or multi-layer cascade between the depth network tracking model and one depth network semantic segmentation model. The accurate output information can be obtained through cascading, and judgment errors are reduced.

In a preferred embodiment, the specified tasks include at least one of:

safety area detection and protection, timing start-up procedures, article care, stranger intrusion alarms, face whitelist and/or blacklist alerts, wonderful instant shooting (including so-called snap shots), elderly care and child care.

In a preferred embodiment, the multi-core processor employs a front-end embedded processing chip, comprising: at least one of a multi-core DSP, a multi-core CPU and a multi-core FPGA.

In a preferred embodiment, the corresponding output action includes at least one of voice prompts, automatic video recording, automatic photographing, and flashing.

In a preferred embodiment, when cascading into a plurality of depth network models, the output information of one depth network model is used as the input information of another depth network model, and new output information is generated after calculation processing, so as to analyze the deep meaning of the scene in the video stream image.

Application example 1

In the application embodiment of the present invention, the present invention will be described in detail taking a care scenario of fall detection for the elderly as an example.

S100, selecting a multi-core neural network processing chip, wherein the multi-core neural network processing chip comprises 12 vector computing units, the computing power of each vector computing unit is 10 (G), and the on-chip cache is 2M.

S200, shooting a scene where the old people are located by adopting a CCD camera, and acquiring video stream images of infrared light or visible light; in this example, a visible light video stream image is used as a test case.

The method comprises the steps that a multi-core neural network processing chip is used for carrying a CCD camera on the chip, a camera device based on visible light or infrared light is obtained through an external infrared light supplementing lamp, and a preset area of a home scene is shot, so that a real-time image of an old person in a current scene is obtained;

s300, selecting a logic combination relation (logic combination module), and determining a cascade relation and a corresponding output action between each depth network model in the depth network module according to the logic combination relation; the logical combination relationship here is mainly: the cascading relationship between the depth network models specified by the user, the area selected by the user, the depth network type specified by the user. The output action may be a voice prompt or sending a remote network alarm signal to the designated caretaker's hand mobile terminal. Meanwhile, the electronic equipment can automatically shoot, and the picture and the alarm signal are sent to the electronic equipment of the caretaker.

Specifically, the selected logical combination relationship is to preferentially select the 1/2 area in the middle of the collected video stream image, and take the area as an effective identification area, and of course, other areas in the video stream image can also be selected to realize the old people fall detection function, where the selected depth network model includes: the system comprises three depth network models, namely an old man depth network detection model, a depth network tracking identification model, a depth network fall classification model and the like. And performing three-level cascade connection on the old man deep network detection model, the deep network tracking and identifying model and the deep network falling classification model, wherein the output of the old man deep network detection model is used as the input of the deep network tracking and identifying model, and the output of the deep network tracking and identifying model is used as the input of the deep network falling classification model. The appointed output task is to judge that the old people fall down and then send out voice to alarm.

S400, loading a corresponding depth network model, namely loading an old man depth network detection model, a depth network tracking and identifying model, a depth network falling classification model and the like which are included in the depth network model.

S500, calling a multi-core dynamic allocation management instruction, calculating the complexity of the loaded depth network model, and allocating corresponding memory and a preset number of core processors for each depth network model according to the complexity.

Specifically, the parameter files and the weight files of the three depth network models of the old man depth network detection model, the depth network tracking and identifying model and the depth network falling and classifying model are imported into a memory through a multi-core dynamic allocation management model, and the space complexity and the time complexity of the corresponding depth network model are calculated: each depth network model is provided with a unique corresponding parameter file and a weight file, wherein the parameter file describes the calculation rule of each layer, and the weight file is obtained by data training.

In this embodiment, the deep neural network detection model of the elderly is a detection model (for example, mobiletv 2-SSD, but also other deep neural networks) based on deep neural network training, and the parameter file of the deep neural network detection model of the elderly is analyzed, where the parameter file mainly includes: the number of convolutional layers and the number of layers, the number of depth convolutional layers and the size of the convolutional kernel, etc.

The parameter file of the old man detection model has 78 convolution layers, the calculated amount is 16 (G) calculated by the previous time complexity formula, and the space complexity is 120K calculated by the space complexity calculation formula. The computing power of each vector computing unit of the multi-core processor is 10 (G), and the multi-core dynamic allocation management model calculates that 2 computing vector units need to be called. Thus the system will automatically allocate two core processors with a memory space of 128K. The input video to the geriatric detection model is subcode stream 1 scaled to 300 x 300 resolution.

The deep network tracking and identifying model in the embodiment of the invention mainly tracks and identifies the characteristics of the old in the scene. The embodiment of the invention adopts an ECO tracking model and also can adopt a C-COT tracking model. The main calculated amount in the ECO tracking model is divided into two parts, namely a relevant filtering calculation part; the other part is the calculation of depth features. The calculation amount of the relevant filtering part is lower than 1G, and the depth characteristic adopts a model extraction characteristic in a deep neural network (such as MobileNet V2). The deep neural network structure adopted by the invention is shown in the following table 1:

TABLE 1

The calculated amount is 0.585G, the space complexity is 25K, 1 calculation vector unit is called by the multi-core dynamic allocation management model, and the allocation memory space is 128K.

The deep network fall classification model of the old can also adopt a classification model based on MobilenetV2 training, the space complexity is 25K, and the calculated amount is 0.585G. Here the system will automatically allocate 1 core with a memory space of 128K.

S600, inputting the two paths of subcode streams separated by the video input model into the corresponding depth network model.

S700, automatically sending the video subcode streams to two paths of advanced age network detection models for detection respectively, and directly sending the detection results of the subcode stream 1 and the subcode stream 2 to the advanced network tracking and identifying model. And tracking the old people through the deep network tracking and identifying model, and then displaying the tracking result into the main code stream. After the old people are detected by the old people deep network detection model in the area where the old people are located in the scene, the old people are tracked by the deep network tracking and identifying model, at the moment, the output of the deep network tracking and identifying model is an image containing the old people, the output is used as an input image of the deep network falling classification model, the image is classified and judged by the deep network falling classification model, if the old people fall, the logic combination model loads alarm audio according to the output action corresponding to the preset task, and therefore a user (caretaker) can be reminded of the falling of the old people. Through tracking the old man in real time, when the old man falls in this region, send alarm signal, perhaps shoot old man and fall the picture and send on nurse electronic equipment, therefore great precaution risk, also help the user in time to know if the old man falls the injury even when busy with other things, alleviateed caretaker's pressure.

Application example 2

Taking care of an infant who is not yet able to walk upright as an example. The main steps of this application example 2 are the same as those of application example 1. Only for infant care, the deep network model mainly comprises the following components according to application example 2: the child depth network detection model and the depth network tracking recognition model are only required to be connected in a two-stage cascade mode, the designated task is a child care function, and the designated output task is that the child exceeds a delimited area to prompt voice.

When the child is detected by the child depth network detection model in the delimited area, the child depth network detection model output at the moment is a rectangular frame containing the child, the rectangular frame area is used as the input of the tracking network, the rectangular frame containing the child is always generated, the rectangular frame always tracks the child, when the child moves, the rectangular frame also moves along, when the child climbs out of the area appointed by the user, the logic combination model automatically loads alarm audio, and gives an alarm sound to prompt the user that the child climbs out of the appointed area.

The child detection model can also adopt a child depth network detection model based on MobilenetV2-SSD training, a parameter file of the child depth network detection model is analyzed, the parameter calculation amount of the child depth network detection model is 16 (G) through the calculation of the time complexity and the space complexity, the space complexity is 120K, and 2 calculation vector units are required to be called by the multi-core dynamic allocation management model to calculate the position. Here the system will automatically allocate 2 core processors with a memory space of 128K. And the input video into the model is subcode stream 2 scaled to 300 x 300 resolution.

When the deep network tracking recognition model tracks that the child exceeds a set effective area, the system automatically sends out a voice prompt.

Application example 3

Based on application embodiments 1 and 2, application embodiment 3 of the present invention provides a multi-core processor-based multi-element depth network model reconstruction method with both old people care and child care, which comprises the following steps:

selecting a multi-core neural network processing chip;

collecting video stream images of visible light or infrared light;

selecting corresponding logic combination relations, and determining cascade relations and output actions among all depth network models in the depth network module;

in a preferred embodiment of the invention, the logical combination selected is to select the area 1/2 of the entire image area in the middle of the image as the effective identification area.

Taking the example of implementing the fall detection of the old and the nursing function of the child at the same time, the selected depth network model comprises: four depth network models, namely an old man detection model, a child detection model, a depth tracking identification model, a fall classification model and the like. The old man detection model, the depth tracking recognition model and the falling classification model are subjected to three-level cascade connection, namely the old man detection model is connected with the depth tracking recognition model, the depth tracking recognition model is connected with the falling classification model, and the appointed output task is to judge that the old man falls and then report the alarm through voice. The child detection model and the depth tracking recognition model are in secondary cascade, the designated task is a child care function, and the designated output task is that the child exceeds a delimited area to prompt voice. The same depth tracking recognition model can be used, and when the detection area determines that the old and the child exist, for example, the old nurses the child or the children accompany the scene, the old and the child can be nursed by only one set of equipment.

The four depth network models are imported into the memory through the multi-core dynamic allocation management module, and two processing cores are allocated to each network model:

specifically, in this embodiment, the old man detection model is a detection model based on mobiletv 2-SSD training, and the input video fed into the model is subcode stream 1 scaled to 300×300 subcode stream 1, and the model is loaded into the memory of addresses 0-256K in the Movidius chip for calculation. Since the calculated amount of training mobiletv 2-SSD is 6G here, movidius has 12 vector calculation units in total, and the calculation force of each calculation unit is 10G, and in order to obtain better real-time capability, calculation vector units SHAVE1 and SHAVE2 are called here for calculation. The child detection model is also based on a detection model trained by the MobilenetV2-SSD, and the input video fed into the model is scaled to 300×300 subcode stream 2. And loading the model to the memory of the address 256-512K in the Movidius chip for calculation, and calling the calculation vector units SHAVE3 and SHAVE4 for calculation. In order to obtain a good tracking effect, four SHAVE are adopted for calculation. And calls the memory of address 512-640K in the Movidius chip to calculate. The old man fall model adopts a classification model based on MobilenetV2 training, and is loaded to the memory of addresses 640-758K in a Movidius sheet for calculation, the calculated amount is 1G, and a calculation vector unit SHAVE10 is called.

And automatically sending the video subcode streams to two paths of detection models for detection respectively through the two paths of subcode streams separated by the video stream image input module, and directly sending the detection results of the subcode stream 1 and the subcode stream 2 to the tracking model. And tracking old people and children through a tracking model, and displaying the tracking result into a main code stream. By tracking the old and the child in real time, when the child exceeds a set effective area, the system automatically sends out voice reminding; when the old falls in the effective area, an alarm is also sent out.

The invention is only exemplified by the nursing of the old and the nursing of the infants, and can be applied to various scene recognition, such as detection and protection of a safety area, timing starting of a program, nursing of important articles, alarm of stranger intrusion, reminding of a white list and/or a black list of a face, instant shooting of highlight, real-time dynamic shooting, pet shooting, smiling shooting and the like through reconstruction of a multi-element depth network model based on a multi-core processor. And one or more of the above can be combined, and a set of equipment is adopted to realize, for example, important article nursing and stranger intrusion alarm. The highlight instant shooting is combined with the real-time dynamic snapshot. Or a combination of a plurality.

Example 2

Referring to fig. 5, an embodiment of the present invention corresponds to the method for reconstructing a multi-core processor-based multi-depth network model set forth in the above embodiment 1 and application embodiments 1 to 3, and further provides a multi-core processor-based multi-depth network model reconstruction device, where the device includes:

the video stream image input module 10 is used for acquiring video stream images acquired by the camera; the method comprises the steps of carrying out a first treatment on the surface of the

The logic combination module 20 is configured to select a logic combination relationship, and determine a cascade relationship and a corresponding output action between each depth network model in the depth network module according to the logic combination relationship;

the loading module 30 is configured to load a corresponding deep network model according to the logical combination relationship;

the multi-core dynamic allocation management module 40 is configured to invoke a multi-core dynamic allocation management instruction, calculate the complexity of the loaded depth network model, and allocate a corresponding memory and a predetermined number of core processors for each depth network model according to the complexity;

the depth network processing module 50 is configured to input the acquired video stream image into a corresponding depth network model;

and the execution module 60 is configured to analyze scene information in the video stream image according to the specified output information obtained after the deep network model processing, and execute a corresponding output action.

According to the multi-core processor-based multi-element depth network model reconstruction device, the cascade relation among the depth network models is determined according to the logic combination relation by adopting the multi-core processor, then corresponding memory and the core processor are allocated to each depth network model according to the complexity of the depth network model, and comprehensive analysis is carried out on scenes in video stream images, so that the requirements on memory space are low, the algorithm complexity is low, the product cost is low, various depth network models (depth network learning algorithms) can be flexibly combined, a building block type development mode is formed, and the development efficiency and the interestingness of users are improved.

The multi-core dynamic allocation management module 40 is specifically configured to invoke a multi-core dynamic allocation management instruction, calculate a time complexity and a space complexity of a corresponding depth network model according to a loaded parameter file of the depth network model, and determine a memory space and the number of core processors dynamically allocated to the corresponding depth network model according to the time complexity and the space complexity.

The multi-core dynamic allocation management module 40 includes: the system comprises a space complexity calculating submodule and a time complexity calculating submodule, wherein the space complexity calculating submodule is used for calculating space complexity. The calculation formula of the space complexity is as follows:

the time complexity calculation submodule is used for calculating the time complexity. The calculation formula of the time complexity is divided into:

Time～O(M ² ·K ² ·C _in •C _out )

wherein M is the size of the output feature map, K is the size of the convolution kernel, C _in Is the number of input channels, C _out The number of output channels, D is the total convolution layer number of the depth network model; l is the first convolution layer of the depth network model, C _l The number of output channels of the first convolution layer of the depth network model is also the number of convolution kernels of the convolution layer; c (C) _l-1 Is the number of input channels of the first said convolution layer.

The multi-core dynamic allocation management module 40 further includes:

the memory space allocation submodule is used for calculating the space complexity of each depth network model to be M according to the N depth network models specified by the logic combination relation _i (K) Allocating a corresponding memory space for each depth network model;

a sub-module for allocating the number of the core processors, wherein the sub-module is used for calculating the time complexity of each depth network model to be T _i (G) The method comprises the steps of carrying out a first treatment on the surface of the Setting the computing capability of each core processor in the multi-core processors as H (G), wherein G is a unit for measuring the complexity of computing time, and the total number of the core processors required by the depth network model is sum=T _i (G) H (G), wherein N is an integer greater than 0.

The multi-core processor-based multi-element depth network model reconstruction method and device provided by the invention are described in detail, and specific examples are applied to illustrate the principle and implementation of the invention, and the description of the above examples is only used for helping to understand the method and core idea of the invention; also, as will occur to those of ordinary skill in the art, many modifications are possible in view of the teachings of the present invention, both in its specific embodiments and its application range. In summary, the disclosure is not limited to the embodiments of the invention, but is intended to cover all modifications of equivalent structures or equivalent processes, or direct or indirect application in other related arts, which are included in the scope of the invention, and should not be construed to limit the scope of the invention.

Claims

1. The target nursing method based on the multistage neural network cascade is characterized in that the nursing object is the old and/or the infant, and the method comprises the following steps:

acquiring a video stream image which is acquired by a camera and comprises a nursing object;

2. The target nursing method based on the multistage neural network cascade according to claim 1, wherein if the nursing object is an infant, the depth network model comprises a child depth detection model and a network tracking identification model, and the child depth detection model and the network tracking identification model are in secondary cascade;

3. The multi-level neural network cascade-based target care method of claim 2, wherein the output action is a voice prompt, and wherein prompt content includes child falling beyond a delimited area or elderly falling.

4. A method of target care based on multi-level neural network cascading according to any one of claims 2 to 3, wherein said allocating a respective memory and a predetermined number of core processors for each deep network model according to the complexity comprises:

obtaining a parameter file and a weight file of each depth network model;

5. The method of claim 4, wherein the parameter file comprises at least one of: the number of convolution layers, the number of depth convolution layers, and the convolution kernel size.

6. The method of claim 4, wherein inputting the captured video stream image into a corresponding depth network model comprises:

7. The multi-level neural network cascade-based target care method of claim 4, wherein the detection model is a network detection model based on mobiletv 2-SSD training.

8. The method of claim 4, wherein assigning each depth network model a respective memory and a predetermined number of core processors based on spatial complexity and temporal complexity of each depth network model comprises:

9. The multi-level neural network cascade-based target care method of claim 1, wherein the effective identification area is: the area located in the middle of the image is selected and the area is 1/2 of the whole image area.

10. Target nurse device based on multistage neural network cascade, its characterized in that, nurse the object and be old man and/or infant, include:

the video stream image input module is used for acquiring video stream images which are acquired by the camera and comprise nursing objects;