CN111770317A

CN111770317A - Video monitoring method, device, equipment and medium for intelligent community

Info

Publication number: CN111770317A
Application number: CN202010713809.4A
Authority: CN
Inventors: 张超
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2020-10-13
Anticipated expiration: 2040-07-22
Also published as: CN111770317B

Abstract

The application relates to the field of artificial intelligence, and provides a video monitoring method, a device, equipment and a medium for a smart community, which can store video stream data acquired by a camera in real time to a block chain to prevent the video stream data from being tampered, improve the safety, call a target model from a model library according to the processing type of a video processing request, acquire the video stream data from the block chain, decode the video stream data by using a pre-trained decoder to obtain target data, input the target data to the target model for processing, output a processing result to determine response measures, and report the response measures to a designated server, thereby realizing effective video monitoring and realizing smart supervision of the community by combining with an artificial intelligence means. The present application also relates to blockchain techniques on which video stream data may be stored. The method and the device can be applied to smart community scenes, and therefore construction of smart cities is promoted.

Description

Video monitoring method, device, equipment and medium for intelligent community

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a video monitoring method, device, equipment and medium for an intelligent community.

Background

The intelligent community is a new idea of community management and is a new mode of social management innovation in new situations. The intelligent community fully utilizes the integrated application of new-generation information technologies such as Internet of things, cloud computing and mobile internet, and provides a safe, comfortable and convenient modern and intelligent living environment for community residents, so that a community with a new management form based on informatization and intelligent social management and service is formed.

Video monitoring is an important component of the smart community, and traditional video monitoring only stays in a local man-machine monitoring state, and cannot realize automatic management of the community, so that development of the smart community is limited to a great extent.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a video monitoring method, apparatus, device and medium for smart communities, which can prevent video stream data from being tampered, improve security, and simultaneously implement smart monitoring of the communities by combining with artificial intelligence means.

A video monitoring method for a smart community comprises the following steps:

collecting video stream data in real time through at least one camera device, and storing the video stream data to a block chain;

identifying a processing type from a video processing request in response to the video processing request;

calling a target model from a model library according to the processing type;

acquiring the video stream data from the block chain, and decoding the video stream data by using a pre-trained decoder to obtain target data;

inputting the target data into the target model for processing, and outputting a processing result;

and determining response measures according to the processing result, and reporting the response measures to a specified server.

According to a preferred embodiment of the present invention, said identifying a processing type from said video processing request comprises:

analyzing the method body in the video processing request to obtain all information carried in the video processing request;

acquiring a preset label;

and acquiring information corresponding to the preset label from all the information as the processing type.

According to a preferred embodiment of the present invention, before decoding the video stream data by using a pre-trained decoder to obtain target data, the method further comprises:

adjusting parameters of an initial model of a decoder by adopting a gradient descent method;

acquiring input data and output data of the decoder;

calculating a mean square error of the output data relative to the input data;

and when the mean square error is smaller than or equal to a preset threshold value, stopping adjusting the parameters to obtain the decoder.

According to a preferred embodiment of the present invention, when the processing type is a pet supervision type, the inputting the target data into the target model for processing, and outputting the processing result includes:

carrying out binarization processing on the target data to obtain a binarized image;

filling the binary image to obtain a filled image;

extracting a contour and the position of the contour from the filled image to obtain a target object;

carrying out facial recognition on the target object to obtain a pet type;

judging whether the target object is a large animal or not according to the pet type to obtain a judgment result;

acquiring motion data of the target object;

determining the motion state of the target object by utilizing a motion state evaluation model according to the motion data;

and determining the judgment result and the motion state as the processing result, and outputting the processing result.

According to a preferred embodiment of the present invention, before determining the motion state of the target object using the motion state estimation model, the method further comprises:

acquiring motion data of positive samples and a preset number of negative samples, and carrying out motion state labeling on the motion data of the positive samples so as to enable the positive samples to carry motion state labels;

randomly dividing the positive sample and the negative sample into a training set with a first preset proportion and a verification set with a second preset proportion;

training an initial model by using the training set, and verifying the accuracy of the trained initial model by using the verification set;

when the accuracy is greater than or equal to a preset accuracy, stopping training to obtain the motion state evaluation model; or

And when the accuracy is smaller than the preset accuracy, increasing the number of positive samples and the number of negative samples to retrain the motion state evaluation model.

According to a preferred embodiment of the present invention, when the processing type is a target user supervision type, the inputting the target data into the target model for processing, and outputting the processing result includes:

carrying out face recognition on the target data to obtain a current user;

when the current user belongs to the target user, detecting the attitude data of the current user by adopting an improved attitude detection model;

inputting the posture data of the current user into a classifier for classification, and outputting a posture classification result of the current user as a posture detection result of the current user.

According to a preferred embodiment of the present invention, the detecting the gesture data of the current user by using the improved gesture detection model comprises:

inputting the target data into a neural network model of a mobile network model, and outputting first characteristic data;

inputting the first characteristic data into a primary stage and a refining stage, and outputting a characteristic diagram;

inputting the feature map into a 1 × 1 convolutional layer, and outputting a key point heat map and a part of affinity fields as the posture data of the current user;

wherein the neural network model, the initial stage, and the refining stage employ a cascaded convolution consisting of 1 × 1 convolutional layer, a first 3 × 3 convolutional layer, and a second 3 × 3 convolutional layer, and the second convolutional layer is a cavity convolution whose convolutional kernel expansion is 2.

A video monitoring device for wisdom community, a video monitoring device for wisdom community includes:

the acquisition unit is used for acquiring video stream data in real time through at least one camera device and storing the video stream data to a block chain;

an identifying unit configured to identify a processing type from a video processing request in response to the video processing request;

the calling unit is used for calling a target model from a model library according to the processing type;

the decoding unit is used for acquiring the video stream data from the block chain and decoding the video stream data by using a pre-trained decoder to obtain target data;

the processing unit is used for inputting the target data into the target model for processing and outputting a processing result;

and the determining unit is used for determining response measures according to the processing result and reporting the response measures to the specified server.

An electronic device, the electronic device comprising:

a memory storing at least one instruction; and

and the processor executes the instructions stored in the memory to realize the video monitoring method for the intelligent community.

A computer-readable storage medium having at least one instruction stored therein, the at least one instruction being executable by a processor in an electronic device to implement the video surveillance method for a smart community.

According to the technical scheme, the invention can acquire video stream data in real time through at least one camera device and store the video stream data to the block chain, to prevent tampering of video stream data, to improve security, to respond to a video processing request, to identify a processing type from the video processing request, calling a target model from a model library according to the processing type, acquiring the video stream data from the block chain, decoding the video stream data by using a pre-trained decoder to obtain target data, inputting the target data into the target model for processing, outputting a processing result, further determining a response measure according to the processing result, and reporting the response measures to a designated server to realize effective video monitoring so as to realize intelligent supervision on the community by combining artificial intelligence means.

Drawings

FIG. 1 is a flowchart illustrating a video surveillance method for intelligent community according to a preferred embodiment of the present invention.

FIG. 2 is a functional block diagram of a video monitoring apparatus for intelligent community according to the preferred embodiment of the present invention.

FIG. 3 is a schematic structural diagram of an electronic device implementing a video monitoring method for smart communities according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a flowchart illustrating a video surveillance method for intelligent community according to a preferred embodiment of the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.

The video monitoring method for the smart community is applied to one or more electronic devices, wherein the electronic devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and hardware of the electronic devices includes but is not limited to a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The electronic device may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an interactive web Television (IPTV), an intelligent wearable device, and the like.

The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a cloud computing (cloud computing) based cloud consisting of a large number of hosts or network servers.

The Network where the electronic device is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.

And S10, acquiring video stream data in real time through at least one camera device, and storing the video stream data to a block chain.

The camera device may include a plurality of cameras, such as a monitoring camera in a community.

The camera device can be configured as a static camera and also can be configured with a dynamic sensing function, namely, images are collected only when moving objects appear in the picture, and the images which are still and still cannot be recorded so as to save energy and storage space.

In this embodiment, in order to prevent the video stream data from being tampered and further improve the security of the video stream data, the video stream data is saved to a blockchain.

S11, in response to the video processing request, identifying a processing type from the video processing request.

In this embodiment, the video processing request may be triggered by the relevant person when needed (e.g., when responsibility needs to be identified), or may be triggered at regular time, for example, configured to be triggered after dinner or in the morning, which is the peak time of an outgoing activity, and is suitable for improving the community supervision.

In at least one embodiment of the invention, the identifying a processing type from the video processing request comprises:

acquiring a preset label;

The preset label is a predefined label, and the processing type can be located through the preset label. For example: the preset tag corresponding to the process type may be configured as a type.

By analyzing the method body in the video processing request, all information carried in the video processing request can be quickly obtained, the analyzing speed is increased, and the processing type is accurately determined through the mapping relation between the preset label and the processing type.

And S12, calling the target model from the model library according to the processing type.

Wherein, a plurality of different models are pre-stored in the model library and are used for realizing different data processing.

In the present embodiment, the process type has a correspondence relationship with the target model.

For example: when the target user is supervised, that is, the processing type is a target user supervision type, the corresponding target model may include, but is not limited to: a face recognition model, a pose detection model, etc.

And S13, acquiring the video stream data from the block chain, and decoding the video stream data by using a pre-trained decoder to obtain target data.

It can be understood that when there is a data processing requirement, the video stream data is acquired from the block chain, so that the reliability of the data can be effectively ensured.

In at least one embodiment of the present invention, before decoding the video stream data by using a pre-trained decoder to obtain target data, the method further includes:

acquiring input data and output data of the decoder;

calculating a mean square error of the output data relative to the input data;

The decoder initial model comprises an input layer, a hidden layer and an output layer.

Different from the mode of adopting cross entropy or square error as the reconstruction loss function of a decoder in the prior art, the method calculates the loss in the form of mean square error in the training process, and then continuously adjusts the decoder in the gradient descending mode to minimize the difference between an input picture and an output picture, so that the decoder is more accurate.

And S14, inputting the target data into the target model for processing, and outputting a processing result.

It will be appreciated that the processing results obtained will vary for different processing types.

In at least one embodiment of the present invention, when the processing type is a pet supervision type, the inputting the target data into the target model for processing, and outputting the processing result includes:

filling the binary image to obtain a filled image;

carrying out facial recognition on the target object to obtain a pet type;

acquiring motion data of the target object;

The boundary of the monitored object can be obtained by carrying out binarization on the target data, so that the position and the contour of the monitored object can be conveniently obtained subsequently.

For example, when the target data is a picture, for one picture, the pixel value of each pixel point in the picture is compared with a pixel threshold value, so as to implement binarization. If the pixel value of a certain pixel point is greater than or equal to the pixel threshold value, taking the pixel value of the pixel point as 1; and if the pixel value of a certain pixel point is smaller than the pixel threshold value, taking the pixel value of the pixel point as 0. Of course, in other embodiments, other binarization methods may also be used, and the present invention is not limited in any way.

Wherein the binarized image may be padded using an image dilation method.

By filling the binary image, the fracture parts in the binary image can be connected, so that the clear outline can be conveniently extracted subsequently.

Wherein contours and their positions can be extracted from the filled image using an edge detection operator. The edge detection operator may include, but is not limited to: sobel operator, laplacian of gaussian operator, and the like.

The algorithm of face recognition is not limited in the present invention, and is not described herein since face recognition belongs to a relatively mature technology.

Through the embodiment, the system can respond timely when dangerous animals such as large dogs appear in the community, and can judge the motion state of the animals timely to determine whether the animals hurt people and other risks, so that the quality of pet supervision in the community is improved.

Specifically, before determining the motion state of the target object using the motion state evaluation model, the method further includes:

Wherein the motion state assessment model may include, but is not limited to: support Vector Machine (SVM) models.

The motion state may include, but is not limited to, any of the following: dysphoria, normal state, and quiet state.

Through the embodiment, the motion state evaluation model with high accuracy can be trained so as to accurately identify the motion state.

In at least one embodiment of the present invention, when the processing type is a target user supervision type, the inputting the target data into the target model for processing, and outputting a processing result includes:

carrying out face recognition on the target data to obtain a current user;

For example: and when the classification result of the user A is that the user A is in a falling state, determining that the gesture detection result of the user A is 'falling'.

The target users may include the elderly, people with dyskinesia, and the like.

Through the embodiment, the motion state of the target user can be determined in time, so that the target user can respond in time when dangers such as falling down occur.

Specifically, the detecting the gesture data of the current user by using the improved gesture detection model includes:

inputting the target data into a neural network model backbone of a mobile network model MobileNet, and outputting first characteristic data;

inputting the first feature data into an initial stage and a refining stage, and outputting a feature map;

wherein the backbone, initial stage and refine stage employ a cascade convolution consisting of a 1 x 1 convolution layer, a first 3 x 3 convolution layer and a second 3 x 3 convolution layer, and the second convolution layer is a void convolution with a convolution kernel expansion of 2.

In the prior art, a back bone of a VGG (Visual Geometry Group, deep convolutional neural network) structure is usually adopted for feature extraction for gesture detection, and the network includes a plurality of 7 × 7 convolutions, an initial stage and a fine stage, so that the detection speed is slow, and for fall detection, high real-time performance is required to ensure that a monitored target can respond in time when falling, so as to perform effective safety protection measures on the monitored target. Obviously, the gesture detection in the prior art cannot meet the requirement of speed.

Through the implementation mode, the backsbone of the MobileNet is adopted to replace a VGG structure adopted in the prior art, an improved posture detection model is constructed, the posture detection on a moving target is realized, meanwhile, a plurality of initial stages in the prior art are reduced to one initial stage, the operation amount is reduced, the high requirement of the current detection task on the detection speed is met, the 7 & lt7 & gt convolution in the prior art is further replaced by the cascade convolution with the cavity convolution, the introduction of the cavity convolution effectively improves the sensing field, the detection is more accurate, therefore, the detection efficiency is further improved on the basis of achieving the same or even higher accuracy, and the real-time performance of the detection is realized.

And S15, determining response measures according to the processing result, and reporting the response measures to a specified server.

In this embodiment, the corresponding relationship between the processing result and the response measure may be configured in advance, so as to directly match the corresponding response measure according to the processing result.

For example: when dangerous dogs are detected, community security is informed, and community security assistance is requested for processing; or when the old people fall down is detected, the guardian of the old people is informed in time, and meanwhile, the staff nearest to the falling down place is informed to assist in processing.

The designated server may be a server of a community security manager and other related personnel.

Through the embodiment, timely response can be realized when an emergency occurs, and further community management is enhanced.

Fig. 2 is a functional block diagram of a video monitoring apparatus for intelligent community according to a preferred embodiment of the present invention. The video monitoring device 11 for the intelligent community comprises a collecting unit 110, a recognition unit 111, a calling unit 112, a decoding unit 113, a processing unit 114 and a determining unit 115. The module/unit referred to in the present invention refers to a series of computer program segments that can be executed by the processor 13 and that can perform a fixed function, and that are stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.

The capture unit 110 captures video stream data in real time by at least one camera and stores the video stream data to the blockchain.

In response to the video processing request, the identifying unit 111 identifies a processing type from the video processing request.

In at least one embodiment of the present invention, the identifying unit 111 identifies the processing type from the video processing request includes:

acquiring a preset label;

The retrieval unit 112 retrieves the target model from the model library according to the processing type.

The decoding unit 113 obtains the video stream data from the block chain, and decodes the video stream data by using a pre-trained decoder to obtain target data.

In at least one embodiment of the present invention, before decoding the video stream data by using a pre-trained decoder to obtain target data, parameters of an initial model of the decoder are adjusted by using a gradient descent method;

acquiring input data and output data of the decoder;

calculating a mean square error of the output data relative to the input data;

The processing unit 114 inputs the target data to the target model for processing, and outputs a processing result.

In at least one embodiment of the present invention, when the processing type is a pet supervision type, the processing unit 114 inputs the target data into the target model for processing, and outputting the processing result includes:

filling the binary image to obtain a filled image;

carrying out facial recognition on the target object to obtain a pet type;

acquiring motion data of the target object;

Wherein the binarized image may be padded using an image dilation method.

Specifically, before the motion state of the target object is determined by using a motion state evaluation model, motion data of positive samples and a preset number of negative samples are obtained, and motion state labeling is performed on the motion data of the positive samples, so that the positive samples carry motion state labels;

In at least one embodiment of the present invention, when the processing type is a target user supervision type, the processing unit 114 inputs the target data into the target model for processing, and outputting a processing result includes:

carrying out face recognition on the target data to obtain a current user;

The target users may include the elderly, people with dyskinesia, and the like.

Specifically, the processing unit 114 detecting the gesture data of the current user by using the improved gesture detection model includes:

The determining unit 115 determines a response measure according to the processing result, and reports the response measure to the designated server.

Fig. 3 is a schematic structural diagram of an electronic device for implementing a video monitoring method for smart communities according to a preferred embodiment of the present invention.

The electronic device 1 may comprise a memory 12, a processor 13 and a bus, and may further comprise a computer program, such as a video surveillance program for a smart community, stored in the memory 12 and executable on the processor 13.

It will be understood by those skilled in the art that the schematic diagram is merely an example of the electronic device 1, and does not constitute a limitation to the electronic device 1, the electronic device 1 may have a bus-type structure or a star-type structure, the electronic device 1 may further include more or less hardware or software than those shown in the figures, or different component arrangements, for example, the electronic device 1 may further include an input and output device, a network access device, and the like.

It should be noted that the electronic device 1 is only an example, and other existing or future electronic products, such as those that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.

The memory 12 includes at least one type of readable storage medium, which includes flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 12 may in some embodiments be an internal storage unit of the electronic device 1, for example a removable hard disk of the electronic device 1. The memory 12 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 1. Further, the memory 12 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 12 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of video surveillance programs for smart communities, etc., but also to temporarily store data that has been output or is to be output.

The processor 13 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 13 is a Control Unit (Control Unit) of the electronic device 1, connects various components of the electronic device 1 by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (for example, executing a video monitoring program for a smart community, etc.) stored in the memory 12 and calling data stored in the memory 12.

The processor 13 executes an operating system of the electronic device 1 and various installed application programs. The processor 13 executes the application program to implement the steps of the above-mentioned video monitoring method embodiments for intelligent communities, such as the steps shown in fig. 1.

Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 12 and executed by the processor 13 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the electronic device 1. For example, the computer program may be divided into an acquisition unit 110, a recognition unit 111, a retrieval unit 112, a decoding unit 113, a processing unit 114, a determination unit 115.

Alternatively, the processor 13, when executing the computer program, implements the functions of the modules/units in the above device embodiments, for example:

calling a target model from a model library according to the processing type;

The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the video monitoring method for the intelligent community according to the embodiments of the present invention.

The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented.

Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

Further, the computer-usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one arrow is shown in FIG. 3, but this does not indicate only one bus or one type of bus. The bus is arranged to enable connection communication between the memory 12 and at least one processor 13 or the like.

Although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 13 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.

Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (organic light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

Fig. 3 only shows the electronic device 1 with components 12-13, and it will be understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.

With reference to fig. 1, the memory 12 in the electronic device 1 stores a plurality of instructions to implement a video monitoring method for smart communities, and the processor 13 can execute the plurality of instructions to implement:

calling a target model from a model library according to the processing type;

Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A video monitoring method for a smart community is characterized by comprising the following steps:

calling a target model from a model library according to the processing type;

2. The method of claim 1, wherein the identifying a processing type from the video processing request comprises:

acquiring a preset label;

3. The method as claimed in claim 1, wherein before decoding the video stream data with a pre-trained decoder to obtain the target data, the method further comprises:

acquiring input data and output data of the decoder;

calculating a mean square error of the output data relative to the input data;

4. The method as claimed in claim 1, wherein when the processing type is a pet supervision type, the inputting the target data into the target model for processing, and the outputting the processing result comprises:

filling the binary image to obtain a filled image;

carrying out facial recognition on the target object to obtain a pet type;

acquiring motion data of the target object;

5. The video surveillance method for intelligent communities as claimed in claim 4, wherein before determining the motion state of the target object using the motion state evaluation model, the method further comprises:

6. The method as claimed in claim 1, wherein when the processing type is a target user supervision type, the inputting the target data into the target model for processing, and outputting the processing result comprises:

carrying out face recognition on the target data to obtain a current user;

7. The method for video surveillance of a smart community as claimed in claim 6, wherein said detecting the pose data of the current user using the improved pose detection model comprises:

8. A video monitoring device for wisdom community, a serial communication port, a video monitoring device for wisdom community includes:

9. An electronic device, characterized in that the electronic device comprises:

a memory storing at least one instruction; and

a processor executing instructions stored in the memory to implement the video surveillance method for intelligent communities according to any one of claims 1 to 7.

10. A computer-readable storage medium characterized by: the computer-readable storage medium stores at least one instruction, which is executed by a processor in an electronic device to implement the video surveillance method for intelligent communities according to any one of claims 1 to 7.