CN114049597A

CN114049597A - Household scene event detection and identification system and method

Info

Publication number: CN114049597A
Application number: CN202111595260.4A
Authority: CN
Inventors: 潘广毅; 黄燕青; 杨洋
Original assignee: Beijing Chuangmizhihui Iot Technology Co ltd; Shanghai Imilab Technology Co Ltd
Current assignee: Shanghai Imilab Technology Co Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-02-15
Anticipated expiration: 2041-12-24
Also published as: CN114049597B

Abstract

The application provides a family scene event detection and identification system and a method, wherein the method comprises the following steps: collecting family scene event information through an information collection device; marking the family scene event information identified by the family server through the client; by the home server: acquiring the family scene event information; identifying the family scene event information through a general event detection and identification model; and adjusting the general event detection and identification model into a family event detection and identification model according to the labeled family scene event information. According to the system and the method for detecting and identifying the family scene event, family information is only stored in the family server, so that the privacy of user data can be protected; training is carried out aiming at different scenes, so that the training device can adapt to different scenes; and can be continuously and iteratively improved according to the participation feedback of the user.

Description

Household scene event detection and identification system and method

Technical Field

The application relates to the field of smart home, in particular to a system and a method for detecting and identifying a family scene event.

Background

Currently, some intelligent video detection systems for home scenes exist in the market, and are used for collecting event information of the home scenes in real time, identifying the event information and then notifying a user of the event information. For example, when a thief intrudes into the house, the intelligent video detection system can acquire a video of an event that the thief intrudes into the house according to a camera arranged at the house, recognize that the event is the thief intrusion event together, and then inform a user that the thief intrudes into the house.

However, some current intelligent video detection systems still have problems such as lack of privacy protection, difficulty in adapting to different home scenes, difficulty in continuously improving algorithm models, and insufficient user engagement. Therefore, there is a need to provide more efficient and reliable solutions.

Disclosure of Invention

The application provides a family scene event detection and identification system and method, which can protect the data privacy of a user, adapt to different scenes and continuously perform iterative improvement under the participation feedback of the user.

One aspect of the present application provides a family scenario event detection and recognition system, including: the information acquisition device is arranged in different rooms of the family scene and is configured to acquire family scene event information in the different rooms; the client is configured to label the family scene event information identified by the family server; the home server is in communication connection with the information acquisition device and the client, a universal event detection and identification model is arranged in the home server, and when the home server works: acquiring family scene event information of different rooms from the information acquisition device; detecting and identifying the family scene event information through the universal event detection and identification model; adjusting the general event detection and identification model into a family event detection and identification model according to family scene event information marked by the client, wherein the family event detection and identification models of different rooms are different; and the family scene event information and the relevant data of the family event detection and identification model are only stored in the family server.

In some embodiments of the present application, the system further comprises: a remote server configured to collect general scene event information from a public network and construct the general event detection recognition model based on the collected general scene event information, the general event detection recognition model set in the home server being downloaded from the remote server.

In some embodiments of the present application, the remote server comprises: a shared database configured to store general scene event information collected from a public network; the cloud algorithm training module is configured to construct the universal event detection and recognition model based on the collected universal scene event information stored in the shared database; and the cloud model library is configured to store the universal event detection and recognition model constructed by the cloud algorithm training module.

In some embodiments of the present application, the remote server updates the shared database periodically and updates the generic event detection recognition model.

In some embodiments of the present application, in the collecting of the general scene event information from the public network and the constructing of the general event detection recognition model based on the collected general scene event information, a single event detection recognition model is constructed for each kind of general scene event information.

In some embodiments of the present application, the general scene event information includes a picture and/or a video corresponding to a general scene event.

In some embodiments of the application, after the number of the home scenario event information marked by the client reaches a preset number threshold, the home server adjusts the general event detection and identification model to the home event detection and identification model according to the home scenario event information marked by the client.

In some embodiments of the present application, the home server comprises: the family database is configured to store the family scene event information labeled by the client; the family algorithm training module is used for adjusting the general event detection and recognition model into the family event detection and recognition model based on the labeled family scene event information stored in the family database; and the family model library is used for storing the adjusted family event detection and identification model.

In some embodiments of the present application, the method for adjusting the generic event detection recognition model to the family event detection recognition model is: and carrying out fine tuning training on the general event detection and recognition model based on a random gradient descent algorithm, a preset fine tuning learning rate, a current scene model, current model parameters and a labeled data set.

In some embodiments of the present application, the client is further configured to deploy a family scenario to which the information collecting apparatus belongs and an event type that needs to be detected and identified.

Another aspect of the present application further provides a method for detecting and identifying a family scenario event, including: acquiring family scene event information of different rooms through an information acquisition device; marking the family scene event information identified by the family server through the client; and by the home server: acquiring family scene event information of different rooms; detecting and identifying the family scene event information through a general event detection and identification model; adjusting the general event detection and identification model into a family event detection and identification model according to the labeled family scene event information, wherein the family event detection and identification models of different rooms are different; and the family scene event information and the relevant data of the family event detection and identification model are only stored in the family server.

In some embodiments of the present application, the method further comprises: collecting general scene event information from a public network through a remote server and constructing the general event detection recognition model based on the collected general scene event information.

In some embodiments of the present application, the collecting general scene event information from a public network and constructing the general event detection recognition model based on the collected general scene event information includes: storing general scene event information collected from a public network through a shared database; establishing the universal event detection and recognition model based on the collected universal scene event information stored in the shared database through a cloud algorithm training module; and storing the universal event detection and recognition model constructed by the cloud algorithm training module through a cloud model base.

In some embodiments of the present application, the method further comprises: and deploying the family scene to which the information acquisition device belongs and the event type needing to be detected and identified through the client.

According to the system and the method for detecting and identifying the family scene event, family information is only stored in the family server, so that the privacy of user data can be protected; training is carried out aiming at different scenes, so that the training device can adapt to different scenes; and can be continuously and iteratively improved according to the participation feedback of the user.

Drawings

The following drawings describe in detail exemplary embodiments disclosed in the present application. Wherein like reference numerals represent similar structures throughout the several views of the drawings. Those of ordinary skill in the art will understand that the present embodiments are non-limiting, exemplary embodiments and that the accompanying drawings are for illustrative and descriptive purposes only and are not intended to limit the scope of the present application, as other embodiments may equally fulfill the inventive intent of the present application. It should be understood that the drawings are not to scale. Wherein:

fig. 1 is a schematic diagram of a family scenario event detection and identification system according to an embodiment of the present application;

fig. 2 is a flowchart of a home scenario event detection and identification method according to an embodiment of the present application.

Detailed Description

The following description is presented to enable any person skilled in the art to make and use the present disclosure, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present application. Thus, the present application is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.

The technical solution of the present invention will be described in detail below with reference to the embodiments and the accompanying drawings.

The following problems mainly exist in some current video detection and identification systems for home scenes.

(1) There is a lack of privacy concerns. Privacy protection is emphasized in the family scene. Many existing AI video detection systems need to collect massive video scene images in a centralized manner, and a machine learning model is trained at the cloud without considering how to protect user data privacy. This brings concerns to the user in terms of data privacy, data security, and also risks of reaching legal regulations.

(2) It is difficult to solve the problem of scene adaptation. The family scenes are various, including indoor scenes and outdoor scenes, and scenes of different living areas (bedrooms, kitchens, living rooms, balconies and the like), and the same living area of each family has obvious personalized differences. The AI video detection model trained in one scenario may not be applicable in another scenario.

(3) It is difficult to achieve a continuous improvement in the algorithmic model effect. Given the privacy protection requirements, if the user's home video images cannot be collected centrally and incrementally, a continuous improvement in the model effect cannot be achieved. On the other hand, even if some users are willing to submit family data to the cloud server side to improve the algorithm model, the cloud service operation is brought with huge cost due to the fact that data transmission occupies a large amount of network bandwidth, cloud training occupies a large amount of computing resources, and the like.

(4) The user engagement is not sufficient. Without good privacy protection and feedback incentive mechanisms, users have no way or will not participate in the process of helping the algorithm model to improve.

The current intelligent detection recognition algorithm model is mainly based on a centralized deep learning technology and a federal visual learning technology. The main drawbacks of the centralized deep learning technique include: data must be concentrated on the central server of the service provider, with risks in terms of privacy protection and security; different scene data are manually marked and identified, and are divided into different training sets, and a model is trained independently for each scene; since no privacy protection mechanism is provided, it is difficult to encourage users to continue to provide home video image help algorithms to achieve iterative improvements. Although the privacy protection problem is partially solved by the federal visual learning technology, the data isomerism problem brought by different scene data cannot be solved, and the adaptability of an algorithm model across scenes is poor; a user feedback system and an incentive mechanism which are combined with a video training system are not considered, and the continuous updating of an algorithm model cannot be realized; landed applications in home scenarios have not been seen, and algorithms, systems, and mechanistic schemes that allow for continuous iterative updates have not been seen.

Aiming at the problems, the application provides a system and a method for detecting and identifying the family scene event, family information is only stored in a family server, and the privacy of user data can be protected; training is carried out aiming at different scenes, so that the training device can adapt to different scenes; and can be continuously and iteratively improved according to the participation feedback of the user.

Fig. 1 is a schematic diagram of a family scene event detection and identification system according to an embodiment of the present application. The following describes the home scenario event detection and identification system according to an embodiment of the present application in detail with reference to the accompanying drawings.

An embodiment of the present application provides a family scenario event detection and identification system, as shown in fig. 1, including: the information acquisition device 110 is arranged in different rooms of the family scene and is configured to acquire family scene event information of the different rooms; a client 120 configured to label the home scenario event information identified by the home server; the home server 130 is in communication connection with the information acquisition device 110 and the client 120, a general event detection and identification model is arranged in the home server 130, and when the home server 130 works: acquiring family scene event information of different rooms from the information acquisition device 110; detecting and identifying the family scene event information through the universal event detection and identification model; adjusting the general event detection and identification model into a family event detection and identification model according to the family scene event information labeled by the client 120, wherein the family event detection and identification models of different rooms are different; and the family scene event information and the relevant data of the family event detection and identification model are only stored in the family server.

It should be noted that the home scene described in the present application may also be any other scene, for example, a factory, a company office, or a supermarket.

Referring to fig. 1, the home scenario event detection and recognition system includes an information collection device 110 configured to collect home scenario event information.

In some embodiments of the present application, the information collecting device 110 is a camera. For example, if the home scene is a home scene, the information collecting device 110 may be a camera installed in a home. In other embodiments of the present application, the information collecting device 110 may also be a life detector (for detecting living things in a home scene, such as the work and rest of a pet), a temperature/smoke sensor (for detecting temperature and smoke in a home scene to determine whether there is a fire), and the like. In some embodiments of the application, the information acquisition device 110 may further be in communication connection with an intelligent home in a home scene, so as to directly obtain a working condition of the intelligent home, for example, a working condition of a home such as a television, an air conditioner, a refrigerator, and the like.

In some embodiments of the present application, the home scene event information includes a picture and/or a video corresponding to the home scene event. Taking a family scene as an example, if the scene event is, for example, a thief break-in event, the scene event information is a picture and/or a video of the thief break-in at home. In other embodiments of the present application, the home scenario event information may further include a temperature, for example, the scenario event is a fire, and the home scenario event information is a high temperature when the fire occurs.

With continued reference to fig. 1, the home scenario event detection and identification system includes a home server 130, which is in communication connection with the information acquisition device 110, a general event detection and identification model is provided in the home server 130, and when the home server 130 works: acquiring family scene event information from the information acquisition device 110; identifying the family scene event information through the general event detection and identification model; and adjusting the general event detection and identification model into a family event detection and identification model according to the family scene event information labeled by the client 120. The home server 130 may further include an algorithmic inference module, and the generic event detection recognition model is loaded into the algorithmic inference module, and the detection and recognition of the scene event are implemented by the algorithmic inference module.

In other embodiments of the present application, the generic event detection recognition model may be disposed not in the home server but in the information collection device 110. That is, the step of detecting the identification event by using the generic event detection and identification model may occur at the information collecting apparatus 110. This can reduce the load on the home server because a large server is not provided in a general home. The information acquisition device 110 may be provided with an algorithm inference module, and the general event detection and identification model is loaded into the algorithm inference module, so as to detect and identify the scene event through the algorithm inference module.

In the embodiment of the present application, the generic event detection recognition model set in the home server 130 is downloaded from the remote server 100 (the remote server 100 will be described below). In other embodiments of the present application, the remote server 100 may not be provided, and the general event detection recognition model provided in the home server 130 is preset and may be directly obtained from other channels (e.g., a public model library).

The home server 130 is capable of acquiring home scenario event information from the information collecting device 110 and recognizing the home scenario event information through the common event detection recognition model when operating. For example, the information acquisition device 110 acquires a picture and/or a video of a thief entering a home, and after the home server 130 acquires the picture and/or the video of the thief entering the home, the event of the thief entering the home is identified through the common event detection and identification model.

In some embodiments of the present application, the home scenario event detection and recognition system may further include an alarm module configured to send alarm information to the client. For example, after the home server 130 recognizes the event that a thief intrudes into the home, the user is notified of the event through an alarm module.

Although the detection and identification of the home scene event can be achieved to some extent by the information collecting device 110 and the home server 130. However, because a general event detection and identification model is adopted, the pertinence is not strong, the method may not be suitable for various scenes, and the personalized requirements of users cannot be met. Therefore, the general event detection recognition model needs to be improved to adapt to various scenes and meet the personalized requirements of users.

The home server 130 may further adjust the general event detection and identification model to a home event detection and identification model according to the home scenario event information labeled by the client 120, where the home event detection and identification models of different rooms are different; and the family scene event information and the relevant data of the family event detection and identification model are only stored in the family server. The client 120 will be described later, and will not be described herein again.

The same event information may represent different event meanings in different rooms, so that family event detection and identification models corresponding to different rooms are trained according to different rooms in a targeted manner. For example, the event information, which is also high temperature, may indicate that a fire is occurring in the living room; while in the kitchen it may indicate that someone is cooking; in the bathroom, it may indicate that a person is taking a hot bath.

As described above, the generic event detection recognition model may not be suitable for various scenarios and does not satisfy the user requirements. That is, the generic event detection recognition model may be misidentified. And the user can mark these false identifications through the user terminal 120. For example, a friend or relative of the user enters the room, which the generic event detection identification model identifies as a thief break-in by not knowing the friend or relative. The user can label this event, specifically for example the identity of the friend or relative. The home server 130 adjusts the general event detection recognition model to a home event detection recognition model according to the mark, and the home event detection recognition model does not generate the wrong recognition of this time any more, but can directly recognize that a friend or a relative enters the home.

In some embodiments of the present application, the home server 130 includes: a family database configured to store family scene event information labeled by the client 120; the family algorithm training module is used for adjusting the general event detection and recognition model into a family event detection and recognition model based on the labeled family scene event information stored in the family database; and the family model library is used for storing the adjusted family event detection and identification model.

In some embodiments of the present application, after the number of the home scenario event information marked by the client 120 reaches a preset number threshold, the home server adjusts the general event detection and recognition model to a home event detection and recognition model according to the home scenario event information marked by the client. In order to reduce consumption and improve user experience, the home server 130 does not adjust the generic event detection recognition model in real time, otherwise, the model is updated every time a new label appears, which is too cumbersome. Therefore, only after the number of the home scenario event information marked by the client 120 reaches a preset number threshold, the update of the generic event detection recognition model is started. In addition, the updating is continuous, and after the general event detection and identification model is adjusted to the family event detection and identification model for the first time, each subsequent updating is based on the last family event detection and identification model.

In other embodiments of the present application, the adjustment of the generic event detection recognition model may also be triggered manually by the user through the client 120.

In some embodiments of the present application, the method for adapting the generic event detection recognition model to the family event detection recognition model is: and carrying out fine tuning training on the general event detection and recognition model based on a random gradient descent algorithm (SGD), a preset fine tuning learning rate, a current scene model, current model parameters and a labeled data set.

The fine tuning training process is as follows:

inputting: fine-tuning learning rate c, a current scene model M (namely an event detection and identification model used by a current scene) and current model parameters W (0), and a user labeling data set D = (Dx [1: N ], Dy [1: N ]), wherein Dx is image/video data of a current room, and Dy is correct/error alarm labeled by a user;

the following procedure i =1: K times is performed:

selecting a small batch of data Dx [1: n ] from the data set D;

inputting the small batch of data into a model M to obtain model output Do [1: n ];

calculating a loss function L (Dy-Do);

calculating the average gradient g (i) over the batch based on the SGD algorithm;

updating model parameters: w (i) = W (i-1) -cg (i);

and (3) outputting: a new room model M-new and model parameters w (k).

And after the fine tuning training is finished, the new event detection and identification model is updated to the family model library and is simultaneously pushed to the camera to update the event detection and identification model. The camera will use the new event detection recognition model to perform subsequent detection alarm.

With continued reference to fig. 1, the home scenario event detection and recognition system further includes: a remote server 100 configured to collect general scene event information from a public network and construct a general event detection recognition model based on the collected general scene event information, the general event detection recognition model set in the home server 130 being downloaded from the remote server 100. The remote server 100 is communicatively connected to the home server 130, and the home server 130 can obtain data from the remote server 100, but the remote server 100 cannot obtain data from the home server 130. This is to protect user data privacy and avoid leakage of home data on the home server 130.

The remote server 100 collects and establishes a common picture/video database of common scene events (such as face detection/human body detection/movement detection/fall detection, etc.), and labels picture/video clips; based on a public image/video database, an event detection/recognition algorithm model is constructed through a cloud algorithm training module (deep learning/logistic regression and other classification algorithms) and stored in a cloud model library.

In some embodiments of the present application, the general scene event information includes a picture and/or a video corresponding to a general scene event. Taking a family scene as an example, if the scene event is, for example, a thief break-in event, the scene event information is a picture and/or a video of the thief break-in at home.

An event detection and identification model for detecting and identifying scene event information is trained and constructed through an artificial intelligence deep learning technology, and the artificial intelligence deep learning needs a large amount of data. If the data are provided for the artificial intelligence deep learning only by adopting the events generated in the family scene, a large amount of event information needs to be acquired from a public network to serve as the data for the artificial intelligence deep learning to train.

In order to protect user data privacy, the remote server 100 only obtains data from a public network, and does not obtain data from a home server.

In some embodiments of the present application, the remote server 100 is, for example, a cloud server.

In some embodiments of the present application, the remote server 100 comprises: a shared database configured to store general scene event information collected from a public network; the cloud algorithm training module is configured to construct a general event detection and recognition model based on the collected general scene event information stored in the shared database; and the cloud model library is configured to store the universal event detection recognition model constructed by the cloud algorithm training module.

Also exemplified is a thief break-in event in a home scenario. The remote server 100 collects a large number of pictures and videos related to the intrusion of a thief into a home on a public network and stores the pictures and videos into the shared database; the cloud algorithm training module carries out artificial intelligence deep learning according to the pictures and videos related to the intrusion of the thief into the house and constructs a general event detection and recognition model aiming at the event of the intrusion of the thief into the house; the cloud model base stores a plurality of universal event detection and identification models constructed by the cloud algorithm training module.

The general event detection and identification model is constructed according to data on a public network, is a general model, has a wide application range and is not strong in pertinence. The general event detection and identification model can detect and identify the intrusion time of the thief according to pictures and videos related to the intrusion of the thief into the house. That is, when a thief breaks into the home, the general event detection and recognition model can recognize the thief breaking-in event according to the picture and the video of the thief breaking-in.

In some embodiments of the present application, in collecting general scene event information from a public network and constructing a general event detection recognition model based on the collected general scene event information, a single event detection recognition model is constructed for each kind of general scene event information. An event detection recognition model for a thief break-in event is exemplified above by a thief break-in. Aiming at other events, such as fire at home, appliances at home being forgotten to be turned off, old people or children at home falling down and the like, one-to-one event detection and identification models can be respectively constructed.

In some embodiments of the present application, the remote server 100 updates the shared database periodically and updates the generic event detection recognition model. Data on the public network is updated on a daily basis, so the remote server 100 also updates the shared database periodically and reconstructs a new generic event detection recognition model based on new time information. Therefore, the universal event detection and identification model is advanced from time to time, is updated regularly and can be more suitable for different requirements of users.

With continued reference to fig. 1, the home scenario event detection and recognition system includes: a client 120 configured to label the home scenario event information identified by the home server 130. The client 120 and the home server 130 are communicatively connected, that is, the home server 130 can obtain data from the client 120, and the client 120 can also obtain data from the home server 130.

The labeling of the home scenario event information identified by the home server 130 includes, for example, labeling an error, a reason, and the like when the home scenario event information identified by the home server 130 is incorrect.

In some embodiments of the present application, the client 120 is further configured to deploy a home scene (e.g., a room corresponding to the camera, a living room, a bedroom, a kitchen, etc.) to which the information acquisition device 110 belongs and a type of event that needs to be detected and identified (e.g., detecting whether a thief intrudes, detecting whether an old person or a child in the home falls, etc.).

In some embodiments of the present application, the client 120 is, for example, a mobile phone APP.

The client 120 is an operation interface of a user, and the user may deploy the home scenario event detection and recognition system through the client 120. When the camera is deployed at home for the first time, a user configures a room to which the camera belongs and an event type which needs real-time detection on an APP, and deploys a home software module on a home server, wherein the method comprises the following steps: the system comprises a family algorithm training module, a family model library and a family database. And the camera is configured to be connected with the home server, and the subsequent snapshot alarm pictures/videos are stored in the home server. The home server downloads the universal event recognition model from the remote server, and pushes the universal event recognition model to the camera terminal according to the camera configuration. And starting the camera identification module. After the camera is started, a room detection identification model is loaded (a universal event identification model is loaded for the first time; and after the user marks, the family event detection identification model after personalized fine tuning in a family model library is loaded). And detecting a video stream, carrying out real-time event detection/identification alarm by a camera algorithm inference module, and storing an alarm picture/video to a family database of a family server. And (4) marking by the user, wherein the user opens a family database on the family server, marks the alarm picture/video data, and selects the correct alarm and the wrong alarm marked. After the user marking data reach a preset data volume threshold value, family algorithm model training can be selected and started.

According to the technical scheme, a scene personalized fine tuning training algorithm is designed, and cloud general model training and family personalized model training are achieved. The data can not have the user ownership, the general model can be improved in family iteration, and the data privacy of the user is protected; and the personalized model can have better detection accuracy in a family scene. The automatic scene model selection and training are realized by acquiring the scene metadata information of the camera deployed by the user. By connecting with the home server, the model effect is continuously improved by incrementally utilizing the home video images of the users. This encourages users to continue to participate in model improvement as user labeling increases the accuracy of home event alerts.

According to the family scene event detection and identification system, family information is only stored in the family server, and user data privacy can be protected; training is carried out aiming at different scenes, so that the training device can adapt to different scenes; and can be continuously and iteratively improved according to the participation feedback of the user.

An embodiment of the present application further provides a method for detecting and identifying a family scene event, which is shown in fig. 2 and includes:

step S1: collecting family scene event information;

step S2: acquiring the family scene event information:

step S3: identifying the family scene event information;

step S4: marking the identified family scene event information;

step S5: and adjusting the general event detection and identification model into a family event detection and identification model according to the labeled family scene event information.

Referring to fig. 2, in step S1, home scene event information of different rooms is collected by an information collecting apparatus.

Continuing to refer to fig. 2, at step S2, the home scene event information of the different room is acquired by the home server. And step S3, identifying the family scene event information of the different rooms through a general event detection identification model.

In some embodiments of the present application, the method further comprises: the method comprises the steps of collecting general scene event information from a public network through a remote server and constructing a general event detection recognition model based on the collected general scene event information.

In some embodiments of the present application, the collecting general scene event information from a public network and constructing a general event detection recognition model based on the collected general scene event information includes: storing general scene event information collected from a public network through a shared database; establishing a universal event detection and recognition model based on the collected universal scene event information stored in the shared database through a cloud algorithm training module; and storing the universal event detection and recognition model constructed by the cloud algorithm training module through a cloud model base.

In other embodiments of the present application, the generic event detection identification model is pre-set, which may be obtained directly from other channels (e.g., a common model library).

In some embodiments of the present application, the remote server periodically updates the shared database and updates a generic event detection recognition model. Data on the public network is changed and updated every day, so the remote server also updates the shared database periodically and reconstructs a new universal event detection recognition model according to new time information. Therefore, the universal event detection and identification model is advanced from time to time, is updated regularly and can be more suitable for different requirements of users.

And when the home server works, the home server can acquire home scene event information from the information acquisition device and identify the home scene event information through the general event detection and identification model. For example, the information acquisition device acquires a picture and/or a video of a thief entering a home, and the home server acquires the picture and/or the video of the thief entering the home and then identifies the event of the thief entering the home through the universal event detection and identification model.

In some embodiments of the present application, the home scenario event detection and identification method may further include sending an alarm message to the client. For example, the home server notifies the user of the event after recognizing the event that a thief intrudes into the home.

With continued reference to fig. 2, at step S4, the home scene event information identified by the home server is annotated by the client.

The labeling of the home scenario event information identified by the home server includes, for example, labeling an error, a reason, and the like when the home scenario event information identified by the home server is incorrect.

In some embodiments of the present application, the method further comprises: the client deploys a family scene (such as a room, a living room, a bedroom, a kitchen and the like corresponding to the camera) to which the information acquisition device belongs and an event type needing to be detected and identified (such as detecting whether a thief breaks into the house, detecting whether an old man or a child at home falls down and the like).

Although the detection and identification of the family scene event can be realized to some extent through the steps described above. However, because a general event detection and identification model is adopted, the pertinence is not strong, the method may not be suitable for various scenes, and the personalized requirements of users cannot be met. Therefore, the general event detection recognition model needs to be improved to adapt to various scenes and meet the personalized requirements of users.

Continuing to refer to fig. 2, in step S5, adjusting, by the home server, the generic event detection recognition model into a home event detection recognition model according to the labeled home scenario event information, where the home event detection recognition models of different rooms are different; and the family scene event information and the relevant data of the family event detection and identification model are only stored in the family server. The same event information may represent different event meanings in different rooms, so that family event detection and identification models corresponding to different rooms are trained according to different rooms in a targeted manner. For example, the event information, which is also high temperature, may indicate that a fire is occurring in the living room; while in the kitchen it may indicate that someone is cooking; in the bathroom, it may indicate that a person is taking a hot bath.

As described above, the generic event detection recognition model may not be suitable for various scenarios and does not satisfy the user requirements. That is, the generic event detection recognition model may be misidentified. And the user can mark the error identifications through the user end. For example, a friend or relative of the user enters the room, which the generic event detection identification model identifies as a thief break-in by not knowing the friend or relative. The user can label this event, specifically for example the identity of the friend or relative. The family server adjusts the general event detection and identification model into a family event detection and identification model according to the mark, the family event detection and identification model cannot generate the wrong identification of the time later, and the family event detection and identification model can directly identify that a friend or a relative enters the family.

In some embodiments of the application, after the number of the home scenario event information marked by the client reaches a preset number threshold, the home server adjusts the general event detection and identification model to a home event detection and identification model according to the home scenario event information marked by the client. In order to reduce consumption and improve user experience, the home server does not adjust the general event detection and identification model in real time, otherwise, the model is updated every time a new label appears, which is too cumbersome. Therefore, only after the number of the family scene event information marked by the client reaches a preset number threshold, the update of the universal event detection and identification model is started. In addition, the updating is continuous, and after the general event detection and identification model is adjusted to the family event detection and identification model for the first time, each subsequent updating is based on the last family event detection and identification model.

In other embodiments of the present application, the adjusting of the generic event detection recognition model may also be triggered manually by a user through a client.

The fine tuning training process is as follows:

the following procedure i =1: K times is performed:

selecting a small batch of data Dx [1: n ] from the data set D;

calculating a loss function L (Dy-Do);

updating model parameters: w (i) = W (i-1) -cg (i);

and (3) outputting: a new room model M-new and model parameters w (k).

In view of the above, it will be apparent to those skilled in the art upon reading the present application that the foregoing application content may be presented by way of example only, and may not be limiting. Those skilled in the art will appreciate that the present application is intended to cover various reasonable variations, adaptations, and modifications of the embodiments described herein, although not explicitly described herein. Such alterations, modifications, and variations are intended to be within the spirit and scope of the exemplary embodiments of this application.

It is to be understood that the term "and/or" as used herein includes any and all combinations of one or more of the associated listed items. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present.

It will be further understood that the terms "comprises," "comprising," "includes" or "including," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. Thus, a first element in some embodiments may be termed a second element in other embodiments without departing from the teachings of the present application. The same reference numerals or the same reference characters denote the same elements throughout the specification.

Further, the present specification describes example embodiments with reference to idealized example cross-sectional and/or plan and/or perspective views. Accordingly, variations from the shapes of the illustrations as a result, for example, of manufacturing techniques and/or tolerances, are to be expected. Thus, example embodiments should not be construed as limited to the shapes of regions illustrated herein but are to include deviations in shapes that result, for example, from manufacturing. Thus, the regions illustrated in the figures are schematic in nature and their shapes are not intended to illustrate the actual shape of a region of a device and are not intended to limit the scope of exemplary embodiments.

Claims

1. A home scenario event detection and recognition system, comprising:

the information acquisition device is arranged in different rooms of the family scene and is configured to acquire family scene event information in the different rooms;

the client is configured to label the family scene event information identified by the family server;

the home server is in communication connection with the information acquisition device and the client, a universal event detection and identification model is arranged in the home server, and when the home server works:

acquiring family scene event information in the different rooms from the information acquisition device;

detecting and identifying the family scene event information through the universal event detection and identification model;

adjusting the general event detection and identification model into a family event detection and identification model according to family scene event information marked by the client, wherein the family event detection and identification models of different rooms are different;

and the family scene event information and the relevant data of the family event detection and identification model are only stored in the family server.

2. The system of claim 1, further comprising: a remote server configured to collect general scene event information from a public network and construct the general event detection recognition model based on the collected general scene event information, the general event detection recognition model set in the home server being downloaded from the remote server.

3. The system of claim 2, wherein the remote server comprises:

a shared database configured to store general scene event information collected from a public network;

the cloud algorithm training module is configured to construct the universal event detection and recognition model based on the collected universal scene event information stored in the shared database;

and the cloud model library is configured to store the universal event detection and recognition model constructed by the cloud algorithm training module.

4. The system of claim 3, wherein the remote server periodically updates the shared database and updates the generic event detection recognition model.

5. The system of claim 2, wherein in collecting generic scene event information from a public network and constructing the generic event detection recognition model based on the collected generic scene event information, a single event detection recognition model is constructed for each generic scene event information.

6. The system of claim 2, wherein the general scene event information includes a picture and/or a video corresponding to a general scene event.

7. The system of claim 1, wherein the home server adjusts the generic event detection recognition model to the home event detection recognition model according to the home scenario event information labeled by the client after the number of the home scenario event information labeled by the client reaches a preset number threshold.

8. The system of claim 1, wherein the home server comprises:

the family database is configured to store the family scene event information labeled by the client;

the family algorithm training module is used for adjusting the general event detection and recognition model into the family event detection and recognition model based on the labeled family scene event information stored in the family database;

and the family model library is used for storing the adjusted family event detection and identification model.

9. The system of claim 1, wherein the generic event detection recognition model is adapted to the home event detection recognition model by: and carrying out fine tuning training on the general event detection and recognition model based on a random gradient descent algorithm, a preset fine tuning learning rate, a current scene model, current model parameters and a labeled data set.

10. The system of claim 1, wherein the client is further configured to deploy a home scenario to which the information gathering device belongs and a type of event that needs to be detected for identification.

11. A family scene event detection and identification method is characterized by comprising the following steps:

acquiring family scene event information in different rooms through an information acquisition device;

marking the family scene event information identified by the family server through the client; and

by the home server:

acquiring family scene event information in different rooms;

identifying the family scene event information through a general event detection and identification model;

adjusting the general event detection and identification model into a family event detection and identification model according to the labeled family scene event information, wherein the family event detection and identification models of different rooms are different;

12. The method of claim 11, further comprising: collecting general scene event information from a public network through a remote server and constructing the general event detection recognition model based on the collected general scene event information.

13. The method of claim 12, wherein said collecting generic context event information from a public network and building the generic event detection recognition model based on the collected generic context event information comprises:

storing general scene event information collected from a public network through a shared database;

establishing the universal event detection and recognition model based on the collected universal scene event information stored in the shared database through a cloud algorithm training module;

and storing the universal event detection and recognition model constructed by the cloud algorithm training module through a cloud model base.

14. The method of claim 13, wherein the remote server periodically updates the shared database and updates the generic event detection recognition model.

15. The method of claim 12, wherein in collecting general scene event information from a public network and constructing the general event detection recognition model based on the collected general scene event information, a single event detection recognition model is constructed for each general scene event information.

16. The method of claim 12, wherein the general scene event information comprises a picture and/or a video corresponding to a general scene event.

17. The method of claim 11, wherein the home server adjusts the generic event detection recognition model to the home event detection recognition model according to the home scenario event information labeled by the client after the number of the home scenario event information labeled by the client reaches a preset number threshold.

18. The method of claim 11, wherein the generic event detection recognition model is adapted to the home event detection recognition model by: and carrying out fine tuning training on the general event detection and recognition model based on a random gradient descent algorithm, a preset fine tuning learning rate, a current scene model, current model parameters and a labeled data set.

19. The method of claim 11, further comprising: and deploying the family scene to which the information acquisition device belongs and the event type needing to be detected and identified through the client.