CN114842319A

CN114842319A - Method and device for detecting home scene, electronic equipment and medium

Info

Publication number: CN114842319A
Application number: CN202210197907.6A
Authority: CN
Inventors: 张鹏; 刘浩宁; 向国庆; 范益波; 黄晓峰; 严伟
Original assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Current assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Priority date: 2022-03-01
Filing date: 2022-03-01
Publication date: 2022-08-02

Abstract

The application discloses a method and a device for detecting a home scene, electronic equipment and a medium. In the application, the home scene video data aiming at the target area can be collected by utilizing the camera device arranged at the intelligent home terminal; acquiring a target video transformation domain coefficient of home scene video data, inputting the target video transformation domain coefficient into a preset scene detection network model, and detecting whether a user operation event exists in a target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. By applying the technical scheme, after the video data are collected through the camera of the intelligent home, the unreadable video transform domain coefficient of the video data is collected and the original video data is eliminated, so that the home scene is detected subsequently according to the video transform domain coefficient data and the learning model. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.

Description

Method and device for detecting home scene, electronic equipment and medium

Technical Field

The present application relates to data processing technologies, and in particular, to a method and an apparatus for detecting a home scene, an electronic device, and a medium.

Background

In smart city application scenes such as smart homes, visual signals are main and indispensable information sources in all scenes, and the application scenes comprise home monitoring, gesture recognition, face recognition, home medical treatment, automatic driving, live entertainment and the like.

Although the visual signal plays an essential important role in smart home, in various application scenarios, the smart home system often directly uses the original visual signal data to perform intelligent analysis on different processing stages of the visual signal data. This also results in potentially significant privacy security concerns for the user once the sensitive privacy data of the user is present in the original video image signal.

Disclosure of Invention

The embodiment of the application provides a method and device for detecting a home scene, electronic equipment and a medium. The method is used for solving the problem that the privacy content of the user existing in the household scene video data collected by the intelligent household terminal is easy to leak in the related technology.

According to an aspect of an embodiment of the present application, a method for detecting a home scene includes:

acquiring home scene video data aiming at a target area by utilizing a camera device arranged at an intelligent home terminal;

acquiring a target video transform domain coefficient of the home scene video data, inputting the target video transform domain coefficient into a preset scene detection network model, and detecting whether a user operation event exists in the target area;

and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event.

Optionally, in another embodiment based on the foregoing method of the present application, the acquiring target video transform domain coefficients of the home scene video data includes:

collecting a CU transform domain coefficient of a coding unit of home scene video data; acquiring a Largest Coding Unit (LCU) transform domain coefficient of the home scene video data; acquiring single-frame transform domain coefficients of the home scene video data;

merging the CU transform domain coefficients, the LCU transform domain coefficients, and the single frame transform domain coefficients as the target video transform domain coefficients.

decoding the home scene video data to obtain decoded video data;

and acquiring the target video transform domain coefficient of the decoded video data.

Optionally, in another embodiment based on the foregoing method of the present application, after the acquiring the target video transform domain coefficient of the home scene video data, the method further includes:

and clearing the video data of the home scene.

Optionally, in another embodiment based on the foregoing method of the present application, the inputting the target video transform domain coefficient to a preset scene detection network model, and detecting whether a user operation event exists in the target area includes:

determining whether human body characteristics exist in a target area where the intelligent home terminal is located currently according to the identification of the target video transform domain coefficient, wherein the human body characteristics comprise at least one of size characteristics, color characteristics and contour characteristics;

and if the human body characteristics are determined to exist, inputting the target video transform domain coefficient into a preset scene detection network model, and detecting the user operation event.

Optionally, in another embodiment based on the foregoing method of the present application, after the controlling the smart home terminal to execute the action corresponding to the user operation event, the method further includes:

recording the incidence relation and incidence data between the target video transform domain coefficient and the action of executing the user operation event;

and after the incidence relation and the incidence data are combined into model training data, uploading the model training data to a cloud, wherein the model training data are used for training the scene detection network model.

Optionally, in another embodiment based on the foregoing method of the present application, after the merging the association relationship and the association data into model training data and uploading the model training data to a cloud, the method further includes:

after receiving a model updating instruction, receiving an updating scene detection network model transmitted from a cloud; and clearing the scene detection network model originally stored in the intelligent home terminal.

According to another aspect of the embodiments of the present application, there is provided an apparatus for detecting a home scene, including:

the first acquisition module is configured to acquire home scene video data aiming at a target area by utilizing a camera device arranged at the intelligent home terminal;

the second acquisition module is configured to acquire a target video transform domain coefficient of the home scene video data, input the target video transform domain coefficient into a preset scene detection network model, and detect whether a user operation event exists in the target area;

and the execution module is configured to control the intelligent home terminal to execute the action corresponding to the user operation event if the execution module exists.

According to another aspect of the embodiments of the present application, there is provided an electronic device including:

a memory for storing executable instructions; and

and the display is used for being matched with the memory to execute the executable instructions so as to complete the operation of any one of the methods for detecting the home scene.

According to a further aspect of the embodiments of the present application, there is provided a computer-readable storage medium for storing computer-readable instructions, which, when executed, perform the operations of any one of the methods for detecting a home scene.

In the application, the home scene video data aiming at the target area can be collected by utilizing the camera device arranged on the intelligent home terminal; acquiring a target video transformation domain coefficient of home scene video data, inputting the target video transformation domain coefficient into a preset scene detection network model, and detecting whether a user operation event exists in a target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. By applying the technical scheme, after the video data are collected through the camera of the intelligent home, the unreadable video transform domain coefficient of the video data is collected and the original video data is eliminated, so that the home scene is detected subsequently according to the video transform domain coefficient data and the learning model. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.

The technical solution of the present application is further described in detail by the accompanying drawings and examples.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.

The present application may be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:

fig. 1 is a schematic diagram of a method for detecting a home scene according to the present application;

fig. 2 is a schematic flow chart illustrating a process of converting video data of a home scene into video transform domain coefficients according to the present application;

FIG. 3 is a schematic flow chart illustrating a method for detecting a user operation event using a scene detection network model according to the present application;

FIG. 4 is a schematic view of a model operation flow for detecting a user operation event by using a scene detection network model according to the present application;

fig. 5 is a schematic structural diagram of an electronic device for detecting a home scene according to the present application;

fig. 6 is a schematic structural diagram of an electronic device for detecting a home scene according to the present application.

Detailed Description

Various exemplary embodiments of the present application will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the application, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

In addition, technical solutions in the embodiments of the present application may be combined with each other, but it is necessary to be based on the realization of the technical solutions by a person skilled in the art, and when the technical solutions are contradictory to each other or cannot be realized, such a combination of technical solutions should not be considered to exist, and is not within the protection scope claimed in the present application.

It should be noted that all directional indicators (such as upper, lower, left, right, front and rear … …) in the embodiments of the present application are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly.

A method for detecting a home scene according to an exemplary embodiment of the present application is described below with reference to fig. 1 to 4. It should be noted that the following application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect. Rather, embodiments of the present application may be applied to any scenario where applicable.

The application also provides a method and a device for detecting the home scene, electronic equipment and a medium.

Fig. 1 schematically shows a flow chart of a method for detecting a home scene according to an embodiment of the present application. As shown in fig. 1, the method includes:

s101, collecting home scene video data aiming at a target area by using a camera device arranged at the intelligent home terminal.

S102, acquiring a target video transformation domain coefficient of home scene video data, inputting the target video transformation domain coefficient into a preset scene detection network model, and detecting whether a user operation event exists in a target area.

And S103, if the operation event exists, controlling the intelligent home terminal to execute the action corresponding to the user operation event.

In the related art, in smart city application scenes such as smart homes, visual signals are main and indispensable information sources in each scene, and the scenes comprise application scenes such as home monitoring, gesture recognition, face recognition, home medical treatment, automatic driving and entertainment live broadcasting. The visual signal is used as a main body, a mainstream intelligent home frame is established around a user, data, a software and hardware system and necessary AI and Internet of things bottom layer technologies, and the daily life of people is more and more improved, so that the visual signal becomes an important component of intelligent and smart life.

Although visual signals play an indispensable important role in smart home, in various application scenarios, the smart home system often directly uses original visual signal data to perform intelligent analysis at different processing stages of the visual signal data. Obviously, once the user sensitive privacy data in the original visual signal is leaked at a certain stage, the problem of great privacy safety hidden danger cannot be avoided.

In visual signals, video data is particularly important. The intelligent home system cannot avoid the privacy potential safety hazard problem generated in different stages in the video data processing mode. The privacy security problem in scenes such as smart homes is highly valued by people at present, and the privacy security problem also becomes an important challenge for more advanced application of smart homes.

For the typical problems, in the current scenes such as smart homes, the video data stream desensitization method mainly adopts the traditional video coding processing mode to further process the decoded data. The privacy data can be protected to a certain extent, and the risk of exposure of the privacy data is reduced. However, once the private data is leaked at a certain stage before decoding, the private data is directly exposed, which inevitably causes a great hidden privacy safety hazard problem, and the processing mode after decoding has a low degree on the private content.

Based on the problem, the method and the device can acquire the unreadable video transform domain coefficient of the video data and remove the original video data after the video data is acquired through the camera of the smart home, so that the home scene can be detected subsequently according to the video transform domain coefficient data and the learning model. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.

Furthermore, in order to avoid the potential safety hazard problem that the video data is cracked and exposed after being leaked, the method and the device directly remove the original sensitive video data in the scenes with more sensitive private data, such as smart homes, and the like, and directly extract the video transform domain coefficient of the video data which is collected by the smart home camera and contains the home scene, so that the user instruction of the home scene is detected according to the video transform domain coefficient subsequently.

Specifically, as shown in fig. 2, the processing of the video data by using the feature signal according to the present application may include converting the home scene video data into a transform domain coefficient of a coding unit CU for collecting the home scene video data; acquiring a Largest Coding Unit (LCU) transform domain coefficient of the video data of the home scene; and acquiring single-frame transform domain coefficients of the home scene video data. Further, for many scenes in smart homes, common functions include identification, retrieval, tracking, and the like. These similar functions do not necessarily need to rely on the original data signal to be processed. And because the video transform domain coefficients usually do not have the characteristics of direct human readability, reversibility and the like, namely, even if the video transform domain coefficients are leaked, the acquired personnel cannot translate and understand the meanings represented by the data, so that the risk of cracking exposure is greatly reduced.

In addition, after the target video transform domain coefficient of the home scene video data is acquired, the home scene video data needs to be cleared. Therefore, the problem that potential safety hazards are easy to occur due to leakage of original video image data is further avoided.

In addition, according to the method and the device, correlation analysis can be performed on unreadable data such as CDVS/CDVA and the like and a scene detection network model, and the unreadable data is converted into an analysis result under each application scene, so that the difficulty of depending on original data is overcome, and privacy disclosure and exposure risks of the original video data are avoided.

Specifically, the video transform domain coefficient can be input into a preset scene detection network model to detect whether the data generated in the process of the user operation event in the target area still has no human readability and reversibility and is unique, so that the technical means can prevent the data from being cracked and exposed, and the risk of privacy content is effectively reduced.

In one mode, in order to improve the decision-making performance of the smart home, as shown in fig. 3, in the embodiment of the present application, when the smart home detects that a user operation event of a manual operation occurs, a visual feature signal and a manual operation result may be recorded and stored, so as to be used as training data of a scene detection network model. The current data is used as the latest data to replace old data, and the total data storage amount is ensured to be unchanged. By using the technology, the processed data is transmitted to the cloud for subsequent equipment updating, function upgrading and the like. The transmission data does not depend on the original video data, and the problem of privacy potential safety hazard generated after the data are intercepted can be effectively avoided.

In the application, the home scene video data aiming at the target area can be collected by utilizing the camera device arranged at the intelligent home terminal; acquiring a target video transformation domain coefficient of home scene video data, inputting the target video transformation domain coefficient into a preset scene detection network model, and detecting whether a user operation event exists in a target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. By applying the technical scheme, after the video data are collected through the camera of the intelligent home, the unreadable video transform domain coefficient of the video data is collected and the original video data is eliminated, so that the home scene is detected subsequently according to the video transform domain coefficient data and the learning model. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.

collecting a transform domain coefficient of a coding unit CU of home scene video data; acquiring a Largest Coding Unit (LCU) transform domain coefficient of the home scene video data; acquiring single-frame transform domain coefficients of the home scene video data;

decoding the home scene video data to obtain decoded video data;

According to the target and the problem to be solved, the method directly abandons the traditional video decoding processing mode in scenes with more sensitive private data such as smart homes, and the like, and changes the mode of collecting the transform domain coefficient during decoding into the mode of processing, wherein the specific process can include that in the process of processing the video data, the first half part is processed by the traditional video decoding mode, and the second half part respectively collects each video transform domain coefficient (for example, including a CU transform domain coefficient, an LCU transform domain coefficient and a single-frame transform domain coefficient) of the video data according to the process. It can be understood that through such a processing procedure, the video code stream does not need to be completely decoded, i.e. no reconstructed file is generated, and the risk of privacy disclosure of the reconstructed file data is avoided.

and clearing the video data of the home scene.

In the process of inputting the target video transform domain coefficient into the preset scene detection network model for detecting whether the user operation event exists in the target area, the method of the invention can comprise the following steps:

the input of the scene detection network model is a target video transform domain coefficient subjected to down-sampling, wherein the down-sampling technology is used for eliminating the influence of resolution in a home scene image so as to fix the input size of the neural network model.

And inputting the target video transform domain coefficient subjected to the down-sampling technology into a network structure for calculation, wherein the network structure of the scene detection network model is not specifically limited by the application.

Further, the prediction result output by the scene detection network model may be a type and a location for specifying an object. And determining whether a user operation event exists in the target area or not by identifying the type and the position of the object.

In one approach, the definable model outputs are:

Y＝[P _c 、b _x 、b _y 、b _h 、b _w 、C ₁ 、C ₂ 、C ₃ ] ^T 。

wherein, P _c Is the probability of having an object in the graph; b _x 、b _y 、b _h 、b _w Position parameters of corresponding objects; c ₁ 、C ₂ 、C _{3 is} Corresponding to the category of the specific object.

In one mode, after the association relation and the association data between the target video transform domain coefficient and the action of executing the user operation event are recorded, the application can also use the association relation and the association data as model training data to train the scene detection network model through continuous updating. It can be understood that, in the related art, the transform domain coefficient cannot be implemented by human operation to detect an application scenario such as a user operation event. Therefore, in order to apply the target transform domain coefficient provided by the application to a specific detection scene of the smart home, the application needs to perform correlation analysis on the target transform domain data by using the device training data in the smart furniture, so as to generate a network model capable of realizing target detection based on the transform domain data.

By the scheme, the target detection can be realized by using the video decoding transform domain coefficient, and the dependence of the intelligent household product on the original data is avoided. The data generated in the process from the data receiving stage to the data processing stage to the analysis result generation by the technology has no human readability and reversibility, and is unique, so that even if information is leaked, the information cannot be cracked, and the risk of private content exposure is greatly reduced.

Furthermore, after the trained scene detection network model which can be used for detecting according to the transform domain coefficient is obtained, the defect that a large memory of the smart home needs to be occupied due to the fact that the data architecture of the scene detection network model is too large is avoided. The method and the device can also perform model compression on the data to obtain a corresponding compressed scene detection network model with a smaller data architecture.

Optionally, the mode of compressing the scene detection network model by the present application may be a method of directly compressing the scene detection network model, and may include two aspects of model kernel sparsification and model clipping, for example. The thinning of the kernel needs the support of some sparse computation libraries, and the acceleration effect may be limited by many factors such as bandwidth and sparsity. In addition, the clipping method of the model needs to directly remove the unimportant filter parameters from the original model. Because the self-adaptive capacity of the neural network is very strong, and the model with a large data architecture is often more redundant, after some parameters are removed, the performance reduced by parameter removal can be recovered through a retraining means, so that the model can be effectively compressed to a great extent on the basis of the existing model only by selecting a proper clipping means and retraining means, and the method is the most common method used at present.

Furthermore, after the compressed scene detection network model with a small data architecture is obtained, the compressed scene detection network model can be deployed on smart homes. Therefore, the compressed image detection classification model can be used for recognizing the home scene video data collected by the camera device subsequently by the smart home, so that the video data can be subsequently converted into corresponding video transform domain coefficients.

In addition, the scene detection network model with a large data structure can be deployed in the server, so that the identification mode is determined based on the running state of the smart home. And then, a corresponding detection network model is selected in a targeted manner to detect whether a user operation event exists in the target area.

In one mode, the method for realizing intelligent home scene detection by using the target transform domain coefficient can realize that the video code stream does not need to be completely decoded, no reconstructed file is generated, the risk of privacy disclosure of the reconstructed file is avoided, and the video privacy data is protected to a certain extent. In addition, the scene detection network model obtained based on the neural network model can avoid dependence on original video data and simultaneously realize result prediction in a target detection scene, and achieves video privacy protection and household product intelligence. Meanwhile, in the method, the latest transform domain coefficient data can be stored at the same time, and the processed transform domain coefficient data are transmitted to the cloud end for subsequent equipment updating, function upgrading and the like, so that the problem of equipment timeliness of the smart home is solved. The method and the device can support more scene applications, such as home monitoring, face recognition and the like.

By applying the technical scheme, after the video data are collected through the camera of the intelligent home, the unreadable video transform domain coefficient of the video data is collected and the original video data is eliminated, so that the home scene is detected subsequently according to the video transform domain coefficient data and the learning model. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.

Optionally, in another embodiment of the present application, as shown in fig. 5, the present application further provides a device for detecting a home scene. Which comprises the following steps:

the system comprises a first acquisition module 201, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire home scene video data aiming at a target area by utilizing a camera device arranged on an intelligent home terminal;

a second collecting module 202, configured to collect a target video transform domain coefficient of the home scene video data, input the target video transform domain coefficient to a preset scene detection network model, and detect whether a user operation event exists in the target area;

and the execution module 203 is configured to control the smart home terminal to execute the action corresponding to the user operation event if the user operation event exists.

In another embodiment of the present application, the first acquisition module 201 is configured to perform the following steps:

decoding the home scene video data to obtain decoded video data;

and clearing the video data of the home scene.

after receiving a model updating command, receiving an updating scene detection network model transmitted from a cloud; and clearing the scene detection network model originally stored in the intelligent home terminal.

FIG. 6 is a block diagram illustrating a logical structure of an electronic device in accordance with an exemplary embodiment. For example, the electronic device 300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium, such as a memory, including instructions executable by a processor of an electronic device to perform the method of detecting a home scenario described above, the method including: acquiring home scene video data aiming at a target area by utilizing a camera device arranged at an intelligent home terminal; acquiring a target video transform domain coefficient of the home scene video data, inputting the target video transform domain coefficient into a preset scene detection network model, and detecting whether a user operation event exists in the target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. Optionally, the instructions may also be executable by a processor of the electronic device to perform other steps involved in the exemplary embodiments described above. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided an application/computer program product including one or more instructions executable by a processor of an electronic device to perform the method of detecting a home scenario described above, the method including: acquiring home scene video data aiming at a target area by utilizing a camera device arranged at an intelligent home terminal; acquiring a target video transform domain coefficient of the home scene video data, inputting the target video transform domain coefficient into a preset scene detection network model, and detecting whether a user operation event exists in the target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. Optionally, the instructions may also be executable by a processor of the electronic device to perform other steps involved in the exemplary embodiments described above.

Those skilled in the art will appreciate that the schematic diagram 6 is merely an example of the electronic device 300 and does not constitute a limitation of the electronic device 300 and may include more or less components than those shown, or combine certain components, or different components, e.g., the electronic device 300 may also include input-output devices, network access devices, buses, etc.

The Processor 302 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor 302 may be any conventional processor or the like, and the processor 302 is the control center of the electronic device 300 and connects the various parts of the entire electronic device 300 using various interfaces and lines.

The memory 301 may be used to store computer readable instructions and the processor 302 may implement various functions of the electronic device 300 by executing or executing computer readable instructions or modules stored in the memory 301 and by invoking data stored in the memory 301. The memory 301 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the electronic device 300, and the like. In addition, the Memory 301 may include a hard disk, a Memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Memory Card (Flash Card), at least one magnetic disk storage device, a Flash Memory device, a Read-Only Memory (ROM), a Random Access Memory (RAM), or other non-volatile/volatile storage devices.

The modules integrated by the electronic device 300 may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by the present application, and can also be realized by hardware related to computer readable instructions, which can be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the above described method embodiments can be realized.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for detecting a home scene is characterized by comprising the following steps:

2. The method of claim 1, wherein said capturing target video transform domain coefficients of the home scene video data comprises:

3. The method of claim 1 or 2, wherein said capturing target video transform domain coefficients of the home scene video data comprises:

decoding the home scene video data to obtain decoded video data;

4. The method of claim 2, wherein after said capturing target video transform domain coefficients of the home scene video data, further comprising:

and clearing the video data of the home scene.

5. The method of claim 1, wherein the inputting the target video transform domain coefficient into a preset scene detection network model, detecting whether a user operation event exists in the target area, comprises:

6. The method according to claim 1, wherein after the controlling the smart home terminal to execute the action corresponding to the user operation event, the method further comprises:

7. The method of claim 6, wherein after the merging the association relationship and the association data into model training data and uploading the model training data to a cloud, further comprising:

8. An apparatus for detecting a home scene, comprising:

9. An electronic device, comprising:

a memory for storing executable instructions; and the number of the first and second groups,

a processor for executing the executable instructions with the memory to perform the operations of the method of detecting a home scenario of any of claims 1-7.

10. A computer-readable storage medium storing computer-readable instructions, wherein the instructions, when executed, perform the operations of the method for detecting a home scene of any one of claims 1-7.