CN114519818A

CN114519818A - Method and device for detecting home scene, electronic equipment and medium

Info

Publication number: CN114519818A
Application number: CN202210044678.4A
Authority: CN
Inventors: 张鹏; 向国庆; 刘浩宁; 卢东东; 黄晓峰
Original assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Current assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Priority date: 2022-01-14
Filing date: 2022-01-14
Publication date: 2022-05-20

Abstract

The application discloses a method and a device for detecting a home scene, electronic equipment and a medium. In the application, a plurality of home scene images aiming at a target area can be acquired by utilizing a camera device arranged at an intelligent home terminal; converting a plurality of home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in a target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. By applying the technical scheme, after the video data are collected through the camera of the intelligent home, the video data are converted into unreadable characteristic signal data and the original video data are cleared, so that the furniture scene is detected according to the characteristic signal data and the learning model subsequently. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.

Description

Method and device for detecting home scene, electronic equipment and medium

Technical Field

The present application relates to data processing technologies, and in particular, to a method and an apparatus for detecting a home scene, an electronic device, and a medium.

Background

In smart city application scenes such as smart homes, visual signals are main and indispensable information sources in all scenes, and the application scenes comprise home monitoring, gesture recognition, face recognition, home medical treatment, automatic driving, live entertainment and the like.

Although the visual signal plays an essential important role in smart home, in various application scenarios, the smart home system often directly uses the original visual signal data to perform intelligent analysis on different processing stages of the visual signal data. This also results in potentially significant privacy security concerns for the user once the sensitive privacy data for the user is present in the original video image signal.

Disclosure of Invention

The embodiment of the application provides a method and device for detecting a home scene, electronic equipment and a medium. The method is used for solving the problem that the user privacy content in the video image cannot be correctly processed in the related technology.

According to an aspect of an embodiment of the present application, a method for detecting a home scene includes:

acquiring a plurality of home scene images aiming at a target area by utilizing a camera device arranged at an intelligent home terminal;

converting the multiple home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in the target area, wherein the visual descriptor signals are one of CDVS signals or CDVA signals;

and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event.

Optionally, in another embodiment based on the foregoing method of the present application, after the converting the multiple images of the home scene into one of a CDVS signal and a CDVA signal, the method further includes:

and clearing the home scene image.

Optionally, in another embodiment of the method according to the present application, the inputting the visual descriptor signal to a preset scene detection network model, and detecting whether there is a user operation event in the target area includes:

determining whether human body characteristics exist in a target area where the intelligent home terminal is located currently according to the identification of the visual descriptor signal, wherein the human body characteristics comprise at least one of size characteristics, color characteristics and contour characteristics;

and if the human body characteristics are determined to exist, inputting the visual descriptor signal to a preset scene detection network model, and detecting the user operation event.

Optionally, in another embodiment of the method based on the present application, after the controlling the smart home terminal to execute the action corresponding to the user operation event, the method further includes:

recording the association relation between the visual descriptor signal and the action of executing the user operation event and the association data;

and after the incidence relation and the incidence data are combined into model training data, uploading the model training data to a cloud, wherein the model training data are used for training the scene detection network model.

Optionally, in another embodiment based on the foregoing method of the present application, before the capturing, by a camera device disposed at the smart home terminal, a plurality of images of a home scene for a target area, the method further includes:

acquiring at least two sample images, wherein the sample images comprise at least one user operation instruction feature;

marking a corresponding user operation event for each sample image based on the user operation instruction characteristics;

and training a preset image semantic segmentation model by using the sample image marked with the user operation event and the user operation instruction characteristics included in the sample image to obtain the scene detection network model meeting preset conditions.

Optionally, in another embodiment based on the foregoing method of the present application, after obtaining the scene detection network model satisfying a preset condition, the method further includes:

carrying out model compression on the scene detection network model to obtain a compressed scene detection network model;

deploying the scene detection network model to a server side, and deploying the compressed scene detection network model to the intelligent home terminal;

after an operation instruction is obtained, determining an identification mode based on the operation state of the intelligent home terminal, wherein the identification mode is identified by utilizing the scene detection network model or the compressed scene detection network model;

and determining to input the visual descriptor signal to a preset scene detection network model based on the identification mode.

According to another aspect of the embodiments of the present application, there is provided an apparatus for detecting a home scene, including:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is configured to acquire a plurality of home scene images aiming at a target area by utilizing a camera device arranged at an intelligent home terminal;

the conversion module is configured to convert the multiple home scene images into corresponding visual descriptor signals, input the visual descriptor signals into a preset scene detection network model, and detect whether a user operation event exists in the target area, wherein the visual descriptor signals are one of CDVS signals or CDVA signals;

and the execution module is configured to control the intelligent home terminal to execute the action corresponding to the user operation event if the execution module exists.

According to another aspect of the embodiments of the present application, there is provided an electronic device including:

a memory for storing executable instructions; and

and the display is used for being matched with the memory to execute the executable instructions so as to complete the operation of any one of the methods for detecting the home scene.

According to a further aspect of the embodiments of the present application, there is provided a computer-readable storage medium for storing computer-readable instructions, which, when executed, perform the operations of any one of the methods for detecting a home scene.

In the application, a plurality of home scene images aiming at a target area can be acquired by utilizing a camera device arranged at an intelligent home terminal; converting a plurality of home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in a target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. By applying the technical scheme, after the video data are collected through the camera of the intelligent home, the video data are converted into unreadable characteristic signal data and the original video data are cleared, so that the furniture scene is detected according to the characteristic signal data and the learning model subsequently. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.

The technical solution of the present application is further described in detail by the accompanying drawings and examples.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.

The present application may be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:

fig. 1 is a schematic diagram of a method for detecting a home scene according to the present application;

fig. 2 is a schematic flow chart of converting a home scene image into a visual descriptor signal according to the present application;

FIG. 3 is a schematic flow chart of a method for training a scene detection network model according to the present application;

fig. 4 is a schematic structural diagram of an electronic device for detecting a home scene according to the present application;

fig. 5 is a schematic structural diagram of an electronic device for detecting a home scene according to the present application.

Detailed Description

Various exemplary embodiments of the present application will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the application, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

In addition, technical solutions between the various embodiments of the present application may be combined with each other, but it must be based on the realization of the technical solutions by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should be considered to be absent and not within the protection scope of the present application.

It should be noted that all the directional indicators (such as upper, lower, left, right, front and rear … …) in the embodiment of the present application are only used to explain the relative position relationship between the components, the motion situation, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly.

A method for detecting a home scene according to an exemplary embodiment of the present application is described below with reference to fig. 1 to 3. It should be noted that the following application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect. Rather, embodiments of the present application may be applied to any scenario where applicable.

The application also provides a method and a device for detecting the home scene, electronic equipment and a medium.

Fig. 1 schematically shows a flow chart of a method for detecting a home scene according to an embodiment of the present application. As shown in fig. 1, the method includes:

s101, collecting a plurality of home scene images aiming at a target area by using a camera device arranged at an intelligent home terminal.

S102, converting the multiple home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in a target area, wherein the visual descriptor signals are CDVS signals or CDVA signals.

And S103, if the user operation event exists, controlling the intelligent home terminal to execute the action corresponding to the user operation event.

In the related technology, in smart home and other smart city application scenes, visual signals are main and indispensable information sources in each scene, and the application scenes comprise home monitoring, gesture recognition, face recognition, home medical treatment, automatic driving, live entertainment and the like. The visual signal is used as a main body, a mainstream intelligent home frame is established around a user, a data, software and hardware system and necessary AI and Internet of things bottom layer technologies, and the daily life of people is more and more improved, so that the visual signal becomes an important component part of intelligent and smart life.

Although visual signals play an indispensable important role in smart home, in various application scenarios, the smart home system often directly uses original visual signal data to perform intelligent analysis at different processing stages of the visual signal data. Obviously, once the user sensitive privacy data in the original visual signal is leaked at a certain stage, the problem of great privacy safety hidden danger cannot be avoided.

In visual signals, video data is particularly important. The intelligent home system cannot avoid the privacy potential safety hazard problem generated in different stages in the video data processing mode. The privacy security problem in scenes such as smart homes is highly valued by people at present, and the privacy security problem also becomes an important challenge for more advanced application of smart homes.

For the typical problems, in the current scenes such as smart homes, the video data stream desensitization method mainly adopts the traditional video coding processing mode to further process the decoded data. The privacy data can be protected to a certain extent, and the risk of exposure of the privacy data is reduced. However, once the private data is leaked at a certain stage before decoding, the private data is directly exposed, which inevitably causes a great hidden privacy safety hazard problem, and the processing mode after decoding has a low degree on the private content.

Based on the above problem, this application can be after gathering video data through the camera of intelligent house, with video data conversion for unreadable characteristic signal data and clear away original video data to make follow-up detect the furniture scene according to this characteristic signal data and study model. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.

Furthermore, in order to avoid the potential safety hazard problem that the video data is cracked and exposed after being leaked, the invention provides a processing mode of directly removing the original sensitive video data in scenes with more sensitive private data, such as smart homes, and the like, and the processing mode is changed into a mode of directly characterizing the video data which is acquired by a smart home camera and contains a plurality of home scene images to obtain corresponding visual descriptor signals.

Specifically, as shown in fig. 2, the present application may convert multiple images of the home scene into one of the CDVS signal and the CDVA signal for processing the video data by using the feature signal. Further, for many scenes in smart homes, common functions include identification, retrieval, tracking, and the like. These similar functions do not need to rely on the original data signal to be processed. On the other hand, many devices use the original signal and then convert the original signal into a characteristic signal for processing. Compared with the original visual signals, the signals are characteristic signals and are generally directly applied to model training and scene recognition, analysis and application in a machine. The characteristic signals usually do not have the characteristics of direct human readability, reversibility and the like, namely, even if the visual characteristic signals are leaked, the acquired personnel cannot translate and understand the meanings represented by the signals, so that the risk of cracking exposure is greatly reduced.

In addition, in the present application, after converting the plurality of home scene images into one of the CDVS signal and the CDVA signal, the home scene images also need to be subjected to the erasing process. Therefore, the problem that potential safety hazards are easy to occur due to leakage of original video image data is further avoided.

In addition, according to the method and the device, correlation analysis can be performed on unreadable data such as CDVS/CDVA and the like and a scene detection network model, and the unreadable data is converted into an analysis result under each application scene, so that the difficulty of depending on original data is overcome, and privacy disclosure and exposure risks of the original video data are avoided.

Specifically, the visual descriptor signal is input to the preset scene detection network model to detect whether the data generated in the process of the user operation event in the target area still has no human readability and reversibility and is unique, so that the technical means can prevent the data from being cracked and exposed, and the risk of privacy content is effectively reduced.

In one mode, in order to improve the decision-making performance of the smart home, as shown in fig. 3, in the embodiment of the present application, when the smart home detects that a user operation event of a manual operation occurs, a visual feature signal and a manual operation result may be recorded and stored, so as to be used as training data of a scene detection network model. The current data is used as the latest data to replace old data, and the total data storage amount is ensured to be unchanged. By using the technology, the processed data is transmitted to the cloud end for subsequent equipment updating, function upgrading and the like. The transmission data does not depend on the original video data, and the problem of privacy potential safety hazard generated after the data are intercepted can be effectively avoided.

Optionally, in another embodiment based on the above method of the present application, the converting the plurality of images of the home scene into corresponding visual descriptor signals includes:

and converting the plurality of home scene images into one of a CDVS signal or a CDVA signal.

and clearing the home scene image.

Optionally, in another embodiment based on the foregoing method of the present application, the inputting the visual descriptor signal to a preset scene detection network model, and detecting whether there is a user operation event in the target area includes:

Optionally, in another embodiment based on the foregoing method of the present application, after the controlling the smart home terminal to execute the action corresponding to the user operation event, the method further includes:

after an operation instruction is obtained, determining an identification mode based on the operation state of the intelligent home terminal, wherein the identification mode corresponds to identification by using the scene detection network model or identification by using the compressed scene detection network model;

and determining to input the visual descriptor signal to a preset scene detection network model based on the recognition mode.

Further, after the scene detection network model is obtained, the defect that a large memory of the smart home needs to be occupied due to the fact that the data architecture of the scene detection network model is too large is avoided. The method and the device can also perform model compression on the data to obtain a corresponding compressed scene detection network model with a smaller data architecture.

Optionally, the mode of compressing the scene detection network model by the method may be a method of directly compressing the scene detection network model, and may include two aspects of model kernel sparsification and model clipping, for example. The thinning of the kernel needs the support of some sparse computation libraries, and the acceleration effect may be limited by many factors such as bandwidth and sparsity. In addition, the clipping method of the model needs to directly remove the unimportant filter parameters from the original model. Because the self-adaptive capacity of the neural network is very strong, and the model with a large data architecture is often more redundant, after some parameters are removed, the performance reduced by parameter removal can be recovered through a retraining means, so that the model can be effectively compressed to a great extent on the basis of the existing model only by selecting a proper clipping means and retraining means, and the method is the most common method used at present.

Furthermore, after the compressed scene detection network model with a small data architecture is obtained, the compressed scene detection network model can be deployed on smart homes. Therefore, the compressed image detection classification model can be used for recognizing a plurality of home scene images collected by the camera device subsequently by the smart home, so that the images are subsequently converted into corresponding visual descriptor signals.

In addition, the scene detection network model with a large data structure can be deployed in the server, so that the identification mode is determined based on the running state of the smart home. And then, a corresponding detection network model is selected in a targeted manner to detect whether a user operation event exists in the target area.

By applying the technical scheme, after the video data are collected through the camera of the intelligent home, the video data are converted into unreadable characteristic signal data and the original video data are cleared, so that the furniture scene is detected according to the characteristic signal data and the learning model subsequently. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.

Optionally, in another embodiment of the present application, as shown in fig. 4, the present application further provides a device for detecting a home scene. Which comprises the following steps:

the system comprises an acquisition module 201, a display module and a display module, wherein the acquisition module is configured to acquire a plurality of home scene images aiming at a target area by utilizing a camera device arranged on an intelligent home terminal;

a conversion module 202, configured to convert the multiple home scene images into corresponding visual descriptor signals, input the visual descriptor signals into a preset scene detection network model, and detect whether a user operation event exists in the target area, where the visual descriptor signals are one of CDVS signals or CDVA signals;

and the execution module 203 is configured to control the smart home terminal to execute the action corresponding to the user operation event if the user operation event exists.

In another embodiment of the present application, the acquisition module 201 is configured to perform the following steps:

and clearing the home scene image.

and if the human body characteristics exist, inputting the visual descriptor signal to a preset scene detection network model, and detecting the user operation event.

the method comprises the steps of obtaining at least two sample images, wherein the sample images comprise at least one user operation instruction characteristic;

and training a preset image semantic segmentation model by using the sample image marked with the user operation event and the user operation instruction characteristics included in the sample image to obtain a scene detection network model meeting preset conditions.

Fig. 5 is a block diagram illustrating a logical structure of an electronic device in accordance with an exemplary embodiment. For example, the electronic device 300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium, such as a memory, including instructions executable by a processor of an electronic device to perform the method of detecting a home scene, the method including: acquiring a plurality of home scene images aiming at a target area by utilizing a camera device arranged at an intelligent home terminal; converting the multiple home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in the target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. Optionally, the instructions may also be executable by a processor of the electronic device to perform other steps involved in the exemplary embodiments described above. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided an application/computer program product including one or more instructions executable by a processor of an electronic device to perform the method of detecting a home scenario described above, the method including: acquiring a plurality of home scene images aiming at a target area by utilizing a camera device arranged at an intelligent home terminal; converting the multiple home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in the target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. Optionally, the instructions may also be executable by a processor of the electronic device to perform other steps involved in the exemplary embodiments described above.

Those skilled in the art will appreciate that the schematic diagram 5 is merely an example of the electronic device 300 and does not constitute a limitation of the electronic device 300 and may include more or less components than those shown, or combine certain components, or different components, e.g., the electronic device 300 may also include input-output devices, network access devices, buses, etc.

The Processor 302 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor 302 may be any conventional processor or the like, and the processor 302 is the control center of the electronic device 300 and connects the various parts of the entire electronic device 300 using various interfaces and lines.

The memory 301 may be used to store computer readable instructions and the processor 302 may implement various functions of the electronic device 300 by executing or executing computer readable instructions or modules stored in the memory 301 and by invoking data stored in the memory 301. The memory 301 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device 300, and the like. In addition, the Memory 301 may include a hard disk, a Memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Memory Card (Flash Card), at least one disk storage device, a Flash Memory device, a Read-Only Memory (ROM), a Random Access Memory (RAM), or other non-volatile/volatile storage devices.

The modules integrated by the electronic device 300 may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by hardware related to computer readable instructions, which may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the method embodiments may be implemented.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for detecting a home scene is characterized by comprising the following steps:

2. The method of claim 1, further comprising, after said converting the plurality of home scene images to one of a CDVS signal or a CDVA signal:

and clearing the home scene image.

3. The method of claim 1, wherein inputting the visual descriptor signal to a preset scene detection network model, detecting whether a user operation event exists in the target area comprises:

4. The method according to claim 1, wherein after the controlling the smart home terminal to execute the action corresponding to the user operation event, the method further comprises:

5. The method according to claim 1, before the capturing, by using a camera device provided at the smart home terminal, a plurality of images of the home scene for the target area, further comprising:

6. The method of claim 5, wherein after obtaining the scene detection network model satisfying a preset condition, the method further comprises:

7. An apparatus for detecting a home scene, comprising:

8. An electronic device, comprising:

a memory for storing executable instructions; and the number of the first and second groups,

a processor for executing the executable instructions with the memory to perform the operations of the method of detecting a home scenario of any of claims 1-6.

9. A computer-readable storage medium storing computer-readable instructions, wherein the instructions, when executed, perform the operations of the method for detecting a home scene of any one of claims 1-6.