CN114519818A - Method and device for detecting home scene, electronic equipment and medium - Google Patents

Method and device for detecting home scene, electronic equipment and medium Download PDF

Info

Publication number
CN114519818A
CN114519818A CN202210044678.4A CN202210044678A CN114519818A CN 114519818 A CN114519818 A CN 114519818A CN 202210044678 A CN202210044678 A CN 202210044678A CN 114519818 A CN114519818 A CN 114519818A
Authority
CN
China
Prior art keywords
scene
network model
detection network
user operation
home
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210044678.4A
Other languages
Chinese (zh)
Inventor
张鹏
向国庆
刘浩宁
卢东东
黄晓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Original Assignee
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Institute of Information Technology AIIT of Peking University, Hangzhou Weiming Information Technology Co Ltd filed Critical Advanced Institute of Information Technology AIIT of Peking University
Priority to CN202210044678.4A priority Critical patent/CN114519818A/en
Publication of CN114519818A publication Critical patent/CN114519818A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/40Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method and a device for detecting a home scene, electronic equipment and a medium. In the application, a plurality of home scene images aiming at a target area can be acquired by utilizing a camera device arranged at an intelligent home terminal; converting a plurality of home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in a target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. By applying the technical scheme, after the video data are collected through the camera of the intelligent home, the video data are converted into unreadable characteristic signal data and the original video data are cleared, so that the furniture scene is detected according to the characteristic signal data and the learning model subsequently. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.

Description

Method and device for detecting home scene, electronic equipment and medium
Technical Field
The present application relates to data processing technologies, and in particular, to a method and an apparatus for detecting a home scene, an electronic device, and a medium.
Background
In smart city application scenes such as smart homes, visual signals are main and indispensable information sources in all scenes, and the application scenes comprise home monitoring, gesture recognition, face recognition, home medical treatment, automatic driving, live entertainment and the like.
Although the visual signal plays an essential important role in smart home, in various application scenarios, the smart home system often directly uses the original visual signal data to perform intelligent analysis on different processing stages of the visual signal data. This also results in potentially significant privacy security concerns for the user once the sensitive privacy data for the user is present in the original video image signal.
Disclosure of Invention
The embodiment of the application provides a method and device for detecting a home scene, electronic equipment and a medium. The method is used for solving the problem that the user privacy content in the video image cannot be correctly processed in the related technology.
According to an aspect of an embodiment of the present application, a method for detecting a home scene includes:
acquiring a plurality of home scene images aiming at a target area by utilizing a camera device arranged at an intelligent home terminal;
converting the multiple home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in the target area, wherein the visual descriptor signals are one of CDVS signals or CDVA signals;
and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event.
Optionally, in another embodiment based on the foregoing method of the present application, after the converting the multiple images of the home scene into one of a CDVS signal and a CDVA signal, the method further includes:
and clearing the home scene image.
Optionally, in another embodiment of the method according to the present application, the inputting the visual descriptor signal to a preset scene detection network model, and detecting whether there is a user operation event in the target area includes:
determining whether human body characteristics exist in a target area where the intelligent home terminal is located currently according to the identification of the visual descriptor signal, wherein the human body characteristics comprise at least one of size characteristics, color characteristics and contour characteristics;
and if the human body characteristics are determined to exist, inputting the visual descriptor signal to a preset scene detection network model, and detecting the user operation event.
Optionally, in another embodiment of the method based on the present application, after the controlling the smart home terminal to execute the action corresponding to the user operation event, the method further includes:
recording the association relation between the visual descriptor signal and the action of executing the user operation event and the association data;
and after the incidence relation and the incidence data are combined into model training data, uploading the model training data to a cloud, wherein the model training data are used for training the scene detection network model.
Optionally, in another embodiment based on the foregoing method of the present application, before the capturing, by a camera device disposed at the smart home terminal, a plurality of images of a home scene for a target area, the method further includes:
acquiring at least two sample images, wherein the sample images comprise at least one user operation instruction feature;
marking a corresponding user operation event for each sample image based on the user operation instruction characteristics;
and training a preset image semantic segmentation model by using the sample image marked with the user operation event and the user operation instruction characteristics included in the sample image to obtain the scene detection network model meeting preset conditions.
Optionally, in another embodiment based on the foregoing method of the present application, after obtaining the scene detection network model satisfying a preset condition, the method further includes:
carrying out model compression on the scene detection network model to obtain a compressed scene detection network model;
deploying the scene detection network model to a server side, and deploying the compressed scene detection network model to the intelligent home terminal;
after an operation instruction is obtained, determining an identification mode based on the operation state of the intelligent home terminal, wherein the identification mode is identified by utilizing the scene detection network model or the compressed scene detection network model;
and determining to input the visual descriptor signal to a preset scene detection network model based on the identification mode.
According to another aspect of the embodiments of the present application, there is provided an apparatus for detecting a home scene, including:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is configured to acquire a plurality of home scene images aiming at a target area by utilizing a camera device arranged at an intelligent home terminal;
the conversion module is configured to convert the multiple home scene images into corresponding visual descriptor signals, input the visual descriptor signals into a preset scene detection network model, and detect whether a user operation event exists in the target area, wherein the visual descriptor signals are one of CDVS signals or CDVA signals;
and the execution module is configured to control the intelligent home terminal to execute the action corresponding to the user operation event if the execution module exists.
According to another aspect of the embodiments of the present application, there is provided an electronic device including:
a memory for storing executable instructions; and
and the display is used for being matched with the memory to execute the executable instructions so as to complete the operation of any one of the methods for detecting the home scene.
According to a further aspect of the embodiments of the present application, there is provided a computer-readable storage medium for storing computer-readable instructions, which, when executed, perform the operations of any one of the methods for detecting a home scene.
In the application, a plurality of home scene images aiming at a target area can be acquired by utilizing a camera device arranged at an intelligent home terminal; converting a plurality of home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in a target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. By applying the technical scheme, after the video data are collected through the camera of the intelligent home, the video data are converted into unreadable characteristic signal data and the original video data are cleared, so that the furniture scene is detected according to the characteristic signal data and the learning model subsequently. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.
The technical solution of the present application is further described in detail by the accompanying drawings and examples.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.
The present application may be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:
fig. 1 is a schematic diagram of a method for detecting a home scene according to the present application;
fig. 2 is a schematic flow chart of converting a home scene image into a visual descriptor signal according to the present application;
FIG. 3 is a schematic flow chart of a method for training a scene detection network model according to the present application;
fig. 4 is a schematic structural diagram of an electronic device for detecting a home scene according to the present application;
fig. 5 is a schematic structural diagram of an electronic device for detecting a home scene according to the present application.
Detailed Description
Various exemplary embodiments of the present application will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the application, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
In addition, technical solutions between the various embodiments of the present application may be combined with each other, but it must be based on the realization of the technical solutions by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should be considered to be absent and not within the protection scope of the present application.
It should be noted that all the directional indicators (such as upper, lower, left, right, front and rear … …) in the embodiment of the present application are only used to explain the relative position relationship between the components, the motion situation, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly.
A method for detecting a home scene according to an exemplary embodiment of the present application is described below with reference to fig. 1 to 3. It should be noted that the following application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect. Rather, embodiments of the present application may be applied to any scenario where applicable.
The application also provides a method and a device for detecting the home scene, electronic equipment and a medium.
Fig. 1 schematically shows a flow chart of a method for detecting a home scene according to an embodiment of the present application. As shown in fig. 1, the method includes:
s101, collecting a plurality of home scene images aiming at a target area by using a camera device arranged at an intelligent home terminal.
S102, converting the multiple home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in a target area, wherein the visual descriptor signals are CDVS signals or CDVA signals.
And S103, if the user operation event exists, controlling the intelligent home terminal to execute the action corresponding to the user operation event.
In the related technology, in smart home and other smart city application scenes, visual signals are main and indispensable information sources in each scene, and the application scenes comprise home monitoring, gesture recognition, face recognition, home medical treatment, automatic driving, live entertainment and the like. The visual signal is used as a main body, a mainstream intelligent home frame is established around a user, a data, software and hardware system and necessary AI and Internet of things bottom layer technologies, and the daily life of people is more and more improved, so that the visual signal becomes an important component part of intelligent and smart life.
Although visual signals play an indispensable important role in smart home, in various application scenarios, the smart home system often directly uses original visual signal data to perform intelligent analysis at different processing stages of the visual signal data. Obviously, once the user sensitive privacy data in the original visual signal is leaked at a certain stage, the problem of great privacy safety hidden danger cannot be avoided.
In visual signals, video data is particularly important. The intelligent home system cannot avoid the privacy potential safety hazard problem generated in different stages in the video data processing mode. The privacy security problem in scenes such as smart homes is highly valued by people at present, and the privacy security problem also becomes an important challenge for more advanced application of smart homes.
For the typical problems, in the current scenes such as smart homes, the video data stream desensitization method mainly adopts the traditional video coding processing mode to further process the decoded data. The privacy data can be protected to a certain extent, and the risk of exposure of the privacy data is reduced. However, once the private data is leaked at a certain stage before decoding, the private data is directly exposed, which inevitably causes a great hidden privacy safety hazard problem, and the processing mode after decoding has a low degree on the private content.
Based on the above problem, this application can be after gathering video data through the camera of intelligent house, with video data conversion for unreadable characteristic signal data and clear away original video data to make follow-up detect the furniture scene according to this characteristic signal data and study model. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.
Furthermore, in order to avoid the potential safety hazard problem that the video data is cracked and exposed after being leaked, the invention provides a processing mode of directly removing the original sensitive video data in scenes with more sensitive private data, such as smart homes, and the like, and the processing mode is changed into a mode of directly characterizing the video data which is acquired by a smart home camera and contains a plurality of home scene images to obtain corresponding visual descriptor signals.
Specifically, as shown in fig. 2, the present application may convert multiple images of the home scene into one of the CDVS signal and the CDVA signal for processing the video data by using the feature signal. Further, for many scenes in smart homes, common functions include identification, retrieval, tracking, and the like. These similar functions do not need to rely on the original data signal to be processed. On the other hand, many devices use the original signal and then convert the original signal into a characteristic signal for processing. Compared with the original visual signals, the signals are characteristic signals and are generally directly applied to model training and scene recognition, analysis and application in a machine. The characteristic signals usually do not have the characteristics of direct human readability, reversibility and the like, namely, even if the visual characteristic signals are leaked, the acquired personnel cannot translate and understand the meanings represented by the signals, so that the risk of cracking exposure is greatly reduced.
In addition, in the present application, after converting the plurality of home scene images into one of the CDVS signal and the CDVA signal, the home scene images also need to be subjected to the erasing process. Therefore, the problem that potential safety hazards are easy to occur due to leakage of original video image data is further avoided.
In addition, according to the method and the device, correlation analysis can be performed on unreadable data such as CDVS/CDVA and the like and a scene detection network model, and the unreadable data is converted into an analysis result under each application scene, so that the difficulty of depending on original data is overcome, and privacy disclosure and exposure risks of the original video data are avoided.
Specifically, the visual descriptor signal is input to the preset scene detection network model to detect whether the data generated in the process of the user operation event in the target area still has no human readability and reversibility and is unique, so that the technical means can prevent the data from being cracked and exposed, and the risk of privacy content is effectively reduced.
In one mode, in order to improve the decision-making performance of the smart home, as shown in fig. 3, in the embodiment of the present application, when the smart home detects that a user operation event of a manual operation occurs, a visual feature signal and a manual operation result may be recorded and stored, so as to be used as training data of a scene detection network model. The current data is used as the latest data to replace old data, and the total data storage amount is ensured to be unchanged. By using the technology, the processed data is transmitted to the cloud end for subsequent equipment updating, function upgrading and the like. The transmission data does not depend on the original video data, and the problem of privacy potential safety hazard generated after the data are intercepted can be effectively avoided.
In the application, a plurality of home scene images aiming at a target area can be acquired by utilizing a camera device arranged at an intelligent home terminal; converting a plurality of home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in a target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. By applying the technical scheme, after the video data are collected through the camera of the intelligent home, the video data are converted into unreadable characteristic signal data and the original video data are cleared, so that the furniture scene is detected according to the characteristic signal data and the learning model subsequently. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.
Optionally, in another embodiment based on the above method of the present application, the converting the plurality of images of the home scene into corresponding visual descriptor signals includes:
and converting the plurality of home scene images into one of a CDVS signal or a CDVA signal.
Optionally, in another embodiment based on the foregoing method of the present application, after the converting the multiple images of the home scene into one of a CDVS signal and a CDVA signal, the method further includes:
and clearing the home scene image.
Optionally, in another embodiment based on the foregoing method of the present application, the inputting the visual descriptor signal to a preset scene detection network model, and detecting whether there is a user operation event in the target area includes:
determining whether human body characteristics exist in a target area where the intelligent home terminal is located currently according to the identification of the visual descriptor signal, wherein the human body characteristics comprise at least one of size characteristics, color characteristics and contour characteristics;
and if the human body characteristics are determined to exist, inputting the visual descriptor signal to a preset scene detection network model, and detecting the user operation event.
Optionally, in another embodiment based on the foregoing method of the present application, after the controlling the smart home terminal to execute the action corresponding to the user operation event, the method further includes:
recording the association relation between the visual descriptor signal and the action of executing the user operation event and the association data;
and after the incidence relation and the incidence data are combined into model training data, uploading the model training data to a cloud, wherein the model training data are used for training the scene detection network model.
Optionally, in another embodiment based on the foregoing method of the present application, before the capturing, by a camera device disposed at the smart home terminal, a plurality of images of a home scene for a target area, the method further includes:
acquiring at least two sample images, wherein the sample images comprise at least one user operation instruction feature;
marking a corresponding user operation event for each sample image based on the user operation instruction characteristics;
and training a preset image semantic segmentation model by using the sample image marked with the user operation event and the user operation instruction characteristics included in the sample image to obtain the scene detection network model meeting preset conditions.
Optionally, in another embodiment based on the foregoing method of the present application, after obtaining the scene detection network model satisfying a preset condition, the method further includes:
carrying out model compression on the scene detection network model to obtain a compressed scene detection network model;
deploying the scene detection network model to a server side, and deploying the compressed scene detection network model to the intelligent home terminal;
after an operation instruction is obtained, determining an identification mode based on the operation state of the intelligent home terminal, wherein the identification mode corresponds to identification by using the scene detection network model or identification by using the compressed scene detection network model;
and determining to input the visual descriptor signal to a preset scene detection network model based on the recognition mode.
Further, after the scene detection network model is obtained, the defect that a large memory of the smart home needs to be occupied due to the fact that the data architecture of the scene detection network model is too large is avoided. The method and the device can also perform model compression on the data to obtain a corresponding compressed scene detection network model with a smaller data architecture.
Optionally, the mode of compressing the scene detection network model by the method may be a method of directly compressing the scene detection network model, and may include two aspects of model kernel sparsification and model clipping, for example. The thinning of the kernel needs the support of some sparse computation libraries, and the acceleration effect may be limited by many factors such as bandwidth and sparsity. In addition, the clipping method of the model needs to directly remove the unimportant filter parameters from the original model. Because the self-adaptive capacity of the neural network is very strong, and the model with a large data architecture is often more redundant, after some parameters are removed, the performance reduced by parameter removal can be recovered through a retraining means, so that the model can be effectively compressed to a great extent on the basis of the existing model only by selecting a proper clipping means and retraining means, and the method is the most common method used at present.
Furthermore, after the compressed scene detection network model with a small data architecture is obtained, the compressed scene detection network model can be deployed on smart homes. Therefore, the compressed image detection classification model can be used for recognizing a plurality of home scene images collected by the camera device subsequently by the smart home, so that the images are subsequently converted into corresponding visual descriptor signals.
In addition, the scene detection network model with a large data structure can be deployed in the server, so that the identification mode is determined based on the running state of the smart home. And then, a corresponding detection network model is selected in a targeted manner to detect whether a user operation event exists in the target area.
By applying the technical scheme, after the video data are collected through the camera of the intelligent home, the video data are converted into unreadable characteristic signal data and the original video data are cleared, so that the furniture scene is detected according to the characteristic signal data and the learning model subsequently. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.
Optionally, in another embodiment of the present application, as shown in fig. 4, the present application further provides a device for detecting a home scene. Which comprises the following steps:
the system comprises an acquisition module 201, a display module and a display module, wherein the acquisition module is configured to acquire a plurality of home scene images aiming at a target area by utilizing a camera device arranged on an intelligent home terminal;
a conversion module 202, configured to convert the multiple home scene images into corresponding visual descriptor signals, input the visual descriptor signals into a preset scene detection network model, and detect whether a user operation event exists in the target area, where the visual descriptor signals are one of CDVS signals or CDVA signals;
and the execution module 203 is configured to control the smart home terminal to execute the action corresponding to the user operation event if the user operation event exists.
In the application, a plurality of home scene images aiming at a target area can be acquired by utilizing a camera device arranged at an intelligent home terminal; converting a plurality of home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in a target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. By applying the technical scheme, after the video data are collected through the camera of the intelligent home, the video data are converted into unreadable characteristic signal data and the original video data are cleared, so that the furniture scene is detected according to the characteristic signal data and the learning model subsequently. Therefore, the problem that potential safety hazards are easy to appear after leakage caused by readability of original video data is solved.
In another embodiment of the present application, the acquisition module 201 is configured to perform the following steps:
and clearing the home scene image.
In another embodiment of the present application, the acquisition module 201 is configured to perform the following steps:
determining whether human body characteristics exist in a target area where the intelligent home terminal is located currently according to the identification of the visual descriptor signal, wherein the human body characteristics comprise at least one of size characteristics, color characteristics and contour characteristics;
and if the human body characteristics exist, inputting the visual descriptor signal to a preset scene detection network model, and detecting the user operation event.
In another embodiment of the present application, the acquisition module 201 is configured to perform the following steps:
recording the association relation between the visual descriptor signal and the action of executing the user operation event and the association data;
and after the incidence relation and the incidence data are combined into model training data, uploading the model training data to a cloud, wherein the model training data are used for training the scene detection network model.
In another embodiment of the present application, the acquisition module 201 is configured to perform the following steps:
the method comprises the steps of obtaining at least two sample images, wherein the sample images comprise at least one user operation instruction characteristic;
marking a corresponding user operation event for each sample image based on the user operation instruction characteristics;
and training a preset image semantic segmentation model by using the sample image marked with the user operation event and the user operation instruction characteristics included in the sample image to obtain a scene detection network model meeting preset conditions.
In another embodiment of the present application, the acquisition module 201 is configured to perform the following steps:
carrying out model compression on the scene detection network model to obtain a compressed scene detection network model;
deploying the scene detection network model to a server side, and deploying the compressed scene detection network model to the intelligent home terminal;
after an operation instruction is obtained, determining an identification mode based on the operation state of the intelligent home terminal, wherein the identification mode corresponds to identification by using the scene detection network model or identification by using the compressed scene detection network model;
and determining to input the visual descriptor signal to a preset scene detection network model based on the identification mode.
Fig. 5 is a block diagram illustrating a logical structure of an electronic device in accordance with an exemplary embodiment. For example, the electronic device 300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium, such as a memory, including instructions executable by a processor of an electronic device to perform the method of detecting a home scene, the method including: acquiring a plurality of home scene images aiming at a target area by utilizing a camera device arranged at an intelligent home terminal; converting the multiple home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in the target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. Optionally, the instructions may also be executable by a processor of the electronic device to perform other steps involved in the exemplary embodiments described above. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, there is also provided an application/computer program product including one or more instructions executable by a processor of an electronic device to perform the method of detecting a home scenario described above, the method including: acquiring a plurality of home scene images aiming at a target area by utilizing a camera device arranged at an intelligent home terminal; converting the multiple home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in the target area; and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event. Optionally, the instructions may also be executable by a processor of the electronic device to perform other steps involved in the exemplary embodiments described above.
Those skilled in the art will appreciate that the schematic diagram 5 is merely an example of the electronic device 300 and does not constitute a limitation of the electronic device 300 and may include more or less components than those shown, or combine certain components, or different components, e.g., the electronic device 300 may also include input-output devices, network access devices, buses, etc.
The Processor 302 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor 302 may be any conventional processor or the like, and the processor 302 is the control center of the electronic device 300 and connects the various parts of the entire electronic device 300 using various interfaces and lines.
The memory 301 may be used to store computer readable instructions and the processor 302 may implement various functions of the electronic device 300 by executing or executing computer readable instructions or modules stored in the memory 301 and by invoking data stored in the memory 301. The memory 301 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device 300, and the like. In addition, the Memory 301 may include a hard disk, a Memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Memory Card (Flash Card), at least one disk storage device, a Flash Memory device, a Read-Only Memory (ROM), a Random Access Memory (RAM), or other non-volatile/volatile storage devices.
The modules integrated by the electronic device 300 may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by hardware related to computer readable instructions, which may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the method embodiments may be implemented.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (9)

1. A method for detecting a home scene is characterized by comprising the following steps:
acquiring a plurality of home scene images aiming at a target area by utilizing a camera device arranged at an intelligent home terminal;
converting the multiple home scene images into corresponding visual descriptor signals, inputting the visual descriptor signals into a preset scene detection network model, and detecting whether a user operation event exists in the target area, wherein the visual descriptor signals are one of CDVS signals or CDVA signals;
and if so, controlling the intelligent home terminal to execute the action corresponding to the user operation event.
2. The method of claim 1, further comprising, after said converting the plurality of home scene images to one of a CDVS signal or a CDVA signal:
and clearing the home scene image.
3. The method of claim 1, wherein inputting the visual descriptor signal to a preset scene detection network model, detecting whether a user operation event exists in the target area comprises:
determining whether human body characteristics exist in a target area where the intelligent home terminal is located currently according to the identification of the visual descriptor signal, wherein the human body characteristics comprise at least one of size characteristics, color characteristics and contour characteristics;
and if the human body characteristics are determined to exist, inputting the visual descriptor signal to a preset scene detection network model, and detecting the user operation event.
4. The method according to claim 1, wherein after the controlling the smart home terminal to execute the action corresponding to the user operation event, the method further comprises:
recording the association relation between the visual descriptor signal and the action of executing the user operation event and the association data;
and after the incidence relation and the incidence data are combined into model training data, uploading the model training data to a cloud, wherein the model training data are used for training the scene detection network model.
5. The method according to claim 1, before the capturing, by using a camera device provided at the smart home terminal, a plurality of images of the home scene for the target area, further comprising:
acquiring at least two sample images, wherein the sample images comprise at least one user operation instruction feature;
marking a corresponding user operation event for each sample image based on the user operation instruction characteristics;
and training a preset image semantic segmentation model by using the sample image marked with the user operation event and the user operation instruction characteristics included in the sample image to obtain a scene detection network model meeting preset conditions.
6. The method of claim 5, wherein after obtaining the scene detection network model satisfying a preset condition, the method further comprises:
carrying out model compression on the scene detection network model to obtain a compressed scene detection network model;
deploying the scene detection network model to a server side, and deploying the compressed scene detection network model to the intelligent home terminal;
after an operation instruction is obtained, determining an identification mode based on the operation state of the intelligent home terminal, wherein the identification mode corresponds to identification by using the scene detection network model or identification by using the compressed scene detection network model;
and determining to input the visual descriptor signal to a preset scene detection network model based on the recognition mode.
7. An apparatus for detecting a home scene, comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is configured to acquire a plurality of home scene images aiming at a target area by utilizing a camera device arranged at an intelligent home terminal;
the conversion module is configured to convert the multiple home scene images into corresponding visual descriptor signals, input the visual descriptor signals into a preset scene detection network model, and detect whether a user operation event exists in the target area, wherein the visual descriptor signals are one of CDVS signals or CDVA signals;
and the execution module is configured to control the intelligent home terminal to execute the action corresponding to the user operation event if the execution module exists.
8. An electronic device, comprising:
a memory for storing executable instructions; and the number of the first and second groups,
a processor for executing the executable instructions with the memory to perform the operations of the method of detecting a home scenario of any of claims 1-6.
9. A computer-readable storage medium storing computer-readable instructions, wherein the instructions, when executed, perform the operations of the method for detecting a home scene of any one of claims 1-6.
CN202210044678.4A 2022-01-14 2022-01-14 Method and device for detecting home scene, electronic equipment and medium Pending CN114519818A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210044678.4A CN114519818A (en) 2022-01-14 2022-01-14 Method and device for detecting home scene, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210044678.4A CN114519818A (en) 2022-01-14 2022-01-14 Method and device for detecting home scene, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN114519818A true CN114519818A (en) 2022-05-20

Family

ID=81596479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210044678.4A Pending CN114519818A (en) 2022-01-14 2022-01-14 Method and device for detecting home scene, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN114519818A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803943A (en) * 2016-03-31 2017-06-06 小蚁科技(香港)有限公司 Video monitoring system and equipment
CN110659333A (en) * 2019-08-23 2020-01-07 浙江省北大信息技术高等研究院 Multi-level visual feature description method and visual retrieval system
WO2020116177A1 (en) * 2018-12-05 2020-06-11 ソニー株式会社 Image capturing element, image capturing device and method
CN111752165A (en) * 2020-07-10 2020-10-09 广州博冠智能科技有限公司 Intelligent equipment control method and device of intelligent home system
WO2021050007A1 (en) * 2019-09-11 2021-03-18 Nanyang Technological University Network-based visual analysis
CN112558760A (en) * 2020-11-30 2021-03-26 青岛海信日立空调系统有限公司 Air conditioner and control method
CN113627339A (en) * 2021-08-11 2021-11-09 普联技术有限公司 Privacy protection method, device and equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803943A (en) * 2016-03-31 2017-06-06 小蚁科技(香港)有限公司 Video monitoring system and equipment
WO2020116177A1 (en) * 2018-12-05 2020-06-11 ソニー株式会社 Image capturing element, image capturing device and method
CN110659333A (en) * 2019-08-23 2020-01-07 浙江省北大信息技术高等研究院 Multi-level visual feature description method and visual retrieval system
WO2021050007A1 (en) * 2019-09-11 2021-03-18 Nanyang Technological University Network-based visual analysis
CN111752165A (en) * 2020-07-10 2020-10-09 广州博冠智能科技有限公司 Intelligent equipment control method and device of intelligent home system
CN112558760A (en) * 2020-11-30 2021-03-26 青岛海信日立空调系统有限公司 Air conditioner and control method
CN113627339A (en) * 2021-08-11 2021-11-09 普联技术有限公司 Privacy protection method, device and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曾子明;秦思琪;: "智慧图书馆移动视觉搜索服务及其技术框架研究", 情报资料工作, no. 04, 25 July 2017 (2017-07-25) *
高文;田永鸿;王坚;: "数字视网膜:智慧城市系统演进的关键环节", 中国科学:信息科学, no. 08, 20 August 2018 (2018-08-20) *

Similar Documents

Publication Publication Date Title
Bayar et al. On the robustness of constrained convolutional neural networks to jpeg post-compression for image resampling detection
CN109858371B (en) Face recognition method and device
CN110705405B (en) Target labeling method and device
CN108805047B (en) Living body detection method and device, electronic equipment and computer readable medium
CN109948497B (en) Object detection method and device and electronic equipment
CN112381775B (en) Image tampering detection method, terminal device and storage medium
CN110688524B (en) Video retrieval method and device, electronic equipment and storage medium
CN111476234B (en) License plate character shielding recognition method and device, storage medium and intelligent equipment
CN107944381B (en) Face tracking method, face tracking device, terminal and storage medium
CN113128368A (en) Method, device and system for detecting character interaction relationship
CN111325107A (en) Detection model training method and device, electronic equipment and readable storage medium
CN111461202A (en) Real-time thyroid nodule ultrasonic image identification method and device
CN112989098B (en) Automatic retrieval method and device for image infringement entity and electronic equipment
CN113158773B (en) Training method and training device for living body detection model
CN111680670B (en) Cross-mode human head detection method and device
CN110210425B (en) Face recognition method and device, electronic equipment and storage medium
CN114519818A (en) Method and device for detecting home scene, electronic equipment and medium
CN111866573B (en) Video playing method and device, electronic equipment and storage medium
CN114842319A (en) Method and device for detecting home scene, electronic equipment and medium
CN112381055A (en) First-person perspective image recognition method and device and computer readable storage medium
CN113723310A (en) Image identification method based on neural network and related device
CN112348112A (en) Training method and device for image recognition model and terminal equipment
CN112270257A (en) Motion trajectory determination method and device and computer readable storage medium
CN111062337B (en) People stream direction detection method and device, storage medium and electronic equipment
US20230274377A1 (en) An end-to-end proctoring system and method for conducting a secure online examination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination