WO2024093296A1 - 一种唤醒方法及装置 - Google Patents

一种唤醒方法及装置 Download PDF

Info

Publication number
WO2024093296A1
WO2024093296A1 PCT/CN2023/103466 CN2023103466W WO2024093296A1 WO 2024093296 A1 WO2024093296 A1 WO 2024093296A1 CN 2023103466 W CN2023103466 W CN 2023103466W WO 2024093296 A1 WO2024093296 A1 WO 2024093296A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
image
wake
level
sub
Prior art date
Application number
PCT/CN2023/103466
Other languages
English (en)
French (fr)
Inventor
余家林
黄婧
余智平
秘谧
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024093296A1 publication Critical patent/WO2024093296A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping

Definitions

  • the present application relates to the field of security monitoring technology, and in particular to a wake-up method and device.
  • Humanoid recognition technology refers to the technology that uses certain characteristics of human body imaging, processes graphic images, and ultimately discovers, identifies, and locates humanoid targets in the imaging space. Humanoid recognition technology is an important technology of intelligent security systems and can be widely used in intelligent monitoring, intelligent transportation, target tracking, target tracking and other fields, such as smart door locks, monitoring equipment, etc.
  • existing smart door locks mainly use low-power devices such as image sensors, infrared sensors, pyroelectric infrared sensors (PIR) and two-level smart motion detection (SMD) for human recognition.
  • the first-level low-frame rate SMD and PIR are used together as the first-level wake-up interrupt source to detect the target motion or thermal motion status inside the region of interest (ROI) of the scene.
  • ROI region of interest
  • the second-level high-frame rate SMD can be entered to further determine whether there is really a human figure recognized in the surveillance image.
  • the current human recognition method can only enable smart door locks to recognize moving creatures, but cannot distinguish whether the moving creatures are humans or animals. As a result, the recognition accuracy of this method is not high, and many false alarms, missed alarms or late alarms will occur, seriously reducing the user experience and the credibility of the product warning function.
  • the present application provides a wake-up method and device for realizing human figure recognition of a smart door lock with high accuracy.
  • the present application provides a wake-up method, which can be applied to the aforementioned smart door lock and other detection devices.
  • the detection device acquires an image, such as the detection device includes a camera, and the camera can collect images.
  • the detection device inputs the acquired image into a first-level wake-up network to obtain a first detection result output by the first-level wake-up network.
  • the first-level wake-up network is used to detect the target object of the input image. If the first detection result indicates that there is a target object such as a human figure in the image, the second-level wake-up network is awakened.
  • the second-level wake-up network is also used to detect the target object of the image, but the detection accuracy of the second-level wake-up network is higher than the detection accuracy of the first-level wake-up network.
  • the image is further detected using the second-level wake-up network to obtain a second detection result output by the second-level wake-up network; when the second detection result indicates that there is a target object in the image, the wake-up processing unit performs a preset operation.
  • the first-level wake-up network and the second-level wake-up network can identify the target object such as a human figure in a single-frame image.
  • the first-level wake-up network detects the presence of a human figure in the image
  • the second-level wake-up network is awakened.
  • the processing unit is awakened, thereby realizing two-level wake-up of a single-frame image and reducing power consumption.
  • the target object perception accuracy of the second-level wake-up network is higher than that of the first-level wake-up network.
  • the second-level wake-up network can filter the detection results of the first-level wake-up network, which can greatly reduce the frequency of false awakening and missed awakening, and improve the target object perception accuracy.
  • the present application since the present application can realize two-level wake-up of a single-frame image, it does not rely on time domain information, reduces the storage burden, and can also achieve rapid response, reducing or avoiding the frequency of delayed alarms.
  • the first-level wake-up network includes a first subnetwork and a second subnetwork, the first subnetwork is used for feature extraction, and the second subnetwork is used for detecting the target object based on the features extracted by the first subnetwork; when the detection device inputs the image into the first-level wake-up network, it includes: the detection device inputs multiple parts of the image into the first subnetwork in sequence to obtain the features extracted by the first subnetwork for each part; determines the fused features corresponding to the image based on the features corresponding to each part; and uses the fused features as input data of the second subnetwork to obtain the first detection result output by the second subnetwork.
  • the input data of the first sub-network is a partial image in a complete image frame.
  • the features of the partial images are extracted respectively and then feature fusion is performed to determine the fusion features corresponding to a frame of image.
  • the second sub-network can determine whether there is a target object in the image based on the fusion features. This method does not require ROI setting, cropping or scaling of the image, which simplifies the wake-up process.
  • the running memory and intermediate layer data cache of the first-level wake-up network are also only 1/N of the original image resolution, while ensuring the perception accuracy of the target object, the input cache and calculation amount of the second sub-network are reduced. This solves the problem of the existing technology that requires caching full-frame data, and greatly reduces the data cache requirements.
  • the detection device includes a memory, and each part of the image is one or more rows of data in the image read from a target memory space. Different parts include different image data.
  • the target memory space can be the entire space of the memory, and the capacity of the target memory space is smaller than the amount of data in the image.
  • the detection device uses the secondary wake-up network to detect the image, including: the detection device inputs the fusion feature corresponding to the image into the secondary wake-up network to obtain the second detection result output by the secondary wake-up network.
  • the secondary wake-up network can detect target objects in the image based on the fusion features, realize the reuse of the first sub-network in the primary wake-up network and the secondary wake-up network, and further reduce power consumption.
  • the detection device uses the secondary wake-up network to detect the image, including: the detection device inputs the image into the secondary wake-up network to obtain a second detection result output by the secondary wake-up network.
  • the power consumption of the primary wake-up network is lower than a first set value
  • the power consumption of the secondary wake-up network is lower than a second set value
  • the first set value is not higher than the second set value
  • a low-power wake-up network is used to realize target object recognition, meeting the low-power consumption requirement of the detection device.
  • the present application provides a detection device, which is used to execute the first aspect and any possible method of the first aspect.
  • the beneficial effects can be referred to the relevant description of the first aspect and will not be repeated here.
  • the detection device includes a camera, a first processor, and a second processor; wherein the camera is used to capture an image; the first processor is used to obtain the image captured by the camera, and input the image into a primary wake-up network to obtain a first detection result output by the primary wake-up network, at which time the first processor is used to run the primary wake-up network, and the primary wake-up network is used to detect a target object in the image; when the first detection result indicates that there is a target object in the image, the secondary wake-up network is awakened, at which time the first processor is used to run the secondary wake-up network, interrupt the primary wake-up network, and the first processor uses the secondary wake-up network to detect the image to obtain a second detection result output by the secondary wake-up network, and the secondary wake-up network is
  • the first-level wake-up network includes a first subnetwork and a second subnetwork, the first subnetwork is used for feature extraction, and the second subnetwork is used for detecting the target object based on the features extracted by the first subnetwork; when the first processor uses the image as input data of the first-level wake-up network to obtain a first detection result output by the first-level wake-up network, it is specifically used to: input multiple parts of the image into the first subnetwork in sequence to obtain the features extracted by the first subnetwork for each part; determine the fusion features corresponding to the image according to the features of each part of the multiple parts; and use the fusion features as input data of the second subnetwork to obtain the first detection result output by the second subnetwork.
  • the device also includes a memory, and each part of the image is one or more rows of data in the image read from a target memory space. Different parts include different image data.
  • the target memory space can be the entire space of the memory, and the capacity of the target memory space is smaller than the amount of data in the image.
  • the second processor uses the secondary wake-up network to detect the image, it is specifically used to: the second processor inputs the fusion feature corresponding to the image into the secondary wake-up network to obtain a second detection result output by the secondary wake-up network.
  • the second processor uses the secondary wake-up network to detect the image, it is specifically configured as follows: the second processor inputs the image into the secondary wake-up network to obtain a second detection result output by the secondary wake-up network.
  • the power consumption of the primary wake-up network is lower than a first set value
  • the power consumption of the secondary wake-up network is lower than a second set value
  • the first set value is not higher than the second set value
  • the present application provides a detection device, and the beneficial effects can be found in the description of the first aspect and will not be repeated here.
  • the device has the function of implementing the behavior in the method example of the first aspect.
  • the function can be implemented by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more units corresponding to the above functions.
  • the device The structure of the device includes an acquisition unit, a detection unit, and a processing unit. These units can perform the corresponding functions in the above-mentioned first aspect method example. Please refer to the detailed description in the method example for details, which will not be repeated here.
  • an embodiment of the present application provides a computer-readable storage medium, in which a computer program or instructions are stored.
  • the detection device When the computer program or instructions are executed by a detection device, the detection device performs the method described in the first aspect or any design of the first aspect.
  • an embodiment of the present application provides a computer program product, which includes a computer program or instructions.
  • the computer program or instructions are executed by a detection device, the method described in the first aspect or any design of the first aspect is implemented.
  • FIG1 is a schematic diagram of a possible system architecture provided by an embodiment of the present application.
  • FIG2 is a schematic diagram of a scene of the prior art
  • FIG3 is a schematic diagram of a possible system architecture provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of a flowchart of a wake-up method provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of another possible system architecture provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of the structure of a detection device provided in an embodiment of the present application.
  • FIG. 7 is a flow chart of another wake-up method provided in an embodiment of the present application.
  • Fig. 1 is a schematic diagram of a monitoring system architecture provided by an embodiment of the present application.
  • the system includes an image acquisition device 1 and a detection device 2, and the detection device 2 may include a detection unit 20 and a processing unit 30.
  • the image acquisition device 1 and the detection device 2 can communicate via wired (such as a bus) or wireless (such as a network), and in one implementation, the image acquisition device 1 is integrated in the detection device 2, and the two can be connected via a bus.
  • the image acquisition device 1 and the detection device 2 are respectively located in two independent devices, and the two can communicate via a network.
  • the image acquisition device 1 can be a camera for acquiring images of the object being photographed.
  • the image acquisition device 1 can acquire images after receiving a shooting instruction, or the image acquisition device 1 can acquire images periodically.
  • a camera usually acquires one frame of image per unit time (such as 1s or 1ms, etc., not specifically limited).
  • the detection unit 20 can be used to detect the target object (including target object recognition) in the image. For example, if the target object is a human figure, the detection unit 20 can be used to recognize whether there is a human figure in the image. For another example, if the target object is a vehicle, the detection unit 20 can be used to recognize whether there is a vehicle in the image, etc.
  • the detection unit 20 can be software, hardware, or a combination of hardware and software.
  • the processing unit 30 can be used for data processing and calculation.
  • the processing unit 30 can be used for face detection, notification alarm, log recording, etc., without specific limitation.
  • the processing unit 30 can be a component with computing power such as a central processing unit (CPU) or a microcontroller (MCU).
  • the processing unit 30 has multiple power consumption modes, such as sleep mode and working mode. When in sleep mode, the processing unit 30 only needs to perform a few operations or maintain some basic functions, thereby achieving a low power consumption state. When in working mode, the processing unit 30 needs to perform more processing operations to achieve a high power consumption state. In order to reduce the power consumption of the system, the present application can make the processing unit 30 in sleep mode by default. When a specific processing operation needs to be performed, the detection unit 20 wakes up the processing unit 30.
  • the architecture of the system shown in Figure 1 is only an example. In actual applications, the system applicable to the present application may have more or fewer units relative to Figure 1.
  • the system may also include an audio unit, such as a speaker, a microphone, and a display unit, such as a display screen.
  • the system provided in the embodiment of the present application can be applied to various visual sensing fields such as security monitoring, doorbell access control, automatic driving, and household robots.
  • the system can be a monitoring device, such as an IP Camera (IPC), which can be installed indoors (or outdoors, without limitation), wherein the image acquisition device 1 is used to capture images of indoor areas, and the detection device 2 uses the images captured by the image acquisition device 1 as input data of the detection unit 20.
  • IPC IP Camera
  • the detection unit 20 detects that there is a human figure in the image
  • the detection unit 20 wakes up the processing unit 30, and the processing unit 30 can be used to perform face detection on the image, and when it is detected that the face in the image belongs to a stranger, an alarm notification can be sent to the householder.
  • the operations performed by the processing unit 30 described here and below are only an example, and the present application does not limit this.
  • the system can be a smart door lock, which is usually installed on the entrance door.
  • the image acquisition device 1 is used to collect images in the area near the entrance door.
  • the detection device 2 uses the image collected by the image acquisition device 1 as input data of the detection unit 20.
  • the processing unit 30 is awakened.
  • the processing unit 30 detects the presence of a stranger in the image, it sends an alarm notification to the homeowner.
  • the system in the field of autonomous driving, can be a vehicle-mounted device, such as a driving recorder.
  • the image acquisition device 1 in the driving recorder is used to acquire images in the area near the vehicle.
  • the detection device 2 uses the image acquired by the image acquisition device 1 as input data of the detection unit 20.
  • the detection unit 20 detects the presence of a human figure in the image, it wakes up the processing unit 30.
  • the processing unit 30 detects the presence of a stranger in the image, it can record the image of the stranger in the form of a log, or send a notification to the car owner.
  • the design of the detection unit 20 in the smart door lock mostly adopts multi-level low-power devices (such as pyroelectric infrared sensor (PIR), smart motion detection (SMD), etc.), and multi-level low-power devices are awakened step by step (also called multi-level low-power wake-up) to save energy, so as to achieve the low power consumption requirement of the smart door lock.
  • multi-level low-power devices such as pyroelectric infrared sensor (PIR), smart motion detection (SMD), etc.
  • multi-level low-power devices are awakened step by step (also called multi-level low-power wake-up) to save energy, so as to achieve the low power consumption requirement of the smart door lock.
  • FIG2 shows an internal architecture of an existing smart door lock.
  • the detection unit 20 in the smart door lock includes a primary SMD, a PIR and a secondary SMD, and its processing flow mainly includes:
  • the image acquisition device 1 acquires images in real time, and what it acquires is a panoramic image.
  • a regular or irregular region of interest (ROI) is set in the panoramic image.
  • the ROI is usually manually set according to the range to be detected.
  • the ROI is an area in the image taken by the image acquisition device 1, such as an entrance, a corridor, etc., where there is a high probability of human figures.
  • the first-level SMD can be used to detect moving objects for two adjacent frames of input images (here, the ROI images in the two frames of images). Specifically, when there is a pixel difference between the two frames of images, it is considered that there is object movement. For example, when a person, animal or non-living thing enters the shooting area of the smart door lock, the first-level SMD detects the movement of the object. At this time, the first-level SMD will wake up the second-level SMD.
  • the PIR in parallel with the primary SMD detects thermal motion, it will also wake up the secondary SMD. In other words, either the primary SMD or the PIR can wake up the secondary SMD.
  • the secondary SMD is woken up, the primary SMD is interrupted, and the image captured by the subsequent image acquisition device 1 will be input to the secondary SMD for further human recognition.
  • the secondary SMD may perform motion detection on the i+1-th frame ROI image and the i+2-th frame ROI image, or on the i+3-th frame ROI image and the i+4-th frame ROI image. If the secondary SMD still detects a moving object, the secondary SMD awakens the processing unit 30 (i.e., the MCU in FIG. 2 ). In this way, multi-level low-power awakening is achieved.
  • the processing unit 30 i.e., the MCU in FIG. 2 .
  • the secondary SMD can play the role of filtering the primary SMD, that is, to eliminate a large number of useless false wake-up frequencies, which can reduce power consumption.
  • the smart door lock shown in Figure 2 can meet the low power consumption requirements, since the SMD itself cannot distinguish between human and non-human movements (such as curtain movement, pixel changes caused by light changes, etc.), and PIR cannot identify whether thermal movement is human or animal movement, this technology still has a large number of false wake-ups, which will increase additional system power consumption. At the same time, a large number of false wake-ups will bring computing resource and storage pressure to the subsequent detection process, affecting the end-to-end computing efficiency.
  • missed awakening may occur due to the human figure quickly leaving the scene.
  • the human figure has left, and the human figure is not captured in the ROI image input to the secondary SMD, resulting in missed awakening.
  • the secondary SMD may not detect the moving object until after a gap of multiple frames, resulting in delayed awakening.
  • the SMD cannot detect stationary human figures, it may also result in missed awakening of stationary human targets.
  • the present application provides a new architecture of the detection unit 20 and a corresponding wake-up method.
  • the detection unit 20 may include multiple levels of detection networks (also called wake-up networks) with target object detection functions, such as neural network models, and the detection accuracy of each level of the wake-up network increases step by step.
  • target object detection functions such as neural network models
  • the detection unit 20 can distinguish between human figures (including static human figures) and non-human figures in the image based on a single frame image, thereby reducing the frequency of false awakening, missed awakening, and delayed awakening, saving storage overhead, and improving detection accuracy.
  • the structure of the detection unit 20 provided in the embodiment of the present application is described by taking the system architecture shown in FIG1 as an example.
  • the following description is given by taking the multi-level wake-up network including the secondary wake-up network in the present application as an example.
  • FIG3 shows a schematic diagram of the structure of the detection unit 20 provided in the embodiment of the present application.
  • the detection unit 20a includes a primary wake-up network 301 and a secondary wake-up network 302.
  • the primary wake-up network can be used to detect the target object in the image.
  • the target object can be a human figure, an animal, or a non-biological object such as a vehicle or an obstacle, etc., without specific limitation.
  • a human figure is usually detected.
  • the following description takes the target object as a human figure as an example.
  • the human figure in the following text can be replaced with the target object.
  • the first-level awakening network 301 can be used to perform human figure recognition on the image captured by the image acquisition device 1. Specifically, the first-level awakening network 301 can extract image features of the input image and perform human figure recognition based on the image features to determine whether there is a human figure in the input image, and when a human figure is recognized in the image, the second-level awakening network 302 is awakened. The second-level awakening network 302 can also be used to perform human figure recognition on the image, and when a human figure is recognized in the image, the processing unit 30 is awakened.
  • Both the primary wake-up network 301 and the secondary wake-up network 302 can adopt a neural network model.
  • the primary wake-up network 301 can be a classification model, and the classification model includes a binary classification model and a multi-classification model.
  • the binary classification model can be used to identify whether there is a human figure in the image.
  • the primary wake-up network 301 can adopt a multi-classification model, which can be used to identify whether there are people and/or vehicles in the image.
  • the primary wake-up network 301 can also be a human figure detection model, which can be used to further detect the position of a human figure in the image, etc.
  • the neural network model used in the present application can adopt a minimalist structure, that is, the neural network model has fewer layers, and/or the neural network model adopts a more streamlined data processing algorithm, such as the neural network model can convert floating-point operations into low-bit operations, reduce the complexity of data processing, and achieve low power consumption and high accuracy.
  • the human shape detection model adds additional operations such as "positioning" on the basis of human shape recognition, in actual applications, in order to further reduce power consumption, both the first-level wake-up network 301 and the second-level wake-up network 302 can adopt a classification model with human shape recognition function.
  • the processing accuracy of the secondary wake-up network 302 is higher than that of the primary wake-up network 301.
  • the secondary wake-up network 302 has a larger capacity.
  • the secondary wake-up network 302 includes more layers, uses more training data, has higher data processing accuracy, and has a higher processing frame rate.
  • the secondary wake-up network 302 can convert floating-point operations into 8-bit operations, while the primary wake-up network 301 can convert floating-point operations into 4-bit operations.
  • the primary wake-up network 301 can determine whether to wake up the secondary wake-up network 302 based on a low frame rate (such as the recognition result of one frame of image), while the secondary wake-up network 302 can determine whether to wake up the processing unit 30 based on a high frame rate (such as the recognition result of multiple frames of image), so that the secondary wake-up network 302 has a higher detection accuracy.
  • a low frame rate such as the recognition result of one frame of image
  • a high frame rate such as the recognition result of multiple frames of image
  • FIG4 is a flow chart of the method. As shown in FIG4 , the method includes the following steps:
  • Step 401 the primary wake-up network 301 obtains an image (referred to as a first image) captured by the image capture device 1 .
  • the image acquisition device 1 inputs the acquired image to the detection device 2, and accordingly, the detection device 2 receives the image sent by the image acquisition device 1. It should be noted that the image acquisition device 1 can periodically acquire images, such as acquiring one frame of image per second, and sequentially send the acquired images to the detection device 2. Accordingly, the detection device 2 sequentially receives the images sent by the image acquisition device 1.
  • step 402 the first-level wake-up network 301 takes the first image as input data and performs human figure recognition on the first image.
  • Step 403 When the primary wake-up network 301 recognizes that there is a human figure in the first image, the secondary wake-up network 302 is awakened.
  • the primary wake-up network 301 wakes up the secondary wake-up network 302 , the primary wake-up network 301 is interrupted, and the image input by the image acquisition device 1 will be input to the secondary wake-up network 302 for processing.
  • step 404 the secondary wake-up network 302 performs human figure recognition on the first image.
  • the secondary awakening network 302 obtains the first image and uses the first image as input data, thereby performing human recognition on the first image.
  • the input data of the secondary awakening network 302 may also be the image features of the first image determined by the primary awakening network 301. It should be understood that when the input data is different, the secondary awakening network 302 used may be different.
  • the secondary wake-up network 302 after the secondary wake-up network 302 is awakened, it can also detect an image frame (such as the second image) after the first image, and determine whether to wake up the processing unit 30 based on the detection result of the second image, or the secondary wake-up network 302 can also determine whether to wake up the processing unit 30 based on the detection results of multiple frames of images. For example, the secondary wake-up network 302 makes a judgment based on the detection result of the first image and the detection result of the second image, or based on the detection result of the second image and the detection result of the third image. If multiple detection results all indicate that a human figure exists, the secondary wake-up network 302 wakes up the processing unit 30.
  • an image frame such as the second image
  • the secondary wake-up network 302 can also determine whether to wake up the processing unit 30 based on the detection results of multiple frames of images. For example, the secondary wake-up network 302 makes a judgment based on the detection result of the first image and the detection result of the second image
  • Step 405 when the secondary wake-up network 302 identifies whether there is a human figure in the first image, the processing unit 30 is woken up.
  • Step 406 the processing unit 30 executes a preset operation, as described above, which will not be described again.
  • the embodiment of the present application can perform two-level wake-up based on a single-frame image.
  • Both the first-level wake-up network and the second-level wake-up network use a low-power neural network model to replace SMD/PIR, which can solve the business pain points of the existing ultra-low power sensor device market, and is expected to achieve milliwatt or even microwatt-level feature extraction and pattern recognition system architecture.
  • the power consumption is greatly reduced, and it has the characteristics of strong specialization, small area, and ultra-low power consumption.
  • the target object perception accuracy of the second-level wake-up network is higher than that of the first-level wake-up network.
  • the second-level wake-up network can filter the recognition results of the first-level wake-up network, which can greatly reduce the frequency of false wake-up and missed wake-up, and improve the perception accuracy of the target object.
  • the present application can achieve two-level wake-up based on a single-frame image, it does not rely on time domain information, and can achieve rapid response, reducing or avoiding the frequency of delayed alarms.
  • FIG5 shows a schematic diagram of the structure of another detection unit 20 b provided in an embodiment of the present application, wherein the detection unit 20 b includes a primary wake-up network 401 and a secondary wake-up network 402, wherein the functions of the primary wake-up network 401 and the secondary wake-up network 402 can be respectively referred to the introduction of the primary wake-up network 301 and the secondary wake-up network 302, and only the differences are described below.
  • the first-level wake-up network 401 includes multiple modules, such as the first subnetwork and the second subnetwork in FIG. 3 ( FIG. 3 is shown as two modules, but the present application is not limited to this). It can also be understood that the first-level wake-up network 401 is divided into two parts to obtain the first subnetwork and the second subnetwork, wherein each module may include one or more layers in the first-level wake-up network 401.
  • the first subnetwork is used to extract features of the input image to extract image features corresponding to the input image.
  • the input data of the first part can be a partial image in a complete frame of an image (such as an image captured by the image acquisition device 1).
  • a complete frame of an image is divided into multiple non-overlapping blocks with a fixed size as the granularity, that is, each block has the same size.
  • Each block is input into the first subnetwork in turn, and the first subnetwork extracts the image features of each block in turn.
  • the first subnetwork can also be applied to systems with small memory capacity in this way, without the need to perform ROI setting, cropping or scaling of the image, thereby simplifying the wake-up process.
  • the second sub-network can be used to fuse the image features of multiple blocks to obtain the image features corresponding to a complete image frame.
  • the fusion here can refer to splicing the image features of multiple blocks in the order of blocks.
  • the convolution operation of the second sub-network can perform further convolution and pooling operations at the splicing of the image features of each fused block, and fill the "gaps" of the image features of each block to restore the spatial correlation, which is beneficial to improve the target perception accuracy.
  • the image features of multiple blocks can also be fused by the first sub-network (not shown in Figure 5), and there is no specific limitation.
  • the second sub-network performs human figure recognition based on the input image features, and wakes up the secondary wake-up network 302 when a human figure is recognized in the image.
  • Fig. 6 is a schematic diagram of the structure of another detection device provided in an embodiment of the present application, and the detection device 600 includes As shown in Fig. 6, the detection device 600 includes: a first processor 602, a second processor 604, a memory 606 and a bus 608. Optionally, a camera 601 may also be included. The camera 601, the first processor 602, the second processor 604, the memory 606 and the bus 608 communicate through the bus 608.
  • the detection device 600 may be the system shown in Fig. 1 or Fig. 3 or Fig. 5. It should be understood that the present application does not limit the number of processors and memories in the detection device 600.
  • the bus 608 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • the bus may be divided into an address bus, a data bus, a control bus, etc.
  • FIG. 6 is represented by only one line, but does not mean that there is only one bus or one type of bus.
  • the bus 608 may include a path for transmitting information between various components of the detection device 600 (e.g., the first processor 602, the second processor 604, and the memory 606).
  • the first processor 602 may include any one or more of a graphics processing unit (GPU), a neural-network processing unit (NPU), an FPGA, and the like. In the present application, the first processor 602 may be used to run the primary wake-up network 301 and the secondary wake-up network 302 in FIG. 3 , or to run the primary wake-up network 401 and the secondary wake-up network 402 in FIG. 5 .
  • GPU graphics processing unit
  • NPU neural-network processing unit
  • FPGA field-programmable gate array
  • the second processor 604 may include any one or more of a central processing unit (CPU), a microcontroller (MCU), a GPU, a microprocessor (MP), or a digital signal processor (DSP).
  • the second processor 604 may have the function of the processing unit 30 in FIG. 3 or FIG. 5, and is used to execute the steps executed by the processing unit 30 in the embodiment shown in FIG. 4, or to execute the steps executed by the processing unit 30 in the embodiment shown in FIG. 7 below, which will not be repeated here.
  • the memory 606 may be a memory that directly exchanges data with the first processor 602, and the memory 606 includes a volatile memory, such as a random access memory (RAM).
  • the memory 606 may also include a non-volatile memory, such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid state drive (SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • the memory 606 may include a program storage area and a data storage area, and an executable program code is stored in the program storage area.
  • the first processor 602 executes the executable program code to respectively implement the first-level wake-up network 301 and the second-level wake-up network 302 in Figure 3, or to implement the functions of the first-level wake-up network 401 and the second-level wake-up network 402 in Figure 5, thereby implementing the wake-up method. That is, the memory 606 stores instructions for the detection device 600 to execute the wake-up method provided in the present application.
  • the data storage area stores data, such as image data obtained from the camera 601.
  • the memory 606 may also store executable program codes of the second processor 604, or the detection device 600 also includes a memory dedicated to exchanging data with the second processor 604, which is not shown in Figure 6.
  • the detection device 600 may not have the camera 601.
  • the detection device 600 may also include a communication interface, using a transceiver module such as, but not limited to, a network interface card or a transceiver to achieve communication between the detection device 600 and other devices or a communication network.
  • a transceiver module such as, but not limited to, a network interface card or a transceiver to achieve communication between the detection device 600 and other devices or a communication network.
  • the detection device 600 may communicate with the camera 601 through the communication interface to obtain image data collected by the camera 601.
  • FIG. 7 is a flow chart of the wake-up method. As shown in FIG. 7 , the method includes the following steps:
  • Step 700 obtaining input data.
  • Acquiring input data includes acquiring multiple blocks included in a frame of image captured by a camera, and inputting each block into the first sub-network in sequence.
  • Step 701 The first sub-network determines the image features of each block respectively.
  • the first processor divides the original image captured by the camera 601 into a plurality of blocks, inputs each block into the first sub-network in turn, and the first sub-network extracts image features of each block.
  • the complete process of step 701 may include: the camera 601 captures an image, and transmits a frame of the captured image to the memory 606 in a row scanning manner.
  • a frame of the image includes multiple rows, and each row scanning of the image can obtain a row of data in the image, that is, the memory 606 stores image data with "row” as the granularity.
  • the "available space" of the memory 606 is full, the multiple rows of data stored in the available space form a "block", and the first processor 602 obtains the multiple rows of data stored in the available space, that is, obtains a block in the image, and uses the block as input data of the first sub-network, and extracts the image features of the block through the first sub-network.
  • the "available space" in the memory 606 can be part of the storage space in the memory 606, and the capacity of the available space can be a preset value.
  • the preset value can also be dynamically adjusted.
  • the present application may adopt a "ping-pong buffer” mechanism to alternately write and read image data of two adjacent blocks.
  • the memory 606 includes two "available spaces", which are respectively recorded as the first buffer and the second buffer.
  • the first buffer is used to temporarily cache H rows of data of block 1.
  • the first processor 602 reads H rows of data of block 1 from the first buffer to perform a feature extraction (i.e., the H rows of data are input into the first sub-network, and the image features of block 1 are extracted through the first sub-network).
  • the second buffer continues to receive H rows of data of block 2 online.
  • the first processor 602 reads H rows of data of block 2 from the second buffer to perform a feature extraction. Similarly, the first buffer and the second buffer complete the feature extraction process of N blocks one by one from top to bottom in a dynamic pipeline manner.
  • H and N are both positive integers.
  • Step 702 The second sub-network determines fused image features based on the image features of the multiple blocks.
  • Step 703 The second sub-network recognizes the target object (such as a human figure) based on the fused image features.
  • Step 704 When the second sub-network detects that there is a target object (such as a human figure) in the image, the secondary wake-up network 402 is awakened.
  • a target object such as a human figure
  • an interrupt instruction is triggered to interrupt the primary wake-up network 401 and wake up the secondary wake-up network 402 , and the corresponding first processor runs the secondary wake-up network 402 .
  • step 705 the secondary wake-up network 402 uses the fused image features as input data and performs target object (eg, human figure) recognition based on the input data.
  • target object eg, human figure
  • the input data of the secondary wake-up network 402 may be fused image features, so that the first sub-network may be reused in the two-stage wake-up network, thereby further reducing the power consumption of the system.
  • the input data of the secondary wake-up network 402 may also be the original image captured by the camera 601.
  • This solution requires an additional buffer for caching the original image, that is, it is applied to devices with a large capacity of the memory 606. It should be understood that when the input data is different, the secondary wake-up network 402 used may be different.
  • Step 706 when the secondary wake-up network 302 identifies whether there is a target object (such as a human figure) in the image, it wakes up the processing unit 30 .
  • a target object such as a human figure
  • Step 707 the processing unit 30 executes a preset operation, as described above, which will not be described again.
  • the above design can be applied to devices with small memory capacity.
  • the memory capacity of devices such as smart door locks is usually small.
  • the image captured by the camera is usually scaled or cropped and then input into the detection unit 20 for processing.
  • there is no need to perform ROI setting, cropping or scaling on the image which simplifies the wake-up process.
  • the resolution of the input data of the first-level wake-up network 401 for a single inference is only 1/N of the original image resolution
  • the running memory and intermediate layer data cache of the first-level wake-up network 401 are also only 1/N of the original image resolution, while ensuring the perception accuracy of the target object, the input cache and calculation amount of the second sub-network are reduced.
  • the problem of the prior art that full-frame data needs to be cached is solved, and the data cache requirement is greatly reduced.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrated.
  • the available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)), etc.
  • the various illustrative logic units and circuits described in the embodiments of the present application can be implemented or operated by a general-purpose processor, a digital signal processor, an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic, a discrete hardware component, or the design of any combination of the above functions.
  • the general-purpose processor can be a microprocessor, and optionally, the general-purpose processor can also be any traditional processor, controller, microcontroller or state machine.
  • the processor can also be implemented by a combination of computing devices, such as a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration to implement.
  • the steps of the method or algorithm described in the embodiments of the present application can be directly embedded in the software unit executed by the hardware, the processor, or a combination of the two.
  • the software unit can be stored in a RAM memory, a flash memory, a ROM memory, an EPROM memory, an EEPROM memory, a register, a hard disk, a removable disk, a CD-ROM or other storage media of any form in the art.
  • the storage medium can be connected to the processor so that the processor can read information from the storage medium and can write information to the storage medium.
  • the storage medium can also be integrated into the processor.
  • the processor and the storage medium can be arranged in an ASIC.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

一种唤醒方法及装置,该方法包括:获取图像;将所述图像作为一级唤醒网络的输入数据,以得到所述一级唤醒网络输出的第一检测结果,所述一级唤醒网络用于对输入的所述图像进行目标对象的检测;当所述第一检测结果指示所述图像中存在所述目标对象时,唤醒所述二级唤醒网络,所述二级唤醒网络用于对所述图像进行目标对象的检测,所述二级唤醒网络的检测精度高于所述一级唤醒网络的检测精度;使用所述二级唤醒网络对所述图像进行检测,以得到所述二级唤醒网络输出的第二检测结果;当所述第二检测结果指示所述图像中存在所述目标对象时,唤醒处理单元执行预设操作。

Description

一种唤醒方法及装置
相关申请的交叉引用
本申请要求在2022年11月03日提交中国专利局、申请号为202211370332.X、申请名称为“一种数据处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中;本申请要求在2023年02月14日提交中国专利局、申请号为202310156937.7、申请名称为“一种唤醒方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及安防监控技术领域,尤其涉及一种唤醒方法及装置。
背景技术
人形识别技术是指利用人体成像的一定特征,通过对图形图像的处理,最终在成像空间中发现识别和定位人形目标的技术。人形识别技术是智能安防系统的一项重要技术,可以被广泛的应用到智能监控、智能交通、目标循迹、目标跟踪等领域,如智能门锁、监控设备等。
以智能门锁为例,现有的智能门锁主要使用图像传感器、红外传感器、热释电红外传感器(pyroelectric infrared sensor,PIR)和两级智能运动检测(Smart motion detection,SMD)等低功耗器件进行人形识别。其中,一级低帧率SMD和PIR共同作为一级唤醒中断源侦测场景感兴趣区域(region of interesting,ROI)内部的目标运动或热运动状况。当一级SMD或PIR任何一者先触发中断即可进入二级高帧率SMD进一步判断监控影像中是否真的有人形被识别。
然而,目前的人形识别方法只能令智能门锁识别到运动的生物,但无法区分该运动的生物是人还是动物,因此导致该方法的识别准确率不高,会出现很多误报、漏报或晚报,严重降低用户体验和产品警示功能的可信度。
综上所述,目前迫切需要提出一种提高智能门锁中人形识别准确率的方案。
发明内容
本申请提供一种唤醒方法及装置,用于实现智能门锁的人形识别且准确率较高。
第一方面,本申请提供一种唤醒方法,该唤醒方法可以应用于前述的智能门锁等检测设备中,以检测设备为例,在该方法中,检测设备获取图像,如检测设备包括摄像头,摄像头可采集图像,检测设备将该获取的图像输入一级唤醒网络,得到所述一级唤醒网络输出的第一检测结果,一级唤醒网络用于对输入的所述图像进行目标对象的检测。如果第一检测结果指示该图像中存在目标对象如人形时,唤醒二级唤醒网络,二级唤醒网络也用于对图像进行目标对象的检测,但二级唤醒网络的检测精度高于一级唤醒网络的检测精度。使用二级唤醒网络对该图像进行进一步检测,以得到二级唤醒网络输出的第二检测结果;当第二检测结果指示该图像中存在目标对象时,唤醒处理单元执行预设操作。
通过上述设计,一级唤醒网络、二级唤醒网络可对单帧图像进行目标对象如人形识别,在一级唤醒网络检测到图像中存在人形时,唤醒二级唤醒网络,在二级唤醒网络仍检测到该帧图像中存在人形时,唤醒处理单元,从而实现单帧图像的两级唤醒,降低功耗,其中,二级唤醒网络的目标对象感知精度高于一级唤醒网络,因此,二级唤醒网络可对一级唤醒网络的检测结果进行过滤,可大幅降低误唤醒、漏唤醒的频次,提升目标对象感知精度。另外,由于本申请可实现单帧图像的两级唤醒,不依赖时域信息,降低存储负担,同时还可实现快速响应,减少或避免延迟报警频次。
在一种可能的设计中,一级唤醒网络包括第一子网络和第二子网络,第一子网络用于进行特征提取,第二子网络用于基于第一子网络所提取的特征进行目标对象的检测;检测设备在将图像输入一级唤醒网络时,包括:检测设备将该图像的多个部分依次输入所述第一子网络,以得到所述第一子网络对每个部分所提取的特征;基于每个部分对应的特征确定该图像对应的融合特征;将该融合特征作为第二子网络的输入数据,以得到述第二子网络输出的第一检测结果。
通过上述设计,第一子网络的输入数据为一帧完整图像中的部分图像,分别提取部分图像的特征再进行特征融合,以确定一帧图像对应的融合特征,第二子网络可基于该融合特征确定图像中是否存在目标对象,该方法中不需要对图像进行ROI设定、剪裁或缩放等处理,简化唤醒流程。并且由于一级唤醒网络单次推理的输入数据的分辨率仅为原始图像分辨率的1/N,一级唤醒网络的运行内存和中间层数据缓存也仅为原始图像分辨率的1/N,再保证目标对象感知精度的同时减低了第二子网络的输入缓存和计算量。解决了现有技术需要缓存全帧数据的问题,大幅降低了数据缓存需求。
在一种可能的设计中,该检测装置包括内存,该图像的每个部分均为从目标内存空间中读取的该图像中的一行或多行数据,不同的部分所包括的图像的数据不同,该目标内存空间可以是该内存的全部空间,目标内存空间的容量小于该图像的数据量。
通过上述设计,解决了现有技术需要缓存全帧数据的问题,大幅降低了数据缓存需求,使得该方法可应用于不同内存容量的检测装置中,增强该方法的实用性。
在一种可能的设计中,检测装置使用所述二级唤醒网络对所述图像进行检测,包括:检测装置将该图像对应的融合特征输入二级唤醒网络,以得到二级唤醒网络输出的述第二检测结果。
通过上述设计,二级唤醒网络可基于融合特征对图像进行目标对象检测,实现第一子网络在一级唤醒网络和二级唤醒网络中的复用,进一步降低功耗。
在一种可能的设计中,检测装置使用所述二级唤醒网络对所述图像进行检测,包括:检测装置将该图像输入二级唤醒网络,以得到二级唤醒网络输出的第二检测结果。
在一种可能的设计中,一级唤醒网络的功耗低于第一设定值,二级唤醒网络的功耗低于第二设定值,第一设定值不高于第二设定值。
通过上述设计,采用低功耗的唤醒网络实现目标对象识别,满足检测装置的低功耗要求。
第二方面,本申请提供一种检测装置,该检测装置用于执行第一方面及第一方面任一种可能的方法,有益效果可以参见第一方面的相关描述此处不再赘述。该检测装置包括摄像头、第一处理器、第二处理器;其中,摄像头,用于采集图像;第一处理器,用于获取摄像头采集的图像,并将该图像输入一级唤醒网络,以得到所述一级唤醒网络输出的第一检测结果,此时第一处理器用于运行一级唤醒网络,一级唤醒网络用于对图像进行目标对象的检测;当该第一检测结果指示该图像中存在目标对象时,唤醒二级唤醒网络,此时第一处理器用于运行二级唤醒网络,中断一级唤醒网络,第一处理器使用该二级唤醒网络对该图像进行检测,以得到二级唤醒网络输出的第二检测结果,二级唤醒网络用于对输入的所述图像进行目标对象的检测,二级唤醒网络的检测精度高于所述一级唤醒网络的检测精度;当该第二检测结果指示该图像中存在所述目标对象时,第一处理器唤醒第二处理器;第二处理器被唤醒后用于执行预设操作。
在一种可能的设计中,一级唤醒网络包括第一子网络和第二子网络,第一子网络用于进行特征提取,第二子网络用于基于所述第一子网络所提取的特征进行目标对象的检测;第一处理器将所述图像作为一级唤醒网络的输入数据,以得到所述一级唤醒网络输出的第一检测结果时,具体用于:将图像的多个部分依次输入第一子网络,以得到第一子网络对每个部分所提取的特征;根据所述多个部分中的每个部分的特征确定该图像对应的融合特征;将该融合特征作为第二子网络的输入数据,以得到第二子网络输出的第一检测结果。
在一种可能的设计中,该装置还包括内存,该图像的每个部分均为从目标内存空间中读取的该图像中的一行或多行数据,不同的部分所包括的图像的数据不同,该目标内存空间可以是该内存的全部空间,目标内存空间的容量小于该图像的数据量。
在一种可能的设计中,第二处理器使用二级唤醒网络对该图像进行检测时,具体用于:第二处理器将该图像对应的融合特征输入二级唤醒网络,以得到二级唤醒网络输出的第二检测结果。
在一种可能的设计中,第二处理器使用二级唤醒网络对该图像进行检测时,具体用于:第二处理器将该图像输入二级唤醒网络,以得到二级唤醒网络输出的第二检测结果。
在一种可能的设计中,一级唤醒网络的功耗低于第一设定值,二级唤醒网络的功耗低于第二设定值,第一设定值不高于第二设定值。
第三方面,本申请提供一种检测装置,有益效果可以参见第一方面的描述此处不再赘述。该装置具有实现上述第一方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的单元。在一个可能的设计中,所述装 置的结构中包括获取单元、检测单元、处理单元,这些单元可以执行上述第一方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第四方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序或指令,当计算机程序或指令被检测装置执行时,使得该检测装置执行如第一方面或者第一方面的任一设计所述中的方法。
第五方面,本申请实施例提供一种计算机程序产品,该计算机程序产品包括计算机程序或指令,当该计算机程序或指令被检测装置执行时,实现如第一方面或者第一方面的任一设计所述中的方法。
本申请在上述各方面提供的实现的基础上,还可以进行进一步组合以提供更多实现。
附图说明
图1为本申请实施例提供的一种可能的系统架构示意图;
图2为一种现有技术的场景示意图;
图3为本申请实施例提供的一种可能的系统架构示意图;
图4为本申请实施例提供的一种唤醒方法的流程示意图;
图5为本申请实施例提供的另一种可能的系统架构示意图;
图6为本申请实施例提供的一种检测设备的结构示意图;
图7为本申请实施例提供的另一种唤醒方法的流程示意图。
具体实施方式
图1为本申请实施例提供的一种监测系统架构示意图。该系统包括图像采集装置1和检测装置2,检测装置2可包括检测单元20和处理单元30。其中,图像采集装置1和检测装置2可通过有线(如总线)或无线(如网络)等方式通信,在一种实现中,图像采集装置1集成于检测装置2内,此时两者可通过总线连接。或者,图像采集装置1和检测装置2分别位于两个独立的设备中,此时两者可通过网络通信。
图像采集装置1可以是摄像头,用于采集被摄物体的图像,图像采集装置1可在接收到拍摄指令后采集图像,或者,图像采集装置1可周期性采集图像,如在安防监控领域,摄像头通常每单位时间(如1s或1ms等,具体不做限定)采集一帧图像。
检测单元20,可用于对图像进行目标对象检测(包括目标对象识别),如目标对象为人形,检测单元20可用于识别图像中是否存在人形。又如目标对象为车辆,检测单元20可用于识别图像中是否存在车辆,等等。其中,检测单元20可以是软件、或硬件、或硬件和软件的组合。
处理单元30,可用于进行数据处理和计算,如在本申请中,处理单元30可用于人脸检测、通知告警、记录日志等,具体不做限定。处理单元30可以是中央处理器(center processing unit,cpu)或微控制器(Mirco Controller Unit,MCU)等具有算力的组件。其中,处理单元30具有多种功耗模式,如休眠模式和工作模式,当处于休眠模式时,处理单元30仅需执行很少的操作或维持一些基本功能,从而达到低功耗的状态,当处于工作模式时,处理单元30所需执行的处理操作增多从而达到高功耗的状态。本申请为降低系统的功耗,可令处理单元30默认处于休眠模式,当需要执行特定的处理操作时,由检测单元20唤醒处理单元30。
需要说明的是,图1所示的系统的架构仅为示例,实际应用中,本申请适用的系统可以具有相对图1更多或更少的单元,比如,该系统还可以包括音频单元,如扬声器、麦克风,还可包括显示单元,如显示屏等。
本申请实施例提供的系统可应用于安防监控、门铃门禁、自动驾驶、家用机器人等多种视觉传感领域,例如,在安防监控领域,该系统可以是监控设备,如网络摄像机(IP Camera,IPC),该IPC可安装于室内(或室外,没有限制),其中的图像采集装置1用于拍摄室内区域的图像,检测装置2将图像采集装置1采集的图像作为检测单元20的输入数据,当检测单元20检测到该图像中存在人形时,检测单元20唤醒处理单元30,处理单元30可用于对该图像进行人脸检测,当检测到图像中的人脸属于陌生人时可向户主发送告警通知等。应理解,此处及下文中描述的处理单元30所执行的操作仅为一种示例,本申请对此不做限定。
又例如,在门铃门禁领域,该系统可以是智能门锁,通常安装于入户门上,其中的图像采集装置1用于采集入户门口附近区域内的图像,类似的,检测装置2将图像采集装置1采集的图像作为检测单元20的输入数据,当检测单元20检测到该图像中存在人形时唤醒处理单元30,处理单元30在检测到该图像中存在陌生人时向户主发送告警通知等。
再例如,在自动驾驶领域,该系统可以是车载装置,如行车记录仪,行车记录仪中的图像采集装置1用于采集车辆附近区域内的图像,类似的,检测装置2将图像采集装置1采集的图像作为检测单元20的输入数据,当检测单元20检测到图像中存在人形时唤醒处理单元30,处理单元30可在检测到图像中存在陌生人时,可以以日志形式记录下陌生人的图像,或向车主发送通知等。
应理解,上述所列举的领域及场景仅为举例说明,本申请对该系统的应用领域及场景不做限定。为便于理解,如下以该系统为智能门锁为例进行描述。
本领域技术人员可知,智能门锁等器件属于低功耗器件,功耗等级通常为毫瓦甚至是微瓦级,为实现该要求,智能门锁内部的工作方式遵循于低功耗多级唤醒,具体的,对于智能门锁中的检测单元20的设计多采用多级低功耗器件(如热释电红外传感器(pyroelectric infrared sensor,PIR)、智能运动检测(Smart motion detection,SMD)等),以及多级低功耗器件逐级唤醒(也可称为多级低功耗唤醒)的方式进行节省,以实现智能门锁的低功耗要求。
举例来说,图2示出了一种现有的智能门锁的内部架构。结合图1和图2理解,该智能门锁中检测单元20包括一级SMD、PIR和二级SMD,其处理流程主要包括:
图像采集装置1实时采集图像,其采集的为一副全景图像,在全景图像中设定有一个规则或不规则的感兴趣区域(region of interesting,ROI),该ROI通常是依据需要侦测的范围进行人工设定的,比如,ROI为图像采集装置1所拍摄的图像中的出入口、楼道等较大概率存在人形的区域。
后续,将图像采集装置1采集的完整图像中的ROI对应的图像(简称ROI图像)经过适当缩放后输入一级SMD,一级SMD可用于对输入的相邻两帧图像(此处为两帧图像中的ROI图像)进行运动物体检测,具体的,检测当两帧图像存在像素差异时,认为存在物体运动,例如有人或动物或非生物进入智能门锁的拍摄区域时,一级SMD检测到物体运动,此时,一级SMD会唤醒二级SMD。
除此之外,与一级SMD处于并列检测位置的PIR在检测到热运动状况时,也会唤醒二级SMD,换言之,一级SMD或PIR任何一方均可以唤醒二级SMD。二级SMD被唤醒的同时一级SMD被中断,后续图像采集装置1采集的图像将被输入至二级SMD进行进一步的人形识别。
类似的,二级SMD被唤醒后,后续图像采集装置1采集的新的两帧图像中ROI图像被输入二级SMD,比如,一级SMD基于第i帧ROI图像和第i+1帧ROI图像唤醒二级SMD后,二级SMD可能对第i+1帧ROI图像和第i+2帧ROI图像,或对第i+3帧ROI图像和第i+4帧ROI图像进行运动检测,若二级SMD仍检测到于运动物体,则二级SMD唤醒处理单元30(即图2中的MCU)。如此,实现多级低功耗唤醒。
其中,二级SMD可以起到对一级SMD过滤的作用,即可剔除大量无用的误唤醒频次,能够降低功耗。然而,虽然图2所示的智能门锁能够满足低功耗需求,但是由于SMD本身不能区分人形运动和非人形运动(如窗帘运动、光照变化导致的像素变化等),且PIR也不能识别热运动为人类运动还是动物运动,导致该技术仍存在大量的误唤醒,会增加额外的系统功耗,同时大量的误唤醒会给后续的检测过程带来计算资源和存储的压力,影响端到端的计算效率。
另外,由于SMD需要基于至少两帧图像进行检测,在人形处于拍摄边界或者在人形快速移动等场景中,可能会因为人形快速离开场景而导致漏唤醒,比如二级SMD被唤醒后,由于人形已离开,输入二级SMD的ROI图像中没有采集到人形,从而导致漏唤醒,或者,由于物体运动缓慢,二级SMD可能在间隔多帧后才能检测到运动物体,从而导致延迟唤醒,以及由于SMD无法检测静止的人形,因此,还会导致静止人形目标的漏唤醒。
综上,现有的智能门锁仍存在大量误唤醒、漏唤醒以及延迟唤醒等问题,检测精度较低。
基于此,本申请提供了一种新的检测单元20的架构及对应的唤醒方法,本申请中,检测单元20可包括多级具有目标对象检测功能的检测网络(又称唤醒网络),如神经网络模型,每一级唤醒网络的检测精度逐级递增。其中,前一级唤醒网络在检测到图像中存在目标对象时,唤醒后一级唤醒网络,当目标对象为人形时,检测单元20可基于单帧图像来区分图像中的人形(包括静止人形)和非人形,从而降低误唤醒、漏唤以及延迟唤的频次,节省存储开销,提高检测精度。
接下来以应用于图1所示的系统架构为例,对本申请实施例提供的检测单元20的结构进行。为便于说明,下文以本申请中的多级唤醒网络包括二级唤醒网络为例进行介绍。
图3示出了本申请实施例提供的检测单元20的结构示意图,如图3所示,该检测单元20a包括一级唤醒网络301和二级唤醒网络302,一级唤醒网络可用于对图像进行目标对象检测,目标对象可以是人形、动物、非生物体,如车辆、障碍物等,具体不做限定。本申请实施例所的应用场景中通常对人形进行检测,如下以目标对象为人形为例进行说明,下文中的人形可替换为目标对象。
其中,一级唤醒网络301,可用于对图像采集装置1采集的图像进行人形识别,具体的,一级唤醒网络301可以提取输入图像的图像特征,并基于图像特征进行人形识别,以确定输入图像中是否存在人形,在识别到图像中存在人形时,唤醒二级唤醒网络302。二级唤醒网络302也可以用于对图像进行人形识别,并在识别到图像中存在人形时,唤醒处理单元30。
一级唤醒网络301和二级唤醒网络302均可以采用神经网络模型。以一级唤醒网络301为例,一级唤醒网络301可以是分类模型,分类模型包括二分类模型、多分类模型。其中,二分类模型可用于识别图像中是否存在人形。当设置有多个目标对象(如人和车辆)时,一级唤醒网络301可采用多分类模型,可用于识别图像中是否存在人和/或车辆。或者,一级唤醒网络301还可以是人形检测模型,可用于进一步检测图像中的人形位置等。
在本申请中,为降低系统功耗,本申请中应用的神经网络模型可采用极简结构,即该神经网络模型具有较少的层数,和/或,该神经网络模型采用了更精简的数据处理算法,如该神经网络模型可将浮点运算转化为低比特(bit)数运算,降低数据处理的复杂度,实现低功耗以及高准确率的目的。另外,由于人形检测模型在人形识别的基础上增加了“定位”等额外的操作,因此在实际应用中,为进一步降低功耗,一级唤醒网络301和二级唤醒网络302均可采用具有人形识别功能的分类模型。
值得指出的是,虽然两者的功能相似,但二级唤醒网络302的处理精度更高于一级唤醒网络301的处理精度。比如体现在,二级唤醒网络302的容量更大,具体如相对一级唤醒网络301而言,二级唤醒网络302所包括的层数更多、所使用的训练数据更多、数据处理精度更高,处理的帧率更高等,比如,从数据处理精度而言,二级唤醒网络302可将浮点运算转化为8bit数运算,而一级唤醒网络301可将浮点运算转化为4bit数运算。帧率而言,一级唤醒网络301可基于低帧率(如一帧图像的识别结果)来确定是否唤醒二级唤醒网络302,而二级唤醒网络302可以基于高帧率(如多帧图像的识别结果)来确定是否唤醒处理单元30,从而使二级唤醒网络302具有更高的检测精度。
如下基于图3所示的结构对本申请实施例提供的一种唤醒方法进行介绍,图4为该方法的流程示意图,如图4所示,该方法包括如下步骤:
步骤401,一级唤醒网络301获取图像采集装置1采集的图像(记为第一图像)。
图像采集装置1将采集的图像输入至检测装置2,相应的,检测装置2接收图像采集装置1发送的图像。应注意,图像采集装置1可周期性采集图像,如一秒采集一帧图像,并将采集到的图像依次发送给检测装置2。相应的,检测装置2依次接收图像采集装置1发送的图像。
步骤402,一级唤醒网络301将第一图像作为输入数据,并对该第一图像进行人形识别。
步骤403,一级唤醒网络301识别到第一图像中存在人形时,唤醒二级唤醒网络302。
一级唤醒网络301唤醒二级唤醒网络302后,一级唤醒网络301被中断,图像采集装置1输入进来的图像将被输入二级唤醒网络302处理。
步骤404,二级唤醒网络302对第一图像进行人形识别。
二级唤醒网络302获取第一图像,并将第一图像作为输入数据,从而对第一图像进行人形识别。可选的,二级唤醒网络302的输入数据还可以是一级唤醒网络301确定的第一图像的图像特征,应理解,输入数据不同时,所采用的二级唤醒网络302可能是不同的。
在另一种实施方式中,二级唤醒网络302被唤醒后,还可以对第一图像之后的一帧图像(如第二图像)进行检测,并基于第二图像的检测结果确定是否唤醒处理单元30,或者,二级唤醒网络302也可以基于多帧图像的检测结果确定是否唤醒处理单元30。比如,二级唤醒网络302基于第一图像的检测结果和第二图像的检测结果,或基于第二图像的检测和第三图像的检测结果进行判读,若多个检测结果均指示存在人形时,二级唤醒网络302唤醒处理单元30。
步骤405,二级唤醒网络302识别到该第一图像中是否存在人形时,唤醒处理单元30。
步骤406,处理单元30执行预设的操作,参见前述的介绍,此处不再赘述。
上述设计,本申请实施例可基于单帧图像进行两级唤醒,一级唤醒网络和二级唤醒网络均采用低功耗的神经网络模型,替代SMD/PIR,可解决现有极低功耗传感器件市场的业务痛点,有望实现毫瓦级甚至微瓦级的特征提取和模式识别系统架构,较之现有技术的视觉传感器芯片而言,功耗大幅度降低,具备专用性强、小面积、超低功耗等特点。其中,二级唤醒网络的目标对象感知精度高于一级唤醒网络,因此,二级唤醒网络可对一级唤醒网络的识别结果进行过滤,可大幅降低误唤醒、漏唤醒的频次,提升目标对象感知精度。另外,由于本申请可基于单帧图像即可实现两级唤醒,不依赖时域信息,可实现快速响应,减少或避免延迟报警频次。
图5示出了本申请实施例提供的另一种检测单元20b的结构示意图,该检测单元20b包括一级唤醒网络401和二级唤醒网络402,其中,一级唤醒网络401、二级唤醒网络402的功能可分别参见一级唤醒网络301、二级唤醒网络302的介绍,以下仅就不同之处进行说明。
在一种可选的实施方式中,如图5所示,一级唤醒网络401包括多个模块,如图3中的第一子网络和第二子网络(图3以两个模块示出的,但本申请不限定于此),也可以理解为,一级唤醒网络401被划分为两个部分,从而得到第一子网络和第二子网络,其中每个模块可包括一级唤醒网络401中的一层或多层。
其中,第一子网络用于对输入图像进行特征提取,以提取到输入图像对应的图像特征。第一部分的输入数据可以是一帧完整图像(如图像采集装置1采集的图像)中的部分图像,比如,以固定大小为粒度,将一帧完整图像划分为多个互不重叠的块,即每个块的大小相同。依次将每个块输入第一子网络,第一子网络依次提取每个块的图像特征。第一子网络还该方式可应用于内存容量较小的系统中,不需要对图像进行ROI设定、剪裁或缩放等处理,简化唤醒流程。
第二子网络可用于将多个块的图像特征进行融合,以得到一帧完整图像对应的图像特征,此处的融合可以是指将多个块的图像特征按块的顺序进行拼接。例如,第二子网络的卷积操作可以在融合后的各个块的图像特征拼接处做进一步卷积和池化运算,将各个块的图像特征的“缝隙”做缝补以恢复空间相关性,有利于提升目标感知精度。可选的,也可以由第一子网络对多个块的图像特征进行融合(图5未示出),具体不做限定。第二子网络基于输入的图像特征进行人形识别,在识别到图像中存在人形时,唤醒二级唤醒网络302。
图6为本申请实施例提供的另一种检测设备的结构示意图,该检测设备600包括如图6所示,检测设备600包括:第一处理器602、第二处理器604、存储器606和总线608。可选的,还可以包括摄像头601。摄像头601、第一处理器602、第二处理器604、存储器606和总线608之间通过总线608通信。检测设备600可以是图1或图3或图5所示的系统。应理解,本申请不限定检测设备600中的处理器、存储器的个数。
总线608可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线608可包括在检测设备600各个部件(例如,第一处理器602、第二处理器604、存储器606)之间传送信息的通路。
第一处理器602可以包括图形处理器(graphics processing unit,GPU)、网络处理器(Neural-network Processing Unit,NPU)、FPGA等处理器中的任意一种或多种。在本申请中,第一处理器602可用于运行图3中的一级唤醒网络301、二级唤醒网络302,或者,用于运行图5中的一级唤醒网络401、二级唤醒网络402。
第二处理器604可以包括中央处理器(central processing unit,CPU)、微控制器(Mirco Controller Unit,MCU)、GPU、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。在本申请中,第二处理器604可具有图3或图5中的处理单元30的功能,用于执行图4所示实施例中处理单元30执行的步骤,或用于执行下文图7所示实施例中处理单元30执行的步骤,此处不再赘述。
存储器606可以是与第一处理器602直接交换数据的内存,存储器606包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器606还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。
存储器606中可包括程序存储区和数据存储区,程序存储区中存储有可执行的程序代码,第一处理器602执行该可执行的程序代码以分别实现图3中的一级唤醒网络301、二级唤醒网络302,或者,实现图5中的一级唤醒网络401、二级唤醒网络402的功能,从而实现唤醒方法。也即,存储器606上存有检测设备600用于执行本申请提供的唤醒方法的指令。数据存储区中存储有数据,如从摄像头601获取的图像数据。可选的,存储器606中还可存储第二处理器604可执行的程序代码,或者,检测设备600还包括专用于与第二处理器604交换数据的存储器,图6未示出。
需要说明的是,图6所示的系统的架构仅为示例,实际应用中,本申请适用的系统可以具有相对图6更多或更少的单元,比如,检测设备600可不具有摄像头601,又比如,检测设备600还可以包括通信接口,使用例如但不限于网络接口卡、收发器一类的收发模块,来实现检测设备600与其他设备或通信网络之间的通信。例如,在检测设备600不具有摄像头601时,检测设备600可通过通信接口与摄像头601通信来获取摄像头601采集的图像数据。
如下以应用图5所示的系统为例,对本申请实施例提供的另一种唤醒方法进行介绍,图7为该唤醒方法的流程示意图,如图7所示,该方法包括如下步骤:
步骤700,获取输入数据。
获取输入数据包括,获取摄像头采集的一帧图像所包括的多个块,并将每个块依次输入第一子网络。
步骤701,第一子网络分别确定每个块的图像特征。
结合图6理解,在一种实施方式中,第一处理器将摄像头601采集的原始图像划分为多个块,依次将每个块输入第一子网络,由第一子网络提取每个块的图像特征。
该示例中,步骤701的完整流程可包括:摄像头601采集图像,并将采集的一帧图像以行扫描的方式向存储器606进行传输,应理解,一帧图像包括多行,每次对图像进行行扫描便可获得图像中的一行数据,即存储器606以“行”为粒度存储图像数据,当存储器606的“可用空间”已满时,该可用空间内存储的多行数据组成一个“块”,第一处理器602获取该可用空间内存储的多行数据,即获取图像中的一个块,并将该块作为第一子网络的输入数据,通过第一子网络提取该块的图像特征。其中,存储器606中的“可用空间”可以是存储器606中的部分存储空间,可用空间的容量可以是预设值,可选的,该预设值还可以进行动态调整。
具体的,本申请可采用“乒乓buffer(缓存)”机制来交替写入和读取相邻两个块的图像数据,举例来说,将存储器606包括两块“可用空间”,分别记为第一buffer和第二buffer,结合图5理解,第一buffer用于暂时缓存块1的H行数据,当块1的H行数据全部写入完成后,第一处理器602从第一buffer中读取块1的H行数据执行一次特征提取(即将该H行数据输入第一子网络,并通过第一子网络提取块1的图像特征)。与此同时,第二buffer继续在线接收块2的H行数据,当块2的H行数据全部写入完成后,第一处理器602从第二buffer中读取块2的H行数据执行一次特征提取,以此类推,第一buffer和第二buffer以动态流水的方式自上而下逐个完成N个块的特征提取过程。其中,H和N均取正整数。
步骤702,第二子网络基于多个块的图像特征确定融合后的图像特征。
步骤703,第二子网络基于融合后的图像特征进行目标对象(如人形)识别。
步骤704,当第二子网络检测图像中存在目标对象(如人形)时,唤醒二级唤醒网络402。
以目标对象为人形为例,在一种实施方式中,一级唤醒网络401识别到人形时,触发中断指令,中断一级唤醒网络401,并唤醒二级唤醒网络402,相应的第一处理器运行二级唤醒网络402。
步骤705,二级唤醒网络402将融合后的图像特征作为输入数据,并基于该输入数据进行目标对象(如人形)识别。
在图5的示例中,二级唤醒网络402的输入数据可以是融合后的图像特征,如此,可以实现第一子网络在两级唤醒网络中的复用,从而进一步降低系统功耗。
可选的,二级唤醒网络402的输入数据还可以是摄像头601采集的原始图像,该方案需要有额外的用于缓存原始图像的buffer,即应用于存储器606容量较大的设备中。应理解,输入数据不同时,所采用的二级唤醒网络402可能是不同的。
步骤706,二级唤醒网络302在识别到图像中是否存在目标对象(如人形)时,唤醒处理单元30。
步骤707,处理单元30执行预设的操作,参见前述的介绍,此处不再赘述。
上述设计,可应用于内存容量较小的设备中,本领域技术人员可知,类似智能门锁这样的设备中内存容量通常较小,现有技术中通常是将摄像头采集的图像进行缩放或裁切后输入检测单元20进行处理, 而在本申请中,不需要对图像进行ROI设定、剪裁或缩放等处理,简化唤醒流程。并且由于一级唤醒网络401单次推理的输入数据的分辨率仅为原始图像分辨率的1/N,一级唤醒网络401的运行内存和中间层数据缓存也仅为原始图像分辨率的1/N,再保证目标对象感知精度的同时减低了第二子网络的输入缓存和计算量。解决了现有技术需要缓存全帧数据的问题,大幅降低了数据缓存需求。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包括一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
本申请实施例中所描述的各种说明性的逻辑单元和电路可以通过通用处理器,数字信号处理器,专用集成电路(ASIC),现场可编程门阵列(FPGA)或其它可编程逻辑装置,离散门或晶体管逻辑,离散硬件部件,或上述任何组合的设计来实现或操作所描述的功能。通用处理器可以为微处理器,可选地,该通用处理器也可以为任何传统的处理器、控制器、微控制器或状态机。处理器也可以通过计算装置的组合来实现,例如数字信号处理器和微处理器,多个微处理器,一个或多个微处理器联合一个数字信号处理器核,或任何其它类似的配置来实现。
本申请实施例中所描述的方法或算法的步骤可以直接嵌入硬件、处理器执行的软件单元、或者这两者的结合。软件单元可以存储于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、可移动磁盘、CD-ROM或本领域中其它任意形式的存储媒介中。示例性地,存储媒介可以与处理器连接,以使得处理器可以从存储媒介中读取信息,并可以向存储媒介存写信息。可选地,存储媒介还可以集成到处理器中。处理器和存储媒介可以设置于ASIC中。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管结合具体特征及其实施例对本申请进行了描述,显而易见的,在不脱离本申请的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本申请的示例性说明,且视为已覆盖本申请范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包括这些改动和变型在内。

Claims (12)

  1. 一种唤醒方法,其特征在于,包括:
    获取图像;
    将所述图像作为一级唤醒网络的输入数据,以得到所述一级唤醒网络输出的第一检测结果,所述一级唤醒网络用于对输入的所述图像进行目标对象的检测;
    当所述第一检测结果指示所述图像中存在所述目标对象时,唤醒所述二级唤醒网络,所述二级唤醒网络用于对所述图像进行目标对象的检测,所述二级唤醒网络的检测精度高于所述一级唤醒网络的检测精度;
    使用所述二级唤醒网络对所述图像进行检测,以得到所述二级唤醒网络输出的第二检测结果;
    当所述第二检测结果指示所述图像中存在所述目标对象时,唤醒处理单元执行预设操作。
  2. 如权利要求1所述的方法,其特征在于,所述一级唤醒网络包括第一子网络和第二子网络,所述第一子网络用于进行特征提取,所述第二子网络用于基于所述第一子网络所提取的特征进行目标对象的检测;
    所述将所述图像作为一级唤醒网络的输入数据,以得到所述一级唤醒网络输出的第一检测结果,包括:
    将所述图像的多个部分依次输入所述第一子网络,以得到所述第一子网络对每个部分所提取的特征;
    将所述图像对应的融合特征作为所述第二子网络的输入数据,以得到所述第二子网络输出的所述第一检测结果,所述融合特征为根据所述多个部分中的每个部分的特征确定的。
  3. 如权利要求2所述的方法,其特征在于,所述图像的每个部分均为从目标内存空间中读取的所述图像的一行或多行数据,不同的部分所包括的图像的数据不同。
  4. 如权利要求2所述的方法,其特征在于,所述使用所述二级唤醒网络对所述图像进行检测,包括:
    将所述融合特征作为所述二级唤醒网络的输入数据,以得到所述二级唤醒网络输出的所述第二检测结果。
  5. 如权利要求1或2所述的方法,其特征在于,所述使用所述二级唤醒网络对所述图像进行检测,包括:
    将所述图像作为所述二级唤醒网络的输入数据,以得到所述二级唤醒网络输出的所述第二检测结果。
  6. 一种检测装置,其特征在于,包括摄像头、第一处理器、第二处理器;其中,
    所述摄像头,用于采集图像;
    所述第一处理器,用于获取所述摄像头采集的图像;将所述图像作为一级唤醒网络的输入数据,以得到所述一级唤醒网络输出的第一检测结果,所述一级唤醒网络用于对所述图像进行目标对象的检测;当所述第一检测结果指示所述图像中存在所述目标对象时,唤醒所述二级唤醒网络;使用所述二级唤醒网络对所述图像进行检测,以得到所述二级唤醒网络输出的第二检测结果,所述二级唤醒网络用于对输入的所述图像进行目标对象的检测,所述二级唤醒网络的检测精度高于所述一级唤醒网络的检测精度;当所述第二检测结果指示所述图像中存在所述目标对象时,唤醒所述第二处理器;
    所述第二处理器,用于执行唤醒后对应的预设操作。
  7. 如权利要求6所述的装置,其特征在于,所述一级唤醒网络包括第一子网络和第二子网络,所述第一子网络用于进行特征提取,所述第二子网络用于基于所述第一子网络所提取的特征进行目标对象的检测;
    所述第一处理器将所述图像作为一级唤醒网络的输入数据,以得到所述一级唤醒网络输出的第一检测结果时,具体用于:将所述图像的多个部分依次输入所述第一子网络,以得到所述第一子网络对每个部分所提取的特征;将所述图像对应的融合特征作为所述第二子网络的输入数据,以得到所述第二子网络输出的所述第一检测结果,所述融合特征为根据所述多个部分中的每个部分的特征进行融合。
  8. 如权利要求7所述的装置,其特征在于,所述装置还包括内存;所述图像的每个部分均为从目标内存空间中读取的所述图像的一行或多行数据,不同的部分所包括的图像的数据不同。
  9. 如权利要求7所述的装置,其特征在于,所述第二处理器使用所述二级唤醒网络对所述图像进行检测时,具体用于:将所述融合特征作为所述二级唤醒网络的输入数据,以得到所述二级唤醒网络输出 的所述第二检测结果。
  10. 如权利要求6或7所述的装置,其特征在于,所述第二处理器使用所述二级唤醒网络对所述图像进行检测时,具体用于:将所述图像作为所述二级唤醒网络的输入数据,以得到所述二级唤醒网络输出的所述第二检测结果。
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序或指令,当计算机程序或指令被检测装置执行时,使得所述检测装置执行如权利要求1-5任一项所述中的方法。
  12. 一种计算机程序产品,其特征在于,该计算机程序产品包括计算机程序或指令,当该计算机程序或指令被检测装置执行时,实现如权利要求1-5任一项所述中的方法。
PCT/CN2023/103466 2022-11-03 2023-06-28 一种唤醒方法及装置 WO2024093296A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202211370332 2022-11-03
CN202211370332.X 2022-11-03
CN202310156937.7 2023-02-14
CN202310156937.7A CN117992128A (zh) 2022-11-03 2023-02-14 一种唤醒方法及装置

Publications (1)

Publication Number Publication Date
WO2024093296A1 true WO2024093296A1 (zh) 2024-05-10

Family

ID=90895201

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/103466 WO2024093296A1 (zh) 2022-11-03 2023-06-28 一种唤醒方法及装置

Country Status (2)

Country Link
CN (1) CN117992128A (zh)
WO (1) WO2024093296A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106842350A (zh) * 2016-12-26 2017-06-13 首都师范大学 同平台不同分辨率传感器联合动目标检测系统及检测方法
CN111429901A (zh) * 2020-03-16 2020-07-17 云知声智能科技股份有限公司 一种面向IoT芯片的多级语音智能唤醒方法及系统
KR20210057358A (ko) * 2019-11-12 2021-05-21 주식회사 에스오에스랩 제스처 인식 방법 및 이를 수행하는 제스처 인식 장치

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106842350A (zh) * 2016-12-26 2017-06-13 首都师范大学 同平台不同分辨率传感器联合动目标检测系统及检测方法
KR20210057358A (ko) * 2019-11-12 2021-05-21 주식회사 에스오에스랩 제스처 인식 방법 및 이를 수행하는 제스처 인식 장치
CN111429901A (zh) * 2020-03-16 2020-07-17 云知声智能科技股份有限公司 一种面向IoT芯片的多级语音智能唤醒方法及系统

Also Published As

Publication number Publication date
CN117992128A (zh) 2024-05-07

Similar Documents

Publication Publication Date Title
Zou et al. Occupancy detection in the office by analyzing surveillance videos and its application to building energy conservation
US11232685B1 (en) Security system with dual-mode event video and still image recording
WO2019179024A1 (zh) 机场跑道智能监控方法、应用服务器及计算机存储介质
WO2020057355A1 (zh) 一种三维建模的方法及其装置
Zhuang et al. Real‐time vehicle detection with foreground‐based cascade classifier
US20190020827A1 (en) Pre-roll image capture implemented by a power-limited security device
US10445885B1 (en) Methods and systems for tracking objects in videos and images using a cost matrix
WO2021223665A1 (zh) 一种减少无效告警的方法及装置
CN104598897A (zh) 视觉传感器、图像处理方法和装置、视觉交互设备
US10657783B2 (en) Video surveillance method based on object detection and system thereof
US20190340904A1 (en) Door Surveillance System and Control Method Thereof
US11354819B2 (en) Methods for context-aware object tracking
US20210201501A1 (en) Motion-based object detection method, object detection apparatus and electronic device
Ali et al. Autonomous road surveillance system: A proposed model for vehicle detection and traffic signal control
CN115100732A (zh) 钓鱼检测方法、装置、计算机设备及存储介质
CN111881946A (zh) 一种安全监控方法、装置、存储介质、电子设备以及空调
CN115410324A (zh) 一种基于人工智能的房车夜间安防系统及方法
Liao et al. Sleep monitoring system in real bedroom environment using texture-based background modeling approaches
WO2024093296A1 (zh) 一种唤醒方法及装置
US11032762B1 (en) Saving power by spoofing a device
Singh et al. An intelligent video surveillance system using edge computing based deep learning model
CN116246402A (zh) 一种监控方法及装置
Paissan et al. People/car classification using an ultra-low-power smart vision sensor
Rathour et al. KlugOculus: A Vision‐Based Intelligent Architecture for Security System
CN111225178A (zh) 基于对象检测的视频监控方法及其系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23884235

Country of ref document: EP

Kind code of ref document: A1