CN112036267A - Target detection method, device, equipment and computer readable storage medium - Google Patents

Target detection method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN112036267A
CN112036267A CN202010819962.5A CN202010819962A CN112036267A CN 112036267 A CN112036267 A CN 112036267A CN 202010819962 A CN202010819962 A CN 202010819962A CN 112036267 A CN112036267 A CN 112036267A
Authority
CN
China
Prior art keywords
data
target object
dimensional
image depth
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010819962.5A
Other languages
Chinese (zh)
Inventor
邓海燕
谭龙田
陈高
陈彦宇
马雅奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai, Zhuhai Lianyun Technology Co Ltd filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN202010819962.5A priority Critical patent/CN112036267A/en
Publication of CN112036267A publication Critical patent/CN112036267A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method, a target detection device, target detection equipment and a computer-readable storage medium. The method comprises the following steps: acquiring image data of a detection environment and depth data corresponding to a target object at intervals of a preset time period; sequentially fusing image data and depth data acquired at the same time according to the sequence of the acquisition time from first to last to obtain image depth fusion data corresponding to the same time; sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model according to the sequence of fusion time from first to last, extracting attribute data of a target object from the sequentially input image depth fusion data through the target detection model, and generating three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data. According to the invention, a user does not need to insist on the periphery of the target object and check the attribute data and the three-dimensional motion trajectory data of the target object output by the target detection model, so that the monitoring process is more flexible and the monitoring efficiency is high.

Description

Target detection method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of image recognition technologies, and in particular, to a target detection method, apparatus, device, and computer-readable storage medium.
Background
In some application scenarios, attention needs to be paid to the dynamics of the target object in order to identify the requirements of the target object. For example: in daily life, there are some special people who can not take care of themselves and have health problems, such as: children, the elderly, patients, etc. These people in need of special care need 24 hours of care to avoid adverse consequences due to unattended care. For example: the children are injured because of no supervision, and the disease is treated without help because of no accompanying person.
At present, manual monitoring is mostly adopted for monitoring people in special needs, and a guardian generally needs to accompany around a person under guardianship for 24 hours, because the person under guardianship may have unpredictable risks after the guardian leaves. However, the method of 24-hour manual monitoring requires a guardian to be in a monitoring location, and if a plurality of monitored persons need to be monitored at the same time, the guardians need to be equipped one by one, so that the efficiency of the manual monitoring method is low.
Disclosure of Invention
Embodiments of the present invention mainly aim to provide a target detection method, an apparatus, a device, and a computer-readable storage medium, so as to solve the problem of low efficiency of the existing manual monitoring.
In view of the above technical problems, the embodiments of the present invention are implemented by the following technical solutions:
the embodiment of the invention provides a target detection method, which comprises the following steps: acquiring image data of a detection environment and depth data corresponding to a target object at intervals of a preset time period; sequentially fusing the image data and the depth data acquired at the same time according to the sequence of the acquisition time from first to last to obtain image depth fusion data corresponding to the same time; sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model according to the sequence of fusion time from first to last, extracting attribute data of the target object from the sequentially input image depth fusion data through the target detection model, and generating three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data.
The sequentially fusing the image data and the depth data acquired at the same time to obtain image depth fusion data corresponding to the same time comprises the following steps: and combining the image data and the depth data acquired at the same time to form an image depth one-dimensional array corresponding to the same time, and taking the image depth one-dimensional array as image depth fusion data corresponding to the same time.
Wherein the target detection model comprises: a connected YOLO model and a long-short term memory (LSTM) model; the extracting, by the target detection model, attribute data of the target object from the sequentially input image depth fusion data, and generating three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data, includes: detecting the target object in the image depth fusion data sequentially input through the YOLO model, and extracting three-dimensional coordinate data of key points of the target object and attribute data of the target object; and tracking the motion trail of the target object detected by the YOLO model in the sequentially input image depth fusion data through the LSTM model to obtain the motion trail data of the target object, and generating the three-dimensional motion trail data of the target object according to the motion trail data of the target object and the three-dimensional coordinate data of the key point of the target object extracted by the YOLO model in the image depth fusion data.
Wherein the YOLO model comprises: a three-dimensional convolutional layer; the extracting three-dimensional coordinate data of the key points of the target object comprises: extracting two-dimensional feature data of key points of the target object in an image data part of the image depth fusion data through the three-dimensional convolution layer, and extracting one-dimensional feature data of the key points of the target object in a depth data part of the image depth fusion data; generating three-dimensional coordinate data of key points of the target object according to the two-dimensional characteristic data and the one-dimensional characteristic data; wherein the spatial dimension of the one-dimensional feature data is different from the spatial dimension of the two-dimensional feature data.
The motion trail data is region position data of the target object in the image depth fusion data; the three-dimensional motion trajectory data includes: multi-frame three-dimensional motion trail images; the generating three-dimensional motion trajectory data of the target object according to the motion trajectory data of the target object and the three-dimensional coordinate data of the key point of the target object extracted by the YOLO model in the image depth fusion data includes: constructing a three-dimensional coordinate space; sequentially acquiring the region position data of the target object and the three-dimensional coordinate data of the key points in each image depth fusion data according to the sequence of fusion time from first to last to obtain a plurality of groups of region position data and three-dimensional coordinate data of the key points; and setting a three-dimensional model corresponding to the target object in the three-dimensional coordinate space according to the region position data and the three-dimensional coordinate data of the key points aiming at each group of the region position data and the three-dimensional coordinate data of the key points, and generating a frame of three-dimensional motion trail image.
Wherein after the obtaining of the motion trajectory data of the target object, the method further comprises: comparing the motion trail data of the target object with preset abnormal state data; and when the similarity between the motion trail data of the target object and the abnormal state data is greater than a preset similarity threshold, executing abnormal alarm operation corresponding to the abnormal state data.
Before the sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model, the method further comprises: simultaneously acquiring image data of a detection environment and depth data corresponding to a target object; fusing the image data and the depth data which are acquired simultaneously to obtain sample image depth fusion data and marking attribute data and region position data for the sample image depth fusion data; and performing data enhancement processing based on the sample image depth fusion to obtain a plurality of enhanced image depth fusion data corresponding to the sample image depth fusion data, and using each enhanced image depth fusion data as one sample image depth fusion data so as to train the target detection model by using all the obtained sample image depth fusion data.
An embodiment of the present invention further provides a target detection apparatus, including: the acquisition module is used for acquiring image data of a detection environment and depth data corresponding to a target object at intervals of a preset time period; the fusion module is used for sequentially fusing the image data and the depth data acquired at the same time according to the sequence of the acquisition time from first to last to obtain image depth fusion data corresponding to the same time; the detection module is used for sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model according to the sequence of fusion time from first to last, extracting attribute data of the target object from the sequentially input image depth fusion data through the target detection model, and generating three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data.
The embodiment of the invention also provides target detection equipment, which comprises a processor and a memory; the processor is configured to execute an object detection program stored in the memory to implement any one of the object detection methods described above.
Embodiments of the present invention also provide a computer-readable storage medium, which stores one or more programs that can be executed by one or more processors to implement any of the object detection methods described above.
The embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, the image data of the detection environment where the target object is located and the depth data corresponding to the target object are collected, the image data and the depth data are fused into one path of data and input into a pre-trained target detection model, and the attribute data and the three-dimensional motion trajectory data of the target object output by the target detection model are displayed, so that a user can visually see the attribute information of the target object and the motion trajectory of the target object. According to the embodiment of the invention, a user does not need to stay around the target object all the time and check the attribute data and the three-dimensional motion trajectory data of the target object output by the target detection model, so that the monitoring process is more flexible, a plurality of monitored objects can be monitored simultaneously, the labor cost is low, and the monitoring efficiency is high.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of a target detection method according to an embodiment of the invention;
FIG. 2 is a flowchart of the processing steps of an object detection model according to one embodiment of the invention;
FIG. 3 is a flow chart of the processing steps of an LSTM model according to an embodiment of the present invention;
FIG. 4 is a block diagram of an object detection device according to an embodiment of the present invention;
FIG. 5 is a block diagram of an object detection device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.
According to an embodiment of the present invention, there is provided a target detection method. Fig. 1 is a flowchart illustrating a target detection method according to an embodiment of the invention.
Step S110, collecting image data of the detection environment and depth data corresponding to the target object at intervals of a preset time period.
The time length of the preset time period is an empirical value or a value obtained through experiments.
The detection environment refers to an environment in which the target object is located.
The target object refers to an image of the monitored object. For example: the monitored objects are children, old people and patients, and of course, the monitored objects can also be animals.
Specifically, the camera and the depth sensor are called simultaneously, so that the camera collects image data of a detection environment, and the depth sensor collects depth data corresponding to a target object. The shooting interval of the camera and the sampling interval of the depth sensor. The shooting interval and the sampling interval are the preset time period.
The image data refers to data of one frame of image collected by the camera. Further, the viewing range of the camera is the detection environment.
The depth data refers to a depth value of the monitored subject in the detection environment with the position of the depth sensor as a starting point. And taking the depth value of the monitored object in the detection environment as the depth value corresponding to the target object. Further, the depth sensor is directly facing the subject or the monitored location of the subject. The monitoring location is for example a child's bed, a bed of a patient. Further, if the depth sensor is directly opposite to the movable monitored object, a positioning device can be arranged on the monitored object, so that the positioning device transmits the position of the target object to the depth sensor in real time; or, an infrared receiver is arranged on the depth sensor, an infrared transmitter is arranged on the monitored object, the infrared transmitter sends an infrared signal to the infrared receiver, the infrared receiver positions the position of the infrared transmitter according to the received infrared signal, and the depth sensor is controlled to be aligned to the position of the infrared transmitter.
And step S120, sequentially fusing the image data and the depth data acquired at the same time according to the sequence of the acquisition time from first to last to obtain image depth fusion data corresponding to the same time.
The interval between two adjacent sampling moments is the preset time period. A set of image data and depth data may be acquired at each acquisition instant. The preceding acquisition instant is earlier than the following acquisition instant.
And performing fusion processing on each group of image data and depth data according to the sequence of the acquisition time from first to last to obtain image depth fusion data corresponding to the group of image data and depth data.
Specifically, the fusion process includes: and combining the image data and the depth data acquired at the same time to form an image depth one-dimensional array corresponding to the same time, and taking the image depth one-dimensional array as image depth fusion data corresponding to the same time. The image data is taken as an image data part (image data element) in the image depth one-dimensional array, and the depth data is taken as a depth data part (depth data element) in the image depth one-dimensional array. Further, the present embodiment does not limit the ordering of the image data elements and the depth data elements in the image depth one-dimensional array.
And step S130, sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model according to the sequence of fusion time from first to last, extracting attribute data of the target object through the target detection model, and generating three-dimensional motion trajectory data corresponding to the target object.
And the target detection model is used for extracting attribute data of the target object from the sequentially input image depth fusion data and generating three-dimensional motion trail data corresponding to the target object according to the sequentially input image depth fusion data.
Attribute data of the target object, including but not limited to: category, name, age, height, skin tone, dressing, and number of the target object. The categories of the target object are for example: children, the elderly, and patients. The numbering is for example: the number of hospitalization indicated on the wristband.
The three-dimensional motion trajectory data refers to video data of a target object moving in a three-dimensional coordinate space.
Three-dimensional motion trajectory data comprising: and (4) multiple frames of three-dimensional motion trail images. And playing the multi-frame three-dimensional motion track image to display the motion track of the target object in the three-dimensional coordinate space.
Further, the three-dimensional motion trajectory data comprises: three-dimensional motion trajectory data of a target object (a monitored object) at the current acquisition moment and three-dimensional motion trajectory data of the target object at a future moment. The future time is a time after a preset time length from the current acquisition time. That is, the target detection model is trained so that the target detection model predicts three-dimensional motion trajectory data of the target object at a future time from the sequentially input image depth fusion data.
And displaying the attribute data of the target object output by the target detection model and the three-dimensional motion trail data corresponding to the target object in preset display equipment.
In the embodiment of the invention, the image data of the detection environment where the target object is located and the depth data corresponding to the target object are collected, the image data and the depth data are fused into one path of data and input into a pre-trained target detection model, and the attribute data and the three-dimensional motion trajectory data of the target object output by the target detection model are displayed, so that a user can visually see the attribute information of the target object and the motion trajectory of the target object. According to the embodiment of the invention, a user does not need to stay around the target object all the time and check the attribute data and the three-dimensional motion trajectory data of the target object output by the target detection model, so that the monitoring process is more flexible, a plurality of monitored objects can be monitored simultaneously, the labor cost is low, and the monitoring efficiency is high.
Further, since the state of the monitored object has flexibility, the types of the monitored object have diversity (children, the elderly, and various patients), and the moving path of the monitored object in the detection environment is random, the target detection model is used for performing trajectory tracking on the target object (the monitored object) and generating three-dimensional motion trajectory data, which is of great significance for the automatic monitoring of the monitored object.
Furthermore, the pre-trained target detection model can predict the three-dimensional motion trajectory data of the target object at the future moment, so that the user can pre-judge according to the three-dimensional motion trajectory data of the target object at the future moment, estimate the risk of the monitored object, solve the problems encountered by the monitored object in time, ensure the safety of the monitored object and avoid the waste of time, labor and economic cost caused by the problem discovery or untimely treatment. For example: according to the image data and the depth data acquired in a period of time, the fact that the target object is about to pull out the infusion needle is predicted, and at the moment, a user can be timely nearby the target object to help or prevent the needle pulling behavior of the target object.
In order to make the object detection process more clear, the structure and function of the object detection model are further described below.
The target detection model is a pre-trained model. The target detection model includes: a YOLO (Young Look one) model and an LSTM (Long Short-Term Memory) model connected with each other.
The processing of the object detection model is further described below.
FIG. 2 is a flowchart illustrating the processing steps of the object detection model according to an embodiment of the invention.
Step S210, sequentially inputting the image depth fusion data obtained by fusion into the YOLO model.
And the YOLO model is used for detecting a target object, extracting attribute data of the target object, detecting key points of the target object and extracting three-dimensional coordinate data of the key points in the image data of the image depth fusion data. Wherein the number of the key points may be plural.
The key point refers to a key position of the target object. For example: eyebrows, canthus, corners of the mouth, earlobes, shoulders, elbows, knees, etc. The keypoints may delineate an image region of the target object.
Step S220, detecting the target object in the sequentially input image depth fusion data through the YOLO model, and extracting three-dimensional coordinate data of a key point of the target object and attribute data of the target object.
The three-dimensional coordinate data of the keypoint refers to the coordinate of the keypoint in three-dimensional space.
In the present embodiment, in order to more accurately represent the three-dimensional motion data of the target object, three-dimensional coordinate data of each of a plurality of key points may be extracted.
The detected target object and the key points of the detected target object are output to the LSTM model through the YOLO model.
Step S230, performing motion trajectory tracking on the target object detected by the YOLO model in the sequentially input image depth fusion data through the LSTM model to obtain motion trajectory data of the target object, and generating three-dimensional motion trajectory data of the target object according to the motion trajectory data of the target object and the three-dimensional coordinate data of the key point of the target object extracted from the image depth fusion data by the YOLO model.
And the LSTM model is used for tracking the motion trail of the target object according to the output result of the YOLO model and generating three-dimensional motion trail data of the target object.
And tracking the motion trail, namely tracking key points of the target object.
Motion trajectory data comprising: region position data of the target object in the image depth fusion data. The region position data may be a region position of a figure formed by key points of the target object. The graphic is similar to the target object.
The image depth fusion data comprises an image data part and a depth data part, and the motion trail data is the region position data of the target object in the image data of the image depth fusion data.
Since the key points of the target object may reflect the posture of the target object, the motion trajectory data may reflect the state of the monitored object (target object). The state of the subject may be determined using the motion trajectory data of one frame of image data or the motion trajectory data of a plurality of frames of image data. The state includes: health status, behavioral status. For example: and determining whether the monitored object is in a preset range of the detection environment or not according to the motion trail data of one frame of image data. For another example: and determining whether the monitored object wears the mask or not and whether the monitored object keeps a distance with other people or not according to the motion trail data of one frame of image data. The following steps are repeated: and determining whether the monitored object has normal gait and whether the monitored object has twitch or not according to the motion trail data of the multi-frame image data.
In the target detection model, a basic network model may be further included, an output of the basic network model is connected to an input of the YOLO model, and an output of the YOLO model is connected to an input of the LSTM model. The basic network model is used for preprocessing the image data in the image depth fusion data, converting the image data into gray data and reducing noise in the image data.
The process of the YOLO model detecting the key points and extracting the three-dimensional coordinate data of the key points is further described below.
After the image of the target object is detected, the YOLO model identifies key points in the image of the target object, and extracts three-dimensional coordinate data of each key point according to a plurality of identified key points of preset types. The plurality of preset types of key points include, but are not limited to: eyebrow key points, shoulder key points, elbow key points, waist key points, knee key points and step key points.
Further, the YOLO model includes: a three-dimensional convolutional layer. Further, the YOLO model is a YOLO algorithm model of fourth edition (YOLOv 4). The two-dimensional convolution layer of the YOLO model is expanded to a three-dimensional convolution layer. The three-dimensional convolutional layer includes a convolutional layer of three channels. The convolutional layers of the three channels have the same structure, and the convolutional layer of each channel is used for extracting feature data of one spatial dimension.
Extracting two-dimensional feature data of key points of the target object in an image data part of the image depth fusion data through the three-dimensional convolution layer, and extracting one-dimensional feature data of the key points of the target object in a depth data part of the image depth fusion data; and generating three-dimensional coordinate data of key points of the target object according to the two-dimensional characteristic data and the one-dimensional characteristic data.
Wherein the spatial dimension of the one-dimensional feature data is different from the spatial dimension of the two-dimensional feature data. For example: the two-dimensional feature data includes: coordinates of an X axis and a Y axis corresponding to the key points; the one-dimensional feature data includes: z-axis coordinates corresponding to the key points; and fusing the X-axis, Y-axis and Z-axis coordinates corresponding to the key points to obtain three-dimensional coordinate data corresponding to the key points.
The processing for the LSTM model is further described below. FIG. 3 is a flow chart illustrating the processing steps of the LSTM model according to an embodiment of the present invention.
Step S310, a three-dimensional coordinate space is constructed.
The three-dimensional coordinate space may be a sky box of a preset viewing angle.
The preset viewing angle is the same as the viewing angle of the camera that collects the image data.
Step S320, sequentially obtaining the region position data of the target object and the three-dimensional coordinate data of the key point in each image depth fusion data according to the sequence of the fusion time from first to last, and obtaining multiple sets of region position data and three-dimensional coordinate data of the key point.
The three-dimensional motion trajectory data includes: and (4) multiple frames of three-dimensional motion trail images. The three-dimensional motion trail image refers to an image of a three-dimensional model corresponding to the target object in a three-dimensional coordinate space.
The region position data of the target object may embody a region position of the target object in the image data.
The LSTM model may extract time-series characteristics of multiple frames of image data from a plurality of continuously input image depth fusion data, and determine the region position of the target object in the image data according to the time-series characteristics. The extracted time sequence features are different data in the previous frame of image data and the next frame of image data. Such as: the human body is continuous in the walking process, partial similarity and difference exist between two adjacent frames of image data from near to far (or from far to near) of the human body image, the time characteristic is used for representing the time sequence relation, and the difference of the time sequence between two frames of image data before and after can be represented.
Step S330, setting a three-dimensional model corresponding to the target object in the three-dimensional coordinate space according to the region position data and the three-dimensional coordinate data of the key points aiming at each group of region position data and three-dimensional coordinate data of the key points, and generating a frame of three-dimensional motion trail image.
And each group of regional position data and the three-dimensional coordinate data of the key points are extracted from the same image depth fusion data.
In the present embodiment, a three-dimensional motion trajectory image of the target object may be generated using Unity 3D.
Specifically, a three-dimensional model of the monitored object, i.e., a three-dimensional model corresponding to the target object, may be acquired in advance. The region position data indicates a region position of the target object in the XOY plane; the three-dimensional coordinate data of the key points indicates positions of the key points of the target object in a three-dimensional coordinate space; setting the three-dimensional model at the area position of the target object in the XOY plane; determining mapping points on the three-dimensional model, which are mapped by each key point of the target object, and adjusting the three-dimensional coordinates of each mapping point into the three-dimensional coordinates of the corresponding key point; and after the adjustment is finished, generating an image of the three-dimensional model under a preset visual angle as a frame of three-dimensional motion track image corresponding to the target object.
In this embodiment, since the motion trajectory data of the target object may be used to represent the state of the monitored object, in order to avoid missing an abnormal behavior of the monitored object, after obtaining the motion trajectory data of the target object, the motion trajectory data of the target object is compared with preset abnormal state data; and when the similarity between the motion trail data of the target object and the abnormal state data is greater than a preset similarity threshold, executing abnormal alarm operation corresponding to the abnormal state data.
The abnormal state data is image data when the subject has abnormal behavior.
The similarity threshold is an empirical value or a value obtained through experiments.
The abnormal alarm operation includes: and sending a preset warning sound and/or sending information of warning content corresponding to the abnormal state data to the target user. The information is text information and/or voice information.
For example: the monitored object is a patient, the motion track data of the target object is image data of the patient who is pulling the infusion needle, and the abnormal state data is image data of the patient who is pulling the infusion needle, so that the similarity between the motion track data of the target object and the abnormal state data is larger than a similarity threshold value, and a preset warning sound is sent or information that the patient tries to pull the needle is sent to a nurse station.
In this embodiment, the target detection model is obtained by pre-training. Before training the target detection model, a sample data set needs to be set for the target detection model. The sample dataset comprises a plurality of sample image depth fusion data which have been annotated.
Specifically, image data of a detection environment and depth data corresponding to a target object are acquired simultaneously; and fusing the image data and the depth data which are acquired simultaneously to obtain sample image depth fusion data and labeling attribute data and region position data for the sample image depth fusion data. Furthermore, when the image data of the detection environment is collected, the image data of the side, the front, the back and other angles of the monitored object are collected as much as possible, and the diversity of the depth fusion data of the sample image is ensured, so that the three-dimensional detection of the target object can be better carried out.
When the number of the sample image depth fusion data is small, the sample image depth fusion data can be enhanced, namely, the number of the sample image depth fusion data is expanded on the basis of the existing sample image depth fusion data. Specifically, data enhancement processing is performed based on the sample image depth fusion to obtain a plurality of enhanced image depth fusion data corresponding to the sample image depth fusion data, and each enhanced image depth fusion data is used as one sample image depth fusion data, so that the target detection model is trained by using all the obtained sample image depth fusion data.
Further, randomly cutting, zooming, rotating, turning, mirroring, adding noise, randomly adjusting contrast, brightness, chroma and the like on the image data in the sample image depth fusion data to obtain new image data, combining the new image data with the depth data in the sample image depth fusion data to obtain new image depth fusion data, and taking the new image depth fusion data as the sample image depth fusion data.
And storing the obtained sample image depth fusion data into a preset sample data set.
Before training the target detection model, training data of the target detection model is set. In this embodiment, the training data includes, but is not limited to: iteration times, initial weight value and network structure of the target detection model, learning rate and convolution kernel.
And when the target detection model is trained, the target detection model is trained by adopting a gradient descent method. And continuously adjusting parameters and weights of the target detection model according to preset iteration times in the process of training the target detection model by using the training data set. After iteration is finished, if the target detection model is not converged, the network structure of the target detection model is adjusted, and the target detection model is trained again until the target detection model is converged.
Further, in order to increase the accuracy and robustness of the YOLO model, multi-scale training is added to the YOLO model, image data in the sample image depth fusion data are converted into multiple scales, and the YOLO model is trained by using the image data of the multiple scales. For example: the image width and height in the configuration file of the YOLO model are set to 640 x 640, so that the detection accuracy of the YOLO model on the small target is improved.
In this embodiment, the accuracy and stability of the target detection model is determined using a loss function. And when the loss value of the target detection model is smaller than the preset loss threshold value, the target detection model is converged. Further, the loss function may be a confidence loss function, a classification loss function, or a loss function based on a target rectangle selection box. For example: a Distance Intersection Over Unit (DIOU) loss function or a Complete Intersection Over Unit (CIOU) loss function may be used to calculate a loss value of the target detection model, and when the loss value is greater than a preset loss threshold, a parameter of the target detection model is adjusted.
The embodiment of the invention also provides a target detection device. Fig. 4 is a block diagram of an object detecting device according to an embodiment of the present invention.
The object detection device includes: an acquisition module 410, a fusion module 420, and a detection module 430.
The acquisition module 410 is configured to acquire image data of a detection environment and depth data corresponding to a target object at intervals of a preset time period.
And the fusion module 420 is configured to sequentially fuse the image data and the depth data acquired at the same time according to the sequence of the acquisition time from first to last to obtain image depth fusion data corresponding to the same time.
The detection module 430 is configured to sequentially input the image depth fusion data obtained by fusion into a pre-trained target detection model according to a sequence from first to last at a fusion time, extract attribute data of the target object from the sequentially input image depth fusion data through the target detection model, and generate three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data.
The functions of the apparatus according to the embodiments of the present invention have been described in the above method embodiments, so that reference may be made to the related descriptions in the foregoing embodiments for details which are not described in the present embodiment, and further details are not described herein.
The present embodiment provides an object detection apparatus. Fig. 5 is a block diagram of an object detecting apparatus according to an embodiment of the present invention.
In this embodiment, the target detection device includes, but is not limited to: processor 510, memory 520.
The processor 510 is configured to execute an object detection program stored in the memory 520 to implement the object detection method described above.
Specifically, the processor 510 is configured to execute the object detection program stored in the memory 520 to implement the following steps: acquiring image data of a detection environment and depth data corresponding to a target object at intervals of a preset time period; sequentially fusing the image data and the depth data acquired at the same time according to the sequence of the acquisition time from first to last to obtain image depth fusion data corresponding to the same time; sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model according to the sequence of fusion time from first to last, extracting attribute data of the target object from the sequentially input image depth fusion data through the target detection model, and generating three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data.
The sequentially fusing the image data and the depth data acquired at the same time to obtain image depth fusion data corresponding to the same time comprises the following steps: and combining the image data and the depth data acquired at the same time to form an image depth one-dimensional array corresponding to the same time, and taking the image depth one-dimensional array as image depth fusion data corresponding to the same time.
Wherein the target detection model comprises: a connected YOLO model and a long-short term memory (LSTM) model; the extracting, by the target detection model, attribute data of the target object from the sequentially input image depth fusion data, and generating three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data, includes: detecting the target object in the image depth fusion data sequentially input through the YOLO model, and extracting three-dimensional coordinate data of key points of the target object and attribute data of the target object; and tracking the motion trail of the target object detected by the YOLO model in the sequentially input image depth fusion data through the LSTM model to obtain the motion trail data of the target object, and generating the three-dimensional motion trail data of the target object according to the motion trail data of the target object and the three-dimensional coordinate data of the key point of the target object extracted by the YOLO model in the image depth fusion data.
Wherein the YOLO model comprises: a three-dimensional convolutional layer; the extracting three-dimensional coordinate data of the key points of the target object comprises: extracting two-dimensional feature data of key points of the target object in an image data part of the image depth fusion data through the three-dimensional convolution layer, and extracting one-dimensional feature data of the key points of the target object in a depth data part of the image depth fusion data; generating three-dimensional coordinate data of key points of the target object according to the two-dimensional characteristic data and the one-dimensional characteristic data; wherein the spatial dimension of the one-dimensional feature data is different from the spatial dimension of the two-dimensional feature data.
The motion trail data is region position data of the target object in the image depth fusion data; the three-dimensional motion trajectory data includes: multi-frame three-dimensional motion trail images; the generating three-dimensional motion trajectory data of the target object according to the motion trajectory data of the target object and the three-dimensional coordinate data of the key point of the target object extracted by the YOLO model in the image depth fusion data includes: constructing a three-dimensional coordinate space; sequentially acquiring the region position data of the target object and the three-dimensional coordinate data of the key points in each image depth fusion data according to the sequence of fusion time from first to last to obtain a plurality of groups of region position data and three-dimensional coordinate data of the key points; and setting a three-dimensional model corresponding to the target object in the three-dimensional coordinate space according to the region position data and the three-dimensional coordinate data of the key points aiming at each group of the region position data and the three-dimensional coordinate data of the key points, and generating a frame of three-dimensional motion trail image.
Wherein after the obtaining of the motion trajectory data of the target object, the method further comprises: comparing the motion trail data of the target object with preset abnormal state data; and when the similarity between the motion trail data of the target object and the abnormal state data is greater than a preset similarity threshold, executing abnormal alarm operation corresponding to the abnormal state data.
Before the sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model, the method further comprises: simultaneously acquiring image data of a detection environment and depth data corresponding to a target object; fusing the image data and the depth data which are acquired simultaneously to obtain sample image depth fusion data and marking attribute data and region position data for the sample image depth fusion data; and performing data enhancement processing based on the sample image depth fusion to obtain a plurality of enhanced image depth fusion data corresponding to the sample image depth fusion data, and using each enhanced image depth fusion data as one sample image depth fusion data so as to train the target detection model by using all the obtained sample image depth fusion data.
The embodiment of the invention also provides a computer readable storage medium. The computer-readable storage medium herein stores one or more programs. Among other things, computer-readable storage media may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.
When the one or more programs in the computer-readable storage medium are executable by the one or more processors to implement the object detection method described above. Since the target detection method has been described in detail above, it is not described herein again.
The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (10)

1. A method of object detection, comprising:
acquiring image data of a detection environment and depth data corresponding to a target object at intervals of a preset time period;
sequentially fusing the image data and the depth data acquired at the same time according to the sequence of the acquisition time from first to last to obtain image depth fusion data corresponding to the same time;
sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model according to the sequence of fusion time from first to last, extracting attribute data of the target object from the sequentially input image depth fusion data through the target detection model, and generating three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data.
2. The method according to claim 1, wherein the sequentially fusing the image data and the depth data acquired at the same time to obtain image depth fusion data corresponding to the same time comprises:
and combining the image data and the depth data acquired at the same time to form an image depth one-dimensional array corresponding to the same time, and taking the image depth one-dimensional array as image depth fusion data corresponding to the same time.
3. The method of claim 1,
the target detection model includes: a connected YOLO model and a long-short term memory (LSTM) model;
the extracting, by the target detection model, attribute data of the target object from the sequentially input image depth fusion data, and generating three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data, includes:
detecting the target object in the image depth fusion data sequentially input through the YOLO model, and extracting three-dimensional coordinate data of key points of the target object and attribute data of the target object;
and tracking the motion trail of the target object detected by the YOLO model in the sequentially input image depth fusion data through the LSTM model to obtain the motion trail data of the target object, and generating the three-dimensional motion trail data of the target object according to the motion trail data of the target object and the three-dimensional coordinate data of the key point of the target object extracted by the YOLO model in the image depth fusion data.
4. The method of claim 3,
the YOLO model includes: a three-dimensional convolutional layer;
the extracting three-dimensional coordinate data of the key points of the target object comprises:
extracting two-dimensional feature data of key points of the target object in an image data part of the image depth fusion data through the three-dimensional convolution layer, and extracting one-dimensional feature data of the key points of the target object in a depth data part of the image depth fusion data;
generating three-dimensional coordinate data of key points of the target object according to the two-dimensional characteristic data and the one-dimensional characteristic data; wherein the spatial dimension of the one-dimensional feature data is different from the spatial dimension of the two-dimensional feature data.
5. The method of claim 3,
the motion trail data is regional position data of the target object in the image depth fusion data;
the three-dimensional motion trajectory data includes: multi-frame three-dimensional motion trail images;
the generating three-dimensional motion trajectory data of the target object according to the motion trajectory data of the target object and the three-dimensional coordinate data of the key point of the target object extracted by the YOLO model in the image depth fusion data includes:
constructing a three-dimensional coordinate space;
sequentially acquiring the region position data of the target object and the three-dimensional coordinate data of the key points in each image depth fusion data according to the sequence of fusion time from first to last to obtain a plurality of groups of region position data and three-dimensional coordinate data of the key points;
and setting a three-dimensional model corresponding to the target object in the three-dimensional coordinate space according to the region position data and the three-dimensional coordinate data of the key points aiming at each group of the region position data and the three-dimensional coordinate data of the key points, and generating a frame of three-dimensional motion trail image.
6. The method of claim 3, wherein after said obtaining motion trajectory data of said target object, said method further comprises:
comparing the motion trail data of the target object with preset abnormal state data;
and when the similarity between the motion trail data of the target object and the abnormal state data is greater than a preset similarity threshold, executing abnormal alarm operation corresponding to the abnormal state data.
7. The method according to any one of claims 1-6, wherein before the sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model, the method further comprises:
simultaneously acquiring image data of a detection environment and depth data corresponding to a target object;
fusing the image data and the depth data which are acquired simultaneously to obtain sample image depth fusion data and marking attribute data and region position data for the sample image depth fusion data;
and performing data enhancement processing based on the sample image depth fusion to obtain a plurality of enhanced image depth fusion data corresponding to the sample image depth fusion data, and using each enhanced image depth fusion data as one sample image depth fusion data so as to train the target detection model by using all the obtained sample image depth fusion data.
8. An object detection device, comprising:
the acquisition module is used for acquiring image data of a detection environment and depth data corresponding to a target object at intervals of a preset time period;
the fusion module is used for sequentially fusing the image data and the depth data acquired at the same time according to the sequence of the acquisition time from first to last to obtain image depth fusion data corresponding to the same time;
the detection module is used for sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model according to the sequence of fusion time from first to last, extracting attribute data of the target object from the sequentially input image depth fusion data through the target detection model, and generating three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data.
9. An object detection device, characterized in that the object detection device comprises a processor, a memory; the processor is used for executing the target detection program stored in the memory to realize the target detection method of any one of claims 1-7.
10. A computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the object detection method of any one of claims 1-7.
CN202010819962.5A 2020-08-14 2020-08-14 Target detection method, device, equipment and computer readable storage medium Pending CN112036267A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010819962.5A CN112036267A (en) 2020-08-14 2020-08-14 Target detection method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010819962.5A CN112036267A (en) 2020-08-14 2020-08-14 Target detection method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112036267A true CN112036267A (en) 2020-12-04

Family

ID=73578617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010819962.5A Pending CN112036267A (en) 2020-08-14 2020-08-14 Target detection method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112036267A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112819804A (en) * 2021-02-23 2021-05-18 西北工业大学 Insulator defect detection method based on improved YOLOv5 convolutional neural network
CN112859907A (en) * 2020-12-25 2021-05-28 湖北航天飞行器研究所 Rocket debris high-altitude detection method based on three-dimensional special effect simulation under condition of few samples
CN114863201A (en) * 2022-03-24 2022-08-05 深圳元戎启行科技有限公司 Training method and device of three-dimensional detection model, computer equipment and storage medium
WO2023119968A1 (en) * 2021-12-20 2023-06-29 コニカミノルタ株式会社 Method for calculating three-dimensional coordinates and device for calculating three-dimensional coordinates
CN116524135A (en) * 2023-07-05 2023-08-01 方心科技股份有限公司 Three-dimensional model generation method and system based on image

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014154839A1 (en) * 2013-03-27 2014-10-02 Mindmaze S.A. High-definition 3d camera device
CN105912999A (en) * 2016-04-05 2016-08-31 中国民航大学 Human behavior identification method based on depth information
CN107330410A (en) * 2017-07-03 2017-11-07 南京工程学院 Method for detecting abnormality based on deep learning under complex environment
CN108171212A (en) * 2018-01-19 2018-06-15 百度在线网络技术(北京)有限公司 For detecting the method and apparatus of target
CN108229531A (en) * 2017-09-29 2018-06-29 北京市商汤科技开发有限公司 Characteristics of objects processing method, device, storage medium and electronic equipment
CN109376667A (en) * 2018-10-29 2019-02-22 北京旷视科技有限公司 Object detection method, device and electronic equipment
WO2019037498A1 (en) * 2017-08-25 2019-02-28 腾讯科技(深圳)有限公司 Active tracking method, device and system
US20190206066A1 (en) * 2017-12-29 2019-07-04 RetailNext, Inc. Human Analytics Using Fusion Of Image & Depth Modalities
CN111460978A (en) * 2020-03-30 2020-07-28 中国科学院自动化研究所南京人工智能芯片创新研究院 Infant behavior monitoring system based on motion judgment sensor and deep learning technology and judgment method thereof

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014154839A1 (en) * 2013-03-27 2014-10-02 Mindmaze S.A. High-definition 3d camera device
CN105912999A (en) * 2016-04-05 2016-08-31 中国民航大学 Human behavior identification method based on depth information
CN107330410A (en) * 2017-07-03 2017-11-07 南京工程学院 Method for detecting abnormality based on deep learning under complex environment
WO2019037498A1 (en) * 2017-08-25 2019-02-28 腾讯科技(深圳)有限公司 Active tracking method, device and system
CN108229531A (en) * 2017-09-29 2018-06-29 北京市商汤科技开发有限公司 Characteristics of objects processing method, device, storage medium and electronic equipment
US20190206066A1 (en) * 2017-12-29 2019-07-04 RetailNext, Inc. Human Analytics Using Fusion Of Image & Depth Modalities
CN108171212A (en) * 2018-01-19 2018-06-15 百度在线网络技术(北京)有限公司 For detecting the method and apparatus of target
CN109376667A (en) * 2018-10-29 2019-02-22 北京旷视科技有限公司 Object detection method, device and electronic equipment
CN111460978A (en) * 2020-03-30 2020-07-28 中国科学院自动化研究所南京人工智能芯片创新研究院 Infant behavior monitoring system based on motion judgment sensor and deep learning technology and judgment method thereof

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112859907A (en) * 2020-12-25 2021-05-28 湖北航天飞行器研究所 Rocket debris high-altitude detection method based on three-dimensional special effect simulation under condition of few samples
CN112819804A (en) * 2021-02-23 2021-05-18 西北工业大学 Insulator defect detection method based on improved YOLOv5 convolutional neural network
WO2023119968A1 (en) * 2021-12-20 2023-06-29 コニカミノルタ株式会社 Method for calculating three-dimensional coordinates and device for calculating three-dimensional coordinates
CN114863201A (en) * 2022-03-24 2022-08-05 深圳元戎启行科技有限公司 Training method and device of three-dimensional detection model, computer equipment and storage medium
CN116524135A (en) * 2023-07-05 2023-08-01 方心科技股份有限公司 Three-dimensional model generation method and system based on image
CN116524135B (en) * 2023-07-05 2023-09-15 方心科技股份有限公司 Three-dimensional model generation method and system based on image

Similar Documents

Publication Publication Date Title
CN109477951B (en) System and method for identifying persons and/or identifying and quantifying pain, fatigue, mood and intent while preserving privacy
CN112036267A (en) Target detection method, device, equipment and computer readable storage medium
US11948401B2 (en) AI-based physical function assessment system
Lu et al. Deep learning for fall detection: Three-dimensional CNN combined with LSTM on video kinematic data
Stone et al. Fall detection in homes of older adults using the Microsoft Kinect
CN112784662A (en) Video-based fall risk evaluation system
CN109726672B (en) Tumbling detection method based on human body skeleton sequence and convolutional neural network
Luštrek et al. Fall detection and activity recognition with machine learning
CN111753747B (en) Violent motion detection method based on monocular camera and three-dimensional attitude estimation
Kumar et al. Human activity recognition (har) using deep learning: Review, methodologies, progress and future research directions
CN107411753A (en) A kind of wearable device for identifying gait
Xu et al. Elders’ fall detection based on biomechanical features using depth camera
Nagalakshmi Vallabhaneni The analysis of the impact of yoga on healthcare and conventional strategies for human pose recognition
Alazrai et al. Fall detection for elderly using anatomical-plane-based representation
Mansoor et al. A machine learning approach for non-invasive fall detection using Kinect
Pogorelc et al. Detecting gait-related health problems of the elderly using multidimensional dynamic time warping approach with semantic attributes
Liu et al. A review of wearable sensors based fall-related recognition systems
CN115695734A (en) Infrared thermal imaging protection monitoring method, device, equipment, system and medium
CN113688740B (en) Indoor gesture detection method based on multi-sensor fusion vision
Seredin et al. The study of skeleton description reduction in the human fall-detection task
Dai Vision-based 3d human motion analysis for fall detection and bed-exiting
Pogorelc et al. Home-based health monitoring of the elderly through gait recognition
Nouisser et al. Deep learning and kinect skeleton-based approach for fall prediction of elderly physically disabled
Mastorakis Human fall detection methodologies: from machine learning using acted data to fall modelling using myoskeletal simulation
Li et al. Non-Invasive Screen Exposure Time Assessment Using Wearable Sensor and Object Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination