CN112036267A

CN112036267A - Target detection method, device, equipment and computer readable storage medium

Info

Publication number: CN112036267A
Application number: CN202010819962.5A
Authority: CN
Inventors: 邓海燕; 谭龙田; 陈高; 陈彦宇; 马雅奇
Original assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Current assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-12-04
Anticipated expiration: 2040-08-14
Also published as: CN112036267B

Abstract

The invention discloses a target detection method, apparatus, device and computer-readable storage medium. The method includes: collecting image data of a detection environment and depth data corresponding to a target object every preset time period; and sequentially merging the image data and depth data collected at the same time in the order of the collection time from first to last to obtain the same time Corresponding image depth fusion data; according to the sequence of fusion time from first to last, sequentially input the image depth fusion data obtained by fusion into the pre-trained target detection model, and extract the image depth fusion data from the sequentially input image depth fusion data through the target detection model. The attribute data of the target object, and the three-dimensional motion trajectory data corresponding to the target object is generated according to the sequentially input image depth fusion data. The present invention does not require the user to stand around the target object, and only needs to check the attribute data and three-dimensional motion trajectory data of the target object output by the target detection model, and the monitoring process is more flexible and the monitoring efficiency is high.

Description

A target detection method, apparatus, device and computer-readable storage medium

技术领域technical field

本发明涉及图像识别技术领域，尤其涉及一种目标检测方法、装置、设备和计算机可读存储介质。The present invention relates to the technical field of image recognition, and in particular, to a target detection method, apparatus, device and computer-readable storage medium.

背景技术Background technique

在一些应用场景中，需要关注目标对象的动态，以便识别目标对象的需求。例如：在日常生活中，存在一些生活不能自理，健康出现问题的特需人群，如：儿童、老人、病患等。这些特需人群需要24小时监护，以避免因无人监护出现的不良后果。例如：儿童因无人看管，造成儿童受伤，病患因无人陪护，造成发病无人救治。In some application scenarios, it is necessary to pay attention to the dynamics of the target object in order to identify the needs of the target object. For example, in daily life, there are some people with special needs who cannot take care of themselves and have health problems, such as children, the elderly, and patients. These people with special needs need 24-hour supervision to avoid adverse consequences due to unsupervised supervision. For example, children are injured because they are left unattended, and patients are left unattended because they are unattended.

目前，对特需人群的监护多采用人工监护，监护人员一般需要24小时陪伴在被监护人的周围，因为监护人员离开之后，被监护人可能会出现不可预知的危险。但是，采用24小时人工监护的方式，需要监护人员坚守在监护场所，如果需要同时监护多个被监护人，则需要一对一配备监护人员，使得这种人工监护方式的效率较为低下。At present, the guardianship of people with special needs is mostly manual guardianship. The guardians generally need to accompany the wards around the ward for 24 hours, because after the guardians leave, the wards may be in unpredictable danger. However, the 24-hour manual monitoring method requires guardians to stay in the monitoring place. If multiple wards need to be monitored at the same time, one-to-one guardians are required, which makes the efficiency of this manual monitoring method relatively low.

发明内容SUMMARY OF THE INVENTION

本发明实施例的主要目的在于提供一种目标检测方法、装置、设备和计算机可读存储介质，以解决现有的人工监护效率较低的问题。The main purpose of the embodiments of the present invention is to provide a target detection method, apparatus, device and computer-readable storage medium, so as to solve the problem of low efficiency of the existing manual monitoring.

针对上述技术问题，本发明实施例是通过以下技术方案来实现的：In view of the above-mentioned technical problems, the embodiments of the present invention are realized through the following technical solutions:

本发明实施例提供了一种目标检测方法，包括：每隔预设时间段采集检测环境的图像数据和目标对象对应的深度数据；按照采集时刻从先到后的顺序，顺次融合相同时刻采集的所述图像数据和所述深度数据，得到所述相同时刻对应的图像深度融合数据；按照融合时刻从先到后的顺序，顺次将融合得到的所述图像深度融合数据输入预先训练的目标检测模型，通过所述目标检测模型在顺次输入的所述图像深度融合数据中提取所述目标对象的属性数据，并且根据顺次输入的所述图像深度融合数据生成所述目标对象对应的三维运动轨迹数据。An embodiment of the present invention provides a target detection method, which includes: collecting image data of a detection environment and depth data corresponding to a target object every preset time period; and sequentially merging the collection at the same time in the order of the collection time from first to last The image data and the depth data are obtained, and the image depth fusion data corresponding to the same time is obtained; according to the sequence of fusion time from first to last, the image depth fusion data obtained by fusion is sequentially input into the pre-trained target A detection model, which extracts attribute data of the target object from the sequentially input image depth fusion data through the target detection model, and generates a three-dimensional image corresponding to the target object according to the sequentially input image depth fusion data Movement track data.

其中，所述顺次融合相同时刻采集的所述图像数据和所述深度数据，得到所述相同时刻对应的图像深度融合数据，包括：组合所述相同时刻采集的所述图像数据和所述深度数据，形成所述相同时刻对应的图像深度一维数组，将所述图像深度一维数组作为所述相同时刻对应的图像深度融合数据。The sequential fusion of the image data and the depth data collected at the same time to obtain image depth fusion data corresponding to the same time includes: combining the image data and the depth data collected at the same time data to form a one-dimensional array of image depths corresponding to the same time, and use the one-dimensional array of image depths as the image depth fusion data corresponding to the same time.

其中，所述目标检测模型包括：相互连接的YOLO模型和长短期记忆LSTM模型；所述通过所述目标检测模型在顺次输入的所述图像深度融合数据中提取所述目标对象的属性数据，并且根据顺次输入的所述图像深度融合数据生成所述目标对象对应的三维运动轨迹数据，包括：通过所述YOLO模型在顺次输入的所述图像深度融合数据中，检测所述目标对象，提取所述目标对象的关键点的三维坐标数据以及所述目标对象的属性数据；通过所述LSTM模型在顺次输入的所述图像深度融合数据中，对所述YOLO模型检测到的所述目标对象进行运动轨迹跟踪，得到所述目标对象的运动轨迹数据，并且根据所述目标对象的运动轨迹数据以及所述YOLO模型在所述图像深度融合数据中提取的所述目标对象的关键点的三维坐标数据，生成所述目标对象的三维运动轨迹数据。Wherein, the target detection model includes: a YOLO model and a long short-term memory LSTM model that are connected to each other; the target detection model extracts the attribute data of the target object from the sequentially input image depth fusion data, And generating the three-dimensional motion trajectory data corresponding to the target object according to the sequentially inputted image depth fusion data, comprising: detecting the target object in the sequentially inputted image depth fusion data by using the YOLO model, Extract the three-dimensional coordinate data of the key points of the target object and the attribute data of the target object; in the sequentially inputted image depth fusion data through the LSTM model, the target detected by the YOLO model The object performs motion trajectory tracking to obtain the motion trajectory data of the target object, and according to the motion trajectory data of the target object and the YOLO model, the three-dimensional image of the key points of the target object extracted from the image depth fusion data is extracted. coordinate data to generate three-dimensional motion trajectory data of the target object.

其中，所述YOLO模型包括：三维卷积层；所述提取所述目标对象的关键点的三维坐标数据，包括：通过所述三维卷积层在所述图像深度融合数据的图像数据部分中提取所述目标对象的关键点的二维特征数据，在所述图像深度融合数据的深度数据部分中提取所述目标对象的关键点的一维特征数据；根据所述二维特征数据和所述一维特征数据，生成所述目标对象的关键点的三维坐标数据；其中，所述一维特征数据的空间维度与所述二维特征数据的空间维度不同。Wherein, the YOLO model includes: a three-dimensional convolution layer; the extracting the three-dimensional coordinate data of the key points of the target object includes: extracting the image data part of the image depth fusion data through the three-dimensional convolution layer The two-dimensional feature data of the key points of the target object, extract the one-dimensional feature data of the key points of the target object in the depth data part of the image depth fusion data; according to the two-dimensional feature data and the one-dimensional feature data. dimensional feature data, and generate three-dimensional coordinate data of key points of the target object; wherein, the spatial dimension of the one-dimensional feature data is different from the spatial dimension of the two-dimensional feature data.

其中，所述运动轨迹数据为所述目标对象在所述图像深度融合数据中的区域位置数据；所述三维运动轨迹数据包括：多帧三维运动轨迹图像；所述根据所述目标对象的运动轨迹数据以及所述YOLO模型在所述图像深度融合数据中提取的所述目标对象的关键点的三维坐标数据，生成所述目标对象的三维运动轨迹数据，包括：构建三维坐标空间；按照融合时刻从先到后的顺序，顺次获取每个所述图像深度融合数据中所述目标对象的区域位置数据和关键点的三维坐标数据，得到多组所述区域位置数据和所述关键点的三维坐标数据；针对每组所述区域位置数据和所述关键点的三维坐标数据，根据所述区域位置数据和所述关键点的三维坐标数据，在所述三维坐标空间中设置所述目标对象对应的三维模型，生成一帧三维运动轨迹图像。Wherein, the motion trajectory data is the regional position data of the target object in the image depth fusion data; the 3D motion trajectory data includes: multiple frames of 3D motion trajectory images; the motion trajectory according to the target object Data and the three-dimensional coordinate data of the key points of the target object extracted by the YOLO model in the image depth fusion data, generating the three-dimensional motion trajectory data of the target object, including: constructing a three-dimensional coordinate space; In a first-come-last order, sequentially acquire the regional position data of the target object and the three-dimensional coordinate data of the key points in each of the image depth fusion data, and obtain multiple sets of the regional position data and the three-dimensional coordinates of the key points data; for each group of the regional position data and the three-dimensional coordinate data of the key points, according to the regional position data and the three-dimensional coordinate data of the key points, set the corresponding target object in the three-dimensional coordinate space. The 3D model generates a frame of 3D motion trajectory image.

其中，在所述得到所述目标对象的运动轨迹数据之后，所述方法还包括：将所述目标对象的运动轨迹数据与预设的异常状态数据进行比较；在所述目标对象的运动轨迹数据与所述异常状态数据的相似度大于预设的相似度阈值时，执行所述异常状态数据对应的异常告警操作。Wherein, after obtaining the motion trajectory data of the target object, the method further includes: comparing the motion trajectory data of the target object with preset abnormal state data; When the similarity with the abnormal state data is greater than a preset similarity threshold, an abnormal alarm operation corresponding to the abnormal state data is performed.

其中，在所述顺次将融合得到的所述图像深度融合数据输入预先训练的目标检测模型之前，所述方法还包括：同时采集检测环境的图像数据和目标对象对应的深度数据；融合同时采集的所述图像数据和所述深度数据，得到样本图像深度融合数据并为所述样本图像深度融合数据标注属性数据和区域位置数据；基于所述样本图像深度融合执行数据增强处理，得到所述样本图像深度融合数据对应的多个增强图像深度融合数据，将每个所述增强图像深度融合数据作为一个样本图像深度融合数据，以便利用得到的所有样本图像深度融合数据训练所述目标检测模型。Wherein, before sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model, the method further includes: simultaneously collecting image data of the detection environment and depth data corresponding to the target object; The image data and the depth data are obtained, and the sample image depth fusion data is obtained and the attribute data and the region position data are marked for the sample image depth fusion data; based on the sample image depth fusion, data enhancement processing is performed to obtain the sample image. For a plurality of enhanced image depth fusion data corresponding to the image depth fusion data, each of the enhanced image depth fusion data is regarded as a sample image depth fusion data, so as to train the target detection model by using all the obtained sample image depth fusion data.

本发明实施例还提供了一种目标检测装置，包括：采集模块，用于每隔预设时间段采集检测环境的图像数据和目标对象对应的深度数据；融合模块，用于按照采集时刻从先到后的顺序，顺次融合相同时刻采集的所述图像数据和所述深度数据，得到所述相同时刻对应的图像深度融合数据；检测模块，用于按照融合时刻从先到后的顺序，顺次将融合得到的所述图像深度融合数据输入预先训练的目标检测模型，通过所述目标检测模型在顺次输入的所述图像深度融合数据中提取所述目标对象的属性数据，并且根据顺次输入的所述图像深度融合数据生成所述目标对象对应的三维运动轨迹数据。The embodiment of the present invention also provides a target detection device, including: a collection module, used for collecting image data of the detection environment and depth data corresponding to the target object every preset time period; The image data and the depth data collected at the same time are fused in sequence to obtain the image depth fusion data corresponding to the same time. The image depth fusion data obtained by fusion is input into a pre-trained target detection model, and the attribute data of the target object is extracted from the sequentially input image depth fusion data by the target detection model. The inputted image depth fusion data generates three-dimensional motion trajectory data corresponding to the target object.

本发明实施例还提供了一种目标检测设备，所述目标检测设备包括处理器、存储器；所述处理器用于执行所述存储器中存储的目标检测程序，以实现上述任一项所述的目标检测法。An embodiment of the present invention further provides a target detection device, where the target detection device includes a processor and a memory; the processor is configured to execute a target detection program stored in the memory, so as to achieve the target described in any one of the above detection method.

本发明实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质存储有一个或者多个程序，所述一个或者多个程序可被一个或者多个处理器执行，以实现上述任一项所述的目标检测方法。An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement the above-mentioned The target detection method of any one.

本发明实施例的有益效果如下：The beneficial effects of the embodiments of the present invention are as follows:

在本发明实施例中，采集目标对象所在检测环境的图像数据和目标对象对应的深度数据，将图像数据和深度数据融合为一路数据并输入预先训练的目标检测模型中，展示该目标检测模型输出的目标对象的属性数据和三维运动轨迹数据，使得用户可以直观地看到目标对象的属性信息以及目标对象的运动轨迹。通过本发明实施例，用户无需一直坚守在目标对象的周围，查看目标检测模型输出的目标对象的属性数据和三维运动轨迹数据即可，监护过程更加灵活，并且可以同时监护多个监护对象，人工成本低，监护效率高。In the embodiment of the present invention, the image data of the detection environment where the target object is located and the depth data corresponding to the target object are collected, the image data and the depth data are fused into one data and input into a pre-trained target detection model, and the output of the target detection model is displayed. The attribute data and three-dimensional motion trajectory data of the target object, so that the user can intuitively see the attribute information of the target object and the motion trajectory of the target object. With the embodiment of the present invention, the user does not need to stick around the target object all the time, just check the attribute data and three-dimensional motion trajectory data of the target object output by the target detection model, the monitoring process is more flexible, and can monitor multiple monitoring objects at the same time. Low cost and high monitoring efficiency.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解，构成本申请的一部分，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。在附图中：The accompanying drawings described herein are used to provide a further understanding of the present invention and constitute a part of the present application. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the attached image:

图1是根据本发明一实施例的目标检测方法的流程图；1 is a flowchart of a target detection method according to an embodiment of the present invention;

图2是根据本发明一实施例的目标检测模型的处理步骤流程图；2 is a flowchart of processing steps of a target detection model according to an embodiment of the present invention;

图3是根据本发明一实施例的LSTM模型的处理步骤流程图；3 is a flowchart of processing steps of an LSTM model according to an embodiment of the present invention;

图4是根据本发明一实施例的目标检测装置的结构图；4 is a structural diagram of a target detection device according to an embodiment of the present invention;

图5是根据本发明一实施例的目标检测设备的结构图。FIG. 5 is a structural diagram of a target detection device according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，以下结合附图及具体实施例，对本发明作进一步地详细说明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

根据本发明的实施例，提供了一种目标检测方法。如图1所示，为根据本发明一实施例的目标检测方法的流程图。According to an embodiment of the present invention, a target detection method is provided. As shown in FIG. 1 , it is a flowchart of a target detection method according to an embodiment of the present invention.

步骤S110，每隔预设时间段采集检测环境的图像数据和目标对象对应的深度数据。In step S110, image data of the detection environment and depth data corresponding to the target object are collected every preset time period.

预设时间段的时间长度为经验值或者通过实验获得的值。The time length of the preset time period is an empirical value or a value obtained through experiments.

检测环境是指目标对象所在的环境。The detection environment refers to the environment in which the target object is located.

目标对象是指监护对象的图像。例如：监护对象为儿童、老人、病患，当然，监护对象也可以是动物。The target object refers to the image of the guardian object. For example, the guardianship objects are children, the elderly, and the sick. Of course, the guardianship objects can also be animals.

具体而言，同时调用摄像头和深度传感器，使摄像头采集检测环境的图像数据，使深度传感器采集目标对象对应的深度数据。摄像头的拍摄间隔和深度传感器的采样间隔。该拍摄间隔和采样间隔为该预设时间段。Specifically, the camera and the depth sensor are called at the same time, so that the camera collects image data of the detection environment, and the depth sensor collects the depth data corresponding to the target object. The shooting interval of the camera and the sampling interval of the depth sensor. The shooting interval and the sampling interval are the preset time period.

图像数据是指摄像头采集的一帧图像的数据。进一步地，摄像头的取景范围为检测环境。Image data refers to the data of one frame of image collected by the camera. Further, the viewing range of the camera is the detection environment.

深度数据是指以深度传感器的位置作为起点，监护对象在检测环境中的深度值。其中，将监护对象在检测环境中的深度值作为目标对象对应的深度值。进一步地，深度传感器正对监护对象或者监护对象的监护位置。该监护位置例如是儿童床、患者的床位。进一步地，如果深度传感器正对可运动的监护对象，则可以在监护对象上设置定位装置，使该定位装置向深度传感器实时地传输目标对象所在的位置；或者，在深度传感器上设置红外接收器，在监护对象上设置红外发射器，使该红外发射器向红外接收器发送红外信号，使红外接收器根据接收到的红外信号定位红外发射器的位置，并控制深度传感器对准该红外发射器的位置。The depth data refers to the depth value of the monitoring object in the detection environment with the position of the depth sensor as the starting point. Wherein, the depth value of the guardian object in the detection environment is taken as the depth value corresponding to the target object. Further, the depth sensor is facing the monitoring object or the monitoring position of the monitoring object. The monitoring location is, for example, a child's bed, a patient's bed. Further, if the depth sensor is facing the movable monitoring object, a positioning device can be set on the monitoring object, so that the positioning device transmits the position of the target object to the depth sensor in real time; or, an infrared receiver is set on the depth sensor. , Set up an infrared transmitter on the monitoring object, so that the infrared transmitter sends an infrared signal to the infrared receiver, so that the infrared receiver locates the position of the infrared transmitter according to the received infrared signal, and controls the depth sensor to align the infrared transmitter. s position.

步骤S120，按照采集时刻从先到后的顺序，顺次融合相同时刻采集的所述图像数据和所述深度数据，得到所述相同时刻对应的图像深度融合数据。In step S120, the image data and the depth data collected at the same time are sequentially fused according to the sequence of the collection time from first to last, to obtain the image depth fusion data corresponding to the same time.

相邻两个采样时刻的间隔为上述的预设时间段。每个采集时刻可以采集到一组图像数据和深度数据。在先的采集时刻早于在后的采集时刻。The interval between two adjacent sampling moments is the above-mentioned preset time period. A set of image data and depth data can be collected at each collection moment. The earlier acquisition time is earlier than the later acquisition time.

按照采集时刻从先到后的顺序，将每组图像数据和深度数据进行融合处理，得到该组图像数据和深度数据对应的图像深度融合数据。According to the sequence of acquisition time from first to last, each group of image data and depth data are fused to obtain image depth fusion data corresponding to the group of image data and depth data.

具体而言，融合处理包括：组合所述相同时刻采集的所述图像数据和所述深度数据，形成所述相同时刻对应的图像深度一维数组，将所述图像深度一维数组作为所述相同时刻对应的图像深度融合数据。将图像数据作为图像深度一维数组中的图像数据部分(图像数据元素)，将深度数据作为图像深度一维数组中的深度数据部分(深度数据元素)。进一步地，本实施例对图像数据元素和深度数据元素在图像深度一维数组中的排序不做限定。Specifically, the fusion process includes: combining the image data and the depth data collected at the same time to form a one-dimensional array of image depths corresponding to the same time, and using the one-dimensional array of image depths as the same The image depth fusion data corresponding to the moment. The image data is taken as the image data part (image data element) in the image depth one-dimensional array, and the depth data is taken as the depth data part (depth data element) in the image depth one-dimensional array. Further, the order of the image data elements and the depth data elements in the one-dimensional array of image depths is not limited in this embodiment.

步骤S130，按照融合时刻从先到后的顺序，顺次将融合得到的所述图像深度融合数据输入预先训练的目标检测模型，通过所述目标检测模型提取所述目标对象的属性数据并生成所述目标对象对应的三维运动轨迹数据。Step S130, according to the sequence of fusion time, input the image depth fusion data obtained by fusion into the pre-trained target detection model in sequence, extract the attribute data of the target object through the target detection model, and generate the target detection model. The three-dimensional motion trajectory data corresponding to the target object is described.

目标检测模型，用于在顺次输入的所述图像深度融合数据中提取所述目标对象的属性数据，并且根据顺次输入的所述图像深度融合数据生成所述目标对象对应的三维运动轨迹数据。A target detection model is used to extract the attribute data of the target object from the sequentially input image depth fusion data, and generate three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data .

目标对象的属性数据，包括但不限于：目标对象的类别、姓名、年龄、身高、肤色、着装和编号。目标对象的类别例如是：儿童、老人和病患。编号例如是：腕带上标明的住院号码。The attribute data of the target object, including but not limited to: the category, name, age, height, skin color, dress and number of the target object. The categories of target objects are, for example, children, the elderly and the sick. The number is for example: the hospital number indicated on the wristband.

三维运动轨迹数据，是指目标对象在三维坐标空间中运动的视频数据。The three-dimensional motion track data refers to the video data of the target object moving in the three-dimensional coordinate space.

三维运动轨迹数据，包括：多帧三维运动轨迹图像。播放该多帧三维运动轨迹图像用于展示目标对象在三维坐标空间中运动轨迹。Three-dimensional motion track data, including: multiple frames of three-dimensional motion track images. Playing the multi-frame three-dimensional motion trajectory images is used to display the motion trajectory of the target object in the three-dimensional coordinate space.

进一步地，三维运动轨迹数据，包括：目标对象(监护对象)在当前采集时刻的三维运动轨迹数据和目标对象在未来时刻的三维运动轨迹数据。未来时刻是指从当前采集时刻开始，预设时间长度之后的时刻。也即是说，通过训练目标检测模型，使得该目标检测模型根据顺次输入的图像深度融合数据预测目标对象在未来时刻的三维运动轨迹数据。Further, the three-dimensional motion trajectory data includes: the three-dimensional motion trajectory data of the target object (monitoring object) at the current collection moment and the three-dimensional motion trajectory data of the target object at the future moment. The future time refers to the time after the preset time length from the current collection time. That is to say, by training the target detection model, the target detection model predicts the three-dimensional motion trajectory data of the target object in the future according to the sequentially input image depth fusion data.

在预设的显示设备中展示目标检测模型输出的目标对象的属性数据以及目标对象对应的三维运动轨迹数据。The attribute data of the target object output by the target detection model and the three-dimensional motion trajectory data corresponding to the target object are displayed on the preset display device.

进一步地，由于监护对象的状态具有灵活性，监护对象的类型具有多样性(儿童、老人、各种病患)，且监护对象在检测环境中的移动路径随机，所以利用目标检测模型对目标对象(监护对象)进行轨迹跟踪并生成三维运动轨迹数据，对监护对象的自动化监护具有重要意义。Further, because the state of the monitoring object is flexible, the types of monitoring objects are diverse (children, elderly, various patients), and the moving path of the monitoring object in the detection environment is random, so the target detection model is used to detect the target object. (Monitoring object) to track the trajectory and generate three-dimensional motion trajectory data, which is of great significance to the automatic monitoring of the monitoring object.

进一步地，由于预先训练的目标检测模型可以预测目标对象在未来时刻的三维运动轨迹数据，所以，用户可以根据目标对象在未来时刻的三维运动轨迹数据进行预判，对监护对象进行风险预估，及时解决监护对象遇到的问题，保证监护对象的安全性，避免问题发现或者处理不及时造成的时间、人工和经济成本的浪费。例如：根据一段时间采集的图像数据和深度数据，预测到目标对象即将拔掉输液针，这时用户可以及时出现在目标对象身边，帮助或者阻止目标对象的拔针行为。Further, since the pre-trained target detection model can predict the three-dimensional motion trajectory data of the target object in the future, the user can make predictions according to the three-dimensional motion trajectory data of the target object in the future, and perform risk estimation on the guardian object. Solve the problems encountered by the guardian in a timely manner, ensure the safety of the guardian, and avoid the waste of time, labor and economic costs caused by problem discovery or untimely processing. For example, according to the image data and depth data collected over a period of time, it is predicted that the target object is about to pull out the infusion needle. At this time, the user can appear beside the target object in time to help or prevent the target object from pulling out the needle.

为了使本发明实施例的目标检测过程更加清楚，下面对目标检测模型的结构和功能进行进一步地描述。In order to make the target detection process of the embodiment of the present invention clearer, the structure and function of the target detection model are further described below.

目标检测模型为预先训练的模型。所述目标检测模型包括：相互连接的YOLO(YouOnly Look Once)模型和LSTM(Long Short-Term Memory，长短期记忆)模型。The object detection model is a pre-trained model. The target detection model includes: a YOLO (YouOnly Look Once) model and an LSTM (Long Short-Term Memory, long short-term memory) model that are connected to each other.

下面对目标检测模型的处理过程进行进一步地描述。The processing procedure of the target detection model is further described below.

如图2所示，为根据本发明一实施例的目标检测模型的处理步骤流程图。As shown in FIG. 2 , it is a flowchart of processing steps of a target detection model according to an embodiment of the present invention.

步骤S210，顺次将融合得到的图像深度融合数据输入YOLO模型。Step S210, input the image depth fusion data obtained by fusion into the YOLO model in sequence.

YOLO模型，用于在图像深度融合数据的图像数据中，检测目标对象，提取目标对象的属性数据，检测目标对象的关键点并且提取关键点的三维坐标数据。其中，关键点的数量可以是多个。The YOLO model is used to detect the target object in the image data of the image depth fusion data, extract the attribute data of the target object, detect the key points of the target object and extract the three-dimensional coordinate data of the key points. Wherein, the number of key points can be multiple.

关键点，是指目标对象的关键位置。例如：眉头、眼角、嘴角、耳垂、肩头、肘部、膝盖等。关键点可以勾勒出目标对象的图像区域。The key point refers to the key position of the target object. For example: eyebrows, eye corners, mouth corners, earlobes, shoulders, elbows, knees, etc. Keypoints can outline the image area of the target object.

步骤S220，通过YOLO模型在顺次输入的所述图像深度融合数据中，检测所述目标对象，提取所述目标对象的关键点的三维坐标数据以及所述目标对象的属性数据。Step S220, the YOLO model is used to detect the target object in the sequentially input image depth fusion data, and extract the three-dimensional coordinate data of the key points of the target object and the attribute data of the target object.

关键点的三维坐标数据是指关键点在三维空间中的坐标。The three-dimensional coordinate data of the key point refers to the coordinates of the key point in the three-dimensional space.

在本实施例中，为了更加精确度地展示目标对象的三维运动数据，可以提取多个关键点中每个关键点的三维坐标数据。In this embodiment, in order to display the three-dimensional motion data of the target object more accurately, three-dimensional coordinate data of each key point among the multiple key points may be extracted.

通过YOLO模型将检测到的目标对象以及检测到的目标对象的关键点输出到LSTM模型中。The detected target object and the key points of the detected target object are output to the LSTM model through the YOLO model.

步骤S230，通过LSTM模型在顺次输入的所述图像深度融合数据中，对YOLO模型检测到的所述目标对象进行运动轨迹跟踪，得到所述目标对象的运动轨迹数据，并且根据所述目标对象的运动轨迹数据以及YOLO模型在所述图像深度融合数据中提取的所述目标对象的关键点的三维坐标数据，生成所述目标对象的三维运动轨迹数据。Step S230, in the sequentially inputted image depth fusion data through the LSTM model, the target object detected by the YOLO model is tracked by motion trajectory, to obtain the motion trajectory data of the target object, and according to the target object and the three-dimensional coordinate data of the key points of the target object extracted by the YOLO model in the image depth fusion data to generate the three-dimensional motion trajectory data of the target object.

LSTM模型，用于根据YOLO模型输出的结果，对目标对象进行运动轨迹跟踪，并生成目标对象的三维运动轨迹数据。The LSTM model is used to track the motion trajectory of the target object according to the output result of the YOLO model, and generate the three-dimensional motion trajectory data of the target object.

运动轨迹跟踪，即是对目标对象的关键点的跟踪。Motion trajectory tracking is the tracking of key points of the target object.

运动轨迹数据，包括：目标对象在图像深度融合数据中的区域位置数据。该区域位置数据可以是目标对象的关键点形成的图形的区域位置。该图形与目标对象相似。The motion trajectory data includes: the region position data of the target object in the image depth fusion data. The area position data may be the area position of the graph formed by the key points of the target object. The graphic is similar to the target object.

图像深度融合数据包括图像数据部分和深度数据部分，运动轨迹数据为所述目标对象在图像深度融合数据的图像数据中的区域位置数据。The image depth fusion data includes an image data part and a depth data part, and the motion track data is the region position data of the target object in the image data of the image depth fusion data.

由于目标对象的关键点可以反映出目标对象的姿态，所以运动轨迹数据可以反映监护对象(目标对象)的状态。可以利用一帧图像数据的运动轨迹数据或者多帧图像数据的运动轨迹数据，确定监护对象的状态。该状态包括：健康状态、行为状态。例如：通过一帧图像数据的运动轨迹数据，确定监护对象是否处于检测环境的预设范围内。又如：通过一帧图像数据的运动轨迹数据，确定监护对象是否佩戴口罩，是否与他人保持距离。再如：通过多帧图像数据的运动轨迹数据，确定监护对象是否步态正常，确定监护对象是否出现抽搐。Since the key points of the target object can reflect the posture of the target object, the motion trajectory data can reflect the state of the guardian object (target object). The state of the monitoring object can be determined by using the motion track data of one frame of image data or the motion track data of multiple frames of image data. The state includes: health state, behavior state. For example, it is determined whether the monitoring object is within the preset range of the detection environment through the motion trajectory data of a frame of image data. Another example: through the motion trajectory data of a frame of image data, it is determined whether the monitoring object wears a mask and keeps a distance from others. For another example: through the motion track data of the multi-frame image data, it is determined whether the gait of the monitoring object is normal, and whether the monitoring object has convulsions.

在目标检测模型中，还可以包括基础网络模型，该基础网络模型的输出连接YOLO模型的输入，YOLO模型的输出连接LSTM模型的输入。该基础网络模型，用于对图像深度融合数据中的图像数据进行预处理，将图像数据转换为灰度数据并降低图像数据中的噪声。In the target detection model, a basic network model may also be included, the output of the basic network model is connected to the input of the YOLO model, and the output of the YOLO model is connected to the input of the LSTM model. The basic network model is used to preprocess the image data in the image depth fusion data, convert the image data to grayscale data and reduce the noise in the image data.

下面对YOLO模型检测关键点以及提取关键点的三维坐标数据的过程进行进一步地描述。The process of YOLO model detecting key points and extracting three-dimensional coordinate data of the key points is further described below.

YOLO模型在检测到目标对象的图像之后，在目标对象的图像中进行关键点识别，针对识别出的多个预设类型的关键点，提取每种关键点的三维坐标数据。多个预设类型的关键点包括但不限于：眉心关键点、肩部关键点、肘部关键点、腰部关键点、膝盖关键点、脚步关键点。After detecting the image of the target object, the YOLO model performs key point recognition in the image of the target object, and extracts the three-dimensional coordinate data of each key point for the identified key points of multiple preset types. Multiple preset types of keys include but are not limited to: eyebrow key, shoulder key, elbow key, waist key, knee key, footstep key.

进一步地，YOLO模型包括：三维卷积层。进一步地，该YOLO模型为第四版YOLO算法模型(YOLOv4)。将YOLO模型的二维卷积层扩展为三维卷积层。该三维卷积层包括三个通道的卷积层。三个通道的卷积层的结构相同，每个通道的卷积层用于提取一个空间维度的特征数据。Further, the YOLO model includes: a three-dimensional convolutional layer. Further, the YOLO model is the fourth version of the YOLO algorithm model (YOLOv4). Extend the 2D convolutional layer of the YOLO model to a 3D convolutional layer. The three-dimensional convolutional layer includes a three-channel convolutional layer. The structure of the convolutional layers of the three channels is the same, and the convolutional layer of each channel is used to extract feature data of one spatial dimension.

通过所述三维卷积层在所述图像深度融合数据的图像数据部分中提取所述目标对象的关键点的二维特征数据，在所述图像深度融合数据的深度数据部分中提取所述目标对象的关键点的一维特征数据；根据所述二维特征数据和所述一维特征数据，生成所述目标对象的关键点的三维坐标数据。The two-dimensional feature data of the key points of the target object is extracted from the image data part of the image depth fusion data by the three-dimensional convolution layer, and the target object is extracted from the depth data part of the image depth fusion data The one-dimensional feature data of the key point of the target object; and the three-dimensional coordinate data of the key point of the target object is generated according to the two-dimensional feature data and the one-dimensional feature data.

其中，所述一维特征数据的空间维度与所述二维特征数据的空间维度不同。例如：二维特征数据包括：关键点对应的X轴和Y轴的坐标；一维特征数据包括：关键点对应的Z轴坐标；将关键点对应的X轴、Y轴和Z轴坐标进行融合，得到关键点对应的三维坐标数据。Wherein, the spatial dimension of the one-dimensional feature data is different from the spatial dimension of the two-dimensional feature data. For example, the two-dimensional feature data includes: the coordinates of the X-axis and Y-axis corresponding to the key points; the one-dimensional feature data includes: the Z-axis coordinates corresponding to the key points; the X-axis, Y-axis and Z-axis coordinates corresponding to the key points are fused , to obtain the three-dimensional coordinate data corresponding to the key points.

下面针对LSTM模型的处理过程进行进一步地描述。如图3所示，为根据本发明一实施例的LSTM模型的处理步骤流程图。The processing of the LSTM model is further described below. As shown in FIG. 3 , it is a flowchart of processing steps of the LSTM model according to an embodiment of the present invention.

步骤S310，构建三维坐标空间。Step S310, constructing a three-dimensional coordinate space.

该三维坐标空间可以为预设视角的天空盒子。The three-dimensional coordinate space may be a skybox with a preset viewing angle.

该预设视角与采集图像数据的摄像头的视角相同。The preset angle of view is the same as the angle of view of the camera that collects image data.

步骤S320，按照融合时刻从先到后的顺序，顺次获取每个所述图像深度融合数据中所述目标对象的区域位置数据和关键点的三维坐标数据，得到多组区域位置数据和关键点的三维坐标数据。Step S320, according to the order of fusion time from first to last, sequentially obtain the regional position data of the target object and the three-dimensional coordinate data of the key points in each of the image depth fusion data, and obtain multiple groups of regional position data and key points. 3D coordinate data.

三维运动轨迹数据包括：多帧三维运动轨迹图像。三维运动轨迹图像是指目标对象对应的三维模型在三维坐标空间中的图像。The three-dimensional motion track data includes: multiple frames of three-dimensional motion track images. The three-dimensional motion trajectory image refers to the image of the three-dimensional model corresponding to the target object in the three-dimensional coordinate space.

目标对象的区域位置数据可以体现目标对象在图像数据中的区域位置。The regional position data of the target object may reflect the regional position of the target object in the image data.

LSTM模型可以在连续输入的多个图像深度融合数据中，提取多帧图像数据的时序特征，根据时序特征确定目标对象在图像数据中的区域位置。提取的时序特征为前一帧图像数据和后一帧图像数据中的异同数据。如：人在步行前进过程中是连贯的，人的图像由近及远(或是由远及近)在相邻两帧图像数据之间存在部分相似与不同，时间特征用于体现这种时序关系，可以表示前后两帧图像数据之间在时序间的异同特征。The LSTM model can extract the time series features of multiple frames of image data from the continuously input multiple image depth fusion data, and determine the regional position of the target object in the image data according to the time series features. The extracted time series features are the similarities and differences between the image data of the previous frame and the image data of the next frame. For example, people are coherent in the process of walking, and there are partial similarities and differences between two adjacent frames of image data from near to far (or from far to near) images of people, and temporal features are used to reflect this time sequence. The relationship can represent the similarities and differences between the two frames of image data before and after the time series.

步骤S330，针对每组区域位置数据和关键点的三维坐标数据，根据所述区域位置数据和关键点的三维坐标数据，在所述三维坐标空间中设置所述目标对象对应的三维模型，生成一帧三维运动轨迹图像。Step S330, for each group of regional position data and the three-dimensional coordinate data of the key points, according to the regional position data and the three-dimensional coordinate data of the key points, set the three-dimensional model corresponding to the target object in the three-dimensional coordinate space, and generate a Frame 3D motion trajectory images.

每组区域位置数据和关键点的三维坐标数据，为在同一图像深度融合数据中提取得到的。Each group of regional position data and three-dimensional coordinate data of key points are extracted from the same image depth fusion data.

在本实施例中，可以使用Unity3D生成目标对象的三维运动轨迹图像。In this embodiment, Unity3D can be used to generate a three-dimensional motion trajectory image of the target object.

具体而言，可以预先采集监护对象的三维模型，即目标对象对应的三维模型。区域位置数据指示了目标对象在XOY平面的区域位置；关键点的三维坐标数据指示了目标对象的关键点在三维坐标空间中的位置；将该三维模型设置在目标对象在XOY平面的区域位置；确定目标对象的每个关键点在该三维模型上分别映射的映射点，将每个映射点的三维坐标调整为对应的关键点的三维坐标；在调整完成之后，生成预设视角下的三维模型的图像，作为目标对象对应的一帧三维运动轨迹图像。Specifically, the three-dimensional model of the monitoring object, that is, the three-dimensional model corresponding to the target object, may be collected in advance. The regional position data indicates the regional position of the target object in the XOY plane; the three-dimensional coordinate data of the key point indicates the position of the key point of the target object in the three-dimensional coordinate space; the three-dimensional model is set at the regional position of the target object in the XOY plane; Determine the mapping points to which each key point of the target object is mapped on the three-dimensional model, and adjust the three-dimensional coordinates of each mapping point to the three-dimensional coordinates of the corresponding key points; after the adjustment is completed, generate a three-dimensional model under a preset viewing angle , as a frame of three-dimensional motion trajectory image corresponding to the target object.

在本实施例中，由于目标对象的运动轨迹数据可以用于表示监护对象的状态，所以，为了避免错过监护对象出现的异常行为，在得到目标对象的运动轨迹数据之后，将所述目标对象的运动轨迹数据与预设的异常状态数据进行比较；在目标对象的运动轨迹数据与所述异常状态数据的相似度大于预设的相似度阈值时，执行所述异常状态数据对应的异常告警操作。In this embodiment, since the motion trajectory data of the target object can be used to represent the state of the monitored object, in order to avoid missing the abnormal behavior of the monitored object, after the motion trajectory data of the target object is obtained, the The motion track data is compared with preset abnormal state data; when the similarity between the motion track data of the target object and the abnormal state data is greater than a preset similarity threshold, an abnormal alarm operation corresponding to the abnormal state data is performed.

异常状态数据是监护对象出现异常行为时的图像数据。The abnormal state data is the image data when the guardian object has abnormal behavior.

相似度阈值为经验值或者通过实验获得的值。The similarity threshold is an empirical value or a value obtained through experiments.

异常告警操作包括：发出预设的告警音，和/或，向目标用户发送与异常状态数据对应告警内容的信息。该信息为文本信息和/或语音信息。The abnormal alarm operation includes: sending out a preset alarm tone, and/or sending information of the alarm content corresponding to the abnormal state data to the target user. The information is text information and/or voice information.

例如：监护对象为病患，目标对象的运动轨迹数据为病患正在拔输液针，异常状态数据为监护对象拔输液针的图像数据，这样，目标对象的运动轨迹数据与异常状态数据的相似度大于相似度阈值，发出预设的告警音或者向护士站发送病患正在试图拔针的信息。For example: the monitoring object is the patient, the movement trajectory data of the target object is that the patient is pulling out the infusion needle, and the abnormal state data is the image data of the monitoring object pulling out the infusion needle. In this way, the similarity between the movement trajectory data of the target object and the abnormal state data If it is greater than the similarity threshold, a preset alarm tone will be issued or the information that the patient is trying to withdraw the needle will be sent to the nurse station.

在本实施例中，目标检测模型为预先训练得到的。在训练目标检测模型之前，需要为目标检测模型设置样本数据集。该样本数据集包括多个已经被标注的样本图像深度融合数据。In this embodiment, the target detection model is obtained by pre-training. Before training an object detection model, a sample dataset needs to be set up for the object detection model. The sample dataset includes a number of annotated sample image depth fusion data.

具体而言，同时采集检测环境的图像数据和目标对象对应的深度数据；融合同时采集的所述图像数据和所述深度数据，得到样本图像深度融合数据并为所述样本图像深度融合数据标注属性数据和区域位置数据。进一步地，在采集检测环境的图像数据时，尽可能采集监护对象的侧面、正面、背面等多角度的图像数据，保证样本图像深度融合数据的多样性，以便于更好的进行目标对象的三维检测。Specifically, the image data of the detection environment and the depth data corresponding to the target object are collected at the same time; the image data and the depth data collected at the same time are fused to obtain sample image depth fusion data and annotate attributes for the sample image depth fusion data data and regional location data. Further, when collecting the image data of the detection environment, collect the image data from multiple angles such as the side, front, and back of the monitoring object as much as possible to ensure the diversity of the deep fusion data of the sample images, so as to better carry out the three-dimensional image of the target object. detection.

在样本图像深度融合数据的数量较少时，可以对样本图像深度融合数据进行增强处理，即在已有的样本图像深度融合数据的基础上，扩充样本图像深度融合数据的数量。具体的，基于所述样本图像深度融合执行数据增强处理，得到所述样本图像深度融合数据对应的多个增强图像深度融合数据，将每个所述增强图像深度融合数据作为一个样本图像深度融合数据，以便利用得到的所有样本图像深度融合数据训练所述目标检测模型。When the quantity of the sample image depth fusion data is small, the sample image depth fusion data can be enhanced, that is, on the basis of the existing sample image depth fusion data, the quantity of the sample image depth fusion data is expanded. Specifically, data enhancement processing is performed based on the deep fusion of the sample images to obtain a plurality of deep fusion data of the enhanced images corresponding to the deep fusion data of the sample images, and each of the deep fusion data of the enhanced images is regarded as a deep fusion data of the sample images , so as to train the target detection model using the obtained deep fusion data of all sample images.

进一步地，对样本图像深度融合数据中的图像数据进行随机裁剪、缩放、旋转、翻转、镜像、添加噪声以及随机调整对比度、亮度、色度等，得到新的图像数据，将新的图像数据和样本图像深度融合数据中的深度数据进行组合，得到新的图像深度融合数据，将新的图像深度融合数据作为样本图像深度融合数据。Further, random cropping, scaling, rotating, flipping, mirroring, adding noise, and randomly adjusting contrast, brightness, chroma, etc. are performed on the image data in the sample image depth fusion data to obtain new image data, and the new image data and The depth data in the sample image depth fusion data is combined to obtain new image depth fusion data, and the new image depth fusion data is used as the sample image depth fusion data.

将得到的样本图像深度融合数据存储到预设的样本数据集中。The obtained sample image depth fusion data is stored in a preset sample data set.

在训练目标检测模型之前，设置目标检测模型的训练数据。在本实施例中，该训练数据包括但不限于：迭代次数，目标检测模型的权重初始值和网络结构，学习率，卷积核。Before training the object detection model, set the training data for the object detection model. In this embodiment, the training data includes, but is not limited to, the number of iterations, the initial weight and network structure of the target detection model, the learning rate, and the convolution kernel.

在训练目标检测模型时，采用梯度下降法训练目标检测模型。在利用训练数据集训练目标检测模型的过程中，根据预设的迭代次数，不断调整目标检测模型的参数和权重。在迭代结束之后，如果目标检测模型尚未收敛，则调整目标检测模型的网络结构，重新对目标检测模型进行训练，直到目标检测模型收敛为止。When training the target detection model, the gradient descent method is used to train the target detection model. In the process of training the target detection model using the training data set, the parameters and weights of the target detection model are continuously adjusted according to the preset number of iterations. After the iteration, if the target detection model has not converged, adjust the network structure of the target detection model, and retrain the target detection model until the target detection model converges.

进一步地，为了增加YOLO模型的准确性和鲁棒性，对YOLO模型增加多尺度训练，将样本图像深度融合数据中的图像数据转换为多种尺度，使用多种尺度的图像数据训练YOLO模型。例如：将YOLO模型的配置文件中的图像宽高设置为640*640，以便提高YOLO模型对小目标的检测精度。Further, in order to increase the accuracy and robustness of the YOLO model, multi-scale training is added to the YOLO model, the image data in the sample image deep fusion data is converted into multiple scales, and the YOLO model is trained using the image data of multiple scales. For example, set the image width and height in the configuration file of the YOLO model to 640*640 to improve the detection accuracy of the YOLO model for small targets.

在本实施例中，利用损失函数确定目标检测模型的准确性和稳定性。当目标检测模型的损失值小于预设的损失阈值，则说明目标检测模型已经收敛。进一步地，损失函数可以为置信度损失函数、分类损失函数或者基于目标矩形选择框的损失函数。例如：可以使用距离交并比(Distance Intersection Over Union，简称DIOU)损失函数或者完整交并比(Complete Intersection Over Union，简称CIOU)损失函数来计算目标检测模型的损失值，并在该损失值大于预设的损失阈值时，调整目标检测模型的参数。In this embodiment, the loss function is used to determine the accuracy and stability of the target detection model. When the loss value of the target detection model is less than the preset loss threshold, it means that the target detection model has converged. Further, the loss function may be a confidence loss function, a classification loss function, or a loss function based on the target rectangle selection box. For example, you can use the Distance Intersection Over Union (DIOU) loss function or the Complete Intersection Over Union (CIOU) loss function to calculate the loss value of the target detection model, and if the loss value is greater than When the preset loss threshold is set, the parameters of the target detection model are adjusted.

本发明实施例还提供了一种目标检测装置。如图4所示，为根据本发明一实施例的目标检测装置的结构图。The embodiment of the present invention also provides a target detection device. As shown in FIG. 4 , it is a structural diagram of a target detection apparatus according to an embodiment of the present invention.

该目标检测装置，包括：采集模块410、融合模块420和检测模块430。The target detection device includes: a collection module 410 , a fusion module 420 and a detection module 430 .

采集模块410，用于每隔预设时间段采集检测环境的图像数据和目标对象对应的深度数据。The acquisition module 410 is configured to acquire image data of the detection environment and depth data corresponding to the target object every preset time period.

融合模块420，用于按照采集时刻从先到后的顺序，顺次融合相同时刻采集的所述图像数据和所述深度数据，得到所述相同时刻对应的图像深度融合数据。The fusion module 420 is configured to fuse the image data and the depth data collected at the same time in sequence according to the sequence of acquisition time, to obtain the image depth fusion data corresponding to the same time.

检测模块430，用于按照融合时刻从先到后的顺序，顺次将融合得到的所述图像深度融合数据输入预先训练的目标检测模型，通过所述目标检测模型在顺次输入的所述图像深度融合数据中提取所述目标对象的属性数据，并且根据顺次输入的所述图像深度融合数据生成所述目标对象对应的三维运动轨迹数据。The detection module 430 is configured to sequentially input the image depth fusion data obtained by fusion into the pre-trained target detection model according to the sequence of fusion time, and use the target detection model in the sequentially inputted images. The attribute data of the target object is extracted from the depth fusion data, and three-dimensional motion trajectory data corresponding to the target object is generated according to the sequentially input image depth fusion data.

本发明实施例所述的装置的功能已经在上述方法实施例中进行了描述，故本实施例的描述中未详尽之处，可以参见前述实施例中的相关说明，在此不做赘述。The functions of the apparatuses according to the embodiments of the present invention have been described in the foregoing method embodiments, so for details that are not described in this embodiment, reference may be made to the relevant descriptions in the foregoing embodiments, which will not be repeated here.

本实施例提供一种目标检测设备。如图5所示，为根据本发明一实施例的目标检测设备的结构图。This embodiment provides a target detection device. As shown in FIG. 5 , it is a structural diagram of a target detection device according to an embodiment of the present invention.

在本实施例中，所述目标检测设备，包括但不限于：处理器510、存储器520。In this embodiment, the target detection device includes but is not limited to: a processor 510 and a memory 520 .

所述处理器510用于执行存储器520中存储的目标检测程序，以实现上述的目标检测方法。The processor 510 is configured to execute the target detection program stored in the memory 520 to implement the above target detection method.

具体而言，所述处理器510用于执行存储器520中存储的目标检测程序，以实现以下步骤：每隔预设时间段采集检测环境的图像数据和目标对象对应的深度数据；按照采集时刻从先到后的顺序，顺次融合相同时刻采集的所述图像数据和所述深度数据，得到所述相同时刻对应的图像深度融合数据；按照融合时刻从先到后的顺序，顺次将融合得到的所述图像深度融合数据输入预先训练的目标检测模型，通过所述目标检测模型在顺次输入的所述图像深度融合数据中提取所述目标对象的属性数据，并且根据顺次输入的所述图像深度融合数据生成所述目标对象对应的三维运动轨迹数据。Specifically, the processor 510 is configured to execute the target detection program stored in the memory 520, so as to realize the following steps: collecting image data of the detection environment and depth data corresponding to the target object every preset time period; In a first-to-last order, the image data and the depth data collected at the same time are fused in sequence to obtain the image depth fusion data corresponding to the same time; The image depth fusion data is input into a pre-trained target detection model, and the attribute data of the target object is extracted from the sequentially input image depth fusion data by the target detection model, and according to the sequentially inputted The image depth fusion data generates three-dimensional motion trajectory data corresponding to the target object.

本发明实施例还提供了一种计算机可读存储介质。这里的计算机可读存储介质存储有一个或者多个程序。其中，计算机可读存储介质可以包括易失性存储器，例如随机存取存储器；存储器也可以包括非易失性存储器，例如只读存储器、快闪存储器、硬盘或固态硬盘；存储器还可以包括上述种类的存储器的组合。Embodiments of the present invention also provide a computer-readable storage medium. The computer-readable storage medium herein stores one or more programs. The computer-readable storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk or solid-state hard disk; the memory may also include the above-mentioned types combination of memory.

当计算机可读存储介质中一个或者多个程序可被一个或者多个处理器执行，以实现上述的目标检测方法。由于上面已经对该目标检测方法进行了详细描述，故在此不做赘述。When one or more programs in the computer-readable storage medium can be executed by one or more processors, the above-mentioned target detection method can be implemented. Since the object detection method has been described in detail above, it will not be repeated here.

以上所述仅为本发明的实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的权利要求范围之内。The above description is only an embodiment of the present invention, and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the scope of the claims of the present invention.

Claims

1. A method of object detection, comprising:

acquiring image data of a detection environment and depth data corresponding to a target object at intervals of a preset time period;

sequentially fusing the image data and the depth data acquired at the same time according to the sequence of the acquisition time from first to last to obtain image depth fusion data corresponding to the same time;

sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model according to the sequence of fusion time from first to last, extracting attribute data of the target object from the sequentially input image depth fusion data through the target detection model, and generating three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data.

2. The method according to claim 1, wherein the sequentially fusing the image data and the depth data acquired at the same time to obtain image depth fusion data corresponding to the same time comprises:

and combining the image data and the depth data acquired at the same time to form an image depth one-dimensional array corresponding to the same time, and taking the image depth one-dimensional array as image depth fusion data corresponding to the same time.

3. The method of claim 1,

the target detection model includes: a connected YOLO model and a long-short term memory (LSTM) model;

the extracting, by the target detection model, attribute data of the target object from the sequentially input image depth fusion data, and generating three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data, includes:

detecting the target object in the image depth fusion data sequentially input through the YOLO model, and extracting three-dimensional coordinate data of key points of the target object and attribute data of the target object;

and tracking the motion trail of the target object detected by the YOLO model in the sequentially input image depth fusion data through the LSTM model to obtain the motion trail data of the target object, and generating the three-dimensional motion trail data of the target object according to the motion trail data of the target object and the three-dimensional coordinate data of the key point of the target object extracted by the YOLO model in the image depth fusion data.

4. The method of claim 3,

the YOLO model includes: a three-dimensional convolutional layer;

the extracting three-dimensional coordinate data of the key points of the target object comprises:

extracting two-dimensional feature data of key points of the target object in an image data part of the image depth fusion data through the three-dimensional convolution layer, and extracting one-dimensional feature data of the key points of the target object in a depth data part of the image depth fusion data;

generating three-dimensional coordinate data of key points of the target object according to the two-dimensional characteristic data and the one-dimensional characteristic data; wherein the spatial dimension of the one-dimensional feature data is different from the spatial dimension of the two-dimensional feature data.

5. The method of claim 3,

the motion trail data is regional position data of the target object in the image depth fusion data;

the three-dimensional motion trajectory data includes: multi-frame three-dimensional motion trail images;

the generating three-dimensional motion trajectory data of the target object according to the motion trajectory data of the target object and the three-dimensional coordinate data of the key point of the target object extracted by the YOLO model in the image depth fusion data includes:

constructing a three-dimensional coordinate space;

sequentially acquiring the region position data of the target object and the three-dimensional coordinate data of the key points in each image depth fusion data according to the sequence of fusion time from first to last to obtain a plurality of groups of region position data and three-dimensional coordinate data of the key points;

and setting a three-dimensional model corresponding to the target object in the three-dimensional coordinate space according to the region position data and the three-dimensional coordinate data of the key points aiming at each group of the region position data and the three-dimensional coordinate data of the key points, and generating a frame of three-dimensional motion trail image.

6. The method of claim 3, wherein after said obtaining motion trajectory data of said target object, said method further comprises:

comparing the motion trail data of the target object with preset abnormal state data;

and when the similarity between the motion trail data of the target object and the abnormal state data is greater than a preset similarity threshold, executing abnormal alarm operation corresponding to the abnormal state data.

7. The method according to any one of claims 1-6, wherein before the sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model, the method further comprises:

simultaneously acquiring image data of a detection environment and depth data corresponding to a target object;

fusing the image data and the depth data which are acquired simultaneously to obtain sample image depth fusion data and marking attribute data and region position data for the sample image depth fusion data;

and performing data enhancement processing based on the sample image depth fusion to obtain a plurality of enhanced image depth fusion data corresponding to the sample image depth fusion data, and using each enhanced image depth fusion data as one sample image depth fusion data so as to train the target detection model by using all the obtained sample image depth fusion data.

8. An object detection device, comprising:

the acquisition module is used for acquiring image data of a detection environment and depth data corresponding to a target object at intervals of a preset time period;

the fusion module is used for sequentially fusing the image data and the depth data acquired at the same time according to the sequence of the acquisition time from first to last to obtain image depth fusion data corresponding to the same time;

the detection module is used for sequentially inputting the image depth fusion data obtained by fusion into a pre-trained target detection model according to the sequence of fusion time from first to last, extracting attribute data of the target object from the sequentially input image depth fusion data through the target detection model, and generating three-dimensional motion trajectory data corresponding to the target object according to the sequentially input image depth fusion data.

9. An object detection device, characterized in that the object detection device comprises a processor, a memory; the processor is used for executing the target detection program stored in the memory to realize the target detection method of any one of claims 1-7.

10. A computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the object detection method of any one of claims 1-7.