WO2020233427A1 - 目标的特征的确定方法和装置 - Google Patents

目标的特征的确定方法和装置 Download PDF

Info

Publication number
WO2020233427A1
WO2020233427A1 PCT/CN2020/089410 CN2020089410W WO2020233427A1 WO 2020233427 A1 WO2020233427 A1 WO 2020233427A1 CN 2020089410 W CN2020089410 W CN 2020089410W WO 2020233427 A1 WO2020233427 A1 WO 2020233427A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
feature
frame image
frame
image
Prior art date
Application number
PCT/CN2020/089410
Other languages
English (en)
French (fr)
Inventor
刘武
叶韵
梅涛
孙宇
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Publication of WO2020233427A1 publication Critical patent/WO2020233427A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Definitions

  • the present disclosure relates to the field of artificial intelligence technology, and in particular to a method for determining characteristics of a target, a device for determining characteristics of a target, and a non-volatile computer-readable storage medium.
  • the three-dimensional shape information and posture information of the person in each frame image can be obtained, so as to achieve the establishment of a three-dimensional human body model.
  • the use of a three-dimensional human body model can be used to implement smart fitting, identity authentication, etc.
  • the key points of the human body in the frame image are extracted, and the three-dimensional shape information and posture information of the human body are estimated based on the image segmentation result.
  • a method for determining the characteristics of a target including: extracting target characteristics of the target in each frame image, the frame image including the frame image to be processed and the phase of the frame image to be processed Neighboring frame images; using the attention mechanism model to extract the correlation between the target features of each frame image to determine the correlation feature of each frame image; according to the correlation feature of each frame image, to The target characteristics of the processed frame image are optimized to determine the comprehensive characteristics of the target in the frame image to be processed.
  • the extracting the target feature of the target in each frame image includes: extracting the target in each frame image by using the first feature extraction module of the first machine learning model according to the feature vector of each frame image According to the feature vector of each frame image, use the second feature extraction module of the first machine learning model to extract the local feature information of the target in each frame image; fuse the overall feature information and The local feature information determines the target feature.
  • the first feature extraction module is a deconvolution layer
  • the overall feature information is skeleton model information of the target.
  • the second feature extraction module is a fully connected layer
  • the local feature information includes local shape features and local posture features.
  • the local feature information includes position information of the target in the image, zoom information relative to the camera, rotation information and translation information.
  • the fusing the overall feature information and the local feature information to determine the target feature includes: performing a bilinear transformation on the overall feature information and the local feature information to determine the target The shape feature and posture feature of is used as the target feature.
  • the attention mechanism model includes a plurality of Transformer modules, and the plurality of Transformer modules are connected in series.
  • a convolutional neural network model is used to determine the comprehensive feature of the target in the frame image to be processed according to the associated features of each frame image.
  • the determining the comprehensive feature of the target in the frame image to be processed includes: sorting the associated features according to the inherent sequence of each frame image in the video; according to the sorted frame For the associated features of the image, a TCN (Temporal Convolutional Net Temporal Convolutional Network) model is used to determine the comprehensive characteristics of the target in the frame image to be processed, and the comprehensive characteristics include the shape characteristics and posture characteristics of the target.
  • TCN Temporal Convolutional Net Temporal Convolutional Network
  • the attention mechanism model is trained through the following steps:
  • an associated feature queue is generated, and the sequence of the associated features in the associated feature queue is different from the inherent sequence of the corresponding frame image in the video; the second machine learning model is used to The associated features in the associated feature queue are sorted; according to the sorting result and the inherent order, the attention mechanism model is trained.
  • an apparatus for determining features of a target including: a target feature extraction unit for extracting target features of the target in each frame image, each frame image including the frame image to be processed and the The adjacent frame images of the frame image to be processed; the associated feature determining unit is configured to use the attention mechanism model to extract the association relationship between the target features of the frame images to determine the associated feature of each frame image; The integrated feature determining unit is configured to optimize the target feature of the frame image to be processed according to the associated feature of each frame image to determine the integrated feature of the target in the frame image to be processed.
  • a device for determining the characteristics of a target including: a memory; and a processor coupled to the memory, the processor being configured to be based on the data stored in the memory device
  • the instruction executes the method for determining the characteristics of the target in any of the foregoing embodiments.
  • a non-volatile computer-readable storage medium on which a computer program is stored.
  • the program is executed by a processor, a method for determining the characteristics of the target in any of the above embodiments is provided .
  • FIG. 1 shows a flowchart of some embodiments of a method for determining characteristics of a target of the present disclosure
  • FIG. 2 shows a flowchart of some embodiments of step 110 in FIG. 1;
  • FIG. 3 shows a schematic diagram of some embodiments of step 110 in FIG. 1;
  • FIG. 4 shows a schematic diagram of some embodiments of step 120 and step 130 in FIG. 1;
  • FIG. 5 shows a block diagram of some embodiments of the device for determining the characteristics of the target of the present disclosure
  • FIG. 6 shows a block diagram of other embodiments of the device for determining the characteristics of the target of the present disclosure
  • FIG. 7 shows a block diagram of further embodiments of the device for determining the characteristics of the target of the present disclosure.
  • the inventors of the present disclosure have discovered the following problems in the above-mentioned related technologies: relying on the key point extraction accuracy and image segmentation accuracy of a single frame of image, and not using the relationship between multiple frames of images, resulting in low accuracy of feature determination.
  • the present disclosure proposes a technical solution for determining the feature of the target, which can improve the accuracy of feature determination.
  • FIG. 1 shows a flowchart of some embodiments of a method for determining characteristics of a target of the present disclosure.
  • the method includes: step 110, extracting the target feature of each frame image; step 120, determining the joint feature of each frame image; and step 130, determining the comprehensive feature of the frame image to be processed.
  • each frame image includes a frame image to be processed and adjacent frame images of the frame image to be processed.
  • the k-th frame image of the video may be used as the frame image to be processed, and the first N frames and the last N frame images of the k-th frame image may be regarded as adjacent frame images, and both k and N are integers greater than 0.
  • the target may be a human body contained in each frame of image, and the target feature may be shape information and posture information of the human body.
  • the shape information can be SMPL (Skinned Multi-Person Linear) shape parameters of the human body shape model (such as a vector with length 10)
  • the posture information can be the shape parameters of the SMPL human shape model (such as A vector of length 72).
  • the human body detection can be performed on the frame image to be processed (for example, using the AlphaPose algorithm) to obtain the rectangular region (which can be referred to as the target area) of the human body in the frame image to be processed; 50 neural network model) extract the feature vector of the frame image to be processed from the target area.
  • the same method can be used to extract feature vectors of adjacent frame images.
  • step 110 may be implemented through the steps in FIG. 2.
  • FIG. 2 shows a flowchart of some embodiments of step 110 in FIG. 1.
  • step 110 includes: step 1110, extracting the target feature of each frame image; step 1120, determining the joint feature of each frame image; and step 1130, determining the comprehensive feature of the frame image to be processed.
  • step 1110 according to the feature vector of each frame image, the first feature extraction module of the first machine learning model is used to extract the overall feature information of the target in each frame image.
  • the first feature extraction module is a deconvolution layer (for example, transposed convolution processing), and the overall feature information is target skeleton model information.
  • the skeleton model information may be the position coordinates of the joint points of the human body model.
  • step 1120 according to the feature vector of each frame image, the second feature extraction module of the first machine learning model is used to extract the local feature information of the target in each frame image.
  • the second feature extraction module is a fully connected layer
  • the local feature information includes: local shape features and local posture features (such as local feature information that cannot be reflected by the skeleton model of the human body such as hands, heads, feet), and the target The position information in the image, as well as the zoom information, rotation information and translation information of the target relative to the camera.
  • the local feature information may also include the shape information of the human body.
  • the target feature can be extracted through the embodiment in FIG. 3.
  • FIG. 3 shows a schematic diagram of some embodiments of step 110 in FIG. 1.
  • the image feature extraction module 31 (such as the Resnet-50 neural network model) is used to extract the feature vector of the k-th frame image.
  • the feature vector is a 16 ⁇ 512 ⁇ 7 ⁇ 7 vector, which is input to the first machine learning model 32.
  • the first feature extraction module 321 is used to extract the overall feature information of the target.
  • the second feature extraction module 322 is used to extract the local feature information of the target.
  • the first feature extraction module 321 may be a deconvolution layer.
  • the first feature extraction module 321 may include 3 transposed convolutional layers to expand a 16 ⁇ 512 ⁇ 7 ⁇ 7 vector into a 16 ⁇ 512 ⁇ 56 ⁇ 56 feature map (for example, the feature map may be a key to describe the human body
  • the heatmap of the point position is used as the overall feature information.
  • the second feature extraction module 322 may be a fully connected layer.
  • a global mean pooling method can be used to convert a 16 ⁇ 512 ⁇ 7 ⁇ 7 vector into a 16 ⁇ 512 vector, and then a fully connected layer is used to extract the same size vector from the 16 ⁇ 512 vector for Describe local feature information (detailed information of the human body).
  • the algorithm for building a three-dimensional human body model can be decomposed into two relatively simple sub-tasks-overall feature information extraction and local feature information extraction, thereby reducing the complexity of the algorithm through decoupling.
  • the overall feature information and the local feature information are input to the bilinear transformation layer 323 to obtain the target feature k in the k-th frame of image.
  • the overall feature information is the vector X 1
  • the local feature information is the vector X 2
  • the weight parameter W of the bilinear transformation layer 323 can be obtained through training
  • the output of the bilinear transformation layer 323 is T is the transpose operation.
  • the bilinear transformation layer 323 fuses the overall feature information and the local feature information, which can ensure that the two types of information do not affect each other and are independent of each other, and maintain the decoupling state of the two while fusing the information, thereby improving the accuracy of feature extraction .
  • the same method as the above-mentioned embodiment may be used to extract target features in adjacent frame images of the k-th frame image to be processed.
  • the target feature k-1 in the k-1 frame image, the target feature k+1 in the k+1 frame image, and so on can be extracted.
  • the comprehensive features of the target can be determined using the regional steps in Figure 1.
  • the attention mechanism model is used to extract the correlation between the target features of each frame of image to determine the correlation feature of each frame of image. For example, the target features in the first 4 frames and the last 4 frames of the frame image to be processed (the target features in 9 consecutive frame images in total) can be extracted, and the attention mechanism model can be input for processing.
  • the attention mechanism model includes a plurality of Transformer modules connected in series. In this way, according to the consistency of the target shape and the continuity of the target posture in the continuous frame images, the association information between the target features is excavated many times, and the feature expression learned from the data is optimized, thereby improving the accuracy of feature determination.
  • the target characteristics of the frame image to be processed are optimized according to the associated characteristics of each frame image to determine the comprehensive characteristics of the target in the frame image to be processed.
  • a convolutional neural network can be used to process the associated features acquired based on the target feature, so as to optimize the target feature.
  • the associated features are sorted according to the inherent order of each frame of image in the video. According to the correlation characteristics of each frame image after sorting, the TCN model is used to determine the comprehensive characteristics of the target in the frame image to be processed. Comprehensive features include the shape feature and posture feature of the target.
  • the first extracted target feature (i.e. the feature to be processed) does not contain the correlation information between each frame image, so the target feature is not accurate enough;
  • the correlation feature determined by the attention mechanism according to the target feature is The features of each frame of image that contain the association relationship information;
  • the comprehensive feature is the feature of the target determined by the association relationship information in the association feature. In this way, compared with target features, comprehensive features can describe the target more accurately.
  • steps 120 and 130 may be implemented through the embodiment in FIG. 4.
  • FIG. 4 shows a schematic diagram of some embodiments of step 120 and step 130 in FIG. 1.
  • the target feature k-1, target feature k, and target feature k+1 in the extracted continuous frame images can be input into the attention mechanism model 41 in the order of each frame image in the video to obtain the corresponding association Feature k-1, associated feature k, and associated feature k+1.
  • the attention mechanism model 41 includes a Transformer module 411 and a Transformer module 412 connected in series.
  • the output associated features include the associated information between the target features, and the comprehensive features in the frame image to be processed are determined according to the associated features, which can improve accuracy.
  • the correlation feature k-1, the correlation feature k, and the correlation feature k+1 are input to the TCN model 42, and the target feature k is optimized to obtain the comprehensive feature k of the k-th frame image.
  • the TCN model 42 may include two one-dimensional convolution layers and one one-dimensional convolution module.
  • the TCN model 42 can introduce information of each associated feature through the first convolution layer, then process it through the one-dimensional convolution module, and finally perform the result prediction output through the second convolution layer.
  • a one-dimensional convolution module may include a third convolution layer connected by residuals (for one-dimensional convolution processing), a BN (Batch Normalization) layer, and an activation layer.
  • an associated feature queue may be generated according to the associated features of each frame of image, and the sequence of the associated features in the associated feature queue is different from the inherent order of each frame of image in the video.
  • the second machine learning model 43 is used to sort the associated features in the associated feature queue. According to the sorting result and the inherent order, the attention mechanism model 41 is trained.
  • the second machine learning model 43 is a sorted network model including three convolutional layers and three fully connected layers.
  • the associated feature k-1, the associated feature k, and the associated feature k+1 may be shuffled and input into the second machine learning model 43 for sorting. That is to say, the inherent sequence of the frame images in the video can be used for supervised training to return to the correct sequence, and the attention mechanism model 41 can be trained using the sorting result.
  • Adopting this confrontation training method can enable the attention mechanism model 41 to deeply understand the sequence between each frame of images, thereby obtaining more accurate feature determination results.
  • the attention mechanism model is used to determine the associated features of the frame image to be processed and the adjacent frame image, and the target features in the frame image to be processed are optimized through each associated feature. In this way, the consistency of the target shape and the continuity of the target posture in each frame of image are used, and the accuracy of target feature determination is improved.
  • FIG. 5 shows a block diagram of some embodiments of an apparatus for determining characteristics of a target of the present disclosure.
  • the target feature determination device 5 includes a target feature extraction unit 51, an associated feature determination unit 52 and a comprehensive feature determination unit 53.
  • the target feature extraction unit 51 extracts target features of the target in each frame image, and each frame image includes the frame image to be processed and adjacent frame images of the frame image to be processed.
  • the target feature extraction unit 51 uses the first feature extraction module of the first machine learning model to extract the overall feature information of the target in each frame image according to the feature vector of each frame image.
  • the first feature extraction module is a deconvolution layer
  • the overall feature information is the target's skeleton model information.
  • the target feature extraction unit 51 uses the second feature extraction module of the first machine learning model to extract the local feature information of the target in each frame image according to the feature vector of each frame image.
  • the second feature extraction module is a fully connected layer
  • the local feature information includes local shape features and local posture features, position information of the target in the image, zoom information relative to the camera, rotation information and translation information.
  • the target feature extraction unit 51 fuses overall feature information and local feature information to determine the target feature.
  • the target feature extraction unit 51 performs bilinear transformation on the overall feature information and the local feature information, and determines the shape feature and posture feature of the target as the target feature.
  • the associated feature determining unit 52 uses the attention mechanism model to extract the associated relationship between the target features of each frame image to determine the associated feature of each frame image.
  • the attention mechanism model includes multiple Transformer modules, and multiple Transformer modules are connected in series.
  • the attention mechanism model is trained through the following steps: generate an associated feature queue according to the associated features of each frame of image, the arrangement order of each associated feature in the associated feature queue and the inherent order of each frame of image in the video Different; use the second machine learning model to sort the correlation features in the correlation feature queue; train the attention mechanism model according to the sorting result and inherent order.
  • the comprehensive feature determining unit 54 uses a convolutional neural network model to determine the comprehensive feature of the target in the frame image to be processed according to the associated features of each frame image.
  • the integrated feature determining unit 54 optimizes the target feature of the frame image to be processed according to the associated features of each frame image to determine the integrated feature of the target in the frame image to be processed. For example, according to the inherent sequence of each frame image in the video, the related features are sorted, and the TCN model is used to determine the comprehensive characteristics of the target in the frame image to be processed according to the related characteristics of the sorted frames. Shape characteristics and posture characteristics.
  • the attention mechanism model is used to determine the associated features of the frame image to be processed and the adjacent frame image, and the target features in the frame image to be processed are optimized through each associated feature. In this way, the consistency of the target shape and the continuity of the target posture in each frame of image are used, and the accuracy of target feature determination is improved.
  • FIG. 6 shows a block diagram of other embodiments of the device for determining the characteristics of the target of the present disclosure.
  • the device 6 for determining the target feature of this embodiment includes: a memory 61 and a processor 62 coupled to the memory 61, and the processor 62 is configured to execute the present disclosure based on instructions stored in the memory 61 The method for determining the target feature in any one of the embodiments in.
  • the memory 61 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
  • the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
  • FIG. 7 shows a block diagram of further embodiments of the device for determining the characteristics of the target of the present disclosure.
  • the device 7 for determining the target feature of this embodiment includes: a memory 710 and a processor 720 coupled to the memory 710.
  • the processor 720 is configured to execute any of the foregoing based on instructions stored in the memory 710.
  • a method for determining the target feature in an embodiment includes: a memory 710 and a processor 720 coupled to the memory 710.
  • the memory 710 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
  • the system memory for example, stores an operating system, an application program, a boot loader (Boot Loader), and other programs.
  • the device 7 for determining the target feature may also include an input/output interface 730, a network interface 740, a storage interface 750, and so on. These interfaces 730, 740, 750, and the memory 710 and the processor 720 may be connected by a bus 760, for example.
  • the input and output interface 730 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
  • the network interface 740 provides a connection interface for various networked devices.
  • the storage interface 750 provides a connection interface for external storage devices such as SD cards and U disks.
  • the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. .
  • the method and system of the present disclosure may be implemented in many ways.
  • the method and system of the present disclosure can be implemented by software, hardware, firmware or any combination of software, hardware, and firmware.
  • the above-mentioned order of the steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless otherwise specifically stated.
  • the present disclosure may also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Abstract

本公开涉及一种目标的特征的确定方法、装置和计算机可读存储介质,涉及人工智能技术领域。该方法包括:提取各帧图像中目标的目标特征,各帧图像包括待处理帧图像和待处理帧图像的相邻帧图像;利用注意力机制模型,提取各帧图像的目标特征之间的关联关系,以确定各帧图像的关联特征;根据各帧图像的关联特征,以确定待处理帧图像中目标的综合特征。

Description

目标的特征的确定方法和装置
相关申请的交叉引用
本申请是以CN申请号为201910411768.0,申请日为2019年5月17日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及人工智能技术领域,特别涉及一种目标的特征的确定方法、目标的特征的确定装置和非易失性计算机可读存储介质。
背景技术
通过对视频中各帧图像进行处理,可以获取各帧图像中人的三维形状信息和姿态信息,从而实现人体三维模型的建立。利用人体三维模型可以用于实现诸如智能试衣、身份认证等。
在相关技术中,提取帧图像中人体的关键点,结合图像分割结果估计人体的三维形状信息和姿态信息。
发明内容
根据本公开的一些实施例,提供了一种目标的特征的确定方法,包括:提取各帧图像中目标的目标特征,所述各帧图像包括待处理帧图像和所述待处理帧图像的相邻帧图像;利用注意力机制模型,提取所述各帧图像的目标特征之间的关联关系,以确定所述各帧图像的关联特征;根据所述各帧图像的关联特征,对所述待处理帧图像的目标特征进行优化,以确定所述待处理帧图像中目标的综合特征。
在一些实施例中,所述提取各帧图像中目标的目标特征包括:根据所述各帧图像的特征向量,利用第一机器学习模型的第一特征提取模块,提取所述各帧图像中目标的总体特征信息;根据所述各帧图像的特征向量,利用所述第一机器学习模型的第二特征提取模块,提取所述各帧图像中目标的局部特征信息;融合所述总体特征信息和所述局部特征信息,确定所述目标特征。
在一些实施例中,所述第一特征提取模块为反卷积层,所述总体特征信息为所述目标的骨架模型信息。
在一些实施例中,所述第二特征提取模块为全连接层,所述局部特征信息包括局部形状特征和局部姿态特征。
在一些实施例中,所述局部特征信息包括所述目标在图像中的位置信息、相对于摄像机的缩放信息、旋转信息和平移信息。
在一些实施例中,所述融合所述总体特征信息和所述局部特征信息,确定所述目标特征包括:对所述总体特征信息和所述局部特征信息进行双线性变换,确定所述目标的形状特征和姿态特征作为所述目标特征。
在一些实施例中,所述注意力机制模型包括多个Transformer(变换器)模块,所述多个Transformer模块之间串联连接。
在一些实施例中,根据所述各帧图像的关联特征,利用卷积神经网络模型,确定所述待处理帧图像中目标的综合特征。
在一些实施例中,所述确定所述待处理帧图像中目标的综合特征包括:按照所述各帧图像在视频中的固有顺序,对各关联特征进行排序;根据排序后的所述各帧图像的关联特征,利用TCN(Temporal Convolutional Net时间卷积网络)模型确定所述待处理帧图像中目标的综合特征,所述综合特征包括所述目标的形状特征和姿态特征。
在一些实施例中,所述注意力机制模型通过下面的步骤进行训练:
根据所述各帧图像的关联特征,生成关联特征队列,所述关联特征队列中各关联特征的排列顺序与对应的帧图像在视频中的固有顺序不同;利用第二机器学习模型,对所述关联特征队列中各关联特征进行排序;根据排序结果和所述固有顺序,对所述注意力机制模型进行训练。
根据本公开的另一些实施例,提供一种目标的特征的确定装置,包括:目标特征提取单元,用于提取各帧图像中目标的目标特征,所述各帧图像包括待处理帧图像和所述待处理帧图像的相邻帧图像;关联特征确定单元,用于利用注意力机制模型,提取所述各帧图像的目标特征之间的关联关系,以确定所述各帧图像的关联特征;综合特征确定单元,用于根据所述各帧图像的关联特征,对所述待处理帧图像的目标特征进行优化,以确定所述待处理帧图像中目标的综合特征。
根据本公开的又一些实施例,提供一种目标的特征的确定装置,包括:存储器;和耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器装置中的指令,执行上述任一个实施例中的目标的特征的确定方法。
根据本公开的再一些实施例,提供一种非易失性计算机可读存储介质,其上存 储有计算机程序,该程序被处理器执行时实现上述任一个实施例中的目标的特征的确定方法。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1示出本公开的目标的特征的确定方法的一些实施例的流程图;
图2示出图1中步骤110的一些实施例的流程图;
图3示出图1中步骤110的一些实施例的示意图;
图4示出图1中步骤120和步骤130的一些实施例的示意图;
图5示出本公开的目标的特征的确定装置的一些实施例的框图;
图6示出本公开的目标的特征的确定装置的另一些实施例的框图;
图7示出本公开的目标的特征的确定装置的又一些实施例的框图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本公开的发明人发现上述相关技术中存在如下问题:依赖于单帧图像的关键点提 取精度和图像分割准确度,没有利用多帧图像之间的联系,导致特征确定的准确性低。
鉴于此,本公开提出了一种目标的特征的确定技术方案,能够提高特征确定的准确性。
图1示出本公开的目标的特征的确定方法的一些实施例的流程图。
如图1所示,该方法包括:步骤110,提取各帧图像的目标特征;步骤120,确定各帧图像的联合特征;和步骤130,确定待处理帧图像的综合特征。
在步骤110中,提取各帧图像中目标的目标特征。各帧图像包括待处理帧图像和待处理帧图像的相邻帧图像。例如,可以将视频的第k帧图像作为待处理帧图像,将第k帧图像的前N帧和后N帧图像作为相邻帧图像,k和N都是大于0的整数。
在一些实施例中,目标可以是各帧图像中包含的人体,目标特征可以是人体的形状信息和姿态信息。例如,形状信息可以是SMPL(Skinned Multi-Person Linear,具有骨骼蒙皮的多人线性)人体形状模型的形状参数(如长度为10的向量),姿态信息可以是SMPL人体形状模型形状参数(如长度为72的向量)。
在一些实施例中,可以先对待处理帧图像进行人体检测(如采用AlphaPose算法),以获取待处理帧图像包含人体的矩形区域(可以称为目标区域);然后利用机器学习方法(如Resnet-50神经网络模型)从目标区域中提取待处理帧图像的特征向量。例如,可以采用相同的方法提取相邻帧图像的特征向量。
在一些实施例中,步骤110可以通过图2中的步骤实现。
图2示出图1中步骤110的一些实施例的流程图。
如图2所示,步骤110包括:步骤1110,提取各帧图像的目标特征;步骤1120,确定各帧图像的联合特征;和步骤1130,确定待处理帧图像的综合特征。
在步骤1110中,根据各帧图像的特征向量,利用第一机器学习模型的第一特征提取模块,提取各帧图像中目标的总体特征信息。
在一些实施例中,第一特征提取模块为反卷积层(如进行转置卷积处理),总体特征信息为目标的骨架模型信息。例如,骨架模型信息可以为人体模型关节点的位置坐标。
在步骤1120中,根据各帧图像的特征向量,利用第一机器学习模型的第二特征提取模块,提取各帧图像中目标的局部特征信息。
在一些实施例中,第二特征提取模块为全连接层,局部特征信息包括:局部形状特征和局部姿态特征(如人体的手、头、脚等骨架模型无法体现的局部特征信息)、 目标在图像中的位置信息,以及目标相对于摄像机的缩放信息、旋转信息和平移信息。局部特征信息还可以包括人体的形状信息。
在一些实施例中,可以通过图3中的实施例提取目标特征。
图3示出图1中步骤110的一些实施例的示意图。
如图3所示,利用图像特征提取模块31(如Resnet-50神经网络模型)提取第k帧图像的特征向量。例如,特征向量为一个16×512×7×7的向量,将该向量输入第一机器学习模型32。
根据特征向量,利用第一特征提取模块321提取目标的总体特征信息。根据特征向量,利用第二特征提取模块322提取目标的局部特征信息。
在一些实施例中,第一特征提取模块321可以为反卷积层。例如,第一特征提取模块321可以包含3个转置卷积层,将16×512×7×7的向量扩大为16×512×56×56的特征图(例如,特征图可以是描述人体关键点位置的heatmap)作为总体特征信息。
在一些实施例中,第二特征提取模块322可以为一个全连接层。例如,可以采用全局均值池化方法将16×512×7×7的向量转换为16×512的向量,再用一个全连接层,从16×512的向量中提取出同样大小的向量,用于描述局部特征信息(人体的细节信息)。
这样,可以将人体三维模型的建立算法分解为完成两个相对较简单的子任务——总体特征信息提取和局部特征信息提取,从而通过解耦的方式降低了算法的复杂度。
在一些实施例中,将总体特征信息和局部特征信息输入双线性变换层323,获取第k帧图像中的目标特征k。例如,总体特征信息为向量X 1,局部特征信息为向量X 2,通过训练可以得到双线性变换层323的权重参数W,则双线性变换层323的输出为
Figure PCTCN2020089410-appb-000001
T为转置运算。
这样,通过双线性变换层323融合总体特征信息和局部特征信息,能够保证两种信息互不影响、相互独立,在融合信息的同时保持两者的解耦状态,从而提高特征提取的准确性。
在一些实施例中,可以利用与上述实施例相同的方法,提取待处理的第k帧图像的相邻帧图像中的目标特征。例如,可以提取第k-1帧图像中的目标特征k-1,第k+1帧图像中的目标特征k+1等。
提取了各帧图像中的目标特征,就可以采用图1中的区域步骤确定目标的综合特征了。
在步骤120中,利用注意力机制模型,提取各帧图像的目标特征之间的关联关系,以确定各帧图像的关联特征。例如,可以提取待处理帧图像前4帧和后4帧图像中的目标特征(共9个连续帧图像中的目标特征),输入注意力机制模型进行处理。
在一些实施例中,注意力机制模型包括多个相互串联的Transformer模块。这样,根据连续帧图像中目标形状的一致性和目标姿态的连续性,多次挖掘各目标特征之间的关联信息,优化从数据中学习的特征表达,从而提高特征确定的准确性。
在步骤130中,根据各帧图像的关联特征,对待处理帧图像的目标特征进行优化,以确定待处理帧图像中目标的综合特征。例如,可以利用卷积神经网络处理基于目标特征获取的关联特征,以实现对目标特征进行优化。
在一些实施例中,按照各帧图像在视频中的固有顺序,对各关联特征进行排序。根据排序后的各帧图像的关联特征,利用TCN模型确定待处理帧图像中目标的综合特征。综合特征包括目标的形状特征和姿态特征。
在上述实施例中,最先提取的目标特征(即待处理特征)中不包含各帧图像之间的关联关系信息,所以该目标特征不够准确;根据目标特征利用注意力机制确定的关联特征为各帧图像的包含了关联关系信息的特征;综合特征为利用关联特征中的关联关系信息确定的目标的特征。这样,相比于目标特征,综合特征能够更准确的描述目标。
在一些实施例中,可以通过图4中的实施例实现步骤120和130。
图4示出图1中步骤120和步骤130的一些实施例的示意图。
如图4所示,可以将提取的连续帧图像中的目标特征k-1、目标特征k、目标特征k+1,按照视频中各帧图像的顺序输入注意力机制模型41,得到相应的关联特征k-1、关联特征k、关联特征k+1。例如,注意力机制模型41包括串联的Transformer模块411和Transformer模块412。
这样,输出的各关联特征中包含了各目标特征之间的关联信息,根据各关联特征确定待处理帧图像中的综合特征,能够提高准确性。
在一些实施例中,将关联特征k-1、关联特征k、关联特征k+1输入TCN模型42,对目标特征k进行优化得到第k帧图像的综合特征k。
在一些实施例中,TCN模型42可以包括两个一维卷积层和一个一维卷积模块组成。TCN模型42可以通过第一卷积层对各关联特征进行信息引入,然后通过一维卷积模块进行处理,最后通过第二卷积层进行结果预测输出。例如,一维卷积模块可以 包括残差连接的第三卷积层(进行一维卷积处理)、BN(Batch Normalization,批量归一化)层和激活层。
在一些实施例中,可以根据各帧图像的关联特征,生成关联特征队列,关联特征队列中各关联特征的排列顺序与各帧图像在视频中的固有顺序不同。利用第二机器学习模型43,对关联特征队列中各关联特征进行排序。根据排序结果和固有顺序,对注意力机制模型41进行训练。
例如,第二机器学习模型43为一个包括三个卷积层和三个全连接层的排序网络模型。可以将关联特征k-1、关联特征k、关联特征k+1打乱顺序后输入第二机器学习模型43进行排序。也就是说,可以用帧图像在视频中的固有顺序进行监督训练,以回归出正确的顺序,利用排序结果对注意力机制模型41进行训练。
采用这种对抗训练方法,能够使得注意力机制模型41深入理解各帧图像之间的顺序,从而得到更准确的特征确定结果。
在上述实施例中,利用注意力机制模型确定了待处理帧图像与相邻帧图像的关联特征,通过各关联特征优化待处理帧图像中的目标特征。这样,利用了各帧图像中目标形状的一致性和目标姿态的连续性,提高了目标特征确定的准确性。
图5示出本公开的目标的特征的确定装置的一些实施例的框图。
如图5所示,目标特征的确定装置5包括目标特征提取单元51、关联特征确定单元52和综合特征确定单元53。
目标特征提取单元51提取各帧图像中目标的目标特征,各帧图像包括待处理帧图像和待处理帧图像的相邻帧图像。
在一些实施例中,目标特征提取单元51根据各帧图像的特征向量,利用第一机器学习模型的第一特征提取模块,提取各帧图像中目标的总体特征信息。例如,第一特征提取模块为反卷积层,总体特征信息为目标的骨架模型信息。
在一些实施例中,目标特征提取单元51根据各帧图像的特征向量,利用第一机器学习模型的第二特征提取模块,提取各帧图像中目标的局部特征信息。例如,第二特征提取模块为全连接层,局部特征信息包括局部形状特征和局部姿态特征、目标在图像中的位置信息、相对于摄像机的缩放信息、旋转信息和平移信息。
在一些实施例中,目标特征提取单元51融合总体特征信息和局部特征信息,确定目标特征。目标特征提取单元51对总体特征信息和所述局部特征信息进行双线性变换,确定目标的形状特征和姿态特征作为目标特征。
关联特征确定单元52利用注意力机制模型,提取各帧图像的目标特征之间的关联关系,以确定各帧图像的关联特征。例如,注意力机制模型包括多个Transformer模块,多个Transformer模块之间串联连接。
在一些实施例中,注意力机制模型通过下面的步骤进行训练:根据各帧图像的关联特征,生成关联特征队列,关联特征队列中各关联特征的排列顺序与各帧图像在视频中的固有顺序不同;利用第二机器学习模型,对关联特征队列中各关联特征进行排序;根据排序结果和固有顺序,对注意力机制模型进行训练。
在一些实施例中,综合特征确定单元54根据各帧图像的关联特征,利用卷积神经网络模型,确定待处理帧图像中目标的综合特征。
综合特征确定单元54根据各帧图像的关联特征,对待处理帧图像的目标特征进行优化,以确定待处理帧图像中目标的综合特征。例如,按照各帧图像在视频中的固有顺序,对各关联特征进行排序,根据排序后的各帧图像的关联特征,利用TCN模型确定待处理帧图像中目标的综合特征,综合特征包括目标的形状特征和姿态特征。在上述实施例中,利用注意力机制模型确定了待处理帧图像与相邻帧图像的关联特征,通过各关联特征优化待处理帧图像中的目标特征。这样,利用了各帧图像中目标形状的一致性和目标姿态的连续性,提高了目标特征确定的准确性。
图6示出本公开的目标的特征的确定装置的另一些实施例的框图。
如图6所示,该实施例的目标特征的确定装置6包括:存储器61以及耦接至该存储器61的处理器62,处理器62被配置为基于存储在存储器61中的指令,执行本公开中任意一个实施例中的目标特征的确定方法。
其中,存储器61例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)、数据库以及其他程序等。
图7示出本公开的目标的特征的确定装置的又一些实施例的框图。
如图7所示,该实施例的目标特征的确定装置7包括:存储器710以及耦接至该存储器710的处理器720,处理器720被配置为基于存储在存储器710中的指令,执行前述任意一个实施例中的目标特征的确定方法。
存储器710例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)以及其他程序等。
目标特征的确定装置7还可以包括输入输出接口730、网络接口740、存储接口 750等。这些接口730、740、750以及存储器710和处理器720之间例如可以通过总线760连接。其中,输入输出接口730为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口740为各种联网设备提供连接接口。存储接口750为SD卡、U盘等外置存储设备提供连接接口。
本领域内的技术人员应当明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
至此,已经详细描述了根据本公开的目标的特征的确定方法、目标的特征的确定装置和非易失性计算机可读存储介质。为了避免遮蔽本公开的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。
可能以许多方式来实现本公开的方法和系统。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和系统。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。本领域的技术人员应该理解,可在不脱离本公开的范围和精神的情况下,对以上实施例进行修改。本公开的范围由所附权利要求来限定。

Claims (13)

  1. 一种目标的特征的确定方法,包括:
    提取各帧图像中目标的目标特征,所述各帧图像包括待处理帧图像和所述待处理帧图像的相邻帧图像;
    利用注意力机制模型,提取所述各帧图像的目标特征之间的关联关系,以确定所述各帧图像的关联特征;
    根据所述各帧图像的关联特征,确定所述待处理帧图像中目标的综合特征。
  2. 根据权利要求1所述的确定方法,其中,所述提取各帧图像中目标的目标特征包括:
    根据所述各帧图像的特征向量,利用第一机器学习模型的第一特征提取模块,提取所述各帧图像中目标的总体特征信息;
    根据所述各帧图像的特征向量,利用所述第一机器学习模型的第二特征提取模块,提取所述各帧图像中目标的局部特征信息;
    融合所述总体特征信息和所述局部特征信息,确定所述目标特征。
  3. 根据权利要求2所述的确定方法,其中,
    所述第一特征提取模块为反卷积层,所述总体特征信息为所述目标的骨架模型信息。
  4. 根据权利要求2所述的确定方法,其中,
    所述第二特征提取模块为全连接层,所述局部特征信息包括局部形状特征和局部姿态特征。
  5. 根据权利要求4所述的确定方法,其中,所述局部特征信息包括:所述目标在图像中的位置信息;以及所述目标相对于摄像机的缩放信息、旋转信息和平移信息。
  6. 根据权利要求2所述的确定方法,其中,所述融合所述总体特征信息和所述局部特征信息,确定所述目标特征包括:
    对所述总体特征信息和所述局部特征信息进行双线性变换,确定所述目标的形状特征和姿态特征作为所述目标特征。
  7. 根据权利要求1所述的确定方法,其中,
    所述注意力机制模型包括多个Transformer模块,所述多个Transformer模块之间串联连接。
  8. 根据权利要求1所述的确定方法,其中,所述确定所述待处理帧图像中目标的综合特征包括:
    根据所述各帧图像的关联特征,利用卷积神经网络模型,确定所述待处理帧图像中目标的综合特征。
  9. 根据权利要求1所述的确定方法,其中,所述确定所述待处理帧图像中目标的综合特征包括:
    按照所述各帧图像在视频中的固有顺序,对各关联特征进行排序;
    根据排序后的所述各帧图像的关联特征,利用时域卷积网络TCN模型确定所述待处理帧图像中目标的综合特征,所述综合特征包括所述目标的形状特征和姿态特征。
  10. 根据权利要求1-9任一项所述的确定方法,其中,所述注意力机制模型通过下面的步骤进行训练:
    根据所述各帧图像的关联特征,生成关联特征队列,所述关联特征队列中各关联特征的排列顺序与对应的帧图像在视频中的固有顺序不同;
    利用第二机器学习模型,对所述关联特征队列中各关联特征进行排序;
    根据排序结果和所述固有顺序,对所述注意力机制模型进行训练。
  11. 一种目标的特征的确定装置,包括:
    目标特征提取单元,用于提取各帧图像中目标的目标特征,所述各帧图像包括待处理帧图像和所述待处理帧图像的相邻帧图像;
    关联特征确定单元,用于利用注意力机制模型,提取所述各帧图像的目标特征之间的关联关系,以确定所述各帧图像的关联特征;
    综合特征确定单元,用于根据所述各帧图像的关联特征,以确定所述待处理帧图像中目标的综合特征。
  12. 一种目标的特征的确定装置,包括:
    存储器;和
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器装置中的指令,执行权利要求1-10任一项所述的目标的特征的确定方法。
  13. 一种非易失性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现权利要求1-10任一项所述的目标的特征的确定方法。
PCT/CN2020/089410 2019-05-17 2020-05-09 目标的特征的确定方法和装置 WO2020233427A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910411768.0A CN111783506A (zh) 2019-05-17 2019-05-17 目标特征的确定方法、装置和计算机可读存储介质
CN201910411768.0 2019-05-17

Publications (1)

Publication Number Publication Date
WO2020233427A1 true WO2020233427A1 (zh) 2020-11-26

Family

ID=72755588

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/089410 WO2020233427A1 (zh) 2019-05-17 2020-05-09 目标的特征的确定方法和装置

Country Status (2)

Country Link
CN (1) CN111783506A (zh)
WO (1) WO2020233427A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378973A (zh) * 2021-06-29 2021-09-10 沈阳雅译网络技术有限公司 一种基于自注意力机制的图像分类方法
CN114170558A (zh) * 2021-12-14 2022-03-11 北京有竹居网络技术有限公司 用于视频处理的方法、系统、设备、介质和产品
CN117180952A (zh) * 2023-11-07 2023-12-08 湖南正明环保股份有限公司 多向气流料层循环半干法烟气脱硫系统及其方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220859A (zh) * 2021-06-01 2021-08-06 平安科技(深圳)有限公司 基于图像的问答方法、装置、计算机设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066973A (zh) * 2017-04-17 2017-08-18 杭州电子科技大学 一种利用时空注意力模型的视频内容描述方法
US9740949B1 (en) * 2007-06-14 2017-08-22 Hrl Laboratories, Llc System and method for detection of objects of interest in imagery
CN109063626A (zh) * 2018-07-27 2018-12-21 深圳市践科技有限公司 动态人脸识别方法和装置
CN109359592A (zh) * 2018-10-16 2019-02-19 北京达佳互联信息技术有限公司 视频帧的处理方法、装置、电子设备及存储介质
CN109409165A (zh) * 2017-08-15 2019-03-01 杭州海康威视数字技术股份有限公司 一种视频内容识别方法、装置及电子设备
CN109544554A (zh) * 2018-10-18 2019-03-29 中国科学院空间应用工程与技术中心 一种植物图像分割及叶片骨架提取方法及系统
CN109583334A (zh) * 2018-11-16 2019-04-05 中山大学 一种基于时空关联神经网络的动作识别方法及其系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832672B (zh) * 2017-10-12 2020-07-07 北京航空航天大学 一种利用姿态信息设计多损失函数的行人重识别方法
CN108510012B (zh) * 2018-05-04 2022-04-01 四川大学 一种基于多尺度特征图的目标快速检测方法
CN109472248B (zh) * 2018-11-22 2022-03-25 广东工业大学 一种行人重识别方法、系统及电子设备和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9740949B1 (en) * 2007-06-14 2017-08-22 Hrl Laboratories, Llc System and method for detection of objects of interest in imagery
CN107066973A (zh) * 2017-04-17 2017-08-18 杭州电子科技大学 一种利用时空注意力模型的视频内容描述方法
CN109409165A (zh) * 2017-08-15 2019-03-01 杭州海康威视数字技术股份有限公司 一种视频内容识别方法、装置及电子设备
CN109063626A (zh) * 2018-07-27 2018-12-21 深圳市践科技有限公司 动态人脸识别方法和装置
CN109359592A (zh) * 2018-10-16 2019-02-19 北京达佳互联信息技术有限公司 视频帧的处理方法、装置、电子设备及存储介质
CN109544554A (zh) * 2018-10-18 2019-03-29 中国科学院空间应用工程与技术中心 一种植物图像分割及叶片骨架提取方法及系统
CN109583334A (zh) * 2018-11-16 2019-04-05 中山大学 一种基于时空关联神经网络的动作识别方法及其系统

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378973A (zh) * 2021-06-29 2021-09-10 沈阳雅译网络技术有限公司 一种基于自注意力机制的图像分类方法
CN113378973B (zh) * 2021-06-29 2023-08-08 沈阳雅译网络技术有限公司 一种基于自注意力机制的图像分类方法
CN114170558A (zh) * 2021-12-14 2022-03-11 北京有竹居网络技术有限公司 用于视频处理的方法、系统、设备、介质和产品
CN117180952A (zh) * 2023-11-07 2023-12-08 湖南正明环保股份有限公司 多向气流料层循环半干法烟气脱硫系统及其方法
CN117180952B (zh) * 2023-11-07 2024-02-02 湖南正明环保股份有限公司 多向气流料层循环半干法烟气脱硫系统及其方法

Also Published As

Publication number Publication date
CN111783506A (zh) 2020-10-16

Similar Documents

Publication Publication Date Title
WO2020233427A1 (zh) 目标的特征的确定方法和装置
Park et al. 3d human pose estimation using convolutional neural networks with 2d pose information
Sun et al. Compositional human pose regression
US10885365B2 (en) Method and apparatus for detecting object keypoint, and electronic device
Chen et al. Facial expression recognition in video with multiple feature fusion
US10769496B2 (en) Logo detection
US9542621B2 (en) Spatial pyramid pooling networks for image processing
EP2893491B1 (en) Image processing apparatus and method for fitting a deformable shape model to an image using random forest regression voting
Yan et al. Ranking with uncertain labels
US9098740B2 (en) Apparatus, method, and medium detecting object pose
WO2020107847A1 (zh) 基于骨骼点的跌倒检测方法及其跌倒检测装置
Jiang et al. Dual attention mobdensenet (damdnet) for robust 3d face alignment
Ma et al. Ppt: token-pruned pose transformer for monocular and multi-view human pose estimation
Tian et al. Densely connected attentional pyramid residual network for human pose estimation
CN110599395A (zh) 目标图像生成方法、装置、服务器及存储介质
CN110738650B (zh) 一种传染病感染识别方法、终端设备及存储介质
Xia et al. Human motion recovery jointly utilizing statistical and kinematic information
Liu et al. Iterative relaxed collaborative representation with adaptive weights learning for noise robust face hallucination
Zhang et al. Multi-local-task learning with global regularization for object tracking
Luvizon et al. Consensus-based optimization for 3D human pose estimation in camera coordinates
Chang et al. 2d–3d pose consistency-based conditional random fields for 3d human pose estimation
Verma et al. Two-stage multi-view deep network for 3D human pose reconstruction using images and its 2D joint heatmaps through enhanced stack-hourglass approach
JP6202938B2 (ja) 画像認識装置および画像認識方法
CN111783497A (zh) 视频中目标的特征确定方法、装置和计算机可读存储介质
US20240013357A1 (en) Recognition system, recognition method, program, learning method, trained model, distillation model and training data set generation method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20809210

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20809210

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20809210

Country of ref document: EP

Kind code of ref document: A1