WO2020233427A1 - 目标的特征的确定方法和装置 - Google Patents
目标的特征的确定方法和装置 Download PDFInfo
- Publication number
- WO2020233427A1 WO2020233427A1 PCT/CN2020/089410 CN2020089410W WO2020233427A1 WO 2020233427 A1 WO2020233427 A1 WO 2020233427A1 CN 2020089410 W CN2020089410 W CN 2020089410W WO 2020233427 A1 WO2020233427 A1 WO 2020233427A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- feature
- frame image
- frame
- image
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
Definitions
- the present disclosure relates to the field of artificial intelligence technology, and in particular to a method for determining characteristics of a target, a device for determining characteristics of a target, and a non-volatile computer-readable storage medium.
- the three-dimensional shape information and posture information of the person in each frame image can be obtained, so as to achieve the establishment of a three-dimensional human body model.
- the use of a three-dimensional human body model can be used to implement smart fitting, identity authentication, etc.
- the key points of the human body in the frame image are extracted, and the three-dimensional shape information and posture information of the human body are estimated based on the image segmentation result.
- a method for determining the characteristics of a target including: extracting target characteristics of the target in each frame image, the frame image including the frame image to be processed and the phase of the frame image to be processed Neighboring frame images; using the attention mechanism model to extract the correlation between the target features of each frame image to determine the correlation feature of each frame image; according to the correlation feature of each frame image, to The target characteristics of the processed frame image are optimized to determine the comprehensive characteristics of the target in the frame image to be processed.
- the extracting the target feature of the target in each frame image includes: extracting the target in each frame image by using the first feature extraction module of the first machine learning model according to the feature vector of each frame image According to the feature vector of each frame image, use the second feature extraction module of the first machine learning model to extract the local feature information of the target in each frame image; fuse the overall feature information and The local feature information determines the target feature.
- the first feature extraction module is a deconvolution layer
- the overall feature information is skeleton model information of the target.
- the second feature extraction module is a fully connected layer
- the local feature information includes local shape features and local posture features.
- the local feature information includes position information of the target in the image, zoom information relative to the camera, rotation information and translation information.
- the fusing the overall feature information and the local feature information to determine the target feature includes: performing a bilinear transformation on the overall feature information and the local feature information to determine the target The shape feature and posture feature of is used as the target feature.
- the attention mechanism model includes a plurality of Transformer modules, and the plurality of Transformer modules are connected in series.
- a convolutional neural network model is used to determine the comprehensive feature of the target in the frame image to be processed according to the associated features of each frame image.
- the determining the comprehensive feature of the target in the frame image to be processed includes: sorting the associated features according to the inherent sequence of each frame image in the video; according to the sorted frame For the associated features of the image, a TCN (Temporal Convolutional Net Temporal Convolutional Network) model is used to determine the comprehensive characteristics of the target in the frame image to be processed, and the comprehensive characteristics include the shape characteristics and posture characteristics of the target.
- TCN Temporal Convolutional Net Temporal Convolutional Network
- the attention mechanism model is trained through the following steps:
- an associated feature queue is generated, and the sequence of the associated features in the associated feature queue is different from the inherent sequence of the corresponding frame image in the video; the second machine learning model is used to The associated features in the associated feature queue are sorted; according to the sorting result and the inherent order, the attention mechanism model is trained.
- an apparatus for determining features of a target including: a target feature extraction unit for extracting target features of the target in each frame image, each frame image including the frame image to be processed and the The adjacent frame images of the frame image to be processed; the associated feature determining unit is configured to use the attention mechanism model to extract the association relationship between the target features of the frame images to determine the associated feature of each frame image; The integrated feature determining unit is configured to optimize the target feature of the frame image to be processed according to the associated feature of each frame image to determine the integrated feature of the target in the frame image to be processed.
- a device for determining the characteristics of a target including: a memory; and a processor coupled to the memory, the processor being configured to be based on the data stored in the memory device
- the instruction executes the method for determining the characteristics of the target in any of the foregoing embodiments.
- a non-volatile computer-readable storage medium on which a computer program is stored.
- the program is executed by a processor, a method for determining the characteristics of the target in any of the above embodiments is provided .
- FIG. 1 shows a flowchart of some embodiments of a method for determining characteristics of a target of the present disclosure
- FIG. 2 shows a flowchart of some embodiments of step 110 in FIG. 1;
- FIG. 3 shows a schematic diagram of some embodiments of step 110 in FIG. 1;
- FIG. 4 shows a schematic diagram of some embodiments of step 120 and step 130 in FIG. 1;
- FIG. 5 shows a block diagram of some embodiments of the device for determining the characteristics of the target of the present disclosure
- FIG. 6 shows a block diagram of other embodiments of the device for determining the characteristics of the target of the present disclosure
- FIG. 7 shows a block diagram of further embodiments of the device for determining the characteristics of the target of the present disclosure.
- the inventors of the present disclosure have discovered the following problems in the above-mentioned related technologies: relying on the key point extraction accuracy and image segmentation accuracy of a single frame of image, and not using the relationship between multiple frames of images, resulting in low accuracy of feature determination.
- the present disclosure proposes a technical solution for determining the feature of the target, which can improve the accuracy of feature determination.
- FIG. 1 shows a flowchart of some embodiments of a method for determining characteristics of a target of the present disclosure.
- the method includes: step 110, extracting the target feature of each frame image; step 120, determining the joint feature of each frame image; and step 130, determining the comprehensive feature of the frame image to be processed.
- each frame image includes a frame image to be processed and adjacent frame images of the frame image to be processed.
- the k-th frame image of the video may be used as the frame image to be processed, and the first N frames and the last N frame images of the k-th frame image may be regarded as adjacent frame images, and both k and N are integers greater than 0.
- the target may be a human body contained in each frame of image, and the target feature may be shape information and posture information of the human body.
- the shape information can be SMPL (Skinned Multi-Person Linear) shape parameters of the human body shape model (such as a vector with length 10)
- the posture information can be the shape parameters of the SMPL human shape model (such as A vector of length 72).
- the human body detection can be performed on the frame image to be processed (for example, using the AlphaPose algorithm) to obtain the rectangular region (which can be referred to as the target area) of the human body in the frame image to be processed; 50 neural network model) extract the feature vector of the frame image to be processed from the target area.
- the same method can be used to extract feature vectors of adjacent frame images.
- step 110 may be implemented through the steps in FIG. 2.
- FIG. 2 shows a flowchart of some embodiments of step 110 in FIG. 1.
- step 110 includes: step 1110, extracting the target feature of each frame image; step 1120, determining the joint feature of each frame image; and step 1130, determining the comprehensive feature of the frame image to be processed.
- step 1110 according to the feature vector of each frame image, the first feature extraction module of the first machine learning model is used to extract the overall feature information of the target in each frame image.
- the first feature extraction module is a deconvolution layer (for example, transposed convolution processing), and the overall feature information is target skeleton model information.
- the skeleton model information may be the position coordinates of the joint points of the human body model.
- step 1120 according to the feature vector of each frame image, the second feature extraction module of the first machine learning model is used to extract the local feature information of the target in each frame image.
- the second feature extraction module is a fully connected layer
- the local feature information includes: local shape features and local posture features (such as local feature information that cannot be reflected by the skeleton model of the human body such as hands, heads, feet), and the target The position information in the image, as well as the zoom information, rotation information and translation information of the target relative to the camera.
- the local feature information may also include the shape information of the human body.
- the target feature can be extracted through the embodiment in FIG. 3.
- FIG. 3 shows a schematic diagram of some embodiments of step 110 in FIG. 1.
- the image feature extraction module 31 (such as the Resnet-50 neural network model) is used to extract the feature vector of the k-th frame image.
- the feature vector is a 16 ⁇ 512 ⁇ 7 ⁇ 7 vector, which is input to the first machine learning model 32.
- the first feature extraction module 321 is used to extract the overall feature information of the target.
- the second feature extraction module 322 is used to extract the local feature information of the target.
- the first feature extraction module 321 may be a deconvolution layer.
- the first feature extraction module 321 may include 3 transposed convolutional layers to expand a 16 ⁇ 512 ⁇ 7 ⁇ 7 vector into a 16 ⁇ 512 ⁇ 56 ⁇ 56 feature map (for example, the feature map may be a key to describe the human body
- the heatmap of the point position is used as the overall feature information.
- the second feature extraction module 322 may be a fully connected layer.
- a global mean pooling method can be used to convert a 16 ⁇ 512 ⁇ 7 ⁇ 7 vector into a 16 ⁇ 512 vector, and then a fully connected layer is used to extract the same size vector from the 16 ⁇ 512 vector for Describe local feature information (detailed information of the human body).
- the algorithm for building a three-dimensional human body model can be decomposed into two relatively simple sub-tasks-overall feature information extraction and local feature information extraction, thereby reducing the complexity of the algorithm through decoupling.
- the overall feature information and the local feature information are input to the bilinear transformation layer 323 to obtain the target feature k in the k-th frame of image.
- the overall feature information is the vector X 1
- the local feature information is the vector X 2
- the weight parameter W of the bilinear transformation layer 323 can be obtained through training
- the output of the bilinear transformation layer 323 is T is the transpose operation.
- the bilinear transformation layer 323 fuses the overall feature information and the local feature information, which can ensure that the two types of information do not affect each other and are independent of each other, and maintain the decoupling state of the two while fusing the information, thereby improving the accuracy of feature extraction .
- the same method as the above-mentioned embodiment may be used to extract target features in adjacent frame images of the k-th frame image to be processed.
- the target feature k-1 in the k-1 frame image, the target feature k+1 in the k+1 frame image, and so on can be extracted.
- the comprehensive features of the target can be determined using the regional steps in Figure 1.
- the attention mechanism model is used to extract the correlation between the target features of each frame of image to determine the correlation feature of each frame of image. For example, the target features in the first 4 frames and the last 4 frames of the frame image to be processed (the target features in 9 consecutive frame images in total) can be extracted, and the attention mechanism model can be input for processing.
- the attention mechanism model includes a plurality of Transformer modules connected in series. In this way, according to the consistency of the target shape and the continuity of the target posture in the continuous frame images, the association information between the target features is excavated many times, and the feature expression learned from the data is optimized, thereby improving the accuracy of feature determination.
- the target characteristics of the frame image to be processed are optimized according to the associated characteristics of each frame image to determine the comprehensive characteristics of the target in the frame image to be processed.
- a convolutional neural network can be used to process the associated features acquired based on the target feature, so as to optimize the target feature.
- the associated features are sorted according to the inherent order of each frame of image in the video. According to the correlation characteristics of each frame image after sorting, the TCN model is used to determine the comprehensive characteristics of the target in the frame image to be processed. Comprehensive features include the shape feature and posture feature of the target.
- the first extracted target feature (i.e. the feature to be processed) does not contain the correlation information between each frame image, so the target feature is not accurate enough;
- the correlation feature determined by the attention mechanism according to the target feature is The features of each frame of image that contain the association relationship information;
- the comprehensive feature is the feature of the target determined by the association relationship information in the association feature. In this way, compared with target features, comprehensive features can describe the target more accurately.
- steps 120 and 130 may be implemented through the embodiment in FIG. 4.
- FIG. 4 shows a schematic diagram of some embodiments of step 120 and step 130 in FIG. 1.
- the target feature k-1, target feature k, and target feature k+1 in the extracted continuous frame images can be input into the attention mechanism model 41 in the order of each frame image in the video to obtain the corresponding association Feature k-1, associated feature k, and associated feature k+1.
- the attention mechanism model 41 includes a Transformer module 411 and a Transformer module 412 connected in series.
- the output associated features include the associated information between the target features, and the comprehensive features in the frame image to be processed are determined according to the associated features, which can improve accuracy.
- the correlation feature k-1, the correlation feature k, and the correlation feature k+1 are input to the TCN model 42, and the target feature k is optimized to obtain the comprehensive feature k of the k-th frame image.
- the TCN model 42 may include two one-dimensional convolution layers and one one-dimensional convolution module.
- the TCN model 42 can introduce information of each associated feature through the first convolution layer, then process it through the one-dimensional convolution module, and finally perform the result prediction output through the second convolution layer.
- a one-dimensional convolution module may include a third convolution layer connected by residuals (for one-dimensional convolution processing), a BN (Batch Normalization) layer, and an activation layer.
- an associated feature queue may be generated according to the associated features of each frame of image, and the sequence of the associated features in the associated feature queue is different from the inherent order of each frame of image in the video.
- the second machine learning model 43 is used to sort the associated features in the associated feature queue. According to the sorting result and the inherent order, the attention mechanism model 41 is trained.
- the second machine learning model 43 is a sorted network model including three convolutional layers and three fully connected layers.
- the associated feature k-1, the associated feature k, and the associated feature k+1 may be shuffled and input into the second machine learning model 43 for sorting. That is to say, the inherent sequence of the frame images in the video can be used for supervised training to return to the correct sequence, and the attention mechanism model 41 can be trained using the sorting result.
- Adopting this confrontation training method can enable the attention mechanism model 41 to deeply understand the sequence between each frame of images, thereby obtaining more accurate feature determination results.
- the attention mechanism model is used to determine the associated features of the frame image to be processed and the adjacent frame image, and the target features in the frame image to be processed are optimized through each associated feature. In this way, the consistency of the target shape and the continuity of the target posture in each frame of image are used, and the accuracy of target feature determination is improved.
- FIG. 5 shows a block diagram of some embodiments of an apparatus for determining characteristics of a target of the present disclosure.
- the target feature determination device 5 includes a target feature extraction unit 51, an associated feature determination unit 52 and a comprehensive feature determination unit 53.
- the target feature extraction unit 51 extracts target features of the target in each frame image, and each frame image includes the frame image to be processed and adjacent frame images of the frame image to be processed.
- the target feature extraction unit 51 uses the first feature extraction module of the first machine learning model to extract the overall feature information of the target in each frame image according to the feature vector of each frame image.
- the first feature extraction module is a deconvolution layer
- the overall feature information is the target's skeleton model information.
- the target feature extraction unit 51 uses the second feature extraction module of the first machine learning model to extract the local feature information of the target in each frame image according to the feature vector of each frame image.
- the second feature extraction module is a fully connected layer
- the local feature information includes local shape features and local posture features, position information of the target in the image, zoom information relative to the camera, rotation information and translation information.
- the target feature extraction unit 51 fuses overall feature information and local feature information to determine the target feature.
- the target feature extraction unit 51 performs bilinear transformation on the overall feature information and the local feature information, and determines the shape feature and posture feature of the target as the target feature.
- the associated feature determining unit 52 uses the attention mechanism model to extract the associated relationship between the target features of each frame image to determine the associated feature of each frame image.
- the attention mechanism model includes multiple Transformer modules, and multiple Transformer modules are connected in series.
- the attention mechanism model is trained through the following steps: generate an associated feature queue according to the associated features of each frame of image, the arrangement order of each associated feature in the associated feature queue and the inherent order of each frame of image in the video Different; use the second machine learning model to sort the correlation features in the correlation feature queue; train the attention mechanism model according to the sorting result and inherent order.
- the comprehensive feature determining unit 54 uses a convolutional neural network model to determine the comprehensive feature of the target in the frame image to be processed according to the associated features of each frame image.
- the integrated feature determining unit 54 optimizes the target feature of the frame image to be processed according to the associated features of each frame image to determine the integrated feature of the target in the frame image to be processed. For example, according to the inherent sequence of each frame image in the video, the related features are sorted, and the TCN model is used to determine the comprehensive characteristics of the target in the frame image to be processed according to the related characteristics of the sorted frames. Shape characteristics and posture characteristics.
- the attention mechanism model is used to determine the associated features of the frame image to be processed and the adjacent frame image, and the target features in the frame image to be processed are optimized through each associated feature. In this way, the consistency of the target shape and the continuity of the target posture in each frame of image are used, and the accuracy of target feature determination is improved.
- FIG. 6 shows a block diagram of other embodiments of the device for determining the characteristics of the target of the present disclosure.
- the device 6 for determining the target feature of this embodiment includes: a memory 61 and a processor 62 coupled to the memory 61, and the processor 62 is configured to execute the present disclosure based on instructions stored in the memory 61 The method for determining the target feature in any one of the embodiments in.
- the memory 61 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
- the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
- FIG. 7 shows a block diagram of further embodiments of the device for determining the characteristics of the target of the present disclosure.
- the device 7 for determining the target feature of this embodiment includes: a memory 710 and a processor 720 coupled to the memory 710.
- the processor 720 is configured to execute any of the foregoing based on instructions stored in the memory 710.
- a method for determining the target feature in an embodiment includes: a memory 710 and a processor 720 coupled to the memory 710.
- the memory 710 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
- the system memory for example, stores an operating system, an application program, a boot loader (Boot Loader), and other programs.
- the device 7 for determining the target feature may also include an input/output interface 730, a network interface 740, a storage interface 750, and so on. These interfaces 730, 740, 750, and the memory 710 and the processor 720 may be connected by a bus 760, for example.
- the input and output interface 730 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
- the network interface 740 provides a connection interface for various networked devices.
- the storage interface 750 provides a connection interface for external storage devices such as SD cards and U disks.
- the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. .
- the method and system of the present disclosure may be implemented in many ways.
- the method and system of the present disclosure can be implemented by software, hardware, firmware or any combination of software, hardware, and firmware.
- the above-mentioned order of the steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless otherwise specifically stated.
- the present disclosure may also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure.
- the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Abstract
Description
Claims (13)
- 一种目标的特征的确定方法,包括:提取各帧图像中目标的目标特征,所述各帧图像包括待处理帧图像和所述待处理帧图像的相邻帧图像;利用注意力机制模型,提取所述各帧图像的目标特征之间的关联关系,以确定所述各帧图像的关联特征;根据所述各帧图像的关联特征,确定所述待处理帧图像中目标的综合特征。
- 根据权利要求1所述的确定方法,其中,所述提取各帧图像中目标的目标特征包括:根据所述各帧图像的特征向量,利用第一机器学习模型的第一特征提取模块,提取所述各帧图像中目标的总体特征信息;根据所述各帧图像的特征向量,利用所述第一机器学习模型的第二特征提取模块,提取所述各帧图像中目标的局部特征信息;融合所述总体特征信息和所述局部特征信息,确定所述目标特征。
- 根据权利要求2所述的确定方法,其中,所述第一特征提取模块为反卷积层,所述总体特征信息为所述目标的骨架模型信息。
- 根据权利要求2所述的确定方法,其中,所述第二特征提取模块为全连接层,所述局部特征信息包括局部形状特征和局部姿态特征。
- 根据权利要求4所述的确定方法,其中,所述局部特征信息包括:所述目标在图像中的位置信息;以及所述目标相对于摄像机的缩放信息、旋转信息和平移信息。
- 根据权利要求2所述的确定方法,其中,所述融合所述总体特征信息和所述局部特征信息,确定所述目标特征包括:对所述总体特征信息和所述局部特征信息进行双线性变换,确定所述目标的形状特征和姿态特征作为所述目标特征。
- 根据权利要求1所述的确定方法,其中,所述注意力机制模型包括多个Transformer模块,所述多个Transformer模块之间串联连接。
- 根据权利要求1所述的确定方法,其中,所述确定所述待处理帧图像中目标的综合特征包括:根据所述各帧图像的关联特征,利用卷积神经网络模型,确定所述待处理帧图像中目标的综合特征。
- 根据权利要求1所述的确定方法,其中,所述确定所述待处理帧图像中目标的综合特征包括:按照所述各帧图像在视频中的固有顺序,对各关联特征进行排序;根据排序后的所述各帧图像的关联特征,利用时域卷积网络TCN模型确定所述待处理帧图像中目标的综合特征,所述综合特征包括所述目标的形状特征和姿态特征。
- 根据权利要求1-9任一项所述的确定方法,其中,所述注意力机制模型通过下面的步骤进行训练:根据所述各帧图像的关联特征,生成关联特征队列,所述关联特征队列中各关联特征的排列顺序与对应的帧图像在视频中的固有顺序不同;利用第二机器学习模型,对所述关联特征队列中各关联特征进行排序;根据排序结果和所述固有顺序,对所述注意力机制模型进行训练。
- 一种目标的特征的确定装置,包括:目标特征提取单元,用于提取各帧图像中目标的目标特征,所述各帧图像包括待处理帧图像和所述待处理帧图像的相邻帧图像;关联特征确定单元,用于利用注意力机制模型,提取所述各帧图像的目标特征之间的关联关系,以确定所述各帧图像的关联特征;综合特征确定单元,用于根据所述各帧图像的关联特征,以确定所述待处理帧图像中目标的综合特征。
- 一种目标的特征的确定装置,包括:存储器;和耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器装置中的指令,执行权利要求1-10任一项所述的目标的特征的确定方法。
- 一种非易失性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现权利要求1-10任一项所述的目标的特征的确定方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910411768.0A CN111783506A (zh) | 2019-05-17 | 2019-05-17 | 目标特征的确定方法、装置和计算机可读存储介质 |
CN201910411768.0 | 2019-05-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020233427A1 true WO2020233427A1 (zh) | 2020-11-26 |
Family
ID=72755588
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/089410 WO2020233427A1 (zh) | 2019-05-17 | 2020-05-09 | 目标的特征的确定方法和装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111783506A (zh) |
WO (1) | WO2020233427A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113378973A (zh) * | 2021-06-29 | 2021-09-10 | 沈阳雅译网络技术有限公司 | 一种基于自注意力机制的图像分类方法 |
CN114170558A (zh) * | 2021-12-14 | 2022-03-11 | 北京有竹居网络技术有限公司 | 用于视频处理的方法、系统、设备、介质和产品 |
CN117180952A (zh) * | 2023-11-07 | 2023-12-08 | 湖南正明环保股份有限公司 | 多向气流料层循环半干法烟气脱硫系统及其方法 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113220859A (zh) * | 2021-06-01 | 2021-08-06 | 平安科技(深圳)有限公司 | 基于图像的问答方法、装置、计算机设备及存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066973A (zh) * | 2017-04-17 | 2017-08-18 | 杭州电子科技大学 | 一种利用时空注意力模型的视频内容描述方法 |
US9740949B1 (en) * | 2007-06-14 | 2017-08-22 | Hrl Laboratories, Llc | System and method for detection of objects of interest in imagery |
CN109063626A (zh) * | 2018-07-27 | 2018-12-21 | 深圳市践科技有限公司 | 动态人脸识别方法和装置 |
CN109359592A (zh) * | 2018-10-16 | 2019-02-19 | 北京达佳互联信息技术有限公司 | 视频帧的处理方法、装置、电子设备及存储介质 |
CN109409165A (zh) * | 2017-08-15 | 2019-03-01 | 杭州海康威视数字技术股份有限公司 | 一种视频内容识别方法、装置及电子设备 |
CN109544554A (zh) * | 2018-10-18 | 2019-03-29 | 中国科学院空间应用工程与技术中心 | 一种植物图像分割及叶片骨架提取方法及系统 |
CN109583334A (zh) * | 2018-11-16 | 2019-04-05 | 中山大学 | 一种基于时空关联神经网络的动作识别方法及其系统 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832672B (zh) * | 2017-10-12 | 2020-07-07 | 北京航空航天大学 | 一种利用姿态信息设计多损失函数的行人重识别方法 |
CN108510012B (zh) * | 2018-05-04 | 2022-04-01 | 四川大学 | 一种基于多尺度特征图的目标快速检测方法 |
CN109472248B (zh) * | 2018-11-22 | 2022-03-25 | 广东工业大学 | 一种行人重识别方法、系统及电子设备和存储介质 |
-
2019
- 2019-05-17 CN CN201910411768.0A patent/CN111783506A/zh active Pending
-
2020
- 2020-05-09 WO PCT/CN2020/089410 patent/WO2020233427A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9740949B1 (en) * | 2007-06-14 | 2017-08-22 | Hrl Laboratories, Llc | System and method for detection of objects of interest in imagery |
CN107066973A (zh) * | 2017-04-17 | 2017-08-18 | 杭州电子科技大学 | 一种利用时空注意力模型的视频内容描述方法 |
CN109409165A (zh) * | 2017-08-15 | 2019-03-01 | 杭州海康威视数字技术股份有限公司 | 一种视频内容识别方法、装置及电子设备 |
CN109063626A (zh) * | 2018-07-27 | 2018-12-21 | 深圳市践科技有限公司 | 动态人脸识别方法和装置 |
CN109359592A (zh) * | 2018-10-16 | 2019-02-19 | 北京达佳互联信息技术有限公司 | 视频帧的处理方法、装置、电子设备及存储介质 |
CN109544554A (zh) * | 2018-10-18 | 2019-03-29 | 中国科学院空间应用工程与技术中心 | 一种植物图像分割及叶片骨架提取方法及系统 |
CN109583334A (zh) * | 2018-11-16 | 2019-04-05 | 中山大学 | 一种基于时空关联神经网络的动作识别方法及其系统 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113378973A (zh) * | 2021-06-29 | 2021-09-10 | 沈阳雅译网络技术有限公司 | 一种基于自注意力机制的图像分类方法 |
CN113378973B (zh) * | 2021-06-29 | 2023-08-08 | 沈阳雅译网络技术有限公司 | 一种基于自注意力机制的图像分类方法 |
CN114170558A (zh) * | 2021-12-14 | 2022-03-11 | 北京有竹居网络技术有限公司 | 用于视频处理的方法、系统、设备、介质和产品 |
CN117180952A (zh) * | 2023-11-07 | 2023-12-08 | 湖南正明环保股份有限公司 | 多向气流料层循环半干法烟气脱硫系统及其方法 |
CN117180952B (zh) * | 2023-11-07 | 2024-02-02 | 湖南正明环保股份有限公司 | 多向气流料层循环半干法烟气脱硫系统及其方法 |
Also Published As
Publication number | Publication date |
---|---|
CN111783506A (zh) | 2020-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020233427A1 (zh) | 目标的特征的确定方法和装置 | |
Park et al. | 3d human pose estimation using convolutional neural networks with 2d pose information | |
Sun et al. | Compositional human pose regression | |
US10885365B2 (en) | Method and apparatus for detecting object keypoint, and electronic device | |
Chen et al. | Facial expression recognition in video with multiple feature fusion | |
US10769496B2 (en) | Logo detection | |
US9542621B2 (en) | Spatial pyramid pooling networks for image processing | |
EP2893491B1 (en) | Image processing apparatus and method for fitting a deformable shape model to an image using random forest regression voting | |
Yan et al. | Ranking with uncertain labels | |
US9098740B2 (en) | Apparatus, method, and medium detecting object pose | |
WO2020107847A1 (zh) | 基于骨骼点的跌倒检测方法及其跌倒检测装置 | |
Jiang et al. | Dual attention mobdensenet (damdnet) for robust 3d face alignment | |
Ma et al. | Ppt: token-pruned pose transformer for monocular and multi-view human pose estimation | |
Tian et al. | Densely connected attentional pyramid residual network for human pose estimation | |
CN110599395A (zh) | 目标图像生成方法、装置、服务器及存储介质 | |
CN110738650B (zh) | 一种传染病感染识别方法、终端设备及存储介质 | |
Xia et al. | Human motion recovery jointly utilizing statistical and kinematic information | |
Liu et al. | Iterative relaxed collaborative representation with adaptive weights learning for noise robust face hallucination | |
Zhang et al. | Multi-local-task learning with global regularization for object tracking | |
Luvizon et al. | Consensus-based optimization for 3D human pose estimation in camera coordinates | |
Chang et al. | 2d–3d pose consistency-based conditional random fields for 3d human pose estimation | |
Verma et al. | Two-stage multi-view deep network for 3D human pose reconstruction using images and its 2D joint heatmaps through enhanced stack-hourglass approach | |
JP6202938B2 (ja) | 画像認識装置および画像認識方法 | |
CN111783497A (zh) | 视频中目标的特征确定方法、装置和计算机可读存储介质 | |
US20240013357A1 (en) | Recognition system, recognition method, program, learning method, trained model, distillation model and training data set generation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20809210 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20809210 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.03.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20809210 Country of ref document: EP Kind code of ref document: A1 |