WO2024037330A1 - Image feature processing method and apparatus, and storage medium - Google Patents

Image feature processing method and apparatus, and storage medium Download PDF

Info

Publication number
WO2024037330A1
WO2024037330A1 PCT/CN2023/110526 CN2023110526W WO2024037330A1 WO 2024037330 A1 WO2024037330 A1 WO 2024037330A1 CN 2023110526 W CN2023110526 W CN 2023110526W WO 2024037330 A1 WO2024037330 A1 WO 2024037330A1
Authority
WO
WIPO (PCT)
Prior art keywords
blocks
gradient
block
feature map
value
Prior art date
Application number
PCT/CN2023/110526
Other languages
French (fr)
Chinese (zh)
Inventor
韩韬
张园
杨明川
王慧芬
薛俊达
Original Assignee
中国电信股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国电信股份有限公司 filed Critical 中国电信股份有限公司
Publication of WO2024037330A1 publication Critical patent/WO2024037330A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present disclosure relates to the field of image processing, and in particular to an image feature processing method and device, and a storage medium.
  • the 5G era has spawned a large number of machine-oriented applications, such as Internet of Vehicles, driverless driving, industrial Internet, smart and safe cities, wearables, video surveillance and other machine vision content. Compared with the increasingly saturated videos for human vision tasks, application scenarios More extensive. Video encoding for machine vision will become one of the main sources of incremental traffic in the 5G and post-5G era.
  • an encoder encodes images or videos oriented to human vision tasks to generate a bit stream.
  • the decoder decrypts the bitstream to obtain images or videos. Since the encoder and decoder are mainly based on convolutional neural networks, they cannot selectively compress images or videos and discard features of non-important areas in the image or video.
  • the present disclosure provides an image feature processing solution that can effectively discard features of non-important areas in images or videos in order to better complete machine vision intelligent analysis tasks.
  • an image feature processing method is provided, which is executed by an image feature processing device, including: extracting features of the original image to obtain a source feature map; dividing the source feature map into a predetermined number D first blocks that do not overlap each other; calculate the gradient value of each first block in all first blocks; sort all the first blocks according to the gradient average to delete the one with the smallest gradient average The first p first blocks; use the preset encoder to embed the block embedding features and position embedding features of the Dp first blocks that have not been deleted.
  • Encoding processing is performed to obtain D second blocks, wherein the D second blocks include Dp coded visible blocks and p mask tokens located at predetermined positions, wherein the Dp coded visible blocks are The blocks correspond to Dp first blocks one-to-one; a preset decoder is used to decode all the second blocks and the position embedded features of the source feature map to obtain the reconstructed feature map.
  • calculating the gradient value of each first block includes: in the first block d(i,j), calculating the first gradient value of each feature point f(x,y) in the first direction.
  • Gradient g _ _ _ ⁇ d 2 determine the gradient value of the feature point f (x, y) according to the first gradient g x and the second gradient g y ; according to the first block d (i, j)
  • the gradient values of all feature points determine the gradient value of the first block d(i,j).
  • the gradient value of the feature point f(x, y) is the mean square value of the first gradient g x and the second gradient g y .
  • the gradient value of the first block d(i,j) is the average of the gradient values of all feature points in the first block d(i,j).
  • the parameter p is determined according to a preset compression ratio ⁇ .
  • the compression ratio ⁇ is:
  • n 1 ⁇ n 2 is the size of the source feature map.
  • using a preset decoder to decode all the second blocks and the position embedded features of the source feature map includes: targeting the t-th second block in all the second blocks. , respectively according to the first attention weight matrix of each single head Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , 1 ⁇ t ⁇ D; according to the first vector matrix Q t , second vector matrix K t and third vector The matrix V t determines the attention value of each single head; determines the multi-head attention value of the t-th second block based on the attention values of all single heads of the t-th second block; for The multi-head attention value of the t-th second block and the t-th second block are subjected to multi-layer perception processing to obtain the reconstructed feature map.
  • the encoder is a Transformer encoder; the decoder is a Transformer decoder.
  • an image feature processing device including: a first processing module configured to extract features of the original image to obtain a source feature map; and a second processing module configured to extract the source feature map.
  • the source feature map is divided into a predetermined number D of first blocks that do not overlap each other, and each of the first blocks is calculated.
  • the gradient value of the first block is to sort all the first blocks according to the gradient average value to delete the first p first blocks with the smallest gradient average value; the third processing module is configured to use the preset
  • the encoder encodes the block embedding features and position embedding features of the Dp first blocks that have not been deleted to obtain D second blocks, wherein the D second blocks include components located at predetermined positions.
  • the fourth processing module is configured to use a preset decoder pair All the second blocks and the position embedded features of the source feature map are decoded to obtain the reconstructed feature map.
  • the gradient value of the feature point f(x, y) is the mean square value of the first gradient g x and the second gradient g y .
  • the gradient value of the first block d(i,j) is the average of the gradient values of all feature points in the first block d(i,j).
  • the second processing module is configured to determine the parameter p according to the preset compression ratio ⁇ .
  • the compression ratio ⁇ is:
  • n 1 ⁇ n 2 is the size of the source feature map.
  • the fourth processing module is configured to, for the t-th second block among all the second blocks, respectively, according to the first attention weight matrix of each single head.
  • Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , 1 ⁇ t ⁇ D, according to the first vector matrix Q t , second vector matrix K t and third vector
  • the matrix V t determines the attention value of each single head, and determines the multi-head attention value of the t second block based on the attention values of all single heads in the t second block.
  • the multi-head attention value of the t-th second block and the t-th second block are subjected to multi-layer perception processing to obtain the reconstructed feature map.
  • the encoder is a Transformer encoder; the decoder is a Transformer decoder.
  • an image feature processing device including: a memory, configured to store instructions; the processor is coupled to the memory, and the processor is configured to execute the method described in any of the above embodiments based on the instructions stored in the memory.
  • a non-transitory computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, the methods described in any of the above embodiments are implemented. method.
  • a computer program product including computer instructions, wherein when the computer instructions are executed by a processor, the method as described in any of the above embodiments is implemented.
  • Figure 1 is a schematic flowchart of an image feature processing method according to an embodiment of the present disclosure
  • FIGS. 2A-2C are schematic diagrams of feature diagrams of some embodiments of the present disclosure.
  • 3A-3B are schematic diagrams of characteristic diagrams of other embodiments of the present disclosure.
  • Figure 4 is a schematic diagram of an encoder output according to an embodiment of the present disclosure.
  • Figure 5 is a schematic diagram of a decoder according to an embodiment of the present disclosure.
  • Figure 6 is a schematic structural diagram of an image feature processing device according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an image feature processing device according to another embodiment of the present disclosure.
  • any specific values are to be construed as illustrative only and not as limiting. Accordingly, other examples of the exemplary embodiments may have different values.
  • Figure 1 is a schematic flowchart of an image feature processing method according to an embodiment of the present disclosure.
  • the following image feature processing method is performed by an image feature processing device.
  • step 101 the features of the original image are extracted to obtain the source feature map.
  • the original image is input into a CNN (Convolutional Neural Network) to obtain the source feature map.
  • CNN Convolutional Neural Network
  • step 102 the source feature map is divided into a predetermined number D of first blocks that do not overlap with each other.
  • the size of the source feature map is 28 ⁇ 28.
  • the size of each block is 7 ⁇ 7, that is, the source feature map is divided into 4 ⁇ 4 blocks, d(0,0) to d(3,3) as shown in Figure 2C.
  • step 103 the gradient value of each first block in all first blocks is calculated.
  • the first gradient gx in the first direction and the second gradient in the second direction of each feature point f(x,y) are calculated.
  • the first direction is the x-axis direction in the preset plane
  • the second direction is the y-axis direction in the preset plane.
  • the first gradient g x of the feature point f(x,y) in the first direction is:
  • the second gradient g y of the feature point f(x,y) in the second direction is:
  • the gradient value of the feature point f(x,y) is determined based on the first gradient gx and the second gradient gy .
  • the gradient value of the feature point f(x, y) is the mean square value of the first gradient g x and the second gradient g y .
  • the gradient value of the first block d(i,j) is the average of the gradient values of all feature points in the first block d(i,j).
  • step 104 all the first blocks are sorted according to the gradient average, so as to delete the top p first blocks with the smallest gradient average.
  • the parameter p is determined according to a preset compression ratio ⁇ .
  • n 1 ⁇ n 2 is the size of the source feature map. Therefore, the number of discarded blocks can be selected as needed, and the compression ratio of features can be flexibly adjusted.
  • D the feature map
  • the five reserved blocks are: the 2nd block, the 7th block, the 9th block, the 14th block and the 16th block.
  • step 105 use a preset encoder to encode the block embedding features and position embedding features of the D-p first blocks that have not been deleted to obtain D second blocks, where D second blocks includes D-p coded visible blocks and p mask tokens located at predetermined positions, where the D-p coded visible blocks correspond to the D-p first blocks one-to-one.
  • the default encoder is a Transformer encoder.
  • the 5 blocks shown in Figure 3B namely the 2nd block, the 7th block, the 9th block, the 14th block, and the 16th block, are input to the trained encoder , so that the encoder outputs the encoding result.
  • the encoding result includes 16 blocks.
  • the positions of the five coded visible blocks in the coding result are the same as those of the five blocks in Figure 3B. Corresponds to the location.
  • the five encoded visible blocks in the encoding result are the 2nd block, the 7th block, the 9th block, the 14th block and the 14th block in the encoding result. 16 blocks.
  • the encoder will add the corresponding mask token (Mask Token) at the position of the discarded block, as shown in the 11 dark boxes 42 in Figure 4. That is, the encoding result includes 11 mask tokens, which are the first block, the third block, the fourth block, the fifth block, the sixth block, and the third block in the encoding result. 8th chunk, 10th chunk, 11th chunk, 12th chunk, 13th chunk and 15th chunk.
  • a preset decoder is used to decode all the second blocks and the position embedded features of the source feature map to obtain the reconstructed feature map.
  • the default decoder is a Transformer decoder.
  • the structure of the decoder is as shown in Figure 5.
  • the input features are processed by the multi-head self-attention (Multi-head Self Attention) layer after normalization.
  • Multi-head Self Attention Multi-head Self Attention
  • calculation formula is as shown in formula (6), where Ft is the feature of the t-th second block.
  • the attention value s t of each single head is determined according to the first vector matrix Q t , the second vector matrix K t and the third vector matrix V t , as shown in formula (7).
  • the multi-head attention value of the t-th second block is determined based on the attention values of all single-heads of the t-th second block, as shown in formula (8).
  • the multi-head attention value of the t-th second block and the t-th second block are input into the MLP (Multi-Layer perceptron, multi-layer perception) layer for corresponding processing to obtain the reconstructed feature map.
  • MLP Multi-Layer perceptron, multi-layer perception
  • the features of non-important areas in the image or video can be effectively discarded to better Complete machine vision intelligent analysis tasks.
  • FIG. 6 is a schematic structural diagram of an image feature processing device according to an embodiment of the present disclosure. As shown in FIG. 6 , the image feature processing device includes a first processing module 61 , a second processing module 62 , a third processing module 63 and a fourth processing module 64 .
  • the first processing module 61 is configured to extract features of the original image to obtain a source feature map.
  • the original image is input into the CNN to obtain the source feature map.
  • the second processing module 62 is configured to divide the source feature map into a predetermined number D of mutually non-overlapping first blocks, calculate the gradient value of each first block in all first blocks, and divide all first blocks into Blocks are sorted by gradient mean to remove the top p first blocks with the smallest gradient mean.
  • the second processing module 62 calculates the first gradient g x of each feature point f (x, y) in the first direction and the second gradient g x in the first block d (i, j).
  • the second gradient g y in the direction, where the first direction and the second direction are perpendicular to each other, where 1 ⁇ i ⁇ d 1 , 1 ⁇ j ⁇ d 2 , D d 1 ⁇ d 2 .
  • the first direction is the x-axis direction in the preset plane
  • the second direction is the y-axis direction in the preset plane.
  • the second processing module 62 determines the gradient value of the feature point f(x, y) according to the first gradient g x and the second gradient gy .
  • the gradient value of the first block d(i,j) is determined based on the gradient values of all feature points in the first block d(i,j).
  • the gradient value of the feature point f(x, y) is the mean square value of the first gradient g x and the second gradient g y , as shown in the above formula (3).
  • the gradient value of the first block d(i,j) is the average of the gradient values of all feature points in the first block d(i,j), as shown in the above formula (4) .
  • the parameter p is determined according to a preset compression ratio ⁇ .
  • the parameter p is determined according to the above formula (5).
  • the third processing module 63 is configured to use a preset encoder to segment the Dp first blocks that have not been deleted. Block embedding features and position embedding features are encoded to obtain D second blocks, where the D second blocks include Dp encoded visual blocks and p mask tokens located at predetermined positions, where Dp coded visible blocks correspond to Dp first blocks one-to-one.
  • the default encoder is a Transformer encoder.
  • the fourth processing module 64 is configured to use a preset decoder to decode all the second blocks and the position embedded features of the source feature map to obtain the reconstructed feature map.
  • the default decoder is a Transformer decoder.
  • the fourth processing module 64 performs the calculation according to the first attention weight matrix of each single head for the t-th second block among all the second blocks.
  • Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , 1 ⁇ t ⁇ D.
  • the above formula (6) is used for calculation.
  • the fourth processing module 64 determines the attention value of each single head according to the first vector matrix Q t , the second vector matrix K t and the third vector matrix V t .
  • the above formula (7) is used for calculation.
  • the fourth processing module 64 determines the multi-head attention value of the t-th second block based on the attention values of all single heads of the t-th second block. For example, the above formula (8) is used for calculation.
  • multi-layer perception processing is performed on the multi-head attention value of the t-th second block and the t-th second block to obtain the reconstructed feature map.
  • FIG. 7 is a schematic structural diagram of an image feature processing device according to another embodiment of the present disclosure. As shown in FIG. 7 , the image feature processing device includes a memory 71 and a processor 72 .
  • the memory 71 is used to store instructions, and the processor 72 is coupled to the memory 71 .
  • the processor 72 is configured to execute the method involved in any embodiment in FIG. 1 based on the instructions stored in the memory.
  • the image feature processing device also includes a communication interface 73 for information interaction with other devices.
  • the image feature processing device also includes a bus 74, through which the processor 72, the communication interface 73, and the memory 71 complete communication with each other.
  • the memory 71 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 71 may also be a memory array.
  • the memory 71 may also be divided into blocks, and the blocks may be combined into virtual volumes according to certain rules.
  • processor 72 may be a central processing unit (CPU), or may be an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present disclosure.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • the present disclosure also relates to a computer-readable storage medium, wherein the computer-readable storage medium stores computer Computer instructions, when executed by the processor, implement the method involved in any embodiment in Figure 1.
  • this disclosure can filter out the important areas where the feature information focuses according to the sorting results of the gradient values, and then discard the non-important area blocks in the features to achieve feature compression;
  • the present disclosure can flexibly control the compression rate by controlling the ratio of discarding feature blocks, and flexibly match various compression rate requirements;
  • the encoding part of the present disclosure can be added to the image device of the machine vision system as a machine vision encoding module, and the decoding part of the present disclosure can be added to the edge function set of the machine vision system as a machine vision decoding module, thereby improving compression efficiency.
  • the functional units described above can be implemented as a general-purpose processor, a programmable logic controller (PLC), a digital signal processor (Digital processor) for performing the functions described in this disclosure.
  • PLC programmable logic controller
  • Digital processor Digital processor
  • DSP Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the program can be stored in a computer-readable storage medium.
  • the storage medium can be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)

Abstract

The present disclosure relates to the field of image processing, and provides an image feature processing method and apparatus, and a storage medium. The image feature processing method comprises: extracting features of an original image so as to obtain a source feature map; segmenting the source feature map into a predetermined number of D mutually non-overlapping first blocks; calculating a gradient value of each of the first blocks; sorting all of the first blocks according to a gradient average value so as to delete the first p first blocks having the smallest gradient average value; using a preset encoder to encode block embedding features and position embedding features of D-p first blocks which are not deleted so as to obtain D second blocks, wherein the D second blocks comprise D-p encoded visual blocks and p mask tokens located at a preset position, and the D-p encoded visual blocks are in one-to-one correspondence with the D-p first blocks; and using a preset decoder to decode position embedding features of all of the second blocks and the source feature map so as to obtain a reconstructed feature map.

Description

图像特征处理方法和装置、存储介质Image feature processing method and device, storage medium
相关申请的交叉引用Cross-references to related applications
本申请是以CN申请号为202210998237.8,申请日为2022年8月19日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。This application is based on the application with CN application number 202210998237.8 and the filing date is August 19, 2022, and claims its priority. The disclosure content of the CN application is hereby incorporated into this application as a whole.
技术领域Technical field
本公开涉及图像处理领域,特别涉及一种图像特征处理方法和装置、存储介质。The present disclosure relates to the field of image processing, and in particular to an image feature processing method and device, and a storage medium.
背景技术Background technique
随着5G、大数据以及人工智能的快速发展,在图像或视频大数据应用背景下,诸如图像和视频的媒体内容被广泛应用在目标检测、目标跟踪、图像分类、图像分割、行人重识别等智能视觉任务领域。With the rapid development of 5G, big data and artificial intelligence, in the context of image or video big data application, media content such as images and videos are widely used in target detection, target tracking, image classification, image segmentation, pedestrian re-identification, etc. Intelligent vision task field.
5G时代催生出面向机器的海量应用,如车联网、无人驾驶、工业互联网、智慧与平安城市、可穿戴、视频监控等机器视觉内容,相比日趋饱和的面向人类视觉任务的视频,应用场景更为广泛。面向机器视觉的视频编码将成为5G和后5G时代的主要增量流量来源之一。The 5G era has spawned a large number of machine-oriented applications, such as Internet of Vehicles, driverless driving, industrial Internet, smart and safe cities, wearables, video surveillance and other machine vision content. Compared with the increasingly saturated videos for human vision tasks, application scenarios More extensive. Video encoding for machine vision will become one of the main sources of incremental traffic in the 5G and post-5G era.
发明内容Contents of the invention
发明人注意到,在相关技术中,编码器对面向人类视觉任务的图像或视频进行编码,以生成比特流。解码器对比特流进行解密以获取图像或视频。由于编码器和解码器主要基于卷积神经网络,因此无法对图像或视频进行有选择性的压缩,无法丢弃图像或视频中的非重要区域的特征。The inventor notes that in the related art, an encoder encodes images or videos oriented to human vision tasks to generate a bit stream. The decoder decrypts the bitstream to obtain images or videos. Since the encoder and decoder are mainly based on convolutional neural networks, they cannot selectively compress images or videos and discard features of non-important areas in the image or video.
据此,本公开提供一种图像特征处理方案,能够有效丢弃图像或视频中的非重要区域的特征,以便更好地完成机器视觉智能分析任务。Accordingly, the present disclosure provides an image feature processing solution that can effectively discard features of non-important areas in images or videos in order to better complete machine vision intelligent analysis tasks.
根据本公开实施例的第一方面,提供一种图像特征处理方法,由图像特征处理装置执行,包括:提取原始图像的特征,以得到源特征图;将所述源特征图分割为预定数量D个相互不重叠的第一分块;计算全部第一分块中的每个第一分块的梯度值;将所述全部第一分块按照梯度平均值进行排序,以删除梯度平均值最小的前p个第一分块;利用预设的编码器对未被删除的D-p个第一分块的分块嵌入特征和位置嵌入特征 进行编码处理,以得到D个第二分块,其中所述D个第二分块中包括位于预定位置上的D-p个编码可视分块和p个掩膜令牌,其中D-p个编码可视分块与D-p个第一分块一一对应;利用预设的解码器对全部第二分块和所述源特征图的位置嵌入特征进行解码处理,以得到重构特征图。According to a first aspect of an embodiment of the present disclosure, an image feature processing method is provided, which is executed by an image feature processing device, including: extracting features of the original image to obtain a source feature map; dividing the source feature map into a predetermined number D first blocks that do not overlap each other; calculate the gradient value of each first block in all first blocks; sort all the first blocks according to the gradient average to delete the one with the smallest gradient average The first p first blocks; use the preset encoder to embed the block embedding features and position embedding features of the Dp first blocks that have not been deleted. Encoding processing is performed to obtain D second blocks, wherein the D second blocks include Dp coded visible blocks and p mask tokens located at predetermined positions, wherein the Dp coded visible blocks are The blocks correspond to Dp first blocks one-to-one; a preset decoder is used to decode all the second blocks and the position embedded features of the source feature map to obtain the reconstructed feature map.
在一些实施例中,计算每个第一分块的梯度值包括:在第一分块d(i,j)中,计算每个特征点f(x,y)在第一方向上的第一梯度gx和在第二方向上的第二梯度gy,其中所述第一方向和所述第二方向相互垂直,其中1≤i≤d1,1≤j≤d2,D=d1×d2;根据所述第一梯度gx和所述第二梯度gy确定所述特征点f(x,y)的梯度值;根据所述第一分块d(i,j)中的全部特征点的梯度值确定第一分块d(i,j)的梯度值。In some embodiments, calculating the gradient value of each first block includes: in the first block d(i,j), calculating the first gradient value of each feature point f(x,y) in the first direction. Gradient g _ _ _ ×d 2 ; determine the gradient value of the feature point f (x, y) according to the first gradient g x and the second gradient g y ; according to the first block d (i, j) The gradient values of all feature points determine the gradient value of the first block d(i,j).
在一些实施例中,所述特征点f(x,y)的梯度值为所述第一梯度gx和所述第二梯度gy的均方值。In some embodiments, the gradient value of the feature point f(x, y) is the mean square value of the first gradient g x and the second gradient g y .
在一些实施例中,第一分块d(i,j)的梯度值为所述第一分块d(i,j)中的全部特征点的梯度值的平均值。In some embodiments, the gradient value of the first block d(i,j) is the average of the gradient values of all feature points in the first block d(i,j).
在一些实施例中,根据预设的压缩比α确定参数p。In some embodiments, the parameter p is determined according to a preset compression ratio α.
在一些实施例中,所述压缩比α为:
In some embodiments, the compression ratio α is:
其中n1×n2为源特征图的大小。Where n 1 × n 2 is the size of the source feature map.
在一些实施例中,利用预设的解码器对全部第二分块和所述源特征图的位置嵌入特征进行解码处理包括:针对所述全部第二分块中的第t个第二分块,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,1≤t≤D;根据所述第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt确定所述每个单头的注意力值;根据所述第t个第二分块的全部单头的注意力值确定所述第t个第二分块的多头注意力值;对所述第t个第二分块的多头注意力值和所述第t个第二分块进行多层感知处理,以得到所述重构特征图。In some embodiments, using a preset decoder to decode all the second blocks and the position embedded features of the source feature map includes: targeting the t-th second block in all the second blocks. , respectively according to the first attention weight matrix of each single head Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , 1≤t≤D; according to the first vector matrix Q t , second vector matrix K t and third vector The matrix V t determines the attention value of each single head; determines the multi-head attention value of the t-th second block based on the attention values of all single heads of the t-th second block; for The multi-head attention value of the t-th second block and the t-th second block are subjected to multi-layer perception processing to obtain the reconstructed feature map.
在一些实施例中,所述编码器为Transformer编码器;所述解码器为Transformer解码器。In some embodiments, the encoder is a Transformer encoder; the decoder is a Transformer decoder.
根据本公开实施例的第二方面,提供一种图像特征处理装置,包括:第一处理模块,被配置为提取原始图像的特征,以得到源特征图;第二处理模块,被配置为将所述源特征图分割为预定数量D个相互不重叠的第一分块,计算全部第一分块中的每个 第一分块的梯度值,将所述全部第一分块按照梯度平均值进行排序,以删除梯度平均值最小的前p个第一分块;第三处理模块,被配置为利用预设的编码器对未被删除的D-p个第一分块的分块嵌入特征和位置嵌入特征进行编码处理,以得到D个第二分块,其中所述D个第二分块中包括位于预定位置上的D-p个编码可视分块和p个掩膜令牌,其中D-p个编码可视分块与D-p个第一分块一一对应;第四处理模块,被配置为利用预设的解码器对全部第二分块和所述源特征图的位置嵌入特征进行解码处理,以得到重构特征图。According to a second aspect of an embodiment of the present disclosure, an image feature processing device is provided, including: a first processing module configured to extract features of the original image to obtain a source feature map; and a second processing module configured to extract the source feature map. The source feature map is divided into a predetermined number D of first blocks that do not overlap each other, and each of the first blocks is calculated. The gradient value of the first block is to sort all the first blocks according to the gradient average value to delete the first p first blocks with the smallest gradient average value; the third processing module is configured to use the preset The encoder encodes the block embedding features and position embedding features of the Dp first blocks that have not been deleted to obtain D second blocks, wherein the D second blocks include components located at predetermined positions. Dp coded visible blocks and p mask tokens, where the Dp coded visible blocks correspond to Dp first blocks one-to-one; the fourth processing module is configured to use a preset decoder pair All the second blocks and the position embedded features of the source feature map are decoded to obtain the reconstructed feature map.
在一些实施例中,第二处理模块被配置为在第一分块d(i,j)中,计算每个特征点f(x,y)在第一方向上的第一梯度gx和在第二方向上的第二梯度gy,其中所述第一方向和所述第二方向相互垂直,其中1≤i≤d1,1≤j≤d2,D=d1×d2;根据所述第一梯度gx和所述第二梯度gy确定所述特征点f(x,y)的梯度值;根据所述第一分块d(i,j)中的全部特征点的梯度值确定第一分块d(i,j)的梯度值。In some embodiments, the second processing module is configured to, in the first block d(i,j), calculate the first gradient g x of each feature point f(x, y) in the first direction and in The second gradient g y in the second direction, wherein the first direction and the second direction are perpendicular to each other, where 1≤i≤d 1 , 1≤j≤d 2 , D=d 1 ×d 2 ; according to The first gradient g x and the second gradient g y determine the gradient value of the feature point f (x, y); according to the gradient of all feature points in the first block d (i, j) The value determines the gradient value of the first block d(i,j).
在一些实施例中,所述特征点f(x,y)的梯度值为所述第一梯度gx和所述第二梯度gy的均方值。In some embodiments, the gradient value of the feature point f(x, y) is the mean square value of the first gradient g x and the second gradient g y .
在一些实施例中,第一分块d(i,j)的梯度值为所述第一分块d(i,j)中的全部特征点的梯度值的平均值。In some embodiments, the gradient value of the first block d(i,j) is the average of the gradient values of all feature points in the first block d(i,j).
在一些实施例中,第二处理模块被配置为根据预设的压缩比α确定参数p。In some embodiments, the second processing module is configured to determine the parameter p according to the preset compression ratio α.
在一些实施例中,所述压缩比α为:
In some embodiments, the compression ratio α is:
其中n1×n2为源特征图的大小。Where n 1 × n 2 is the size of the source feature map.
在一些实施例中,第四处理模块被配置为针对所述全部第二分块中的第t个第二分块,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,1≤t≤D,根据所述第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt确定所述每个单头的注意力值,根据所述第t个第二分块的全部单头的注意力值确定所述第t个第二分块的多头注意力值,对所述第t个第二分块的多头注意力值和所述第t个第二分块进行多层感知处理,以得到所述重构特征图。In some embodiments, the fourth processing module is configured to, for the t-th second block among all the second blocks, respectively, according to the first attention weight matrix of each single head. Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , 1≤t≤D, according to the first vector matrix Q t , second vector matrix K t and third vector The matrix V t determines the attention value of each single head, and determines the multi-head attention value of the t second block based on the attention values of all single heads in the t second block. The multi-head attention value of the t-th second block and the t-th second block are subjected to multi-layer perception processing to obtain the reconstructed feature map.
在一些实施例中,所述编码器为Transformer编码器;所述解码器为Transformer解码器。In some embodiments, the encoder is a Transformer encoder; the decoder is a Transformer decoder.
根据本公开实施例的第三方面,提供一种图像特征处理装置,包括:存储器,被 配置为存储指令;处理器,耦合到存储器,处理器被配置为基于存储器存储的指令执行实现如上述任一实施例所述的方法。According to a third aspect of an embodiment of the present disclosure, an image feature processing device is provided, including: a memory, configured to store instructions; the processor is coupled to the memory, and the processor is configured to execute the method described in any of the above embodiments based on the instructions stored in the memory.
根据本公开实施例的第四方面,提供一种非瞬态计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如上述任一实施例所述的方法。According to a fourth aspect of an embodiment of the present disclosure, a non-transitory computer-readable storage medium is provided, wherein the computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, the methods described in any of the above embodiments are implemented. method.
根据本公开实施例的第五方面,提供一种计算机程序产品,包括计算机指令,其中所述计算机指令被处理器执行时实现如上述任一实施例所述的方法。According to a fifth aspect of an embodiment of the present disclosure, a computer program product is provided, including computer instructions, wherein when the computer instructions are executed by a processor, the method as described in any of the above embodiments is implemented.
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。Other features and advantages of the present disclosure will become apparent from the following detailed description of exemplary embodiments of the present disclosure with reference to the accompanying drawings.
附图说明Description of drawings
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present disclosure or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting any creative effort.
图1为本公开一个实施例的图像特征处理方法的流程示意图;Figure 1 is a schematic flowchart of an image feature processing method according to an embodiment of the present disclosure;
图2A-图2C为本公开一些实施例的特征图的示意图;2A-2C are schematic diagrams of feature diagrams of some embodiments of the present disclosure;
图3A-图3B为本公开另一些实施例的特征图的示意图;3A-3B are schematic diagrams of characteristic diagrams of other embodiments of the present disclosure;
图4为本公开一个实施例的编码器输出示意图;Figure 4 is a schematic diagram of an encoder output according to an embodiment of the present disclosure;
图5为本公开一个实施例的解码器示意图;Figure 5 is a schematic diagram of a decoder according to an embodiment of the present disclosure;
图6为本公开一个实施例的图像特征处理装置的结构示意图;Figure 6 is a schematic structural diagram of an image feature processing device according to an embodiment of the present disclosure;
图7为本公开另一个实施例的图像特征处理装置的结构示意图。FIG. 7 is a schematic structural diagram of an image feature processing device according to another embodiment of the present disclosure.
具体实施方式Detailed ways
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only some of the embodiments of the present disclosure, rather than all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application or uses. Based on the embodiments in this disclosure, all other embodiments obtained by those of ordinary skill in the art without making creative efforts fall within the scope of protection of this disclosure.
除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表 达式和数值不限制本公开的范围。Unless otherwise specifically stated, the relative arrangement of components and steps set forth in these examples, numerical representations Expressions and numerical values do not limit the scope of the disclosure.
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。At the same time, it should be understood that, for convenience of description, the dimensions of various parts shown in the drawings are not drawn according to actual proportional relationships.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the authorized specification.
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。In all examples shown and discussed herein, any specific values are to be construed as illustrative only and not as limiting. Accordingly, other examples of the exemplary embodiments may have different values.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that similar reference numerals and letters refer to similar items in the following figures, so that once an item is defined in one figure, it does not need further discussion in subsequent figures.
图1为本公开一个实施例的图像特征处理方法的流程示意图。在一些实施例中,下列的图像特征处理方法由图像特征处理装置执行。Figure 1 is a schematic flowchart of an image feature processing method according to an embodiment of the present disclosure. In some embodiments, the following image feature processing method is performed by an image feature processing device.
在步骤101,提取原始图像的特征,以得到源特征图。In step 101, the features of the original image are extracted to obtain the source feature map.
在一些实施例中,将原始图像输入CNN(Convolutional Neural Network,卷积神经网络),以得到源特征图。In some embodiments, the original image is input into a CNN (Convolutional Neural Network) to obtain the source feature map.
在步骤102,将源特征图分割为预定数量D个相互不重叠的第一分块。In step 102, the source feature map is divided into a predetermined number D of first blocks that do not overlap with each other.
在一些实施例中,若源特征图的大小为n1×n2,每个第一分块的大小为m1×m2,则D=d1×d2,其中d1=n1/m1,d2=n2/m2In some embodiments, if the size of the source feature map is n 1 ×n 2 and the size of each first block is m 1 ×m 2 , then D=d 1 ×d 2 , where d 1 =n 1 / m 1 , d 2 =n 2 /m 2 .
例如,如图2A所示,源特征图的大小为28×28。如图2B所示,每个分块的大小为7×7,即将源特征图分割为4×4个分块,如图2C所示的d(0,0)到d(3,3)。For example, as shown in Figure 2A, the size of the source feature map is 28×28. As shown in Figure 2B, the size of each block is 7×7, that is, the source feature map is divided into 4×4 blocks, d(0,0) to d(3,3) as shown in Figure 2C.
在步骤103,计算全部第一分块中的每个第一分块的梯度值。In step 103, the gradient value of each first block in all first blocks is calculated.
在一些实施例中,在第一分块d(i,j)中,计算每个特征点f(x,y)在第一方向上的第一梯度gx和在第二方向上的第二梯度gy,其中第一方向和第二方向相互垂直,其中1≤i≤d1,1≤j≤d2,D=d1×d2In some embodiments, in the first block d(i,j), the first gradient gx in the first direction and the second gradient in the second direction of each feature point f(x,y) are calculated. Gradient g y , where the first direction and the second direction are perpendicular to each other, where 1≤i≤d 1 , 1≤j≤d 2 , and D=d 1 ×d 2 .
例如,第一方向为预设平面中的x轴方向,第二方向为预设平面中的y轴方向。For example, the first direction is the x-axis direction in the preset plane, and the second direction is the y-axis direction in the preset plane.
特征点f(x,y)在第一方向上的第一梯度gx为:
The first gradient g x of the feature point f(x,y) in the first direction is:
特征点f(x,y)在第二方向上的第二梯度gy为:
The second gradient g y of the feature point f(x,y) in the second direction is:
接下来,根据第一梯度gx和第二梯度gy确定特征点f(x,y)的梯度值。Next, the gradient value of the feature point f(x,y) is determined based on the first gradient gx and the second gradient gy .
在一些实施例中,特征点f(x,y)的梯度值为第一梯度gx和第二梯度gy的均方值。 即:
In some embodiments, the gradient value of the feature point f(x, y) is the mean square value of the first gradient g x and the second gradient g y . Right now:
接下来,根据第一分块d(i,j)中的全部特征点的梯度值确定第一分块d(i,j)的梯度值。Next, determine the gradient value of the first block d(i,j) based on the gradient values of all feature points in the first block d(i,j).
在一些实施例中,第一分块d(i,j)的梯度值为第一分块d(i,j)中的全部特征点的梯度值的平均值。即:
In some embodiments, the gradient value of the first block d(i,j) is the average of the gradient values of all feature points in the first block d(i,j). Right now:
在步骤104,将全部第一分块按照梯度平均值进行排序,以删除梯度平均值最小的前p个第一分块。In step 104, all the first blocks are sorted according to the gradient average, so as to delete the top p first blocks with the smallest gradient average.
在一些实施例中,根据预设的压缩比α确定参数p。In some embodiments, the parameter p is determined according to a preset compression ratio α.
例如,压缩比α和参数p之间的关系如公式(5)所示。
For example, the relationship between the compression ratio α and the parameter p is shown in formula (5).
其中n1×n2为源特征图的大小。因此可根据需要选择所丢弃的分块数量,即可灵活地调节特征的压缩比率。Where n 1 × n 2 is the size of the source feature map. Therefore, the number of discarded blocks can be selected as needed, and the compression ratio of features can be flexibly adjusted.
例如,如图3A所示,特征图中包括16个分块,即D=16。通过将全部分块按照梯度平均值进行排序,并删除梯度平均值最小的前11个分块,即p=11。从而在特征图中只保留5个分块,即D-p=5。如图3B所示,保留的5个分块为:第2个分块、第7个分块、第9个分块、第14个分块和第16个分块。For example, as shown in Figure 3A, the feature map includes 16 blocks, that is, D=16. By sorting all the blocks according to the gradient average, and deleting the first 11 blocks with the smallest gradient average, that is, p=11. Therefore, only 5 blocks are retained in the feature map, that is, D-p=5. As shown in Figure 3B, the five reserved blocks are: the 2nd block, the 7th block, the 9th block, the 14th block and the 16th block.
在步骤105,利用预设的编码器对未被删除的D-p个第一分块的分块嵌入特征和位置嵌入特征进行编码处理,以得到D个第二分块,其中D个第二分块中包括位于预定位置上的D-p个编码可视分块和p个掩膜令牌,其中D-p个编码可视分块与D-p个第一分块一一对应。In step 105, use a preset encoder to encode the block embedding features and position embedding features of the D-p first blocks that have not been deleted to obtain D second blocks, where D second blocks includes D-p coded visible blocks and p mask tokens located at predetermined positions, where the D-p coded visible blocks correspond to the D-p first blocks one-to-one.
在一些实施例中,预设的编码器为Transformer编码器。In some embodiments, the default encoder is a Transformer encoder.
例如,将图3B所示的5个分块,即第2个分块、第7个分块、第9个分块、第14个分块和第16个分块,输入经过训练的编码器,以便编码器输出编码结果。编码结果包括16个分块。在这16个分块中,包括与图3B中的5个分块相对应的5个编码可视分块(Encoded Visible Patches),如图4中的5个白色方框41所示,以及11个掩膜令牌(Mask Token),如图4中的11个深色方框42所示。For example, the 5 blocks shown in Figure 3B, namely the 2nd block, the 7th block, the 9th block, the 14th block, and the 16th block, are input to the trained encoder , so that the encoder outputs the encoding result. The encoding result includes 16 blocks. Among these 16 patches, there are 5 encoded visible patches (Encoded Visible Patches) corresponding to the 5 patches in Figure 3B, as shown in the 5 white boxes 41 in Figure 4, and 11 Mask Token, as shown in the 11 dark boxes 42 in Figure 4.
需要说明的是,编码结果中的5个编码可视分块所处位置与图3B中的5个分块 所处位置相对应。例如,如图4所示,编码结果中的5个编码可视分块分别为编码结果中的第2个分块、第7个分块、第9个分块、第14个分块和第16个分块。It should be noted that the positions of the five coded visible blocks in the coding result are the same as those of the five blocks in Figure 3B. Corresponds to the location. For example, as shown in Figure 4, the five encoded visible blocks in the encoding result are the 2nd block, the 7th block, the 9th block, the 14th block and the 14th block in the encoding result. 16 blocks.
还需要说明的是,由于在图3B所示的特征图中,丢弃了11个分块,即丢弃了第1个分块、第3个分块、第4个分块、第5个分块、第6个分块、第8个分块、第10个分块、第11个分块、第12个分块、第13个分块和第15个分块。因此在编码器会在所丢弃分块的位置上添加相应的掩膜令牌(Mask Token),如图4中的11个深色方框42所示。即,编码结果中包括11个掩膜令牌,分别为编码结果中的第1个分块、第3个分块、第4个分块、第5个分块、第6个分块、第8个分块、第10个分块、第11个分块、第12个分块、第13个分块和第15个分块。It should also be noted that in the feature map shown in Figure 3B, 11 blocks are discarded, that is, the 1st block, 3rd block, 4th block, and 5th block are discarded. , 6th block, 8th block, 10th block, 11th block, 12th block, 13th block and 15th block. Therefore, the encoder will add the corresponding mask token (Mask Token) at the position of the discarded block, as shown in the 11 dark boxes 42 in Figure 4. That is, the encoding result includes 11 mask tokens, which are the first block, the third block, the fourth block, the fifth block, the sixth block, and the third block in the encoding result. 8th chunk, 10th chunk, 11th chunk, 12th chunk, 13th chunk and 15th chunk.
需要说明的是,由于如何对编码器进行训练并不是本公开的发明点所在,因此这里不展开描述。It should be noted that since how to train the encoder is not the invention of the present disclosure, the description will not be carried out here.
在步骤106,利用预设的解码器对全部第二分块和源特征图的位置嵌入特征进行解码处理,以得到重构特征图。In step 106, a preset decoder is used to decode all the second blocks and the position embedded features of the source feature map to obtain the reconstructed feature map.
在一些实施例中,预设的解码器为Transformer解码器。In some embodiments, the default decoder is a Transformer decoder.
在一些实施例中,解码器的结构如图5所示。In some embodiments, the structure of the decoder is as shown in Figure 5.
如图5所示,输入特征经过归一化(Normalize)处理后,由多头自注意力(Multi-head Self Attention)层进行处理。As shown in Figure 5, the input features are processed by the multi-head self-attention (Multi-head Self Attention) layer after normalization.
例如,针对第t个第二分块,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,1≤t≤D。For example, for the t-th second block, according to the first attention weight matrix of each single head Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , 1≤t≤D.
例如,计算公式如公式(6)所示,其中Ft为第t个第二分块的特征。
For example, the calculation formula is as shown in formula (6), where Ft is the feature of the t-th second block.
接下来,根据第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt确定每个单头的注意力值st,如公式(7)所示。
Next, the attention value s t of each single head is determined according to the first vector matrix Q t , the second vector matrix K t and the third vector matrix V t , as shown in formula (7).
其中,为矩阵Kt的维度,δ为注意力计算函数,τ为Softmax逻辑回归函数。in, is the dimension of matrix K t , δ is the attention calculation function, and τ is the Softmax logistic regression function.
接下来,根据第t个第二分块的全部单头的注意力值确定第t个第二分块的多头注意力值,如公式(8)所示。
Next, the multi-head attention value of the t-th second block is determined based on the attention values of all single-heads of the t-th second block, as shown in formula (8).
其中,为Concatenate函数,为参数矩阵。in, is the Concatenate function, is the parameter matrix.
接下来,将第t个第二分块的多头注意力值和第t个第二分块输入MLP(Multi-Layer perceptron,多层感知)层进行相应处理,以得到重构特征图。Next, the multi-head attention value of the t-th second block and the t-th second block are input into the MLP (Multi-Layer perceptron, multi-layer perception) layer for corresponding processing to obtain the reconstructed feature map.
在本公开上述实施例提供的图像特征处理方法中,通过丢弃特征图中的梯度平均值最小的前p个分块,从而能够有效丢弃图像或视频中的非重要区域的特征,以便更好地完成机器视觉智能分析任务。In the image feature processing method provided by the above embodiments of the present disclosure, by discarding the first p blocks with the smallest gradient average in the feature map, the features of non-important areas in the image or video can be effectively discarded to better Complete machine vision intelligent analysis tasks.
图6为本公开一个实施例的图像特征处理装置的结构示意图。如图6所示,图像特征处理装置包括第一处理模块61、第二处理模块62、第三处理模块63和第四处理模块64。FIG. 6 is a schematic structural diagram of an image feature processing device according to an embodiment of the present disclosure. As shown in FIG. 6 , the image feature processing device includes a first processing module 61 , a second processing module 62 , a third processing module 63 and a fourth processing module 64 .
第一处理模块61被配置为提取原始图像的特征,以得到源特征图。The first processing module 61 is configured to extract features of the original image to obtain a source feature map.
在一些实施例中,将原始图像输入CNN,以得到源特征图。In some embodiments, the original image is input into the CNN to obtain the source feature map.
第二处理模块62被配置为将源特征图分割为预定数量D个相互不重叠的第一分块,计算全部第一分块中的每个第一分块的梯度值,将全部第一分块按照梯度平均值进行排序,以删除梯度平均值最小的前p个第一分块。The second processing module 62 is configured to divide the source feature map into a predetermined number D of mutually non-overlapping first blocks, calculate the gradient value of each first block in all first blocks, and divide all first blocks into Blocks are sorted by gradient mean to remove the top p first blocks with the smallest gradient mean.
在一些实施例中,若源特征图的大小为n1×n2,每个第一分块的大小为m1×m2,则D=d1×d2,其中d1=n1/m1,d2=n2/m2In some embodiments, if the size of the source feature map is n 1 ×n 2 and the size of each first block is m 1 ×m 2 , then D=d 1 ×d 2 , where d 1 =n 1 / m 1 , d 2 =n 2 /m 2 .
在一些实施例中,第二处理模块62在第一分块d(i,j)中,计算每个特征点f(x,y)在第一方向上的第一梯度gx和在第二方向上的第二梯度gy,其中第一方向和第二方向相互垂直,其中1≤i≤d1,1≤j≤d2,D=d1×d2In some embodiments, the second processing module 62 calculates the first gradient g x of each feature point f (x, y) in the first direction and the second gradient g x in the first block d (i, j). The second gradient g y in the direction, where the first direction and the second direction are perpendicular to each other, where 1≤i≤d 1 , 1≤j≤d 2 , D=d 1 ×d 2 .
例如,第一方向为预设平面中的x轴方向,第二方向为预设平面中的y轴方向。For example, the first direction is the x-axis direction in the preset plane, and the second direction is the y-axis direction in the preset plane.
接下来,第二处理模块62根据第一梯度gx和第二梯度gy确定特征点f(x,y)的梯度值。根据第一分块d(i,j)中的全部特征点的梯度值确定第一分块d(i,j)的梯度值。Next, the second processing module 62 determines the gradient value of the feature point f(x, y) according to the first gradient g x and the second gradient gy . The gradient value of the first block d(i,j) is determined based on the gradient values of all feature points in the first block d(i,j).
在一些实施例中,特征点f(x,y)的梯度值为第一梯度gx和第二梯度gy的均方值,如上述公式(3)所示。In some embodiments, the gradient value of the feature point f(x, y) is the mean square value of the first gradient g x and the second gradient g y , as shown in the above formula (3).
在一些实施例中,第一分块d(i,j)的梯度值为第一分块d(i,j)中的全部特征点的梯度值的平均值,如上述公式(4)所示。In some embodiments, the gradient value of the first block d(i,j) is the average of the gradient values of all feature points in the first block d(i,j), as shown in the above formula (4) .
在一些实施例中,根据预设的压缩比α确定参数p。例如根据上述公式(5)确定参数p。In some embodiments, the parameter p is determined according to a preset compression ratio α. For example, the parameter p is determined according to the above formula (5).
第三处理模块63被配置为利用预设的编码器对未被删除的D-p个第一分块的分 块嵌入特征和位置嵌入特征进行编码处理,以得到D个第二分块,其中D个第二分块中包括位于预定位置上的D-p个编码可视分块和p个掩膜令牌,其中D-p个编码可视分块与D-p个第一分块一一对应。The third processing module 63 is configured to use a preset encoder to segment the Dp first blocks that have not been deleted. Block embedding features and position embedding features are encoded to obtain D second blocks, where the D second blocks include Dp encoded visual blocks and p mask tokens located at predetermined positions, where Dp coded visible blocks correspond to Dp first blocks one-to-one.
在一些实施例中,预设的编码器为Transformer编码器。In some embodiments, the default encoder is a Transformer encoder.
第四处理模块64被配置为利用预设的解码器对全部第二分块和源特征图的位置嵌入特征进行解码处理,以得到重构特征图。The fourth processing module 64 is configured to use a preset decoder to decode all the second blocks and the position embedded features of the source feature map to obtain the reconstructed feature map.
在一些实施例中,预设的解码器为Transformer解码器。In some embodiments, the default decoder is a Transformer decoder.
在一些实施例中,第四处理模块64针对全部第二分块中的第t个第二分块,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,1≤t≤D。例如采用上述公式(6)进行计算。In some embodiments, the fourth processing module 64 performs the calculation according to the first attention weight matrix of each single head for the t-th second block among all the second blocks. Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , 1≤t≤D. For example, the above formula (6) is used for calculation.
接下来,第四处理模块64根据第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt确定每个单头的注意力值。例如采用上述公式(7)进行计算。Next, the fourth processing module 64 determines the attention value of each single head according to the first vector matrix Q t , the second vector matrix K t and the third vector matrix V t . For example, the above formula (7) is used for calculation.
接下来,第四处理模块64根据第t个第二分块的全部单头的注意力值确定第t个第二分块的多头注意力值。例如采用上述公式(8)进行计算。Next, the fourth processing module 64 determines the multi-head attention value of the t-th second block based on the attention values of all single heads of the t-th second block. For example, the above formula (8) is used for calculation.
接下来,对第t个第二分块的多头注意力值和第t个第二分块进行多层感知处理,以得到重构特征图。Next, multi-layer perception processing is performed on the multi-head attention value of the t-th second block and the t-th second block to obtain the reconstructed feature map.
图7为本公开另一个实施例的图像特征处理装置的结构示意图。如图7所示,图像特征处理装置包括存储器71和处理器72。FIG. 7 is a schematic structural diagram of an image feature processing device according to another embodiment of the present disclosure. As shown in FIG. 7 , the image feature processing device includes a memory 71 and a processor 72 .
存储器71用于存储指令,处理器72耦合到存储器71,处理器72被配置为基于存储器存储的指令执行实现如图1中任一实施例涉及的方法。The memory 71 is used to store instructions, and the processor 72 is coupled to the memory 71 . The processor 72 is configured to execute the method involved in any embodiment in FIG. 1 based on the instructions stored in the memory.
如图7所示,该图像特征处理装置还包括通信接口73,用于与其它设备进行信息交互。同时,该图像特征处理装置还包括总线74,处理器72、通信接口73、以及存储器71通过总线74完成相互间的通信。As shown in Figure 7, the image feature processing device also includes a communication interface 73 for information interaction with other devices. At the same time, the image feature processing device also includes a bus 74, through which the processor 72, the communication interface 73, and the memory 71 complete communication with each other.
存储器71可以包含高速RAM存储器,也可还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。存储器71也可以是存储器阵列。存储器71还可能被分块,并且块可按一定的规则组合成虚拟卷。The memory 71 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 71 may also be a memory array. The memory 71 may also be divided into blocks, and the blocks may be combined into virtual volumes according to certain rules.
此外,处理器72可以是一个中央处理器CPU,或者可以是专用集成电路ASIC,或是被配置成实施本公开实施例的一个或多个集成电路。Additionally, processor 72 may be a central processing unit (CPU), or may be an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present disclosure.
本公开同时还涉及一种计算机可读存储介质,其中计算机可读存储介质存储有计 算机指令,指令被处理器执行时实现如图1中任一实施例涉及的方法。The present disclosure also relates to a computer-readable storage medium, wherein the computer-readable storage medium stores computer Computer instructions, when executed by the processor, implement the method involved in any embodiment in Figure 1.
通过实施本公开的上述实施例,能够得到以下有益效果:By implementing the above embodiments of the present disclosure, the following beneficial effects can be obtained:
1.本公开通过计算特征图分块中的梯度值,根据梯度值的排序结果能够筛选出特征信息聚焦的重要区域,进而可以丢弃特征中的非重要区域分块,实现特征的压缩;1. By calculating the gradient values in the feature map blocks, this disclosure can filter out the important areas where the feature information focuses according to the sorting results of the gradient values, and then discard the non-important area blocks in the features to achieve feature compression;
2)本公开能够通过控制丢弃特征分块的比率,灵活控制压缩率,灵活搭配满足多种压缩率需求;2) The present disclosure can flexibly control the compression rate by controlling the ratio of discarding feature blocks, and flexibly match various compression rate requirements;
3)本公开的编码部分可作为机器视觉编码模块添加在机器视觉系统的图像设备中,本公开的解码部分可作为机器视觉解码模块添加在机器视觉系统的边缘功能集中,从而可以提高压缩效率。3) The encoding part of the present disclosure can be added to the image device of the machine vision system as a machine vision encoding module, and the decoding part of the present disclosure can be added to the edge function set of the machine vision system as a machine vision decoding module, thereby improving compression efficiency.
在一些实施例中,在上面所描述的功能单元可以实现为用于执行本公开所描述功能的通用处理器、可编程逻辑控制器(Programmable Logic Controller,简称:PLC)、数字信号处理器(Digital Signal Processor,简称:DSP)、专用集成电路(Application Specific Integrated Circuit,简称:ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称:FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件或者其任意适当组合。In some embodiments, the functional units described above can be implemented as a general-purpose processor, a programmable logic controller (PLC), a digital signal processor (Digital processor) for performing the functions described in this disclosure. Signal Processor (DSP for short), Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, or any appropriate combination thereof.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps to implement the above embodiments can be completed by hardware, or by hardware related to program instructions. The program can be stored in a computer-readable storage medium. The storage medium can be a read-only memory, a magnetic disk or an optical disk, etc.
本公开的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本公开限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本公开的原理和实际应用,并且使本领域的普通技术人员能够理解本公开从而设计适于特定用途的带有各种修改的各种实施例。 The description of the present disclosure has been presented for the purposes of illustration and description, and is not intended to be exhaustive or to limit the disclosure to the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure and design various embodiments with various modifications as are suited to the particular use contemplated.

Claims (19)

  1. 一种图像特征处理方法,由图像特征处理装置执行,包括:An image feature processing method, executed by an image feature processing device, including:
    提取原始图像的特征,以得到源特征图;Extract features of the original image to obtain the source feature map;
    将所述源特征图分割为预定数量D个相互不重叠的第一分块;Divide the source feature map into a predetermined number D of first sub-blocks that do not overlap with each other;
    计算全部第一分块中的每个第一分块的梯度值;Calculate the gradient value of each first block in all first blocks;
    将所述全部第一分块按照梯度平均值进行排序,以删除梯度平均值最小的前p个第一分块;Sort all the first blocks according to the average gradient to delete the first p first blocks with the smallest average gradient;
    利用预设的编码器对未被删除的D-p个第一分块的分块嵌入特征和位置嵌入特征进行编码处理,以得到D个第二分块,其中所述D个第二分块中包括位于预定位置上的D-p个编码可视分块和p个掩膜令牌,所述D-p个编码可视分块与所述D-p个第一分块一一对应;Use a preset encoder to encode the block embedding features and position embedding features of the D-p first blocks that have not been deleted to obtain D second blocks, wherein the D second blocks include D-p coded visible blocks and p mask tokens located at predetermined positions, the D-p coded visible blocks being in one-to-one correspondence with the D-p first blocks;
    利用预设的解码器对全部第二分块和所述源特征图的位置嵌入特征进行解码处理,以得到重构特征图。A preset decoder is used to decode all the second blocks and the position embedded features of the source feature map to obtain a reconstructed feature map.
  2. 根据权利要求1所述的方法,其中,计算每个第一分块的梯度值包括:The method of claim 1, wherein calculating the gradient value of each first block includes:
    在第一分块d(i,j)中,计算每个特征点f(x,y)在第一方向上的第一梯度gx和在第二方向上的第二梯度gy,其中所述第一方向和所述第二方向相互垂直,其中1≤i≤d1,1≤j≤d2,D=d1×d2In the first block d(i,j), calculate the first gradient g x in the first direction and the second gradient g y in the second direction of each feature point f(x, y), where The first direction and the second direction are perpendicular to each other, where 1≤i≤d 1 , 1≤j≤d 2 , D=d 1 ×d 2 ;
    根据所述第一梯度gx和所述第二梯度gy确定所述特征点f(x,y)的梯度值;Determine the gradient value of the feature point f(x, y) according to the first gradient g x and the second gradient g y ;
    根据所述第一分块d(i,j)中的全部特征点的梯度值确定第一分块d(i,j)的梯度值。The gradient value of the first block d(i,j) is determined based on the gradient values of all feature points in the first block d(i,j).
  3. 根据权利要求2所述的方法,其中,The method of claim 2, wherein
    所述特征点f(x,y)的梯度值为所述第一梯度gx和所述第二梯度gy的均方值。The gradient value of the feature point f(x,y) is the mean square value of the first gradient g x and the second gradient g y .
  4. 根据权利要求2所述的方法,其中,The method of claim 2, wherein
    第一分块d(i,j)的梯度值为所述第一分块d(i,j)中的全部特征点的梯度值的平均值。The gradient value of the first block d(i,j) is the average of the gradient values of all feature points in the first block d(i,j).
  5. 根据权利要求1所述的方法,还包括:The method of claim 1, further comprising:
    根据预设的压缩比α确定参数p。 The parameter p is determined according to the preset compression ratio α.
  6. 根据权利要求5所述的方法,其中,The method of claim 5, wherein,
    所述压缩比α为:
    The compression ratio α is:
    其中n1×n2为源特征图的大小。Where n 1 × n 2 is the size of the source feature map.
  7. 根据权利要求1所述的方法,其中,利用预设的解码器对全部第二分块和所述源特征图的位置嵌入特征进行解码处理包括:The method according to claim 1, wherein using a preset decoder to decode all second blocks and position embedded features of the source feature map includes:
    针对所述全部第二分块中的第t个第二分块,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,1≤t≤D;For the t-th second block among all the second blocks, according to the first attention weight matrix of each single head, Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , 1≤t≤D;
    根据所述第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt确定所述每个单头的注意力值;Determine the attention value of each single head according to the first vector matrix Q t , the second vector matrix K t and the third vector matrix V t ;
    根据所述第t个第二分块的全部单头的注意力值确定所述第t个第二分块的多头注意力值;Determine the multi-head attention value of the t-th second block based on the attention values of all single heads of the t-th second block;
    对所述第t个第二分块的多头注意力值和所述第t个第二分块进行多层感知处理,以得到所述重构特征图。Multi-layer perception processing is performed on the multi-head attention value of the t-th second block and the t-th second block to obtain the reconstructed feature map.
  8. 根据权利要求1-7中任一项所述的方法,其中,The method according to any one of claims 1-7, wherein,
    所述编码器为Transformer编码器;The encoder is a Transformer encoder;
    所述解码器为Transformer解码器。The decoder is a Transformer decoder.
  9. 一种图像特征处理装置,包括:An image feature processing device, including:
    第一处理模块,被配置为提取原始图像的特征,以得到源特征图;The first processing module is configured to extract features of the original image to obtain the source feature map;
    第二处理模块,被配置为将所述源特征图分割为预定数量D个相互不重叠的第一分块,计算全部第一分块中的每个第一分块的梯度值,将所述全部第一分块按照梯度平均值进行排序,以删除梯度平均值最小的前p个第一分块;The second processing module is configured to divide the source feature map into a predetermined number D of first blocks that do not overlap with each other, calculate the gradient value of each first block in all first blocks, and divide the All first blocks are sorted according to the gradient average to delete the top p first blocks with the smallest gradient average;
    第三处理模块,被配置为利用预设的编码器对未被删除的D-p个第一分块的分块嵌入特征和位置嵌入特征进行编码处理,以得到D个第二分块,其中所述D个第二分块中包括位于预定位置上的D-p个编码可视分块和p个掩膜令牌,所述D-p个编码可 视分块与所述D-p个第一分块一一对应;The third processing module is configured to use a preset encoder to encode the block embedding features and position embedding features of the Dp first blocks that have not been deleted to obtain D second blocks, wherein the The D second blocks include Dp coded visible blocks located at predetermined positions and p mask tokens, and the Dp coded visible blocks are The visual blocks correspond one-to-one to the Dp first blocks;
    第四处理模块,被配置为利用预设的解码器对全部第二分块和所述源特征图的位置嵌入特征进行解码处理,以得到重构特征图。The fourth processing module is configured to use a preset decoder to decode all the second blocks and the position embedded features of the source feature map to obtain a reconstructed feature map.
  10. 根据权利要求9所述的装置,其中,The device of claim 9, wherein:
    第二处理模块被配置为在第一分块d(i,j)中,计算每个特征点f(x,y)在第一方向上的第一梯度gx和在第二方向上的第二梯度gy,其中所述第一方向和所述第二方向相互垂直,其中1≤i≤d1,1≤j≤d2,D=d1×d2;根据所述第一梯度gx和所述第二梯度gy确定所述特征点f(x,y)的梯度值;根据所述第一分块d(i,j)中的全部特征点的梯度值确定第一分块d(i,j)的梯度值。The second processing module is configured to calculate the first gradient g x in the first direction and the second gradient g x in the second direction of each feature point f(x, y) in the first block d(i,j). Two gradients g y , wherein the first direction and the second direction are perpendicular to each other, where 1≤i≤d 1 , 1≤j≤d 2 , D=d 1 ×d 2 ; according to the first gradient g x and the second gradient g y determine the gradient value of the feature point f(x,y); determine the first block based on the gradient values of all feature points in the first block d(i,j) The gradient value of d(i,j).
  11. 根据权利要求10所述的装置,其中,The device of claim 10, wherein:
    所述特征点f(x,y)的梯度值为所述第一梯度gx和所述第二梯度gy的均方值。The gradient value of the feature point f(x,y) is the mean square value of the first gradient g x and the second gradient g y .
  12. 根据权利要求10所述的装置,其中,The device of claim 10, wherein:
    第一分块d(i,j)的梯度值为所述第一分块d(i,j)中的全部特征点的梯度值的平均值。The gradient value of the first block d(i,j) is the average of the gradient values of all feature points in the first block d(i,j).
  13. 根据权利要求9所述的装置,其中,The device of claim 9, wherein:
    第二处理模块被配置为根据预设的压缩比α确定参数p。The second processing module is configured to determine the parameter p according to the preset compression ratio α.
  14. 根据权利要求13所述的装置,其中,The device of claim 13, wherein:
    所述压缩比α为:
    The compression ratio α is:
    其中n1×n2为源特征图的大小。Where n 1 × n 2 is the size of the source feature map.
  15. 根据权利要求9所述的装置,其中,The device of claim 9, wherein:
    第四处理模块被配置为针对所述全部第二分块中的第t个第二分块,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,1≤t≤D,根据所述第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt确定所述每个单头的注意力值, 根据所述第t个第二分块的全部单头的注意力值确定所述第t个第二分块的多头注意力值,对所述第t个第二分块的多头注意力值和所述第t个第二分块进行多层感知处理,以得到所述重构特征图。The fourth processing module is configured to, for the t-th second block among all the second blocks, respectively, according to the first attention weight matrix of each single head Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , 1≤t≤D, according to the first vector matrix Q t , second vector matrix K t and third vector The matrix V t determines the attention value of each single head, The multi-head attention value of the t-th second block is determined based on the attention values of all single heads of the t-th second block, and the sum of the multi-head attention values of the t-th second block is The t-th second block undergoes multi-layer sensing processing to obtain the reconstructed feature map.
  16. 根据权利要求9-15中任一项所述的装置,其中,The device according to any one of claims 9-15, wherein,
    所述编码器为Transformer编码器;The encoder is a Transformer encoder;
    所述解码器为Transformer解码器。The decoder is a Transformer decoder.
  17. 一种图像特征处理装置,包括:An image feature processing device, including:
    存储器,被配置为存储指令;memory configured to store instructions;
    处理器,耦合到存储器,处理器被配置为基于存储器存储的指令执行实现如权利要求1-8中任一项所述的方法。A processor, coupled to the memory, configured to execute the method according to any one of claims 1-8 based on instructions stored in the memory.
  18. 一种非瞬态计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如权利要求1-8中任一项所述的方法。A non-transitory computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, the method according to any one of claims 1-8 is implemented.
  19. 一种计算机程序产品,包括计算机指令,其中所述计算机指令被处理器执行时实现如权利要求1-8中任一项所述的方法。 A computer program product comprising computer instructions, wherein when the computer instructions are executed by a processor, the method according to any one of claims 1-8 is implemented.
PCT/CN2023/110526 2022-08-19 2023-08-01 Image feature processing method and apparatus, and storage medium WO2024037330A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210998237.8 2022-08-19
CN202210998237.8A CN117649569A (en) 2022-08-19 2022-08-19 Image feature processing method and device and storage medium

Publications (1)

Publication Number Publication Date
WO2024037330A1 true WO2024037330A1 (en) 2024-02-22

Family

ID=89940676

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110526 WO2024037330A1 (en) 2022-08-19 2023-08-01 Image feature processing method and apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN117649569A (en)
WO (1) WO2024037330A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090067491A1 (en) * 2007-09-07 2009-03-12 Microsoft Corporation Learning-Based Image Compression
CN107154061A (en) * 2017-05-09 2017-09-12 北京航宇天穹科技有限公司 The regularization coding/decoding method that a kind of splits' positions are perceived
US20200105022A1 (en) * 2018-09-27 2020-04-02 Ateme Method for image processing and apparatus for implementing the same
CN115514976A (en) * 2022-07-15 2022-12-23 中国电信股份有限公司 Image encoding method, decoding method, device, readable medium and electronic equipment
CN115661276A (en) * 2022-10-21 2023-01-31 中国电信股份有限公司 Image data encoding method, device, apparatus, medium, and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090067491A1 (en) * 2007-09-07 2009-03-12 Microsoft Corporation Learning-Based Image Compression
CN107154061A (en) * 2017-05-09 2017-09-12 北京航宇天穹科技有限公司 The regularization coding/decoding method that a kind of splits' positions are perceived
US20200105022A1 (en) * 2018-09-27 2020-04-02 Ateme Method for image processing and apparatus for implementing the same
CN115514976A (en) * 2022-07-15 2022-12-23 中国电信股份有限公司 Image encoding method, decoding method, device, readable medium and electronic equipment
CN115661276A (en) * 2022-10-21 2023-01-31 中国电信股份有限公司 Image data encoding method, device, apparatus, medium, and program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHI RONG, FU ZHIZHONG, SONG YAHUI, LI ZAIMING : "The New Technique for the Digital Image Compression Based on the Information Redundancy among Image Blocks", CHINESE JOURNAL OF SCIENTIFIC INSTRUMENT., vol. 24, no. 4, 1 December 2003 (2003-12-01), pages 375 - 376, XP093140315, DOI: 10.19650/j.cnki.cjsi.2003.s2.167 *
XU HAOHANG, DING SHUANGRUI, ZHANG XIAOPENG, XIONG HONGKAI, TIAN QI: "Masked Autoencoders are Robust Data Augmentors", ARXIV (CORNELL UNIVERSITY), CORNELL UNIVERSITY LIBRARY, ARXIV.ORG, ITHACA, 9 June 2022 (2022-06-09), Ithaca, XP093140308, [retrieved on 20240312], DOI: 10.48550/arxiv.2206.04846 *

Also Published As

Publication number Publication date
CN117649569A (en) 2024-03-05

Similar Documents

Publication Publication Date Title
Do et al. Forensics face detection from GANs using convolutional neural network
CN109697424A (en) A kind of high-speed railway impurity intrusion detection device and method based on FPGA and deep learning
WO2022067656A1 (en) Image processing method and apparatus
US11062210B2 (en) Method and apparatus for training a neural network used for denoising
CA3137297C (en) Adaptive convolutions in neural networks
WO2024012574A9 (en) Image coding method and apparatus, image decoding method and apparatus, readable medium, and electronic device
CN116385947B (en) Video target segmentation method, device, computer equipment and storage medium
CN114863539A (en) Portrait key point detection method and system based on feature fusion
CN112560753A (en) Face recognition method, device and equipment based on feature fusion and storage medium
CN114419408A (en) Target re-identification method, terminal device and computer-readable storage medium
Zhao et al. Detecting deepfake video by learning two-level features with two-stream convolutional neural network
WO2024037330A1 (en) Image feature processing method and apparatus, and storage medium
CN113936243A (en) Discrete representation video behavior identification system and method
WO2023068953A1 (en) Attention-based method for deep point cloud compression
CN111898638B (en) Image processing method, electronic device and medium fusing different visual tasks
Mei et al. Learn a compression for objection detection-vae with a bridge
CN115661276A (en) Image data encoding method, device, apparatus, medium, and program
CN113382244B (en) Coding and decoding network structure, image compression method, device and storage medium
Kong et al. Improved YOLOv4 for pedestrian detection and counting in UAV images
CN115131386A (en) Contour extraction and detection method and system for thoracic cavity focus image
CN111062315B (en) 2D and 3D mixed behavior recognition method
CN117746206A (en) Image feature processing method and device and storage medium
CN117635735A (en) Image processing method and device, and storage medium
Joy et al. A Novel User-Friendly Application for Foreground Detection with Post-Processing in Surveillance Video Analytics
CN112329925B (en) Model generation method, feature extraction method, device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23854234

Country of ref document: EP

Kind code of ref document: A1