WO2022205937A1 - 特征信息提取方法、模型训练方法、装置及电子设备 - Google Patents

特征信息提取方法、模型训练方法、装置及电子设备 Download PDF

Info

Publication number
WO2022205937A1
WO2022205937A1 PCT/CN2021/131681 CN2021131681W WO2022205937A1 WO 2022205937 A1 WO2022205937 A1 WO 2022205937A1 CN 2021131681 W CN2021131681 W CN 2021131681W WO 2022205937 A1 WO2022205937 A1 WO 2022205937A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature information
candidate frame
feature
image
module
Prior art date
Application number
PCT/CN2021/131681
Other languages
English (en)
French (fr)
Inventor
刘业鹏
程骏
谢琨
庞建新
Original Assignee
深圳市优必选科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技股份有限公司 filed Critical 深圳市优必选科技股份有限公司
Publication of WO2022205937A1 publication Critical patent/WO2022205937A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a feature information extraction method, a model training method, an apparatus, and an electronic device.
  • the traditional Sort-based multi-target tracking framework does not introduce pedestrian re-identification (Re-identification, Reid) information when calculating the frame and trajectory distances, but only performs movement tracking of faces. This will lead to the problem of ID loss when two faces overlap.
  • the ID loss due to the relatively large movement angle of the face, when the face shakes from side to side, the ID loss will also occur in the subsequent frames because the face is not detected in some frames.
  • the purpose of this application is to provide a method for extracting feature information, the method comprising:
  • the method further includes:
  • the face in the to-be-processed image is identified and tracked.
  • the step of performing feature extraction on the to-be-processed image to obtain overall feature information of the to-be-processed image includes:
  • the first feature information is input into a multi-layer feature pyramid network connected across stages for processing, and the second feature information output by the feature pyramid network at each level is obtained as the overall feature information of the image to be processed.
  • the step of inputting the first feature information into a multi-layer feature pyramid network connected across stages for processing includes:
  • the input data is processed by the bottom-up module and the top-down module of the first-layer feature pyramid network, and the self-report of the first-layer feature pyramid network is obtained.
  • the data output by the top-down module is used as the second feature information output by this layer;
  • the input data is processed by the bottom-up module and the top-down module of the non-first-layer feature pyramid network, and the non-first-layer feature pyramid network is processed.
  • the data output from the bottom-up module of the feature pyramid network is fused with the data output from the bottom-up module of the feature pyramid network of the previous layer, it is used as the second feature information output by this layer.
  • the step of inputting the overall feature information into the candidate frame position identification module, the candidate frame classification module and the candidate frame feature extraction module respectively for processing includes:
  • the second feature information of each layer is respectively input to the candidate frame position identification module, candidate frame classification module and candidate frame feature extraction module corresponding to each layer to obtain the candidate frame position feature information and candidate frame classification feature of each layer. information and image feature information in the candidate frame;
  • the image feature information in the candidate frame output by each layer is spliced to obtain the image feature information in the candidate frame after splicing.
  • the step of performing feature extraction on the to-be-processed image through a backbone network to obtain the first feature information includes:
  • Feature extraction is performed on the image to be processed through a residual network to obtain first feature information.
  • Another object of the present application is to provide a model training method, the method comprising:
  • the training samples include two face images marked as the same target and one face image marked as different targets, and the training samples carry the position and size information of the face frame;
  • a regression function as a loss function to adjust the network parameters for the part where the position feature information of the candidate frame is extracted
  • a classification function is used as a loss function to adjust the network parameters for the part that extracts the classification feature information of the candidate frame;
  • a ternary loss function is used to adjust the network parameters for the part where the image feature information in the candidate frame is extracted by using the twin network.
  • Another object of the present application is to provide a feature information extraction device, the device comprising:
  • Data acquisition module used to acquire the image to be processed
  • an overall feature extraction module configured to perform feature extraction on the to-be-processed image to obtain first feature information of the to-be-processed image
  • the candidate frame feature extraction module is used to input the first feature information into the candidate frame position identification module, the candidate frame classification module and the candidate frame feature extraction module for processing, and obtain each candidate frame identified according to the first feature information. Corresponding candidate frame position feature information, candidate frame classification feature information, and candidate frame image feature information.
  • Another object of the present application is to provide an electronic device, including a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are executed by the processor. when the feature information extraction or model training method provided in this application is implemented.
  • Another object of the present application is to provide a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions, when executed by one or more processors, implement the present application Provide the feature information extraction or model training method.
  • the feature information extraction method, model training method, device, and electronic device provided by the embodiments of the present application perform further feature extraction on the basis of the overall image feature extraction of the image to be processed, and obtain the candidate frame position features corresponding to each candidate frame. information, the classification feature information of the candidate frame, and the image feature information in the candidate frame. In this way, when the follow-up tracking and recognition is performed according to the feature information, even if there are image frames with overlapping faces or no faces are recognized, pedestrian Reid can be performed according to the image feature information in the candidate frame, which can reduce the ID during multi-target tracking. Lost problem.
  • FIG. 1 is a schematic diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 2 is one of the schematic flow charts of the steps of the feature information extraction method provided by the embodiment of the present application.
  • FIG. 3 is the second schematic flowchart of the steps of the feature information extraction method provided by the embodiment of the present application.
  • FIG. 4 is a schematic flowchart of sub-steps of step S120;
  • FIG. 5 is a schematic structural diagram of a backbone network provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a multi-layer feature pyramid network provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of output result transmission of a multi-layer feature pyramid network provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of steps of a model training method provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a data flow of a model training method provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of functional modules of a feature information extraction apparatus provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of functional modules of a model training apparatus provided by an embodiment of the present application.
  • Icons 100-electronic equipment; 120-machine-readable storage medium; 130-processor; 140-feature information extraction device; 141-data acquisition module; 142-overall feature extraction module; 143-candidate frame feature extraction module; 150- Model training device; 151-sample acquisition module; 152-feature acquisition module; 153-training module.
  • the electronic device may be a device with image processing capability, such as a server, a smart phone, a personal computer (personal computer, PC), an intelligent robot, and the like.
  • FIG. 1 is a schematic diagram of the hardware structure of the electronic device 100 .
  • the electronic device 100 may include a processor 130 and a machine-readable storage medium 120 .
  • the processor 130 and the machine-readable storage medium 120 may communicate via a system bus.
  • the machine-readable storage medium 120 stores machine-executable instructions.
  • the processor 130 can execute this implementation.
  • the feature information extraction method or model training method provided by the example.
  • the machine-readable storage medium 120 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory) Memory, PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Read-Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.
  • RAM Random Access Memory
  • ROM read-only memory
  • PROM programmable read-only memory
  • PROM Programmable Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electrically Erasable Read-Only Memory
  • the processor 130 may be an integrated circuit chip with signal processing capability.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a network processor (Network Processor, referred to as NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit ( ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA Field Programmable Gate Array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • FIG. 1 is only a schematic diagram of the composition of the electronic device 100, and the electronic device 100 may further include more or less components than those shown in FIG. 1 shows different configurations. Each component shown in FIG. 1 may be implemented in hardware, software, or a combination thereof.
  • FIG. 2 is a flowchart of a feature information extraction method applied to the electronic device 100 shown in FIG. Implementation, the method including each step will be described in detail below.
  • Step S110 acquiring an image to be processed.
  • the to-be-processed image may be a face recognition or face tracking image, for example, the to-be-processed image may be a certain frame of video data collected by an image collection device.
  • Step S120 perform feature extraction on the to-be-processed image to obtain overall feature information of the to-be-processed image.
  • the feature extraction model may include an overall feature extraction part and a candidate frame feature extraction part.
  • the overall feature information may be feature information obtained after feature extraction is performed by the overall feature extraction part.
  • Step S130 input the overall feature information into the candidate frame position identification module, the candidate frame classification module and the candidate frame feature extraction module respectively for processing, and obtain the candidate frame position feature information corresponding to each candidate frame identified according to the overall feature information , candidate frame classification feature information, and candidate frame image feature information.
  • the candidate frame feature extraction part of the feature extraction model may include a candidate frame position identification module, a candidate frame classification module, and a candidate frame feature extraction module.
  • the overall feature information obtained by the overall feature extraction part can be respectively input to the candidate frame position identification module, the candidate frame classification module and the candidate frame feature extraction module for further feature extraction.
  • the candidate frame position identification module is used to further identify candidate frames that may have targets to be tracked on the to-be-processed image according to the overall feature information, and obtain the position coordinates and size information of these candidate frames as all the candidate frames. Describe the position feature information of the candidate frame. For example, obtain feature information representing the center coordinates of the candidate frame and the length and width of the rectangular candidate frame.
  • the candidate frame classification module is configured to further identify, according to the overall feature information, the probability that each candidate frame is a foreground image (face) or a background image as the candidate frame classification feature information.
  • the candidate frame feature extraction module is configured to perform further feature extraction on the image feature information related to face recognition in each candidate frame on the basis of the overall feature information to obtain image feature information in the candidate frame.
  • further feature extraction is performed to obtain the position feature information of the candidate frame corresponding to each candidate frame, the classification feature information of the candidate frame, and the image feature information in the candidate frame.
  • step S140 may be further included after step S130.
  • Step S140 Identify and track the face in the to-be-processed image according to the candidate frame position feature information, the candidate frame classification feature information, and the image feature information in the candidate frame of each candidate frame.
  • the image frames in the candidate frame can also be followed according to the Image feature information for face Reid can reduce the ID loss problem when performing multi-target tracking.
  • the overall feature extraction part of the feature extraction model may include a backbone network and a multi-layer feature pyramid network of Cross Stage Connect (CSC).
  • CSC Cross Stage Connect
  • step S120 may include the following sub-steps.
  • step S121 feature extraction is performed on the to-be-processed image through a backbone network to obtain first feature information.
  • the backbone network may be a residual network, that is, the feature extraction is performed on the image to be processed through the residual network to obtain the first feature information.
  • the backbone network can be a lightweight Resnet18 network, which consists of a plurality of convolutional layers of different sizes to perform residual connection (skip connect) according to a preset configuration, and finally input the average pooling for processing , so as to ensure that the entire backbone network has better feature extraction capability.
  • Step S122 Input the first feature information into a multi-layer feature pyramid network connected across stages for processing, and obtain the second feature information output by the feature pyramid network at each level as the overall feature information of the image to be processed.
  • each layer in the multi-layer feature pyramid network may include a bottom-up module and a top-down module.
  • the first feature information is respectively input into the multi-layered feature pyramid network with hierarchical relationship; for the first layer of the feature pyramid network in the multi-layered feature pyramid network, through The bottom-up module and the top-down module of the first-layer feature pyramid network process the input data, and obtain the data output by the top-down module of the first-layer feature pyramid network as the second feature information output by the layer;
  • the input data is processed by the bottom-up module and the top-down module of the non-first-layer feature pyramid network, and the non-first-layer feature pyramid network is processed. After the data output from the bottom-up module of the feature pyramid network is fused with the data output from the bottom-up module of the feature pyramid network of the previous layer, it is used as the second feature information output by this layer.
  • the first-layer feature pyramid network (FPN1) is the first-layer feature pyramid network.
  • FPN1 obtains the first feature information from the backbone network, and after processing by the bottom-up module and the top-down module of FPN1, the data output by the top-down module is used as the second feature information output by this layer.
  • the second-layer feature pyramid network is a non-first-layer feature pyramid network.
  • FPN2 also obtains the first feature information from the backbone network and is processed by the bottom-up module and top-down module of FPN2, and then fused with the data output by the bottom-up module of FPN1, and the fused data is used as the The second feature information output by the layer.
  • the fusion method can adopt the element-wise operation, that is, the two data values of the corresponding pixel points in the feature map are added and the average value is obtained.
  • the second-layer feature pyramid network is a non-first-layer feature pyramid network.
  • FPN3 also obtains the first feature information from the backbone network and is processed by the bottom-up module and top-down module of FPN3, and then fused with the data output by the bottom-up module of FPN2, and the fused data is used as the The second feature information output by the layer.
  • feature pyramid network can improve the ability of multi-scale face detection.
  • cross-stage connection is used for feature fusion, so that the feature information of the previous layer is fused into the output data of each layer, thereby enhancing the expressiveness of the output data.
  • the feature extraction model includes a candidate frame position identification module, a candidate frame classification module and a candidate frame feature extraction module corresponding to each layer of the feature pyramid network.
  • step S130 the second feature information of each layer obtained from step S122 is input to the candidate frame position identification module, candidate frame classification module and candidate frame feature extraction module corresponding to each layer to obtain the candidate frame position features corresponding to each layer. information, the classification feature information of the candidate frame, and the image feature information in the candidate frame.
  • the FPN1 layer corresponds to a candidate frame position identification module, a candidate frame classification module and a candidate frame feature extraction module
  • the data output by the FPN1 layer is respectively input to the corresponding candidate frame position identification module, candidate frame classification module of this layer Module and candidate frame feature extraction module.
  • the candidate frame position identification module can be a 1*1*4 convolution module
  • the candidate frame classification feature information can be a 1*1*2 convolution module
  • the candidate frame feature extraction module can be a 1*1*128 convolution module .
  • candidate frame position identification module After being processed by the candidate frame position identification module, candidate frame classification module and candidate frame feature extraction module of this layer, the corresponding candidate frame position feature information bbox1, candidate frame classification feature information cls1 and image feature information feature1 in the candidate frame corresponding to this layer are obtained.
  • FPN2 and FPN3 have their own selection frame position identification module, candidate frame classification module and candidate frame feature extraction module, which can output the candidate frame position feature information bbox2, candidate frame classification feature information cls2 and candidate frame corresponding to the second layer.
  • the position feature information bbox1, bbox2, and bbox3 of the candidate frame output by each layer is spliced and fused, and used as the final output position feature information of the candidate frame.
  • the classification feature information of the candidate frame and the image feature information in the candidate frame that are finally output are also the result of various splicing and fusion.
  • the position feature information of the candidate frame output by each layer may be a 4-dimensional feature vector (x, y, w, h), which respectively represent the center coordinates of the candidate frame and the length and width of the rectangular frame.
  • the second feature information output by FPN1 is a feature map of 100*100. After the convolution of 1*1*4 in the position recognition module of the candidate frame corresponding to this layer, the output is a feature map of 100*100*4 , 4 means that there are 4 position coordinates.
  • the candidate frame classification feature information output by each layer can be a 2-dimensional feature vector, which respectively represents the probability of the pedestrian frame belonging to (foreground/background).
  • the image feature information in the candidate frame output by each layer may be a 128-dimensional feature vector, which respectively represents the 128-dimensional image feature information extracted from the face frame.
  • this embodiment provides a lightweight multi-task framework, which can simultaneously extract the position of the candidate frame containing the face in the image to be processed and extract the image feature information in the candidate frame, and input this information into the subsequent identification
  • the problem of missing target tracking IDs can be reduced when the tracking module performs further processing. Since the above-mentioned network architecture is lightweight, it can be configured on terminal devices with limited processing capabilities, such as mobile robots.
  • this embodiment further provides a model training method for training the feature extraction model, and each step of the method is explained in detail below.
  • Step S210 Acquire training samples, where the training samples include two face images marked as the same target and one face image marked as different targets, and the training samples carry the position and size information of the face frame.
  • the training samples may include three face images Face1, Face2, and Face3. Among them, Face1 and Face2 are marked as the same target, Face3 is marked as a different target, and the three face images are also marked with the position and size information of the face frame.
  • Step S220 performing feature extraction on the face image through steps S110-S130 shown in FIG. 2 .
  • the above-mentioned feature information extraction method can be used to perform feature extraction on the three face images respectively to obtain the position feature information of the candidate frame, the classification feature information of the candidate frame and the image feature information in the candidate frame corresponding to each face image.
  • Step S230 according to the position and size information of the face frame in the face image, use a regression function as a loss function to adjust the network parameters for the part where the position feature information of the candidate frame is extracted; according to the face frame in the face image
  • the position and size information of the candidate frame is adjusted by using the classification function as the loss function to adjust the network parameters of the part that extracts the classification feature information of the candidate frame;
  • the ternary loss function is used to extract the candidate frame using the twin network.
  • the part of the inner image feature information is used for network parameter adjustment.
  • a regression function can be used as the loss function to adjust the network parameters according to the position and size information of the face frame in the face image in the training sample.
  • Smooth L1 Loss can be used as the loss function.
  • a classification function can be used as a loss function to adjust network parameters according to the position and size information of the face frame in the face image in the training sample.
  • Softmax can be adopted as the loss function.
  • the expression form of the Softmax function can be as follows:
  • the twin network For the part of extracting the image feature information in the candidate frame, it is possible to use the twin network to adjust the network parameters by using the ternary loss function in conjunction with the labeling information of whether the three face images in the training sample are the same target.
  • the Triplet loss function is used as the loss function using the Siamese network.
  • the expression form of the Triplet loss function can be as follows:
  • Each batch of data calculates the loss function of 32 sets of data for reverse gradient derivation.
  • the feature information extraction method and the model training method may be executed by different electronic devices, or may be executed by the same electronic device at different stages. Do limit.
  • this embodiment further provides a feature information extraction apparatus 140 , and the feature information extraction apparatus 140 includes at least one functional module that can be stored in the machine-readable storage medium 120 in the form of software.
  • the feature information extraction device 140 may include a data acquisition module 141 , an overall feature extraction module 142 and a candidate frame feature extraction module 143 .
  • the data acquisition module 141 is used to acquire images to be processed.
  • the data acquisition module 141 may be configured to execute the step S110 shown in FIG. 2 , and for the specific description of the data acquisition module 141 , please refer to the description of the step S110 .
  • the overall feature extraction module 142 is configured to perform feature extraction on the to-be-processed image to obtain first feature information of the to-be-processed image.
  • the overall feature extraction module 142 may be configured to execute the step S120 shown in FIG. 2 , and for the specific description of the overall feature extraction module 142 , please refer to the description of the step S120 .
  • the candidate frame feature extraction module 143 is used for inputting the first feature information into the candidate frame position identification module, the candidate frame classification module and the candidate frame feature extraction module 143 respectively for processing, and obtains the frame identified according to the first feature information.
  • the candidate frame feature extraction module 143 may be configured to perform step S130 shown in FIG. 2 , and for the specific description of the candidate frame feature extraction module 143 , please refer to the description of the step S130 .
  • this embodiment further provides a model training apparatus 150 , and the model training apparatus 150 includes at least one functional module that can be stored in the machine-readable storage medium 120 in the form of software.
  • the model training device 150 may include a sample acquisition module 151, a feature acquisition module 152 and a training module 153.
  • the sample acquisition module 151 is used to acquire training samples, the training samples include two face images marked as the same target and one face image marked as different targets, and the training samples carry a face frame. location size information.
  • the sample acquisition module 151 may be configured to execute the step S210 shown in FIG. 8 , and for the specific description of the sample acquisition module 151 , please refer to the description of the step S210 .
  • the feature acquisition module 152 is configured to perform feature extraction on the face image through the feature information extraction method.
  • the feature acquisition module 152 may be configured to execute the step S220 shown in FIG. 8 , and for the specific description of the feature acquisition module 152 , please refer to the description of the step S220 .
  • the training module 153 is used to adjust the network parameters according to the position and size information of the face frame in the face image, using a regression function as a loss function to extract the position feature information of the candidate frame;
  • the position and size information of the face frame in the middle, and the classification function is used as the loss function to adjust the network parameters of the part that extracts the classification feature information of the candidate frame;
  • the part of the image feature information in the candidate frame is extracted to adjust the network parameters.
  • the training module 153 may be configured to execute the step S230 shown in FIG. 8 , and for the specific description of the training module 153 , please refer to the description of the step S230 .
  • the feature information extraction method, model training method, device, and electronic device perform further feature extraction on the basis of the overall image feature extraction of the image to be processed, and obtain the corresponding candidate frames.
  • the position feature information of the candidate frame, the classification feature information of the candidate frame and the image feature information in the candidate frame are obtained.
  • pedestrian Reid can be performed according to the image feature information in the candidate frame, which can reduce the ID during multi-target tracking. Lost problem.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function(s) executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.
  • each functional module in each embodiment of the present application may be integrated together to form an independent part, or each module may exist independently, or two or more modules may be integrated to form an independent part.
  • the functions are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了一种特征信息提取方法、模型训练方法、装置及电子设备,所述特征信息提取方法包括:获取待处理图像;对待处理图像进行特征提取,获得待处理图像的整体特征信息;将整体特征信息分别输入候选框位置识别模块、候选框分类模块及候选框特征提取模块进行处理,获得根据整体特征信息识别出的各候选框对应的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息。如此,在将根据这些特征信息进行后续跟踪识别时,即使出现人脸重叠或未识别到人脸的图像帧,还可以依据候选框内的图像特征信息进行行人Reid,可以减少多目标跟踪时的ID丢失问题。

Description

特征信息提取方法、模型训练方法、装置及电子设备
相关申请的交叉引用
本申请要求于2021年04月01日提交中国专利局的申请号为2021103573240、名称为“特征信息提取方法、模型训练方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,具体而言,涉及一种特征信息提取方法、模型训练方法、装置及电子设备。
背景技术
在动态人脸识别系统中,通常需要先对人脸进行初步识别,然后跟踪人脸的移动,在同一人脸的移动轨迹中找到一张最优的人脸图像来进行后续的人脸识别或者人脸属性分析。
但是传统的基于Sort的多目标跟踪框架在计算框和轨迹距离的时候并没有引入行人重识别(Re-identification,Reid)信息,仅仅是对人脸进行移动跟踪。这会导致,在出现两个人脸重叠时出现ID丢失的问题。另外,由于人脸活动角度比较大,当人脸出现左右晃动的时候,由于有些帧没检测到人脸,也会导致后面的帧出现ID丢失。
申请内容
为了克服现有技术中的上述不足,本申请的目的在于提供一种特征信息提取方法,所述方法包括:
获取待处理图像;
对所述待处理图像进行特征提取,获得所述待处理图像的整体特征信息;
将所述整体特征信息分别输入候选框位置识别模块、候选框分类模块及候选框特征提取模块进行处理,获得根据所述整体特征信息识别出的各候选框对应的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息。
在一些可能的实现方式中,所述方法还包括:
根据各候选框的所述候选框位置特征信息、候选框分类特征信息以及候选框内图像 特征信息,对所述待处理图像中的人脸进行识别和跟踪。
在一些可能的实现方式中,所述对所述待处理图像进行特征提取,获得所述待处理图像的整体特征信息的步骤,包括:
通过骨干网络对所述待处理图像进行特征提取,获得第一特征信息;
将所述第一特征信息输入跨阶段连接的多层特征金字塔网络进行处理,获得各层级的所述特征金字塔网络输出的第二特征信息作为所述待处理图像的整体特征信息。
在一些可能的实现方式中,所述将所述第一特征信息输入跨阶段连接的多层特征金字塔网络进行处理的步骤,包括:
将所述第一特征信息分别输入具有层级关系的多层特征金字塔网络;
针对多层所述特征金字塔网络中的首层特征金字塔网络,通过该首层特征金字塔网络的自底向上模块和自顶向下模块对输入的数据进行处理,获得该首层特征金字塔网络的自顶向下模块输出的数据作为该层输出的第二特征信息;
针对多层所述特征金字塔网络中的每个非首层特征金字塔网络,通过该非首层特征金字塔网络的自底向上模块和自顶向下模块对输入的数据进行处理,将该非首层特征金字塔网络自底向上模块的输出的数据和上一层特征金字塔网络自底向上模块输出的数据融合后,作为该层输出的第二特征信息。
在一些可能的实现方式中,所述将所述整体特征信息分别输入候选框位置识别模块、候选框分类模块及候选框特征提取模块进行处理的步骤,包括:
分别将各层的所述第二特征信息输入至与各层对应的候选框位置识别模块、候选框分类模块及候选框特征提取模块,获得与各层的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息;
对各层输出的候选框位置特征信息进行拼接,获得拼接后的候选框位置特征信息;
对各层输出的候选框分类特征信息进行拼接,获得拼接后的候选框分类特征信息;
对各层输出的候选框内图像特征信息进行拼接,获得拼接后的候选框内图像特征信息。
在一些可能的实现方式中,所述通过骨干网络对所述待处理图像进行特征提取,获得第一特征信息的步骤,包括:
通过残差网络对待处理图像进行特征提取,获得第一特征信息。
本申请的另一目的在于提供一种模型训练方法,所述方法包括:
获取训练样本,所述训练样本包括被标注为相同目标的两张人脸图像和被标注为不同目标的一张人脸图像,所述训练样本中携带有人脸框的位置尺寸信息;
通过本申请提供的所述特征信息提取方法对所述人脸图像进行特征提取;
根据所述人脸图像中人脸框的位置尺寸信息,采用回归函数作为损失函数对提取所述候选框位置特征信息的部分进行网络参数调整;
根据所述人脸图像中人脸框的位置尺寸信息,采用分类函数作为损失函数对提取所述候选框分类特征信息的部分进行网络参数调整;
结合所述训练样本中标注信息,利用孪生网路采用三元损失函数对提取所述候选框内图像特征信息的部分进行网络参数调整。
本申请的另一目的在于提供一种特征信息提取装置,所述装置包括:
数据获取模块,用于获取待处理图像;
整体特征提取模块,用于对所述待处理图像进行特征提取,获得所述待处理图像的第一特征信息;
候选框特征提取模块,用于将所述第一特征信息分别输入候选框位置识别模块、候选框分类模块及候选框特征提取模块进行处理,获得根据所述第一特征信息识别出的各候选框对应的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息。
本申请的另一目的在于提供一种电子设备,包括处理器及机器可读存储介质,所述机器可读存储介质存储有机器可执行指令,所述机器可执行指令在被所述处理器执行时,实现本申请提供的所述特征信息提取或模型训练方法。
本申请的另一目的在于提供一种机器可读存储介质,所述机器可读存储介质存储有机器可执行指令,所述机器可执行指令在被一个或多个处理器执行时,实现本申请提供的所述特征信息提取或模型训练方法。
相对于现有技术而言,本申请具有以下有益效果:
本申请实施例提供的特征信息提取方法、模型训练方法、装置及电子设备,在对待处理图像进行整体图像特征提取的基础上,进行了进一步的特征提取,获得各候选框对应的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息。如此,在将根据特征信息进行后续跟踪识别时,即使出现人脸重叠或未识别到人脸的图像帧,还可以依据候选框内的图像特征信息进行行人Reid,可以减少多目标跟踪时的ID丢失问题。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本申请实施例提供的电子设备的示意图;
图2为本申请实施例提供的特征信息提取方法的步骤流程示意图之一;
图3为本申请实施例提供的特征信息提取方法的步骤流程示意图之二;
图4为步骤S120的子步骤流程示意图;
图5为本申请实施例提供的骨干网络的结构示意图;
图6为本申请实施例提供的多层特征金字塔网络的结构示意图;
图7为本申请实施例提供的多层特征金字塔网络输出结果传递示意图;
图8为本申请实施例提供的模型训练方法的步骤流程示意图;
图9为本申请实施例提供的模型训练方法的数据流向示意图;
图10为本申请实施例提供的特征信息提取装置的功能模块示意图;
图11为本申请实施例提供的模型训练装置的功能模块示意图。
图标:100-电子设备;120-机器可读存储介质;130-处理器;140-特征信息提取装置;141-数据获取模块;142-整体特征提取模块;143-候选框特征提取模块;150-模型训练装置;151-样本获取模块;152-特征获取模块;153-训练模块。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。
因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一 个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
在本申请的描述中,需要理解的是,术语“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他30性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本申请中的具体含义。
发明人经过研究发现,传统的基于Sort的多目标跟踪框架在计算框和轨迹距离的时候并没有引入人脸Reid信息,仅仅是对人脸进行移动跟踪。这会导致在图像中出现两个人脸重叠时出现ID丢失的问题。例如,人脸重叠后,跟踪轨迹会交叉,由于没有Reid信息,导致后续跟踪轨迹而分离后无法与之前的跟踪轨迹对应上。
另外,由于人脸活动角度比较大,当人脸出现左右晃动的时候,可能存在有的画面帧中没检测到人脸,也会导致后面的帧出现ID丢失。例如,当某几帧图像没有会出现人脸时,会导致踪轨迹中断,后续出现的跟踪轨迹无法和之前原有的跟踪轨迹对应上。
下面结合附图,对本申请的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互结合。
本申请实施例提供的一种电子设备,所述电子设备可以是具有图像处理能力的设备,例如,服务器、智能手机、个人电脑(personal computer,PC)、智能机器人等。
请参照图1,图1是所述电子设备100的硬件结构示意图。该电子设备100可包括处理器130及机器可读存储介质120。处理器130与机器可读存储介质120可经由系统总线通信。并且,机器可读存储介质120存储有机器可执行指令,通过读取并执行机器可读存储介质120中与特征信息提取逻辑或模型训练逻辑对应的机器可执行指令,处理器130可执行本实施例提供的特征信息提取方法或模型训练方法。
其中,所述机器可读存储介质120可以是,但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable  Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。其中,机器可读存储介质120用于存储程序,所述处理器130在接收到执行指令后,执行所述程序。
所述处理器130可能是一种集成电路芯片,具有信号的处理能力。上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
可以理解的是,图1所示的框图仅为所述电子设备100的一种组成示意图,所述电子设备100还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。图1中所示的各组件可以采用硬件、软件或其组合实现。
请参照图2,图2为应用于图1所示的电子设备100的一种特征信息提取方法的流程图,该特征信息提取方法的各个步骤可以由所述电子设备100中配置的特征提取模型实现,以下将对所述方法包括各个步骤进行详细阐述。
步骤S110,获取待处理图像。
在本实施例中,所述待处理图像可以为需要进行人脸识别或人脸的跟踪图像,例如,该待处理图像可以为通过图像采集设备采集到的视频数据中的某一帧图像。
步骤S120,对所述待处理图像进行特征提取,获得所述待处理图像的整体特征信息。
在本实施例中,所述特征提取模型可以包括整体特征提取部分和候选框特征提取部分。所述整体特征信息可以为通过所述整体特征提取部分进行特征提取后获得的特征信息。
步骤S130,将所述整体特征信息分别输入候选框位置识别模块、候选框分类模块及候选框特征提取模块进行处理,获得根据所述整体特征信息识别出的各候选框对应的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息。
在本实施例中,所述特征提取模型的候选框特征提取部分可以包括候选框位置识别模块、候选框分类模块及候选框特征提取模块。所述整体特征提取部分获得的整体特征信息可以被分别输入至所述候选框位置识别模块、候选框分类模块及候选框特征提取模 块进行进一步的特征提取。
其中,所述候选框位置识别模块用于根据所述整体特征信息,进一步识别出所述待处理图像上可能存在待跟踪目标的候选框,并获得这些候选框的位置坐标及尺寸大小信息作为所述候选框位置特征信息。例如,获取表征候选框中心坐标和矩形候选框长宽尺寸的特征信息。
所述候选框分类模块用于根据所述整体特征信息,进一步识别出各个所述候选框为前景图像(人脸)或背景图像的概率作为候选框分类特征信息。
所述候选框特征提取模块用于在所述整体特征信息的基础上,对各个候选框中和人脸识别相关的图像特征信息进行进一步特征提取,获得所述候选框内图像特征信息。
在本实施例中,在对待处理图像进行整体图像特征提取的基础上,进行了进一步的特征提取,获得各候选框对应的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息。
例如,请参见图2,在步骤S130之后还可以包括步骤S140。
步骤S140,根据各候选框的所述候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息,对所述待处理图像中的人脸进行识别和跟踪。
如此,在根据本实施例提供的方法对视频中的各图像帧进行特征并用于进行后续跟踪识别时,即使出现人脸重叠或未识别到人脸的图像帧,后续还可以依据候选框内的图像特征信息进行人脸Reid,可以减少执行多目标跟踪时的ID丢失问题。
在一些可能的实现方式中,请参照图3,所述特征提取模型的整体特征提取部分可以包括骨干网络和跨阶段连接(Corss Stage Connect,CSC)的多层特征金字塔网络。请参照图4,步骤S120可以包括以下子步骤。
步骤S121,通过骨干网络对所述待处理图像进行特征提取,获得第一特征信息。
在本实施例中,所述骨干网络可以为残差网络,即所述通过残差网络对待处理图像进行特征提取,获得第一特征信息。例如,请参照图5,所述骨干网络可以为轻量级的Resnet18网络,其由多个不同大小的卷积层根据预设配置进行残差连接(skip connect),最后输入平均池化进行处理,如此保证整个所述骨干网络的具有较好的特征提取能力。
步骤S122,将所述第一特征信息输入跨阶段连接的多层特征金字塔网络进行处理,获得各层级的所述特征金字塔网络输出的第二特征信息作为所述待处理图像的整体特征信息。
具体地,多层所述特征金字塔网络中每一层可以包括自底向上(bottom-up)模块和自顶向下(top-down)模块。在通过多层所述特征金字塔网络进行特征提取时,将所述第一特征信息分别输入具有层级关系的多层特征金字塔网络;针对多层所述特征金字塔网络中的首层特征金字塔网络,通过该首层特征金字塔网络的自底向上模块和自顶向下模块对输入的数据进行处理,获得该首层特征金字塔网络的自顶向下模块输出的数据作为该层输出的第二特征信息;针对多层所述特征金字塔网络中的每个非首层特征金字塔网络,通过该非首层特征金字塔网络的自底向上模块和自顶向下模块对输入的数据进行处理,将该非首层特征金字塔网络自底向上模块的输出的数据和上一层特征金字塔网络自底向上模块输出的数据融合后,作为该层输出的第二特征信息。
例如,请再次参照图6,以三层特征金字塔网络为例,第一层特征金字塔网络(FPN1)为首层特征金字塔网络。FPN1从骨干网络(backbone)获得所述第一特征信息,经过FPN1的自底向上模块和自顶向下模块处理后,自顶向下模块输出的数据则作为该层输出的第二特征信息。
第二层特征金字塔网络(FPN2)为非首层特征金字塔网络。FPN2也从骨干网络(backbone)获得所述第一特征信息经过FPN2的自底向上模块和自顶向下模块处理后,再与FPN1自底向上模块输出的数据进行融合,融合后的数据作为该层输出的第二特征信息。其中,融合的方式可以采用element-wise操作,即两个将特征图中对应像素点的数据值相加后取平均值。
第二层特征金字塔网络(FPN3)为非首层特征金字塔网络。FPN3也从骨干网络(backbone)获得所述第一特征信息经过FPN3的自底向上模块和自顶向下模块处理后,再与FPN2自底向上模块输出的数据进行融合,融合后的数据作为该层输出的第二特征信息。
因为实际采集的图像中人脸大小变化的区间很大,采用特征金字塔网络可以提升对多尺度人脸检测的能力。并且在所述多层特征金字塔网络中,采用跨阶段连接进行特征融合,使得每层输出的数据中融合上一层的特征信息,从而增强了输出数据的表达能力。
在一些可能的实现方式中,请参照图7,所述特征提取模型包括与每层所述特征金字塔网络对应的候选框位置识别模块、候选框分类模块及候选框特征提取模块。
在步骤S130中,将从步骤S122获得的各层第二特征信息输入至与各层对应的候选框位置识别模块、候选框分类模块及候选框特征提取模块,获得与各层的候选框位置特 征信息、候选框分类特征信息以及候选框内图像特征信息。
针对各层输出的候选框位置特征信息进行拼接,获得拼接后的候选框位置特征信息;针对各层输出的候选框分类特征信息进行拼接,获得拼接后的候选框分类特征信息;针对各层输出的候选框内图像特征信息进行拼接,获得拼接后的候选框内图像特征信息。
最后输出拼接后的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息。
例如,再次请参照图7,FPN1层对应一个候选框位置识别模块、候选框分类模块及候选框特征提取模块,FPN1层输出的数据分别输入至该层对应的候选框位置识别模块、候选框分类模块及候选框特征提取模块。候选框位置识别模块可以为1*1*4的卷积模块,候选框分类特征信息可以为1*1*2的卷积模块,候选框特征提取模块可以为1*1*128的卷积模块。经该层的候选框位置识别模块、候选框分类模块及候选框特征提取模块处理后,获得该层对应的候选框位置特征信息bbox1、候选框分类特征信息cls1以及候选框内图像特征信息feature1。
相应的,FPN2和FPN3对应有各自的选框位置识别模块、候选框分类模块及候选框特征提取模块,能够输出第二层对应的候选框位置特征信息bbox2、候选框分类特征信息cls2以及候选框内图像特征信息feature2,以及第三层对应的候选框位置特征信息bbox3、候选框分类特征信息cls3以及候选框内图像特征信息feature3。
以图7所示候选框位置特征信息为例,对各层输出的候选框位置特征信息bbox1、bbox2、bbox3进行拼接融合后,作为最后输出的候选框位置特征信息。同理,对最后输出的所述候选框分类特征信息和候选框内图像特征信息也是各种拼接同融合后的结果。
在本实施例中,每层输出的候选框位置特征信息可以是4维的特征向量(x,y,w,h),分别代表候选框的中心坐标和矩形框的长和宽。例如,FPN1输出的第二特征信息为100*100的特征图,经过该层对应的候选框位置识别模块中1*1*4的卷积处理后,输出得是100*100*4的特征图,4是代表位置坐标有4个。
每层输出的候选框分类特征信息可以是2维的特征向量,分别代表行人框的归属于(前景/背景)的概率。
每层输出的候选框内图像特征信息可以是128维的特征向量,分别代表该人脸框提取到的128维图像特征信息。
如此,本实施例提供了一种轻量级的多任务框架,可以同时提取包含人脸的候选框 在待处理图像中的位置并提取候选框中的图像特征信息,将这些信息输入至后续识别跟踪模块进行进一步处理时,可以减少目标跟踪ID丢失的问题。由于上述采用的网络架构为轻量级的,可以配置于处理能力有限的终端设备,例如,移动端机器人。
请参照图8,本实施例还提供一种用于对所述特征提取模型进行训练的模型训练方法,下面对该方法的各个步骤进行详细解释。
步骤S210,获取训练样本,所述训练样本包括被标注为相同目标的两张人脸图像和被标注为不同目标的一张人脸图像,所述训练样本中携带有人脸框的位置尺寸信息。
请参照图9,在本实施例中,所述训练样本中可以包括3个人脸图像Face1、Face2、Face3。其中,Face1、Face2被标注为相同目标,Face3被标注为不同目标,并且,3个人脸图像中还标注有人脸框的位置和尺寸信息。
步骤S220,通过图2所示步骤S110-步骤S130对所述人脸图像进行特征提取。
在本实施例中,可以通过上述特征信息提取方法分别对3个人脸图像进行特征提取,获得各人脸图像对应的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息。
步骤S230,根据所述人脸图像中人脸框的位置尺寸信息,采用回归函数作为损失函数对提取所述候选框位置特征信息的部分进行网络参数调整;根据所述人脸图像中人脸框的位置尺寸信息,采用分类函数作为损失函数对提取所述候选框分类特征信息的部分进行网络参数调整;结合所述训练样本中标注信息,利用孪生网络采用三元损失函数对提取所述候选框内图像特征信息的部分进行网络参数调整。
具体地,在本实施例中,针对提取所述候选框位置特征信息的部分,可以根据训练样本中所述人脸图像中人脸框的位置尺寸信息,采用回归函数作为损失函数进行网络参数调整。例如,可以采用Smooth L1 Loss作为损失函数。
针对提取所述候选框分类特征信息的部分,可以根据训练样本中所述人脸图像中人脸框的位置尺寸信息,采用分类函数作为损失函数进行网络参数调整。例如,可以采用Softmax作为损失函数。
其中,Softmax函数的表达形式可以如下:
Figure PCTCN2021131681-appb-000001
针对提取所述候选框内图像特征信息的部分,可以结合所述训练样本中3个人脸图 像是否为相同目标的标注信息,利用孪生网络采用三元损失函数进行网络参数调整。利用孪生网络采用Triplet loss函数作为损失函数。
其中,Triplet loss函数的表达形式可以如下:
Figure PCTCN2021131681-appb-000002
每次会分别将训练样本中的3张人脸图像分别输入所述特征提取模型得到三组特征向量。每一批数据计算32组数据的loss函数进行反向梯度求导。
需要说明的是,在本实施例中,所述特征信息提取方法和所述模型训练方法,可以由不同的电子设备执行,也可以由相同的电子设备在不同阶段执行,在本实施例中不做限定。
请参照图10,本实施例还提供一种特征信息提取装置140,所述特征信息提取装置140包括至少一个可以软件形式存储于机器可读存储介质120中的功能模块。从功能上划分,所述特征信息提取装置140可以包括数据获取模块141、整体特征提取模块142及候选框特征提取模块143。
所述数据获取模块141用于获取待处理图像。
本实施例中,所述数据获取模块141可用于执行图2所示的步骤S110,关于所述数据获取模块141的具体描述可参对所述步骤S110的描述。
所述整体特征提取模块142用于对所述待处理图像进行特征提取,获得所述待处理图像的第一特征信息。
本实施例中,所述整体特征提取模块142可用于执行图2所示的步骤S120,关于所述整体特征提取模块142的具体描述可参对所述步骤S120的描述。
所述候选框特征提取模块143用于将所述第一特征信息分别输入候选框位置识别模块、候选框分类模块及候选框特征提取模块143进行处理,获得根据所述第一特征信息识别出的各候选框对应的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息。
本实施例中,所述候选框特征提取模块143可用于执行图2所示的步骤S130,关于所述候选框特征提取模块143的具体描述可参对所述步骤S130的描述。
请参照图11,本实施例还提供一种模型训练装置150,所述模型训练装置150包括至少一个可以软件形式存储于机器可读存储介质120中的功能模块。从功能上划分,所 述模型训练装置150可以包括样本获取模块151、特征获取模块152及训练模块153。
所述样本获取模块151用于获取训练样本,所述训练样本包括被标注为相同目标的两张人脸图像和被标注为不同目标的一张人脸图像,所述训练样本中携带有人脸框的位置尺寸信息。
本实施例中,所述样本获取模块151可用于执行图8所示的步骤S210,关于所述样本获取模块151的具体描述可参对所述步骤S210的描述。
所述特征获取模块152用于通过所述特征信息提取方法对所述人脸图像进行特征提取。
本实施例中,所述特征获取模块152可用于执行图8所示的步骤S220,关于所述特征获取模块152的具体描述可参对所述步骤S220的描述。
所述训练模块153用于根据所述人脸图像中人脸框的位置尺寸信息,采用回归函数作为损失函数对提取所述候选框位置特征信息的部分进行网络参数调整;根据所述人脸图像中人脸框的位置尺寸信息,采用分类函数作为损失函数对提取所述候选框分类特征信息的部分进行网络参数调整;结合所述训练样本中标注信息,利用孪生网路采用三元损失函数对提取所述候选框内图像特征信息的部分进行网络参数调整。
本实施例中,所述训练模块153可用于执行图8所示的步骤S230,关于所述训练模块153的具体描述可参对所述步骤S230的描述。
综上所述,本申请实施例提供的特征信息提取方法、模型训练方法、装置及电子设备,在对待处理图像进行整体图像特征提取的基础上,进行了进一步的特征提取,获得各候选框对应的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息。如此,在将根据特征信息进行后续跟踪识别时,即使出现人脸重叠或未识别到人脸的图像帧,还可以依据候选框内的图像特征信息进行行人Reid,可以减少多目标跟踪时的ID丢失问题。
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本申请的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可 以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
另外,在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。
所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述,仅为本申请的各种实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。

Claims (10)

  1. 一种特征信息提取方法,其特征在于,所述方法包括:
    获取待处理图像;
    对所述待处理图像进行特征提取,获得所述待处理图像的整体特征信息;
    将所述整体特征信息分别输入候选框位置识别模块、候选框分类模块及候选框特征提取模块进行处理,获得根据所述整体特征信息识别出的各候选框对应的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    根据各候选框的所述候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息,对所述待处理图像中的人脸进行识别和跟踪。
  3. 根据权利要求1所述的方法,其特征在于,所述对所述待处理图像进行特征提取,获得所述待处理图像的整体特征信息的步骤,包括:
    通过骨干网络对所述待处理图像进行特征提取,获得第一特征信息;
    将所述第一特征信息输入跨阶段连接的多层特征金字塔网络进行处理,获得各层级的所述特征金字塔网络输出的第二特征信息作为所述待处理图像的整体特征信息。
  4. 根据权利要求3所述的方法,其特征在于,所述将所述第一特征信息输入跨阶段连接的多层特征金字塔网络进行处理的步骤,包括:
    将所述第一特征信息分别输入具有层级关系的多层特征金字塔网络;
    针对多层所述特征金字塔网络中的首层特征金字塔网络,通过该首层特征金字塔网络的自底向上模块和自顶向下模块对输入的数据进行处理,获得该首层特征金字塔网络的自顶向下模块输出的数据作为该层输出的第二特征信息;
    针对多层所述特征金字塔网络中的每个非首层特征金字塔网络,通过该非首层特征金字塔网络的自底向上模块和自顶向下模块对输入的数据进行处理,将该非首层特征金字塔网络自底向上模块的输出的数据和上一层特征金字塔网络自底向上模块输出的数据融合后,作为该层输出的第二特征信息。
  5. 根据权利要求4所述的方法,其特征在于,所述将所述整体特征信息分别输入候选框位置识别模块、候选框分类模块及候选框特征提取模块进行处理的步骤,包括:
    分别将各层的所述第二特征信息输入至与各层对应的候选框位置识别模块、候选框分类模块及候选框特征提取模块,获得与各层的候选框位置特征信息、候选框分类特征 信息以及候选框内图像特征信息;
    对各层输出的候选框位置特征信息进行拼接,获得拼接后的候选框位置特征信息;
    对各层输出的候选框分类特征信息进行拼接,获得拼接后的候选框分类特征信息;
    对各层输出的候选框内图像特征信息进行拼接,获得拼接后的候选框内图像特征信息。
  6. 根据权利要求3所述的方法,其特征在于,所述通过骨干网络对所述待处理图像进行特征提取,获得第一特征信息的步骤,包括:
    通过残差网络对待处理图像进行特征提取,获得第一特征信息。
  7. 一种模型训练方法,其特征在于,所述方法包括:
    获取训练样本,所述训练样本包括被标注为相同目标的两张人脸图像和被标注为不同目标的一张人脸图像,所述训练样本中携带有人脸框的位置尺寸信息;
    通过权利要求1-6任意一项所述的特征信息提取方法对所述人脸图像进行特征提取;
    根据所述人脸图像中人脸框的位置尺寸信息,采用回归函数作为损失函数对提取所述候选框位置特征信息的部分进行网络参数调整;
    根据所述人脸图像中人脸框的位置尺寸信息,采用分类函数作为损失函数对提取所述候选框分类特征信息的部分进行网络参数调整;
    结合所述训练样本中标注信息,利用孪生网路采用三元损失函数对提取所述候选框内图像特征信息的部分进行网络参数调整。
  8. 一种特征信息提取装置,其特征在于,所述装置包括:
    数据获取模块,用于获取待处理图像;
    整体特征提取模块,用于对所述待处理图像进行特征提取,获得所述待处理图像的第一特征信息;
    候选框特征提取模块,用于将所述第一特征信息分别输入候选框位置识别模块、候选框分类模块及候选框特征提取模块进行处理,获得根据所述第一特征信息识别出的各候选框对应的候选框位置特征信息、候选框分类特征信息以及候选框内图像特征信息。
  9. 一种电子设备,其特征在于,包括处理器及机器可读存储介质,所述机器可读存储介质存储有机器可执行指令,所述机器可执行指令在被所述处理器执行时,实现权利要求1-7任意一项所述的方法。
  10. 一种机器可读存储介质,其特征在于,所述机器可读存储介质存储有机器可执行 指令,所述机器可执行指令在被一个或多个处理器执行时,实现权利要求1-7任意一项所述的方法。
PCT/CN2021/131681 2021-04-01 2021-11-19 特征信息提取方法、模型训练方法、装置及电子设备 WO2022205937A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110357324.0 2021-04-01
CN202110357324.0A CN112926531B (zh) 2021-04-01 2021-04-01 特征信息提取方法、模型训练方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2022205937A1 true WO2022205937A1 (zh) 2022-10-06

Family

ID=76173852

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131681 WO2022205937A1 (zh) 2021-04-01 2021-11-19 特征信息提取方法、模型训练方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN112926531B (zh)
WO (1) WO2022205937A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661586A (zh) * 2022-12-09 2023-01-31 云粒智慧科技有限公司 模型训练和人流量统计方法、装置及设备
CN116883951A (zh) * 2023-09-07 2023-10-13 杭州像素元科技有限公司 基于多源信息感知的高速施工员识别方法、装置及其应用
CN117635688B (zh) * 2023-11-28 2024-06-07 广州恒沙数字科技有限公司 一种尺寸测量方法、装置、电子设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926531B (zh) * 2021-04-01 2023-09-26 深圳市优必选科技股份有限公司 特征信息提取方法、模型训练方法、装置及电子设备
CN113963150B (zh) * 2021-11-16 2022-04-08 北京中电兴发科技有限公司 一种基于多尺度孪生级联网络的行人重识别方法
CN114639165B (zh) * 2022-03-16 2024-05-10 平安科技(深圳)有限公司 基于人工智能的行人重识别方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304820A (zh) * 2018-02-12 2018-07-20 腾讯科技(深圳)有限公司 一种人脸检测方法、装置及终端设备
CN109948568A (zh) * 2019-03-26 2019-06-28 东华大学 基于arm微处理器和深度学习的嵌入式人脸识别系统
CN110163032A (zh) * 2018-02-13 2019-08-23 浙江宇视科技有限公司 一种人脸检测方法及装置
CN112381075A (zh) * 2021-01-18 2021-02-19 北京蒙帕信创科技有限公司 一种机房特定场景下进行人脸识别的方法及系统
CN112926531A (zh) * 2021-04-01 2021-06-08 深圳市优必选科技股份有限公司 特征信息提取方法、模型训练方法、装置及电子设备

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201123031A (en) * 2009-12-24 2011-07-01 Univ Nat Taiwan Science Tech Robot and method for recognizing human faces and gestures thereof
WO2015083199A1 (en) * 2013-12-04 2015-06-11 J Tech Solutions, Inc. Computer device and method executed by the computer device
CN109934115B (zh) * 2019-02-18 2021-11-02 苏州市科远软件技术开发有限公司 人脸识别模型的构建方法、人脸识别方法及电子设备
CN110210285A (zh) * 2019-04-16 2019-09-06 浙江大华技术股份有限公司 人脸跟踪方法、人脸跟踪装置以及计算机存储介质
CN110321923B (zh) * 2019-05-10 2021-05-04 上海大学 不同尺度感受野特征层融合的目标检测方法、系统及介质
CN110729045A (zh) * 2019-10-12 2020-01-24 闽江学院 一种基于上下文感知残差网络的舌图像分割方法
CN111340039B (zh) * 2020-02-12 2023-10-17 杰创智能科技股份有限公司 一种基于特征选择的目标检测方法
CN111339893B (zh) * 2020-02-21 2022-11-22 哈尔滨工业大学 基于深度学习和无人机的管道检测系统及方法
CN111738280A (zh) * 2020-06-29 2020-10-02 腾讯科技(武汉)有限公司 一种图像识别方法、装置、设备及可读存储介质
CN111914782A (zh) * 2020-08-10 2020-11-10 河南威虎智能科技有限公司 人脸及其特征点的检测方法、装置、电子设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304820A (zh) * 2018-02-12 2018-07-20 腾讯科技(深圳)有限公司 一种人脸检测方法、装置及终端设备
CN110163032A (zh) * 2018-02-13 2019-08-23 浙江宇视科技有限公司 一种人脸检测方法及装置
CN109948568A (zh) * 2019-03-26 2019-06-28 东华大学 基于arm微处理器和深度学习的嵌入式人脸识别系统
CN112381075A (zh) * 2021-01-18 2021-02-19 北京蒙帕信创科技有限公司 一种机房特定场景下进行人脸识别的方法及系统
CN112926531A (zh) * 2021-04-01 2021-06-08 深圳市优必选科技股份有限公司 特征信息提取方法、模型训练方法、装置及电子设备

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661586A (zh) * 2022-12-09 2023-01-31 云粒智慧科技有限公司 模型训练和人流量统计方法、装置及设备
CN115661586B (zh) * 2022-12-09 2023-04-18 云粒智慧科技有限公司 模型训练和人流量统计方法、装置及设备
CN116883951A (zh) * 2023-09-07 2023-10-13 杭州像素元科技有限公司 基于多源信息感知的高速施工员识别方法、装置及其应用
CN116883951B (zh) * 2023-09-07 2023-11-10 杭州像素元科技有限公司 基于多源信息感知的高速施工员识别方法、装置及其应用
CN117635688B (zh) * 2023-11-28 2024-06-07 广州恒沙数字科技有限公司 一种尺寸测量方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN112926531B (zh) 2023-09-26
CN112926531A (zh) 2021-06-08

Similar Documents

Publication Publication Date Title
WO2022205937A1 (zh) 特征信息提取方法、模型训练方法、装置及电子设备
CN109344701B (zh) 一种基于Kinect的动态手势识别方法
WO2022126377A1 (zh) 检测车道线的方法、装置、终端设备及可读存储介质
US9619696B2 (en) Duplicate reduction for face detection
WO2020206850A1 (zh) 基于高维图像的图像标注方法和装置
CN109543641B (zh) 一种实时视频的多目标去重方法、终端设备及存储介质
US20120148093A1 (en) Blob Representation in Video Processing
US20230267735A1 (en) Method for structuring pedestrian information, device, apparatus and storage medium
WO2022082999A1 (zh) 一种物体识别方法、装置、终端设备及存储介质
CN109447022B (zh) 一种镜头类型识别方法及装置
CN112154476A (zh) 用于快速对象检测的系统和方法
WO2024077781A1 (zh) 基于卷积神经网络模型的图像识别方法、装置及终端设备
CN113673584A (zh) 一种图像检测方法及相关装置
WO2024001123A1 (zh) 基于神经网络模型的图像识别方法、装置及终端设备
WO2022033264A1 (zh) 人体特征点的筛选方法、装置、电子设备以及存储介质
US11709914B2 (en) Face recognition method, terminal device using the same, and computer readable storage medium
CN109635649B (zh) 一种无人机侦察目标的高速检测方法及系统
CN110070490B (zh) 图像拼接方法和装置
WO2023160061A1 (zh) 图像中运动对象的确定方法、装置、电子设备和存储介质
CN114267076B (zh) 一种图像识别方法、装置、设备及存储介质
CN115841672A (zh) 文字检测识别方法、装置及设备
CN114821777A (zh) 一种手势检测方法、装置、设备及存储介质
KR102224218B1 (ko) 비디오 시간 정보를 활용하는 딥러닝 기반 물체 검출 방법 및 장치
CN113705643A (zh) 一种目标物检测方法、装置以及电子设备
Soon et al. The utilization of feature based Viola-Jones method for face detection in invariant rotation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21934573

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21934573

Country of ref document: EP

Kind code of ref document: A1