WO2022012179A1 - 生成特征提取网络的方法、装置、设备和计算机可读介质 - Google Patents

生成特征提取网络的方法、装置、设备和计算机可读介质 Download PDF

Info

Publication number
WO2022012179A1
WO2022012179A1 PCT/CN2021/096145 CN2021096145W WO2022012179A1 WO 2022012179 A1 WO2022012179 A1 WO 2022012179A1 CN 2021096145 W CN2021096145 W CN 2021096145W WO 2022012179 A1 WO2022012179 A1 WO 2022012179A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
feature map
affine transformation
image
vector
Prior art date
Application number
PCT/CN2021/096145
Other languages
English (en)
French (fr)
Inventor
何轶
王长虎
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2022012179A1 publication Critical patent/WO2022012179A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/14Transformations for image registration, e.g. adjusting or mapping for alignment of images
    • G06T3/147Transformations for image registration, e.g. adjusting or mapping for alignment of images using affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method, apparatus, electronic device, and computer-readable medium for generating a feature extraction network.
  • Some embodiments of the present disclosure propose a method, apparatus, electronic device, and computer-readable medium for generating a feature extraction network to solve the technical problems mentioned in the background art above.
  • some embodiments of the present disclosure provide a method for generating a feature extraction network, the method comprising: inputting a first sample image and a second sample image into the aforementioned feature extraction network respectively to obtain a first sample feature map and a second sample feature map, wherein the second sample image is obtained by performing affine transformation on the first sample image; performing affine transformation on the first sample feature map to obtain the first sample affine transformation feature Figure; for the first vector and the second vector in the same position in the affine transformation feature map of the first sample and the second sample feature map, determine the loss of the first vector and the second vector based on a preset loss function value; the above feature extraction network is trained based on the above loss value.
  • some embodiments of the present disclosure provide an apparatus for generating a feature extraction network, the apparatus comprising: a feature map generating unit configured to: input a first sample image and a second sample image into the above-mentioned feature extraction network respectively , to obtain a first sample feature map and a second sample feature map, wherein the second sample image is obtained by performing affine transformation on the first sample image; the affine transformation unit is configured to perform affine transformation on the first sample image.
  • the feature map is subjected to affine transformation to obtain a first sample affine transformation feature map;
  • the loss determination unit is configured to perform affine transformation on the first sample affine transformation feature map and the first vector in the same position in the second sample feature map and a second vector, the loss values of the first vector and the second vector are determined based on a preset loss function;
  • the network training unit is configured to train the feature extraction network based on the loss value.
  • some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device on which one or more programs are stored, when one or more programs are stored by one or more The processor executes, causing the one or more processors to implement a method as described in any one of the implementations of the first aspect, or to implement a method as described in any of the implementations of the third aspect.
  • some embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in any of the implementation manners of the first aspect, or implements A method as described in any implementation of the third aspect.
  • One of the foregoing embodiments of the present disclosure has the following beneficial effects: performing affine transformation on the first sample image to obtain a second sample image, and inputting the first sample image and the second sample image into a feature extraction network
  • the feature map of the above image can be obtained.
  • the feature extraction network can be optimized using the feature map and the image obtained by performing affine transformation on the first sample feature map.
  • the loss value at the same position of the image obtained by affine transformation of the first sample feature map and the feature map obtained after the first sample image is input to the feature extraction network can be determined, and the feature extraction network can be trained by using the loss value. On this basis, the similarity between the features of the image after affine transformation and the features of the original image can be improved.
  • FIG. 1 is a schematic diagram of an application scenario of a method for generating a feature extraction network according to some embodiments of the present disclosure
  • FIG. 2 is a flowchart of some embodiments of a method of generating a feature extraction network according to the present disclosure
  • FIG. 3 is a flowchart of other embodiments of methods of generating a feature extraction network according to the present disclosure
  • FIG. 4 is a schematic structural diagram of some embodiments of an apparatus for generating a feature extraction network according to the present disclosure
  • FIG. 5 is a schematic structural diagram of an electronic device suitable for implementing some embodiments of the present disclosure.
  • FIG. 1 is a schematic diagram 100 of an application scenario of a method for generating a feature extraction network according to some embodiments of the present disclosure.
  • the computing device 101 can respectively input the first sample image 102 and the second sample image 103 into the feature extraction network 104 to obtain the first sample feature map 105 and the second sample feature map 106 .
  • the above-mentioned first sample image 102 may be a face image
  • the second sample image 103 is the above-mentioned face image after affine transformation.
  • Affine transformation is performed on the first sample feature map 105
  • the affine transformation may be translation transformation
  • the first sample affine transformation feature map 107 can be obtained.
  • the first vector and the second vector in the same position of the first sample affine transformation feature map 107 and the second sample feature map 106 can be determined to determine the above-mentioned first vector and the above-mentioned second vector.
  • a vector of loss values of 108 As an example, after a certain pixel position in the first sample image 102 has undergone feature extraction and affine transformation, the first position in the affine transformation feature map 107 of the first sample is different from that after affine transformation and feature extraction.
  • the second position in the above-mentioned second sample feature map 106 may be the same position.
  • the above loss function may be a Euclidean distance loss function.
  • the above-mentioned feature extraction network 104 is trained based on the above-mentioned loss value 108, which can realize the optimization of the feature extraction network, and on this basis, can improve the similarity between the image and the features extracted from the affine transformed image.
  • the method for generating a feature extraction network may be executed by the computing device 101, or may also be executed by a server, and the execution body of the above method may also include the above-mentioned computing device 101 and the above-mentioned server are integrated through the network. device, or also various software programs to execute.
  • the computing device 101 may be various electronic devices with information processing capabilities, including but not limited to smart phones, tablet computers, e-book readers, laptop computers, desktop computers, and the like.
  • the execution body may also be embodied as a server, software, or the like.
  • the execution body is software, it can be installed in the electronic devices listed above. It can be implemented, for example, as multiple software or software modules for providing distributed services, or as a single software or software module. There is no specific limitation here.
  • the method for generating a feature extraction network includes the following steps:
  • Step 201 Input the first sample image and the second sample image into the above-mentioned feature extraction network respectively to obtain a first sample feature map and a second sample feature map, wherein the above-mentioned second sample image is performed on the first sample image. obtained by affine transformation.
  • the first sample image may be any image.
  • the first sample feature map and the second sample feature map may have features such as size features and light and dark features of the image.
  • the first sample image and the second sample feature map have color features, texture features, shape features, and spatial relationship features.
  • affine transformations may include operations such as translation, rotation, scaling, shearing, or reflection.
  • the first sample image may be rotated to obtain the second sample image, or the first sample image may be scaled to obtain the second sample image.
  • the feature extraction network may be various neural networks for feature extraction.
  • it can be a convolutional neural network or a recurrent neural network.
  • the first sample feature map and the second sample feature map may be images having features of the first sample image and the second sample image, respectively.
  • Step 202 Perform affine transformation on the first sample feature map to obtain a first sample affine transformation feature map.
  • translation transformation may be performed on the first sample feature map to obtain the first sample affine transformation feature map.
  • the first sample feature map may also be rotated to obtain the first sample affine transformation feature map.
  • Step 203 for the first vector and the second vector at the same position in the affine transformation feature map of the first sample and the second sample feature map, determine the difference between the first vector and the second vector based on a preset loss function. loss value.
  • the above-mentioned same position may be the position of the first sample affine transformation feature map and the above-mentioned second sample feature map, which have the same coordinates in the same coordinate system.
  • the first vector and the second vector are feature vectors at the same location in the first sample affine transformation feature map and the second sample feature map.
  • the loss function is a function that defines the difference between the fitted and true results.
  • the loss function may be an absolute value loss function or a squared loss function.
  • the loss value may be the degree of image dissimilarity between the affine transformation feature map of the first sample and the feature map of the second sample.
  • normalize the vector corresponding to each pixel in the affine transformation feature map of the first sample and the second sample feature map to obtain a normalized vector of the affine transformation feature map of the first sample The set and the set of normalized vectors of the second sample feature map.
  • each normalized vector corresponding to the normalized vector set of the second sample feature map is used to determine the loss value by the following formula, and the specific formula is as follows:
  • the second position in can be the same position.
  • i represents the corresponding hash code in the normalized vector set of the affine transformation feature map of the first sample at the same position, and the ith bit of the corresponding hash code in the normalized vector set of the second sample feature map.
  • p i is the probability that the i-th bit of the corresponding hash code in the normalized vector set of the first sample affine transformation feature map takes 1.
  • elements greater than 0.5 in the normalized vector set may be used as hash code 1, and elements smaller than 0.5 may be used as hash code 0.
  • q i is the probability that the i-th bit of the corresponding hash code in the normalized vector set of the second sample feature map takes 1.
  • p i q i represents the probability that the i-th bit of the corresponding hash code in the normalized vector set of the first sample affine transformation feature map takes 1, and the corresponding probability in the normalized vector set of the second sample feature map The probability product of the ith bit of the hash code taking 1.
  • Probability of the i-th bit set to 0 normalized hash coding vector corresponding to the set of 1-p i affine transformed feature representing the first sample of FIG, 1-q i representing a second sample feature FIG normalized vector The probability that the ith bit of the corresponding hash code in the set takes 0. (1-p i )(1-q i ) represents the probability that the i-th bit of the corresponding hash code in the normalized vector set of the affine transformation feature map of the first sample takes 0, which is the same as that of the second sample feature map.
  • the i-th bit of the corresponding hash code in the normalized vector set takes the probability product of 0.
  • p i q i +(1-p i )(1-q i ) represents the i-th bit of the corresponding hash code in the normalized vector set of the affine transformation feature map of the predicted first sample, which is the same as the second sample
  • the degree of difference between the i-th bits of the corresponding hash codes in the normalized vector set of the feature map. Indicates the sum of the hash-coded prediction difference values corresponding to each element in the normalized vector set of the affine transformation feature map of the first sample and the normalized vector set of the second sample feature map.
  • the obtained result is the loss value between the normalized vector of the affine transformation feature map of the first sample and the normalized vector of the second sample feature map at the same position.
  • the loss function may be a maximum likelihood estimation function, a divergence function, or a Hamming distance.
  • Step 204 train the above feature extraction network based on the above loss value.
  • the weights in the feature extraction network can be optimized by gradient descent to minimize the loss.
  • One of the above-mentioned embodiments of the present disclosure has the following beneficial effects: based on the loss function, affine transformation is performed on the first sample image, and then the image obtained by feature extraction, and the first sample image is subjected to feature extraction, After comparing the pictures obtained by affine transformation, the loss value of the training neural network can be obtained. Using the above loss values, the feature extraction network can be optimized and trained, so as to realize the optimization of the feature extraction network.
  • the method for generating a feature extraction network includes the following steps:
  • Step 301 Preprocess the first image to obtain a first sample image.
  • grayscale processing, geometric transformation, and image enhancement may be performed on the first image.
  • Step 302 Input the first sample image and the second sample image into the feature extraction network respectively to obtain a first sample feature map and a second sample feature map, wherein the second sample image is obtained from the first sample image. obtained by affine transformation.
  • Step 303 Perform affine transformation on the above-mentioned first sample feature map to obtain a first sample affine transformation feature map.
  • Step 304 For the first vector and the second vector at the same position in the affine transformation feature map of the first sample and the second sample feature map, determine the difference between the first vector and the second vector based on a preset loss function. loss value.
  • Step 305 Train the above feature extraction network based on the above loss value.
  • steps 302 , 303 , 304 and 305 for the specific implementation of steps 302 , 303 , 304 and 305 and the technical effects brought about by them, reference may be made to steps 201 , 202 , 203 and 204 in the embodiment corresponding to FIG. 2 , and details are not repeated here. .
  • the method for generating a feature extraction network disclosed in some embodiments of the present disclosure is based on image preprocessing, performing grayscale processing, geometric transformation and image enhancement, which can eliminate irrelevant information in the image, thereby improving the training effect of the network. Based on the above The training effect can improve the accuracy of the features extracted by the network.
  • the present disclosure provides some embodiments of an apparatus for generating a feature extraction network, these apparatus embodiments correspond to those method embodiments shown in FIG. 2 , Specifically, the device can be applied to various electronic devices.
  • an apparatus 400 for generating a feature extraction network in some embodiments includes: a feature map generating unit 401 , an affine transformation unit 402 , a loss value determining unit 403 and a network training unit 404 .
  • the feature map generation unit 401 is configured to input the first sample image and the second sample image into the feature extraction network, respectively, to obtain the first sample feature map and the second sample feature map, wherein the second sample image is obtained by performing affine transformation on the first sample image; the affine transformation unit 402 is configured to perform affine transformation on the above-mentioned first sample feature map to obtain the first sample affine transformation feature map; the loss value is determined
  • the unit 403 is configured to determine the first vector and the second vector based on a preset loss function for the first vector and the second vector at the same position in the affine transformation feature map of the first sample and the second sample feature map.
  • the loss value of the two vectors; the network training unit 404 is configured to train the above-mentioned feature extraction network based on the above-mentioned loss value.
  • the above-mentioned apparatus further includes: an image preprocessing unit configured to preprocess the first image to obtain the above-mentioned first sample image.
  • the above-mentioned first sample feature map and second sample feature map include: color features, texture features, shape features, and spatial relationship features.
  • the above-mentioned loss function is one of the following: a maximum likelihood estimation function, a divergence function, and a Hamming distance.
  • the units recorded in the apparatus 400 correspond to the respective steps in the method described with reference to FIG. 2 . Therefore, the operations, features, and beneficial effects described above with respect to the method are also applicable to the apparatus 400 and the units included therein, and details are not described herein again.
  • an electronic device 500 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 501 that may be loaded into random access according to a program stored in a read only memory (ROM) 502 or from a storage device 508 Various appropriate actions and processes are executed by the programs in the memory (RAM) 503 . In the RAM 503, various programs and data required for the operation of the electronic device 500 are also stored.
  • the processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • An input/output (I/O) interface 505 is also connected to bus 504 .
  • the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 507 such as a computer; a storage device 508 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 509 .
  • Communication means 509 may allow electronic device 500 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 5 shows electronic device 500 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in FIG. 5 can represent one device, and can also represent multiple devices as required.
  • the processes described above with reference to the flowcharts may be implemented as computer software programs.
  • some embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from a network via communication device 509, or from storage device 508, or from ROM 502.
  • the processing device 501 When the computer program is executed by the processing device 501, the above-mentioned functions defined in the methods of some embodiments of the present disclosure are performed.
  • the computer-readable medium described in some embodiments of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • the client and server can use any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects.
  • HTTP HyperText Transfer Protocol
  • Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic apparatus; or may exist alone without being incorporated into the electronic apparatus.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device causes the electronic device to: input the first sample image and the second sample image into the above-mentioned feature extraction network, respectively, Obtain a first sample feature map and a second sample feature map, wherein the second sample image is obtained by performing affine transformation on the first sample image; perform affine transformation on the first sample feature map to obtain the first sample feature map.
  • Computer program code for carrying out operations of some embodiments of the present disclosure may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, or a combination thereof , as well as conventional procedural programming languages - such as "C" or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the units described in some embodiments of the present disclosure may be implemented by means of software, and may also be implemented by means of hardware.
  • the described unit can also be set in the processor, for example, it can be described as: a processor includes a feature map generation unit, an affine transformation unit, a loss value determination unit and a network training unit. Wherein, the names of these units do not constitute a limitation on the unit itself in some cases, for example, the feature map generating unit may also be described as a "unit for generating feature maps".
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a method for generating a feature extraction network comprising: inputting a first sample image and a second sample image into the above-mentioned feature extraction network, respectively, to obtain a first sample feature map and a The second sample feature map, wherein the second sample image is obtained by performing affine transformation on the first sample image; performing affine transformation on the first sample feature map to obtain the first sample affine transformation feature map ; For the first vector and the second vector at the same position in the above-mentioned first sample affine transformation feature map and the above-mentioned second sample characteristic map, determine the loss value of the above-mentioned first vector and the above-mentioned second vector based on a preset loss function ; Train the above feature extraction network based on the above loss value.
  • the first sample image and the second sample image are respectively input into the feature extraction network to obtain a first sample feature map and a second sample feature map, wherein the second sample Before the image is obtained by performing affine transformation on the first sample image, the method further includes: preprocessing the first image to obtain the above-mentioned first sample image.
  • the above-mentioned first sample feature map and second sample feature map include: color features, texture features, shape features, and spatial relationship features.
  • the above-mentioned loss function is one of the following: a maximum likelihood estimation function, a divergence function, and a Hamming distance.
  • the apparatus for generating a feature extraction network includes: a feature map generating unit configured to input the first sample image and the second sample image into the feature extraction network respectively to obtain the first sample A feature map and a second sample feature map, wherein the second sample image is obtained by performing affine transformation on the first sample image; the affine transformation unit is configured to perform affine transformation on the first sample feature map to obtain the first sample affine transformation feature map; the loss value determination unit is configured to perform the affine transformation feature map of the first sample and the first vector and the second vector at the same position in the second sample feature map, The loss value of the first vector and the second vector is determined based on the preset loss function; the network training unit is configured to train the feature extraction network based on the loss value.
  • the above-mentioned apparatus further includes: an image preprocessing unit configured to preprocess the first image to obtain the above-mentioned first sample image.
  • the above-mentioned first sample feature map and second sample feature map include: color features, texture features, shape features, and spatial relationship features.
  • the above-mentioned loss function is one of the following: a maximum likelihood estimation function, a divergence function, and a Hamming distance.
  • an electronic device comprising: one or more processors; a storage device on which one or more programs are stored, when the one or more programs are stored by one or more The processors execute such that one or more processors implement a method as in any of the above.
  • a computer-readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the foregoing embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种生成特征提取网络的方法、装置、电子设备和计算机可读介质。该方法包括:将第一样本图像和第二样本图像分别输入上述特征提取网络,得到第一样本特征图和第二样本特征图,其中,上述第二样本图像是对第一样本图像进行仿射变换得到的(201);对上述第一样本特征图进行仿射变换,得到第一样本仿射变换特征图(202);对上述第一样本仿射变换特征图和上述第二样本特征图中同一位置的第一向量和第二向量,基于预设的损失函数确定上述第一向量和上述第二向量的损失值(203);基于上述损失值对上述特征提取网络进行训练(204)。该方法实现了对特征提取网络的训练优化,使仿射变换图片提取的特征与原图片提取的特征相近。

Description

生成特征提取网络的方法、装置、设备和计算机可读介质
相关申请的交叉引用
本申请要求于2020年07月16日提交的,申请号为202010685579.5、发明名称为“生成特征提取网络的方法、装置、设备和计算机可读介质”的中国专利申请的优先权,该申请的全文通过引用结合在本申请中。
技术领域
本公开的实施例涉及计算机技术领域,具体涉及生成特征提取网络的方法、装置、电子设备和计算机可读介质。
背景技术
随着互联网的发展和以深度学习为核心的人工智能技术的普及,计算机视觉技术涉及到人们生活的各个领域。实践中,经过仿射变换的图片提取的特征,与原图片提取的特征差异较大,进而影响后续的相似度计算的准确性。
发明内容
本公开的内容部分用于以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。本公开的内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
本公开的一些实施例提出了一种生成特征提取网络的方法、装置、电子设备和计算机可读介质,来解决以上背景技术中提到的技术问题。
第一方面,本公开的一些实施例提供了一种生成特征提取网络的方法,该方法包括:将第一样本图像和第二样本图像分别输入上述特征提取网络,得到第一样本特征图和第二样本特征图,其中,上述第二样本图像是对第一样本图像进行仿射变换得到的;对上述第一样本 特征图进行仿射变换,得到第一样本仿射变换特征图;对上述第一样本仿射变换特征图和上述第二样本特征图中同一位置的第一向量和第二向量,基于预设的损失函数确定上述第一向量和上述第二向量的损失值;基于上述损失值对上述特征提取网络进行训练。
第二方面,本公开的一些实施例提供了一种生成特征提取网络的装置,装置包括:特征图生成单元,被配置成:将第一样本图像和第二样本图像分别输入上述特征提取网络,得到第一样本特征图和第二样本特征图,其中,上述第二样本图像是对第一样本图像进行仿射变换得到的;仿射变换单元,被配置成对上述第一样本特征图进行仿射变换,得到第一样本仿射变换特征图;损失确定单元,被配置成对上述第一样本仿射变换特征图和上述第二样本特征图中同一位置的第一向量和第二向量,基于预设的损失函数确定上述第一向量和上述第二向量的损失值;网络训练单元,被配置成基于上述损失值对上述特征提取网络进行训练。
第三方面,本公开的一些实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如第一方面中任一实现方式描述的方法,或实现第三方面中任一实现方式描述的方法。
第四方面,本公开的一些实施例提供了一种计算机可读介质,其上存储有计算机程序,其中,程序被处理器执行时实现如第一方面中任一实现方式描述的方法,或实现如第三方面中任一实现方式描述的方法。
本公开的上述各个实施例中的一个实施例具有如下有益效果:将第一样本图像进行仿射变换可以得到第二样本图像,将第一样本图像和第二样本图像输入到特征提取网络可以得到上述图像的特征图。可以使用该特征图和对上述第一样本特征图进行仿射变换后的图像对特征提取网络进行优化。可以确定第一样本特征图进行仿射变换后的图像,和第一样本图像输入到特征提取网络后得到的特征图的同一位置的损失值,使用该损失值可以训练特征提取网络。在此基础上,可以 提升经过仿射变换后图片的特征与原图片特征的相似度。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1是根据本公开的一些实施例的生成特征提取网络的方法的一个应用场景的示意图;
图2是根据本公开的生成特征提取网络的方法的一些实施例的流程图;
图3是根据本公开的生成特征提取网络的方法的另一些实施例的流程图;
图4是根据本公开的生成特征提取网络的装置的一些实施例的结构示意图;
图5是适于用来实现本公开的一些实施例的电子设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例。相反,提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不 同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
下面将参考附图并结合实施例来详细说明本公开。
图1是根据本公开一些实施例的一种生成特征提取网络的方法的一个应用场景的示意图100。
如图1所示,计算设备101可以将第一样本图像102和第二样本图像103分别输入上述特征提取网络104,得到第一样本特征图105和第二样本特征图106。作为示例,上述第一样本图像102可以是人脸图像,第二样本图像103是经过仿射变换后的上述人脸图像。对第一样本特征图105进行仿射变换,仿射变换可以是平移变换,可以得到第一样本仿射变换特征图107。基于预设的欧式距离损失函数,可以对第一样本仿射变换特征图107和上述第二样本特征图中106同一位置的第一向量和第二向量,确定上述第一向量和上述第二向量的损失值108。作为示例,第一样本图像102中某像素位置,经过特征提取和仿射变换后,在第一样本仿射变换特征图107中的第一位置,与经过仿射变换和特征提取后,在上述第二样本特征图106中的第二位置,可以是同一位置。上述损失函数可以是欧式距离损失函数。基于上述损失值108对上述特征提取网络104进行训练,可以实现对特征提取网络的优化,在此基础上,可以提升图片和仿射变换后图片提取的特征的相似度。可以理解的是,生成特征提取网络方法可以是由计算设备101来执行,或者也可以是由服务器来执行,上述方法的执行主体还可以包括上述计算设备101与上述服务器通过网络相集成所构成的设备,或者还可以是各种软件程序来执行。其中,计算设备101可以是具有信息处理能力的各种电子设备,包括但不限于智能手机、 平板电脑、电子书阅读器、膝上型便携计算机和台式计算机等等。此外,执行主体也可以体现为服务器、软件等。当执行主体为软件时,可以安装在上述所列举的电子设备中。其可以实现成例如用来提供分布式服务的多个软件或软件模块,也可以实现成单个软件或软件模块。在此不做具体限定。
应该理解,图1中的计算设备数目仅仅是示意性的。根据实现需要,可以具有任意数目的计算设备。
继续参考图2,示出了根据本公开的一种生成特征提取网络的方法的一些实施例的流程200。该生成特征提取网络的方法,包括以下步骤:
步骤201,将第一样本图像和第二样本图像分别输入上述特征提取网络,得到第一样本特征图和第二样本特征图,其中,上述第二样本图像是对第一样本图像进行仿射变换得到的。
在一些实施例中,第一样本图像可以是任意图像。
在一些实施例中,第一样本特征图和第二样本特征图可以具有图像的尺寸特征和明暗特征等特征。
在一些实施例的一些可选实现方式中,第一样本图像和第二样本特征图具有颜色特征、纹理特征、形状特征和空间关系特征。
在一些实施例中,仿射变换可以包括平移、旋转、放缩、剪切或反射等操作。作为示例,可以对第一样本图像进行旋转得到第二样本图像,也可以对第一样本图像进行放缩得到第二样本图像。
在一些实施例中,特征提取网络可以是用于特征提取的各种神经网络。例如,可以是卷积神经网络,也可以是循环神经网络。第一样本特征图和第二样本特征图可以是分别具有第一样本图像和第二样本图像的特征的图像。
步骤202,对上述第一样本特征图进行仿射变换,得到第一样本仿射变换特征图。
在一些实施例中,作为示例,可以对第一样本特征图进行平移变 换,得到第一样本仿射变换特征图。也可以对第一样本特征图进行旋转,得到第一样本仿射变换特征图。
步骤203,对上述第一样本仿射变换特征图和上述第二样本特征图中同一位置的第一向量和第二向量,基于预设的损失函数确定上述第一向量和上述第二向量的损失值。
在一些实施例中,上述的同一位置可以是第一样本仿射变换特征图和上述第二样本特征图,在同一坐标系具有相同坐标的位置。
在一些实施例中,第一向量和第二向量是上述第一样本仿射变换特征图和上述第二样本特征图中,同一位置的特征向量。
在一些实施例中,损失函数是定义拟合结果和真实结果之间的差异的函数。作为示例,损失函数可以是绝对值损失函数或平方损失函数。损失值可以是第一样本仿射变换特征图和第二样本特征图之间的图像差异度。作为示例,对上述第一样本仿射变换特征图和上述第二样本特征图中,每个像素对应的向量进行归一化处理,得到第一样本仿射变换特征图的归一化向量集合和第二样本特征图的归一化向量集合。将第一样本仿射变换特征图的归一化向量集合中,每一个与第二样本特征图的归一化向量集合对应的归一化向量,通过以下公式确定损失值,具体公式如下:
Figure PCTCN2021096145-appb-000001
其中,目标图像中某像素位置,经过特征提取和仿射变换后,在第一样本仿射变换特征图中的第一位置,与经过仿射变换和特征提取后,在第二样本特征图中的第二位置,可以是同一位置。i表示同一位置的第一样本仿射变换特征图的归一化向量集合中对应的哈希编码,以及第二样本特征图的归一化向量集合中对应的哈希编码的第i位。p i为第一样本仿射变换特征图的归一化向量集合中对应的哈希编码的第i位取1的概率。其中,可以将归一化向量集合中大于0.5的元素作为哈希编码1,小于0.5的作为哈希编码0。q i为在第二样本特征图的归一化向量集合中对应的哈希编码的第i位取1的概率。p iq i代表第一样本仿射变换特征图的归一化向量集合中对应的哈希编码的第i位取1的概率,与第二样本特征图的归一化向量集合中对应的哈希编码的第i位取1的概率乘积。1-p i代表第一样本仿射变换特征图的归一 化向量集合中对应的哈希编码的第i位取0的概率,1-q i代表第二样本特征图的归一化向量集合中对应的哈希编码的第i位取0的概率。(1-p i)(1-q i)代表第一样本仿射变换特征图的归一化向量集合中对应的哈希编码的第i位取0的概率,与第二样本特征图的归一化向量集合中对应的哈希编码的第i位取0的概率乘积。p iq i+(1-p i)(1-q i)表示预测的第一样本仿射变换特征图的归一化向量集合中对应的哈希编码的第i位,与第二样本特征图的归一化向量集合中对应的哈希编码的第i位之间的差异度。
Figure PCTCN2021096145-appb-000002
表示对第一样本仿射变换特征图的归一化向量集合,与第二样本特征图的归一化向量集合中每一个元素对应的哈希编码的预测差异值之和。得到的结果为同一位置的第一样本仿射变换特征图的归一化向量与第二样本特征图的归一化向量之间的损失值。
在一些实施例的一些可选实现方式中,损失函数可以是极大似然估计函数、散度函数或汉明距离。
步骤204,基于上述损失值对上述特征提取网络进行训练。
在一些实施例中,可以通过梯度下降法来优化特征提取网络中的权重,以使损失最小化。
本公开的上述各个实施例中的一个实施例具有如下有益效果:基于损失函数,对第一样本图像进行仿射变换,然后进行特征提取得到的图片,与第一样本图像经过特征提取,再经过仿射变换得到的图片进行比较,可以得到训练神经网络的损失值。使用上述损失值可以对特征提取网络进行优化训练,从而实现特征提取网络的优化。
进一步参考图3,其示出了一种生成特征提取网络的方法的另一些实施例的流程300。该生成特征提取网络的方法,包括以下步骤:
步骤301,将第一图像进行预处理,得到第一样本图像。
在一些实施例中,可以对第一图像进行灰度化处理、几何变换和图像增强。
步骤302,将第一样本图像和第二样本图像分别输入上述特征提取网络,得到第一样本特征图和第二样本特征图,其中,上述第二样本图像是对第一样本图像进行仿射变换得到的。
步骤303,对上述第一样本特征图进行仿射变换,得到第一样本 仿射变换特征图。
步骤304,对上述第一样本仿射变换特征图和上述第二样本特征图中同一位置的第一向量和第二向量,基于预设的损失函数确定上述第一向量和上述第二向量的损失值。
步骤305,基于上述损失值对上述特征提取网络进行训练。
在一些实施例中,步骤302、303、304和305的具体实现及其所带来的技术效果,可以参考图2对应的实施例中的步骤201、202、203和204,在此不再赘述。
本公开的一些实施例公开的生成特征提取网络的方法,基于对图像预处理,进行灰度化处理、几何变换和图像增强,可以消除图像中的无关信息,从而提升网络的训练效果,基于上述训练效果可以提升网络提取的特征的准确度。
进一步参考图4,作为对上述各图所示方法的实现,本公开提供了一种生成特征提取网络的装置的一些实施例,这些装置实施例与图2所示的那些方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图4所示,一些实施例的生成特征提取网络的装置400包括:特征图生成单元401、仿射变换单元402、损失值确定单元403和网络训练单元404。其中,特征图生成单元401,被配置成将第一样本图像和第二样本图像分别输入上述特征提取网络,得到第一样本特征图和第二样本特征图,其中,上述第二样本图像是对第一样本图像进行仿射变换得到的;仿射变换单元402,被配置成对上述第一样本特征图进行仿射变换,得到第一样本仿射变换特征图;损失值确定单元403,被配置成对上述第一样本仿射变换特征图和上述第二样本特征图中同一位置的第一向量和第二向量,基于预设的损失函数确定上述第一向量和上述第二向量的损失值;网络训练单元404,被配置成基于上述损失值对上述特征提取网络进行训练。
在一些实施例的可选实现方式中,上述装置还包括:图像预处理单元,被配置成将第一图像进行预处理,得到上述第一样本图像。
在一些实施例的可选实现方式中,上述第一样本特征图和第二样本特征图,包括:颜色特征、纹理特征、形状特征和空间关系特征。
在一些实施例的可选实现方式中,上述损失函数是以下之一:极大似然估计函数、散度函数、汉明距离。
可以理解的是,该装置400中记载的诸单元与参考图2描述的方法中的各个步骤相对应。由此,上文针对方法描述的操作、特征以及产生的有益效果同样适用于装置400及其中包含的单元,在此不再赘述。
如图5所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的各种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。
通常,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置507;包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有各种装置的电子设备500,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。图5中示出的每个方框可以代表一个装置,也可以根据需要代表多个装置。
特别地,根据本公开的一些实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的一些实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的一些实施例中,该计算机程序可以通过通信装置509从网络上被下载和 安装,或者从存储装置508被安装,或者从ROM 502被安装。在该计算机程序被处理装置501执行时,执行本公开的一些实施例的方法中限定的上述功能。
需要说明的是,本公开的一些实施例所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开的一些实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开的一些实施例中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是 单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:将第一样本图像和第二样本图像分别输入上述特征提取网络,得到第一样本特征图和第二样本特征图,其中,上述第二样本图像是对第一样本图像进行仿射变换得到的;对上述第一样本特征图进行仿射变换,得到第一样本仿射变换特征图;对上述第一样本仿射变换特征图和上述第二样本特征图中同一位置的第一向量和第二向量,基于预设的损失函数确定上述第一向量和上述第二向量的损失值;基于上述损失值对上述特征提取网络进行训练。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的一些实施例的操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开的一些实施例中的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括特征图生成单元、仿射变换单元、损失值确定单元和网络训练单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,特征图生成单元还可以被描述为“生成特征图的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
根据本公开的一个或多个实施例,提供了一种生成特征提取网络的方法,包括:将第一样本图像和第二样本图像分别输入上述特征提取网络,得到第一样本特征图和第二样本特征图,其中,上述第二样本图像是对第一样本图像进行仿射变换得到的;对上述第一样本特征图进行仿射变换,得到第一样本仿射变换特征图;对上述第一样本仿射变换特征图和上述第二样本特征图中同一位置的第一向量和第二向量,基于预设的损失函数确定上述第一向量和上述第二向量的损失值;基于上述损失值对上述特征提取网络进行训练。
根据本公开的一个或多个实施例,在将第一样本图像和第二样本图像分别输入上述特征提取网络,得到第一样本特征图和第二样本特征图,其中,上述第二样本图像是对第一样本图像进行仿射变换得到的之前,还包括:将第一图像进行预处理,得到上述第一样本图像。
根据本公开的一个或多个实施例,上述第一样本特征图和第二样本特征图,包括:颜色特征、纹理特征、形状特征和空间关系特征。
根据本公开的一个或多个实施例,上述损失函数是以下之一:极大似然估计函数、散度函数、汉明距离。
根据本公开的一个或多个实施例,生成特征提取网络的装置包括:特征图生成单元,被配置成将第一样本图像和第二样本图像分别输入上述特征提取网络,得到第一样本特征图和第二样本特征图,其中, 上述第二样本图像是对第一样本图像进行仿射变换得到的;仿射变换单元,被配置成对上述第一样本特征图进行仿射变换,得到第一样本仿射变换特征图;损失值确定单元,被配置成对上述第一样本仿射变换特征图和上述第二样本特征图中同一位置的第一向量和第二向量,基于预设的损失函数确定上述第一向量和上述第二向量的损失值;网络训练单元,被配置成基于上述损失值对上述特征提取网络进行训练。
根据本公开的一个或多个实施例,上述装置还包括:图像预处理单元,被配置成将第一图像进行预处理,得到上述第一样本图像。
根据本公开的一个或多个实施例,上述第一样本特征图和第二样本特征图,包括:颜色特征、纹理特征、形状特征和空间关系特征。
根据本公开的一个或多个实施例,上述损失函数是以下之一:极大似然估计函数、散度函数、汉明距离。
根据本公开的一个或多个实施例,提供了一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如上述任一的方法。
根据本公开的一个或多个实施例,提供了一种计算机可读介质,其上存储有计算机程序,其中,程序被处理器执行时实现如上述任一实施例描述的方法。
以上描述仅为本公开的一些较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开的实施例中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开的实施例中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (10)

  1. 一种生成特征提取网络的方法,包括:
    将第一样本图像和第二样本图像分别输入所述特征提取网络,得到第一样本特征图和第二样本特征图,其中,所述第二样本图像是对第一样本图像进行仿射变换得到的;
    对所述第一样本特征图进行仿射变换,得到第一样本仿射变换特征图;
    对所述第一样本仿射变换特征图和所述第二样本特征图中同一位置的第一向量和第二向量,基于预设的损失函数确定所述第一向量和所述第二向量的损失值;
    基于所述损失值对所述特征提取网络进行训练。
  2. 根据权利要求1所述的方法,其中,在将第一样本图像和第二样本图像分别输入所述特征提取网络,得到第一样本特征图和第二样本特征图,其中,所述第二样本图像是对第一样本图像进行仿射变换得到的之前,还包括:
    将第一图像进行预处理,得到所述第一样本图像。
  3. 根据权利要求1所述的方法,其中,所述第一样本特征图和第二样本特征图,包括:
    颜色特征、纹理特征、形状特征和空间关系特征。
  4. 根据权利要求1所述的方法,其中,所述损失函数是以下之一:
    极大似然估计函数、散度函数、汉明距离。
  5. 一种生成特征提取网络的装置,包括:
    特征图生成单元,被配置成将第一样本图像和第二样本图像分别输入所述特征提取网络,得到第一样本特征图和第二样本特征图,其中,所述第二样本图像是对第一样本图像进行仿射变换得到的;
    仿射变换单元,被配置成对所述第一样本特征图进行仿射变换,得到第一样本仿射变换特征图;
    损失值确定单元,被配置成对所述第一样本仿射变换特征图和所述第二样本特征图中同一位置的第一向量和第二向量,基于预设的损失函数确定所述第一向量和所述第二向量的损失值;
    网络训练单元,被配置成基于所述损失值对所述特征提取网络进行训练。
  6. 根据权利要求5所述装置,其中,所述装置还包括:
    图像预处理单元,被配置成将第一图像进行预处理,得到所述第一样本图像。
  7. 根据权利要求5所述装置,其中,所述第一样本特征图和第二样本特征图,包括:
    颜色特征、纹理特征、形状特征和空间关系特征。
  8. 根据权利要求5所述装置,其中,所述损失函数是以下之一:极大似然估计函数、散度函数、汉明距离。
  9. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,其上存储有一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-4中任一所述的方法。
  10. 一种计算机可读介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-4中任一所述的方法。
PCT/CN2021/096145 2020-07-16 2021-05-26 生成特征提取网络的方法、装置、设备和计算机可读介质 WO2022012179A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010685579.5 2020-07-16
CN202010685579.5A CN111915480B (zh) 2020-07-16 2020-07-16 生成特征提取网络的方法、装置、设备和计算机可读介质

Publications (1)

Publication Number Publication Date
WO2022012179A1 true WO2022012179A1 (zh) 2022-01-20

Family

ID=73280390

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096145 WO2022012179A1 (zh) 2020-07-16 2021-05-26 生成特征提取网络的方法、装置、设备和计算机可读介质

Country Status (2)

Country Link
CN (1) CN111915480B (zh)
WO (1) WO2022012179A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111915480B (zh) * 2020-07-16 2023-05-23 抖音视界有限公司 生成特征提取网络的方法、装置、设备和计算机可读介质
CN112651880B (zh) * 2020-12-25 2022-12-30 北京市商汤科技开发有限公司 视频数据处理方法及装置、电子设备和存储介质
CN113065475B (zh) * 2021-04-08 2023-11-07 上海晓材科技有限公司 一种cad图例快速精准识别方法
CN113313022B (zh) * 2021-05-27 2023-11-10 北京百度网讯科技有限公司 文字识别模型的训练方法和识别图像中文字的方法
CN114528976B (zh) * 2022-01-24 2023-01-03 北京智源人工智能研究院 一种等变网络训练方法、装置、电子设备及存储介质
CN115082740B (zh) * 2022-07-18 2023-09-01 北京百度网讯科技有限公司 目标检测模型训练方法、目标检测方法、装置、电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007133840A (ja) * 2005-11-07 2007-05-31 Hirotaka Niitsuma EMObjectLocalizationusingHaar−likefeature
CN102231191A (zh) * 2011-07-17 2011-11-02 西安电子科技大学 基于asift的多模态图像特征提取与匹配方法
CN111340013A (zh) * 2020-05-22 2020-06-26 腾讯科技(深圳)有限公司 人脸识别方法、装置、计算机设备及存储介质
CN111382727A (zh) * 2020-04-02 2020-07-07 安徽睿极智能科技有限公司 一种基于深度学习的犬脸识别方法
CN111382793A (zh) * 2020-03-09 2020-07-07 腾讯音乐娱乐科技(深圳)有限公司 一种特征提取方法、装置和存储介质
CN111915480A (zh) * 2020-07-16 2020-11-10 北京字节跳动网络技术有限公司 生成特征提取网络的方法、装置、设备和计算机可读介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6095559B2 (ja) * 2013-12-17 2017-03-15 日本電信電話株式会社 特徴抽出装置、方法、及びプログラム
CN109344845B (zh) * 2018-09-21 2020-06-09 哈尔滨工业大学 一种基于Triplet深度神经网络结构的特征匹配方法
CN110188754B (zh) * 2019-05-29 2021-07-13 腾讯科技(深圳)有限公司 图像分割方法和装置、模型训练方法和装置
CN110555835B (zh) * 2019-09-04 2022-12-02 郑州大学 一种脑片图像区域划分方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007133840A (ja) * 2005-11-07 2007-05-31 Hirotaka Niitsuma EMObjectLocalizationusingHaar−likefeature
CN102231191A (zh) * 2011-07-17 2011-11-02 西安电子科技大学 基于asift的多模态图像特征提取与匹配方法
CN111382793A (zh) * 2020-03-09 2020-07-07 腾讯音乐娱乐科技(深圳)有限公司 一种特征提取方法、装置和存储介质
CN111382727A (zh) * 2020-04-02 2020-07-07 安徽睿极智能科技有限公司 一种基于深度学习的犬脸识别方法
CN111340013A (zh) * 2020-05-22 2020-06-26 腾讯科技(深圳)有限公司 人脸识别方法、装置、计算机设备及存储介质
CN111915480A (zh) * 2020-07-16 2020-11-10 北京字节跳动网络技术有限公司 生成特征提取网络的方法、装置、设备和计算机可读介质

Also Published As

Publication number Publication date
CN111915480B (zh) 2023-05-23
CN111915480A (zh) 2020-11-10

Similar Documents

Publication Publication Date Title
WO2022012179A1 (zh) 生成特征提取网络的方法、装置、设备和计算机可读介质
CN108509915B (zh) 人脸识别模型的生成方法和装置
CN109800732B (zh) 用于生成漫画头像生成模型的方法和装置
CN108427939B (zh) 模型生成方法和装置
WO2020215974A1 (zh) 用于人体检测的方法和装置
WO2020006961A1 (zh) 用于提取图像的方法和装置
WO2023005386A1 (zh) 模型训练方法和装置
WO2023077995A1 (zh) 信息提取方法、装置、设备、介质及产品
CN112766284B (zh) 图像识别方法和装置、存储介质和电子设备
WO2023179310A1 (zh) 图像修复方法、装置、设备、介质及产品
WO2020034981A1 (zh) 编码信息的生成方法和识别方法
CN111539287B (zh) 训练人脸图像生成模型的方法和装置
WO2020056901A1 (zh) 用于处理图像的方法和装置
CN111091182A (zh) 数据处理方法、电子设备及存储介质
CN111402122A (zh) 图像的贴图处理方法、装置、可读介质和电子设备
CN112418249A (zh) 掩膜图像生成方法、装置、电子设备和计算机可读介质
WO2022012178A1 (zh) 用于生成目标函数的方法、装置、电子设备和计算机可读介质
WO2024040870A1 (zh) 文本图像生成、训练、文本图像处理方法以及电子设备
CN111967584A (zh) 生成对抗样本的方法、装置、电子设备及计算机存储介质
WO2023116744A1 (zh) 图像处理方法、装置、设备及介质
WO2023130925A1 (zh) 字体识别方法、装置、可读介质及电子设备
CN115100536B (zh) 建筑物识别方法、装置、电子设备和计算机可读介质
CN115375657A (zh) 息肉检测模型的训练方法、检测方法、装置、介质及设备
CN116704593A (zh) 预测模型训练方法、装置、电子设备和计算机可读介质
CN115700838A (zh) 用于图像识别模型的训练方法及其装置、图像识别方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21841205

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08/05/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21841205

Country of ref document: EP

Kind code of ref document: A1