WO2018153322A1 - 关键点检测方法、神经网络训练方法、装置和电子设备 - Google Patents

关键点检测方法、神经网络训练方法、装置和电子设备 Download PDF

Info

Publication number
WO2018153322A1
WO2018153322A1 PCT/CN2018/076689 CN2018076689W WO2018153322A1 WO 2018153322 A1 WO2018153322 A1 WO 2018153322A1 CN 2018076689 W CN2018076689 W CN 2018076689W WO 2018153322 A1 WO2018153322 A1 WO 2018153322A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
feature information
hourglass
sub
current
Prior art date
Application number
PCT/CN2018/076689
Other languages
English (en)
French (fr)
Inventor
王晓刚
初晓
杨巍
欧阳万里
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2018153322A1 publication Critical patent/WO2018153322A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Definitions

  • the embodiments of the present application relate to artificial intelligence technologies, and in particular, to a key point detection method, apparatus, and electronic device, and a neural network training method, apparatus, and electronic device.
  • Neural network is an important research field for computer vision and pattern recognition. It uses computer-like biological brain thinking to inspire information processing similar to humans to specific objects. Through the neural network, the detection and recognition of target objects (such as people, animals, vehicles, etc.) can be performed efficiently. With the development of Internet technology and the rapid increase of information volume, neural networks are more and more widely used in the field of image detection and target object recognition to find out the actual information needed from a large amount of information.
  • the embodiment of the present application provides a key point detection scheme and a neural network training scheme.
  • a key point detection method includes: performing feature extraction on a to-be-detected image including a target object via a neural network; and generating attention of the target object according to the extracted feature information. Using the attention map to correct the feature information; and performing key point detection on the target object according to the modified feature information.
  • the neural network performs a feature extraction operation on the image to be detected that includes the target object, including: performing a convolution operation on the image to be detected by using a convolutional neural network, and obtaining the first image to be detected.
  • Generating the attention map of the target object according to the extracted feature information comprising: performing nonlinear transformation on the first feature information to obtain second feature information; and generating, according to the second feature information, The attention map of the target object.
  • the method before modifying the feature information by using the attention map, the method further includes: smoothing the attention map by using a conditional random field; or, using the normalization function, performing the attention map Normalized processing.
  • the neural network includes multiple sub-neural networks stacked end-to-end; for each sub-neural network, the attention map of the current sub-neural network is generated according to the feature information extracted by the current sub-neural network, using the current sub-neural network Attention is made to correct the feature information extracted by the current sub-neural network; if the current sub-neural network is a non-last sub-neural network in the plurality of sub-neural networks, the modified sub-neural network is characterized by the adjacent sub-neural nerve The input of the network; and/or, if the current sub-neural network is the last sub-neural network of the plurality of sub-neural networks, performing key point detection on the target object according to the modified feature information of the current sub-neural network.
  • the using the attention map of the current sub-neural network to correct the feature information extracted by the current sub-neural network includes: at least representing a feature map of the feature information extracted by the current sub-neural network according to the attention map of the current sub-neural network; The pixel value of the area corresponding to the part of the non-target object is set to zero, and the modified feature information of the current sub-neural network is obtained.
  • the pixel value of the region corresponding to at least part of the non-target object in the feature map indicating the feature information extracted by the current sub-neural network is zeroed, and the modified feature of the current sub-neural network is obtained.
  • the information includes: if the current child neural network is the first N child neural networks set in the plurality of child neural networks, using the attention map of the current child neural network, in the feature map indicating the feature information extracted by the current child neural network And at least part of the non-target object corresponding to the pixel value of the area is set to zero, obtaining feature information of the area where the target object is located; and/or, if the current sub-neural network is not the first N sub-neurals set in the plurality of sub-neural networks
  • the network extracts the feature map of the feature information indicating the region where the target object is located by the current sub-neural network, and generates a attention map of the current sub-neural network according to the extracted feature information; using the attention map of the current sub-neural network, At least part of the feature map indicating the feature information extracted by the current sub-neural network
  • the pixel value of the region corresponding to the key point of the object is set to zero, and the feature information of the region
  • the neural network performs feature extraction on the to-be-detected image including the target object, including: obtaining different resolutions of multiple convolutional layers corresponding to the current sub-neural network a feature map, respectively, the plurality of feature maps are up-sampled to obtain feature information corresponding to the plurality of feature maps; and the generating the attention map of the target object according to the extracted feature information, including: according to the plurality of features
  • the feature information corresponding to the map generates a plurality of attention maps of different resolutions; and the attention graphs of the plurality of different resolutions are combined to generate an attention map of the target object of the current sub-neural network.
  • the neural network comprises: an hourglass neural network.
  • the hourglass neural network includes a plurality of hourglass sub-neural networks, each hourglass sub-neural network includes at least one hourglass residual module; each hourglass residual module includes a first residual branch, a second residual branch, and a third residual branch; wherein, each of the hourglass residual modules in each hourglass neural network performs feature extraction on the image to be detected including the target object, including: inputting the current hourglass via the first residual branch pair
  • the image block of the residual module is subjected to identity mapping to obtain first feature information included in the first image block after the identity mapping; and the convolution in the image block of the current hourglass residual module is input through the second residual branch pair
  • the image area indicated by the kernel size is subjected to convolution processing to obtain second feature information included in the second image region after the convolution processing; and the image block input to the current hourglass residual module is subjected to the pooled kernel via the third residual branch
  • the size is pooled, and the image area in the pooled image block is convoluted according to the convolution
  • the target includes the target object through the hourglass residual module and/or the residual module of the current hourglass sub-network.
  • the original image to be detected is subjected to feature extraction; and/or, if the current hourglass neural network is a non-first sub-neural network of the plurality of sub-neural networks, the hourglass residual module and/or the residual through the current hourglass neural network
  • the difference module performs feature extraction on the image outputted by the previous hourglass neural network adjacent to the current hourglass neural network.
  • a neural network training method includes: performing feature extraction on a training sample image including a target object via a neural network; and generating attention of the target object according to the extracted feature information. Using the attention map to correct the feature information; obtaining key point prediction information of the target object according to the modified feature information; obtaining the key point prediction information and the key point labeling information in the training sample image The difference; adjusting the network parameters of the neural network according to the difference.
  • the neural network performs feature extraction on the training sample image including the target object, including: convolving the training sample image by a convolutional neural network, and obtaining first feature information of the training sample image. And generating, according to the extracted feature information, the attention map of the target object, comprising: performing nonlinear transformation on the first feature information to obtain second feature information; and generating, according to the second feature information, the The attention map of the target object.
  • the method before modifying the feature information by using the attention map, the method further includes: smoothing the attention map by using a conditional random field; or, using the normalization function, performing the attention map Normalized processing.
  • the neural network includes multiple sub-neural networks stacked end-to-end; for each sub-neural network, the attention map of the current sub-neural network is generated according to the feature information extracted by the current sub-neural network, using the current sub-neural network Attention is made to correct the feature information extracted by the current sub-neural network; if the current sub-neural network is a non-last sub-neural network in the plurality of sub-neural networks, the modified sub-neural network is characterized by the adjacent sub-neural nerve Input of the network; and/or, if the current child neural network is the last child neural network in the plurality of child neural networks, performing key point prediction on the target object according to the modified feature information of the current child neural network, Key point prediction information for the target object.
  • the using the attention map of the current sub-neural network to correct the feature information extracted by the current sub-neural network includes: at least representing a feature map of the feature information extracted by the current sub-neural network according to the attention map of the current sub-neural network; The pixel value of the area corresponding to the part of the non-target object is set to zero, and the modified feature information of the current sub-neural network is obtained.
  • the neural network performs feature extraction on the training sample image including the target object, including: obtaining multiple different resolutions of the corresponding output of the plurality of convolution layers of the current sub-neural network a feature map, respectively, a plurality of feature maps are up-sampled to obtain feature information corresponding to the plurality of feature maps; and the generating the attention map of the target object according to the extracted feature information, including: corresponding to the plurality of feature maps
  • the feature information generates a plurality of attention maps of different resolutions; the attention graphs of the plurality of different resolutions are combined to generate a attention map of the target object of the current sub-neural network.
  • the neural network is an hourglass neural network.
  • the hourglass neural network comprises a plurality of hourglass neural networks, wherein an output of the prior hourglass neural network is used as an input of an adjacent backslip neural network, and each hourglass neural network adopts the present application.
  • the neural network training method described in any of the above embodiments is trained.
  • each hourglass sub-neural network includes at least one hourglass residual module; each hourglass residual module includes a first residual branch, a second residual branch, and a third residual branch; wherein each hourglass is Each hourglass residual module in the neural network performs feature extraction on the training sample image including the target object, including: performing an identity mapping on the image block input to the current hourglass residual module by the first residual branch to obtain a constant First mapping information included in the first image block after mapping; convolving the image area indicated by the convolution kernel size in the image block of the current hourglass residual module by the second residual branch to obtain a volume And second image information included in the second image region after the product processing; the image block input to the current hourglass residual module is pooled according to the size of the pooled kernel through the third residual branch, and according to the size of the convolution kernel The image area in the imaged block is subjected to convolution processing, and the convolved image area is upsampled to generate and input an image size of the current hourglass residual module.
  • the target includes the target object through the hourglass residual module and/or the residual module of the current hourglass sub-network.
  • the original image to be detected is subjected to feature extraction; and/or, if the current hourglass neural network is a non-first sub-neural network of the plurality of sub-neural networks, the hourglass residual module and/or the residual through the current hourglass neural network
  • the difference module performs feature extraction on the image outputted by the previous hourglass neural network adjacent to the current hourglass neural network.
  • a key point detecting apparatus including: a first feature extraction module, configured to perform feature extraction on a to-be-detected image including a target object via a neural network; And a method for generating a attention map of the target object according to the extracted feature information; a first correcting module, configured to correct the feature information by using the attention map; and a detecting module, configured to perform, according to the modified feature information,
  • the target object performs key point detection.
  • the first feature extraction module is configured to perform a convolution operation on the image to be detected by using a convolutional neural network to obtain first feature information of the image to be detected; Performing nonlinear transformation on the first feature information to obtain second feature information; and generating a attention map of the target object according to the second feature information.
  • the device further includes: a first processing module, configured to perform smoothing processing on the attention map by using a conditional random field before the first correction module corrects the feature information by using the attention map;
  • the attention map is normalized using a normalization function.
  • the neural network includes multiple sub-neural networks stacked end-to-end; for each sub-neural network, the first generation module generates an attention map of the current sub-neural network according to the feature information extracted by the current sub-neural network, The first correction module corrects the feature information extracted by the current child neural network by using the attention map of the current child neural network; if the current child neural network is the non-last child neural network in the plurality of child neural networks, the current child neural network is corrected.
  • the subsequent feature information is an input of an adjacent subsequent child neural network; and/or, if the current child neural network is the last child neural network of the plurality of child neural networks, the detection module corrects according to the current child neural network After the feature information, key point detection is performed on the target object.
  • the first correction module corrects the feature information extracted by the current child neural network by using the attention map of the current child neural network
  • the feature information indicating the current child neural network is extracted.
  • the pixel value of the region corresponding to at least part of the non-target object in the feature map is set to zero, and the modified feature information of the current sub-neural network is obtained.
  • the first correction module sets, according to the attention map of the current sub-neural network, a pixel value of an area corresponding to at least a part of the non-target object in the feature map indicating the feature information extracted by the current sub-neural network, to obtain a current
  • the sub-neural network corrects the feature information
  • the attention map of the current sub-neural network is used to represent the current sub-neural network extraction.
  • the pixel value of the region corresponding to at least part of the non-target object in the feature map of the feature information is set to zero, and the feature information of the region where the target object is located is obtained; and/or, if the current child neural network is not in the plurality of child neural networks
  • the first N sub-neural networks are characterized, and the feature map of the feature information indicating the region where the target object is located is extracted by the current sub-neural network, and the attention map of the current sub-neural network is generated according to the extracted feature information;
  • the attention map of the network, the feature map representing the feature information extracted by the current sub-neural network The pixel value of the region corresponding to the key point of the non-target object is set to zero, and the feature information of the region corresponding to the key point of the target object is obtained; wherein the resolution of the attention map corresponding to the first N sub-neural networks is low
  • the first feature extraction module obtains multiple feature maps of different resolutions corresponding to the output of the plurality of convolution layers of the current sub-neural network, and separately samples the multiple feature maps. Obtaining feature information corresponding to the plurality of feature maps; the first generating module generates a plurality of attention maps of different resolutions according to the feature information corresponding to the plurality of feature maps; and combining the attention graphs of the plurality of different resolutions Processing, generating a attention map of the target object of the current child neural network.
  • the neural network is an hourglass neural network.
  • the hourglass neural network includes a plurality of hourglass sub-neural networks, each hourglass sub-neural network includes at least one hourglass residual module; each hourglass residual module includes a first residual branch, a second residual branch, and a third residual branch; wherein the first feature extraction module performs feature extraction on the image to be detected including the target object through each hourglass residual module in each hourglass neural network,
  • the residual branch performs an identity mapping on the image block input to the current hourglass residual module, and obtains the first feature information included in the first image block after the identity mapping; and inputs the current hourglass residual module through the second residual branch pair
  • the image area indicated by the convolution kernel size in the image block is subjected to convolution processing to obtain second feature information included in the second image region after the convolution processing; and the current hourglass residual module is input through the third residual branch
  • the image block is pooled according to the size of the pooled kernel, and the image region in the image block after the pooling process is convoluted according to the size of the convolution kernel
  • the first feature extraction module performs feature extraction: if the current hourglass neural network is the first sub-neural network of the plurality of sub-neural networks, passing the hourglass residual module of the current hourglass sub-network and/or a residual module residual module that performs feature extraction on the input original image to be detected including the target object; and/or, if the current hourglass neural network is a non-first neural network in the plurality of child neural networks, The hourglass residual module of the current hourglass neural network and/or feature extraction of the image outputted by the previous hourglass neural network adjacent to the current hourglass neural network.
  • a neural network training apparatus includes: a second feature extraction module, configured to perform feature extraction on a training sample image including a target object via a neural network; and a second generation module, Generating an attention map of the target object according to the extracted feature information; a second correction module, configured to correct the feature information by using the attention map; and a prediction module, configured to obtain a target object according to the modified feature information Key point prediction information; a difference obtaining module, configured to obtain a difference between the key point prediction information and key point labeling information in the training sample image; and an adjustment module, configured to adjust the neural network according to the difference Network parameters.
  • the second feature extraction module is configured to convolute the training sample image by a convolutional neural network to obtain first feature information of the training sample image; and the second generation module is configured to: Performing nonlinear transformation on the first feature information to obtain second feature information; and generating a attention map of the target object according to the second feature information.
  • the device further includes: a second processing module, configured to perform smoothing processing on the attention map by using a conditional random field before the second correction module corrects the feature information by using the attention map;
  • the attention map is normalized using a normalization function.
  • the neural network includes multiple sub-neural networks stacked end-to-end; for each sub-neural network, the second generation module generates an attention map of the current sub-neural network according to the feature information extracted by the current sub-neural network, The second correction module corrects the feature information extracted by the current child neural network by using the attention map of the current child neural network; if the current child neural network is the non-last child neural network of the plurality of child neural networks, the current child neural network is corrected.
  • the subsequent feature information is an input of an adjacent subsequent child neural network; and/or, if the current child neural network is the last child neural network of the plurality of child neural networks, the prediction module corrects according to the current child neural network After the feature information, key point prediction is performed on the target object, and key point prediction information of the target object is obtained.
  • the second correction module corrects the feature information extracted by the current sub-neural network according to the attention map of the current sub-neural network when correcting the feature information extracted by the current sub-neural network through the attention map of the current sub-neural network
  • the pixel value of the region corresponding to at least part of the non-target object in the feature map is set to zero, and the modified feature information of the current sub-neural network is obtained.
  • the second feature extraction module obtains multiple feature maps of different resolutions corresponding to the output of the plurality of convolution layers of the current sub-neural network, and separately samples the multiple feature maps. Obtaining feature information corresponding to the plurality of feature maps; the second generating module generates a plurality of attention maps of different resolutions according to the feature information corresponding to the plurality of feature maps; and combining the attention graphs of the plurality of different resolutions Processing, generating a attention map of the target object of the current child neural network.
  • the neural network is an hourglass neural network.
  • the hourglass neural network comprises a plurality of hourglass neural networks, wherein an output of the prior hourglass neural network is used as an input of an adjacent backslip neural network, and each hourglass neural network adopts a fourth The device described in the aspects is trained.
  • each hourglass sub-neural network includes at least one hourglass residual module; each hourglass residual module includes a first residual branch, a second residual branch, and a third residual branch; wherein the second feature The extraction module inputs the image block of the current hourglass residual module via the first residual branch pair when performing feature extraction on the training sample image including the target object through each hourglass residual module in each hourglass neural network.
  • the second feature extraction module performs feature extraction: if the current hourglass sub-neural network is the first sub-neural network in the plurality of sub-neural networks, passing the hourglass residual module of the current hourglass sub-network and/or a residual module, performing feature extraction on the input original image to be detected including the target object; and/or, if the current hourglass neural network is a non-first sub-neural network in the plurality of sub-neural networks, passing the current hourglass The hourglass residual module and/or the residual module of the neural network extracts features from the image outputted by the previous hourglass neural network adjacent to the current hourglass neural network.
  • an electronic device including: a processor and a memory; the memory is configured to store at least one executable instruction, the executable instruction causing the processor to execute as in the present application.
  • a computer readable storage medium which is stored with a computer program, and when the computer program is executed by the processor, implements the key point detection provided by any of the foregoing embodiments of the present application.
  • Method or neural network training method is provided.
  • another computer readable storage medium is provided, which is configured to: perform feature extraction on an image to be detected including a target object via a neural network. Executing instructions; executable instructions for generating an attention map of the target object based on the extracted feature information; executable instructions for modifying the feature information using the attention map; for using the modified feature information An executable instruction that performs keypoint detection on the target object.
  • still another computer readable storage medium storing: executable for performing feature extraction on a training sample image including a target object via a neural network An executable instruction for generating an attention map of the target object based on the extracted feature information; an executable instruction for correcting the feature information using the attention map; and configured to, according to the modified feature information, Obtaining executable instructions for obtaining key point prediction information of the target object; executable instructions for obtaining a difference between the key point prediction information and key point annotation information in the training sample image; for adjusting according to the difference Executable instructions for network parameters of the neural network.
  • an Attention mechanism is introduced into the neural network, and an attention map is generated according to the feature information of the target object output by the neural network.
  • the neural network after introducing the attention mechanism can focus on the information of the target object.
  • the feature information of the target object is significantly different from the feature information of the non-target object. Therefore, the feature map is corrected by using the attention map to implement the correction of the feature of the target object, so that the feature information of the target object in the image to be detected is more prominent, more easily detected and recognized, and the accuracy of the detection result is improved. Reduce false detection or missed detection.
  • FIG. 1 is a flow chart showing the steps of a key point detecting method according to an embodiment of the present application
  • FIG. 2 is a flow chart showing the steps of another key point detecting method according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an hourglass network structure for key point detection in the embodiment shown in FIG. 2;
  • FIG. 4 is a schematic diagram of an improved hourglass residual module in the embodiment of FIG. 2;
  • FIG. 5 is a flowchart of steps of a neural network training method according to an embodiment of the present application.
  • FIG. 6 is a flow chart of steps of another neural network training method according to an embodiment of the present application.
  • FIG. 7 is a structural block diagram of a key point detecting apparatus according to an embodiment of the present application.
  • FIG. 8 is a structural block diagram of another key point detecting apparatus according to an embodiment of the present application.
  • FIG. 9 is a structural block diagram of a neural network training apparatus according to an embodiment of the present application.
  • FIG. 10 is a structural block diagram of another neural network training apparatus according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • the key point detecting method of each embodiment of the present application may be performed by any suitable device having data processing capability, including but not limited to: a terminal device, a server, and the like.
  • the key point detection method of this embodiment includes the following steps:
  • Step S102 performing a feature extraction operation on the image to be detected including the target object via the neural network.
  • the neural network may be any suitable neural network that can implement feature extraction or target object detection, for example, but not limited to: convolutional neural network, enhanced learning neural network, generation network in anti-neural network, etc. .
  • the configuration of the structure in the neural network such as the number of layers of the convolutional layer, the size of the convolution kernel, the number of channels, and the like, may be appropriately set by a person skilled in the art according to actual needs, and the embodiment of the present application does not limit this.
  • the step S102 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the first feature extraction module 502 being executed by the processor.
  • the feature information of the target object can be obtained, for example, by feature extraction of the convolutional neural network, and a feature map including the feature information is obtained.
  • Step S104 Generate a attention map of the target object according to the extracted feature information.
  • an attention mechanism is introduced in a neural network, and an attention map is generated.
  • the step S104 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the first generation module 504 being executed by the processor.
  • Human visual attention is not balanced in the processing of information. It automatically processes the region of interest to extract useful information, while the regions that are not of interest are not processed, so that humans can be in a complex visual environment. Quickly locate the target of interest.
  • the attention mechanism is a model that uses a computer to simulate human visual attention, extracting an eye-catching focus that can be observed by the human eye, that is, a salient region of the image.
  • the feature map extracted based on the neural network generates the attention map.
  • the saliency area of the image such as the area where the target object is located, is more prominent; on the other hand, the attention is reduced compared with the processing of the original image.
  • the data processing burden of the force mechanism is a model that uses a computer to simulate human visual attention, extracting an eye-catching focus that can be observed by the human eye, that is, a salient region of the image.
  • the feature map extracted based on the neural network generates the attention map.
  • the saliency area of the image such as the area where the target
  • Step S106 Correct the feature information using the attention map.
  • step S106 may be performed by the processor invoking a corresponding instruction stored in the memory or by the first modification module 506 being executed by the processor.
  • the attention map can use the attention map to correct the feature information. For example, use the attention map to correct the feature map to effectively filter the information of the non-target object, so that the information of the target object is more protruding.
  • Step S108 Perform key point detection on the target object according to the modified feature information.
  • step S106 may be performed by the processor invoking a corresponding instruction stored in the memory or by the detection module 508 being executed by the processor.
  • the modified feature information can make the feature information of the target object more prominent.
  • the information of the non-target object has less interference to the recognition and detection of the target object; on the other hand, the attention mechanism extracts
  • the feature information of the target object has a certain spatial context correlation.
  • the feature information of the highlighted target object facilitates the comprehensive detection of the key points by the neural network, and avoids key points miss detection as much as possible. All of the above make the target object easier to detect and recognize.
  • the attention mechanism is introduced into the neural network, and the attention map is generated according to the characteristic information output by the neural network.
  • the neural network after introducing the attention mechanism can focus on the information of the target object.
  • the feature information of the target object is significantly different from the feature information of the non-target object. Therefore, the feature map is corrected by using the attention map to implement the correction of the feature of the target object, so that the feature information of the target object in the image to be detected is more prominent, more easily detected and recognized, and the detection accuracy is improved, and the error is reduced.
  • Check or miss detection is used to implement the correction of the feature of the target object.
  • the key point detection method of this embodiment includes the following steps:
  • Step S202 Acquire an image to be detected including the target object.
  • the image to be detected may be a static image, or may be any frame image in the video frame image.
  • Step S204 performing a feature extraction operation on the detected image by the neural network.
  • the neural network may select any suitable neural network that can implement feature extraction or target object detection.
  • the neural network selects a convolutional neural network.
  • the convolutional neural network may be a HOURGLASS (hourglass) neural network.
  • the hourglass neural network can realize the recognition of the target object by effectively detecting the key points of the target object, and can detect the human body posture very effectively.
  • a single hourglass neural network adopts a symmetric topology, which usually includes an input layer, a convolution layer, a pooling layer, an upsampling layer, etc.
  • the input of the hourglass neural network is a picture
  • the output is at least one pixel point (for example, each pixel point) )
  • Each score graph in the output section corresponds to a key point on a target object.
  • the position with the highest score on the score map represents the detected position of the key point.
  • the resolution is continuously reduced by the pooling (POOLING) layer, and global features are obtained, and then the global features are interpolated and enlarged, and the positions of the corresponding resolutions in the feature map are combined for judgment.
  • POOLING pooling
  • the neural network may comprise a plurality of sub-neural networks stacked end-to-end, such as a plurality of convolutional neural networks stacked end-to-end, optionally with a plurality of hourglass sub-networks stacked end-to-end.
  • a plurality of sub-neural networks stacked end-to-end can perform deeper extraction of features to ensure accurate and effective extracted features.
  • it is not limited to the sand trap neural network, and other neural networks having the same or similar structure as the hourglass neural network and having the key point detecting function can be applied to the solution of the embodiment of the present application.
  • FIG. 3 When the neural network selects multiple hourglass sub-networks stacked end-to-end, a feasible structure is shown in FIG.
  • FIG. 3 eight hourglass neural networks are stacked together to form an hourglass neural network for critical point detection.
  • the eight hourglass neural networks are connected end-to-end, and the output of the previous hourglass is the input of the adjacent latter hourglass.
  • the bottom-up, top-down analysis and learning are consistent throughout the model, so that the detection of key points of the target object is more accurate.
  • the number of the sand trap neural network can be appropriately set according to actual needs, and only eight examples are used as an example for description.
  • the convolutional neural network When the neural network selects the convolutional neural network, the convolutional neural network performs a convolution operation on the detected image to obtain first feature information of the image to be detected.
  • the convolutional neural network performs feature extraction on the input image to be detected to obtain feature information and generate a feature map.
  • the feature map can be considered as a representation form of the feature information, and in the actual application, the feature information can be directly operated.
  • the feature information of the target object of the convolutional neural network can be obtained.
  • the hourglass neural network includes multiple hourglass neural networks
  • an attention mechanism is introduced for each hourglass neural network to obtain characteristic information (such as a feature map) of the last convolutional layer output in each hourglass neural network.
  • each hourglass sub-network usually includes a plurality of Residual Units (RUs), and the hourglass neural network extracts higher-level features of the image through the residual module, while retaining the original level information without changing the data size.
  • RUs Residual Units
  • the hourglass neural network extracts higher-level features of the image through the residual module, while retaining the original level information without changing the data size.
  • Simply changing the data depth can be thought of as an advanced convolutional layer that preserves the data size.
  • the residual module can combine features of different resolutions to make feature learning more robust.
  • At least one of the residual modules in each of the hourglass sub-neural networks is improved, and the improved residual module is called a Hourglass Residual Unit (Hourglass Residual Unit, HRU).
  • HRU Hourglass Residual Unit
  • Each hourglass includes at least one hourglass residual module, and each hourglass residual module includes a first residual branch, a second residual branch, and a third residual branch.
  • Each of the hourglass residual modules performs an identity mapping operation on the image block of the current hourglass residual module through the first residual branch to obtain the first feature information included in the first image block after the identity mapping.
  • the third residual branch performs pooling processing on the image block input into the current hourglass residual module according to the size of the pooled kernel, and performs convolution processing on the image region in the imaged processed image block according to the convolution kernel size.
  • the image area after the product processing is upsampled, and a third image block having the same size as the image block of the current hourglass residual module is generated to obtain third feature information of the third image block; and further, the first feature information and the second feature information are generated.
  • the feature information and the third feature information are combined to obtain the feature information extracted by the current hourglass residual module.
  • a conventional residual module that is, a residual module having only a first residual branch and a second residual branch, is also applicable to the solution of the embodiment of the present application. .
  • an hourglass neural network only a plurality of hourglass residual modules may be included, or only a plurality of residual modules may be included, and at least one hourglass residual module and at least one residual module may be included.
  • the output of the previous hourglass residual module or residual module is the input of the adjacent latter hourglass residual module or residual module, and the last hourglass residual module or residual module in the hourglass neural network.
  • the output is the output of the current hourglass neural network.
  • the input is the original image to be detected, and then passes through the current hourglass neural network.
  • An hourglass residual module and/or a residual module residual module performing a feature extraction operation on the input original image to be detected including the target object; and/or if the current hourglass neural network is a non-first in the plurality of child neural networks
  • the sub-neural network performs feature extraction operations on the image outputted by the previous hourglass sub-neural network adjacent to the current hourglass sub-neural network through the hourglass residual module and/or the residual module of the current hourglass neural network.
  • the corresponding output of the multiple convolution layers of the current sub-neural network may be obtained.
  • a plurality of feature maps of different resolutions are respectively used to upsample a plurality of feature maps, and then feature information corresponding to the plurality of feature maps is obtained.
  • step S204 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by the first feature extraction module 602 being executed by the processor.
  • Step S206 Generate a attention map of the target object according to the extracted feature information.
  • the first feature information may be nonlinearly transformed to obtain the second Feature information; generating a attention map according to the second feature information.
  • w ⁇ denotes a convolution filter, which is a matrix containing linear transformations of network parameters such as the network parameters of the hourglass neural network
  • f denotes a feature of the output of the neural network such as the final output of the hourglass neural network (which can be expressed as a feature)
  • the characteristics of the layer f) b represents the bias
  • g() represents the equation of the nonlinear transformation (such as ReLU).
  • the feature f of the feature layer has multiple channels (such as the three commonly used settings of 128, 256, and 512), but s is an output and there is only one channel. The value of s is controlled between 0-1 by a nonlinear transformation g().
  • the neural network selects the hourglass neural network and the hourglass neural network includes multiple hourglass neural networks
  • multiple different resolutions of the current hourglass sub-network can be obtained with different resolutions.
  • the feature maps are respectively upsampled to obtain feature information corresponding to the plurality of feature maps; and the corresponding plurality of attention maps of different resolutions are generated according to the feature information corresponding to the plurality of feature maps.
  • Feature maps with different resolutions enable multi-level extraction of features from coarse to fine.
  • step S206 may be performed by the processor invoking a corresponding instruction stored in the memory or by the first generation module 604 being executed by the processor.
  • Step S208 Processing the attention map.
  • CRF Conditional Random Fields
  • the obtaining of the conditional random field can be obtained by any suitable method by a person skilled in the art, and the parameter in the conditional random field can reflect the spatial context information between the features, and the smoothing process of the attention map can be realized.
  • step S208 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the first processing module 610 being executed by the processor.
  • This step is an optional step. Through this step, the noise points in the attention map can be removed.
  • Step S210 Correct the feature information using the attention map.
  • the attention map has more significant feature information of the target object, and the feature information can be corrected by using the attention map to make the feature information of the target object more significant.
  • the attention map of the current sub-neural network is generated according to the feature information extracted by the current sub-neural network.
  • the modified sub-neural network is characterized by The input of the adjacent subsequent sub-neural network; and/or, if the current sub-neural network is the last sub-neural network of the plurality of sub-neural networks, the target information may be keyed according to the modified feature information of the current sub-neural network Point detection.
  • the plurality of feature maps are respectively upsampled to obtain features corresponding to the plurality of feature maps.
  • a plurality of attention maps of different resolutions may be generated according to the feature information corresponding to the plurality of feature maps, and the generated attention graphs of different resolutions are combined to generate a final sub-neural network.
  • the attention map of the target object using the final attention map to correct the feature map of the current hourglass output, and obtain the corrected feature information.
  • the hourglass neural network includes a plurality of hourglass neural networks, each of the hourglass neural networks performs the above-described correction process.
  • the pixel value of the region corresponding to at least part of the non-target object in the feature map indicating the feature information extracted by the current sub-neural network may be zeroed, and the current sub-neural network is corrected.
  • Feature information Feature information.
  • the point of 1 in the attention map will not change the value of the feature information of the corresponding position, but the point of 0 in the attention map will set the feature information of the corresponding position to 0, thereby classifying to the non-target object.
  • the target object is more prominent.
  • the point of 0 will no longer participate in the next processing, which reduces the data processing burden of the key point detection of the target object and improves the processing efficiency.
  • the feature map representing the feature information extracted by the current sub-neural network is used, using the attention map of the current sub-neural network.
  • the pixel value of at least part of the non-target object corresponding to the pixel is set to zero, and the feature information of the area where the target object is located is obtained; and/or, if the current sub-neural network is not the first N sub-neural networks set in the plurality of sub-neural networks And performing a feature extraction operation on the feature map of the feature information indicating the region where the target object is located by the current child neural network, and generating a attention map of the current child neural network according to the extracted feature information; using the attention map of the current child neural network, The pixel value of the region corresponding to the key point of the non-target object in the feature map of the feature information extracted by the current sub-neural network is set to zero, and the feature information of the region corresponding to the key point of the target object is obtained; wherein, the first N sub-neural networks The resolution of the corresponding attention map is lower than the post-MN sub-neural network pair FIG attention resolution, wherein, M is the total number of the plurality
  • the attention map when the attention map is used to correct the feature information, it can be determined whether the current hourglass sub-neural network is the first N sub-neural networks set in the plurality of sub-neural networks; If yes, use the attention map to correct the feature map of the current hourglass neural network output; obtain the feature information of the region where the target object is located; if not, use the attention map to correct the feature map of the current hourglass neural network output, and obtain the key of the target object. Feature information of the point. In this manner, the feature information extracted by the stacked plurality of hourglass sub-neural networks is distinguished, and the differentiation can be implemented by adjusting network parameters.
  • the resolution of the feature information extracted by the first N hourglass neural networks is lower, which can make the foreground part of the target object more prominent, and remove the influence of the background part on the subsequent target object as much as possible; the latter MN hourglass nerves
  • the resolution of the feature information extracted by the network is high, and the influence of the background part can be removed, and the key points of the target object are further detected and identified.
  • N can be appropriately set by a person skilled in the art according to actual needs.
  • N can be set to half the number of M.
  • step S210 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by the first modification module 606 being executed by the processor.
  • Step S212 Perform key point detection on the target object according to the modified feature information.
  • step S212 may be performed by the processor invoking a corresponding instruction stored in the memory or by the detection module 608 being executed by the processor.
  • this example Based on the hourglass neural network, this example stacks eight hourglass neural networks together, the initial input is the source picture, and finally outputs multiple score maps for each pixel point in the source picture.
  • Each score map corresponds to a key point on the body of a human body.
  • the position with the highest score on the A key point score map represents the position where the key point A is detected.
  • the hourglass neural network continuously reduces the resolution through the POOLING layer, obtains the global features, and then interpolates the global features and combines them with the position of the resolution corresponding to the feature map.
  • the results of the neural network stacked by the eight hourglass neural networks are improved, and attention mechanisms are introduced behind the last convolutional layer of each hourglass neural network, including: generating attention maps, paying attention to The force is smoothed and the attention map is used to change the value of the input features in the source image.
  • the f in the formula is the feature in the feature layer of the last convolutional layer output of the current hourglass neural network
  • w ⁇ is the matrix of the linear transformation (including all network training parameters)
  • b is the bias
  • g() is nonlinear.
  • the equation of the transformation (such as conditional random field or SOFTMAX).
  • the characteristics of the feature layer include the characteristics of multiple channels (such as the three commonly used settings of 128, 256, 512), but s as the output, only one channel, through the nonlinear transformation g (), the value of s is controlled Between 0-1.
  • one way can normalize the value in the attention map to 0-1 through the traditional SOFTMAX function; the other way is to learn a smoothed kernel through multiple iterations, that is, by conditional The airport removes the noise in the attention map.
  • the obtaining of the conditional random field can be obtained by any suitable method by a person skilled in the art, and the parameter in the conditional random field can reflect the spatial context information between the features, and the smoothing process of the attention map can be realized.
  • the attention map is a W*H diagram with only one channel and the feature layer is the tensor of W*H*C. Where W is the width, H is the height, and C is the number of channels. Focus on the C channels and then multiply the points on the feature layer. Thus, the point of 1 in the attention map will not change the value of the corresponding position of the feature layer, but the point in the attention map is 0, the corresponding position in the feature layer will be 0, and thus classified into the background, not Then participate in the next judgment.
  • feature layers of different resolutions are used, which combines the judgment of global features and local detail features.
  • a plurality of attention images of different sizes are generated.
  • the 8*8 pixel size attention map can pull out the whole body from the background, but in the 64*64 attention map, only the key points of the human body are selected. Add the four attention maps together, and then use the merged attention map to change the value of the source image input features.
  • the attention mechanism of this example uses a coarse to fine attention mechanism.
  • the attention mechanism has different points of interest.
  • the network In the first four hourglass neural networks, the network is relatively shallow, and the ability to distinguish the foreground background is poor. Therefore, in the first four hourglass neural networks, only the attention and mechanism are used to distinguish the foreground and the background, and a rough segmentation is made. In the latter four hourglass neural networks, the network is deeper, the learning ability is stronger, and the resolution is better.
  • the attention mechanism is used to further distinguish the classification of key points in the foreground (such as the head or the hand).
  • this example replaces all or part of the residual modules in each hourglass sub-neural network with a new hourglass residual module structure.
  • the original residual module there are only two branches of the A branch (ie, the Identity mapping branch) and the B branch (ie, the Residual branch).
  • This example adds C.
  • Branch ie Hourglass residual branch.
  • the A branch is mainly used for mapping the input image of the current hourglass residual module, and still outputs the input image; the B branch sequentially performs 1 ⁇ 1 on the image input to the current hourglass residual module.
  • the attention mechanism is introduced in the hourglass neural network, which can effectively distinguish the foreground (such as human) and background (such as surrounding objects) where the target object of the image is located, and then focus on detecting the key points of the target object in the foreground.
  • the part that can make the target object occluded is divided into the foreground, so that it can be detected more easily in the subsequent detection; on the other side, the key points of the target object are determined by combining the feature map generated by the different resolution feature layers, and the resolution is better.
  • the features of the small feature maps produce a attention map that covers a relatively large area.
  • the features of the larger resolution feature maps produce attention maps that cover more detailed points.
  • the improved hourglass residual module is used to expand the receptive field of the model.
  • the neural network training method of various embodiments of the present application may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device and a server.
  • the neural network training method of this embodiment includes the following steps:
  • Step S302 performing a feature extraction operation on the training sample image including the target object via the neural network.
  • step S302 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by the second feature extraction module 702 being executed by the processor.
  • the neural network may be any suitable neural network that can implement feature extraction and target object key point detection, such as, but not limited to, convolutional neural networks, enhanced learning neural networks, and generation networks in an anti-neural network. and many more.
  • the convolutional neural network may be an hourglass neural network.
  • Step S304 Generate a attention map of the target object according to the extracted feature information.
  • the step S304 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a second generation module 704 being executed by the processor.
  • Step S306 Correct the feature information using the attention map.
  • step S306 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a second modification module 706 executed by the processor.
  • Step S308 Obtain key point prediction information of the target object according to the modified feature information.
  • the step S308 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a prediction module 708 being executed by the processor.
  • the training of neural networks such as convolutional neural networks is an iterative multi-training learning process.
  • the key points of the target objects in the image are predicted, and the key point prediction information of the target objects is obtained.
  • the network parameters of the convolutional neural network can be reversely adjusted to achieve a more accurate prediction.
  • the termination condition of the training may be a conventional condition such as the number of times the training meets the set number, and the embodiment of the present application does not limit this.
  • Step S310 Obtain a difference between the key point prediction information and the key point annotation information in the training sample image.
  • the manner of obtaining the difference between the key point prediction information and the key point labeling information may be appropriately set by a person skilled in the art according to actual needs, including but not limited to the mean square error mode, etc., which is not limited by the embodiment of the present application.
  • the step S310 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a difference obtaining module 710 executed by the processor.
  • Step S312 Adjust network parameters of the convolutional neural network according to the difference.
  • the step S310 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an adjustment module 712 that is executed by the processor.
  • the training of the neural network introducing the attention mechanism is realized, and the trained neural network can correct the feature information of the image to be detected by using the attention map, thereby realizing the correction of the feature of the image to be detected, which can make the to-be-detected
  • the feature information of the target object in the image is more prominent and easier to detect and recognize.
  • FIG. 6 a flow chart of the steps of another neural network training method in accordance with an embodiment of the present application is shown.
  • This embodiment takes the training of the hourglass neural network with the attention mechanism introduced as an example.
  • the training of other convolutional neural networks or other neural networks that introduce the attention mechanism can be implemented by referring to this embodiment.
  • the hourglass neural network in this embodiment includes a plurality of hourglass neural networks.
  • the neural network training method of this embodiment includes the following steps:
  • Step S402 performing a feature extraction operation on the training sample image including the target object via the hourglass neural network.
  • the step S402 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a second feature extraction module 802 executed by the processor.
  • the hourglass neural network includes a plurality of hourglass neural networks, as shown in FIG. 3, wherein the input of the first hourglass neural network is the original training sample image, and the other hourglass neural network The input is the output of the adjacent previous hourglass neural network.
  • the step may obtain the first feature information of the training sample image by performing a convolution operation on the training sample image via the convolutional neural network.
  • the training sample image is convoluted through the hourglass neural network to obtain the first feature information of the training sample image.
  • the neural network uses a convolutional neural network, such as an hourglass neural network, and the hourglass neural network includes a plurality of hourglass neural networks, wherein the output of the prior hourglass neural network serves as an adjacent rear-slipping neural network.
  • the input, each hourglass neural network is trained by the method of the embodiment of the present application.
  • the neural network includes a plurality of sub-neural networks stacked end-to-end
  • the feature extraction operation is performed on the training sample image including the target object via the neural network
  • multiple volumes of the current sub-neural network can be obtained.
  • the plurality of feature maps corresponding to different resolutions are outputted, and the plurality of feature maps are respectively upsampled to obtain feature information corresponding to the plurality of feature maps, so that the obtained feature information is rich and accurate.
  • each hourglass sub-neural network includes at least one hourglass residual module, and each hourglass residual module includes a first residual branch, a second residual branch, and The third residual branch.
  • a feature extraction operation is performed on each of the training sample images including the target object via each hourglass residual module in each hourglass neural network.
  • the method includes: performing an identity mapping on the image block input to the current hourglass residual module by using the first residual branch to obtain the first feature information included in the first image block after the identity mapping; and the second residual branch And performing convolution processing on the image region indicated by the convolution kernel size in the image block of the current hourglass residual module, obtaining second feature information included in the second image region after the convolution processing; inputting through the third residual branch
  • the image block of the current hourglass residual module is pooled according to the size of the pooled kernel, and convolution processing is performed on the image region in the pooled image block according to the convolution kernel size, and the image area after the convolution processing is performed.
  • Upsampling generating a third image block of the same size as the image block of the current hourglass residual module, obtaining third feature information of the third image block; combining the first feature information, the second feature information, and the third feature information Processing, obtaining the feature information extracted by the current hourglass residual module.
  • a conventional residual module that is, a residual module having only a first residual branch and a second residual branch, is also applicable to the solution of the embodiment of the present application. .
  • the current hourglass neural network is the first sub-neural network in the plurality of sub-neural networks
  • the inclusion of the input through the hourglass residual module and/or the residual module of the current hourglass sub-network The original image to be detected of the target object performs a feature extraction operation; and/or, if the current hourglass neural network is a non-first sub-neural network of the plurality of sub-neural networks, the hourglass residual module and/or the current hourglass sub-network
  • the residual module performs a feature extraction operation on the image outputted by the previous hourglass neural network adjacent to the current hourglass neural network.
  • the obtained feature information may be the feature information of the last convolutional layer output of the current hourglass neural network.
  • Step S404 Generate a attention map of the target object according to the extracted feature information.
  • the first feature information is nonlinearly transformed to obtain second feature information; and the attention map of the target object is generated according to the second feature information.
  • the method for generating an attention map in the embodiment shown in FIG. 2 above may be used, and details are not described herein again.
  • a plurality of attention maps of different resolutions may be generated according to the feature information corresponding to the plurality of feature maps; and the attention maps of the plurality of different resolutions are combined to generate an attention map of the final target object of the current sub-neural network.
  • the step S404 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a second generation module 804 that is executed by the processor.
  • Step S406 Correct the feature information using the attention map.
  • conditional random field may be used to smooth the attention map; or the normalization function may be used to normalize the attention map.
  • the attention map of the current sub-neural network is generated according to the feature information extracted by the current sub-neural network, and the current sub-neural network attention map is used to correct the current Feature information extracted by the sub-neural network; if the current sub-neural network is a non-last sub-neural network in the plurality of sub-neural networks, the modified feature information of the current sub-neural network is the input of the adjacent subsequent sub-neural network; and / Or, if the current sub-neural network is the last sub-neural network in the plurality of sub-neural networks, the key points are detected on the target object according to the modified feature information of the current sub-neural network.
  • the pixel value of the region corresponding to at least part of the non-target object in the feature map indicating the feature information extracted by the current sub-neural network may be zeroed, and the current sub-neural network is corrected. Feature information.
  • step S406 may be performed by the processor invoking a corresponding instruction stored in the memory or by the second modification module 806 being executed by the processor.
  • Step S408 Obtain key point prediction information of the target object according to the modified feature information.
  • step S408 can be performed by the processor invoking a corresponding instruction stored in the memory or by the prediction module 808 being executed by the processor.
  • Step S410 Obtain a difference between the key point prediction information and the key point labeling information in the training sample image.
  • the difference between the key point prediction information and the key point labeling information is calculated by the loss function, such as the mean square error between the two.
  • the step S410 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a difference obtaining module 810 executed by the processor.
  • Step S412 Adjust network parameters of the current hourglass sub-neural network according to the difference.
  • the step S412 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by an adjustment module 812 that is executed by the processor.
  • the training of a single hourglass neural network is realized.
  • the above training is performed for each hourglass neural network to realize the training of the entire hourglass.
  • the focus of training on different hourglass neural networks can be different.
  • an eight hourglass neural network is stacked into an hourglass neural network.
  • the network In the first four hourglass neural networks, the network is shallow and the foreground is resolved. The background's ability is poor, so in the first four hourglass neural networks, focus on training to distinguish the foreground and background through the attention mechanism, and make a rough segmentation.
  • the network In the latter four hourglass neural networks, the network is deeper, the learning ability is stronger, and the resolution is better.
  • the attention mechanism is used to further distinguish the classification of key points in the foreground (such as the head or the hand).
  • the differentiation of the emphasis can be achieved by a person skilled in the art by adjusting the network training parameters.
  • the residual module in the hourglass neural network for training can be improved, and the new hourglass residual module structure is used to replace all or part of the residual module in each hourglass neural network.
  • the original residual module there are only two branches of the A branch (that is, the identity mapping branch) and the B branch (that is, the residual branch), and this example adds the C branch (ie, the hourglass residual branch).
  • the judgment is not limited to a small area, and the training difficulty and burden of the hourglass sub-neural network are alleviated.
  • the training of the neural network introducing the attention mechanism is realized, and the trained neural network can correct the feature information of the image to be detected by using the attention map, thereby realizing the correction of the feature of the image to be detected, which can make the to-be-detected
  • the feature information of the target object in the image is more prominent and easier to detect and recognize.
  • the neural network training method of this embodiment may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like.
  • the method of any of the above embodiments of the present application may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like.
  • the method of any of the above embodiments of the present application may be executed by a processor, such as the processor, by executing a corresponding instruction stored in the memory to perform the method of any of the above embodiments of the present application. This will not be repeated below.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the key point detecting device of the embodiments of the present application can be used to implement the corresponding key point detecting method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
  • the key point detecting device of the embodiment includes: a first feature extraction module 502, configured to perform a feature extraction operation on the image to be detected including the target object via the neural network; and a first generation module 504, configured to use the extracted feature information And generating a attention map of the target object; a first correcting module 506, configured to correct the feature information by using a attention map; and a detecting module 508, configured to perform key point detection on the target object according to the modified feature information.
  • the key point detecting device of the embodiment includes: a first feature extraction module 602, configured to perform a feature extraction operation on the image to be detected including the target object via the neural network; and a first generation module 604, configured to use the extracted feature information And generating a attention map of the target object; the first correcting module 606 is configured to correct the feature information by using the attention map; and the detecting module 608 is configured to perform key point detection on the target object according to the modified feature information.
  • the first feature extraction module 602 is configured to perform a convolution operation on the image to be detected by using a convolutional neural network to obtain first feature information of the image to be detected;
  • the feature information is nonlinearly transformed to obtain second feature information; and the attention map of the target object is generated according to the second feature information.
  • the key point detecting apparatus of the embodiment further includes: a first processing module 610, configured to perform smoothing of the attention map by using the conditional random field before the first correction module 606 corrects the feature information by using the attention map Or, normalize the attention map using the normalization function.
  • a first processing module 610 configured to perform smoothing of the attention map by using the conditional random field before the first correction module 606 corrects the feature information by using the attention map Or, normalize the attention map using the normalization function.
  • the neural network includes multiple sub-neural networks stacked end-to-end; for each sub-neural network, the first generation module 604 generates a attention map of the current sub-neural network according to the feature information extracted by the current sub-neural network, the first correction The module 606 corrects the feature information extracted by the current child neural network by using the attention map of the current child neural network; if the current child neural network is the non-last child neural network in the plurality of child neural networks, the modified feature information of the current child neural network is phase The input of the next sub-neural network of the neighbor; and/or, if the current sub-neural network is the last sub-neural network of the plurality of sub-neural networks, the detection module 608 performs the target object according to the modified feature information of the current sub-neural network. Key point detection.
  • the first correction module 606 when correcting the feature information extracted by the current sub-neural network through the attention map of the current sub-neural network, according to the attention map of the current sub-neural network, the feature information indicating the feature information extracted by the current sub-neural network
  • the feature information indicating the feature information extracted by the current sub-neural network
  • at least a part of the pixel value corresponding to the non-target object is set to zero, and the modified feature information of the current sub-neural network is obtained.
  • the first correction module 606 sets, according to the attention map of the current sub-neural network, the pixel value of the region corresponding to at least part of the non-target object in the feature map indicating the feature information extracted by the current sub-neural network, to obtain the current sub-item
  • the attention map of the current sub-neural network is used to represent the feature information extracted by the current sub-neural network.
  • the pixel value of the region corresponding to at least part of the non-target object is set to zero, and the feature information of the region where the target object is located is obtained; and/or, if the current child neural network is not the first N children set in the plurality of child neural networks
  • the neural network performs a feature extraction operation on the feature map representing the feature information of the region where the target object is located by the current sub-neural network, generates a attention map of the current sub-neural network according to the extracted feature information, and uses the attention map of the current sub-neural network.
  • the pixel value of the area corresponding to the key point of the non-target object is set to zero, and the feature information of the area corresponding to the key point of the target object is obtained; wherein the resolution of the attention map corresponding to the first N sub-neural networks is lower than that of the latter MN
  • M represents the total number of the plurality of sub-neural networks, M is an integer greater than 1, N is an integer greater than 0, and N is less than M.
  • the first feature extraction module 602 obtains multiple feature maps of different resolutions corresponding to multiple convolutional layers of the current sub-neural network, and respectively performs multiple feature maps.
  • Sampling obtaining feature information corresponding to the plurality of feature maps;
  • the first generating module 604 generates a plurality of attention maps of different resolutions according to the feature information corresponding to the plurality of feature maps; and combining the attention graphs of the plurality of different resolutions Processing, generating a attention map of the final target object of the current child neural network.
  • the neural network is an hourglass neural network.
  • the hourglass neural network includes a plurality of hourglass sub-neural networks, each hourglass sub-network includes at least one hourglass residual module; each hourglass residual module includes a first residual branch, a second residual branch, and a third a residual branch; wherein the first feature extraction module 602 performs a feature extraction operation on the image to be detected including the target object through each hourglass residual module in each hourglass neural network, and performs a first residual branch pair Inputting an image block of the current hourglass residual module to perform an identity mapping to obtain first feature information included in the first image block after the identity mapping; and inputting a volume in the image block of the current hourglass residual module by the second residual branch pair
  • the image area indicated by the kernel size is subjected to convolution processing to obtain second feature information included in the second image region after the convolution processing; and the image block of the current hourglass residual module is input according to the size of the pooled kernel via the third residual branch
  • the pooling process is performed, and the image area in the pooled image block is convoluted
  • the first feature extraction module 602 performs a feature extraction operation: if the current hourglass neural network is the first sub-neural network in the plurality of sub-neural networks, the hourglass residual module and/or the current hourglass sub-network a residual module that performs a feature extraction operation on the input original image to be detected including the target object; and/or, if the current hourglass neural network is a non-first neural network in the plurality of child neural networks, passes the current hourglass neural network The hourglass residual module and/or the residual module of the network performs feature extraction operations on the image outputted by the previous hourglass neural network adjacent to the current hourglass neural network.
  • the key point detecting device of the embodiments of the present application can be used to implement the corresponding neural network training method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
  • FIG. 9 a block diagram of a neural network training device in accordance with an embodiment of the present application is shown.
  • the neural network training device of this embodiment includes: a second feature extraction module 702, configured to perform a feature extraction operation on the training sample image including the target object via the neural network; and a second generation module 704, configured to use the extracted feature information, Generating a attention map of the target object; a second correction module 706, configured to correct the feature information by using the attention map; and a prediction module 708, configured to obtain key point prediction information of the target object according to the modified feature information; the difference obtaining module 710 And a method for obtaining a difference between the key point prediction information and the key point labeling information in the training sample image; the adjusting module 712, configured to adjust the network parameter of the neural network according to the difference.
  • FIG. 10 a block diagram of a structure of another neural network training apparatus according to an embodiment of the present application is shown.
  • the neural network training device of the present embodiment includes: a second feature extraction module 802, configured to perform a feature extraction operation on the training sample image including the target object via the neural network; and a second generation module 804, configured to use the extracted feature information, Generating a attention map of the target object; a second correction module 806, configured to correct the feature information by using the attention map; and a prediction module 808, configured to obtain key point prediction information of the target object according to the modified feature information; the difference obtaining module 810 And for obtaining a difference between the key point prediction information and the key point labeling information in the training sample image; the adjusting module 812 is configured to adjust the network parameter of the neural network according to the difference.
  • the second feature extraction module 802 is configured to perform a convolution operation on the training sample image by the convolutional neural network to obtain first feature information of the training sample image
  • the second generation module 804 is configured to perform non-first feature information. Linear transformation, obtaining second feature information; generating a attention map of the target object according to the second feature information.
  • the neural network training apparatus of this embodiment further includes: a second processing module 814, configured to perform smoothing processing on the attention map by using the CRF before the second correction module 806 corrects the feature information by using the attention map; or
  • the normalization function is used to normalize the attention map.
  • the neural network includes a plurality of sub-neural networks stacked end-to-end; for each sub-neural network, the second generation module 804 generates a attention map of the current sub-neural network according to the feature information extracted by the current sub-neural network, and the second correction
  • the module 806 corrects the feature information extracted by the current child neural network by using the attention map of the current child neural network; if the current child neural network is the non-last child neural network of the plurality of child neural networks, the modified feature information of the current child neural network is phase The input of the next sub-neural network of the neighbor; and/or, if the current sub-neural network is the last sub-neural network of the plurality of sub-neural networks, the prediction module 808 performs the target object according to the modified feature information of the current sub-neural network. Key point prediction, obtaining key point prediction information of the target object.
  • the second correction module 806 when correcting the feature information extracted by the current sub-neural network through the attention map of the current sub-neural network, and characterizing the feature information extracted by the current sub-neural network according to the attention map of the current sub-neural network
  • at least a part of the pixel value corresponding to the non-target object is set to zero, and the modified feature information of the current sub-neural network is obtained.
  • the second feature extraction module 802 obtains multiple feature maps of different resolutions corresponding to the output of the plurality of convolution layers of the current sub-neural network, and separately samples the plurality of feature images. Obtaining feature information corresponding to the plurality of feature maps; the second generation module 804 generates a plurality of attention maps of different resolutions according to the feature information corresponding to the plurality of feature maps; and combining the attention graphs of the plurality of different resolutions, Generates a attention map of the final target object of the current child neural network.
  • the neural network is an hourglass neural network.
  • the hourglass neural network comprises a plurality of hourglass neural networks, wherein the output of the prior hourglass neural network is used as an input of an adjacent backslip neural network, and each of the hourglass neural networks adopts the embodiment.
  • the neural network training device is trained.
  • each hourglass sub-neural network includes at least one hourglass residual module; each hourglass residual module includes a first residual branch, a second residual branch, and a third residual branch; wherein the second feature extraction module
  • the 802 performs a feature extraction operation on the training sample image including the target object through each hourglass residual module in each hourglass neural network, and performs the image block of the current hourglass residual module through the first residual branch.
  • mapping obtaining first feature information included in the first image block after the identity mapping; performing convolution processing on the image region indicated by the convolution kernel size in the image block of the current hourglass residual module by the second residual branch Obtaining second feature information included in the second image region after the convolution processing; and the image block input to the current hourglass residual module is pooled according to the size of the pooled kernel through the third residual branch, and according to the size of the convolution kernel Convolution processing is performed on the image region in the image block after the pooling process, and the image region after the convolution processing is upsampled to generate and input a map of the current hourglass residual module.
  • the third image block having the same block size obtains the third feature information of the third image block; the first feature information, the second feature information, and the third feature information are combined to obtain the feature information extracted by the current hourglass residual module .
  • the second feature extraction module 802 performs a feature extraction operation: if the current hourglass neural network is the first child neural network in the plurality of child neural networks, passing the hourglass residual module of the current hourglass neural network and/or a residual module that performs a feature extraction operation on the input original image to be detected including the target object; and/or, if the current hourglass neural network is a non-first neural network in the plurality of child neural networks, passes the current hourglass neural network The hourglass residual module and/or the residual module of the network performs feature extraction operations on the image outputted by the previous hourglass neural network adjacent to the current hourglass neural network.
  • an embodiment of the present application further provides an electronic device, including: a processor and a memory.
  • the memory is configured to store at least one executable instruction that causes the processor to perform the key point detection method of any of the above embodiments of the present application or the operation corresponding to the neural network training method.
  • the electronic device provided by the embodiment of the present application may be, for example, a mobile terminal, a personal computer (PC), a tablet computer, a server, or the like.
  • the electronic device 900 includes one or more first processors, a first communication component, etc., the one or more first processors such as: one or more central processing units (CPUs) 901, and / or one or more image processor (GPU) 913 or the like, the first processor may be loaded into random access memory (RAM) 903 according to executable instructions stored in read only memory (ROM) 902 or from storage portion 908.
  • the executable instructions execute various appropriate actions and processes.
  • the first read only memory 902 and the random access memory 903 are collectively referred to as a first memory.
  • the first communication component includes a communication component 912 and/or a communication interface 909.
  • the communication component 912 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card
  • the communication interface 909 includes a communication interface of a network interface card such as a LAN card, a modem, etc.
  • the communication interface 909 is via an Internet interface such as The network performs communication processing.
  • the first processor can communicate with read only memory 902 and/or random access memory 903 to execute executable instructions, connect to communication component 912 via first communication bus 904, and communicate with other target devices via communication component 912, thereby completing
  • the operation corresponding to the key point detecting method of any embodiment of the present application for example, performing a feature extraction operation on the image to be detected including the target object via a neural network; generating a attention map of the target object according to the extracted feature information; using the attention map Correcting the feature information; performing key point detection on the target object according to the modified feature information.
  • performing the operation corresponding to the neural network training method of any embodiment of the present application for example, performing feature extraction on the training sample image including the target object via the neural network; and generating an attention map of the target object according to the extracted feature information; Using the attention map to correct the feature information; obtaining key point prediction information of the target object according to the modified feature information; obtaining a difference between the key point prediction information and the key point labeling information in the training sample image Adjusting network parameters of the neural network according to the difference.
  • RAM 903 various programs and data required for the operation of the device can be stored.
  • the CPU 901 or the GPU 913, the ROM 902, and the RAM 903 are connected to each other through the first communication bus 904.
  • ROM 902 is an optional module.
  • the RAM 903 stores executable instructions or writes executable instructions to the ROM 902 at runtime, the executable instructions causing the first processor to perform operations corresponding to the above-described communication methods.
  • An input/output (I/O) interface 905 is also coupled to the first communication bus 904.
  • the communication component 912 can be integrated or can be configured to have multiple sub-modules (e.g., multiple IB network cards) and be on a communication bus link.
  • the following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, etc.; an output portion 907 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 908 including a hard disk or the like. And a communication interface 909 including a network interface card such as a LAN card, a modem, or the like.
  • Driver 910 is also connected to I/O interface 905 as needed.
  • a removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 910 as needed so that a computer program read therefrom is installed into the storage portion 908 as needed.
  • FIG. 11 is only an optional implementation manner.
  • the number and type of components in the foregoing FIG. 11 may be selected, deleted, added, or replaced according to actual needs; Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, communication components can be separated, or integrated on the CPU or GPU. ,and many more.
  • These alternative embodiments are all within the scope of the present application.
  • embodiments of the present application include a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising the corresponding execution
  • the instruction corresponding to the method step provided by the embodiment of the present application for example, performs a feature extraction operation on the image to be detected including the target object via the neural network; generates an attention map of the target object according to the extracted feature information; and uses the attention map correction method Describe feature information; perform key point detection on the target object according to the modified feature information.
  • the computer program can be downloaded and installed from the network via a communication component, and/or installed from the removable media 911.
  • the above-described functions defined in the method of the embodiments of the present application are executed when the computer program is executed by the first processor.
  • the embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, and when the computer program is executed by the processor, the key point detecting method or neural network of any of the above embodiments of the present application is implemented. Training method.
  • the embodiment of the present application further provides a computer program, including computer instructions, when the computer instruction is run in a processor of the device, implementing the key point detection method or neural network training of any of the above embodiments of the present application. method.
  • the methods, apparatus, and apparatus of the present application may be implemented in a number of ways.
  • the method, apparatus, and apparatus of the embodiments of the present application can be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the embodiments of the present application are not limited to the order of the above optional description unless otherwise specified.
  • the present application may also be embodied as a program recorded in a recording medium, the programs including machine readable instructions for implementing a method in accordance with embodiments of the present application.
  • the present application also covers a recording medium storing a program for performing the method according to an embodiment of the present application.

Abstract

一种关键点检测方法、神经网络训练方法、装置和电子设备,其中,所述关键点检测方法包括:经神经网络对包括有目标对象的待检测图像进行特征提取操作(S102);根据提取到的特征信息,生成目标对象的注意力图(S104);使用注意力图修正特征信息(S106);根据修正后的特征信息,对目标对象进行关键点检测(S108)。上述方法使得待检测图像中的目标对象的特征信息更为突出,更易被检测和识别,提高检测准确性,减少误检或漏检现象。

Description

关键点检测方法、神经网络训练方法、装置和电子设备
本申请要求在2017年02月23日提交中国专利局、申请号为CN201710100498.2、申请名称为“关键点检测方法、神经网络训练方法、装置和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能技术,尤其涉及一种关键点检测方法、装置和电子设备,以及一种神经网络训练方法、装置和电子设备。
背景技术
神经网络是用于计算机视觉和模式识别的一个重要的研究领域,它通过计算机仿照生物大脑思维启发进行类似人类对特定对象的信息处理。通过神经网络,能够有效地进行目标对象(如人、动物、车辆等)检测和识别。随着互联网技术的发展,信息量的急剧增加,神经网络被越来越广泛地应用于图像检测及目标对象识别领域,以从大量的信息中寻找出实际所需的信息。
发明内容
本申请实施例提供了一种关键点检测方案和一种神经网络训练方案。
根据本申请实施例的一个方面,提供了一种关键点检测方法,包括:经神经网络对包括有目标对象的待检测图像进行特征提取;根据提取到的特征信息,生成所述目标对象的注意力图;使用所述注意力图修正所述特征信息;根据修正后的特征信息,对所述目标对象进行关键点检测。
可选地,所述经神经网络对包括有目标对象的待检测图像进行特征提取操作,包括:经卷积神经网络对所述待检测图像进行卷积操作,获得所述待检测图像的第一特征信息;所述根据提取到的特征信息,生成所述目标对象的注意力图,包括:对所述第一特征信息进行非线性变换,获得第二特征信息;根据所述第二特征信息,生成所述目标对象的注意力图。
可选地,在使用所述注意力图修正所述特征信息之前,所述方法还包括:使用条件随机场对所述注意力图进行平滑化处理;或者,使用归一化函数对所述注意力图进行归一化处理。
可选地,所述神经网络包括端对端堆叠的多个子神经网络;针对每一个子神经网络,根据当前子神经网络提取的特征信息生成当前子神经网络的注意力图,使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息;如果当前子神经网络为所述多个子神经网络中的非末个子神经网络,则当前子神经网络修正后的特征信息为相邻的后一子神经网络的输入;和/或,如果当前子神经网络为所述多个子神经网络中的末个子神经网络,则根据当前子神经网络修正后的特征信息,对所述目标对象进行关键点检测。
可选地,所述使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息,包括:根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息。
可选地,根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息,包括:如果当前子神经网络是所述多个子神经网络中设定的前N个子神经网络,则使用当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得所述目标对象所在的区域的特征信息;和/或,如果当前子神经网络并非所述多个子神经网络中设定的前N个子神经网络,则经当前子神经网络对表示目标对象所在的区域的特征信息的特征图进行特征提取,根据提取到的特征信息生成当前子神经网络的注意力图;使用当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象的关键点对应的区域的像素值置零,获得所述目标对象的关键点对应的区域的特征信息;其中,所述前N个子神经网络对应的注意力图的分辨率,低于后M-N个子神经网络对应的注意力图的分辨率,其中,M表示所述多个子神经网络的总数量,M为大于1的整数,N为大于0的整数且N小于M。
可选地,针对每一个子神经网络,所述经神经网络对包括有目标对象的待检测图像进行特征提取,包括:获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息;所述根据提取到的特征信息,生成所述目标对象的注意力图,包括:根据所述多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图;对所述多个不同 分辨率的注意力图进行合并处理,生成当前子神经网络的目标对象的注意力图。
可选地,所述神经网络包括:沙漏神经网络。
可选地,所述沙漏神经网络包括多个沙漏子神经网络,每个沙漏子神经网络包括至少一个沙漏残差模块;每个沙漏残差模块包括第一残差分支、第二残差分支和第三残差分支;其中,经每个沙漏子神经网络中的每个沙漏残差模块对包括有目标对象的待检测图像进行特征提取,包括:经所述第一残差分支对输入当前沙漏残差模块的图像块进行恒等映射,获得恒等映射后的第一图像块包含的第一特征信息;经所述第二残差分支对输入当前沙漏残差模块的图像块中的卷积核大小指示的图像区域进行卷积处理,获得卷积处理后的第二图像区域包含的第二特征信息;经所述第三残差分支将输入当前沙漏残差模块的图像块按照池化核大小进行池化处理,并按照卷积核大小对池化处理后的图像块中的图像区域进行卷积处理,对卷积处理后的图像区域进行上采样,生成与输入当前沙漏残差模块的图像块大小相同的第三图像块,获得所述第三图像块的第三特征信息;将所述第一特征信息、第二特征信息和第三特征信息进行合并处理,获得当前沙漏残差模块提取到的特征信息。
可选地,如果当前沙漏子神经网络为所述多个子神经网络中的首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对输入的包括有目标对象的原始待检测图像进行特征提取;和/或,如果当前沙漏子神经网络为所述多个子神经网络中的非首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对与当前沙漏子神经网络相邻的前一沙漏子神经网络输出的图像进行特征提取。
根据本申请实施例的另一个方面,提供了一种神经网络训练方法,包括:经神经网络对包括目标对象的训练样本图像进行特征提取;根据提取到的特征信息,生成所述目标对象的注意力图;使用所述注意力图修正所述特征信息;根据修正后的特征信息,获得目标对象的关键点预测信息;获得所述关键点预测信息与所述训练样本图像中的关键点标注信息之间的差异;根据所述差异调整所述神经网络的网络参数。
可选地,所述经神经网络对包括有目标对象的训练样本图像进行特征提取,包括:经卷积神经网络对所述训练样本图像进行卷积,获得所述训练样本图像的第一特征信息;所述根据提取到的特征信息,生成所述目标对象的注意力图,包括:对所述第一特征信息进行非线性变换,获得第二特征信息;根据所述第二特征信息,生成所述目标对象的注意力图。
可选地,在使用所述注意力图修正所述特征信息之前,所述方法还包括:使用条件随机场对所述注意力图进行平滑化处理;或者,使用归一化函数对所述注意力图进行归一化处理。
可选地,所述神经网络包括端对端堆叠的多个子神经网络;针对每一个子神经网络,根据当前子神经网络提取的特征信息生成当前子神经网络的注意力图,使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息;如果当前子神经网络为所述多个子神经网络中的非末个子神经网络,则当前子神经网络修正后的特征信息为相邻的后一子神经网络的输入;和/或,如果当前子神经网络为所述多个子神经网络中的末个子神经网络,则根据当前子神经网络修正后的特征信息,对所述目标对象进行关键点预测,获得目标对象的关键点预测信息。
可选地,所述使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息,包括:根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息。
可选地,针对每一个子神经网络,所述经神经网络对包括目标对象的训练样本图像进行特征提取,包括:获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息;所述根据提取到的特征信息,生成所述目标对象的注意力图,包括:根据多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图;对多个不同分辨率的注意力图进行合并处理,生成当前子神经网络的目标对象的注意力图。
可选地,所述神经网络为沙漏神经网络。
可选地,所述沙漏神经网络包括多个沙漏子神经网络,其中,在先沙漏子神经网络的输出作为相邻的在后沙漏子神经网络的输入,每个沙漏子神经网络均采用本申请上述任一实施例所述的神经网络训练方法进行训练。
可选地,每个沙漏子神经网络包括至少一个沙漏残差模块;每个沙漏残差模块包括第一残差分支、第二残差分支和第三残差分支;其中,经每个沙漏子神经网络中的每个沙漏残差模块对包括有目标对象的训练样本图像进行特征提取,包括:经所述第一残差分支对输入当前沙漏残差模块的图像块进行恒等映射,获得恒等映射后的第一图像块包含的第一特征信息;经所述第二残差分支对输入当前沙漏残差模块的图像块中的卷积核大小指示的图像区域进行卷积处理,获得卷积处理后的第二图像区域包含的第二特征信息;经所述第三残差分支将输入当前沙漏残差模块的图像块按照池化核大小进行池化处理,并按 照卷积核大小对池化处理后的图像块中的图像区域进行卷积处理,对卷积处理后的图像区域进行上采样,生成与输入当前沙漏残差模块的图像块大小相同的第三图像块,获得所述第三图像块的第三特征信息;将所述第一特征信息、第二特征信息和第三特征信息进行合并处理,获得当前沙漏残差模块提取到的特征信息。
可选地,如果当前沙漏子神经网络为所述多个子神经网络中的首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对输入的包括有目标对象的原始待检测图像进行特征提取;和/或,如果当前沙漏子神经网络为所述多个子神经网络中的非首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对与当前沙漏子神经网络相邻的前一沙漏子神经网络输出的图像进行特征提取。
根据本申请实施例的又一个方面,提供了一种关键点检测装置,包括:第一特征提取模块,用于经神经网络对包括有目标对象的待检测图像进行特征提取;第一生成模块,用于根据提取到的特征信息,生成所述目标对象的注意力图;第一修正模块,用于使用所述注意力图修正所述特征信息;检测模块,用于根据修正后的特征信息,对所述目标对象进行关键点检测。
可选地,所述第一特征提取模块,用于经卷积神经网络对所述待检测图像进行卷积操作,获得所述待检测图像的第一特征信息;所述第一生成模块,用于对所述第一特征信息进行非线性变换,获得第二特征信息;根据所述第二特征信息,生成所述目标对象的注意力图。
可选地,所述装置还包括:第一处理模块,用于在所述第一修正模块使用所述注意力图修正所述特征信息之前,使用条件随机场对所述注意力图进行平滑化处理;或者,使用归一化函数对所述注意力图进行归一化处理。
可选地,所述神经网络包括端对端堆叠的多个子神经网络;针对每一个子神经网络,所述第一生成模块根据当前子神经网络提取的特征信息生成当前子神经网络的注意力图,所述第一修正模块使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息;如果当前子神经网络为所述多个子神经网络中的非末个子神经网络,则当前子神经网络修正后的特征信息为相邻的后一子神经网络的输入;和/或,如果当前子神经网络为所述多个子神经网络中的末个子神经网络,则所述检测模块根据当前子神经网络修正后的特征信息,对所述目标对象进行关键点检测。
可选地,所述第一修正模块在使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息时,根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息。
可选地,所述第一修正模块在根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息时,如果当前子神经网络是所述多个子神经网络中设定的前N个子神经网络,则使用当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得所述目标对象所在的区域的特征信息;和/或,如果当前子神经网络并非所述多个子神经网络中设定的前N个子神经网络,则经当前子神经网络对表示目标对象所在的区域的特征信息的特征图进行特征提取,根据提取到的特征信息生成当前子神经网络的注意力图;使用当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象的关键点对应的区域的像素值置零,获得所述目标对象的关键点对应的区域的特征信息;其中,所述前N个子神经网络对应的注意力图的分辨率,低于后M-N个子神经网络对应的注意力图的分辨率,其中,M表示所述多个子神经网络的总数量,M为大于1的整数,N为大于0的整数且N小于M。
可选地,针对每一个子神经网络,所述第一特征提取模块获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息;所述第一生成模块根据多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图;对多个不同分辨率的注意力图进行合并处理,生成当前子神经网络的目标对象的注意力图。
可选地,所述神经网络为沙漏神经网络。
可选地,所述沙漏神经网络包括多个沙漏子神经网络,每个沙漏子神经网络包括至少一个沙漏残差模块;每个沙漏残差模块包括第一残差分支、第二残差分支和第三残差分支;其中,所述第一特征提取模块在经每个沙漏子神经网络中的每个沙漏残差模块对包括有目标对象的待检测图像进行特征提取时,经所述第一残差分支对输入当前沙漏残差模块的图像块进行恒等映射,获得恒等映射后的第一图像块包含的第一特征信息;经所述第二残差分支对输入当前沙漏残差模块的图像块中的卷积核大小指示的图像区域进行卷积处理,获得卷积处理后的第二图像区域包含的第二特征信息;经所述第三残差分支将输入当前沙漏残差模块的图像块按照池化核大小进行池化处理,并按照卷积核大小对池化处理后的图像块中的图像区域进行卷积处理,对卷积处理后的图像区域进行上采样,生成与输入当前沙漏残差模块的图像 块大小相同的第三图像块,获得所述第三图像块的第三特征信息;将所述第一特征信息、第二特征信息和第三特征信息进行合并处理,获得当前沙漏残差模块提取到的特征信息。
可选地,第一特征提取模块在进行特征提取时:如果当前沙漏子神经网络为所述多个子神经网络中的首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块残差模块,对输入的包括有目标对象的原始待检测图像进行特征提取;和/或,如果当前沙漏子神经网络为所述多个子神经网络中的非首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或,对与当前沙漏子神经网络相邻的前一沙漏子神经网络输出的图像进行特征提取。
根据本申请实施例的再一个方面,提供了一种神经网络训练装置,包括:第二特征提取模块,用于经神经网络对包括目标对象的训练样本图像进行特征提取;第二生成模块,用于根据提取到的特征信息,生成所述目标对象的注意力图;第二修正模块,用于使用所述注意力图修正所述特征信息;预测模块,用于根据修正后的特征信息,获得目标对象的关键点预测信息;差异获得模块,用于获得所述关键点预测信息与所述训练样本图像中的关键点标注信息之间的差异;调整模块,用于根据所述差异调整所述神经网络的网络参数。
可选地,所述第二特征提取模块,用于经卷积神经网络对所述训练样本图像进行卷积,获得所述训练样本图像的第一特征信息;所述第二生成模块,用于对所述第一特征信息进行非线性变换,获得第二特征信息;根据所述第二特征信息,生成所述目标对象的注意力图。
可选地,所述装置还包括:第二处理模块,用于在所述第二修正模块使用所述注意力图修正所述特征信息之前,使用条件随机场对所述注意力图进行平滑化处理;或者,使用归一化函数对所述注意力图进行归一化处理。
可选地,所述神经网络包括端对端堆叠的多个子神经网络;针对每一个子神经网络,所述第二生成模块根据当前子神经网络提取的特征信息生成当前子神经网络的注意力图,所述第二修正模块使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息;如果当前子神经网络为所述多个子神经网络中的非末个子神经网络,则当前子神经网络修正后的特征信息为相邻的后一子神经网络的输入;和/或,如果当前子神经网络为所述多个子神经网络中的末个子神经网络,则所述预测模块根据当前子神经网络修正后的特征信息,对所述目标对象进行关键点预测,获得目标对象的关键点预测信息。
可选地,所述第二修正模块在通过当前子神经网络的注意力图修正当前子神经网络提取的特征信息时,根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息。
可选地,针对每一个子神经网络,所述第二特征提取模块获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息;所述第二生成模块根据多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图;对多个不同分辨率的注意力图进行合并处理,生成当前子神经网络的目标对象的注意力图。
可选地,所述神经网络为沙漏神经网络。
可选地,所述沙漏神经网络包括多个沙漏子神经网络,其中,在先沙漏子神经网络的输出作为相邻的在后沙漏子神经网络的输入,每个沙漏子神经网络均采用第四方面所述的装置进行训练。
可选地,每个沙漏子神经网络包括至少一个沙漏残差模块;每个沙漏残差模块包括第一残差分支、第二残差分支和第三残差分支;其中,所述第二特征提取模块在经每个沙漏子神经网络中的每个沙漏残差模块对包括有目标对象的训练样本图像进行特征提取时,经所述第一残差分支对输入当前沙漏残差模块的图像块进行恒等映射,获得恒等映射后的第一图像块包含的第一特征信息;经所述第二残差分支对输入当前沙漏残差模块的图像块中的卷积核大小指示的图像区域进行卷积处理,获得卷积处理后的第二图像区域包含的第二特征信息;经所述第三残差分支将输入当前沙漏残差模块的图像块按照池化核大小进行池化处理,并按照卷积核大小对池化处理后的图像块中的图像区域进行卷积处理,对卷积处理后的图像区域进行上采样,生成与输入当前沙漏残差模块的图像块大小相同的第三图像块,获得所述第三图像块的第三特征信息;将所述第一特征信息、第二特征信息和第三特征信息进行合并处理,获得当前沙漏残差模块提取到的特征信息。
可选地,第二特征提取模块在进行特征提取时:如果当前沙漏子神经网络为所述多个子神经网络中的首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对输入的包括有目标对象的原始待检测图像进行特征提取;和/或,如果当前沙漏子神经网络为所述多个子神经网络中的非首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对与当前沙漏子神经网络相邻的前一沙漏子神经网络输出的图像进行特征提取。
根据本申请实施例的再一个方面,提供了一种电子设备,包括:处理器和存储器;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如本申请上述任一实施例提供的关键点检测 方法或者神经网络训练方法对应的操作。
根据本申请实施例的再一个方面,提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现上述本申请上述任一实施例提供的关键点检测方法或者神经网络训练方法。根据本申请实施例的再一个方面,提供了另一种计算机可读存储介质,所述计算机可读存储介质存储有:用于经神经网络对包括有目标对象的待检测图像进行特征提取的可执行指令;用于根据提取到的特征信息,生成所述目标对象的注意力图的可执行指令;用于使用所述注意力图修正所述特征信息的可执行指令;用于根据修正后的特征信息,对所述目标对象进行关键点检测的可执行指令。
根据本申请实施例的再一个方面,提供了又一种计算机可读存储介质,所述计算机可读存储介质存储有:用于经神经网络对包括目标对象的训练样本图像进行特征提取的可执行指令;用于根据提取到的特征信息,生成所述目标对象的注意力图的可执行指令;用于使用所述注意力图修正所述特征信息的可执行指令;用于根据修正后的特征信息,获得目标对象的关键点预测信息的可执行指令;用于获得所述关键点预测信息与所述训练样本图像中的关键点标注信息之间的差异的可执行指令;用于根据所述差异调整所述神经网络的网络参数的可执行指令。
根据本申请实施例提供的技术方案,将注意力(Attention)机制引入神经网络,根据神经网络输出的目标对象的特征信息,生成注意力图。引入注意力机制后的神经网络可以重点关注目标对象的信息,在生成的注意力图中,目标对象的特征信息与非目标对象的特征信息存在较大差异。因此,使用注意力图对特征图进行修正,从而实现对目标对象的特征的修正,可以使得待检测图像中的目标对象的特征信息更为突出,更易被检测和识别,提高检测结果的准确性,减少误检或漏检现象。
下面通过附图和实施例,对本申请实施例的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1是根据本申请实施例的一种关键点检测方法的步骤流程图;
图2是根据本申请实施例的另一种关键点检测方法的步骤流程图;
图3是图2所示实施例中的一种用于关键点检测的沙漏网络结构的示意图;
图4是图2所示实施例中的一种改进的沙漏残差模块的示意图;
图5是根据本申请实施例的一种神经网络训练方法的步骤流程图;
图6是根据本申请实施例的另一种神经网络训练方法的步骤流程图;
图7是根据本申请实施例的一种关键点检测装置的结构框图;
图8是根据本申请实施例的另一种关键点检测装置的结构框图;
图9是根据本申请实施例的一种神经网络训练装置的结构框图;
图10是根据本申请实施例的另一种神经网络训练装置的结构框图;
图11是根据本申请实施例的一种电子设备的结构示意图。
具体实施方式
下面结合附图(若干附图中相同的标号表示相同的元素)和实施例,对本申请实施例的实施方式作进一步详细说明。以下实施例用于说明本申请,但不用来限制本申请的范围。
应注意到:除非另外可选说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。本领域技术人员可以理解,本申请实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知 的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
参照图1,示出了根据本申请实施例的一种关键点检测方法的步骤流程图。本申请各实施例的关键点检测方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。本实施例的关键点检测方法包括以下步骤:
步骤S102:经神经网络对包括有目标对象的待检测图像进行特征提取操作。
本申请实施例中,神经网络可以是任意适当的可实现特征提取或目标对象检测的神经网络,例如包括但不限于:卷积神经网络、增强学习神经网络、对抗神经网络中的生成网络等等。神经网络中结构的设置,如卷积层的层数、卷积核的大小、通道数等等,可以由本领域技术人员根据实际需求适当设定,本申请实施例对此不作限制。
在一个可选示例中,该步骤S102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一特征提取模块502执行。
通过神经网络的特征提取,可以获得目标对象的特征信息,例如,通过卷积神经网络的特征提取,获得包括有特征信息的特征图(Feature Map)。
步骤S104:根据提取到的特征信息,生成目标对象的注意力图。
本申请实施例中,在神经网络中引入注意力机制,并生成注意力图(Attention Map)。
在一个可选示例中,该步骤S104可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一生成模块504执行。
人类视觉注意力对信息的处理不是均衡的,它会自动地对感兴趣的区域进行处理,提取出有用的信息,而对不感兴趣的区域则不作处理,以使人类能够在复杂的视觉环境中快速定位感兴趣目标。注意力机制是一种用计算机来模拟人类视觉注意力的模型,在图像中提取人眼所能观察到的引人注意的焦点,也即,图像的显著性区域。而基于神经网络提取出的特征图生成注意力图,一方面,使得图像的显著性区域,如目标对象所在的区域,表现得更为显著;另一方面,与处理原始图像相比,减轻了注意力机制的数据处理负担。
步骤S106:使用注意力图修正特征信息。
在一个可选示例中,该步骤S106可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一修正模块506执行。
因注意力图中目标对象所在的区域较为显著,因此,可以使用注意力图修正特征信息,例如,使用注意力图对特征图进行修正,以有效过滤非目标对象的信息,使得待目标对象的信息更为突出。
步骤S108:根据修正后的特征信息,对目标对象进行关键点检测。
在一个可选示例中,该步骤S106可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的检测模块508执行。
如上所述,修正后的特征信息能够使目标对象的特征信息更为突出,一方面,非目标对象的信息对目标对象的识别和检测造成的干扰较小;另一方面,通过注意力机制提取出的目标对象的特征信息之间具有一定的空间上下文关联,突出的目标对象的特征信息便于神经网络对关键点的全面检测,尽可能避免关键点漏检。以上,都使得目标对象更易于被检测和识别出来。
根据本实施例的关键点检测方法,将注意力机制引入神经网络,根据神经网络输出的特征信息,生成注意力图。引入注意力机制后的神经网络可以重点关注目标对象的信息,在生成的注意力图中,目标对象的特征信息与非目标对象的特征信息存在较大差异。因此,使用注意力图对特征图进行修正,从而实现对目标对象的特征的修正,可以使得待检测图像中的目标对象的特征信息更为突出,更易被检测和识别,提高检测准确性,减少误检或漏检现象。
参照图2,示出了根据本申请实施例的另一种关键点检测方法的步骤流程图。本实施例的关键点检测方法包括以下步骤:
步骤S202:获取包括有目标对象的待检测图像。
本申请实施例中,待检测图像可以是静态图像,也可以是视频帧图像中的任意一帧图像。
步骤S204:经神经网络对待检测图像进行特征提取操作。
如上述图1所示实施例所述,神经网络可以选用任意适当的可实现特征提取或目标对象检测的神经网络。本实施例中,神经网络选择卷积神经网络,可选地,卷积神经网络可以为HOURGLASS(沙漏)神经网络。相比较于其它卷积神经网络,沙漏神经网络可以通过对目标对象的关键点的有效检测实现目标对象的识别,可以对人体姿态进行非常有效的检测。单个沙漏神经网络采用对称的拓扑结构,通常包括输入层、卷积层、池化层、上采样层等,沙漏神经网络的输入是图片,输出是可以对至少一个像素点(例如每个像素点)进行判断的得分图。输出部分每个得分图对应一个目标对象上的一个关键点。针对某一个关键点,得分图上分数最高的位置,代表检测到的该关键点的位置。沙漏神经网络中,通过池化(POOLING)层不断减小分辨率,得到全局特征,然后将全局特征插值放大,和特征图中对应分辨率的位置结合进行判断。
可选地,神经网络可以包括端对端堆叠的多个子神经网络,例如端对端堆叠的多个卷积神经网络,可选地,可选择端对端堆叠的多个沙漏子神经网络。端对端堆叠的多个子神经网络相较于单个神经网络,可以对特征进行更深层次的提取,以保证提取的特征的准确和有效。但不限于沙漏子神经网络,其它具有与沙漏神经网络相同或相似结构、具有关键点检测功能的神经网络均可适用本申请实施例的方案。
当神经网络选择端对端堆叠的多个沙漏子神经网络时,一种可行结构如图3所示。图3中,由8个沙漏子神经网络堆叠在一起,形成用于进行关键点检测的沙漏神经网络。这8个沙漏子神经网络端对端连接在一起,前一沙漏的输出为相邻的后一沙漏的输入。通过该种结构,使得自底向上、自顶向下的分析和学习贯穿模型始终,从而使得对目标对象关键点的检测更为准确。但本领域技术人员应当理解,实际应用中,沙漏子神经网络的数量可以根据实际需要适当设定,本申请实施例仅以8个为例进行说明。
当神经网络选择卷积神经网络时,则经卷积神经网络对待检测图像进行卷积操作,以获得待检测图像的第一特征信息。
在一种可行方式中,卷积神经网络对输入的待检测图像进行特征提取获得特征信息并生成特征图。但需要说明的是,可以认为特征图为特征信息的一种表现形式,在实际应用中,可以直接对特征信息进行操作。
通常情况下,可以获取卷积神经网络如沙漏神经网络中最后一个卷积层输出的目标对象的特征信息。当沙漏神经网络包括多个沙漏子神经网络时,对每一个沙漏子神经网络都引入注意力机制,获取每一个沙漏子神经网络中的最后一个卷积层输出的特征信息(如特征图)。
此外,每个沙漏子神经网络通常包括多个残差模块(Residual Unit,RU),沙漏神经网络通过残差模块提取图像较高层次的特征,同时保留原有层次的信息,不改变数据尺寸,只改变数据深度,可以看作是一个保留数据尺寸的高级卷积层。并且,残差模块能够结合不同分辨率的特征,使得特征学习更加鲁棒。
本实施例中,在每个沙漏子神经网络中的多个残差模块中,对其中的至少一个残差模块进行了改进,改进后的残差模块称为沙漏残差模块(Hourglass Residual Unit,HRU)。每个沙漏中包括至少一个沙漏残差模块,每个沙漏残差模块包括第一残差分支、第二残差分支和第三残差分支。每个沙漏残差模块在进行特征提取操作时,经第一残差分支对输入当前沙漏残差模块的图像块进行恒等映射,获得恒等映射后的第一图像块包含的第一特征信息;经第二残差分支对输入当前沙漏残差模块的图像块中的卷积核大小指示的图像区域进行卷积处理,获得卷积处理后的第二图像区域包含的第二特征信息;经第三残差分支将输入当前沙漏残差模块的图像块按照池化核大小进行池化处理,并按照卷积核大小对池化处理后的图像块中的图像区域进行卷积处理,对卷积处理后的图像区域进行上采样,生成与输入当前沙漏残差模块的图像块大小相同的第三图像块,获得第三图像块的第三特征信息;进而,将第一特征信息、第二特征信息和第三特征信息进行合并处理,获得当前沙漏残差模块提取到的特征信息。通过对传统残差模块的改进,扩大了残差模块输出的感受野(receptive field),简化了残差模块的学习和检测过程。但本领域技术人员应当明了,在实际应用中,采用传统的残差模块,也即仅设置有第一残差分支和第二残差分支的残差模块,同样适用于本申请实施例的方案。
在一个沙漏子神经网络中可能仅包括多个沙漏残差模块,也可能仅包括多个残差模块,还可能不仅包括至少一个沙漏残差模块,也包括至少一个残差模块。在此情况下,前一个沙漏残差模块或残差模块的输出为相邻的后一个沙漏残差模块或残差模块的输入,沙漏子神经网络中的最后一个沙漏残差模块或残差模块的输出为当前沙漏子神经网络的输出。
并且,如果当前沙漏子神经网络为多个子神经网络中的首个子神经网络(如图3中的第一个沙漏子神经网络),其输入为原始待检测图像,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块残差模块,对输入的包括有目标对象的原始待检测图像进行特征提取操作;和/或,如果当前沙漏子神经网络为多个子神经网络中的非首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对与当前沙漏子神经网络相邻的前一沙漏子神经网络输出的图像进行特征提取操作。
可选地,为使神经网络提取的特征信息更为准确,在经神经网络对包括有目标对象的待检测图像进行特征提取操作时,可以获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,然后获得多个特征图对应的特征信息。
在一个可选示例中,该步骤S204可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一特征提取模块602执行。
步骤S206:根据提取到的特征信息,生成目标对象的注意力图。
在一种可行方式中,如使用前述经卷积神经网络对待检测图像进行卷积操作,获得待检测图像的第一特征信息的方式时,可以对第一特征信息进行非线性变换,获得第二特征信息;根据第二特征信息,生成注意力图。
例如,采用公式s=g(w α*f+b)生成注意力图。其中,w α表示卷积过滤器,是一个包含网络参数如沙漏神经网络的网络参数的线性变换的矩阵,f表示一个神经网络输出的特征如沙漏神经网络最后输出的特征(可表现为一个特征层的特征f),b表示偏差(bias),g()表示非线性变换的方程(如ReLU)。特征层的特征f有多个通道(比如128、256、512这三种常用的设置),但是s作为输出,只有一个通道。通过非线性变换g(),将s的值控制在0-1之间。
当神经网络选择沙漏神经网络、且沙漏神经网络包括多个沙漏子神经网络时,针对每一个沙漏子神经网络:可以获得当前沙漏子神经网络的多个卷积层对应输出的不同分辨率的多个特征图;分别对多个特征图进行上采样,获得多个特征图对应的特征信息;根据多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图。不同分辨率的特征图能够实现特征从粗到细的多层次提取。
在一个可选示例中,该步骤S206可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一生成模块604执行。
步骤S208:对注意力图进行处理。
包括:使用条件随机场(Conditional Random Fields,CRF)对注意力图进行平滑化处理;或者,使用归一化函数(包括但不限于SOFTMAX函数)对注意力图进行归一化处理。
其中,条件随机场的获得可以由本领域技术人员采用任意适当的方式获得,条件随机场中的参数能够体现特征之间的空间上下文信息,实现注意力图的平滑化处理。
在一个可选示例中,该步骤S208可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一处理模块610执行。
本步骤为可选步骤,通过本步骤,可以去除注意力图中的噪声点。
步骤S210:使用注意力图修正特征信息。
注意力图具有较显著的目标对象的特征信息,使用注意力图修正特征信息,可以使得目标对象的特征信息更为显著。
当神经网络包括端对端堆叠的多个子神经网络时,如前述的多个沙漏子神经网络时,针对每一个子神经网络,根据当前子神经网络提取的特征信息生成当前子神经网络的注意力图,通过当前子神经网络的注意力图修正当前子神经网络提取的特征信息;其中,如果当前子神经网络为多个子神经网络中的非末个子神经网络,则当前子神经网络修正后的特征信息为相邻的后一子神经网络的输入;和/或,如果当前子神经网络为多个子神经网络中的末个子神经网络,则可以根据当前子神经网络修正后的特征信息,对目标对象进行关键点检测。
当如步骤S206中所述,通过获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息时,则可以根据多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图,将生成的多个不同分辨率的注意力图进行合并处理,生成当前子神经网络的最终的目标对象的注意力图,使用最终的注意力图修正当前沙漏输出的特征图,获得修正后的特征信息。当沙漏神经网络包括多个沙漏子神经网络时,每一个沙漏子神经网络都执行上述修正过程。
可选地,可以根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息。这样,注意力图中为1的点,将不会改变对应位置的特征信息的值,但是注意力图中为0的点,就会将对应位置的特征信息置为0,从而归类到非目标对象区域中,一方面使得目标对象更为突出,另一方面,为0的点将不再参与接下来的处理,减轻了目标对象的关键点检测的数据处理负担,提高了处理效率。
在一种可行方式中,如果当前子神经网络是上述多个子神经网络中设定的前N个子神经网络,则使用当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得目标对象所在的区域的特征信息;和/或,如果当前子神经网络并非上述多个子神经网络中设定的前N个子神经网络,则经当前子神经网络对表示目标对象所在的区域的 特征信息的特征图进行特征提取操作,根据提取到的特征信息生成当前子神经网络的注意力图;使用当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象的关键点对应的区域的像素值置零,获得目标对象的关键点对应的区域的特征信息;其中,前N个子神经网络对应的注意力图的分辨率,低于后M-N个子神经网络对应的注意力图的分辨率,其中,M表示上述多个子神经网络的总数量,M为大于1的整数,N为大于0的整数且N小于M。
例如,在神经网络由多个沙漏子神经网络组成的情况下,在使用注意力图修正特征信息时,可以判断当前沙漏子神经网络是否为上述多个子神经网络中设定的前N个子神经网络;若是,则使用注意力图修正当前沙漏子神经网络输出的特征图;获得目标对象所在的区域的特征信息;若否,则使用注意力图修正当前沙漏子神经网络输出的特征图,获得目标对象的关键点的特征信息。该种方式中,对堆叠的多个沙漏子神经网络提取的特征信息进行区分,该区分可以通过调整网络参数实现。其中,前N个沙漏子神经网络提取的特征信息的分辨率较低,可以使得目标对象所在的前景部分更为突出,尽可能去除背景部分对后续目标对象确定的影响;后M-N个沙漏子神经网络提取的特征信息的分辨率较高,可去除背景部分的影响的基础上,进一步对目标对象的关键点进行明确的检测和识别。
其中,M和N的数量可以由本领域技术人员根据实际需求适当设置,例如,在一个可选示例中,N可以设置为M的一半数量。
在一个可选示例中,该步骤S210可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一修正模块606执行。
步骤S212:根据修正后的特征信息,对目标对象进行关键点检测。
在一个可选示例中,该步骤S212可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的检测模块608执行。
以下,以一个人体识别的实例为例,对本申请实施例的图像检测方法进行说明。
本实例以沙漏神经网络为基础,将8个沙漏子神经网络堆叠在一起,起始输入为源图片,最终输出对于源图片中每个像素点的判断的多个得分图。每个得分图对应一个人体身体上的一个关键点。A关键点得分图上分数最高的位置,代表检测到关键点A的位置。沙漏神经网络是通过POOLING层不断减小分辨率,得到全局特征,然后将全局特征插值放大,和特征图对应分辨率的位置结合进行判断。
本实例中,对上述由8个沙漏子神经网络堆叠在一起神经网络结果进行改进,在每一个沙漏子神经网络的最后一个卷积层的后面引入注意力机制,包括:产生注意力图,对注意力进行平滑化处理,使用注意力图改变源图片中的输入特征的值。
以下,以单个沙漏子神经网络的改进为例,对引入的注意力机制的沙漏神经网络进行说明,其它沙漏子神经网络可以参照下述说明实现改进。该改进包括:
(1)产生注意力图。
例如,采用公式s=g(w α*f+b)生成注意力图。
公式中的f是当前沙漏子神经网络最后一个卷积层输出的特征层中的特征,w α是线性变换的矩阵(包括所有网络训练参数),b是偏差(bias),g()是非线性变换的方程(如条件随机场或SOFTMAX)。特征层的特征包括多个通道(channel)(比如128,256,512这三种常用的设置)的特征,但是s作为输出,只有一个通道,通过非线性变换g(),将s的值控制在0-1之间。
(2)注意力图平滑化处理。
本步骤中,一种方式可以通过传统的SOFTMAX函数将注意力图中的值归一化到0-1之间;另一种方式通过多次迭代学习到的一个平滑化的核,即通过条件随机场去掉注意力图中的噪点。其中,条件随机场的获得可以由本领域技术人员采用任意适当的方式获得,条件随机场中的参数能够体现特征之间的空间上下文信息,实现注意力图的平滑化处理。
(3)使用注意力图改变源图像的输入特征的值(特征图中的特征的值)。
注意力图是一个W*H的图,只有一个通道,而特征层是W*H*C的张量。其中,W表示宽,H表示高,C表示通道数。将注意力图复制C个通道,然后点对点的乘在特征层上。这样,注意力图中为1的点,将不会改变特征层对应位置的值,但是注意力图中为0的点,就会将特征层中对应位置至为0,从而归类到背景中,不再参与接下来的判断。
本实例中,采用了不同分辨率的特征层,从而结合了全局特征和局部细节特征的判断,由此,在对特征层的特征进行差值的同时,产生了多个不同大小的注意力图,如4个不同大小的注意力图(分别是8*8,16*16,32*32和64*64)。将不同的注意力图调整到设定大小如源图像的1/4大小,并覆盖到特征图上。其中,8*8像素大小的注意力图可以将整个人体从背景中抠出来,但是64*64的注意力图中,只有人体的关键点被选出来。将这四个注意力图相加合并,然后用合并后的注意力图去改变源图像输入特征的值。
此外,本实例的注意力机制采用了由粗到细的注意力机制。在不同的沙漏子神经网络上,注意力机制关注的点不同。在前四个沙漏子神经网络中,网络比较浅,分辨前景背景的能力较差,所以在前四个沙漏子神经网络中,只通过注意力机制去区分前景和背景,做一个粗略的分割。在后四个沙漏子神经网络中,网络比较深,学习能力更强,有更好的分辨能力,通过注意力机制去进一步区分前景中关键点的分类(比如是头,还是手)。
通过上述过程,实现了沙漏神经网络中注意力机制的引入。
在此基础上,可选地,本实例采用新的沙漏残差模块结构来替换掉每个沙漏子神经网络中全部或部分的残差模块。如图4所示,原始的残差模块中,只有A分支(即恒等映射分支(Identity mapping branch))和B分支(即残差分支(Residual branch))两个分支,本实例增加了C分支(即沙漏残差分支(Hourglass residual branch))。如图4中所示,A分支中主要用于对输入当前沙漏残差模块的图像进行恒等映射,仍然输出该输入的图像;B分支对输入当前沙漏残差模块的图像依次进行1×1,3×3,1×1的卷积,最终获得1×1的卷积结果;C分支对输入当前沙漏残差模块的图像依次进行2×2的池化,两次3×3的卷积,以及上采样处理,最终获得与输入当前沙漏残差模块的图像的大小相同的图像。通过增加C分支,可以增加残差模块输出的时候的感受野(receptive field),从而使得判断不局限于一个小的区域。
通过本实例,一方面,在沙漏神经网络中引入注意力机制,能够有效区分图像的目标对象所在的前景(如人)和背景(如周围物体),然后集中检测前景中目标对象的关键点,能够使得目标对象被遮挡的部分被划分在前景中,从而在后续的检测中能够更容易被检测到;另一面,结合不同分辨率特征层产生的特征图判断目标对象的关键点,分辨率较小的特征图的特征产生的注意力图涵盖相对大的区域,分辨率较大的特征图的特征产生的注意力图涵盖更细节的点,通过结合不同分辨率的图,将全局的判断和局部的判断结合起来,从而更好的处理目标对象的关键点被遮挡的问题;再一方面,可以将传统注意力机制中的归一化函数替换为条件随机场,从而去掉注意力机制中的噪音点;再一方面,使用改进的沙漏残差模块,从而扩大了模型的感受野。
参照图5,示出了根据本申请实施例的一种神经网络训练方法的步骤流程图。本申请各实施例的神经网络训练方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。本实施例的神经网络训练方法包括以下步骤:
步骤S302:经神经网络对包括目标对象的训练样本图像进行特征提取操作。
在一个可选示例中,该步骤S302可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二特征提取模块702执行。
本申请各实施例中,神经网络可以是任意适当的可实现特征提取和目标对象关键点检测的神经网络,例如包括但不限于卷积神经网络、增强学习神经网络、对抗神经网络中的生成网络等等。可选地,卷积神经网络可以为沙漏神经网络。
步骤S304:根据提取到的特征信息,生成目标对象的注意力图。
在一个可选示例中,该步骤S304可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二生成模块704执行。
步骤S306:使用注意力图修正特征信息。
在一个可选示例中,该步骤S306可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二修正模块706执行。
步骤S308:根据修正后的特征信息,获得目标对象的关键点预测信息。
在一个可选示例中,该步骤S308可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的预测模块708执行。
对神经网络如卷积神经网络的训练是一个迭代的多次训练学习的过程,在每一次训练学习过程中,对图像中的目标对象的关键点进行预测,获得目标对象的关键点预测信息。进而,可以根据该关键点预测信息与实际标注信息的差别,反向调整卷积神经网络的网络参数,以实现最终较为精准的预测。训练的终止条件可以是训练次数满足设定的次数等常规条件,本申请实施例对此不作限制。
步骤S310:获得关键点预测信息与训练样本图像中的关键点标注信息之间的差异。
其中,获得关键点预测信息与关键点标注信息之间的差异的方式可以由本领域技术人员根据实际需求适当设置,包括但不限于均方误差方式等,本申请实施例对此不作限制。
在一个可选示例中,该步骤S310可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的差异获得模块710执行。
步骤S312:根据所述差异调整卷积神经网络的网络参数。
在一个可选示例中,该步骤S310可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的调整模块712执行。
通过本实施例,实现了对引入注意力机制的神经网络的训练,训练后的神经网络能够使用注意力图对待检测图像的特征信息进行修正,从而实现对待检测图像的特征的修正,可以使得待检测图像中的目标对象的特征信息更为突出,更易被检测和识别。
参照图6,示出了根据本申请实施例的另一种神经网络训练方法的步骤流程图。本实施例以对引入了注意力机制的沙漏神经网络的训练为例,其它引入注意力机制的卷积神经网络或其它神经网络的训练可参照本实施例实现。其中,本实施例中的沙漏神经网络包括多个沙漏子神经网络。本实施例的神经网络训练方法包括以下步骤:
步骤S402:经沙漏子神经网络对包括目标对象的训练样本图像进行特征提取操作。
在一个可选示例中,该步骤S402可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二特征提取模块802执行。
本实施例中,沙漏神经网络包括多个沙漏子神经网络,如图3中所示的8个,其中,第一个沙漏子神经网络的输入为原始的训练样本图像,其它沙漏子神经网络的输入为相邻的前一沙漏子神经网络的输出。
在一种可行方式中,本步骤可以通过经卷积神经网络对训练样本图像进行卷积操作,获得训练样本图像的第一特征信息。如,通过沙漏子神经网络对训练样本图像进行卷积操作,获得训练样本图像的第一特征信息。
本实施例中,神经网络采用卷积神经网络,例如沙漏神经网络,该沙漏神经网络包括多个沙漏子神经网络,其中,在先沙漏子神经网络的输出作为相邻的在后沙漏子神经网络的输入,每个沙漏子神经网络均采用本申请实施例的方法进行训练。
当神经网络包括端对端堆叠的多个子神经网络时,针对每一个子神经网络,在经神经网络对包括目标对象的训练样本图像进行特征提取操作时,可以获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息,以使获得的特征信息丰富和精准。
此外,当神经网络采用包括多个沙漏子神经网络的结构时,每个沙漏子神经网络包括至少一个沙漏残差模块,每个沙漏残差模块包括第一残差分支、第二残差分支和第三残差分支。在此情况下,经每个沙漏子神经网络中的每个沙漏残差模块对包括有目标对象的训练样本图像进行特征提取操作。可选地,包括:经第一残差分支对输入当前沙漏残差模块的图像块进行恒等映射,获得恒等映射后的第一图像块包含的第一特征信息;经第二残差分支对输入当前沙漏残差模块的图像块中的卷积核大小指示的图像区域进行卷积处理,获得卷积处理后的第二图像区域包含的第二特征信息;经第三残差分支将输入当前沙漏残差模块的图像块按照池化核大小进行池化处理,并按照卷积核大小对池化处理后的图像块中的图像区域进行卷积处理,对卷积处理后的图像区域进行上采样,生成与输入当前沙漏残差模块的图像块大小相同的第三图像块,获得第三图像块的第三特征信息;将第一特征信息、第二特征信息和第三特征信息进行合并处理,获得当前沙漏残差模块提取到的特征信息。通过该方式,扩大了残差模块输出的感受野,简化了残差模块的学习和检测过程。但本领域技术人员应当明了,在实际应用中,采用传统的残差模块,也即仅设置有第一残差分支和第二残差分支的残差模块,同样适用于本申请实施例的方案。
此外,还需要说明的是,如果当前沙漏子神经网络为多个子神经网络中的首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对输入的包括有目标对象的原始待检测图像进行特征提取操作;和/或,如果当前沙漏子神经网络为多个子神经网络中的非首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对与当前沙漏子神经网络相邻的前一沙漏子神经网络输出的图像进行特征提取操作。
以下,以一个沙漏子神经网络的训练为例,其它沙漏子神经网络的训练可参照本实施例执行。
本步骤中,获得的特征信息可以为当前沙漏子神经网络最后一个卷积层输出的特征信息。
步骤S404:根据提取到的特征信息,生成目标对象的注意力图。
如,在步骤S402获得的第一特征信息的基础上,对第一特征信息进行非线性变换,获得第二特征信息;根据第二特征信息,生成目标对象的注意力图。例如可采用上述图2所示实施例中生成注意力图的方式生成,在此不再赘述。
此外,当采用获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息的方式时,可以根据多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图;对多个不同分辨率的注意力图进行合并处理,生成当前子神经网络的最终的目标对象的注意力图。
在一个可选示例中,该步骤S404可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二生成模块804执行。
步骤S406:使用注意力图修正特征信息。
在一种可行方式中,在本步骤之前,可选地,还可以使用条件随机场对注意力图进行平滑化处理;或者,使用归一化函数对注意力图进行归一化处理。
当神经网络包括端对端堆叠的多个子神经网络时,针对每一个子神经网络,根据当前子神经网络提取的特征信息生成当前子神经网络的注意力图,通过当前子神经网络的注意力图修正当前子神经网络提取的特征信息;如果当前子神经网络为多个子神经网络中的非末个子神经网络,则当前子神经网络修正后的特征信息为相邻的后一子神经网络的输入;和/或,如果当前子神经网络为多个子神经网络中的末个子神经网络,则根据当前子神经网络修正后的特征信息,对目标对象进行关键点检测。
可选地,可以根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息。
在一个可选示例中,该步骤S406可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二修正模块806执行。
步骤S408:根据修正后的特征信息,获得目标对象的关键点预测信息。
在一个可选示例中,该步骤S408可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的预测模块808执行。
步骤S410:获得关键点预测信息与训练样本图像中的关键点标注信息之间的差异。
如,通过损失函数计算关键点预测信息与关键点标注信息之间的差别,如二者之间的均方误差。
在一个可选示例中,该步骤S410可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的差异获得模块810执行。
步骤S412:根据所述差异调整当前沙漏子神经网络的网络参数。
在一个可选示例中,该步骤S412可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的调整模块812执行。
通过上述步骤,实现了单个沙漏子神经网络的训练。对每一个沙漏子神经网络均进行上述训练,实现整个沙漏的训练。
此外,对不同沙漏子神经网络的训练的着重点可以不同,例如,以8个沙漏子神经网络堆叠成一个沙漏神经网络为例,在前四个沙漏子神经网络中,网络比较浅,分辨前景背景的能力较差,所以在前四个沙漏子神经网络中,着重训练通过注意力机制去区分前景和背景,做一个粗略的分割。在后四个沙漏子神经网络中,网络比较深,学习能力更强,有更好的分辨能力,着重通过注意力机制去进一步区分前景中关键点的分类(比如是头,还是手)。着重点的区分可以由本领域技术人员通过调整网络训练参数实现。
其次,还可以对用于训练的沙漏子神经网络中的残差模块进行改进,采用新的沙漏残差模块结构来替换掉每个沙漏子神经网络中全部或部分的残差模块。如图4所示,原始的残差模块中,只有A分支(即恒等映射分支)和B分支(即残差分支)两个分支,本实例增加了C分支(即沙漏残差分支),以增加残差模块输出的时候的感受野,从而使得判断不局限于一个小的区域,减轻沙漏子神经网络的训练难度和负担。
通过本实施例,实现了对引入注意力机制的神经网络的训练,训练后的神经网络能够使用注意力图对待检测图像的特征信息进行修正,从而实现对待检测图像的特征的修正,可以使得待检测图像中的目标对象的特征信息更为突出,更易被检测和识别。
本实施例的神经网络训练方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。
本申请上述任一实施例的方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本申请上述任一实施例的方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本申请上述任一实施例的方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
本申请各实施例的关键点检测装置可用于实现前述方法实施例中相应的关键点检测方法,并具有相应的方法实施例的有益效果,在此不再赘述。
参照图7,示出了根据本申请实施例的一种关键点检测装置的结构框图。本实施例的关键点检测装置包括:第一特征提取模块502,用于经神经网络对包括有目标对象的待检测图像进行特征提取操作;第一生成模块504,用于根据提取到的特征信息,生成所述目标对象的注意力图;第一修正模块506,用于使用注意力图修正所述特征信息;检测模块508,用于根据修正后的特征信息,对目标对象进行关 键点检测。
参照图8,示出了根据本申请实施例的另一种关键点检测装置的结构框图。本实施例的关键点检测装置包括:第一特征提取模块602,用于经神经网络对包括有目标对象的待检测图像进行特征提取操作;第一生成模块604,用于根据提取到的特征信息,生成目标对象的注意力图;第一修正模块606,用于使用注意力图修正所述特征信息;检测模块608,用于根据修正后的特征信息,对目标对象进行关键点检测。
可选地,第一特征提取模块602用于经卷积神经网络对所述待检测图像进行卷积操作,获得所述待检测图像的第一特征信息;第一生成模块604用于对第一特征信息进行非线性变换,获得第二特征信息;根据第二特征信息,生成目标对象的注意力图。
可选地,本实施例的关键点检测装置还包括:第一处理模块610,用于在第一修正模块606使用注意力图修正所述特征信息之前,使用条件随机场对注意力图进行平滑化处理;或者,使用归一化函数对注意力图进行归一化处理。
可选地,神经网络包括端对端堆叠的多个子神经网络;针对每一个子神经网络,第一生成模块604根据当前子神经网络提取的特征信息生成当前子神经网络的注意力图,第一修正模块606通过当前子神经网络的注意力图修正当前子神经网络提取的特征信息;如果当前子神经网络为多个子神经网络中的非末个子神经网络,则当前子神经网络修正后的特征信息为相邻的后一子神经网络的输入;和/或,如果当前子神经网络为多个子神经网络中的末个子神经网络,则检测模块608根据当前子神经网络修正后的特征信息,对目标对象进行关键点检测。
可选地,第一修正模块606在通过当前子神经网络的注意力图修正当前子神经网络提取的特征信息时,根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息。
可选地,第一修正模块606在根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息时,如果当前子神经网络是上述多个子神经网络中设定的前N个子神经网络,则使用当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得目标对象所在的区域的特征信息;和/或,如果当前子神经网络并非上述多个子神经网络中设定的前N个子神经网络,则经当前子神经网络对表示目标对象所在的区域的特征信息的特征图进行特征提取操作,根据提取到的特征信息生成当前子神经网络的注意力图;使用当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象的关键点对应的区域的像素值置零,获得目标对象的关键点对应的区域的特征信息;其中,前N个子神经网络对应的注意力图的分辨率,低于后M-N个子神经网络对应的注意力图的分辨率,其中,M表示上述多个子神经网络的总数量,M为大于1的整数,N为大于0的整数且N小于M。
可选地,针对每一个子神经网络,所述第一特征提取模块602获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息;第一生成模块604根据多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图;对多个不同分辨率的注意力图进行合并处理,生成当前子神经网络的最终的目标对象的注意力图。
可选地,神经网络为沙漏神经网络。
可选地,沙漏神经网络包括多个沙漏子神经网络,每个沙漏子神经网络包括至少一个沙漏残差模块;每个沙漏残差模块包括第一残差分支、第二残差分支和第三残差分支;其中,第一特征提取模块602在经每个沙漏子神经网络中的每个沙漏残差模块对包括有目标对象的待检测图像进行特征提取操作时,经第一残差分支对输入当前沙漏残差模块的图像块进行恒等映射,获得恒等映射后的第一图像块包含的第一特征信息;经第二残差分支对输入当前沙漏残差模块的图像块中的卷积核大小指示的图像区域进行卷积处理,获得卷积处理后的第二图像区域包含的第二特征信息;经第三残差分支将输入当前沙漏残差模块的图像块按照池化核大小进行池化处理,并按照卷积核大小对池化处理后的图像块中的图像区域进行卷积处理,对卷积处理后的图像区域进行上采样,生成与输入当前沙漏残差模块的图像块大小相同的第三图像块,获得所述第三图像块的第三特征信息;将第一特征信息、第二特征信息和第三特征信息进行合并处理,获得当前沙漏残差模块提取到的特征信息。
可选地,第一特征提取模块602在进行特征提取操作时:如果当前沙漏子神经网络为多个子神经网络中的首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对输入的包括有目标对象的原始待检测图像进行特征提取操作;和/或,如果当前沙漏子神经网络为多个子神经网络中的非首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对与当前沙漏子神经网络相邻的前一沙漏子神经网络输出的图像进行特征提取操作。
本申请各实施例的关键点检测装置可用于实现前述方法实施例中相应的神经网络训练方法,并具有相应的方法实施例的有益效果,在此不再赘述。
参照图9,示出了根据本申请实施例的一种神经网络训练装置的结构框图。
本实施例的神经网络训练装置包括:第二特征提取模块702,用于经神经网络对包括目标对象的训练样本图像进行特征提取操作;第二生成模块704,用于根据提取到的特征信息,生成目标对象的注意力图;第二修正模块706,用于使用注意力图修正所述特征信息;预测模块708,用于根据修正后的特征信息,获得目标对象的关键点预测信息;差异获得模块710,用于获得关键点预测信息与训练样本图像中的关键点标注信息之间的差异;调整模块712,用于根据所述差异调整神经网络的网络参数。
参照图10,示出了根据本申请实施例的另一种神经网络训练装置的结构框图。
本实施例的神经网络训练装置包括:第二特征提取模块802,用于经神经网络对包括目标对象的训练样本图像进行特征提取操作;第二生成模块804,用于根据提取到的特征信息,生成目标对象的注意力图;第二修正模块806,用于使用注意力图修正所述特征信息;预测模块808,用于根据修正后的特征信息,获得目标对象的关键点预测信息;差异获得模块810,用于获得关键点预测信息与训练样本图像中的关键点标注信息之间的差异;调整模块812,用于根据所述差异调整神经网络的网络参数。
可选地,第二特征提取模块802用于经卷积神经网络对训练样本图像进行卷积操作,获得训练样本图像的第一特征信息;第二生成模块804用于对第一特征信息进行非线性变换,获得第二特征信息;根据第二特征信息,生成目标对象的注意力图。
可选地,本实施例的神经网络训练装置还包括:第二处理模块814,用于在第二修正模块806使用注意力图修正所述特征信息之前,使用CRF对注意力图进行平滑化处理;或者,使用归一化函数对所述注意力图进行归一化处理。
可选地,神经网络包括端对端堆叠的多个子神经网络;针对每一个子神经网络,第二生成模块804根据当前子神经网络提取的特征信息生成当前子神经网络的注意力图,第二修正模块806通过当前子神经网络的注意力图修正当前子神经网络提取的特征信息;如果当前子神经网络为多个子神经网络中的非末个子神经网络,则当前子神经网络修正后的特征信息为相邻的后一子神经网络的输入;和/或,如果当前子神经网络为多个子神经网络中的末个子神经网络,则预测模块808根据当前子神经网络修正后的特征信息,对目标对象进行关键点预测,获得目标对象的关键点预测信息。
可选地,第二修正模块806在通过当前子神经网络的注意力图修正当前子神经网络提取的特征信息时,根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息。
可选地,针对每一个子神经网络,第二特征提取模块802获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息;第二生成模块804根据多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图;对多个不同分辨率的注意力图进行合并处理,生成当前子神经网络的最终的目标对象的注意力图。
可选地,神经网络为沙漏神经网络。
可选地,沙漏神经网络包括多个沙漏子神经网络,其中,在先沙漏子神经网络的输出作为相邻的在后沙漏子神经网络的输入,每个沙漏子神经网络均采用本实施例的神经网络训练装置进行训练。
可选地,每个沙漏子神经网络包括至少一个沙漏残差模块;每个沙漏残差模块包括第一残差分支、第二残差分支和第三残差分支;其中,第二特征提取模块802在经每个沙漏子神经网络中的每个沙漏残差模块对包括有目标对象的训练样本图像进行特征提取操作时,经第一残差分支对输入当前沙漏残差模块的图像块进行恒等映射,获得恒等映射后的第一图像块包含的第一特征信息;经第二残差分支对输入当前沙漏残差模块的图像块中的卷积核大小指示的图像区域进行卷积处理,获得卷积处理后的第二图像区域包含的第二特征信息;经第三残差分支将输入当前沙漏残差模块的图像块按照池化核大小进行池化处理,并按照卷积核大小对池化处理后的图像块中的图像区域进行卷积处理,对卷积处理后的图像区域进行上采样,生成与输入当前沙漏残差模块的图像块大小相同的第三图像块,获得第三图像块的第三特征信息;将第一特征信息、第二特征信息和第三特征信息进行合并处理,获得当前沙漏残差模块提取到的特征信息。
可选地,第二特征提取模块802在进行特征提取操作时:如果当前沙漏子神经网络为多个子神经网络中的首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对输入的包括有目标对象的原始待检测图像进行特征提取操作;和/或,如果当前沙漏子神经网络为多个子神经网络中的非首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对与当前沙漏子神经网络相邻的前一沙漏子神经网络输出的图像进行特征提取操作。
另外,本申请实施例还提供了一种电子设备,包括:处理器和存储器。其中,存储器用于存放至少 一可执行指令,该可执行指令使处理器执行本申请上述任一实施例的关键点检测方法、或者神经网络训练方法对应的操作。
本申请实施例提供的电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。
下面参考图11,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备900的结构示意图。如图11所示,电子设备900包括一个或多个第一处理器、第一通信元件等,所述一个或多个第一处理器例如:一个或多个中央处理单元(CPU)901,和/或一个或多个图像处理器(GPU)913等,第一处理器可以根据存储在只读存储器(ROM)902中的可执行指令或者从存储部分908加载到随机访问存储器(RAM)903中的可执行指令而执行各种适当的动作和处理。本实施例中,第一只读存储器902和随机访问存储器903统称为第一存储器。第一通信元件包括通信组件912和/或通信接口909。其中,通信组件912可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,通信接口909包括诸如LAN卡、调制解调器等的网络接口卡的通信接口,通信接口909经由诸如因特网的网络执行通信处理。
第一处理器可与只读存储器902和/或随机访问存储器903中通信以执行可执行指令,通过第一通信总线904与通信组件912相连、并经通信组件912与其他目标设备通信,从而完成本申请任一实施例关键点检测方法对应的操作,例如,经神经网络对包括有目标对象的待检测图像进行特征提取操作;根据提取到的特征信息,生成目标对象的注意力图;使用注意力图修正所述特征信息;根据修正后的特征信息,对目标对象进行关键点检测。或者,完成本申请任一实施例神经网络训练方法对应的操作,例如,经神经网络对包括目标对象的训练样本图像进行特征提取;根据提取到的特征信息,生成所述目标对象的注意力图;使用所述注意力图修正所述特征信息;根据修正后的特征信息,获得目标对象的关键点预测信息;获得所述关键点预测信息与所述训练样本图像中的关键点标注信息之间的差异;根据所述差异调整所述神经网络的网络参数。
此外,在RAM 903中,还可存储有装置操作所需的各种程序和数据。CPU901或GPU913、ROM902以及RAM903通过第一通信总线904彼此相连。在有RAM903的情况下,ROM902为可选模块。RAM903存储可执行指令,或在运行时向ROM902中写入可执行指令,可执行指令使第一处理器执行上述通信方法对应的操作。输入/输出(I/O)接口905也连接至第一通信总线904。通信组件912可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在通信总线链接上。
以下部件连接至I/O接口905:包括键盘、鼠标等的输入部分906;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分907;包括硬盘等的存储部分908;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信接口909。驱动器910也根据需要连接至I/O接口905。可拆卸介质911,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器910上,以便于从其上读出的计算机程序根据需要被安装入存储部分908。
需要说明的,如图11所示的架构仅为一种可选实现方式,在可选实践过程中,可根据实际需要对上述图11的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信元件可分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本申请的保护范围。
特别地,根据本申请实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,经神经网络对包括有目标对象的待检测图像进行特征提取操作;根据提取到的特征信息,生成目标对象的注意力图;使用注意力图修正所述特征信息;根据修正后的特征信息,对目标对象进行关键点检测。在这样的实施例中,该计算机程序可以通过通信元件从网络上被下载和安装,和/或从可拆卸介质911被安装。在该计算机程序被第一处理器执行时,执行本申请实施例的方法中限定的上述功能。
另外,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现本申请上述任一实施例的关键点检测方法、或者神经网络训练方法。
另外,本申请实施例还提供了一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现本申请上述任一实施例的关键点检测方法、或者神经网络训练方法。
可能以许多方式来实现本申请的方法和装置、设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请实施例的方法和装置、设备。用于方法的步骤的上述顺序仅是为了进行说明,本申请实施例的方法的步骤不限于以上可选描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请实施例的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请实施例的方法的程序的记录介 质。
本申请实施例的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式,很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。

Claims (43)

  1. 一种关键点检测方法,包括:
    经神经网络对包括有目标对象的待检测图像进行特征提取;
    根据提取到的特征信息,生成所述目标对象的注意力图;
    使用所述注意力图修正所述特征信息;
    根据修正后的特征信息,对所述目标对象进行关键点检测。
  2. 根据权利要求1所述的方法,其中,所述经神经网络对包括有目标对象的待检测图像进行特征提取,包括:经卷积神经网络对所述待检测图像进行卷积操作,获得所述待检测图像的第一特征信息;
    所述根据提取到的特征信息,生成所述目标对象的注意力图,包括:对所述第一特征信息进行非线性变换,获得第二特征信息;根据所述第二特征信息,生成所述目标对象的注意力图。
  3. 根据权利要求1或2所述的方法,其中,在使用所述注意力图修正所述特征信息之前,还包括:
    使用条件随机场对所述注意力图进行平滑化处理;或者,
    使用归一化函数对所述注意力图进行归一化处理。
  4. 根据权利要求1-3任一项所述的方法,其中,所述神经网络包括端对端堆叠的多个子神经网络;
    针对每一个子神经网络,根据当前子神经网络提取的特征信息生成当前子神经网络的注意力图,使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息;
    如果当前子神经网络为所述多个子神经网络中的非末个子神经网络,则当前子神经网络修正后的特征信息为相邻的后一子神经网络的输入;和/或,如果当前子神经网络为所述多个子神经网络中的末个子神经网络,则根据当前子神经网络修正后的特征信息,对所述目标对象进行关键点检测。
  5. 根据权利要求4所述的方法,其中,所述使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息,包括:
    根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息。
  6. 根据权利要求5所述的方法,其中,根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息,包括:
    如果当前子神经网络是所述多个子神经网络中设定的前N个子神经网络,则使用当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得所述目标对象所在的区域的特征信息;和/或,
    如果当前子神经网络并非所述多个子神经网络中设定的前N个子神经网络,则经当前子神经网络对表示目标对象所在的区域的特征信息的特征图进行特征提取,根据提取到的特征信息生成当前子神经网络的注意力图;使用当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象的关键点对应的区域的像素值置零,获得所述目标对象的关键点对应的区域的特征信息;
    其中,所述前N个子神经网络对应的注意力图的分辨率,低于后M-N个子神经网络对应的注意力图的分辨率,其中,M表示所述多个子神经网络的总数量,M为大于1的整数,N为大于0的整数且N小于M。
  7. 根据权利要求4-6任一项所述的方法,其中,针对每一个子神经网络:
    所述经神经网络对包括有目标对象的待检测图像进行特征提取,包括:获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息;
    所述根据提取到的特征信息,生成所述目标对象的注意力图,包括:根据所述多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图;对所述多个不同分辨率的注意力图进行合并处理,生成当前子神经网络的目标对象的注意力图。
  8. 根据权利要求1-7任一项所述的方法,其中,所述神经网络包括:沙漏神经网络。
  9. 根据权利要求8所述的方法,其中,所述沙漏神经网络包括多个沙漏子神经网络,每个沙漏子神经网络包括至少一个沙漏残差模块;
    每个沙漏残差模块包括:第一残差分支、第二残差分支和第三残差分支;
    其中,经每个沙漏子神经网络中的每个沙漏残差模块对包括有目标对象的待检测图像进行特征提取,包括:
    经所述第一残差分支对输入当前沙漏残差模块的图像块进行恒等映射,获得恒等映射后的第一图像 块包含的第一特征信息;
    经所述第二残差分支对输入当前沙漏残差模块的图像块中的卷积核大小指示的图像区域进行卷积处理,获得卷积处理后的第二图像区域包含的第二特征信息;
    经所述第三残差分支将输入当前沙漏残差模块的图像块按照池化核大小进行池化处理,并按照卷积核大小对池化处理后的图像块中的图像区域进行卷积处理,对卷积处理后的图像区域进行上采样,生成与输入当前沙漏残差模块的图像块大小相同的第三图像块,获得所述第三图像块的第三特征信息;
    将所述第一特征信息、第二特征信息和第三特征信息进行合并处理,获得当前沙漏残差模块提取到的特征信息。
  10. 根据权利要求9所述的方法,其中,
    如果当前沙漏子神经网络为所述多个子神经网络中的首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对输入的包括有目标对象的待检测图像进行特征提取;和/或,
    如果当前沙漏子神经网络为所述多个子神经网络中的非首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对与当前沙漏子神经网络相邻的前一沙漏子神经网络输出的图像进行特征提取。
  11. 一种神经网络训练方法,包括:
    经神经网络对包括目标对象的训练样本图像进行特征提取;
    根据提取到的特征信息,生成所述目标对象的注意力图;
    使用所述注意力图修正所述特征信息;
    根据修正后的特征信息,获得目标对象的关键点预测信息;
    获得所述关键点预测信息与所述训练样本图像中的关键点标注信息之间的差异;
    根据所述差异调整所述神经网络的网络参数。
  12. 根据权利要求11所述的方法,其中,
    所述经神经网络对包括有目标对象的训练样本图像进行特征提取操作,包括:经卷积神经网络对所述训练样本图像进行卷积操作,获得所述训练样本图像的第一特征信息;
    所述根据提取到的特征信息,生成所述目标对象的注意力图,包括:对所述第一特征信息进行非线性变换,获得第二特征信息;根据所述第二特征信息,生成所述目标对象的注意力图。
  13. 根据权利要求11或12所述的方法,其中,在使用所述注意力图修正所述特征信息之前,还包括:
    使用条件随机场对所述注意力图进行平滑化处理;或者,
    使用归一化函数对所述注意力图进行归一化处理。
  14. 根据权利要求11-13任一项所述的方法,其中,所述神经网络包括端对端堆叠的多个子神经网络;
    针对每一个子神经网络,根据当前子神经网络提取的特征信息生成当前子神经网络的注意力图,使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息;
    如果当前子神经网络为所述多个子神经网络中的非末个子神经网络,则当前子神经网络修正后的特征信息为相邻的后一子神经网络的输入;和/或,
    如果当前子神经网络为所述多个子神经网络中的末个子神经网络,则根据当前子神经网络修正后的特征信息,对所述目标对象进行关键点预测,获得目标对象的关键点预测信息。
  15. 根据权利要求14所述的方法,其中,所述使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息,包括:
    根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息。
  16. 根据权利要求14-15任一项所述的方法,其中,针对每一个子神经网络,
    所述经神经网络对包括目标对象的训练样本图像进行特征提取,包括:获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息;
    所述根据提取到的特征信息,生成所述目标对象的注意力图,包括:根据多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图;对多个不同分辨率的注意力图进行合并处理,生成当前子神经网络的目标对象的注意力图。
  17. 根据权利要求11-16任一项所述的方法,其中,所述神经网络为沙漏神经网络。
  18. 根据权利要求17所述的方法,其中,所述沙漏神经网络包括多个沙漏子神经网络,其中,在先沙漏子神经网络的输出作为相邻的在后沙漏子神经网络的输入,每个沙漏子神经网络均采用权利要求 11所述的方法进行训练。
  19. 根据权利要求18所述的方法,其中,每个沙漏子神经网络包括至少一个沙漏残差模块;
    每个沙漏残差模块包括:第一残差分支、第二残差分支和第三残差分支;
    其中,经每个沙漏子神经网络中的每个沙漏残差模块对包括有目标对象的训练样本图像进行特征提取,包括:
    经所述第一残差分支对输入当前沙漏残差模块的图像块进行恒等映射,获得恒等映射后的第一图像块包含的第一特征信息;
    经所述第二残差分支对输入当前沙漏残差模块的图像块中的卷积核大小指示的图像区域进行卷积处理,获得卷积处理后的第二图像区域包含的第二特征信息;
    经所述第三残差分支将输入当前沙漏残差模块的图像块按照池化核大小进行池化处理,并按照卷积核大小对池化处理后的图像块中的图像区域进行卷积处理,对卷积处理后的图像区域进行上采样,生成与输入当前沙漏残差模块的图像块大小相同的第三图像块,获得所述第三图像块的第三特征信息;
    将所述第一特征信息、第二特征信息和第三特征信息进行合并处理,获得当前沙漏残差模块提取到的特征信息。
  20. 根据权利要求19所述的方法,其中,
    如果当前沙漏子神经网络为所述多个子神经网络中的首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对输入的包括有目标对象的原始待检测图像进行特征提取操作;
    和/或,
    如果当前沙漏子神经网络为所述多个子神经网络中的非首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对与当前沙漏子神经网络相邻的前一沙漏子神经网络输出的图像进行特征提取。
  21. 一种关键点检测装置,包括:
    第一特征提取模块,用于经神经网络对包括有目标对象的待检测图像进行特征提取;
    第一生成模块,用于根据提取到的特征信息,生成所述目标对象的注意力图;
    第一修正模块,用于使用所述注意力图修正所述特征信息;
    检测模块,用于根据修正后的特征信息,对所述目标对象进行关键点检测。
  22. 根据权利要求21所述的装置,其中,
    所述第一特征提取模块,用于经卷积神经网络对所述待检测图像进行卷积操作,获得所述待检测图像的第一特征信息;
    所述第一生成模块,用于对所述第一特征信息进行非线性变换,获得第二特征信息;根据所述第二特征信息,生成所述目标对象的注意力图。
  23. 根据权利要求21或22所述的装置,其中,还包括:
    第一处理模块,用于在所述第一修正模块使用所述注意力图修正所述特征信息之前,使用条件随机场对所述注意力图进行平滑化处理;或者,使用归一化函数对所述注意力图进行归一化处理。
  24. 根据权利要求21-23任一项所述的装置,其中,所述神经网络包括端对端堆叠的多个子神经网络;
    针对每一个子神经网络,所述第一生成模块根据当前子神经网络提取的特征信息生成当前子神经网络的注意力图,所述第一修正模块使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息;
    如果当前子神经网络为所述多个子神经网络中的非末个子神经网络,则当前子神经网络修正后的特征信息为相邻的后一子神经网络的输入;和/或,如果当前子神经网络为所述多个子神经网络中的末个子神经网络,则所述检测模块根据当前子神经网络修正后的特征信息,对所述目标对象进行关键点检测。
  25. 根据权利要求24所述的装置,其中,所述第一修正模块在使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息时,根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息。
  26. 根据权利要求25所述的装置,其中,所述第一修正模块在根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息时,
    如果当前子神经网络是所述多个子神经网络中设定的前N个子神经网络,则使用当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得所述目标对象所在的区域的特征信息;和/或,
    如果当前子神经网络并非所述多个子神经网络中设定的前N个子神经网络,则经当前子神经网络 对表示目标对象所在的区域的特征信息的特征图进行特征提取,根据提取到的特征信息生成当前子神经网络的注意力图;使用当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象的关键点对应的区域的像素值置零,获得所述目标对象的关键点对应的区域的特征信息;
    其中,所述前N个子神经网络对应的注意力图的分辨率,低于后M-N个子神经网络对应的注意力图的分辨率,其中,M表示所述多个子神经网络的总数量,M为大于1的整数,N为大于0的整数且N小于M。
  27. 根据权利要求24-26任一项所述的装置,其中,针对每一个子神经网络;
    所述第一特征提取模块获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息;
    所述第一生成模块根据多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图;对多个不同分辨率的注意力图进行合并处理,生成当前子神经网络的目标对象的注意力图。
  28. 根据权利要求21-27任一项所述的装置,其中,所述神经网络包括:沙漏沙漏神经网络。
  29. 根据权利要求28所述的装置,其中,所述沙漏神经网络包括多个沙漏子神经网络,每个沙漏子神经网络包括至少一个沙漏残差模块;
    每个沙漏残差模块包括:第一残差分支、第二残差分支和第三残差分支;
    其中,所述第一特征提取模块在经每个沙漏子神经网络中的每个沙漏残差模块对包括有目标对象的待检测图像进行特征提取时,
    经所述第一残差分支对输入当前沙漏残差模块的图像块进行恒等映射,获得恒等映射后的第一图像块包含的第一特征信息;
    经所述第二残差分支对输入当前沙漏残差模块的图像块中的卷积核大小指示的图像区域进行卷积处理,获得卷积处理后的第二图像区域包含的第二特征信息;
    经所述第三残差分支将输入当前沙漏残差模块的图像块按照池化核大小进行池化处理,并按照卷积核大小对池化处理后的图像块中的图像区域进行卷积处理,对卷积处理后的图像区域进行上采样,生成与输入当前沙漏残差模块的图像块大小相同的第三图像块,获得所述第三图像块的第三特征信息;
    将所述第一特征信息、第二特征信息和第三特征信息进行合并处理,获得当前沙漏残差模块提取到的特征信息。
  30. 根据权利要求29所述的装置,其中,第一特征提取模块在进行特征提取操作时:
    如果当前沙漏子神经网络为所述多个子神经网络中的首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对输入的包括有目标对象的原始待检测图像进行特征提取;和/或,
    如果当前沙漏子神经网络为所述多个子神经网络中的非首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对与当前沙漏子神经网络相邻的前一沙漏子神经网络输出的图像进行特征提取。
  31. 一种神经网络训练装置,包括:
    第二特征提取模块,用于经神经网络对包括目标对象的训练样本图像进行特征提取;
    第二生成模块,用于根据提取到的特征信息,生成所述目标对象的注意力图;
    第二修正模块,用于使用所述注意力图修正所述特征信息;
    预测模块,用于根据修正后的特征信息,获得目标对象的关键点预测信息;
    差异获得模块,用于获得所述关键点预测信息与所述训练样本图像中的关键点标注信息之间的差异;
    调整模块,用于根据所述差异调整所述神经网络的网络参数。
  32. 根据权利要求31所述的装置,其中,所述第二特征提取模块,用于经卷积神经网络对所述训练样本图像进行卷积操作,获得所述训练样本图像的第一特征信息;
    所述第二生成模块,用于对所述第一特征信息进行非线性变换,获得第二特征信息;根据所述第二特征信息,生成所述目标对象的注意力图。
  33. 根据权利要求31或32所述的装置,其中,还包括:
    第二处理模块,用于在所述第二修正模块使用所述注意力图修正所述特征信息之前,使用条件随机场对所述注意力图进行平滑化处理;或者,使用归一化函数对所述注意力图进行归一化处理。
  34. 根据权利要求31-33任一项所述的装置,其中,所述神经网络包括端对端堆叠的多个子神经网络;
    针对每一个子神经网络,所述第二生成模块根据当前子神经网络提取的特征信息生成当前子神经网络的注意力图,所述第二修正模块使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息;
    如果当前子神经网络为所述多个子神经网络中的非末个子神经网络,则当前子神经网络修正后的特 征信息为相邻的后一子神经网络的输入;和/或,
    如果当前子神经网络为所述多个子神经网络中的末个子神经网络,则所述预测模块根据当前子神经网络修正后的特征信息,对所述目标对象进行关键点预测,获得目标对象的关键点预测信息。
  35. 根据权利要求34所述的装置,其中,所述第二修正模块在使用当前子神经网络的注意力图修正当前子神经网络提取的特征信息时,根据当前子神经网络的注意力图,对表示当前子神经网络提取的特征信息的特征图中至少部分非目标对象对应的区域的像素值置零,获得当前子神经网络修正后的特征信息。
  36. 根据权利要求34-35任一项所述的装置,其中,针对每一个子神经网络,
    所述第二特征提取模块获得当前子神经网络的多个卷积层对应输出的不同分辨率的多个特征图,分别对多个特征图进行上采样,获得多个特征图对应的特征信息;
    所述第二生成模块根据多个特征图对应的特征信息,生成对应的多个不同分辨率的注意力图;对多个不同分辨率的注意力图进行合并处理,生成当前子神经网络的目标对象的注意力图。
  37. 根据权利要求31-36任一项所述的装置,其中,所述神经网络为沙漏神经网络。
  38. 根据权利要求37所述的装置,其中,所述沙漏神经网络包括多个沙漏子神经网络,其中,在先沙漏子神经网络的输出作为相邻的在后沙漏子神经网络的输入,每个沙漏子神经网络均采用权利要求31所述的装置进行训练。
  39. 根据权利要求38所述的装置,其中,每个沙漏子神经网络包括至少一个沙漏残差模块;
    每个沙漏残差模块包括:第一残差分支、第二残差分支和第三残差分支;
    其中,所述第二特征提取模块在经每个沙漏子神经网络中的每个沙漏残差模块对包括有目标对象的训练样本图像进行特征提取时,
    经所述第一残差分支对输入当前沙漏残差模块的图像块进行恒等映射,获得恒等映射后的第一图像块包含的第一特征信息;
    经所述第二残差分支对输入当前沙漏残差模块的图像块中的卷积核大小指示的图像区域进行卷积处理,获得卷积处理后的第二图像区域包含的第二特征信息;
    经所述第三残差分支将输入当前沙漏残差模块的图像块按照池化核大小进行池化处理,并按照卷积核大小对池化处理后的图像块中的图像区域进行卷积处理,对卷积处理后的图像区域进行上采样,生成与输入当前沙漏残差模块的图像块大小相同的第三图像块,获得所述第三图像块的第三特征信息;
    将所述第一特征信息、第二特征信息和第三特征信息进行合并处理,获得当前沙漏残差模块提取到的特征信息。
  40. 根据权利要求39所述的装置,其中,第二特征提取模块在进行特征提取操作时:
    如果当前沙漏子神经网络为所述多个子神经网络中的首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对输入的包括有目标对象的原始待检测图像进行特征提取操作;和/或,
    如果当前沙漏子神经网络为所述多个子神经网络中的非首个子神经网络,则通过当前沙漏子神经网络的沙漏残差模块和/或残差模块,对与当前沙漏子神经网络相邻的前一沙漏子神经网络输出的图像进行特征提取。
  41. 一种电子设备,包括:处理器和存储器;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-20任一项所述方法对应的操作。
  42. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现上述权利要求1-20任一项所述的方法。
  43. 一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现上述权利要求1-20任一项所述的方法。
PCT/CN2018/076689 2017-02-23 2018-02-13 关键点检测方法、神经网络训练方法、装置和电子设备 WO2018153322A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710100498.2 2017-02-23
CN201710100498.2A CN108229490B (zh) 2017-02-23 2017-02-23 关键点检测方法、神经网络训练方法、装置和电子设备

Publications (1)

Publication Number Publication Date
WO2018153322A1 true WO2018153322A1 (zh) 2018-08-30

Family

ID=62656500

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/076689 WO2018153322A1 (zh) 2017-02-23 2018-02-13 关键点检测方法、神经网络训练方法、装置和电子设备

Country Status (2)

Country Link
CN (1) CN108229490B (zh)
WO (1) WO2018153322A1 (zh)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635630A (zh) * 2018-10-23 2019-04-16 百度在线网络技术(北京)有限公司 手部关节点检测方法、装置及存储介质
CN109657482A (zh) * 2018-10-26 2019-04-19 阿里巴巴集团控股有限公司 一种数据有效性的验证方法和装置
CN109685246A (zh) * 2018-11-13 2019-04-26 平安科技(深圳)有限公司 环境数据预估方法、装置及存储介质、服务器
CN110110689A (zh) * 2019-05-15 2019-08-09 东北大学 一种行人重识别方法
CN110148212A (zh) * 2019-05-17 2019-08-20 北京市商汤科技开发有限公司 一种动作序列生成方法及装置、电子设备和存储介质
CN110222718A (zh) * 2019-05-09 2019-09-10 华为技术有限公司 图像处理的方法及装置
CN110287846A (zh) * 2019-06-19 2019-09-27 南京云智控产业技术研究院有限公司 一种基于注意力机制的人脸关键点检测方法
CN111008929A (zh) * 2019-12-19 2020-04-14 维沃移动通信(杭州)有限公司 图像矫正方法及电子设备
CN111144168A (zh) * 2018-11-02 2020-05-12 阿里巴巴集团控股有限公司 农作物生长周期的识别方法、设备以及系统
CN111191486A (zh) * 2018-11-14 2020-05-22 杭州海康威视数字技术股份有限公司 一种溺水行为识别方法、监控相机及监控系统
CN111210432A (zh) * 2020-01-12 2020-05-29 湘潭大学 一种基于多尺度多级注意力机制的图像语义分割方法
CN111353349A (zh) * 2018-12-24 2020-06-30 杭州海康威视数字技术股份有限公司 人体关键点检测方法、装置、电子设备及存储介质
CN111680722A (zh) * 2020-05-25 2020-09-18 腾讯科技(深圳)有限公司 内容识别方法、装置、设备及可读存储介质
CN111815606A (zh) * 2020-07-09 2020-10-23 浙江大华技术股份有限公司 图像质量评估方法、存储介质及计算装置
CN111860652A (zh) * 2020-07-22 2020-10-30 中国平安财产保险股份有限公司 基于图像检测的动物体重测量方法、装置、设备及介质
CN112164109A (zh) * 2020-07-08 2021-01-01 浙江大华技术股份有限公司 坐标修正方法、装置、存储介质及电子装置
CN112183826A (zh) * 2020-09-15 2021-01-05 湖北大学 基于深度级联生成对抗网络的建筑能耗预测方法及相关产品
CN112183269A (zh) * 2020-09-18 2021-01-05 哈尔滨工业大学(深圳) 一种适用于智能视频监控的目标检测方法与系统
CN112257567A (zh) * 2020-10-20 2021-01-22 浙江大华技术股份有限公司 行为识别网络的训练、行为识别方法及相关设备
CN112712061A (zh) * 2021-01-18 2021-04-27 清华大学 适用于多方向交警指挥手势的识别方法、系统及存储介质
CN112990046A (zh) * 2021-03-25 2021-06-18 北京百度网讯科技有限公司 差异信息获取方法、相关装置及计算机程序产品
CN113052175A (zh) * 2021-03-26 2021-06-29 北京百度网讯科技有限公司 目标检测方法、装置、电子设备及可读存储介质
CN113140005A (zh) * 2021-04-29 2021-07-20 上海商汤科技开发有限公司 目标对象定位方法、装置、设备及存储介质
CN113569798A (zh) * 2018-11-16 2021-10-29 北京市商汤科技开发有限公司 关键点检测方法及装置、电子设备和存储介质
CN109685246B (zh) * 2018-11-13 2024-04-23 平安科技(深圳)有限公司 环境数据预估方法、装置及存储介质、服务器

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751162B (zh) * 2018-07-24 2023-04-07 杭州海康威视数字技术股份有限公司 一种图像识别方法、装置和计算机设备
CN109190467A (zh) * 2018-07-26 2019-01-11 北京纵目安驰智能科技有限公司 一种基于关键点回归的多物体检测方法、系统、终端和存储介质
CN109271842A (zh) * 2018-07-26 2019-01-25 北京纵目安驰智能科技有限公司 一种基于关键点回归的通用物体检测方法、系统、终端和存储介质
CN109376571B (zh) * 2018-08-03 2022-04-08 西安电子科技大学 基于变形卷积的人体姿态估计方法
CN108960212A (zh) * 2018-08-13 2018-12-07 电子科技大学 基于端到端的人体关节点检测与分类方法
CN109145816B (zh) * 2018-08-21 2021-01-26 北京京东尚科信息技术有限公司 商品识别方法和系统
CN109191255B (zh) * 2018-09-04 2022-04-15 中山大学 一种基于无监督特征点检测的商品对齐方法
CN109308459B (zh) * 2018-09-05 2022-06-24 南京大学 基于手指注意力模型和关键点拓扑模型的手势估计方法
CN109257622A (zh) * 2018-11-01 2019-01-22 广州市百果园信息技术有限公司 一种音视频处理方法、装置、设备及介质
CN109670397B (zh) 2018-11-07 2020-10-30 北京达佳互联信息技术有限公司 人体骨骼关键点的检测方法、装置、电子设备及存储介质
CN109635926B (zh) * 2018-11-30 2021-11-05 深圳市商汤科技有限公司 用于神经网络的注意力特征获取方法、装置及存储介质
CN109726659A (zh) * 2018-12-21 2019-05-07 北京达佳互联信息技术有限公司 人体骨骼关键点的检测方法、装置、电子设备和可读介质
CN109829391B (zh) * 2019-01-10 2023-04-07 哈尔滨工业大学 基于级联卷积网络和对抗学习的显著性目标检测方法
CN111626082A (zh) * 2019-02-28 2020-09-04 佳能株式会社 检测装置和方法及图像处理装置和系统
CN109934183B (zh) * 2019-03-18 2021-09-14 北京市商汤科技开发有限公司 图像处理方法及装置、检测设备及存储介质
CN110084161B (zh) * 2019-04-17 2023-04-18 中山大学 一种人体骨骼关键点的快速检测方法及系统
CN110084180A (zh) * 2019-04-24 2019-08-02 北京达佳互联信息技术有限公司 关键点检测方法、装置、电子设备及可读存储介质
US11282180B1 (en) 2019-04-24 2022-03-22 Apple Inc. Object detection with position, pose, and shape estimation
CN110426112B (zh) * 2019-07-04 2022-05-13 平安科技(深圳)有限公司 一种生猪体重测量方法及装置
CN110648291B (zh) * 2019-09-10 2023-03-03 武汉科技大学 一种基于深度学习的无人机运动模糊图像的复原方法
CN111079749B (zh) * 2019-12-12 2023-12-22 创新奇智(重庆)科技有限公司 一种带姿态校正的端到端商品价签文字识别方法和系统
CN111445440B (zh) * 2020-02-20 2023-10-31 上海联影智能医疗科技有限公司 一种医学图像分析方法、设备和存储介质
CN111368685B (zh) * 2020-02-27 2023-09-29 北京字节跳动网络技术有限公司 关键点的识别方法、装置、可读介质和电子设备
CN111523480B (zh) * 2020-04-24 2021-06-18 北京嘀嘀无限科技发展有限公司 一种面部遮挡物的检测方法、装置、电子设备及存储介质
CN111652244A (zh) * 2020-04-27 2020-09-11 合肥中科类脑智能技术有限公司 一种基于无监督特征提取和匹配的指针式表计识别方法
CN113689527B (zh) * 2020-05-15 2024-02-20 武汉Tcl集团工业研究院有限公司 一种人脸转换模型的训练方法、人脸图像转换方法
CN112259119B (zh) * 2020-10-19 2021-11-16 深圳市策慧科技有限公司 基于堆叠沙漏网络的音乐源分离方法
CN112287855A (zh) * 2020-11-02 2021-01-29 东软睿驰汽车技术(沈阳)有限公司 基于多任务神经网络的驾驶行为检测方法和装置
CN112668430A (zh) * 2020-12-21 2021-04-16 四川长虹电器股份有限公司 一种吸烟行为检测方法、系统、计算机设备、存储介质
CN113361540A (zh) * 2021-05-25 2021-09-07 商汤集团有限公司 图像处理方法及装置、电子设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110249886A1 (en) * 2010-04-12 2011-10-13 Samsung Electronics Co., Ltd. Image converting device and three-dimensional image display device including the same
CN103198316A (zh) * 2011-12-12 2013-07-10 佳能株式会社 用于识别图像中的干扰元素的方法、装置和系统
CN106203376A (zh) * 2016-07-19 2016-12-07 北京旷视科技有限公司 人脸关键点定位方法及装置
CN106295547A (zh) * 2016-08-05 2017-01-04 深圳市商汤科技有限公司 一种图像比对方法及图像比对装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140001358A (ko) * 2012-06-26 2014-01-07 한국전자통신연구원 차폐 영역 필터링 기반 영상 처리 방법
CN103345763B (zh) * 2013-06-25 2016-06-01 西安理工大学 一种基于多尺度可变块的运动注意力计算方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110249886A1 (en) * 2010-04-12 2011-10-13 Samsung Electronics Co., Ltd. Image converting device and three-dimensional image display device including the same
CN103198316A (zh) * 2011-12-12 2013-07-10 佳能株式会社 用于识别图像中的干扰元素的方法、装置和系统
CN106203376A (zh) * 2016-07-19 2016-12-07 北京旷视科技有限公司 人脸关键点定位方法及装置
CN106295547A (zh) * 2016-08-05 2017-01-04 深圳市商汤科技有限公司 一种图像比对方法及图像比对装置

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635630A (zh) * 2018-10-23 2019-04-16 百度在线网络技术(北京)有限公司 手部关节点检测方法、装置及存储介质
CN109635630B (zh) * 2018-10-23 2023-09-01 百度在线网络技术(北京)有限公司 手部关节点检测方法、装置及存储介质
CN109657482A (zh) * 2018-10-26 2019-04-19 阿里巴巴集团控股有限公司 一种数据有效性的验证方法和装置
CN111144168A (zh) * 2018-11-02 2020-05-12 阿里巴巴集团控股有限公司 农作物生长周期的识别方法、设备以及系统
CN111144168B (zh) * 2018-11-02 2023-04-18 阿里巴巴集团控股有限公司 农作物生长周期的识别方法、设备以及系统
CN109685246A (zh) * 2018-11-13 2019-04-26 平安科技(深圳)有限公司 环境数据预估方法、装置及存储介质、服务器
CN109685246B (zh) * 2018-11-13 2024-04-23 平安科技(深圳)有限公司 环境数据预估方法、装置及存储介质、服务器
CN111191486A (zh) * 2018-11-14 2020-05-22 杭州海康威视数字技术股份有限公司 一种溺水行为识别方法、监控相机及监控系统
CN111191486B (zh) * 2018-11-14 2023-09-05 杭州海康威视数字技术股份有限公司 一种溺水行为识别方法、监控相机及监控系统
CN113591750A (zh) * 2018-11-16 2021-11-02 北京市商汤科技开发有限公司 关键点检测方法及装置、电子设备和存储介质
CN113591754A (zh) * 2018-11-16 2021-11-02 北京市商汤科技开发有限公司 关键点检测方法及装置、电子设备和存储介质
CN113569796A (zh) * 2018-11-16 2021-10-29 北京市商汤科技开发有限公司 关键点检测方法及装置、电子设备和存储介质
CN113569798A (zh) * 2018-11-16 2021-10-29 北京市商汤科技开发有限公司 关键点检测方法及装置、电子设备和存储介质
CN111353349A (zh) * 2018-12-24 2020-06-30 杭州海康威视数字技术股份有限公司 人体关键点检测方法、装置、电子设备及存储介质
CN111353349B (zh) * 2018-12-24 2023-10-17 杭州海康威视数字技术股份有限公司 人体关键点检测方法、装置、电子设备及存储介质
CN110222718B (zh) * 2019-05-09 2023-11-03 华为技术有限公司 图像处理的方法及装置
CN110222718A (zh) * 2019-05-09 2019-09-10 华为技术有限公司 图像处理的方法及装置
CN110110689B (zh) * 2019-05-15 2023-05-26 东北大学 一种行人重识别方法
CN110110689A (zh) * 2019-05-15 2019-08-09 东北大学 一种行人重识别方法
CN110148212A (zh) * 2019-05-17 2019-08-20 北京市商汤科技开发有限公司 一种动作序列生成方法及装置、电子设备和存储介质
CN110287846B (zh) * 2019-06-19 2023-08-04 南京云智控产业技术研究院有限公司 一种基于注意力机制的人脸关键点检测方法
CN110287846A (zh) * 2019-06-19 2019-09-27 南京云智控产业技术研究院有限公司 一种基于注意力机制的人脸关键点检测方法
CN111008929B (zh) * 2019-12-19 2023-09-26 维沃移动通信(杭州)有限公司 图像矫正方法及电子设备
CN111008929A (zh) * 2019-12-19 2020-04-14 维沃移动通信(杭州)有限公司 图像矫正方法及电子设备
CN111210432B (zh) * 2020-01-12 2023-07-25 湘潭大学 一种基于多尺度多级注意力机制的图像语义分割方法
CN111210432A (zh) * 2020-01-12 2020-05-29 湘潭大学 一种基于多尺度多级注意力机制的图像语义分割方法
CN111680722B (zh) * 2020-05-25 2022-09-16 腾讯科技(深圳)有限公司 内容识别方法、装置、设备及可读存储介质
CN111680722A (zh) * 2020-05-25 2020-09-18 腾讯科技(深圳)有限公司 内容识别方法、装置、设备及可读存储介质
CN112164109A (zh) * 2020-07-08 2021-01-01 浙江大华技术股份有限公司 坐标修正方法、装置、存储介质及电子装置
CN111815606A (zh) * 2020-07-09 2020-10-23 浙江大华技术股份有限公司 图像质量评估方法、存储介质及计算装置
CN111815606B (zh) * 2020-07-09 2023-09-01 浙江大华技术股份有限公司 图像质量评估方法、存储介质及计算装置
CN111860652B (zh) * 2020-07-22 2022-03-29 中国平安财产保险股份有限公司 基于图像检测的动物体重测量方法、装置、设备及介质
CN111860652A (zh) * 2020-07-22 2020-10-30 中国平安财产保险股份有限公司 基于图像检测的动物体重测量方法、装置、设备及介质
CN112183826A (zh) * 2020-09-15 2021-01-05 湖北大学 基于深度级联生成对抗网络的建筑能耗预测方法及相关产品
CN112183826B (zh) * 2020-09-15 2023-08-01 湖北大学 基于深度级联生成对抗网络的建筑能耗预测方法及相关产品
CN112183269B (zh) * 2020-09-18 2023-08-29 哈尔滨工业大学(深圳) 一种适用于智能视频监控的目标检测方法与系统
CN112183269A (zh) * 2020-09-18 2021-01-05 哈尔滨工业大学(深圳) 一种适用于智能视频监控的目标检测方法与系统
CN112257567B (zh) * 2020-10-20 2023-04-07 浙江大华技术股份有限公司 行为识别网络的训练、行为识别方法及相关设备
CN112257567A (zh) * 2020-10-20 2021-01-22 浙江大华技术股份有限公司 行为识别网络的训练、行为识别方法及相关设备
CN112712061A (zh) * 2021-01-18 2021-04-27 清华大学 适用于多方向交警指挥手势的识别方法、系统及存储介质
CN112712061B (zh) * 2021-01-18 2023-01-24 清华大学 适用于多方向交警指挥手势的识别方法、系统及存储介质
CN112990046B (zh) * 2021-03-25 2023-08-04 北京百度网讯科技有限公司 差异信息获取方法、相关装置及计算机程序产品
CN112990046A (zh) * 2021-03-25 2021-06-18 北京百度网讯科技有限公司 差异信息获取方法、相关装置及计算机程序产品
CN113052175B (zh) * 2021-03-26 2024-03-29 北京百度网讯科技有限公司 目标检测方法、装置、电子设备及可读存储介质
CN113052175A (zh) * 2021-03-26 2021-06-29 北京百度网讯科技有限公司 目标检测方法、装置、电子设备及可读存储介质
CN113140005A (zh) * 2021-04-29 2021-07-20 上海商汤科技开发有限公司 目标对象定位方法、装置、设备及存储介质
CN113140005B (zh) * 2021-04-29 2024-04-16 上海商汤科技开发有限公司 目标对象定位方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN108229490A (zh) 2018-06-29
CN108229490B (zh) 2021-01-05

Similar Documents

Publication Publication Date Title
WO2018153322A1 (zh) 关键点检测方法、神经网络训练方法、装置和电子设备
US11551333B2 (en) Image reconstruction method and device
Zhou et al. Semantic-supervised infrared and visible image fusion via a dual-discriminator generative adversarial network
US11481869B2 (en) Cross-domain image translation
WO2018166438A1 (zh) 图像处理方法、装置及电子设备
Nandhini Abirami et al. Deep CNN and deep GAN in computational visual perception-driven image analysis
JP2023545565A (ja) 画像検出方法、モデルトレーニング方法、画像検出装置、トレーニング装置、機器及びプログラム
US10262229B1 (en) Wide-area salient object detection architecture for low power hardware platforms
Jozdani et al. A review and meta-analysis of generative adversarial networks and their applications in remote sensing
CN111767906B (zh) 人脸检测模型训练方法、人脸检测方法、装置及电子设备
Shu et al. LVC-Net: Medical image segmentation with noisy label based on local visual cues
Ngo et al. Single-image visibility restoration: A machine learning approach and its 4K-capable hardware accelerator
CN114861842B (zh) 少样本目标检测方法、装置和电子设备
Zhang et al. WGGAN: A wavelet-guided generative adversarial network for thermal image translation
Krishnan et al. SwiftSRGAN-Rethinking super-resolution for efficient and real-time inference
Sundaram et al. FSSCaps-DetCountNet: fuzzy soft sets and CapsNet-based detection and counting network for monitoring animals from aerial images
Xu et al. COCO-Net: A dual-supervised network with unified ROI-loss for low-resolution ship detection from optical satellite image sequences
Chowdhury et al. Automated augmentation with reinforcement learning and gans for robust identification of traffic signs using front camera images
Lai et al. Generative focused feedback residual networks for image steganalysis and hidden information reconstruction
Tomar et al. Attentive ExFeat based deep generative adversarial network for noise robust face super-resolution
US20230073175A1 (en) Method and system for processing image based on weighted multiple kernels
El-Shafai et al. Single image super-resolution approaches in medical images based-deep learning: a survey
Xiang et al. Recognition of characters on curved metal workpiece surfaces based on multi-exposure image fusion and deep neural networks
CN112529081A (zh) 基于高效注意力校准的实时语义分割方法
Dong et al. ICIF: Image fusion via information clustering and image features

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18756582

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 05.12.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 18756582

Country of ref document: EP

Kind code of ref document: A1