WO2019100886A1 - 用于确定目标对象的外接框的方法、装置、介质和设备 - Google Patents

用于确定目标对象的外接框的方法、装置、介质和设备 Download PDF

Info

Publication number
WO2019100886A1
WO2019100886A1 PCT/CN2018/111464 CN2018111464W WO2019100886A1 WO 2019100886 A1 WO2019100886 A1 WO 2019100886A1 CN 2018111464 W CN2018111464 W CN 2018111464W WO 2019100886 A1 WO2019100886 A1 WO 2019100886A1
Authority
WO
WIPO (PCT)
Prior art keywords
attribute information
key points
target object
key point
information
Prior art date
Application number
PCT/CN2018/111464
Other languages
English (en)
French (fr)
Inventor
李步宇
李全全
闫俊杰
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2019572712A priority Critical patent/JP6872044B2/ja
Priority to SG11201913529UA priority patent/SG11201913529UA/en
Publication of WO2019100886A1 publication Critical patent/WO2019100886A1/zh
Priority to US16/731,858 priority patent/US11348275B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/12Bounding box

Definitions

  • the present application relates to computer vision technology, and more particularly to a method, apparatus, electronic device and computer readable storage medium for determining a bounding box of a target object.
  • Faster-RCNN Convolutional Neural Networks
  • RPN Random Proposal Network
  • the RCNN is used to score and correct each candidate area to determine the external frame of the human body.
  • the accuracy of determining the external frame of the human body needs to be further improved.
  • Embodiments of the present application provide a technical solution for determining a bounding box of a target object.
  • a method for determining a bounding box of a target object includes: acquiring attribute information of each of a plurality of key points of the target object; The attribute information of each key point in the key point and the preset neural network determine the location of the bounding box of the target object.
  • the target object includes: a human body.
  • the attribute information of the key point includes: coordinate information and a presence discriminant value.
  • the determining, according to the attribute information of each key point of the target object and the preset neural network, determining a bounding box of the target object a location including: determining, according to attribute information of each of the plurality of key points, at least one valid key point from the plurality of key points; according to each of the at least one valid key point Attribute information, the attribute information of the plurality of key points is processed to obtain attribute information of the processed plurality of key points; and the attribute information of the processed plurality of key points is input to the preset neural network Processing is performed to obtain a position of the outer frame of the target object.
  • the processed attribute information of the multiple key points includes: processed attribute information of each valid key point of the at least one valid key point, and Attribute information of other key points other than the at least one valid key point among the plurality of key points.
  • the attribute information of the multiple key points is processed and processed according to attribute information of each valid key point of the at least one valid key point.
  • the attribute information of the plurality of key points after the method includes: determining, according to the coordinate information included in the attribute information of each of the at least one valid key point, the reference coordinate; according to the reference coordinate and the at least one effective key The coordinate information in the attribute information of each valid key point in the point determines coordinate information in the processed attribute information of each valid key point.
  • the determining the reference coordinate according to the coordinate information included in the attribute information of each valid key point of the at least one valid key point including: the at least one The coordinates corresponding to the coordinate information of each valid key point of the effective key points are averaged to obtain the reference coordinates; and/or, according to the reference coordinates and each of the at least one valid key point
  • the coordinate information in the attribute information, the coordinate information in the processed attribute information of each valid key point is determined, including: using the reference coordinate as an origin, determining each valid key of the at least one valid key point The processed coordinate information corresponding to the coordinate information of the point.
  • the attribute information of the processed multiple key points is input to the preset neural network for processing, to obtain a bounding box of the target object.
  • Positioning including: inputting attribute information of the processed plurality of key points to the preset neural network for processing, obtaining output position information; determining the target according to the reference coordinate and the output position information The bounding box position of the object.
  • the method further includes: acquiring a sample set including a plurality of sample data, where the sample data includes: attribute information of multiple key points of the sample object, And the sample data is marked with a location of a circumscribing frame of the sample object;
  • the neural network is trained according to attribute information of a plurality of key points of the sample object in each of the sample data and a circumscribing position of the sample object.
  • the neural network is trained based on a random gradient descent algorithm.
  • the location of the circumscribing frame of the target object includes: coordinate information of two vertices in a diagonal direction of the circumscribing frame of the target object.
  • the neural network includes: at least two layers of fully connected layers.
  • the neural network includes: a three-layer fully-connected layer, wherein the first-layer fully-connected layer and the second-layer fully-connected layer of the three-layer fully-connected layer
  • the activation function of at least one of the layers includes: a modified linear unit ReLu activation function.
  • the first layer fully connected layer includes 320 neurons
  • the second layer fully connected layer includes 320 neurons
  • the three layers are fully connected.
  • the last layer of the fully connected layer consists of 4 neurons.
  • an apparatus for determining a bounding box of a target object includes: an acquiring module, configured to acquire attribute information of each of a plurality of key points of the target object; determining a module And determining, according to the attribute information of each key point of the plurality of key points of the target object acquired by the acquiring module, and the preset neural network, determining a location of the bounding box of the target object.
  • the target object includes: a human body.
  • the attribute information of the key point includes: coordinate information and a presence discriminant value.
  • the determining module includes: a first submodule, configured to: according to attribute information of each key point of the plurality of key points acquired by the acquiring module, Determining at least one valid key point among the plurality of key points; the second sub-module, configured to: according to attribute information of each valid key point of the at least one valid key point determined by the first sub-module The attribute information of the key points is processed to obtain the attribute information of the processed plurality of key points; the third sub-module is configured to input the attribute information of the processed plurality of key points obtained by the second sub-module to the The preset neural network performs processing to obtain a location of the bounding box of the target object.
  • the attribute information of the processed multiple key points includes: processed attribute information of each valid key point of the at least one valid key point, and Attribute information of the other key points other than the at least one valid key point among the plurality of key points.
  • the second submodule includes: a first unit, configured to determine, according to the first submodule, each valid key of the at least one valid key point a coordinate information included in the attribute information of the point, determining a reference coordinate; a second unit, configured to determine a reference coordinate according to the first unit and a coordinate in attribute information of each of the at least one valid key point Information, determining coordinate information in the processed attribute information of each valid key point.
  • the first unit is configured to: correspond to coordinate information of each valid key point of the at least one valid key point determined by the first submodule The coordinates are averaged to obtain the reference coordinates; and/or the second unit is configured to: determine a coordinate of each of the at least one valid key point by using a reference coordinate determined by the first unit as an origin The processed coordinate information corresponding to the information.
  • the third submodule is configured to: input attribute information of the processed multiple key points obtained by the second submodule to the preset
  • the neural network performs processing to obtain output location information; and determines a location of the bounding box of the target object according to the reference coordinates and the output location information.
  • the apparatus further includes: a training module, configured to: acquire a sample set including a plurality of sample data, where the sample data includes: a plurality of sample objects Attribute information of a key point, and the sample data is labeled with a circumscribing position of the sample object; attribute information of a plurality of key points of the sample object in each of the sample data and a position of a bounding box of the sample object Train the neural network.
  • a training module configured to: acquire a sample set including a plurality of sample data, where the sample data includes: a plurality of sample objects Attribute information of a key point, and the sample data is labeled with a circumscribing position of the sample object; attribute information of a plurality of key points of the sample object in each of the sample data and a position of a bounding box of the sample object Train the neural network.
  • the neural network is trained based on a random gradient descent algorithm.
  • the location of the circumscribing frame of the target object includes: coordinate information of two vertices in a diagonal direction of the circumscribing frame of the target object.
  • the neural network comprises: at least two layers of fully connected layers.
  • the neural network includes: a three-layer fully connected layer, wherein the first layer of the three-layer fully connected layer is fully connected and the second layer is fully connected.
  • the activation function of at least one of the layers includes: a modified linear unit ReLu activation function.
  • the first layer of fully connected layers comprises 320 neurons
  • the second layer of fully connected layers comprises 320 neurons
  • the last of the three layers of fully connected layers comprises 4 neurons.
  • an electronic device comprising: a processor and a computer readable storage medium for storing instructions, the execution of the instructions by the processor causing the The electronic device performs any of the embodiments described above.
  • a computer program product comprising at least one instruction, the at least one instruction being executed by a processor, any embodiment of the method being performed.
  • the computer program product is a computer storage medium, and in another alternative embodiment, the computer program product is a software product, such as an SDK or the like.
  • a method and apparatus, an electronic device, and a computer program product for determining a bounding box of a target object provided by using the above-described embodiments of the present application, by utilizing attribute information and a neural network of each of a plurality of key points of the target object Determining the location of the bounding box of the target object is beneficial to improving the efficiency and accuracy of determining the bounding box of the target object.
  • FIG. 1 is a flow chart of a method for determining a bounding box of a target object in some embodiments of the present application
  • FIG. 2 is a flow chart of a method for training a neural network in some embodiments of the present application
  • FIG. 3 is a schematic structural diagram of an apparatus for determining a bounding box of a target object in some embodiments of the present application
  • FIG. 4 is a schematic structural diagram of an electronic device in some embodiments of the present application.
  • FIG. 5 is a schematic diagram of a computer storage medium in some embodiments of the present application.
  • Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • FIG. 1 is a flow chart of a method for determining a bounding box of a target object in some embodiments of the present application. As shown in FIG. 1, the method for determining a bounding box of a target object of the present application includes: S100 and S110. The respective operations in Fig. 1 will be described below.
  • the target object in the embodiment of the present application may also be referred to as a detection object or a circumstance detection object, and the like.
  • the target object may be a human body, or may be a human face or a specific object.
  • the embodiment of the present application does not limit the representation form of the target object.
  • the circumscribing frame in the embodiment of the present application generally refers to a polygon (usually a rectangle) capable of indicating the area where the target object is located, and the circumscribing frame generally not only accurately covers all parts of the target object, but also the area thereof can be as much as possible. small.
  • the attribute information of the key points in the embodiment of the present application may include various information of key points.
  • the attribute information of the key point may be used to describe whether at least one key point of the target object is visible in the image, and the position of at least one key point visible in the image in the image, and the embodiment of the present application may be in the image.
  • the visible key points ie, the key points in the image
  • the key points that are not visible in the image that is, the key points that are not in the image
  • the key point that is not visible in the image may be a key point that is occluded, or may be a key point that is located outside the image, which is not limited in this embodiment of the present application.
  • the attribute information of the key point may include: coordinate information of the key point and a presence discriminant value of the key point, wherein the coordinate information of the key point may be used to indicate the position of the key point in the image, for example,
  • the coordinate information of the key point may be a two-dimensional coordinate of the key point, but the embodiment of the present application is not limited thereto; the presence discriminant value of the key point may be used to indicate whether the key point is visible in the image.
  • the discriminant value of the key point is 1, it indicates that the key point is visible, and if the discriminant value of the key point is 0, it indicates that the key point is not visible, but the discriminant value in the embodiment of the present application may also be This solution is implemented in other manners, which is not limited in this embodiment.
  • the attribute information may further include other information, and the embodiment of the present application is not limited thereto.
  • the attribute information of the key point acquired by the embodiment of the present application may be a 3 ⁇ N-dimensional vector, where N represents the number of multiple key points set in advance for the target object.
  • the attribute information of a key point in the embodiment of the present application may be represented by an array (x, y, v), where x and y are the two-dimensional coordinates of the key point in the image, respectively, and v is the existence of the key point.
  • the discriminant value when the value of v is the first discriminant value, indicates that the key point is a visible key point in the image, and when the value of v is the second discriminant value, it indicates that the key point is an invisible key in the image. point.
  • the attribute information of the key point can be represented as an array (x, y, 1), and if the key point is an invalid key point (
  • the attribute information of the key point can be represented as an array (0, 0, 0) if it is occluded or located outside the image.
  • the key points of the human body of the embodiments of the present application may generally include: a head, a neck, a left shoulder, a right shoulder, a left elbow, a right elbow, a left wrist, a right wrist, and a left
  • the hip, right hip, left knee, right knee, left ankle, and right ankle can use these 14 key points to describe the posture of the human body more completely.
  • the attribute information of the plurality of key points may include attribute information of some or all of the 14 key points.
  • the attribute information of the plurality of key points acquired by the embodiment of the present application may include: coordinate information of the top of the head and a presence discriminant value of the head, coordinate information of the neck, presence discriminant value of the neck, and coordinate information of the left shoulder.
  • the attribute information of these 14 key points the human body profile in the image can be described.
  • the key points thereof usually change accordingly, and the embodiment of the present application does not limit the expression of the key points of the target object.
  • the embodiments of the present application may be applied to an application scenario in which attribute information of a plurality of key points of a target object has been obtained, that is, a plurality of target objects have been obtained from an image or by other means.
  • the embodiment of the present application can obtain the attribute information of the key point of the target object by means of information reading, etc., but the embodiment of the present application is not limited thereto.
  • the embodiment of the present application may obtain the location of the bounding box of the target object by using the pre-trained neural network according to the attribute information of the plurality of key points of the target object.
  • the S100 may be executed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an acquisition module 300 executed by the processor.
  • S110 Determine a location of a bounding box of the target object according to attribute information of each of the plurality of key points of the target object and a preset neural network.
  • the bounding box location of the target object can be used to determine the bounding box of the target object.
  • the outer frame position may include location information of one or more vertices of the outer frame.
  • the outer frame position may include position information of two opposite vertices of the outer frame, for example, each of the two opposite vertices
  • the two-dimensional coordinates are not limited to the implementation of the position of the outer frame of the target object in the embodiment of the present application.
  • the neural network in the embodiments of the present application may be a dedicated neural network.
  • the neural network may be trained by using a large amount of sample data, wherein the sample data may include attribute information of a plurality of key points of the sample object and a position of the outer frame, that is, the sample data may be labeled with the sample object. The location of the external frame.
  • An optional example of the training process can be found in the description of Figure 2 below, and therefore will not be described in detail herein.
  • the neural network in the embodiment of the present application may include: at least two layers of fully connected layers. Compared to convolutional neural networks, fully connected networks can have faster computational speeds and processing efficiencies.
  • the neural network in the embodiment of the present application includes: two layers of fully connected layers, and the activation function of the first layer fully connected layer may be a ReLu (Rectified Linear Unit) activation function.
  • ReLu Rectified Linear Unit
  • the neural network in the embodiment of the present application includes: a three-layer fully connected layer, and the activation function of the first layer fully connected layer may be a ReLu activation function, and the activation function of the second layer fully connected layer is also The function can be activated for ReLu.
  • the number of layers of the fully connected layer included in the neural network and the number of neurons included in the fully connected layer of each layer may be set according to actual conditions. In the case that the number of layers of the neural network and the number of neurons are sufficient, the neural network has a strong function of expressing functions, so that the position of the circumscribing frame obtained based on the neural network is more accurate.
  • the number of neurons in the first layer of the fully connected layer may be 320, and the number of neurons in the second layer of the fully connected layer may also be 320.
  • the third layer The number of neurons in the fully connected layer can be set to four.
  • the neural network with three layers of fully connected layers is verified by multiple experiments, wherein the activation function of the first layer of the fully connected layer and the second layer of the fully connected layer uses the ReLu activation function, and the first layer is fully connected and the second layer
  • the fully connected layer has 320 neurons, and the third layer has 4 neurons.
  • the calculation speed can meet the actual needs, and the accuracy of determining the position of the external frame can also meet the actual needs.
  • the S110 may be executed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a determination module 310 executed by the processor.
  • the attribute information of the plurality of key points may be directly input to the neural network, or may be input to the neural network after processing the attribute information of the plurality of key points. That is to say, the input information of the neural network may be determined according to the attribute information of the plurality of key points, wherein the input information may be the attribute information of the plurality of key points itself, or the attribute of the plurality of key points The information is processed.
  • the neural network can process the input information to obtain an output result, wherein the location of the bounding box of the target object can be obtained according to the output result of the neural network.
  • the output result of the neural network may include location information of the bounding box of the target object, for example, coordinate information of one or more vertices of the bounding box of the target object.
  • location information of the bounding box of the target object for example, coordinate information of one or more vertices of the bounding box of the target object.
  • the output of the neural network may be obtained by processing the output of the neural network.
  • the present application may perform selection of valid key points according to attribute information of each of a plurality of key points of the target object. For example, if the attribute information of the key point includes the presence discriminant value, the key point where the existence discriminant value indicates existence may be determined as a valid key point, for example, if the existence judgment value of the key point is 1, the key point may be determined as An effective key point, but the embodiment of the present application is not limited thereto.
  • part or all of the multiple key points may be determined according to attribute information of each valid key point of the at least one valid key point
  • the attribute information of the key point is processed, and the attribute information of the processed plurality of key points is obtained, and the attribute information of the processed plurality of key points is used as the input information.
  • the attribute information of the processed plurality of key points may include processed attribute information of each of the plurality of key points, or processed by a part of the plurality of key points. The attribute information and the original attribute information of the other key point of the plurality of key points.
  • the attribute information of the processed plurality of key points may include processed attribute information of each of the at least one valid key point and the at least one valid key point of the plurality of key points
  • the original attribute information of the other key points that is, the attribute information of each of the at least one valid key point can be processed without processing the attribute information of the other key points, but the embodiment of the present application Not limited to this.
  • attribute information of the at least one valid key point may be processed in various manners.
  • the reference coordinates may be determined according to coordinate information included in the attribute information of each of the at least one valid key point, and the valid key point is determined according to the coordinate information in the attribute information of the reference coordinate and the valid key point.
  • the reference coordinates can be obtained by processing coordinate information of the at least one valid key point.
  • the reference coordinate may be obtained by averaging the coordinates of the at least one valid key point, but the implementation manner of the reference coordinate is not limited in the embodiment of the present application.
  • the attribute information of the key points acquired by S100 may be subjected to zero-averaging processing, and the information obtained after the zero-mean processing is provided to the neural network as part of the input information.
  • the coordinate mean value (m x , m y ) can be calculated according to the coordinate information in the attribute information of the valid key point; after that, the coordinate information of the key point is calculated for each valid key point of all the key points (x) i , y i ) is the difference between the above-mentioned coordinate mean values, ie (x i -m x , y i -m y ), and uses the calculated difference value as the coordinate information of the valid key point; finally, the target object can be The coordinate information of all the key points and the existence discriminant values of all the key points are provided as input information to the neural network.
  • the embodiment of the present application may use the sum of the coordinate information output by the neural network and the calculated coordinate mean as the target object.
  • the final coordinates of the multiple vertices of the box such as the two vertices on the diagonal of the rectangular bounding box.
  • the output position information of the neural network is (bx 1 , by 1 ) and (bx 2 , by 2 )
  • the coordinates of the two vertices on the diagonal of the bounding box of the target object may be (bx 1 +m x , By 1 +m y ) and (bx 2 +m x ,by 2 +m y ).
  • FIG. 2 is a flow chart of a method of training a neural network in some embodiments of the present application.
  • N the number of multiple key points
  • the attribute information of each key point may be a 3-dimensional vector: (x, y, v), and the outer bounding box is a rectangle.
  • the input of the neural network comprises a 3 ⁇ N matrix
  • the output comprises a 2 ⁇ 2 matrix, which can be a two-dimensional coordinate of two vertices on the diagonal of the circumscribed box.
  • the method for training a neural network in the embodiments of the present application includes: S200, S210, S220, S230, S240, and S250.
  • the respective operations in Fig. 2 will be described below.
  • S200 Obtain a piece of sample data from a sample set.
  • the set of samples in embodiments of the present application is typically non-empty and typically includes a large amount of sample data
  • the set of samples may be the currently published MS COCO database, and the like.
  • Each piece of sample data in the sample set may include: attribute information of a plurality of key points of the sample object, and each sample data may be labeled with a position of a bounding box of the sample object, wherein the attribute information of the key point may include coordinates of the key point.
  • the discriminant value of the presence of information and key points but the embodiment of the present application is not limited thereto.
  • the sample object corresponding to the sample data is usually of the same type as the target object.
  • the sample object is also a human body.
  • one sample data may be sequentially selected from the sample set according to the arrangement order of the sample data, or one sample data may be randomly selected from the sample set.
  • the manner in which the sample data is selected is not limited in the embodiment of the present application.
  • the coordinate mean value (m x , m y ) is calculated for the coordinate information in the attribute information of all the key points in which the discrimination value v is 1 in the sample data.
  • (x i -m x , y i -m y ) is calculated for the coordinate information (x i , y i ) of the valid key points in the sample data.
  • the attribute information of all the key points of the piece of sample data is provided as an input to the neural network.
  • the output of the neural network is the two-dimensional coordinates (bx 1 , by 1 ) and (bx 2 , by 2 ) of the two vertices on the diagonal of the rectangle
  • the coordinates of the bounding box It can be determined that the sum of the above output coordinate information and the coordinate mean (ie, given supervision) can be expressed as (bx 1 + m x , by 1 + m y ) and (bx 2 + m x , by 2 + m y ).
  • the embodiment of the present application may perform calculation by using a random gradient descent algorithm to implement training.
  • whether the parameters of the neural network are adjusted may be determined by comparing the calculated result of the neural network with the location of the circumscribed frame of the sample data. Wherein, if the difference between the calculated result of the neural network and the position of the bounding box marked by the sample data is lower than a certain range, the training process may be terminated or new sample data may be selected from the sample set. Otherwise, the parameters of the neural network can be adjusted and the calculation can be continued using the adjusted neural network.
  • the embodiment of the present application may determine whether all the sample data in the sample set is used for training, whether the result of the neural network output meets the predetermined accuracy requirement, or whether the number of samples read reaches a predetermined number, and the like. Factor to determine whether to continue to get a new sample data from the sample set.
  • the neural network training is successful if it is determined that the result of the neural network output meets the predetermined accuracy requirement, and if all sample data in the sample set has been used for training or reading The number of samples has reached the predetermined number.
  • the neural network is not successfully trained and can be used for the neural network. Train again.
  • the foregoing detection may be: selecting a plurality of untrained sample data from the sample set, and based on the sample data, providing the neural network according to the method shown in FIG. 1 and determining at least one external frame obtained based on the neural network.
  • the error of the position and the position of the manually marked outer frame in the corresponding sample data is successful when the accuracy is determined according to at least one error to meet the predetermined accuracy requirement.
  • the embodiment of the present application may perform the training supervision by using the L2 loss function, but the embodiment of the present application is not limited thereto.
  • the present application trains the neural network by using the attribute information of the key points of the sample object and the position of the bounding box, so that the trained neural network can directly determine the position of the bounding box of the target object based on the attribute information of the key points of the target object;
  • the embodiment of the present application can quickly utilize the attribute information of the key point of the target object that has been obtained without using the image.
  • any of the methods provided by the embodiments of the present application may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like.
  • any of the methods provided by the embodiments of the present application may be executed by a processor, such as a processor, by executing a corresponding instruction stored in a memory to perform any one of the methods mentioned in the embodiments of the present application. This will not be repeated below.
  • the foregoing programs may be stored in a computer readable storage medium, and the program is executed when executed.
  • the operation of the foregoing method embodiment is included; and the foregoing storage medium includes at least one medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • FIG. 3 is a schematic structural diagram of an apparatus for determining a bounding box of a target object in some embodiments of the present application.
  • the apparatus of this embodiment can be used to implement the various method embodiments described above.
  • the apparatus of this embodiment includes: an obtaining module 300 and a determining module 310.
  • the apparatus may further include: a training module 320.
  • the obtaining module 300 is configured to acquire attribute information of each key point of the plurality of key points of the target object.
  • the content of the attribute information of the target object, the key point, the valid key point, and the key point can be referred to the related description of S100 in the foregoing method embodiment, and therefore will not be described in detail herein.
  • the apparatus of the embodiments of the present application may be applied to an application scenario in which attribute information of a key point of a target object has been successfully obtained, that is, an attribute of a key point of the target object has been obtained from the image.
  • the obtaining module 300 can directly obtain the attribute information of the key point of the existing target object by means of information reading or the like.
  • the determining module 310 is configured to determine the location of the bounding box of the target object according to the attribute information of each of the plurality of key points of the target object acquired by the obtaining module 300 and the preset neural network.
  • the representation of the neural network in the present application (for example, the number of layers, the number of neurons, and the activation function, etc.) can be referred to the related description in the foregoing method embodiments, and thus will not be described in detail herein.
  • the determining module 310 can include: a first sub-module, a second sub-module, and a third sub-module.
  • the first sub-module is configured to determine at least one valid key point from the plurality of key points according to the attribute information of each of the plurality of key points acquired by the obtaining module 300;
  • the second sub-module is used according to the first sub-module
  • the attribute information of each valid key point of the at least one valid key point determined by the module, the attribute information of the plurality of key points is processed, and the attribute information of the processed multiple key points is obtained;
  • the third sub-module is used for The attribute information of the processed multiple key points obtained by the two sub-modules is input to a preset neural network for processing, and the position of the external frame of the target object is obtained.
  • the attribute information of the processed multiple key points may include: processed attribute information of each of the at least one valid key point and a plurality of key points other than the at least one valid key point Property information for other key points.
  • the second submodule may include: a first unit and a second unit.
  • the first unit is configured to determine, according to coordinate information included in the attribute information of each valid key point of the at least one valid key point determined by the first submodule, the reference coordinate is determined; for example, the first unit is in the at least one valid key point.
  • the coordinates corresponding to the coordinate information of each valid key point are averaged to obtain reference coordinates; the second unit is used for the reference coordinates determined according to the first unit and the attribute information of each of the valid key points of the at least one valid key point Coordinate information, determining coordinate information in the processed attribute information of each valid key point; for example, the second unit determines the reference coordinate determined by the first unit as an origin, and determines each valid key point of the at least one valid key point The processed coordinate information corresponding to the coordinate information.
  • the third sub-module may be configured to input attribute information of the plurality of key points processed by the second unit to the neural network for processing, obtain output position information, and determine the target object according to the reference coordinates and the output position information. The location of the add-in box.
  • the first unit is configured to calculate a two-dimensional coordinate mean value according to coordinate information of all valid key points of the target object;
  • the second unit is used to calculate the difference between the coordinate information of the key point and the mean value of the two-dimensional coordinate for all the valid key points of the target object, and use the difference as the coordinate information of the effective key point;
  • the third sub-module is used for the target object.
  • the determining module 310 may use the sum of the bounding box coordinate information and the coordinate mean value output by the neural network as the bounding box two-dimensional coordinate information of the target object. .
  • the training module 320 is configured to train a neural network to acquire a sample set including a plurality of sample data, wherein the sample data includes: attribute information of a plurality of key points of the sample object, and the sample data is marked with a position of the outer frame of the sample object, and then according to The attribute information of the plurality of key points of the sample object in each sample data and the position of the bounding box of the sample object are trained in the neural network.
  • the training module 320 acquires a plurality of pieces of sample data from the sample set, and for each piece of sample data, calculates coordinate mean values according to coordinate information of all valid key points of the piece of sample data, and separately calculates the piece of sample data.
  • the difference between the coordinate information of at least one valid key point and the above-mentioned coordinate mean value, the calculated difference value is used as the coordinate information of the corresponding valid key point, and then the attribute information of all the key points of the piece of sample data is provided as an input.
  • the neural network An example of the operation performed by the training module 320 to train the neural network can be referred to the description in the above method, and therefore the description will not be repeated here.
  • the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • an electronic device such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • FIG. 4 there is shown a schematic structural diagram of an electronic device 400 suitable for implementing a terminal device or a server of an embodiment of the present application.
  • the electronic device 400 includes one or more processors and a communication unit.
  • the one or more processors are, for example, one or more central processing units (CPUs) 401, and/or one or more acceleration units 413, etc., and the acceleration units 413 may include, but are not limited to, GPUs, FPGAs, other types.
  • CPUs central processing units
  • acceleration units 413 may include, but are not limited to, GPUs, FPGAs, other types.
  • the processor can perform various appropriate actions in accordance with executable instructions stored in read only memory (ROM) 402 or executable instructions loaded from random access memory (RAM) 403 from storage portion 408.
  • the communication unit 412 may include, but is not limited to, a network card, which may include, but is not limited to, an IB (Infiniband) network card.
  • the processor can communicate with the read only memory 402 and/or the random access memory 403 to execute executable instructions, connect to the communication portion 412 via the bus 404, and communicate with other target devices via the communication portion 412, thereby completing the embodiments of the present application. Any of the methods corresponding to the operation. For example, acquiring attribute information of each of a plurality of key points of the target object; determining attribute information of the target object according to attribute information of each of the plurality of key points of the target object and a preset neural network The location of the external frame.
  • RAM 403 various programs and data required for the operation of the device can be stored.
  • the CPU 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
  • ROM 402 is an optional module.
  • the RAM 403 stores executable instructions, or writes executable instructions to the ROM 402 at runtime, the executable instructions causing the processor to perform operations corresponding to the above-described communication methods.
  • An input/output (I/O) interface 405 is also coupled to bus 404.
  • the communication unit 412 may be integrated or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and on the bus link.
  • the following components are connected to the I/O interface 405: an input portion 406 including a keyboard, a mouse, etc.; an output portion 407 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 408 including a hard disk or the like. And a communication portion 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the Internet.
  • Driver 410 is also coupled to I/O interface 405 as needed.
  • a removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 410 as needed so that a computer program read therefrom is installed into the storage portion 408 as needed.
  • FIG. 4 is only an optional implementation manner.
  • the number and type of components in the foregoing FIG. 4 may be selected, deleted, added, or replaced according to actual needs; Different functional component settings may also be implemented by separate settings or integrated settings.
  • the acceleration unit 413 and the CPU 401 may be separately disposed or the acceleration unit 413 may be integrated on the CPU 401, and the communication unit may be separately configured or integrated in the CPU 401. Or on the acceleration unit 413, and so on.
  • embodiments disclosed herein include a computer program product comprising a computer program tangibly embodied on a computer readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Corresponding to the instructions corresponding to the method steps provided by the embodiments of the present application. For example, acquiring attribute information of each of a plurality of key points of the target object; determining attribute information of the target object according to attribute information of each of the plurality of key points of the target object and a preset neural network The location of the external frame.
  • the computer program can be downloaded and installed from the network via the communication portion 409, and/or installed from the removable medium 411.
  • the instructions in the computer program are executed by the central processing unit (CPU) 401, the above-described functions defined in the method of the present application are performed.
  • the methods, apparatus, and apparatus of the present application may be implemented in a number of ways.
  • the methods, apparatus, and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
  • the present application can also be embodied as a program recorded in a recording medium, the program comprising computer readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method of the present application, for example, the computer readable storage medium 500 shown in FIG.
  • the methods and apparatus, electronic devices, and computer readable storage media of the present application are possible in many ways.
  • the methods and apparatus, electronic devices, and computer readable storage media of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, or firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
  • the present application can also be embodied as a program recorded in a recording medium, the program comprising computer readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种用于确定目标对象的外接框的方法、装置、介质和设备,其中,方法包括:获取目标对象的多个关键点中每个关键点的属性信息;根据所述目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定所述目标对象的外接框位置。本申请实施方式能够提高确定目标对象的外接框的效率以及准确性。

Description

用于确定目标对象的外接框的方法、装置、介质和设备
本申请要求在2017年11月21日提交中国专利局、申请号为CN 201711165979.8、发明名称为“用于确定目标对象的外接框的方法、装置和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术,尤其是一种用于确定目标对象的外接框的方法、装置、电子设备和计算机可读存储介质。
背景技术
在图像识别等计算机视觉领域中,往往需要快速准确的确定出人体的外接框。
目前,通常利用Faster-RCNN(Convolutional Neural Networks,加快的基于区域的卷积神经网络)来确定人体的外接框,即先利用RPN(Region Proposal Network,候选区域生成网络)获得多个候选区域,然后,再利用RCNN对各个候选区域进行评分及修正,从而确定出人体的外接框,然而,确定人体的外接框的准确率还有待进一步提升。
发明内容
本申请实施方式提供一种用于确定目标对象的外接框的技术方案。
根据本申请实施例的一个方面,提供一种用于确定目标对象的外接框的方法,包括:获取目标对象的多个关键点中每个关键点的属性信息;根据所述目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定所述目标对象的外接框位置。
可选地,在本申请上述方法的一实施方式中,所述目标对象包括:人体。
可选地,在本申请上述方法的又一实施方式中,所述关键点的属性信息包括:坐标信息以及存在判别值。
可选地,在本申请上述方法的一实施方式中,所述根据所述目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定所述目标对象的外接框位置,包括:根据所述多个关键点中每个关键点的属性信息,从所述多个关键点中确定至少一个有效关键点;根据所述至少一个有效关键点中每个有效关键点的属性信息,对所述多个关键点的属性信息进行处理,得到处理后的多个关键点的属性信息;将所述处理后的多个关键点的属 性信息输入到所述预设的神经网络进行处理,得到所述目标对象的外接框位置。
可选地,在本申请上述方法的一实施方式中,所述处理后的多个关键点的属性信息包括:所述至少一个有效关键点中每个有效关键点的处理后的属性信息以及所述多个关键点中除所述至少一个有效关键点之外的其他关键点的属性信息。
可选地,在本申请上述方法的一实施方式中,所述根据所述至少一个有效关键点中每个有效关键点的属性信息,对所述多个关键点的属性信息进行处理,得到处理后的多个关键点的属性信息,包括:根据所述至少一个有效关键点中每个有效关键点的属性信息包括的坐标信息,确定参考坐标;根据所述参考坐标和所述至少一个有效关键点中每个有效关键点的属性信息中的坐标信息,确定所述每个有效关键点的处理后的属性信息中的坐标信息。
可选地,在本申请上述方法的一实施方式中,所述根据所述至少一个有效关键点中每个有效关键点的属性信息包括的坐标信息,确定参考坐标,包括:对所述至少一个有效关键点中每个有效关键点的坐标信息对应的坐标进行平均处理,得到所述参考坐标;和/或,所述根据所述参考坐标和所述至少一个有效关键点中每个有效关键点的属性信息中的坐标信息,确定所述每个有效关键点的处理后的属性信息中的坐标信息,包括:将所述参考坐标作为原点,确定所述至少一个有效关键点中每个有效关键点的坐标信息所对应的处理后的坐标信息。
可选地,在本申请上述方法的一实施方式中,所述将所述处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到所述目标对象的外接框位置,包括:将所述处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到输出位置信息;根据所述参考坐标和所述输出位置信息,确定所述目标对象的外接框位置。
可选地,在本申请上述方法的一实施方式中,所述方法还包括:获取包括多个样本数据的样本集合,其中,所述样本数据包括:样本对象的多个关键点的属性信息,并且所述样本数据标注有所述样本对象的外接框位置;
根据每个所述样本数据中样本对象的多个关键点的属性信息以及所述样本对象的外接框位置,训练所述神经网络。
可选地,在本申请上述方法的一实施方式中,所述神经网络是基于随机梯度下降算法进行训练得到的。
可选地,在本申请上述方法的一实施方式中,所述目标对象的外接框位置包括:所述目标对象的外接框对角线方向上的两个顶点的坐标信息。
可选地,在本申请上述方法的一实施方式中,所述神经网络包括:至少两层全连接层。
可选地,在本申请上述方法的一实施方式中,所述神经网络包括:三层全连接层,其中,所述三层全连接层的第一层全连接层和第二层全连接层中的至少一层的激活函数包括:修正线性单元ReLu激活函数。
可选地,在本申请上述方法的一实施方式中,所述第一层全连接层包括320个神经元,所述第二层全连接层包括320个神经元,所述三层全连接层中的最后一层全连接层包括4个神经元。
根据本申请实施例的另一个方面,提供一种用于确定目标对象的外接框的装置,包括:获取模块,用于获取目标对象的多个关键点中每个关键点的属性信息;确定模块,用于根据所述获取模块获取的所述目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定所述目标对象的外接框位置。
可选地,在本申请上述装置的一实施方式中,所述目标对象包括:人体。
可选地,在本申请上述装置的又一实施方式中,所述关键点的属性信息包括:坐标信息以及存在判别值。
可选地,在本申请上述装置的再一实施方式中,所述确定模块包括:第一子模块,用于根据所述获取模块获取的多个关键点中每个关键点的属性信息,从所述多个关键点中确定至少一个有效关键点;第二子模块,用于根据所述第一子模块确定出的至少一个有效关键点中每个有效关键点的属性信息,对所述多个关键点的属性信息进行处理,得到处理后的多个关键点的属性信息;第三子模块,用于将所述第二子模块得到的处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到所述目标对象的外接框位置。
可选地,在本申请上述装置的再一实施方式中,所述处理后的多个关键点的属性信息包括:所述至少一个有效关键点中每个有效关键点的处理后的属性信息以及所述多个关键点中除所述至少一个有效关键点之外的其他关键点的属性信息。
可选地,在本申请上述装置的再一实施方式中,所述第二子模块包括:第一单元,用于根据所述第一子模块确定出的至少一个有效关键点中每个有效关键点的属性信息包括的坐标信息,确定参考坐标;第二单元,用于根据所述第一单元确定出的参考坐标和所述至少一个有效关键点中每个有效关键点的属性信息中的坐标信息,确定所述每个有效关键点的处理后的属性信息中的坐标信息。
可选地,在本申请上述装置的再一实施方式中,所述第一单元用于:对所述第一子模块确定出的至少一个有效关键点中每个有效关键点的坐标信息对应的坐标进行平均处理, 得到所述参考坐标;和/或第二单元用于:将所述第一单元确定出的参考坐标作为原点,确定所述至少一个有效关键点中每个有效关键点的坐标信息所对应的处理后的坐标信息。
可选地,在本申请上述装置的再一实施方式中,所述第三子模块用于:将所述第二子模块得到的处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到输出位置信息;根据所述参考坐标和所述输出位置信息,确定所述目标对象的外接框位置。
可选地,在本申请上述装置的再一实施方式中,所述装置还包括:训练模块,用于:获取包括多个样本数据的样本集合,其中,所述样本数据包括:样本对象的多个关键点的属性信息,并且所述样本数据标注有所述样本对象的外接框位置;根据每个所述样本数据中样本对象的多个关键点的属性信息以及所述样本对象的外接框位置,训练所述神经网络。
可选地,在本申请上述装置的再一实施方式中,所述神经网络是基于随机梯度下降算法进行训练得到的。
可选地,在本申请上述装置的再一实施方式中,所述目标对象的外接框位置包括:所述目标对象的外接框对角线方向上的两个顶点的坐标信息。
可选地,在本申请上述装置的再一实施方式中,所述神经网络包括:至少两层全连接层。
可选地,在本申请上述装置的再一实施方式中,所述神经网络包括:三层全连接层,其中,所述三层全连接层的第一层全连接层和第二层全连接层中的至少一层的激活函数包括:修正线性单元ReLu激活函数。所述第一层全连接层包括320个神经元,所述第二层全连接层包括320个神经元,所述三层全连接层中的最后一层全连接层包括4个神经元。
根据本申请实施例的再一个方面,提供一种电子设备,包括:处理器和计算机可读存储介质,计算机可读存储介质用于存储指令,所述处理器对所述指令的执行使得所述电子设备执行上述方法的任一实施方式。
根据本申请实施例的再一个方面,提供一种计算机程序产品,包括至少一个指令,所述至少一个指令在被处理器执行时,上述方法的任一实施方式被执行。
在一个可选实施方式中,所述计算机程序产品为计算机存储介质,在另一个可选实施方式中,所述计算机程序产品为软件产品,例如SDK等。
基于本申请上述实施方式提供的用于确定目标对象的外接框的方法和装置、电子设备和计算机程序产品,通过利用目标对象的多个关键点中每个关键点的属性信息以及神经网络,来确定目标对象的外接框位置,有利于提高确定目标对象的外接框的效率以及准确性。
下面通过附图和实施方式,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1为本申请一些实施方式中用于确定目标对象的外接框的方法的流程图;
图2为本申请一些实施方式中对神经网络进行训练的方法的流程图;
图3为本申请一些实施方式中用于确定目标对象的外接框的装置的结构示意图;
图4为本申请一些实施方式中的电子设备的结构示意图;
图5为本申请一些实施方式中计算机存储介质的示意图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施方式。应该注意到:除非另外具体说明,否则在这些实施方式中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施方式的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施方式可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或者专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统、大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或者远程计算系统存储介质上。
图1为本申请一些实施方式中用于确定目标对象的外接框的方法的流程图。如图1所示,本申请用于确定目标对象的外接框的方法包括:S100和S110。下面对图1中的各操作分别进行说明。
S100、获取目标对象的多个关键点中每个关键点的属性信息。
在一个可选示例中,本申请实施方式中的目标对象也可以称为检测对象或者外接框检测对象等,本申请实施例对此不做限定。可选地,该目标对象可以为人体,也可以为人脸或者某种特定物体,本申请实施方式不限制目标对象的表现形式。本申请实施方式中的外接框通常是指能够表示出目标对象所在区域的多边形(通常为矩形),且该外接框通常不仅可以准确的涵盖目标对象的所有部位,而且其面积还可以尽可能的小。
在一个可选示例中,本申请实施方式中的关键点的属性信息可以包括关键点的多种信息。作为一个例子,关键点的属性信息可以用于描述目标对象的至少一个关键点是否在图像中可见,以及图像中可见的至少一个关键点在图像中的位置,本申请实施方式可以将在图像中可见的关键点(即位于图像中的关键点)称为有效关键点,而将图像中不可见的关键点(即没有位于图像中的关键点)称为无效关键点。其中,图像中不可见的关键点可以为被遮挡的关键点,也可以为位于图像之外的关键点,本申请实施例对此不做限定。
在一个可选的例子中,关键点的属性信息可以包括:关键点的坐标信息以及关键点的存在判别值,其中,关键点的坐标信息可以用于表示该关键点在图像中的位置,例如,该关键点的坐标信息可以为关键点的二维坐标,但本申请实施例不限于此;关键点的存在判别值可以用于指示关键点在图像中是否可见。例如,若关键点的存在判别值为1,则表示该关键点可见,而若关键点的存在判别值为0,则表示该关键点不可见,但本申请实施例中的存在判别值还可以采用其他方式实现,本实施例对此不做限定。可选地,该属性信息还可以包括其他信息,本申请实施例不限于此。
作为一个例子,本申请实施方式获取到的关键点的属性信息可以是一个3×N维的向量,其中,N表示预先为目标对象设定的多个关键点的数量。本申请实施方式中的一个关 键点的属性信息可以通过一个数组(x,y,v)来表示,其中x和y分别为该关键点在图像中的二维坐标,v为该关键点的存在判别值,在v的取值为第一判别值时,表示该关键点为图像中的可见关键点,在v的取值为第二判别值时,表示该关键点为图像中的不可见关键点。例如,对于目标对象的一个关键点而言,如果该关键点为有效关键点,则该关键点的属性信息可以表示为数组(x,y,1),而如果该关键点为无效关键点(被遮挡或者位于图像之外),则该关键点的属性信息可以表示为数组(0,0,0)。通过这样的方式来表示关键点的属性信息可以方便的获知目标对象的所有关键点在图像中的实际情况。
在一个可选示例中,在目标对象为人体的情况下,本申请实施方式的人体的关键点通常可以包括:头顶、颈部、左肩、右肩、左肘、右肘、左腕、右腕、左髋、右髋、左膝、右膝、左踝以及右踝,利用这14个关键点可以较为完整的描述出人体的姿势形态。此时,多个关键点的属性信息可以包括这14个关键点中部分或全部关键点的属性信息。作为一个例子,本申请实施方式所获取到的多个关键点的属性信息可以包括:头顶的坐标信息和头顶的存在判别值、颈部的坐标信息和颈部的存在判别值、左肩的坐标信息和左肩的存在判别值、右肩的坐标信息和右肩的存在判别值、左肘的坐标信息和左肘的存在判别值、右肘的坐标信息和右肘的存在判别值、左腕的坐标信息和左腕的存在判别值、右腕的坐标信息和右腕的存在判别值、左髋的坐标信息和左髋的存在判别值、右髋的坐标信息和右髋的存在判别值、左膝的坐标信息和左膝的存在判别值、右膝的坐标信息和右膝的存在判别值、左踝的坐标信息和左踝的存在判别值以及右踝的坐标信息和右踝的存在判别值。利用这14个关键点的属性信息可以描述出图像中的人体概况。在目标对象为其他事物的情况下,其关键点通常会随之发生变化,本申请实施方式不限制目标对象的关键点的表现形式。
在一个可选示例中,本申请实施方式可以适用于已经获得了目标对象的多个关键点的属性信息的应用场景,也就是说,在已经从图像中或通过其他方式获得了目标对象的多个关键点的属性信息的应用场景中,本申请实施方式可以通过信息读取等方式获得该目标对象的关键点的属性信息,但本申请实施例不限于此。在这样的应用场景中,本申请实施方式可以根据该目标对象的多个关键点的属性信息,利用预先训练出的神经网络,获得目标对象的外接框位置。
在一个可选示例中,该S100可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的获取模块300执行。
S110、根据目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定目标对象的外接框位置。
可选地,目标对象的外接框位置可以用于确定目标对象的外接框。可选地,该外接框位置可以包括外接框的一个或多个顶点的位置信息。在一个可选的例子中,如果该外接框为四边形,例如,四边形为矩形,则该外接框位置可以包括外接框的两个相对顶点的位置信息,例如,两个相对顶点中每个相对顶点的二维坐标,但本申请实施例对目标对象的外接框位置的实现方式不做限定。
在一个可选示例中,本申请实施方式中的神经网络可以为专用神经网络。该神经网络可以是利用大量的样本数据训练而成的,其中,该样本数据可以包括样本对象的多个关键点的属性信息以及外接框位置,也就是说,该样本数据可以标注有样本对象的外接框位置。该训练过程的一个可选的例子可以参见下面对图2的描述,故在此处不再详细说明。
可选地,本申请实施方式中的神经网络可以包括:至少两层全连接层。与卷积神经网络相比,全连接网络可以具有更快的计算速度和处理效率。
在一个可选示例中,本申请实施方式中的神经网络包括:两层全连接层,且第一层全连接层的激活函数可以为ReLu(Rectified Linear Unit,修正线性单元)激活函数。
在一个可选示例中,本申请实施方式中的神经网络包括:三层全连接层,且第一层全连接层的激活函数可以为ReLu激活函数,而第二层全连接层的激活函数也可以为ReLu激活函数。
在本申请实施例中,神经网络所包含的全连接层的层数以及每一层全连接层所包含的神经元的数量可以根据实际情况自行设置。在神经网络的层数以及神经元的数量足够多的情况下,神经网络具有较强的函数表达能力,从而基于神经网络所获得的外接框位置会更准确。在一个可选的例子中,对于由三层全连接层形成的神经网络,第一层全连接层的神经元数量可以为320个,第二层全连接层的神经元数量也可以为320个,在目标对象的外接框为四边形(例如,矩形),且在目标对象的外接框位置通过外接框的对角线方向上的两个顶点的二维坐标信息来表示的情况下,第三层全连接层的神经元数量可以设置为4个。
通过多次实验验证,具有三层全连接层的神经网络,其中,第一层全连接层和第二层全连接层的激活函数使用ReLu激活函数,且第一层全连接层和第二层全连接层分别具有320个神经元,第三层全连接层具有4个神经元,不但运算速度可以满足实际需求,且确定外接框位置的准确性同样可以满足实际需求。
在一个可选示例中,该S110可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的确定模块310执行。
在本申请实施例中,可以直接将该多个关键点的属性信息输入到神经网络,或者可以 在对该多个关键点的属性信息进行处理之后输入到神经网络。也就是说,可以根据该多个关键点的属性信息,确定神经网络的输入信息,其中,该输入信息可以是该多个关键点的属性信息本身,或者是通过对该多个关键点的属性信息进行处理得到的。神经网络可以对输入信息进行处理,得到输出结果,其中,该目标对象的外接框位置可以根据神经网络的输出结果得到。可选地,神经网络的输出结果可以包括该目标对象的外接框的位置信息,例如,目标对象的外接框的一个或多个顶点的坐标信息,作为一个例子,如果该外接框为矩形,则该输出结果可以包括外接框的两个相对顶点的坐标信息;或者,该目标对象的外接框位置可以通过对神经网络的输出结果进行处理得到的,本申请实施例对此不做限定。
在一个可选示例中,本申请可以根据目标对象的多个关键点中每个关键点的属性信息,进行有效关键点的选取。例如,如果关键点的属性信息包括存在判别值,则可以将存在判别值指示存在的关键点确定为有效关键点,例如,如果关键点的存在判别值为1,则可以将该关键点确定为有效关键点,但本申请实施例不限于此。
可选地,如果能够从该多个关键点中选出至少一个有效关键点,则可以根据该至少一个有效关键点中每个有效关键点的属性信息,对该多个关键点中部分或所有关键点的属性信息进行处理,得到处理后的多个关键点的属性信息,并将该处理后的多个关键点的属性信息作为输入信息。可选地,该处理后的多个关键点的属性信息可以包括该多个关键点中每个关键点的处理后的属性信息,或者包括该多个关键点中的一部分关键点的处理后的属性信息以及该多个关键点中的另一个部分关键点的原始的属性信息。作为一个例子,该处理后的多个关键点的属性信息可以包括至少一个有效关键点中每个有效关键点的处理后的属性信息以及该多个关键点中除该至少一个有效关键点之外的其他关键点的原始的属性信息,也就是说,可以对该至少一个有效关键点中每个有效关键点的属性信息进行处理,而不对其他关键点的属性信息进行处理,但本申请实施例不限于此。
在本申请实施例中,可以通过多种方式处理该至少一个有效关键点的属性信息。作为一个例子,可以根据至少一个有效关键点中每个有效关键点的属性信息包括的坐标信息,确定参考坐标,并且根据参考坐标和有效关键点的属性信息中的坐标信息,确定该有效关键点的处理后的属性信息中的坐标信息。该参考坐标可以通过对该至少一个有效关键点的坐标信息进行处理得到的。例如,该参考坐标可以是对该至少一个有效关键点的坐标进行平均处理得到的,但本申请实施例对该参考坐标的实现方式不做限定。
在一个可选示例中,可以对S100获取到的关键点的属性信息进行零均值化处理,并将零均值处理后获得的信息作为输入信息的一部分提供给神经网络。例如:可以根据有效 关键点的属性信息中的坐标信息,计算坐标均值(m x,m y);之后,针对所有关键点中的每一个有效关键点,分别计算该关键点的坐标信息(x i,y i)与上述坐标均值的差值,即(x i-m x,y i-m y),并利用计算出的差值作为该有效关键点的坐标信息;最后,可以将目标对象的所有关键点的坐标信息以及所有关键点的存在判别值作为输入信息提供给神经网络。
需要特别说明的是,如果本申请实施方式在训练神经网络的过程中,并未对样本数据进行零均值处理,则在S110中,也不需要对目标对象的有效关键点的二维坐标进行零均值处理。
可选地,在提供给神经网络的输入信息为经过零均值处理的输入信息的情况下,本申请实施方式可以将神经网络输出的坐标信息与上述计算出来的坐标均值的和作为目标对象的外接框的多个顶点(如矩形外接框的对角线上的两个顶点)的最终的坐标。例如:神经网络的输出位置信息为(bx 1,by 1)和(bx 2,by 2),则目标对象的外接框对角线上的两个顶点的坐标可以为(bx 1+m x,by 1+m y)以及(bx 2+m x,by 2+m y)。
图2为本申请一些实施方式中对神经网络进行训练的方法的流程图。其中,这里假设多个关键点的数量为N,每个关键点的属性信息可以均为3维向量:(x,y,v),外接框为矩形。此外,假设神经网络的输入包括3×N的矩阵,输出包括2×2的矩阵,可以为外接框对角线上两个顶点的二维坐标。
如图2所示,本申请实施方式对神经网络进行训练的方法包括:S200、S210、S220、S230、S240和S250。下面对图2中的各操作分别进行说明。
S200、从样本集合中获取一条样本数据。
在一个可选示例中,本申请实施方式中的样本集合通常为非空,且通常包括大量的样本数据,例如,该样本集合可以为目前公开的MS COCO数据库等。样本集合中的每一条样本数据可以包括:样本对象的多个关键点的属性信息,并且每个样本数据可以标注有样本对象的外接框位置,其中,关键点的属性信息可以包括关键点的坐标信息和关键点的存在判别值,但本申请实施例不限于此。样本数据所对应的样本对象通常与目标对象具有相同的类型,例如,在目标对象为人体的情况下,样本对象也为人体。本申请实施方式可以从样本集合中按照样本数据的排列次序顺序地选取一条样本数据,也可以从样本集合中随机地选取一条样本数据,本申请实施例对选择样本数据的方式不做限定。
S210、根据该条样本数据的所有有效关键点的坐标信息计算坐标均值。
例如,针对样本数据中的存在判别值v为1的所有关键点的属性信息中的坐标信息计算坐标均值(m x,m y)。
S220、计算该条样本数据中的至少一个有效关键点的坐标信息与上述坐标均值的差值,并将计算出的差值作为相应的有效关键点的坐标信息。
例如,针对样本数据中的有效关键点的坐标信息(x i,y i),计算(x i-m x,y i-m y)。
S230、将该条样本数据的所有关键点的属性信息作为输入提供给神经网络。
在一个可选的例子中,在神经网络的输出为矩形对角线上的两个顶点的二维坐标(bx 1,by 1)和(bx 2,by 2)的情况下,外接框的坐标可以确定为上述输出坐标信息与坐标均值的和(即给定监督),可以表示为(bx 1+m x,by 1+m y)以及(bx 2+m x,by 2+m y)。
可选地,本申请实施方式可以采用随机梯度下降算法进行计算,以实现训练。
可选地,可以通过比较神经网络计算得到的结果和该样本数据标注的外接框位置,来确定是否对神经网络的参数进行调整。其中,若神经网络计算得到的结果与样本数据标注的外接框位置之间的差别低于一定范围,则可以终止训练过程或者继续从样本集合中选取新的样本数据。否则,可以调整神经网络的参数,并利用调整后的神经网络继续进行计算。
S240、判断是否继续从样本集合中获取一条新的样本数据。
如果需要继续从样本集合中获取一条新的样本数据,则返回S200,否则,到S250。
在一个可选示例中,本申请实施方式可以通过判断样本集合中的所有样本数据是否均被用于训练、神经网络输出的结果是否满足预定准确度要求或者读取的样本数量是否达到预定数量等因素,来判断是否继续从样本集合中获取一条新的样本数据。
S250、本次训练过程结束。
在一个可选示例中,通过检测,在确定出神经网络输出的结果满足预定准确度要求的情况下,神经网络训练成功,而如果样本集合中的所有样本数据已经均被用于训练或者读取的样本数量已经达到预定数量,然而,通过检测在确定出神经网络输出的结果还不满足预定准确度要求,则虽然本次训练过程结束了,但是,神经网络并没有训练成功,可以对神经网络进行再次训练。上述检测可以为:从样本集合中选取多个未进行过训练的样本数据,并基于这样的样本数据,按照图1所示的方法提供给神经网络,并确定基于神经网络获得的至少一个外接框位置与相应的样本数据中人工标注的外接框位置的误差,在根据至少一个误差确定出准确度满足预定准确度要求时,神经网络训练成功。另外,本申请实施 方式在训练神经网络的过程中,可以采用L2损失函数进行训练监督,但本申请实施例不限于此。
本申请通过利用样本对象的关键点的属性信息以及外接框位置来训练神经网络,使训练后的神经网络可以基于目标对象的关键点的属性信息直接确定出目标对象的外接框位置;由于在一些实际应用中,存在已经成功获得了目标对象的关键点的属性信息的情况,因此,本申请实施方式可以在不需要图像的情况下,通过充分利用已经获得的目标对象的关键点的属性信息快速的获得目标对象的外接框;由于本申请实施方式中的神经网络是利用样本对象的关键点的属性信息以及外接框位置训练获得的,因此,在样本对象的关键点数量较多且设置的神经元数量较多的情况下,神经网络所需要学习的参数也较多,这有利于使神经网络能够较为准确的确定出目标对象的外接框。
本申请实施例提供的任一种方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本申请实施例提供的任一种方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本申请实施例提及的任一种方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分操作可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的操作;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等至少一个种可以存储程序代码的介质。
图3为本申请一些实施方式中用于确定目标对象的外接框的装置的结构示意图。该实施方式的装置可用于实现本申请上述各方法实施方式。
如图3所示,该实施方式的装置包括:获取模块300和确定模块310,可选的,该装置还可以包括:训练模块320。
获取模块300,用于获取目标对象的多个关键点中每个关键点的属性信息。
可选地,目标对象、关键点、有效关键点以及关键点的属性信息的内容可以参见上述方法实施方式中S100的相关描述,故在此不再详细说明。
在一个可选示例中,本申请实施方式的装置可以适用于已经成功获得了目标对象的关键点的属性信息的应用场景,也就是说,在已经从图像中获得了目标对象的关键点的属性信息的应用场景中,获取模块300可以通过信息读取等方式直接获得已经存在的目标对象的关键点的属性信息。
确定模块310,用于根据获取模块300获取的目标对象的多个关键点中每个关键点的 属性信息以及预设的神经网络,确定目标对象的外接框位置。
可选地,本申请中的神经网络的表现形式(例如,层数、神经元数量以及激活函数等)可以参见上述方法实施方式中的相关描述,故在此不再详细说明。
在一个可选示例中,确定模块310可以包括:第一子模块、第二子模块和第三子模块。其中,第一子模块用于根据获取模块300获取的多个关键点中每个关键点的属性信息,从多个关键点中确定至少一个有效关键点;第二子模块用于根据第一子模块确定出的至少一个有效关键点中每个有效关键点的属性信息,对多个关键点的属性信息进行处理,得到处理后的多个关键点的属性信息;第三子模块用于将第二子模块得到的处理后的多个关键点的属性信息输入到预设的神经网络进行处理,得到目标对象的外接框位置。
可选地,处理后的多个关键点的属性信息可以包括:至少一个有效关键点中每个有效关键点的处理后的属性信息以及多个关键点中除所述至少一个有效关键点之外的其他关键点的属性信息。
可选地,第二子模块可以包括:第一单元和第二单元。其中,第一单元用于根据第一子模块确定出的至少一个有效关键点中每个有效关键点的属性信息包括的坐标信息,确定参考坐标;例如,第一单元对至少一个有效关键点中每个有效关键点的坐标信息对应的坐标进行平均处理,得到参考坐标;第二单元用于根据第一单元确定出的参考坐标和至少一个有效关键点中每个有效关键点的属性信息中的坐标信息,确定每个有效关键点的处理后的属性信息中的坐标信息;例如,第二单元将第一单元确定出的参考坐标作为原点,确定至少一个有效关键点中每个有效关键点的坐标信息所对应的处理后的坐标信息。此时,第三子模块可以用于将第二单元处理后的多个关键点的属性信息输入到神经网络进行处理,得到输出位置信息,并根据参考坐标和上述输出的位置信息,确定目标对象的外接框位置。
在一个可选示例中,在需要对目标对象的关键点的二维坐标进行零均值处理的情况下,第一单元用于根据目标对象的所有有效关键点的坐标信息计算二维坐标均值;第二单元用于针对目标对象的所有有效关键点,分别计算关键点的坐标信息与二维坐标均值的差值,并将差值作为有效关键点的坐标信息;第三子模块用于将目标对象的所有关键点的坐标信息以及所有关键点的存在判别值作为输入信息提供给神经网络。
在确定模块310对目标对象的关键点的二维坐标进行零均值处理的情况下,确定模块310可以将神经网络输出的外接框坐标信息与坐标均值的和作为目标对象的外接框二维坐标信息。
训练模块320用于训练神经网络,获取包括多个样本数据的样本集合,其中,样本数 据包括:样本对象的多个关键点的属性信息,并且样本数据标注有样本对象的外接框位置,然后根据每个样本数据中样本对象的多个关键点的属性信息以及样本对象的外接框位置,训练神经网络。
在一个可选示例中,训练模块320从样本集合中获取多条样本数据,针对每一条样本数据,根据该条样本数据的所有有效关键点的坐标信息计算坐标均值,并分别计算该条样本数据中的至少一个有效关键点的坐标信息与上述坐标均值的差值,将计算出的差值作为相应的有效关键点的坐标信息,然后将该条样本数据的所有关键点的属性信息作为输入提供给神经网络。训练模块320对神经网络进行训练所执行的操作的一个例子可以参见上述方法施方式中的描述,故在此不再重复说明。
本申请实施方式还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。下面参考图4,其示出了适于用来实现本申请实施方式的终端设备或服务器的电子设备400的结构示意图:如图4所示,电子设备400包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)401,和/或一个或多个加速单元413等,加速单元413可包括但不限于GPU、FPGA、其他类型的专用处理器等,处理器可以根据存储在只读存储器(ROM)402中的可执行指令或者从存储部分408加载到随机访问存储器(RAM)403中的可执行指令而执行各种适当的动作和处理。通信部412可以包括但不限于网卡,所述网卡可以包括但不限于IB(Infiniband)网卡。处理器可与只读存储器402和/或随机访问存储器403中通信以执行可执行指令,通过总线404与通信部412相连、并经通信部412与其他目标设备通信,从而完成本申请实施方式提供的任一项方法对应的操作。例如:获取目标对象的多个关键点中每个关键点的属性信息;根据所述目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定所述目标对象的外接框位置。
此外,在RAM 403中,还可以存储有装置操作所需的各种程序以及数据。CPU401、ROM402及RAM403通过总线404彼此相连。在有RAM403的情况下,ROM402为可选模块。RAM403存储可执行指令,或在运行时向ROM402中写入可执行指令,可执行指令使处理器执行上述通信方法对应的操作。输入/输出(I/O)接口405也连接至总线404。通信部412可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口405:包括键盘、鼠标等的输入部分406;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分407;包括硬盘等的存储部分408;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分409。通信部分409经 由诸如因特网的网络执行通信处理。驱动器410也根据需要连接至I/O接口405。可拆卸介质411,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器410上,以便于从其上读出的计算机程序根据需要被安装入存储部分408。
需要说明的,如图4所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图4的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或者集成设置等实现方式,例如加速单元413和CPU401可分离设置或者可将加速单元413集成在CPU401上,通信部可分离设置,也可集成设置在CPU401或者加速单元413上,等等。这些可替换的实施方式均落入本申请公开的保护范围。
特别地,根据本申请公开的实施方式,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请公开的实施方式包括一种计算机程序产品,其包括有形地包含在计算机可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施方式提供的方法步骤对应的指令。例如:获取目标对象的多个关键点中每个关键点的属性信息;根据所述目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定所述目标对象的外接框位置。在这样的实施方式中,该计算机程序可以通过通信部分409从网络上被下载以及安装,和/或从可拆卸介质411被安装。在该计算机程序中的指令被中央处理单元(CPU)401执行时,执行本申请的方法中限定的上述功能。
可能以许多方式来实现本申请的方法和装置、设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置、设备。用于方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施方式中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的计算机可读指令。因而,本申请还覆盖存储用于执行本申请的方法的程序的记录介质,例如,图5所示的计算机可读存储介质500。
可能以许多方式来实现本申请的方法和装置、电子设备以及计算机可读存储介质。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置、电子设备以及计算机可读存储介质。用于方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施方式中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的计算机可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施方式是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施方式。

Claims (30)

  1. 一种用于确定目标对象的外接框的方法,其特征在于,包括:
    获取目标对象的多个关键点中每个关键点的属性信息;
    根据所述目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定所述目标对象的外接框位置。
  2. 根据权利要求1所述的方法,其特征在于,所述目标对象包括:人体。
  3. 根据权利要求1或2所述的方法,其特征在于,所述关键点的属性信息包括:坐标信息以及存在判别值。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定所述目标对象的外接框位置,包括:
    根据所述多个关键点中每个关键点的属性信息,从所述多个关键点中确定至少一个有效关键点;
    根据所述至少一个有效关键点中每个有效关键点的属性信息,对所述多个关键点的属性信息进行处理,得到处理后的多个关键点的属性信息;
    将所述处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到所述目标对象的外接框位置。
  5. 根据权利要求4所述的方法,其特征在于,所述处理后的多个关键点的属性信息包括:所述至少一个有效关键点中每个有效关键点的处理后的属性信息以及所述多个关键点中除所述至少一个有效关键点之外的其他关键点的属性信息。
  6. 根据权利要求4或5所述的方法,其特征在于,所述根据所述至少一个有效关键点中每个有效关键点的属性信息,对所述多个关键点的属性信息进行处理,得到处理后的多个关键点的属性信息,包括:
    根据所述至少一个有效关键点中每个有效关键点的属性信息包括的坐标信息,确定参考坐标;
    根据所述参考坐标和所述至少一个有效关键点中每个有效关键点的属性信息中的坐标信息,确定所述每个有效关键点的处理后的属性信息中的坐标信息。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述至少一个有效关键点中每个有效关键点的属性信息包括的坐标信息,确定参考坐标,包括:
    对所述至少一个有效关键点中每个有效关键点的坐标信息对应的坐标进行平均处理,得到所述参考坐标;和/或
    所述根据所述参考坐标和所述至少一个有效关键点中每个有效关键点的属性信息中的坐标信息,确定所述每个有效关键点的处理后的属性信息中的坐标信息,包括:
    将所述参考坐标作为原点,确定所述至少一个有效关键点中每个有效关键点的坐标信息所对应的处理后的坐标信息。
  8. 根据权利要求6或7所述的方法,其特征在于,所述将所述处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到所述目标对象的外接框位置,包括:
    将所述处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到输出位置信息;
    根据所述参考坐标和所述输出位置信息,确定所述目标对象的外接框位置。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述方法还包括:
    获取包括多个样本数据的样本集合,其中,所述样本数据包括:样本对象的多个关键点的属性信息,并且所述样本数据标注有所述样本对象的外接框位置;
    根据每个所述样本数据中样本对象的多个关键点的属性信息以及所述样本对象的外接框位置,训练所述神经网络。
  10. 根据权利要求1至9中任一项所述的方法,其特征在于,所述神经网络是基于随机梯度下降算法进行训练得到的。
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,所述目标对象的外接框位置包括:所述目标对象的外接框对角线方向上的两个顶点的坐标信息。
  12. 根据权利要求1至11中任一项所述的方法,其特征在于,所述神经网络包括:至少两层全连接层。
  13. 根据权利要求1至12中任一项所述的方法,其特征在于,所述神经网络包括:三层全连接层,其中,所述三层全连接层的第一层全连接层和第二层全连接层中的至少一层的激活函数包括:修正线性单元ReLu激活函数。
  14. 根据权利要求13所述的方法,其特征在于,所述第一层全连接层包括320个神经元,所述第二层全连接层包括320个神经元,所述三层全连接层中的最后一层全连接层包括4个神经元。
  15. 一种用于确定目标对象的外接框的装置,其特征在于,包括:
    获取模块,用于获取目标对象的多个关键点中每个关键点的属性信息;
    确定模块,用于根据所述获取模块获取的所述目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定所述目标对象的外接框位置。
  16. 根据权利要求15所述的装置,其特征在于,所述目标对象包括:人体。
  17. 根据权利要求15或16所述的装置,其特征在于,所述关键点的属性信息包括:坐标信息以及存在判别值。
  18. 根据权利要求17所述的装置,其特征在于,所述确定模块包括:
    第一子模块,用于根据所述获取模块获取的多个关键点中每个关键点的属性信息,从所述多个关键点中确定至少一个有效关键点;
    第二子模块,用于根据所述第一子模块确定出的至少一个有效关键点中每个有效关键点的属性信息,对所述多个关键点的属性信息进行处理,得到处理后的多个关键点的属性信息;
    第三子模块,用于将所述第二子模块得到的处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到所述目标对象的外接框位置。
  19. 根据权利要求18所述的装置,其特征在于,所述处理后的多个关键点的属性信息包括:所述至少一个有效关键点中每个有效关键点的处理后的属性信息以及所述多个关键点中除所述至少一个有效关键点之外的其他关键点的属性信息。
  20. 根据权利要求18或19所述的装置,其特征在于,所述第二子模块包括:
    第一单元,用于根据所述第一子模块确定出的至少一个有效关键点中每个有效关键点的属性信息包括的坐标信息,确定参考坐标;
    第二单元,用于根据所述第一单元确定出的参考坐标和所述至少一个有效关键点中每个有效关键点的属性信息中的坐标信息,确定所述每个有效关键点的处理后的属性信息中的坐标信息。
  21. 根据权利要求20所述的装置,其特征在于,所述第一单元用于:
    对所述第一子模块确定出的至少一个有效关键点中每个有效关键点的坐标信息对应的坐标进行平均处理,得到所述参考坐标;和/或
    第二单元用于:
    将所述第一单元确定出的参考坐标作为原点,确定所述至少一个有效关键点中每个有效关键点的坐标信息所对应的处理后的坐标信息。
  22. 根据权利要求20或21所述的装置,其特征在于,所述第三子模块用于:
    将所述第二子模块得到的处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到输出位置信息;
    根据所述参考坐标和所述输出位置信息,确定所述目标对象的外接框位置。
  23. 根据权利要求15至22中任一项所述的装置,其特征在于,所述装置还包括:训练模块,用于:
    获取包括多个样本数据的样本集合,其中,所述样本数据包括:样本对象的多个关键点的属性信息,并且所述样本数据标注有所述样本对象的外接框位置;
    根据每个所述样本数据中样本对象的多个关键点的属性信息以及所述样本对象的外接框位置,训练所述神经网络。
  24. 根据权利要求15至23中任一项所述的装置,其特征在于,所述神经网络是基于随机梯度下降算法进行训练得到的。
  25. 根据权利要求15至24中任一项所述的装置,其特征在于,所述目标对象的外接框位置包括:所述目标对象的外接框对角线方向上的两个顶点的坐标信息。
  26. 根据权利要求15至25中任一项所述的装置,其特征在于,所述神经网络包括:至少两层全连接层。
  27. 根据权利要求15至26中任一项所述的装置,其特征在于,所述神经网络包括:三层全连接层,其中,所述三层全连接层的第一层全连接层和第二层全连接层中的至少一层的激活函数包括:修正线性单元ReLu激活函数。
  28. 根据权利要求27所述的装置,其特征在于,所述第一层全连接层包括320个神经元,所述第二层全连接层包括320个神经元,所述三层全连接层中的最后一层全连接层包括4个神经元。
  29. 一种电子设备,包括:处理器和计算机可读存储介质,计算机可读存储介质用于存储指令,所述处理器对所述指令的执行使得所述电子设备执行如权利要求1至14中任一项所述的方法。
  30. 一种计算机可读存储介质,其上存储有指令,所述指令被处理器执行时,执行如权利要求1至14中任一项所述的方法。
PCT/CN2018/111464 2017-11-21 2018-10-23 用于确定目标对象的外接框的方法、装置、介质和设备 WO2019100886A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2019572712A JP6872044B2 (ja) 2017-11-21 2018-10-23 対象物の外接枠を決定するための方法、装置、媒体及び機器
SG11201913529UA SG11201913529UA (en) 2017-11-21 2018-10-23 Methods and apparatuses for determining bounding box of target object, media, and devices
US16/731,858 US11348275B2 (en) 2017-11-21 2019-12-31 Methods and apparatuses for determining bounding box of target object, media, and devices

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711165979.8 2017-11-21
CN201711165979.8A CN108229305B (zh) 2017-11-21 2017-11-21 用于确定目标对象的外接框的方法、装置和电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/731,858 Continuation US11348275B2 (en) 2017-11-21 2019-12-31 Methods and apparatuses for determining bounding box of target object, media, and devices

Publications (1)

Publication Number Publication Date
WO2019100886A1 true WO2019100886A1 (zh) 2019-05-31

Family

ID=62652771

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/111464 WO2019100886A1 (zh) 2017-11-21 2018-10-23 用于确定目标对象的外接框的方法、装置、介质和设备

Country Status (5)

Country Link
US (1) US11348275B2 (zh)
JP (1) JP6872044B2 (zh)
CN (1) CN108229305B (zh)
SG (1) SG11201913529UA (zh)
WO (1) WO2019100886A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7419964B2 (ja) 2019-06-21 2024-01-23 富士通株式会社 人体動作認識装置及び方法、電子機器

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018033137A1 (zh) * 2016-08-19 2018-02-22 北京市商汤科技开发有限公司 在视频图像中展示业务对象的方法、装置和电子设备
CN108229305B (zh) * 2017-11-21 2021-06-04 北京市商汤科技开发有限公司 用于确定目标对象的外接框的方法、装置和电子设备
CN110826357B (zh) * 2018-08-07 2022-07-26 北京市商汤科技开发有限公司 对象三维检测及智能驾驶控制的方法、装置、介质及设备
CN111241887B (zh) * 2018-11-29 2024-04-16 北京市商汤科技开发有限公司 目标对象关键点识别方法及装置、电子设备和存储介质
CN110782404B (zh) * 2019-10-11 2022-06-10 北京达佳互联信息技术有限公司 一种图像处理方法、装置及存储介质
CN110929792B (zh) * 2019-11-27 2024-05-24 深圳市商汤科技有限公司 图像标注方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120020519A1 (en) * 2010-07-21 2012-01-26 Canon Kabushiki Kaisha Image processing apparatus, image processing method, and storage medium
CN107194361A (zh) * 2017-05-27 2017-09-22 成都通甲优博科技有限责任公司 二维姿势检测方法及装置
CN107220604A (zh) * 2017-05-18 2017-09-29 清华大学深圳研究生院 一种基于视频的跌倒检测方法
CN108229305A (zh) * 2017-11-21 2018-06-29 北京市商汤科技开发有限公司 用于确定目标对象的外接框的方法、装置和电子设备

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9342888B2 (en) * 2014-02-08 2016-05-17 Honda Motor Co., Ltd. System and method for mapping, localization and pose correction of a vehicle based on images
IL231862A (en) * 2014-04-01 2015-04-30 Superfish Ltd Image representation using a neural network
JP2016006626A (ja) 2014-05-28 2016-01-14 株式会社デンソーアイティーラボラトリ 検知装置、検知プログラム、検知方法、車両、パラメータ算出装置、パラメータ算出プログラムおよびパラメータ算出方法
WO2016004330A1 (en) * 2014-07-03 2016-01-07 Oim Squared Inc. Interactive content generation
CN104573715B (zh) 2014-12-30 2017-07-25 百度在线网络技术(北京)有限公司 图像主体区域的识别方法及装置
WO2016179808A1 (en) 2015-05-13 2016-11-17 Xiaoou Tang An apparatus and a method for face parts and face detection
US9767381B2 (en) 2015-09-22 2017-09-19 Xerox Corporation Similarity-based detection of prominent objects using deep CNN pooling layers as features
WO2017095948A1 (en) 2015-11-30 2017-06-08 Pilot Ai Labs, Inc. Improved general object detection using neural networks
KR102592076B1 (ko) 2015-12-14 2023-10-19 삼성전자주식회사 딥러닝 기반 영상 처리 장치 및 방법, 학습 장치
CN107194338A (zh) 2017-05-14 2017-09-22 北京工业大学 基于人体树图模型的交通环境行人检测方法
US11080551B2 (en) * 2017-05-22 2021-08-03 Intel Corporation Proposal region filter for digital image processing
US10438371B2 (en) * 2017-09-22 2019-10-08 Zoox, Inc. Three-dimensional bounding box from two-dimensional image and point cloud data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120020519A1 (en) * 2010-07-21 2012-01-26 Canon Kabushiki Kaisha Image processing apparatus, image processing method, and storage medium
CN107220604A (zh) * 2017-05-18 2017-09-29 清华大学深圳研究生院 一种基于视频的跌倒检测方法
CN107194361A (zh) * 2017-05-27 2017-09-22 成都通甲优博科技有限责任公司 二维姿势检测方法及装置
CN108229305A (zh) * 2017-11-21 2018-06-29 北京市商汤科技开发有限公司 用于确定目标对象的外接框的方法、装置和电子设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7419964B2 (ja) 2019-06-21 2024-01-23 富士通株式会社 人体動作認識装置及び方法、電子機器

Also Published As

Publication number Publication date
JP6872044B2 (ja) 2021-05-19
US11348275B2 (en) 2022-05-31
SG11201913529UA (en) 2020-01-30
JP2020525959A (ja) 2020-08-27
CN108229305B (zh) 2021-06-04
CN108229305A (zh) 2018-06-29
US20200134859A1 (en) 2020-04-30

Similar Documents

Publication Publication Date Title
WO2019100886A1 (zh) 用于确定目标对象的外接框的方法、装置、介质和设备
CN108427927B (zh) 目标再识别方法和装置、电子设备、程序和存储介质
US11120254B2 (en) Methods and apparatuses for determining hand three-dimensional data
US10210418B2 (en) Object detection system and object detection method
US20190108447A1 (en) Multifunction perceptrons in machine learning environments
WO2019128932A1 (zh) 人脸姿态分析方法、装置、设备、存储介质以及程序
US10157309B2 (en) Online detection and classification of dynamic gestures with recurrent convolutional neural networks
WO2019105337A1 (zh) 基于视频的人脸识别方法、装置、设备、介质及程序
US10572072B2 (en) Depth-based touch detection
CN108229353B (zh) 人体图像的分类方法和装置、电子设备、存储介质、程序
WO2018054329A1 (zh) 物体检测方法和装置、电子设备、计算机程序和存储介质
US20150253864A1 (en) Image Processor Comprising Gesture Recognition System with Finger Detection and Tracking Functionality
CN108229301B (zh) 眼睑线检测方法、装置和电子设备
US20160026857A1 (en) Image processor comprising gesture recognition system with static hand pose recognition based on dynamic warping
US11954862B2 (en) Joint estimation of heart rate and respiratory rate using neural networks
US11604963B2 (en) Feedback adversarial learning
CN110659570A (zh) 目标对象姿态跟踪方法、神经网络的训练方法及装置
WO2023083030A1 (zh) 一种姿态识别方法及其相关设备
CN114005149A (zh) 一种目标角度检测模型的训练方法及装置
US10867441B2 (en) Method and apparatus for prefetching data items to a cache
López-Rubio et al. Robust fitting of ellipsoids by separating interior and exterior points during optimization
Guo et al. A hybrid framework based on warped hierarchical tree for pose estimation of texture-less objects
KR102215811B1 (ko) 파티클 이미지와 icp매칭을 이용하여 오브젝트 인식속도를 향상시킨 피라미드 이미지 기반의 영상 분석 방법 및 이를 위한 영상 분석 장치
JP7364077B2 (ja) 画像処理装置、画像処理方法、及びプログラム
US20220215564A1 (en) Three-dimensional scan registration with deformable models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18881654

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019572712

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18881654

Country of ref document: EP

Kind code of ref document: A1