WO2019100886A1 - 用于确定目标对象的外接框的方法、装置、介质和设备 - Google Patents
用于确定目标对象的外接框的方法、装置、介质和设备 Download PDFInfo
- Publication number
- WO2019100886A1 WO2019100886A1 PCT/CN2018/111464 CN2018111464W WO2019100886A1 WO 2019100886 A1 WO2019100886 A1 WO 2019100886A1 CN 2018111464 W CN2018111464 W CN 2018111464W WO 2019100886 A1 WO2019100886 A1 WO 2019100886A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- attribute information
- key points
- target object
- key point
- information
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/12—Bounding box
Definitions
- the present application relates to computer vision technology, and more particularly to a method, apparatus, electronic device and computer readable storage medium for determining a bounding box of a target object.
- Faster-RCNN Convolutional Neural Networks
- RPN Random Proposal Network
- the RCNN is used to score and correct each candidate area to determine the external frame of the human body.
- the accuracy of determining the external frame of the human body needs to be further improved.
- Embodiments of the present application provide a technical solution for determining a bounding box of a target object.
- a method for determining a bounding box of a target object includes: acquiring attribute information of each of a plurality of key points of the target object; The attribute information of each key point in the key point and the preset neural network determine the location of the bounding box of the target object.
- the target object includes: a human body.
- the attribute information of the key point includes: coordinate information and a presence discriminant value.
- the determining, according to the attribute information of each key point of the target object and the preset neural network, determining a bounding box of the target object a location including: determining, according to attribute information of each of the plurality of key points, at least one valid key point from the plurality of key points; according to each of the at least one valid key point Attribute information, the attribute information of the plurality of key points is processed to obtain attribute information of the processed plurality of key points; and the attribute information of the processed plurality of key points is input to the preset neural network Processing is performed to obtain a position of the outer frame of the target object.
- the processed attribute information of the multiple key points includes: processed attribute information of each valid key point of the at least one valid key point, and Attribute information of other key points other than the at least one valid key point among the plurality of key points.
- the attribute information of the multiple key points is processed and processed according to attribute information of each valid key point of the at least one valid key point.
- the attribute information of the plurality of key points after the method includes: determining, according to the coordinate information included in the attribute information of each of the at least one valid key point, the reference coordinate; according to the reference coordinate and the at least one effective key The coordinate information in the attribute information of each valid key point in the point determines coordinate information in the processed attribute information of each valid key point.
- the determining the reference coordinate according to the coordinate information included in the attribute information of each valid key point of the at least one valid key point including: the at least one The coordinates corresponding to the coordinate information of each valid key point of the effective key points are averaged to obtain the reference coordinates; and/or, according to the reference coordinates and each of the at least one valid key point
- the coordinate information in the attribute information, the coordinate information in the processed attribute information of each valid key point is determined, including: using the reference coordinate as an origin, determining each valid key of the at least one valid key point The processed coordinate information corresponding to the coordinate information of the point.
- the attribute information of the processed multiple key points is input to the preset neural network for processing, to obtain a bounding box of the target object.
- Positioning including: inputting attribute information of the processed plurality of key points to the preset neural network for processing, obtaining output position information; determining the target according to the reference coordinate and the output position information The bounding box position of the object.
- the method further includes: acquiring a sample set including a plurality of sample data, where the sample data includes: attribute information of multiple key points of the sample object, And the sample data is marked with a location of a circumscribing frame of the sample object;
- the neural network is trained according to attribute information of a plurality of key points of the sample object in each of the sample data and a circumscribing position of the sample object.
- the neural network is trained based on a random gradient descent algorithm.
- the location of the circumscribing frame of the target object includes: coordinate information of two vertices in a diagonal direction of the circumscribing frame of the target object.
- the neural network includes: at least two layers of fully connected layers.
- the neural network includes: a three-layer fully-connected layer, wherein the first-layer fully-connected layer and the second-layer fully-connected layer of the three-layer fully-connected layer
- the activation function of at least one of the layers includes: a modified linear unit ReLu activation function.
- the first layer fully connected layer includes 320 neurons
- the second layer fully connected layer includes 320 neurons
- the three layers are fully connected.
- the last layer of the fully connected layer consists of 4 neurons.
- an apparatus for determining a bounding box of a target object includes: an acquiring module, configured to acquire attribute information of each of a plurality of key points of the target object; determining a module And determining, according to the attribute information of each key point of the plurality of key points of the target object acquired by the acquiring module, and the preset neural network, determining a location of the bounding box of the target object.
- the target object includes: a human body.
- the attribute information of the key point includes: coordinate information and a presence discriminant value.
- the determining module includes: a first submodule, configured to: according to attribute information of each key point of the plurality of key points acquired by the acquiring module, Determining at least one valid key point among the plurality of key points; the second sub-module, configured to: according to attribute information of each valid key point of the at least one valid key point determined by the first sub-module The attribute information of the key points is processed to obtain the attribute information of the processed plurality of key points; the third sub-module is configured to input the attribute information of the processed plurality of key points obtained by the second sub-module to the The preset neural network performs processing to obtain a location of the bounding box of the target object.
- the attribute information of the processed multiple key points includes: processed attribute information of each valid key point of the at least one valid key point, and Attribute information of the other key points other than the at least one valid key point among the plurality of key points.
- the second submodule includes: a first unit, configured to determine, according to the first submodule, each valid key of the at least one valid key point a coordinate information included in the attribute information of the point, determining a reference coordinate; a second unit, configured to determine a reference coordinate according to the first unit and a coordinate in attribute information of each of the at least one valid key point Information, determining coordinate information in the processed attribute information of each valid key point.
- the first unit is configured to: correspond to coordinate information of each valid key point of the at least one valid key point determined by the first submodule The coordinates are averaged to obtain the reference coordinates; and/or the second unit is configured to: determine a coordinate of each of the at least one valid key point by using a reference coordinate determined by the first unit as an origin The processed coordinate information corresponding to the information.
- the third submodule is configured to: input attribute information of the processed multiple key points obtained by the second submodule to the preset
- the neural network performs processing to obtain output location information; and determines a location of the bounding box of the target object according to the reference coordinates and the output location information.
- the apparatus further includes: a training module, configured to: acquire a sample set including a plurality of sample data, where the sample data includes: a plurality of sample objects Attribute information of a key point, and the sample data is labeled with a circumscribing position of the sample object; attribute information of a plurality of key points of the sample object in each of the sample data and a position of a bounding box of the sample object Train the neural network.
- a training module configured to: acquire a sample set including a plurality of sample data, where the sample data includes: a plurality of sample objects Attribute information of a key point, and the sample data is labeled with a circumscribing position of the sample object; attribute information of a plurality of key points of the sample object in each of the sample data and a position of a bounding box of the sample object Train the neural network.
- the neural network is trained based on a random gradient descent algorithm.
- the location of the circumscribing frame of the target object includes: coordinate information of two vertices in a diagonal direction of the circumscribing frame of the target object.
- the neural network comprises: at least two layers of fully connected layers.
- the neural network includes: a three-layer fully connected layer, wherein the first layer of the three-layer fully connected layer is fully connected and the second layer is fully connected.
- the activation function of at least one of the layers includes: a modified linear unit ReLu activation function.
- the first layer of fully connected layers comprises 320 neurons
- the second layer of fully connected layers comprises 320 neurons
- the last of the three layers of fully connected layers comprises 4 neurons.
- an electronic device comprising: a processor and a computer readable storage medium for storing instructions, the execution of the instructions by the processor causing the The electronic device performs any of the embodiments described above.
- a computer program product comprising at least one instruction, the at least one instruction being executed by a processor, any embodiment of the method being performed.
- the computer program product is a computer storage medium, and in another alternative embodiment, the computer program product is a software product, such as an SDK or the like.
- a method and apparatus, an electronic device, and a computer program product for determining a bounding box of a target object provided by using the above-described embodiments of the present application, by utilizing attribute information and a neural network of each of a plurality of key points of the target object Determining the location of the bounding box of the target object is beneficial to improving the efficiency and accuracy of determining the bounding box of the target object.
- FIG. 1 is a flow chart of a method for determining a bounding box of a target object in some embodiments of the present application
- FIG. 2 is a flow chart of a method for training a neural network in some embodiments of the present application
- FIG. 3 is a schematic structural diagram of an apparatus for determining a bounding box of a target object in some embodiments of the present application
- FIG. 4 is a schematic structural diagram of an electronic device in some embodiments of the present application.
- FIG. 5 is a schematic diagram of a computer storage medium in some embodiments of the present application.
- Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
- Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
- program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
- the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
- program modules may be located on a local or remote computing system storage medium including storage devices.
- FIG. 1 is a flow chart of a method for determining a bounding box of a target object in some embodiments of the present application. As shown in FIG. 1, the method for determining a bounding box of a target object of the present application includes: S100 and S110. The respective operations in Fig. 1 will be described below.
- the target object in the embodiment of the present application may also be referred to as a detection object or a circumstance detection object, and the like.
- the target object may be a human body, or may be a human face or a specific object.
- the embodiment of the present application does not limit the representation form of the target object.
- the circumscribing frame in the embodiment of the present application generally refers to a polygon (usually a rectangle) capable of indicating the area where the target object is located, and the circumscribing frame generally not only accurately covers all parts of the target object, but also the area thereof can be as much as possible. small.
- the attribute information of the key points in the embodiment of the present application may include various information of key points.
- the attribute information of the key point may be used to describe whether at least one key point of the target object is visible in the image, and the position of at least one key point visible in the image in the image, and the embodiment of the present application may be in the image.
- the visible key points ie, the key points in the image
- the key points that are not visible in the image that is, the key points that are not in the image
- the key point that is not visible in the image may be a key point that is occluded, or may be a key point that is located outside the image, which is not limited in this embodiment of the present application.
- the attribute information of the key point may include: coordinate information of the key point and a presence discriminant value of the key point, wherein the coordinate information of the key point may be used to indicate the position of the key point in the image, for example,
- the coordinate information of the key point may be a two-dimensional coordinate of the key point, but the embodiment of the present application is not limited thereto; the presence discriminant value of the key point may be used to indicate whether the key point is visible in the image.
- the discriminant value of the key point is 1, it indicates that the key point is visible, and if the discriminant value of the key point is 0, it indicates that the key point is not visible, but the discriminant value in the embodiment of the present application may also be This solution is implemented in other manners, which is not limited in this embodiment.
- the attribute information may further include other information, and the embodiment of the present application is not limited thereto.
- the attribute information of the key point acquired by the embodiment of the present application may be a 3 ⁇ N-dimensional vector, where N represents the number of multiple key points set in advance for the target object.
- the attribute information of a key point in the embodiment of the present application may be represented by an array (x, y, v), where x and y are the two-dimensional coordinates of the key point in the image, respectively, and v is the existence of the key point.
- the discriminant value when the value of v is the first discriminant value, indicates that the key point is a visible key point in the image, and when the value of v is the second discriminant value, it indicates that the key point is an invisible key in the image. point.
- the attribute information of the key point can be represented as an array (x, y, 1), and if the key point is an invalid key point (
- the attribute information of the key point can be represented as an array (0, 0, 0) if it is occluded or located outside the image.
- the key points of the human body of the embodiments of the present application may generally include: a head, a neck, a left shoulder, a right shoulder, a left elbow, a right elbow, a left wrist, a right wrist, and a left
- the hip, right hip, left knee, right knee, left ankle, and right ankle can use these 14 key points to describe the posture of the human body more completely.
- the attribute information of the plurality of key points may include attribute information of some or all of the 14 key points.
- the attribute information of the plurality of key points acquired by the embodiment of the present application may include: coordinate information of the top of the head and a presence discriminant value of the head, coordinate information of the neck, presence discriminant value of the neck, and coordinate information of the left shoulder.
- the attribute information of these 14 key points the human body profile in the image can be described.
- the key points thereof usually change accordingly, and the embodiment of the present application does not limit the expression of the key points of the target object.
- the embodiments of the present application may be applied to an application scenario in which attribute information of a plurality of key points of a target object has been obtained, that is, a plurality of target objects have been obtained from an image or by other means.
- the embodiment of the present application can obtain the attribute information of the key point of the target object by means of information reading, etc., but the embodiment of the present application is not limited thereto.
- the embodiment of the present application may obtain the location of the bounding box of the target object by using the pre-trained neural network according to the attribute information of the plurality of key points of the target object.
- the S100 may be executed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an acquisition module 300 executed by the processor.
- S110 Determine a location of a bounding box of the target object according to attribute information of each of the plurality of key points of the target object and a preset neural network.
- the bounding box location of the target object can be used to determine the bounding box of the target object.
- the outer frame position may include location information of one or more vertices of the outer frame.
- the outer frame position may include position information of two opposite vertices of the outer frame, for example, each of the two opposite vertices
- the two-dimensional coordinates are not limited to the implementation of the position of the outer frame of the target object in the embodiment of the present application.
- the neural network in the embodiments of the present application may be a dedicated neural network.
- the neural network may be trained by using a large amount of sample data, wherein the sample data may include attribute information of a plurality of key points of the sample object and a position of the outer frame, that is, the sample data may be labeled with the sample object. The location of the external frame.
- An optional example of the training process can be found in the description of Figure 2 below, and therefore will not be described in detail herein.
- the neural network in the embodiment of the present application may include: at least two layers of fully connected layers. Compared to convolutional neural networks, fully connected networks can have faster computational speeds and processing efficiencies.
- the neural network in the embodiment of the present application includes: two layers of fully connected layers, and the activation function of the first layer fully connected layer may be a ReLu (Rectified Linear Unit) activation function.
- ReLu Rectified Linear Unit
- the neural network in the embodiment of the present application includes: a three-layer fully connected layer, and the activation function of the first layer fully connected layer may be a ReLu activation function, and the activation function of the second layer fully connected layer is also The function can be activated for ReLu.
- the number of layers of the fully connected layer included in the neural network and the number of neurons included in the fully connected layer of each layer may be set according to actual conditions. In the case that the number of layers of the neural network and the number of neurons are sufficient, the neural network has a strong function of expressing functions, so that the position of the circumscribing frame obtained based on the neural network is more accurate.
- the number of neurons in the first layer of the fully connected layer may be 320, and the number of neurons in the second layer of the fully connected layer may also be 320.
- the third layer The number of neurons in the fully connected layer can be set to four.
- the neural network with three layers of fully connected layers is verified by multiple experiments, wherein the activation function of the first layer of the fully connected layer and the second layer of the fully connected layer uses the ReLu activation function, and the first layer is fully connected and the second layer
- the fully connected layer has 320 neurons, and the third layer has 4 neurons.
- the calculation speed can meet the actual needs, and the accuracy of determining the position of the external frame can also meet the actual needs.
- the S110 may be executed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a determination module 310 executed by the processor.
- the attribute information of the plurality of key points may be directly input to the neural network, or may be input to the neural network after processing the attribute information of the plurality of key points. That is to say, the input information of the neural network may be determined according to the attribute information of the plurality of key points, wherein the input information may be the attribute information of the plurality of key points itself, or the attribute of the plurality of key points The information is processed.
- the neural network can process the input information to obtain an output result, wherein the location of the bounding box of the target object can be obtained according to the output result of the neural network.
- the output result of the neural network may include location information of the bounding box of the target object, for example, coordinate information of one or more vertices of the bounding box of the target object.
- location information of the bounding box of the target object for example, coordinate information of one or more vertices of the bounding box of the target object.
- the output of the neural network may be obtained by processing the output of the neural network.
- the present application may perform selection of valid key points according to attribute information of each of a plurality of key points of the target object. For example, if the attribute information of the key point includes the presence discriminant value, the key point where the existence discriminant value indicates existence may be determined as a valid key point, for example, if the existence judgment value of the key point is 1, the key point may be determined as An effective key point, but the embodiment of the present application is not limited thereto.
- part or all of the multiple key points may be determined according to attribute information of each valid key point of the at least one valid key point
- the attribute information of the key point is processed, and the attribute information of the processed plurality of key points is obtained, and the attribute information of the processed plurality of key points is used as the input information.
- the attribute information of the processed plurality of key points may include processed attribute information of each of the plurality of key points, or processed by a part of the plurality of key points. The attribute information and the original attribute information of the other key point of the plurality of key points.
- the attribute information of the processed plurality of key points may include processed attribute information of each of the at least one valid key point and the at least one valid key point of the plurality of key points
- the original attribute information of the other key points that is, the attribute information of each of the at least one valid key point can be processed without processing the attribute information of the other key points, but the embodiment of the present application Not limited to this.
- attribute information of the at least one valid key point may be processed in various manners.
- the reference coordinates may be determined according to coordinate information included in the attribute information of each of the at least one valid key point, and the valid key point is determined according to the coordinate information in the attribute information of the reference coordinate and the valid key point.
- the reference coordinates can be obtained by processing coordinate information of the at least one valid key point.
- the reference coordinate may be obtained by averaging the coordinates of the at least one valid key point, but the implementation manner of the reference coordinate is not limited in the embodiment of the present application.
- the attribute information of the key points acquired by S100 may be subjected to zero-averaging processing, and the information obtained after the zero-mean processing is provided to the neural network as part of the input information.
- the coordinate mean value (m x , m y ) can be calculated according to the coordinate information in the attribute information of the valid key point; after that, the coordinate information of the key point is calculated for each valid key point of all the key points (x) i , y i ) is the difference between the above-mentioned coordinate mean values, ie (x i -m x , y i -m y ), and uses the calculated difference value as the coordinate information of the valid key point; finally, the target object can be The coordinate information of all the key points and the existence discriminant values of all the key points are provided as input information to the neural network.
- the embodiment of the present application may use the sum of the coordinate information output by the neural network and the calculated coordinate mean as the target object.
- the final coordinates of the multiple vertices of the box such as the two vertices on the diagonal of the rectangular bounding box.
- the output position information of the neural network is (bx 1 , by 1 ) and (bx 2 , by 2 )
- the coordinates of the two vertices on the diagonal of the bounding box of the target object may be (bx 1 +m x , By 1 +m y ) and (bx 2 +m x ,by 2 +m y ).
- FIG. 2 is a flow chart of a method of training a neural network in some embodiments of the present application.
- N the number of multiple key points
- the attribute information of each key point may be a 3-dimensional vector: (x, y, v), and the outer bounding box is a rectangle.
- the input of the neural network comprises a 3 ⁇ N matrix
- the output comprises a 2 ⁇ 2 matrix, which can be a two-dimensional coordinate of two vertices on the diagonal of the circumscribed box.
- the method for training a neural network in the embodiments of the present application includes: S200, S210, S220, S230, S240, and S250.
- the respective operations in Fig. 2 will be described below.
- S200 Obtain a piece of sample data from a sample set.
- the set of samples in embodiments of the present application is typically non-empty and typically includes a large amount of sample data
- the set of samples may be the currently published MS COCO database, and the like.
- Each piece of sample data in the sample set may include: attribute information of a plurality of key points of the sample object, and each sample data may be labeled with a position of a bounding box of the sample object, wherein the attribute information of the key point may include coordinates of the key point.
- the discriminant value of the presence of information and key points but the embodiment of the present application is not limited thereto.
- the sample object corresponding to the sample data is usually of the same type as the target object.
- the sample object is also a human body.
- one sample data may be sequentially selected from the sample set according to the arrangement order of the sample data, or one sample data may be randomly selected from the sample set.
- the manner in which the sample data is selected is not limited in the embodiment of the present application.
- the coordinate mean value (m x , m y ) is calculated for the coordinate information in the attribute information of all the key points in which the discrimination value v is 1 in the sample data.
- (x i -m x , y i -m y ) is calculated for the coordinate information (x i , y i ) of the valid key points in the sample data.
- the attribute information of all the key points of the piece of sample data is provided as an input to the neural network.
- the output of the neural network is the two-dimensional coordinates (bx 1 , by 1 ) and (bx 2 , by 2 ) of the two vertices on the diagonal of the rectangle
- the coordinates of the bounding box It can be determined that the sum of the above output coordinate information and the coordinate mean (ie, given supervision) can be expressed as (bx 1 + m x , by 1 + m y ) and (bx 2 + m x , by 2 + m y ).
- the embodiment of the present application may perform calculation by using a random gradient descent algorithm to implement training.
- whether the parameters of the neural network are adjusted may be determined by comparing the calculated result of the neural network with the location of the circumscribed frame of the sample data. Wherein, if the difference between the calculated result of the neural network and the position of the bounding box marked by the sample data is lower than a certain range, the training process may be terminated or new sample data may be selected from the sample set. Otherwise, the parameters of the neural network can be adjusted and the calculation can be continued using the adjusted neural network.
- the embodiment of the present application may determine whether all the sample data in the sample set is used for training, whether the result of the neural network output meets the predetermined accuracy requirement, or whether the number of samples read reaches a predetermined number, and the like. Factor to determine whether to continue to get a new sample data from the sample set.
- the neural network training is successful if it is determined that the result of the neural network output meets the predetermined accuracy requirement, and if all sample data in the sample set has been used for training or reading The number of samples has reached the predetermined number.
- the neural network is not successfully trained and can be used for the neural network. Train again.
- the foregoing detection may be: selecting a plurality of untrained sample data from the sample set, and based on the sample data, providing the neural network according to the method shown in FIG. 1 and determining at least one external frame obtained based on the neural network.
- the error of the position and the position of the manually marked outer frame in the corresponding sample data is successful when the accuracy is determined according to at least one error to meet the predetermined accuracy requirement.
- the embodiment of the present application may perform the training supervision by using the L2 loss function, but the embodiment of the present application is not limited thereto.
- the present application trains the neural network by using the attribute information of the key points of the sample object and the position of the bounding box, so that the trained neural network can directly determine the position of the bounding box of the target object based on the attribute information of the key points of the target object;
- the embodiment of the present application can quickly utilize the attribute information of the key point of the target object that has been obtained without using the image.
- any of the methods provided by the embodiments of the present application may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like.
- any of the methods provided by the embodiments of the present application may be executed by a processor, such as a processor, by executing a corresponding instruction stored in a memory to perform any one of the methods mentioned in the embodiments of the present application. This will not be repeated below.
- the foregoing programs may be stored in a computer readable storage medium, and the program is executed when executed.
- the operation of the foregoing method embodiment is included; and the foregoing storage medium includes at least one medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
- FIG. 3 is a schematic structural diagram of an apparatus for determining a bounding box of a target object in some embodiments of the present application.
- the apparatus of this embodiment can be used to implement the various method embodiments described above.
- the apparatus of this embodiment includes: an obtaining module 300 and a determining module 310.
- the apparatus may further include: a training module 320.
- the obtaining module 300 is configured to acquire attribute information of each key point of the plurality of key points of the target object.
- the content of the attribute information of the target object, the key point, the valid key point, and the key point can be referred to the related description of S100 in the foregoing method embodiment, and therefore will not be described in detail herein.
- the apparatus of the embodiments of the present application may be applied to an application scenario in which attribute information of a key point of a target object has been successfully obtained, that is, an attribute of a key point of the target object has been obtained from the image.
- the obtaining module 300 can directly obtain the attribute information of the key point of the existing target object by means of information reading or the like.
- the determining module 310 is configured to determine the location of the bounding box of the target object according to the attribute information of each of the plurality of key points of the target object acquired by the obtaining module 300 and the preset neural network.
- the representation of the neural network in the present application (for example, the number of layers, the number of neurons, and the activation function, etc.) can be referred to the related description in the foregoing method embodiments, and thus will not be described in detail herein.
- the determining module 310 can include: a first sub-module, a second sub-module, and a third sub-module.
- the first sub-module is configured to determine at least one valid key point from the plurality of key points according to the attribute information of each of the plurality of key points acquired by the obtaining module 300;
- the second sub-module is used according to the first sub-module
- the attribute information of each valid key point of the at least one valid key point determined by the module, the attribute information of the plurality of key points is processed, and the attribute information of the processed multiple key points is obtained;
- the third sub-module is used for The attribute information of the processed multiple key points obtained by the two sub-modules is input to a preset neural network for processing, and the position of the external frame of the target object is obtained.
- the attribute information of the processed multiple key points may include: processed attribute information of each of the at least one valid key point and a plurality of key points other than the at least one valid key point Property information for other key points.
- the second submodule may include: a first unit and a second unit.
- the first unit is configured to determine, according to coordinate information included in the attribute information of each valid key point of the at least one valid key point determined by the first submodule, the reference coordinate is determined; for example, the first unit is in the at least one valid key point.
- the coordinates corresponding to the coordinate information of each valid key point are averaged to obtain reference coordinates; the second unit is used for the reference coordinates determined according to the first unit and the attribute information of each of the valid key points of the at least one valid key point Coordinate information, determining coordinate information in the processed attribute information of each valid key point; for example, the second unit determines the reference coordinate determined by the first unit as an origin, and determines each valid key point of the at least one valid key point The processed coordinate information corresponding to the coordinate information.
- the third sub-module may be configured to input attribute information of the plurality of key points processed by the second unit to the neural network for processing, obtain output position information, and determine the target object according to the reference coordinates and the output position information. The location of the add-in box.
- the first unit is configured to calculate a two-dimensional coordinate mean value according to coordinate information of all valid key points of the target object;
- the second unit is used to calculate the difference between the coordinate information of the key point and the mean value of the two-dimensional coordinate for all the valid key points of the target object, and use the difference as the coordinate information of the effective key point;
- the third sub-module is used for the target object.
- the determining module 310 may use the sum of the bounding box coordinate information and the coordinate mean value output by the neural network as the bounding box two-dimensional coordinate information of the target object. .
- the training module 320 is configured to train a neural network to acquire a sample set including a plurality of sample data, wherein the sample data includes: attribute information of a plurality of key points of the sample object, and the sample data is marked with a position of the outer frame of the sample object, and then according to The attribute information of the plurality of key points of the sample object in each sample data and the position of the bounding box of the sample object are trained in the neural network.
- the training module 320 acquires a plurality of pieces of sample data from the sample set, and for each piece of sample data, calculates coordinate mean values according to coordinate information of all valid key points of the piece of sample data, and separately calculates the piece of sample data.
- the difference between the coordinate information of at least one valid key point and the above-mentioned coordinate mean value, the calculated difference value is used as the coordinate information of the corresponding valid key point, and then the attribute information of all the key points of the piece of sample data is provided as an input.
- the neural network An example of the operation performed by the training module 320 to train the neural network can be referred to the description in the above method, and therefore the description will not be repeated here.
- the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
- an electronic device such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
- FIG. 4 there is shown a schematic structural diagram of an electronic device 400 suitable for implementing a terminal device or a server of an embodiment of the present application.
- the electronic device 400 includes one or more processors and a communication unit.
- the one or more processors are, for example, one or more central processing units (CPUs) 401, and/or one or more acceleration units 413, etc., and the acceleration units 413 may include, but are not limited to, GPUs, FPGAs, other types.
- CPUs central processing units
- acceleration units 413 may include, but are not limited to, GPUs, FPGAs, other types.
- the processor can perform various appropriate actions in accordance with executable instructions stored in read only memory (ROM) 402 or executable instructions loaded from random access memory (RAM) 403 from storage portion 408.
- the communication unit 412 may include, but is not limited to, a network card, which may include, but is not limited to, an IB (Infiniband) network card.
- the processor can communicate with the read only memory 402 and/or the random access memory 403 to execute executable instructions, connect to the communication portion 412 via the bus 404, and communicate with other target devices via the communication portion 412, thereby completing the embodiments of the present application. Any of the methods corresponding to the operation. For example, acquiring attribute information of each of a plurality of key points of the target object; determining attribute information of the target object according to attribute information of each of the plurality of key points of the target object and a preset neural network The location of the external frame.
- RAM 403 various programs and data required for the operation of the device can be stored.
- the CPU 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
- ROM 402 is an optional module.
- the RAM 403 stores executable instructions, or writes executable instructions to the ROM 402 at runtime, the executable instructions causing the processor to perform operations corresponding to the above-described communication methods.
- An input/output (I/O) interface 405 is also coupled to bus 404.
- the communication unit 412 may be integrated or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and on the bus link.
- the following components are connected to the I/O interface 405: an input portion 406 including a keyboard, a mouse, etc.; an output portion 407 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 408 including a hard disk or the like. And a communication portion 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the Internet.
- Driver 410 is also coupled to I/O interface 405 as needed.
- a removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 410 as needed so that a computer program read therefrom is installed into the storage portion 408 as needed.
- FIG. 4 is only an optional implementation manner.
- the number and type of components in the foregoing FIG. 4 may be selected, deleted, added, or replaced according to actual needs; Different functional component settings may also be implemented by separate settings or integrated settings.
- the acceleration unit 413 and the CPU 401 may be separately disposed or the acceleration unit 413 may be integrated on the CPU 401, and the communication unit may be separately configured or integrated in the CPU 401. Or on the acceleration unit 413, and so on.
- embodiments disclosed herein include a computer program product comprising a computer program tangibly embodied on a computer readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Corresponding to the instructions corresponding to the method steps provided by the embodiments of the present application. For example, acquiring attribute information of each of a plurality of key points of the target object; determining attribute information of the target object according to attribute information of each of the plurality of key points of the target object and a preset neural network The location of the external frame.
- the computer program can be downloaded and installed from the network via the communication portion 409, and/or installed from the removable medium 411.
- the instructions in the computer program are executed by the central processing unit (CPU) 401, the above-described functions defined in the method of the present application are performed.
- the methods, apparatus, and apparatus of the present application may be implemented in a number of ways.
- the methods, apparatus, and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
- the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
- the present application can also be embodied as a program recorded in a recording medium, the program comprising computer readable instructions for implementing the method according to the present application.
- the present application also covers a recording medium storing a program for executing the method of the present application, for example, the computer readable storage medium 500 shown in FIG.
- the methods and apparatus, electronic devices, and computer readable storage media of the present application are possible in many ways.
- the methods and apparatus, electronic devices, and computer readable storage media of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, or firmware.
- the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
- the present application can also be embodied as a program recorded in a recording medium, the program comprising computer readable instructions for implementing the method according to the present application.
- the present application also covers a recording medium storing a program for executing the method according to the present application.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (30)
- 一种用于确定目标对象的外接框的方法,其特征在于,包括:获取目标对象的多个关键点中每个关键点的属性信息;根据所述目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定所述目标对象的外接框位置。
- 根据权利要求1所述的方法,其特征在于,所述目标对象包括:人体。
- 根据权利要求1或2所述的方法,其特征在于,所述关键点的属性信息包括:坐标信息以及存在判别值。
- 根据权利要求3所述的方法,其特征在于,所述根据所述目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定所述目标对象的外接框位置,包括:根据所述多个关键点中每个关键点的属性信息,从所述多个关键点中确定至少一个有效关键点;根据所述至少一个有效关键点中每个有效关键点的属性信息,对所述多个关键点的属性信息进行处理,得到处理后的多个关键点的属性信息;将所述处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到所述目标对象的外接框位置。
- 根据权利要求4所述的方法,其特征在于,所述处理后的多个关键点的属性信息包括:所述至少一个有效关键点中每个有效关键点的处理后的属性信息以及所述多个关键点中除所述至少一个有效关键点之外的其他关键点的属性信息。
- 根据权利要求4或5所述的方法,其特征在于,所述根据所述至少一个有效关键点中每个有效关键点的属性信息,对所述多个关键点的属性信息进行处理,得到处理后的多个关键点的属性信息,包括:根据所述至少一个有效关键点中每个有效关键点的属性信息包括的坐标信息,确定参考坐标;根据所述参考坐标和所述至少一个有效关键点中每个有效关键点的属性信息中的坐标信息,确定所述每个有效关键点的处理后的属性信息中的坐标信息。
- 根据权利要求6所述的方法,其特征在于,所述根据所述至少一个有效关键点中每个有效关键点的属性信息包括的坐标信息,确定参考坐标,包括:对所述至少一个有效关键点中每个有效关键点的坐标信息对应的坐标进行平均处理,得到所述参考坐标;和/或所述根据所述参考坐标和所述至少一个有效关键点中每个有效关键点的属性信息中的坐标信息,确定所述每个有效关键点的处理后的属性信息中的坐标信息,包括:将所述参考坐标作为原点,确定所述至少一个有效关键点中每个有效关键点的坐标信息所对应的处理后的坐标信息。
- 根据权利要求6或7所述的方法,其特征在于,所述将所述处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到所述目标对象的外接框位置,包括:将所述处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到输出位置信息;根据所述参考坐标和所述输出位置信息,确定所述目标对象的外接框位置。
- 根据权利要求1至8中任一项所述的方法,其特征在于,所述方法还包括:获取包括多个样本数据的样本集合,其中,所述样本数据包括:样本对象的多个关键点的属性信息,并且所述样本数据标注有所述样本对象的外接框位置;根据每个所述样本数据中样本对象的多个关键点的属性信息以及所述样本对象的外接框位置,训练所述神经网络。
- 根据权利要求1至9中任一项所述的方法,其特征在于,所述神经网络是基于随机梯度下降算法进行训练得到的。
- 根据权利要求1至10中任一项所述的方法,其特征在于,所述目标对象的外接框位置包括:所述目标对象的外接框对角线方向上的两个顶点的坐标信息。
- 根据权利要求1至11中任一项所述的方法,其特征在于,所述神经网络包括:至少两层全连接层。
- 根据权利要求1至12中任一项所述的方法,其特征在于,所述神经网络包括:三层全连接层,其中,所述三层全连接层的第一层全连接层和第二层全连接层中的至少一层的激活函数包括:修正线性单元ReLu激活函数。
- 根据权利要求13所述的方法,其特征在于,所述第一层全连接层包括320个神经元,所述第二层全连接层包括320个神经元,所述三层全连接层中的最后一层全连接层包括4个神经元。
- 一种用于确定目标对象的外接框的装置,其特征在于,包括:获取模块,用于获取目标对象的多个关键点中每个关键点的属性信息;确定模块,用于根据所述获取模块获取的所述目标对象的多个关键点中每个关键点的属性信息以及预设的神经网络,确定所述目标对象的外接框位置。
- 根据权利要求15所述的装置,其特征在于,所述目标对象包括:人体。
- 根据权利要求15或16所述的装置,其特征在于,所述关键点的属性信息包括:坐标信息以及存在判别值。
- 根据权利要求17所述的装置,其特征在于,所述确定模块包括:第一子模块,用于根据所述获取模块获取的多个关键点中每个关键点的属性信息,从所述多个关键点中确定至少一个有效关键点;第二子模块,用于根据所述第一子模块确定出的至少一个有效关键点中每个有效关键点的属性信息,对所述多个关键点的属性信息进行处理,得到处理后的多个关键点的属性信息;第三子模块,用于将所述第二子模块得到的处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到所述目标对象的外接框位置。
- 根据权利要求18所述的装置,其特征在于,所述处理后的多个关键点的属性信息包括:所述至少一个有效关键点中每个有效关键点的处理后的属性信息以及所述多个关键点中除所述至少一个有效关键点之外的其他关键点的属性信息。
- 根据权利要求18或19所述的装置,其特征在于,所述第二子模块包括:第一单元,用于根据所述第一子模块确定出的至少一个有效关键点中每个有效关键点的属性信息包括的坐标信息,确定参考坐标;第二单元,用于根据所述第一单元确定出的参考坐标和所述至少一个有效关键点中每个有效关键点的属性信息中的坐标信息,确定所述每个有效关键点的处理后的属性信息中的坐标信息。
- 根据权利要求20所述的装置,其特征在于,所述第一单元用于:对所述第一子模块确定出的至少一个有效关键点中每个有效关键点的坐标信息对应的坐标进行平均处理,得到所述参考坐标;和/或第二单元用于:将所述第一单元确定出的参考坐标作为原点,确定所述至少一个有效关键点中每个有效关键点的坐标信息所对应的处理后的坐标信息。
- 根据权利要求20或21所述的装置,其特征在于,所述第三子模块用于:将所述第二子模块得到的处理后的多个关键点的属性信息输入到所述预设的神经网络进行处理,得到输出位置信息;根据所述参考坐标和所述输出位置信息,确定所述目标对象的外接框位置。
- 根据权利要求15至22中任一项所述的装置,其特征在于,所述装置还包括:训练模块,用于:获取包括多个样本数据的样本集合,其中,所述样本数据包括:样本对象的多个关键点的属性信息,并且所述样本数据标注有所述样本对象的外接框位置;根据每个所述样本数据中样本对象的多个关键点的属性信息以及所述样本对象的外接框位置,训练所述神经网络。
- 根据权利要求15至23中任一项所述的装置,其特征在于,所述神经网络是基于随机梯度下降算法进行训练得到的。
- 根据权利要求15至24中任一项所述的装置,其特征在于,所述目标对象的外接框位置包括:所述目标对象的外接框对角线方向上的两个顶点的坐标信息。
- 根据权利要求15至25中任一项所述的装置,其特征在于,所述神经网络包括:至少两层全连接层。
- 根据权利要求15至26中任一项所述的装置,其特征在于,所述神经网络包括:三层全连接层,其中,所述三层全连接层的第一层全连接层和第二层全连接层中的至少一层的激活函数包括:修正线性单元ReLu激活函数。
- 根据权利要求27所述的装置,其特征在于,所述第一层全连接层包括320个神经元,所述第二层全连接层包括320个神经元,所述三层全连接层中的最后一层全连接层包括4个神经元。
- 一种电子设备,包括:处理器和计算机可读存储介质,计算机可读存储介质用于存储指令,所述处理器对所述指令的执行使得所述电子设备执行如权利要求1至14中任一项所述的方法。
- 一种计算机可读存储介质,其上存储有指令,所述指令被处理器执行时,执行如权利要求1至14中任一项所述的方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019572712A JP6872044B2 (ja) | 2017-11-21 | 2018-10-23 | 対象物の外接枠を決定するための方法、装置、媒体及び機器 |
SG11201913529UA SG11201913529UA (en) | 2017-11-21 | 2018-10-23 | Methods and apparatuses for determining bounding box of target object, media, and devices |
US16/731,858 US11348275B2 (en) | 2017-11-21 | 2019-12-31 | Methods and apparatuses for determining bounding box of target object, media, and devices |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711165979.8 | 2017-11-21 | ||
CN201711165979.8A CN108229305B (zh) | 2017-11-21 | 2017-11-21 | 用于确定目标对象的外接框的方法、装置和电子设备 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/731,858 Continuation US11348275B2 (en) | 2017-11-21 | 2019-12-31 | Methods and apparatuses for determining bounding box of target object, media, and devices |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019100886A1 true WO2019100886A1 (zh) | 2019-05-31 |
Family
ID=62652771
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/111464 WO2019100886A1 (zh) | 2017-11-21 | 2018-10-23 | 用于确定目标对象的外接框的方法、装置、介质和设备 |
Country Status (5)
Country | Link |
---|---|
US (1) | US11348275B2 (zh) |
JP (1) | JP6872044B2 (zh) |
CN (1) | CN108229305B (zh) |
SG (1) | SG11201913529UA (zh) |
WO (1) | WO2019100886A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7419964B2 (ja) | 2019-06-21 | 2024-01-23 | 富士通株式会社 | 人体動作認識装置及び方法、電子機器 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018033137A1 (zh) * | 2016-08-19 | 2018-02-22 | 北京市商汤科技开发有限公司 | 在视频图像中展示业务对象的方法、装置和电子设备 |
CN108229305B (zh) * | 2017-11-21 | 2021-06-04 | 北京市商汤科技开发有限公司 | 用于确定目标对象的外接框的方法、装置和电子设备 |
CN110826357B (zh) * | 2018-08-07 | 2022-07-26 | 北京市商汤科技开发有限公司 | 对象三维检测及智能驾驶控制的方法、装置、介质及设备 |
CN111241887B (zh) * | 2018-11-29 | 2024-04-16 | 北京市商汤科技开发有限公司 | 目标对象关键点识别方法及装置、电子设备和存储介质 |
CN110782404B (zh) * | 2019-10-11 | 2022-06-10 | 北京达佳互联信息技术有限公司 | 一种图像处理方法、装置及存储介质 |
CN110929792B (zh) * | 2019-11-27 | 2024-05-24 | 深圳市商汤科技有限公司 | 图像标注方法、装置、电子设备及存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120020519A1 (en) * | 2010-07-21 | 2012-01-26 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method, and storage medium |
CN107194361A (zh) * | 2017-05-27 | 2017-09-22 | 成都通甲优博科技有限责任公司 | 二维姿势检测方法及装置 |
CN107220604A (zh) * | 2017-05-18 | 2017-09-29 | 清华大学深圳研究生院 | 一种基于视频的跌倒检测方法 |
CN108229305A (zh) * | 2017-11-21 | 2018-06-29 | 北京市商汤科技开发有限公司 | 用于确定目标对象的外接框的方法、装置和电子设备 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9342888B2 (en) * | 2014-02-08 | 2016-05-17 | Honda Motor Co., Ltd. | System and method for mapping, localization and pose correction of a vehicle based on images |
IL231862A (en) * | 2014-04-01 | 2015-04-30 | Superfish Ltd | Image representation using a neural network |
JP2016006626A (ja) | 2014-05-28 | 2016-01-14 | 株式会社デンソーアイティーラボラトリ | 検知装置、検知プログラム、検知方法、車両、パラメータ算出装置、パラメータ算出プログラムおよびパラメータ算出方法 |
WO2016004330A1 (en) * | 2014-07-03 | 2016-01-07 | Oim Squared Inc. | Interactive content generation |
CN104573715B (zh) | 2014-12-30 | 2017-07-25 | 百度在线网络技术(北京)有限公司 | 图像主体区域的识别方法及装置 |
WO2016179808A1 (en) | 2015-05-13 | 2016-11-17 | Xiaoou Tang | An apparatus and a method for face parts and face detection |
US9767381B2 (en) | 2015-09-22 | 2017-09-19 | Xerox Corporation | Similarity-based detection of prominent objects using deep CNN pooling layers as features |
WO2017095948A1 (en) | 2015-11-30 | 2017-06-08 | Pilot Ai Labs, Inc. | Improved general object detection using neural networks |
KR102592076B1 (ko) | 2015-12-14 | 2023-10-19 | 삼성전자주식회사 | 딥러닝 기반 영상 처리 장치 및 방법, 학습 장치 |
CN107194338A (zh) | 2017-05-14 | 2017-09-22 | 北京工业大学 | 基于人体树图模型的交通环境行人检测方法 |
US11080551B2 (en) * | 2017-05-22 | 2021-08-03 | Intel Corporation | Proposal region filter for digital image processing |
US10438371B2 (en) * | 2017-09-22 | 2019-10-08 | Zoox, Inc. | Three-dimensional bounding box from two-dimensional image and point cloud data |
-
2017
- 2017-11-21 CN CN201711165979.8A patent/CN108229305B/zh active Active
-
2018
- 2018-10-23 SG SG11201913529UA patent/SG11201913529UA/en unknown
- 2018-10-23 JP JP2019572712A patent/JP6872044B2/ja active Active
- 2018-10-23 WO PCT/CN2018/111464 patent/WO2019100886A1/zh active Application Filing
-
2019
- 2019-12-31 US US16/731,858 patent/US11348275B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120020519A1 (en) * | 2010-07-21 | 2012-01-26 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method, and storage medium |
CN107220604A (zh) * | 2017-05-18 | 2017-09-29 | 清华大学深圳研究生院 | 一种基于视频的跌倒检测方法 |
CN107194361A (zh) * | 2017-05-27 | 2017-09-22 | 成都通甲优博科技有限责任公司 | 二维姿势检测方法及装置 |
CN108229305A (zh) * | 2017-11-21 | 2018-06-29 | 北京市商汤科技开发有限公司 | 用于确定目标对象的外接框的方法、装置和电子设备 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7419964B2 (ja) | 2019-06-21 | 2024-01-23 | 富士通株式会社 | 人体動作認識装置及び方法、電子機器 |
Also Published As
Publication number | Publication date |
---|---|
JP6872044B2 (ja) | 2021-05-19 |
US11348275B2 (en) | 2022-05-31 |
SG11201913529UA (en) | 2020-01-30 |
JP2020525959A (ja) | 2020-08-27 |
CN108229305B (zh) | 2021-06-04 |
CN108229305A (zh) | 2018-06-29 |
US20200134859A1 (en) | 2020-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019100886A1 (zh) | 用于确定目标对象的外接框的方法、装置、介质和设备 | |
CN108427927B (zh) | 目标再识别方法和装置、电子设备、程序和存储介质 | |
US11120254B2 (en) | Methods and apparatuses for determining hand three-dimensional data | |
US10210418B2 (en) | Object detection system and object detection method | |
US20190108447A1 (en) | Multifunction perceptrons in machine learning environments | |
WO2019128932A1 (zh) | 人脸姿态分析方法、装置、设备、存储介质以及程序 | |
US10157309B2 (en) | Online detection and classification of dynamic gestures with recurrent convolutional neural networks | |
WO2019105337A1 (zh) | 基于视频的人脸识别方法、装置、设备、介质及程序 | |
US10572072B2 (en) | Depth-based touch detection | |
CN108229353B (zh) | 人体图像的分类方法和装置、电子设备、存储介质、程序 | |
WO2018054329A1 (zh) | 物体检测方法和装置、电子设备、计算机程序和存储介质 | |
US20150253864A1 (en) | Image Processor Comprising Gesture Recognition System with Finger Detection and Tracking Functionality | |
CN108229301B (zh) | 眼睑线检测方法、装置和电子设备 | |
US20160026857A1 (en) | Image processor comprising gesture recognition system with static hand pose recognition based on dynamic warping | |
US11954862B2 (en) | Joint estimation of heart rate and respiratory rate using neural networks | |
US11604963B2 (en) | Feedback adversarial learning | |
CN110659570A (zh) | 目标对象姿态跟踪方法、神经网络的训练方法及装置 | |
WO2023083030A1 (zh) | 一种姿态识别方法及其相关设备 | |
CN114005149A (zh) | 一种目标角度检测模型的训练方法及装置 | |
US10867441B2 (en) | Method and apparatus for prefetching data items to a cache | |
López-Rubio et al. | Robust fitting of ellipsoids by separating interior and exterior points during optimization | |
Guo et al. | A hybrid framework based on warped hierarchical tree for pose estimation of texture-less objects | |
KR102215811B1 (ko) | 파티클 이미지와 icp매칭을 이용하여 오브젝트 인식속도를 향상시킨 피라미드 이미지 기반의 영상 분석 방법 및 이를 위한 영상 분석 장치 | |
JP7364077B2 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
US20220215564A1 (en) | Three-dimensional scan registration with deformable models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18881654 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019572712 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08/09/2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18881654 Country of ref document: EP Kind code of ref document: A1 |