WO2020062493A1 - Procédé et appareil de traitement d'image - Google Patents

Procédé et appareil de traitement d'image Download PDF

Info

Publication number
WO2020062493A1
WO2020062493A1 PCT/CN2018/115968 CN2018115968W WO2020062493A1 WO 2020062493 A1 WO2020062493 A1 WO 2020062493A1 CN 2018115968 W CN2018115968 W CN 2018115968W WO 2020062493 A1 WO2020062493 A1 WO 2020062493A1
Authority
WO
WIPO (PCT)
Prior art keywords
key point
pose
preset
candidate frame
target candidate
Prior art date
Application number
PCT/CN2018/115968
Other languages
English (en)
Chinese (zh)
Inventor
胡耀全
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020062493A1 publication Critical patent/WO2020062493A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • Embodiments of the present application relate to the field of computer technology, and specifically to the field of Internet technology, and in particular, to an image processing method and device.
  • the embodiments of the present application provide an image processing method and device.
  • an embodiment of the present application provides an image processing method, including: obtaining an image of a posture of a labeled object, where the image includes at least two objects, different objects have different postures, and the posture is indicated by multiple key points; Based on the image and the annotation of the pose, the convolutional neural network is trained to obtain the trained convolutional neural network.
  • the training process includes: inputting the image into the convolutional neural network, and the previously set anchor pose based on the convolutional neural network.
  • each object determines the coincidence degree of the candidate frame where each candidate pose is located and the labeled frame of the labeled pose, and use the candidate frame whose coincidence is greater than the preset coincidence degree threshold as the target candidate frame; for each labeled frame For each key point in the corresponding target candidate frame, the position average value of the key point in each target candidate frame is taken; the set of position average values of each key point is taken as a pose detected on the image.
  • the method before the image is input to the convolutional neural network, and based on the previously set anchor pose of the convolutional neural network to determine the candidate pose of each object, the method further includes: presetting multiple presets in the target image.
  • the pose is clustered to obtain the key point set; each key point set is determined as the anchor point pose, where the key points included in different key point sets have different positions in the target image.
  • clustering a plurality of preset poses in a target image to obtain a key point set includes: clustering a multi-dimensional vector corresponding to each preset pose, wherein the multi-dimensional vectors corresponding to the preset pose are The number of dimensions of the vector is the same as the number of key points of the preset pose; each key point of the pose corresponding to the multi-dimensional vector of the cluster center is composed of a key point set.
  • taking an average position of the key point in the candidate pose within each target candidate frame includes: corresponding to each labeled frame For each key point in each target candidate frame, in response to determining that the position of the key point is outside the callout frame, use a preset first preset weight as the weight of the key point in the target candidate frame; response After determining that the position of the key point is within the label box, a preset second preset weight is used as the weight of the key point in the target candidate frame, and the first preset weight is smaller than the second preset weight; based on the label The weight of the key point in each target candidate frame corresponding to the frame determines the average position of the key point in each target candidate frame.
  • taking an average position of the key point in the candidate pose within each target candidate frame includes: corresponding to each labeled frame For each key point within each target candidate frame, determine whether the distance between the key point and the key point in the labeled pose is less than or equal to a preset distance threshold; and in response to determining that the distance is less than or equal to, based on the corresponding value of the labeled frame The weight of the key point in each target candidate frame is determined by determining the average position of the key point in each target candidate frame.
  • an embodiment of the present application provides an image processing apparatus, including: an obtaining unit configured to obtain an image of a posture of a labeled object, where the image includes at least two objects, different objects having different postures, Multiple key point instructions; training unit configured to train the convolutional neural network based on the image and the annotation of the pose to obtain a trained convolutional neural network.
  • the training process includes: inputting the image into the convolutional neural network, based on The previously set anchor pose of the convolutional neural network determines the candidate pose of each object; determines the degree of coincidence between the candidate frame where each candidate pose is located and the labeled frame of the already marked pose, and the degree of coincidence is greater than a preset coincidence degree threshold
  • the candidate frame is used as the target candidate frame; for each key point in the target candidate frame corresponding to each labeled frame, the average position of the key point in each target candidate frame is taken; the set of the average position of each key point is set As a gesture detected on the image.
  • the device further includes: a clustering unit configured to cluster a plurality of preset poses in the target image to obtain a key point set; a determining unit configured to determine each key point set as an anchor Point pose, where key points included in different key point sets have different positions in the target image.
  • a clustering unit configured to cluster a plurality of preset poses in the target image to obtain a key point set
  • a determining unit configured to determine each key point set as an anchor Point pose, where key points included in different key point sets have different positions in the target image.
  • the clustering unit is further configured to: cluster the multi-dimensional vectors corresponding to each preset pose, wherein the number of dimensions of the multi-dimensional vector corresponding to the preset pose and the number of key points of the preset pose Same; the key points of the preset pose corresponding to the multi-dimensional vector of the cluster center are formed into a key point set.
  • the training unit is further configured to: for each key point within each target candidate frame corresponding to each labeled box, in response to determining that the position of the key point is outside the labeled box , Taking the preset first preset weight as the weight of the key point within the target candidate frame; and in response to determining that the position of the key point is within the callout frame, using the preset second preset weight as the target candidate The weight of the key point in the frame, the first preset weight is less than the second preset weight; based on the weight of the key point in each target candidate frame corresponding to the labeled box, the key point in each target candidate frame is determined Average position.
  • the training unit is further configured to: for each key point in each target candidate frame corresponding to each labeled frame, determine whether the distance between the key point and the key point in the labeled pose is Less than or equal to the preset distance threshold; in response to determining that it is less than or equal to, based on the weight of the key point in each target candidate frame corresponding to the labeled frame, determining an average position of the key point in each target candidate frame.
  • an embodiment of the present application provides an electronic device including: one or more processors; a storage device configured to store one or more programs, and when one or more programs are executed by one or more processors , So that one or more processors implement the method as in any embodiment of the image processing method.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method as in any embodiment of the image processing method is implemented.
  • an image of the posture of the labeled object is obtained, where the image includes at least two objects, and the posture of different objects is different, and the posture is indicated by multiple key points.
  • the convolutional neural network is trained based on the image and the annotation of the pose to obtain a trained convolutional neural network.
  • the training process includes: inputting the image into the convolutional neural network, and the previously set anchor based on the convolutional neural network. Point pose to determine candidate poses for each object. Then, the coincidence degree of the candidate frame where each candidate pose is located with the labeled frame of the already-posted pose is determined, and the candidate frame whose coincidence degree is greater than a preset coincidence degree threshold is used as the target candidate frame.
  • each key point in the target candidate frame corresponding to each labeled frame an average position of the key point in each target candidate frame is taken. Finally, the set of position averages of each key point is used as a pose detected on the image.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of an image processing method according to the present application.
  • FIG. 3 is a schematic diagram of an application scenario of an image processing method according to the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of an image processing apparatus according to the present application.
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 to which an embodiment of an image processing method or an image processing apparatus of the present application can be applied.
  • the system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105.
  • the network 104 is a medium for providing a communication link between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications can be installed on the terminal devices 101, 102, 103, such as image processing applications, video applications, live broadcast applications, instant communication tools, mailbox clients, social platform software, and so on.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, and 103 can be various electronic devices with a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop computers and desktop computers.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as multiple software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • the server 105 may be a server that provides various services, such as a background server that supports the terminal devices 101, 102, and 103.
  • the background server may analyze and process the acquired data such as the image of the labeled object posture, and feed back the processing result (such as a posture detected on the image) to the terminal device.
  • the image processing method provided in the embodiment of the present application may be executed by the server 105 or the terminal devices 101, 102, and 103. Accordingly, the image processing apparatus may be provided in the server 105 or the terminal devices 101, 102, and 103.
  • terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the image processing method includes the following steps:
  • Step 201 Obtain an image of the posture of the labeled object, where the image includes at least two objects, and different objects have different postures, and the posture is indicated by multiple key points.
  • an execution subject for example, a server or a terminal device shown in FIG. 1
  • the image processing method may acquire an image of the posture of the labeled object.
  • the object's pose is labeled.
  • the objects here can be people, faces, cats, objects, and so on.
  • the posture can be represented by the coordinates of the key points. For example, when a person is in a standing posture and a squatting posture, the distance between the coordinates of the nose key point and the coordinates of the toe key point is different.
  • Step 202 Train the convolutional neural network based on the image and the annotation of the pose to obtain a trained convolutional neural network.
  • the training process includes steps 2021, 2022, and 2023, as follows:
  • Step 2021 input the image into the convolutional neural network, and determine the candidate pose of each object based on the previously set anchor pose of the convolutional neural network.
  • the above-mentioned execution body may input the acquired image into the convolutional neural network, so that based on the previously set anchor pose in the convolutional neural network, the convolutional neural network obtains the candidate pose of each object.
  • the convolutional neural network includes a region candidate network (RPN).
  • the size and position of the anchor in the convolutional neural network in the image is fixed.
  • the execution subject may input the image into a regional candidate network, and the regional candidate network may determine a difference in size and position between the candidate pose and the anchor pose, and use the difference between the size and the position to Represents the size and position of each candidate pose.
  • the size here can be expressed by area or width, height or length, width, etc., and the position can be expressed by coordinates.
  • the execution subject described above may determine multiple candidate poses.
  • the execution subject can obtain the pose output by the convolutional neural network as the pose detected on the image, and determine the loss value of the pose and the labeled pose based on a preset loss function. Then use this loss value for training to get the trained convolutional neural network.
  • Step 2022 Determine the degree of coincidence between the candidate frame where each candidate pose is located and the annotated frame of the already labeled pose, and use the candidate frame whose coincidence is greater than a preset coincidence degree threshold as the target candidate frame.
  • the execution body may determine an overlap degree (Intersection Over Union, IOU) of a candidate frame where each candidate pose is located and a labeled frame of the already marked pose. After that, the execution body may select a candidate frame whose coincidence degree is greater than a preset coincidence degree threshold, and use the selected candidate frame as a target candidate frame.
  • the width and height of the frame of the gesture may be the width (or length) generated by the leftmost and rightmost coordinates of the key points included in the gesture, and the height (or width) generated by the uppermost and lowest coordinates.
  • the coincidence degree may be a ratio of an intersection between the candidate frame and the labeled frame and a union between the candidate frame and the labeled frame. If the overlap between the candidate frame and the labeled frame is large, it indicates that the candidate frame has a high accuracy in framing the object. In this way, the candidate frame can more accurately divide the object and the non-object.
  • Step 2023 For each key point in the target candidate frame corresponding to each labeled frame, take the position average value of the key point in each target candidate frame; use the set of position average values of each key point as the image detection To a gesture.
  • the execution body may take the average position of the key point in the candidate poses in each target candidate frame corresponding to the labeled frame. value. Therefore, the above-mentioned execution body may use the set of the position average values of the key points of the target candidate frame corresponding to the labeled frame as a pose detected on the image.
  • the corresponding callout box and the target candidate box indicate the same object.
  • the same weight can be set for the positions of the respective key points to calculate the average position.
  • the weights set for the positions of the key points may also be different.
  • step 2023 for each key point in the target candidate frame corresponding to each labeled frame, the positions of the key points in at least two target candidate frames are averaged. Values, which can include:
  • a preset first preset weight is taken as the target candidate frame.
  • the above-mentioned execution body may use a smaller weight for the coordinates of the position outside the callout box and a larger value for the coordinates of the position inside the callout box.
  • Weights For example, the weight of key point A, key point B, and key point C are in the callout box, callout box, and callout box, respectively. You can use weights 1, 1 for key point A, key point B, and key point C, respectively.
  • 0.5 to calculate the position average. Then the average position obtained is (1 ⁇ keypoint A position + 1 ⁇ keypoint B position + 0.5 ⁇ keypoint C position) / (1 + 1 + 0.5).
  • step 2023 for each key point in the target candidate frame corresponding to each labeled frame, the positions of the key points in at least two target candidate frames are averaged. Values, which can include:
  • each key point in each target candidate frame corresponding to each labeled box determines whether the distance between the key point and the key point in the labeled pose is less than or equal to a preset distance threshold; in response to determining that the distance is less than or equal to , Based on the weights of the key points in each target candidate frame corresponding to the labeled box, determine an average position of the key points in each target candidate frame.
  • the execution body may determine whether the distance between each key point in each target candidate frame corresponding to each labeled frame and the key point in the labeled pose in the labeled frame is less than or It is equal to a preset distance threshold, and thus the key point in each target candidate frame corresponding to the labeled frame is selected. That is, in these implementation manners, the key point in some target candidate frames does not participate in obtaining the position average. Specifically, if the distance between a key point in a target candidate frame corresponding to the labeled frame and the labeled key point is small, it is determined that the key point can participate in calculating a position average.
  • the three target candidate boxes a, b, and c corresponding to a callout box M have nasal tip key points, and the nasal tip key points in a, b, and c are different from the nasal tip key points labeled in the callout box M.
  • the distances are 1, 2 and 3, respectively. If the preset distance threshold is 2.5, then the distances 1 and 2 corresponding to the target candidate frames a and b are smaller than the preset distance threshold, so the key points of the nasal tip in the target candidate frames a and b can participate in calculating the position average.
  • These implementations can select a key point closer to the label box from a key point within each target candidate frame corresponding to the label box to determine the position average, and can avoid key points with large deviations from participating in the calculation of the position average, thereby Improve the accuracy of determining attitude.
  • FIG. 3 is a schematic diagram of an application scenario of the image processing method according to this embodiment.
  • the execution body 301 may obtain an image 302 of the posture of the labeled object, where the image includes at least two objects, and the posture of different objects is different, and the posture is indicated by multiple key points.
  • the convolutional neural network is trained to obtain the trained convolutional neural network.
  • the training process includes: inputting the image into the convolutional neural network, and the previously set anchor pose based on the convolutional neural network. 303. Determine candidate poses 304 for each object.
  • each candidate pose from the image including at least two objects by the coincidence degree to select a target candidate frame that indicates the object is more accurate.
  • the average value of the key points is taken to accurately distinguish each pose in the image.
  • FIG. 4 illustrates a flowchart 400 of still another embodiment of an image processing method.
  • the process 400 of the image processing method includes the following steps:
  • Step 401 Cluster multiple preset poses in the target image to obtain a key point set.
  • an execution subject for example, a server or a terminal device shown in FIG. 1
  • the image processing method runs can obtain a target image and cluster multiple preset poses in the target image to obtain Key points collection.
  • the foregoing execution subject may cluster multiple preset poses in multiple ways. For example, the coordinates of the position of each key point can be clustered to obtain the clustering result of each key point.
  • the foregoing step 401 may include the following steps:
  • the preset pose may be represented by a multi-dimensional vector.
  • the vector of each dimension in the multi-dimensional vector corresponds to the coordinates of the position of a key point in the preset pose.
  • One or more cluster centers can be obtained by clustering.
  • the cluster center here is also a multi-dimensional vector.
  • the above-mentioned execution subject may form each key point of the gesture indicated by this multi-dimensional vector into a key point set.
  • Step 402 Determine each key point set as an anchor point pose, where key points included in different key point sets have different positions in the target image.
  • the above-mentioned execution body may determine each of the obtained key point sets as an anchor point posture. In this way, the position of each anchor point posture obtained is more differentiated. At the same time, this embodiment can also cluster multiple preset poses to obtain an accurate anchor point pose. In this way, in the process of detecting the posture, the deviation between the detected candidate posture and the anchor point posture can be reduced.
  • Step 403 Obtain an image of the posture of the labeled object, where the image includes at least two objects, and different objects have different postures, and the posture is indicated by multiple key points.
  • the above-mentioned execution subject may obtain an image of the posture of the labeled object.
  • the object's pose is labeled.
  • the objects here can be people, faces, cats, objects, and so on.
  • the posture can be represented by the coordinates of the key points.
  • step 404 the convolutional neural network is trained based on the image and the annotation of the pose to obtain a trained convolutional neural network.
  • the training process includes steps 4041, 4042, and 4043, as follows:
  • Step 4041 Input the image into the convolutional neural network, and determine the candidate pose of each object based on the previously set anchor pose of the convolutional neural network.
  • the above-mentioned execution body may input the acquired image into the convolutional neural network, so that based on the previously set anchor pose of the convolutional neural network, the candidate pose of each object is obtained by the convolutional neural network.
  • the convolutional neural network includes a regional candidate network. The size and position of the anchor pose in the image is fixed.
  • Step 4042 Determine the degree of coincidence between the candidate frame where each candidate pose is located and the annotated frame of the already identified pose, and use the candidate frame whose coincidence is greater than a preset coincidence degree threshold as the target candidate frame.
  • the execution body may determine the degree of coincidence between the candidate frame where each candidate pose is located and the labeled frame of the already labeled pose. After that, the execution body may select a candidate frame whose coincidence degree is greater than a preset coincidence degree threshold, and use the selected candidate frame as a target candidate frame.
  • Step 4043 For each key point in the target candidate frame corresponding to each labeled frame, take the position average value of the key point in each target candidate frame; use the set of position average values of each key point as the image detection To a gesture.
  • the execution body may take the average position of the key point in the candidate poses in each target candidate frame corresponding to the labeled frame. value. Therefore, the above-mentioned execution body may use the set of the position average values of the key points of the target candidate frame corresponding to the labeled frame as a pose detected on the image.
  • the postures of the respective anchor points obtained in this embodiment are more differentiated, which is beneficial to controlling the number of the postures of the anchor points while obtaining a wealth of anchor point postures. In this way, both the computation speed of the regional candidate network can be increased, and the deviation between the detected candidate pose and the anchor pose can be ensured to be small.
  • this embodiment can also cluster multiple preset poses to obtain an accurate anchor point pose, thereby further reducing the deviation between the detected candidate pose and the anchor point pose.
  • this application provides an embodiment of an image processing device.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2, and the device may specifically Used in various electronic equipment.
  • the image processing apparatus 500 in this embodiment includes an obtaining unit 501 and a training unit 502.
  • the obtaining unit 501 is configured to obtain an image of the pose of the labeled object, where the image includes at least two objects, the poses of different objects are different, and the pose is indicated by multiple key points
  • the training unit 502 is configured to be based on the image And labeling the pose, training the convolutional neural network to obtain the trained convolutional neural network
  • the training process includes: inputting the image into the convolutional neural network, and based on the previously set anchor pose of the convolutional neural network, determining Candidate poses for each object; determine the overlap between the candidate frame where each candidate pose is located and the labeled frame with the labeled pose, and use the candidate frame with a coincidence greater than a preset coincidence threshold as the target candidate frame; for each labeled frame For each key point in the target candidate frame, an average position of the key point in each target candidate frame is taken; a set of position averages of the key points is used as
  • the obtaining unit 501 of the image processing apparatus 500 may obtain an image of the pose of the labeled object.
  • the object's pose is labeled.
  • the objects here can be people, faces, cats, objects, and so on.
  • the posture can be represented by the coordinates of the key points. For example, when a person is in a standing posture and a squatting posture, the distance between the coordinates of the nose key point and the coordinates of the toe key point is different.
  • the training unit 502 may input the acquired image into a convolutional neural network, so as to obtain candidate poses of each object from the convolutional neural network based on a previously set anchor pose in the convolutional neural network. Then, a candidate frame with a coincidence degree greater than a preset coincidence degree threshold is selected, and the selected candidate frame is used as a target candidate frame.
  • the above-mentioned execution body may also take, for each key point in the target candidate frame corresponding to each label frame, an average position of the key point in the candidate poses in each target candidate frame corresponding to the label frame.
  • the device further includes: a clustering unit configured to cluster a plurality of preset poses in the target image to obtain a key point set; a determining unit configured to convert Each key point set is determined as an anchor point pose, where the key points included in different key point sets have different positions in the target image.
  • the clustering unit is further configured to: cluster the multi-dimensional vectors corresponding to each preset pose, wherein the number of dimensions of the multi-dimensional vector corresponding to the preset pose and the number of key points of the preset pose Same; the key points of the preset pose corresponding to the multi-dimensional vector of the cluster center are formed into a key point set.
  • the training unit is further configured to: for each key point in each target candidate frame corresponding to each labeled box, in response to determining that the position of the key point is Outside the callout box, the preset first preset weight is taken as the weight of the key point in the target candidate box; in response to determining that the position of the keypoint is within the callout box, the preset second preset weight is set As the weight of the key point in the target candidate frame, the first preset weight is smaller than the second preset weight; based on the weight of the key point in each target candidate frame corresponding to the labeled frame, determining within each target candidate frame The average position of this keypoint.
  • the training unit is further configured to: for each key point within each target candidate frame corresponding to each labeled frame, determine the key point and the labeled pose Whether the distance of the key point in the target point is less than or equal to a preset distance threshold; in response to determining that the key point is less than or equal to, determine the key point in each target candidate frame based on the weight of the key point in each target candidate frame corresponding to the labeled box The average position of the point.
  • FIG. 6 illustrates a schematic structural diagram of a computer system 600 suitable for implementing an electronic device according to an embodiment of the present application.
  • the electronic device shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
  • the computer system 600 includes a central processing unit (CPU and / or GPU) 601, which can be loaded into a random access memory (RAM) according to a program stored in a read-only memory (ROM) 602 or from a storage portion 608
  • the program in 603 performs various appropriate actions and processes.
  • various programs and data required for the operation of the system 600 are also stored.
  • the central processing unit 601, ROM 602, and RAM 603 are connected to each other through a bus 604.
  • An input / output (I / O) interface 605 is also connected to the bus 604.
  • the following components are connected to the I / O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the speaker; a storage portion including a hard disk and the like 608; and a communication section 609 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 609 performs communication processing via a network such as the Internet.
  • the driver 610 is also connected to the I / O interface 605 as necessary.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as needed, so that a computer program read therefrom is installed into the storage section 608 as needed.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication portion 609, and / or installed from a removable medium 611.
  • the computer-readable medium of the present application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions.
  • the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented by software or hardware.
  • the described unit may also be provided in a processor, for example, it may be described as: a processor includes an acquisition unit and a training unit. Wherein, the names of these units do not constitute a limitation on the unit itself in some cases.
  • the obtaining unit may also be described as a “unit for obtaining an image of the posture of an already labeled object”.
  • the present application also provides a computer-readable medium, which may be included in the device described in the foregoing embodiments; or may exist alone without being assembled into the device.
  • the computer-readable medium described above carries one or more programs, and when the one or more programs are executed by the device, the device causes the device to obtain an image of the posture of the marked object, where the image contains at least two objects, different objects The pose is different, and the pose is indicated by multiple key points.
  • the convolutional neural network is trained to obtain the trained convolutional neural network.
  • the training process includes: inputting the image into the convolutional neural network.
  • the previously set anchor pose of the convolutional neural network determines the candidate pose of each object; determines the degree of coincidence between the candidate frame where each candidate pose is located and the labeled frame of the already marked pose, and the degree of coincidence is greater than a preset coincidence degree threshold
  • the candidate frame is used as the target candidate frame; for each key point in the target candidate frame corresponding to each labeled frame, the average position of the key point in each target candidate frame is taken; the set of the average position of each key point is set As a gesture detected on the image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Les modes de réalisation de la présente invention concernent un procédé et un appareil de traitement d'image. Un mode de réalisation spécifique du procédé consiste à : acquérir une image dans laquelle les poses des sujets sont déjà étiquetées ; sur la base des images et de l'étiquetage des poses, entraîner un réseau neuronal convolutif pour obtenir un réseau neuronal convolutif entraîné. Le processus d'entraînement consistant à : entrer l'image dans le réseau neuronal convolutif et, sur la base de poses d'ancrage prédéfinies du réseau neuronal convolutif, déterminer des poses candidates de chaque sujet ; définir des trames candidates ayant un degré de coïncidence supérieur à un degré de coïncidence prédéfini en tant que trames candidates cibles ; pour chaque point clé dans une trame candidate cible correspondant à chaque trame étiquetée, prendre des valeurs de position moyennes du point clé dans chaque trame candidate cible ; et définir l'ensemble de valeurs de position moyenne des points clés en tant que pose détectée pour l'image. Dans le présent mode de réalisation, les poses candidates sont filtrées au moyen du degré de coïncidence et les valeurs moyennes des points clés sont prises en vue de distinguer avec précision les poses dans une image.
PCT/CN2018/115968 2018-09-29 2018-11-16 Procédé et appareil de traitement d'image WO2020062493A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811149818.4A CN109389640A (zh) 2018-09-29 2018-09-29 图像处理方法和装置
CN201811149818.4 2018-09-29

Publications (1)

Publication Number Publication Date
WO2020062493A1 true WO2020062493A1 (fr) 2020-04-02

Family

ID=65418681

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/115968 WO2020062493A1 (fr) 2018-09-29 2018-11-16 Procédé et appareil de traitement d'image

Country Status (2)

Country Link
CN (1) CN109389640A (fr)
WO (1) WO2020062493A1 (fr)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163841B (zh) * 2019-04-12 2021-05-14 中科微至智能制造科技江苏股份有限公司 物体表面缺陷的检测方法、装置、设备及存储介质
US10885625B2 (en) 2019-05-10 2021-01-05 Advanced New Technologies Co., Ltd. Recognizing damage through image analysis
CN110569703B (zh) * 2019-05-10 2020-09-01 阿里巴巴集团控股有限公司 计算机执行的从图片中识别损伤的方法及装置
CN110378244B (zh) * 2019-05-31 2021-12-31 曹凯 异常姿态的检测方法和装置
CN112132913A (zh) * 2019-06-25 2020-12-25 北京字节跳动网络技术有限公司 图像处理方法、装置、介质和电子设备
CN110738125B (zh) * 2019-09-19 2023-08-01 平安科技(深圳)有限公司 利用Mask R-CNN选择检测框的方法、装置及存储介质
CN110765942A (zh) * 2019-10-23 2020-02-07 睿魔智能科技(深圳)有限公司 图像数据的标注方法、装置、设备及存储介质
CN111695540B (zh) * 2020-06-17 2023-05-30 北京字节跳动网络技术有限公司 视频边框识别方法及裁剪方法、装置、电子设备及介质
CN112907583B (zh) * 2021-03-29 2023-04-07 苏州科达科技股份有限公司 目标对象姿态选择方法、图像评分方法及模型训练方法
CN112819937B (zh) * 2021-04-19 2021-07-06 清华大学 一种自适应多对象光场三维重建方法、装置及设备
CN113326901A (zh) * 2021-06-30 2021-08-31 北京百度网讯科技有限公司 图像的标注方法和装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080187172A1 (en) * 2004-12-02 2008-08-07 Nobuyuki Otsu Tracking Apparatus And Tracking Method
CN106355188A (zh) * 2015-07-13 2017-01-25 阿里巴巴集团控股有限公司 图像检测方法及装置
CN107358149A (zh) * 2017-05-27 2017-11-17 深圳市深网视界科技有限公司 一种人体姿态检测方法和装置
CN107463903A (zh) * 2017-08-08 2017-12-12 北京小米移动软件有限公司 人脸关键点定位方法及装置
CN107909005A (zh) * 2017-10-26 2018-04-13 西安电子科技大学 基于深度学习的监控场景下人物姿态识别方法
CN108229445A (zh) * 2018-02-09 2018-06-29 深圳市唯特视科技有限公司 一种基于级联金字塔网络的多人姿态估计方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080187172A1 (en) * 2004-12-02 2008-08-07 Nobuyuki Otsu Tracking Apparatus And Tracking Method
CN106355188A (zh) * 2015-07-13 2017-01-25 阿里巴巴集团控股有限公司 图像检测方法及装置
CN107358149A (zh) * 2017-05-27 2017-11-17 深圳市深网视界科技有限公司 一种人体姿态检测方法和装置
CN107463903A (zh) * 2017-08-08 2017-12-12 北京小米移动软件有限公司 人脸关键点定位方法及装置
CN107909005A (zh) * 2017-10-26 2018-04-13 西安电子科技大学 基于深度学习的监控场景下人物姿态识别方法
CN108229445A (zh) * 2018-02-09 2018-06-29 深圳市唯特视科技有限公司 一种基于级联金字塔网络的多人姿态估计方法

Also Published As

Publication number Publication date
CN109389640A (zh) 2019-02-26

Similar Documents

Publication Publication Date Title
WO2020062493A1 (fr) Procédé et appareil de traitement d'image
US11238272B2 (en) Method and apparatus for detecting face image
CN108898086B (zh) 视频图像处理方法及装置、计算机可读介质和电子设备
CN108898186B (zh) 用于提取图像的方法和装置
CN108509915B (zh) 人脸识别模型的生成方法和装置
CN109584276B (zh) 关键点检测方法、装置、设备及可读介质
CN108256479B (zh) 人脸跟踪方法和装置
CN108416310B (zh) 用于生成信息的方法和装置
US11436863B2 (en) Method and apparatus for outputting data
CN109993150B (zh) 用于识别年龄的方法和装置
CN109389072B (zh) 数据处理方法和装置
CN109344762B (zh) 图像处理方法和装置
WO2020029466A1 (fr) Procédé et appareil de traitement d'image
EP3872764B1 (fr) Procédé et appareil de construction de carte
WO2020056902A1 (fr) Procédé et appareil pour le traitement d'image de bouche
WO2019149186A1 (fr) Procédé et appareil de génération d'informations
CN108509921B (zh) 用于生成信息的方法和装置
CN110059623B (zh) 用于生成信息的方法和装置
CN110163171B (zh) 用于识别人脸属性的方法和装置
CN108388889B (zh) 用于分析人脸图像的方法和装置
CN108229375B (zh) 用于检测人脸图像的方法和装置
WO2019080702A1 (fr) Procédé et appareil de traitement d'images
US11210563B2 (en) Method and apparatus for processing image
WO2020238321A1 (fr) Procédé et dispositif d'identification d'âge
WO2020034981A1 (fr) Procédé permettant de générer des informations codées et procédé permettant de reconnaître des informations codées

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18935375

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.06.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18935375

Country of ref document: EP

Kind code of ref document: A1