CN109389640A

CN109389640A - Image processing method and device

Info

Publication number: CN109389640A
Application number: CN201811149818.4A
Authority: CN
Inventors: 胡耀全
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2018-09-29
Filing date: 2018-09-29
Publication date: 2019-02-26
Also published as: WO2020062493A1

Abstract

The embodiment of the present application discloses image processing method and device.One specific embodiment of this method includes: the image for obtaining the posture for having marked object；Based on the image and to the mark of posture, convolutional neural networks are trained, the convolutional neural networks after being trained, training process includes: that the image is inputted convolutional neural networks, the anchor point posture being formerly arranged based on the convolutional neural networks, determines the candidate attitudes of each object；Registration is greater than the candidate frame of default registration threshold value as target candidate frame；For each key point in target candidate frame corresponding to each callout box, the position mean of the key point in each target candidate frame is taken；By the set of the position mean of each key point as the posture arrived to the image detection.The present embodiment is screened to each candidate attitudes by registration and is obtained the average value of key point, accurately to tell each posture in image.

Description

Image processing method and device

Technical field

The invention relates to field of computer technology, and in particular at Internet technical field more particularly to image Manage method and apparatus.

Background technique

When confirming human body key point, it is sometimes desirable to confirm one key point, then need to confirm sometimes every in multiple people Personal key point.In the related art, it when detecting everyone key point in multiple people, is generally difficult to accurately detect to tie Fruit.

Summary of the invention

The embodiment of the present application proposes image processing method and device.

In a first aspect, the embodiment of the present application provides a kind of image processing method, comprising: obtain the posture for having marked object Image, wherein image contains at least two object, and the posture of different objects is different, and posture is indicated by multiple key points；Base In image and to the mark of posture, convolutional neural networks are trained, the convolutional neural networks after being trained, training process It include: that image is inputted into convolutional neural networks, the anchor point posture being formerly arranged based on convolutional neural networks determines each object Candidate attitudes；It determines the candidate frame and the registration of the callout box of the posture marked where each candidate attitudes, will be overlapped Degree is greater than the candidate frame of default registration threshold value as target candidate frame；For in target candidate frame corresponding to each callout box Each key point, take the position mean of the key point in each target candidate frame；The position of each key point is put down The set of mean value is as the posture arrived to image detection.

In some embodiments, image is being inputted into convolutional neural networks, being formerly arranged based on convolutional neural networks Anchor point posture, before the candidate attitudes for determining each object, method further include: multiple preset postures in target image are carried out Cluster, obtains set of keypoints；Each set of keypoints is determined as anchor point posture, wherein included by different set of keypoints Key point position in the target image it is different.

In some embodiments, multiple preset postures in target image are clustered, obtains set of keypoints, wrapped It includes: multi-C vector corresponding to each preset posture is clustered, wherein the dimension of multi-C vector corresponding to preset posture Quantity is identical as the keypoint quantity of preset posture；Each key point group of posture corresponding to multi-C vector by cluster centre At set of keypoints.

In some embodiments, it for each key point of target candidate frame corresponding to each callout box, takes each The position mean of the key point in target candidate frame in candidate attitudes, comprising: for each of corresponding to each callout box Each key point in target candidate frame, in response to determining the position of the key point other than the callout box, by preset the Weight of the one default weight as the key point in the target candidate frame；In response to determining the position of the key point in the mark Within frame, using the preset second default weight as the weight of the key point in the target candidate frame, the first default weight is small In the second default weight；Based on the weight of the key point in each target candidate frame corresponding to the callout box, determine each The position mean of the key point in target candidate frame.

In some embodiments, it for each key point of target candidate frame corresponding to each callout box, takes each The position mean of the key point in target candidate frame in candidate attitudes, comprising: for each of corresponding to each callout box Each key point in target candidate frame, determines whether the key point is less than at a distance from the key point in the posture marked Or it is equal to pre-determined distance threshold value；It is less than or equal in response to determination, based on each target candidate frame corresponding to the callout box The weight of the interior key point determines the position mean of the key point in each target candidate frame.

Second aspect, the embodiment of the present application provide a kind of image processing apparatus, comprising: acquiring unit is configured to obtain Take the image for having marked the posture of object, wherein image contains at least two object, and the posture of different objects is different, and posture is logical Cross multiple key point instructions；Training unit is configured to instruct convolutional neural networks based on image and to the mark of posture Practice, the convolutional neural networks after being trained, training process includes: that image is inputted convolutional neural networks, is based on convolutional Neural The anchor point posture of network being formerly arranged, determines the candidate attitudes of each object；Determine the candidate frame where each candidate attitudes With the registration of the callout box of the posture marked, registration is greater than the candidate frame of default registration threshold value as target candidate Frame；For each key point in target candidate frame corresponding to each callout box, the pass in each target candidate frame is taken The position mean of key point；By the set of the position mean of each key point as the posture arrived to image detection.

In some embodiments, device further include: cluster cell is configured to multiple preset postures in target image It is clustered, obtains set of keypoints；Determination unit is configured to each set of keypoints being determined as anchor point posture, wherein The position of key point included by different set of keypoints in the target image is different.

In some embodiments, cluster cell is further configured to: to multi-C vector corresponding to each preset posture It is clustered, wherein the number of dimensions of multi-C vector corresponding to preset posture and the keypoint quantity of preset posture are identical；It will Each key point of preset posture corresponding to the multi-C vector of cluster centre forms set of keypoints.

In some embodiments, training unit is further configured to: being configured to for corresponding to each callout box Each key point in each target candidate frame, the position in response to determining the key point will be preset other than the callout box Weight of the first default weight as the key point in the target candidate frame；In response to determining the position of the key point at this Within callout box, using the preset second default weight as the weight of the key point in the target candidate frame, the first default power Again less than the second default weight；Based on the weight of the key point in each target candidate frame corresponding to the callout box, determine The position mean of the key point in each target candidate frame.

In some embodiments, training unit is further configured to: for each target corresponding to each callout box Each key point in candidate frame, determines whether the key point is less than or waits at a distance from the key point in the posture marked In pre-determined distance threshold value；It is less than or equal in response to determination, based in each target candidate frame corresponding to the callout box The weight of the key point determines the position mean of the key point in each target candidate frame.

The third aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: one or more processors；Storage dress It sets, for storing one or more programs, when one or more programs are executed by one or more processors, so that one or more A processor realizes the method such as any embodiment in image processing method.

Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey Sequence realizes the method such as any embodiment in image processing method when the program is executed by processor.

Image procossing scheme provided by the embodiments of the present application, firstly, obtaining the image for having marked the posture of object, wherein Image contains at least two object, and the posture of different objects is different, and posture is indicated by multiple key points.Later, it is based on image Mark with to posture, is trained convolutional neural networks, the convolutional neural networks after being trained, and training process includes: Image is inputted into convolutional neural networks, the anchor point posture being formerly arranged based on convolutional neural networks determines the time of each object Select posture.Then, it is determined that candidate frame and the registration of the callout box of the posture marked where each candidate attitudes, will be overlapped Degree is greater than the candidate frame of default registration threshold value as target candidate frame.Then, target corresponding to each callout box is waited Each key point in frame is selected, the position mean of the key point in each target candidate frame is taken.Finally, by each key The set of the position mean of point is as the posture arrived to image detection.The present embodiment can be from including at least two objects Image in, each candidate attitudes are screened by registration, with choose indicate the more accurate target candidate frame of object. Also, the average value for obtaining key point accurately tells each posture in image.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that this application can be applied to exemplary system architecture figures therein；

Fig. 2 is the flow chart according to one embodiment of the image processing method of the application；

Fig. 3 is the schematic diagram according to an application scenarios of the image processing method of the application；

Fig. 4 is the flow chart according to another embodiment of the image processing method of the application；

Fig. 5 is the structural schematic diagram according to one embodiment of the image processing apparatus of the application；

Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can be using the exemplary system of the embodiment of the image processing method or image processing apparatus of the application System framework 100.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications can be installed on terminal device 101,102,103, such as image processing application, Video class application, live streaming application, instant messaging tools, mailbox client, social platform software etc..

Here terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102, 103 be hardware when, can be the various electronic equipments with display screen, including but not limited to smart phone, tablet computer, electronics Book reader, pocket computer on knee and desktop computer etc..It, can be with when terminal device 101,102,103 is software It is mounted in above-mentioned cited electronic equipment.Multiple softwares or software module may be implemented into (such as providing distribution in it The multiple softwares or software module of formula service), single software or software module also may be implemented into.It is not specifically limited herein.

Server 105 can be to provide the server of various services, such as provide support to terminal device 101,102,103 Background server.Background server can the data such as image to the posture for having marked object got analyze etc. Reason, and by processing result (such as to image detection to a posture) feed back to terminal device.

It should be noted that image processing method provided by the embodiment of the present application can be by server 105 or terminal Equipment 101,102,103 executes, correspondingly, image processing apparatus can be set in server 105 or terminal device 101, 102, in 103.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

With continued reference to Fig. 2, the process 200 of one embodiment of the image processing method according to the application is shown.The figure As processing method, comprising the following steps:

Step 201, the image for having marked the posture of object is obtained, wherein image contains at least two object, different objects Posture it is different, posture is indicated by multiple key points.

In the present embodiment, the executing subject (such as server shown in FIG. 1 or terminal device) of image processing method can To obtain the image for the posture for having marked object.In the picture, the posture of object is marked out.Here object can be People, face, cat, article etc..Specifically, posture can be indicated by the coordinate of key point.For example, people is in standing Posture and squat down posture when, the coordinate of nose key point with the coordinate of tiptoe key point at a distance from difference.

Step 202, based on image and to the mark of posture, convolutional neural networks are trained, the volume after being trained Product neural network, training process includes step 2021, step 2022 and step 2023, as follows:

Step 2021, image is inputted into convolutional neural networks, the anchor point posture being formerly arranged based on convolutional neural networks, Determine the candidate attitudes of each object.

In the present embodiment, the image that above-mentioned executing subject can will acquire inputs convolutional neural networks, thus based on volume The anchor point posture being formerly arranged in product neural network, obtains the candidate attitudes of each object by convolutional neural networks (proposal).Specifically, in convolutional neural networks include region candidate network (Region Proposal Network, RPN).Anchor point posture (anchor) in convolutional neural networks size in the picture and position are fixed.Above-mentioned execution master Body can determine above-mentioned image input area domain candidate network, region candidate network between candidate attitudes and anchor point posture The difference of size and the difference of position, and each candidate attitudes are indicated using the difference of above-mentioned size and the difference of position Size and position.Here size can indicate that position can be using coordinate come table using area or width, height or length and width etc. Show.For each of image object, above-mentioned executing subject can determine multiple candidate attitudes.

During training, the posture that above-mentioned executing subject can obtain convolutional neural networks output is used as to above-mentioned figure As the posture detected, and it is based on preset loss function, determines the penalty values of the posture of the posture and mark.Utilizing later should Penalty values are trained, with the convolutional neural networks after being trained.

Step 2022, the candidate frame and the registration of the callout box of the posture marked where each candidate attitudes are determined, Registration is greater than the candidate frame of default registration threshold value as target candidate frame.

In the present embodiment, above-mentioned executing subject can determine candidate frame where each candidate attitudes and the appearance that has marked The registration (Intersection over Union, IOU) of the callout box of state.Later, above-mentioned executing subject can choose coincidence Degree is greater than the candidate frame of default registration threshold value, and using selected candidate frame as target candidate frame.Specifically, the frame of posture Width and height to can be the most left coordinate of key point included by posture, most right coordinate width generated (or long Degree), and most upper coordinate, most descend coordinate height generated (or width).Registration can be between candidate frame and callout box Intersection and candidate frame and callout box between union ratio.If candidate frame and the registration of callout box are larger, show this The accuracy that candidate frame confines object is higher, in this way, can more accurately divide object and non-right by the candidate frame As.

Step 2023, for each key point in target candidate frame corresponding to each callout box, each target is taken The position mean of the key point in candidate frame；The set of the position mean of each key point is arrived as to image detection A posture.

In the present embodiment, above-mentioned executing subject can be for each in target candidate frame corresponding to each callout box A key point takes the position mean of the key point in each target candidate frame corresponding to the callout box in candidate attitudes. To which above-mentioned executing subject can be by the collection of the position mean of each key point of target candidate frame corresponding to the callout box Cooperation is the posture arrived to image detection.Corresponding callout box and target candidate frame indicate identical object.

Specifically, identical weight can be arranged to the position of each key point, with calculating position average value.In addition, right There may also be differences for weight set by the position of each key point.

It should be noted that, although taking position mean to each key point of posture in target candidate frame, still The present embodiment is not involved in the possibility for obtaining position mean there are the key point in partial target candidate frame.

In some optional implementations of the present embodiment, in step 2023 for mesh corresponding to each callout box Each key point in candidate frame is marked, the position mean of the key point at least two target candidate frames is taken, can wrap It includes:

For each key point in each target candidate frame corresponding to each callout box, in response to determining the key The position of point is other than the callout box, using the preset first default weight as the power of the key point in the target candidate frame Weight；For each key point in each target candidate frame corresponding to each callout box, in response to determining the key point Position is within the callout box, using the preset second default weight as the weight of the key point in the target candidate frame, One default weight is less than the second default weight；Power based on the key point in each target candidate frame corresponding to the callout box Weight, determines the position mean of the key point in each target candidate frame.

In these optional implementations, above-mentioned executing subject can be when the average value of calculating position, to marking The coordinate for infusing the position other than frame uses lesser weight, and uses biggish power to the coordinate of the position within callout box Weight.For example, the weight of key point A, key point B and key point C is respectively in callout box, in callout box and outside callout box, Can respectively to key point A, key point B and key point C using weight 1,1 and 0.5 come calculating position average value.So obtain Position mean is (1 × key point location A+1 × key point+0.5 × key point of B location location of C)/(1+1+0.5).

These implementations can obtain the weight of different target candidate frame differentially.Because of the key outside callout box Often accuracy is lower for point, and such weight setting mode can reduce the weight of these key points to obtain more accurate key The position mean of point, and then accurately determine posture.

For each key point in each target candidate frame corresponding to each callout box, determine the key point with Whether the distance of the key point is less than or equal to pre-determined distance threshold value in the posture of mark；It is less than or equal in response to determination, Based on the weight of the key point in each target candidate frame corresponding to the callout box, being somebody's turn to do in each target candidate frame is determined The position mean of key point.

In these optional implementations, above-mentioned executing subject can determine each target corresponding to each callout box Whether it is less than or equal at a distance from the key point in the posture that each key point in candidate frame has marked in the callout box pre- If distance threshold, and thus the key point in each target candidate frame corresponding to the callout box is accepted or rejected.Namely There are the key points in partial target candidate frame to be not involved in acquirement position mean in these implementations.Specifically, if The distance between some key point in some target candidate frame corresponding to the callout box and the key point that is marked are smaller, Then determine that the key point can participate in calculating position average value.If existed in some target candidate frame corresponding to the callout box The distance between some key point and the key point that is marked are larger, then it represents that in the candidate attitudes that convolutional neural networks obtain The key point accuracy it is poor, can determine that the key point is not involved in calculating position average value.

For example, there are nose key point, a, b in three target candidate frames a, b and c corresponding to a callout box M It is respectively 1,2 and 3 at a distance from the nose key point marked in callout box M with the nose key point in c.If pre-determined distance Threshold value is 2.5, then the corresponding distance 1,2 of target candidate frame a, b institute is respectively less than the pre-determined distance threshold value, so target is waited Select the nose key point in frame a, b that can participate in calculating position average value.

These implementations can from some key point in the corresponding each target candidate frame of callout box selected distance The closer key point of callout box determines position mean, can be avoided the meter of the biggish key point participant position average value of deviation It calculates, and then improves the accuracy for determining posture.

With continued reference to the schematic diagram that Fig. 3, Fig. 3 are according to the application scenarios of the image processing method of the present embodiment.? In the application scenarios of Fig. 3, the image 302 of the available posture for having marked object of executing subject 301, wherein image includes extremely The posture of few two objects, different objects is different, and posture is indicated by multiple key points.Based on image and to the mark of posture, Convolutional neural networks are trained, the convolutional neural networks after being trained, training process includes: by image input convolution mind Through network, the anchor point posture 303 being formerly arranged based on convolutional neural networks determines the candidate attitudes 304 of each object.It determines Registration is greater than default registration by candidate frame and the registration of the callout box of the posture marked where each candidate attitudes The candidate frame of threshold value is as target candidate frame 305.It is crucial for each in target candidate frame corresponding to each callout box Point takes the position mean 306 of the key point in each target candidate frame.By the set of the position mean of each key point As the posture 307 arrived to image detection.

The present embodiment can sieve each candidate attitudes by registration from the image including at least two objects Choosing indicates the more accurate target candidate frame of object to choose.Also, the average value for obtaining key point accurately tells image In each posture.

With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of image processing method.The image procossing The process 400 of method, comprising the following steps:

Step 401, multiple preset postures in target image are clustered, obtains set of keypoints.

In the present embodiment, executing subject (such as server shown in FIG. 1 or the end of image processing method operation thereon End equipment) available target image, and multiple preset postures in target image are clustered, to obtain crucial point set It closes.Specifically, above-mentioned executing subject can cluster multiple preset postures using various ways.For example, can be to each The coordinate of the position of a key point is clustered, to obtain the cluster result of each key point.

In some optional implementations of the present embodiment, above-mentioned steps 401 be may comprise steps of:

Multi-C vector corresponding to each preset posture is clustered, wherein multi-C vector corresponding to preset posture Number of dimensions it is identical as the keypoint quantity of preset posture；Each pass of posture corresponding to multi-C vector by cluster centre Key point forms set of keypoints.

In these implementations, preset posture can be indicated using multi-C vector.Each of multi-C vector dimension Vector all correspond to a key point in preset posture position coordinate.By clustering available one or more clusters Center.Here cluster centre is also a multi-C vector.Above-mentioned executing subject can be by appearance indicated by this multi-C vector Each key point of state forms set of keypoints.

Step 402, each set of keypoints is determined as anchor point posture, wherein pass included by different set of keypoints The position of key point in the target image is different.

In the present embodiment, obtained each set of keypoints can be determined as anchor point posture by above-mentioned executing subject.This The position of sample, obtained each anchor point posture is more differential.Meanwhile the present embodiment also can be to multiple preset postures It is clustered, to obtain accurate anchor point posture.In this way, the candidate that detection obtains can be reduced during test pose The deviation of posture and anchor point posture.

Step 403, the image for having marked the posture of object is obtained, wherein image contains at least two object, different objects Posture it is different, posture is indicated by multiple key points.

In the present embodiment, the image of the available posture for having marked object of above-mentioned executing subject.In the picture, object Posture be marked out.Here object can be people, face, cat, article etc..Specifically, key point can be passed through Coordinate indicates posture.

Step 404, based on image and to the mark of posture, convolutional neural networks are trained, the volume after being trained Product neural network, training process includes step 4041, step 4042 and step 4043, as follows:

Step 4041, image is inputted into convolutional neural networks, the anchor point posture being formerly arranged based on convolutional neural networks, Determine the candidate attitudes of each object.

In the present embodiment, the image that above-mentioned executing subject can will acquire inputs convolutional neural networks, thus based on volume The anchor point posture of product neural network being formerly arranged, obtains the candidate attitudes of each object by convolutional neural networks.Specifically, it rolls up It include region candidate network in product neural network.Anchor point posture size in the picture and position are fixed.

Step 4042, the candidate frame and the registration of the callout box of the posture marked where each candidate attitudes are determined, Registration is greater than the candidate frame of default registration threshold value as target candidate frame.

In the present embodiment, above-mentioned executing subject can determine candidate frame where each candidate attitudes and the appearance that has marked The registration of the callout box of state.Later, above-mentioned executing subject can choose the candidate frame that registration is greater than default registration threshold value, And using selected candidate frame as target candidate frame.

Step 4043, for each key point in target candidate frame corresponding to each callout box, each target is taken The position mean of the key point in candidate frame；The set of the position mean of each key point is arrived as to image detection A posture.

In the present embodiment, above-mentioned executing subject can be for each in target candidate frame corresponding to each callout box A key point takes the position mean of the key point in each target candidate frame corresponding to the callout box in candidate attitudes. To which above-mentioned executing subject can be by the collection of the position mean of each key point of target candidate frame corresponding to the callout box Cooperation is the posture arrived to image detection.

The obtained each anchor point posture of the present embodiment be it is more differential, be conducive to obtaining anchor point posture abundant While, control the quantity of anchor point posture.In this way, the arithmetic speed of region candidate network can either be improved, inspection can also ensure that The deviation of the candidate attitudes and anchor point posture that measure is smaller.Also, the present embodiment can also gather multiple preset postures Class obtains accurate anchor point posture, to further decrease the deviation of candidate attitudes and anchor point posture that detection obtains.

With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides a kind of image procossing dresses The one embodiment set, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which specifically can be applied to respectively In kind electronic equipment.

As shown in figure 5, the image processing apparatus 500 of the present embodiment includes: acquiring unit 501 and training unit 502.Its In, acquiring unit 501 is configured to obtain the image for the posture for having marked object, wherein and image contains at least two object, The posture of different objects is different, and posture is indicated by multiple key points；Training unit 502 is configured to based on image and to appearance The mark of state, is trained convolutional neural networks, the convolutional neural networks after being trained, and training process includes: by image Convolutional neural networks are inputted, the anchor point posture being formerly arranged based on convolutional neural networks determines the candidate attitudes of each object； It determines the candidate frame and the registration of the callout box of the posture marked where each candidate attitudes, registration is greater than default weight The candidate frame of right threshold value is as target candidate frame；It is crucial for each in target candidate frame corresponding to each callout box Point takes the position mean of the key point in each target candidate frame；By the collection cooperation of the position mean of each key point For the posture arrived to image detection.

In some embodiments, the acquiring unit 501 of the image processing apparatus 500 available posture for having marked object Image.In the picture, the posture of object is marked out.Here object can be people, face, cat, article etc..Specifically Ground can indicate posture by the coordinate of key point.For example, people in midstance and squat down posture when, nose close The coordinate of key point is different at a distance from the coordinate of tiptoe key point.

In some embodiments, the image that training unit 502 can will acquire inputs convolutional neural networks, thus based on volume The anchor point posture being formerly arranged in product neural network, obtains the candidate attitudes of each object by convolutional neural networks.Later, it selects Registration is taken to be greater than the candidate frame of default registration threshold value, and using selected candidate frame as target candidate frame.Above-mentioned execution Main body can also take corresponding to the callout box each key point in target candidate frame corresponding to each callout box The position mean of the key point in each target candidate frame in candidate attitudes.

In some optional implementations of the present embodiment, device further include: cluster cell is configured to target figure Multiple preset postures as in are clustered, and set of keypoints is obtained；Determination unit is configured to each set of keypoints is true It is set to anchor point posture, wherein the position of key point included by different set of keypoints in the target image is different.

In some optional implementations of the present embodiment, training unit is further configured to: for each mark Each key point in each target candidate frame corresponding to frame, in response to determine the key point position the callout box with Outside, using the preset first default weight as the weight of the key point in the target candidate frame；In response to determining the key point Position within the callout box, using the preset second default weight as the weight of the key point in the target candidate frame, First default weight is less than the second default weight；Based on the key point in each target candidate frame corresponding to the callout box Weight determines the position mean of the key point in each target candidate frame.

In some optional implementations of the present embodiment, training unit is further configured to: for each mark Each key point in each target candidate frame corresponding to frame determines the key point in the key point and the posture marked Distance whether be less than or equal to pre-determined distance threshold value；It is less than or equal in response to determination, based on corresponding to the callout box The weight of the key point in each target candidate frame determines the position mean of the key point in each target candidate frame.

Below with reference to Fig. 6, it illustrates the computer systems 600 for the electronic equipment for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Electronic equipment shown in Fig. 6 is only an example, function to the embodiment of the present application and should not use model Shroud carrys out any restrictions.

As shown in fig. 6, computer system 600 includes central processing unit (CPU and/or GPU) 601, it can be according to depositing Storage is loaded into random access storage device (RAM) 603 in the program in read-only memory (ROM) 602 or from storage section 608 Program and execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various journeys Sequence and data.Central processing unit 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) Interface 605 is also connected to bus 604.

I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 608 including hard disk etc.； And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon Computer program be mounted into storage section 608 as needed.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media 611 are mounted.When the computer program is executed by central processing unit 601, limited in execution the present processes above-mentioned Function.It should be noted that the computer-readable medium of the application can be computer-readable signal media or computer can Read storage medium either the two any combination.Computer readable storage medium for example can be --- but it is unlimited In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates The more specific example of machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, portable of one or more conducting wires Formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or The above-mentioned any appropriate combination of person.In this application, computer readable storage medium can be it is any include or storage program Tangible medium, which can be commanded execution system, device or device use or in connection.And in this Shen Please in, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable Any computer-readable medium other than storage medium, the computer-readable medium can send, propagate or transmit for by Instruction execution system, device or device use or program in connection.The journey for including on computer-readable medium Sequence code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include acquiring unit and training unit.Wherein, the title of these units does not constitute the limit to the unit itself under certain conditions It is fixed, for example, acquiring unit is also described as " obtaining the unit for having marked the image of posture of object ".

As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in device described in above-described embodiment；It is also possible to individualism, and without in the supplying device.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should Device: the image for having marked the posture of object is obtained, wherein image contains at least two object, and the posture of different objects is different, Posture is indicated by multiple key points；Based on image and to the mark of posture, convolutional neural networks are trained, are trained Convolutional neural networks afterwards, training process include: that image is inputted convolutional neural networks, formerly setting based on convolutional neural networks The anchor point posture set, determines the candidate attitudes of each object；The appearance for determining candidate frame where each candidate attitudes and having marked Registration is greater than the candidate frame of default registration threshold value as target candidate frame by the registration of the callout box of state；For each Each key point in target candidate frame corresponding to callout box takes the position of the key point in each target candidate frame flat Mean value；By the set of the position mean of each key point as the posture arrived to image detection.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of image processing method, comprising:

Obtain the image for having marked the posture of object, wherein described image contains at least two object, and the posture of different objects is not Together, posture is indicated by multiple key points；

Based on described image and to the mark of posture, convolutional neural networks are trained, the convolutional Neural net after being trained Network, training process include:

Described image is inputted into convolutional neural networks, the anchor point posture being formerly arranged based on the convolutional neural networks determines The candidate attitudes of each object；

It determines the candidate frame and the registration of the callout box of the posture marked where each candidate attitudes, registration is greater than pre- If the candidate frame of registration threshold value is as target candidate frame；

For each key point in target candidate frame corresponding to each callout box, the pass in each target candidate frame is taken The position mean of key point；By the set of the position mean of each key point as the appearance detected to described image State.

2. according to the method described in claim 1, wherein, described image is inputted convolutional neural networks described, based on described The anchor point posture of convolutional neural networks being formerly arranged, before the candidate attitudes for determining each object, the method also includes:

Multiple preset postures in target image are clustered, set of keypoints is obtained；

Each set of keypoints is determined as anchor point posture, wherein key point included by different set of keypoints is in the mesh Position in logo image is different.

3. being obtained according to the method described in claim 2, wherein, multiple preset postures in target image cluster To set of keypoints, comprising:

Multi-C vector corresponding to each preset posture is clustered, wherein the dimension of multi-C vector corresponding to preset posture Degree amount is identical as the keypoint quantity of preset posture；

Each key point of posture corresponding to multi-C vector by cluster centre forms set of keypoints.

4. described for each of target candidate frame corresponding to each callout box according to the method described in claim 1, wherein A key point takes the position mean of the key point in each target candidate frame in candidate attitudes, comprising:

For each key point in each target candidate frame corresponding to each callout box, in response to determining the key point Position is other than the callout box, using the preset first default weight as the weight of the key point in the target candidate frame；It rings It should be in determining the position of the key point within the callout box, using the preset second default weight as in the target candidate frame The weight of the key point, the first default weight is less than the second default weight；Based on each target corresponding to the callout box The weight of the key point in candidate frame determines the position mean of the key point in each target candidate frame.

5. the method stated according to claim 1, wherein each for target candidate frame corresponding to each callout box Key point takes the position mean of the key point in each target candidate frame in candidate attitudes, comprising:

For each key point in each target candidate frame corresponding to each callout box, determines the key point and marked Posture in the distance of the key point whether be less than or equal to pre-determined distance threshold value；It is less than or equal in response to determination, is based on The weight of the key point in each target candidate frame corresponding to the callout box determines the key in each target candidate frame The position mean of point.

6. a kind of image processing apparatus, comprising:

Acquiring unit is configured to obtain the image for the posture for having marked object, wherein described image contains at least two pair As the posture of different objects is different, and posture is indicated by multiple key points；

Training unit is configured to be trained convolutional neural networks based on described image and to the mark of posture, instructed Convolutional neural networks after white silk, training process include:

Described image is inputted into convolutional neural networks, the anchor point posture being formerly arranged based on the convolutional neural networks determines The candidate attitudes of each object；Determine being overlapped for candidate frame where each candidate attitudes and the callout box of the posture marked Registration is greater than the candidate frame of default registration threshold value as target candidate frame by degree；For mesh corresponding to each callout box Each key point in candidate frame is marked, the position mean of the key point in each target candidate frame is taken；By each key The set of the position mean of point is as the posture detected to described image.

7. device according to claim 6, wherein described device further include:

Cluster cell is configured to cluster multiple preset postures in target image, obtains set of keypoints；

Determination unit is configured to each set of keypoints being determined as anchor point posture, wherein included by different set of keypoints Position of the key point in the target image it is different.

8. device according to claim 7, wherein the cluster cell is further configured to:

Each key point of preset posture corresponding to multi-C vector by cluster centre forms set of keypoints.

9. device according to claim 6, wherein the training unit is further configured to:

10. device according to claim 6, wherein the training unit is further configured to:

11. a kind of electronic equipment, comprising:

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 5.

12. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor Realize such as method as claimed in any one of claims 1 to 5.