WO2021135424A1 - 图像处理方法、装置、存储介质和电子设备 - Google Patents

图像处理方法、装置、存储介质和电子设备 Download PDF

Info

Publication number
WO2021135424A1
WO2021135424A1 PCT/CN2020/116889 CN2020116889W WO2021135424A1 WO 2021135424 A1 WO2021135424 A1 WO 2021135424A1 CN 2020116889 W CN2020116889 W CN 2020116889W WO 2021135424 A1 WO2021135424 A1 WO 2021135424A1
Authority
WO
WIPO (PCT)
Prior art keywords
interaction
target
image
key point
point
Prior art date
Application number
PCT/CN2020/116889
Other languages
English (en)
French (fr)
Inventor
廖越
王飞
陈彦杰
钱晨
刘偲
Original Assignee
上海商汤临港智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤临港智能科技有限公司 filed Critical 上海商汤临港智能科技有限公司
Priority to KR1020217034504A priority Critical patent/KR102432204B1/ko
Priority to JP2021557461A priority patent/JP7105383B2/ja
Publication of WO2021135424A1 publication Critical patent/WO2021135424A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present disclosure relates to image processing technology, and in particular to an image processing method, device, storage medium, and electronic equipment.
  • the people and objects in the picture are usually detected by the detector first, and the people and objects with confidence higher than a certain threshold are selected, and the selected people and objects are selected. Pairs in pairs to form a person-object pair; then each person-object pair is classified through a relationship classification network, and the action relationship category is output.
  • the first aspect only considers the confidence of the detection, and does not consider the possibility of interaction between people and objects. This will lose the person or object with the real interaction relationship, that is, the loss of the real interaction. There are a lot of person-object pairs that do not have a real interactive action relationship. Secondly, under normal circumstances, there are only a few people and objects in a picture that have an interactive action relationship. If M persons and N objects are detected in the picture, the above processing method will generate M ⁇ N person-object pairs. Then the relationship classification network needs to determine the corresponding action relationship category for each person-object pair, resulting in more problems. Necessary treatment and consumption.
  • the embodiments of the present disclosure provide an image processing method, device, storage medium, and electronic equipment.
  • An embodiment of the present disclosure provides an image processing method, the method includes: extracting feature data of a first image; determining each interaction key point in the first image and the center point of each target based on the feature data; An interaction key point is a point on a line within a preset range from the midpoint of the line, and the line is a line between the center points of two targets in an interactive action; based on the feature
  • the data determines at least two offsets; one offset represents the offset between the interaction key point in an interactive action and the center point of a target in the interaction action; based on the center point of each target and the interaction key point And the at least two offsets determine the interaction relationship between the targets in the first image.
  • the determining each interaction key point in the first image and the center point of each target in the first image based on the feature data includes: determining the first image based on the feature data The center point of each target in the image, and the confidence of each target; determining the interaction key points in the first image based on the feature data, and the confidence of each interaction key point corresponding to each interaction action category;
  • the determining the interaction relationship between the targets in the first image based on the center point of each target, the interaction key point, and the at least two offsets includes: based on the center point of each target, the interaction The key points, the at least two offsets, the confidence of each target, and the confidence of each interaction key point corresponding to each preset interactive action category determine the interaction relationship between the targets in the first image.
  • the determining the center point of each target in the first image and the confidence level of each target in the first image based on the characteristic data includes: determining the The center point and category of each target in the first image, and the confidence that each target belongs to each category; the center point based on each target, the interaction key point, the at least two offsets, The confidence of each target and the confidence of each interactive key point corresponding to each preset interactive action category, and determining the interactive relationship between the targets in the first image includes: based on the center point of each target and its category, The interaction key point, the at least two offsets, the confidence that each target belongs to each category, and the confidence that each interaction key point corresponds to each preset interactive action category, determine the target in the first image The interaction relationship between.
  • the central point based on each target, the interaction key point, the at least two offsets, the confidence of each target, and each interaction key point correspond to each preset
  • Setting the confidence level of the interactive action category and determining the interaction relationship between the targets in the first image includes: for one interaction key point, determining two offsets corresponding to the interaction key point; The interaction key point and the two offsets corresponding to the interaction key point determine the two prediction center points corresponding to the interaction key point; according to the center point of each target and the two predictions corresponding to each interaction key point
  • the central point determines the two targets corresponding to each interactive key point; according to the two targets corresponding to each interactive key point, the confidence level of each target, and the confidence level of each interactive key point corresponding to each preset interactive action category, Determine the interaction relationship between the targets in the first image.
  • the two targets corresponding to each interaction key point, the confidence of each target, and the confidence of each interaction key point corresponding to each preset interaction action category are determined.
  • the interaction relationship between the targets in the first image includes: for an interaction key point, the confidence of the interaction key point corresponding to a preset interaction action category and the confidence of the two targets corresponding to the interaction key point Multiply the degree to obtain a first confidence degree, where the first confidence degree is the degree of confidence that the interaction relationship between the two targets corresponding to the interaction key point belongs to the preset interactive action category; in response to the first confidence If the degree is greater than the confidence threshold, it is determined that the interaction relationship between the two targets corresponding to the interaction key point belongs to the preset interaction action category; in response to the first confidence being not greater than the confidence threshold, the interaction key is determined The interaction relationship between the two targets corresponding to the points does not belong to the preset interaction action category.
  • the method further includes: after determining that the interaction relationship between the two targets corresponding to an interaction key point does not belong to each preset interaction action category, determining that the interaction key point corresponds to There is no interaction between the two goals of.
  • the determining the two targets corresponding to each interaction key point according to the center point of each target and the two prediction center points corresponding to each interaction key point includes: for one prediction The center point determines the distance between the center point of each target and the predicted center point; the target whose distance between the center point and the predicted center point is less than the preset distance threshold is used as the interaction key corresponding to the predicted center point Point corresponding to the target.
  • determining the center point of each target in the first image based on the characteristic data includes: down-sampling the characteristic data to obtain a heat map of the first image; Determine the position offset of each point in the first image, the center point of each target in the first image, and the height and width of the detection frame of each target according to the heat map; After determining the center point of each target in the first image, the method further includes: according to the position offset of the center point of the target in the first image that has an interactive relationship, the interaction in the first image is determined.
  • the position of the center point of the target of the relationship is corrected to obtain the corrected position of the center point of the target with the interactive relationship in the first image; the corrected position is based on the center point of the target with the interactive relationship in the first image
  • the position of and the height and width of its detection frame determine the detection frame of an interactive target in the first image.
  • the image processing method is executed by a neural network, and the neural network is trained by using a sample image, and the sample image is marked with a detection frame of an interactive target, and the sample The marked center point and the marked interaction key point of the target with the interaction relationship in the image are determined according to the marked detection frame, and the marked offset is determined according to the marked center point of the target with the interaction relationship and the marked interaction key point.
  • the neural network is trained using sample images, including: extracting feature data of the sample image using the neural network; using the neural network to analyze the feature data of the sample image
  • the heat map of the sample image is obtained by down-sampling; the neural network is used to predict the position offset of each point in the sample image based on the heat map of the sample image, each interactive key point in the sample image, the The center point of each target in the sample image, the height and width of the detection frame of each target in the sample image; using the neural network to predict at least two offsets based on the feature data of the sample image; Predicting the interaction relationship between the targets in the sample image by the center point of each target in the sample image, the interaction key point in the sample image, and at least two offsets in the sample image; According to the predicted position offset, the predicted center point of the target that has an interactive relationship in the sample image, and the height and width of the predicted detection frame, the predicted interaction key point corresponding to the target that has an interactive relationship in the sample image, and The corresponding predicted
  • the embodiment of the present disclosure also provides an image processing device, the device includes: an extraction unit, a first determination unit, a second determination unit, and a third determination unit; wherein,
  • the extraction unit is configured to extract feature data of the first image
  • the first determining unit is configured to determine each interaction key point and the center point of each target in the first image based on the feature data extracted by the extraction unit; an interaction key point is determined by the distance on the line A point within a preset range of the midpoint of the line, and the line is a line between the center points of two targets in an interactive action;
  • the second determining unit is configured to determine at least two offsets based on the feature data extracted by the extracting unit; one offset represents the difference between an interactive key point in an interactive action and a target in the interactive action The offset of the center point;
  • the third determining unit is configured to determine the interaction relationship between the targets in the first image based on the center point of each target, the interaction key point, and the at least two offsets.
  • the first determining unit is configured to determine the center point of each target in the first image and the confidence level of each target based on the characteristic data;
  • the feature data determines the interaction key points in the first image, and the confidence of each interaction key point corresponding to each interaction action category;
  • the third determining unit is configured to be based on the center point of each target, the interaction key point, the at least two offsets, the confidence of each target, and each interaction key point corresponds to each preset interaction action category
  • the confidence level of determines the interaction relationship between the targets in the first image.
  • the first determining unit is configured to determine the center point and its category of each target in the first image based on the feature data, and that each target belongs to each preset Set the confidence level of the category;
  • the third determining unit is configured to be based on the center point of each target and its category, the interaction key point, the at least two offsets, the confidence that each target belongs to each preset category, and each interaction key The point corresponds to the confidence level of each preset interactive action category, and determines the interactive relationship between the targets in the first image.
  • the third determining unit is configured to determine two offsets corresponding to the interaction key point for one interaction key point; according to the interaction key point and the The two offsets corresponding to the interaction key point determine the two prediction center points corresponding to the interaction key point; according to the center point of each target and the two prediction center points corresponding to each interaction key point, determine each Two targets corresponding to two interactive key points; determine the first target according to the two targets corresponding to each interactive key point, the confidence of each target, and the confidence of each interactive key point corresponding to each preset interactive action category The interaction between the objects in the image.
  • the third determining unit is configured to, for an interaction key point, set the confidence that the interaction key point corresponds to a preset interaction action category and the confidence level corresponding to the interaction key point
  • the confidence levels of the two targets are multiplied to obtain a first confidence level, where the first confidence level is the confidence level that the interaction relationship between the two targets corresponding to the interaction key point belongs to the interaction action category; in response to the The first confidence is greater than the confidence threshold, it is determined that the interaction relationship between the two targets corresponding to the interaction key point belongs to the preset interactive action category; in response to the first confidence being not greater than the confidence threshold, it is determined that all The interaction relationship between the two targets corresponding to the interaction key point does not belong to the preset interaction action category.
  • the third determining unit is further configured to determine that the interaction relationship between two targets corresponding to one interaction key point does not belong to each preset interaction action category. There is no interactive relationship between the two targets corresponding to the interactive key points.
  • the third determining unit is configured to determine the distance between the center point of each target and the predicted center point for a predicted center point; and compare the center point to the predicted center point.
  • the target whose distance between the predicted center points is less than the preset distance threshold is regarded as the target corresponding to the interaction key point corresponding to the predicted center point.
  • the first determining unit is configured to down-sample the characteristic data to obtain a heat map of the first image; determine each of the first images according to the heat map The position offset of the points, the center point of each target in the first image, and the height and width of the detection frame of each target; and the height and width of the detection frame of each target; After the center point of the target, the position of the center point of the target with the interactive relationship in the first image is corrected according to the position offset of the center point of the target with the interactive relationship in the first image to obtain the first The corrected position of the center point of the target with the interactive relationship in the image; the first image is determined according to the corrected position of the center point of the target with the interactive relationship in the first image and the height and width of the detection frame. The detection frame of the interactive target in the image.
  • each functional unit in the image processing device is implemented by a neural network, and the neural network is trained using sample images, and the sample images are marked with the detection of interactive targets
  • the marked center point and the marked interaction key point of the target with the interaction relationship in the sample image are determined according to the marked detection frame, and the marked offset is determined according to the marked center point of the target with the interaction relationship and the marked interaction
  • the key point is determined.
  • the device further includes a training unit configured to train the neural network using sample images, and is specifically configured to: extract feature data of the sample image by using the neural network;
  • the neural network down-samples the characteristic data of the sample image to obtain the heat map of the sample image; the neural network is used to predict the position offset and the position of each point in the sample image based on the heat map of the sample image.
  • Each interactive key point in the sample image, the center point of each target in the sample image, the height and width of the detection frame of each target in the sample image; using the neural network based on the sample image Predict at least two offsets based on the feature data of the sample image; predict the offset based on the center point of each target in the sample image, the interaction key point in the sample image, and at least two offsets in the sample image
  • the interactive relationship between the targets in the sample image according to the predicted position offset, the predicted center point of the target with the interactive relationship in the sample image, and the predicted height and width of the detection frame, the presence in the sample image
  • the predicted interaction key points corresponding to the target of the interaction relationship and the corresponding predicted offset, as well as the marked position offset and the detection frame of the target with the interaction relationship marked in the sample image adjust the neural network Network parameter value.
  • the embodiment of the present disclosure further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the method described in the embodiment of the present disclosure are realized.
  • the embodiment of the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the method described in the embodiment of the present disclosure when the program is executed. A step of.
  • the embodiment of the present disclosure also provides a computer program, including computer readable code.
  • the processor in the electronic device executes the Method steps.
  • the image processing method, device, storage medium, and electronic equipment provided by the embodiments of the present disclosure include: extracting feature data of a first image; and determining each interaction key point and each interaction key point in the first image based on the feature data.
  • the center point of a target; an interaction key point is a point on the line within a preset range from the midpoint of the line, and the line is the connection between the center points of two targets in an interactive action Line; determine at least two offsets based on the feature data; an offset characterizes the offset of the interaction key point in an interactive action and the center point of a target in the interactive action; based on the center point of each target ,
  • the interaction key point and the at least two offsets determine the interaction relationship between the targets in the first image.
  • the interaction key points related to the interactive actions by defining the interaction key points related to the interactive actions, and determining at least two offsets related to the interaction key points, and then passing the center point of each target, the interaction key point and all the interaction key points.
  • the at least two offsets determine the interaction relationship between the targets in the first image, without generating a person-object pair, and avoiding a person with a real interaction relationship generated by using a person-object pair for interactive action detection.
  • the problem of object pair loss; and, compared with the traditional method of first detecting people and objects, then grouping people and objects, and then classifying and detecting each person-object pair based on the relationship classification network, this embodiment is greatly improved
  • the detection speed improves the detection efficiency.
  • FIG. 1 is a schematic flowchart of an image processing method according to an embodiment of the disclosure
  • FIG. 2 is a schematic diagram of an application of the image processing method according to an embodiment of the disclosure.
  • FIG. 3 is a schematic diagram of another application of the image processing method according to an embodiment of the disclosure.
  • FIG. 4 is a schematic flow chart of a neural network training method in an image processing method according to an embodiment of the disclosure
  • FIG. 5 is a first schematic diagram of the composition structure of the image processing apparatus according to an embodiment of the disclosure.
  • FIG. 6 is a second schematic diagram of the composition structure of the image processing apparatus according to an embodiment of the disclosure.
  • FIG. 7 is a schematic diagram of the hardware composition structure of an electronic device according to an embodiment of the disclosure.
  • FIG. 1 is a schematic flowchart of an image processing method according to an embodiment of the disclosure; as shown in FIG. 1, the method includes:
  • Step 101 Extract feature data of the first image
  • Step 102 Determine each interaction key point in the first image and the center point of each target based on the characteristic data; an interaction key point is one on the line within a preset range from the midpoint of the line Point, the line is a line between the center points of two targets in an interactive action;
  • Step 103 Determine at least two offsets based on the characteristic data; one offset represents the offset between the interaction key point in an interactive action and the center point of a target in the interactive action;
  • Step 104 Determine the interaction relationship between the targets in the first image based on the center point of each target, the interaction key point, and the at least two offsets.
  • the first image may include multiple targets, wherein each target in the multiple targets may not have an interactive relationship, or the multiple targets may include at least one set of interactive relationships.
  • Target wherein the target having an interactive relationship is specifically at least two targets, for example, at least one of the at least two targets has a target person, for example, two targets having an interactive relationship are two targets having an interactive relationship A target person, or two targets with an interactive relationship are a target person and a target object with an interactive relationship.
  • the at least two targets with an interactive relationship may specifically be two targets with interactive actions; wherein, the two targets with interactive actions may be two targets with direct interactive actions or implicit interactive actions. .
  • the target person included in the first image holds a cigarette in his hand, it can be considered that the target person has a direct action relationship with the cigarette as the target object.
  • the target person and the target object have a direct action relationship .
  • the target person included in the first image bounces the ball, the target person makes a bounce motion, and the ball is in the air below the target person’s hand, it can be considered that the target person is the same as the target object.
  • the ball has an implicit movement relationship.
  • the step of determining the center point and the key point of the interaction may be the same as the step of determining the offset (point matching step) Parallel, and then according to the correct offset and the detected center point and interaction key point to finally determine the target with the interaction relationship and the type of interaction action, thereby improving the efficiency of the interaction relationship detection.
  • the extracting feature data of the first image includes: extracting feature data of the first image through a deep neural network model.
  • the first image is input into the deep neural network model as input data to obtain feature data of the first image.
  • the deep neural network model may include multiple convolutional layers, and convolution processing is performed on the first image sequentially through each convolutional layer, so as to obtain feature data of the first image.
  • step 102 may be performed through the first branch network obtained through pre-training, that is, the center point of each target and each interaction key point are determined based on the characteristic data through the first branch network.
  • the feature data of the first image is input into the first branch network as input data to obtain the center point of each target in the first image and each interaction key point.
  • the characteristic data is processed through the first branch network to obtain the center point of each target person and each interaction key point.
  • the target included in the first image includes a target person and a target object
  • the feature data is processed through the first branch network to obtain the center point of the target person, the center point of the target object, and each interaction key point.
  • the first branch network will return the length and width of the detection frame of the target.
  • the detection frame of the target is based on the center point of the target and the length and width of the detection frame of the target. determine.
  • the first image includes two target persons and two target objects (the two target objects are two balls).
  • the center point of the target person can be recorded as the first center point. Record the center point of the target object as the second center point.
  • the interaction key point is a point on a line between the center points of two targets in an interactive action that is within a preset range from the midpoint of the line.
  • the interaction key point may be a midpoint of a line between the center points of two targets in an interaction action.
  • an interaction key point may be the midpoint of the line between the first center point of the target person and the second center point of the target object in an interactive action.
  • step 103 can be performed through the second branch network obtained by pre-training, that is, at least two offsets are determined by the second branch network based on the characteristic data; wherein, one offset represents an interactive action.
  • At least two offsets corresponding to each point can be represented by an offset matrix. Then, based on the interaction key points determined in step 102, at least two offsets corresponding to each interaction key point can be determined. In some embodiments, at least two offsets corresponding to each interactive key point can be determined according to the coordinates of each interactive key point and the offset matrix corresponding to each point.
  • one offset represents the offset between the interaction key point in the interactive action and the first center point
  • the other offset represents the offset between the interaction key point in the interactive action and the second center point.
  • the offset of the center point in order to distinguish, the offset between the interaction key point in the interaction action and the first center point is recorded as the first offset
  • the interaction key point in the interaction action is The offset of the center point is recorded as the second offset.
  • the first offset represents the offset of the interaction key point in the interaction action from the first center point
  • the second offset represents the interaction action The offset between the interaction key point in and the second center point.
  • the two targets can also be denoted as the first target and the second target respectively, and the first offset represents the offset between the interaction key point in the interactive action and the center point of the first target.
  • the second offset represents the offset between the interaction key point in the interaction action and the center point of the second target.
  • the determining the interaction relationship between the targets in the first image based on the center point of each target, the interaction key point, and the at least two offsets includes: An interaction key point, two offsets corresponding to the interaction key point are determined; according to the interaction key point and the two offsets corresponding to the interaction key point, it is determined to correspond to the interaction key point Two prediction center points of each target; two targets corresponding to each interaction key point are determined according to the center point of each target and the two prediction center points corresponding to each interaction key point; according to the two targets corresponding to each interaction key point Determine the interaction relationship between the targets in the first image.
  • the function of the at least two offsets determined in step 103 is to determine at least two targets with interactive actions (ie, interactive relationships). Through the center points of the targets and the key points of interaction determined in step 102, it is not known which targets have interactive actions. Based on this, in this embodiment, two offsets corresponding to each interaction key point are determined, and the interaction key is determined according to the interaction key point and the two offsets corresponding to the interaction key point. The two prediction center points corresponding to the points.
  • any interaction key point (denoted as the first interaction key point here) as an example, it is based on the position of the first interaction key point and an offset corresponding to the first interaction key point (for example, the first offset (Movement) can determine the first position.
  • the first position can theoretically be used as the position of the center point (for example, the first center point) of a target that matches the first interaction key point.
  • the first position is denoted as The first prediction center point; in the same way, the second position can be determined based on the position of the first interaction key point and another offset corresponding to the first interaction key point (for example, the second offset).
  • the second position is recorded as the second prediction center point.
  • a target whose distance between the center point and the obtained predicted center point is less than a preset distance threshold is taken as the target corresponding to the interaction key point corresponding to the predicted center point.
  • the distance between the center point of the first target and the first predicted center point is less than a preset distance threshold
  • the distance between the center point of the second target and the second preset center point is less than the preset
  • the distance threshold may indicate that the first target and the second target are the two targets corresponding to the first interaction key point. It can be understood that there may be more than one center point of a target whose distance from a certain predicted center point is less than the preset distance threshold, that is, there may be two or more targets corresponding to one interaction key point.
  • the interaction relationship between at least two targets corresponding to the interaction key point may be determined based on the confidence of each preset interaction action category corresponding to each interaction key point. It can be understood that when the feature data is processed through the first branch network to obtain each interaction key point in the first image, the confidence level of each preset interaction action category corresponding to each interaction key point can also be obtained, based on the prediction It is assumed that the confidence level of the interactive action category determines the interactive relationship between at least two targets.
  • the interaction key points related to the interactive actions by defining the interaction key points related to the interactive actions, and determining at least two offsets related to the interaction key points, and then passing the center point of each target, the interaction key point and all the interaction key points.
  • the at least two offsets determine the interaction relationship between the targets in the first image, without generating a person-object pair, and avoiding a person with a real interaction relationship generated by using a person-object pair for interactive action detection.
  • the problem of object pair loss; and, this embodiment can directly obtain objects with interactive relationships. Compared with the traditional way of classifying and detecting each person-object pair based on a relational classification network, this embodiment greatly improves the detection speed. Improve the detection efficiency.
  • determining the center point of each target in the first image based on the characteristic data includes: down-sampling the characteristic data to obtain the first A heat map of the image; determine the position offset of each point in the first image, the center point of each target in the first image, and the height and width of the detection frame of each target according to the heat map; After the central point of each target in the first image is determined based on the characteristic data, the method further includes: aligning the first image according to the position offset of the central point of the target having an interactive relationship in the first image Correct the position of the center point of the target with the interactive relationship in an image, and obtain the corrected position of the center point of the target with the interactive relationship in the first image; according to the position of the target with the interactive relationship in the first image The corrected position of the center point and the height and width of the detection frame determine the detection frame of the target having an interactive relationship in the first image.
  • the down-sampling process is performed on the feature data of the first image.
  • the down-sampling process may be, for example, performing image reduction processing on the feature map containing the feature data, that is, reducing the size of the feature map, which leads to the following
  • Each point in the heat map obtained after sampling does not have a one-to-one correspondence with each point in the first image.
  • the size of the first image is 128x128, and the center point of the target person in the first image is (10, 10).
  • the target person since the heat map is obtained by down-sampling, assuming that the down-sampling 4 times is 32x32, then the target person’s
  • the rounded point, that is, the coordinates are (2, 2), that is to say, downsampling will cause a position shift in the position of the center point of the target person.
  • the feature data can be processed through the first branch network, specifically by first down-sampling the feature map containing the feature data to obtain a heat map (Heatmap), and then determine the first image according to the heat map The position offset of each point, the center point of each target in the first image, and the height and width of the detection frame of each target. It can be understood that the feature data is used as the input data of the first branch network. After the heat map is obtained by down-sampling according to the feature data, the first branch network determines the position offset (4ffset) of each point in the first image based on the heat map.
  • a heat map heat map
  • each interaction key point in the first image and each interaction key point belong to each preset Set the confidence level of the interactive action category.
  • the center point of the target with an interactive relationship may be offset based on the position offset of the center point of the target.
  • the position of the point is corrected.
  • the obtained center point of the target and the corresponding position offset may be added together to obtain the corrected position of the center point of the target.
  • the detection frame of the target is obtained according to the corrected position of the center point of the target and the height and width of the detection frame, thereby outputting the detection frame of the target with an interactive relationship.
  • the first center point in FIG. 2 is the corrected position, and the vertical dashed line passing through the first center point indicates the height of the detection frame, which passes through the first center point.
  • the horizontal dashed line of the point indicates the width of the detection frame (width)
  • the determining each interaction key point and the center point of each target in the first image based on the feature data includes: based on the feature data Determine the center point of each target in the first image and the confidence level of each target; determine the interaction key points in the first image based on the feature data, and the presets corresponding to each interaction key point Set the confidence level of the interactive action category;
  • the determining the interaction relationship between the targets in the first image based on the center point of each target, the interaction key point, and the at least two offsets includes: based on the center point of each target, the interaction The key points, the at least two offsets, the confidence of each target, and the confidence of each preset interactive action category corresponding to each interactive key point, determine the interaction relationship between the targets in the first image .
  • the feature data can be processed based on the first branch network.
  • the feature data can be convolved through multiple convolution layers in the first branch network to obtain each target in the first image.
  • the center point of and the confidence of each target where the confidence of the target may be the confidence that the target exists in the first image.
  • the feature data can also be convolved through multiple convolutional layers in the first branch network to obtain each interaction key point in the first image and the confidence of the preset interaction action category corresponding to each interaction key point Degree; wherein, the preset interactive action category can be any pre-set interactive action category, such as smoking interactive action, bounce ball interactive action and so on.
  • the determining the center point of each target in the first image and the confidence of each target based on the feature data includes: based on the feature The data determines the center point and category of each target in the first image, and the confidence that each target belongs to each category; the center point based on each target, the interaction key point, the at least two The offset, the confidence of each target, and the confidence of each interactive key point corresponding to each preset interactive action category, determine the interaction relationship between the targets in the first image, including: based on the center point of each target And its category, the interactive key point, the at least two offsets, the confidence that each target belongs to each category, and the confidence that each interactive key point corresponds to each preset interactive action category, determine the first The interaction between the objects in the image.
  • the feature data can be processed based on the first branch network.
  • the feature data can be convolved through multiple convolution layers in the first branch network to obtain each target in the first image.
  • the confidence that the target in the first image belongs to the category that is, the confidence that a target belonging to a certain category exists somewhere in the first image.
  • the interaction key point based on the center point of each target and its category, the interaction key point, the at least two offsets, the confidence that each target belongs to each category, and each interaction key point corresponds to each preset
  • the confidence level of the interactive action category determines the interactive relationship between the targets in the first image.
  • any interaction key point (denoted as the first interaction key point here) as an example, it is based on the position of the first interaction key point and an offset corresponding to the first interaction key point (for example, the first interaction key point). Offset) to determine the first position.
  • the first position is recorded as the first prediction center point; in the same way, it is based on the position of the first interaction key point and another offset corresponding to the first interaction key point.
  • the shift amount (for example, the second shift amount) may determine the second position, and the second position is recorded as the second predicted center point here.
  • the two targets corresponding to each key point of interaction are determined. According to the two targets corresponding to each key point of interaction, each target belongs to each key point.
  • the confidence of the category and the confidence of each interaction key point corresponding to each preset interactive action category determine the interaction relationship between the targets in the first image.
  • the determining two targets corresponding to each interactive key point according to the center point of each target and the two predicted center points corresponding to each interactive key point includes: Predict the center point, determine the distance between the center point of each target and the predicted center point; take the target whose distance between the center point and the predicted center point is less than the preset distance threshold as the interaction corresponding to the predicted center point The target corresponding to the key point.
  • the target whose distance between the center point of the target and the obtained predicted center point is less than the preset distance threshold is taken as the target corresponding to the interaction key point corresponding to the predicted center point.
  • the distance between the center point of the first target and the first predicted center point is less than a preset distance threshold
  • the distance between the center point of the second target and the second preset center point is less than the preset
  • the distance threshold may indicate that the first target and the second target are the two targets corresponding to the first interaction key point. It can be understood that there may be more than one center point of a target whose distance from a certain predicted center point is less than the preset distance threshold, that is, there may be two or more targets corresponding to one interaction key point.
  • the target in the first image is further determined based on at least two targets corresponding to each interactive key point, the confidence that each target belongs to each category, and the confidence that each interactive key point corresponds to each preset interactive action category The interaction relationship between.
  • the determination is made according to the two targets corresponding to each interaction key point, the confidence of each target, and the confidence of each interaction key point corresponding to each preset interaction action category
  • the interaction relationship between the targets in the first image includes: for an interaction key point, the confidence that the interaction key point corresponds to a preset interaction action category and the two targets corresponding to the interaction key point belong to The confidence of the corresponding category is multiplied to obtain the first confidence, and the first confidence is the confidence that the interactive relationship between the two targets corresponding to the interactive key point belongs to the interactive action category; where the corresponding category is Refers to when two targets belong to this category, the interaction between the two targets belongs to the preset interactive action category; for example, if the preset action category is volleyball, then the corresponding category is that the category of one target is human, and the category of the other target is ball.
  • the default action category is call, then the corresponding category is one target category is human, and the other target category is call.
  • the first confidence level being greater than the confidence threshold, it is determined that the interaction relationship between the two targets corresponding to the interaction key point belongs to the preset interactive action category; in response to the first confidence level being not greater than the confidence level Threshold, which determines that the interaction relationship between the two targets corresponding to the interaction key point does not belong to the preset interaction action category.
  • the method further includes: after determining that the interaction relationship between the two targets corresponding to one interaction key point does not belong to each preset interaction action category, determining the interaction key point There is no interactive relationship between the corresponding two targets.
  • the above solution may be used to determine the interaction relationship between two targets among the multiple targets. Determine whether the interactive relationship between the two targets belongs to the preset interactive action category corresponding to the corresponding interactive key point, and so on. For example, there are three goals corresponding to a key point of interaction, denoted as goal 1, goal 2, and goal 3. The above scheme can be used to determine the interaction between goal 1 and goal 2, goal 2 and goal 3, and goal 3 and goal 1. relationship.
  • Fig. 3 is a schematic diagram of another application of the image processing method according to an embodiment of the disclosure; as shown in Fig. 3, the neural network may include a feature extraction network, a first branch network, and a second branch network; wherein, the feature extraction network is used for matching Feature extraction is performed on the input image to obtain feature data.
  • the first branch network is used to down-sample the feature data to obtain a heat map, and then determine the center point of each target in the input image and each interactive key point according to the heat map, and obtain the position offset of each point (offset) and each The height and width of the target detection frame [height, width], the confidence of each target category, and the confidence of each interactive key point corresponding to each preset interactive action category.
  • the second branch network is used to process the feature data to obtain at least two offsets of each point in the input image, and one offset represents an interactive key point in an interactive action and the center of a target in the interactive action The offset of the point.
  • the feature map containing the feature data is down-sampled through the first branch network to obtain the heat map.
  • the target in the input image includes the target person and the target object.
  • the center point of the target person is recorded as the first center point
  • the center point of the target object is recorded as the second center point.
  • a first heat map including a first center point, a second heat map including a second center point, and a third heat map including each interaction key point can be obtained.
  • the output data of the first branch network may include the first heat map, the second heat map, the third heat map, the position offset of each point in the input image, and the height and height of the detection frame of the target person and the target object. width.
  • the center point and category of each target based on the first branch network, the center point and category of each target, the confidence that each target belongs to each category, and the confidence of each preset interaction action category corresponding to each interaction key point can also be obtained.
  • the feature map containing the feature data is processed through the second branch network to obtain two offsets corresponding to each interaction key point.
  • the offset between the first center points of the target person is recorded as the first offset
  • the offset between the interaction key point and the second center point of the target object in the interactive action is recorded as the second offset the amount.
  • two prediction center points corresponding to the interaction key point are determined, which are respectively recorded as the first prediction center point and the second prediction center point.
  • Prediction center point for the first prediction center point, the distance between each first center point and the first prediction center point is determined, and the distance between the first prediction center point and the first prediction center point is determined to be less than the preset distance threshold
  • the first center point correspondingly, for the second prediction center point, respectively determine the distance between each second center point and the second prediction center point, and determine that the distance from the second prediction center point is less than the preset distance threshold The second center point.
  • the confidence of the preset interaction action category corresponding to each interaction key point is multiplied by the confidence of the target person and the confidence of the target object corresponding to the interaction key point.
  • the position offset of each point in the input image output by the first branch network is used to correct the position of the first center point of the target person and the second center point of the target object to obtain the target person's interactive relationship.
  • the corrected position of the first center point and the corrected position of the second center point of the target object are based on the corrected position of the first center point of the target person with an interactive relationship in the input image and the height of the detection frame And the width [height, width], the corrected position of the second center point of the target object and the height and width [height, width] of the detection frame to determine the detection frame of the target with an interactive relationship in the first image.
  • the output result of the neural network is the corrected position of the first center point of the target person and the corresponding detection frame, the corrected position of the second center point of the target object and the corresponding detection frame, and the interaction between the target person and the target object Relationship (i.e. interaction type). For targets that do not have an interactive relationship in the input image, no detection frame will be output.
  • the image processing method of this embodiment is executed by a neural network, and the neural network is trained using sample images, and the sample images are marked with the detection of interactive targets
  • the marked center point of the target in the sample image that is, the center of the target detection frame
  • the marked interaction key point (the midpoint of the line of the center of the detection frame of the target in the interactive relationship) is based on
  • the marked detection frame is determined, and the marked offset is determined according to the size of the sample image and the size of the heat map determined according to the sample image.
  • FIG. 4 is a schematic flowchart of a neural network training method in an image processing method according to an embodiment of the disclosure; as shown in FIG. 4, the method includes:
  • Step 201 Use the neural network to extract feature data of the sample image
  • Step 202 Use the neural network to down-sample the characteristic data of the sample image to obtain a heat map of the sample image;
  • Step 203 Use the neural network to predict the position offset of each point in the sample image, each interactive key point in the sample image, and the value of each target in the sample image based on the heat map of the sample image. The center point, the height and width of the detection frame of each target in the sample image;
  • Step 204 Use the neural network to predict at least two offsets based on the feature data of the sample image
  • Step 205 Predict the distance between the targets in the sample image based on the center point of each target in the sample image, the interaction key point in the sample image, and at least two offsets in the sample image ’S interaction
  • Step 206 According to the predicted position offset, the predicted center point of the target with the interaction relationship in the sample image and the predicted height and width of the detection frame, the predicted interaction corresponding to the target with the interaction relationship in the sample image.
  • the key points and their corresponding predicted offsets, as well as the marked position offsets, and the detection frame of the target with an interactive relationship marked in the sample image adjust the network parameter values of the neural network.
  • step 201 to step 205 in this embodiment For details of step 201 to step 205 in this embodiment, reference may be made to the description in the foregoing embodiment, which will not be repeated here.
  • the predicted center point of the target that has an interactive relationship in the predicted sample image and the predicted height and width of the detection frame can be used.
  • the predicted interaction key points, the detection frame of the marked interactive target and the marked position offset are combined to determine a loss function, and the network parameters of the first branch network are adjusted based on the loss function.
  • a loss function can be determined according to the predicted offset corresponding to the interaction key point and the marked offset, and the network of the second branch network can be affected by the loss function. The parameters are adjusted.
  • a loss function may be determined based on the predicted position offset and the marked position offset, and the position deviation caused by the down-sampling of the feature map containing the feature data is regressed through the loss function to minimize the down-sampling band The resulting loss can make the position offset (offset) of each point obtained more accurate. Based on this, the network parameters of the first branch network are adjusted through the loss function.
  • the parameter adjustment methods in the foregoing embodiments can be used to adjust the network parameter values of the neural network.
  • FIG. 5 is a schematic diagram 1 of the composition structure of an image processing device according to an embodiment of the disclosure; as shown in FIG. 5, the device includes: an extracting unit 41, a first determining unit 42, a second determining unit 43, and a third determining unit 44; among them,
  • the extraction unit 41 is configured to extract feature data of the first image
  • the first determining unit 42 is configured to determine each interaction key point and the center point of each target in the first image based on the feature data extracted by the extraction unit 41; one interaction key point is on the line A point within a preset range from the midpoint of the line, where the line is a line between the center points of two targets in an interactive action;
  • the second determining unit 43 is configured to determine at least two offsets based on the feature data extracted by the extracting unit 41; one offset represents an interaction key point in an interactive action and one of the interactive actions The offset of the center point of the target;
  • the third determining unit 44 is configured to determine the interaction relationship between the targets in the first image based on the center point of each target, the interaction key point, and the at least two offsets.
  • the first determining unit 42 is configured to determine the center point of each target in the first image and the confidence level of each target based on the characteristic data; Determining the interaction key points in the first image based on the feature data, and the confidence of each interaction key point corresponding to each interaction action category;
  • the third determining unit 44 is configured to be based on the center point of each target, the interaction key point, the at least two offsets, the confidence of each target, and each interaction key point corresponds to each preset interaction action
  • the confidence of the category determines the interaction relationship between the targets in the first image.
  • the first determining unit 42 is configured to determine the center point and its category of each target in the first image based on the feature data, and that each target belongs to The confidence level of each preset category;
  • the third determining unit 44 is configured to be based on the center point of each target and its category, the interaction key point, the at least two offsets, the confidence that each target belongs to each preset category, and each interaction
  • the key point corresponds to the confidence of each preset interaction action category, and determines the interaction relationship between the targets in the first image.
  • the third determining unit 44 is configured to determine two offsets corresponding to the interaction key point for one interaction key point; according to the interaction key point And the two offsets corresponding to the interaction key point, determine the two prediction center points corresponding to the interaction key point; according to the center point of each target and the two prediction center points corresponding to each interaction key point, Determine the two goals corresponding to each key point of interaction; determine the two goals corresponding to each key point of interaction, the confidence of each target, and the degree of confidence of each key point of interaction corresponding to each preset interaction action category. The interaction between the targets in the first image.
  • the third determining unit 44 is configured to, for an interaction key point, associate the interaction key point with a confidence level of a preset interaction action category and the interaction key point
  • the confidence levels of the two corresponding targets are multiplied to obtain a first confidence level, where the first confidence level is the confidence level that the interaction relationship between the two targets corresponding to the interaction key point belongs to the interaction action category;
  • the first confidence is greater than a confidence threshold, and it is determined that the interaction relationship between the two targets corresponding to the interaction key point belongs to the preset interactive action category; in response to the first confidence being not greater than the confidence threshold, It is determined that the interaction relationship between the two targets corresponding to the interaction key point does not belong to the preset interaction action category.
  • the third determining unit 44 is further configured to determine that the interaction relationship between the two targets corresponding to one interaction key point does not belong to each preset interaction action category There is no interaction relationship between the two targets corresponding to the interaction key point.
  • the third determining unit 44 is configured to determine the distance between the center point of each target and the prediction center point for a prediction center point;
  • the target whose distance between the predicted center points is less than the preset distance threshold is regarded as the target corresponding to the interaction key point corresponding to the predicted center point.
  • the first determining unit 42 is configured to down-sample the characteristic data to obtain a heat map of the first image; determine the first image according to the heat map The position offset of each point in the first image, the center point of each target in the first image, and the height and width of the detection frame of each target; After the center point of each target, the position of the center point of the interactive target in the first image is corrected according to the position offset of the center point of the interactive target in the first image to obtain the The corrected position of the center point of the interactive target in the first image; the corrected position of the center point of the interactive target in the first image and the height and width of the detection frame are used to determine the The detection frame of an interactive target in the first image.
  • each functional unit in the image processing device is implemented by a neural network, and the neural network is trained by using sample images, and the sample images are marked with interactive targets.
  • the detection frame, the marked center point and the marked interaction key point of the target in the sample image are determined according to the marked detection frame, and the marked offset is based on the marked center point of the target with the interactive relationship and the marked The key points of the interaction are determined.
  • the device further includes a training unit 45 configured to train the neural network by using sample images, and the specific configuration is: extracting the neural network by using the neural network.
  • the feature data of the sample image use the neural network to down-sample the feature data of the sample image to obtain the heat map of the sample image; use the neural network to predict the sample image based on the heat map of the sample image
  • the neural network predicts at least two offsets based on the feature data of the sample image; based on the center point of each target in the sample image, the interaction key point in the sample image, and the value in the sample image
  • At least two offsets predict the interactive relationship between the targets in the sample image; according to the predicted position offset, the predicted center point of the target with the interactive relationship in the sample image, and the predicted height of the detection frame
  • the extraction unit 41, the first determining unit 42, the second determining unit 43, the third determining unit 44, and the training unit 45 in the device can all be processed by the central processing unit in the device in practical applications. It is implemented by a CPU (Central Processing Unit), a digital signal processor (DSP, Digital Signal Processor), a microcontroller unit (MCU, Microcontroller Unit) or a programmable gate array (FPGA, Field-Programmable Gate Array).
  • a CPU Central Processing Unit
  • DSP Digital Signal Processor
  • MCU Microcontroller Unit
  • FPGA Field-Programmable Gate Array
  • the image processing device provided in the above embodiment performs image processing
  • only the division of the above-mentioned program modules is used as an example for illustration.
  • the above-mentioned processing can be allocated by different program modules as needed. That is, the internal structure of the device is divided into different program modules to complete all or part of the processing described above.
  • the image processing device provided in the foregoing embodiment and the image processing method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which is not repeated here.
  • FIG. 7 is a schematic diagram of the hardware composition structure of an electronic device according to an embodiment of the disclosure.
  • the electronic device includes a memory 52, a processor 51, and a computer program stored on the memory 52 and running on the processor 51 When the processor 51 executes the program, the steps of the image processing method described in the embodiment of the present disclosure are implemented.
  • bus system 53 various components in the electronic device are coupled together through the bus system 53.
  • the bus system 53 is used to implement connection and communication between these components.
  • the bus system 53 also includes a power bus, a control bus, and a status signal bus.
  • various buses are marked as the bus system 53 in FIG. 7.
  • the memory 52 may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory.
  • the non-volatile memory can be a read-only memory (ROM, Read Only Memory), a programmable read-only memory (PROM, Programmable Read-Only Memory), an erasable programmable read-only memory (EPROM, Erasable Programmable Read- Only Memory, Electrically Erasable Programmable Read-Only Memory, Magnetic Random Access Memory (FRAM, Ferromagnetic Random Access Memory), Flash Memory, Magnetic Surface Memory , CD-ROM, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface memory can be magnetic disk storage or tape storage.
  • the volatile memory may be a random access memory (RAM, Random Access Memory), which is used as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • SSRAM synchronous static random access memory
  • Synchronous Static Random Access Memory Synchronous Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • SDRAM Synchronous Dynamic Random Access Memory
  • DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • ESDRAM Enhanced Synchronous Dynamic Random Access Memory
  • SLDRAM synchronous connection dynamic random access memory
  • DRRAM Direct Rambus Random Access Memory
  • the memory 52 described in the embodiments of the present disclosure is intended to include, but is not limited to, these and any other suitable types of memory.
  • the methods disclosed in the foregoing embodiments of the present disclosure may be applied to the processor 51 or implemented by the processor 51.
  • the processor 51 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 51 or instructions in the form of software.
  • the aforementioned processor 51 may be a general-purpose processor, a DSP, or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and the like.
  • the processor 51 may implement or execute various methods, steps, and logical block diagrams disclosed in the embodiments of the present disclosure.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present disclosure may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a storage medium.
  • the storage medium is located in the memory 52.
  • the processor 51 reads the information in the memory 52 and completes the steps of the foregoing method in combination with its hardware.
  • the electronic device may be used by one or more Application Specific Integrated Circuits (ASIC, Application Specific Integrated Circuit), DSP, Programmable Logic Device (PLD, Programmable Logic Device), and Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), FPGA, general-purpose processor, controller, MCU, microprocessor (Microprocessor), or other electronic components are used to implement the aforementioned methods.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processing
  • PLD Programmable Logic Device
  • CPLD Complex Programmable Logic Device
  • FPGA general-purpose processor
  • controller MCU
  • microprocessor Microprocessor
  • the embodiment of the present disclosure also provides a computer-readable storage medium, such as a memory 52 including a computer program, which can be executed by the processor 51 of the image processing apparatus to complete the steps described in the foregoing method.
  • the computer-readable storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM; it may also be various devices including one or any combination of the foregoing memories.
  • the computer-readable storage medium provided by the embodiment of the present disclosure has a computer program stored thereon, and when the program is executed by a processor, the steps of the image processing method described in the embodiment of the present disclosure are realized.
  • the computer program provided by the embodiment of the present disclosure includes computer readable code, and when the computer readable code runs in an electronic device, the processor in the electronic device executes the image processing used in the embodiment of the present disclosure. Method steps.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units; Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the embodiments of the present disclosure can be all integrated into one processing unit, or each unit can be individually used as a unit, or two or more units can be integrated into one unit;
  • the unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the foregoing program can be stored in a computer readable storage medium. When the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a mobile storage device, ROM, RAM, magnetic disk, or optical disk.
  • the aforementioned integrated unit of the present disclosure is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage media include: removable storage devices, ROM, RAM, magnetic disks, or optical disks and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例公开了一种图像处理方法、装置、存储介质和电子设备。所述方法包括:提取第一图像的特征数据;基于所述特征数据确定所述第一图像中的各个交互关键点以及每个目标的中心点;一个交互关键点为连线上距离所述连线的中点预设范围内的一个点,所述连线为一个交互动作中的两个目标的中心点之间的连线;基于所述特征数据确定至少两个偏移量;一个偏移量表征一个交互动作中的交互关键点与该交互动作中的一个目标的中心点的偏移量;基于各个目标的中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系。

Description

图像处理方法、装置、存储介质和电子设备
相关申请的交叉引用
本公开基于申请号为201911404450.6、申请日为2019年12月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本公开。
技术领域
本公开涉及图像处理技术,具体涉及一种图像处理方法、装置、存储介质和电子设备。
背景技术
针对图片中的人和物体之间的交互动作关系检测,通常先通过检测器检测出图片中的人和物体,选取置信度高于一定阈值的人和物体,并将选取出的人和物体进行两两配对,形成人-物体对;再通过关系分类网络对每个人-物体对进行分类,输出动作关系类别。
上述处理过程中,第一方面,只是考虑到检测的置信度,并未考虑人和物体产生交互动作的可能性,这样会丢失具有真正交互动作关系的人或物体,也即丢失了具有真正交互动作关系的人-物体对,并且会产生大量的不具有真正交互动作关系的人-物体对;第二方面,通常情况下,一张图片中只有很少的人和物体具有交互动作关系,若图片中检测出M个人,N个物体,则采用上述处理方式,会生成M×N个人-物体对,则关系分类网络需要针对每个人-物体对确定其对应的动作关系类别,产生较多不必要的处理以及消耗。
发明内容
本公开实施例提供一种图像处理方法、装置、存储介质和电子设备。
本公开实施例提供了一种图像处理方法,所述方法包括:提取第一图像的特征数据;基于所述特征数据确定所述第一图像中的各个交互关键点以及每个目标的中心点;一个交互关键点为连线上距离所述连线的中点预设范围内的一个点,所述连线为一个交互动作中的两个目标的中心点之间的连线;基于所述特征数据确定至少两个偏移量;一个偏移量表征一个交互动作中的交互关键点与该交互动作中的一个目标的中心点的偏移量;基于各个目标的中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系。
在本公开的一些可选实施例中,所述基于所述特征数据确定所述第一图像中的各个交互关键点以及每个目标的中心点,包括:基于所述特征数据确定所述第一图像中的每个目标的中心点,以及每个目标的置信度;基于所述特征数据确定所述第一图像中的交互关键点,以及每个交互关键点对应各个交互动作类别的置信度;所述基于各个目标的 中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系,包括:基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
在本公开的一些可选实施例中,所述基于所述特征数据确定所述第一图像中的每个目标的中心点以及每个目标的置信度,包括:基于所述特征数据确定所述第一图像中的每个目标的中心点及其类别,以及每个目标属于各个类别的置信度;所述基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系,包括:基于各个目标的中心点及其类别、所述交互关键点、所述至少两个偏移量、每个目标属于各个类别的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
在本公开的一些可选实施例中,所述基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系,包括:针对一个交互关键点,确定与所述交互关键点相对应的两个偏移量;根据所述交互关键点以及与所述交互关键点相对应的两个偏移量,确定与该交互关键点对应的两个预测中心点;根据各目标的中心点以及与各个交互关键点对应的两个预测中心点,确定每个交互关键点对应的两个目标;根据每个交互关键点对应的两个目标、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
在本公开的一些可选实施例中,所述根据每个交互关键点对应的两个目标、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系,包括:针对一个交互关键点,将所述交互关键点对应一个预设交互动作类别的置信度与所述交互关键点对应的两个目标的置信度相乘,得到第一置信度,所述第一置信度为所述交互关键点对应的两个目标之间的交互关系属于该预设交互动作类别的置信度;响应于所述第一置信度大于置信度阈值,确定所述交互关键点对应的两个目标之间的交互关系属于所述预设交互动作类别;响应于所述第一置信度不大于置信度阈值,确定所述交互关键点对应的两个目标之间的交互关系不属于所述预设交互动作类别。
在本公开的一些可选实施例中,所述方法还包括:在确定一个交互关键点对应的两个目标之间的交互关系不属于各个预设交互动作类别之后,确定所述交互关键点对应的两个目标之间不存在交互关系。
在本公开的一些可选实施例中,所述根据各目标的中心点以及与各个交互关键点对应的两个预测中心点,确定每个交互关键点对应的两个目标,包括:针对一个预测中心点,确定各目标的中心点与所述预测中心点之间的距离;将中心点与所述该预测中心点之间的距离小于预设距离阈值的目标作为该预测中心点对应的交互关键点所对应的目标。
在本公开的一些可选实施例中,基于所述特征数据确定所述第一图像中的每个目标的中心点,包括:将所述特征数据下采样得到所述第一图像的热力图;根据所述热力图确定所述第一图像中各点的位置偏移、所述第一图像中的每个目标的中心点以及每个目标的检测框的高度和宽度;在基于所述特征数据确定所述第一图像中的每个目标的中心点之后,所述方法还包括:根据所述第一图像中具有交互关系的目标的中心点的位置偏移对所述第一图像中具有交互关系的目标的中心点的位置进行修正,得到所述第一图像中具有交互关系的目标的中心点的修正后的位置;根据所述第一图像中具有交互关系的 目标的中心点的修正后的位置及其检测框的高度和宽度,确定所述第一图像中具有交互关系的目标的检测框。
在本公开的一些可选实施例中,所述图像处理方法由神经网络执行,所述神经网络采用样本图像训练得到,所述样本图像中标注了存在交互关系的目标的检测框,所述样本图像中存在交互关系的目标的标注的中心点以及标注的交互关键点根据标注的检测框确定,标注的偏移量根据存在交互关系的目标的标注的中心点以及标注的交互关键点确定。
在本公开的一些可选实施例中,所述神经网络采用样本图像训练得到,包括:利用所述神经网络提取所述样本图像的特征数据;利用所述神经网络对所述样本图像的特征数据下采样得到所述样本图像的热力图;利用所述神经网络基于所述样本图像的热力图预测所述样本图像中各点的位置偏移、所述样本图像中的各个交互关键点、所述样本图像中的每个目标的中心点、所述样本图像中的每个目标的检测框的高度和宽度;利用所述神经网络基于所述样本图像的特征数据预测至少两个偏移量;基于所述样本图像中的各个目标的中心点、所述样本图像中的所述交互关键点以及所述样本图像中的至少两个偏移量预测所述样本图像中的目标之间的交互关系;根据预测的位置偏移、所述样本图像中存在交互关系的目标的预测的中心点及预测的检测框的高度和宽度、所述样本图像中存在交互关系的目标对应的预测的交互关键点及其对应的预测的偏移量,以及标注的位置偏移以及所述样本图像中标注的存在交互关系的目标的检测框,调整所述神经网络的网络参数值。
本公开实施例还提供了一种图像处理装置,所述装置包括:提取单元、第一确定单元、第二确定单元和第三确定单元;其中,
所述提取单元,配置为提取第一图像的特征数据;
所述第一确定单元,配置为基于所述提取单元提取的所述特征数据确定所述第一图像中的各个交互关键点以及每个目标的中心点;一个交互关键点为连线上距离所述连线的中点预设范围内的一个点,所述连线为一个交互动作中的两个目标的中心点之间的连线;
所述第二确定单元,配置为基于所述提取单元提取的所述特征数据确定至少两个偏移量;一个偏移量表征一个交互动作中的交互关键点与该交互动作中的一个目标的中心点的偏移量;
所述第三确定单元,配置为基于各个目标的中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系。
在本公开的一些可选实施例中,所述第一确定单元,配置为基于所述特征数据确定所述第一图像中的每个目标的中心点,以及每个目标的置信度;基于所述特征数据确定所述第一图像中的交互关键点,以及每个交互关键点对应各个交互动作类别的置信度;
所述第三确定单元,配置为基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
在本公开的一些可选实施例中,所述第一确定单元,配置为基于所述特征数据确定所述第一图像中的每个目标的中心点及其类别,以及每个目标属于各个预设类别的置信度;
所述第三确定单元,配置为基于各个目标的中心点及其类别、所述交互关键点、所述至少两个偏移量、每个目标属于各个预设类别的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
在本公开的一些可选实施例中,所述第三确定单元,配置为针对一个交互关键点, 确定与所述交互关键点相对应的两个偏移量;根据所述交互关键点以及与所述交互关键点相对应的两个偏移量,确定与该交互关键点对应的两个预测中心点;根据各目标的中心点以及与各个交互关键点对应的两个预测中心点,确定每个交互关键点对应的两个目标;根据每个交互关键点对应的两个目标、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
在本公开的一些可选实施例中,所述第三确定单元,配置为针对一个交互关键点,将所述交互关键点对应一个预设交互动作类别的置信度与所述交互关键点对应的两个目标的置信度相乘,得到第一置信度,所述第一置信度为所述交互关键点对应的两个目标之间的交互关系属于该交互动作类别的置信度;响应于所述第一置信度大于置信度阈值,确定所述交互关键点对应的两个目标之间的交互关系属于所述预设交互动作类别;响应于所述第一置信度不大于置信度阈值,确定所述交互关键点对应的两个目标之间的交互关系不属于所述预设交互动作类别。
在本公开的一些可选实施例中,所述第三确定单元,还配置为在确定一个交互关键点对应的两个目标之间的交互关系不属于各个预设交互动作类别之后,确定所述交互关键点对应的两个目标之间不存在交互关系。
在本公开的一些可选实施例中,所述第三确定单元,配置为针对一个预测中心点,确定各目标的中心点与所述预测中心点之间的距离;将中心点与所述该预测中心点之间的距离小于预设距离阈值的目标作为该预测中心点对应的交互关键点所对应的目标。
在本公开的一些可选实施例中,所述第一确定单元,配置为将所述特征数据下采样得到所述第一图像的热力图;根据所述热力图确定所述第一图像中各点的位置偏移、所述第一图像中的每个目标的中心点以及每个目标的检测框的高度和宽度;还配置为在基于所述特征数据确定所述第一图像中的每个目标的中心点之后,根据所述第一图像中具有交互关系的目标的中心点的位置偏移对所述第一图像中具有交互关系的目标的中心点的位置进行修正,得到所述第一图像中具有交互关系的目标的中心点的修正后的位置;根据所述第一图像中具有交互关系的目标的中心点的修正后的位置及其检测框的高度和宽度,确定所述第一图像中具有交互关系的目标的检测框。
在本公开的一些可选实施例中,所述图像处理装置中的各功能单元由神经网络实现,所述神经网络采用样本图像训练得到,所述样本图像中标注了存在交互关系的目标的检测框,所述样本图像中存在交互关系的目标的标注的中心点以及标注的交互关键点根据标注的检测框确定,标注的偏移量根据存在交互关系的目标的标注的中心点以及标注的交互关键点确定。
在本公开的一些可选实施例中,所述装置还包括训练单元,配置为采用样本图像训练得到所述神经网络,具体配置为:利用所述神经网络提取所述样本图像的特征数据;利用所述神经网络对所述样本图像的特征数据下采样得到所述样本图像的热力图;利用所述神经网络基于所述样本图像的热力图预测所述样本图像中各点的位置偏移、所述样本图像中的各个交互关键点、所述样本图像中的每个目标的中心点、所述样本图像中的每个目标的检测框的高度和宽度;利用所述神经网络基于所述样本图像的特征数据预测至少两个偏移量;基于所述样本图像中的各个目标的中心点、所述样本图像中的所述交互关键点以及所述样本图像中的至少两个偏移量预测所述样本图像中的目标之间的交互关系;根据预测的位置偏移、所述样本图像中存在交互关系的目标的预测的中心点及预测的检测框的高度和宽度、所述样本图像中存在交互关系的目标对应的预测的交互关键点及其对应的预测的偏移量,以及标注的位置偏移以及所述样本图像中标注的存在交互关系的目标的检测框,调整所述神经网络的网络参数值。
本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序 被处理器执行时实现本公开实施例所述方法的步骤。
本公开实施例还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现本公开实施例所述方法的步骤。
本公开实施例还提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现本公开实施例所述方法的步骤。
本公开实施例提供的图像处理方法、装置、存储介质和电子设备,所述方法包括:提取第一图像的特征数据;基于所述特征数据确定所述第一图像中的各个交互关键点以及每个目标的中心点;一个交互关键点为连线上距离所述连线的中点预设范围内的一个点,所述连线为一个交互动作中的两个目标的中心点之间的连线;基于所述特征数据确定至少两个偏移量;一个偏移量表征一个交互动作中的交互关键点与该交互动作中的一个目标的中心点的偏移量;基于各个目标的中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系。采用本公开实施例的技术方案,通过定义与交互动作相关的交互关键点,以及确定与交互关键点相关的至少两个偏移量,进而通过各个目标的中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系,无需生成人-物体对,也避免了采用人-物体对进行交互动作检测产生的具有真正交互关系的人-物体对丢失的问题;并且,相比于传统方式中先检测人和物体,然后再将人和物体组队,再基于关系分类网络对每个人-物体对进行分类检测,本实施例大大提升了检测速度,提升了检测效率。
附图说明
图1为本公开实施例的图像处理方法的流程示意图;
图2为本公开实施例的图像处理方法的一种应用示意图;
图3为本公开实施例的图像处理方法的另一种应用示意图;
图4为本公开实施例的图像处理方法中的神经网络的训练方法流程示意图;
图5为本公开实施例的图像处理装置的组成结构示意图一;
图6为本公开实施例的图像处理装置的组成结构示意图二;
图7为本公开实施例的电子设备的硬件组成结构示意图。
具体实施方式
下面结合附图及具体实施例对本公开作进一步详细的说明。
本公开实施例提供了一种图像处理方法。图1为本公开实施例的图像处理方法的流程示意图;如图1所示,所述方法包括:
步骤101:提取第一图像的特征数据;
步骤102:基于所述特征数据确定所述第一图像中的各个交互关键点以及每个目标的中心点;一个交互关键点为连线上距离所述连线的中点预设范围内的一个点,所述连线为一个交互动作中的两个目标的中心点之间的连线;
步骤103:基于所述特征数据确定至少两个偏移量;一个偏移量表征一个交互动作中的交互关键点与该交互动作中的一个目标的中心点的偏移量;
步骤104:基于各个目标的中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系。
本实施例中,第一图像中可包括多个目标,其中,所述多个目标中各目标之间可能不具有交互关系,或者,所述多个目标中可包括至少一组具有交互关系的目标;其中,所述具有交互关系的目标具体是至少两个目标,示例性的,所述至少两个目标中至少具有一个目标人物,例如,具有交互关系的两个目标为具有交互关系的两个目标人物,或者,具有交互关系的两个目标为具有交互关系的一个目标人物和一个目标物体。可以理解,所述具有交互关系的至少两个目标具体可以是具有交互动作的两个目标;其中,所述具有交互动作的两个目标可以是具有直接交互动作或隐含交互动作的两个目标。作为一种示例,若第一图像中包括的目标人物手中执有一根香烟,则可认为该目标人物与作为目标物体的香烟具有直接动作关系,则本示例中目标人物和目标对象具有直接动作关系。作为另一种示例,若第一图像中包括的目标人物拍球,目标人物做出拍球的动作,球在目标人物的手部的下方半空中,则可认为该目标人物与作为目标物体的球具有隐含动作关系。
本公开实施例提供的图像处理方法在确定图像中的目标是否存在交互关系时,确定目标的中心点和交互关键点的步骤(点检测步骤)可以与确定偏移量的步骤(点匹配步骤)并行,然后根据确当的偏移量以及检测的中心点和交互关键点来最终确定存在交互关系的目标及其交互动作类别,从而提高交互关系检测的效率。
在本公开的一种可选实施例中,针对步骤101,所述提取第一图像的特征数据,包括:通过深度神经网络模型提取所述第一图像的特征数据。示例性的,将第一图像作为输入数据输入至深度神经网络模型中,获得所述第一图像的特征数据。其中,可以理解,深度神经网络模型中可包括多个卷积层,通过各卷积层依次对第一图像进行卷积处理,从而获得第一图像的特征数据。
本实施例中,可通过预先训练获得的第一分支网络执行步骤102,即通过第一分支网络基于所述特征数据确定每个目标的中心点以及各个交互关键点。可以理解,将所述第一图像的特征数据作为输入数据输入至所述第一分支网络中,得到所述第一图像中每个目标的中心点以及各个交互关键点。例如,若第一图像中包括的目标均为目标人物,则通过所述第一分支网络对所述特征数据进行处理,得到每个目标人物的中心点以及各个交互关键点。又例如,若第一图像中包括的目标包括目标人物和目标物体,则通过所述第一分支网络对所述特征数据进行处理,得到目标人物的中心点、目标物体的中心点以及各个交互关键点。
其中,在一些实施例中,第一分支网络在目标的中心点之后,还会回归出目标的检测框的长度和宽度,目标的检测框根据目标的中心点和目标的检测框的长度和宽度确定。如图2所示,第一图像中包括两个目标人物和两个目标物体(两个目标物体为两个球),为了以示区别,可将目标人物的中心点记为第一中心点,将目标物体的中心点记为第二中心点。
其中,在一些实施例中,交互关键点为一个交互动作中的两个目标的中心点之间的连线上距离该连线的中点在预设范围内的点。作为一种示例,所述交互关键点可以为一个交互动作中的两个目标的中心点之间的连线的中点。如图2所示,一个交互关键点可以为一个交互动作中的目标人物的第一中心点和目标物体的第二中心点之间的连线的中点。
本实施例中,可通过预先训练获得的第二分支网络执行步骤103,即通过第二分支网络基于所述特征数据确定至少两个偏移量;其中,一个偏移量表征一个交互动作中的交互关键点与该交互动作中的一个目标的中心点的偏移量。可以理解,将第一图像的特征数据作为输入数据输入至所述第二分支网络中,得到第一图像中的每个点的至少两个偏移量。
实际应用中,每个点对应的至少两个偏移量可通过偏移量矩阵表示。则可基于步骤102中确定的各交互关键点,确定每个交互关键点对应的至少两个偏移量。在一些实施例中,可根据各交互关键点的坐标,以及每个点对应的偏移量矩阵,确定各交互关键点对应的至少两个偏移量。
参照图2所示,示例性的,一个偏移量表征交互动作中的交互关键点与第一中心点的偏移量,另一个偏移量表征所述交互动作中的交互关键点与第二中心点的偏移量,为了以示区别,将交互动作中的交互关键点与第一中心点的偏移量记为第一偏移量,将所述交互动作中的交互关键点与第二中心点的偏移量记为第二偏移量,则本示例中,第一偏移量表征交互动作中的交互关键点与第一中心点的偏移量,第二偏移量表征交互动作中的交互关键点与第二中心点的偏移量。当然,在其他示例中,也可将两个目标分别记为第一目标和第二目标,则第一偏移量表征交互动作中的交互关键点与第一目标的中心点的偏移量,第二偏移量表征交互动作中的交互关键点与第二目标的中心点的偏移量。
本实施例中,针对步骤104,所述基于各个目标的中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系,包括:针对一个交互关键点,确定与所述交互关键点相对应的两个偏移量;根据所述交互关键点以及与所述交互关键点相对应的两个偏移量,确定与该交互关键点对应的两个预测中心点;根据各目标的中心点以及与各个交互关键点对应的两个预测中心点,确定每个交互关键点对应的两个目标;根据每个交互关键点对应的两个目标确定所述第一图像中的目标之间的交互关系。
本实施例中,通过步骤103确定的至少两个偏移量的作用是为了确定具有交互动作(即交互关系)的至少两个目标。通过步骤102中确定的各目标的中心点以及各交互关键点,但并不知道哪些目标具有交互动作。基于此,本实施例中确定与每个交互关键点相对应的两个偏移量,根据所述交互关键点以及与所述交互关键点相对应的两个偏移量,确定与该交互关键点对应的两个预测中心点。
示例性的,以任意交互关键点(这里记为第一交互关键点)为例,则基于第一交互关键点的位置和与该第一交互关键点对应的一个偏移量(例如第一偏移量)可确定第一位置,所述第一位置理论上可作为与第一交互关键点匹配的一个目标的中心点(例如第一中心点)所在位置,这里将所述第一位置记为第一预测中心点;同理,则基于第一交互关键点的位置和与该第一交互关键点对应的另一个偏移量(例如第二偏移量)可确定第二位置,这里将所述第二位置记为第二预测中心点。
进一步地,将中心点与获得的预测中心点之间的距离小于预设距离阈值的目标作为该预测中心点对应的交互关键点所对应的目标。示例性的,第一目标的中心点与上述第一预测中心点之间的距离小于预设距离阈值,第二目标的中心点与上述第二预设中心点之间的距离小于所述预设距离阈值,则可表明,所述第一目标和所述第二目标为上述第一交互关键点对应的两个目标。可以理解,与某预测中心点之间的距离小于预设距离阈值的目标的中心点可能不止一个,也即与一个交互关键点对应的目标可以是两个或两个以上。
本实施例中,可基于各交互关键点对应的各个预设交互动作类别的置信度确定与该交互关键点对应的至少两个目标之间的交互关系。可以理解,在通过第一分支网络对特征数据进行处理得到第一图像中的各个交互关键点时,还可获得每个交互关键点对应的各个预设交互动作类别的置信度,基于所述预设交互动作类别的置信度确定至少两个目标之间的交互关系。
采用本公开实施例的技术方案,通过定义与交互动作相关的交互关键点,以及确定与交互关键点相关的至少两个偏移量,进而通过各个目标的中心点、所述交互关键点以 及所述至少两个偏移量确定所述第一图像中的目标之间的交互关,无需生成人-物体对,也避免了采用人-物体对进行交互动作检测产生的具有真正交互关系的人-物体对丢失的问题;并且,本实施例可直接获得具有交互关系的目标,相比于传统方式中基于关系分类网络对每个人-物体对进行分类检测,本实施例大大提升了检测速度,提升了检测效率。
下面针对图1所示的图像处理方法的各步骤进行具体说明。
在本公开的一种可选实施例中,针对步骤102,基于所述特征数据确定所述第一图像中的每个目标的中心点,包括:将所述特征数据下采样得到所述第一图像的热力图;根据所述热力图确定所述第一图像中各点的位置偏移、所述第一图像中的每个目标的中心点以及每个目标的检测框的高度和宽度;在基于所述特征数据确定所述第一图像中的每个目标的中心点之后,所述方法还包括:根据所述第一图像中具有交互关系的目标的中心点的位置偏移对所述第一图像中具有交互关系的目标的中心点的位置进行修正,得到所述第一图像中具有交互关系的目标的中心点的修正后的位置;根据所述第一图像中具有交互关系的目标的中心点的修正后的位置及其检测框的高度和宽度,确定所述第一图像中具有交互关系的目标的检测框。
本实施例中,对所述第一图像的特征数据进行下采样处理,所述下采样处理例如可以是对包含有特征数据的特征图进行图像缩小处理,即缩小特征图的尺寸,这导致下采样后得到的热力图中的各点与第一图像中的各点并不是一一对应的关系。例如,第一图像的大小为128x128,第一图像中的目标人物的中心点是(10,10),但是,由于热力图是下采样得到的,假设下采样4倍为32x32,那么目标人物的中心点映射过来应该是(10/4,10/4)=(2.5,2.5),但是由于再热力图中点的坐标是整数,因此,热力图中预测出来的目标人物的中心点是坐标下取整的点,即坐标为(2,2),也就是说,下采样会导致目标人物的中心点的位置产生一个位置偏移。
因此,可通过第一分支网络对所述特征数据进行处理,具体是先通过对包含有特征数据的特征图进行下采样处理得到热力图(Heatmap),再根据热力图确定所述第一图像中各点的位置偏移、所述第一图像中的每个目标的中心点以及每个目标的检测框的高度和宽度。可以理解,将特征数据作为第一分支网络的输入数据,在根据特征数据下采样得到热力图后,第一分支网络基于热力图确定出第一图像中各点的位置偏移(4ffset),第一图像中的各目标的中心点、各目标的检测框的高度和宽度[height,width]以及各目标属于各个类别置信度、第一图像中的各个交互关键点以及各个交互关键点属于各个预设交互动作类别的置信度。
本实施例中,在一些实施例中,在基于所述特征数据确定所述第一图像中的各点的位置偏移之后,可基于具有交互关系的目标的中心点的位置偏移对该中心点的位置进行修正。示例性的,可将得到的目标的中心点与相应的位置偏移进行加和处理,得到修正后的目标的中心点的位置。相应的,根据目标的中心点的修正后的位置以及检测框的高度和宽度,得到该目标的检测框,从而输出具有交互关系的目标的检测框。
示例性的,可参照图2所示,假设图2中的第一中心点即为修正后的位置,贯穿该第一中心点的纵向虚线表明检测框的高度(height),贯穿该第一中心点的横向虚线表明检测框的宽度(width)
在本公开的一种可选实施例中,针对步骤102,所述基于所述特征数据确定所述第一图像中的各个交互关键点以及每个目标的中心点,包括:基于所述特征数据确定所述第一图像中的每个目标的中心点,以及每个目标的置信度;基于所述特征数据确定所述第一图像中的交互关键点,以及每个交互关键点对应的各个预设交互动作类别的置信度;
所述基于各个目标的中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系,包括:基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应的各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
本实施例中,可基于第一分支网络对特征数据进行处理,示例性的,可通过第一分支网络中的多个卷积层对特征数据进行卷积处理,得到第一图像中的各目标的中心点以及每个目标的置信度,其中,所述目标的置信度可以为所述第一图像中存在所述目标的置信度。相应的,还可通过第一分支网络中的多个卷积层对特征数据进行卷积处理,得到第一图像中的各交互关键点以及每个交互关键点对应的预设交互动作类别的置信度;其中,所述预设交互动作类别可以是预先设置的任意交互动作类别,例如吸烟交互动作、拍球交互动作等等。进一步地,基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
基于此,在本公开的一种可选实施例中,所述基于所述特征数据确定所述第一图像中的每个目标的中心点以及每个目标的置信度,包括:基于所述特征数据确定所述第一图像中的每个目标的中心点及其类别,以及每个目标属于各个类别的置信度;所述基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系,包括:基于各个目标的中心点及其类别、所述交互关键点、所述至少两个偏移量、每个目标属于各个类别的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
本实施例中,可基于第一分支网络对特征数据进行处理,示例性的,可通过第一分支网络中的多个卷积层对特征数据进行卷积处理,得到第一图像中的各目标的中心点及其类别,以及每个目标属于各个类别的置信度;其中,第一图像中的目标所属类别可包括人、车、球类等任意类别,所述目标属于各个类别的置信度所述第一图像中所述目标属于该类别的置信度,也就是第一图像中的某处存在属于某一类别的目标的置信度。则本实施例中,基于各个目标的中心点及其类别、所述交互关键点、所述至少两个偏移量、每个目标属于各个类别的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
在本公开的一种可选实施例中,所述基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系,包括:针对一个交互关键点,确定与所述交互关键点相对应的两个偏移量;根据所述交互关键点以及与所述交互关键点相对应的两个偏移量,确定与该交互关键点对应的两个预测中心点;根据各目标的中心点以及与各个交互关键点对应的两个预测中心点,确定每个交互关键点对应的两个目标;根据每个交互关键点对应的两个目标、每个目标属于各个类别的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
本实施例中,以任意交互关键点(这里记为第一交互关键点)为例,则基于第一交互关键点的位置和与该第一交互关键点对应的一个偏移量(例如第一偏移量)可确定第一位置,这里将所述第一位置记为第一预测中心点;同理,则基于第一交互关键点的位置和与该第一交互关键点对应的另一个偏移量(例如第二偏移量)可确定第二位置,这里将所述第二位置记为第二预测中心点。
进一步基于各目标的中心点以及与各个交互关键点对应的两个预测中心点,确定每 个交互关键点对应的两个目标,根据每个交互关键点对应的两个目标、每个目标属于各个类别的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
在本公开的一种可选实施例中,所述根据各目标的中心点以及与各个交互关键点对应的两个预测中心点,确定每个交互关键点对应的两个目标,包括:针对一个预测中心点,确定各目标的中心点与所述预测中心点之间的距离;将中心点与所述该预测中心点之间的距离小于预设距离阈值的目标作为该预测中心点对应的交互关键点所对应的目标。
本实施例中,将目标的中心点与获得的预测中心点之间的距离小于预设距离阈值的目标作为该预测中心点对应的交互关键点所对应的目标。示例性的,第一目标的中心点与上述第一预测中心点之间的距离小于预设距离阈值,第二目标的中心点与上述第二预设中心点之间的距离小于所述预设距离阈值,则可表明,所述第一目标和所述第二目标为上述第一交互关键点对应的两个目标。可以理解,与某预测中心点之间的距离小于预设距离阈值的目标的中心点可能不止一个,也即与一个交互关键点对应的目标可以是两个或两个以上。进一步基于根据每个交互关键点对应的至少两个目标、每个目标属于各个类别的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
在本公开的一种可选实施例中,所述根据每个交互关键点对应的两个目标、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系,包括:针对一个交互关键点,将所述交互关键点对应一个预设交互动作类别的置信度与所述交互关键点对应的两个目标属于相应类别的置信度相乘,得到第一置信度,所述第一置信度为所述交互关键点对应的两个目标之间的交互关系属于该交互动作类别的置信度;其中,相应类别是指两个目标属于该类别的时候,两个目标之间的交互属于预设交互动作类别;例如,预设动作类别为排球,那么相应类别是一个目标的类别为人,另一个目标的类别为球;预设动作类别为打电话,那么相应类别是一个目标的类别为人,另一个目标的类别为电话。响应于所述第一置信度大于置信度阈值,确定所述交互关键点对应的两个目标之间的交互关系属于所述预设交互动作类别;响应于所述第一置信度不大于置信度阈值,确定所述交互关键点对应的两个目标之间的交互关系不属于所述预设交互动作类别。
在本公开的一种可选实施例中,所述方法还包括:在确定一个交互关键点对应的两个目标之间的交互关系不属于各个预设交互动作类别之后,确定所述交互关键点对应的两个目标之间不存在交互关系。
本实施例中,若一个交互关键点对应至少两个目标,也即确定多个目标之间的交互关系过程中,可先采用上述方案确定多个目标中的两两目标之间的交互关系,确定这两两目标之间的交互关系是否属于对应的交互关键点对应的预设交互动作类别,以此类推。例如与一个交互关键点对应有三个目标,记为目标1、目标2和目标3,则可采用上述方案分别确定目标1和目标2、目标2和目标3以及目标3和目标1之间的交互关系。
图3为本公开实施例的图像处理方法的另一种应用示意图;如图3所示,神经网络可包括特征提取网络、第一分支网络和第二分支网络;其中,特征提取网络用于对输入图像进行特征提取,得到特征数据。第一分支网络用于对特征数据进行下采样得到热力图,再根据热力图确定输入图像中的各目标的中心点以及各交互关键点,以及得到各点的位置偏移(offset)和每个目标的检测框的高度和宽度[高度,宽度],各目标所属类别的置信度以及各个交互关键点对应各个预设交互动作类别的置信度。第二分支网络用于 对特征数据进行处理得到输入图像中的每个点的至少两个偏移量,一个偏移量表征一个交互动作中的交互关键点与该交互动作中的一个目标的中心点的偏移量。
在一种实施方式中,通过第一分支网络对包含有特征数据的特征图进行下采样处理,得到热力图。以本示例中输入图像中的目标包括目标人物和目标物体为例,为了以示区别,将目标人物的中心点记为第一中心点,将目标物体的中心点记为第二中心点,则可得到分别包含有第一中心点的第一热力图、包含有第二中心点的第二热力图和包含有各交互关键点的第三热力图。也就是说,第一分支网络的输出数据可以包括上述第一热力图、第二热力图、第三热力图以及输入图像中各点的位置偏移以及目标人物和目标物体的检测框的高度和宽度。
具体的,基于第一分支网络还可获得每个目标的中心点及其类别以及每个目标属于各个类别的置信度,以及每个交互关键点对应的各个预设交互动作类别的置信度。
在一种实施方式中,通过第二分支网络对包含有特征数据的特征图进行处理,得到每个交互关键点对应的两个偏移量,为了以示区别,将交互关键点与交互动作中的目标人物的第一中心点之间的偏移量记为第一偏移量,将交互关键点与交互动作中的目标物体的第二中心点之间的偏移量记为第二偏移量。
根据一个交互关键点以及与该交互关键点相对应的第一偏移量和第二偏移量,确定与该交互关键点对应的两个预测中心点,分别记为第一预测中心点和第二预测中心点;针对第一预测中心点,分别确定各第一中心点与第一预测中心点之间的距离,确定与所述该第一预测中心点之间的距离小于预设距离阈值的第一中心点;相应的,针对第二预测中心点,分别确定各第二中心点与第二预测中心点之间的距离,确定与该第二预测中心点之间的距离小于预设距离阈值的第二中心点。
针对图3中的两个交互关键点,分别将每个交互关键点对应的预设交互动作类别的置信度与所述交互关键点对应的目标人物的置信度和目标物体的置信度相乘,得到第一置信度;在第一置信度大于置信度阈值的情况下,确定该交互关键点对应的目标人物和目标物体之间的交互关系属于所述交互关键点对应的预设交互动作类别;在第一置信度不大于置信度阈值的情况下,确定该交互关键点对应的目标人物和目标物体之间的交互关系不属于所述交互关键点对应的预设交互动作类别。
本示例中,通过第一分支网络输出的输入图像中的各点的位置偏移对目标人物的第一中心点和目标物体的第二中心点的位置进行修正,得到具有交互关系的目标人物的第一中心点的修正后的位置,以及目标物体的第二中心点的修正后的位置,根据输入图像中具有交互关系的目标人物的第一中心点的修正后的位置及其检测框的高度和宽度[高度,宽度]、目标物体的第二中心点的修正后的位置及其检测框的高度和宽度[高度,宽度],确定所述第一图像中具有交互关系的目标的检测框。神经网络的输出结果为目标人物的第一中心点的修正后的位置和对应的检测框、目标物体的第二中心点的修正后的位置和对应的检测框,以及目标人物和目标物体的交互关系(即交互动作类别)。对于输入图像中不存在交互关系的目标则不会输出检测框。
在本公开的一种可选实施例中,本实施例的所述图像处理方法由神经网络执行,所述神经网络采用样本图像训练得到,所述样本图像中标注了存在交互关系的目标的检测框,所述样本图像中存在交互关系的目标的标注的中心点(即目标检测框的中心)以及标注的交互关键点(存在交互关系的目标的检测框的中心的连线的中点)根据标注的检测框确定,标注的偏移量根据样本图像的大小以及根据样本图像确定的热力图的大小确定。基于此,本公开实施例还提供了一种神经网络的训练方法。图4为本公开实施例的图像处理方法中的神经网络的训练方法流程示意图;如图4所示,所述方法包括:
步骤201:利用所述神经网络提取所述样本图像的特征数据;
步骤202:利用所述神经网络对所述样本图像的特征数据下采样得到所述样本图像的热力图;
步骤203:利用所述神经网络基于所述样本图像的热力图预测所述样本图像中各点的位置偏移、所述样本图像中的各个交互关键点、所述样本图像中的每个目标的中心点、所述样本图像中的每个目标的检测框的高度和宽度;
步骤204:利用所述神经网络基于所述样本图像的特征数据预测至少两个偏移量;
步骤205:基于所述样本图像中的各个目标的中心点、所述样本图像中的所述交互关键点以及所述样本图像中的至少两个偏移量预测所述样本图像中的目标之间的交互关系;
步骤206:根据预测的位置偏移、所述样本图像中存在交互关系的目标的预测的中心点及预测的检测框的高度和宽度、所述样本图像中存在交互关系的目标对应的预测的交互关键点及其对应的预测的偏移量,以及标注的位置偏移以及所述样本图像中标注的存在交互关系的目标的检测框,调整所述神经网络的网络参数值。
本实施例步骤201至步骤205具体可参照前述实施例中所述,这里不再赘述。
本实施例步骤206中,在一些实施例中,针对神经网络中的第一分支网络,可根据预测的样本图像中存在交互关系的目标的预测的中心点及预测的检测框的高度和宽度以及预测的交互关键点、结合标注的存在交互关系的目标的检测框以及标注的位置偏移确定一个损失函数,基于该损失函数对第一分支网络的网络参数进行调整。
在一些实施例中,针对神经网络中的第二分支网络,可根据交互关键点对应的预测的偏移量以及标注的偏移量确定一个损失函数,基于该损失函数对第二分支网络的网络参数进行调整。
在一些实施例中,可基于预测的位置偏移和标注的位置偏移确定一个损失函数,通过该损失函数回归对包含特征数据的特征图进行下采样处理导致的位置偏差,尽量减少下采样带来的损失,可以使获得的各点的位置偏移(offset)更为准确。基于此,通过该损失函数对第一分支网络的网络参数进行调整。
本实施例中可采用上述各实施例中的参数调整方式对神经网络的网络参数值进行调整。
本公开实施例还提供了一种图像处理装置。图5为本公开实施例的图像处理装置的组成结构示意图一;如图5所示,所述装置包括:提取单元41、第一确定单元42、第二确定单元43和第三确定单元44;其中,
所述提取单元41,配置为提取第一图像的特征数据;
所述第一确定单元42,配置为基于所述提取单元41提取的所述特征数据确定所述第一图像中的各个交互关键点以及每个目标的中心点;一个交互关键点为连线上距离所述连线的中点预设范围内的一个点,所述连线为一个交互动作中的两个目标的中心点之间的连线;
所述第二确定单元43,配置为基于所述提取单元41提取的所述特征数据确定至少两个偏移量;一个偏移量表征一个交互动作中的交互关键点与该交互动作中的一个目标的中心点的偏移量;
所述第三确定单元44,配置为基于各个目标的中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系。
在本公开的一种可选实施例中,所述第一确定单元42,配置为基于所述特征数据确定所述第一图像中的每个目标的中心点,以及每个目标的置信度;基于所述特征数据确定所述第一图像中的交互关键点,以及每个交互关键点对应各个交互动作类别的置信度;
所述第三确定单元44,配置为基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
在本公开的一种可选实施例中,所述第一确定单元42,配置为基于所述特征数据确定所述第一图像中的每个目标的中心点及其类别,以及每个目标属于各个预设类别的置信度;
所述第三确定单元44,配置为基于各个目标的中心点及其类别、所述交互关键点、所述至少两个偏移量、每个目标属于各个预设类别的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
在本公开的一种可选实施例中,所述第三确定单元44,配置为针对一个交互关键点,确定与所述交互关键点相对应的两个偏移量;根据所述交互关键点以及与所述交互关键点相对应的两个偏移量,确定与该交互关键点对应的两个预测中心点;根据各目标的中心点以及与各个交互关键点对应的两个预测中心点,确定每个交互关键点对应的两个目标;根据每个交互关键点对应的两个目标、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
在本公开的一种可选实施例中,所述第三确定单元44,配置为针对一个交互关键点,将所述交互关键点对应一个预设交互动作类别的置信度与所述交互关键点对应的两个目标的置信度相乘,得到第一置信度,所述第一置信度为所述交互关键点对应的两个目标之间的交互关系属于该交互动作类别的置信度;响应于所述第一置信度大于置信度阈值,确定所述交互关键点对应的两个目标之间的交互关系属于所述预设交互动作类别;响应于所述第一置信度不大于置信度阈值,确定所述交互关键点对应的两个目标之间的交互关系不属于所述预设交互动作类别。
在本公开的一种可选实施例中,所述第三确定单元44,还配置为在确定一个交互关键点对应的两个目标之间的交互关系不属于各个预设交互动作类别之后,确定所述交互关键点对应的两个目标之间不存在交互关系。
在本公开的一种可选实施例中,所述第三确定单元44,配置为针对一个预测中心点,确定各目标的中心点与所述预测中心点之间的距离;将中心点与所述该预测中心点之间的距离小于预设距离阈值的目标作为该预测中心点对应的交互关键点所对应的目标。
在本公开的一种可选实施例中,所述第一确定单元42,配置为将所述特征数据下采样得到所述第一图像的热力图;根据所述热力图确定所述第一图像中各点的位置偏移、所述第一图像中的每个目标的中心点以及每个目标的检测框的高度和宽度;还配置为在基于所述特征数据确定所述第一图像中的每个目标的中心点之后,根据所述第一图像中具有交互关系的目标的中心点的位置偏移对所述第一图像中具有交互关系的目标的中心点的位置进行修正,得到所述第一图像中具有交互关系的目标的中心点的修正后的位置;根据所述第一图像中具有交互关系的目标的中心点的修正后的位置及其检测框的高度和宽度,确定所述第一图像中具有交互关系的目标的检测框。
在本公开的一种可选实施例中,所述图像处理装置中的各功能单元由神经网络实现,所述神经网络采用样本图像训练得到,所述样本图像中标注了存在交互关系的目标的检测框,所述样本图像中存在交互关系的目标的标注的中心点以及标注的交互关键点根据标注的检测框确定,标注的偏移量根据存在交互关系的目标的标注的中心点以及标注的交互关键点确定。
在本公开的一种可选实施例中,如图6所示,所述装置还包括训练单元45,配置为采用样本图像训练得到所述神经网络,具体配置为:利用所述神经网络提取所述样本图像的特征数据;利用所述神经网络对所述样本图像的特征数据下采样得到所述样本图像 的热力图;利用所述神经网络基于所述样本图像的热力图预测所述样本图像中各点的位置偏移、所述样本图像中的各个交互关键点、所述样本图像中的每个目标的中心点、所述样本图像中的每个目标的检测框的高度和宽度;利用所述神经网络基于所述样本图像的特征数据预测至少两个偏移量;基于所述样本图像中的各个目标的中心点、所述样本图像中的所述交互关键点以及所述样本图像中的至少两个偏移量预测所述样本图像中的目标之间的交互关系;根据预测的位置偏移、所述样本图像中存在交互关系的目标的预测的中心点及预测的检测框的高度和宽度、所述样本图像中存在交互关系的目标对应的预测的交互关键点及其对应的预测的偏移量,以及标注的位置偏移以及所述样本图像中标注的存在交互关系的目标的检测框,调整所述神经网络的网络参数值。
本公开实施例中,所述装置中的提取单元41、第一确定单元42、第二确定单元43和第三确定单元44和训练单元45,在实际应用中均可由所述装置中的中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)、微控制单元(MCU,Microcontroller Unit)或可编程门阵列(FPGA,Field-Programmable Gate Array)实现。
需要说明的是:上述实施例提供的图像处理装置在进行图像处理时,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即将装置的内部结构划分成不同的程序模块,以完成以上描述的全部或者部分处理。另外,上述实施例提供的图像处理装置与图像处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本公开实施例还提供了一种电子设备。图7为本公开实施例的电子设备的硬件组成结构示意图,如图7所示,所述电子设备包括存储器52、处理器51及存储在存储器52上并可在处理器51上运行的计算机程序,所述处理器51执行所述程序时实现本公开实施例述图像处理方法的步骤。
可选地,电子设备中的各个组件通过总线系统53耦合在一起。可理解,总线系统53用于实现这些组件之间的连接通信。总线系统53除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图7中将各种总线都标为总线系统53。
可以理解,存储器52可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM,Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM,Ferromagnetic Random Access Memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM,Compact Disc Read-Only Memory);磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM,Random Access Memory),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM,Static Random Access Memory)、同步静态随机存取存储器(SSRAM,Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM,Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM,Synchronous Dynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM,Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM,Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM,SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储 器(DRRAM,Direct Rambus Random Access Memory)。本公开实施例描述的存储器52旨在包括但不限于这些和任意其它适合类型的存储器。
上述本公开实施例揭示的方法可以应用于处理器51中,或者由处理器51实现。处理器51可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器51中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器51可以是通用处理器、DSP,或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器51可以实现或者执行本公开实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本公开实施例所公开的方法的步骤,可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中,该存储介质位于存储器52,处理器51读取存储器52中的信息,结合其硬件完成前述方法的步骤。
在示例性实施例中,电子设备可以被一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、FPGA、通用处理器、控制器、MCU、微处理器(Microprocessor)、或其他电子元件实现,用于执行前述方法。
在示例性实施例中,本公开实施例还提供了一种计算机可读存储介质,例如包括计算机程序的存储器52,上述计算机程序可由图像处理装置的处理器51执行,以完成前述方法所述步骤。计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、Flash Memory、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备。
本公开实施例提供的计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本公开实施例所述的图像处理方法的步骤。
本公开实施例提供的计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现本公开实施例所述的图像处理方法的步骤。
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本公开各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本公开上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (21)

  1. 一种图像处理方法,所述方法包括:
    提取第一图像的特征数据;
    基于所述特征数据确定所述第一图像中的各个交互关键点以及每个目标的中心点;一个交互关键点为连线上距离所述连线的中点预设范围内的一个点,所述连线为一个交互动作中的两个目标的中心点之间的连线;
    基于所述特征数据确定至少两个偏移量;一个偏移量表征一个交互动作中的交互关键点与该交互动作中的一个目标的中心点的偏移量;
    基于各个目标的中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系。
  2. 根据权利要求1所述的方法,其中,所述基于所述特征数据确定所述第一图像中的各个交互关键点以及每个目标的中心点,包括:
    基于所述特征数据确定所述第一图像中的每个目标的中心点,以及每个目标的置信度;
    基于所述特征数据确定所述第一图像中的交互关键点,以及每个交互关键点对应的各个预设交互动作类别的置信度;
    所述基于各个目标的中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系,包括:
    基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
  3. 根据权利要求2所述的方法,其中,所述基于所述特征数据确定所述第一图像中的每个目标的中心点以及每个目标的置信度,包括:
    基于所述特征数据确定所述第一图像中的每个目标的中心点及其类别,以及每个目标属于各个类别的置信度;
    所述基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系,包括:
    基于各个目标的中心点及其类别、所述交互关键点、所述至少两个偏移量、每个目标属于各个类别的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
  4. 根据权利要求2或3所述的方法,其中,所述基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系,包括:
    针对一个交互关键点,确定与所述交互关键点相对应的两个偏移量;
    根据所述交互关键点以及与所述交互关键点相对应的两个偏移量,确定与该交互关键点对应的两个预测中心点;
    根据各目标的中心点以及与各个交互关键点对应的两个预测中心点,确定每个交互关键点对应的两个目标;
    根据每个交互关键点对应的两个目标、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
  5. 根据权利要求4所述的方法,其中,所述根据每个交互关键点对应的两个目标、 每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系,包括:
    针对一个交互关键点,将所述交互关键点对应一个预设交互动作类别的置信度与所述交互关键点对应的两个目标的置信度相乘,得到第一置信度,所述第一置信度为所述交互关键点对应的两个目标之间的交互关系属于该预设交互动作类别的置信度;
    响应于所述第一置信度大于置信度阈值,确定所述交互关键点对应的两个目标之间的交互关系属于所述预设交互动作类别;
    响应于所述第一置信度不大于置信度阈值,确定所述交互关键点对应的两个目标之间的交互关系不属于所述预设交互动作类别。
  6. 根据权利要求5所述的方法,其中,所述方法还包括:
    在确定一个交互关键点对应的两个目标之间的交互关系不属于各个预设交互动作类别之后,确定所述交互关键点对应的两个目标之间不存在交互关系。
  7. 根据权利要求4至6任一项所述的方法,其中,所述根据各目标的中心点以及与各个交互关键点对应的两个预测中心点,确定每个交互关键点对应的两个目标,包括:
    针对一个预测中心点,确定各目标的中心点与所述预测中心点之间的距离;
    将中心点与所述该预测中心点之间的距离小于预设距离阈值的目标作为该预测中心点对应的交互关键点所对应的目标。
  8. 根据权利要求1至7任一项所述的方法,其中,基于所述特征数据确定所述第一图像中的每个目标的中心点,包括:
    将所述特征数据下采样得到所述第一图像的热力图;
    根据所述热力图确定所述第一图像中各点的位置偏移、所述第一图像中的每个目标的中心点以及每个目标的检测框的高度和宽度;
    在基于所述特征数据确定所述第一图像中的每个目标的中心点之后,所述方法还包括:
    根据所述第一图像中具有交互关系的目标的中心点的位置偏移对所述第一图像中具有交互关系的目标的中心点的位置进行修正,得到所述第一图像中具有交互关系的目标的中心点的修正后的位置;
    根据所述第一图像中具有交互关系的目标的中心点的修正后的位置及其检测框的高度和宽度,确定所述第一图像中具有交互关系的目标的检测框。
  9. 根据权利要求8所述的方法,其中,所述图像处理方法由神经网络执行,所述神经网络采用样本图像训练得到,所述样本图像中标注了存在交互关系的目标的检测框,所述样本图像中存在交互关系的目标的标注的中心点以及标注的交互关键点根据标注的检测框确定,标注的偏移量根据存在交互关系的目标的标注的中心点以及标注的交互关键点确定。
  10. 根据权利要求9所述的方法,其中,所述神经网络采用样本图像训练得到,包括:
    利用所述神经网络提取所述样本图像的特征数据;
    利用所述神经网络对所述样本图像的特征数据下采样得到所述样本图像的热力图;
    利用所述神经网络基于所述样本图像的热力图预测所述样本图像中各点的位置偏移、所述样本图像中的各个交互关键点、所述样本图像中的每个目标的中心点、所述样本图像中的每个目标的检测框的高度和宽度;
    利用所述神经网络基于所述样本图像的特征数据预测至少两个偏移量;
    基于所述样本图像中的各个目标的中心点、所述样本图像中的所述交互关键点以及所述样本图像中的至少两个偏移量预测所述样本图像中的目标之间的交互关系;
    根据预测的位置偏移、所述样本图像中存在交互关系的目标的预测的中心点及预测的检测框的高度和宽度、所述样本图像中存在交互关系的目标对应的预测的交互关键点及其对应的预测的偏移量,以及标注的位置偏移以及所述样本图像中标注的存在交互关系的目标的检测框,调整所述神经网络的网络参数值。
  11. 一种图像处理装置,所述装置包括:提取单元、第一确定单元、第二确定单元和第三确定单元;其中,
    所述提取单元,配置为提取第一图像的特征数据;
    所述第一确定单元,配置为基于所述提取单元提取的所述特征数据确定所述第一图像中的各个交互关键点以及每个目标的中心点;一个交互关键点为连线上距离所述连线的中点预设范围内的一个点,所述连线为一个交互动作中的两个目标的中心点之间的连线;
    所述第二确定单元,配置为基于所述提取单元提取的所述特征数据确定至少两个偏移量;一个偏移量表征一个交互动作中的交互关键点与该交互动作中的一个目标的中心点的偏移量;
    所述第三确定单元,配置为基于各个目标的中心点、所述交互关键点以及所述至少两个偏移量确定所述第一图像中的目标之间的交互关系。
  12. 根据权利要求11所述的装置,其中,所述第一确定单元,配置为基于所述特征数据确定所述第一图像中的每个目标的中心点,以及每个目标的置信度;基于所述特征数据确定所述第一图像中的交互关键点,以及每个交互关键点对应各个交互动作类别的置信度;
    所述第三确定单元,配置为基于各个目标的中心点、所述交互关键点、所述至少两个偏移量、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
  13. 根据权利要求12所述的装置,其中,所述第一确定单元,配置为基于所述特征数据确定所述第一图像中的每个目标的中心点及其类别,以及每个目标属于各个预设类别的置信度;
    所述第三确定单元,配置为基于各个目标的中心点及其类别、所述交互关键点、所述至少两个偏移量、每个目标属于各个预设类别的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
  14. 根据权利要求12或13所述的装置,其中,所述第三确定单元,配置为针对一个交互关键点,确定与所述交互关键点相对应的两个偏移量;根据所述交互关键点以及与所述交互关键点相对应的两个偏移量,确定与该交互关键点对应的两个预测中心点;根据各目标的中心点以及与各个交互关键点对应的两个预测中心点,确定每个交互关键点对应的两个目标;根据每个交互关键点对应的两个目标、每个目标的置信度以及每个交互关键点对应各个预设交互动作类别的置信度,确定所述第一图像中的目标之间的交互关系。
  15. 根据权利要求14所述的装置,其中,所述第三确定单元,配置为针对一个交互关键点,将所述交互关键点对应一个预设交互动作类别的置信度与所述交互关键点对应的两个目标的置信度相乘,得到第一置信度,所述第一置信度为所述交互关键点对应的两个目标之间的交互关系属于该交互动作类别的置信度;响应于所述第一置信度大于置信度阈值,确定所述交互关键点对应的两个目标之间的交互关系属于所述预设交互动作类别;响应于所述第一置信度不大于置信度阈值,确定所述交互关键点对应的两个目标之间的交互关系不属于所述预设交互动作类别。
  16. 根据权利要求15所述的装置,其中,所述第三确定单元,还配置为在确定一 个交互关键点对应的两个目标之间的交互关系不属于各个预设交互动作类别之后,确定所述交互关键点对应的两个目标之间不存在交互关系。
  17. 根据权利要求14至16任一项所述的装置,其中,所述第三确定单元,配置为针对一个预测中心点,确定各目标的中心点与所述预测中心点之间的距离;将中心点与所述该预测中心点之间的距离小于预设距离阈值的目标作为该预测中心点对应的交互关键点所对应的目标。
  18. 根据权利要求11至17任一项所述的装置,其中,所述第一确定单元,配置为将所述特征数据下采样得到所述第一图像的热力图;根据所述热力图确定所述第一图像中各点的位置偏移、所述第一图像中的每个目标的中心点以及每个目标的检测框的高度和宽度;还配置为在基于所述特征数据确定所述第一图像中的每个目标的中心点之后,根据所述第一图像中具有交互关系的目标的中心点的位置偏移对所述第一图像中具有交互关系的目标的中心点的位置进行修正,得到所述第一图像中具有交互关系的目标的中心点的修正后的位置;根据所述第一图像中具有交互关系的目标的中心点的修正后的位置及其检测框的高度和宽度,确定所述第一图像中具有交互关系的目标的检测框。
  19. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现权利要求1至10任一项所述方法的步骤。
  20. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至10任一项所述方法的步骤。
  21. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1至10中任意一项所述的方法。
PCT/CN2020/116889 2019-12-30 2020-09-22 图像处理方法、装置、存储介质和电子设备 WO2021135424A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020217034504A KR102432204B1 (ko) 2019-12-30 2020-09-22 이미지 처리 방법, 장치, 저장 매체 및 전자 기기
JP2021557461A JP7105383B2 (ja) 2019-12-30 2020-09-22 画像処理方法、装置、記憶媒体及び電子機器

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911404450.6 2019-12-30
CN201911404450.6A CN111104925B (zh) 2019-12-30 2019-12-30 图像处理方法、装置、存储介质和电子设备

Publications (1)

Publication Number Publication Date
WO2021135424A1 true WO2021135424A1 (zh) 2021-07-08

Family

ID=70424673

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/116889 WO2021135424A1 (zh) 2019-12-30 2020-09-22 图像处理方法、装置、存储介质和电子设备

Country Status (4)

Country Link
JP (1) JP7105383B2 (zh)
KR (1) KR102432204B1 (zh)
CN (1) CN111104925B (zh)
WO (1) WO2021135424A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116258722A (zh) * 2023-05-16 2023-06-13 青岛奥维特智能科技有限公司 基于图像处理的桥梁建筑智能检测方法
CN116862980A (zh) * 2023-06-12 2023-10-10 上海玉贲智能科技有限公司 图像边缘的目标检测框位置优化校正方法、系统、介质及终端
CN117523645A (zh) * 2024-01-08 2024-02-06 深圳市宗匠科技有限公司 一种人脸关键点检测方法、装置、电子设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104925B (zh) * 2019-12-30 2022-03-11 上海商汤临港智能科技有限公司 图像处理方法、装置、存储介质和电子设备
CN111695519B (zh) 2020-06-12 2023-08-08 北京百度网讯科技有限公司 关键点定位方法、装置、设备以及存储介质
CN112560726B (zh) * 2020-12-22 2023-08-29 阿波罗智联(北京)科技有限公司 目标检测置信度确定方法、路侧设备及云控平台

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089556B1 (en) * 2017-06-12 2018-10-02 Konica Minolta Laboratory U.S.A., Inc. Self-attention deep neural network for action recognition in surveillance videos
CN109241835A (zh) * 2018-07-27 2019-01-18 上海商汤智能科技有限公司 图像处理方法及装置、电子设备和存储介质
CN109255296A (zh) * 2018-08-06 2019-01-22 广东工业大学 一种基于深度卷积神经网络的日常人体行为识别方法
CN109685041A (zh) * 2019-01-23 2019-04-26 北京市商汤科技开发有限公司 图像分析方法及装置、电子设备和存储介质
CN109726808A (zh) * 2017-10-27 2019-05-07 腾讯科技(深圳)有限公司 神经网络训练方法和装置、存储介质及电子装置
CN111104925A (zh) * 2019-12-30 2020-05-05 上海商汤临港智能科技有限公司 图像处理方法、装置、存储介质和电子设备

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9870523B2 (en) * 2016-01-26 2018-01-16 Kabushiki Kaisha Toshiba Image forming system and image forming apparatus
JP6853528B2 (ja) * 2016-10-25 2021-03-31 東芝デジタルソリューションズ株式会社 映像処理プログラム、映像処理方法、及び映像処理装置
JP2019057836A (ja) * 2017-09-21 2019-04-11 キヤノン株式会社 映像処理装置、映像処理方法、コンピュータプログラム、及び記憶媒体
CN108268863B (zh) * 2018-02-13 2020-12-01 北京市商汤科技开发有限公司 一种图像处理方法、装置和计算机存储介质
JP2019148865A (ja) * 2018-02-26 2019-09-05 パナソニックIpマネジメント株式会社 識別装置、識別方法、識別プログラムおよび識別プログラムを記録した一時的でない有形の記録媒体
JP2019179459A (ja) * 2018-03-30 2019-10-17 株式会社Preferred Networks 推定処理装置、推定モデル生成装置、推定モデル、推定方法およびプログラム
CN110532838A (zh) * 2018-05-25 2019-12-03 佳能株式会社 对象检测装置和方法及存储介质
WO2019235350A1 (ja) * 2018-06-06 2019-12-12 日本電気株式会社 情報処理システム、情報処理方法及び記憶媒体
KR101969050B1 (ko) * 2019-01-16 2019-04-15 주식회사 컨티넘 자세 추정
CN110232706B (zh) * 2019-06-12 2022-07-29 睿魔智能科技(深圳)有限公司 多人跟拍方法、装置、设备及存储介质
CN110348335B (zh) * 2019-06-25 2022-07-12 平安科技(深圳)有限公司 行为识别的方法、装置、终端设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089556B1 (en) * 2017-06-12 2018-10-02 Konica Minolta Laboratory U.S.A., Inc. Self-attention deep neural network for action recognition in surveillance videos
CN109726808A (zh) * 2017-10-27 2019-05-07 腾讯科技(深圳)有限公司 神经网络训练方法和装置、存储介质及电子装置
CN109241835A (zh) * 2018-07-27 2019-01-18 上海商汤智能科技有限公司 图像处理方法及装置、电子设备和存储介质
CN109255296A (zh) * 2018-08-06 2019-01-22 广东工业大学 一种基于深度卷积神经网络的日常人体行为识别方法
CN109685041A (zh) * 2019-01-23 2019-04-26 北京市商汤科技开发有限公司 图像分析方法及装置、电子设备和存储介质
CN111104925A (zh) * 2019-12-30 2020-05-05 上海商汤临港智能科技有限公司 图像处理方法、装置、存储介质和电子设备

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116258722A (zh) * 2023-05-16 2023-06-13 青岛奥维特智能科技有限公司 基于图像处理的桥梁建筑智能检测方法
CN116258722B (zh) * 2023-05-16 2023-08-11 青岛奥维特智能科技有限公司 基于图像处理的桥梁建筑智能检测方法
CN116862980A (zh) * 2023-06-12 2023-10-10 上海玉贲智能科技有限公司 图像边缘的目标检测框位置优化校正方法、系统、介质及终端
CN116862980B (zh) * 2023-06-12 2024-01-23 上海玉贲智能科技有限公司 图像边缘的目标检测框位置优化校正方法、系统、介质及终端
CN117523645A (zh) * 2024-01-08 2024-02-06 深圳市宗匠科技有限公司 一种人脸关键点检测方法、装置、电子设备及存储介质
CN117523645B (zh) * 2024-01-08 2024-03-22 深圳市宗匠科技有限公司 一种人脸关键点检测方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
JP2022520498A (ja) 2022-03-30
CN111104925B (zh) 2022-03-11
KR102432204B1 (ko) 2022-08-12
CN111104925A (zh) 2020-05-05
JP7105383B2 (ja) 2022-07-22
KR20210136138A (ko) 2021-11-16

Similar Documents

Publication Publication Date Title
WO2021135424A1 (zh) 图像处理方法、装置、存储介质和电子设备
US11625953B2 (en) Action recognition using implicit pose representations
CN106934376B (zh) 一种图像识别方法、装置及移动终端
US11468682B2 (en) Target object identification
CN108416250A (zh) 人数统计方法及装置
US10565713B2 (en) Image processing apparatus and method
WO2021164395A1 (zh) 图像处理方法、装置、电子设备及计算机程序产品
CN112926410B (zh) 目标跟踪方法、装置、存储介质及智能视频系统
CN106326853A (zh) 一种人脸跟踪方法及装置
CN107563299B (zh) 一种利用ReCNN融合上下文信息的行人检测方法
CN107909016A (zh) 一种卷积神经网络生成方法及车系识别方法
CN112560710B (zh) 一种用于构建指静脉识别系统的方法及指静脉识别系统
CN113343985B (zh) 车牌识别方法和装置
WO2021217937A1 (zh) 姿态识别模型的训练方法及设备、姿态识别方法及其设备
CN111461145A (zh) 一种基于卷积神经网络进行目标检测的方法
CN108053447A (zh) 基于图像的重定位方法、服务器及存储介质
CN113822163A (zh) 一种复杂场景下的行人目标跟踪方法及装置
CN116453067A (zh) 基于动态视觉识别的短跑计时方法
CN113033524A (zh) 遮挡预测模型训练方法、装置、电子设备及存储介质
CN111401335B (zh) 一种关键点检测方法及装置、存储介质
CN116977783A (zh) 一种目标检测模型的训练方法、装置、设备及介质
Li et al. Detection of partially occluded pedestrians by an enhanced cascade detector
CN114463835A (zh) 行为识别方法、电子设备及计算机可读存储介质
CN113536859A (zh) 行为识别模型训练方法、识别方法、装置及存储介质
CN114677611A (zh) 数据识别方法、存储介质及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20908572

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021557461

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217034504

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20908572

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20908572

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20908572

Country of ref document: EP

Kind code of ref document: A1