CN111523403B - Method and device for acquiring target area in picture and computer readable storage medium - Google Patents

Method and device for acquiring target area in picture and computer readable storage medium Download PDF

Info

Publication number
CN111523403B
CN111523403B CN202010258207.4A CN202010258207A CN111523403B CN 111523403 B CN111523403 B CN 111523403B CN 202010258207 A CN202010258207 A CN 202010258207A CN 111523403 B CN111523403 B CN 111523403B
Authority
CN
China
Prior art keywords
module
detection
picture
sub
target area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010258207.4A
Other languages
Chinese (zh)
Other versions
CN111523403A (en
Inventor
徐嵚嵛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010258207.4A priority Critical patent/CN111523403B/en
Publication of CN111523403A publication Critical patent/CN111523403A/en
Application granted granted Critical
Publication of CN111523403B publication Critical patent/CN111523403B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)

Abstract

The application relates to the field of image processing, and discloses a method and a device for acquiring a target area in a picture and a computer readable storage medium. The method for acquiring the target area in the picture comprises the following steps: inputting the picture to be detected into a detection model, and acquiring a target detection point in the picture to be detected; the detection model comprises a backbone module and a multi-stage detection module connected with the backbone module; the backbone module is used for acquiring feature graphs of the pictures to be detected in each preset dimension; the multistage detection module is used for carrying out position detection and category detection on the feature map and determining a target detection point according to the detection result of the position detection and the detection result of the category detection; and acquiring a target area from the picture to be detected according to the target detection point. Compared with the prior art, the method and the device for acquiring the target area in the picture and the computer readable storage medium provided by the embodiment of the application have the advantages that the target area where the target to be identified is accurately detected, and the picture is convenient to accurately cut.

Description

Method and device for acquiring target area in picture and computer readable storage medium
Technical Field
The present application relates to the field of image processing, and in particular, to a method and apparatus for acquiring a target area in a picture, and a computer readable storage medium.
Background
With the development of modern information technology to intelligent and humanized, various man-machine interaction, virtual reality and intelligent monitoring systems are sequentially developed. Human body posture estimation, motion recognition, behavior understanding, etc. techniques based on computer vision play an important role therein.
However, the inventor of the present application found that, in the prior art, a deep learning model is generally required to be used, and when the deep learning model performs motion recognition on a picture, if the motion image of the picture is small, for example, when performing motion recognition on a football match picture, if the football match picture is a long-range picture, the motion image of a player is small, and in this case, in the prior art, the picture is generally required to be scaled, so that part of motion details are lost, and the motion recognition effect is poor. In order to improve the action recognition effect and reduce the loss of action details, the prior art also has the technical proposal of cutting pictures, cutting larger pictures into a plurality of small pictures and carrying out action recognition on the small pictures. However, since the region where the target to be identified is located cannot be accurately cut in the prior art, all the small pictures need to be respectively identified in an action mode, which results in a multiple increase in the calculation amount.
Disclosure of Invention
The embodiment of the application aims to provide a method and a device for acquiring a target area in a picture, and a computer readable storage medium, which can accurately detect the target area where a target to be identified is located, and is convenient for accurately cutting the picture.
In order to solve the above technical problems, an embodiment of the present application provides a method for acquiring a target area in a picture, where the method includes: inputting the picture to be detected into a detection model, and acquiring a target detection point in the picture to be detected; the detection model comprises a backbone module and a multi-stage detection module connected with the backbone module; the backbone module is used for acquiring feature graphs of the picture to be detected in each preset dimension; the multi-stage detection module is used for carrying out position detection and category detection on the feature map, and determining the target detection point according to the detection result of the position detection and the detection result of the category detection; and acquiring a target area from the picture to be detected according to the target detection point.
The embodiment of the application also provides a target detection device, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method for acquiring a target area in a picture as described above.
The embodiment of the application also provides a computer readable storage medium which stores a computer program, and the computer program realizes the method for acquiring the target area in the picture when being executed by a processor.
Compared with the prior art, the built detection model comprises a backbone module and a multi-stage detection module, wherein the backbone module acquires the characteristic diagrams of the pictures to be detected in each preset dimension, the characteristic diagrams in a plurality of preset dimensions are respectively input into the multi-stage detection module, the multi-stage detection module respectively performs position detection and category detection on the input characteristic diagrams in the plurality of preset dimensions, and then the obtained detection results of the position detection and the detection results of the category detection are combined to determine the positions corresponding to the target detection points. After the picture to be detected is input into the detection model after the training is completed, the detection model can output the target detection point in the picture to be detected, and after the position of the target detection point is determined, the range of the target area in the picture to be detected can be determined according to the target detection point. Through the accurate identification to the target area, can more be convenient for wait to detect the picture and carry out accurate tailorring, can carry out action recognition through tailorring the picture later, reduce action detail and lose the while, reduce action recognition's calculated quantity.
In addition, the detection result of the position detection comprises a central point coordinate parameter of the target area and a size parameter of the target area; and constructing a loss function of the detection model according to the central point coordinate parameter and the size parameter, wherein the weight of the central point coordinate parameter in the loss function is larger than the weight of the size parameter, and the weight of the central point coordinate parameter and the weight of the size parameter are preset constants. Because the validity of the central point coordinate parameter of the target detection point is greater than that of the size parameter, the weight of the central point coordinate parameter in the loss function is set to be greater than that of the size parameter, and the accuracy of the target detection point output by the detection model can be effectively improved.
In addition, the backbone module comprises a first sub-module, a second sub-module, a third sub-module and a fourth sub-module which are sequentially connected; the first sub-module, the second sub-module, the third sub-module and the fourth sub-module comprise a plurality of convolution modules with the same convolution kernel, and the number of the convolution kernels of the first sub-module, the second sub-module, the third sub-module and the fourth sub-module is increased in sequence; the first sub-module, the second sub-module, the third sub-module and the fourth sub-module are respectively used for outputting the feature graphs of the preset dimensions corresponding to the first sub-module, the second sub-module, the third sub-module and the fourth sub-module.
In addition, the convolution modules with the same convolution kernel comprise a feature mapping sub-module, a convolution computing sub-module, a batch norm computing sub-module, a linear rectification computing sub-module and a feature mapping sub-module which are connected in sequence.
In addition, the detection model further comprises a feature pyramid network connecting the fourth sub-module and the multi-stage detection module; and the characteristic pyramid network is used for combining output results output by the fourth submodule and inputting the combined output results into the multi-stage detection module.
In addition, the determining the target detection point according to the detection result of the position detection and the detection result of the category detection specifically includes: and combining the detection result of the position detection and the detection result of the category detection according to a preset dimension to obtain the target detection point.
In addition, the obtaining the target area from the picture to be detected according to the target detection point specifically includes: and cutting the picture to be detected by taking the target detection point as a center point and the preset size as a side length to obtain the target area.
In addition, before inputting the picture to be detected into the detection model, the method further comprises the following steps: acquiring a plurality of football match pictures; inputting a plurality of football match images into a far-middle and near-view judging model to obtain near-view football match pictures, middle-view football match pictures and far-view football match pictures; and taking the middle-view football match picture and the distant-view football match picture as the pictures to be detected. The middle-view football match picture and the distant-view football match picture are used as pictures to be detected to be input into the detection model, so that the accuracy of the detection results of football actions in the middle-view football match picture and the distant-view football match picture can be effectively improved. In addition, the close-range football match pictures are directly subjected to action recognition, and the pictures are not required to be cut, so that the calculated amount can be effectively reduced.
Drawings
Fig. 1 is a program flow chart of a method for acquiring a target area in a picture according to a first embodiment of the present application;
fig. 2 is a schematic structural diagram of a detection model in the method for acquiring a target area in a picture according to the first embodiment of the present application;
fig. 3 is a schematic structural diagram of a convolution module in the method for obtaining a target area in a picture according to the first embodiment of the present application;
fig. 4 is a schematic structural diagram of a pyramid module in the method for obtaining a target area in a picture according to the first embodiment of the present application;
fig. 5 is a flowchart of a method for acquiring a target area in a picture according to a second embodiment of the present application;
fig. 6 is a schematic structural diagram of an object detection device according to a third embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the claimed application may be practiced without these specific details and with various changes and modifications based on the following embodiments.
The first embodiment of the application relates to a method for acquiring a target area in a picture. The specific flow is shown in fig. 1, and comprises the following steps:
step S101: and inputting the picture to be detected into the detection model.
Specifically, in the present embodiment, as shown in fig. 2, the inspection model includes a backbone module 201 and a multi-stage inspection module 202 connected to the backbone module 201. The backbone module is used for acquiring feature images of the pictures to be detected in each preset dimension, and the multi-stage detection module is used for carrying out position detection and category detection on the feature images and determining target detection points in the pictures to be detected by combining detection results of the position detection and detection results of the category detection.
Further, in the present embodiment, the backbone module 201 includes a first sub-module 2011, a second sub-module 2012, a third sub-module 2013, and a fourth sub-module 2014 that are sequentially connected; the first sub-module 2011, the second sub-module 2012, the third sub-module 2013 and the fourth sub-module 2014 each comprise a plurality of convolution modules 203 with identical convolution kernels. And the number of convolution kernels within the first 2011, second 2012, third 2013 and fourth 2014 sub-modules increases in sequence.
The convolution kernel of the convolution module 203 is a 3x3 convolution kernel, and it is understood that the convolution kernel of the convolution module 203 is a 3x3 convolution kernel only as a specific application example in this embodiment, and in other embodiments of the present application, the convolution kernel of the convolution module 203 may also be other values, such as 4x4, 5x5, etc., which are not listed here, and in particular, may be flexibly set according to actual needs.
In the following, the operation process of the backbone module in this embodiment is specifically illustrated, and it will be understood that the number of convolution kernels and the size of the output feature map described below are only one specific illustration in this embodiment, and are not limited thereto. In this embodiment, the picture to be detected with the size of 1280x720 is input into the backbone module, the number of convolution kernels of the convolution module 203 included in the first submodule 2011 is 128, the feature map output size is 1280x720, the number of convolution kernels of the convolution module 203 included in the second submodule 2012 is 256, the feature map output size is 640x360, the number of convolution kernels of the convolution module 203 included in the third submodule 2013 is 512, the feature map output size is 320x180, and the number of convolution kernels of the convolution module 203 included in the fourth submodule 2014 is 1024, and the feature map output size is 160x90.
Preferably, in this embodiment, as shown in fig. 3, the convolution module 203 includes a feature mapping submodule 2031, a convolution calculation submodule 2032, a batch norm operator submodule 2033, a linear rectification calculation submodule 2034, and a feature mapping submodule 2035, which are connected in sequence.
Note that, in fig. 2, the horizontally extending arrow indicates the same 3×3 convolution calculation process with a step size of 1 as the convolution module 203; the top-down arrow indicates the downsampling process and the bottom-up arrow indicates the upsampling process. The up-sampling process and the down-sampling process are convolution calculation processes, and the step sizes of the up-sampling process and the down-sampling process are different from those of the convolution module 203, for example, the up-sampling process and the down-sampling process may be convolution calculation processes with a step size of 2 or convolution calculation processes with a step size of 4.
Further, in the present embodiment, as shown in fig. 2, the multi-level detection module 202 includes a position detection sub-module 2021 and a category detection sub-module 2022. The position detection sub-module 2021 is configured to perform position detection on an input feature map, and the category detection sub-module 2022 is configured to perform category detection on the input feature map.
In this embodiment, the feature pyramid module 204 is further included. The feature pyramid module 204 is connected to the fourth sub-module 2014, the position detection sub-module 2021, and the category detection sub-module 2022, respectively. The specific structure of the feature pyramid module 204 is shown in fig. 4, and is used for combining the feature information output by the fourth submodule 2014 to form a feature map, and transmitting the feature map to the position detection submodule 2021 and the category detection submodule 2022.
Specifically, in the present embodiment, the multi-stage detection module 202 further includes a combination module connected to the position detection sub-module 2021 and the category detection sub-module 2022. The combination module is configured to combine the detection results of the position detection sub-module 2021 and the detection results of the category detection sub-module 2022. The location of the target detection point is determined. In the present embodiment, the combination module performs combination calculation of a preset dimension on the detection result of the position detection sub-module 2021 and the detection result of the category detection sub-module 2022 to obtain a combination result, and uses the combination result as the target detection point. For example, the dimension of the detection result of the position detection sub-module 2021 before combination is 5x5x24, and for position detection, 4-dimensional position coordinates are unified, so 5x5x24 is converted to 150x4. Similarly, the dimension of the detection result of the class detection sub-module 2022 before combination is 5x5x18, and for the class detection, the 3-dimensional class number is uniform, so the 5x5x18 is transformed to 150x3, so that combination is performed according to the common dimension 150. It should be understood that the foregoing is merely a specific example of the present embodiment, and is not limited thereto, and other combinations may be adopted in other embodiments of the present application, and the present application is not limited to the specific examples, and may be flexibly set according to actual needs.
Specifically, the detection model is a learning model with training completed. The training process comprises the following steps: training samples are first obtained. For example, in the present embodiment, a soccer game picture is exemplified.
First, a plurality of football match pictures are acquired, and for example, a plurality of frames of images may be extracted from a piece of football match video. Then, the target areas in the football match pictures are marked by anchor frame marks, for example, football and ball holders can be marked by manual marking and other methods, and a plurality of football match pictures with marks are formed as training samples. And finally, inputting the training sample into a detection model, firstly, acquiring feature images of each preset dimension of the training sample through a backbone module, then, carrying out position detection on the feature images through a position detection sub-module, and carrying out category detection on the feature images through a category detection sub-module. And solving a loss function according to the position detection result and the category detection result and combining the labels of the training samples. Specifically, the position detection sub-module presets an anchor frame on the feature map, and each coordinate point on the feature map generates a corresponding anchor frame. And comparing each anchor frame with the anchor frame label, and calculating a loss function.
The loss function calculation formula of the position detection result is as follows:
wherein L is loc For loss of position detection, n posotives And posives is an anchor frame with IoU greater than 0.5 as compared to the anchor frame label; the SizeLoss input is the last two dimensions of the four-dimensional coordinates after combination transformation in the detection module, namely the width and the height of the target area, and the Sizecenter is the four-dimensional coordinates after combination transformation in the detection module, namely the coordinates of the target detection point; smoothL1 represents a smooth L1 loss function, specifically as follows:
IoU is IoU = (A.u.B)/(A.u.B), and A, B is respectively an anchor frame and an anchor frame mark.
In the present embodiment, since the validity of the target detection point center point coordinate parameter is greater than the validity of the size parameter of the target area, different weights are assigned and α2> α1. I.e. the weight of the position offset is larger than the weight of the predicted size offset.
The loss function calculation formula of the category detection result is as follows:
wherein L is conf For loss of class detection results, CELoss is the cross entropy loss function and hard negotives is a fixed multiple of posotives. hard negotives are obtained by respective negative matches (IoU<0.5 The largest of the cross entropy losses of anchor frame prediction).
The loss function is L:. Wherein, beta is a preset constant.
After the loss function is obtained, training the model through a back propagation algorithm, and continuously reducing the value of the loss function until the value of the loss function reaches a preset threshold.
Step S102: and acquiring a target area from the picture to be detected according to the target detection point.
Specifically, in this embodiment, after the target detection point is obtained, the side length of the preset size can be set according to the actual requirement, and the picture to be detected is cut by using the target detection point as the center point, so as to obtain the target area.
Compared with the prior art, in the method for acquiring the target area in the picture provided by the first embodiment of the application, the established detection model comprises two parts, namely a backbone module and a multi-stage detection module, after the backbone module acquires the feature images of the picture to be detected in a plurality of preset dimensions, the feature images in the plurality of preset dimensions are respectively input into the multi-stage detection module, and the multi-stage detection module respectively performs position detection and category detection on the input feature images in the plurality of preset dimensions, and then combines the obtained detection results of the position detection and the detection results of the category detection to determine the position corresponding to the target detection point. By training the data of the detection model, the accuracy of the target detection point output by the detection model is improved, after the picture to be detected is input into the detection model after the training is completed, the detection model can output the target detection point in the picture to be detected, and after the position of the target detection point is determined, the range of the target area in the picture to be detected can be determined according to the target detection point. Through the accurate identification to the target area, can more be convenient for wait to detect the picture and carry out accurate tailorring, can carry out action recognition through tailorring the picture later, reduce action detail and lose the while, reduce action recognition's calculated quantity.
The second embodiment of the application relates to a method for acquiring a target area in a picture. The second embodiment is substantially the same as the first embodiment, and differs mainly in that: the second embodiment is a cropping process applied to a soccer game picture. The specific steps are shown in fig. 5, including:
step S201: a plurality of football match pictures are acquired.
Specifically, in this embodiment, a football match video may be obtained; and extracting a plurality of picture frames from the football match video to obtain a plurality of football match pictures. It should be understood that the foregoing is merely a specific illustration of capturing a plurality of football match pictures in the present embodiment, and is not meant to be limiting.
Step S202: and inputting a plurality of football match images into a far-middle and near-view judging model to obtain near-view football match pictures, middle-view football match pictures and far-view football match pictures.
Specifically, in this embodiment, the far, middle and near view judgment model is a res net34 network, and the specific structure is as follows:
it should be understood that the foregoing table is merely a specific structural example of the network of the res net34, and is not limited thereto, and in other embodiments of the present application, other structures may be used, which are not listed herein, and may be specifically set flexibly according to actual needs.
Step S203: cutting the close-range football match pictures according to a preset size to obtain a target area.
Step S204: inputting the middle-view football match pictures and the distant-view football match pictures into a detection model after training is completed, and obtaining target detection points output by the detection model.
Step S205: and acquiring a target area from the middle-view football match picture and the distant-view football match picture according to the target detection point.
It should be understood that the steps S204 to S205 are substantially the same as the steps S101 to S102 in the first embodiment, and specific reference may be made to the specific description of the first embodiment, which is not repeated herein.
Compared with the prior art, the method for acquiring the target area in the picture provided by the second embodiment of the application acquires the close-range football match picture, the middle-range football match picture and the far-range football match picture in the far-middle close-range judgment model while retaining the technical effect of the first embodiment; for the close-range football match picture, as the action details are clear, the target area is obtained by directly cutting according to the preset size or the target area can be cut without cutting, so that the operation amount is effectively reduced.
The above steps of the methods are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they contain the same logic relationship, and they are all within the protection scope of this patent; it is within the scope of this patent to add insignificant modifications to the algorithm or flow or introduce insignificant designs, but not to alter the core design of its algorithm and flow.
A third embodiment of the present application relates to an object detection apparatus, as shown in fig. 6, including: at least one processor 601; and a memory 602 communicatively coupled to the at least one processor 601; the memory 602 stores instructions executable by the at least one processor 601, where the instructions are executed by the at least one processor 601, so that the at least one processor 601 can perform a method for acquiring a target area in a picture as described above.
Where the memory 602 and the processor 601 are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors 601 and the memory 602. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 601 is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor 601.
The processor 601 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 602 may be used to store data used by processor 601 in performing operations.
A fourth embodiment of the present application relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the application and that various changes in form and details may be made therein without departing from the spirit and scope of the application.

Claims (7)

1. The method for acquiring the target area in the picture is characterized by comprising the following steps:
inputting the picture to be detected into a detection model, and obtaining a target detection point in the picture to be detected; the detection model comprises a backbone module and a multi-stage detection module connected with the backbone module; the backbone module is used for acquiring feature graphs of the picture to be detected in each preset dimension; the multi-stage detection module is used for carrying out position detection and category detection on the feature map, and determining the target detection point according to the detection result of the position detection and the detection result of the category detection;
acquiring a target area from the picture to be detected according to the target detection point;
the backbone module comprises a first sub-module, a second sub-module, a third sub-module and a fourth sub-module which are sequentially connected; the first sub-module, the second sub-module, the third sub-module and the fourth sub-module comprise a plurality of convolution modules with the same convolution kernel, and the number of the convolution kernels of the first sub-module, the second sub-module, the third sub-module and the fourth sub-module is increased in sequence; the first sub-module, the second sub-module, the third sub-module and the fourth sub-module are respectively used for outputting the feature graphs of the corresponding preset dimensions; the convolution modules with the same convolution kernel comprise a feature mapping sub-module, a convolution computing sub-module, a batch norm computing sub-module, a linear rectification computing sub-module and a feature mapping sub-module which are connected in sequence; the detection model further comprises a characteristic pyramid network connecting the fourth sub-module and the multi-level detection module; and the characteristic pyramid network is used for combining output results output by the fourth submodule and inputting the combined output results into the multi-stage detection module.
2. The method for obtaining a target area in a picture according to claim 1, wherein the detection result of the position detection includes a center point coordinate parameter of the target area and a size parameter of the target area;
and constructing a loss function of the detection model according to the central point coordinate parameter and the size parameter, wherein the weight of the central point coordinate parameter in the loss function is larger than the weight of the size parameter, and the weight of the central point coordinate parameter and the weight of the size parameter are preset constants.
3. The method for acquiring the target area in the picture according to claim 1, wherein the determining the target detection point according to the detection result of the position detection and the detection result of the category detection specifically includes:
and combining the detection result of the position detection and the detection result of the category detection according to a preset dimension to obtain the target detection point.
4. The method for obtaining a target area in a picture according to claim 1, wherein the obtaining the target area from the picture to be detected according to the target detection point specifically includes:
and cutting the picture to be detected by taking the target detection point as a center point and the preset size as a side length to obtain the target area.
5. The method for obtaining a target area in a picture according to claim 1, wherein before inputting the picture to be detected into the detection model, the method further comprises:
acquiring a plurality of football match pictures;
inputting a plurality of football match pictures into a far-middle and near-view judging model to obtain near-view football match pictures, middle-view football match pictures and far-view football match pictures;
and taking the middle-view football match picture and the distant-view football match picture as the pictures to be detected.
6. An object detection apparatus, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to at least one of the processors; wherein,,
the memory stores instructions executable by at least one of the processors to enable the at least one of the processors to perform the method of acquiring a target region in a picture according to any one of claims 1 to 5.
7. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of acquiring a target area in a picture as claimed in any one of claims 1 to 5.
CN202010258207.4A 2020-04-03 2020-04-03 Method and device for acquiring target area in picture and computer readable storage medium Active CN111523403B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010258207.4A CN111523403B (en) 2020-04-03 2020-04-03 Method and device for acquiring target area in picture and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010258207.4A CN111523403B (en) 2020-04-03 2020-04-03 Method and device for acquiring target area in picture and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111523403A CN111523403A (en) 2020-08-11
CN111523403B true CN111523403B (en) 2023-10-20

Family

ID=71901943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010258207.4A Active CN111523403B (en) 2020-04-03 2020-04-03 Method and device for acquiring target area in picture and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111523403B (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541494A (en) * 2010-12-30 2012-07-04 中国科学院声学研究所 Video size switching system and video size switching method facing display terminal
CN104091171A (en) * 2014-07-04 2014-10-08 华南理工大学 Vehicle-mounted far infrared pedestrian detection system and method based on local features
CN107392244A (en) * 2017-07-18 2017-11-24 厦门大学 The image aesthetic feeling Enhancement Method returned based on deep neural network with cascade
CN108108657A (en) * 2017-11-16 2018-06-01 浙江工业大学 A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning
WO2019041360A1 (en) * 2017-09-04 2019-03-07 华为技术有限公司 Pedestrian attribute recognition and positioning method and convolutional neural network system
CN109492608A (en) * 2018-11-27 2019-03-19 腾讯科技(深圳)有限公司 Image partition method, device, computer equipment and storage medium
CN109886273A (en) * 2019-02-26 2019-06-14 四川大学华西医院 A kind of CMR classification of image segmentation system
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning
CN110069993A (en) * 2019-03-19 2019-07-30 同济大学 A kind of target vehicle detection method based on deep learning
CN110309876A (en) * 2019-06-28 2019-10-08 腾讯科技(深圳)有限公司 Object detection method, device, computer readable storage medium and computer equipment
CN110363104A (en) * 2019-06-24 2019-10-22 中国科学技术大学 A kind of detection method of diesel oil black smoke vehicle
CN110414574A (en) * 2019-07-10 2019-11-05 厦门美图之家科技有限公司 A kind of object detection method calculates equipment and storage medium
CN110503112A (en) * 2019-08-27 2019-11-26 电子科技大学 A kind of small target deteection of Enhanced feature study and recognition methods
CN110729045A (en) * 2019-10-12 2020-01-24 闽江学院 Tongue image segmentation method based on context-aware residual error network
WO2020024584A1 (en) * 2018-08-03 2020-02-06 华为技术有限公司 Method, device and apparatus for training object detection model
CN110781728A (en) * 2019-09-16 2020-02-11 北京嘀嘀无限科技发展有限公司 Face orientation estimation method and device, electronic equipment and storage medium
WO2020032354A1 (en) * 2018-08-06 2020-02-13 Samsung Electronics Co., Ltd. Method, storage medium and apparatus for converting 2d picture set to 3d model
CN110852177A (en) * 2019-10-17 2020-02-28 北京全路通信信号研究设计院集团有限公司 Obstacle detection method and system based on monocular camera

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694401B (en) * 2018-05-09 2021-01-12 北京旷视科技有限公司 Target detection method, device and system
CN109272530B (en) * 2018-08-08 2020-07-21 北京航空航天大学 Target tracking method and device for space-based monitoring scene

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541494A (en) * 2010-12-30 2012-07-04 中国科学院声学研究所 Video size switching system and video size switching method facing display terminal
CN104091171A (en) * 2014-07-04 2014-10-08 华南理工大学 Vehicle-mounted far infrared pedestrian detection system and method based on local features
CN107392244A (en) * 2017-07-18 2017-11-24 厦门大学 The image aesthetic feeling Enhancement Method returned based on deep neural network with cascade
WO2019041360A1 (en) * 2017-09-04 2019-03-07 华为技术有限公司 Pedestrian attribute recognition and positioning method and convolutional neural network system
CN108108657A (en) * 2017-11-16 2018-06-01 浙江工业大学 A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning
WO2020024584A1 (en) * 2018-08-03 2020-02-06 华为技术有限公司 Method, device and apparatus for training object detection model
WO2020032354A1 (en) * 2018-08-06 2020-02-13 Samsung Electronics Co., Ltd. Method, storage medium and apparatus for converting 2d picture set to 3d model
CN109492608A (en) * 2018-11-27 2019-03-19 腾讯科技(深圳)有限公司 Image partition method, device, computer equipment and storage medium
CN109886273A (en) * 2019-02-26 2019-06-14 四川大学华西医院 A kind of CMR classification of image segmentation system
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning
CN110069993A (en) * 2019-03-19 2019-07-30 同济大学 A kind of target vehicle detection method based on deep learning
CN110363104A (en) * 2019-06-24 2019-10-22 中国科学技术大学 A kind of detection method of diesel oil black smoke vehicle
CN110309876A (en) * 2019-06-28 2019-10-08 腾讯科技(深圳)有限公司 Object detection method, device, computer readable storage medium and computer equipment
CN110414574A (en) * 2019-07-10 2019-11-05 厦门美图之家科技有限公司 A kind of object detection method calculates equipment and storage medium
CN110503112A (en) * 2019-08-27 2019-11-26 电子科技大学 A kind of small target deteection of Enhanced feature study and recognition methods
CN110781728A (en) * 2019-09-16 2020-02-11 北京嘀嘀无限科技发展有限公司 Face orientation estimation method and device, electronic equipment and storage medium
CN110729045A (en) * 2019-10-12 2020-01-24 闽江学院 Tongue image segmentation method based on context-aware residual error network
CN110852177A (en) * 2019-10-17 2020-02-28 北京全路通信信号研究设计院集团有限公司 Obstacle detection method and system based on monocular camera

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Adult Image and Video Recognition by a Deep Multicontext Network and Fine-to-Coarse Strategy;Ou X 等;ACM Transactions on Intelligent Systems and Technology (TIST);第8卷(第5期);1-5 *
Deep fully-connected networks for video compressive sensing;Iliadis M 等;Digital Signal Processing;9-18 *
基于Faster RCNN的道路车辆检测算法研究;刘敦强;中国硕士学位论文全文数据库 工程科技Ⅱ辑(第(2019)02期);C034-519 *
基于SSD和MobileNet网络的目标检测方法的研究;任宇杰 等;计算机科学与探索;第13卷(第11期);1881-1893 *
视频显著性检测研究进展;丛润民 等;软件学报;第29卷(第08期);2527-2544 *

Also Published As

Publication number Publication date
CN111523403A (en) 2020-08-11

Similar Documents

Publication Publication Date Title
CN108229509B (en) Method and device for identifying object class and electronic equipment
CN110363817B (en) Target pose estimation method, electronic device, and medium
WO2022116423A1 (en) Object posture estimation method and apparatus, and electronic device and computer storage medium
CN114186632B (en) Method, device, equipment and storage medium for training key point detection model
CN109858476B (en) Tag expansion method and electronic equipment
CN108573471B (en) Image processing apparatus, image processing method, and recording medium
CN112509036B (en) Pose estimation network training and positioning method, device, equipment and storage medium
CN114519881A (en) Face pose estimation method and device, electronic equipment and storage medium
CN110910375A (en) Detection model training method, device, equipment and medium based on semi-supervised learning
CN114092963A (en) Key point detection and model training method, device, equipment and storage medium
CN114359665A (en) Training method and device of full-task face recognition model and face recognition method
CN110276801B (en) Object positioning method and device and storage medium
CN111177811A (en) Automatic fire point location layout method applied to cloud platform
CN115965961B (en) Local-global multi-mode fusion method, system, equipment and storage medium
CN111523403B (en) Method and device for acquiring target area in picture and computer readable storage medium
CN115131621A (en) Image quality evaluation method and device
CN110633630B (en) Behavior identification method and device and terminal equipment
CN116069801B (en) Traffic video structured data generation method, device and medium
WO2023109086A1 (en) Character recognition method, apparatus and device, and storage medium
CN113721240B (en) Target association method, device, electronic equipment and storage medium
CN113033578B (en) Image calibration method, system, terminal and medium based on multi-scale feature matching
CN113705643A (en) Target detection method and device and electronic equipment
CN113792660B (en) Pedestrian detection method, system, medium and equipment based on improved YOLOv3 network
CN113139579B (en) Image classification method and system based on image feature self-adaptive convolution network
CN116152345B (en) Real-time object 6D pose and distance estimation method for embedded system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant