CN112258572A

CN112258572A - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN112258572A
Application number: CN202011066445.1A
Authority: CN
Inventors: 何建业
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-01-22

Abstract

The embodiment of the application discloses a target detection method, a target detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining a target image comprising a first object and a second object, inputting the target image into a target detection model to obtain position information of the first object and position information of the second object, and determining the relative position of the first object and the second object according to the position information of the first object and the position information of the second object. And under the condition that the relative position of the first object and the second object accords with the preset position relation, determining that the first object is the target object. In the target detection process, a second object of an auxiliary category is set for a plurality of first objects of the same category, and after the plurality of first objects and the second object are obtained through target detection model detection, the first object with certain characteristics can be further determined from the plurality of first objects according to the relative position between the first object and the second object, so that the target detection precision can be improved.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a target detection method and apparatus, an electronic device, and a storage medium.

Background

The target detection is an important research direction in the field of computer vision, and is widely applied to various fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like. With the development of deep learning technology, the target detection technology based on deep learning is rapidly developed.

In the target detection process, the characteristics of the image are analyzed through a target detection model, so that an object belonging to a preset category can be determined from the image, and the position information of the object in the image is determined. When a plurality of objects of preset categories exist in an image, although the target detection model can determine the plurality of objects of the preset categories, the target detection model cannot determine a target object with certain characteristics from the plurality of objects of the same category (preset categories), and the target detection accuracy is reduced.

Disclosure of Invention

The embodiment of the application provides a target detection method, a target detection device, electronic equipment and a storage medium, and aims to solve the problem that the electronic equipment cannot determine a target object from a plurality of objects of the same category.

In view of the above, a first aspect of the present application provides a target detection method, including:

acquiring a target image, wherein the target image comprises a plurality of first objects in preset categories and at least one second object in auxiliary categories;

inputting the target image into a pre-trained target detection model to obtain the position information of the first object and the position information of the second object output by the target detection model;

and determining the first object of which the relative position with the second object conforms to a preset position relation as a target object according to the position information of the first object and the position information of the second object.

Optionally, the determining, according to the position information of the first object and the position information of the second object, that the first object whose relative position with the second object conforms to a preset position relationship is a target object includes:

determining a target area according to the position information of the second object;

and determining the first object positioned in the target area as the target object according to the position information of each first object.

Optionally, the target area includes an interior of the second object or an exterior of the second object.

Optionally, the determining the target area according to the position information of the second object includes:

and under the condition that a plurality of second objects form a preset shape according to the position information of each second object, determining that the preset position of the preset shape is the target area.

determining a target area corresponding to each first object according to the position information of each first object;

and determining a target area where the second object is located according to the position information of the second object, and taking a first object corresponding to the target area where the second object is located as the target object.

and determining that the first object overlapped with the second object is the target object according to the position information of the first object and the position information of the second object.

Optionally, before the acquiring the target image, the method further includes:

acquiring a plurality of sample images and marking information of each sample image; the sample image comprises the first object and the second object, and the annotation information comprises category information and position information of the first object and category information and position information of the second object;

inputting the sample image into a target detection model to be trained to obtain the predicted position information and the predicted category information of the first object and the predicted position information and the predicted category information of the second object, which are output by the target detection model to be trained;

calculating a first loss value according to the position information and the predicted position information of the first object, calculating a second loss value according to the category information and the predicted category information of the first object, calculating a third loss value according to the position information and the predicted position information of the second object, and calculating a fourth loss value according to the category information and the predicted category information of the second object;

and determining a target loss value according to the first loss value, the second loss value, the third loss value and the fourth loss value, and training the target detection model to be trained according to the target loss value to obtain the target detection model.

A second aspect of the embodiments of the present application provides an object detection apparatus, including:

a first acquisition module configured to acquire a target image including a plurality of first objects of a preset category and a second object of at least one auxiliary category;

the first detection module is configured to input the target image into a pre-trained target detection model, and obtain position information of the first object and position information of the second object output by the target detection model;

the determining module is configured to determine a first object of which the relative position with the second object conforms to a preset position relation as a target object according to the position information of the first object and the position information of the second object.

Optionally, the determining module includes:

a first determination unit configured to determine a target area from the position information of the second object;

a second determination unit configured to determine, as the target object, the first object located in the target area according to the position information of each of the first objects.

Optionally, the second objects are multiple, and the first determining unit is specifically configured to determine that the preset position of the preset shape is the target area when determining that the multiple second objects form the preset shape according to the position information of each second object.

Optionally, the determining module is specifically configured to determine, according to the position information of each first object, a target area corresponding to each first object; and determining a target area where the second object is located according to the position information of the second object, and taking a first object corresponding to the target area where the second object is located as the target object.

Optionally, the determining module is specifically configured to determine, according to the position information of the first object and the position information of the second object, that the first object overlapping with the second object is the target object.

Optionally, the method further includes:

the second acquisition module is configured to acquire a plurality of sample images and the annotation information of each sample image; the sample image comprises the first object and the second object, and the annotation information comprises category information and position information of the first object and category information and position information of the second object;

the second detection module is configured to input the sample image into a target detection model to be trained, and obtain the predicted position information and the predicted category information of the first object and the predicted position information and the predicted category information of the second object, which are output by the target detection model to be trained;

a calculation module configured to calculate a first loss value from the position information and the predicted position information of the first object, calculate a second loss value from the category information and the predicted category information of the first object, and calculate a third loss value from the position information and the predicted position information of the second object, and calculate a fourth loss value from the category information and the predicted category information of the second object;

and the training module is configured to determine a target loss value according to the first loss value, the second loss value, the third loss value and the fourth loss value, and train the target detection model to be trained according to the target loss value to obtain the target detection model.

A third aspect of the embodiments of the present application provides an electronic device, including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the object detection method as described in any one of the optional implementations of the first aspect of the present application.

A fourth aspect of the embodiments of the present application provides a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform an object detection method as described in any one of the optional implementations of the first aspect of the present application.

A fifth aspect of the embodiments of the present application provides a computer program product including instructions, which, when run on an electronic device, enables the electronic device to perform the object detection method as described in any one of the optional implementations of the first aspect of the present application.

According to the technical scheme, the embodiment of the application has the following advantages:

in the embodiment of the application, a target image including a first object and a second object is acquired, the target image is input into a target detection model obtained through pre-training, position information of the first object and position information of the second object output by the target detection model are obtained, and the relative position of the first object and the second object is determined according to the position information of the first object and the position information of the second object. And under the condition that the relative position of the first object and the second object accords with the preset position relation, determining that the first object is the target object. In the target detection process, a second object of an auxiliary category is set for a plurality of first objects of the same category, and after the plurality of first objects and the second object are obtained through target detection model detection, the first object with certain characteristics can be further determined from the plurality of first objects according to the relative position between the first object and the second object, so that the target detection precision can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following briefly introduces the embodiments and the drawings used in the description of the prior art, and obviously, the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained according to the drawings.

FIG. 1 is a flow chart illustrating steps of a method of object detection according to an exemplary embodiment;

FIG. 2 is a schematic illustration of a target image shown in accordance with an exemplary embodiment;

FIG. 3 is a diagram illustrating a detection result according to an exemplary embodiment;

FIG. 4 is a flow chart illustrating steps of another method of object detection in accordance with an exemplary embodiment;

FIG. 5 is a flow chart illustrating steps of a method of model training in accordance with an exemplary embodiment;

FIG. 6 is a block diagram illustrating the structure of an object detection device in accordance with an exemplary embodiment;

FIG. 7 is a block diagram illustrating the structure of an electronic device in accordance with an exemplary embodiment;

fig. 8 is a block diagram illustrating another object detection apparatus according to an exemplary embodiment.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present application, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments. The embodiments in the present application shall fall within the protection scope of the present application.

The technical solution in the embodiment of the present invention is described below with reference to the drawings in the embodiment of the present invention.

Fig. 1 is a flowchart illustrating steps of a target detection method according to an exemplary embodiment, and referring to fig. 1, the target detection method provided in this embodiment may be applied to detect objects in a target image, so as to determine a target object having a certain characteristic from a plurality of objects in the same category included in the target image. The object detection method provided by this embodiment may be executed by an object detection apparatus, where the object detection apparatus is usually implemented in a software and/or hardware manner, and the object detection apparatus may be disposed in an electronic device, and the method may include the following steps:

101. and acquiring a target image.

The target image comprises a plurality of first objects in preset categories and at least one second object in auxiliary categories. The second object is a reference object preset for selecting a specific target object from among the plurality of first objects, based on one first object (target object) having a specific feature.

Illustratively, as shown in fig. 2, fig. 2 is a schematic diagram of a target image according to an exemplary embodiment, where the preset category is "bridge opening", the first object is bridge opening 1, bridge opening 2 and bridge opening 3 shown in fig. 2, the auxiliary category is "composite bridge opening", and the second object is composite bridge opening 4 shown in a solid line box in fig. 2. The target object is a bridge opening (bridge opening 2) between two bridge openings, the composite bridge opening 4 is a preset reference object according to the characteristic that the bridge opening 2 is located between the two bridge openings, and the bridge opening 2 is located inside the composite bridge opening 4.

In practical applications, the preset category and the auxiliary category may be set according to requirements. For example, the preset category may be "person", the target object may be "person wearing hat", and the auxiliary category may be "hat" for the target object, the hat being one object predetermined according to the characteristics of the target object (the head position of the human body has a hat). Alternatively, the preset category may be a cup, the target object may be a "cup with a handle", and the auxiliary category may be a "handle", the handle being located at an intermediate position on one side of the cup.

The specific method for obtaining the target image may be set according to the requirement, and this embodiment does not limit this.

102. And inputting the target image into a target detection model obtained by pre-training to obtain the position information of the first object and the position information of the second object output by the target detection model.

In this embodiment, after acquiring the target image, the electronic device first obtains, through the target detection model, a plurality of first objects belonging to a preset category and one or more second objects belonging to an auxiliary category in the target image, and then determines the target object from the plurality of first objects according to a relative position between the first object and the second object.

Before detecting the target image, a target detection model for detecting the first object and the second object may be obtained by training in advance, where the target detection model is, for example, any one of an RCNN (Region-based volumetric Neural Network) model, an ssd (single Shot multiply box detector) model, or a yolo (young only look) model. After the target image is obtained, the target image may be input into the target detection model to obtain a detection result output by the target detection model. The detection result may include category information and position information of the first object, and category information and position information of the second object. In combination with step 101, if the target image includes the bridge opening 1, the bridge opening 2, and the bridge opening 3, the detection result may include category information "bridge opening" of the bridge opening 1, the bridge opening 2, and the bridge opening 3, and location information of each bridge opening, and may include category information and location information of the composite bridge opening 4. The specific process of detecting the target image by the target detection model may refer to the prior art, which is not described in detail in this embodiment.

103. And determining the first object of which the relative position with the second object accords with the preset position relation as the target object according to the position information of the first object and the position information of the second object.

In this embodiment, after obtaining the detection result output by the target detection model, the relative position of the first object and the second object may be determined according to the position information of the first object and the position information of the second object.

For example, in connection with step 102, the detection result may include the position information of the bridge opening 1, the bridge opening 2, and the bridge opening 3, and the position information of the composite bridge opening 4. The electronic equipment can determine the relative position of each first object (the bridge opening 1, the bridge opening 2 and the bridge opening 3) and the second object (the composite bridge opening 4) respectively according to the position information of each first object and the position information of the second object. As shown in fig. 3, fig. 3 is a schematic diagram of a detection result according to an exemplary embodiment, the position information may be represented by coordinates of areas where the first object and the second object are located, the position information of the bridge opening 1 is coordinates (20, 50) of an upper left corner and coordinates (120, 10) of a lower right corner of a dashed-line frame area where the bridge opening 1 is located, the position information of the bridge opening 2 is coordinates (140, 50) of an upper left corner and coordinates (240, 10) of a lower right corner of a dashed-line frame area where the bridge opening 2 is located, the position information of the bridge opening 3 is coordinates (260, 50) of an upper left corner and coordinates (360, 10) of a lower right corner of a dashed-line frame area where the bridge opening 3 is located, and the position information of the composite bridge opening 4 is coordinates (70, 70) of an upper left corner and coordinates (310, 0) of a lower right corner of a solid-line frame area where the composite bridge opening 4.

The electronic device may compare the coordinates of the bridge opening 1, the bridge opening 2 and the bridge opening 3 with the composite bridge opening 4, respectively, to determine the relative position of each bridge opening with the composite bridge opening 4, respectively. For example, the abscissa of the upper left corner of the bridge opening 1 is smaller than the abscissa of the upper left corner of the composite bridge opening 4, it can be determined that the portion of the bridge opening 1 is located outside the composite bridge opening 4; the abscissa of the lower right corner of the bridge opening 3 is greater than the abscissa of the lower right corner of the composite bridge opening 4, and it can be determined that the part of the bridge opening 3 is located outside the bridge opening 4; and the abscissa of the upper left corner of the bridge opening 2 is greater than the abscissa of the upper left corner of the composite bridge opening 4, and the abscissa of the lower right corner of the bridge opening 2 is less than the abscissa of the lower right corner of the composite bridge opening 4, so that it can be determined that the bridge opening 2 is entirely located inside the bridge opening 4. The specific form of the position information may be set according to a requirement, and the method for determining the relative position of the first object and the second object according to the position information of the first object and the position information of the second object may be set according to the specific form of the position information, which is not limited in this embodiment.

After determining the relative positions of the first object and the second object, the electronic device may determine whether the relative positions of the first object and the second object conform to a preset positional relationship. With reference to the above example, if the predetermined positional relationship is that the first object is located inside the second object, it can be determined that, among the bridge opening 1, the bridge opening 2, and the bridge opening 3, all of the bridge opening 2 are located inside the composite bridge opening 4, and it can be determined that the bridge opening 2 is the target object, that is, the bridge opening 2 is located between two bridge openings.

In practical applications, the preset position relationship may be set according to characteristics of the target object, and in combination with step 101, if the first object is a "person", the target object is a "person wearing a hat", and the second object is a "hat", the preset position relationship may be that the second object is located inside the first object, and the second object is located in the upper half area of the first object. If the first object is a "cup", the target object is a "cup with a handle", and the second object is a "handle", the predetermined positional relationship may be that the second object is located at an outer middle position of the first object.

In summary, in this embodiment, a target image including a first object and a second object is obtained, the target image is input into a pre-trained target detection model, position information of the first object and position information of the second object output by the target detection model are obtained, and a relative position between the first object and the second object is determined according to the position information of the first object and the position information of the second object. And under the condition that the relative position of the first object and the second object accords with the preset position relation, determining that the first object is the target object. In the target detection process, a second object of an auxiliary category is set for a plurality of first objects of the same category, and after the plurality of first objects and the second object are obtained through target detection model detection, the first object with certain characteristics can be further determined from the plurality of first objects according to the relative position between the first object and the second object, so that the target detection precision can be improved.

Optionally, step 103 may be implemented as follows:

and determining a target area where the second object is located according to the position information of the second object, and taking the first object corresponding to the target area where the second object is located as the target object.

In this embodiment, it may be determined that the first object having the second object at the preset position is the target object, after the electronic device obtains the detection result output by the target detection model, the electronic device may determine the target area corresponding to each first object according to the position information of each first object, and if the second object exists in the target area corresponding to the first object, determine that the first object is the target object. For example, the first object is a "person", the second object is a "hat", and the hat is located on the head of the person. After obtaining the position information of the person and the position information of the hat output by the target detection model, the electronic device can firstly determine the area where each person is located as a target area, then determine the target area where the hat is located according to the position information of the hat, and determine the man-made target object 'the person wearing the hat' corresponding to the target area where the hat is located.

The target area corresponding to the first object may be one or more of an inside, an outside, a left side, a right side, an upper side and a lower side of the first object. For example, the target area may include the left and right sides of a person, or include the upper half area or the lower half area of the inside of a human body.

In practical application, the target area is determined according to the position information of the first object, the first object with the second object in the target area is determined to be the target object, the second object can be flexibly set, and the target object can be conveniently detected.

Optionally, step 103 may also be implemented as follows:

and determining the first object overlapped with the second object as the target object according to the position information of the first object and the position information of the second object.

In this embodiment, it may be determined that the first object partially or entirely overlaps the second object is the target object. As shown in fig. 2, the target object may be a bridge opening located on both sides of the bridge, after the position information of the bridge opening 1 and the composite bridge opening 4 is detected, it may be determined that the abscissa of the upper left corner of the bridge opening 1 is smaller than the abscissa of the upper left corner of the composite bridge opening 4, the abscissa of the lower right corner of the bridge opening 1 is larger than the abscissa of the lower right corner of the composite bridge opening 4, it may be determined that the bridge opening 1 and the composite bridge opening 4 are partially overlapped, and at this time, it may be determined that the bridge opening 1 is the target object. Similarly, the bridge opening 3 can be determined as the target object. The first object and the second object may be completely overlapped or partially overlapped, which is not limited in this embodiment.

In practical application, the target object is detected according to the overlapping relation of the first object and the second object, the second object can be flexibly arranged, and the target object can be conveniently detected. .

FIG. 4 is a flow chart illustrating steps of another method of object detection according to an exemplary embodiment, which may include the steps of, as shown in FIG. 4:

401. and acquiring a target image.

402. And inputting the target image into a target detection model obtained by pre-training to obtain the position information of the first object and the position information of the second object output by the target detection model.

403. And determining the target area according to the position information of the second object.

404. And determining the first object positioned in the target area as the target object according to the position information of each first object.

In this embodiment, after the first object and the second object are obtained through detection, the target area may be determined according to the position information of the second object, and then the first object located in the target area is determined as the target object according to the position information of each first object.

Alternatively, the target area may comprise an interior of the second object or an exterior of the second object.

For example, the target area may be the inside of the second object, as shown in fig. 3, after the bridge opening 1, the bridge opening 2, the bridge opening 3, and the composite bridge opening 4 are detected, the inside of the composite bridge opening 4 is determined as the target area according to the position information of the composite bridge opening 4, then the horizontal and vertical coordinates of the bridge opening 1, the bridge opening 2, the bridge opening 3, and the composite bridge opening 4 are compared, and the bridge opening 2 whose horizontal and vertical coordinates are all located inside the composite bridge opening 4 (target area) is determined as the target object.

In this embodiment, the target area may also be an outside of the second object, and when the target area is an outside of the second object, the target area may be an entire area outside of the second object. As shown in fig. 3, the target region may be the entire region other than the solid line frame 4, or the target region may be one or more of the left side, the right side, the upper side, and the lower side of the second object exterior. Referring to fig. 3, when the target area is the left side of the second object, the first object whose abscissa of the lower right corner is smaller than the abscissa of the lower left corner of the second object may be determined as the target object. Conversely, when the target area is the right side of the second object exterior, the first object whose abscissa at the lower left corner is larger than the abscissa at the lower right corner of the second object may be determined as the target object. Similarly, when the target region includes the upper side and the lower side of the second object, the first object located on the upper side and the lower side of the outside of the second object may be determined as the target object.

In practical application, the target area is determined according to the position information of the second object, the first object located in the target area is determined to be the target object, a user can set the second object of an auxiliary category according to the area where the target object is located, the second object can be flexibly set, and the target object can be conveniently detected.

Optionally, when the second object is multiple, step 403 may be implemented as follows:

and under the condition that a plurality of second objects form a preset shape according to the position information of each second object, determining the preset position of the preset shape as the target area.

In this embodiment, when the second objects are multiple, the relative positions of the multiple second objects may be determined according to the position information of the multiple second objects, and if the positions of the multiple second objects form a preset shape, the target area may be determined according to the preset shape. For example, the preset shape may be a straight line, and the preset position may be an upper side of the straight line. After detecting the plurality of second objects, the center coordinates of the second objects may be determined according to the position information of the second objects, for example, the center coordinates of the bridge opening 1 may be determined according to the upper left corner coordinates and the lower right corner coordinates of the bridge opening 1. If the vertical coordinates of the central coordinates of the second objects are the same, it can be determined that the second objects are located on the same horizontal line, and the second objects form a straight line shape. At this time, it may be determined that the upper side of the straight line is the target area, and the first object located in the target area is the target object, that is, when the ordinate of the lower right corner of the first object is higher than the ordinate of the center coordinates of all the second objects, the first object is determined to be the target object.

For another example, the predetermined shape may be a rectangle, and the predetermined position may be an inside of the rectangle. After the plurality of second objects are detected, the center coordinates of each second object are determined, and if the plurality of second objects are determined to surround the rectangle according to the center coordinates of the plurality of second objects, the first object located inside the rectangle (target area) can be determined as the target object according to the coordinates of the first object. Similarly, the outside of the rectangle may also be determined as the target area.

The preset shape may be a regular shape such as a circle, a triangle, a pentagon, or an irregular shape, the preset shape may be one or more, the preset position corresponding to the preset shape may be set according to a requirement, and this embodiment does not limit this.

In practical application, the target area is determined according to the position information of the plurality of second objects, the second objects and the corresponding target area can be flexibly set, and the target area can be determined through the position information of the plurality of second objects, so that the target area can be accurately determined.

FIG. 5 is a flow chart illustrating steps of a method of model training, according to an exemplary embodiment, which may include the steps of, as shown in FIG. 5:

501. a plurality of sample images and annotation information for each sample image are obtained.

The sample image comprises a first object of a preset category and a second object of an auxiliary category, and the labeling information comprises category information and position information of the first object and category information and position information of the second object.

In this embodiment, before detecting the target image, a target detection model may be obtained by training according to a sample image including the first object and the second object. Correspondingly, the labeling information of the sample image includes the category information and the position information of the first object, and the category information and the position information of the second object.

In practical application, a user may label the sample image to determine a first object and a second object in the sample image, as shown in fig. 2, the sample image includes a bridge opening 1, a bridge opening 2, and a bridge opening 3, and the user may label the position information of the bridge opening 1, the bridge opening 2, and the bridge opening 3 respectively using a sample labeling tool, and set the type of the bridge opening 1, the bridge opening 2, and the bridge opening 3 as a preset type "bridge opening" (type information). Specifically, for the bridge opening 1, the coordinates of the upper left corner and the lower right corner of the dashed-line frame region where the bridge opening 1 is marked may be set, and the category of the bridge opening 1 is set, and the obtained marking information may be (0, 20, 50, 120, 10). Wherein "0" indicates that the category of the bridge opening 1 is a preset category, "20 and 50" are the abscissa and ordinate of the upper left corner of the dashed box area where the bridge opening 1 is located, and "120 and 10" are the ordinate and abscissa of the lower right corner of the dashed box area where the bridge opening 1 is located. Similarly, for the composite bridge opening 4, the label information (1, 70, 70, 310, 0) can be obtained, where "1" represents the auxiliary category "composite bridge opening", "70 and 70" are the abscissa and ordinate of the upper left corner of the solid frame area where the composite bridge opening 4 is located, and "310 and 0" are the abscissa and ordinate of the lower right corner of the solid frame area where the composite bridge opening 4 is located. The specific labeling method of the sample image can refer to the prior art, which is not limited in this embodiment.

502. And inputting the sample image into the target detection model to be trained to obtain the predicted position information and the predicted category information of the first object and the predicted position information and the predicted category information of the second object, which are output by the target detection model to be trained.

In this embodiment, after the sample image and the annotation information of the sample image are acquired, the sample image may be input into the target detection model to be trained, so as to obtain an output detection result of the target detection model to be trained. In conjunction with step 102, the detection result may include the predicted category information and predicted position information of the first object (i.e. the category information and position information of the bridge opening 1 output by the target detection model to be trained), and the predicted category information and position information of the second object (i.e. the category information and position information of the composite bridge opening 4 output by the target detection model to be trained).

The target detection model to be trained may be any one of an RCNN model, an SSD model, or a YOLO model.

503. Calculating a first loss value based on the position information and the predicted position information of the first object, calculating a second loss value based on the category information and the predicted category information of the first object, calculating a third loss value based on the position information and the predicted position information of the second object, and calculating a fourth loss value based on the category information and the predicted category information of the second object.

504. And determining a target loss value according to the first loss value, the second loss value, the third loss value and the fourth loss value, and training a target detection model to be trained according to the target loss value to obtain the target detection model.

In this embodiment, after obtaining the detection result output by the target detection model to be trained, the loss value (target loss value) of the target detection model to be trained may be calculated according to the labeling information and the detection result of the sample image. Specifically, a first loss value may be calculated according to the position information and the predicted position information of the first object, a second loss value may be calculated according to the category information and the predicted category information of the first object, a third loss value may be calculated according to the position information and the predicted position information of the second object, a fourth loss value may be calculated according to the category information and the predicted category information of the second object, a target loss value may be calculated according to the first loss value, the second loss value, the third loss value, and the fourth loss value by using a target loss function, a model parameter of the target detection model to be trained may be adjusted according to the target loss value, and a training of the target detection model to be trained may be completed. The process of calculating the loss value according to the position information and the predicted position information, calculating the loss value according to the category information and the predicted category information, and calculating the target loss value through the target loss function may refer to the prior art, which is not described in detail in this embodiment.

In the training process, each sample image can be circularly input into the target detection model to be trained to obtain a corresponding detection result, the first loss value, the second loss value, the third loss value and the fourth loss value are calculated according to the detection result and the labeling information, the target loss is calculated, the target detection model to be trained is trained, and the training is finished when the target loss value is combined with a preset finishing condition to obtain the target detection model. The preset ending condition may be set by referring to the prior art, which is not limited in this embodiment. After the target detection model is obtained, the target image may be detected according to the target detection model, and the position information and the category information of the first object and the second object in the target image may be output.

Fig. 6 is a block diagram illustrating an object detection apparatus according to an exemplary embodiment, and the apparatus 600 may include: a first acquisition module 601, a first detection module 602, and a determination module 603.

The first acquisition module 601 is configured to acquire a target image including a plurality of preset categories of first objects and at least one auxiliary category of second objects.

The first detection module 602 is configured to input a target image into a pre-trained target detection model, and obtain position information of a first object and position information of a second object output by the target detection model.

The determining module 603 is configured to determine, as the target object, a first object whose relative position with respect to a second object matches a preset positional relationship, according to the position information of the first object and the position information of the second object.

Optionally, the determining module 603 includes: a first determination unit and a second determination unit.

The first determination unit is configured to determine the target area from the position information of the second object.

The second determination unit is configured to determine the first object located in the target area as the target object based on the position information of each of the first objects.

Optionally, the target area comprises an interior of the second object or an exterior of the second object.

Optionally, the second objects are multiple, and the first determining unit is specifically configured to determine the preset position of the preset shape as the target area when determining that the multiple second objects form the preset shape according to the position information of each second object.

Optionally, the determining module 603 is specifically configured to determine, according to the position information of each first object, a target region corresponding to each first object; and determining a target area where the second object is located according to the position information of the second object, and taking the first object corresponding to the target area where the second object is located as the target object.

Optionally, the determining module 603 is specifically configured to determine, according to the position information of the first object and the position information of the second object, that the first object overlapping with the second object is the target object.

Optionally, the method further includes: the second acquisition module is configured to acquire a plurality of sample images and the annotation information of each sample image; the sample image includes a first object and a second object, and the labeling information includes category information and position information of the first object, and category information and position information of the second object.

And the second detection module is configured to input the sample image into the target detection model to be trained, and obtain the predicted position information and the predicted category information of the first object and the predicted position information and the predicted category information of the second object, which are output by the target detection model to be trained.

A calculation module configured to calculate a first loss value based on the position information and the predicted position information of the first object, calculate a second loss value based on the category information and the predicted category information of the first object, and calculate a third loss value based on the position information and the predicted position information of the second object, and calculate a fourth loss value based on the category information and the predicted category information of the second object.

And the training module is configured to determine a target loss value according to the first loss value, the second loss value, the third loss value and the fourth loss value, and train a target detection model to be trained according to the target loss value to obtain the target detection model.

Fig. 7 is a block diagram illustrating a structure of an electronic device according to an example embodiment, where the electronic device 700 may include:

a processor 701;

a memory 702 for storing instructions executable by the processor 701;

wherein the processor 701 is configured to perform the method performed by the electronic device in the embodiment shown in fig. 1, fig. 4 or fig. 5.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, the instructions in which, when executed by a processor of an electronic device, enable the electronic device to perform the method performed by the electronic device in the embodiments shown in fig. 1, 4 or 5.

In an exemplary embodiment, a computer program product containing instructions is also provided, which when run on an electronic device, enables the electronic device to perform the method performed by the electronic device in the embodiments shown in fig. 1, fig. 4 or fig. 5.

Fig. 8 is a block diagram illustrating another object detection apparatus according to an example embodiment, and apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the audio data processing methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the apparatus 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed status of the device 800, the relative positioning of components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in the position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, the orientation or acceleration/deceleration of the device 800, and a change in the temperature of the device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby target object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The apparatus 800 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described audio data processing methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the audio data processing method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of object detection, comprising:

2. The method according to claim 1, wherein the determining, as the target object, the first object whose relative position with the second object matches a preset positional relationship according to the position information of the first object and the position information of the second object includes:

3. The method of claim 2, wherein the target area comprises an interior of the second object or an exterior of the second object.

4. The method according to claim 2, wherein the second object is a plurality of objects, and the determining the target area according to the position information of the second object comprises:

5. The method according to claim 1, wherein the determining, as the target object, the first object whose relative position with the second object matches a preset positional relationship according to the position information of the first object and the position information of the second object includes:

6. The method according to claim 1, wherein the determining, as the target object, the first object whose relative position with the second object matches a preset positional relationship according to the position information of the first object and the position information of the second object includes:

7. The method of any of claims 1-6, further comprising, prior to said acquiring a target image:

8. An object detection device, comprising:

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the object detection method of any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the object detection method of any of claims 1-7.