CN113326796A

CN113326796A - Object detection method, model training method and device and electronic equipment

Info

Publication number: CN113326796A
Application number: CN202110672301.9A
Authority: CN
Inventors: 钱正宇; 袁正雄; 李金麒; 褚振方; 黄悦; 李润青; 胡鸣人; 施恩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-08-31
Anticipated expiration: 2041-06-17
Also published as: CN113326796B

Abstract

The disclosure provides an object detection method, a model training device and electronic equipment, and relates to the technical field of artificial intelligence such as computer vision and deep learning. The specific implementation scheme is as follows: acquiring an image to be detected; carrying out object detection on the image to be detected to obtain first object detection information of the image to be detected; performing scene detection on the image to be detected to acquire target scene detection information of the image to be detected, wherein the target scene detection information comprises a target scene category corresponding to the image to be detected; acquiring a scene recognition model corresponding to the target scene category; and determining second object detection information of the image to be detected based on the scene recognition model and the first object detection information. According to the technology disclosed by the invention, the problem of poor object detection effect in the object detection technology is solved, and the object detection effect is improved.

Description

Object detection method, model training method and device and electronic equipment

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and specifically relates to an object detection method, a model training device and electronic equipment.

Background

With the rapid development of artificial intelligence, more and more application scenes can solve practical problems based on deep learning object detection technology, such as application scenes for inspection in the retail industry, application scenes for inspection by crop unmanned aerial vehicles, application scenes for pipeline detection of industrial standard parts, and the like.

In these application scenarios, the image to be detected usually includes multiple detection scenarios, so that one object detection application often needs to process data of multiple detection scenarios simultaneously.

At present, an object detection application generally integrates only a single deep learning model, and detects an object with a plurality of detection scenes through the integrated deep learning model.

Disclosure of Invention

The disclosure provides an object detection method, a model training device and electronic equipment.

According to a first aspect of the present disclosure, there is provided an object detection method comprising:

acquiring an image to be detected;

carrying out object detection on the image to be detected to obtain first object detection information of the image to be detected;

performing scene detection on the image to be detected to acquire target scene detection information of the image to be detected, wherein the target scene detection information comprises a target scene category corresponding to the image to be detected;

acquiring a scene recognition model corresponding to the target scene category;

and determining second object detection information of the image to be detected based on the scene recognition model and the first object detection information.

According to a second aspect of the present disclosure, there is provided a model training method, comprising:

acquiring target data, wherein the target data comprises scene image sample data in a target scene category in an industry scene library, and/or input object image sample data in the target scene category, and the target scene category is a scene category in target scene detection information acquired by performing scene detection on an image to be detected;

training a scene recognition model corresponding to the target scene category based on the target data;

the scene recognition model is used for determining second object detection information of the image to be detected by combining first object detection information, and the first object detection information is object detection information obtained by performing object detection on the image to be detected.

According to a third aspect of the present disclosure, there is provided an object detection apparatus comprising:

the first acquisition module is used for acquiring an image to be detected;

the object detection module is used for carrying out object detection on the image to be detected to obtain first object detection information of the image to be detected;

the scene detection module is used for carrying out scene detection on the image to be detected so as to obtain target scene detection information of the image to be detected, wherein the target scene detection information comprises a target scene category corresponding to the image to be detected;

the second acquisition module is used for acquiring a scene identification model corresponding to the target scene category;

and the determining module is used for determining second object detection information of the image to be detected based on the scene recognition model and the first object detection information.

According to a fourth aspect of the present disclosure, there is provided a model training apparatus comprising:

a fifth obtaining module, configured to obtain target data, where the target data includes scene image sample data in a target scene category, and the target scene category is a scene category in target scene detection information obtained by performing scene detection on an image to be detected;

the first training module is used for training a scene recognition model corresponding to the target scene category based on the target data;

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the methods of the first aspect or to perform any one of the methods of the second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform any one of the methods of the first aspect or to perform any one of the methods of the second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements any of the methods of the first aspect, or implements any of the methods of the second aspect.

According to the technology disclosed by the invention, the problem of poor object detection effect in the object detection technology is solved, and the object detection effect is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flow chart of an object detection method according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram of the identification of an object frame in an image to be detected;

FIG. 3 is a schematic diagram of a model structure in an object detection platform;

FIG. 4 is a schematic diagram of the identification of a scene frame in an image to be detected;

FIG. 5 is an identification schematic of a shelf scene;

FIG. 6 is a schematic overall flow chart of object detection in the object detection platform;

FIG. 7 is a schematic flow chart diagram of a model training method according to a second embodiment of the present disclosure;

fig. 8 is a schematic structural view of an object detection device according to a third embodiment of the present disclosure;

FIG. 9 is a schematic structural diagram of a model training apparatus according to a fourth embodiment of the present disclosure;

FIG. 10 shows a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

First embodiment

As shown in fig. 1, the present disclosure provides an object detection method, including the steps of:

step S101: and acquiring an image to be detected.

In the embodiment, the object detection method relates to an artificial intelligence technology, in particular to the technical field of computer vision and deep learning, and can be widely applied to application scenes such as inspection aiming at the retail industry, inspection aiming at crop unmanned aerial vehicles, assembly line detection of industrial standard parts and the like. The method may be performed by an object detection apparatus of an embodiment of the present disclosure. The object detection apparatus may be configured in any electronic device to perform the object detection method according to the embodiment of the present disclosure, and the electronic device may be a server or a terminal, which is not limited herein.

The image to be detected may be an image including object image content and object scene image content, the object related to the object image content includes at least one object, and the scene related to the object scene image content may also include at least one scene.

For example, for an application scene of inspection in the retail industry, scenes related to image content in an image to be inspected may include a shelf scene, a ground heap scene, and the like.

For another example, for an application scene of the unmanned aerial vehicle for crops inspection, scenes related to image content in an image to be detected may include a terrace scene, a paddy field scene, a corn crop scene, a rice crop scene and the like.

The purpose of this embodiment is to perform object detection on an image to be detected to determine relevant information of an object in the image to be detected in an actual scene, so that corresponding applications can be performed based on the detected relevant information.

For example, to the application scene that retail trade patrolled and examined, can detect the number of piles of commodity on the goods shelves, the arrangement face of this commodity in this layer accounts for than and the plumpness that this commodity discharged on goods shelves etc. can carry out automatic selling to this commodity according to the number of piles of commodity on goods shelves like this, adjust its arrangement face and account for than according to the condition of selling to and carry out commodity according to the plumpness and discharge. For another example, the actual stacking quantity and the commodity placing depth of the commodities in the ground stacking scene can be detected, so that the labor cost for counting the commodities can be reduced.

The acquisition mode of the image to be detected can include multiple modes, for example, a camera can be adopted to acquire the image in real time and use the image as the image to be detected, a pre-stored image to be detected can also be acquired, the image to be detected can also be downloaded from a network, or the image sent by other electronic equipment can be received and used as the image to be detected.

Step S102: and carrying out object detection on the image to be detected to obtain first object detection information of the image to be detected.

In this step, the first object detection information may be a list of objects in the image to be detected, including but not limited to object type, confidence, object position information, and the like of each object in the image to be detected, where the object position information refers to a pixel position of the object in the image to be detected, and may be calibrated by a frame-shaped identifier, which may be referred to as an object frame. As shown in fig. 2, a square solid identifier 201 represents an object, and a bold frame 202 outside the object represents an object frame of the object.

The image to be detected can be subjected to object detection based on an object detection model, the object detection model can be an object detection model of customized training, the customized training refers to training based on actual data, namely, the input of the customized training object detection model is the actual data input by a user, and the output is the object detection model. The object detection model may be trained by using an existing or new training method, which is not specifically limited herein.

In practical application, referring to fig. 3, fig. 3 is a schematic diagram of a model structure in an object detection platform, as shown in fig. 3, an object detection model may be one model in the object detection platform, the object detection platform may be accessed to object detection applications in different industries, and the object detection applications in different industries may reuse one object detection model, or each object detection application may be trained to obtain one object detection model corresponding to the object detection model, which is not specifically limited herein.

In addition, the customized training link of the object detection model can run on the deep learning customized training platform, and all processes and optimization functions of the deep learning customized training platform can be supported, such as data enhancement, data feature extraction, pre-training model selection, parameter tuning, model evaluation, attribution analysis and the like.

Step S103: and carrying out scene detection on the image to be detected so as to obtain target scene detection information of the image to be detected, wherein the target scene detection information comprises a target scene category corresponding to the image to be detected.

In this step, the target scene detection information may be used to represent an environment of all or part of the detection scenes corresponding to the image to be detected, and may be a list of scenes in the image to be detected, including but not limited to scene categories, confidence degrees, scene position information, and the like of each detection scene in the image to be detected, where the scene position information refers to a pixel position of the detection scene in the image to be detected, and may also be calibrated by a frame-shaped identifier, which may be referred to as a scene frame. As shown in fig. 4, a square grid marker 401 represents a shelf comprising a plurality of layers on which an object 402 is placed, and a bold frame 403 outside the shelf represents a scene frame of the scene of the shelf.

The industries related to the image content in the image to be detected are different, and the target scene detection information can also be different, for example, for an application scene of retail industry inspection, the scene related to the image content in the image to be detected can include a shelf scene, a ground pile scene and the like. For another example, for an application scene of the unmanned aerial vehicle for crops inspection, scenes related to image content in an image to be detected may include a terrace scene, a paddy field scene, a corn crop scene, a rice crop scene and the like. The two industries have completely different scene types, and correspondingly, target scene detection information is also different.

The method comprises the steps of carrying out scene detection on an image to be detected to obtain target scene detection information, wherein the target scene type included in the target scene detection information can be the scene type of all detection scenes related to the image to be detected.

The method also can perform scene detection on the image to be detected to obtain scene detection information, wherein the scene detection information is used for representing the environment of all detection scenes corresponding to the image to be detected, and then the scene detection information can be filtered based on the scene categories corresponding to the scene detection information to finally obtain target scene detection information.

Filtering the scene detection information based on the scene type corresponding to the scene detection information to obtain the target scene detection information may include: the target scene detection information can be finally obtained by filtering the scene detection information of the corresponding scene category in the scene detection information based on the scene category corresponding to the scene detection information and according to the filtering information (the filtering information can represent which scene categories need to be filtered). Therefore, unnecessary scene recognition operation can be avoided, the difficulty of scene recognition can be reduced, the time of scene recognition can be shortened, and the effect of object detection can be improved.

For example, for an application scene of inspection in the retail industry, performing scene detection on the image to be detected to obtain scene detection information, where the scene detection information includes scene detection information of a shelf scene and scene detection information of a ground pile scene, filtering the scene detection information of the ground pile scene in the scene detection information according to filtering information (the filtering information represents that the ground pile scene needs to be filtered) based on a scene category corresponding to the scene detection information, and finally obtaining target scene detection information of the shelf scene.

The image to be detected can be subjected to scene detection based on a scene detection model, the scene detection model can be a customized training scene detection model and can also be an industry universal scene detection model, the industry universal scene detection model can be called as a prefabricated scene detection model, and specific limitation is not performed here.

Wherein, the trade is commonly referred to: although the scenes involved in each object detection application are different, the scenes are usually similar when the application is enlarged to the industry dimension, so that for one industry, even if different object detection applications are used, a universal scene detection model can be used for detecting the scenes of the image to be detected, namely, the different object detection applications of one industry can multiplex one scene detection model, and therefore the problem of multi-scene adaptation in most applications in the industry dimension can be solved.

For each industry type, an industry general scene detection model can be obtained based on the training of scene image sample data under the industry type in an industry scene library, the input of the industry general scene detection model is large-scale scene image sample data under the industry type in the industry scene library, and the output of the industry general scene detection model is a scene detection model. Data of an industry scene library can be acquired through manual acquisition or network crawling, and in an object detection platform, the richness of scene data is improved more and more along with the increase of the number of accessed object detection applications.

In practical application, as shown in fig. 3, the object detection platform may include a scene detection model, and may include scene detection models corresponding to a plurality of industry categories, for example, a scene detection model in a retail industry polling application scene, a scene detection model in a crop unmanned aerial vehicle polling application scene, and a scene detection model in an industry standard part assembly line detection application scene. Before scene detection is carried out on the image to be detected based on a scene detection model, the target industry category corresponding to the image to be detected can be obtained; and acquiring a scene detection model corresponding to the target industry category from a prefabricated scene detection model of the object detection platform, and then carrying out scene detection on the image to be detected based on the scene detection model, so that the accuracy of the scene detection of the image to be detected can be improved.

Step S104: and acquiring a scene recognition model corresponding to the target scene category.

The scene recognition model may be a model capable of handling scene recognition of a plurality of detection scenes, which may include detection scenes in a target scene category; the scene recognition model may be a model for performing scene recognition only on a detection scene in the target scene category, which is a specific detection scene, and is not specifically limited herein.

The scene recognition model is taken as an example for performing scene recognition only for a specific detection scene, and an image obtained by cutting an image to be detected or an image to be detected according to a target scene type and a target scene frame detected in the scene detection model are input into the scene recognition model, and the target scene frame is a scene frame corresponding to the target scene type and is output as scene recognition information.

The scene identification information is used for representing the scene environment of the object in the scene frame corresponding to the target scene type, and the target scene type is different and the scene identification information is also different. For example, for a shelf scene, the scene identification information may include the number of shelf layers, confidence, layer number location information, and the like. As another example, for a ground dump scene, the scene identification information may include a height of a stacker, confidence, and stacker position information.

The number of the target scene categories may include one or more than one, and when the number of the target scene categories includes a plurality of the target scene categories, the scene recognition models corresponding to each target scene category may be respectively obtained, and for each target scene category, the scene recognition is performed on the image to be detected based on the scene recognition model corresponding to the target scene category.

For example, for a shelf scene, scene recognition is performed on image content of a scene frame of the shelf scene in the image to be detected based on a scene recognition model corresponding to the shelf scene. For another example, for a ground dump scene, the image content of the scene frame of the ground dump scene in the image to be detected is subjected to scene recognition based on the scene recognition model corresponding to the ground dump scene.

The scene recognition model may be referred to as a scene-specific model, that is, a model specific to a specific scene, which may be a customized training scene-specific model or a prefabricated scene-specific model, and is not specifically limited herein.

The pre-manufactured scene special model can be a model obtained by training based on scene image sample data under the scene category in the industry scene library, wherein the input is the scene image sample data under the scene category in the industry scene library, and the output is the scene special model.

The scene special model for customized training can introduce actual data input by a user under a certain scene category, and can fuse scene image sample data under the scene category in an industry scene library to perform customized training. Therefore, the image characteristic deviation of the actual data and the scene image sample data in the industry scene library can be solved, and the scene image sample data in the industry scene library can be subjected to style conversion by generating the countermeasure network during actual training to generate a data set suitable for training.

In practical application, as shown in fig. 3, the object detection platform may include scene recognition models corresponding to a plurality of scene categories, may include a customized and trained scene-specific model, such as a scene a-specific model, and may also include a prefabricated scene-specific model, such as a scene B-specific model, and may obtain a scene recognition model corresponding to a target scene category from these scene recognition models.

Step S105: and determining second object detection information of the image to be detected based on the scene recognition model and the first object detection information.

In this step, the second object detection information may be used to represent related information of the object in a specific scene, such as a placement position and a placement condition, and the scene types are different and the second object detection information is also different.

For example, for a shelf scene, the second object detection information includes, but is not limited to, the number of layers of the item on the shelf, the arrangement area ratio of the item in the layer, and the fullness of the item on the shelf. For another example, for a ground dump scene, the second object detection information includes, but is not limited to, the number of actually stacked commodities, the depth of the commodities, and the like.

In an optional embodiment, the step specifically includes:

performing scene recognition on the image to be detected based on the scene recognition model to obtain scene recognition information under the target scene category, wherein the scene recognition information is used for representing the scene environment of an object in a scene frame corresponding to the target scene category;

and performing fusion processing on the first object detection information and the scene identification information to obtain the second object detection information.

Specific description thereof will be set forth in detail later with respect to this embodiment.

In another optional embodiment, the first object detection information and the image to be detected may be input to the scene recognition model to perform scene recognition and information matching operations, so as to obtain second object detection information of the image to be detected.

In the embodiment, an image to be detected is obtained; carrying out object detection on the image to be detected to obtain first object detection information of the image to be detected; performing scene detection on the image to be detected to acquire target scene detection information of the image to be detected, wherein the target scene detection information comprises a target scene category corresponding to the image to be detected; acquiring a scene recognition model corresponding to the target scene category; and determining second object detection information of the image to be detected based on the scene recognition model and the first object detection information. Therefore, the object detection can be performed on the image to be detected by combining the object detection model, the scene detection model and the scene recognition model, so that under the condition that the image to be detected comprises a plurality of detection scenes, the related information of the object in a specific scene can be accurately detected, and the object detection effect can be improved.

Optionally, step S105 specifically includes:

In this embodiment, the image to be detected marked with the scene frame corresponding to the target scene category may be input to a scene recognition model for scene recognition, so as to obtain the scene recognition information under the target scene category.

The scene identification information is used for representing the scene environment of the object in the scene frame corresponding to the target scene type, and the target scene type is different and the scene identification information is also different. For example, for a shelf scene, the scene identification information may include the number of shelf layers, confidence, layer number location information, and the like. As shown in fig. 5, the square grid marking 501 represents a shelf layer, and includes two shelf layers on which an object 502 is placed, and a bold frame 503 outside the shelf layers represents a marking frame of the shelf layers. As another example, for a ground dump scene, the scene identification information may include a height of a stacker, confidence, and stacker position information.

Then, the first object detection information and the scene identification information may be fused to obtain the second object detection information, and the fusion manner may be various, for example, the first object detection information and the scene identification information may be fused based on a pixel position relationship between a pixel position of the object in the image to be detected and the scene division in the scene frame. For example, for a shelf scene, which layer of the shelf the object is located on can be determined based on the relationship between the pixel position of the object in the image to be detected and the pixel position of the number of shelf layers in the image to be detected. For another example, a corresponding fusion function may be set to fuse the first object detection information and the scene identification information.

In this embodiment, scene recognition is performed on the image to be detected based on the scene recognition model to obtain scene recognition information under the target scene category, where the scene recognition information is used to represent a scene environment of an object in a scene frame corresponding to the target scene category; and performing fusion processing on the first object detection information and the scene identification information to obtain the second object detection information. Therefore, the final second object detection information is obtained in an information fusion mode, the situation that a scene recognition model needs to be retrained when a new scene or a scene changes can be avoided, and the training difficulty of the scene recognition model in the object detection platform can be reduced.

Optionally, the right carries out scene detection on the image to be detected to obtain target scene detection information of the image to be detected, including:

performing scene detection on the image to be detected based on a scene detection model to obtain target scene detection information of the image to be detected; the scene detection model is obtained by training scene image sample data under the industry category in an industry scene library, the industry scene library comprises the scene image sample data under M industry categories, and M is a positive integer.

In this embodiment, the scene detection model may be an industry-common scene detection model, that is, a model obtained by training based on scene image sample data under an industry category in an industry scene library, where the industry scene library includes scene image sample data under M industry categories, and the industry-common scene detection model may be referred to as a prefabricated scene detection model.

Optionally, before the scene detection is performed on the image to be detected based on the scene detection model to obtain the target scene detection information of the image to be detected, the method further includes:

acquiring a target industry category corresponding to the image to be detected;

and acquiring the scene detection model corresponding to the target industry category from a preset scene detection model.

In this embodiment, the preset scene detection models are at least one prefabricated scene detection model, and each prefabricated scene detection model may correspond to one industry category.

Under the condition that the object detection platform can be applied to multiple industries, in order to ensure the accuracy of scene detection, a scene detection model matched with the industry type corresponding to an image to be detected needs to be obtained. Specifically, the target industry category corresponding to the image to be detected may be obtained in a variety of manners, for example, the industry category corresponding to the image to be detected may be identified to obtain the target industry category, the target industry category may also be obtained and stored in advance, and the target industry category sent by other electronic devices may also be received. And then, acquiring the scene detection model corresponding to the target industry category from a prefabricated scene detection model, and carrying out scene detection on the image to be detected based on the scene detection model.

Therefore, the object detection platform can be applied to object detection of multiple industries, and a scene detection model can be highly multiplexed from the industry dimension, so that the problem of multi-scene adaptation applied to most of the industry dimension is solved.

Optionally, the industry scene library includes scene image sample data in N scene categories corresponding to each industry category, where N is a positive integer, the scene identification model is a model obtained by training based on target data, and the target data includes the scene image sample data in the target scene category of the target industry category in the industry scene library, and/or the input object image sample data in the target scene category.

In this embodiment, the scene recognition model may be referred to as a scene specific model, that is, a model specific to a specific scene, and is a pre-fabricated scene specific model when the scene recognition model is trained based on the scene image sample data in the target scene category of the target industry category in the industry scene library.

When the scene recognition model is trained based on object image sample data (which is actual data) in the target scene category input by the user, the model is a customized training scene-specific model, and the customized training scene-specific model can also be obtained by jointly training the object image sample data in the target scene category input by the user and the scene image sample data in the target scene category in the industry scene library in the target industry category.

Therefore, the object detection platform can comprise a prefabricated scene special model and a customized training scene special model, so that the scene recognition models of the object detection platform are rich, and the flexibility of scene recognition is improved.

To explain the solution of the present disclosure in more detail, referring to fig. 6, fig. 6 is a schematic overall flow chart of object detection in the object detection platform, as shown in fig. 6, the object detection flow chart is as follows:

acquiring an image to be detected;

carrying out object detection on an image to be detected to obtain first object detection information; simultaneously, carrying out scene detection on an image to be detected to obtain scene detection information;

acquiring filtering information under the condition that the object detection platform starts a scene selection switch;

scene filtering is carried out on the scene detection information based on the filtering information, and target scene detection information is obtained;

acquiring scene special models corresponding to the target scene types included in the target scene detection information, wherein the scene special models include a scene A special model and a scene B special model;

respectively carrying out scene recognition on the image to be detected based on the special model for the scene A and the special model for the scene B to obtain scene A information and scene B information;

and fusing the first object detection information, the scene A information and the scene B information to obtain second object detection information.

Second embodiment

As shown in fig. 7, the present disclosure provides a model training method, comprising the steps of:

step S701: acquiring target data, wherein the target data comprises scene image sample data under a target scene category, and the target scene category is a scene category in target scene detection information acquired by carrying out scene detection on an image to be detected;

step S702: training a scene recognition model corresponding to the target scene category based on the target data;

In this embodiment, the scene recognition model may be trained by using scene image sample data in a target scene category, where the scene image sample data in the target scene category may be scene image sample data in the industrial scene library in the target scene category, or may be input scene image sample data in the target scene category, and the input scene image sample data in the target scene category may also be referred to as object image sample data in the target scene category, such as a shelf scene, and an image including contents of a shelf object may be input as the scene image sample data. The scene image sample data under the target scene category can also simultaneously comprise the scene image sample data under the target scene category and the input scene image sample data under the target scene category in the industry scene library.

In an optional implementation, scene image sample data under a target scene category in an industry scene library may be used for training. The industry scene library comprises scene image sample data under a plurality of scene categories, and the plurality of scene categories comprise target scene categories, so that the scene image sample data under the target scene categories can be obtained from the industry scene library, and a scene recognition model corresponding to the target scene categories is trained based on the scene image sample data under the target scene categories to obtain a prefabricated scene recognition model.

In another alternative embodiment, the actual data under the target scene category may be used for training. Scene image sample data under the target scene category can be collected and input into the object detection platform, and a scene recognition model corresponding to the target scene category is trained based on the input scene image sample data under the target scene category to obtain a customized training scene recognition model.

In yet another optional implementation, the input scene image sample data under the target scene category and the input scene image sample data under the target scene category in the industry scene library may be combined to train the scene recognition model corresponding to the target scene category, so as to obtain a customized trained scene recognition model.

The scene recognition model can be used for determining second object detection information of an image to be detected by combining first object detection information, wherein the first object detection information is object detection information obtained by performing object detection on the image to be detected.

The related concepts such as the image to be detected, the first object detection information, the second object detection information, etc. have been explained in detail in the first embodiment, and are not described herein again. The object detection model can be used for carrying out object detection on the image to be detected to obtain first object detection information, and the scene detection model can be used for carrying out scene detection on the image to be detected to obtain target scene detection information.

In the embodiment, the object detection can be performed on the image to be detected by combining the object detection model and the scene detection model through training the scene recognition model corresponding to the target scene category, so that the related information of the object in a specific scene can be accurately detected under the condition that the image to be detected comprises a plurality of detection scenes, and the object detection effect can be improved.

Optionally, the target scene category is a scene category in target scene detection information obtained by performing scene detection on an image to be detected based on a scene detection model, the target scene category is a scene category under a target industry category, and the target industry category is an industry category corresponding to the image to be detected;

the method further comprises the following steps:

acquiring scene image sample data under the target industry category from an industry scene library;

training the scene detection model based on the scene image sample data under the target industry category, wherein the industry scene library comprises the scene image sample data under M industry categories, the M industry categories comprise the target industry category, and M is a positive integer.

In this embodiment, the target scene category is a scene category in the target industry category, the industry scene library includes scene image sample data in M industry categories, and the M industry categories include the target industry category, so that the scene image sample data in the target industry category can be obtained from the industry scene library, and the scene detection model is trained based on the scene image sample data in the target industry category to obtain a prefabricated scene detection model.

The prefabricated scene detection model can be a scene detection model which is universal for industries corresponding to target industry types. Thus, for an industry, even different object detection applications can use a universal scene detection model to perform scene detection on an image to be detected, namely, different object detection applications of the industry can reuse one scene detection model, so that the problem of multi-scene adaptation of most applications in industry dimension can be solved.

Third embodiment

As shown in fig. 8, the present disclosure provides an object detection apparatus 800 comprising:

a first obtaining module 801, configured to obtain an image to be detected;

an object detection module 802, configured to perform object detection on the image to be detected to obtain first object detection information of the image to be detected;

a scene detection module 803, configured to perform scene detection on the image to be detected to obtain target scene detection information of the image to be detected, where the target scene detection information includes a target scene category corresponding to the image to be detected;

a second obtaining module 804, configured to obtain a scene identification model corresponding to the target scene category;

a determining module 805, configured to determine second object detection information of the image to be detected based on the scene recognition model and the first object detection information.

Optionally, the determining module 805 includes:

the scene recognition unit is used for carrying out scene recognition on the image to be detected based on the scene recognition model to obtain scene recognition information under the target scene category, and the scene recognition information is used for representing the scene environment of an object in a scene frame corresponding to the target scene category;

and the fusion processing unit is used for carrying out fusion processing on the first object detection information and the scene identification information to obtain the second object detection information.

Optionally, the scene detection module 803 is specifically configured to perform scene detection on the image to be detected based on a scene detection model, so as to obtain target scene detection information of the image to be detected; the scene detection model is obtained by training scene image sample data under the industry category in an industry scene library, the industry scene library comprises the scene image sample data under M industry categories, and M is a positive integer.

Optionally, the apparatus further comprises:

the third acquisition module is used for acquiring the target industry category corresponding to the image to be detected;

and the fourth acquisition module is used for acquiring the scene detection model corresponding to the target industry category from a preset scene detection model.

The object detection apparatus 800 provided by the present disclosure can implement each process implemented by the above-mentioned object detection method embodiments, and can achieve the same beneficial effects, and for avoiding repetition, the details are not repeated here.

Fourth embodiment

As shown in fig. 9, the present disclosure provides a model training apparatus 900 comprising:

a fifth obtaining module 901, configured to obtain target data, where the target data includes scene image sample data in a target scene category, and the target scene category is a scene category in target scene detection information obtained by performing scene detection on an image to be detected;

a first training module 902, configured to train, based on the target data, a scene recognition model corresponding to the target scene category;

the device further comprises:

a sixth obtaining module, configured to obtain scene image sample data in the target industry category from an industry scene library;

and the second training module is used for training the scene detection model based on the scene image sample data under the target industry category, wherein the industry scene library comprises the scene image sample data under M industry categories, the M industry categories comprise the target industry category, and M is a positive integer.

The model training device 900 provided by the present disclosure can implement each process implemented by the above-mentioned model training method embodiments, and can achieve the same beneficial effects, and for avoiding repetition, it is not repeated here.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 10 illustrates a schematic block diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM1003, various programs and data necessary for the operation of the device 1000 can also be stored. The calculation unit 1001, the ROM1002, and the RAM1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

A number of components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 executes the respective methods and processes described above, such as the object detection method or the model training method. For example, in some embodiments, the object detection method or the model training method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM1002 and/or communications unit 1009. When the computer program is loaded into RAM1003 and executed by the computing unit 1001, one or more steps of the object detection method or the model training method described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured by any other suitable method (e.g., by means of firmware) to perform an object detection method or a model training method.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more editing languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An object detection method comprising:

acquiring an image to be detected;

acquiring a scene recognition model corresponding to the target scene category;

2. The method of claim 1, wherein the determining second object detection information for the image to be detected based on the scene recognition model and the first object detection information comprises:

3. The method as claimed in claim 1, wherein said performing scene detection on said image to be detected to obtain target scene detection information of said image to be detected comprises:

4. The method as claimed in claim 3, wherein before the scene detection is performed on the image to be detected based on the scene detection model to obtain the target scene detection information of the image to be detected, the method further comprises:

acquiring a target industry category corresponding to the image to be detected;

5. The method according to claim 3, wherein the industry scene library comprises scene image sample data under N scene classes corresponding to each industry class, wherein N is a positive integer, the scene recognition model is a model trained based on target data, and the target data comprises the scene image sample data under the target scene class of the target industry class in the industry scene library and/or the input object image sample data under the target scene class.

6. A model training method, comprising:

acquiring target data, wherein the target data comprises scene image sample data under a target scene category, and the target scene category is a scene category in target scene detection information acquired by carrying out scene detection on an image to be detected;

7. The method according to claim 6, wherein the target scene category is a scene category in target scene detection information obtained by performing scene detection on the image to be detected based on a scene detection model, the target scene category is a scene category under a target industry category, and the target industry category is an industry category corresponding to the image to be detected;

the method further comprises the following steps:

8. An object detecting device comprising:

the first acquisition module is used for acquiring an image to be detected;

9. The apparatus of claim 8, wherein the means for determining comprises:

10. The apparatus according to claim 8, wherein the scene detection module is specifically configured to perform scene detection on the image to be detected based on a scene detection model to obtain target scene detection information of the image to be detected; the scene detection model is obtained by training scene image sample data under the industry category in an industry scene library, the industry scene library comprises the scene image sample data under M industry categories, and M is a positive integer.

11. The apparatus of claim 10, wherein the apparatus further comprises:

12. The apparatus according to claim 10, wherein the industry scene library includes scene image sample data in N scene classes corresponding to each industry class, where N is a positive integer, the scene identification model is a model trained based on target data, and the target data includes scene image sample data in the target scene class of the target industry class in the industry scene library, and/or input object image sample data in the target scene class.

13. A model training apparatus comprising:

14. The device according to claim 13, wherein the target scene category is a scene category in target scene detection information obtained by performing scene detection on an image to be detected based on a scene detection model, the target scene category is a scene category under a target industry category, and the target industry category is an industry category corresponding to the image to be detected;

the device further comprises:

15. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5 or to perform the method of any one of claims 6-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-5 or to perform the method of any one of claims 6-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-5, or implements the method according to any one of claims 6-7.