CN114708185A - Target detection method, system and equipment based on big data enabling and model flow - Google Patents

Target detection method, system and equipment based on big data enabling and model flow Download PDF

Info

Publication number
CN114708185A
CN114708185A CN202111258992.4A CN202111258992A CN114708185A CN 114708185 A CN114708185 A CN 114708185A CN 202111258992 A CN202111258992 A CN 202111258992A CN 114708185 A CN114708185 A CN 114708185A
Authority
CN
China
Prior art keywords
model
target detection
current scene
image
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111258992.4A
Other languages
Chinese (zh)
Inventor
张兆翔
彭君然
卜兴源
常清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202111258992.4A priority Critical patent/CN114708185A/en
Publication of CN114708185A publication Critical patent/CN114708185A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of computer vision and pattern recognition, and particularly relates to a target detection method, a system and equipment based on big data enabling and model flow, aiming at solving the problems that the existing target detection model is limited by training data, so that the model performance is not high, and the reusability is poor in different application scenes. The invention comprises the following steps: integrating all the disclosed target detection data sets, and building a model sampling space by taking any model as a reference; completing dynamic extranet covering various operation requirements in one training; in the current scene, model initialization and sub-model screening are carried out through semantic information vectors of categories; and pre-training the sub-models through the current scene data, and finally obtaining the target detection model to perform target detection on the image to be detected of the target in the current scene. After a flexible dynamic extranet is constructed, the target detection model with excellent performance in the current scene can be obtained by using a small amount of labeled data in the current scene for quick fine adjustment.

Description

Target detection method, system and equipment based on big data enabling and model flow
Technical Field
The invention belongs to the field of computer vision and pattern recognition, and particularly relates to a target detection method, a system and equipment based on big data enabling and model flow.
Background
Target detection is an important and challenging computer vision task, and has wide application in the fields of security monitoring, intelligent video analysis, automatic driving and the like. In each application scenario, the cost is too high to achieve the data set with the similar COCO labeling order, and the realization is difficult. Generally, only a small amount of low-cost labeled data can be obtained, and the limited data causes that the model training is very difficult and the performance of the model is difficult to be exerted as much as possible. Meanwhile, in different application scenes, the difference of usable hardware resources is very obvious, the models capable of being deployed are different, and in a new scene, a new model needs to be selected again from the beginning according to the experience of human experts for retraining, so that the reusability of the model is very poor, and the hardware resources are very wasted.
Disclosure of Invention
In order to solve the above problems in the prior art, namely that the existing target detection model is limited by insufficient training data, so that the model performance is not high, and the reusability is poor, in different application scenarios, the model needs to be changed due to the change of hardware requirements, so that the model needs to be retrained, the invention provides a target detection method based on big data enabling and model flow, which comprises the following steps:
step S10, acquiring a target detection image of the current scene, and converting the category information of the image into a semantic information vector through a Word2vector model;
step S20, performing cosine similarity matching between the semantic information vector and the semantic information vectors of all the public target detection data set image category information;
step S30, initializing the target detection model of the current scene by the classification full-connection weight of the class with the highest matching value in the ultra-network with variable width and depth;
step S40, pre-specifying the floating point operation times and parameter quantity of the model, and traversing in the model sampling space one by one to obtain K target detection submodels of the current scene;
step S50, pre-training the K target detection submodels by respectively adopting the training images of the current scene, and taking the pre-trained submodel with the maximum mAP value as the final target detection model of the current scene;
and step S60, carrying out target detection on the image to be detected by the target in the current scene through the final target detection model in the current scene to obtain a target detection result.
In some preferred embodiments, the model sampling space is obtained by:
step A10, normalizing the images in all the obtained public target detection data sets, and converting the category information corresponding to the images into semantic information vectors through a Word2vector model;
step A20, calculating cosine similarity among semantic information vectors of each category, merging categories with the cosine similarity larger than a preset threshold value, and then performing label remapping;
and A30, selecting any deep learning target detection model, and extracting the width and the depth sampling space of the backbone network set model according to the characteristics of the model to obtain a model sampling space.
In some preferred embodiments, the width and depth variable ultra-mesh is constructed and trained by the following method:
step B10, dividing the image set after label remapping into batches with set size, extracting the characteristics of any batch of images through the characteristic extraction backbone network, and randomly sampling a model in the model sampling space;
step B20, inputting the characteristics of the current batch of images into the classification of the corresponding random sampling model and the regression branch of the bounding box for forward propagation, and calculating the global loss of the model;
and step B30, reducing the global loss through a back propagation method and a random gradient descent method to update model parameters, and carrying out iterative training until a set training end condition is reached to obtain the ultra-net with variable width and depth.
In some preferred embodiments, before dividing the image set after the tag remapping in step B10 into batches with a set size, an image set expanding step is further provided, and the method is as follows:
and carrying out random multi-scale scaling and multi-angle turning operation on the image subjected to the label remapping to obtain an expanded image set.
In some preferred embodiments, the global penalty is expressed as:
Lall=λLrcnn+Lrpn
wherein L isrcnnClassification and bounding Box regression losses, L, for the representative modelrpnThe area of the model proposes the loss of the network part, and λ is a balance factor for balancing the two losses.
In some preferred embodiments, the classification and bounding box regression loss for the model is expressed as:
Figure BDA0003324948590000031
where k represents the number of the prediction box, pkiRepresenting the prediction probability of the kth prediction box being predicted as class i,
Figure BDA0003324948590000032
represents the kth preActual probability, L, of the test frame corresponding to the i-th class labelclsFor cross entropy loss, NclsRepresents the total number of categories included in the dataset; t is tkiThe prediction coordinates representing the k-th prediction box,
Figure BDA0003324948590000033
representing the true coordinates, L, corresponding to the kth prediction boxregTo smooth L1 loss, NregRepresents the total number of prediction boxes; γ is a balancing factor for balancing the two losses.
In some preferred embodiments, the region of the model proposes a network portion loss, which is expressed as:
Figure BDA0003324948590000034
where k denotes the number representing the prediction frame, piA prediction probability representing whether the kth prediction box contains an object,
Figure BDA0003324948590000041
actual probability, L, representing whether the kth prediction box contains an objectclsFor binary cross entropy loss, K represents the number of all predicted detection frames; t is tkThe prediction coordinates representing the k-th prediction box,
Figure BDA0003324948590000042
representing the true coordinates, L, corresponding to the kth prediction boxregLost to smooth L1; β is a balance factor for balancing the two losses.
In another aspect of the invention, an object detection system based on big data enabling and model flow is provided, which comprises the following modules:
the semantic information vector extraction module is configured to acquire an image to be detected of a target in a current scene and convert the category information of the image into a semantic information vector through a Word2vector model;
the matching module is configured to perform cosine similarity matching between the semantic information vector and semantic information vectors of all public target detection data set image category information;
the initialization module is configured to initialize the target detection model of the current scene according to the classification full-connection weight of the class with the highest matching value in the ultra-network with variable width and depth;
the submodel screening module is configured to pre-specify the floating point operation times and the parameter quantity of the model, and traverse the models one by one in a model sampling space to obtain K target detection submodels of the current scene;
the model training module is configured to respectively adopt the training images of the current scene to pre-train the K target detection submodels, and takes the pre-trained submodel with the maximum mAP value as the final target detection model of the current scene;
and the target detection module is configured to perform target detection on the image to be subjected to target detection in the current scene through the final target detection model in the current scene to obtain a target detection result.
In a third aspect of the present invention, an electronic device is provided, including:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein,
the memory stores instructions executable by the processor for execution by the processor to implement the big data enable and model flow based object detection method described above.
In a fourth aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for execution by the computer to implement the above-mentioned big data enabling and model flow based object detection method.
The invention has the beneficial effects that:
(1) according to the target detection method based on big data enabling and model flow, disclosed by the invention, the supernet with dynamically variable depth and width is constructed by integrating publicly available large target detection data sets, and the supernet is enabled by big data enabling, so that the adaptability of the model can be greatly improved, and the effect of simultaneously training hundreds of thousands of submodels is achieved.
(2) According to the target detection method based on big data enabling and model flow, the trained model has extremely strong generalization capability because of being trained by a million-order data set, and can be used as an excellent initialization model of a downstream actual deployment scene, and the model can achieve extremely excellent performance only by providing a small amount of labeled data in the specific deployment scene.
(3) The invention relates to a target detection method based on big data enabling and model flow, which utilizes the characteristic that images distributed in the same domain are similar in the characteristic graph of a full connection layer of a network to match and select the closest image with a label from a database trained by a super-network for the images provided by a downstream deployment scene, and the closest image with the label is added into the training of model deployment, thereby further improving the performance of a model.
(4) According to the target detection method based on big data enabling and model flow, simultaneous training including subnets is achieved through the design of a dynamic structure in the initial extranet training process, when deployment is conducted on various different scenes, only direct extraction is needed to be conducted on the trained subnets according to various requirements of the corresponding scenes, and model deployment consumption under various environments is saved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow diagram of a big data enabling and model flow based object detection method of the present invention;
FIG. 2 is a block diagram of the present invention large data enabling and model flow based object detection method;
FIG. 3 is a schematic search space diagram of an embodiment of a big data enabling and model flow based object detection method according to the present invention, which uses a residual neural network as a backbone network;
FIG. 4 is a schematic diagram of similar image extraction in a sparse data scene according to an embodiment of the target detection method based on big data enabling and model flow.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention provides a target detection method based on big data enabling and model flow, aiming at the problems in the prior art in target detection, aiming at the current situation that the performance of the existing target detection model is low in a few-sample scene on one hand, and aiming at the situations of complexity of model deployment and resource waste on the other hand, hundreds of thousands of submodels are trained at one time by integrating a large target detection data set and constructing a flexible dynamic extranet. In a specific use scene, selecting the weight corresponding to the closest semantic category in the training database according to the word vector model for initialization, and quickly fine-tuning by using a small amount of labeled data in the use scene to achieve excellent target detection performance. Under the condition that corresponding use scene labeled data are sparse, similar data are selected from a training database according to the distribution characteristics of the labeled data in a network full-connection layer feature diagram for assisting deployment training, and the performance of the model is further enhanced.
The invention relates to a target detection method based on big data enabling and model flow, which comprises the following steps:
step S10, acquiring a target detection image of the current scene, and converting the category information of the image into a semantic information vector through a Word2vector model;
step S20, performing cosine similarity matching between the semantic information vector and the semantic information vectors of all the public target detection data set image category information;
step S30, initializing the target detection model of the current scene by the classification full-connection weight of the category with the highest matching value in the hyper-network with variable width and depth;
step S40, pre-specifying the floating point operation times and parameter quantity of the model, and traversing in the model sampling space one by one to obtain K target detection submodels of the current scene;
step S50, pre-training the K target detection submodels by respectively adopting the training images of the current scene, and taking the pre-trained submodel with the maximum mAP value as the final target detection model of the current scene;
and step S60, carrying out target detection on the image to be detected by the target in the current scene through the final target detection model in the current scene to obtain a target detection result.
In order to more clearly describe the object detection method based on big data enabling and model flow of the present invention, the following describes the steps in the embodiment of the present invention in detail with reference to fig. 1 and 2.
The object detection method based on big data enabling and model flow in the first embodiment of the invention has the following steps:
step A10, normalizing the images in all the obtained public target detection data sets, and converting the category information corresponding to the images into semantic information vectors through a Word2vector model. In one embodiment of the invention, the length of the short side of all the images is adjusted to 800 pixels.
And step A20, calculating cosine similarity among semantic information vectors of each category, merging categories with the cosine similarity larger than a preset threshold value, and then performing label remapping. In one embodiment of the present invention, the predetermined threshold is 0.8.
And A30, selecting any deep learning target detection model, and extracting the width and the depth sampling space of the backbone network set model according to the characteristics of the model to obtain a model sampling space.
In one embodiment of the invention, a two-stage target detector FasterRCNN model is selected as a target detection model, a residual error network Resnet is selected as a backbone network of the model, and a Resnet50 network, a Resnet77 network or a Resnet101 network is selected as a model anchor point.
As shown in fig. 3, which is a schematic diagram of a search space when a residual neural network is used as a backbone network according to an embodiment of the target detection method based on big data enabling and model flow in the present invention, AR50, AR77, and AR101 represent respet 50, respet 77, and respet 101 networks respectively, dmin represents the depth of the minimum network, Danchor represents the depth of the corresponding anchor model, Dmax represents the depth of the maximum network, Dstep represents the step size of the network that can be dynamically changed from the minimum depth to the maximum depth, Wmin represents the width of the maximum network, Wanchor represents the width of the corresponding anchor model, Wmax represents the depth of the corresponding maximum model, Wstep represents the step size of the network that can be dynamically changed from the minimum width to the maximum width, Smin represents the minimum input resolution of the network, Sanchor represents the reference input resolution, Smax represents the maximum input resolution of the network, and Sstep represents the step size of the network that accepts the input resolution to be changed from the maximum to the minimum.
And carrying out random multi-scale scaling and multi-angle turning operation on the image subjected to the label remapping to obtain an expanded image set.
And step B10, dividing the image set (or the expanded image set) into batches with set sizes, extracting the features of any batch of images through the feature extraction backbone network, and randomly sampling a model in the model sampling space.
Step B20, inputting the characteristics of the current batch of images into the classification of the corresponding random sampling model and the regression branch of the bounding box for forward propagation, and calculating the global loss of the model, as shown in formula (1):
Lall=λLrcnn+Lrpn (1)
wherein L isrcnnClassification and bounding box regression loss, L, of the representative modelrpnThe area of the model proposes the loss of the network part, and λ is a balance factor for balancing the two losses.
Classification and bounding box regression loss for the model, which is expressed as shown in equation (2):
Figure BDA0003324948590000091
where k represents the number of the prediction box, pkiRepresenting the prediction probability of the kth prediction box being predicted as class i,
Figure BDA0003324948590000092
represents the actual probability that the corresponding label of the kth prediction box is the ith class, LclsFor cross entropy loss, NclsRepresents the total number of categories included in the dataset; t is tkiThe predicted coordinates representing the kth prediction box,
Figure BDA0003324948590000093
represents the true coordinate corresponding to the kth prediction box, LregTo smooth L1 loss, NregRepresents the total number of prediction boxes; γ is a balancing factor for balancing the two losses.
The area of the model proposes a network portion loss, which is expressed as shown in equation (3):
Figure BDA0003324948590000094
where k denotes the number representing the prediction frame, piA prediction probability representing whether the kth prediction box contains an object,
Figure BDA0003324948590000095
actual probability, L, representing whether the kth prediction box contains an objectclsFor binary cross entropy loss, K represents the number of all predicted detection frames; t is tkThe prediction coordinates representing the k-th prediction box,
Figure BDA0003324948590000096
representing the true coordinates, L, corresponding to the kth prediction boxregLoss of smoothing L1; β is a balance factor for balancing the two losses.
And step B30, reducing the global loss through a back propagation method and a random gradient descent method to update model parameters, and carrying out iterative training until a set training end condition is reached to obtain the ultra-net with variable width and depth.
Step S10, obtaining an image to be target-detected in a current scene (i.e., a downstream target-detected scene that needs to be deployed), and converting the category information of the image into a semantic information vector through a Word2vector model.
And step S20, performing cosine similarity matching between the semantic information vector and the semantic information vectors of all the public target detection data set image category information.
And step S30, initializing the target detection model of the current scene by the classification full-connection weight of the class with the highest matching value in the ultra-net with variable width and depth.
Step S40, appointing the floating point operation times (FLOPs, floating point operation times, which can be used for measuring algorithm/model complexity) and parameters of the model in advance, traversing one by one in the model sampling space, screening out submodels meeting the limitation requirement, and randomly sampling K selected submodels to obtain K target detection submodels of the current scene.
Step S50, pre-training the K target detection submodels respectively by using the training images of the current scene (i.e. the data set of the downstream target detection scene to be deployed), to obtain the maps, and selecting the pre-trained submodel with the best result from the K target detection submodels as the final target detection model of the current scene (i.e. the final deployment model of the downstream target detection scene to be deployed) according to the maps result.
And step S60, carrying out target detection on the image to be detected by the target in the current scene through the final target detection model in the current scene to obtain a target detection result.
As shown in fig. 4, a schematic diagram of similar image extraction in a sparse data scene is shown, according to an embodiment of the target detection method based on big data enabling and model flow, labeled images very similar to a deployment scene can be provided from a locally maintained huge database, and the performance of a deep learning algorithm is greatly improved along with the improvement of data volume.
The method has the advantages that the potential capability of a large-scale data set is mined, so that the trained model has excellent migration capability, and excellent target detection performance can be achieved only by little related domain data in other unseen data domains. In an actual application scene, data are often sparse, the method not only provides good model initialization, but also selects the most similar related data aiming at the deployed specific data domain from the labeling database used for training through the characteristics of the distribution of the characteristic diagram of the image in the network, assists the specific downstream model deployment training, and obtains excellent target detection performance in the scene with sparse data. .
Although the foregoing embodiments describe the steps in the above sequential order, those skilled in the art will understand that, in order to achieve the effect of the present embodiments, the steps may not be executed in such an order, and may be executed simultaneously (in parallel) or in an inverse order, and these simple variations are within the scope of the present invention.
A second embodiment of the present invention is a big-data-enabled and model-flow-based object detection system, comprising the following modules:
the semantic information vector extraction module is configured to acquire an image to be detected of a target in a current scene and convert the category information of the image into a semantic information vector through a Word2vector model;
the matching module is configured to perform cosine similarity matching between the semantic information vector and semantic information vectors of all public target detection data set image category information;
the initialization module is configured to initialize the target detection model of the current scene according to the classification full-connection weight of the class with the highest matching value in the ultra-network with variable width and depth;
the submodel screening module is configured to pre-specify the floating point operation times and the parameter quantity of the model, and traverse the models one by one in a model sampling space to obtain K target detection submodels of the current scene;
the model training module is configured to respectively adopt training images of a current scene to pre-train the K target detection submodels, and takes the pre-trained submodel with the maximum mAP value as a final target detection model of the current scene;
and the target detection module is configured to perform target detection on the image to be detected by the target in the current scene through the final target detection model in the current scene to obtain a target detection result.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
It should be noted that, the object detection system based on big data enabling and model flow provided by the foregoing embodiment is only illustrated by the above division of each functional module, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
An electronic apparatus according to a third embodiment of the present invention includes:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein,
the memory stores instructions executable by the processor for execution by the processor to implement the big data enable and model flow based object detection method described above.
A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the big data enabling and model flow based object detection method described above.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art will appreciate that the various illustrative modules, method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. An object detection method based on big data enabling and model flow, characterized in that the object detection method comprises:
step S10, acquiring a target detection image of the current scene, and converting the category information of the image into a semantic information vector through a Word2vector model;
step S20, performing cosine similarity matching between the semantic information vector and the semantic information vectors of all the public target detection data set image category information;
step S30, initializing the target detection model of the current scene by the classification full-connection weight of the class with the highest matching value in the ultra-network with variable width and depth;
step S40, pre-specifying the floating point operation times and parameter quantity of the model, and traversing in the model sampling space one by one to obtain K target detection submodels of the current scene;
step S50, pre-training the K target detection sub-models respectively by using the training images of the current scene, and taking the pre-trained sub-model with the largest mean Average Precision value as the final target detection model of the current scene;
and step S60, carrying out target detection on the image to be detected by the target in the current scene through the final target detection model in the current scene to obtain a target detection result.
2. The big data enabling and model flow based object detection method according to claim 1, wherein the model samples the space and is obtained by:
step A10, normalizing the images in all the obtained public target detection data sets, and converting the category information corresponding to the images into semantic information vectors through a Word2vector model;
step A20, calculating cosine similarity among semantic information vectors of each category, merging categories with the cosine similarity larger than a preset threshold value, and then performing label remapping;
and A30, selecting any deep learning target detection model, and extracting the width and the depth sampling space of the backbone network set model according to the characteristics of the model to obtain a model sampling space.
3. The big data enabling and model flow based object detection method according to claim 2, wherein the ultra-net with variable width and depth is constructed and trained by the following method:
step B10, dividing the image set after label remapping into batches with set size, extracting the characteristics of any batch of images through the characteristic extraction backbone network, and randomly sampling a model in the model sampling space;
step B20, inputting the characteristics of the current batch of images into the classification of the corresponding random sampling model and the regression branch of the bounding box for forward propagation, and calculating the global loss of the model;
and step B30, reducing the global loss through a back propagation method and a random gradient descent method to update model parameters, and carrying out iterative training until a set training end condition is reached to obtain the ultra-net with variable width and depth.
4. The big data enabling and model flow based object detection method according to claim 3, wherein before dividing the image set after label remapping in step B10 into batches with set size, there is further provided an image set expansion step, and the method comprises:
and carrying out random multi-scale scaling and multi-angle turning operation on the image subjected to the label remapping to obtain an expanded image set.
5. The big-data-enabled and model-flow-based object detection method of claim 3, wherein the global penalty is expressed as:
Lall=λLrcnn+Lrpn
wherein L isrcnnClassification and bounding box regression loss, L, of the representative modelrpnThe area of the model proposes the loss of the network part, and λ is a balance factor for balancing the two losses.
6. The big-data-enabled and model-flow-based object detection method of claim 5, wherein the classification and bounding box regression loss of the model is expressed as:
Figure FDA0003324948580000031
where k represents the number of the prediction box, pkiRepresenting the prediction probability of the kth prediction box being predicted as class i,
Figure FDA0003324948580000032
representing the actual probability that the corresponding label of the kth prediction box is of the ith class, LclsFor cross entropy loss, NclsRepresents the total number of categories included in the dataset; t is tkiThe prediction coordinates representing the k-th prediction box,
Figure FDA0003324948580000033
representing the true coordinates, L, corresponding to the kth prediction boxregTo smooth L1 loss, NregRepresents the total number of prediction boxes; γ is a balancing factor for balancing the two losses.
7. The big-data-enabled and model-flow-based object detection method of claim 5, wherein the model's area proposal network section loss is expressed as:
Figure FDA0003324948580000034
where k denotes the number representing the prediction frame, piPrediction probability representing whether the k-th prediction box contains an object,
Figure FDA0003324948580000035
Actual probability, L, representing whether the kth prediction box contains an objectclsFor binary cross entropy loss, K represents the number of all predicted detection frames; t is tkThe predicted coordinates representing the kth prediction box,
Figure FDA0003324948580000036
representing the true coordinates, L, corresponding to the kth prediction boxregLoss of smoothing L1; β is a balance factor for balancing the two losses.
8. An object detection system based on big data enabling and model flow, characterized in that the object detection system comprises the following modules:
the semantic information vector extraction module is configured to acquire an image to be detected of a target in a current scene and convert the category information of the image into a semantic information vector through a Word2vector model;
the matching module is configured to perform cosine similarity matching between the semantic information vector and semantic information vectors of all public target detection data set image category information;
the initialization module is configured to initialize the target detection model of the current scene according to the classification full-connection weight of the class with the highest matching value in the ultra-network with variable width and depth;
the submodel screening module is configured to pre-specify the floating point operation times and the parameter quantity of the model, traverse one by one in a model sampling space and obtain K target detection submodels of the current scene;
the model training module is configured to respectively adopt the training images of the current scene to pre-train the K target detection submodels, and takes the pre-trained submodel with the maximum mAP value as the final target detection model of the current scene;
and the target detection module is configured to perform target detection on the image to be detected by the target in the current scene through the final target detection model in the current scene to obtain a target detection result.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein,
the memory stores instructions executable by the processor for execution by the processor to implement a big data enabling and model flow based object detection method of any of claims 1-7.
10. A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions for execution by the computer to implement the big-data enabling and model-flow based object detection method of any of claims 1-7.
CN202111258992.4A 2021-10-28 2021-10-28 Target detection method, system and equipment based on big data enabling and model flow Pending CN114708185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111258992.4A CN114708185A (en) 2021-10-28 2021-10-28 Target detection method, system and equipment based on big data enabling and model flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111258992.4A CN114708185A (en) 2021-10-28 2021-10-28 Target detection method, system and equipment based on big data enabling and model flow

Publications (1)

Publication Number Publication Date
CN114708185A true CN114708185A (en) 2022-07-05

Family

ID=82166424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111258992.4A Pending CN114708185A (en) 2021-10-28 2021-10-28 Target detection method, system and equipment based on big data enabling and model flow

Country Status (1)

Country Link
CN (1) CN114708185A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024012179A1 (en) * 2022-07-15 2024-01-18 马上消费金融股份有限公司 Model training method, target detection method and apparatuses

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024012179A1 (en) * 2022-07-15 2024-01-18 马上消费金融股份有限公司 Model training method, target detection method and apparatuses

Similar Documents

Publication Publication Date Title
CN108038474B (en) Face detection method, convolutional neural network parameter training method, device and medium
CN108090508B (en) classification training method, device and storage medium
CN112734775B (en) Image labeling, image semantic segmentation and model training methods and devices
US10395136B2 (en) Image processing apparatus, image processing method, and recording medium
JP6330385B2 (en) Image processing apparatus, image processing method, and program
CN108108751B (en) Scene recognition method based on convolution multi-feature and deep random forest
CN110598603A (en) Face recognition model acquisition method, device, equipment and medium
CN109002755B (en) Age estimation model construction method and estimation method based on face image
WO2016080913A1 (en) Method and device for traffic sign recognition
CN112966691A (en) Multi-scale text detection method and device based on semantic segmentation and electronic equipment
CN106815323B (en) Cross-domain visual retrieval method based on significance detection
CN108595558B (en) Image annotation method based on data equalization strategy and multi-feature fusion
CN110598586A (en) Target detection method and system
CN113128478B (en) Model training method, pedestrian analysis method, device, equipment and storage medium
CN116206185A (en) Lightweight small target detection method based on improved YOLOv7
Nguyen et al. Satellite image classification using convolutional learning
CN115049841A (en) Depth unsupervised multistep anti-domain self-adaptive high-resolution SAR image surface feature extraction method
CN112541448A (en) Pedestrian re-identification method and device, electronic equipment and storage medium
CN112131944B (en) Video behavior recognition method and system
CN113850838A (en) Ship voyage intention acquisition method and device, computer equipment and storage medium
CN115797735A (en) Target detection method, device, equipment and storage medium
CN113849679A (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
CN114708185A (en) Target detection method, system and equipment based on big data enabling and model flow
CN116362294B (en) Neural network searching method and device and readable storage medium
Bouteldja et al. A comparative analysis of SVM, K-NN, and decision trees for high resolution satellite image scene classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination