CN112580684A - Target detection method and device based on semi-supervised learning and storage medium - Google Patents

Target detection method and device based on semi-supervised learning and storage medium Download PDF

Info

Publication number
CN112580684A
CN112580684A CN202011288652.1A CN202011288652A CN112580684A CN 112580684 A CN112580684 A CN 112580684A CN 202011288652 A CN202011288652 A CN 202011288652A CN 112580684 A CN112580684 A CN 112580684A
Authority
CN
China
Prior art keywords
data
target detection
semi
tag data
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011288652.1A
Other languages
Chinese (zh)
Other versions
CN112580684B (en
Inventor
唐子豪
刘莉红
刘玉宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011288652.1A priority Critical patent/CN112580684B/en
Publication of CN112580684A publication Critical patent/CN112580684A/en
Application granted granted Critical
Publication of CN112580684B publication Critical patent/CN112580684B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioethics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Hardware Design (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of target detection, and discloses a target detection method based on semi-supervised learning, which comprises the following steps: determining label data corresponding to the training data based on the acquired training data; carrying out data cleaning processing on the tag data to obtain new cleaned tag data; performing data enhancement processing on the new tag data to acquire enhanced data corresponding to the new tag data; training a deep learning model based on the enhanced data and preset artificially labeled image information until a loss function of the deep learning model converges in a preset range to form a target detection model; and acquiring a target detection result of the data to be detected based on the target detection model. The invention also relates to blockchain techniques, the new tag data being stored in a blockchain. The invention can improve the efficiency and accuracy of target detection based on semi-supervised learning.

Description

Target detection method and device based on semi-supervised learning and storage medium
Technical Field
The present invention relates to the field of target detection technologies, and in particular, to a method and an apparatus for target detection based on semi-supervised learning, an electronic device, and a computer-readable storage medium.
Background
The manual work behind artificial intelligence mainly means that a large amount of manpower is needed to label data before training a model. Although there are target detection data sets disclosed by COCO and the like at present, the target detection depth model is applied to the actual item at present, and needs to be trained again on the labeled service data set to adapt to the service data. At present, most artificial intelligence enterprises need to invest a large amount of cost for acquiring the artificial annotation of business data. Meanwhile, for the labeled data, manual work is required to be invested to carry out inspection, cleaning and correction to ensure the image labeling quality, the requirement comes from the sensitivity of the neural network to the data, therefore, a multi-level labeling and examining structure is required to be constructed for the labeled data, and the labeled data can only be proved to be statistically available through sampling inspection for large-batch data.
At present, although a semi-supervised learning method for classification tasks achieves certain results, the semi-supervised learning method for target detection is not mature, and the problems of high calculation precision, large data volume and the like still exist.
Disclosure of Invention
The invention provides a target detection method and device based on semi-supervised learning, electronic equipment and a computer readable storage medium, and mainly aims to improve the efficiency and accuracy of target detection based on semi-supervised learning.
In order to achieve the above object, the present invention provides a target detection method based on semi-supervised learning, comprising: determining label data corresponding to the training data based on the acquired training data;
carrying out data cleaning processing on the tag data to obtain new cleaned tag data;
performing data enhancement processing on the new tag data to acquire enhanced data corresponding to the new tag data;
training a deep learning model based on the enhanced data and preset artificially labeled image information until a loss function of the deep learning model converges in a preset range to form a target detection model;
and acquiring a target detection result of the data to be detected based on the target detection model.
Optionally, the step of determining label data corresponding to the training data includes:
carrying out horizontal mirror image overturning processing on the training data and acquiring the processed training data;
training an open source model based on the processed training data until the open source model converges in a specified range to form a label obtaining model;
and obtaining label data of the unmarked training data according to the label obtaining model.
Optionally, the training data comprises label-free image information;
the tag data includes an object located on the image information, an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa, and a lower right-hand ordinate of a bounding box for bounding the object.
Optionally, the step of performing data cleaning processing on the tag data, and acquiring new tag data after cleaning includes:
determining the width, height and central point coordinate information of the surrounding frame based on the upper left-corner abscissa, the upper left-corner ordinate, the lower right-corner abscissa and the lower right-corner ordinate of the surrounding frame;
determining a conversion coordinate corresponding to the surrounding frame according to the width coordinate of the surrounding frame, the width of the image information corresponding to the surrounding frame, the height and center coordinate of the surrounding frame and the height of the image information corresponding to the surrounding frame;
and carrying out data cleaning processing on the converted coordinates based on an open source framework CLEANAB to obtain new label data after cleaning.
Optionally, the new tag data is stored in a block chain, and the step of performing data enhancement processing on the new tag data includes:
randomly dithering a color variable of the new label data; and/or geometrically deforming objects within a bounding box in the new tag data; and/or geometrically deforming the new label data and correspondingly transforming the bounding box in the new label data; wherein the content of the first and second substances,
the color variables include brightness, saturation, contrast, and transparency;
the geometric deformation includes translation, flipping, shearing, and rotation.
Optionally, the loss function comprises a sum of supervised and unsupervised losses; wherein the content of the first and second substances,
the expression of the supervised loss is as follows:
Figure RE-GDA0002956358740000021
wherein, x represents an image, p and t represent vector information, b represents the sequence number of a bounding box in the artificially marked image information, i represents the sequence number of a prior box, pi represents the probability that the predicted prior box belongs to a positive sample, ti represents the coordinate of the prior box (including the horizontal and vertical coordinates of the upper left corner and the lower right corner), and when the prior box i belongs to the bounding box b, p represents the horizontal and vertical coordinates of the prior boxi,b1, otherwise pi,bIs 0, tbRepresenting coordinates of the manual label surrounding the frame, Ls represents the survived loss, Lcls represents a classified loss function, Lreg represents a regressed loss function, Nreg represents a normalization coefficient of a regression term, and Ncls represents a normalization coefficient of a classification term;
the expression of the unsupervised loss is as follows:
Figure RE-GDA0002956358740000031
wherein x represents an image, q represents label data of the image x, b represents a sequence number of a bounding box in the label data, i represents a sequence number of a prior box, pi represents the predicted probability that the prior box belongs to a positive sample, ti represents coordinates of the prior box (including a horizontal and vertical coordinate at the upper left corner and a horizontal and vertical coordinate at the lower right corner), and q represents the number of the prior box when the prior box i belongs to the bounding box bi,b1, otherwise qi,bIs 0, sbRepresenting the bounding box and the labeled coordinates thereof, Lu representing unsupervised loss, Lcs representing the loss function of classification, Lreg representing the loss function of regression, Nreg representing the normalization coefficient of regression term, and Ncs representing the normalization coefficient of classification term; wherein the content of the first and second substances,
ω(x)=1ifmax(p(x;θ))≥τelse0
q(x)=ONE_HOT(argmax(p(x;θ)))
where θ represents a trainable parameter of the deep learning model and τ represents a confidence threshold for the new tag data.
Optionally, the number of the enhancement data is 10-15 times of the number of the artificially labeled image information.
In order to solve the above problem, the present invention further provides a target detection apparatus based on semi-supervised learning, the apparatus comprising:
a label data determination unit configured to determine label data corresponding to the training data based on the acquired training data;
a new tag data acquisition unit, configured to perform data cleaning processing on the tag data, and acquire new tag data after cleaning;
the enhanced data acquisition unit is used for performing data enhancement processing on the new tag data to acquire enhanced data corresponding to the new tag data;
the target detection model forming unit is used for training a deep learning model based on the enhanced data and preset artificially labeled image information until the loss function of the deep learning model is converged in a preset range so as to form a target detection model;
and the detection result acquisition unit is used for acquiring a target detection result of the data to be detected based on the target detection model.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the semi-supervised learning based target detection method.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, which stores at least one instruction, where the at least one instruction is executed by a processor in an electronic device to implement the semi-supervised learning based target detection method described above.
The method comprises the steps of determining corresponding label data based on acquired training data, then carrying out data cleaning processing and data enhancement processing on the label data to acquire new label data and enhanced data, and training a deep learning model based on the enhanced data and preset artificially labeled image information until a loss function of the deep learning model converges in a preset range to form a target detection model; through the characteristics, the automatic marking of the embodiment of the invention can achieve the degree close to that after multi-stage inspection and correction in quality, and the cost is greatly reduced; the method can also be used for matching with manual quality inspection to inspect and modify the existing data, simplify the data quality inspection process, save the management and time cost except the labor cost.
Drawings
Fig. 1 is a flowchart of a target detection method based on semi-supervised learning according to an embodiment of the present invention;
FIG. 2 is a block diagram of an apparatus for detecting targets based on semi-supervised learning according to an embodiment of the present invention;
fig. 3 is a schematic internal structural diagram of an electronic device implementing a target detection method based on semi-supervised learning according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a target detection method based on semi-supervised learning. Fig. 1 is a schematic flow chart of a target detection method based on semi-supervised learning according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the target detection method based on semi-supervised learning includes:
s110: based on the acquired training data, tag data corresponding to the training data is determined.
The training data can adopt image information without labels, and corresponding label data is obtained based on the image information without labels; wherein the label data further comprises a category of the object in the image information, an upper left abscissa, an upper left ordinate, a lower right abscissa, and a lower right ordinate of a bounding box that bounds the object.
Wherein the step of determining label data corresponding to the training data comprises:
1. carrying out horizontal mirror image overturning processing on the training data and acquiring the processed training data;
2. training a DetectORS open source model based on the processed training data until the DetectORS open source model converges in a specified range to form a label obtaining model;
3. and obtaining label data of the unmarked training data according to the label obtaining model.
It should be noted that, the image information in the training data is the original image, not the reduced-size image, the image information in the training data includes multi-medium-scale images with sizes of 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, etc., and the detection accuracy of the later model can be improved by using image information with various sizes.
S120: and carrying out data cleaning processing on the tag data to obtain new tag data after cleaning.
The data cleaning processing is performed on the tag data, and the specific process of acquiring the cleaned new tag data may include:
1. determining the width, height and central point coordinate information of the enclosing frame based on the upper left-corner abscissa, the upper left-corner ordinate, the lower right-corner abscissa and the lower right-corner ordinate of the enclosing frame;
2. determining a conversion coordinate corresponding to the surrounding frame according to the width coordinate of the surrounding frame, the width of the image information corresponding to the surrounding frame, the height and center coordinate of the surrounding frame and the height of the image information corresponding to the surrounding frame;
specifically, the width coordinate of the bounding box may be divided by the width of the image information corresponding to the bounding box, and the height and the center coordinate of the bounding box may be simultaneously divided by the height of the image information corresponding to the bounding box, to obtain the conversion coordinate corresponding to the bounding box;
3. and carrying out data cleaning processing on the converted coordinates based on an open source framework CLEANAB to obtain new label data after cleaning.
It should be noted that, the existing open source framework clearlab can clean noisy data, and train a model using the cleaned data. However, the method can only be used for a classification task and cannot be used in a target detection task, and the target detection task can be regarded as the superposition of the classification task and a regression task.
Since CLEANLAB can be used to process classification problems up to 1000 classes, the regression target output values can be normalized to between 0 and 1, and each 0.001 step is divided into 1 class, which can be subdivided into 1000 classes 1 × 1e-3, 2 × 1e-3, … …, 1000 × 1e-3, [ (n-1) × 1e-3, n × 1e-3), belonging to the nth class.
In the above step, the process of obtaining the transformation coordinates corresponding to the bounding box may normalize all 4 coordinates of the bounding box to between 0 and 1, and further transform the target detection task into a classification problem of 1000 classes. For example, in a neural network, assuming that there are m neurons in the last layer, m scalars are output, which is represented by a vector v. If the regression problem is processed, the m neurons are connected to 1 neuron to obtain 1 scalar w.v + b, and a continuous value is output. If the classification problem is handled: the m neurons are connected to n neurons to obtain n scalars w.v + b, and then normalized into probabilities on n categories by an activation function such as softmax, so that the regression problem can be converted into the classification problem.
S130: and performing data enhancement processing on the new tag data to acquire enhanced data corresponding to the new tag data.
The new tag data can be stored in a block chain, and the step of performing data enhancement processing on the new tag data comprises the following steps: randomly dithering the color variable of the new label data; and/or geometrically deforming the object within the bounding box in the new tag data; and/or geometrically deforming the new label data and correspondingly transforming the bounding box in the new label data; wherein the color variables include brightness, saturation, contrast, and transparency; geometric deformations include translation, flipping, shearing, and rotation.
It is emphasized that, to further ensure the privacy and security of the new tag data, the new tag data may also be stored in a node of a blockchain.
Specifically, the label data after washing can also be understood as image data with a machine label after washing, and the manner of performing data enhancement processing on the label data after washing includes three processing cases, the first: carrying out random dithering of color variables on the cleaned label data; in the second case: geometrically deforming the bounding box in the cleaned label data; in the third case: and performing geometric deformation on the cleaned label data, and performing corresponding transformation on the enclosure frame. Wherein, the color variables comprise brightness, saturation, contrast, transparency and the like; geometric deformations include translation, flipping, shearing, rotation, and the like.
In addition, after the enhancement data are obtained, positions can be randomly selected on the image of the enhancement data, a rectangular frame with color random noise is added, the shielding condition in a real scene is simulated through the rectangular frame, and the rectangular frame can be selectively added according to a specific application scene or requirement.
S140: and training the deep learning model based on the enhanced data and preset artificially labeled image information until the loss function of the deep learning model is converged in a preset range to form a target detection model.
Optionally, the loss function comprises a sum of supervised and unsupervised losses; wherein the content of the first and second substances,
the expression with supervision loss is:
Figure RE-GDA0002956358740000071
wherein, x represents an image, p and t represent vector information, b represents the sequence number of a bounding box in the artificially marked image information, i represents the sequence number of a prior box, pi represents the probability that the predicted prior box belongs to a positive sample, ti represents the coordinate of the prior box (including the horizontal and vertical coordinates of the upper left corner and the lower right corner), and when the prior box i belongs to the bounding box b, p represents the horizontal and vertical coordinates of the prior boxi,b1, otherwise pi,bIs 0, tbRepresenting coordinates of the manual label surrounding the frame, Ls represents the survived loss, Lcls represents a classified loss function, Lreg represents a regressed loss function, Nreg represents a normalization coefficient of a regression term, and Ncls represents a normalization coefficient of a classification term;
the expression for unsupervised loss is:
Figure RE-GDA0002956358740000072
wherein x represents an image, q represents label data of the image x, b represents a sequence number of a bounding box in the label data, i represents a sequence number of a prior box, pi represents the predicted probability that the prior box belongs to a positive sample, ti represents coordinates of the prior box (including a horizontal and vertical coordinate at the upper left corner and a horizontal and vertical coordinate at the lower right corner), and q represents the number of the prior box when the prior box i belongs to the bounding box bi,b1, otherwise qi,bIs 0, sbRepresenting the bounding box and the labeled coordinates thereof, Lu representing unsupervised loss, Lcs representing the loss function of classification, Lreg representing the loss function of regression, Nreg representing the normalization coefficient of regression term, and Ncs representing the normalization coefficient of classification term; wherein the content of the first and second substances,
ω(x)=1if max(p(x;θ))≥τelse0
q(x)=ONE_HOT(arg max(p(x;θ)))
in the above formula, θ represents a trainable parameter of the deep learning model, and τ represents a confidence threshold of the new tag data.
It should be noted that, in order to increase the data amount and avoid the image information occupation ratio of the manual annotation being too small, the number of the enhanced data (i.e. the number of the images to be labeled) is 10 to 15 times of the number of the image information of the manual annotation, and the occupation ratio may also be set according to a specific application scenario and requirements, and is not limited to the specific numerical value.
S150: and acquiring a target detection result of the data to be detected based on the target detection model.
The target detection method based on semi-supervised learning can greatly reduce the marking cost and improve the detection precision, compared with the traditional method, the automatic marking can achieve the degree close to that after multi-stage inspection and correction in quality, and meanwhile, the cost is greatly reduced; the method can also be used for matching with manual quality inspection to inspect and reform the existing data, thereby simplifying the data quality inspection process; furthermore, in addition to labor costs, management and time costs may also be saved. In addition, the existing data cleaning method can be applied to more complex scenes through modification, and more options are provided for subsequent expansion of the application field.
Corresponding to the target detection method based on semi-supervised learning, the invention also provides a target detection device based on semi-supervised learning.
Specifically, fig. 3 shows functional modules of a target detection apparatus based on semi-supervised learning according to an embodiment of the present invention.
As shown in fig. 3, the target detection apparatus 100 based on semi-supervised learning according to the present invention can be installed in an electronic device. According to the implemented functions, the semi-supervised learning based object detection apparatus 100 may include: a tag data determination unit 101, a new tag data acquisition unit 102, an enhanced data acquisition unit 103, an object detection model formation unit 104, and a detection result acquisition unit 105. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
a tag data determination unit 101 configured to determine, based on the acquired training data, tag data corresponding to the training data;
a new tag data obtaining unit 102, configured to perform data cleaning processing on the tag data, and obtain new tag data after cleaning. It is emphasized that, to further ensure the privacy and security of the new tag data, the new tag data may also be stored in a node of a blockchain.
An enhanced data obtaining unit 103, configured to perform data enhancement processing on the new tag data, and obtain enhanced data corresponding to the new tag data;
a target detection model forming unit 104, which trains a deep learning model based on the enhanced data and preset artificially labeled image information until a loss function of the deep learning model converges within a preset range to form a target detection model;
a detection result obtaining unit 105, configured to obtain a target detection result of the to-be-detected data based on the target detection model.
Specifically, in the tag data determination unit 101, the step of determining the tag data corresponding to the training data includes:
carrying out horizontal mirror image overturning processing on the training data and acquiring the processed training data;
training an open source model based on the processed training data until the open source model converges in a specified range to form a label obtaining model;
and obtaining label data of the unmarked training data according to the label obtaining model.
Furthermore, the training data comprises label-free image information;
the tag data includes an object located on the image information, an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa, and a lower right-hand ordinate of a bounding box for bounding the object.
In the new tag data obtaining unit 102, a data washing process is performed on the tag data, and the step of obtaining the washed new tag data includes:
determining the width, height and central point coordinate information of the surrounding frame based on the upper left-corner abscissa, the upper left-corner ordinate, the lower right-corner abscissa and the lower right-corner ordinate of the surrounding frame;
determining a conversion coordinate corresponding to the surrounding frame according to the width coordinate of the surrounding frame, the width of the image information corresponding to the surrounding frame, the height and the center coordinate of the surrounding frame and the height of the image information corresponding to the surrounding frame;
and carrying out data cleaning processing on the converted coordinates based on an open source framework CLEANAB to obtain new label data after cleaning.
In the enhanced data obtaining unit 103, the new tag data is stored in a block chain, and the step of performing data enhancement processing on the new tag data includes:
randomly dithering a color variable of the new label data; and/or geometrically deforming objects within a bounding box in the new tag data; and/or geometrically deforming the new label data and correspondingly transforming the bounding box in the new label data; wherein the content of the first and second substances,
the color variables include brightness, saturation, contrast, and transparency;
the geometric deformation includes translation, flipping, shearing, and rotation.
Further, in the object detection model forming unit 104, the loss function includes a sum of the supervised loss and the unsupervised loss; wherein the content of the first and second substances,
the expression of the supervised loss is as follows:
Figure RE-GDA0002956358740000101
wherein, x represents an image, p and t represent vector information, b represents the sequence number of a bounding box in the artificially marked image information, i represents the sequence number of a prior box, pi represents the probability that the predicted prior box belongs to a positive sample, ti represents the coordinate of the prior box (including the horizontal and vertical coordinates of the upper left corner and the lower right corner), and when the prior box i belongs to the bounding box b, p represents the horizontal and vertical coordinates of the prior boxi,b1, otherwise pi,bIs 0, tbRepresenting coordinates of the manual label surrounding the frame, Ls represents the survived loss, Lcls represents a classified loss function, Lreg represents a regressed loss function, Nreg represents a normalization coefficient of a regression term, and Ncls represents a normalization coefficient of a classification term;
the expression of the unsupervised loss is as follows:
Figure RE-GDA0002956358740000102
wherein x represents an image, q represents label data of the image x, b represents a sequence number of a bounding box in the label data, i represents a sequence number of a prior box, pi represents the predicted probability that the prior box belongs to a positive sample, ti represents coordinates of the prior box (including a horizontal and vertical coordinate at the upper left corner and a horizontal and vertical coordinate at the lower right corner), and q represents the number of the prior box when the prior box i belongs to the bounding box bi,b1, otherwise qi,bIs 0, sbRepresenting the bounding box and the labeled coordinates thereof, Lu representing unsupervised loss, Lcs representing the loss function of classification, Lreg representing the loss function of regression, Nreg representing the normalization coefficient of regression term, and Ncs representing the normalization coefficient of classification term; wherein the content of the first and second substances,
ω(x)=1if max(p(x;θ))≥τelse0
q(x)=ONE_HOT(arg max(p(x;θ)))
where θ represents a trainable parameter of the deep learning model and τ represents a confidence threshold for the new tag data.
Fig. 3 is a schematic structural diagram of an electronic device for implementing a target detection method based on semi-supervised learning according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a semi-supervised learning based object detection program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of an object detection program based on semi-supervised learning, etc., but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., an object detection program based on semi-supervised learning, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 3 only shows an electronic device with components, and it will be understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The semi-supervised learning based object detection program 12 stored by the memory 11 in the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
determining label data corresponding to the training data based on the acquired training data;
carrying out data cleaning processing on the tag data to obtain new cleaned tag data;
performing data enhancement processing on the new tag data to acquire enhanced data corresponding to the new tag data;
training a deep learning model based on the enhanced data and preset artificially labeled image information until a loss function of the deep learning model converges in a preset range to form a target detection model;
and acquiring a target detection result of the data to be detected based on the target detection model.
Optionally, the step of determining label data corresponding to the training data includes:
carrying out horizontal mirror image overturning processing on the training data and acquiring the processed training data;
training an open source model based on the processed training data until the open source model converges in a specified range to form a label obtaining model;
and obtaining label data of the unmarked training data according to the label obtaining model.
Optionally, the training data comprises label-free image information;
the tag data includes an object located on the image information, an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa, and a lower right-hand ordinate of a bounding box for bounding the object. Optionally, the step of performing data cleaning processing on the tag data, and acquiring new tag data after cleaning includes:
determining the width, height and central point coordinate information of the surrounding frame based on the upper left-corner abscissa, the upper left-corner ordinate, the lower right-corner abscissa and the lower right-corner ordinate of the surrounding frame;
dividing the width coordinate of the surrounding frame by the width of the image information corresponding to the surrounding frame, and simultaneously dividing the height and the center coordinate of the surrounding frame by the height of the image information corresponding to the surrounding frame to obtain the conversion coordinate information corresponding to the surrounding frame;
and carrying out data cleaning processing on the converted coordinates based on an open source framework CLEANAB to obtain new label data after cleaning.
Optionally, the new tag data is stored in a block chain, and the step of performing data enhancement processing on the new tag data includes:
randomly dithering a color variable of the new label data; and/or geometrically deforming objects within a bounding box in the new tag data; and/or geometrically deforming the new label data and correspondingly transforming the bounding box in the new label data; wherein the content of the first and second substances,
the color variables include brightness, saturation, contrast, and transparency;
the geometric deformation includes translation, flipping, shearing, and rotation.
Optionally, the loss function comprises a sum of supervised and unsupervised losses; wherein the content of the first and second substances,
the expression of the supervised loss is as follows:
Figure RE-GDA0002956358740000131
wherein, x represents an image, p and t represent vector information, b represents the sequence number of a bounding box in the artificially marked image information, i represents the sequence number of a prior box, pi represents the probability that the predicted prior box belongs to a positive sample, ti represents the coordinate of the prior box (including the horizontal and vertical coordinates of the upper left corner and the lower right corner), and when the prior box i belongs to the bounding box b, p represents the horizontal and vertical coordinates of the prior boxi,b1, otherwise pi,bIs 0, tbRepresenting coordinates of the manual label surrounding the frame, Ls represents the survived loss, Lcls represents a classified loss function, Lreg represents a regressed loss function, Nreg represents a normalization coefficient of a regression term, and Ncls represents a normalization coefficient of a classification term;
the expression of the unsupervised loss is as follows:
Figure RE-GDA0002956358740000141
wherein x represents an image, q represents label data of the image x, b represents a sequence number of a bounding box in the label data, i represents a sequence number of a prior box, pi represents the predicted probability that the prior box belongs to a positive sample, ti represents coordinates of the prior box (including a horizontal and vertical coordinate at the upper left corner and a horizontal and vertical coordinate at the lower right corner), and q represents the number of the prior box when the prior box i belongs to the bounding box bi,b1, otherwise qi,bIs 0, sbRepresenting the bounding box and the labeled coordinates thereof, Lu representing unsupervised loss, Lcs representing the loss function of classification, Lreg representing the loss function of regression, Nreg representing the normalization coefficient of regression term, and Ncs representing the normalization coefficient of classification term; wherein the content of the first and second substances,
ω(x)=1if max(p(x;θ))≥τelse0
q(x)=ONE_HOT(arg max(p(x;θ)))
where θ represents a trainable parameter of the deep learning model and τ represents a confidence threshold for the new tag data.
Optionally, the number of the enhancement data is 10-15 times of the number of the artificially labeled image information.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A target detection method based on semi-supervised learning is characterized by comprising the following steps:
determining label data corresponding to the training data based on the acquired training data;
carrying out data cleaning processing on the tag data to obtain new cleaned tag data;
performing data enhancement processing on the new tag data to acquire enhanced data corresponding to the new tag data;
training a deep learning model based on the enhanced data and preset artificially labeled image information until a loss function of the deep learning model converges in a preset range to form a target detection model;
and acquiring a target detection result of the data to be detected based on the target detection model.
2. The semi-supervised learning based object detection method of claim 1, wherein the step of determining label data corresponding to the training data comprises:
carrying out horizontal mirror image overturning processing on the training data to obtain the training data after overturning processing;
training an open source model based on the training data after the turning processing until the open source model is converged in a specified range to form a label obtaining model;
and obtaining label data of the unmarked training data according to the label obtaining model.
3. The semi-supervised learning based object detection method of claim 2,
the training data comprises image information without labels;
the tag data includes an object located on the image information, an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa, and a lower right-hand ordinate of a bounding box for bounding the object.
4. The semi-supervised learning based object detection method as recited in claim 3, wherein the tag data is subjected to data washing processing, and the step of acquiring new tag data after washing comprises the following steps:
determining the width, height and central point coordinate information of the surrounding frame based on the upper left-corner abscissa, the upper left-corner ordinate, the lower right-corner abscissa and the lower right-corner ordinate of the surrounding frame;
determining a conversion coordinate corresponding to the surrounding frame according to the width coordinate of the surrounding frame, the width of the image information corresponding to the surrounding frame, the height and center coordinate of the surrounding frame and the height of the image information corresponding to the surrounding frame;
and carrying out data cleaning processing on the converted coordinates based on an open source framework CLEANAB to obtain new label data after cleaning.
5. The semi-supervised learning based object detection method according to claim 3, wherein the new tag data is stored in a block chain, and the step of performing data enhancement processing on the new tag data comprises:
randomly dithering a color variable of the new label data; and/or geometrically deforming objects within a bounding box in the new tag data; and/or geometrically deforming the new label data and correspondingly transforming the bounding box in the new label data; wherein the content of the first and second substances,
the color variables include brightness, saturation, contrast, and transparency;
the geometric deformation includes translation, flipping, shearing, and rotation.
6. The semi-supervised learning based object detection method of claim 1, wherein the loss function comprises a sum of supervised and unsupervised losses; wherein the content of the first and second substances,
the expression of the supervised loss is as follows:
Figure FDA0002783194690000021
wherein, x represents an image, p and t represent vector information, b represents the sequence number of a bounding box in the artificially marked image information, i represents the sequence number of a prior box, pi represents the probability that the predicted prior box belongs to a positive sample, ti represents the coordinate of the prior box (including the horizontal and vertical coordinates of the upper left corner and the lower right corner), and when the prior box i belongs to the bounding box b, p represents the horizontal and vertical coordinates of the prior boxi,b1, otherwise pi,bIs 0, tbCoordinates representing an artificial label surrounding the boxLs represents the survived loss, Lcls represents the loss function of classification, Lreg represents the loss function of regression, Nreg represents the normalization coefficient of the regression term, and Ncls represents the normalization coefficient of the classification term;
the expression of the unsupervised loss is as follows:
Figure FDA0002783194690000022
wherein x represents an image, q represents label data of the image x, b represents a sequence number of a bounding box in the label data, i represents a sequence number of a prior box, pi represents the predicted probability that the prior box belongs to a positive sample, ti represents coordinates of the prior box (including a horizontal and vertical coordinate at the upper left corner and a horizontal and vertical coordinate at the lower right corner), and q represents the number of the prior box when the prior box i belongs to the bounding box bi,b1, otherwise qi,bIs 0, sbRepresenting the bounding box and the labeled coordinates thereof, Lu representing unsupervised loss, Lcs representing the loss function of classification, Lreg representing the loss function of regression, Nreg representing the normalization coefficient of regression term, and Ncs representing the normalization coefficient of classification term; wherein the content of the first and second substances,
ω(x)=1if max(p(x;θ))≥τelse0
q(x)=ONE_HOT(arg max(p(x;θ)))
where θ represents a trainable parameter of the deep learning model and τ represents a confidence threshold for the new tag data.
7. The semi-supervised learning based object detection method of claim 1,
the quantity of the enhanced data is 10-15 times of the quantity of the artificially marked image information.
8. An object detection apparatus based on semi-supervised learning, the apparatus comprising:
a label data determination unit configured to determine label data corresponding to the training data based on the acquired training data;
a new tag data acquisition unit, configured to perform data cleaning processing on the tag data, and acquire new tag data after cleaning;
the enhanced data acquisition unit is used for performing data enhancement processing on the new tag data to acquire enhanced data corresponding to the new tag data;
the target detection model forming unit is used for training a deep learning model based on the enhanced data and preset artificially labeled image information until the loss function of the deep learning model is converged in a preset range so as to form a target detection model;
and the detection result acquisition unit is used for acquiring a target detection result of the data to be detected based on the target detection model.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps in the semi-supervised learning based object detection method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps in the semi-supervised learning based object detection method as recited in any one of claims 1 to 7.
CN202011288652.1A 2020-11-17 2020-11-17 Target detection method, device and storage medium based on semi-supervised learning Active CN112580684B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011288652.1A CN112580684B (en) 2020-11-17 2020-11-17 Target detection method, device and storage medium based on semi-supervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011288652.1A CN112580684B (en) 2020-11-17 2020-11-17 Target detection method, device and storage medium based on semi-supervised learning

Publications (2)

Publication Number Publication Date
CN112580684A true CN112580684A (en) 2021-03-30
CN112580684B CN112580684B (en) 2024-04-09

Family

ID=75122779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011288652.1A Active CN112580684B (en) 2020-11-17 2020-11-17 Target detection method, device and storage medium based on semi-supervised learning

Country Status (1)

Country Link
CN (1) CN112580684B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990374A (en) * 2021-04-28 2021-06-18 平安科技(深圳)有限公司 Image classification method, device, electronic equipment and medium
CN113139594A (en) * 2021-04-19 2021-07-20 北京理工大学 Airborne image unmanned aerial vehicle target self-adaptive detection method
CN113191409A (en) * 2021-04-20 2021-07-30 国网江苏省电力有限公司营销服务中心 Method for detecting abnormal electricity consumption behaviors of residents through tag data expansion and deep learning
CN113379322A (en) * 2021-07-06 2021-09-10 国网江苏省电力有限公司营销服务中心 Electricity stealing user distinguishing method based on tag augmentation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190205794A1 (en) * 2017-12-29 2019-07-04 Oath Inc. Method and system for detecting anomalies in data labels
CN110910375A (en) * 2019-11-26 2020-03-24 北京明略软件系统有限公司 Detection model training method, device, equipment and medium based on semi-supervised learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190205794A1 (en) * 2017-12-29 2019-07-04 Oath Inc. Method and system for detecting anomalies in data labels
CN110910375A (en) * 2019-11-26 2020-03-24 北京明略软件系统有限公司 Detection model training method, device, equipment and medium based on semi-supervised learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139594A (en) * 2021-04-19 2021-07-20 北京理工大学 Airborne image unmanned aerial vehicle target self-adaptive detection method
CN113191409A (en) * 2021-04-20 2021-07-30 国网江苏省电力有限公司营销服务中心 Method for detecting abnormal electricity consumption behaviors of residents through tag data expansion and deep learning
CN112990374A (en) * 2021-04-28 2021-06-18 平安科技(深圳)有限公司 Image classification method, device, electronic equipment and medium
WO2022227192A1 (en) * 2021-04-28 2022-11-03 平安科技(深圳)有限公司 Image classification method and apparatus, and electronic device and medium
CN112990374B (en) * 2021-04-28 2023-09-15 平安科技(深圳)有限公司 Image classification method, device, electronic equipment and medium
CN113379322A (en) * 2021-07-06 2021-09-10 国网江苏省电力有限公司营销服务中心 Electricity stealing user distinguishing method based on tag augmentation

Also Published As

Publication number Publication date
CN112580684B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN112580684B (en) Target detection method, device and storage medium based on semi-supervised learning
CN111932564B (en) Picture identification method and device, electronic equipment and computer readable storage medium
CN111652845A (en) Abnormal cell automatic labeling method and device, electronic equipment and storage medium
CN111932547B (en) Method and device for segmenting target object in image, electronic device and storage medium
CN112052850A (en) License plate recognition method and device, electronic equipment and storage medium
CN112446544A (en) Traffic flow prediction model training method and device, electronic equipment and storage medium
CN112396005A (en) Biological characteristic image recognition method and device, electronic equipment and readable storage medium
CN112137591B (en) Target object position detection method, device, equipment and medium based on video stream
CN112749653A (en) Pedestrian detection method, device, electronic equipment and storage medium
CN112767320A (en) Image detection method, image detection device, electronic equipment and storage medium
CN113222063A (en) Express carton garbage classification method, device, equipment and medium
CN114708461A (en) Multi-modal learning model-based classification method, device, equipment and storage medium
CN114511038A (en) False news detection method and device, electronic equipment and readable storage medium
CN111985449A (en) Rescue scene image identification method, device, equipment and computer medium
CN115471775A (en) Information verification method, device and equipment based on screen recording video and storage medium
CN115205225A (en) Training method, device and equipment of medical image recognition model and storage medium
CN114494800A (en) Prediction model training method and device, electronic equipment and storage medium
CN114399775A (en) Document title generation method, device, equipment and storage medium
CN112990374A (en) Image classification method, device, electronic equipment and medium
CN113065607A (en) Image detection method, image detection device, electronic device, and medium
CN112101481A (en) Method, device and equipment for screening influence factors of target object and storage medium
CN112269875A (en) Text classification method and device, electronic equipment and storage medium
CN114627435B (en) Intelligent light adjusting method, device, equipment and medium based on image recognition
CN115147660A (en) Image classification method, device and equipment based on incremental learning and storage medium
CN114463685A (en) Behavior recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant