CN112580684B - Target detection method, device and storage medium based on semi-supervised learning - Google Patents

Target detection method, device and storage medium based on semi-supervised learning Download PDF

Info

Publication number
CN112580684B
CN112580684B CN202011288652.1A CN202011288652A CN112580684B CN 112580684 B CN112580684 B CN 112580684B CN 202011288652 A CN202011288652 A CN 202011288652A CN 112580684 B CN112580684 B CN 112580684B
Authority
CN
China
Prior art keywords
data
bounding box
target detection
tag data
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011288652.1A
Other languages
Chinese (zh)
Other versions
CN112580684A (en
Inventor
唐子豪
刘莉红
刘玉宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011288652.1A priority Critical patent/CN112580684B/en
Publication of CN112580684A publication Critical patent/CN112580684A/en
Application granted granted Critical
Publication of CN112580684B publication Critical patent/CN112580684B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of target detection, and discloses a target detection method based on semi-supervised learning, which comprises the following steps: determining label data corresponding to the training data based on the acquired training data; performing data cleaning treatment on the label data to obtain cleaned new label data; performing data enhancement processing on the new tag data to obtain enhancement data corresponding to the new tag data; training a deep learning model based on the enhancement data and preset manually marked image information until a loss function of the deep learning model is converged within a preset range to form a target detection model; and acquiring a target detection result of the data to be detected based on the target detection model. The present invention also relates to blockchain techniques, the new tag data being stored in the blockchain. The invention can improve the efficiency and accuracy of target detection based on semi-supervised learning.

Description

Target detection method, device and storage medium based on semi-supervised learning
Technical Field
The present invention relates to the field of target detection technologies, and in particular, to a method, an apparatus, an electronic device, and a computer readable storage medium for target detection based on semi-supervised learning.
Background
The manual work behind artificial intelligence mainly means that a large amount of manpower is required for labeling data before training a model. Although there are currently target detection data sets disclosed by COCO and the like, the target detection depth model is applied to actual items, and is still required to be trained again on the marked service data sets so as to adapt to the service data. Currently, most artificial intelligence enterprises require a great deal of cost to input for acquiring the artificial annotation of business data. Meanwhile, for the marked data, manual inspection, cleaning and correction are needed to ensure the image marking quality, and the requirement comes from the sensitivity of the neural network to the data, so that the marked data needs to construct a multi-level marking and examining structure, and statistics are often only proved to be available through sampling inspection for a large amount of data.
At present, although a semi-supervised learning method for classification tasks achieves a certain result, the semi-supervised learning method for target detection is not mature, and still has the problems of high calculation accuracy, large data volume and the like.
Disclosure of Invention
The invention provides a target detection method, device, electronic equipment and computer readable storage medium based on semi-supervised learning, and aims to improve the efficiency and accuracy of target detection based on semi-supervised learning.
In order to achieve the above object, the present invention provides a target detection method based on semi-supervised learning, including: determining label data corresponding to the training data based on the acquired training data;
performing data cleaning treatment on the label data to obtain cleaned new label data;
performing data enhancement processing on the new tag data to obtain enhancement data corresponding to the new tag data;
training a deep learning model based on the enhancement data and preset manually marked image information until a loss function of the deep learning model is converged within a preset range to form a target detection model;
and acquiring a target detection result of the data to be detected based on the target detection model.
Optionally, the step of determining the tag data corresponding to the training data includes:
performing horizontal mirror image overturning treatment on the training data, and acquiring the treated training data;
training an open source model based on the processed training data until the open source model is converged in a specified range to form a label acquisition model;
and acquiring label data of the label-free training data according to the label acquisition model.
Optionally, the training data includes unlabeled image information;
the tag data includes an object located on the image information, an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa, and a lower right-hand ordinate of a bounding box for bounding the object.
Optionally, the step of performing data cleaning processing on the tag data to obtain cleaned new tag data includes:
determining width, height and center point coordinate information of the bounding box based on an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa and a lower right-hand ordinate of the bounding box;
determining conversion coordinates corresponding to the bounding box according to the width coordinates of the bounding box, the width of the image information corresponding to the bounding box, the height and center coordinates of the bounding box and the height of the image information corresponding to the bounding box;
and carrying out data cleaning treatment on the converted coordinates based on an open source frame CLEANLAB to obtain cleaned new label data.
Optionally, the new tag data is stored in a blockchain, and the step of performing data enhancement processing on the new tag data includes:
randomly dithering the color variable of the new tag data; and/or geometrically deforming the object in the bounding box in the new tag data; and/or performing geometric deformation on the new tag data, and performing corresponding transformation on the bounding box in the new tag data; wherein,
the color variables include brightness, saturation, contrast, and transparency;
the geometric deformation includes translation, flipping, shearing, and rotation.
Optionally, the loss function comprises a sum of supervised loss and unsupervised loss; wherein,
the expression of the supervised loss is:
wherein x represents an image, p and t represent vector information, b represents a sequence number of a bounding box in manually marked image information, i represents a sequence number of a priori frame, pi represents a probability that a predicted priori frame belongs to a positive sample, ti represents coordinates (including an upper left abscissa and a lower right ordinate) of the priori frame, and when the priori frame i belongs to the bounding box b, p i,b *1, otherwise p i,b * Is 0, t b * Representing manually-labeled coordinates of the bounding box, ls representing the supervisual loss, lcls representing the classified loss function, lreg representing the regressive loss function, nreg representing the normalization coefficient of the regression term, ncls representing the normalization coefficient of the classification term;
the expression of the unsupervised loss is:
wherein x represents the image, q represents the label data of the image x, b represents the sequence number of the bounding box in the label data, i represents the sequence number of the prior box, pi represents the probability that the predicted prior box belongs to a positive sample, ti represents the coordinates (including the upper left-hand and lower right-hand) of the prior box, and q when the prior box i belongs to the bounding box b i,b *1, otherwise q i,b * Is 0, s b * Representing bounding boxes and their labeling coordinates, lu representing unsupervised loss, lcls representing the classified loss function, lreg representing the regressive loss function, nreg representing the normalization coefficient of the regression term, ncls representing the normalization coefficient of the classified term; wherein,
ω(x)=1ifmax(p(x;θ))≥τelse0
q(x)=ONE_HOT(argmax(p(x;θ)))
where θ represents a parameter that the deep learning model can train, and τ represents a confidence threshold for the new tag data.
Optionally, the number of the enhancement data is 10-15 times of the number of the image information marked manually.
In order to solve the above problems, the present invention also provides a target detection device based on semi-supervised learning, the device comprising:
a tag data determining unit configured to determine tag data corresponding to acquired training data based on the training data;
the new tag data acquisition unit is used for carrying out data cleaning treatment on the tag data to acquire cleaned new tag data;
the enhancement data acquisition unit is used for carrying out data enhancement processing on the new tag data and acquiring enhancement data corresponding to the new tag data;
the target detection model forming unit is used for training a deep learning model based on the enhancement data and preset image information marked by people until a loss function of the deep learning model is converged in a preset range so as to form a target detection model;
and the detection result acquisition unit is used for acquiring a target detection result of the data to be detected based on the target detection model.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
a memory storing at least one instruction; and
And the processor executes the instructions stored in the memory to realize the target detection method based on semi-supervised learning.
In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one instruction that is executed by a processor in an electronic device to implement the above-mentioned semi-supervised learning-based target detection method.
According to the embodiment of the invention, based on the acquired training data, corresponding tag data are determined, then data cleaning processing and data enhancement processing are carried out on the tag data, new tag data and enhancement data are acquired, and based on the enhancement data and preset image information marked manually, a deep learning model is trained until a loss function of the deep learning model is converged within a preset range, so that a target detection model is formed; through the characteristics, the automatic labeling of the embodiment of the invention can reach the quality close to the quality of the automatic labeling after the multi-stage inspection and correction, and the cost is greatly reduced; the method can also be used for checking and reforming the existing data by matching with manual quality check, simplifies the data quality check flow, and can save management and time cost besides labor cost.
Drawings
FIG. 1 is a flowchart of a method for detecting targets based on semi-supervised learning according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a target detection device based on semi-supervised learning according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an internal structure of an electronic device for implementing a target detection method based on semi-supervised learning according to an embodiment of the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The invention provides a target detection method based on semi-supervised learning. Referring to fig. 1, a flow chart of a target detection method based on semi-supervised learning according to an embodiment of the invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.
In this embodiment, the target detection method based on semi-supervised learning includes:
s110: based on the acquired training data, tag data corresponding to the training data is determined.
The training data can adopt image information without labels, and corresponding label data is obtained based on the image information without labels; wherein the tag data further includes a category of the object in the image information, an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa, and a lower right-hand ordinate of a bounding box bounding the object.
Wherein the step of determining the tag data corresponding to the training data comprises:
1. performing horizontal mirror image overturning treatment on the training data, and acquiring the treated training data;
2. training a detectorS open source model based on the processed training data until the detectorS open source model is converged in a specified range to form a label acquisition model;
3. and acquiring label data of the label-free training data according to the label acquisition model.
It should be noted that, the image information in the training data adopts original images instead of reduced-size images, the image information in the training data includes multi-mesoscale images with a size of 0.25,0.5,0.75,1,1.25,1.5,1.75,2,2.25,2.5,2.75,3, and the detection accuracy of the later model can be improved by adopting image information with various sizes.
S120: and performing data cleaning treatment on the label data to obtain cleaned new label data.
The specific process of performing data cleaning processing on the tag data and obtaining the cleaned new tag data may include:
1. determining width, height and center point coordinate information of the bounding box based on an upper left corner abscissa, an upper left corner ordinate, a lower right corner abscissa and a lower right corner ordinate of the bounding box;
2. determining conversion coordinates corresponding to the bounding box according to the width coordinates of the bounding box, the width of the image information corresponding to the bounding box, the height and center coordinates of the bounding box and the height of the image information corresponding to the bounding box;
specifically, the width coordinates of the bounding box may be divided by the width of the image information corresponding to the bounding box, and the height and center coordinates of the bounding box may be simultaneously divided by the height of the image information corresponding to the bounding box, to obtain the transformation coordinates corresponding to the bounding box;
3. and carrying out data cleaning treatment on the converted coordinates based on an open source frame CLEANLAB to obtain cleaned new label data.
It should be noted that the existing open source framework clean lab can clean noisy data and train a model with the cleaned data. However, the method can only be used for classification tasks and cannot be used in target detection tasks, and the target detection tasks can be regarded as superposition of classification tasks and regression tasks, so that in the target detection method based on semi-supervised learning, the regression tasks can be converted into classification tasks, and the CLEANLAB can be used for target detection tasks.
Since clearlab can be used to deal with classification problems up to 1000 classes, the regression target output value can be normalized to between 0 and 1, and then divided into 1 class for every 0.001 step, then it can be subdivided into 1000 classes of 1×1e-3,2×1e-3, … …,1000×1e-3, [ (n-1) ×1e-3, n×1 e-3) belonging to the nth class.
In the above step, the process of obtaining the transformed coordinates corresponding to the bounding box may normalize all 4 coordinates of the bounding box to between 0 and 1, thereby transforming the object detection task into a class 1000 classification problem. For example, assuming m neurons in the last layer in a neural network, m scalar quantities are output, represented by vector v. If the regression problem is handled, the m neurons are connected to 1 neuron again, 1 scalar w.v+b is obtained, and continuous values are output. If the classification problem is processed: the m neurons are connected to n neurons again to obtain n scalar w.v+b, and the n scalar w.v+b is normalized to the probability on n categories by using activation functions such as softmax, so that the regression problem can be converted into the classification problem.
S130: and carrying out data enhancement processing on the new tag data to obtain enhancement data corresponding to the new tag data.
Wherein, the new tag data can be stored in the blockchain, and the step of performing data enhancement processing on the new tag data comprises the following steps: randomly dithering the color variable of the new label data; and/or geometrically deforming the object within the bounding box in the new tag data; and/or performing geometric deformation on the new tag data, and performing corresponding transformation on the bounding box in the new tag data; wherein the color variables include brightness, saturation, contrast, and transparency; geometric deformation includes translation, flipping, shearing, and rotation.
It is emphasized that to further ensure the privacy and security of the new tag data, the new tag data may also be stored in a blockchain node.
Specifically, the label data after cleaning may be understood as image data with a machine label after cleaning, and the mode of performing data enhancement processing on the label data after cleaning includes three processing cases, the first type: performing random dithering of color variables on the cleaned label data; second case: geometrically deforming the bounding box in the cleaned label data; third case: and carrying out geometric deformation on the cleaned label data, and carrying out corresponding transformation on the bounding box. Wherein the color variables include brightness, saturation, contrast, transparency, etc.; geometric deformation includes translation, flipping, shearing, rotation, and the like.
In addition, after the enhancement data are acquired, positions can be selected randomly on the image of the enhancement data, and rectangular frames with random color noise are added, so that the shielding condition in the real scene is simulated through the rectangular frames, and the rectangular frames can be selectively added according to specific application scenes or requirements.
S140: training a deep learning model based on the enhancement data and preset manually-marked image information until a loss function of the deep learning model is converged within a preset range to form a target detection model.
Optionally, the loss function comprises a sum of supervised loss and unsupervised loss; wherein,
the expression for supervised loss is:
wherein x represents an image, p and t represent vector information, b represents a sequence number of a bounding box in manually marked image information, i represents a sequence number of a priori frame, and pi represents a predicted priori frameProbability of belonging to positive samples, ti denotes the coordinates of the prior box (including the abscissa of the upper left corner and the ordinate of the lower right corner), p when the prior box i belongs to bounding box b i,b *1, otherwise p i,b * Is 0, t b * Representing manually-labeled coordinates of the bounding box, ls representing the supervisual loss, lcls representing the classified loss function, lreg representing the regressive loss function, nreg representing the normalization coefficient of the regression term, ncls representing the normalization coefficient of the classification term;
the expression for unsupervised loss is:
wherein x represents the image, q represents the label data of the image x, b represents the sequence number of the bounding box in the label data, i represents the sequence number of the prior box, pi represents the probability that the predicted prior box belongs to a positive sample, ti represents the coordinates (including the upper left-hand and lower right-hand) of the prior box, and q when the prior box i belongs to the bounding box b i,b *1, otherwise q i,b * Is 0, s b * Representing bounding boxes and their labeling coordinates, lu representing unsupervised loss, lcls representing the classified loss function, lreg representing the regressive loss function, nreg representing the normalization coefficient of the regression term, ncls representing the normalization coefficient of the classified term; wherein,
ω(x)=1if max(p(x;θ))≥τelse0
q(x)=ONE_HOT(arg max(p(x;θ)))
in the above formula, θ represents a parameter trainable by the deep learning model, and τ represents a confidence threshold of new tag data.
It should be noted that, in order to increase the data volume, and avoid too small a ratio of manually marked image information, the number of enhancement data (i.e. the number of images to be marked with the image information) is 10-15 times that of the manually marked image information, and the ratio can also be set according to specific application scenarios and requirements, and is not limited to the specific numerical value.
S150: and acquiring a target detection result of the data to be detected based on the target detection model.
Compared with the traditional method, the automatic labeling can reach the quality approaching degree after the multi-stage inspection and correction, and the cost is greatly reduced; the method can also be used for checking and reforming the existing data by matching with manual quality check, and simplifying the data quality check flow; in addition, management and time costs can be saved in addition to labor costs. In addition, the method can be applied to more complicated scenes by modifying the existing data cleaning method, and provides a more alternative scheme for the subsequent expansion of the application field.
Corresponding to the target detection method based on semi-supervised learning, the invention also provides a target detection device based on semi-supervised learning.
Specifically, fig. 3 shows functional modules of a target detection apparatus based on semi-supervised learning according to an embodiment of the present invention.
As shown in fig. 3, the object detection apparatus 100 based on semi-supervised learning according to the present invention may be installed in an electronic device. Depending on the functions implemented, the semi-supervised learning based object detection apparatus 100 may include: a tag data determination unit 101, a new tag data acquisition unit 102, an enhanced data acquisition unit 103, an object detection model formation unit 104, and a detection result acquisition unit 105. The module of the present invention may also be referred to as a unit, meaning a series of computer program segments capable of being executed by the processor of the electronic device and of performing fixed functions, stored in the memory of the electronic device.
In the present embodiment, the functions concerning the respective modules/units are as follows:
a tag data determining unit 101 configured to determine tag data corresponding to acquired training data based on the training data;
and a new tag data obtaining unit 102, configured to perform data cleaning processing on the tag data, and obtain cleaned new tag data. It is emphasized that to further ensure the privacy and security of the new tag data, the new tag data may also be stored in a blockchain node.
An enhanced data obtaining unit 103, configured to perform data enhancement processing on the new tag data, and obtain enhanced data corresponding to the new tag data;
the target detection model forming unit 104 trains a deep learning model based on the enhancement data and preset manually-marked image information until a loss function of the deep learning model is converged within a preset range to form a target detection model;
and a detection result obtaining unit 105, configured to obtain a target detection result of the data to be detected based on the target detection model.
Specifically, in the tag data determining unit 101, the step of determining the tag data corresponding to the training data includes:
performing horizontal mirror image overturning treatment on the training data, and acquiring the treated training data;
training an open source model based on the processed training data until the open source model is converged in a specified range to form a label acquisition model;
and acquiring label data of the label-free training data according to the label acquisition model.
In addition, the training data includes unlabeled image information;
the tag data includes an object located on the image information, an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa, and a lower right-hand ordinate of a bounding box for bounding the object.
In the new tag data obtaining unit 102, performing data cleaning processing on the tag data, and obtaining cleaned new tag data includes:
determining width, height and center point coordinate information of the bounding box based on an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa and a lower right-hand ordinate of the bounding box;
determining conversion coordinates corresponding to the bounding box according to the width coordinates of the bounding box, the width of the image information corresponding to the bounding box, the height and the center coordinates of the bounding box and the height of the image information corresponding to the bounding box;
and carrying out data cleaning treatment on the converted coordinates based on an open source frame CLEANLAB to obtain cleaned new label data.
In the enhanced data obtaining unit 103, the new tag data is stored in a blockchain, and the step of performing data enhancement processing on the new tag data includes:
randomly dithering the color variable of the new tag data; and/or geometrically deforming the object in the bounding box in the new tag data; and/or performing geometric deformation on the new tag data, and performing corresponding transformation on the bounding box in the new tag data; wherein,
the color variables include brightness, saturation, contrast, and transparency;
the geometric deformation includes translation, flipping, shearing, and rotation.
Further, in the target detection model forming unit 104, the loss function includes a sum of supervised loss and unsupervised loss; wherein,
the expression of the supervised loss is:
wherein x represents an image, p and t represent vector information, b represents a sequence number of a bounding box in manually marked image information, i represents a sequence number of a priori frame, pi represents a probability that a predicted priori frame belongs to a positive sample, ti represents coordinates (including an upper left abscissa and a lower right ordinate) of the priori frame, and when the priori frame i belongs to the bounding box b, p i,b *1, otherwise p i,b * Is 0, t b * Artificially labeled coordinates representing bounding boxes, ls representing superselected loss, lcls representing classified loss functions, lreg representing regressive loss functions, nreg representing normalization of regression termsCoefficients, ncls, represent normalized coefficients of the classification term;
the expression of the unsupervised loss is:
wherein x represents the image, q represents the label data of the image x, b represents the sequence number of the bounding box in the label data, i represents the sequence number of the prior box, pi represents the probability that the predicted prior box belongs to a positive sample, ti represents the coordinates (including the upper left-hand and lower right-hand) of the prior box, and q when the prior box i belongs to the bounding box b i,b *1, otherwise q i,b * Is 0, s b * Representing bounding boxes and their labeling coordinates, lu representing unsupervised loss, lcls representing the classified loss function, lreg representing the regressive loss function, nreg representing the normalization coefficient of the regression term, ncls representing the normalization coefficient of the classified term; wherein,
ω(x)=1if max(p(x;θ))≥τelse0
q(x)=ONE_HOT(arg max(p(x;θ)))
where θ represents a parameter that the deep learning model can train, and τ represents a confidence threshold for the new tag data.
Fig. 3 is a schematic structural diagram of an electronic device for implementing the target detection method based on semi-supervised learning according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program stored in the memory 11 and executable on the processor 10, such as a semi-supervised learning based object detection program 12.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of a target detection program based on semi-supervised learning, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective components of the entire electronic device using various interfaces and lines, executes or executes programs or modules (e.g., a target detection program based on semi-supervised learning, etc.) stored in the memory 11, and invokes data stored in the memory 11 to perform various functions of the electronic device 1 and process the data.
The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 3 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The object detection program 12 based on semi-supervised learning stored by the memory 11 in the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
determining label data corresponding to the training data based on the acquired training data;
performing data cleaning treatment on the label data to obtain cleaned new label data;
performing data enhancement processing on the new tag data to obtain enhancement data corresponding to the new tag data;
training a deep learning model based on the enhancement data and preset manually marked image information until a loss function of the deep learning model is converged within a preset range to form a target detection model;
and acquiring a target detection result of the data to be detected based on the target detection model.
Optionally, the step of determining the tag data corresponding to the training data includes:
performing horizontal mirror image overturning treatment on the training data, and acquiring the treated training data;
training an open source model based on the processed training data until the open source model is converged in a specified range to form a label acquisition model;
and acquiring label data of the label-free training data according to the label acquisition model.
Optionally, the training data includes unlabeled image information;
the tag data includes an object located on the image information, an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa, and a lower right-hand ordinate of a bounding box for bounding the object. Optionally, the step of performing data cleaning processing on the tag data to obtain cleaned new tag data includes:
determining width, height and center point coordinate information of the bounding box based on an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa and a lower right-hand ordinate of the bounding box;
dividing the width coordinate of the bounding box by the width of the image information corresponding to the bounding box, and simultaneously dividing the height and the center coordinate of the bounding box by the height of the image information corresponding to the bounding box to obtain conversion coordinate information corresponding to the bounding box;
and carrying out data cleaning treatment on the converted coordinates based on an open source frame CLEANLAB to obtain cleaned new label data.
Optionally, the new tag data is stored in a blockchain, and the step of performing data enhancement processing on the new tag data includes:
randomly dithering the color variable of the new tag data; and/or geometrically deforming the object in the bounding box in the new tag data; and/or performing geometric deformation on the new tag data, and performing corresponding transformation on the bounding box in the new tag data; wherein,
the color variables include brightness, saturation, contrast, and transparency;
the geometric deformation includes translation, flipping, shearing, and rotation.
Optionally, the loss function comprises a sum of supervised loss and unsupervised loss; wherein,
the expression of the supervised loss is:
wherein x represents an image, p and t represent vector information, b represents a sequence number of a bounding box in manually marked image information, i represents a sequence number of a priori frame, pi represents a probability that a predicted priori frame belongs to a positive sample, ti represents coordinates (including an upper left abscissa and a lower right ordinate) of the priori frame, and when the priori frame i belongs to the bounding box b, p i,b *1, otherwise p i,b * Is 0, t b * Representing manually-labeled coordinates of the bounding box, ls representing the supervisual loss, lcls representing the classified loss function, lreg representing the regressive loss function, nreg representing the normalization coefficient of the regression term, ncls representing the normalization coefficient of the classification term;
the expression of the unsupervised loss is:
wherein x represents the image, q represents the label data of the image x, b represents the sequence number of the bounding box in the label data, i represents the sequence number of the prior box, pi represents the probability that the predicted prior box belongs to a positive sample, ti represents the coordinates (including the upper left-hand and lower right-hand) of the prior box, and q when the prior box i belongs to the bounding box b i,b *1, otherwise q i,b * Is 0, s b * Representing bounding boxes and their labeling coordinates, lu representing unsupervised loss, lcls representing the classified loss function, lreg representing the regressive loss function, nreg representing the normalization coefficient of the regression term, ncls representing the normalization coefficient of the classified term; wherein,
ω(x)=1if max(p(x;θ))≥τelse0
q(x)=ONE_HOT(arg max(p(x;θ)))
where θ represents a parameter that the deep learning model can train, and τ represents a confidence threshold for the new tag data.
Optionally, the number of the enhancement data is 10-15 times of the number of the image information marked manually.
Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (7)

1. A target detection method based on semi-supervised learning, the method comprising:
determining label data corresponding to the training data based on the acquired training data;
performing data cleaning treatment on the label data to obtain cleaned new label data;
performing data enhancement processing on the new tag data to obtain enhancement data corresponding to the new tag data;
training a deep learning model based on the enhancement data and preset manually marked image information until a loss function of the deep learning model is converged within a preset range to form a target detection model;
acquiring a target detection result of the data to be detected based on the target detection model; the step of determining the tag data corresponding to the training data includes:
performing horizontal mirror image overturning treatment on the training data to obtain overturning treated training data;
training an open source model based on the training data after the overturning treatment until the open source model is converged in a specified range to form a label acquisition model;
acquiring label data of the unlabeled training data according to the label acquisition model;
the training data comprises image information without labels;
the tag data includes an object located on the image information, an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa, and a lower right-hand ordinate of a bounding box for bounding the object; the step of carrying out data cleaning treatment on the label data and obtaining the cleaned new label data comprises the following steps:
determining width, height and center point coordinate information of the bounding box based on an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa and a lower right-hand ordinate of the bounding box;
determining conversion coordinates corresponding to the bounding box according to the width coordinates of the bounding box, the width of the image information corresponding to the bounding box, the height and center coordinates of the bounding box and the height of the image information corresponding to the bounding box;
and carrying out data cleaning treatment on the converted coordinates based on an open source frame CLEANLAB to obtain cleaned new label data.
2. The semi-supervised learning based target detection method as set forth in claim 1, wherein the new tag data is stored in a blockchain, and the step of performing data enhancement processing on the new tag data includes:
randomly dithering the color variable of the new tag data; and/or geometrically deforming the object in the bounding box in the new tag data; and/or performing geometric deformation on the new tag data, and performing corresponding transformation on the bounding box in the new tag data; wherein,
the color variables include brightness, saturation, contrast, and transparency;
the geometric deformation includes translation, flipping, shearing, and rotation.
3. The semi-supervised learning based target detection method as recited in claim 1, wherein the loss function includes a sum of supervised and unsupervised losses; wherein,
the expression of the supervised loss is:
wherein x represents an image, p and t represent vector information, b represents a sequence number of a bounding box in manually labeled image information, i represents a sequence number of a priori frame, and p i Representing the probability that the predicted prior frame belongs to a positive sample, t i Representing coordinates of the prior frame including the abscissa of the upper left corner and the ordinate of the lower right corner, p when the prior frame i belongs to the bounding frame b i,b *1, otherwise p i,b * Is 0, t b * Representing manually annotated coordinates of bounding box, L s Represents a supervised loss, L cls Representing the loss function of the classification, L reg Representing the regression loss function, N reg Normalized coefficients representing regression terms, N cls Normalized coefficients representing the classification term;
the expression of the unsupervised loss is:
wherein x represents an image, q represents tag data of the image x, b represents a sequence number of a bounding box in the tag data, i represents a sequence number of an a priori frame, and p i Representing the probability that the predicted prior frame belongs to a positive sample, t i Representing coordinates of the prior frame including the abscissa of the upper left corner and the ordinate of the lower right corner, q when the prior frame i belongs to the bounding frame b i,b *1, otherwise q i,b * Is 0, s b * Representing bounding box and its labeling coordinates, L u Representation unsupervised loss, L cls Representing the loss function of the classification, L reg Representing the regression loss function, N reg Normalized coefficients representing regression terms, N cls Normalized coefficients representing the classification term; wherein,
ω(y)=1ifmax(p(y;θ))≥τelse0
q(y)=ONE_HOT(argmax(p(y;θ)))
where y represents an unknown parameter, p represents a probability, θ represents a parameter trainable by the deep learning model, and τ represents a confidence threshold for the new tag data.
4. The method for detecting a target based on semi-supervised learning as set forth in claim 1,
the number of the enhancement data is 10-15 times of the number of the image information marked manually.
5. A semi-supervised learning based object detection apparatus, the apparatus comprising:
a tag data determining unit configured to determine tag data corresponding to acquired training data based on the training data; wherein the step of determining the tag data corresponding to the training data includes:
performing horizontal mirror image overturning treatment on the training data to obtain overturning treated training data;
training an open source model based on the training data after the overturning treatment until the open source model is converged in a specified range to form a label acquisition model;
acquiring label data of the unlabeled training data according to the label acquisition model;
the new tag data acquisition unit is used for carrying out data cleaning treatment on the tag data to acquire cleaned new tag data;
the enhancement data acquisition unit is used for carrying out data enhancement processing on the new tag data and acquiring enhancement data corresponding to the new tag data;
the target detection model forming unit is used for training a deep learning model based on the enhancement data and preset image information marked by people until a loss function of the deep learning model is converged in a preset range so as to form a target detection model;
the detection result acquisition unit is used for acquiring a target detection result of the data to be detected based on the target detection model;
the training data comprises image information without labels;
the tag data includes an object located on the image information, an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa, and a lower right-hand ordinate of a bounding box for bounding the object; the step of carrying out data cleaning treatment on the label data and obtaining the cleaned new label data comprises the following steps:
determining width, height and center point coordinate information of the bounding box based on an upper left-hand abscissa, an upper left-hand ordinate, a lower right-hand abscissa and a lower right-hand ordinate of the bounding box;
determining conversion coordinates corresponding to the bounding box according to the width coordinates of the bounding box, the width of the image information corresponding to the bounding box, the height and center coordinates of the bounding box and the height of the image information corresponding to the bounding box;
and carrying out data cleaning treatment on the converted coordinates based on an open source frame CLEANLAB to obtain cleaned new label data.
6. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps in the semi-supervised learning based objective detection method as recited in any of claims 1 to 4.
7. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the semi-supervised learning based object detection method as recited in any of claims 1 to 4.
CN202011288652.1A 2020-11-17 2020-11-17 Target detection method, device and storage medium based on semi-supervised learning Active CN112580684B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011288652.1A CN112580684B (en) 2020-11-17 2020-11-17 Target detection method, device and storage medium based on semi-supervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011288652.1A CN112580684B (en) 2020-11-17 2020-11-17 Target detection method, device and storage medium based on semi-supervised learning

Publications (2)

Publication Number Publication Date
CN112580684A CN112580684A (en) 2021-03-30
CN112580684B true CN112580684B (en) 2024-04-09

Family

ID=75122779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011288652.1A Active CN112580684B (en) 2020-11-17 2020-11-17 Target detection method, device and storage medium based on semi-supervised learning

Country Status (1)

Country Link
CN (1) CN112580684B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139594B (en) * 2021-04-19 2023-05-02 北京理工大学 Self-adaptive detection method for airborne image unmanned aerial vehicle target
CN113191409A (en) * 2021-04-20 2021-07-30 国网江苏省电力有限公司营销服务中心 Method for detecting abnormal electricity consumption behaviors of residents through tag data expansion and deep learning
CN112990374B (en) * 2021-04-28 2023-09-15 平安科技(深圳)有限公司 Image classification method, device, electronic equipment and medium
CN113379322A (en) * 2021-07-06 2021-09-10 国网江苏省电力有限公司营销服务中心 Electricity stealing user distinguishing method based on tag augmentation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910375A (en) * 2019-11-26 2020-03-24 北京明略软件系统有限公司 Detection model training method, device, equipment and medium based on semi-supervised learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11238365B2 (en) * 2017-12-29 2022-02-01 Verizon Media Inc. Method and system for detecting anomalies in data labels

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910375A (en) * 2019-11-26 2020-03-24 北京明略软件系统有限公司 Detection model training method, device, equipment and medium based on semi-supervised learning

Also Published As

Publication number Publication date
CN112580684A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN112580684B (en) Target detection method, device and storage medium based on semi-supervised learning
CN113283446B (en) Method and device for identifying object in image, electronic equipment and storage medium
CN108376267A (en) A kind of zero sample classification method based on classification transfer
CN112052850B (en) License plate recognition method and device, electronic equipment and storage medium
CN111932547B (en) Method and device for segmenting target object in image, electronic device and storage medium
WO2021151277A1 (en) Method and apparatus for determining severity of damage on target object, electronic device, and storage medium
WO2022141858A1 (en) Pedestrian detection method and apparatus, electronic device, and storage medium
CN111932564A (en) Picture identification method and device, electronic equipment and computer readable storage medium
CN112308237A (en) Question and answer data enhancement method and device, computer equipment and storage medium
CN113222063A (en) Express carton garbage classification method, device, equipment and medium
CN113298159A (en) Target detection method and device, electronic equipment and storage medium
CN115690615B (en) Video stream-oriented deep learning target recognition method and system
CN114627435B (en) Intelligent light adjusting method, device, equipment and medium based on image recognition
CN113792801B (en) Method, device, equipment and storage medium for detecting face dazzling degree
CN112580505B (en) Method and device for identifying network point switch door state, electronic equipment and storage medium
CN112233194B (en) Medical picture optimization method, device, equipment and computer readable storage medium
CN114049676A (en) Fatigue state detection method, device, equipment and storage medium
Han et al. Automatic used mobile phone color determination: Enhancing the used mobile phone recycling in China
WO2020172767A1 (en) Electronic purchase order recognition method and apparatus, and terminal device.
CN113850207B (en) Micro-expression classification method and device based on artificial intelligence, electronic equipment and medium
CN113239876B (en) Training method for large-angle face recognition model
CN113408424B (en) Article identification method, apparatus, electronic device and storage medium
WO2023178798A1 (en) Image classification method and apparatus, and device and medium
CN116805377A (en) Image classification method and device based on domain features, electronic equipment and storage medium
CN117408257A (en) Entity identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant