CN110610230A - Station caption detection method and device and readable storage medium - Google Patents
Station caption detection method and device and readable storage medium Download PDFInfo
- Publication number
- CN110610230A CN110610230A CN201910698120.6A CN201910698120A CN110610230A CN 110610230 A CN110610230 A CN 110610230A CN 201910698120 A CN201910698120 A CN 201910698120A CN 110610230 A CN110610230 A CN 110610230A
- Authority
- CN
- China
- Prior art keywords
- station caption
- loss
- neural network
- training
- station
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 87
- 238000013528 artificial neural network Methods 0.000 claims abstract description 70
- 230000004927 fusion Effects 0.000 claims abstract description 36
- 238000000034 method Methods 0.000 claims abstract description 28
- 238000012360 testing method Methods 0.000 claims description 21
- 230000001537 neural effect Effects 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 11
- 230000000694 effects Effects 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 29
- 230000004913 activation Effects 0.000 description 9
- 238000011176 pooling Methods 0.000 description 9
- 238000010606 normalization Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003313 weakening effect Effects 0.000 description 2
- 240000004282 Grewia occidentalis Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a station caption detection method, a device and a readable storage medium, wherein the method comprises the following steps: acquiring a station caption data set, and grouping the station caption data set to acquire a station caption training set; constructing a multi-loss fusion twin neural network, and training the constructed multi-loss fusion twin neural network based on the station caption training set to obtain a trained multi-loss fusion twin neural network; and detecting the station caption to be detected through the trained multi-loss fusion twin neural network. The method well eliminates the influence on the training network caused by insufficient sample number by constructing the twin neural network framework, and can better detect the unknown new type of sensitive station caption.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a station caption detection method, a station caption detection device and a readable storage medium.
Background
With the development of science and technology and the change of information carriers, various information clouds can integrate into every corner of the society in the internet, and the network empowerment can make people have more speaking rights and also make people spreading bad information have the opportunity. The internet affects the vast netizens in the aspects of information dissemination and public opinion guidance, so that the maintenance of network information security becomes a primary task.
The existing data is various in types but rare in data quantity, the workload of manual early-stage labeling is large, and the difficulty and the cost of using conventional target detection are large.
Disclosure of Invention
The embodiment of the invention provides a station caption detection method, a station caption detection device and a readable storage medium.
In a first aspect, a first embodiment of the present invention provides a station caption detecting method, including the following steps:
acquiring a station caption data set, and grouping the station caption data set to acquire a station caption training set;
constructing a multi-loss fusion twin neural network, and training the constructed multi-loss fusion twin neural network based on the station caption training set to obtain a trained multi-loss fusion twin neural network;
and detecting the station caption to be detected through the trained multi-loss fusion twin neural network.
Optionally, the acquiring the station caption data set includes:
acquiring a specified amount of picture data from a public data set and frames intercepted by an existing video, and cutting the picture data into picture clips with set sizes;
carrying out random processing on the existing vector station caption, and adding the vector station caption after the random processing as a watermark to different positions of the picture clip to obtain a station caption picture set;
classifying the station caption picture set according to the type of the station caption to obtain a station caption positive sample;
acquiring a plurality of pure background pictures, adding other watermarks to a set number of the pure background pictures to obtain watermark background pictures, and combining the watermark background pictures and the residual number of the pure background pictures to form a station caption negative sample;
and forming a station caption data set according to the station caption positive sample and the station caption negative sample.
Optionally, the grouping the station caption data sets to obtain a station caption training set includes:
and randomly arranging the station caption data sets, and dividing the randomly arranged station caption data sets into a station caption training set and a station caption testing set according to a proportion.
Optionally, the constructing a multi-loss fused twin neural network includes:
constructing a residual error neural network comprising a set depth;
constructing two residual error neural sub-networks with the same structure according to the residual error neural network;
constructing a contrast loss layer, connecting the outputs of the two residual neural sub-networks to the input of the contrast loss layer.
Optionally, the training the constructed multi-loss fusion twin neural network based on the station caption training set to obtain the trained multi-loss fusion twin neural network includes:
dividing the station caption training set and the station caption testing set into two equal parts;
keeping the corresponding relation unchanged, and disordering the data of the station caption training set and the station caption testing set after the two equal parts;
inputting the disturbed station caption training set and the disturbed station caption testing set into the two residual error neural sub-networks in pairs respectively;
performing similarity contrast on input data through a contrast-loss layer to train the multi-loss fused twin neural network.
Optionally, the performing similarity comparison on the input data through a contrast loss layer to train the multi-loss fused twin neural network includes:
constructing a classification loss function of a cost function layer of a residual neural subnetwork;
performing data processing on the classification loss values output by the two residual error neural sub-networks, and adding the classification loss values and the output of the contrast loss layer to obtain a final loss value;
training the multi-loss fused twin neural network according to the final loss value.
Optionally, the training the multi-loss fused twin neural network according to the final loss value includes:
setting training parameters and training the multi-loss fused twin neural network according to the final loss value;
and stopping training after the trained multi-loss fusion twin neural network achieves a preset effect.
In a second aspect, a second embodiment of the present invention provides a station caption detecting apparatus, including:
the data processing module is used for acquiring a station caption data set and grouping the station caption data set to acquire a station caption training set;
the network training module is used for constructing a multi-loss fusion twin neural network and training the constructed multi-loss fusion twin neural network based on the station caption training set to obtain a trained multi-loss fusion twin neural network;
and the detection module is used for detecting the station caption to be detected through the trained multi-loss fusion twin neural network.
In a third aspect, a third embodiment of the present invention provides a computer-readable storage medium, on which an implementation program for information transfer is stored, and the program, when executed by a processor, implements the steps of the method of the first embodiment.
According to the embodiment of the invention, the influence of insufficient sample quantity on the training network is well eliminated by constructing the twin neural network framework, and unknown new types of sensitive station marks can be better detected.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart of an embodiment of the method of the present invention;
FIG. 2 is a schematic diagram of a training model according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
A first embodiment of the present invention provides a first aspect, and a first embodiment of the present invention provides a station caption detecting method, as shown in fig. 1, the method including the steps of:
acquiring a station caption data set, and grouping the station caption data set to acquire a station caption training set;
constructing a multi-loss fusion twin neural network, and training the constructed multi-loss fusion twin neural network based on the station caption training set to obtain a trained multi-loss fusion twin neural network;
and detecting the station caption to be detected through the trained multi-loss fusion twin neural network.
According to the embodiment of the invention, the influence of insufficient sample quantity on the training network is well eliminated by constructing the twin neural network framework, and unknown new types of sensitive station marks can be better detected.
Optionally, in an optional embodiment of the present invention, the acquiring the station caption data set includes:
acquiring a specified amount of picture data from a public data set and frames intercepted by an existing video, and cutting the picture data into picture clips with set sizes;
specifically, the public data set may be obtained through programming, or may be obtained through other manners, which is not specifically limited in this application.
Carrying out random processing on the existing vector station caption, and adding the vector station caption after the random processing as a watermark to different positions of the picture clip to obtain a station caption picture set;
specifically, in this embodiment, the watermark is randomly subjected to size scaling, deformation, color enhancement and weakening, and is randomly added to different positions of the picture, the pictures with the same type of logo are used as data of one type of label, and each type of picture is about ten thousand, so as to obtain a logo picture set.
Classifying the station caption picture set according to the type of the station caption to obtain a station caption positive sample;
acquiring a plurality of pure background pictures, adding other watermarks to a set number of the pure background pictures to obtain watermark background pictures, and combining the watermark background pictures and the residual number of the pure background pictures to form a station caption negative sample;
specifically, in this embodiment, according to the number of station caption positive samples, for example, in this embodiment, ten thousand pure background pictures are further cut, and part of the pure background pictures is added with other watermarks, which are different from the station caption watermark, and the other watermarks are used as negative samples together with the remaining pure background pictures.
And forming a station caption data set according to the station caption positive sample and the station caption negative sample.
Optionally, in an optional embodiment of the present invention, the grouping the station caption data set to obtain a station caption training set includes:
and randomly arranging the station caption data sets, and dividing the randomly arranged station caption data sets into a station caption training set and a station caption testing set according to a proportion.
Specifically, the whole station caption data set formed in the above manner is randomly disturbed, and the randomly arranged station caption data set is randomly divided into a training set and a test set according to a certain proportion.
Optionally, in an optional embodiment of the present invention, the constructing a multi-loss fused twin neural network includes:
constructing a residual error neural network comprising a set depth;
specifically, in this embodiment, a residual neural network with a predetermined depth is constructed, and the design of the residual neural network may include the following structure: the device comprises an ImageData data layer, a volume Convolution layer, a Batch Normal normalization layer, a ReLU activation function layer, a Pooling pool layer, an Eltwise addition layer, an InnerProduct full connection layer, a SoftmaxWithLoss cost function layer, an Accuracy precision layer and other structures, wherein the data layer, the initial Convolution layer, the normalization layer, the activation function layer, the maximum value pool layer, the Convolution layer and the normalization layer are sequentially connected, and a result is output to the addition layer. The initial convolutional layer is used for carrying out convolution on input sample data, a short-circuit path exists in the pooling layer and leads to the addition layer, the output of the addition layer leads to the activation function layer, and the output of the activation function layer is used as the input of the next convolutional layer and the next addition layer; and an average pooling layer and a full-link layer are connected behind the last activation function layer, and are sent to a cost function SoftmaxWithLoss layer and an Accuracy layer together with the label value output by the data layer, so that the final output of the network, namely the probability and the Accuracy of the station caption as the predicted station caption, is obtained.
In this embodiment, the residual neural network is used for short-circuiting the characteristics of the front and rear layers, and satisfies the following conditions:
y=F(x,{wi})+x
where x and y represent the input and output vectors of the residual network, respectively.
Constructing two residual error neural sub-networks with the same structure according to the residual error neural network;
the residual error neural network is transformed into two residual error neural sub-networks with the same structure, and the input data can be averagely divided into two parts to be respectively input into the two sub-networks in the specific implementation process.
Constructing a contrast loss layer, connecting the outputs of the two residual neural sub-networks to the input of the contrast loss layer.
In this embodiment, a contrast Loss layer is further added on the basis of the residual neural sub-networks, and the output of the average pooling layer and the tag value of the output of the data layer in the two residual neural sub-networks are used as the input of the contrast Loss layer.
The contrast Loss layer function satisfies:
wherein d | | | an-bn||2The euclidean distance between two sample features is represented, y is a label indicating whether two samples match, y being 1 represents that two samples are similar or match, y being 0 represents mismatch, and margin is a set threshold. The output of the loss function in this embodiment can be well expressed as the matching degree of the pair of samples, and when y is equal to 1 (i.e. the samples are similar), the loss function only leaves yd2That is, if the euclidean distance of the original sample in the feature space is large, the loss increases because the current model is not good. And when y is 0 (i.e., the samples are not similar), the loss function is Σ (1-y) max (margin-d,0)2That is, when the samples are not similar, the euclidean distance of the feature space is small, and the loss value becomes large, so that the network can be well trained by comparing the matching degree of the two samples.
Optionally, in an optional embodiment of the present invention, the training the constructed multi-loss fusion twin neural network based on the station caption training set to obtain a trained multi-loss fusion twin neural network includes:
dividing the station caption training set and the station caption testing set into two equal parts;
specifically, on the basis that the training set and the test set are randomly divided into the training set and the test set according to a certain proportion, the training set and the test set are reclassified, the training set and the test set are respectively and averagely divided into two parts, and the picture position information and the tag values are written into the text documents, wherein two text documents in the training set and two text documents in the test set can ensure that each row corresponds to one another one by one, half of the tag values are the same, and the other half of the tag values are different.
Keeping the corresponding relation unchanged, disordering the data of the station caption training set and the station caption testing set after the two equal parts, disordering the data in the two text documents of the training set, and keeping the corresponding relation unchanged.
Inputting the disturbed station caption training set and the disturbed station caption testing set into the two residual error neural sub-networks in pairs respectively;
performing similarity contrast on input data through a contrast-loss layer to train the multi-loss fused twin neural network.
Then, two parts of data are respectively input into the two sub-networks in pairs and pass through a contrast loss function L1The input is mapped to a target space, and the similarity is compared in the target space using a simple distance (euclidean distance, etc.). In the training phase, the loss function values of a pair of samples from the same class are minimized, and the loss function values of a stack of samples from different classes are maximized.
Optionally, the performing similarity comparison on the input data through a contrast loss layer to train the multi-loss fused twin neural network includes:
constructing a classification loss function of a cost function layer of a residual neural subnetwork;
in the embodiment, classification loss is applied to each network in the twin network, and the network is trained to recognize the existing type-sensitive station caption. Loss of classification L2And cross entropy loss is adopted, so that the following conditions are met:
wherein,class predicted by the network for the nth sample, znIs the corresponding true category label.
And performing data processing on the classification loss values output by the two residual neural sub-networks, and adding the classification loss values and the output of the contrast loss layer to obtain a final loss value.
Specifically, as shown in fig. 2, the classification loss value L of the cost function layer output is set2Multiplying the value by a set coefficient and comparing the value of the loss function L1And adding the loss values to obtain the final loss value for training the network. For example, the final loss function may be:
L=L1+αL2
training the multi-loss fused twin neural network according to the final loss value.
Optionally, the training the multi-loss fused twin neural network according to the final loss value includes:
setting training parameters and training the multi-loss fused twin neural network according to the final loss value;
and stopping training after the trained multi-loss fusion twin neural network achieves a preset effect.
Specifically, in this embodiment, the training is stopped and the model is saved after the output data of the network to be generated reaches the expected effect by setting the super parameters such as the number of iterations.
And detecting the unknown station caption by the saved model.
In view of the fact that the position of the sensitive station caption is relatively fixed, the method can directly extract the four-corner area of the video frame to detect the sensitive station caption. Meanwhile, station labels in the sensitive library may be increased continuously along with the time, and the station label comparison method is added to enable the sample to be detected to obtain the identification result through comparison calculation with the newly added station label. In addition, because the data volume of the sensitive station caption is rare, the invention simulates a real scene to generate training data, and uses a comparison loss training network to reduce the requirement of the model on the data volume. And the multiple losses such as the contrast loss and the classification loss are fused, so that the classification performance and the contrast performance of the detection model can be improved simultaneously.
After the technical scheme is adopted, the invention at least has the following beneficial effects:
1. the invention adopts multi-loss fusion, the classification loss value is multiplied by a specific coefficient and then added with the comparison loss function value to be used as a final loss value for training the network, and the unknown new type of sensitive station caption can be better detected.
2. By adopting the twin neural network framework, the invention well eliminates the influence on the training network caused by insufficient sample number, and better adapts to the characteristics of a data set with more sample types and less sample number of each type in the station caption detection.
3. The invention embeds the convolutional neural network into the twin neural network framework, adds the ContrastivLoss contrast loss layer in the convolutional neural network framework, leads the network to learn a similarity measure from the data, compares and matches a new sample of unknown class by using the learned measure, and has higher accuracy.
In a second aspect, a second embodiment of the present invention provides an implementation case of a station caption detection method, including:
1. programming to generate a data set, selecting more than one hundred pictures with bright colors from a public data set and frames intercepted by an existing video, cutting the pictures into specific sizes, taking a vector station caption as a watermark, randomly carrying out size scaling, deformation, color enhancement and weakening on the vector station caption, randomly adding the vector station caption to different positions of the pictures, taking the pictures with the same station caption as data of a class of labels, and taking about ten thousand pictures of each class of pictures; disorganizing the whole data set, and randomly dividing the data set into a training set and a testing set according to a certain proportion; and respectively and averagely dividing the training set and the test set into two parts, and writing the picture position information and the label value into a text document. The two text documents in the training set and the two text documents in the testing set can ensure that each row corresponds to each other one by one, and half of the label values are the same, while the other half of the label values are different; and (4) disordering data in the two text documents of the training set, but keeping the corresponding relation unchanged.
2. The data are respectively input into the data layers of the two sub-networks for data preprocessing, and data enhancement is carried out on the data, and the method comprises the following steps: the 64 x 112 blocks were randomly clipped, mirrored on, and the mean of all three channels was set to 127.5, normalizing the data to between 0 and 1, and the training batch size of the network was set to 64.
3. After the station caption data set is divided, the training set is used for training the network, the specific network structure is shown in fig. 2, and the whole network is formed by stacking a plurality of residual modules. Firstly, two groups of data are respectively input into initial convolution layers of two sub-networks, the step is set to be 2 through a convolution filter of 7 x 7, 3 pixel completions are added on each side of an input image, then four edges are all expanded by 3 pixels, namely the width and the height are both expanded by 6 pixels, and thus the feature diagram after convolution operation can keep the original size. And then sequentially passing the feature vectors output by the initial convolutional layer through a Batch Normal normalization layer, a ReLU activation function layer, a Pooling maximum value Pooling layer, a constraint convolutional layer and a Batch Normal normalization layer, and outputting the result to an Eltwise addition layer. Wherein a short-circuit path exists in the maximum value pooling layer and leads to the addition layer, the output of the addition layer leads to the activation function layer, and the output of the activation function layer is used as the input of the next convolution layer and the next addition layer; an average value pooling layer and a full connection layer are connected behind the last activation function layer; the output of the average value pooling layer of the two sub-networks and the label value output by the data layer in the two sub-networks are input into a contrast Loss layer for training; and the output result of the full connection layer and the label value are sent to a cost function SoftmaxWithLoss layer and an Accuracy layer.
4. And multiplying the classification loss value output by the cost function layer by a specific coefficient, and adding the result and the comparison loss function value to obtain a final loss value, namely:
L=L1+αL2
the network is trained with this total loss.
5. The super parameter is set when the network is trained, the initial learning rate is set to be 0.1, and the learning rate can be updated according to the specified iteration number by adopting a multi-step learning strategy. Two step values (stepvalue) are set to 20000 and 80000, respectively, and the parameter constant factor is set to 0.1, i.e., the learning rate changes to 0.01 by 20000 iterations and to 0.001 by 80000 iterations. The accuracy of the model can be effectively improved by using the changed learning rate in the training process.
6. When the iteration is carried out for a certain number of times, the network is converged, the precision reaches the highest, and the model is saved.
7. Unknown station markers are detected using the trained model.
In a third aspect, a third embodiment of the present invention provides a station caption detecting apparatus, including:
the data processing module is used for acquiring a station caption data set and grouping the station caption data set to acquire a station caption training set;
the network training module is used for constructing a multi-loss fusion twin neural network and training the constructed multi-loss fusion twin neural network based on the station caption training set to obtain a trained multi-loss fusion twin neural network;
and the detection module is used for detecting the station caption to be detected through the trained multi-loss fusion twin neural network.
In a fourth aspect, a fourth embodiment of the present invention provides a computer-readable storage medium, on which an implementation program for information transfer is stored, and the program, when executed by a processor, implements the steps of the method of the first embodiment.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (9)
1. A station caption detection method is characterized by comprising the following steps:
acquiring a station caption data set, and grouping the station caption data set to acquire a station caption training set;
constructing a multi-loss fusion twin neural network, and training the constructed multi-loss fusion twin neural network based on the station caption training set to obtain a trained multi-loss fusion twin neural network;
and detecting the station caption to be detected through the trained multi-loss fusion twin neural network.
2. The method of claim 1, wherein said obtaining a station caption data set comprises:
acquiring a specified amount of picture data from a public data set and frames intercepted by an existing video, and cutting the picture data into picture clips with set sizes;
carrying out random processing on the existing vector station caption, and adding the vector station caption after the random processing as a watermark to different positions of the picture clip to obtain a station caption picture set;
classifying the station caption picture set according to the type of the station caption to obtain a station caption positive sample;
acquiring a plurality of pure background pictures, adding other watermarks to a set number of the pure background pictures to obtain watermark background pictures, and combining the watermark background pictures and the residual number of the pure background pictures to form a station caption negative sample;
and forming a station caption data set according to the station caption positive sample and the station caption negative sample.
3. The method of claim 2, wherein the grouping the station caption data set to obtain a station caption training set comprises:
and randomly arranging the station caption data sets, and dividing the randomly arranged station caption data sets into a station caption training set and a station caption testing set according to a proportion.
4. The method of claim 3, wherein constructing a multi-loss fused twin neural network comprises:
constructing a residual error neural network comprising a set depth;
constructing two residual error neural sub-networks with the same structure according to the residual error neural network;
constructing a contrast loss layer, connecting the outputs of the two residual neural sub-networks to the input of the contrast loss layer.
5. The method of claim 4, wherein training the constructed multi-loss fused twin neural network based on the station mark training set obtains a trained multi-loss fused twin neural network, comprising:
dividing the station caption training set and the station caption testing set into two equal parts;
keeping the corresponding relation unchanged, and disordering the data of the station caption training set and the station caption testing set after the two equal parts;
inputting the disturbed station caption training set and the disturbed station caption testing set into the two residual error neural sub-networks in pairs respectively;
performing similarity contrast on input data through a contrast-loss layer to train the multi-loss fused twin neural network.
6. The method of claim 5, wherein the similarity comparison of input data by a contrast-loss layer to train the multi-loss fused twin neural network comprises:
constructing a classification loss function of a cost function layer of a residual neural subnetwork;
performing data processing on the classification loss values output by the two residual error neural sub-networks, and adding the classification loss values and the output of the contrast loss layer to obtain a final loss value;
training the multi-loss fused twin neural network according to the final loss value.
7. The method of claim 6, wherein training the multi-loss fused twin neural network according to the final loss values comprises:
setting training parameters and training the multi-loss fused twin neural network according to the final loss value;
and stopping training after the trained multi-loss fusion twin neural network achieves a preset effect.
8. A station caption detecting apparatus, characterized in that the apparatus comprises:
the data processing module is used for acquiring a station caption data set and grouping the station caption data set to acquire a station caption training set;
the network training module is used for constructing a multi-loss fusion twin neural network and training the constructed multi-loss fusion twin neural network based on the station caption training set to obtain a trained multi-loss fusion twin neural network;
and the detection module is used for detecting the station caption to be detected through the trained multi-loss fusion twin neural network.
9. A computer-readable storage medium, characterized in that it has stored thereon a program for implementing the transfer of information, which program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910698120.6A CN110610230A (en) | 2019-07-31 | 2019-07-31 | Station caption detection method and device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910698120.6A CN110610230A (en) | 2019-07-31 | 2019-07-31 | Station caption detection method and device and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110610230A true CN110610230A (en) | 2019-12-24 |
Family
ID=68890202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910698120.6A Pending CN110610230A (en) | 2019-07-31 | 2019-07-31 | Station caption detection method and device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110610230A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178290A (en) * | 2019-12-31 | 2020-05-19 | 上海眼控科技股份有限公司 | Signature verification method and device |
CN111311475A (en) * | 2020-02-21 | 2020-06-19 | 广州腾讯科技有限公司 | Detection model training method and device, storage medium and computer equipment |
CN111860472A (en) * | 2020-09-24 | 2020-10-30 | 成都索贝数码科技股份有限公司 | Television station caption detection method, system, computer equipment and storage medium |
CN112975639A (en) * | 2021-05-19 | 2021-06-18 | 江苏中科云控智能工业装备有限公司 | Die casting polishing and deburring mechanism and method |
CN113222802A (en) * | 2021-05-27 | 2021-08-06 | 西安电子科技大学 | Digital image watermarking method based on anti-attack |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902987A (en) * | 2014-04-17 | 2014-07-02 | 福州大学 | Station caption identifying method based on convolutional network |
CN106488313A (en) * | 2016-10-31 | 2017-03-08 | Tcl集团股份有限公司 | A kind of TV station symbol recognition method and system |
CN108171209A (en) * | 2018-01-18 | 2018-06-15 | 中科视拓(北京)科技有限公司 | A kind of face age estimation method that metric learning is carried out based on convolutional neural networks |
CN108388927A (en) * | 2018-03-26 | 2018-08-10 | 西安电子科技大学 | Small sample polarization SAR terrain classification method based on the twin network of depth convolution |
CN108846358A (en) * | 2018-06-13 | 2018-11-20 | 浙江工业大学 | Target tracking method for feature fusion based on twin network |
CN109117744A (en) * | 2018-07-20 | 2019-01-01 | 杭州电子科技大学 | A kind of twin neural network training method for face verification |
CN109766921A (en) * | 2018-12-19 | 2019-05-17 | 合肥工业大学 | A kind of vibration data Fault Classification based on depth domain-adaptive |
-
2019
- 2019-07-31 CN CN201910698120.6A patent/CN110610230A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902987A (en) * | 2014-04-17 | 2014-07-02 | 福州大学 | Station caption identifying method based on convolutional network |
CN106488313A (en) * | 2016-10-31 | 2017-03-08 | Tcl集团股份有限公司 | A kind of TV station symbol recognition method and system |
CN108171209A (en) * | 2018-01-18 | 2018-06-15 | 中科视拓(北京)科技有限公司 | A kind of face age estimation method that metric learning is carried out based on convolutional neural networks |
CN108388927A (en) * | 2018-03-26 | 2018-08-10 | 西安电子科技大学 | Small sample polarization SAR terrain classification method based on the twin network of depth convolution |
CN108846358A (en) * | 2018-06-13 | 2018-11-20 | 浙江工业大学 | Target tracking method for feature fusion based on twin network |
CN109117744A (en) * | 2018-07-20 | 2019-01-01 | 杭州电子科技大学 | A kind of twin neural network training method for face verification |
CN109766921A (en) * | 2018-12-19 | 2019-05-17 | 合肥工业大学 | A kind of vibration data Fault Classification based on depth domain-adaptive |
Non-Patent Citations (3)
Title |
---|
KAI QIU ET AL.: "Siamese-ResNet: Implementing Loop Closure Detection based on Siamese Network", 《2018 IEEE INTELLIGENT VEHICLES SYMPOSIUM》 * |
YISEN WANG ET AL.: "Iterative Learning with Open-set Noisy Labels", 《ARXIV》 * |
刘琨: "基于深度学习的台标检测在网络视频审核中的应用", 《无线互联科技》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178290A (en) * | 2019-12-31 | 2020-05-19 | 上海眼控科技股份有限公司 | Signature verification method and device |
CN111311475A (en) * | 2020-02-21 | 2020-06-19 | 广州腾讯科技有限公司 | Detection model training method and device, storage medium and computer equipment |
CN111860472A (en) * | 2020-09-24 | 2020-10-30 | 成都索贝数码科技股份有限公司 | Television station caption detection method, system, computer equipment and storage medium |
CN112975639A (en) * | 2021-05-19 | 2021-06-18 | 江苏中科云控智能工业装备有限公司 | Die casting polishing and deburring mechanism and method |
CN113222802A (en) * | 2021-05-27 | 2021-08-06 | 西安电子科技大学 | Digital image watermarking method based on anti-attack |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110610230A (en) | Station caption detection method and device and readable storage medium | |
CN110443143B (en) | Multi-branch convolutional neural network fused remote sensing image scene classification method | |
Thai et al. | Image classification using support vector machine and artificial neural network | |
CN111598182B (en) | Method, device, equipment and medium for training neural network and image recognition | |
CN110674874B (en) | Fine-grained image identification method based on target fine component detection | |
CN109063649B (en) | Pedestrian re-identification method based on twin pedestrian alignment residual error network | |
CN109063723A (en) | The Weakly supervised image, semantic dividing method of object common trait is excavated based on iteration | |
CN111178120B (en) | Pest image detection method based on crop identification cascading technology | |
CN113221987B (en) | Small sample target detection method based on cross attention mechanism | |
CN111709313B (en) | Pedestrian re-identification method based on local and channel combination characteristics | |
CN109919149B (en) | Object labeling method and related equipment based on object detection model | |
CN112052845A (en) | Image recognition method, device, equipment and storage medium | |
CN113807214B (en) | Small target face recognition method based on deit affiliated network knowledge distillation | |
CN111680705A (en) | MB-SSD method and MB-SSD feature extraction network suitable for target detection | |
CN115410059B (en) | Remote sensing image part supervision change detection method and device based on contrast loss | |
CN111339869A (en) | Face recognition method, face recognition device, computer readable storage medium and equipment | |
CN115731422A (en) | Training method, classification method and device of multi-label classification model | |
CN115661777A (en) | Semantic-combined foggy road target detection algorithm | |
CN112464775A (en) | Video target re-identification method based on multi-branch network | |
CN116563410A (en) | Electrical equipment electric spark image generation method based on two-stage generation countermeasure network | |
CN111027472A (en) | Video identification method based on fusion of video optical flow and image space feature weight | |
Cyganek | An analysis of the road signs classification based on the higher-order singular value decomposition of the deformable pattern tensors | |
CN111310516A (en) | Behavior identification method and device | |
CN113221991A (en) | Method for re-labeling data set by utilizing deep learning | |
CN109635647A (en) | A kind of clustering method based on more picture plurality of human faces under constraint condition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191224 |