CN111079849A - Method for constructing new target network model for voice-assisted audio-visual collaborative learning - Google Patents
Method for constructing new target network model for voice-assisted audio-visual collaborative learning Download PDFInfo
- Publication number
- CN111079849A CN111079849A CN201911334785.5A CN201911334785A CN111079849A CN 111079849 A CN111079849 A CN 111079849A CN 201911334785 A CN201911334785 A CN 201911334785A CN 111079849 A CN111079849 A CN 111079849A
- Authority
- CN
- China
- Prior art keywords
- image
- feature
- new
- model
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/757—Matching configurations of points or features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for constructing a new audio-visual collaborative learning target network model assisted by voice, which comprises the steps of S1-S11, wherein the method is based on the traditional object recognition model and image characteristic matching technology, the known object is accurately recognized through an initial object recognition model, if a new object appears, the new object is subjected to characteristic memory through an online learning model, and the initial object recognition model is updated in real time, so that the generalization capability of the model is stronger, and the method is more suitable for the application of real scenes.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a method for constructing a new audio-visual collaborative learning target network model assisted by voice.
Background
With the rapid development of computer vision, object recognition technology has been applied to various fields and brings great economic benefits. In recent years, a number of object recognition network models have appeared, and the recognition accuracy thereof has been improved, but there is a common drawback that an image data set must be prepared in advance, trained on the existing data set, and an object detector must be generated. In practical applications, there are many kinds of objects, and many image data are not collected or are difficult to obtain. In some scenarios, it is not known in advance which categories of image data should be prepared, which makes it difficult for the conventional network model to be applied to the actual scenario. The image feature matching technology can match two images, has strong application value when insufficient training data exists, and can be well applied to some specific scenes although the generalization capability is weak.
A good object recognition model is similar to a human, has the capabilities of autonomous learning and guided learning, can accurately recognize a learned object, can remember and learn a new object through the guidance of the human, and continuously updates the knowledge reserve of the model, so that the model becomes more intelligent. Aiming at the prior art, the invention provides a network model for audio-visual collaborative learning of a new target assisted by voice, which has the function of on-line learning of the new target and has important application value in some specific scenes (such as a home robot, an inspection robot and the like), and the development of the field can be promoted.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method for constructing a new audio-visual collaborative learning target network model, which solves the problem that the existing network model does not have the defect of online learning of a new target.
In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows:
the method for constructing the new audio-visual collaborative learning target network model assisted by voice comprises the following steps:
s1: building an original object classifier M1 for original object recognition and an object feature extraction model M2 for extracting feature vectors of objects;
s2: creating an object feature vector repository B1 for holding feature vectors of new objects and a new object image repository B2 for holding image datasets of new objects;
s3: inputting a new image picture, and loading an original object classifier M1 to perform object recognition on the new image picture;
s4: if the new image picture does not have an unidentified object, stopping operation; if the unidentified objects (object-1, … …, object-M) exist, the object feature extraction model M2 is loaded to perform feature extraction on the unidentified objects (object-1, … …, object-M), and each feature vector in the extracted feature vector set R is respectively subjected to feature matching with each feature vector in the feature vector library B1;
s5: if an object with the highest confidence coefficient most-value higher than the base confidence coefficient base-value exists during matching, judging that the object is correctly identified, otherwise, judging that the object is a new object;
s6: performing man-machine interaction through voice assistance, performing voice description on the dominant characteristic of the new object, and printing a voice tag on the new object to obtain a new image;
s7: image augmentation is carried out on the new image to obtain augmented images (image-1, image-2, … … and image-n), and the images are stored in a new object image library B2;
s8: loading an object feature extraction model M2, extracting features of new object objects in a new image, and storing the obtained feature vector feature in a feature vector library B1;
s9: traversing the new object image library B2, and judging whether the data set quantity of the new object reaches the data set quantity N required by training;
s10: if yes, merging the data set N of the new object with the data set of the original object classifier M1, training a new object classifier to replace the original object classifier M1 by using the merged data set, and deleting the image data set of the new object features in the new object image library B2;
s11: otherwise, repeating steps S3-S9 until the data set amount of the new object reaches the data set amount N required by the training.
Further, the method for constructing the original object classifier M1 for original object recognition includes:
a11: generating a training image set images-input1 by using the image data set according to the actual application scene;
a12: creating a residual convolutional neural network ResNet to extract image feature features-maps of images in the training image set images-input1, wherein the residual convolutional neural network ResNet consists of convolutional layer conv1, relu1 layers and pooling layer pooling 1;
a13: creating an RPN network to generate image candidate region regions, inputting image feature features-maps, judging whether the image feature features-maps belong to a foreground or a background through Softmax, and correcting the candidate region regions to generate accurate candidate region regions 1;
a14: a feature region pro-features-maps of a fixed size is generated using the candidate regions pro-contaminants 1 and the image feature features-maps.
A15: and fully connecting the feature areas of fixed size, classifying the objects by using Softmax, calculating Loss, correcting the Loss, and realizing accurate classification of the original objects.
Further, the method for building the object feature extraction model M2 for extracting the feature vector of the object includes:
b11: preparing image Data1 with several types as training Data set images-input 2;
b12: loading a training data set images-input2, pre-training an autonomous RPN network model RPN-model, and outputting an object candidate region proposals 2;
b13: pre-training a feature extraction network model con-model, loading a training data set images-input2, wherein the feature extraction network model con-model consists of a convolutional layer conv2, a relu2 layer, a pooling layer pooling2 and a full connection layer FC.
B14: and correcting the object candidate regions propofol 2, and then inputting the corrected object candidate regions to a feature extraction network model con-model for feature extraction to obtain image feature features-maps of each candidate region.
Further, the convolutional layer conv2 of the feature extraction network model con-model is 16 layers, the relu2 layer is 15 layers, the pooling layer pooling2 is 5 layers, the convolutional layer conv2 uses a multi-channel convolution operation, the size of a convolution kernel is 3x3, the filling size is 1, the number of convolution steps is 1, the pooling layer pooling2 uses a filter size of 2x2, the step size is 2, the type is maximum pooling, the fully connected layer FC is three layers, and a dropout mechanism is added to each layer.
Further, convolutional layer conv1 of residual convolutional neural network ResNet is 49 layers, relu1 layer is 49 layers, pooling layer posing 1 is 2 layers, convolutional layer conv1 uses a multi-channel convolution operation, convolutional layer conv1 includes 1 convolution kernel of 7x7, 32 convolution kernels of 1x1 and 16 convolution kernels of 3x3, pooling layer posing 1 uses a maximum filter of 3x3 and an average filter of 2x 2.
Further, extracting the feature vector set R through a deep convolutional layer in a feature extraction network model con-model:
Where i is n/2, j is m/2, p is i/2, q is j/2,
A function MatToVec (T) splices rows of a matrix to form a one-dimensional vector, and a parameter T ═ A/B/C is a matrix; the function pad (n) is zero padding operation, and the parameter n represents the number of zero padding; the eigenvector R1 is MatToVec (S1), the eigenvector set R is (R1, R2.., Rs), n and m in the matrix A, B are the length and width of the matrix A, B, respectively, and p and q represent the length and width of the matrix C, respectively.
Further, the highest confidence most-value is the matching degree of any one feature vector in the feature vector set R and any one feature vector in the feature vector library B1:
highest confidenceWherein α + β + γ is 1, M belongs to the feature vector library B1 feature vector set Q (Q)1,Q2…Qt) N belongs to the feature vector set R ═ (R)1,R2…Rs) The size of all feature vectors in Q and R is L.
The invention has the beneficial effects that: the method is based on the traditional object recognition model and the image characteristic matching technology, the known object is accurately recognized through the initial object recognition model, if a new object appears, the new object is subjected to characteristic memory through the online learning model, and the initial object recognition model is updated in real time, so that the generalization capability of the model is stronger, and the method is more suitable for the application of a real scene.
In some scenes, most objects are relatively fixed, recognition can be realized only by characteristic memory, and the initial object recognition model can be updated in continuous memory learning, so that more kinds of objects can be recognized. The network model is applied to the scene needing object identification, and the model is more intelligent. Compared with the traditional object recognition model, the method has higher application value, and the network model can promote the development of the object recognition field and has important research significance.
Drawings
FIG. 1 is a flow chart of a method for constructing a new target network model for voice-assisted audio-visual collaborative learning.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, the method for constructing a new target network model for audio-visual collaborative learning assisted by voice comprises the following steps:
s1: building an original object classifier M1 for identifying an original object and an object feature extraction model M2 for extracting feature vectors of the object;
the method for constructing the original object classifier M1 for original object recognition comprises the following steps:
a11: generating a training image set images-input1 by using the image data set according to the actual application scene;
a12: a residual convolutional neural network ResNet consisting of convolutional layer conv1, relu1 layers, and pooling layer pooling1 was created to extract the image feature features-maps of the images in the training image set images-input 1.
Convolutional layer conv1 of residual convolutional neural network ResNet is 49 layers, relu1 layers is 49 layers, pooling layer posing 1 is 2 layers, convolutional layer conv1 uses a multi-channel convolution operation, convolutional layer conv1 includes 1 convolution kernel of 7x7, 32 convolution kernels of 1x1 and 16 convolution kernels of 3x3, pooling layer posing 1 uses a maximum filter of 3x3 and a mean filter of 2x 2.
Step a12 includes:
a121: using image Data1 with several types as training Data set images-input 2;
a122: and loading a training data set images-input2, pre-training an autonomous RPN network model RPN-model, and outputting an object candidate region proposals 2.
A123: the pre-training feature extraction network model con-model serves as a training data set images-input2 and is composed of a convolutional layer conv2, a relu2 layer, a pooling layer pooling2 and a full connection layer FC.
The convolutional layer conv2 of the feature extraction network model con-model is 16 layers, the relu2 layer is 15 layers, the pooling layer pooling2 is 5 layers, the convolutional layer conv2 uses multi-channel convolution operation, the size of a convolution kernel is 3x3, the filling size is 1, the convolution step number is 1, the pooling layer pooling2 uses a filter with the size of 2x2, the step size is 2, the type is maximum pooling, the full connection layer FC is three layers, and a dropout mechanism is added in each layer.
A124: and correcting the object candidate regions propofol 2, and then inputting the corrected object candidate regions to a feature extraction network model con-model for feature extraction to obtain image feature features-maps of each candidate region.
A13: creating an RPN network to generate image candidate region regions, inputting image feature features-maps, judging whether the image feature features-maps belong to a foreground or a background through Softmax, and correcting the candidate region regions to generate accurate candidate region regions 1;
a14: a feature region pro-features-maps of a fixed size is generated using the candidate regions pro-contaminants 1 and the image feature features-maps.
A15: and fully connecting the feature areas of fixed size, classifying the objects by using Softmax, calculating Loss, correcting the Loss, and realizing accurate classification of the original objects.
The method for building the object feature extraction model M2 for extracting the feature vector of the object comprises the following steps:
b11: generating a training Data set images-input2 by using image Data1 with several types;
b12: and loading a training data set images-input2, pre-training an autonomous RPN network model RPN-model, and outputting an object candidate region proposals 2.
B13: pre-training a feature extraction network model con-model, loading a training data set images-input2, wherein the feature extraction network model con-model consists of a convolutional layer conv2, a relu2 layer, a pooling layer pooling2 and a full connection layer FC.
The convolutional layer conv2 of the feature extraction network model con-model is 16 layers, the relu2 layer is 15 layers, the pooling layer pooling2 is 5 layers, the convolutional layer conv2 uses multi-channel convolution operation, the size of a convolution kernel is 3x3, the filling size is 1, the convolution step number is 1, the pooling layer pooling2 uses a filter with the size of 2x2, the step size is 2, the type is maximum pooling, the full connection layer FC is three layers, and a dropout mechanism is added in each layer.
B14: and correcting the object candidate regions propofol 2, and then inputting the corrected object candidate regions to a feature extraction network model con-model for feature extraction to obtain image feature features-maps of each candidate region.
S2: creating an object feature vector repository B1 for holding feature vectors of new objects and a new object image repository B2 for holding image datasets of new objects;
s3: inputting a new image picture, and loading an original object classifier M1 to perform object recognition on the new image picture;
s4: if the new image picture does not have an unidentified object, stopping operation; if the unidentified objects (object-1, … …, object-M) exist, the object feature extraction model M2 is loaded to perform feature extraction on the unidentified objects (object-1, … …, object-M), and each feature vector in the extracted feature vector set R is respectively subjected to feature matching with each feature vector in the feature vector library B1;
extracting a deep convolutional layer in a feature extraction network model con-model by using a feature vector set R:
Where i is n/2, j is m/2, p is i/2, q is j/2,
A function MatToVec (T) splices rows of a matrix to form a one-dimensional vector, and a parameter T ═ A/B/C is a matrix; the function pad (n) is zero padding operation, and the parameter n represents the number of zero padding; the eigenvector R1 is MatToVec (S1), the eigenvector set R is (R1, R2.., Rs), n and m in the matrix A, B are the length and width of the matrix A, B, respectively, and p and q represent the length and width of the matrix C, respectively.
S5: if an object with the highest confidence coefficient most-value higher than the base confidence coefficient base-value exists during matching, judging that the object is correctly identified, otherwise, judging that the object is a new object;
the highest confidence most-value is the matching degree of any one feature vector in the feature vector set R and any one feature vector in the feature vector library B1:
highest confidenceWherein α + β + γ is 1, M belongs to the feature vector library B1 feature vector set Q (Q)1,Q2…Qt) N belongs to the feature vector set R ═ (R)1,R2…Rs) The size of all feature vectors in Q and R is L.
S6: performing man-machine interaction through voice assistance, performing voice description on the dominant characteristic of the new object, and printing a voice tag on the new object to obtain a new image;
s7: image augmentation is carried out on the new image to obtain augmented images (image-1, image-2, … … and image-n), and the images are stored in a new object image library B2;
s8: loading an object feature extraction model M2, extracting features of new object objects in a new image, and storing the obtained feature vector feature in a feature vector library B1;
the feature vector feature is obtained without passing through the autonomous RPN network model RPN-model in the feature extraction model M2, because the new object is already tagged with features by speech assistance and no object region needs to be extracted again. And directly inputting the image with the characteristic label into the characteristic extraction network model con-model in the characteristic extraction model M2 to extract the characteristic vector feature.
S9: traversing the new object image library B2, and judging whether the data set quantity of the new object reaches the data set quantity N required by training;
s10: if so, merging the data set N of the new object with the data set of the original object classifier M1, training a new object classifier to replace the original object classifier M1 by using the merged data set, and deleting the image data set of the new object features in the new object image library B2;
s11: if not, repeating the steps S3-S9 until the data set quantity of the new object reaches the data set quantity N required by the training.
The method is based on the traditional object recognition model and the image characteristic matching technology, the known object is accurately recognized through the initial object recognition model, if a new object appears, the new object is subjected to characteristic memory through the online learning model, and the initial object recognition model is updated in real time, so that the generalization capability of the model is stronger, and the method is more suitable for the application of a real scene.
In some scenes, most objects are relatively fixed, recognition can be realized only by characteristic memory, and the initial object recognition model can be updated in continuous memory learning, so that more kinds of objects can be recognized. The network model is applied to the scene needing object identification, and the model is more intelligent. Compared with the traditional object recognition model, the method has higher application value, and the network model can promote the development of the object recognition field and has important research significance.
Claims (7)
1. A method for constructing a new audio-visual collaborative learning target network model assisted by voice is characterized by comprising the following steps:
s1: building an original object classifier M1 for original object recognition and an object feature extraction model M2 for extracting feature vectors of objects;
s2: creating an object feature vector repository B1 for holding feature vectors of new objects and a new object image repository B2 for holding image datasets of new objects;
s3: inputting a new image picture, and loading an original object classifier M1 to perform object recognition on the new image picture;
s4: if the new image picture does not have an unidentified object, stopping operation; if the unidentified objects (object-1, … …, object-M) exist, the object feature extraction model M2 is loaded to perform feature extraction on the unidentified objects (object-1, … …, object-M), and each feature vector in the extracted feature vector set R is respectively subjected to feature matching with each feature vector in the feature vector library B1;
s5: if an object with the highest confidence coefficient most-value higher than the base confidence coefficient base-value exists during matching, judging that the object is correctly identified, otherwise, judging that the object is a new object;
s6: performing man-machine interaction through voice assistance, performing voice description on the dominant characteristic of the new object, and printing a voice tag on the new object to obtain a new image;
s7: image augmentation is carried out on the new image to obtain augmented images (image-1, image-2, … … and image-n), and the images are stored in a new object image library B2;
s8: loading an object feature extraction model M2, extracting features of new object objects in a new image, and storing the obtained feature vector feature in a feature vector library B1;
s9: traversing the new object image library B2, and judging whether the data set quantity of the new object reaches the data set quantity N required by training;
s10: if yes, merging the data set N of the new object with the data set of the original object classifier M1, training a new object classifier to replace the original object classifier M1 by using the merged data set, and deleting the image data set of the new object features in the new object image library B2;
s11: otherwise, repeating steps S3-S9 until the data set amount of the new object reaches the data set amount N required by the training.
2. The method for constructing a new target network model for speech-assisted audio-visual collaborative learning according to claim 1, wherein the method for constructing an original object classifier M1 for original object recognition comprises the following steps:
a11: generating a training image set images-input1 by using the image data set according to the actual application scene;
a12: creating a residual convolutional neural network ResNet to extract image feature features-maps of images in the training image set images-input1, wherein the residual convolutional neural network ResNet consists of convolutional layer conv1, relu1 layers and pooling layer pooling 1;
a13: creating an RPN network to generate image candidate region regions, inputting image feature features-maps, judging whether the image feature features-maps belong to a foreground or a background through Softmax, and correcting the candidate region regions to generate accurate candidate region regions 1;
a14: a feature region pro-features-maps of a fixed size is generated using the candidate regions pro-contaminants 1 and the image feature features-maps.
A15: and fully connecting the feature areas of fixed size, classifying the objects by using Softmax, calculating Loss, correcting the Loss, and realizing accurate classification of the original objects.
3. The method for constructing the new target network model for the voice-assisted audio-visual collaborative learning according to claim 2, wherein the method for constructing the object feature extraction model M2 for extracting the feature vectors of the object comprises the following steps:
b11: preparing image Data1 with several types as training Data set images-input 2;
b12: loading a training data set images-input2, pre-training an autonomous RPN network model RPN-model, and outputting an object candidate region proposals 2;
b13: pre-training a feature extraction network model con-model, loading a training data set images-input2, wherein the feature extraction network model con-model consists of a convolutional layer conv2, a relu2 layer, a pooling layer pooling2 and a full connection layer FC;
b14: and correcting the object candidate regions propofol 2, and then inputting the corrected object candidate regions to a feature extraction network model con-model for feature extraction to obtain image feature features-maps of each candidate region.
4. The method as claimed in claim 3, wherein the convolutional layer conv2 of the feature extraction network model con-model is 16 layers, the relu2 layer is 15 layers, the pooling layer posing 2 is 5 layers, the convolutional layer conv2 uses multi-channel convolution operation, the convolution kernel size is 3x3, the padding size is 1, the number of convolution steps is 1, the pooling layer posing 2 uses the filter size is 2x2, the step size is 2, the type is maximum pooling, the fully connected layer FC is three layers, and a dropout mechanism is added to each layer.
5. The method of claim 2, wherein the convolutional layer conv1 of the residual convolutional neural network ResNet is 49 layers, the relu1 layer is 49 layers, and the pooling layer pooling1 is 2 layers, the convolutional layer conv1 uses a multi-channel convolution operation, the convolutional layer conv1 comprises 1 convolution kernel of 7x7, 32 convolution kernels of 1x1 and 16 convolution kernels of 3x3, and the pooling layer pooling1 uses a maximum filter of 3x3 and an average filter of 2x 2.
6. The method for constructing a new target network model for speech-assisted audio-visual collaborative learning according to claim 1, wherein the feature vector set R is extracted by a deep convolutional layer in a feature extraction network model con-model:
Where i is n/2, j is m/2, p is i/2, q is j/2,
A function MatToVec (T) splices rows of a matrix to form a one-dimensional vector, and a parameter T ═ A/B/C is a matrix; the function pad (n) is zero padding operation, and the parameter n represents the number of zero padding; the eigenvector R1 is MatToVec (S1), the eigenvector set R is (R1, R2.., Rs), n and m in the matrix A, B are the length and width of the matrix A, B, respectively, and p and q represent the length and width of the matrix C, respectively.
7. The method for constructing a new target network model for speech-assisted audio-visual collaborative learning according to claim 1, wherein the highest confidence most-value is a matching degree between any one feature vector in a feature vector set R and any one feature vector in a feature vector library B1:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911334785.5A CN111079849A (en) | 2019-12-23 | 2019-12-23 | Method for constructing new target network model for voice-assisted audio-visual collaborative learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911334785.5A CN111079849A (en) | 2019-12-23 | 2019-12-23 | Method for constructing new target network model for voice-assisted audio-visual collaborative learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111079849A true CN111079849A (en) | 2020-04-28 |
Family
ID=70316831
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911334785.5A Pending CN111079849A (en) | 2019-12-23 | 2019-12-23 | Method for constructing new target network model for voice-assisted audio-visual collaborative learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111079849A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107506407A (en) * | 2017-08-07 | 2017-12-22 | 深圳市大迈科技有限公司 | A kind of document classification, the method and device called |
CN108009591A (en) * | 2017-12-14 | 2018-05-08 | 西南交通大学 | A kind of contact network key component identification method based on deep learning |
CN108875455A (en) * | 2017-05-11 | 2018-11-23 | Tcl集团股份有限公司 | A kind of unsupervised face intelligence precise recognition method and system |
CN109063594A (en) * | 2018-07-13 | 2018-12-21 | 吉林大学 | Remote sensing images fast target detection method based on YOLOv2 |
-
2019
- 2019-12-23 CN CN201911334785.5A patent/CN111079849A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108875455A (en) * | 2017-05-11 | 2018-11-23 | Tcl集团股份有限公司 | A kind of unsupervised face intelligence precise recognition method and system |
CN107506407A (en) * | 2017-08-07 | 2017-12-22 | 深圳市大迈科技有限公司 | A kind of document classification, the method and device called |
CN108009591A (en) * | 2017-12-14 | 2018-05-08 | 西南交通大学 | A kind of contact network key component identification method based on deep learning |
CN109063594A (en) * | 2018-07-13 | 2018-12-21 | 吉林大学 | Remote sensing images fast target detection method based on YOLOv2 |
Non-Patent Citations (2)
Title |
---|
KAIMING HE等: "《Deep Residual Learning for Image Recognition》", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
SHAOQING REN等: "《Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks》", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kao et al. | Visual aesthetic quality assessment with a regression model | |
CN109740413A (en) | Pedestrian recognition methods, device, computer equipment and computer storage medium again | |
US20170032222A1 (en) | Cross-trained convolutional neural networks using multimodal images | |
JP2017062781A (en) | Similarity-based detection of prominent objects using deep cnn pooling layers as features | |
CN111461212A (en) | Compression method for point cloud target detection model | |
CN112347284B (en) | Combined trademark image retrieval method | |
CN108133235B (en) | Pedestrian detection method based on neural network multi-scale feature map | |
CN113836992B (en) | Label identification method, label identification model training method, device and equipment | |
CN111368766A (en) | Cattle face detection and identification method based on deep learning | |
CN111222487A (en) | Video target behavior identification method and electronic equipment | |
CN108921850B (en) | Image local feature extraction method based on image segmentation technology | |
CN109034121B (en) | Commodity identification processing method, device, equipment and computer storage medium | |
CN112200031A (en) | Network model training method and equipment for generating image corresponding word description | |
CN111340051A (en) | Picture processing method and device and storage medium | |
CN113989604A (en) | Tire DOT information identification method based on end-to-end deep learning | |
CN113963026A (en) | Target tracking method and system based on non-local feature fusion and online updating | |
CN116258990A (en) | Cross-modal affinity-based small sample reference video target segmentation method | |
Peng et al. | Document image quality assessment using discriminative sparse representation | |
CN117115614B (en) | Object identification method, device, equipment and storage medium for outdoor image | |
CN112070181B (en) | Image stream-based cooperative detection method and device and storage medium | |
Timotheatos et al. | Vision based horizon detection for UAV navigation | |
CN117437691A (en) | Real-time multi-person abnormal behavior identification method and system based on lightweight network | |
Abdelaziz et al. | Few-shot learning with saliency maps as additional visual information | |
Jiafa et al. | A scene recognition algorithm based on deep residual network | |
CN116740413A (en) | Deep sea biological target detection method based on improved YOLOv5 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200428 |