CN111552837A - Animal video tag automatic generation method based on deep learning, terminal and medium - Google Patents

Animal video tag automatic generation method based on deep learning, terminal and medium Download PDF

Info

Publication number
CN111552837A
CN111552837A CN202010382574.5A CN202010382574A CN111552837A CN 111552837 A CN111552837 A CN 111552837A CN 202010382574 A CN202010382574 A CN 202010382574A CN 111552837 A CN111552837 A CN 111552837A
Authority
CN
China
Prior art keywords
video
key frame
animal
detected
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010382574.5A
Other languages
Chinese (zh)
Inventor
刘露
蔺昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Inveno Technology Co ltd
Original Assignee
Shenzhen Inveno Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Inveno Technology Co ltd filed Critical Shenzhen Inveno Technology Co ltd
Priority to CN202010382574.5A priority Critical patent/CN111552837A/en
Publication of CN111552837A publication Critical patent/CN111552837A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7328Query by example, e.g. a complete video frame or video sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]

Abstract

The invention provides an animal video tag automatic generation method based on deep learning, a terminal and a medium, wherein the method comprises the following steps: extracting a plurality of key frame images in a video to be detected, and inputting the key frame images into a feature extraction model; inputting the characteristic information output by the characteristic extraction model into a trained target detection algorithm model; and recording the position and the category of the target object output by the target detection algorithm model in the video to be detected, and defining the category of the target object as an animal tag of the video to be detected. The method improves the identification efficiency and the identification accuracy.

Description

Animal video tag automatic generation method based on deep learning, terminal and medium
Technical Field
The invention belongs to the technical field of video tags, and particularly relates to a method, a terminal and a medium for automatically generating an animal video tag based on deep learning.
Background
The automatic animal video tag generation system detects whether an animal exists in a video and what the animal is, and accordingly tags the video. The method commonly used in the existing automatic generation system of animal video tags comprises an interframe difference method and a traditional computer vision image processing method.
Referring to fig. 1 and 2, the inter-frame difference method is based on the difference between pixel values of two images of adjacent frames or images spaced by several frames of video to obtain the absolute value of the brightness difference between the two frames of images, and then thresholding is performed to extract the motion region in the images, so as to deduce the animal region appearing in the video. The method has simple logic and high processing speed. But it cannot be used in a moving camera, and also cannot be used for identifying a static object or an object with a slow or very fast moving speed, and if the surface of the target animal has a large area with similar gray scale values, the identification effect is not good. More importantly, the method can only be used for identifying whether an animal exists in the video, but cannot identify what the animal is, or even cannot ensure the correctness of the identification result, so that the method has great limitation on the use scene.
Referring to fig. 3, 4, and 5, the conventional computer vision image processing method requires artificial design of features for each animal in a training data set, and then training classifier recognition using the extracted features. Since detecting an animal in a video frame requires locating the animal in the video frame image and then identifying the animal's category. Therefore, besides the classification function, the identification model also needs a positioning function. During training, in order to enable the model obtained through final training to be capable of adapting to pictures with different scales, the pictures are firstly scaled into a plurality of pictures according to different aspect ratios, then the whole image is traversed by adopting a method that rectangular frames with different scales and aspect ratios slide in the image, and a position area containing a target which may appear is obtained through the exhaustive strategy. And then extracting a characteristic matrix from the image of each position region obtained by the above strategies. And finally, using the extracted feature matrix for training a classifier. After the model is trained, when the model is actually applied, the video frames need to be extracted at fixed time intervals, and then the model is used for identifying the animal class contained in the image of each frame. And finally, integrating the recognition results of all the extracted video frames as the recognition result of the whole video.
Conventional computer vision image processing methods are capable of identifying the animal class that may be included in the video. However, the sliding window approach will generate a large number of redundant windows, and will also increase the burden of subsequent feature extraction and recognition, which seriously affects the processing efficiency. Moreover, the expression capability of the feature matrix extracted by the artificially designed feature extraction template is weak, and the classifier generally uses weak classifiers such as SVM or Adaboost, so that the recognition accuracy of the final model is low.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the animal video tag automatic generation method, the terminal and the medium based on deep learning, so that the identification efficiency and the identification accuracy are improved.
In a first aspect, a method for automatically generating animal video tags based on deep learning includes the following steps:
extracting a plurality of key frame images in a video to be detected, and inputting the key frame images into a feature extraction model;
inputting the characteristic information output by the characteristic extraction model into a trained target detection algorithm model;
and recording the position and the category of the target object output by the target detection algorithm model in the video to be detected, and defining the category of the target object as an animal tag of the video to be detected.
Preferably, the feature extraction model is formed by a convolutional neural network and is trained by an ImageNet classification data set.
Preferably, the target detection algorithm model is obtained by training the following method:
acquiring a training set consisting of a plurality of training pictures, and marking the position and the category of an object in each training picture;
realizing a target detection algorithm based on TensorFlow framework programming;
training the target detection algorithm by using the training set;
and saving the trained target detection algorithm as the target detection algorithm model.
Preferably, the object detection algorithm model comprises a fast RCNN algorithm model.
Preferably, the extracting a plurality of key frame images in the video to be detected and inputting the key frame images into the feature extraction model specifically includes:
extracting a plurality of frame images in a video to be detected at a preset time interval, and performing de-duplication processing on the extracted frame images by using a perceptual hash algorithm to obtain the key frame images;
and inputting the key frame image into a feature extraction model.
Preferably, the object detection algorithm model comprises a YOLOv2 algorithm model.
Preferably, the extracting a plurality of key frame images in the video to be detected and inputting the key frame images into the feature extraction model specifically includes:
extracting a frame of image from a video to be detected according to a preset time interval;
comparing the new frame image with the cached key frame image by using a perceptual hash algorithm; if the comparison result is smaller than the preset difference threshold value, discarding the new frame image; if the comparison result is greater than or equal to the difference threshold value, defining a new frame image as the key frame image, and inputting the key frame image into a feature extraction model;
the key frame image is buffered.
Preferably, the recording of the position and the category of the target object in the video to be detected, which is output by the target detection algorithm model, and the defining of the category of the target object as the animal tag of the video to be detected specifically includes:
recording the position and the category of the target object in each key frame image output by a fast RCNN algorithm model or a YOLOv2 algorithm model;
counting the occurrence frequency of each type of animal in all key frame images, and sequencing the occurrence frequency of each type of animal in the video to be detected according to a descending order arrangement mode to obtain the animal label of the video to be detected.
In a second aspect, a terminal comprises a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being interconnected, wherein the memory is configured to store a computer program, the computer program comprising program instructions, and the processor is configured to invoke the program instructions to perform the method of the first aspect.
In a third aspect, a computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of the first aspect.
According to the technical scheme, the animal video tag automatic generation method, the terminal and the medium based on deep learning provided by the invention can improve the identification efficiency and the identification accuracy.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
Fig. 1 is a flowchart of an animal video detection method based on an inter-frame difference method in the background art.
Fig. 2 is a flowchart of an animal video detection method based on an inter-frame difference method in the implementation of the background art.
Fig. 3 is a flowchart of a conventional computer vision image processing method provided in the background art.
Fig. 4 is a flowchart of a method for training a model in a conventional computer vision image processing method provided in the background art.
Fig. 5 is a flowchart of a video tag generation method in a conventional computer vision image processing method provided in the background art.
Fig. 6 is a main step of the automatic generation method of the animal video tag provided by the invention.
FIG. 7 is a flowchart of a training method of a target detection model according to the present invention.
Fig. 8 is a flow of a tag generation system of the fast RCNN algorithm according to a second embodiment of the present invention.
Fig. 9 is a flow chart of a label generation system of the YOLOv2 algorithm according to the second embodiment of the present invention.
Fig. 10 is a frame image animal identification result of the fast RCNN algorithm according to the second embodiment of the present invention.
Fig. 11 is a frame image animal recognition result of YOLOv2 algorithm according to the second embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby. It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
The first embodiment is as follows:
an animal video tag automatic generation method based on deep learning, referring to fig. 6, includes the following steps:
extracting a plurality of key frame images in a video to be detected, and inputting the key frame images into a feature extraction model;
inputting the characteristic information output by the characteristic extraction model into a trained target detection algorithm model;
and recording the position and the category of the target object output by the target detection algorithm model in the video to be detected, and defining the category of the target object as an animal tag of the video to be detected.
Specifically, the method for automatically generating the animal video tag provided by the embodiment includes a feature extraction model and a target detection model. The feature extraction model is formed by a convolutional neural network and is obtained by training an ImageNet classification data set. The characteristic extraction model is used for extracting characteristic information of key frame images in the video to be detected. The target detection model comprises two functional modules, namely a locator and a classifier, wherein the locator is used for locating the position of the target object in the key frame image, and the locator outputs the width and the height of the target object and the coordinates of the target object in the key frame image. The classifier is used for classifying the target object positioned by the positioner and outputting the category of the target object. The animal video tag automatic generation method based on deep learning improves recognition efficiency and recognition accuracy.
Example two:
the second embodiment further defines a training method of the target detection model on the basis of the first embodiment.
Referring to fig. 7, the target detection model is trained by the following method:
acquiring a training set consisting of a plurality of training pictures, and marking the position and the category of an object in each training picture;
realizing a target detection algorithm based on TensorFlow framework programming;
training the target detection algorithm by using the training set;
and saving the trained target detection algorithm as the target detection algorithm model.
Specifically, the training pictures in the training set may be determined according to the service condition and the use condition of a specific user. For example, a proper number of pictures are screened out according to the animal pictures appearing in the business provided by the user, the positions and the types of the animals in the pictures are marked, and all the marked pictures are used as training pictures. In the training process, the method can also continuously adjust the parameters of the target detection model according to the comparison between the obtained position and the obtained category and the marking information of the training picture, thereby continuously optimizing the positioning and classifying capability of the model. The method can regularly store the target detection model obtained by training in the training process until the training is stopped, and takes the optimal model as a final result. Two training methods for the target detection model are given below.
1. The fast RCNN algorithm.
The target detection model comprises a fast RCNN algorithm model written based on a tensorflow framework. Referring to fig. 8, the extracting a plurality of key frame images from a video to be detected and inputting the key frame images into a feature extraction model specifically includes:
extracting a plurality of frame images in a video to be detected at a preset time interval, and performing de-duplication processing on the extracted frame images by using a perceptual hash algorithm to obtain the key frame images;
and inputting the key frame image into a feature extraction model.
Specifically, the method writes a fast RCNN algorithm model based on a tensoflow frame, and trains the fast RCNN algorithm model by using the training set. Although the fast RCNN algorithm model has accurate identification precision, the complexity is high, so the identification speed is low, and the real-time effect cannot be achieved. When a fast RCNN algorithm model is used, frame images in a video to be detected are extracted at fixed time intervals, then a perceptual hash algorithm is used for carrying out de-duplication processing on the extracted frame images, only a series of key frame images with large differences are left, then characteristic information of the key frame images is input into the fast RCNN algorithm model, and the fast RCNN algorithm model outputs all animal types and positions in each key frame image.
2. YOLOv2 algorithm model.
The target detection model comprises a YOLOv2 algorithm model written based on the tenserflow framework. Referring to fig. 9, the extracting a plurality of key frame images from a video to be detected and inputting the key frame images into a feature extraction model specifically includes:
extracting a frame of image from a video to be detected according to a preset time interval; (ii) a
Comparing the new frame image with the cached key frame image by using a perceptual hash algorithm; if the comparison result is smaller than the preset difference threshold value, discarding the new frame image; if the comparison result is greater than or equal to the difference threshold value, defining a new frame image as the key frame image, and inputting the key frame image into a feature extraction model;
the key frame image is buffered.
Specifically, the method is based on a Yolov2 algorithm model written by a tenserflow framework, and then the Yolov2 algorithm model is trained by using the training set. The YOLOv2 algorithm model is characterized in that on the premise of keeping the same recognition accuracy as that of a Faster RCNN algorithm model, the recognition efficiency is greatly improved to 40 FPS-67 FPS, the requirement of video real-time processing can be met, and adjustment can be made between accuracy and speed as required. When the video is actually processed, many times, adjacent frame images in the video do not have great difference, and it is not necessary to identify animals in each frame image. Therefore, in practical application, the method only needs to cache the recently identified key frame image, then utilizes the perceptual hash algorithm to compare the latest frame image with the cached key frame image, and abandons the detection of the new frame image if the latest frame image has little difference with the cached key frame image. If the difference is large, the animal and the animal category contained in the new frame image are located by using a YOLOv2 algorithm model.
Preferably, the recording of the position and the category of the target object in the video to be detected, which is output by the target detection algorithm model, and the defining of the category of the target object as the animal tag of the video to be detected specifically includes:
recording the position and the category of the target object in each key frame image output by a fast RCNN algorithm model or a YOLOv2 algorithm model;
counting the occurrence frequency of each type of animal in all key frame images, and sequencing the occurrence frequency of each type of animal in the video to be detected according to a descending order arrangement mode to obtain the animal label of the video to be detected.
Specifically, the method obtains the animal tags in the video by counting the occurrence frequency of each type of animals in all frame images and arranging the occurrence frequency of each type of animals in the video to be detected in a descending order.
FIG. 10 shows the animal recognition results of frame images of the fast RCNN algorithm model. In fig. 10, there are two dogs, the coordinate positions of the dogs located according to the model in the recognition result are drawn, two boxes are drawn on the coordinate positions, and the animals in each box are labeled according to the classification result, for example, the dogs are labeled. In specific implementation, the boxes and the animal types do not need to be marked like 10, and only the animal types and the animals in each type in the image need to be recorded. Fig. 11 shows the result of animal recognition of frame images of YOLOv2 model.
For the sake of brief description, the method provided by the embodiment of the present invention may refer to the corresponding contents in the foregoing method embodiments.
Example three:
third embodiment on the basis of the above embodiments, a terminal is provided.
A terminal comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method described above.
It should be understood that in the embodiments of the present invention, the Processor may be a Central Processing Unit (CPU), and the Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The input device may include a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of the fingerprint), a microphone, etc., and the output device may include a display (LCD, etc.), a speaker, etc.
The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. The portion of memory may also include non-volatile random access memory. For example, the memory may also store device type information.
For a brief description, the embodiment of the present invention may refer to the corresponding content in the foregoing method embodiments.
Example four:
embodiment four on the basis of the above-described embodiments, a medium is provided.
A computer-readable storage medium, in which a computer program is stored, the computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the above-mentioned method.
The computer readable storage medium may be an internal storage unit of the terminal according to any of the foregoing embodiments, for example, a hard disk or a memory of the terminal. The computer readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the terminal. The computer-readable storage medium is used for storing the computer program and other programs and data required by the terminal. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
For the sake of brief description, the media provided by the embodiments of the present invention, and the portions of the embodiments that are not mentioned, refer to the corresponding contents in the foregoing method embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (10)

1. An animal video tag automatic generation method based on deep learning is characterized by comprising the following steps:
extracting a plurality of key frame images in a video to be detected, and inputting the key frame images into a feature extraction model;
inputting the characteristic information output by the characteristic extraction model into a trained target detection algorithm model;
and recording the position and the category of the target object output by the target detection algorithm model in the video to be detected, and defining the category of the target object as an animal tag of the video to be detected.
2. The method for automatically generating animal video tags based on deep learning of claim 1,
the feature extraction model is formed by a convolutional neural network and is obtained by training an ImageNet classification data set.
3. The method for automatically generating animal video tags based on deep learning of claim 1, wherein the target detection algorithm model is trained by the following method:
acquiring a training set consisting of a plurality of training pictures, and marking the position and the category of an object in each training picture;
realizing a target detection algorithm based on TensorFlow framework programming;
training the target detection algorithm by using the training set;
and saving the trained target detection algorithm as the target detection algorithm model.
4. The method for automatically generating animal video tags based on deep learning of claim 3,
the target detection algorithm model comprises a fast RCNN algorithm model.
5. The method for automatically generating animal video tags based on deep learning of claim 4, wherein the extracting a plurality of key frame images from the video to be detected and inputting the key frame images into the feature extraction model specifically comprises:
extracting a plurality of frame images in a video to be detected at a preset time interval, and performing de-duplication processing on the extracted frame images by using a perceptual hash algorithm to obtain the key frame images;
and inputting the key frame image into a feature extraction model.
6. The method for automatically generating animal video tags based on deep learning of claim 3,
the target detection algorithm model comprises a YOLOv2 algorithm model.
7. The method for automatically generating animal video tags based on deep learning of claim 6, wherein the extracting a plurality of key frame images from a video to be detected and inputting the key frame images into a feature extraction model specifically comprises:
extracting a frame of image from a video to be detected according to a preset time interval;
comparing the new frame image with the cached key frame image by using a perceptual hash algorithm; if the comparison result is smaller than the preset difference threshold value, discarding the new frame image; if the comparison result is greater than or equal to the difference threshold value, defining a new frame image as the key frame image, and inputting the key frame image into a feature extraction model;
the key frame image is buffered.
8. The method for automatically generating animal video tags based on deep learning according to claim 5 or 7, wherein the step of recording the position and the category of the target object output by the target detection algorithm model in the video to be detected and the step of defining the category of the target object as the animal tags of the video to be detected specifically comprises the steps of:
recording the position and the category of the target object in each key frame image output by a fast RCNN algorithm model or a YOLOv2 algorithm model;
counting the occurrence frequency of each type of animal in all key frame images, and sequencing the occurrence frequency of each type of animal in the video to be detected according to a descending order arrangement mode to obtain the animal label of the video to be detected.
9. A terminal, comprising a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method according to any of claims 1-7.
CN202010382574.5A 2020-05-08 2020-05-08 Animal video tag automatic generation method based on deep learning, terminal and medium Pending CN111552837A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010382574.5A CN111552837A (en) 2020-05-08 2020-05-08 Animal video tag automatic generation method based on deep learning, terminal and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010382574.5A CN111552837A (en) 2020-05-08 2020-05-08 Animal video tag automatic generation method based on deep learning, terminal and medium

Publications (1)

Publication Number Publication Date
CN111552837A true CN111552837A (en) 2020-08-18

Family

ID=72001892

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010382574.5A Pending CN111552837A (en) 2020-05-08 2020-05-08 Animal video tag automatic generation method based on deep learning, terminal and medium

Country Status (1)

Country Link
CN (1) CN111552837A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112819885A (en) * 2021-02-20 2021-05-18 深圳市英威诺科技有限公司 Animal identification method, device and equipment based on deep learning and storage medium
CN113076882A (en) * 2021-04-03 2021-07-06 国家计算机网络与信息安全管理中心 Specific mark detection method based on deep learning
CN114866788A (en) * 2021-02-03 2022-08-05 阿里巴巴集团控股有限公司 Video processing method and device
CN115115822A (en) * 2022-06-30 2022-09-27 小米汽车科技有限公司 Vehicle-end image processing method and device, vehicle, storage medium and chip
CN116612494A (en) * 2023-05-05 2023-08-18 交通运输部水运科学研究所 Pedestrian target detection method and device in video monitoring based on deep learning
CN117037049A (en) * 2023-10-10 2023-11-10 武汉博特智能科技有限公司 Image content detection method and system based on YOLOv5 deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718890A (en) * 2016-01-22 2016-06-29 北京大学 Method for detecting specific videos based on convolution neural network
CN110119757A (en) * 2019-03-28 2019-08-13 北京奇艺世纪科技有限公司 Model training method, video category detection method, device, electronic equipment and computer-readable medium
CN110147722A (en) * 2019-04-11 2019-08-20 平安科技(深圳)有限公司 A kind of method for processing video frequency, video process apparatus and terminal device
CN110188794A (en) * 2019-04-23 2019-08-30 深圳大学 A kind of training method, device, equipment and the storage medium of deep learning model
CN110472492A (en) * 2019-07-05 2019-11-19 平安国际智慧城市科技股份有限公司 Target organism detection method, device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718890A (en) * 2016-01-22 2016-06-29 北京大学 Method for detecting specific videos based on convolution neural network
CN110119757A (en) * 2019-03-28 2019-08-13 北京奇艺世纪科技有限公司 Model training method, video category detection method, device, electronic equipment and computer-readable medium
CN110147722A (en) * 2019-04-11 2019-08-20 平安科技(深圳)有限公司 A kind of method for processing video frequency, video process apparatus and terminal device
CN110188794A (en) * 2019-04-23 2019-08-30 深圳大学 A kind of training method, device, equipment and the storage medium of deep learning model
CN110472492A (en) * 2019-07-05 2019-11-19 平安国际智慧城市科技股份有限公司 Target organism detection method, device, computer equipment and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114866788A (en) * 2021-02-03 2022-08-05 阿里巴巴集团控股有限公司 Video processing method and device
CN112819885A (en) * 2021-02-20 2021-05-18 深圳市英威诺科技有限公司 Animal identification method, device and equipment based on deep learning and storage medium
CN113076882A (en) * 2021-04-03 2021-07-06 国家计算机网络与信息安全管理中心 Specific mark detection method based on deep learning
CN115115822A (en) * 2022-06-30 2022-09-27 小米汽车科技有限公司 Vehicle-end image processing method and device, vehicle, storage medium and chip
CN115115822B (en) * 2022-06-30 2023-10-31 小米汽车科技有限公司 Vehicle-end image processing method and device, vehicle, storage medium and chip
CN116612494A (en) * 2023-05-05 2023-08-18 交通运输部水运科学研究所 Pedestrian target detection method and device in video monitoring based on deep learning
CN117037049A (en) * 2023-10-10 2023-11-10 武汉博特智能科技有限公司 Image content detection method and system based on YOLOv5 deep learning
CN117037049B (en) * 2023-10-10 2023-12-15 武汉博特智能科技有限公司 Image content detection method and system based on YOLOv5 deep learning

Similar Documents

Publication Publication Date Title
CN111552837A (en) Animal video tag automatic generation method based on deep learning, terminal and medium
CN111241947B (en) Training method and device for target detection model, storage medium and computer equipment
CN109492643B (en) Certificate identification method and device based on OCR, computer equipment and storage medium
CN110197146B (en) Face image analysis method based on deep learning, electronic device and storage medium
WO2019128646A1 (en) Face detection method, method and device for training parameters of convolutional neural network, and medium
CN109344727B (en) Identity card text information detection method and device, readable storage medium and terminal
US20190362193A1 (en) Eyeglass positioning method, apparatus and storage medium
US20060222243A1 (en) Extraction and scaled display of objects in an image
CN111832366B (en) Image recognition apparatus and method
CN110222582B (en) Image processing method and camera
Molina-Moreno et al. Efficient scale-adaptive license plate detection system
CN111488943A (en) Face recognition method and device
CN111368632A (en) Signature identification method and device
CN110796039B (en) Face flaw detection method and device, electronic equipment and storage medium
CN112541394A (en) Black eye and rhinitis identification method, system and computer medium
CN110298302B (en) Human body target detection method and related equipment
CN116543261A (en) Model training method for image recognition, image recognition method device and medium
CN111027526A (en) Method for improving vehicle target detection, identification and detection efficiency
CN109034032B (en) Image processing method, apparatus, device and medium
CN112686122B (en) Human body and shadow detection method and device, electronic equipment and storage medium
CN112200218A (en) Model training method and device and electronic equipment
CN110796145B (en) Multi-certificate segmentation association method and related equipment based on intelligent decision
WO2020244076A1 (en) Face recognition method and apparatus, and electronic device and storage medium
US20240087352A1 (en) System for identifying companion animal and method therefor
CN115953744A (en) Vehicle identification tracking method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination