CN113536823A - Video scene label extraction system and method based on deep learning and application thereof - Google Patents

Video scene label extraction system and method based on deep learning and application thereof Download PDF

Info

Publication number
CN113536823A
CN113536823A CN202010281542.6A CN202010281542A CN113536823A CN 113536823 A CN113536823 A CN 113536823A CN 202010281542 A CN202010281542 A CN 202010281542A CN 113536823 A CN113536823 A CN 113536823A
Authority
CN
China
Prior art keywords
deep learning
video
module
picture
preprocessing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010281542.6A
Other languages
Chinese (zh)
Inventor
秦迎梅
门聪
车艳秋
韩春晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Technology and Education China Vocational Training Instructor Training Center
Original Assignee
Tianjin University of Technology and Education China Vocational Training Instructor Training Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Technology and Education China Vocational Training Instructor Training Center filed Critical Tianjin University of Technology and Education China Vocational Training Instructor Training Center
Priority to CN202010281542.6A priority Critical patent/CN113536823A/en
Publication of CN113536823A publication Critical patent/CN113536823A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a video scene label extraction system based on deep learning and a method thereof, wherein the method comprises the following steps: step 1, a sample construction module collects picture samples and labels scene labels for the second time; step 2, preprocessing the picture sample obtained in the step 1 by a data preprocessing module, and dividing the picture sample into a training set and a verification set; step 3, training the deep learning model in the deep learning model module by using the training set obtained in the step 2, and then verifying by using a verification set to obtain an identification model; step 4, the identification and processing module extracts key frames of the video to be processed, records corresponding time points of the key frames in the video, and preprocesses the extracted key frames; and inputting the key frame picture generated by preprocessing into a recognition model to obtain recognized possible scene labels. The invention solves the problem of insufficient extraction of video content information, and the client can optimize the recommendation and search effects of the video based on the identified information.

Description

Video scene label extraction system and method based on deep learning and application thereof
Technical Field
The invention relates to the technical field of video processing, in particular to a system and a method for extracting video scene labels based on deep learning and application thereof.
Background
In the field of video APP, users of short video and small video products are getting larger and larger, and a large amount of video data needs to be analyzed and processed. The platform of the video product needs to effectively analyze video results, content labels extracted based on video content are single at present, general video search mainly depends on matching of keywords and video titles, and the extracted labels have deviation with the video content.
Disclosure of Invention
The invention aims to provide a video scene tag extraction system based on deep learning, aiming at the problem that the deviation exists in extracting video tags according to keywords in the prior art.
Another object of the present invention is to provide an extraction method of the video scene tag extraction system.
Another object of the present invention is to provide an application of the video scene tag extraction system and the extraction method.
The technical scheme adopted for realizing the purpose of the invention is as follows:
a video scene label extraction system based on deep learning is characterized by comprising a sample construction module, a data preprocessing module, a deep learning model module and an identification and processing module;
the system comprises a sample construction module, a data preprocessing module, a deep learning model module, an identification and processing module, a video processing module and a data processing module, wherein the sample construction module is used for collecting picture samples and labeling the picture samples with scene labels, the data preprocessing module is used for preprocessing the picture samples in a filtering and standardization mode, the deep learning model in the deep learning model module is trained by the preprocessed picture samples to obtain an identification model, the identification and processing module is used for extracting pictures from videos and then standardizing the pictures, and the identification model is used for identifying the standardized extracted pictures and outputting the video scene labels.
In the above technical solution, the identification and processing module may be used locally or deployed in a cloud, and the process of deployment in the cloud is as follows:
step 1, deploying the rear end of a server by using a python flash framework, and building an http service; and 2, the server side open port processes the request transmitted by the Internet.
In the technical scheme, the deep learning model adopts an EfficientNet network structure.
In another aspect of the invention, the deep learning-based video scene label extraction system is applied to short video feature extraction and search strategy optimization.
In another aspect of the present invention, an extraction method of a deep learning-based video scene tag extraction system includes the following steps:
step 1, a sample construction module collects picture samples, wherein the picture samples comprise picture samples in a public data set and picture samples obtained through keyword search, all the picture samples are secondarily labeled with scene labels according to scenes needing to be mined, and pictures which do not accord with the scene labels in related categories are deleted;
step 2, preprocessing the picture samples obtained in the step 1 by a data preprocessing module, sorting the picture samples according to categories, and dividing the picture samples into a training set and a verification set;
step 3, training the deep learning model in the deep learning model module by using the training set obtained in the step 2, verifying the deep learning model by using the verification set, and storing the deep learning model with the optimal effect on the verification set to obtain an identification model;
step 4, the identification and processing module extracts key frames of the video to be processed, records corresponding time points of the key frames in the video, and preprocesses the extracted key frames;
inputting the key frame picture generated by preprocessing into the recognition model to obtain the recognized possible scene label x and the corresponding score, and if the score is above a threshold value a, the picture is considered to have the label x, preferably, the score is between 0 and 1, and the value of a is 0.5.
In the above technical solution, during the preprocessing in step 2, firstly, the picture samples with the width less than 200 pixels are filtered, and then, the picture samples are resampled and black-filled to process into a picture with 446 × 446 pixels; and when the preprocessing is performed in the step 4, resampling and black filling are performed, and a 446 × 446 pixel picture is processed.
In the above technical solution, in the step 2, the ratio of the number of samples in the training set to the number of samples in the verification set is (3-5):1, and more preferably 4: 1.
In the above technical solution, in the step 2, after the preprocessing, the number of samples is increased by using a sample enhancement technique, including translating random pixels, rotating random angles or left and right mirroring.
In the above technical solution, in the step 4, a ffmpeg tool is used to extract a key frame of a video to be processed;
in the step 4, the key frame pictures are extracted every 2-5s, and in the step 4, after each key frame in the video is identified, the corresponding scene label of each time point, namely the scene label included in the key frame of each time point, can be generated.
In another aspect of the invention, the extraction method is applied to short video feature extraction and search strategy optimization.
Compared with the prior art, the invention has the beneficial effects that:
the method can extract and identify the scene labels of the video through a deep learning technology, and based on the sequence information, the efficiency and the accuracy of extracting the video characteristics can be improved. The method can be further applied to a short video platform, and richness and accuracy of search and recommendation results can be improved.
Drawings
FIG. 1 is an example of a potential application scenario of the method
Detailed Description
The present invention will be described in further detail with reference to specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
A video scene label extraction system based on deep learning comprises a sample construction module, a data preprocessing module, a deep learning model module and an identification and processing module;
the system comprises a sample construction module, a data preprocessing module, a deep learning model module, an identification and processing module, a video processing module and a data processing module, wherein the sample construction module is used for collecting picture samples and labeling the picture samples with scene labels, the data preprocessing module is used for preprocessing the picture samples in a filtering and standardization mode, the deep learning model in the deep learning model module is trained by the preprocessed picture samples to obtain an identification model, the identification and processing module is used for extracting pictures from videos and then standardizing the pictures, and the identification model is used for identifying the standardized extracted pictures and outputting the video scene labels.
The identification and processing module can be used locally or deployed in a cloud end, and the process of deployment in the cloud end is as follows:
step 1, deploying the back end of a server by using a python flash framework, and building an http service. The cloud system may be Aliyun. And 2, opening a certain port such as 8080 at the server end, and processing the request of internet transmission. The request for internet transmission is done via http protocol.
Example 2
The extraction method of the deep learning-based video scene tag extraction system according to embodiment 1 includes the following steps:
step 1, a sample construction module collects picture samples, wherein the picture samples comprise picture samples in a public data set and picture samples obtained through keyword search, and scene labels are secondarily labeled on the picture samples according to scenes needing to be mined; (scene label categories such as indoor, street, park, home, office, field, seaside, mountain, car, ship, snow mountain, desert), deleting pictures which do not conform to the scene label under the related category;
step 2, preprocessing the picture samples obtained in the step 1 by a data preprocessing module, sorting the picture samples according to categories, and dividing the picture samples into a training set and a verification set;
step 3, training the deep learning model in the deep learning model module by using the training set obtained in the step 2, verifying the deep learning model by using the verification set, and storing the deep learning model with the optimal effect on the verification set to obtain an identification model;
step 4, the identification and processing module extracts key frames of the video to be processed, records corresponding time points of the key frames in the video, and preprocesses the extracted key frames;
and inputting the key frame picture generated by preprocessing into a recognition model, and obtaining the recognized possible scene label x and a corresponding score (the score is between 0 and 1), and if the score is above a threshold value a, the picture is considered to have the label x.
Preferably, a is 0.5.
In order to optimize the recognition effect of the recognition model, the deep learning model adopts an EfficientNet network structure, and the large-scale method of the general deep learning classification network comprises the following steps: widening the network, deepening the network, and increasing resolution. And the EfficientNet takes the network width, the depth and the resolution as optimization parameters to obtain the optimal width, depth and resolution combination under a certain model complexity. The extraction system can improve the identification accuracy.
In order to unify the picture samples, in the step 2, during the preprocessing, firstly, the picture samples with the width less than 200 pixels are filtered, and then, the picture samples are resampled and black-filled to be processed into a picture with 446 × 446 pixels; and when the preprocessing is performed in the step 4, resampling and black filling are performed, and a 446 × 446 pixel picture is processed.
In step 1, a relevant sample is downloaded from a network public data set, such as imagenet, to obtain a picture sample in the public data set.
In step 2, the ratio of the number of samples in the training set to the number of samples in the validation set is (3-5):1, and more preferably 4: 1.
In order to improve the generalization capability of the deep learning model, in step 2, after the preprocessing, the number of samples is increased by using a sample enhancement technique, including translating random pixels, rotating random angles or left and right mirroring, for example, randomly translating up, down, left and right by 1-50 pixels and rotating random angles, for example, -20 degrees to 20 degrees. All the pictures are enhanced in this way, one picture is subjected to different enhancement methods to obtain a plurality of pictures, and the pictures are all used as training samples.
In order to extract the key frame picture, the key frame of the video to be processed is extracted by using the ffmpeg tool in the step 4,
in order to optimize the effect of extracting the labels, the key frame pictures are extracted every 2-5s in the step 4. In step 4, after each key frame in the video is identified, a corresponding scene tag of each time point, that is, a scene tag included in the key frame of each time point, may be generated. Specific uses of this information are detailed in example 3.
Example 3
This embodiment exemplifies an application scenario of the extraction system of embodiment 1 or the extraction method of embodiment 2.
3.1
The extraction system and the extraction method can be applied to short video feature extraction and search strategy optimization, such as calculating which main scene labels are contained in the short video and applying the main scene labels as features in video search. For example, the search no longer uses the video title alone as a standard, but directly searches the content tag of the video. The current video search scene search basis mainly takes title information as a main basis, and as shown in fig. 1, content information can be added to increase the richness of search results.
3.2
The extraction system and the extraction method can be applied to short video recommendation strategy optimization, dig out which labels have higher playing completion rate and higher praise rate of the users, and then increase the recommendation weight of the video of the type in recommendation.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A video scene label extraction system based on deep learning is characterized by comprising a sample construction module, a data preprocessing module, a deep learning model module and an identification and processing module;
the system comprises a sample construction module, a data preprocessing module, a deep learning model module, an identification and processing module, a video processing module and a data processing module, wherein the sample construction module is used for collecting picture samples and labeling the picture samples with scene labels, the data preprocessing module is used for preprocessing the picture samples in a filtering and standardization mode, the deep learning model in the deep learning model module is trained by the preprocessed picture samples to obtain an identification model, the identification and processing module is used for extracting pictures from videos and then standardizing the pictures, and the identification model is used for identifying the standardized extracted pictures and outputting the video scene labels.
2. The deep learning-based video scene tag extraction system of claim 1, wherein the recognition and processing module is configured to be used locally or deployed in a cloud, and the process of deployment in the cloud is as follows:
step 1, deploying the rear end of a server by using a python flash framework, and building an http service; and 2, the server side open port processes the request transmitted by the Internet.
3. The deep learning based video scene tag extraction system of claim 1, wherein the deep learning model employs an EfficientNet network structure.
4. Use of a deep learning based video scene tag extraction system according to any of claims 1-3 in short video feature extraction and search strategy optimization.
5. The extraction method of the video scene label extraction system based on deep learning is characterized by comprising the following steps:
step 1, a sample construction module collects picture samples, wherein the picture samples comprise picture samples in a public data set and picture samples obtained through keyword search, all the picture samples are secondarily labeled with scene labels according to scenes needing to be mined, and pictures which do not accord with the scene labels in related categories are deleted;
step 2, preprocessing the picture samples obtained in the step 1 by a data preprocessing module, sorting the picture samples according to categories, and dividing the picture samples into a training set and a verification set;
step 3, training the deep learning model in the deep learning model module by using the training set obtained in the step 2, verifying the deep learning model by using the verification set, and storing the deep learning model with the optimal effect on the verification set to obtain an identification model;
step 4, the identification and processing module extracts key frames of the video to be processed, records corresponding time points of the key frames in the video, and preprocesses the extracted key frames;
and inputting the key frame picture generated by preprocessing into the recognition model to obtain a recognized possible scene label x and a corresponding score, wherein if the score is above a threshold value a, the picture has the label x, preferably, the score is between 0 and 1, and the value of a is 0.5.
6. The extraction method according to claim 5, wherein in the step 2 preprocessing, the picture samples with the width less than 200 pixels are filtered, and then the picture samples are resampled and black-filled to be processed into a 446 × 446 pixel picture; and when the preprocessing is performed in the step 4, resampling and black filling are performed, and a 446 × 446 pixel picture is processed.
7. The extraction method according to claim 5, wherein in step 2, the ratio of the number of samples in the training set to the number of samples in the validation set is (3-5):1, and more preferably 4: 1.
8. The extraction method according to claim 5, wherein in the step 2, after the preprocessing, the number of samples is increased by using a sample enhancement technique, including translating random pixels, rotating random angles or mirroring left and right.
9. The extraction method as claimed in claim 5, wherein in step 4, the key frames of the video to be processed are extracted by using ffmpeg tool;
in the step 4, the key frame pictures are extracted every 2-5s, and in the step 4, after each key frame in the video is identified, the corresponding scene label of each time point, namely the scene label included in the key frame of each time point, can be generated.
10. Use of the extraction method of any one of claims 5-9 in short video feature extraction and search strategy optimization.
CN202010281542.6A 2020-04-10 2020-04-10 Video scene label extraction system and method based on deep learning and application thereof Pending CN113536823A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010281542.6A CN113536823A (en) 2020-04-10 2020-04-10 Video scene label extraction system and method based on deep learning and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010281542.6A CN113536823A (en) 2020-04-10 2020-04-10 Video scene label extraction system and method based on deep learning and application thereof

Publications (1)

Publication Number Publication Date
CN113536823A true CN113536823A (en) 2021-10-22

Family

ID=78087766

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010281542.6A Pending CN113536823A (en) 2020-04-10 2020-04-10 Video scene label extraction system and method based on deep learning and application thereof

Country Status (1)

Country Link
CN (1) CN113536823A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114187558A (en) * 2021-12-20 2022-03-15 深圳万兴软件有限公司 Video scene recognition method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105187795A (en) * 2015-09-14 2015-12-23 博康云信科技有限公司 Video label positioning method and device based on view library
CN108777815A (en) * 2018-06-08 2018-11-09 Oppo广东移动通信有限公司 Method for processing video frequency and device, electronic equipment, computer readable storage medium
CN109284784A (en) * 2018-09-29 2019-01-29 北京数美时代科技有限公司 A kind of content auditing model training method and device for live scene video
CN110674345A (en) * 2019-09-12 2020-01-10 北京奇艺世纪科技有限公司 Video searching method and device and server
CN110827250A (en) * 2019-10-29 2020-02-21 浙江明峰智能医疗科技有限公司 Intelligent medical image quality evaluation method based on lightweight convolutional neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105187795A (en) * 2015-09-14 2015-12-23 博康云信科技有限公司 Video label positioning method and device based on view library
CN108777815A (en) * 2018-06-08 2018-11-09 Oppo广东移动通信有限公司 Method for processing video frequency and device, electronic equipment, computer readable storage medium
CN109284784A (en) * 2018-09-29 2019-01-29 北京数美时代科技有限公司 A kind of content auditing model training method and device for live scene video
CN110674345A (en) * 2019-09-12 2020-01-10 北京奇艺世纪科技有限公司 Video searching method and device and server
CN110827250A (en) * 2019-10-29 2020-02-21 浙江明峰智能医疗科技有限公司 Intelligent medical image quality evaluation method based on lightweight convolutional neural network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114187558A (en) * 2021-12-20 2022-03-15 深圳万兴软件有限公司 Video scene recognition method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110297943B (en) Label adding method and device, electronic equipment and storage medium
CN109657552B (en) Vehicle type recognition device and method for realizing cross-scene cold start based on transfer learning
WO2021082589A1 (en) Content check model training method and apparatus, video content check method and apparatus, computer device, and storage medium
US8315430B2 (en) Object recognition and database population for video indexing
CN107103314B (en) A kind of fake license plate vehicle retrieval system based on machine vision
CN111191695A (en) Website picture tampering detection method based on deep learning
CN106844685B (en) Method, device and server for identifying website
CN113407886A (en) Network crime platform identification method, system, device and computer storage medium
CN107992937B (en) Unstructured data judgment method and device based on deep learning
CN112488222B (en) Crowdsourcing data labeling method, system, server and storage medium
CN109684511A (en) A kind of video clipping method, video aggregation method, apparatus and system
Chen et al. Text area detection from video frames
CN111601179A (en) Network advertisement promotion method based on video content
CN115775363A (en) Illegal video detection method based on text and video fusion
CN109408671A (en) The searching method and its system of specific objective
CN111914649A (en) Face recognition method and device, electronic equipment and storage medium
CN113536823A (en) Video scene label extraction system and method based on deep learning and application thereof
CN113536032A (en) Video sequence information mining system, method and application thereof
CN114037886A (en) Image recognition method and device, electronic equipment and readable storage medium
CN113762034A (en) Video classification method and device, storage medium and electronic equipment
CN115983873B (en) User data analysis management system and method based on big data
CN112685510B (en) Asset labeling method, computer program and storage medium based on full flow label
CN111860222B (en) Video behavior recognition method, system, computer device and storage medium based on dense-segmented frame sampling
CN110163043B (en) Face detection method, device, storage medium and electronic device
CN114267084A (en) Video identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211022