CN114241367A - Visual semantic detection method and system - Google Patents
Visual semantic detection method and system Download PDFInfo
- Publication number
- CN114241367A CN114241367A CN202111461695.XA CN202111461695A CN114241367A CN 114241367 A CN114241367 A CN 114241367A CN 202111461695 A CN202111461695 A CN 202111461695A CN 114241367 A CN114241367 A CN 114241367A
- Authority
- CN
- China
- Prior art keywords
- label
- frame
- frame image
- dimension
- video data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 17
- 230000000007 visual effect Effects 0.000 title claims abstract description 16
- 238000006243 chemical reaction Methods 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims description 20
- 238000003062 neural network model Methods 0.000 claims description 3
- 230000008054 signal transmission Effects 0.000 claims description 3
- 230000009191 jumping Effects 0.000 abstract description 3
- 238000002372 labelling Methods 0.000 abstract description 3
- 230000003287 optical effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a visual semantic detection method and a visual semantic detection system, which are characterized in that a characteristic value jumping point is obtained by calculating a characteristic value of a histogram of a frame image in a gradient direction, a label is used for labeling, vectorization and dimension conversion are carried out on the frame image at the later moment of the label, a high-dimension sample set is obtained and is divided according to different planes, and the flattened frame image is input into a graphic analysis model, so that whether the frame image and a data stream are in compliance or not is judged.
Description
Technical Field
The present application relates to the field of network multimedia, and in particular, to a method and system for visual semantic detection.
Background
With the rapid development of network video multimedia, a large number of video programs appear, and the amateur life of people is enriched. But also poses a problem in that content that is not compliant, such as violence, may appear in the video. In the prior art, although the feature value of the histogram of the frame in the gradient direction is calculated and used for detection, the effect is not particularly good, and further improvement is needed.
Therefore, a method and system for targeted visual semantic detection is urgently needed.
Disclosure of Invention
The invention aims to provide a visual semantic detection method and a visual semantic detection system, which are used for obtaining a characteristic value jumping point by calculating a characteristic value of a histogram of a frame image in a gradient direction, labeling by using a label, vectorizing and performing dimension conversion on the frame image at the moment after the label to obtain a high-dimension sample set, dividing the high-dimension sample set according to different planes, and inputting the flattened frame image into a graphic analysis model so as to judge whether the frame image and a data stream are in compliance or not.
In a first aspect, the present application provides a method for visual semantic detection, the method comprising:
acquiring a video data stream, calculating a characteristic value of a histogram of each frame in a gradient direction, judging that image skipping occurs between the frames when the difference of the characteristic values between the frames is larger than a preset threshold value, and inserting a label between the frames, wherein the label is used for marking a point of the image skipping;
extracting a frame image at the next moment after the label according to the label, inputting the frame image into a semantic analysis model, analyzing character information contained in the frame image, acquiring key character features, judging whether the frame image comprises non-compliant character content or not, and obtaining a first judgment result;
vectorizing the frame image at the moment after the label, performing dimension conversion, converting the dimension P x Q of the received frame image into the dimension M x N, wherein P x Q is the dimension of a signal transmission channel, M x N is the dimension of server load processing, P, Q, M, N are all non-zero positive integers, inputting a high-dimension sample set, and calling a classifier to classify the high-dimension sample set; the classification refers to clustering according to the characteristic values after the vectorization and the dimension conversion of the frame images, the characteristic values are divided into a group in a preset difference value range, the average characteristic value of the group is calculated, and different groups with the difference value of the average characteristic values between the groups larger than a threshold value are divided into different planes;
inputting the frame images of the same plane into a graphic analysis model, identifying object information contained in the frame images, acquiring key object characteristics, judging whether the frame images comprise non-compliant graphic contents, and obtaining a second judgment result;
determining whether the frame image at the later moment of the label is in compliance according to the first judgment result and the second judgment result, if so, judging a video data stream between the current label and the next label as a compliant video data stream, and storing the compliant video data stream in a server; otherwise, judging that a section of video data stream between the current label and the next label is not compliant, and deleting the section of video data stream;
and moving to the next label, and repeating the action of extracting the frame image at the moment after the label until all the video data streams are judged to be finished.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the detecting a change in a grayscale centroid position of the image includes detecting a change in a feature value of the histogram in a gradient direction.
With reference to the first aspect, in a second possible implementation manner of the first aspect, the invoking the classifier to classify the high-dimensional sample set includes an inner product operation between an input vector and the high-dimensional sample set.
With reference to the first aspect, in a third possible implementation manner of the first aspect, the kernels of the semantic analysis model and the graphical analysis model both use a neural network model.
In a second aspect, the present application provides a system for visual semantic detection, the system comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method of any one of the four possibilities of the first aspect according to instructions in the program code.
In a third aspect, the present application provides a computer readable storage medium for storing program code for performing the method of any one of the four possibilities of the first aspect.
The invention provides a visual semantic detection method and a visual semantic detection system, which are characterized in that a characteristic value jumping point is obtained by calculating a characteristic value of a histogram of a frame image in a gradient direction, a label is used for labeling, vectorization and dimension conversion are carried out on the frame image at the later moment of the label, a high-dimension sample set is obtained and is divided according to different planes, and the flattened frame image is input into a graphic analysis model, so that whether the frame image and a data stream are in compliance or not is judged.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, and the scope of the present invention will be more clearly and clearly defined.
Fig. 1 is a flowchart of a method for visual semantic detection provided in the present application, including:
acquiring a video data stream, calculating a characteristic value of a histogram of each frame in a gradient direction, judging that image skipping occurs between the frames when the difference of the characteristic values between the frames is larger than a preset threshold value, and inserting a label between the frames, wherein the label is used for marking a point of the image skipping;
extracting a frame image at the next moment after the label according to the label, inputting the frame image into a semantic analysis model, analyzing character information contained in the frame image, acquiring key character features, judging whether the frame image comprises non-compliant character content or not, and obtaining a first judgment result;
vectorizing the frame image at the moment after the label, performing dimension conversion, converting the dimension P x Q of the received frame image into the dimension M x N, wherein P x Q is the dimension of a signal transmission channel, M x N is the dimension of server load processing, P, Q, M, N are all non-zero positive integers, inputting a high-dimension sample set, and calling a classifier to classify the high-dimension sample set; the classification refers to clustering according to the characteristic values after the vectorization and the dimension conversion of the frame images, the characteristic values are divided into a group in a preset difference value range, the average characteristic value of the group is calculated, and different groups with the difference value of the average characteristic values between the groups larger than a threshold value are divided into different planes;
inputting the frame images of the same plane into a graphic analysis model, identifying object information contained in the frame images, acquiring key object characteristics, judging whether the frame images comprise non-compliant graphic contents, and obtaining a second judgment result;
determining whether the frame image at the later moment of the label is in compliance according to the first judgment result and the second judgment result, if so, judging a video data stream between the current label and the next label as a compliant video data stream, and storing the compliant video data stream in a server; otherwise, judging that a section of video data stream between the current label and the next label is not compliant, and deleting the section of video data stream;
and moving to the next label, and repeating the action of extracting the frame image at the moment after the label until all the video data streams are judged to be finished.
In some preferred embodiments, the feature value of the histogram in the gradient direction includes detecting a change in a location of a grayscale centroid of the image.
In some preferred embodiments, the calling classifier classifies the high-dimensional sample set, including an inner product operation between the input vector and the high-dimensional sample set.
In some preferred embodiments, the kernels of the semantic analysis model and the graphical analysis model both use a neural network model.
The present application provides a system for visual semantic detection, the system comprising: the system includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method according to any of the embodiments of the first aspect according to instructions in the program code.
The present application provides a computer readable storage medium for storing program code for performing the method of any of the embodiments of the first aspect.
In specific implementation, the present invention further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments of the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The same and similar parts in the various embodiments of the present specification may be referred to each other. In particular, for the embodiments, since they are substantially similar to the method embodiments, the description is simple, and the relevant points can be referred to the description in the method embodiments.
The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.
Claims (6)
1. A visual semantic detection method, the method comprising:
acquiring a video data stream, calculating a characteristic value of a histogram of each frame of image in a gradient direction, judging that image jump occurs between the frame and the frame when the difference of the characteristic values between the frame and the frame is greater than a preset threshold value, and inserting a label between the frame and the frame, wherein the label is used for marking a point of the image jump;
extracting a frame image at the next moment after the label according to the label, inputting the frame image into a semantic analysis model, analyzing character information contained in the frame image, acquiring key character features, judging whether the frame image comprises non-compliant character content or not, and obtaining a first judgment result;
vectorizing the frame image at the moment after the label, performing dimension conversion, converting the dimension P x Q of the received frame image into the dimension M x N, wherein P x Q is the dimension of a signal transmission channel, M x N is the dimension of server load processing, P, Q, M, N are all non-zero positive integers, inputting a high-dimension sample set, and calling a classifier to classify the high-dimension sample set; the classification refers to clustering according to the characteristic values after the vectorization and the dimension conversion of the frame images, the characteristic values are divided into a group in a preset difference value range, the average characteristic value of the group is calculated, and different groups with the difference value of the average characteristic values between the groups larger than a threshold value are divided into different planes;
inputting the frame images of the same plane into a graphic analysis model, identifying object information contained in the frame images, acquiring key object characteristics, judging whether the frame images comprise non-compliant graphic contents, and obtaining a second judgment result;
determining whether the frame image at the later moment of the label is in compliance according to the first judgment result and the second judgment result, if so, judging a video data stream between the current label and the next label as a compliant video data stream, and storing the compliant video data stream in a server; otherwise, judging that a section of video data stream between the current label and the next label is not compliant, and deleting the section of video data stream;
and moving to the next label, and repeating the action of extracting the frame image at the moment after the label until all the video data streams are judged to be finished.
2. The method of claim 1, wherein: the feature value of the histogram in the gradient direction includes detecting a change in a gray centroid position of the image.
3. The method according to any one of claims 1-2, wherein: and the calling classifier classifies the high-dimensional sample set, and comprises inner product operation between the input vector and the high-dimensional sample set.
4. A method according to any one of claims 1-3, characterized in that: the kernels of the semantic analysis model and the graphic analysis model both use a neural network model.
5. A visual semantic detection system, the system comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method according to instructions in the program code to implement any of claims 1-4.
6. A computer-readable storage medium, characterized in that the computer-readable storage medium is configured to store a program code for performing implementing the method of any of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111461695.XA CN114241367B (en) | 2021-12-02 | 2021-12-02 | Visual semantic detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111461695.XA CN114241367B (en) | 2021-12-02 | 2021-12-02 | Visual semantic detection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114241367A true CN114241367A (en) | 2022-03-25 |
CN114241367B CN114241367B (en) | 2024-08-23 |
Family
ID=80752822
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111461695.XA Active CN114241367B (en) | 2021-12-02 | 2021-12-02 | Visual semantic detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114241367B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120201436A1 (en) * | 2011-02-03 | 2012-08-09 | Jonathan Oakley | Method and system for image analysis and interpretation |
CN103426176A (en) * | 2013-08-27 | 2013-12-04 | 重庆邮电大学 | Video shot detection method based on histogram improvement and clustering algorithm |
US20150363648A1 (en) * | 2014-06-11 | 2015-12-17 | Arris Enterprises, Inc. | Detection of demarcating segments in video |
CN108124191A (en) * | 2017-12-22 | 2018-06-05 | 北京百度网讯科技有限公司 | A kind of video reviewing method, device and server |
CN109409294A (en) * | 2018-10-29 | 2019-03-01 | 南京邮电大学 | The classification method and system of trapping event based on object motion trajectory |
CN110019817A (en) * | 2018-12-04 | 2019-07-16 | 阿里巴巴集团控股有限公司 | A kind of detection method, device and the electronic equipment of text in video information |
CN111242019A (en) * | 2020-01-10 | 2020-06-05 | 腾讯科技(深圳)有限公司 | Video content detection method and device, electronic equipment and storage medium |
US20210200802A1 (en) * | 2019-12-30 | 2021-07-01 | Alibaba Group Holding Limited | Method and apparatus for video searches and index construction |
CN113379693A (en) * | 2021-06-01 | 2021-09-10 | 大连东软教育科技集团有限公司 | Capsule endoscopy key focus image detection method based on video abstraction technology |
-
2021
- 2021-12-02 CN CN202111461695.XA patent/CN114241367B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120201436A1 (en) * | 2011-02-03 | 2012-08-09 | Jonathan Oakley | Method and system for image analysis and interpretation |
CN103426176A (en) * | 2013-08-27 | 2013-12-04 | 重庆邮电大学 | Video shot detection method based on histogram improvement and clustering algorithm |
US20150363648A1 (en) * | 2014-06-11 | 2015-12-17 | Arris Enterprises, Inc. | Detection of demarcating segments in video |
CN108124191A (en) * | 2017-12-22 | 2018-06-05 | 北京百度网讯科技有限公司 | A kind of video reviewing method, device and server |
CN109409294A (en) * | 2018-10-29 | 2019-03-01 | 南京邮电大学 | The classification method and system of trapping event based on object motion trajectory |
CN110019817A (en) * | 2018-12-04 | 2019-07-16 | 阿里巴巴集团控股有限公司 | A kind of detection method, device and the electronic equipment of text in video information |
US20210200802A1 (en) * | 2019-12-30 | 2021-07-01 | Alibaba Group Holding Limited | Method and apparatus for video searches and index construction |
CN111242019A (en) * | 2020-01-10 | 2020-06-05 | 腾讯科技(深圳)有限公司 | Video content detection method and device, electronic equipment and storage medium |
CN113379693A (en) * | 2021-06-01 | 2021-09-10 | 大连东软教育科技集团有限公司 | Capsule endoscopy key focus image detection method based on video abstraction technology |
Non-Patent Citations (2)
Title |
---|
SHREYANSH GANDHI, ET.AL: "Scalable detection of offensive and non-compliant content/logo in product images", 《PROCEEDINGS OF THE IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION》, 31 December 2020 (2020-12-31), pages 2247 - 2256 * |
宋伟等: "基于视觉语义概念的暴恐视频检测", 《信息网络安全》, no. 9, 4 November 2016 (2016-11-04), pages 12 - 17 * |
Also Published As
Publication number | Publication date |
---|---|
CN114241367B (en) | 2024-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112734775B (en) | Image labeling, image semantic segmentation and model training methods and devices | |
CN109241861B (en) | Mathematical formula identification method, device, equipment and storage medium | |
CN110807314A (en) | Text emotion analysis model training method, device and equipment and readable storage medium | |
CN113779308B (en) | Short video detection and multi-classification method, device and storage medium | |
US20020067857A1 (en) | System and method for classification of images and videos | |
CN109189965A (en) | Pictograph search method and system | |
CN112925905B (en) | Method, device, electronic equipment and storage medium for extracting video subtitles | |
CN113591746B (en) | Document table structure detection method and device | |
CN114330234A (en) | Layout structure analysis method and device, electronic equipment and storage medium | |
CN110728193B (en) | Method and device for detecting richness characteristics of face image | |
CN111488813A (en) | Video emotion marking method and device, electronic equipment and storage medium | |
CN113095239B (en) | Key frame extraction method, terminal and computer readable storage medium | |
CN110413856A (en) | Classification annotation method, apparatus, readable storage medium storing program for executing and equipment | |
CN117037049B (en) | Image content detection method and system based on YOLOv5 deep learning | |
CN101977311A (en) | Multi-characteristic analysis-based CG animation video detecting method | |
CN112487795A (en) | Context ironic detection method, device, electronic device and storage medium | |
Cheng et al. | Activity guided multi-scales collaboration based on scaled-CNN for saliency prediction | |
CN117078970A (en) | Picture identification method and device, electronic equipment and storage medium | |
CN112241470A (en) | Video classification method and system | |
CN114241367B (en) | Visual semantic detection method and system | |
CN110874547B (en) | Method and apparatus for identifying objects from video | |
CN115410131A (en) | Method for intelligently classifying short videos | |
CN113158745B (en) | Multi-feature operator-based messy code document picture identification method and system | |
CN115063858A (en) | Video facial expression recognition model training method, device, equipment and storage medium | |
CN115457620A (en) | User expression recognition method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 607a, 6 / F, No. 31, Fuchengmenwai street, Xicheng District, Beijing 100037 Applicant after: Beijing Guorui Digital Intelligence Technology Co.,Ltd. Address before: 607a, 6 / F, No. 31, Fuchengmenwai street, Xicheng District, Beijing 100037 Applicant before: Beijing Zhimei Internet Technology Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |