CN114022923A - Intelligent collecting and editing system - Google Patents

Intelligent collecting and editing system Download PDF

Info

Publication number
CN114022923A
CN114022923A CN202111090633.2A CN202111090633A CN114022923A CN 114022923 A CN114022923 A CN 114022923A CN 202111090633 A CN202111090633 A CN 202111090633A CN 114022923 A CN114022923 A CN 114022923A
Authority
CN
China
Prior art keywords
image
sensitive
editing system
face
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111090633.2A
Other languages
Chinese (zh)
Inventor
陆建德
许文明
王必江
谢宗霖
刘永鑫
耿允
殷福权
陈儒智
肖亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan Kaiping Information Technology Co ltd
Original Assignee
Yunnan Kaiping Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan Kaiping Information Technology Co ltd filed Critical Yunnan Kaiping Information Technology Co ltd
Priority to CN202111090633.2A priority Critical patent/CN114022923A/en
Publication of CN114022923A publication Critical patent/CN114022923A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an intelligent collecting and editing system, which comprises: the face detection and identification module is used for detecting whether public figures exist in the image; the image yellow identification module is used for detecting the objectionable pornographic images; the OCR + sensitive word recognition module is used for detecting sensitive content; the voice recognition module is used for recognizing sensitive contents in the audio and video; the text comparison module is used for finding the sensitive content in the text; and the image comparison module is used for finding the sensitive content in the image. The intelligent collecting and editing system can improve the auditing efficiency and effectively reduce the phenomenon that the illegal images or videos pollute the Internet in a large quantity, thereby achieving the purpose of net clearing; the workload of manual image and video review is reduced, and the negative influence on the auditor due to excessive browsing of bad images and video contents is reduced; the workload of manual image and video review is effectively reduced, and people can be liberated from boring review work.

Description

Intelligent collecting and editing system
Technical Field
The invention relates to the technical field of artificial intelligence and images, in particular to an intelligent collecting and editing system.
Background
The development of the current fusion media is still in the initial stage of continuous exploration, and various problems exist, such as: 1. more mechanisms present a cooperative relationship rather than a fusion relationship, and people are aged and insufficient in compiling; 2. various new media products are redundant, and technical providers lack standards and need to be further fused; 3. lack of richness and appeal of content, lack of features of new media; 4. no true new media propagation matrix is established and no significant information outlets are available.
Aiming at the problems, regarding the technical problems of more mechanism collaboration, less fusion and the like in the development process of the existing fusion media, a fusion media intelligent control system with the patent number of CN202110316806.1 is inquired through a large amount of retrieval, a fusion media intelligence library, a media talent culture platform, a fusion media command management platform, a fusion media data monitoring platform, a fusion media data scheduling platform and a propagation pushing platform are adopted, and the relevant resource information of the fusion media is stored through the fusion media intelligence library; the media talent culture platform cultures related media-fused talents; the converged media command management platform issues and manages the related resource information of the converged media; monitoring related resource information of a converged medium by a converged medium data monitoring platform; the transmission and pushing platform transmits and pushes the related resource information of the converged media; according to the invention, through technology fusion, data fusion and service fusion, the mutual fusion and intercommunication of regional media platforms, unified office, unified command and scheduling, unified propaganda and launching are realized, party administration resources and propaganda contents are efficiently managed, and a truly fused media transmission system is constructed; constructing an effective propagation matrix for fusing the news information of the media center; the problem of content production of the media center is solved.
However, the technical solution provided by the patent has the following problems for the intelligent control system of the fused media: 1. the documents cannot be classified, corrected, early-warned by sensitive words and checked for duplication in the whole network; 2. the image/video/audio cannot be classified and subjected to full network duplicate search, sensitive characters cannot be identified, the image yellow identification, the character extraction, the sensitive word detection and the like cannot be realized, and the efficiency of information acquisition, editing and structuring, examination and transmission is reduced.
Disclosure of Invention
The invention aims to solve the technical problems in the background technology and provides an intelligent collecting and editing system.
In order to achieve the purpose, the invention provides the following technical scheme: intelligent collecting and editing system includes:
the face detection and identification module is used for detecting whether public figures exist in the image;
the image yellow identification module is used for detecting the objectionable pornographic images;
the OCR + sensitive word recognition module is used for detecting sensitive content;
the voice recognition module is used for recognizing sensitive contents in the audio and video;
the text comparison module is used for finding the sensitive content in the text;
and the image comparison module is used for finding the sensitive content in the image.
Further preferred embodiments: the face detection and recognition module comprises the following steps:
s1: detecting a face in the image, extracting face features through Dlib, and accelerating the face features by using a GPU;
s2: comparing the face feature library recorded in advance, and calculating the distance between the current face feature vector and the face features in the face feature library;
s3: and judging whether the face belongs to a known face or not by setting a threshold value.
Further preferred embodiments: the image yellow identification module comprises the following steps:
s1: enhancing the ImageNet1000 data set by using the ImageNet1000 data set in a data enhancement mode;
s2: training the model by using the enhanced data to obtain a pre-training model;
s3: fine-tuning the pre-training model, continuously adjusting the hyper-parameters, and training and testing on a special data set;
s4: obtaining an available image classification model, and predicting the probability that the target image belongs to the pornographic image and the conventional image;
s5: whether the target image is the pornographic image or not is judged by setting a threshold value.
Further preferred embodiments: the ImageNet1000 data set comprises more than one million images of 1000 animals and objects which are common in life, the data enhancement mode is that the original image is subjected to random rotation, translation, color transformation, noise addition and the like and is combined with GAN, and the special data set is used for training the data set of the image yellow identification model.
Further preferred embodiments: the OCR + sensitive word recognition module comprises the following steps:
s1: the method comprises the steps of realizing positioning and identification of characters in an image by referring to a CNN + RNN scheme;
s2: performing word segmentation on the recognized text through a Jieba tool;
s3: and performing adjacent combination on the segmentation results and judging whether sensitive words exist in the text.
Further preferred embodiments: the voice recognition module comprises the following steps:
s1: a Convolutional Neural Network (CNN) and a connectivity time sequence classification (CTC) method are adopted;
s2: training by using a large amount of Chinese voice data sets, and transcribing the voice into Chinese pinyin;
s3: and converting the pinyin sequence into a Chinese text through a language model.
Further preferred embodiments: the text comparison module comprises the following steps:
s1: adopting a Locality-Sensitive Hashing (LSH) algorithm;
s2: and reducing the dimension of the document to a hash number, and calculating the number two by two.
Further preferred embodiments: the image comparison module comprises the following steps:
s1: reducing the picture, and converting the zoomed picture into a 256-level gray scale image;
s2: calculating DCT, reducing DCT, and calculating the average value of all pixel points after reducing DCT;
s3: further reducing DCT to obtain information fingerprint;
s4: and comparing the fingerprints of the two pictures to obtain the Hamming distance.
Further preferred embodiments: the image comparison module adopts a DCT (discrete cosine transform) method in a perceptual Hash algorithm (pHash) to reduce the frequency.
Further preferred embodiments: the image comparison module is also used for starting the image shielding operation when the current image is detected to be consistent with the sensitive image sample in the database.
Has the advantages that:
1. the intelligent collecting and editing system can improve the auditing efficiency, effectively reduce the phenomenon that illegal images or videos pollute the Internet in a large quantity, and further achieve the purpose of net clearing; the system can reduce the workload of manual image and video review, and reduce the negative influence on the system caused by excessive review of bad images and video contents by reviewers; from the perspective of individuals, the system can effectively reduce the workload of manually checking images and videos, and people can be liberated from boring checking work; from the perspective of enterprises, the system can greatly improve the working efficiency, reduce the cost of manual examination and reduce the related expenditure;
2. the intelligent gathering and editing system is combined with advanced AI technology and cloud computing capacity, and assists in the whole processes of planning, gathering and editing, checking, releasing and the like of news production; the intelligent and networked information collecting and editing system comprises all applications of collecting, editing, reviewing and sending, and has full functions of information reporting, information selecting, information classifying to information receiving and use and text combining; based on massive information such as text, video, pictures, voice and the like of the whole network, the fields of natural language, machine learning, computer vision and the like in the AI key technology are effectively applied to the media-integrated industry. The method can classify, correct, early warn sensitive words and check duplicate in the whole network for the manuscript respectively; the method has the advantages that the method can be used for classifying images/videos/audios, searching for duplication in a full network, identifying sensitive characters, identifying yellow images, extracting characters, detecting sensitive words and the like, so that the efficiency of information acquisition and coding structured coding, examining and sending is fundamentally improved, and adverse effects on individuals or enterprises due to carelessness of workers are prevented; on platforms such as social contact, forum and interaction, the system can automatically filter sensitive and illegal images, and help build a harmonious and clean internet platform; the system provides powerful support for information acquisition and processing work, greatly improves the work efficiency of editing, effectively reduces the operation cost of new media, and obviously improves the company performance.
Drawings
FIG. 1 is a functional block diagram of the system of the present invention;
FIG. 2 is a schematic flow chart of a face detection and recognition module according to the present invention;
FIG. 3 is a schematic flow chart of the image yellow-identifying module according to the present invention;
FIG. 4 is a schematic flow diagram of the OCR + sensitive word recognition module of the present invention;
FIG. 5 is a flow chart of a speech recognition module according to the present invention;
FIG. 6 is a schematic flow chart of a text comparison module according to the present invention;
FIG. 7 is a schematic flow chart of an image matching module according to the present invention;
FIG. 8 is a schematic flow chart of text monitoring according to the present invention;
FIG. 9 is a schematic view of a process of image monitoring according to the present invention;
FIG. 10 is a flow chart of audio monitoring according to the present invention;
FIG. 11 is a schematic flow chart of video monitoring according to the present invention;
FIG. 12 is a flow chart of live monitoring of the present invention;
FIG. 13 is an exemplary diagram of a text, picture, audio monitoring system according to the present invention;
FIG. 14 is a diagram of an exemplary video monitor of the present invention;
FIG. 15 is a system architecture diagram of a data application and data service architecture of the present invention;
FIG. 16 is a system architecture diagram of data refinement and data processing analysis of the present invention;
FIG. 17 is a system architecture diagram of a data collection and data source of the present invention.
Detailed Description
The technical solution in the embodiment of the present invention will be clearly and completely described below with reference to fig. 1 to 17 in the embodiment of the present invention.
Example 1
Referring to fig. 1 and 15-17, in an embodiment of the present invention, the intelligent editing system includes:
the face detection and identification module is used for detecting whether public figures exist in the image;
the image yellow identification module is used for detecting the objectionable pornographic images;
the OCR + sensitive word recognition module is used for detecting sensitive content;
the voice recognition module is used for recognizing sensitive contents in the audio and video;
the text comparison module is used for finding the sensitive content in the text;
the image comparison module is used for finding the sensitive content in the image;
in the business of news collection, production, distribution and the like, an intelligent production and distribution process is constructed based on AI technologies such as machine learning, deep learning and the like; the method has the advantages that accurate voice recognition, image recognition and semantic recognition are realized, the media is innovated to be manufactured, released and spread, and the productivity and the spreading power of the media are greatly improved; the method mainly comprises the step of monitoring the edited and submitted manuscript, pictures, videos and audios.
Example 2
Referring to fig. 2 and fig. 13, the embodiment of the present invention is different from embodiment 1 in that: the face detection and recognition module comprises the following steps:
s1: detecting a face in the image, extracting face features through Dlib, and accelerating the face features by using a GPU;
s2: comparing the face feature library recorded in advance, and calculating the distance between the current face feature vector and the face features in the face feature library;
s3: judging whether the face belongs to a known face or not by setting a threshold value;
the face recognition algorithm can support face recognition on the front side and face recognition on the slightly non-front side, and can automatically mark the face, so that auditors can conveniently and quickly check the face; the face recognition technology is used for detecting whether public figures exist in the image, so that adverse effects caused by negligence of auditors are reduced to a great extent;
the face recognition algorithm is realized as follows:
encoding the picture using a HOG algorithm to create a simplified version of the image, using the simplified image to find a generic HOG-encoded portion of the image that most resembles a human face; to find a face in an image, the image is first set to black and white because no color data is needed to find the face: each pixel in the image will then be viewed one at a time; for each pixel, the pixels immediately surrounding it are also viewed: the goal is to figure out how dark the current pixel is compared to the pixels immediately surrounding it; then an arrow is drawn showing the direction in which the image darkens: looking at only this one pixel and the pixel touching it, the image becomes darker to the upper right; if this process is repeated for each pixel in the image, eventually each pixel is replaced by an arrow; these arrows, called fades, show the flow of the entire image from light to dark; if the pixels are analyzed directly, the true dark image and the true light image of the same person will have completely different pixel values; but only considering the direction of the brightness change, the truly dark image and the truly bright image will eventually both get the same accurate representation; if the basic flow of light and shade can be seen at a higher level, the basic mode of the image can be seen;
wherein the image is decomposed into small squares of 16x16 pixels each; in each square, the number of fade points in each major direction (how many points are up, pointing up right, pointing right, etc. … …) will be calculated; then that square in the image will be replaced with the strongest arrow direction;
the end result is, among other things, the conversion of the original image into a very simple representation, capturing the basic structure of a human face in a simple way: the original image is converted to a HOG representation, which captures the main features of the image, regardless of the image brightness; to find a face in this HOG image, all that is needed is to find the portion of the image that is most similar to the known HOG pattern extracted from a pile of other training faces: with this technique, a human face can be easily found in any image;
wherein the pose of the face is determined by finding the primary landmarks of the face; once these landmarks are found, they are used to distort the image, centering the eyes and mouth; the centered face image is processed through a neural network which knows how to measure the face characteristics to store the 128 measured characteristic values; looking at all the saved measured faces to see which person's feature value is closest to the face measured feature value.
Example 3
Referring to fig. 3 and 9, the embodiment of the present invention is different from embodiment 1 in that: the image yellow identification module comprises the following steps:
s1: enhancing the ImageNet1000 data set by using the ImageNet1000 data set in a data enhancement mode;
s2: training the model by using the enhanced data to obtain a pre-training model;
s3: fine-tuning the pre-training model, continuously adjusting the hyper-parameters, and training and testing on a special data set;
s4: obtaining an available image classification model, and predicting the probability that the target image belongs to the pornographic image and the conventional image;
s5: whether the target image is the pornographic image or not is judged by setting a threshold value.
In the embodiment of the invention, an ImageNet1000 data set comprises more than one million images of 1000 animals and objects which are common in life, the data enhancement mode is that the original image is subjected to random rotation, translation, color transformation, noise addition and the like and is combined with GAN, and a special data set is used for training a data set of an image yellow identification model;
by using a Resnet50 model structure, the network does not have the gradient disappearance phenomenon, so that the real data distribution situation can be infinitely approximated theoretically; the images can be automatically classified, and the accuracy rate of the images exceeds 95%; because defining image categories is highly subjective, some images may be objectionable in certain scenes and then sometimes appropriate, different thresholds are used in different scenes by predicting the probability that the target image belongs to each category, thereby achieving the effect of adjusting to local conditions; training the model by using the enhanced data can effectively enhance the generalization capability of the model, so that the model can effectively learn the characteristics of the shape, texture and the like of the image;
wherein, the image yellow-identifying module further comprises a picture monitoring process, and the specific process is as follows:
1. uploading picture files
2. Calling an image content monitoring algorithm for detection, wherein the detection comprises the following steps: extracting texts and subtitles appearing in the image to detect sensitive words; detecting and reminding a head portrait of a leader appearing in the image (face recognition algorithm); whether pornographic contents exist in the image or not is pre-warned (adult/obscene picture classification model based on Caffe deep neural network training).
In the embodiment of the invention, the image comparison module adopts a method for reducing the frequency by using DCT (discrete cosine transform) in a perceptual Hash algorithm (pHash).
In the embodiment of the present invention, the image comparison module is further configured to start the image masking operation when it is detected that the current image matches the sensitive image sample in the database.
Example 4
Referring to fig. 4, the embodiment of the present invention is different from embodiment 1 in that: the OCR + sensitive word recognition module comprises the following steps:
s1: the method comprises the steps of realizing positioning and identification of characters in an image by referring to a CNN + RNN scheme;
s2: performing word segmentation on the recognized text through a Jieba tool;
s3: carrying out adjacent combination on the word segmentation results and judging whether sensitive words exist in the text or not;
compared with the traditional scheme of directly matching sensitive words, the technical scheme of firstly segmenting words and then identifying words can effectively improve the identification accuracy.
Example 5
Referring to fig. 5 and 10, the embodiment of the present invention is different from embodiment 1 in that: the voice recognition module comprises the following steps:
s1: a Convolutional Neural Network (CNN) and a connectivity time sequence classification (CTC) method are adopted;
s2: training by using a large amount of Chinese voice data sets, and transcribing the voice into Chinese pinyin;
s3: converting the pinyin sequence into a Chinese text through a language model;
the voice recognition module further comprises an audio monitoring process, and the specific process is as follows:
1. uploading an audio file;
2. an artificial intelligence algorithm ASRT is used for audio recognition, and the specific algorithm content is as follows:
ASRT is a set of speech recognition system based on deep learning, which is trained by using a large amount of Chinese speech data sets by adopting a Convolutional Neural Network (CNN) and connectivity time sequence classification (CTC) method, transcribes sound into Chinese pinyin, and converts a pinyin sequence into a Chinese text through a language model; feature extraction: converting a common wav voice signal into a two-dimensional spectrum image signal, namely a spectrogram, required by a neural network through operations such as framing and windowing;
acoustic model: based on Keras and TensorFlow frameworks, the deep convolutional neural network with reference to VGG is used as a network model and trained;
CTC decoding: in the acoustic model output of the speech recognition system, a large number of continuous and repeated symbols are often included, so that continuous and same symbols need to be combined into the same symbol, and then the mute separation marker is removed to obtain the final actual speech pinyin symbol sequence;
the language model is as follows: converting the pinyin into a final recognition text by using a statistical language model and outputting the final recognition text; the nature of the Pinyin-to-text is modeled as a hidden Markov chain, and the model has high accuracy;
3. the identified text content calls a sensitive word detection algorithm to detect, and sensitive content (such as counterwork, advertisement, politics, gun-involved explosive violation, pornography, corruption and the like) in the text content is found;
4. and (3) carrying out duplicate checking on a resource library, and comparing audio frequencies by adopting an algorithm as follows:
music is taken as signal processing, Fast Fourier Transform (FFT), and the method has a good application scene in the aspect of signal processing: music, in fact, is a digital code that resembles a long string of numbers. In uncompressed wav files, there are many such numbers-44100 numbers per second per channel. This means that a three minute long song has nearly 1600 ten thousand digits; 3 minutes x 60 seconds x 44100 samples per second x 2 channels =15,876,000 signal samples; channel refers to an independent sequence of signal samples that can be played with a speaker;
in the case of audio recording, it is a widely accepted rule that signals above 22050Hz can be ignored, since the human ear cannot hear frequencies above 20000 Hz. Therefore, according to the nyquist theorem, double sampling is required:
= highest frequency 2=22050 2=44100 requiring sampling per second;
MP3 format files compress this sample rate, in fact pure wav format files are a series of 16-bit digital sequences (plus a small header);
because these audio samples are actually signals, a spectrogram of a song can be generated by fast fourier transform continuously on song samples within a short time window; the spectrogram is a matrix with time represented by a horizontal axis, frequency represented by a vertical axis and amplitude represented by color; the fast fourier transform shows the intensity (amplitude) of the signal at a particular frequency; if the sliding windows are calculated for enough times, the sliding windows can be spliced together to form a matrix frequency spectrum;
it is important to note that the values of frequency and time are discrete, each pair representing a "bin", the amplitude being real; color represents the real value of amplitude (red- > higher, green- > lower) in the discretized (time, frequency) coordinate system;
uniquely tagging the song with the spectrogram; there will be noise-there is speech sound in the background sound when the song is recognized; finding a robust method to obtain a "digital fingerprint" of an audio signal;
with a spectrogram generated from an audio signal, one can start by finding a 'peak' inside the amplitude. Defining a peak value as a time frequency corresponding to a local maximum value of the amplitude nearby; the surrounding time-frequency correspondences have smaller amplitudes than the surrounding time-frequency correspondences, and are more likely to be background noise.
Example 6
Referring to fig. 6 and 8, the embodiment of the present invention is different from embodiment 1 in that: the text comparison module comprises the following steps:
s1: adopting a Locality-Sensitive Hashing (LSH) algorithm;
s2: reducing the dimension of the document to a hash number, and calculating the number pairwise;
the text comparison module further comprises a text monitoring process, and the specific process is as follows:
1. inputting text content;
2. carrying out error correction identification on the text content, and giving a prompt when finding the error content in the text;
3. calling a sensitive word detection algorithm for detection, and finding out sensitive contents (reaction, advertisement, politics, gun-involved explosive violation, pornography, corruption and the like) in the sensitive words;
4. and (3) carrying out duplicate checking on the clue library, wherein the duplicate checking adopts a simhash algorithm, and the algorithm is specifically described as follows:
the algorithm is mainly divided into the following steps:
1) dividing words, namely dividing words of the text to be judged into characteristic words of the article; finally, forming a word sequence with noise words removed, adding a weight to each word, and assuming that the weight is divided into 5 levels (1-5); such as: an employee in ' American ' 51 district ' refers to 9 flying saucer in the employee, and the employee who sees gray alien "= = > is" American (4) 51 district (5) after the employee (3) in (1) refers to (2) has (1) 9 flying saucer (3) and flying saucer (5) once (1) sees (3) gray alien (4) "in parentheses, which represents the importance degree of the word in the whole sentence, and the number is more important;
2) hash, each word is changed into a hash value by a hash algorithm, for example, "usa" is calculated as 100101 by the hash algorithm, and "51 zone" is calculated as 101011 by the hash algorithm;
3) weighting, namely generating a result through the hash of the 2 steps, wherein a weighted number string needs to be formed according to the weight of the word, for example, the hash value of the United states is '100101', and the weighted number string is calculated to be '4-4-44-44' through weighting; the hash value of the "51 region" is "101011", and is calculated as "5-55-555" by weighting;
4) merging, accumulating the sequence values calculated by the words to form only one sequence string; for example, "4-4-44-44" of "U.S." and "5-55-555" of "zone 51", each bit is accumulated, and "4 +5-4+ -5-4+54+ -5-4+54+ 5" = "9-91-119"; here, for example, only two words are computed, and the real computation requires the accumulation of the sequence strings of all words;
5) reducing dimension, namely changing the '9-91-119' calculated in the step 4 into a 01 string to form a final simhash signature; if each bit is greater than 0 and is recorded as 1, less than 0 is recorded as 0; the final result is: "101011";
example 7
Referring to fig. 7, fig. 11 and fig. 12, the embodiment of the present invention is different from embodiment 1 in that: the image comparison module comprises the following steps:
s1: reducing the picture, and converting the zoomed picture into a 256-level gray scale image;
among them, 32 x 32 is a better size, which facilitates DCT calculation;
s2: calculating DCT, reducing DCT, and calculating the average value of all pixel points after reducing DCT;
wherein, the matrix after DCT calculation is 32 × 32, and the upper left corner 8 × 8 is reserved, which represents the lowest frequency of the picture;
s3: further reducing DCT to obtain information fingerprint;
wherein values greater than the average are recorded as 1, otherwise they are recorded as 0; combining 64 information bits, and keeping consistency in sequence at will;
s4: comparing the fingerprints of the two pictures to obtain a Hamming distance;
wherein, this is equivalent to a Hamming Distance (in the information theory, the Hamming Distance between two equal-length character strings is the number of different characters at the corresponding positions of the two character strings); if the number of the different data bits does not exceed 5, the two images are very similar; if the number is more than 10, the two different images are indicated;
the image comparison module further comprises a video monitoring process, and the specific process is as follows:
1. uploading a video file;
2. extracting audio content, and detecting sensitive words of the audio content;
3. the method comprises the following steps of extracting frames from a video, extracting 1 frame per second, and calling an image content monitoring algorithm for detection, wherein the method comprises the following steps: extracting texts and subtitles appearing in the image to detect sensitive words; detecting and reminding a head portrait of a leader appearing in the image (face recognition algorithm); whether pornographic contents exist in the image or not is pre-warned (based on an adult/obscene picture classification model trained by a Caffe deep neural network, the picture to be detected is input, the picture rating (0-1) is returned, the picture is more yellow and violent when the rating is higher, the picture with larger scale is filtered, and the picture with more than 0.8 is filtered);
4. checking the duplicate in a resource library;
the image comparison module further comprises a live broadcast monitoring process, and the specific process is as follows:
1. direct seeding is found to be carried out;
2. extracting the direct current, extracting 1 frame per second and calling an image content monitoring algorithm for detection, wherein the method comprises the following steps: extracting texts and subtitles appearing in the image to detect sensitive words; detecting and reminding a head portrait of a leader appearing in the image (face recognition algorithm); whether pornographic contents exist in the image or not is pre-warned (an adult/obscene picture classification model based on Caffe deep neural network training);
3. and displaying the live broadcast early warning content on the monitoring station.

Claims (10)

1. Intelligent collecting and editing system, its characterized in that: the method comprises the following steps:
the face detection and identification module is used for detecting whether public figures exist in the image;
the image yellow identification module is used for detecting the objectionable pornographic images;
the OCR + sensitive word recognition module is used for detecting sensitive content;
the voice recognition module is used for recognizing sensitive contents in the audio and video;
the text comparison module is used for finding the sensitive content in the text;
and the image comparison module is used for finding the sensitive content in the image.
2. The intelligent editing system of claim 1, wherein: the face detection and recognition module comprises the following steps:
s1: detecting a face in the image, extracting face features through Dlib, and accelerating the face features by using a GPU;
s2: comparing the face feature library recorded in advance, and calculating the distance between the current face feature vector and the face features in the face feature library;
s3: and judging whether the face belongs to a known face or not by setting a threshold value.
3. The intelligent editing system of claim 1, wherein: the image yellow identification module comprises the following steps:
s1: enhancing the ImageNet1000 data set by using the ImageNet1000 data set in a data enhancement mode;
s2: training the model by using the enhanced data to obtain a pre-training model;
s3: fine-tuning the pre-training model, continuously adjusting the hyper-parameters, and training and testing on a special data set;
s4: obtaining an available image classification model, and predicting the probability that the target image belongs to the pornographic image and the conventional image;
s5: whether the target image is the pornographic image or not is judged by setting a threshold value.
4. The intelligent editing system of claim 2, wherein: the ImageNet1000 data set comprises more than one million images of 1000 animals and objects which are common in life, the data enhancement mode is that the original image is subjected to random rotation, translation, color transformation, noise addition and the like and is combined with GAN, and the special data set is used for training the data set of the image yellow identification model.
5. The intelligent editing system of claim 1, wherein: the OCR + sensitive word recognition module comprises the following steps:
s1: the method comprises the steps of realizing positioning and identification of characters in an image by referring to a CNN + RNN scheme;
s2: performing word segmentation on the recognized text through a Jieba tool;
s3: and performing adjacent combination on the segmentation results and judging whether sensitive words exist in the text.
6. The intelligent editing system of claim 1, wherein: the voice recognition module comprises the following steps:
s1: a Convolutional Neural Network (CNN) and a connectivity time sequence classification (CTC) method are adopted;
s2: training by using a large amount of Chinese voice data sets, and transcribing the voice into Chinese pinyin;
s3: and converting the pinyin sequence into a Chinese text through a language model.
7. The intelligent editing system of claim 1, wherein: the text comparison module comprises the following steps:
s1: adopting a Locality-Sensitive Hashing (LSH) algorithm;
s2: and reducing the dimension of the document to a hash number, and calculating the number two by two.
8. The intelligent editing system of claim 1, wherein: the image comparison module comprises the following steps:
s1: reducing the picture, and converting the zoomed picture into a 256-level gray scale image;
s2: calculating DCT, reducing DCT, and calculating the average value of all pixel points after reducing DCT;
s3: further reducing DCT to obtain information fingerprint;
s4: and comparing the fingerprints of the two pictures to obtain the Hamming distance.
9. The intelligent editing system of claim 8, wherein: the image comparison module adopts a DCT (discrete cosine transform) method in a perceptual Hash algorithm (pHash) to reduce the frequency.
10. The intelligent editing system of claim 1, wherein: the image comparison module is also used for starting the image shielding operation when the current image is detected to be consistent with the sensitive image sample in the database.
CN202111090633.2A 2021-09-17 2021-09-17 Intelligent collecting and editing system Pending CN114022923A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111090633.2A CN114022923A (en) 2021-09-17 2021-09-17 Intelligent collecting and editing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111090633.2A CN114022923A (en) 2021-09-17 2021-09-17 Intelligent collecting and editing system

Publications (1)

Publication Number Publication Date
CN114022923A true CN114022923A (en) 2022-02-08

Family

ID=80054726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111090633.2A Pending CN114022923A (en) 2021-09-17 2021-09-17 Intelligent collecting and editing system

Country Status (1)

Country Link
CN (1) CN114022923A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114786035A (en) * 2022-05-25 2022-07-22 上海氪信信息技术有限公司 Compliance quality inspection and interactive question-answering system and method for live scene
CN116208802A (en) * 2023-05-05 2023-06-02 广州信安数据有限公司 Video data multi-mode compliance detection method, storage medium and compliance detection device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833487A (en) * 2012-08-08 2012-12-19 中国科学院自动化研究所 Visual computing-based optical field imaging device and method
US20130182182A1 (en) * 2012-01-18 2013-07-18 Eldon Technology Limited Apparatus, systems and methods for presenting text identified in a video image
US20150310107A1 (en) * 2014-04-24 2015-10-29 Shadi A. Alhakimi Video and audio content search engine
US20170289624A1 (en) * 2016-04-01 2017-10-05 Samsung Electrônica da Amazônia Ltda. Multimodal and real-time method for filtering sensitive media
CN108918532A (en) * 2018-06-15 2018-11-30 长安大学 A kind of through street traffic sign breakage detection system and its detection method
CN109271965A (en) * 2018-10-11 2019-01-25 百度在线网络技术(北京)有限公司 Video reviewing method, device and storage medium
CN110837615A (en) * 2019-11-05 2020-02-25 福建省趋普物联科技有限公司 Artificial intelligent checking system for advertisement content information filtering
CN111814860A (en) * 2020-07-01 2020-10-23 浙江工业大学 Multi-target detection method for garbage classification
CN111860434A (en) * 2020-07-31 2020-10-30 贵州大学 Robot vision privacy behavior identification and protection method
CN113139782A (en) * 2021-03-24 2021-07-20 湖南新浪信息服务有限公司 Intelligent control system for converged media

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130182182A1 (en) * 2012-01-18 2013-07-18 Eldon Technology Limited Apparatus, systems and methods for presenting text identified in a video image
CN102833487A (en) * 2012-08-08 2012-12-19 中国科学院自动化研究所 Visual computing-based optical field imaging device and method
US20150310107A1 (en) * 2014-04-24 2015-10-29 Shadi A. Alhakimi Video and audio content search engine
US20170289624A1 (en) * 2016-04-01 2017-10-05 Samsung Electrônica da Amazônia Ltda. Multimodal and real-time method for filtering sensitive media
CN108918532A (en) * 2018-06-15 2018-11-30 长安大学 A kind of through street traffic sign breakage detection system and its detection method
CN109271965A (en) * 2018-10-11 2019-01-25 百度在线网络技术(北京)有限公司 Video reviewing method, device and storage medium
CN110837615A (en) * 2019-11-05 2020-02-25 福建省趋普物联科技有限公司 Artificial intelligent checking system for advertisement content information filtering
CN111814860A (en) * 2020-07-01 2020-10-23 浙江工业大学 Multi-target detection method for garbage classification
CN111860434A (en) * 2020-07-31 2020-10-30 贵州大学 Robot vision privacy behavior identification and protection method
CN113139782A (en) * 2021-03-24 2021-07-20 湖南新浪信息服务有限公司 Intelligent control system for converged media

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
杨焕峥;: "基于深度学习的中文语音识别模型设计与实现", 湖南邮电职业技术学院学报, no. 03, 15 September 2020 (2020-09-15) *
潘粤成;刘卓;潘文豪;蔡典仑;韦政松;: "一种基于CNN/CTC的端到端普通话语音识别方法", 现代信息科技, no. 05, 10 March 2020 (2020-03-10) *
王赛赛;张磊;李健: "面向网站图像数据的安全分析系统", 计算机系统应用, vol. 27, no. 10, 15 October 2018 (2018-10-15), pages 1 - 6 *
罗时俊;: "天行云:人工智能助力广电视频生产与运营升级", 视听界(广播电视技术), no. 01, 10 February 2018 (2018-02-10), pages 1 - 6 *
罗时俊等: "天行云:人工智能助力广电视频生产与运营升级", 视听界(广播电视技术), no. 01, 10 February 2018 (2018-02-10), pages 1 - 6 *
聂长生;王向前;: "媒体融合统一内容管理中的AI应用", 广播电视信息, no. 10, 15 October 2018 (2018-10-15) *
韩利明;: "AI爆发前夜, 浅谈大洋LeoAI在融媒体中的深度实践", 电视工程, no. 03, 30 September 2018 (2018-09-30), pages 1 - 6 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114786035A (en) * 2022-05-25 2022-07-22 上海氪信信息技术有限公司 Compliance quality inspection and interactive question-answering system and method for live scene
CN116208802A (en) * 2023-05-05 2023-06-02 广州信安数据有限公司 Video data multi-mode compliance detection method, storage medium and compliance detection device

Similar Documents

Publication Publication Date Title
Huang et al. A visual–textual fused approach to automated tagging of flood-related tweets during a flood event
CN115994230A (en) Intelligent archive construction method integrating artificial intelligence and knowledge graph technology
CN114022923A (en) Intelligent collecting and editing system
CN114694220B (en) Double-flow face counterfeiting detection method based on Swin Transformer
CN112633241B (en) News story segmentation method based on multi-feature fusion and random forest model
CN111488487B (en) Advertisement detection method and detection system for all-media data
Singh et al. Systematic Linear Word String Recognition and Evaluation Technique
CN112183334A (en) Video depth relation analysis method based on multi-modal feature fusion
CN114896305A (en) Smart internet security platform based on big data technology
CN109829499A (en) Image, text and data fusion sensibility classification method and device based on same feature space
CN110781333A (en) Method for processing unstructured monitoring data of cable-stayed bridge based on machine learning
CN113936236A (en) Video entity relationship and interaction identification method based on multi-modal characteristics
Kesiman et al. ICFHR 2018 competition on document image analysis tasks for southeast asian palm leaf manuscripts
CN112925905A (en) Method, apparatus, electronic device and storage medium for extracting video subtitles
CN115512259A (en) Multimode-based short video auditing method
CN113052243A (en) Target detection method based on CycleGAN and condition distribution self-adaption
Rigaud et al. What do we expect from comic panel extraction?
CN115272533A (en) Intelligent image-text video conversion method and system based on video structured data
CN116562270A (en) Natural language processing system supporting multi-mode input and method thereof
CN111914649A (en) Face recognition method and device, electronic equipment and storage medium
CN111488813A (en) Video emotion marking method and device, electronic equipment and storage medium
CN108520740B (en) Audio content consistency analysis method and analysis system based on multiple characteristics
CN107291952B (en) Method and device for extracting meaningful strings
CN114998785B (en) Intelligent Mongolian video analysis method
CN116434759A (en) Speaker identification method based on SRS-CL network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination