CN117115155A - Image analysis method and system based on AI live broadcast - Google Patents

Image analysis method and system based on AI live broadcast Download PDF

Info

Publication number
CN117115155A
CN117115155A CN202311370419.1A CN202311370419A CN117115155A CN 117115155 A CN117115155 A CN 117115155A CN 202311370419 A CN202311370419 A CN 202311370419A CN 117115155 A CN117115155 A CN 117115155A
Authority
CN
China
Prior art keywords
image
frame
video
video segment
optical flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311370419.1A
Other languages
Chinese (zh)
Inventor
陈达剑
李火亮
陈鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Tuoshi Intelligent Technology Co ltd
Original Assignee
Jiangxi Tuoshi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Tuoshi Intelligent Technology Co ltd filed Critical Jiangxi Tuoshi Intelligent Technology Co ltd
Priority to CN202311370419.1A priority Critical patent/CN117115155A/en
Publication of CN117115155A publication Critical patent/CN117115155A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image analysis method and system based on AI live broadcast, wherein the method comprises the following steps: acquiring a live video stream, preprocessing the video stream to acquire a video set comprising a plurality of video segments, wherein the video segments comprise a plurality of frame images; determining a key frame image from the video segment, and extracting an optical flow image in the video segment; determining a combined feature of the video segment based on the key frame image and the optical flow image; constructing an LSTM classifier, and training the LSTM classifier through combined features so that the LSTM classifier has sensitive image recognition capability. By the method, the sensitive images can be automatically identified accurately and efficiently in real time, and the method replaces a manual management method, so that the requirement of real-time management is met.

Description

Image analysis method and system based on AI live broadcast
Technical Field
The application relates to the technical field of image processing, in particular to an image analysis method and system based on AI live broadcast.
Background
With the continuous improvement of network quality and bandwidth, people can use higher and higher quality and high speed network services. In this context, the live industry has evolved with the help of powerful network deployments and high network speed support.
Along with the development of artificial intelligence technology, artificial intelligence AI live broadcast is also widely applied to live broadcast industry, and a brand new live broadcast mode is used for replacing a traditional live broadcast mode, and AI live broadcast is an intelligent virtual image constructed based on the artificial intelligence technology, and real-time live broadcast is performed on a network platform through the virtual image, so that user experience can be enhanced, live broadcast quality can be improved, and live broadcast environment conditions can be improved.
However, the problems existing in the AI live broadcast are gradually revealed, in the AI live broadcast process, some sensitive images can appear, and the management of the sensitive images is basically in a human management stage, and the management of the sensitive images is mainly performed through supervision of users and staff on a platform, however, the quantity of AI live broadcast in the same time period is extremely huge, and the real-time management requirement on the sensitive images in the live broadcast process cannot be met only by means of human management.
Disclosure of Invention
The embodiment of the application provides an image analysis method and system based on AI live broadcasting, which are used for solving the technical problems that in the prior art, sensitive images in the AI live broadcasting process are managed only by means of manpower management, the management strength is insufficient, and the real-time management requirement on the sensitive images in the AI live broadcasting process cannot be met.
In a first aspect, an embodiment of the present application provides an image analysis method based on AI live broadcast, including the following steps:
acquiring a live video stream, and preprocessing the video stream to acquire a video set comprising a plurality of video segments, wherein the video segments comprise a plurality of frame images;
determining a key frame image from the video segment, and extracting an optical flow image in the video segment;
determining a combined feature of the video segment based on the key frame image and the optical flow image;
constructing an LSTM classifier, and training the LSTM classifier through the combined features so as to enable the LSTM classifier to have sensitive image recognition capability.
Further, the step of preprocessing the video stream to obtain a video set including a plurality of video segments, the video segments including a plurality of frame images includes:
dividing the video stream into a plurality of frame images under continuous frames, dividing the frame images of a first frame into a first video segment, and taking the frame images of the first frame as a first center point of the first video segment;
comparing the similarity between the frame image of the second frame and the first center point;
if the similarity between the frame image of the second frame and the first center point is greater than a similarity threshold, dividing the frame image of the second frame into the first video segment, and calculating a first updated center point of the first video segment;
if the similarity between the frame image of the second frame and the first center point is smaller than a similarity threshold, dividing the frame image of the second frame into a second video segment, and taking the frame image of the second frame as a second center point of the second video segment;
and sequentially processing the subsequent frame images in a time sequence until a plurality of frame images are classified into a plurality of video segments to form a video set.
Further, the calculation formula of the first update center point is:
wherein the method comprises the steps of,Representing a first updated center point, +.>Indicating the number of existing frame pictures in the ith video segment,/-, for example>Frame image representing the j-th frame, +.>Representing the kth frame image in the ith video segment.
Further, the step of determining a key frame image from the video segment specifically includes:
and calculating entropy values of the frame images in the video segment, and comparing the entropy values of different frame images to select the frame image with the largest entropy value as a key frame image.
Further, the step of extracting an optical flow image in the video segment includes:
extracting a first-direction moving image and a second-direction moving image of the video segment by a TV-L1 dense optical flow algorithm;
and accumulating the first direction moving image and the second direction moving image as an optical flow image.
Further, the step of determining the combined features of the video segment based on the key frame image and the optical flow image comprises:
constructing a feature extraction model, wherein the feature extraction model comprises a spatial convolution network and an optical flow convolution network, and the spatial convolution network and the optical flow convolution network are both connected with a combined network;
taking the key frame image as an input value of the spatial convolution network to acquire spatial features;
taking the optical flow image as an input value of the optical flow convolution network to acquire action characteristics;
and taking the spatial characteristics and the action characteristics as input values of the combined network so as to acquire the combined characteristics through the combined network.
Further, the step of acquiring the combined characteristic through the combined network specifically includes:
and carrying out averaging processing on the action features and the space features through the combined network to form combined features.
Further, the step of constructing an LSTM classifier, training the LSTM classifier by the combined features to enable the LSTM classifier to have sensitive image recognition capabilities includes:
constructing a plurality of first neurons to form an LSTM layer;
constructing a plurality of second neurons to form a fully connected layer, and connecting the LSTM layers with the fully connected layer to form an LSTM classifier;
the combined characteristic is partitioned into a positive sample and a negative sample based on a sensitive image and a normal image, and the positive sample and the negative sample are input into the LSTM classifier as input values, so that the LSTM classifier has sensitive image recognition capability.
Further, the fully-connected layer includes 2 second neurons, and an activation function of the fully-connected layer is a softmax function.
In a second aspect, an embodiment of the present application provides an image analysis system based on AI live broadcast, which is applied to an image analysis method based on AI live broadcast in the above technical solution, where the system includes:
the preprocessing module is used for acquiring a live video stream, preprocessing the video stream to acquire a video set comprising a plurality of video segments, wherein the video segments comprise a plurality of frame images;
the extraction module is used for determining a key frame image from the video segment and extracting an optical flow image in the video segment;
a combination module for determining a combination feature of the video segment based on the key frame image and the optical flow image;
and the analysis module is used for constructing an LSTM classifier, and training the LSTM classifier through the combined features so as to enable the LSTM classifier to have sensitive image recognition capability.
Compared with the prior art, the application has the beneficial effects that: by extracting the key frame image, further extracting the spatial characteristics of the key frame image, training a neural network by using the spatial characteristics of the key frame image, then completing the automatic identification of the sensitive image in the AI live video stream, further extracting the action characteristics of the optical flow image by extracting the optical flow image together, combining the spatial characteristics and the action characteristics into the combined characteristics, avoiding the situation of misjudgment on partial indistinguishable images caused by only considering the spatial characteristics, effectively improving the accuracy of the automatic identification of the sensitive image, and after constructing and training the LSTM classifier, carrying out image classification on a large number of video streams in the same time period by the LSTM classifier after completing the lifting of the combined characteristics so as to complete the real-time, accurate and efficient automatic identification of the sensitive image.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
Fig. 1 is a flowchart of an image analysis method based on AI live broadcast in a first embodiment of the present application;
fig. 2 is a block diagram of an image analysis system based on AI live broadcast according to a second embodiment of the present application;
the application will be further described in the following detailed description in conjunction with the above-described figures.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person of ordinary skill in the art based on the embodiments provided by the present application without making any inventive effort, are intended to fall within the scope of the present application.
It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the described embodiments of the application can be combined with other embodiments without conflict.
Referring to fig. 1, the image analysis method based on AI live broadcast provided by the first embodiment of the present application includes the following steps:
step S10: acquiring a live video stream, and preprocessing the video stream to acquire a video set comprising a plurality of video segments, wherein the video segments comprise a plurality of frame images;
the purpose of preprocessing the video stream is to summarize and classify the frame images with higher similarity, so that the data processing amount of the video stream is effectively reduced, the calculation complexity is reduced, and the efficiency of sensitive image recognition is improved. The step S10 includes:
s110: dividing the video stream into a plurality of frame images under continuous frames, dividing the frame images of a first frame into a first video segment, and taking the frame images of the first frame as a first center point of the first video segment;
s120: comparing the similarity between the frame image of the second frame and the first center point;
in the initial situation, comparing the similarity of the frame image of the second frame with that of the first frame, specifically, constructing an HSV color space, and equally dividing an H value in the HSV color space into 12 sections, an S value into 5 sections and a V value into 5 sections to form a plurality of index areas; mapping the frame image of the first frame and the frame image of the second frame into HSV color space respectively to obtain a first color histogram and a second color histogram; acquiring a corresponding minimum value of the first color histogram and the second color histogram in the index area; it is understandable that the H value, S value, and V value of the first color histogram and the second color histogram fall into the number of points in the index region; and superposing the minimum values of different index areas to form a similarity value.
S130: if the similarity between the frame image of the second frame and the first center point is greater than a similarity threshold, dividing the frame image of the second frame into the first video segment, and calculating a first updated center point of the first video segment;
that is, the similarity value is smaller than the similarity threshold, it is understood that, in the different index regions, the first color histogram and the second color histogram have a larger number of points falling into the index regions, which means that the frame image of the first frame has a higher similarity with the frame image of the second frame, and therefore, the frame image of the second frame needs to be classified into the first video segment. After the division is completed, when the frame images of the subsequent frames are processed, the number of the frame images in the first video segment is changed, so that the first center point needs to be updated to meet the subsequent similarity comparison requirement.
Specifically, the calculation formula of the first update center point is:
wherein,representing a first updated center point, +.>Indicating the number of existing frame pictures in the ith video segment,/-, for example>Frame image representing the j-th frame, +.>Representing the kth frame image in the ith video segment.
S140: if the similarity between the frame image of the second frame and the first center point is smaller than a similarity threshold, dividing the frame image of the second frame into a second video segment, and taking the frame image of the second frame as a second center point of the second video segment;
that is, when the similarity is insufficient, the frame image of the second frame is separated from the frame image of the first frame.
S150: sequentially processing the subsequent frame images in a time sequence until a plurality of frame images are classified into a plurality of video segments to form a video set;
it can be understood that when the frame image of a certain frame is processed, the similarity between the frame image and the center point of the determined video segment is compared, and the frame image is further classified by the processing methods from the step S130 to the step S140, which will not be described herein.
Step S20: determining a key frame image from the video segment, and extracting an optical flow image in the video segment;
the step S20 includes:
s210: calculating entropy values of the frame images in the video segment, and comparing the entropy values of different frame images to select the frame image with the largest entropy value as a key frame image;
image entropy is a statistical form of features that reflects how much information is averaged in an image, and specifically refers to the amount of information contained in the aggregate features of the gray distribution in an image. It will be appreciated that in the video stream, a number of the key frame images are included.
S220: extracting a first-direction moving image and a second-direction moving image of the video segment by a TV-L1 dense optical flow algorithm;
optical flow is a method used to describe objects in a scene that dynamically change between consecutive frames due to motion. Essentially, it is a two-dimensional field of vectors, each representing the displacement of the point in the scene from the previous frame to the next frame. And solving the optical flow, namely inputting two continuous frame images in the video segment, and outputting a two-dimensional vector field based on pixels of the frame images.
Taking two continuous frame images in the video segment as an example, the illumination energy of the frame image of the previous frame can determine a first X-axis coordinate and a first Y-axis coordinate in a coordinate system, the illumination energy of the frame image of the next frame can determine a second X-axis coordinate and a second Y-axis coordinate in the coordinate system, the position change from the first X-axis coordinate to the second X-axis coordinate is the first direction moving image, and the second direction moving image is the same.
S230: accumulating the first-direction moving image and the second-direction moving image as an optical flow image;
the first direction moving image and the second direction moving image point to the moving trend of the frame image, so that the first direction moving image and the second direction moving image need to be comprehensively considered, namely, accumulation processing is performed so as to correspond to the key frame image.
Step S30: determining a combined feature of the video segment based on the key frame image and the optical flow image;
and comprehensively considering the key frame image and the optical flow image, namely synchronously considering the action characteristic represented by the optical flow image while considering the space characteristic represented by the key frame image.
Specifically, the step S30 includes:
s310: constructing a feature extraction model, wherein the feature extraction model comprises a spatial convolution network and an optical flow convolution network, and the spatial convolution network and the optical flow convolution network are both connected with a combined network;
it should be noted that, the spatial convolution network and the optical flow convolution network both use RseNet-50 network models, which are different in that the training data for model training is different. The combination network is used for combining the features extracted through the spatial convolution network and the optical flow convolution network.
S320: taking the key frame image as an input value of the spatial convolution network to acquire spatial features;
s330: taking the optical flow image as an input value of the optical flow convolution network to acquire action characteristics;
s340: and taking the spatial characteristics and the action characteristics as input values of the combined network so as to acquire the combined characteristics through the combined network.
Specifically, the action feature and the spatial feature are subjected to a averaging process through the combination network to form a combination feature. Assuming that 1000 motion features and 1000 spatial features are extracted, the number of the combined features after the averaging process is 1000.
In some embodiments, the combined network may also form the combined feature by superimposing the action feature and the spatial feature. Assuming that 1000 motion features and 1000 spatial features are extracted, the number of the combined features after the superposition processing is still 1000.
Step S40: constructing an LSTM classifier, and training the LSTM classifier through the combined features so as to enable the LSTM classifier to have sensitive image recognition capability.
Specifically, the step S40 includes:
s410: constructing a plurality of first neurons to form an LSTM layer;
preferably, 128 of said first neurons are constructed to form said LSTM layer. The LSTM layer is used for learning the combined features, so that the LSTM classifier has distinguishing capability.
S420: constructing a plurality of second neurons to form a fully connected layer, and connecting the LSTM layers with the fully connected layer to form an LSTM classifier;
since only sensitive images need to be distinguished, the classification result of the LSTM classifier is only sensitive images or normal images, and thus the fully connected layer includes 2 second neurons. Further, the activation function of the fully connected layer is a softmax function.
S430: the combined characteristic is partitioned into a positive sample and a negative sample based on a sensitive image and a normal image, and the positive sample and the negative sample are input into the LSTM classifier as input values, so that the LSTM classifier has sensitive image recognition capability.
By extracting the key frame image, further extracting the spatial characteristics of the key frame image, training a neural network by using the spatial characteristics of the key frame image, then completing the automatic identification of the sensitive image in the AI live video stream, further extracting the action characteristics of the optical flow image by extracting the optical flow image together, combining the spatial characteristics and the action characteristics into the combined characteristics, avoiding the situation of misjudgment on partial indistinguishable images caused by only considering the spatial characteristics, effectively improving the accuracy of the automatic identification of the sensitive image, and after constructing and training the LSTM classifier, carrying out image classification on a large number of video streams in the same time period by the LSTM classifier after completing the lifting of the combined characteristics so as to complete the real-time, accurate and efficient automatic identification of the sensitive image.
Referring to fig. 2, a second embodiment of the present application provides an image analysis system based on AI live broadcast, which is applied to the image analysis based on AI live broadcast in the above embodiment, and will not be described again. As used below, the terms "module," "unit," "sub-unit," and the like may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The system comprises:
the preprocessing module 10 is configured to acquire a live video stream, and preprocess the video stream to acquire a video set including a plurality of video segments, where the video segments include a plurality of frame images;
the preprocessing module 10 includes:
a first unit, configured to divide the video stream into a plurality of frame images under consecutive frames, divide the frame images of a first frame into a first video segment, and use the frame images of the first frame as a first center point of the first video segment;
a second unit, configured to perform similarity comparison on the frame image of a second frame and the first center point;
a third unit, configured to divide the frame image of the second frame into the first video segment and calculate a first updated center point of the first video segment if the similarity between the frame image of the second frame and the first center point is greater than a similarity threshold;
a fourth unit, configured to divide the frame image of the second frame into a second video segment if the similarity between the frame image of the second frame and the first center point is smaller than a similarity threshold, and take the frame image of the second frame as a second center point of the second video segment;
and a fifth unit, configured to sequentially process the subsequent frame images in a time sequence until a plurality of frame images are classified into a plurality of video segments, so as to form a video set.
An extraction module 20, configured to determine a key frame image from the video segment, and extract an optical flow image in the video segment;
the extraction module 20 includes:
a sixth unit, configured to calculate entropy values of the frame images in the video segment, and compare the entropy values of different frame images, so as to select the frame image with the largest entropy value as a key frame image;
a seventh unit, configured to extract a first direction moving image and a second direction moving image of the video segment through a TV-L1 dense optical flow algorithm;
eighth means for accumulating the first-direction moving image and the second-direction moving image as optical flow images;
a combination module 30 for determining a combination feature of the video segment based on the key frame image and the optical flow image;
the combining module 30 includes:
a ninth unit, configured to construct a feature extraction model, where the feature extraction model includes a spatial convolution network and an optical flow convolution network, and the spatial convolution network and the optical flow convolution network are both connected to a combined network;
a tenth unit, configured to use the key frame image as an input value of the spatial convolution network, so as to obtain a spatial feature;
an eleventh unit, configured to take the optical flow image as an input value of the optical flow convolution network, so as to obtain an action feature;
a twelfth unit, configured to use the spatial feature and the action feature as input values of the combination network, so as to obtain a combination feature through the combination network;
the twelfth unit is specifically configured to take the spatial feature and the motion feature as input values of the combination network, and perform a averaging process on the motion feature and the spatial feature through the combination network to form a combination feature;
an analysis module 40, configured to construct an LSTM classifier, and train the LSTM classifier through the combined features, so that the LSTM classifier has a sensitive image recognition capability;
the analysis module 40 includes:
a thirteenth unit for constructing a number of first neurons to form LSTM layers;
a fourteenth unit for constructing a plurality of second neurons to form a fully connected layer, connecting the LSTM layers to the fully connected layer to form an LSTM classifier;
a fifteenth unit, configured to divide the combined feature into a positive sample and a negative sample based on a sensitive image and a normal image, and input the positive sample and the negative sample as input values into the LSTM classifier, so that the LSTM classifier has a sensitive image recognition capability.
The application also provides a computer, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the image analysis method based on the AI live broadcast in the technical scheme when executing the computer program.
The application also provides a storage medium, on which a computer program is stored, which when being executed by a processor implements the AI live broadcast-based image analysis method as described in the above technical scheme.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (10)

1. An image analysis method based on AI live broadcast is characterized by comprising the following steps:
acquiring a live video stream, and preprocessing the video stream to acquire a video set comprising a plurality of video segments, wherein the video segments comprise a plurality of frame images;
determining a key frame image from the video segment, and extracting an optical flow image in the video segment;
determining a combined feature of the video segment based on the key frame image and the optical flow image;
constructing an LSTM classifier, and training the LSTM classifier through the combined features so as to enable the LSTM classifier to have sensitive image recognition capability.
2. The AI live-based image analysis method of claim 1, wherein the step of preprocessing the video stream to obtain a video set including a number of video segments, the video segments including a number of frame images, comprises:
dividing the video stream into a plurality of frame images under continuous frames, dividing the frame images of a first frame into a first video segment, and taking the frame images of the first frame as a first center point of the first video segment;
comparing the similarity between the frame image of the second frame and the first center point;
if the similarity between the frame image of the second frame and the first center point is greater than a similarity threshold, dividing the frame image of the second frame into the first video segment, and calculating a first updated center point of the first video segment;
if the similarity between the frame image of the second frame and the first center point is smaller than a similarity threshold, dividing the frame image of the second frame into a second video segment, and taking the frame image of the second frame as a second center point of the second video segment;
and sequentially processing the subsequent frame images in a time sequence until a plurality of frame images are classified into a plurality of video segments to form a video set.
3. The AI live-based image analysis method of claim 2, wherein the first updated center point has a calculation formula:
wherein,representing a first updated center point, +.>Indicating the number of existing frame pictures in the ith video segment,/-, for example>Representing the j-th frameIs>Representing the kth frame image in the ith video segment.
4. The AI live-based image analysis method of claim 1, wherein the step of determining a key frame image from the video segment is specifically:
and calculating entropy values of the frame images in the video segment, and comparing the entropy values of different frame images to select the frame image with the largest entropy value as a key frame image.
5. The AI live-based image analysis method of claim 1, wherein the step of extracting an optical flow image in the video segment comprises:
extracting a first-direction moving image and a second-direction moving image of the video segment by a TV-L1 dense optical flow algorithm;
and accumulating the first direction moving image and the second direction moving image as an optical flow image.
6. The AI live-based image analysis method of claim 1, wherein the step of determining a combined feature of the video segment based on the key frame image and the optical flow image comprises:
constructing a feature extraction model, wherein the feature extraction model comprises a spatial convolution network and an optical flow convolution network, and the spatial convolution network and the optical flow convolution network are both connected with a combined network;
taking the key frame image as an input value of the spatial convolution network to acquire spatial features;
taking the optical flow image as an input value of the optical flow convolution network to acquire action characteristics;
and taking the spatial characteristics and the action characteristics as input values of the combined network so as to acquire the combined characteristics through the combined network.
7. The AI-live-based image analysis method of claim 6, wherein the step of obtaining the combined features through the combined network is specifically:
and carrying out averaging processing on the action features and the space features through the combined network to form combined features.
8. The AI live-based image analysis method of claim 1, wherein the step of constructing an LSTM classifier, training the LSTM classifier with the combined features to provide the LSTM classifier with sensitive image recognition capabilities comprises:
constructing a plurality of first neurons to form an LSTM layer;
constructing a plurality of second neurons to form a fully connected layer, and connecting the LSTM layers with the fully connected layer to form an LSTM classifier;
the combined characteristic is partitioned into a positive sample and a negative sample based on a sensitive image and a normal image, and the positive sample and the negative sample are input into the LSTM classifier as input values, so that the LSTM classifier has sensitive image recognition capability.
9. The AI live-based image analysis method of claim 8, wherein the fully connected layer includes 2 of the second neurons, and an activation function of the fully connected layer is a softmax function.
10. An AI live broadcast-based image analysis system applied to the AI live broadcast-based image analysis method as claimed in any one of claims 1 to 9, characterized in that the system comprises:
the preprocessing module is used for acquiring a live video stream, preprocessing the video stream to acquire a video set comprising a plurality of video segments, wherein the video segments comprise a plurality of frame images;
the extraction module is used for determining a key frame image from the video segment and extracting an optical flow image in the video segment;
a combination module for determining a combination feature of the video segment based on the key frame image and the optical flow image;
and the analysis module is used for constructing an LSTM classifier, and training the LSTM classifier through the combined features so as to enable the LSTM classifier to have sensitive image recognition capability.
CN202311370419.1A 2023-10-23 2023-10-23 Image analysis method and system based on AI live broadcast Pending CN117115155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311370419.1A CN117115155A (en) 2023-10-23 2023-10-23 Image analysis method and system based on AI live broadcast

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311370419.1A CN117115155A (en) 2023-10-23 2023-10-23 Image analysis method and system based on AI live broadcast

Publications (1)

Publication Number Publication Date
CN117115155A true CN117115155A (en) 2023-11-24

Family

ID=88795073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311370419.1A Pending CN117115155A (en) 2023-10-23 2023-10-23 Image analysis method and system based on AI live broadcast

Country Status (1)

Country Link
CN (1) CN117115155A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288015A (en) * 2017-01-10 2018-07-17 武汉大学 Human motion recognition method and system in video based on THE INVARIANCE OF THE SCALE OF TIME
US10402656B1 (en) * 2017-07-13 2019-09-03 Gopro, Inc. Systems and methods for accelerating video analysis
CN111950653A (en) * 2020-08-24 2020-11-17 腾讯科技(深圳)有限公司 Video processing method and device, storage medium and electronic equipment
CN112989117A (en) * 2021-04-14 2021-06-18 北京世纪好未来教育科技有限公司 Video classification method and device, electronic equipment and computer storage medium
CN113114946A (en) * 2021-04-19 2021-07-13 深圳市帧彩影视科技有限公司 Video processing method and device, electronic equipment and storage medium
CN113673307A (en) * 2021-07-05 2021-11-19 浙江工业大学 Light-weight video motion recognition method
CN113850158A (en) * 2021-09-08 2021-12-28 深圳供电局有限公司 Video feature extraction method
CN115734025A (en) * 2021-08-30 2023-03-03 北京安云世纪科技有限公司 Method and system for detecting redundant segments of video
US11768504B2 (en) * 2020-06-10 2023-09-26 AI Incorporated Light weight and real time slam for robots

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288015A (en) * 2017-01-10 2018-07-17 武汉大学 Human motion recognition method and system in video based on THE INVARIANCE OF THE SCALE OF TIME
US10402656B1 (en) * 2017-07-13 2019-09-03 Gopro, Inc. Systems and methods for accelerating video analysis
US11768504B2 (en) * 2020-06-10 2023-09-26 AI Incorporated Light weight and real time slam for robots
CN111950653A (en) * 2020-08-24 2020-11-17 腾讯科技(深圳)有限公司 Video processing method and device, storage medium and electronic equipment
CN112989117A (en) * 2021-04-14 2021-06-18 北京世纪好未来教育科技有限公司 Video classification method and device, electronic equipment and computer storage medium
CN113114946A (en) * 2021-04-19 2021-07-13 深圳市帧彩影视科技有限公司 Video processing method and device, electronic equipment and storage medium
CN113673307A (en) * 2021-07-05 2021-11-19 浙江工业大学 Light-weight video motion recognition method
CN115734025A (en) * 2021-08-30 2023-03-03 北京安云世纪科技有限公司 Method and system for detecting redundant segments of video
CN113850158A (en) * 2021-09-08 2021-12-28 深圳供电局有限公司 Video feature extraction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AMIN ULLAH等: "Activity Recognition Using Temporal Optical Flow Convolutional Features and Multilayers LSTM", IEEE *
周育新 等: "基于关键帧的轻量化行为识别方法研究", 仪器仪表学报 *

Similar Documents

Publication Publication Date Title
CN108304798B (en) Street level order event video detection method based on deep learning and motion consistency
CN112131978B (en) Video classification method and device, electronic equipment and storage medium
CN102334118B (en) Promoting method and system for personalized advertisement based on interested learning of user
CN108960059A (en) A kind of video actions recognition methods and device
CN104679818B (en) A kind of video key frame extracting method and system
CN110807757B (en) Image quality evaluation method and device based on artificial intelligence and computer equipment
CN111091109B (en) Method, system and equipment for predicting age and gender based on face image
CN107358141B (en) Data identification method and device
CN105740915B (en) A kind of collaboration dividing method merging perception information
CN109657715B (en) Semantic segmentation method, device, equipment and medium
CN111723773B (en) Method and device for detecting carryover, electronic equipment and readable storage medium
CN107622280B (en) Modularized processing mode image saliency detection method based on scene classification
CN111723687A (en) Human body action recognition method and device based on neural network
dos Santos et al. CV-C3D: action recognition on compressed videos with convolutional 3d networks
CN113221770A (en) Cross-domain pedestrian re-identification method and system based on multi-feature hybrid learning
CN110751191A (en) Image classification method and system
CN114782859B (en) Method for establishing target behavior perception space-time positioning model and application
CN109740527B (en) Image processing method in video frame
CN112487926A (en) Scenic spot feeding behavior identification method based on space-time diagram convolutional network
CN116993760A (en) Gesture segmentation method, system, device and medium based on graph convolution and attention mechanism
CN116469177A (en) Living body target detection method with mixed precision and training method of living body detection model
CN116110074A (en) Dynamic small-strand pedestrian recognition method based on graph neural network
CN117115155A (en) Image analysis method and system based on AI live broadcast
CN115424164A (en) Method and system for constructing scene self-adaptive video data set
CN115457620A (en) User expression recognition method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20240920