CN117915096A - Target identification high-precision high-resolution video coding method and system for AI large model - Google Patents

Target identification high-precision high-resolution video coding method and system for AI large model Download PDF

Info

Publication number
CN117915096A
CN117915096A CN202311716249.8A CN202311716249A CN117915096A CN 117915096 A CN117915096 A CN 117915096A CN 202311716249 A CN202311716249 A CN 202311716249A CN 117915096 A CN117915096 A CN 117915096A
Authority
CN
China
Prior art keywords
image
main body
video
signal
video image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311716249.8A
Other languages
Chinese (zh)
Inventor
常学智
娄方超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Daxing Economic Development Zone Development And Operation Co ltd
Original Assignee
Beijing Daxing Economic Development Zone Development And Operation Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Daxing Economic Development Zone Development And Operation Co ltd filed Critical Beijing Daxing Economic Development Zone Development And Operation Co ltd
Priority to CN202311716249.8A priority Critical patent/CN117915096A/en
Publication of CN117915096A publication Critical patent/CN117915096A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to the technical field of video coding, and provides a high-precision high-resolution video coding method and a high-precision high-resolution video coding system for target identification of an AI large model, wherein the high-precision high-resolution video coding method and the high-precision video coding system comprise the following steps: carrying out video framing treatment on the high-definition video stream to obtain a framing video, identifying video images in the framing video, and carrying out optimization treatment on the video images to obtain optimized video images; performing signal analysis on the optimized video image to obtain an image signal, and calculating a signal amplitude value corresponding to the image signal; performing main body recognition on the target video image to obtain an image main body, performing image segmentation on the target video image to obtain a key main body image and a non-key main body image, and performing coding processing on the key main body image to obtain a coded image; and extracting visual features corresponding to the non-key main body image, constructing a virtual image corresponding to the non-key main body image, and generating a final coded image corresponding to the target video image. The invention aims to improve the coding efficiency of high-precision high-resolution video.

Description

Target identification high-precision high-resolution video coding method and system for AI large model
Technical Field
The invention relates to the technical field of video coding, in particular to a high-precision high-resolution video coding method and system for target identification of an AI large model.
Background
The high-precision high-resolution video is a video with higher image quality and resolution, can provide clearer and more lifelike image details, and can be adopted for live broadcast of a sports event or retransmission of an international conference at present, so that a viewer can better enjoy and feel video content, and the video needs to be encoded during processing so as to be convenient for better video transmission.
However, the existing high-precision high-resolution video coding method mainly adopts an h.264/AVC method, and the method obtains high-quality coded video through processing modes such as motion estimation, transform coding, quantization, entropy coding and the like, but the method also carries out coding processing on inconsequential content in the video, such as grasslands in a court or physical scenes of audiences and the like, in the processing process, so that the calculated amount of video coding is increased, the coding efficiency of the high-precision high-resolution video is reduced, and therefore, a method capable of improving the coding efficiency of the high-precision high-resolution video is needed.
Disclosure of Invention
The invention provides a target identification high-precision high-resolution video coding method and system of an AI large model, and mainly aims to improve the coding efficiency of high-precision high-resolution video.
In order to achieve the above object, the present invention provides a method for high-precision and high-resolution video coding for object recognition of AI large models, comprising:
Obtaining a high-definition video stream to be encoded, carrying out video framing treatment on the high-definition video stream to obtain a framing video, identifying video images in the framing video, and carrying out optimization treatment on the video images to obtain optimized video images;
Performing signal analysis on the optimized video image to obtain an image signal, calculating a signal amplitude value corresponding to the image signal, performing de-duplication processing on the optimized video image according to the signal amplitude value to obtain a target video image, mining image information corresponding to each image in the target video image, and analyzing an image scene corresponding to the video image according to the image information;
Performing main body recognition on the target video image to obtain an image main body, extracting main body characteristics corresponding to the image main body, analyzing association relations between the image scene and the main body characteristics, and determining a key main body and a non-key main body in the image main body by combining the association relations;
according to the key main body and the non-key main body, performing image segmentation on the target video image to obtain a key main body image and a non-key main body image, and performing coding processing on the key main body image to obtain a coded image;
Inputting the non-key main body image into a trained AI large model, extracting visual characteristics corresponding to the non-key main body image by utilizing a convolutional neural network in the AI large model, constructing a virtual image corresponding to the non-key main body image by utilizing a self-encoder in the AI large model according to the visual characteristics, and generating a final coded image corresponding to the target video image according to the coded image and the virtual image.
Optionally, the optimizing the video image to obtain an optimized video image includes:
Carrying out noise reduction treatment on the video image to obtain a noise-reduced video image;
performing distortion restoration processing on the denoising video image to obtain a restoration video image;
performing distortion correction processing on the repair video image to obtain a corrected video image;
calculating a pixel intensity value corresponding to a pixel point of the corrected video image, and carrying out equalization processing on the pixel intensity value to obtain an equalized pixel;
And carrying out pixel optimization processing on the corrected video image according to the balanced pixels to obtain an optimized video image.
Optionally, the performing noise reduction processing on the video image to obtain a noise-reduced video image includes:
noise reduction processing is carried out on the video image through the following formula:
Wherein, A represents a noise reduction video image, a and B represent the length and width corresponding to the convolution window respectively, B a,b represents the convolution window for noise reduction processing of pixel points in the video image, and D (x, y) represents the video image.
Optionally, the calculating a signal amplitude value corresponding to the image signal includes:
calculating a signal amplitude value corresponding to the image signal by the following formula:
Wherein E represents a signal amplitude value corresponding to the image signal, F represents a length value corresponding to the image signal, β d represents time domain energy corresponding to a d-th signal sampling point in the image signal, d represents a signal real sampling point of the image signal, t represents a signal sampling point number of the image signal, F (d) represents frequency domain energy corresponding to a d-th signal sampling point of the image signal, and α represents a frequency domain conversion coefficient corresponding to the image signal.
Optionally, the analyzing, according to the image information, an image scene corresponding to the video image includes:
Extracting information description characters corresponding to the image information;
carrying out semantic analysis on the information description character to obtain a character paraphrasing;
determining an information scene corresponding to the image information according to the character definition;
and counting the scene frequency corresponding to the information scene, and determining the target scene of the image information according to the scene frequency.
Optionally, the extracting the main feature corresponding to the image main body includes:
identifying a main texture corresponding to the image main body, and calculating a pixel gray value corresponding to each pixel point in the main texture;
Constructing a texture matrix corresponding to the main texture according to the pixel gray value, and calculating a matrix mean value corresponding to the texture matrix;
Extracting texture features corresponding to the main body textures according to a preset threshold value and the matrix mean value;
and extracting color features corresponding to the image main body, and taking the texture features and the color features as main body features of the image main body.
Optionally, the extracting the color feature corresponding to the image main body includes:
Extracting the corresponding color characteristics of the image main body through the following formula:
Wherein, Q represents a color feature corresponding to an image main body, LN represents a normalization algorithm, W e represents a pixel mean value corresponding to an e-th main body in the image main body, e represents a main body serial number of the image main body, ω represents a main body number of the image main body, R ei represents a pixel value corresponding to an i-th pixel point in the e-th main body, i represents a pixel point serial number of the image main body, V e represents a pixel variance corresponding to the e-th main body in the image main body, and U e represents a pixel skewness coefficient corresponding to the e-th main body in the image main body.
Optionally, the analyzing the association relationship between the image scene and the subject feature includes:
analyzing feature elements in the main body features and analyzing visual elements corresponding to the image scene;
calculating a correlation coefficient between the visual element and the characteristic element;
Vectorizing the characteristic elements and the visual elements respectively to obtain characteristic element vectors and visual element vectors;
calculating the vector similarity between the characteristic element vector and the visual element vector;
and analyzing the association relation between the image scene and the main body characteristic by combining the vector similarity and the association coefficient.
Optionally, the calculating the vector similarity between the feature element vector and the visual element vector includes:
calculating the vector similarity between the feature element vector and the visual element vector by the following formula:
Wherein N represents the vector similarity between the feature element vector and the visual element vector, j and j+1 represent the sequence numbers of the feature element vector and the visual element, respectively, μ represents the total number of vectors of the feature element vector, M j represents the vector value of the j-th vector in the feature element vector, and G j+1 represents the vector value of the j+1th vector in the visual element vector.
A high-precision high-resolution video coding system for object recognition of AI large models, the system comprising:
The image optimization module is used for obtaining a high-definition video stream to be encoded, carrying out video framing processing on the high-definition video stream to obtain a framing video, identifying video images in the framing video, and carrying out optimization processing on the video images to obtain optimized video images;
The scene analysis module is used for carrying out signal analysis on the optimized video image to obtain an image signal, calculating a signal amplitude value corresponding to the image signal, carrying out de-duplication processing on the optimized video image according to the signal amplitude value to obtain a target video image, mining image information corresponding to each image in the target video image, and analyzing an image scene corresponding to the video image according to the image information;
The main body analysis module is used for carrying out main body identification on the target video image to obtain an image main body, extracting main body characteristics corresponding to the image main body, analyzing the association relation between the image scene and the main body characteristics, and determining a key main body and a non-key main body in the image main body by combining the association relation;
the image coding module is used for carrying out image segmentation on the target video image according to the key main body and the non-key main body to obtain a key main body image and a non-key main body image, and carrying out coding processing on the key main body image to obtain a coded image;
The image construction module is used for inputting the non-key main body image into a trained AI large model, extracting visual characteristics corresponding to the non-key main body image by utilizing a convolutional neural network in the AI large model, constructing a virtual image corresponding to the non-key main body image by utilizing a self-encoder in the AI large model according to the visual characteristics, and generating a final coded image corresponding to the target video image according to the coded image and the virtual image.
According to the invention, the high-definition video stream is subjected to video framing processing, so that the difficulty of video image identification processing is reduced, compared with multi-frame video, the video image corresponding to single-frame video is identified more efficiently, the method can obtain the electric signal of the optimized video image by carrying out signal analysis on the optimized video image, calculate the signal amplitude value corresponding to the image signal, thereby facilitating subsequent removal of repeated images in the optimized video image, avoiding subsequent calculation processing of repeated images, extracting the main body characteristics corresponding to the image main body, obtaining the relevant representation of the image main body, and further improving the accuracy of subsequent association relation analysis by carrying out image segmentation on the target video image according to the key main body and the non-key main body. Therefore, the method and the system for encoding the high-precision high-resolution video through the target recognition of the AI large model can improve the encoding efficiency of the high-precision high-resolution video.
Drawings
FIG. 1 is a schematic flow chart of a method for encoding a high-precision high-resolution video with object recognition of an AI large model according to an embodiment of the invention;
Fig. 2 is a functional block diagram of a high-precision high-resolution video coding system with AI large model object recognition according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the application provides a target identification high-precision high-resolution video coding method of an AI large model. In the embodiment of the present application, the execution body of the high-precision high-resolution video encoding method for object recognition of the AI large model includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the present application. In other words, the object recognition high-precision high-resolution video encoding method of the AI large model may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content distribution network (content DeliveryNetwork, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Referring to fig. 1, a flow chart of a method for identifying a target of an AI large model and encoding a high-precision high-resolution video according to an embodiment of the invention is shown. In this embodiment, the method for encoding the high-precision high-resolution video for the object recognition of the AI large model includes steps S1 to S5.
S1, obtaining a high-definition video stream to be encoded, carrying out video framing treatment on the high-definition video stream to obtain a framing video, identifying video images in the framing video, and carrying out optimization treatment on the video images to obtain optimized video images.
According to the method, the high-definition video stream is subjected to video framing processing, the high-definition video stream can be decomposed into single-frame videos, so that difficulty in video image recognition processing is reduced, video image recognition corresponding to the single-frame videos is higher than that of multi-frame videos, the high-definition video stream is a video data stream with higher resolution, high frame rate and high code rate, clear, finer and more realistic images and smoother animation effects can be provided, such as a live broadcast scene of a sports event or an internationally meeting are adopted, the high-definition video stream is adopted, the framing videos are single-frame videos corresponding to the high-definition video stream, the video images are static images corresponding to the framing videos, the video framing processing of the high-definition video stream can be realized through a video framing tool, such as an OpenCV tool, and the video images in the framing videos can be recognized through an image recognition tool, and the image recognition tool is compiled by a script language.
The invention can improve the image quality of the video image by optimizing the video image, so that the image signal obtained by subsequent signal analysis is more accurate, wherein the optimized video image is a high-quality image obtained by optimizing the video image.
As an embodiment of the present invention, the optimizing the video image to obtain an optimized video image includes: and carrying out noise reduction processing on the video image to obtain a noise reduction video image, carrying out distortion restoration processing on the noise reduction video image to obtain a restoration video image, carrying out distortion correction processing on the restoration video image to obtain a corrected video image, calculating a pixel intensity value corresponding to a pixel point of the corrected video image, carrying out equalization processing on the pixel intensity value to obtain an equalized pixel, and carrying out pixel optimization processing on the corrected video image according to the equalized pixel to obtain an optimized video image.
The noise reduction video image is an image obtained after noise points in the video image are suppressed, the repair video image is an image obtained after distortion artifacts in the noise reduction video image are repaired, the correction video image is an image obtained after geometric distortion or lens distortion in the repair video image are repaired, the pixel intensity value is a value corresponding to a pixel point of the correction video image, and the balanced pixel is a pixel obtained after the pixel intensity value is subjected to equalization processing.
Optionally, the distortion correction processing on the denoised video image may be implemented by a blind deconvolution method, the distortion correction processing on the corrected video image may be implemented by a distortion model, for example, a Brown model, and the pixel intensity value corresponding to the pixel point of the corrected video image may be calculated using the following formula: pixel intensity value=0.299r+0.587×g+0.114×b, wherein R, G, B is a pixel value of a red, green, and blue channel, respectively, and the equalizing of the pixel intensity value may be implemented by using a pixel histogram equalizing method, and the pixel updating process may be performed on the corrected video image by using the equalized pixel, so as to perform the pixel optimization process on the corrected video image.
Further, as an optional embodiment of the present invention, the performing noise reduction processing on the video image to obtain a noise-reduced video image includes:
noise reduction processing is carried out on the video image through the following formula:
Wherein, A represents a noise reduction video image, a and B represent the length and width corresponding to the convolution window respectively, B a,b represents the convolution window for noise reduction processing of pixel points in the video image, and D (x, y) represents the video image.
S2, carrying out signal analysis on the optimized video image to obtain an image signal, calculating a signal amplitude value corresponding to the image signal, carrying out de-duplication processing on the optimized video image according to the signal amplitude value to obtain a target video image, mining image information corresponding to each image in the target video image, and analyzing an image scene corresponding to the video image according to the image information.
According to the invention, the signal analysis is carried out on the optimized video image to obtain the electric signal of the optimized video image, and the signal amplitude value corresponding to the image signal is calculated, so that repeated images in the optimized video image can be removed conveniently and the repeated images are avoided from being calculated and processed subsequently, wherein the image signal is the electric signal corresponding to the optimized video image when the optimized video image is transmitted or processed, the signal amplitude value represents the signal intensity corresponding to the image signal, and optionally, the signal analysis on the optimized video image can be realized through a signal analysis tool, such as a video decoder.
As one embodiment of the present invention, the calculating a signal amplitude value corresponding to the image signal includes:
calculating a signal amplitude value corresponding to the image signal by the following formula:
Wherein E represents a signal amplitude value corresponding to the image signal, F represents a length value corresponding to the image signal, β d represents time domain energy corresponding to a d-th signal sampling point in the image signal, d represents a signal real sampling point of the image signal, t represents a signal sampling point number of the image signal, F (d) represents frequency domain energy corresponding to a d-th signal sampling point of the image signal, and α represents a frequency domain conversion coefficient corresponding to the image signal.
According to the signal amplitude value, the invention carries out the de-duplication processing on the optimized video image, can reduce the image quantity of the optimized video image, reduce the subsequent calculation cost about the image, mine the image information corresponding to each image in the target video image, and can obtain the content contained in each image in the target video image, thereby providing basis for subsequent image scene analysis, wherein the image information is the content contained in each image in the target video image, such as an object or a person in the image, and the like, and the de-duplication processing on the optimized video image can be realized by a perception hash algorithm, and comprises the following specific steps: calculating a hash value corresponding to the optimized video image by using a perceptual hash function, comparing the difference degree of the hash values, judging the similarity of the optimized video image, and mining the image information corresponding to each image in the target video image when the similarity is greater than a preset threshold similarity, wherein the image information can be realized through a neural network model, such as a cyclic neural network.
According to the image information, the image scene corresponding to the video image is analyzed, so that the specific application scene corresponding to the video image can be determined, and the subsequent image main body can be conveniently distinguished, wherein the image scene is the specific application scene corresponding to the video image, such as international video or live broadcast of a sports event.
As one embodiment of the present invention, the analyzing, according to the image information, an image scene corresponding to the video image includes: extracting information description characters corresponding to the image information, carrying out semantic analysis on the information description characters to obtain character paraphrasing, determining information scenes corresponding to the image information according to the character paraphrasing, counting scene frequencies corresponding to the information scenes, and determining target scenes of the image information according to the scene frequencies.
The information description characters are description texts corresponding to the image information, the character paraphrasing is the meaning corresponding to the information description characters, the information scenes are scenes corresponding to each piece of information in the image information, and the scene frequency represents the scene occurrence frequency corresponding to the information scenes.
Optionally, extracting the information description character corresponding to the image information may be implemented by an OCR text recognition technology, performing semantic analysis on the information description character may be implemented by a semantic analysis method, determining an information scene corresponding to the image information according to the interpretation content corresponding to the character interpretation, counting the scene frequency corresponding to the information scene may be implemented by a counting method, and taking the information scene with the largest occurrence number in the scene frequency as the target scene of the image information.
S3, carrying out main body recognition on the target video image to obtain an image main body, extracting main body characteristics corresponding to the image main body, analyzing association relations between the image scene and the main body characteristics, and determining a key main body and a non-key main body in the image main body by combining the association relations.
According to the method, the relevant characterization of the image main body can be obtained by extracting the main body characteristics corresponding to the image main body, and the accuracy of subsequent association relation analysis is further improved through the relevant characterization, wherein the image main body is the most important target or object in the target video image, the main body characteristics are the characteristics corresponding to the image main body, and optionally, main body identification of the target video image can be realized through a target detection algorithm, such as FasterR-CNN algorithm.
As one embodiment of the present invention, the extracting the subject feature corresponding to the image subject includes: identifying a main texture corresponding to the image main body, calculating a pixel gray value corresponding to each pixel point in the main texture, constructing a texture matrix corresponding to the main texture according to the pixel gray value, calculating a matrix mean value corresponding to the texture matrix, extracting texture features corresponding to the main texture according to a preset threshold value and the matrix mean value, extracting color features corresponding to the image main body, and taking the texture features and the color features as main features of the image main body.
The main texture is texture corresponding to the image main body, the pixel gray value is brightness value corresponding to the image expressed by the main texture only through a single tone, the texture matrix is a square matrix constructed by the pixel gray value, the preset threshold is a standard value of matrix mean value judgment, the texture feature is texture representation corresponding to the main texture, and the color feature is color representation corresponding to the image main body.
Optionally, identifying the main texture corresponding to the image main body may be implemented by using an LBP algorithm, calculating a pixel gray value corresponding to each pixel point in the main texture may be implemented by using an average method, averaging pixel values of three channels of red, green and blue of each pixel to obtain a pixel gray value, constructing a texture matrix corresponding to the main texture may be implemented by using a matrix function, for example, a zero matrix function, calculating a matrix average value corresponding to the texture matrix may be implemented by using an average function, extracting texture features corresponding to the main texture may be implemented by using a gray level symbiotic moment algorithm, and taking the texture features and the color features as main features of the image main body.
Optionally, as an optional embodiment of the present invention, the extracting a color feature corresponding to the image main body includes:
Extracting the corresponding color characteristics of the image main body through the following formula:
Wherein, Q represents a color feature corresponding to an image main body, LN represents a normalization algorithm, W e represents a pixel mean value corresponding to an e-th main body in the image main body, e represents a main body serial number of the image main body, ω represents a main body number of the image main body, R ei represents a pixel value corresponding to an i-th pixel point in the e-th main body, i represents a pixel point serial number of the image main body, V e represents a pixel variance corresponding to the e-th main body in the image main body, and U e represents a pixel skewness coefficient corresponding to the e-th main body in the image main body.
According to the method, the association degree between the image scene and the main body characteristic is known through the association relationship, so that a key main body and a non-key main body in the image main body are conveniently determined later, a basis is provided for subsequent image segmentation, wherein the association relationship is the association degree between the image scene and the main body characteristic, the key main body is the main body with the highest association relationship corresponding to the image scene in the image main body, such as a sportsman in a sports event, the non-key main body is the main body with the lowest association relationship corresponding to the image scene in the image main body, such as a spectator in a court, and optionally, the key main body and the non-key main body in the image main body can be determined according to the association relationship.
As one embodiment of the present invention, the analyzing the association relationship between the image scene and the subject feature includes: analyzing feature elements in the main body features, analyzing visual elements corresponding to the image scene, calculating association coefficients between the visual elements and the feature elements, respectively carrying out vectorization processing on the feature elements and the visual elements to obtain feature element vectors and visual element vectors, calculating vector similarity between the feature element vectors and the visual element vectors, and analyzing association relations between the image scene and the main body features by combining the vector similarity and the association coefficients.
Wherein the feature element is an element, such as an object or a color, where the main feature exists, the visual element is an element, such as a person, where the visual element exists in the image scene, the association coefficient represents a degree of association between the visual element and the feature element, the feature element vector and the visual element vector are expression vectors corresponding to the feature element and the visual element, respectively, and the vector similarity represents a degree of similarity between the feature element vector and the visual element vector.
Optionally, analyzing the feature elements in the main feature may be implemented by a corner detection algorithm, for example, harris corner detection algorithm, where the visual elements are consistent with the analysis method of the feature elements, and not described in detail herein, calculating the association coefficients between the visual elements and the feature elements may be implemented by pearson correlation coefficient algorithm, vectorizing the feature elements and the visual elements respectively may be implemented by word2vec algorithm, and analyzing the association relationship between the image scene and the main feature by combining the vector similarity and the association coefficients
Optionally, as an optional embodiment of the present invention, the calculating a vector similarity between the feature element vector and the visual element vector includes:
calculating the vector similarity between the feature element vector and the visual element vector by the following formula:
Wherein N represents the vector similarity between the feature element vector and the visual element vector, j and j+1 represent the sequence numbers of the feature element vector and the visual element, respectively, μ represents the total number of vectors of the feature element vector, M j represents the vector value of the j-th vector in the feature element vector, and G j+1 represents the vector value of the j+1th vector in the visual element vector.
And S4, performing image segmentation on the target video image according to the key main body and the non-key main body to obtain a key main body image and a non-key main body image, and performing coding processing on the key main body image to obtain a coded image.
According to the invention, the target video image is subjected to image segmentation according to the key main body and the non-key main body, the target video image can be effectively separated, and the subsequent mixed coding processing of the target video image is facilitated, so that the coding efficiency of the image is improved, wherein the key main body image is an image corresponding to the target video image of the key main body, the non-key main body image is an image corresponding to the target video image of the non-key main body, the coding image is an image obtained after the key main body image is subjected to coding processing, and optionally, the image segmentation of the target video image can be realized through a threshold segmentation method, and the coding processing of the key main body image comprises the following steps: preprocessing the key main body image, including operations such as cutting, size adjustment, color space conversion and the like, to obtain a target main body image, performing block processing on the target main body image, dividing the target main body image into image blocks with equal or unequal sizes, performing pixel prediction on the image blocks, predicting pixels in the image blocks according to statistical information of encoded pixel values or surrounding pixel values to obtain predicted pixel values, predicting the predicted pixel values by using a linear model method, comparing the predicted pixel values with actual pixel values to obtain prediction errors, encoding and compressing the prediction errors by using a lossless encoding algorithm to obtain compressed pixel values, and performing reconstruction processing on the compressed pixel values by using a lossless encoding method including Huffman encoding to obtain an encoded image.
S5, inputting the non-key main body image into a trained AI large model, extracting visual characteristics corresponding to the non-key main body image by using a convolutional neural network in the AI large model, constructing a virtual image corresponding to the non-key main body image by using a self-encoder in the AI large model according to the visual characteristics, and generating a final encoded image corresponding to the target video image according to the encoded image and the virtual image.
The invention can obtain the visual characteristic attribute corresponding to the non-key main body image by utilizing the convolution neural network in the AI large model so as to facilitate the subsequent construction of the virtual image corresponding to the non-key main body image, wherein the AI large model is used for modeling the non-key object in the image, the convolution neural network is used for extracting the characteristic of the image, the virtual model is used for replacing part of the physical image, the coding efficiency of the high-definition video stream can be improved, the self-encoder is used for modeling the image object, the coding of the physical image is realized by the modeling model, the virtual image is the image obtained by replacing the coding of the physical image in the non-key main body image by the model, the visual characteristic corresponding to the non-key main body image can be extracted by the convolution kernel in the convolution neural network, the virtual image corresponding to the non-key main body image can be constructed by the encoder and the decoder in the self-encoder, the final coding image corresponding to the target video image can be generated by the generator, and the JAVA language is realized by the compiler.
According to the invention, the high-definition video stream is subjected to video framing processing, so that the difficulty of video image identification processing is reduced, compared with multi-frame video, the video image corresponding to single-frame video is identified more efficiently, the method can obtain the electric signal of the optimized video image by carrying out signal analysis on the optimized video image, calculate the signal amplitude value corresponding to the image signal, thereby facilitating subsequent removal of repeated images in the optimized video image, avoiding subsequent calculation processing of repeated images, extracting the main body characteristics corresponding to the image main body, obtaining the relevant representation of the image main body, and further improving the accuracy of subsequent association relation analysis by carrying out image segmentation on the target video image according to the key main body and the non-key main body. Therefore, the target recognition high-precision high-resolution video coding method of the AI large model provided by the embodiment of the invention can improve the coding efficiency of high-precision high-resolution video.
Fig. 2 is a functional block diagram of an AI large model object recognition high-precision high-resolution video coding system according to an embodiment of the present invention.
The high-precision high-resolution video coding system 100 for the object recognition of the AI large model can be installed in electronic equipment. The high-precision high-resolution video coding system 100 for object recognition of AI large models may include an image optimization module 101, a scene analysis module 102, a subject analysis module 103, an image coding module 104, and an image construction module 105, depending on the functions implemented. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
In the present embodiment, the functions concerning the respective modules/units are as follows:
The image optimization module 101 is configured to obtain a high-definition video stream to be encoded, perform video framing processing on the high-definition video stream to obtain a framed video, identify a video image in the framed video, and perform optimization processing on the video image to obtain an optimized video image;
The scene analysis module 102 is configured to perform signal analysis on the optimized video image to obtain an image signal, calculate a signal amplitude value corresponding to the image signal, perform de-duplication processing on the optimized video image according to the signal amplitude value to obtain a target video image, mine image information corresponding to each image in the target video image, and analyze an image scene corresponding to the video image according to the image information;
The main body analysis module 103 is configured to perform main body recognition on the target video image to obtain an image main body, extract main body features corresponding to the image main body, analyze an association relationship between the image scene and the main body features, and determine a key main body and a non-key main body in the image main body in combination with the association relationship;
The image encoding module 104 is configured to perform image segmentation on the target video image according to the key main body and the non-key main body to obtain a key main body image and a non-key main body image, and perform encoding processing on the key main body image to obtain an encoded image;
The image construction module 105 is configured to input the non-key main body image into a trained AI large model, extract visual features corresponding to the non-key main body image by using a convolutional neural network in the AI large model, construct a virtual image corresponding to the non-key main body image by using a self-encoder in the AI large model according to the visual features, and generate a final encoded image corresponding to the target video image according to the encoded image and the virtual image.
In detail, each module in the AI large model target recognition high-precision high-resolution video coding system 100 in the embodiment of the present application adopts the same technical means as the AI large model target recognition high-precision high-resolution video coding method in fig. 1, and can produce the same technical effects, which are not described herein.
In several embodiments provided by the present invention, it should be understood that the methods and systems provided may be implemented in other ways. For example, the above-described method embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and other manners of division may be implemented in practice.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is a theory, method, technique, and application system that utilizes a digital computer or a digital computer-controlled machine to simulate, extend, and expand artificial intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. Multiple units or systems as set forth in the system claims may also be implemented by means of one unit or system in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. A method for high-precision high-resolution video coding of object recognition of AI large models, the method comprising:
Obtaining a high-definition video stream to be encoded, carrying out video framing treatment on the high-definition video stream to obtain a framing video, identifying video images in the framing video, and carrying out optimization treatment on the video images to obtain optimized video images;
Performing signal analysis on the optimized video image to obtain an image signal, calculating a signal amplitude value corresponding to the image signal, performing de-duplication processing on the optimized video image according to the signal amplitude value to obtain a target video image, mining image information corresponding to each image in the target video image, and analyzing an image scene corresponding to the video image according to the image information;
Performing main body recognition on the target video image to obtain an image main body, extracting main body characteristics corresponding to the image main body, analyzing association relations between the image scene and the main body characteristics, and determining a key main body and a non-key main body in the image main body by combining the association relations;
according to the key main body and the non-key main body, performing image segmentation on the target video image to obtain a key main body image and a non-key main body image, and performing coding processing on the key main body image to obtain a coded image;
Inputting the non-key main body image into a trained AI large model, extracting visual characteristics corresponding to the non-key main body image by utilizing a convolutional neural network in the AI large model, constructing a virtual image corresponding to the non-key main body image by utilizing a self-encoder in the AI large model according to the visual characteristics, and generating a final coded image corresponding to the target video image according to the coded image and the virtual image.
2. The method for high-precision high-resolution video coding for object recognition of AI large model according to claim 1, wherein said optimizing said video image to obtain an optimized video image comprises:
Carrying out noise reduction treatment on the video image to obtain a noise-reduced video image;
performing distortion restoration processing on the denoising video image to obtain a restoration video image;
performing distortion correction processing on the repair video image to obtain a corrected video image;
calculating a pixel intensity value corresponding to a pixel point of the corrected video image, and carrying out equalization processing on the pixel intensity value to obtain an equalized pixel;
And carrying out pixel optimization processing on the corrected video image according to the balanced pixels to obtain an optimized video image.
3. The method for high-precision high-resolution video coding for object recognition of AI large model according to claim 2, wherein said denoising the video image to obtain a denoised video image comprises:
noise reduction processing is carried out on the video image through the following formula:
Wherein, A represents a noise reduction video image, a and B represent the length and width corresponding to the convolution window respectively, B a,b represents the convolution window for noise reduction processing of pixel points in the video image, and D (x, y) represents the video image.
4. The method for high-precision high-resolution video encoding for object recognition of AI large models according to claim 1, wherein said calculating signal amplitude values corresponding to said image signals comprises:
calculating a signal amplitude value corresponding to the image signal by the following formula:
Wherein E represents a signal amplitude value corresponding to the image signal, F represents a length value corresponding to the image signal, β d represents time domain energy corresponding to a d-th signal sampling point in the image signal, d represents a signal real sampling point of the image signal, t represents a signal sampling point number of the image signal, F (d) represents frequency domain energy corresponding to a d-th signal sampling point of the image signal, and α represents a frequency domain conversion coefficient corresponding to the image signal.
5. The method for high-precision and high-resolution video encoding for object recognition of AI large models according to claim 1, wherein said analyzing image scenes corresponding to said video image based on said image information comprises:
Extracting information description characters corresponding to the image information;
carrying out semantic analysis on the information description character to obtain a character paraphrasing;
determining an information scene corresponding to the image information according to the character definition;
and counting the scene frequency corresponding to the information scene, and determining the target scene of the image information according to the scene frequency.
6. The method for high-precision and high-resolution video encoding for object recognition of AI large model of claim 1, wherein said extracting the corresponding subject features of said image subject comprises:
identifying a main texture corresponding to the image main body, and calculating a pixel gray value corresponding to each pixel point in the main texture;
Constructing a texture matrix corresponding to the main texture according to the pixel gray value, and calculating a matrix mean value corresponding to the texture matrix;
Extracting texture features corresponding to the main body textures according to a preset threshold value and the matrix mean value;
and extracting color features corresponding to the image main body, and taking the texture features and the color features as main body features of the image main body.
7. The method for high-precision and high-resolution video encoding of AI large model of claim 6, wherein said extracting color features corresponding to said image subject comprises:
Extracting the corresponding color characteristics of the image main body through the following formula:
Wherein, Q represents a color feature corresponding to an image main body, LN represents a normalization algorithm, W e represents a pixel mean value corresponding to an e-th main body in the image main body, e represents a main body serial number of the image main body, ω represents a main body number of the image main body, R ei represents a pixel value corresponding to an i-th pixel point in the e-th main body, i represents a pixel point serial number of the image main body, V e represents a pixel variance corresponding to the e-th main body in the image main body, and U e represents a pixel skewness coefficient corresponding to the e-th main body in the image main body.
8. The method for high-precision high-resolution video encoding for object recognition of AI large models according to claim 1, wherein said analyzing the association between said image scene and said subject feature comprises:
analyzing feature elements in the main body features and analyzing visual elements corresponding to the image scene;
calculating a correlation coefficient between the visual element and the characteristic element;
Vectorizing the characteristic elements and the visual elements respectively to obtain characteristic element vectors and visual element vectors;
calculating the vector similarity between the characteristic element vector and the visual element vector;
and analyzing the association relation between the image scene and the main body characteristic by combining the vector similarity and the association coefficient.
9. The AI large model of claim 8, wherein said computing vector similarity between the feature element vector and the visual element vector comprises:
calculating the vector similarity between the feature element vector and the visual element vector by the following formula:
Wherein N represents the vector similarity between the feature element vector and the visual element vector, j and j+1 represent the sequence numbers of the feature element vector and the visual element, respectively, μ represents the total number of vectors of the feature element vector, M j represents the vector value of the j-th vector in the feature element vector, and G j+1 represents the vector value of the j+1th vector in the visual element vector.
10. A high-precision high-resolution video coding system for object recognition of AI large models, the system being based on the method of any of claims 1-9, the system comprising:
The image optimization module is used for obtaining a high-definition video stream to be encoded, carrying out video framing processing on the high-definition video stream to obtain a framing video, identifying video images in the framing video, and carrying out optimization processing on the video images to obtain optimized video images;
The scene analysis module is used for carrying out signal analysis on the optimized video image to obtain an image signal, calculating a signal amplitude value corresponding to the image signal, carrying out de-duplication processing on the optimized video image according to the signal amplitude value to obtain a target video image, mining image information corresponding to each image in the target video image, and analyzing an image scene corresponding to the video image according to the image information;
The main body analysis module is used for carrying out main body identification on the target video image to obtain an image main body, extracting main body characteristics corresponding to the image main body, analyzing the association relation between the image scene and the main body characteristics, and determining a key main body and a non-key main body in the image main body by combining the association relation;
the image coding module is used for carrying out image segmentation on the target video image according to the key main body and the non-key main body to obtain a key main body image and a non-key main body image, and carrying out coding processing on the key main body image to obtain a coded image;
The image construction module is used for inputting the non-key main body image into a trained AI large model, extracting visual characteristics corresponding to the non-key main body image by utilizing a convolutional neural network in the AI large model, constructing a virtual image corresponding to the non-key main body image by utilizing a self-encoder in the AI large model according to the visual characteristics, and generating a final coded image corresponding to the target video image according to the coded image and the virtual image.
CN202311716249.8A 2023-12-14 2023-12-14 Target identification high-precision high-resolution video coding method and system for AI large model Pending CN117915096A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311716249.8A CN117915096A (en) 2023-12-14 2023-12-14 Target identification high-precision high-resolution video coding method and system for AI large model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311716249.8A CN117915096A (en) 2023-12-14 2023-12-14 Target identification high-precision high-resolution video coding method and system for AI large model

Publications (1)

Publication Number Publication Date
CN117915096A true CN117915096A (en) 2024-04-19

Family

ID=90691429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311716249.8A Pending CN117915096A (en) 2023-12-14 2023-12-14 Target identification high-precision high-resolution video coding method and system for AI large model

Country Status (1)

Country Link
CN (1) CN117915096A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110310343A (en) * 2019-05-28 2019-10-08 西安万像电子科技有限公司 Image processing method and device
CN111031032A (en) * 2019-12-12 2020-04-17 深圳市万佳安物联科技股份有限公司 Cloud video transcoding method and device, decoding method and device, and electronic device
WO2021164176A1 (en) * 2020-02-20 2021-08-26 北京大学 End-to-end video compression method and system based on deep learning, and storage medium
CN114554220A (en) * 2022-01-13 2022-05-27 北京信息科技大学 Method for over-limit compression and decoding of fixed scene video based on abstract features
CN115690615A (en) * 2022-10-11 2023-02-03 杭州视图智航科技有限公司 Deep learning target identification method and system for video stream

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110310343A (en) * 2019-05-28 2019-10-08 西安万像电子科技有限公司 Image processing method and device
CN111031032A (en) * 2019-12-12 2020-04-17 深圳市万佳安物联科技股份有限公司 Cloud video transcoding method and device, decoding method and device, and electronic device
WO2021164176A1 (en) * 2020-02-20 2021-08-26 北京大学 End-to-end video compression method and system based on deep learning, and storage medium
CN114554220A (en) * 2022-01-13 2022-05-27 北京信息科技大学 Method for over-limit compression and decoding of fixed scene video based on abstract features
CN115690615A (en) * 2022-10-11 2023-02-03 杭州视图智航科技有限公司 Deep learning target identification method and system for video stream

Similar Documents

Publication Publication Date Title
US9282330B1 (en) Method and apparatus for data compression using content-based features
Zhang et al. Low-rank decomposition-based restoration of compressed images via adaptive noise estimation
Hadizadeh et al. Video error concealment using a computation-efficient low saliency prior
CN110944200B (en) Method for evaluating immersive video transcoding scheme
CN114363623A (en) Image processing method, image processing apparatus, image processing medium, and electronic device
CN111626178B (en) Compressed domain video motion recognition method and system based on new spatio-temporal feature stream
CN113379858A (en) Image compression method and device based on deep learning
CN115396669A (en) Video compression method and device based on interest area enhancement
CN111754430A (en) Color image denoising method based on pure quaternion dictionary learning
Zou et al. A nonlocal low-rank regularization method for fractal image coding
Katakol et al. Distributed learning and inference with compressed images
Kekre et al. Image Reconstruction using Fast Inverse Half tone and Huffman Coding Technique
CN116524387A (en) Ultra-high definition video compression damage grade assessment method based on deep learning network
CN117915096A (en) Target identification high-precision high-resolution video coding method and system for AI large model
US11928855B2 (en) Method, device, and computer program product for video processing
Farah et al. Full-reference and reduced-reference quality metrics based on SIFT
CN110933402B (en) No-reference stereo video quality evaluation method based on motion texture features
Xie et al. Just noticeable visual redundancy forecasting: a deep multimodal-driven approach
US20230342986A1 (en) Autoencoder-based segmentation mask generation in an alpha channel
CN114422795A (en) Face video coding method, decoding method and device
CN114549302A (en) Image super-resolution reconstruction method and system
Zhang et al. Reduced-reference image quality assessment based on entropy differences in DCT domain
WO2015128302A1 (en) Method and apparatus for filtering and analyzing a noise in an image
CN115510271B (en) Content-oriented animation video non-reference quality evaluation method
Wan et al. Omnidirectional Image Quality Assessment With a Superpixel-Based Sparse Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination