CN111953973B - General video compression coding method supporting machine intelligence - Google Patents

General video compression coding method supporting machine intelligence Download PDF

Info

Publication number
CN111953973B
CN111953973B CN202010895946.4A CN202010895946A CN111953973B CN 111953973 B CN111953973 B CN 111953973B CN 202010895946 A CN202010895946 A CN 202010895946A CN 111953973 B CN111953973 B CN 111953973B
Authority
CN
China
Prior art keywords
information
coding
position information
frame
spatial position
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010895946.4A
Other languages
Chinese (zh)
Other versions
CN111953973A (en
Inventor
陈志波
金鑫
孙思萌
冯若愚
冯润森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202010895946.4A priority Critical patent/CN111953973B/en
Publication of CN111953973A publication Critical patent/CN111953973A/en
Application granted granted Critical
Publication of CN111953973B publication Critical patent/CN111953973B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a general video compression coding method supporting machine intelligence, which is used for compressing intelligent analysis tasks of machines, so that a higher compression ratio can be obtained when the same intelligent analysis tasks of machines are realized compared with the compression ratio aiming at human eyes, the information required to be transmitted is reduced, and the transmission load is lightened; the compressed features can be directly applied to machine intelligent analysis tasks without extra decoding and processing, so that the calculated amount is reduced, the execution of the machine analysis tasks is accelerated, and the realization of edge calculation is supported; in addition, partial analysis on the original video/image is supported before coding compression, so that not only can the intelligent analysis precision be improved, but also a structured compressed code stream can be generated, and more subsequent intelligent analysis tasks can be supported. In conclusion, the method can make the process of performing video/image compression on a machine more universal, flexible and efficient.

Description

General video compression coding method supporting machine intelligence
Technical Field
The invention relates to the technical field of video/image compression coding, in particular to a general video compression coding method supporting machine intelligence.
Background
The existing video/image compression standard mainly aims at the compression facing human vision, and aims to keep the video code rate as low as possible under the condition that the distortion of the video watched by human eyes is certain. As the algorithm of machine learning becomes mature, the task of machine intelligent analysis is also gradually applied to various fields of human social life and production, such as intelligent factories, intelligent cities, intelligent transportation, and the like. The realization of the series of applications is often accompanied with the analysis of a large amount of video/image data, and the conventional method is adopted, wherein the video/image is compressed by the existing standard, the compressed code stream needs to be decoded before the analysis, so that the compressed video/image is obtained, and then the compressed and restored video/image is analyzed. However, there are problems as follows: 1) Since the conventional video/image compression standard aims at human vision, a large number of code rates may be used in the compressed code stream to represent unnecessary content in the video/image analysis, which may cause a heavy burden on transmission. 2) Since the compressed video/image needs to be decoded and restored in the conventional method and then analyzed, a time delay is also caused, which results in poor user experience. 3) Since the compression-restored video/image has some distortion, the analysis may be wrong or even more problematic.
With the development of edge computing and terminal intelligent technologies, more machine intelligent analysis can process and analyze videos/images on an edge server or terminal equipment, so if a machine-oriented encoding method can be realized, the encoded code stream only contains contents useful for machine intelligent analysis, and the data volume required to be transmitted by a machine intelligent analysis task can be greatly reduced. Meanwhile, the coded code stream can be directly used in the task of machine intelligent analysis without recovering compressed video/images, so that the calculation time delay can be reduced, and the processing efficiency can be improved. Therefore, the intelligent analysis and coding of partial machines are performed before, the structural function of the code stream is improved, and the subsequent intelligent analysis task is favorably executed.
In the prior art, a Visual Search Compact descriptor international standard (CDVS) encodes video/image features required by a retrieval task, and the above requirements are met to a certain extent, but a code stream of the video/image features can only be used for the Search task, an application scene is single, and the requirements of more general intelligent applications on compression coding cannot be met. Therefore, a general video compression encoding method supporting machine intelligence is highly desirable.
Disclosure of Invention
The invention aims to provide a general video compression coding method supporting machine intelligence so as to realize coding of video/image characteristic information required by each task, thereby improving the analysis accuracy of intelligent tasks and reducing the data transmission pressure.
The purpose of the invention is realized by the following technical scheme:
a method of universal video compression encoding supporting machine intelligence, comprising: intra-frame coding and inter-frame coding; wherein:
the intra-frame encoding section includes: for an input video frame, firstly carrying out object detection to obtain spatial position information and category information of each object; performing attribute analysis and relationship inference based on the spatial position information and the category information of each object to obtain attribute information of each object and a topological relationship between the objects; then, the spatial position information and the category information of each object are used as guiding information, the spatial position information of the object is used for dividing the coding units of the input video frame, the divided coding units are coded, and the category information of the object contained in the code stream obtained by coding is used for the video frame reconstruction process of the interframe coding part;
the inter-frame encoding section includes: reconstructing a video frame by taking an input video frame or an object as a unit, and obtaining optical flow prediction information and residual error coding information through motion compensation;
and entropy coding spatial position information and category information of each object obtained by the intra-frame coding part, attribute information of each object, topological relation among the objects, a coded coding unit and optical flow prediction information and residual coding information obtained by the inter-frame coding part to obtain a corresponding code stream.
The technical scheme provided by the invention can show that 1) the method can support various existing or even possible future tasks, has a wide application range and has a strong practical application value; 2) The compression is carried out aiming at the machine intelligent analysis task, a compression ratio higher than that of human eye compression can be obtained when the same machine intelligent analysis task can be realized, the information required to be transmitted is reduced, and the transmission burden is lightened. 3) The machine intelligent analysis tasks are compressed, the compressed features can be directly applied to the machine intelligent analysis tasks, extra decoding and processing are not needed, the calculated amount is reduced, the execution of the machine analysis tasks is accelerated, and the realization of edge calculation is supported. 4) The universal coding framework supports partial analysis on the original video/image before coding compression, can improve intelligent analysis precision, can generate a structured compressed code stream, and supports more subsequent intelligent analysis tasks. In summary, the above scheme can make the process of video/image compression more general, flexible and efficient for the machine.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a block diagram of a general video compression encoding method supporting machine intelligence according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an encoding process according to an embodiment of the present invention;
fig. 3 is a schematic view of a code stream structure of an intra-frame coding portion according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a general video compression coding method supporting machine intelligence, which is different from a coding mode of a traditional video coding framework and is used for compression by utilizing a coding framework based on deep machine learning. The division of the coding processing unit can be carried out in a pixel domain, and also supports the division in a hidden variable domain. As shown in fig. 1, it mainly includes: the method comprises the following steps of intra-frame coding and inter-frame coding.
1. An intra-coded portion.
As shown in fig. 2, the intra-frame coding portion includes an object detection module, an encoder, a spatial relationship inference module, a semantic relationship inference module, and an attribute analysis module.
The main process is as follows: for an input video frame x t Firstly, object detection is carried out to obtain spatial position information and category information of each object; then combine video frame x t Further mining the spatial position information and the category information of each object, including performing attribute analysis and relationship inference, to obtain attribute information of each object (in the case of a pedestrian, the attribute information includes features of each body part of the pedestrian, such as a head feature, upper/lower body features, accessory features, and the like), and a topological relationship between the objects; and then, the spatial position information and the category information of each object are used as guide information, the input video frame is divided into coding units, and the divided coding units are coded.
In the embodiment of the present invention, the processing unit is an Object (Object) in the video and a background outside the Object, and the Object may be a rectangular frame containing one or more objects or a closed boundary of an arbitrary shape containing one or more objects, as shown in fig. 2.
In the embodiment of the present invention, the relationship inference includes: spatial relationship reasoning and semantic relationship reasoning; carrying out spatial relationship reasoning by using the spatial position information of each object to obtain the spatial relationship among the objects; performing semantic relation reasoning by utilizing the category information of the objects to obtain the semantic relation among the objects; the spatial relationship and the semantic relationship form a topological relationship.
In the embodiment of the invention, the spatial position information and the category information of each object are used as the guidance information. The dividing of the encoding unit of the input video frame using the spatial position information of the object, and the encoding of the divided encoding unit includes: the method comprises the steps of mapping an object to a hidden variable space to be coded according to the space position information of the object, carrying out semantic division on hidden variables (the hidden variables belong to a coding unit form) according to the mapped space position information to obtain hidden variables to be coded corresponding to the semantics, coding the divided hidden variables according to the sequence from top to bottom and from left to right, wherein code streams obtained by coding also comprise class information of the object, and using the class information of the object as object mark information required by a decoder in the process of reconstructing video frames of an interframe coding part, such as pedestrian-1, vehicle-2, pedestrian-3 and the like.
2. And an inter-frame coding part.
The inter-frame encoding section includes: the method comprises the steps of reconstructing a video frame by taking an input video frame or an object as a unit, and obtaining optical flow prediction information and residual coding information through motion compensation. This may be achieved by conventional techniques.
3. And entropy coding to generate a code stream.
As shown in fig. 1, the spatial position information and the category information of each object obtained by the intra-frame coding part, the attribute information of each object, the topological relation between the objects, the coded coding unit, and the optical flow prediction information and the residual coding information obtained by the inter-frame coding part are entropy-coded to obtain a corresponding code stream.
As shown in fig. 3, a code stream structure of an intra-coded portion is given.
The code stream structure of the intra-frame coding part is as follows: object header information, object attribute information, and an object information stream; wherein the object header information includes: spatial position information, category information, and topological relation information of the object. Each object and background included in the object information stream mainly refers to a corresponding image, each object can be detected in the first object detection, and the remaining portion is the background.
The compression coding process can be realized at the edge aiming at certain specific tasks, and can also be realized aiming at various tasks at the cloud.
In practical application, the code stream is transmitted or stored, when the terminal decompresses, the code stream is correspondingly decompressed according to header information defined during compression and encoding (namely, header information required by decompression and a corresponding object information stream) to obtain characteristic information for a specific task, and the characteristic information is input to obtain an analysis result for the task.
Based on the scheme of the embodiment of the invention, image analysis tasks such as physical detection, object segmentation, image enhancement, image understanding and the like can be supported by analyzing part of code stream data, and video analysis tasks such as pedestrian tracking, behavior recognition, anomaly detection and the like can be realized; the data can be decoded to support visual analysis and manual identification; and decoding all code streams to generate complete image video data can be supported.
Through the description of the above embodiments, it is clear to those skilled in the art that the above embodiments may be implemented by software, or by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are also within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (3)

1. A method for universal video compression coding that supports machine intelligence, comprising: intra-frame coding and inter-frame coding; wherein:
the intra-frame encoding section includes: for an input video frame, firstly carrying out object detection to obtain spatial position information and category information of each object; performing attribute analysis and relationship reasoning based on the spatial position information and the category information of each object to obtain attribute information of each object and a topological relationship between the objects; then, the spatial position information and the category information of each object are used as guiding information, the spatial position information of the object is used for dividing the coding unit of the input video frame, the divided coding unit is coded, and the category information of the object contained in the code stream obtained by coding is used for the video frame reconstruction process of the interframe coding part;
the inter-frame encoding section includes: reconstructing a video frame by taking an input video frame or an object as a unit, and obtaining optical flow prediction information and residual coding information through motion compensation;
entropy coding spatial position information and category information of each object obtained by the intra-frame coding part, attribute information of each object, topological relation between the objects, a coded coding unit and optical flow prediction information and residual coding information obtained by the inter-frame coding part to obtain a corresponding code stream;
the method for dividing the coding units of the input video frame by using the spatial position information and the category information of each object as the guide information and using the spatial position information of the object comprises the following steps: mapping an object to a hidden variable space to be coded according to the space position information of the object, performing semantic division on hidden variables according to the mapped space position information to obtain hidden variables to be coded corresponding to semantics, and then coding the divided hidden variables according to the sequence from top to bottom and from left to right; and the class information of the object contained in the code stream obtained by coding is used as object mark information required by a decoder in the process of reconstructing the video frame of the interframe coding part.
2. The method of claim 1, wherein the relational inference comprises: spatial relationship reasoning and semantic relationship reasoning;
carrying out spatial relationship reasoning by utilizing the spatial position information of each object to obtain the spatial relationship among the objects;
performing semantic relation reasoning by utilizing the category information of the objects to obtain the semantic relation among the objects;
the spatial relationship and the semantic relationship form a topological relationship.
3. The method of claim 1, wherein the stream structure of the intra-frame coding part is: object header information, object attribute information, and an object information stream;
wherein the object header information includes: spatial position information, category information and topological relation information of the object; the object information stream includes: and images corresponding to the objects and the background.
CN202010895946.4A 2020-08-31 2020-08-31 General video compression coding method supporting machine intelligence Active CN111953973B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010895946.4A CN111953973B (en) 2020-08-31 2020-08-31 General video compression coding method supporting machine intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010895946.4A CN111953973B (en) 2020-08-31 2020-08-31 General video compression coding method supporting machine intelligence

Publications (2)

Publication Number Publication Date
CN111953973A CN111953973A (en) 2020-11-17
CN111953973B true CN111953973B (en) 2022-10-28

Family

ID=73368155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010895946.4A Active CN111953973B (en) 2020-08-31 2020-08-31 General video compression coding method supporting machine intelligence

Country Status (1)

Country Link
CN (1) CN111953973B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877007B (en) * 2010-05-18 2012-05-02 南京师范大学 Remote sensing image retrieval method with integration of spatial direction relation semanteme
CN102724485B (en) * 2012-06-26 2016-01-13 公安部第三研究所 Dual core processor is adopted input video to be carried out to the apparatus and method of structural description
CN105049790A (en) * 2015-06-18 2015-11-11 中国人民公安大学 Video monitoring system image acquisition method and apparatus
CN108632625B (en) * 2017-03-21 2020-02-21 华为技术有限公司 Video coding method, video decoding method and related equipment
US10908616B2 (en) * 2017-05-05 2021-02-02 Hrl Laboratories, Llc Attribute aware zero shot machine vision system via joint sparse representations
CN109874011B (en) * 2018-12-28 2020-06-09 杭州海康威视数字技术股份有限公司 Encoding method, decoding method and device
CN111210518B (en) * 2020-01-15 2022-04-05 西安交通大学 Topological map generation method based on visual fusion landmark

Also Published As

Publication number Publication date
CN111953973A (en) 2020-11-17

Similar Documents

Publication Publication Date Title
US20220353525A1 (en) Image encoding method and apparatus, and image decoding method and apparatus
EP4373086A1 (en) Image processing method and apparatus, medium, and electronic device
US11893761B2 (en) Image processing apparatus and method
CN107566798A (en) A kind of system of data processing, method and device
CN116233445B (en) Video encoding and decoding processing method and device, computer equipment and storage medium
JP2024511103A (en) Method and apparatus for evaluating the quality of an image or video based on approximate values, method and apparatus for training a first model, electronic equipment, storage medium, and computer program
CN111680618B (en) Dynamic gesture recognition method based on video data characteristics, storage medium and device
CN111953973B (en) General video compression coding method supporting machine intelligence
CN112866715B (en) Universal video compression coding system supporting man-machine hybrid intelligence
KR20210064587A (en) High speed split device and method for video section
CN115914631A (en) Encoding and decoding method and system with controllable entropy decoding complexity
CN114973224A (en) Character recognition method and device, electronic equipment and storage medium
US20220377342A1 (en) Video encoding and video decoding
CN112967188A (en) Spatial self-adaptive image super-resolution reconstruction method combined with structured semantic code stream
CN109862207B (en) KVM video content change detection method based on compressed domain
CN102948147A (en) Video rate control based on transform-coefficients histogram
WO2023050431A1 (en) Encoding method, decoding method, decoder, encoder and computer-readable storage medium
CN116437089B (en) Depth video compression method based on key target
CN112492314B (en) Dynamic motion estimation algorithm selection method based on machine learning
WO2024078512A1 (en) Pre-analysis based image compression methods
CN113034626A (en) Optimization method for alignment of target object in feature domain in structured image coding
CN115567719A (en) Multi-level convolution video compression method and system
CN116979971A (en) Data encoding method, data decoding method, data encoding device, data decoding device, computer equipment and storage medium
Hu Lossless Decoding Method of Compressed Coded Video Based on Inter-Frame Differential Background Model: Multi-Algorithm Joint Lossless Decoding
CN116962695A (en) Target detection method of HEVC intra-frame coding compression domain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant