WO2021139114A1 - 一种基于反馈优化的人机视觉编码方法和装置 - Google Patents

一种基于反馈优化的人机视觉编码方法和装置 Download PDF

Info

Publication number
WO2021139114A1
WO2021139114A1 PCT/CN2020/099511 CN2020099511W WO2021139114A1 WO 2021139114 A1 WO2021139114 A1 WO 2021139114A1 CN 2020099511 W CN2020099511 W CN 2020099511W WO 2021139114 A1 WO2021139114 A1 WO 2021139114A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
stream
feature
generate
encoding
Prior art date
Application number
PCT/CN2020/099511
Other languages
English (en)
French (fr)
Inventor
段凌宇
刘家瑛
杨文瀚
白燕
高文
Original Assignee
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学 filed Critical 北京大学
Publication of WO2021139114A1 publication Critical patent/WO2021139114A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output

Definitions

  • This application relates to the field of image processing technology, and in particular to a human-machine vision coding method and device based on feedback optimization.
  • the current solution is to use the digital retina architecture and related methods to use three streams of data, models, and features for collaborative learning, to realize the joint allocation of front-end and back-end resources, and to achieve efficient video coding, understanding and analysis.
  • the framework When analyzing massive amounts of big data, the framework has the following shortcomings: (1) Independent processing of feature video streams: The transmission and utilization of data streams and feature streams are separate for the same set of data, so there is redundancy, resulting in resources Waste; (2) One-way data transformation: Although there is interaction between the front and back ends, the nature of the information flow is one-way, the direction is the flow of pixel features to semantic features, and the information is from more to less; (3) Non-scalable: based on video Data optimization video compression and feature compression cannot flexibly support encoding analysis switching of different types of tasks.
  • the embodiments of the present application provide a human-machine visual coding method and device based on feedback optimization.
  • a brief summary is given below. This summary is not a general comment, nor is it intended to determine key/important elements or describe the scope of protection of these embodiments. Its sole purpose is to present some concepts in a simple form as a prelude to the detailed description that follows.
  • an embodiment of the present application provides a human-machine vision coding method based on feedback optimization, which is applied to the coding end, and the method includes:
  • the encoded feature stream and the video stream are sent to the decoding end.
  • the generating a video stream based on the semantic feature includes:
  • the residual video is encoded to generate a video stream.
  • the embodiments of the present application provide a feedback optimization-based human-machine vision coding method, which is applied to the decoding end, and the method includes:
  • the code rate parameter is generated and sent to the encoding end.
  • the generating a decoded video based on the encoded feature stream and video stream includes:
  • the residual video and the reconstructed video are added to generate a decoded video.
  • an embodiment of the present application provides a human-machine visual coding method based on feedback optimization, and the method includes:
  • the encoding terminal obtains the pixel feature corresponding to the target video
  • the encoding end inputs the pixel features into a preset prediction model to generate semantic features
  • the encoding end generates a video stream and a feature stream based on the semantic feature
  • the decoding end generates a decoded video based on the encoded feature stream and video stream;
  • the decoding end When the decoding end receives the parameter adjustment instruction input for the client, it generates the code rate parameter and sends it to the encoding end;
  • the encoder terminal obtains the current bit rate
  • the encoding end adjusts the current code rate based on the code rate parameter to generate an adjusted code rate
  • An encoding end enhances the video stream and the feature stream based on the adjusted bit rate, and generates an enhanced video stream and an enhanced feature stream;
  • the decoding end updates the encoding end model based on the enhanced video stream and the enhanced feature stream, and the encoding end model includes a prediction model and a generation model.
  • the method further includes:
  • the encoding end uses the camera to collect image frames to generate the target video.
  • the encoding end generating a video stream and a feature stream based on the semantic feature includes:
  • the encoding end inputs the semantic features into a preset generation model to generate a reconstructed video
  • the encoding end subtracts the target video and the reconstructed video to generate a residual video
  • the encoding end encodes the residual video to generate a video stream
  • the encoding end inputs the semantic features into a preset compression model to generate a feature stream.
  • an embodiment of the present application provides a human-machine visual coding device based on feedback optimization, and the device includes:
  • the pixel feature acquisition module is used for the encoding terminal to acquire the pixel feature corresponding to the target video
  • the semantic feature acquisition module is used for the encoding terminal to input the pixel features into a preset prediction model to generate semantic features
  • the first generation module is used for the encoding end to generate a video stream and a feature stream based on the semantic feature
  • a video generation module used for the decoder to generate a decoded video based on the encoded feature stream and video stream;
  • the first code rate generation module is used to generate code rate parameters and send them to the encoding end when the decoding end receives a parameter adjustment instruction inputted by the client;
  • the code rate acquisition module is used for the encoding terminal to obtain the current code rate
  • the second code rate generation module is configured to adjust the current code rate based on the code rate parameter by the encoding terminal to generate an adjusted code rate
  • a second stream generating module configured to enhance the video stream and the feature stream based on the adjusted bit rate on the encoding end, and generate an enhanced video stream and an enhanced feature stream;
  • the model update module is used for the decoding end to update the encoding end model based on the enhanced video stream and the enhanced feature stream, and the encoding end model includes a prediction model and a generation model.
  • the device further includes:
  • the video acquisition module is used for the encoding end to collect image frames through the camera to generate the target video.
  • the first stream generation module includes:
  • the first video generation unit is configured to input the semantic features into a preset generation model to generate a reconstructed video at the encoding end;
  • the second video generating unit is used for the encoding terminal to subtract the target video and the reconstructed video to generate a residual video
  • a video stream generating unit configured to generate a video stream after encoding the residual video at the encoding end
  • the feature stream generating unit is configured to input the semantic feature into a preset compression model at the encoding end to generate a feature stream.
  • the encoding terminal obtains the pixel features corresponding to the target video; the encoding terminal inputs the pixel features into a preset prediction model to generate semantic features; the encoding terminal generates a video stream and a feature stream based on the semantic features;
  • the decoder generates a decoded video based on the encoded feature stream and video stream; when the decoder receives a parameter adjustment command input for the client, it generates a bit rate parameter and sends it to the encoder; the encoder obtains the current bit rate; the encoder The current bit rate is adjusted based on the bit rate parameter to generate an adjusted bit rate; the encoding end enhances the video stream and the feature stream based on the adjusted bit rate, and generates an enhanced video stream and enhancement After the feature stream; the decoding end updates the encoding end model based on the enhanced video stream and the enhanced feature stream, and the encoding end model includes a prediction model and a generation model.
  • this solution supports direct compression and transmission of features with a smaller bitstream, it supports efficient video understanding and analysis, and at the same time supports feature-based bitstream reconstruction, and simultaneously supports video reconstruction at a lower cost.
  • the present invention implements incremental adjustment of the bit rate based on the scaling feedback to support understanding, analysis and video viewing tasks, while also allowing front-end model updates based on existing analysis data and features. Improve the performance and efficiency of the model.
  • FIG. 1 is a schematic flowchart of a human-machine visual coding method based on feedback optimization provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of collaborative feedback of pixel features and semantic features provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a front-end and back-end telescopic feedback provided by an embodiment of the present application
  • FIG. 4 is a schematic flowchart of a human-machine visual coding method based on feedback optimization provided by an embodiment of the present application applied to the coding end;
  • FIG. 5 is a schematic flowchart of a human-machine vision coding method based on feedback optimization applied to the decoding end according to an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of a human-machine visual coding device based on feedback optimization provided by an embodiment of the present application
  • FIG. 7 is a schematic structural diagram of another human-machine visual coding device based on feedback optimization provided by an embodiment of the present application.
  • Fig. 8 is a schematic diagram of a first stream generation module provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a terminal provided by an embodiment of the present application.
  • the current plan is to use the digital retina architecture and related methods to use the three streams of data, models and features to learn together to realize the joint allocation of front-end and back-end resources. Realize efficient video coding, understanding and analysis.
  • the framework When analyzing massive amounts of big data, the framework has the following shortcomings: (1) Independent processing of feature video streams: The transmission and utilization of data streams and feature streams are separate for the same set of data, so there is redundancy, resulting in resources Waste; (2) One-way data transformation: Although there is interaction between the front and back ends, the nature of the information flow is one-way, the direction is the flow of pixel features to semantic features, and the information is from more to less; (3) Non-scalable: based on video Data optimization video compression and feature compression cannot flexibly support encoding analysis switching of different types of tasks. To this end, the present application provides a human-machine vision coding method and device based on feedback optimization to solve the above-mentioned related technical problems.
  • the solution since the solution supports direct compression and transmission of features with a smaller bitstream, it supports efficient video understanding and analysis, and at the same time supports feature-based bitstream reconstruction, and simultaneously supports video reconstruction at a lower cost.
  • the present invention implements incremental adjustment of the bit rate based on the scaling feedback to support understanding, analysis and video viewing tasks, while also allowing front-end model updates based on existing analysis data and features. Improve the performance and efficiency of the model.
  • the following will describe in detail the human-machine visual coding method based on feedback optimization provided by the embodiments of the present application with reference to FIG. 1 to FIG. 5.
  • the method can be realized by relying on a computer program, and can be run on a human-machine visual coding device based on feedback optimization based on the Von Neumann system.
  • FIG. 1 provides a schematic flowchart of a human-machine visual coding method based on feedback optimization for an embodiment of the present application.
  • the method of the embodiment of the present application may include the following steps:
  • the encoding terminal obtains pixel features corresponding to the target video
  • the encoding end first collects image frames at different moments through the camera, and the image frame collection collected within a period of time generates the target video. After the target video is formed, the image is processed according to the pre-saved program to obtain the target video. Pixel characteristics.
  • the encoding end inputs the pixel features into a preset prediction model to generate semantic features
  • the pixel feature corresponding to the target video can be obtained according to step S101. After the pixel feature is obtained, the pixel feature is input into a pre-stored prediction model for processing, and the semantic feature corresponding to the target video is generated after processing.
  • the input video V is passed through the prediction model P( ⁇
  • ⁇ p ) to extract the feature F ⁇ f i ⁇ :
  • ⁇ p is the parameter to be learned.
  • F is a compact feature, requiring less bit stream for transmission and storage, and ⁇ is a rate control parameter.
  • ⁇ cf ) compresses F into a feature flow B F :
  • ⁇ cf is the parameter to be learned.
  • the encoding end generates a video stream and a feature stream based on the semantic feature
  • the semantic feature corresponding to the target video can be obtained according to step S102.
  • the semantic feature is input into the preset generation model to generate a reconstructed video, and then the target video and the reconstructed video are subtracted to generate Residual video, and then encode the residual video to generate a video stream.
  • Input semantic features corresponding to the target video into a preset compression model to generate a feature stream, and then encode the feature stream to generate an encoded feature stream.
  • the encoded feature stream and video stream are sent to the decoder.
  • ⁇ g is the parameter to be learned. Generated The more consistent it is with the original video V, the higher-quality reconstructed video can be directly provided according to the transmitted F at a lower cost for manual viewing.
  • ⁇ cv ) is a video compression model
  • ⁇ cv is a parameter to be learned.
  • the decoding end generates a decoded video based on the encoded feature stream and video stream;
  • the encoded feature stream and video stream sent to the decoder are received, then the encoded feature stream is decoded to generate a decoded feature stream, and then the decoded feature stream is generated.
  • the stream is input into the preset generation model to obtain the reconstructed video, and then the video stream is restored to generate the residual video, and then the residual video and the reconstructed video are added to generate the decoded video.
  • the code rate parameter is generated and sent to the encoding end.
  • the flow characteristic is characterized by restoring B F
  • D F ( ⁇
  • ⁇ df is the parameter to be learned. Only need less calculation, it can be used for back-end intelligent analysis applications, supporting rapid understanding of analysis applications.
  • the feature Input to the generative model to generate the reconstructed video is provided for quick viewing:
  • the video stream B V is restored to residual video , Plus reconstructed video , Get the decoded video :
  • ⁇ dv ) is a video decompression model
  • ⁇ dv is a parameter to be learned.
  • Decoded video is used for human eye video content viewing and machine vision applications.
  • the scaling feedback is initiated from the back end to the front end. According to the characteristics of the actual application or the bit rate requirements of the video, the bit rate is increased incrementally to improve the quality of serving human vision and machine vision applications.
  • the existing features and videos cannot meet the needs of the back end (decoding end), and a new bit rate parameter is generated , Sent to the front end (encoding end), enhanced to generate a new incremental residual video code stream R U and feature code stream ⁇ F.
  • the encoding end obtains the current bit rate
  • the encoding end adjusts the current code rate based on the code rate parameter to generate an adjusted code rate
  • S108 The encoding end enhances the video stream and the feature stream based on the adjusted bit rate, and generates an enhanced video stream and an enhanced feature stream;
  • the decoding end updates the encoding end model based on the enhanced video stream and the enhanced feature stream, and the encoding end model includes a prediction model and a generation model.
  • the model parameters are optimized for the current scene. And the model parameters are transmitted or enhanced to the front end for more efficient video feature extraction and compression.
  • ⁇ q is the parameter to be learned.
  • ⁇ F is an incremental feature, and ⁇ f is compressed into a feature flow B DF :
  • C DF ( ⁇
  • ⁇ cdf is a parameter to be learned.
  • ⁇ h is the parameter to be learned. The more consistent it is with the original video V, the higher-quality reconstructed video can be directly provided based on the transmitted F and ⁇ F at a lower cost for manual viewing.
  • ⁇ cdr ) is a video compression model
  • ⁇ cdr is a parameter to be learned.
  • the feature stream B DF is decoded into incremental features :
  • D DF ( ⁇
  • ⁇ ddf is a parameter to be learned. Used to improve the accuracy of back-end intelligent analysis applications.
  • the feature with Input to the generative model generate incremental reconstruction video .
  • incremental reconstruction video In order to provide higher quality reconstructed video for quick viewing without incremental video streaming:
  • the video stream B DV is restored to incremental residual video , Plus reconstructed video , Incremental reconstruction video And the last transmitted residual video To get the updated decoded video :
  • D DV ( ⁇
  • ⁇ ddv is a parameter to be learned.
  • Decoded video is used for fine-grained video content viewing.
  • two feedback mechanisms-coordinated feedback of pixel features and semantic features and back-end and front-end scaling feedback are used to achieve a breakthrough in data/feature/ Joint optimization of model flow.
  • the collaborative feedback of pixel features and semantic features realizes flexible conversion between pixel features and semantic features through prediction and generation models, effectively maps semantic features to pixel features, improves frame coding efficiency and supports application flexibility and scalability, while providing efficient services For human vision and machine vision.
  • the scaling feedback between the back end and the front end allows the back end (decoding end) to initiate scaling feedback when the encoding reconstruction accuracy rate fails to meet the application requirements, so that the front end (encoding end) provides incremental bit streams and improves the back end (decoding end) decoding Features/video quality improves application performance.
  • the encoding terminal obtains the pixel features corresponding to the target video; the encoding terminal inputs the pixel features into a preset prediction model to generate semantic features; the encoding terminal generates a video stream and a feature stream based on the semantic features;
  • the decoder generates a decoded video based on the encoded feature stream and video stream; when the decoder receives a parameter adjustment command input for the client, it generates a bit rate parameter and sends it to the encoder; the encoder obtains the current bit rate; the encoder The current bit rate is adjusted based on the bit rate parameter to generate an adjusted bit rate; the encoding end enhances the video stream and the feature stream based on the adjusted bit rate, and generates an enhanced video stream and enhancement After the feature stream; the decoding end updates the encoding end model based on the enhanced video stream and the enhanced feature stream, and the encoding end model includes a prediction model and a generation model.
  • this solution supports direct compression and transmission of features with a smaller bitstream, it supports efficient video understanding and analysis, and at the same time supports feature-based bitstream reconstruction, and simultaneously supports video reconstruction at a lower cost.
  • the present invention implements incremental adjustment of the bit rate based on the scaling feedback to support understanding, analysis and video viewing tasks, while also allowing front-end model updates based on existing analysis data and features. Improve the performance and efficiency of the model.
  • FIG. 4 provides a schematic flow diagram of a human-machine visual coding method based on feedback optimization applied to the coding end in this embodiment of the application.
  • the method of the embodiment of the present application may include the following steps:
  • S201 Generate a target video by collecting image frames through a camera
  • S209 Send the encoded feature stream and the video stream to a decoding end.
  • the encoding terminal obtains the pixel features corresponding to the target video; the encoding terminal inputs the pixel features into a preset prediction model to generate semantic features; the encoding terminal generates a video stream and a feature stream based on the semantic features;
  • the decoder generates a decoded video based on the encoded feature stream and video stream; when the decoder receives a parameter adjustment instruction input for the client, it generates a bit rate parameter and sends it to the encoder; the encoder obtains the current bit rate; the encoder The current bit rate is adjusted based on the bit rate parameter to generate an adjusted bit rate; the encoder end enhances the video stream and the feature stream based on the adjusted bit rate, and generates an enhanced video stream and enhancement After the feature stream; the decoding end updates the encoding end model based on the enhanced video stream and the enhanced feature stream, and the encoding end model includes a prediction model and a generation model.
  • this solution supports direct compression and transmission of features with a smaller bitstream, it supports efficient video understanding and analysis, and at the same time supports feature-based bitstream reconstruction, and simultaneously supports video reconstruction at a lower cost.
  • the present invention implements incremental adjustment of the bit rate based on telescopic feedback to support understanding analysis and video viewing tasks, while also allowing front-end model updates based on existing analysis data and features. Improve the performance and efficiency of the model.
  • FIG. 5 provides a schematic flowchart of a human-machine vision coding method based on feedback optimization applied to the decoding end for this embodiment of the application.
  • the method of the embodiment of the present application may include the following steps:
  • the encoding terminal obtains the pixel features corresponding to the target video; the encoding terminal inputs the pixel features into a preset prediction model to generate semantic features; the encoding terminal generates a video stream and a feature stream based on the semantic features;
  • the decoder generates a decoded video based on the encoded feature stream and video stream; when the decoder receives a parameter adjustment command input for the client, it generates a bit rate parameter and sends it to the encoder; the encoder obtains the current bit rate; the encoder The current bit rate is adjusted based on the bit rate parameter to generate an adjusted bit rate; the encoding end enhances the video stream and the feature stream based on the adjusted bit rate, and generates an enhanced video stream and enhancement After the feature stream; the decoding end updates the encoding end model based on the enhanced video stream and the enhanced feature stream, and the encoding end model includes a prediction model and a generation model.
  • this solution supports direct compression and transmission of features with a smaller bitstream, it supports efficient video understanding and analysis, and at the same time supports feature-based bitstream reconstruction, and simultaneously supports video reconstruction at a lower cost.
  • the present invention implements incremental adjustment of the bit rate based on the scaling feedback to support understanding, analysis and video viewing tasks, while also allowing front-end model updates based on existing analysis data and features. Improve the performance and efficiency of the model.
  • FIG. 6 shows a schematic structural diagram of a human-machine visual encoding device based on feedback optimization provided by an exemplary embodiment of the present application.
  • the human-machine visual coding method and device based on feedback optimization can be implemented as all or a part of the terminal through software, hardware or a combination of the two.
  • the device 1 includes a pixel feature acquisition module 10, a semantic feature acquisition module 20, a first stream generation module 30, a video generation module 40, a first bit rate generation module 50, a bit rate acquisition module 60, a second bit rate generation module 70, and a second bit rate generation module.
  • the second-rate generation module 80 and the model update module 90 is provided as FIG. 6, which shows a schematic structural diagram of a human-machine visual encoding device based on feedback optimization provided by an exemplary embodiment of the present application.
  • the human-machine visual coding method and device based on feedback optimization can be implemented as all or a part of the terminal through software, hardware or a combination of the two.
  • the device 1 includes
  • the pixel feature acquiring module 10 is used for the encoding terminal to acquire the pixel feature corresponding to the target video;
  • the semantic feature acquisition module 20 is configured to input the pixel features into a preset prediction model at the encoding end to generate semantic features;
  • the first stream generation module 30 is used for the encoding end to generate a video stream and a feature stream based on the semantic feature;
  • the video generation module 40 is used for the decoder to generate a decoded video based on the encoded feature stream and video stream;
  • the first code rate generation module 50 is configured to generate code rate parameters and send them to the encoding end when the decoding end receives a parameter adjustment instruction inputted by the client;
  • the code rate obtaining module 60 is used for the encoding terminal to obtain the current code rate
  • the second code rate generating module 70 is configured to adjust the current code rate based on the code rate parameter by the encoding terminal to generate an adjusted code rate;
  • the second stream generating module 80 is configured to enhance the video stream and the feature stream on the encoding side based on the adjusted bit rate, and generate an enhanced video stream and an enhanced feature stream;
  • the model update module 90 is used for the decoding end to update the encoding end model based on the enhanced video stream and the enhanced feature stream, and the encoding end model includes a prediction model and a generation model.
  • the device 1 further includes:
  • the video acquisition module 100 is used for the encoding end to collect image frames through a camera to generate a target video.
  • the first stream generating module 30 includes:
  • the first video generation unit 310 is configured to input the semantic features into a preset generation model at the encoding end to generate a reconstructed video;
  • the second video generating unit 320 is configured to generate a residual video by subtracting the target video and the reconstructed video from the encoding end;
  • the video stream generating unit 330 is configured to encode the residual video at the encoding end to generate a video stream;
  • the feature stream generating unit 340 is configured to input the semantic features into a preset compression model at the encoding end to generate a feature stream.
  • the feedback-optimized human-machine visual coding device provided by the above-mentioned embodiment only uses the division of the above-mentioned functional modules as an example for the feedback-optimized human-machine visual coding method. In actual applications, it can be described as required.
  • the above-mentioned function allocation is completed by different function modules, that is, the internal structure of the device is divided into different function modules to complete all or part of the functions described above.
  • the human-machine visual coding device based on feedback optimization provided by the above-mentioned embodiment and the embodiment of the human-machine visual coding method based on feedback optimization belong to the same concept, and the implementation process is detailed in the method embodiment, which is not repeated here.
  • the encoding terminal obtains the pixel features corresponding to the target video; the encoding terminal inputs the pixel features into a preset prediction model to generate semantic features; the encoding terminal generates a video stream and a feature stream based on the semantic features;
  • the decoder generates a decoded video based on the encoded feature stream and video stream; when the decoder receives a parameter adjustment command input for the client, it generates a bit rate parameter and sends it to the encoder; the encoder obtains the current bit rate; the encoder The current bit rate is adjusted based on the bit rate parameter to generate an adjusted bit rate; the encoder end enhances the video stream and the feature stream based on the adjusted bit rate, and generates an enhanced video stream and enhancement After the feature stream; the decoding end updates the encoding end model based on the enhanced video stream and the enhanced feature stream, and the encoding end model includes a prediction model and a generation model.
  • this solution supports direct compression and transmission of features with a smaller bitstream, it supports efficient video understanding and analysis, and at the same time supports feature-based bitstream reconstruction, and simultaneously supports video reconstruction at a lower cost.
  • the present invention implements incremental adjustment of the bit rate based on the scaling feedback to support understanding, analysis and video viewing tasks, while also allowing front-end model updates based on existing analysis data and features. Improve the performance and efficiency of the model.
  • the present application also provides a computer-readable medium on which program instructions are stored, and when the program instructions are executed by a processor, the human-machine visual coding method based on feedback optimization provided by the foregoing method embodiments is implemented.
  • the present application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the human-machine visual coding method based on feedback optimization described in the foregoing method embodiments.
  • the terminal 1000 may include: at least one processor 1001, at least one network interface 1004, a user interface 1003, a memory 1005, and at least one communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and a camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • Display display screen
  • Camera Camera
  • the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the processor 1001 may include one or more processing cores.
  • the processor 1001 uses various excuses and lines to connect various parts of the entire electronic device 1000, and executes by running or executing instructions, programs, code sets, or instruction sets stored in the memory 1005, and calling data stored in the memory 1005.
  • Various functions and processing data of the electronic device 1000 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the processor 1001 may integrate one or a combination of a central processing unit (CPU), a graphics processing unit (GPU), a modem, and the like.
  • the CPU mainly processes the operating system, user interface, and application programs; the GPU is used to render and draw the content that needs to be displayed on the display screen; the modem is used to process wireless communication. It is understandable that the above-mentioned modem may not be integrated into the processor 1001, but may be implemented by a chip alone.
  • the memory 1005 may include random access memory (Random Access Memory, RAM), and may also include read-only memory (Read-Only Memory).
  • the memory 1005 includes a non-transitory computer-readable storage medium.
  • the memory 1005 may be used to store instructions, programs, codes, code sets or instruction sets.
  • the memory 1005 may include a program storage area and a data storage area, where the program storage area may store instructions for implementing the operating system and instructions for at least one function (such as touch function, sound playback function, image playback function, etc.), Instructions and the like used to implement the above method embodiments; the storage data area can store the data and the like involved in the above method embodiments.
  • the memory 1005 may also be at least one storage device located far away from the foregoing processor 1001.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a human-machine visual coding application optimized based on feedback.
  • the user interface 1003 is mainly used to provide an input interface for the user to obtain data input by the user; and the processor 1001 can be used to call the human-computer visual coding based on feedback optimization stored in the memory 1005 Application, and specifically perform the following operations:
  • the encoding terminal obtains the pixel feature corresponding to the target video
  • the encoding end inputs the pixel features into a preset prediction model to generate semantic features
  • the encoding end generates a video stream and a feature stream based on the semantic feature
  • the decoding end generates a decoded video based on the encoded feature stream and video stream;
  • the decoding end When the decoding end receives the parameter adjustment instruction input for the client, it generates the code rate parameter and sends it to the encoding end;
  • the encoder terminal obtains the current bit rate
  • the encoding end adjusts the current code rate based on the code rate parameter to generate an adjusted code rate
  • An encoding end enhances the video stream and the feature stream based on the adjusted bit rate, and generates an enhanced video stream and an enhanced feature stream;
  • the decoding end updates the encoding end model based on the enhanced video stream and the enhanced feature stream, and the encoding end model includes a prediction model and a generation model.
  • the processor 1001 further performs the following operations before executing the encoding end to obtain the pixel feature corresponding to the target video:
  • the encoding end uses the camera to collect image frames to generate the target video.
  • the processor 1001 specifically executes the following operations when executing the encoding end to generate a video stream and a feature stream based on the semantic feature:
  • the encoding end inputs the semantic features into a preset generation model to generate a reconstructed video
  • the encoding end subtracts the target video and the reconstructed video to generate a residual video
  • the encoding end encodes the residual video to generate a video stream
  • the encoding end inputs the semantic features into a preset compression model to generate a feature stream.
  • the encoding terminal obtains the pixel features corresponding to the target video; the encoding terminal inputs the pixel features into a preset prediction model to generate semantic features; the encoding terminal generates a video stream and a feature stream based on the semantic features;
  • the decoder generates a decoded video based on the encoded feature stream and video stream; when the decoder receives a parameter adjustment command input for the client, it generates a bit rate parameter and sends it to the encoder; the encoder obtains the current bit rate; the encoder The current bit rate is adjusted based on the bit rate parameter to generate an adjusted bit rate; the encoding end enhances the video stream and the feature stream based on the adjusted bit rate, and generates an enhanced video stream and enhancement After the feature stream; the decoding end updates the encoding end model based on the enhanced video stream and the enhanced feature stream, and the encoding end model includes a prediction model and a generation model.
  • this solution supports direct compression and transmission of features with a smaller bitstream, it supports efficient video understanding and analysis, and at the same time supports feature-based bitstream reconstruction, and simultaneously supports video reconstruction at a lower cost.
  • the present invention implements incremental adjustment of the bit rate based on the scaling feedback to support understanding, analysis and video viewing tasks, while also allowing front-end model updates based on existing analysis data and features. Improve the performance and efficiency of the model.
  • the disclosed methods and products can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the module, program segment, or part of the code contains one or more functions for realizing the specified logical function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the present application is not limited to the processes and structures that have been described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the application is only limited by the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请公开了一种基于反馈优化的人机视觉编码方法,所述方法包括:编码端获取所述目标视频对应的像素特征后输入预设预测模型中生成语义特征;编码端基于所述语义特征生成视频流和特征流;解码端基于所述编码后的特征流和视频流生成解码视频;解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;编码端获取当前码率;编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;解码端基于所述增强后的视频流和增强后的特征流对编码端模型进行更新。因此,采用本申请实施例,可以提高视频特征抽取和压缩效率。

Description

一种基于反馈优化的人机视觉编码方法和装置 技术领域
本申请涉及图像处理技术领域,特别涉及一种基于反馈优化的人机视觉编码方法和装置。
背景技术
智慧城市背景下的大数据分析,对现有“先编码后理解”的传统编码分析体系带来挑战。在前端对视频进行编码,之后在后端进行解码分析。当需要处理的数据量非常大时,维持高质量的视频压缩和传输会造成延迟,消耗大量带宽和存储资源。
为了使得带宽和存储资源消耗减少,目前的方案是采用数字视网膜架构和相关方法利用数据、模型和特征三个流协同学习,实现前后端资源的联合分配,实现高效的视频编码、理解与分析。当对海量大数据进行分析时,该框架存在不足有:(1)特征视频流独立处理:数据流和特征流的传输与利用对同一组数据而言是分离的,因此存在冗余,造成资源浪费;(2)数据单向变换:尽管前后端之间存在交互,但信息流的本质是单向的,方向为像素特征流向语义特征,信息由多向少;(3)不可伸缩:基于视频数据优化视频压缩和特征压缩,不能灵活支持不同类型任务的编码分析切换。
发明内容
本申请实施例提供了一种基于反馈优化的人机视觉编码方法和装置。为了对披露的实施例的一些方面有一个基本的理解,下面给出了简单的概括。该概括部分不是泛泛评述,也不是要确定关键/重要组成元素或描绘这些实施例的保护范围。其唯一目的是用简单的形式呈现一些概念,以此作为后面的详细说明的序言。
第一方面,本申请实施例提供了一种基于反馈优化的人机视觉编码方法,应用于编码端,所述方法包括:
通过摄像头采集图像帧生成目标视频;
获取所述目标视频对应的像素特征;
将所述像素特征输入预设预测模型中生成语义特征;
基于所述语义特征生成视频流;
将所述语义特征输入预设压缩模型中生成特征流;
将所述特征流进行编码生成编码后的特征流;
将所述编码后的特征流和所述视频流发送至解码端。
可选的,所述基于所述语义特征生成视频流,包括:
将所述语义特征输入预设生成模型中生成重建视频;
将所述目标视频和重建视频相减生成残差视频;
将所述残差视频进行编码后生成视频流。
第二方面,本申请实施例提供了一种基于反馈优化的人机视觉编码方法,应用于解码端,所述方法包括:
当接收到针对解码端发送的编码后的特征流和视频流时,获取编码后的特征流和视频流;
基于所述编码后的特征流和视频流生成解码视频;
当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端。
可选的,所述基于所述编码后的特征流和视频流生成解码视频,包括:
将所述编码后的特征流进行解码生成解码后的特征流;
将所述解码后的特征流输入预设生成模型中得到重建视频;
将所述视频流进行复原后生成残差视频;
将所述残差视频和所述重建视频相加后生成解码视频。
第三方面,本申请实施例提供了一种基于反馈优化的人机视觉编码方法,所述方法包括:
编码端获取所述目标视频对应的像素特征;
编码端将所述像素特征输入预设预测模型中生成语义特征;
编码端基于所述语义特征生成视频流和特征流;
解码端基于所述编码后的特征流和视频流生成解码视频;
解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;
编码端获取当前码率;
编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;
编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;
解码端基于所述增强后的视频流和增强后的特征流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。
可选的,所述编码端获取所述目标视频对应的像素特征之前,还包括:
编码端通过摄像头采集图像帧生成目标视频。
可选的,所述编码端基于所述语义特征生成视频流和特征流,包括:
编码端将所述语义特征输入预设生成模型中生成重建视频;
编码端将所述目标视频和重建视频相减生成残差视频;
编码端将所述残差视频进行编码后生成视频流;
编码端将所述语义特征输入预设压缩模型中生成特征流。
第四方面,本申请实施例提供一种基于反馈优化的人机视觉编码装置,所述装置包括:
像素特征获取模块,用于编码端获取所述目标视频对应的像素特征;
语义特征获取模块,用于编码端将所述像素特征输入预设预测模型中生成语义特征;
第一流生成模块,用于编码端基于所述语义特征生成视频流和特征流;
视频生成模块,用于解码端基于所述编码后的特征流和视频流生成解码视频;
第一码率生成模块,用于解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;
码率获取模块,用于编码端获取当前码率;
第二码率生成模块,用于编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;
第二流生成模块,用于编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;
模型更新模块,用于解码端基于所述增强后的视频流和增强后的特征流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。
可选的,所述装置还包括:
视频采集模块,用于编码端通过摄像头采集图像帧生成目标视频。
可选的,所述第一流生成模块,包括:
第一视频生成单元,用于编码端将所述语义特征输入预设生成模型中生成重建视频;
第二视频生成单元,用于编码端将所述目标视频和重建视频相减生成残差视频;
视频流生成单元,用于编码端将所述残差视频进行编码后生成视频流;
特征流生成单元,用于编码端将所述语义特征输入预设压缩模型中生成特征流。
本申请实施例提供的技术方案可以包括以下有益效果:
在本申请实施例中,编码端获取所述目标视频对应的像素特征;编码端将所述像素特征输入预设预测模型中生成语义特征;编码端基于所述语义特征生成视频流和特征流;解码端基于所述编码后的特征流和视频流生成解码视频;解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;编码端获取当前码率;编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;解码端基于所述增强后的视频流和增强后的特征流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。由于本方案支持以较小码流直接压缩和传输特征,支持高效的视频理解分析,同时支持基于特征的码流重建,以较小代价同时支持视频重建。考虑到实际应用中的码率需求变化,本发明基于伸缩反馈实现了码率增量调整以支持理解分析和视频查看任务,同时还允许基于已有的分析数据与特征,进行前端的模型更新,提升模型的性能和效率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1是本申请实施例提供的一种基于反馈优化的人机视觉编码方法的流程示意图;
图2是本申请实施例提供的一种像素特征和语义特征的协同反馈示意图;
图3是本申请实施例提供的一种前端和后端的伸缩反馈示意图;
图4是本申请实施例提供的一种基于反馈优化的人机视觉编码方法应用于编码端的流程示意图;
图5是本申请实施例提供的一种基于反馈优化的人机视觉编码方法应用于解码端的流程示意图;
图6是本申请实施例提供的一种基于反馈优化的人机视觉编码装置的结构示意图;
图7是本申请实施例提供的另一种基于反馈优化的人机视觉编码装置的结构示意图;
图8是本申请实施例提供的第一流生成模块示意图;
图9是本申请实施例提供的一种终端示意图。
具体实施方式
以下描述和附图充分地描述出本申请的具体实施方案,以使本领域的技术人员能够实践它们。
应当明确,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
在本申请的描述中,需要理解的是,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本申请中的具体含义。此外,在本申请的描述中,除非另有说明,“多个”是指两个或两个 以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
到目前为止,在编码分析体系中,为了使得带宽和存储资源消耗减少,目前的方案是采用数字视网膜架构和相关方法利用数据、模型和特征三个流协同学习,实现前后端资源的联合分配,实现高效的视频编码、理解与分析。当对海量大数据进行分析时,该框架存在不足有:(1)特征视频流独立处理:数据流和特征流的传输与利用对同一组数据而言是分离的,因此存在冗余,造成资源浪费;(2)数据单向变换:尽管前后端之间存在交互,但信息流的本质是单向的,方向为像素特征流向语义特征,信息由多向少;(3)不可伸缩:基于视频数据优化视频压缩和特征压缩,不能灵活支持不同类型任务的编码分析切换。为此,本申请提供了一种基于反馈优化的人机视觉编码方法和装置,以解决上述相关技术问题中存在的问题。本申请提供的技术方案中,由于本方案支持以较小码流直接压缩和传输特征,支持高效的视频理解分析,同时支持基于特征的码流重建,以较小代价同时支持视频重建。考虑到实际应用中的码率需求变化,本发明基于伸缩反馈实现了码率增量调整以支持理解分析和视频查看任务,同时还允许基于已有的分析数据与特征,进行前端的模型更新,提升模型的性能和效率。
下面将结合附图1-附图5,对本申请实施例提供的基于反馈优化的人机视觉编码方法进行详细介绍。该方法可依赖于计算机程序实现,可运行于基于冯诺依曼体系的基于反馈优化的人机视觉编码装置上。
请参见图1,为本申请实施例提供了一种基于反馈优化的人机视觉编码方法的流程示意图。如图1所示,本申请实施例的所述方法可以包括以下步骤:
S101,编码端获取所述目标视频对应的像素特征;
在本申请实施例中,首先编码端通过摄像头采集不同时刻的图像帧,在一段时间内采集的图像帧集合生成目标视频,当目标视频形成后,根据预先保存的程序处理图像后获取目标视频的像素特征。
S102,编码端将所述像素特征输入预设预测模型中生成语义特征;
在一种可能的实现方式中,根据步骤S101可得到目标视频对应的像素特征,在得到像素特征后,将像素特征输入预先保存的预测模型中进行处理,处理后生成目标视频对应的语义特征。
例如,在前端(编码端),将输入视频V经过预测模型P(·|θ p)提取特征F={f i}:
F=P(V,λ|θ p),
其中,θ p是待学习的参数。F是紧凑的特征,传输和存储仅需要较少的比特流,λ是码率控制参数。压缩模型C F(·|θ cf)将F压缩为特征流B F:
B F=C F(F|θ cf),
其中,θ cf是待学习的参数。
S103,编码端基于所述语义特征生成视频流和特征流;
在一种可能的实现方式中,根据步骤S102可得到目标视频对应的语义特征,在得到语义特征后,将语义特征输入预设生成模型中生成重建视频,再将目标视频和重建视频相减生成残差视频,然后将残差视频进行编码后生的视频流。将目标视频对应的语义特征输入预设的压缩 模型中生成特征流,再将特征流进行编码后生成编码后的特征流。最后将编码后的特征流和视频流发送至解码端。
例如,在前端(编码端),将提取的特征F={f i}输入到生成模型中,生成得到重建视频
Figure PCTCN2020099511-appb-000001
Figure PCTCN2020099511-appb-000002
其中,θ g是待学习的参数。生成的
Figure PCTCN2020099511-appb-000003
与原视频V越一致,则可以以更小的代价直接根据传输的F提供高质量的重建视频,用于人工查看。
在前端(编码端),将原视频V和重建的视频
Figure PCTCN2020099511-appb-000004
相减,得到残差视频
Figure PCTCN2020099511-appb-000005
编码为视频流B v:
B V=C V(R|θ cv),
其中,C V(·|θ cv)是视频压缩模型,θ cv是待学习的参数。
S104,解码端基于所述编码后的特征流和视频流生成解码视频;
在一种可能的实现方式中,首先当接收到针对解码端发送的编码后的特征流和视频流时,然后将编码后的特征流进行解码生成解码后的特征流,再将解码后的特征流输入预设生成模型中得到重建视频,再将视频流进行复原后生成残差视频,再将残差视频和重建视频相加后生成解码视频。最后当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端。
例如,在后端(解码端),将特征流B F复原为特征
Figure PCTCN2020099511-appb-000006
Figure PCTCN2020099511-appb-000007
其中,D F(·|θ df)是特征解压模型,θ df是待学习的参数。只需要较少计算,就可以用于后端的智能分析应用,支撑快速的理解分析应用。
在后端(解码端),将特征
Figure PCTCN2020099511-appb-000008
输入到生成模型中,生成得到重建视频
Figure PCTCN2020099511-appb-000009
以在没有视频流传输的情况下,提供重建视频,用于快速查看:
Figure PCTCN2020099511-appb-000010
在后端(解码端),将视频流B V复原为残差视频
Figure PCTCN2020099511-appb-000011
,加上重建视频
Figure PCTCN2020099511-appb-000012
,得到解码视频
Figure PCTCN2020099511-appb-000013
:
Figure PCTCN2020099511-appb-000014
Figure PCTCN2020099511-appb-000015
其中,D V(·|θ dv)是视频解压模型,θ dv是待学习的参数。解码视频用于人眼视频内容查看及机 器视觉应用。
S105,解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;
在一种可能的实现方式中,当已有特征或视频的质量并不能满足应用需求时,从后端向前端发起伸缩反馈。根据实际应用中的特征或视频的码率需求,增量增加码率,提升服务于人眼视觉和机器视觉应用的质量。
例如,现有特征和视频不能满足后端(解码端)的需求,生成一个新的码率参数
Figure PCTCN2020099511-appb-000016
,发送到前端(编码端),增强生成新的增量残差视频码流R U和特征码流ΔF。
S106,编码端获取当前码率;
S107,编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;
S108,编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;
S109,解码端基于所述增强后的视频流和增强后的特征流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。
在一种可能的实现方式中,基于存储的特征和视频,针对当前场景优化模型参数。并将模型参数传输或增强传输到前端,以进行更高效的视频特征抽取和压缩。
具体的,在前端(编码端),给定已编码特征F和调整后的码率控制参数
Figure PCTCN2020099511-appb-000017
,将输入视频V={v i}经过预测模型Q(·|θ q)进行增量特征提取:
Figure PCTCN2020099511-appb-000018
其中,θ q是待学习的参数。ΔF是增量特征,将Δf压缩为特征流B DF:
B DF=C DF(ΔF|θ cdf),
其中,C DF(·|θ cdf)是特征压缩模型,θ cdf是待学习的参数。
在前端(编码端),将更新后的特征F U=F+ΔF的两个部分F和ΔF输入到生成模型中,生成得到增量重建视频
Figure PCTCN2020099511-appb-000019
:
Figure PCTCN2020099511-appb-000020
其中,θ h是待学习的参数。
Figure PCTCN2020099511-appb-000021
与原视频V越一致,则可以以更小的代价直接根据传输的F和ΔF提供高质量的重建视频,用于人工查看。
在前端(编码端),将原视频V,与重建的视频
Figure PCTCN2020099511-appb-000022
和第一次传输的残差视频
Figure PCTCN2020099511-appb-000023
相减,得到增量残差视频
Figure PCTCN2020099511-appb-000024
,编码为视频流B DV:
B DV=C DV(R Ucdr),
其中,C DV(·|θ cdr)是视频压缩模型,θ cdr是待学习的参数。
在后端(解码端),将特征流B DF解码为增量特征
Figure PCTCN2020099511-appb-000025
Figure PCTCN2020099511-appb-000026
其中,D DF(·|θ ddf)是增量特征解压模型,θ ddf是待学习的参数。
Figure PCTCN2020099511-appb-000027
用于提升后端智能分析应用的准确性。
在后端(解码端),将特征
Figure PCTCN2020099511-appb-000028
Figure PCTCN2020099511-appb-000029
输入到生成模型中,生成得到增量重建视频
Figure PCTCN2020099511-appb-000030
。以在没有增量视频流传输的情况下,提供更高质量的重建视频,用于快速查看:
Figure PCTCN2020099511-appb-000031
在后端(解码端),将视频流B DV复原为增量残差视频
Figure PCTCN2020099511-appb-000032
,加上重建视频
Figure PCTCN2020099511-appb-000033
、增量重建视频
Figure PCTCN2020099511-appb-000034
和上一次传输的残差视频
Figure PCTCN2020099511-appb-000035
,得到更新后的解码视频
Figure PCTCN2020099511-appb-000036
:
Figure PCTCN2020099511-appb-000037
Figure PCTCN2020099511-appb-000038
其中,D DV(·|θ ddv)是视频解压模型,θ ddv是待学习的参数。解码视频用于细粒度的视频内容查看。
在后端(解码端),根据解码视频
Figure PCTCN2020099511-appb-000039
和特征
Figure PCTCN2020099511-appb-000040
,对前端模型进行调整,生成模型变化量ΔM:
Figure PCTCN2020099511-appb-000041
在前端(编码端),更新模型:
M′=ΔM+M.
在本申请实施例中,例如图2和图3所示,利用两种反馈机制——像素特征与语义特征的协同反馈和后端与前端的伸缩反馈,突破性地实现了对数据/特征/模型流的联合优化。像素特征与语义特征的协同反馈通过预测和生成模型实现像素特征和语义特征之间的灵活转换,有效映射语义特征到像素特征,提高框架编码效率以及支撑应用的灵活性和伸缩性,同时高效服务于人眼视觉和机器视觉。后端与前端的伸缩反馈在编码重建准确率未能达到应用需求时,允许后端(解码端)发起伸缩反馈,使前端(编码端)增量提供码流,提升后端(解码端)解码特征/视频的质量,提升应用性能。
在本申请实施例中,编码端获取所述目标视频对应的像素特征;编码端将所述像素特征输入预设预测模型中生成语义特征;编码端基于所述语义特征生成视频流和特征流;解码端基于所述编码后的特征流和视频流生成解码视频;解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;编码端获取当前码率;编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;解码端基于所述增强后的视频流和增强后的特征 流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。由于本方案支持以较小码流直接压缩和传输特征,支持高效的视频理解分析,同时支持基于特征的码流重建,以较小代价同时支持视频重建。考虑到实际应用中的码率需求变化,本发明基于伸缩反馈实现了码率增量调整以支持理解分析和视频查看任务,同时还允许基于已有的分析数据与特征,进行前端的模型更新,提升模型的性能和效率。
请参见图4,为本申请实施例提供了一种基于反馈优化的人机视觉编码方法应用于编码端的流程示意图。如图4所示,本申请实施例的所述方法可以包括以下步骤:
S201,通过摄像头采集图像帧生成目标视频;
S202,获取所述目标视频对应的像素特征;
S203,将所述像素特征输入预设预测模型中生成语义特征;
S204,将所述语义特征输入预设生成模型中生成重建视频;
S205,将所述目标视频和重建视频相减生成残差视频;
S206,将所述残差视频进行编码后生成视频流。
S207,将所述语义特征输入预设压缩模型中生成特征流;
S208,将所述特征流进行编码生成编码后的特征流;
S209,将所述编码后的特征流和所述视频流发送至解码端。
在本申请实施例中,编码端获取所述目标视频对应的像素特征;编码端将所述像素特征输入预设预测模型中生成语义特征;编码端基于所述语义特征生成视频流和特征流;解码端基于所述编码后的特征流和视频流生成解码视频;解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;编码端获取当前码率;编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;解码端基于所述增强后的视频流和增强后的特征流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。由于本方案支持以较小码流直接压缩和传输特征,支持高效的视频理解分析,同时支持基于特征的码流重建,以较小代价同时支持视频重建。考虑到实际应用中的码率需求变化,本发明基于伸缩反馈实现了码率增量调整以支持理解分析和视频查看任务,同时还允许基于已有的分析数据与特征,进行前端的模型更新,提升模型的性能和效率。
请参见图5,为本申请实施例提供了一种基于反馈优化的人机视觉编码方法应用于解码端的流程示意图。如图5所示,本申请实施例的所述方法可以包括以下步骤:
S301,当接收到针对解码端发送的编码后的特征流和视频流时,获取编码后的特征流和视频流;
S302,将所述编码后的特征流进行解码生成解码后的特征流;
S303,将所述解码后的特征流输入预设生成模型中得到重建视频;
S304,将所述视频流进行复原后生成残差视频;
S305,将所述残差视频和所述重建视频相加后生成解码视频。
S306,当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端。
在本申请实施例中,编码端获取所述目标视频对应的像素特征;编码端将所述像素特征输入预设预测模型中生成语义特征;编码端基于所述语义特征生成视频流和特征流;解码端基于所述编码后的特征流和视频流生成解码视频;解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;编码端获取当前码率;编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;解码端基于所述增强后的视频流和增强后的特征 流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。由于本方案支持以较小码流直接压缩和传输特征,支持高效的视频理解分析,同时支持基于特征的码流重建,以较小代价同时支持视频重建。考虑到实际应用中的码率需求变化,本发明基于伸缩反馈实现了码率增量调整以支持理解分析和视频查看任务,同时还允许基于已有的分析数据与特征,进行前端的模型更新,提升模型的性能和效率。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
请参见图6,其示出了本申请一个示例性实施例提供的基于反馈优化的人机视觉编码装置的结构示意图。该基于反馈优化的人机视觉编码方法装置可以通过软件、硬件或者两者的结合实现成为终端的全部或一部分。该装置1包括像素特征获取模块10、语义特征获取模块20、第一流生成模块30、视频生成模块40、第一码率生成模块50、码率获取模块60、第二码率生成模块70、第二流生成模块80、模型更新模块90。
像素特征获取模块10,用于编码端获取所述目标视频对应的像素特征;
语义特征获取模块20,用于编码端将所述像素特征输入预设预测模型中生成语义特征;
第一流生成模块30,用于编码端基于所述语义特征生成视频流和特征流;
视频生成模块40,用于解码端基于所述编码后的特征流和视频流生成解码视频;
第一码率生成模块50,用于解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;
码率获取模块60,用于编码端获取当前码率;
第二码率生成模块70,用于编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;
第二流生成模块80,用于编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;
模型更新模块90,用于解码端基于所述增强后的视频流和增强后的特征流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。
可选的,如图7所示,所述装置1还包括:
视频采集模块100,用于编码端通过摄像头采集图像帧生成目标视频。
可选的,如图8所示,所述第一流生成模块30,包括:
第一视频生成单元310,用于编码端将所述语义特征输入预设生成模型中生成重建视频;
第二视频生成单元320,用于编码端将所述目标视频和重建视频相减生成残差视频;
视频流生成单元330,用于编码端将所述残差视频进行编码后生成视频流;
特征流生成单元340,用于编码端将所述语义特征输入预设压缩模型中生成特征流。
需要说明的是,上述实施例提供的基于反馈优化的人机视觉编码装置在基于反馈优化的人机视觉编码方法时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的基于反馈优化的人机视觉编码装置与基于反馈优化的人机视觉编码方法实施例属于同一构思,其体现实现过程详见方法实施例,这里不再赘述。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
在本申请实施例中,编码端获取所述目标视频对应的像素特征;编码端将所述像素特征输入预设预测模型中生成语义特征;编码端基于所述语义特征生成视频流和特征流;解码端基于所述编码后的特征流和视频流生成解码视频;解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;编码端获取当前码率;编码端基于所述码率参数对所述当前 码率进行调整生成调整后的码率;编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;解码端基于所述增强后的视频流和增强后的特征流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。由于本方案支持以较小码流直接压缩和传输特征,支持高效的视频理解分析,同时支持基于特征的码流重建,以较小代价同时支持视频重建。考虑到实际应用中的码率需求变化,本发明基于伸缩反馈实现了码率增量调整以支持理解分析和视频查看任务,同时还允许基于已有的分析数据与特征,进行前端的模型更新,提升模型的性能和效率。
本申请还提供一种计算机可读介质,其上存储有程序指令,该程序指令被处理器执行时实现上述各个方法实施例提供的基于反馈优化的人机视觉编码方法。
本申请还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各个方法实施例所述的基于反馈优化的人机视觉编码方法。
请参见图9,为本申请实施例提供了一种终端的结构示意图。如图9所示,所述终端1000可以包括:至少一个处理器1001,至少一个网络接口1004,用户接口1003,存储器1005,至少一个通信总线1002。
其中,通信总线1002用于实现这些组件之间的连接通信。
其中,用户接口1003可以包括显示屏(Display)、摄像头(Camera),可选用户接口1003还可以包括标准的有线接口、无线接口。
其中,网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。
其中,处理器1001可以包括一个或者多个处理核心。处理器1001利用各种借口和线路连接整个电子设备1000内的各个部分,通过运行或执行存储在存储器1005内的指令、程序、代码集或指令集,以及调用存储在存储器1005内的数据,执行电子设备1000的各种功能和处理数据。可选的,处理器1001可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器1001可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示屏所需要显示的内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器1001中,单独通过一块芯片进行实现。
其中,存储器1005可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。可选的,该存储器1005包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器1005可用于存储指令、程序、代码、代码集或指令集。存储器1005可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现上述各个方法实施例的指令等;存储数据区可存储上面各个方法实施例中涉及到的数据等。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图9所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及基于反馈优化的人机视觉编码应用程序。
在图9所示的终端1000中,用户接口1003主要用于为用户提供输入的接口,获取用户输入的数据;而处理器1001可以用于调用存储器1005中存储的基于反馈优化的人机视觉编码应用程序,并具体执行以下操作:
编码端获取所述目标视频对应的像素特征;
编码端将所述像素特征输入预设预测模型中生成语义特征;
编码端基于所述语义特征生成视频流和特征流;
解码端基于所述编码后的特征流和视频流生成解码视频;
解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;
编码端获取当前码率;
编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;
编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;
解码端基于所述增强后的视频流和增强后的特征流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。
在一个实施例中,所述处理器1001在执行所述编码端获取所述目标视频对应的像素特征之前时,还执行以下操作:
编码端通过摄像头采集图像帧生成目标视频。
在一个实施例中,所述处理器1001在执行所述编码端基于所述语义特征生成视频流和特征流时,具体执行以下操作:
编码端将所述语义特征输入预设生成模型中生成重建视频;
编码端将所述目标视频和重建视频相减生成残差视频;
编码端将所述残差视频进行编码后生成视频流;
编码端将所述语义特征输入预设压缩模型中生成特征流。
在本申请实施例中,编码端获取所述目标视频对应的像素特征;编码端将所述像素特征输入预设预测模型中生成语义特征;编码端基于所述语义特征生成视频流和特征流;解码端基于所述编码后的特征流和视频流生成解码视频;解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;编码端获取当前码率;编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;解码端基于所述增强后的视频流和增强后的特征流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。由于本方案支持以较小码流直接压缩和传输特征,支持高效的视频理解分析,同时支持基于特征的码流重建,以较小代价同时支持视频重建。考虑到实际应用中的码率需求变化,本发明基于伸缩反馈实现了码率增量调整以支持理解分析和视频查看任务,同时还允许基于已有的分析数据与特征,进行前端的模型更新,提升模型的性能和效率。
本领域技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。所属技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本文所披露的实施例中,应该理解到,所揭露的方法、产品(包括但不限于装置、设备等),可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方, 或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
应当理解的是,附图中的流程图和框图显示了根据本申请的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。本申请并不局限于上面已经描述并在附图中示出的流程及结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (10)

  1. 一种基于反馈优化的人机视觉编码方法,应用于编码端,其特征在于,所述方法包括:
    通过摄像头采集图像帧生成目标视频;
    获取所述目标视频对应的像素特征;
    将所述像素特征输入预设预测模型中生成语义特征;
    基于所述语义特征生成视频流;
    将所述语义特征输入预设压缩模型中生成特征流;
    将所述特征流进行编码生成编码后的特征流;
    将所述编码后的特征流和所述视频流发送至解码端。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述语义特征生成视频流,包括:
    将所述语义特征输入预设生成模型中生成重建视频;
    将所述目标视频和重建视频相减生成残差视频;
    将所述残差视频进行编码后生成视频流。
  3. 一种基于反馈优化的人机视觉编码方法,应用于解码端,其特征在于,所述方法包括:
    当接收到针对解码端发送的编码后的特征流和视频流时,获取编码后的特征流和视频流;
    基于所述编码后的特征流和视频流生成解码视频;
    当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述编码后的特征流和视频流生成解码视频,包括:
    将所述编码后的特征流进行解码生成解码后的特征流;
    将所述解码后的特征流输入预设生成模型中得到重建视频;
    将所述视频流进行复原后生成残差视频;
    将所述残差视频和所述重建视频相加后生成解码视频。
  5. 一种基于反馈优化的人机视觉编码方法,其特征在于,所述方法包括:
    编码端获取所述目标视频对应的像素特征;
    编码端将所述像素特征输入预设预测模型中生成语义特征;
    编码端基于所述语义特征生成视频流和特征流;
    解码端基于所述编码后的特征流和视频流生成解码视频;
    解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;
    编码端获取当前码率;
    编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;
    编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;
    解码端基于所述增强后的视频流和增强后的特征流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。
  6. 根据权利要求5所述的方法,其特征在于,所述编码端获取所述目标视频对应的像素特征之前,还包括:
    编码端通过摄像头采集图像帧生成目标视频。
  7. 根据权利要求5所述的方法,其特征在于,所述编码端基于所述语义特征生成视频流和特征流,包括:
    编码端将所述语义特征输入预设生成模型中生成重建视频;
    编码端将所述目标视频和重建视频相减生成残差视频;
    编码端将所述残差视频进行编码后生成视频流;
    编码端将所述语义特征输入预设压缩模型中生成特征流。
  8. 一种基于反馈优化的人机视觉编码装置,其特征在于,所述装置包括:
    像素特征获取模块,用于编码端获取所述目标视频对应的像素特征;
    语义特征获取模块,用于编码端将所述像素特征输入预设预测模型中生成语义特征;
    第一流生成模块,用于编码端基于所述语义特征生成视频流和特征流;
    视频生成模块,用于解码端基于所述编码后的特征流和视频流生成解码视频;
    第一码率生成模块,用于解码端当接收到针对客户端输入的参数调整指令时,生成码率参数发送至编码端;
    码率获取模块,用于编码端获取当前码率;
    第二码率生成模块,用于编码端基于所述码率参数对所述当前码率进行调整生成调整后的码率;
    第二流生成模块,用于编码端基于所述调整后的码率增强所述视频流和所述特征流,生成增强后的视频流和增强后的特征流;
    模型更新模块,用于解码端基于所述增强后的视频流和增强后的特征流对编码端模型进行更新,所述编码端模型包括预测模型和生成模型。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    视频采集模块,用于编码端通过摄像头采集图像帧生成目标视频。
  10. 根据权利要求8所述的装置,其特征在于,所述第一流生成模块,包括:
    第一视频生成单元,用于编码端将所述语义特征输入预设生成模型中生成重建视频;
    第二视频生成单元,用于编码端将所述目标视频和重建视频相减生成残差视频;
    视频流生成单元,用于编码端将所述残差视频进行编码后生成视频流;
    特征流生成单元,用于编码端将所述语义特征输入预设压缩模型中生成特征流。
PCT/CN2020/099511 2020-01-09 2020-06-30 一种基于反馈优化的人机视觉编码方法和装置 WO2021139114A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010020628.3A CN111163318B (zh) 2020-01-09 2020-01-09 一种基于反馈优化的人机视觉编码方法和装置
CN202010020628.3 2020-01-09

Publications (1)

Publication Number Publication Date
WO2021139114A1 true WO2021139114A1 (zh) 2021-07-15

Family

ID=70562225

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099511 WO2021139114A1 (zh) 2020-01-09 2020-06-30 一种基于反馈优化的人机视觉编码方法和装置

Country Status (2)

Country Link
CN (1) CN111163318B (zh)
WO (1) WO2021139114A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111163318B (zh) * 2020-01-09 2021-05-04 北京大学 一种基于反馈优化的人机视觉编码方法和装置
CN112351252B (zh) * 2020-10-27 2023-10-20 重庆中星微人工智能芯片技术有限公司 监控视频编解码装置
CN112383778B (zh) * 2020-11-12 2023-03-17 三星电子(中国)研发中心 一种视频编码方法、装置及解码方法、装置
CN114157863B (zh) * 2022-02-07 2022-07-22 浙江智慧视频安防创新中心有限公司 基于数字视网膜的视频编码方法、系统及存储介质
CN114630129A (zh) * 2022-02-07 2022-06-14 浙江智慧视频安防创新中心有限公司 一种基于智能数字视网膜的视频编解码方法和装置
CN116708843B (zh) * 2023-08-03 2023-10-31 清华大学 一种语义通信过程中的用户体验质量反馈调节系统
CN116743609B (zh) * 2023-08-14 2023-10-17 清华大学 一种基于语义通信的视频流媒体的QoE评估方法及装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105530449A (zh) * 2014-09-30 2016-04-27 阿里巴巴集团控股有限公司 编码参数调整方法及装置
CN107483969A (zh) * 2017-09-19 2017-12-15 上海爱优威软件开发有限公司 一种基于pca的数据传输方法及系统
CN108882020A (zh) * 2017-05-15 2018-11-23 北京大学 一种视频信息处理方法、装置及系统
CN109218727A (zh) * 2017-06-30 2019-01-15 华为软件技术有限公司 视频处理的方法和装置
US10419773B1 (en) * 2018-03-22 2019-09-17 Amazon Technologies, Inc. Hybrid learning for adaptive video grouping and compression
CN110278050A (zh) * 2018-03-13 2019-09-24 中兴通讯股份有限公司 一种超100g wdm传输系统反馈调优的方法及装置
CN110381268A (zh) * 2019-06-25 2019-10-25 深圳前海达闼云端智能科技有限公司 生成视频的方法,装置,存储介质及电子设备
CN111163318A (zh) * 2020-01-09 2020-05-15 北京大学 一种基于反馈优化的人机视觉编码方法和装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105530449A (zh) * 2014-09-30 2016-04-27 阿里巴巴集团控股有限公司 编码参数调整方法及装置
CN108882020A (zh) * 2017-05-15 2018-11-23 北京大学 一种视频信息处理方法、装置及系统
CN109218727A (zh) * 2017-06-30 2019-01-15 华为软件技术有限公司 视频处理的方法和装置
CN107483969A (zh) * 2017-09-19 2017-12-15 上海爱优威软件开发有限公司 一种基于pca的数据传输方法及系统
CN110278050A (zh) * 2018-03-13 2019-09-24 中兴通讯股份有限公司 一种超100g wdm传输系统反馈调优的方法及装置
US10419773B1 (en) * 2018-03-22 2019-09-17 Amazon Technologies, Inc. Hybrid learning for adaptive video grouping and compression
CN110381268A (zh) * 2019-06-25 2019-10-25 深圳前海达闼云端智能科技有限公司 生成视频的方法,装置,存储介质及电子设备
CN111163318A (zh) * 2020-01-09 2020-05-15 北京大学 一种基于反馈优化的人机视觉编码方法和装置

Also Published As

Publication number Publication date
CN111163318B (zh) 2021-05-04
CN111163318A (zh) 2020-05-15

Similar Documents

Publication Publication Date Title
WO2021139114A1 (zh) 一种基于反馈优化的人机视觉编码方法和装置
WO2021208247A1 (zh) 一种视频图像的拟态压缩方法、装置、存储介质及终端
WO2021068598A1 (zh) 共享屏幕的编码方法、装置、存储介质及电子设备
WO2023016155A1 (zh) 图像处理方法、装置、介质及电子设备
CN111652830A (zh) 图像处理方法及装置、计算机可读介质及终端设备
CN101883284B (zh) 基于背景建模和可选差分模式的视频编/解码方法及系统
WO2021174878A1 (zh) 视频编码方法、装置、计算机设备及存储介质
CN111586412B (zh) 高清视频处理方法、主设备、从设备和芯片系统
WO2019228207A1 (zh) 一种图像编码、解码方法、相关装置及存储介质
US20210006840A1 (en) Techniques and apparatus for pcm patch creation using morton codes
US11587263B2 (en) Method and apparatus for enhanced patch boundary identification for point cloud compression
CN103152573A (zh) 一种移动终端与智能电视间图像帧传输的方法及系统
CN110827380A (zh) 图像的渲染方法、装置、电子设备及计算机可读介质
US10904579B2 (en) Method and apparatus for annealing iterative geometry smoothing
CN105208394B (zh) 一种实时数字图像压缩预测方法与系统
WO2016202285A1 (zh) 一种即时视频的传输方法和电子设备
WO2013174337A2 (zh) 字幕提取方法及装置
WO2018107715A1 (zh) 一种降低坐席管理系统成本的方法及系统
WO2023024832A1 (zh) 数据处理方法、装置、计算机设备和存储介质
WO2023104186A1 (zh) 一种高效低成本的云游戏系统
WO2022179600A1 (zh) 视频编码方法、视频解码方法、装置及电子设备
JP6216046B2 (ja) コーデックの自動適応化
CN114707646B (zh) 基于远程推理的分布式人工智能实践平台
CN103096086A (zh) 一种在多画面显示中下采样前移达到系统优化的方法
CN116546220A (zh) 面向人机混合的视频编码与解码方法、系统、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20912767

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20912767

Country of ref document: EP

Kind code of ref document: A1