WO2022205060A1 - 图像处理方式的确定方法及装置 - Google Patents

图像处理方式的确定方法及装置 Download PDF

Info

Publication number
WO2022205060A1
WO2022205060A1 PCT/CN2021/084377 CN2021084377W WO2022205060A1 WO 2022205060 A1 WO2022205060 A1 WO 2022205060A1 CN 2021084377 W CN2021084377 W CN 2021084377W WO 2022205060 A1 WO2022205060 A1 WO 2022205060A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameter
image
value
relationship
probability
Prior art date
Application number
PCT/CN2021/084377
Other languages
English (en)
French (fr)
Inventor
林永兵
马莎
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2021/084377 priority Critical patent/WO2022205060A1/zh
Priority to CN202180001348.0A priority patent/CN113228657B/zh
Publication of WO2022205060A1 publication Critical patent/WO2022205060A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Definitions

  • the present application relates to the field of image technology, and in particular, to a method and device for determining an image processing method.
  • the camera has the characteristics of high resolution, non-contact, convenient use and low cost, and is an essential sensor for environmental perception of autonomous driving. More and more cameras can be installed on the vehicle. During automatic driving, the camera collects images in the environment and performs machine vision processing to identify obstacles or targets in the environment, so as to achieve no blind spot coverage.
  • FIG. 1a is a schematic diagram of data transmission in a compression-based sensing system in the related art.
  • FIG. 1a is a schematic diagram of data transmission in a compression-based sensing system in the related art.
  • the perception system includes a camera and an image signal processor (ISP), and the perception system transmits the processed image data to the mobile data center (MDC), which is further processed by the MDC. deal with.
  • the Bayer RAW image output by the camera is processed by the ISP and sent to the MDC, and the MDC performs machine vision processing on the image processed by the ISP.
  • the Bayer RAW image output by the camera in Figure 1a can be an Ultra high definition (UHD) image with a resolution of 4K, the frame rate of the image can be 30fps, and the bit depth of the image can be 16bitdepth.
  • the bandwidth requirement is up to 4Gbps (4K*2k*30*16).
  • the method of compressing and transmitting images can be used to reduce bandwidth requirements, and new services of UHD video transmission can be carried out without upgrading the existing network.
  • FIG. 1a shows a schematic diagram of an architecture of video compression according to an example in the related art.
  • the camera outputs an image in RAW format, which is encoded by an encoder and then outputs an image in RAW format.
  • the outputted image in RAW format is a compressed image, and the image encoded by the encoder can be transmitted to MDC, MDC It can include decoder, ISP and deep neural network.
  • the decoder is used to decode the received compressed image to obtain the decoded image, and then output the three primary colors (Red Green Blue, RGB) or YUV format after ISP processing.
  • the image goes to a deep neural network for further processing.
  • the ISP processing may include: a demosaic (Demosaic) operation for converting an image from a RAW format to an RGB format; a white balance (WB) operation for performing white balance processing on an image; a color correction matrix (Color Correction Matrix, CCM), used to complete the conversion from sensor_RGB color space to sRGB color space, so that the color matching characteristics of the camera meet the Luther condition; Gamma (Gamma) correction, used to correct the display characteristics of the display and the nonlinearity of the input image relation.
  • the ISP processing may also include other processing procedures for images, and the present application is not limited to the above-mentioned processing.
  • the processing of the image by the deep neural network can include: image recognition, segmentation, etc.
  • the example shown in Figure 1b can reduce the delay from the perception system to the MDC in the RAW domain; the ISP and deep neural network shown in Figure 1b can be set in the MDC, which can provide more flexible ISP capabilities and obtain better images quality, and can reduce the delay from the sensing system to the MDC.
  • lossy image/video compression technology can achieve higher compression rates.
  • Commonly used lossy compression standards include: Joint Photographic Experts Group (JPEG), H264/H265, JPEE-XS (Joint Photographic Experts Group Extra) Speed) etc.
  • JPEG-XS is a new compression standard proposed by the Joint Photographic Experts Group.
  • the image quality damage caused by the introduction of compression technology is inevitable, and the image quality damage will have an impact on subsequent machine vision processing, which may lead to problems such as a decrease in the accuracy of recognition and inaccurate image segmentation.
  • the probability index is used to characterize the degree to which the machine is lossless, a probability value is calculated for each code rate, and the code that meets the machine lossless is determined according to the probability threshold required by the business.
  • the rate threshold can be reasonably given to satisfy the lossless code rate threshold of the machine.
  • an embodiment of the present application provides a method for determining an image processing mode, the method comprising: calculating a probability corresponding to each value in the first parameter according to a first relationship between the first parameter and the precision,
  • the first parameter is a bit rate or a degree of distortion; wherein, the accuracy is the accuracy of recognizing the processed image, and the probability corresponding to each value in the first parameter is used to indicate the accuracy of each value in the first parameter.
  • the probability threshold required by the business can refer to the requirement for the proximity of the recognition accuracy to the lossless machine in different application scenarios.
  • the application scenarios can be automatic driving, assisted driving, etc.
  • the requirements for lossless proximity may be different, therefore, different application scenarios have corresponding probability thresholds for business requirements.
  • a probability index is used to characterize the degree to which the machine is lossless, a probability value is calculated for each code rate, and a code rate threshold that satisfies the machine lossless is determined according to the probability threshold required by the business, which is reasonable. gives a bit rate threshold that satisfies the lossless machine.
  • calculating the probability corresponding to each value in the first parameter according to the first relationship between the first parameter and the precision includes: for each value in the first parameter calculate the precision mean and standard deviation of the precisions corresponding to all the numerical values of the first parameter in the sliding window centered on each numerical value; according to the precision mean, the standard deviation, and the sliding window The length, the first precision, and the cumulative distribution function are used to calculate the probability corresponding to each value.
  • a second possible implementation manner according to the precision mean, the standard deviation, the length of the sliding window, the first precision, and the cumulative distribution function , calculate the probability corresponding to each value, including:
  • the probability corresponding to the first parameter in the embodiment of the present application is calculated based on the T hypothesis test theory, which is suitable for application scenarios with small samples, so the evaluation accuracy is higher, and the application scenarios with small samples can improve the evaluation efficiency.
  • the first parameter is a code rate
  • each value in the first parameter corresponds to the
  • the processed image is an image obtained by compressing the original image by using each value in the first parameter.
  • the first parameter is the degree of distortion of the processed image
  • the threshold of the first parameter is A distortion threshold
  • the method further includes: determining a code rate threshold corresponding to the distortion threshold according to the distortion threshold and a third relationship; wherein the third relationship is a corresponding relationship between a code rate and a degree of distortion.
  • the first relationship is used to evaluate the influence of the degree of distortion after compression on the accuracy
  • the third relationship is used to evaluate the degree of distortion after compression with different code rates.
  • the evaluation of the processing process after compression is processed separately, which can realize the decoupling of the processing process after compression and compression, and improve the efficiency of evaluation.
  • the above-mentioned methods provided in the embodiments of the present application can realize the decoupling of compression algorithms and AI processing. If a new compression algorithm is to be evaluated, it is not necessary to perform end-to-end (from front-end compression processing to back-end artificial intelligence processing). ) evaluation, the third relationship corresponding to the new compression algorithm can be obtained only by evaluating the compression processing process.
  • the original image is a Bayer original Bayer RAW image
  • the processed image is a RAW image, an RGB image, or a YUV image .
  • the method for determining the image processing method of the present application can be applied to various scenarios, has strong versatility, and is easy to extend the evaluation of new compression algorithms or AI modules, which can improve the efficiency of evaluation.
  • an embodiment of the present application provides an apparatus for determining an image processing method, the apparatus includes: a calculation module, configured to calculate each of the first parameters according to the first relationship between the first parameters and the precision The probability corresponding to the value, the first parameter is the bit rate or the degree of distortion; wherein, the accuracy is the accuracy of recognizing the processed image, and the probability corresponding to each value in the first parameter is used to indicate that the The degree of closeness between the recognition accuracy of the processed image corresponding to each value in the first parameter and the first accuracy, the first accuracy is the accuracy of the original image, and the original image is processed to obtain the a processed image; a first determination module for determining a second relationship between the first parameter and the probability; a second determination module for obtaining the probability according to a probability threshold required by a business and the second relationship The threshold corresponds to the threshold of the first parameter.
  • the device for determining an image processing method in this embodiment of the present application uses a probability index to characterize the degree to which the machine is lossless, calculates a probability value for each code rate, and determines a code rate threshold that satisfies the machine lossless according to the probability threshold required by the business, which is reasonable gives a bit rate threshold that satisfies the lossless machine.
  • the calculation module includes: a first calculation unit, configured to calculate, for each value in the first parameter, a sliding movement centered on each value The precision mean and standard deviation of the precisions corresponding to all the values of the first parameter in the window; the second calculation unit is configured to calculate the precision according to the precision mean, the standard deviation, the length of the sliding window, the first precision and the cumulative distribution function to calculate the probability corresponding to each of the values.
  • the second calculation unit is configured to calculate according to the formula Calculate the probability corresponding to each value, where P m represents the probability corresponding to the first parameter m, ⁇ represents the first precision, n represents the length of the sliding window, Represents the average precision of the precision corresponding to the value of the first parameter in the sliding window with the first parameter m as the center and the length n, and ⁇ m represents the first parameter m in the sliding window with the length n as the center.
  • the standard deviation of the precision corresponding to the value of the parameter Represents the cumulative distribution function of the T distribution with n-1 degrees of freedom, where n is a positive integer greater than 1.
  • the probability corresponding to the first parameter in the embodiment of the present application is calculated based on the T hypothesis test theory, which is suitable for application scenarios with small samples, so the evaluation accuracy is higher, and the application scenarios with small samples can improve the evaluation efficiency.
  • the first parameter is a code rate
  • each value in the first parameter corresponds to the
  • the processed image is an image obtained by compressing the original image by using each value in the first parameter.
  • the apparatus further includes: a third determination module, configured to determine a bit rate threshold corresponding to the distortion threshold according to the distortion threshold and a third relationship; wherein the third relationship is a bit rate and a distortion degree Correspondence between.
  • the first relationship is used to evaluate the influence of the degree of distortion after compression on the accuracy
  • the third relationship is used to evaluate the degree of distortion after compression with different code rates.
  • the evaluation of the processing process after compression is processed separately, which can realize the decoupling of the processing process after compression and compression, and improve the efficiency of evaluation.
  • the above-mentioned device provided by the embodiment of the present application can realize the decoupling of the compression algorithm and the AI processing. If a new compression algorithm is to be evaluated, end-to-end evaluation is not required, and the new compression process can be evaluated only by evaluating the process.
  • the third relationship corresponding to the compression algorithm can be used.
  • the original image is a Bayer original Bayer RAW image
  • the processed image is a RAW image, or an RGB image, or a YUV image .
  • the apparatus for determining the image processing method of the present application can be applied to various scenarios, has strong versatility, and is easy to expand to evaluate new compression algorithms or AI modules, which can improve the efficiency of evaluation.
  • embodiments of the present application provide a computer program product, comprising computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in an electronic
  • the processor in the electronic device executes the first aspect or the method for determining one or more image processing manners in the first aspect or multiple possible implementation manners of the first aspect.
  • an embodiment of the present application provides an electronic device, which can execute the first aspect or a method for determining one or more image processing methods in the first aspect or multiple possible implementations of the first aspect .
  • an embodiment of the present application further provides a sensor system for providing a sensing function for a vehicle. It includes at least one device for determining the image processing method mentioned in the above-mentioned embodiments of the present application, and at least one of other sensors such as a camera or a radar. At least one sensor device in the system can be integrated into a whole machine or device, or The at least one sensor device within the system can also be provided independently as an element or device.
  • the embodiments of the present application also provide a system, which is applied in unmanned driving or intelligent driving, which includes at least one device for determining the image processing method mentioned in the above-mentioned embodiments of the present application, and a sensor in sensors such as cameras and radars.
  • At least one, at least one device in the system can be integrated into a whole machine or equipment, or at least one device in the system can also be independently set as a component or device.
  • any of the above systems may interact with the vehicle's central controller to provide detection and/or fusion information for decision-making or control of the vehicle's driving.
  • an embodiment of the present application further provides a vehicle, where the vehicle includes at least one image processing method determination device or any of the above-mentioned systems mentioned in the above-mentioned embodiments of the present application.
  • FIG. 1a is a schematic diagram of data transmission in a compression-based sensing system in the related art.
  • FIG. 1b shows a schematic diagram of an architecture of video compression according to an example in the related art.
  • Fig. 2a shows a schematic diagram of an evaluation framework according to an embodiment of the present application.
  • FIG. 2b shows a schematic diagram of a rate-precision curve according to an embodiment of the present application.
  • FIG. 3 shows a flowchart of a method for determining an image processing mode according to an embodiment of the present application.
  • FIG. 4 shows a schematic diagram of a curve corresponding to the first relationship and a curve diagram of the second relationship according to an embodiment of the present application.
  • FIG. 5 shows a schematic diagram of the distribution of the probability density function of the T distribution according to an embodiment of the present application.
  • FIG. 6 shows a schematic diagram of a scenario of obtaining a first relationship according to an embodiment of the present application.
  • FIG. 7a shows a schematic diagram of a curve of the first relationship according to an embodiment of the present application.
  • FIG. 7b shows a schematic diagram of a curve corresponding to the third relationship according to an embodiment of the present application.
  • FIG. 8 shows a schematic diagram of a curve corresponding to the first relationship and a curve diagram of the second relationship according to an embodiment of the present application.
  • FIG. 9 shows a schematic diagram of an assessment framework according to some examples of the present application.
  • FIG. 10 shows a schematic diagram of an assessment framework according to some examples of the present application.
  • FIG. 11 shows a schematic diagram of an assessment framework according to some examples of the present application.
  • FIG. 12 shows a block diagram of an apparatus for determining an image processing method according to an embodiment of the present application.
  • the bit rate can indicate how often the image is sampled when compressing it.
  • Accuracy can represent the accuracy with which processed images are recognized, such as the accuracy with which compressed images are recognized.
  • the first accuracy the accuracy of identifying the original image (the uncompressed image).
  • Machine Lossless Probability The closeness of the accuracy of recognizing the compressed image to the first accuracy.
  • Fig. 2a shows a schematic diagram of an evaluation framework according to an embodiment of the present application
  • the evaluation framework shown in Fig. 2a is the Moving Picture Experts Group (MPEG)-Machine Vision Coding (Video Coding for Machines, VCM) working group
  • MPEG Moving Picture Experts Group
  • VCM Video Coding for Machines
  • the camera outputs the video processed by the ISP to the VCM encoder.
  • the video processed by the ISP can be in RGB or YUV format.
  • the VCM encoder encodes the video to obtain the encoded video, and the encoded video is transmitted.
  • the VCM decoder performs video decoding to obtain the decoded video, and performs machine vision processing on the decoded video.
  • the decoded image can be output to a neural network, and machine vision processing is performed through the neural network.
  • the accuracy threshold required by the business may refer to the accuracy requirements in different application scenarios, and the application scenarios may be automatic driving, assisted driving, etc. These different application scenarios may have different processing accuracy requirements. Therefore, different The application scenarios have corresponding accuracy thresholds required by the business.
  • the accuracy threshold required by the service may refer to the difference between the accuracy required by the service and the accuracy that is lossless by the machine.
  • FIG. 2b shows a schematic diagram of a rate-precision curve according to an embodiment of the present application.
  • the accuracy thresholds required by the business are 10% and 20%, respectively.
  • the accuracy of the neural network for uncompressed image recognition is about 26%. If the accuracy threshold is 10%, then the corresponding accuracy About 16%, the code rate corresponding to 16% accuracy is about 0.03; if the accuracy threshold is 20%, then the corresponding accuracy is about 6%, and the code rate corresponding to 6% accuracy is about 0.025.
  • FIG. 3 shows a flowchart of a method for determining an image processing mode according to an embodiment of the present application. As shown in FIG. 3 , the method for determining an image processing mode according to this embodiment of the present application may include the following steps:
  • Step S300 Calculate the probability corresponding to each value in the first parameter according to the first relationship between the first parameter and the precision, where the first parameter is a bit rate or a distortion degree.
  • Step S301 determining a second relationship between the first parameter and the probability.
  • Step S302 Obtain the threshold of the first parameter corresponding to the probability threshold according to the probability threshold required by the service and the second relationship.
  • the precision is the precision of recognizing the processed image, and the processing may refer to compression processing such as encoding/decoding performed on the image.
  • the indicators used for precision may be mean Average Precision (mAP), mean precision (Average Precision, AP), average recall rate (Average Recall, AR), mean cross-join ratio (Mean Intersection over Union, MIoU).
  • the AP may be AP50, AP60, AP70, or weightedAP, and so on.
  • the indicators used for accuracy can also be a combination of multiple indicators above. For example, the accuracy of AI processing can be comprehensively evaluated by combining multiple indicators above.
  • the weighted index of mAP and AR can be used as the precision index, and the present application does not limit the specific index used for the precision.
  • the probability corresponding to each value in the first parameter is used to indicate the degree of closeness between the accuracy of identifying the processed image corresponding to each value in the first parameter and the first accuracy
  • the first accuracy is the accuracy of identifying the original image
  • the processed image is obtained after the original image is processed.
  • the processed image corresponding to each value in the first parameter is an image obtained by compressing the original image by using each value in the first parameter. Therefore, the probability corresponding to each value in the first parameter is used to indicate: the degree of closeness between the accuracy of identifying the original image after processing the original image by using each value in the first parameter and the first accuracy.
  • the first relationship is the corresponding relationship between the code rate and the precision.
  • the probability corresponding to each code rate value can be calculated, and the probability corresponding to each code rate value is used to indicate the use of the code rate.
  • the value is the closeness of the recognition accuracy of the compressed image to the first accuracy.
  • the first parameter is the code rate
  • the first relationship can be obtained by testing the framework shown in FIG. 2a, and can be stored as a correspondence between discrete pairs of values, or can be stored as a function
  • the function can be expressed as a curve as shown in Figure 2b, that is, the first relationship can also be expressed as a curve.
  • the first relationship in this embodiment of the present application can be directly tested on an existing test set, for example, it can be tested on the cityscape data set.
  • the cityscape data set includes a training map, a verification map, and a test map.
  • the images included in the data set are all annotated, and can be directly compressed and identified to obtain the first relationship required by the embodiment of the present application.
  • the simulation device can include the VCM encoder, VCM decoder, machine vision module, and processor shown in Figure 2a, and does not need to include the camera and ISP shown in Figure 2a, and the simulation device receives the images in the test set data.
  • the VCM encoder, VCM decoder, and machine vision module can be software programs stored in the memory of the simulation device, and the processor can call the corresponding modules to process the images in the test set, and obtain the code rate and the corresponding precision. data.
  • the processor may establish a first relationship between code rate and precision, and store the first relationship.
  • the method in this embodiment of the present application may also be tested in an actual application scenario to obtain the first relationship, for example, the test is performed on an automatic driving system, and the automatic driving system may include a camera (such as The camera shown in Figure 2a), may also include but not limited to: vehicle terminal, vehicle controller, vehicle module, vehicle module, vehicle parts, vehicle chip, vehicle unit, vehicle radar or vehicle camera and other sensors.
  • a camera such as The camera shown in Figure 2a
  • vehicle terminal such as The camera shown in Figure 2a
  • vehicle controller vehicle module, vehicle module, vehicle parts, vehicle chip, vehicle unit, vehicle radar or vehicle camera and other sensors.
  • the encoder can be located on the camera, the decoder can be located on the MDC, the machine vision module and the processor can be located on the MDC, or the machine vision module can be located on the MDC, and the processor can be an external device (a device other than the automatic driving system) processor.
  • the external device can be a general-purpose device or a dedicated device.
  • the external device may be a desktop computer, a portable computer, a network server, a personal digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or other devices with processing functions.
  • PDA personal digital assistant
  • the peripheral device may have a chip or processor with processing functions, the peripheral device may include multiple processors, and the processor may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor .
  • the method for determining the image processing mode of the present application may be performed offline by the external device.
  • the code rate of the encoder in Figure 2a can be set during the test. After the camera captures the image, it is encoded by the encoder and sent to the MDC. The image obtained after decoding by the decoder can be processed by machine vision to obtain the accuracy of the decoded image. data. For different code rate points, the code rate of the encoder can be set multiple times, and the above process is performed to obtain the accuracy corresponding to multiple code rate points.
  • the obtained code rate and the accuracy data corresponding to the code rate can be output to an external device, and the external device can establish a first relationship between the code rate and the accuracy according to the code rate and the accuracy data corresponding to the code rate, and store the first relationship.
  • the automatic driving system can execute the method for determining the image processing method of the embodiment of the present application online.
  • the encoder in FIG. 2a can be set during testing. Bit rate, after the camera captures the image, it is encoded by the encoder, and then sent to the MDC.
  • the image decoded by the decoder can be processed by machine vision to obtain precision data.
  • the MDC may establish a corresponding relationship (a first relationship) between the code rate and the precision, and store the first relationship. Then, the method for determining the image processing mode of the present application is executed by the MDC.
  • both the machine vision module and the processor are located on the MDC, which is only an example of the application, and does not limit the application in any way.
  • the machine vision module and the processor may also be located in the automatic driving system. On other components, this application does not limit it.
  • step S300 calculating the probability corresponding to each value in the first parameter according to the first relationship between the first parameter and the precision may include: for each value in the first parameter , calculate the precision mean and standard deviation of the precision corresponding to all the numerical values of the first parameter in the sliding window centered on each numerical value; according to the precision mean, the standard deviation, the length of the sliding window, The first precision and the cumulative distribution function are used to calculate the probability corresponding to each value.
  • the sliding window may refer to a window with a fixed covering length, but may slide along the curve corresponding to the first relationship, and the sliding window may be along the abscissa direction. Sliding on the curve corresponding to the first relationship according to a certain step size, the width of the sliding window can be changed according to the sliding process. The larger the difference, the wider the width of the sliding window, and vice versa.
  • FIG. 4 shows a schematic diagram of a curve corresponding to the first relationship and a curve diagram of the second relationship according to an embodiment of the present application.
  • the relationship curve between the accuracy of the compressed image recognition and the code rate can be an example of the curve corresponding to the first relationship.
  • the dotted rectangle in FIG. 4 and Solid rectangles can represent sliding windows.
  • the center of the sliding window slides in the direction of the intersection of the abscissa corresponding to the next step and the curve corresponding to the first relationship, as shown in Figure 4, the dotted rectangle The center is at the intersection of the abscissa 5 and the curve corresponding to the first relationship.
  • the center of the solid-line rectangular box is at the intersection of the abscissa 5.5 and the curve corresponding to the first relationship.
  • the precision mean and standard deviation of the first parameter within the coverage of the sliding window can be calculated.
  • the sliding window covers
  • the mean and standard deviation of the precision corresponding to code rates 5 to 6 can be calculated according to the expression of the curve corresponding to the first relationship.
  • the length of the sliding window, the numerical value of the code rate, the numerical value of the precision, and the numerical value of the step size in the example shown in FIG. 4 are all examples of the present application and do not limit the present application in any way.
  • the length of the sliding window can also be set to 11.
  • calculating the probability corresponding to each value according to the precision mean, the standard deviation, the length of the sliding window, the first precision, and the cumulative distribution function may include:
  • P m represents the probability corresponding to the first parameter m
  • m may represent the code rate
  • represents the first precision
  • n represents the length of the sliding window
  • ⁇ m represents the first parameter m in the sliding window with the length n as the center.
  • the standard deviation of the precision corresponding to the value of the parameter Represents the cumulative distribution function of the T distribution with n-1 degrees of freedom, where n is a positive integer greater than 1.
  • FIG. 5 shows a schematic diagram of the distribution of the probability density function of the T distribution according to an embodiment of the present application.
  • the larger the P the closer the t value is to the y-axis (the center line). ), the smaller the P, the farther the t value is from the y-axis (center line).
  • the center line corresponding to the above formula (1) is the thick dashed line with the precision equal to 1.0 in Figure 4 (the curve corresponding to no compression), and the calculated The larger the P value, the closer the accuracy value is to the center line, and the smaller the calculated P value, the farther the accuracy is from the center line. Therefore, the P value calculated according to formula (1) can represent the degree of accuracy from the center line.
  • the center line here is the accuracy line for identifying the original image.
  • the probability corresponding to the first parameter in the embodiment of the present application is calculated based on the T hypothesis test theory, which is suitable for application scenarios with small samples, so the evaluation accuracy is higher, and the application scenarios with small samples can improve the evaluation efficiency.
  • the calculated discrete value pair may be used as the second relationship, or a function may be obtained by fitting the calculated discrete value pair, and the obtained function may be used as the second relationship, as shown in FIG. 4 .
  • a rate-probability curve may be an example of the second relationship.
  • the probability threshold value required by the business may refer to the requirement for the proximity of the recognition accuracy to the lossless machine in different application scenarios, and the application scenarios may be automatic driving, assisted driving, etc. These different applications Scenarios may have different requirements for the proximity of processing accuracy to machine lossless. Therefore, different application scenarios have corresponding probability thresholds for business requirements.
  • the corresponding bit rate threshold may be determined according to the machine lossless confidence level required by the service, and the machine lossless confidence level required by the service may be used as the probability threshold required by the service. For example, if the machine lossless confidence level required by the service is 0.1, then, according to the curve corresponding to the second relationship shown in FIG.
  • the code rate point is the machine lossless code rate threshold point, and when the code rate is higher than the code rate threshold, it is considered that the machine lossless level is reached.
  • Different confidence levels have different code rate threshold points. The higher the confidence level, the larger the code rate threshold, and the closer to the accuracy index without compression.
  • a probability index is used to characterize the degree to which the machine is lossless, a probability value is calculated for each code rate, and a code rate threshold that satisfies the machine lossless is determined according to the probability threshold required by the business, which is reasonable. gives a bit rate threshold that satisfies the machine lossless.
  • the first parameter in this embodiment of the present application may also be a degree of distortion
  • the threshold of the first parameter is a threshold of distortion
  • the degree of distortion may represent the difference between the processed (compressed) image and the real environment.
  • the index used for the distortion degree may be a peak signal noise ratio (Peak signal noise ratio, PSNR), or a mean square error (Mean square error, MSE), or a structural similarity index (Structure similarity index, SSIM), or Perception loss (P-loss).
  • the degree of distortion may also be a combination of multiple indicators above, for example, combining multiple indicators of distortion to comprehensively evaluate the degree of distortion of a compressed image.
  • a weighted index of PSNR and SSIM can be used as the final distortion degree, which can be applied to applications requiring both signal fidelity (PSNR) and human vision (SSIM).
  • FIG. 6 shows a schematic diagram of a scenario of obtaining a first relationship according to an embodiment of the present application.
  • a compression module can compress the received image, and during the compression process, it can sample the image (the sampling frequency is the bit rate), and the compressed image can be transmitted to the AI module for target detection, image segmentation, etc. .
  • the method of the embodiment of the present application can be directly tested on the existing test set, for example, the test can be performed on the cityscape data set.
  • the cityscape data set includes training maps, verification maps, and test maps.
  • the included images are all annotated, and can be directly compressed and identified to obtain the test result data (distortion and precision data corresponding to the code rate) of the embodiment of the present application.
  • the code rate is the sampling frequency of the compressed image obtained by compressing the original image using the compression algorithm
  • the distortion degree is the difference between the compressed image and the real environment.
  • the simulation device may include the above-mentioned compression module, AI module, and processor, wherein the compression module and the AI module may be software programs stored in the memory of the simulation device, and the processor may call the corresponding The image processing, and get the test result data.
  • the processor may establish the correspondence between the distortion degree and the precision (the first relation), and the correspondence relation between the code rate and the distortion degree (the third relation), and store the first relation and the third relation .
  • the methods of the embodiments of the present application can also be tested in actual application scenarios, for example, tested on an automatic driving system.
  • the automatic driving system may include a camera, and may also include but not limited to: vehicle terminals, vehicle controllers, vehicle Other sensors such as modules, in-vehicle modules, in-vehicle components, in-vehicle chips, in-vehicle units, in-vehicle radar or in-vehicle cameras.
  • the compression module may be an encoder, the encoder may be located on the camera, the compression module may also include a decoder, the decoder may be located on the MDC, the AI module and the processor may be located on the MDC, or the AI module is located on the MDC, processing
  • the processor can be a processor of an external device (device other than the autonomous driving system).
  • the external device can be a general-purpose device or a dedicated device.
  • the external device may also be a desktop computer, a portable computer, a network server, a personal digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or other devices with processing functions .
  • PDA personal digital assistant
  • This embodiment of the present application does not limit the type of the external device.
  • the external device may have a chip or processor with a processing function (such as the processor shown in FIG. 6 ), the external device may include multiple processors, and the processor may be a single-core (single-CPU) processor, or It is a multi-core (multi-CPU) processor.
  • the method for determining the image processing mode of the present application may be performed offline by the above-mentioned external device.
  • the code rate of the encoder can be set during the test. After the camera captures the image, the encoder encodes it and sends it to the MDC. The image decoded by the decoder can be stored in the MDC.
  • the AI module can identify the decoded image and get Precision data. For different code rate points, the code rate of the encoder can be set multiple times, and the above process is performed to obtain the test result data.
  • the obtained test result data can be output to an external device, the external device can obtain the distortion degree of the compressed image according to the decoded image, and the external device can establish a corresponding relationship between the distortion degree and the accuracy (the first relationship), and the code
  • the corresponding relationship (third relationship) between the rate and the distortion degree is stored, and the first relationship and the third relationship are stored.
  • the automatic driving system can execute the method for determining the image processing method of the embodiment of the present application online.
  • the code rate of the encoder can be set during testing, and the camera capture After the image is encoded by the encoder, it is sent to the MDC.
  • the image decoded by the decoder can be stored in the MDC.
  • the MDC can obtain the distortion degree of the compressed image according to the decoded image.
  • Accuracy data can be obtained by performing identification.
  • the MDC can establish the correspondence between the distortion degree and the precision (the first relation), and the correspondence relation between the code rate and the distortion degree (the third relation), and store the first relation and the third relation.
  • the first relationship and the third relationship may be one-to-one values stored in the form of table entries, or may be expressed in the form of functions, which are not limited in the present application.
  • the first relationship can be represented in the form shown in Table 1.
  • the first relationship can also be expressed in the form of a function, as shown in the following formula (2):
  • fi(D) represents the functional relationship between the precision and the degree of distortion in the numerical range Di, and i is a positive integer from 1 to n.
  • the relationship between precision and distortion can be expressed in the form of a piecewise function.
  • P and D may have a linear relationship.
  • the relationship between accuracy and distortion can be expressed as a piecewise linear function.
  • FIG. 7a shows a schematic diagram of a curve of the first relationship according to an embodiment of the present application.
  • the abscissa may represent the degree of distortion
  • the ordinate may represent the accuracy.
  • the index of distortion degree is PSNR
  • the index of accuracy is mAP.
  • the curves of the first relationship corresponding to three different compression algorithms X265_medium (X265 default configuration), X264_medium (X264 default configuration), X264_ultrafast (X264 fast configuration)
  • the first relationship has nothing to do with the specific compression algorithm used, and does not depend on the specific compression algorithm.
  • the performance of machine vision mainly depends on the distortion degree of the input image, which has nothing to do with the compression algorithm and does not depend on the specific compression algorithm.
  • the performance of machine vision is related to the specific neural network used.
  • the third relationship can also be stored in a table in discrete value pairs, and Table 2 shows a form of storage of the third relationship
  • the third relationship can also be expressed in the form of a function, as shown in the following formula (3):
  • gi(R) represents the functional relationship between the distortion degree and the code rate in the numerical range Ri.
  • the relationship between the distortion degree and the code rate can be expressed in the form of a piecewise function.
  • D and R may have a linear relationship, and the relationship between the distortion degree and the code rate may be expressed as a piecewise linear function.
  • FIG. 7b shows a schematic diagram of a curve corresponding to the third relationship according to an embodiment of the present application.
  • the abscissa may represent the code rate, and the ordinate may represent the distortion degree.
  • the index of the distortion degree adopted is PSNR.
  • the curves of the third relationship corresponding to the three different compression algorithms are relatively scattered, that is to say, even if different compression algorithms use the same bit rate to compress the image, the The degree of distortion varies greatly, and the third relationship is related to the specific compression algorithm used.
  • the processor may execute steps S300-S302.
  • the probability corresponding to the degree of distortion may be calculated in the same manner as in the example in which the first parameter is the code rate.
  • FIG. 8 shows a schematic diagram of a curve corresponding to the first relationship and a curve diagram of the second relationship according to an embodiment of the present application.
  • a relationship curve curve compression: distortion degree-accuracy
  • the probability corresponding to the degree of distortion can also be calculated by a sliding window.
  • the probability corresponding to each degree of distortion can be calculated according to formula (4):
  • P d represents the probability corresponding to the first parameter d
  • d may represent the distortion
  • represents the first precision
  • n represents the length of the sliding window
  • ⁇ d represents the first parameter in the sliding window with the first parameter d as the center and the length of n
  • the standard deviation of the precision corresponding to the value of the parameter Represents the cumulative distribution function of the T distribution with n-1 degrees of freedom, where n is a positive integer greater than 1.
  • the calculated discrete value pair may be used as the second relationship, or a function may be obtained by fitting the calculated discrete value pair, and the obtained function may be used as the second relationship, as shown in FIG. 8 .
  • a distortion-probability curve may be an example of the second relationship.
  • step S302 is executed, and the corresponding distortion threshold can be determined according to the machine lossless confidence level required by the service, and the machine lossless confidence level required by the service can be used as the probability threshold value required by the service.
  • the machine lossless confidence level required by the business is 0.05, then, according to the curve corresponding to the second relationship shown in Figure 8, it can be determined that the distortion threshold corresponding to the probability threshold 0.05 can be 47.5.
  • the method for determining an image processing mode may further include: determining a bit rate threshold corresponding to the distortion threshold according to the distortion threshold and a third relationship; wherein the third relationship is the bit rate and the Correspondence between distortion degrees.
  • the same distortion threshold corresponds to different code rates.
  • the bit rate corresponding to the distortion threshold 47.5 is about 1.4
  • the bit rate corresponding to the distortion threshold 47.5 is about 1.75.
  • the probability index is used to characterize the degree to which the machine is lossless, a probability value is calculated for each distortion degree, and a distortion threshold that satisfies the machine lossless is determined according to the probability threshold required by the business.
  • the third relationship with the code rate and the distortion threshold value can determine the code rate threshold value corresponding to the distortion threshold value, and can reasonably provide a code rate threshold value that satisfies the lossless of the machine.
  • the first relationship is used to evaluate the influence of the degree of distortion after compression on the accuracy
  • the third relationship is used to evaluate the degree of distortion after compression using different code rates, and the compressed
  • the evaluation of the process and the post-compression process are processed separately, which can realize the decoupling of the compression and post-compression processes, that is, the decoupling of the compression algorithm and AI processing. If you want to evaluate a new compression algorithm, you do not need to perform a terminal In the end-to-end evaluation, the third relationship corresponding to the new compression algorithm can be obtained only by evaluating the compression processing process.
  • the original image may be a Bayer original Bayer RAW image
  • the processed image may be a RAW image, or an RGB image, or a YUV image.
  • the evaluation process can be divided into two stages: the first stage and the second stage.
  • the first stage is used to test the compression algorithm, and the third relationship between the bit rate and the distortion degree can be obtained.
  • the stage is used to test the recognition accuracy of the neural network, and the first relationship between distortion and accuracy can be obtained.
  • the degree of distortion may be defined in the RGB domain, and the indicator used for the degree of distortion may be the above-mentioned PSNR, or MSE, or SSIM, or P-loss, or the weighted result of the above multiple indicators.
  • distortion is mainly quantization noise introduced by compression coding.
  • the degree of distortion mainly depends on the energy of the compressed and quantized noise and has little to do with the specific noise form.
  • PSNR/MSE becomes a suitable indicator to measure the compression distortion.
  • PSNR and MSE have a log relationship. MSE is characterized by the amount of compressed noise energy, and the PSNR/MSE index is simple to calculate and easy to use.
  • Example (a) represents the evaluation process of the reference.
  • the RAW image is processed by the ISP and then the RGB image is output to the deep neural network.
  • the distortion data of the RGB image can be output.
  • the deep neural network performs machine vision for the uncompressed RGB image. Process the identified accuracy data.
  • Example (b) represents the scene of compressing RAW images in the RAW domain.
  • the RAW images are compressed by the encoder/decoder to obtain the compressed images.
  • the ISP can process the compressed images to obtain RGB images, and output the RGB images. Distortion data, machine vision processing of RGB images by deep neural network, can get the accuracy data of recognition.
  • compressing RAW images in RAW domain can reduce the complexity of compression algorithm because the amount of data in RAW domain is less.
  • Example (c) represents the scene of compressing RGB images in the RGB domain.
  • the RAW image is processed by ISP to obtain an RGB image, and the encoder/decoder compresses the RGB image to obtain a compressed image, and outputs the distortion of the compressed RGB image.
  • the accuracy data of the recognition can be obtained by the machine vision processing of the compressed RGB image by the deep neural network.
  • Example (d) represents a scene where YUV images are compressed in the YUV domain.
  • RAW images are processed by ISP to obtain RGB images
  • YUV images can be obtained by converting RGB images to RGB-YUV format
  • YUV images are compressed using encoder/decoder
  • Obtain the compressed image convert the compressed image to YUV-RGB format to obtain the compressed RGB image, output the distortion data of the compressed RGB image, and perform machine vision processing on the compressed RGB image by the deep neural network.
  • Accuracy data for identification can be obtained.
  • Example (a) can obtain the accuracy of recognizing uncompressed images, that is, the first accuracy.
  • the frameworks of example (b), example (c), and example (d) can be used for testing, and the third relationship corresponding to each compression algorithm on the framework of each example is obtained, and each Example first relation.
  • tests can be performed on the frameworks of example (b), example (c), and example (d) according to the above process.
  • the compression algorithm X265_medium is used to compress the RAW image at different bit rates to obtain a compressed image, and the ISP can process the compressed image to obtain an RGB image, and output the distortion data of the RGB image.
  • the third relationship between the bit rate and the distortion degree corresponding to the compression algorithm X265_medium can be established, the deep neural network can process the RGB image by machine vision, and the recognized accuracy data can be obtained, and the first relationship between the distortion degree and the accuracy can be established;
  • the compression algorithm X264_medium is used to compress the RAW image to obtain a compressed image.
  • the ISP can process the compressed image to obtain an RGB image, and output the distortion data of the RGB image.
  • the corresponding code rate and compression algorithm X264_medium can be established.
  • the third relationship of distortion degree, for the compression algorithm X264_medium it is not necessary to continue to test the subsequent machine vision processing process.
  • the first relationship obtained according to the compression algorithm X265_medium test can be used; for compression
  • the algorithm X264_ultrafast can repeat the same process as the compression algorithm X264_medium to obtain the corresponding third relationship.
  • a probability corresponding to each distortion degree may be calculated according to step S300 in the method for determining an image processing method provided by the embodiment of the present application, and a second relationship between the distortion degree and the probability may be established.
  • a code rate threshold that satisfies the machine lossless can be reasonably given.
  • the image quality evaluation method of the embodiment of the present application is simple and efficient.
  • the process of compression and AI processing is decoupled to achieve staged evaluation.
  • the precision of AI processing is independent of the compression algorithm. For new compression algorithms, only the It is enough to re-test the distortion and bit rate, without the need for end-to-end testing, which simplifies the testing process and makes the evaluation more efficient.
  • the evaluation process can still be decomposed into two stages: the first stage and the second stage, which is different from the example in FIG. 9 in which the stages are divided.
  • the distortion degree can be defined in the YUV domain, and PSNR, or MSE, or SSIM, or P-loss, or the weighted result of the above multiple indicators can also be used as the distortion degree indicator. Therefore, the first stage can be divided into using a compression algorithm to compress the YUV image to obtain a compressed YUV image, and the distortion data of the compressed YUV image can be output.
  • example (e) may represent a reference evaluation process
  • example (f) may represent a scene of compressing an image in the YUV domain.
  • Other processes are similar to the example in FIG. 9 and will not be repeated.
  • FIG. 11 shows a schematic diagram of an assessment framework according to some examples of the present application.
  • the evaluation process can still be decomposed into two stages: the first stage and the second stage, which is different from the example of FIG. 9 and FIG. 10 in which the stages are divided.
  • the distortion degree can be defined in the RAW domain, and PSNR, or MSE, or SSIM, or P-loss, or the weighted result of the above multiple indicators can also be used as the distortion degree indicator. Therefore, the first stage can be divided into using a compression algorithm to compress the RAW image to obtain a compressed RAW image, and the distortion data of the compressed RAW image can be output.
  • example (g) can represent the reference evaluation process
  • example (h) can represent the scene of compressing images in the RAW domain. Other processes are similar to the example in FIG. 9 and will not be repeated.
  • the method for determining the image processing method of the present application can be applied to various scenarios, has strong versatility, and is easy to extend the evaluation of new compression algorithms or AI modules, which can improve the efficiency of evaluation.
  • the accuracy in the embodiments of the present application may be the accuracy of identifying other processed data, for example, the accuracy of identifying the processed audio data, and the processing may refer to the accuracy of identifying the audio data.
  • Compression processing such as encoding/decoding.
  • the image in the embodiment of the present application may also be other data, such as audio data, text data, etc., and the present application may provide a method for determining a data processing method.
  • the first parameter may be a code If the processed data is image data, the accuracy index can be as described above.
  • the accuracy of speech recognition can be the word Error rate (Word Error Rate, WER), Character Error Rate (Character Error Rate, CER), or the accuracy index used in speech recognition can also be a combination of multiple of the above indicators, for example, combining the above indicators for speech recognition
  • the accuracy of point cloud recognition can be the target recognition accuracy mAP, semantic segmentation accuracy IoU and Multi-Object Tracking Accuracy (MOTA) and so on.
  • the method for determining the data processing mode may include calculating a probability corresponding to each value in the first parameter according to the first relationship between the first parameter and the precision; determining the difference between the first parameter and the probability The second relationship: obtaining the threshold of the first parameter corresponding to the probability threshold according to the probability threshold required by the service and the second relationship.
  • the data can be images, audio, text, etc.
  • the probability index is used to characterize the degree to which the machine is lossless, a probability value is calculated for each first parameter, and the code rate threshold that satisfies the machine lossless is determined according to the probability threshold required by the business.
  • FIG. 12 shows a block diagram of an apparatus for determining an image processing method according to an embodiment of the present application.
  • the apparatus may include: a calculation module 120, configured to calculate the probability corresponding to each value in the first parameter according to the first relationship between the first parameter and the precision, where the first parameter is a code
  • the accuracy is the accuracy of recognizing the processed image
  • the probability corresponding to each value in the first parameter is used to indicate the value corresponding to each value in the first parameter.
  • the degree of closeness between the recognition accuracy of the processed image and the first accuracy where the first accuracy is the accuracy of the original image, and the processed image is obtained after the original image is processed; the first determination module 121 is used for Determine the second relationship between the first parameter and the probability; the second determination module 122 is configured to obtain the threshold value of the first parameter corresponding to the probability threshold value according to the probability threshold value required by the service and the second relationship .
  • the device for determining an image processing method in this embodiment of the present application uses a probability index to characterize the degree to which the machine is lossless, calculates a probability value for each code rate, and determines a code rate threshold that satisfies the machine lossless according to the probability threshold required by the business, which is reasonable gives a bit rate threshold that satisfies the lossless machine.
  • the calculation module 120 includes: a first calculation unit, configured to, for each value in the first parameter, calculate the first parameter in the sliding window centered on the each value The precision mean and standard deviation of the precisions corresponding to all the numerical values of a parameter; the second calculation unit is configured to, according to the precision mean, the standard deviation, the length of the sliding window, the first precision and the cumulative distribution function, Calculate the probability corresponding to each of the values.
  • the second calculation unit is used for formulating:
  • the probability corresponding to the first parameter in the embodiment of the present application is calculated based on the T hypothesis test theory, which is suitable for application scenarios with small samples, so the evaluation accuracy is higher, and the application scenarios with small samples can improve the evaluation efficiency.
  • the first parameter is a bit rate
  • the processed image corresponding to each value in the first parameter is the original image using each value in the first parameter Compressed image.
  • the first parameter is the degree of distortion of the processed image
  • the threshold of the first parameter is a distortion threshold
  • the apparatus further includes: a third determining module, configured to The distortion threshold value and the third relationship are determined, and the code rate threshold value corresponding to the distortion threshold value is determined; wherein, the third relationship is the corresponding relationship between the code rate and the distortion degree.
  • the decoupling of the compression algorithm and AI processing is realized. If a new compression algorithm is to be evaluated, end-to-end evaluation is not required, and only the compression process can be evaluated to obtain the first compression algorithm corresponding to the new compression algorithm. Three relationships are enough. Similarly, if you want to use a new AI module to recognize images, you can also use existing data to re-evaluate the AI recognition process to obtain a new first relationship, without end-to-end evaluation.
  • the device provided by the embodiment of the present application can improve the efficiency of evaluation.
  • the original image is a Bayer original Bayer RAW image
  • the processed image is a RAW image, or an RGB image, or a YUV image.
  • the apparatus for determining the image processing method of the present application can be applied to various scenarios, has strong versatility, and is easy to expand to evaluate new compression algorithms or AI modules, which can improve the efficiency of evaluation.
  • the device for determining the image processing mode may be a chip with processing function or a program module in a processor, and the processor may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the chip or processor can implement the methods of the foregoing embodiments of the present application by executing the program.
  • An embodiment of the present application provides an electronic device, including: a processor and a memory for storing instructions executable by the processor; wherein, the processor is configured to implement the methods of the foregoing embodiments of the present application when executing the instructions .
  • the apparatus or electronic device for determining the above image processing method may be a general-purpose device or a special-purpose device.
  • the apparatus can also be a desktop computer, a portable computer, a network server, a PDA (personal digital assistant, PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or other devices with processing functions.
  • PDA personal digital assistant
  • the embodiments of the present application do not limit the type of the apparatus for determining the image processing method.
  • Embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the above method.
  • Embodiments of the present application provide a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
  • a computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (Electrically Programmable Read-Only-Memory, EPROM or flash memory), static random access memory (Static Random-Access Memory, SRAM), portable compact disk read-only memory (Compact Disc Read-Only Memory, CD - ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices, such as punch cards or raised structures in grooves on which instructions are stored, and any suitable combination of the foregoing .
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read-only memory
  • EPROM Errically Programmable Read-Only-Memory
  • SRAM static random access memory
  • portable compact disk read-only memory Compact Disc Read-Only Memory
  • CD - ROM Compact Disc Read-Only Memory
  • DVD Digital Video Disc
  • memory sticks floppy disks

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请涉及图像处理方式的确定方法及装置,可用于辅助驾驶和自动驾驶。所述方法包括:根据第一参数和精度的第一关系,计算所述第一参数中每个数值对应的概率,所述第一参数为码率或失真度;确定所述第一参数和所述概率的第二关系;根据业务要求的概率阈值和所述第二关系,得到所述概率阈值对应的所述第一参数的阈值。本申请实施例的图像处理方式的确定方法,通过概率指标表征接近机器无损的程度,为每一个码率计算一个概率值,并根据业务要求的概率阈值确定满足机器无损的码率阈值,能够合理的给出满足机器无损的码率阈值,可以应用于车联网,如车辆外联V2X、车间通信长期演进技术LTE-V、车辆-车辆V2V等。

Description

图像处理方式的确定方法及装置 技术领域
本申请涉及图像技术领域,尤其涉及一种图像处理方式的确定方法及装置。
背景技术
随着社会的发展,智能运输设备、智能家居设备、机器人等智能终端正在逐步进入人们的日常生活中。传感器在智能终端上发挥着十分重要的作用。安装在智能终端上的各式各样的传感器,比如毫米波雷达、激光雷达、摄像头、超声波雷达等,在智能终端在运动过程中感知周围的环境,收集数据,进行移动物体的辨识与追踪,以及静止场景如车道线、标示牌的识别,并结合导航仪及地图数据进行路径规划。传感器可以预先察觉到可能发生的危险并辅助甚至自主采取必要的规避手段,有效增加了智能终端的安全性和舒适性。
摄像头具有分辨率高、非接触、使用方便、成本低廉等特点,是自动驾驶环境感知的必备传感器。车辆上可以安装越来越多的摄像头,在自动驾驶时,通过摄像头采集环境中的图像并进行机器视觉处理,识别环境中的障碍物或者目标,从而实现无盲点覆盖。
随着摄像头的分辨率、帧率、采样深度等参数的不断提高,摄像头输出的视频对传输带宽的需求越来越大。图1a是相关技术中一种基于压缩的感知系统传输数据的示意图。图1a是相关技术中一种基于压缩的感知系统传输数据的示意图。如图1a所示,感知系统中包括摄像头、图像信号处理器(Image signal processor,ISP),感知系统将处理后的图像数据传输到移动数据计算平台(Mobile data center,MDC),由MDC进一步进行处理。具体地,摄像头输出的拜耳原始(Bayer RAW)图像,经过ISP处理后发送MDC,MDC对ISP处理后的图像进行机器视觉处理。
图1a中的摄像头输出的Bayer RAW图像可以为分辨率为4K的超高清(Ultra high definition,UHD)图像,图像的帧率可以为30fps,图像的位深度可以为16bitdepth(比特位深),图像的带宽需求高达4Gbps(4K*2k*30*16)。为缓解传输网络的压力,可以采用对图像进行压缩后传输的方法降低带宽需求,无需升级现有网络即可开展UHD视频传输的新业务。
自动驾驶对安全性要求高,因此,自动驾驶系统对感知系统的延时比较敏感。图1a所示的场景作为感知系统的一个示例,对压缩算法的需求可以包括:支持RAW格式图像的编码,低延时,低复杂度,高压缩性能。为了满足这些性能,相关技术中设计了在RAW域进行视频压缩的架构。图1b示出根据相关技术中一示例的视频压缩的架构的示意图。如图1b所示,摄像头输出RAW格式的图像,经过编码器进行编码后输出RAW格式的图像,输出的RAW格式的图像是经过压缩后的图像,编码器编码后的图像可以传输到MDC,MDC上可以包括解码器、ISP以及深度神经网络,解码器用于对收到的已压缩的图像进行解码得到解码后的图像,然后再经过ISP处理后输出三原色(Red Green Blue,RGB)或者YUV格式的图像到深度神经网络进一步处理。其中,ISP处理可以包括:去马赛克(Demosaic)操作,用于将图像从RAW格式转换成RGB格式;白平衡(white balance,WB)操作,用于对图像进行白平衡处理;色彩校正矩阵(Color Correction Matrix,CCM),用于完成sensor_RGB色彩空间到sRGB色彩空间的转换,使得相机的颜色匹配特性满足卢瑟条件;伽马(Gamma)矫正, 用于矫正显示器的显示特性和输入图像的非线性关系。ISP处理还可以包括其他对图像的处理过程,本申请不限于上述处理。深度神经网络对图像进行的处理可以包括:图像识别、分割等。
图1b所示的示例在RAW域压缩可以降低感知系统到MDC的时延;图1b所示的ISP和深度神经网络可以设置于MDC中,这样可以提供更加灵活的ISP能力,获得更好的图像质量,并且能够降低感知系统到MDC的时延。
采用有损图像/视频压缩技术能够获得较高的压缩率,常用的有损压缩标准包括:联合图像专家组(Joint Photographic Experts Group,JPEG),H264/H265,JPEE-XS(Joint Photographic Experts Group Extra Speed)等。其中,JPEG-XS是联合图像专家组提出的一种新的压缩标准。压缩技术的引入导致的图像质量损伤是不可避免的,图像质量的损伤会对后续的机器视觉处理产生影响,可能会导致识别的准确率下降,图像分割不准确等问题。
为了评估压缩带来的图像质量的损伤对后续人工智能(Artificial Intelligence,AI)处理的影响,相关技术中提出了一些图像质量评价方法,在多大的码率阈值上进行压缩,可以达到机器无损的要求。其中,机器无损是指,相比于不压缩的图像,对压缩后的图像进行识别的精度指标在一定的误差范围内。也就是说,对压缩后的图像进行识别的精度指标,与对原图像(没有压缩的图像)进行识别的精度指标之间的差值在一定的误差范围内。
在实际应用中,如何评价处理后的图像质量是否满足机器无损,尚无业界共识和标准,精度阈值定多少算机器无损,业界也不清楚。因此,往往无法给出合适的精度阈值,使得采用根据精度阈值确定的码率阈值对图像进行压缩后可以尽量达到机器无损,不影响AI处理的性能。
发明内容
有鉴于此,提出了一种图像处理方式的确定方法及装置,通过概率指标表征接近机器无损的程度,为每一个码率计算一个概率值,并根据业务要求的概率阈值确定满足机器无损的码率阈值,能够合理的给出满足机器无损的码率阈值。
第一方面,本申请的实施例提供了一种图像处理方式的确定方法,所述方法包括:根据第一参数和精度的第一关系,计算所述第一参数中每个数值对应的概率,所述第一参数为码率或失真度;其中,所述精度为对已处理图像进行识别的精度,所述第一参数中每个数值对应的概率用于指示对所述第一参数中每个数值对应的所述已处理图像进行识别的精度与第一精度的接近程度,所述第一精度为对原图像进行识别的精度,所述原图像被处理后得到所述已处理图像;确定所述第一参数和所述概率的第二关系;根据业务要求的概率阈值和所述第二关系,得到所述概率阈值对应的所述第一参数的阈值。
其中,业务要求的概率阈值可以是指不同的应用场景下对识别的精度与机器无损的接近程度的需求,应用场景可以为自动驾驶、辅助驾驶等等,这些不同的应用场景对处理精度与机器无损的接近程度的需求可能是不同的,因此,不同的应用场景有对应的业务要求的概率阈值。
本申请实施例的图像处理方式的确定方法,通过概率指标表征接近机器无损的程度,为每一个码率计算一个概率值,并根据业务要求的概率阈值确定满足机器无损的码率阈值,能 够合理的给出满足机器无损的码率阈值。
根据第一方面,在第一种可能的实现方式中,根据第一参数和精度的第一关系,计算所述第一参数中每个数值对应的概率,包括:针对所述第一参数中每个数值,计算以所述每个数值为中心的滑动窗口内所述第一参数的所有数值对应的精度的精度均值和标准差;根据所述精度均值、所述标准差、所述滑动窗口的长度、所述第一精度以及累积分布函数,计算所述每个数值对应的概率。
根据第一方面的第一种可能的实现方式,在第二种可能的实现方式中,根据所述精度均值、所述标准差、所述滑动窗口的长度、所述第一精度以及累积分布函数,计算所述每个数值对应的概率,包括:
根据公式
Figure PCTCN2021084377-appb-000001
计算每个数值对应的概率,其中,P m表示第一参数m对应的概率,μ表示所述第一精度,n表示滑动窗口的长度,
Figure PCTCN2021084377-appb-000002
表示以第一参数m为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的精度均值,σ m表示以第一参数m为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的标准差,
Figure PCTCN2021084377-appb-000003
表示自由度为n-1的T分布累积分布函数,n为大于1的正整数。
基于T假设检验理论计算本申请实施例中第一参数对应的概率,适用于小样本的应用场景,是的评测的准确度更高,并且采用小样本的应用场景可以提高评测的效率。
根据第一方面或第一方面的第一种可能的实现方式,在第三种可能的实现方式中,所述第一参数为码率,所述第一参数中每个数值对应的所述已处理图像为采用所述第一参数中每个数值对所述原图像进行压缩得到的图像。
根据第一方面或第一方面的第一种可能的实现方式,在第四种可能的实现方式中,所述第一参数为所述已处理图像的失真度,所述第一参数的阈值为失真阈值,所述方法还包括:根据所述失真阈值和第三关系,确定所述失真阈值对应的码率阈值;其中,所述第三关系为码率和失真度之间的对应关系。
在上述实施例中,通过引入失真度作为中间变量,采用第一关系评价压缩之后的失真度对精度的影响,采用第三关系评价采用不同的码率压缩之后的失真度,将压缩的过程和压缩之后的处理过程的评价分开处理,可以实现压缩和压缩之后的处理过程的解耦,提高评测的效率。示例性的,本申请实施例提供的上述方法可以实现压缩算法与AI处理的解耦,如果要评测新的压缩算法,不需要进行端到端(从前端的压缩处理、到后端的人工智能处理)评测,可以只对压缩处理的过程进行评测得到新的压缩算法对应的第三关系即可。同样的,如果要采用新的AI模块对图像进行识别,也可以采用已有的数据对AI识别的过程重新进行评测得到新的第一关系即可,不需要进行端到端的评测。本申请实施例提供的方法可以提高评测的效率。
根据第一方面的第四种可能的实现方式,在第五种可能的实现方式中,所述原图像为拜耳原始Bayer RAW图像,所述已处理图像为RAW图像、或RGB图像、或YUV图像。
本申请的图像处理方式的确定方法可以应用于多种场景,通用性强,并且易于扩展对新 的压缩算法或者AI模块的评测,可以提高评测的效率。
第二方面,本申请的实施例提供了一种图像处理方式的确定装置,所述装置包括:计算模块,用于根据第一参数和精度的第一关系,计算所述第一参数中每个数值对应的概率,所述第一参数为码率或失真度;其中,所述精度为对已处理图像进行识别的精度,所述第一参数中每个数值对应的概率用于指示对所述第一参数中每个数值对应的所述已处理图像进行识别的精度与第一精度的接近程度,所述第一精度为对原图像进行识别的精度,所述原图像被处理后得到所述已处理图像;第一确定模块,用于确定所述第一参数和所述概率的第二关系;第二确定模块,用于根据业务要求的概率阈值和所述第二关系,得到所述概率阈值对应的所述第一参数的阈值。
本申请实施例的图像处理方式的确定装置,通过概率指标表征接近机器无损的程度,为每一个码率计算一个概率值,并根据业务要求的概率阈值确定满足机器无损的码率阈值,能够合理的给出满足机器无损的码率阈值。
根据第二方面,在第一种可能的实现方式中,所述计算模块包括:第一计算单元,用于针对所述第一参数中每个数值,计算以所述每个数值为中心的滑动窗口内所述第一参数的所有数值对应的精度的精度均值和标准差;第二计算单元,用于根据所述精度均值、所述标准差、所述滑动窗口的长度、所述第一精度以及累积分布函数,计算所述每个数值对应的概率。
根据第二方面的第一种可能的实现方式,在第二种可能的实现方式中,所述第二计算单元用于根据公式
Figure PCTCN2021084377-appb-000004
计算每个数值对应的概率,其中,P m表示第一参数m对应的概率,μ表示所述第一精度,n表示滑动窗口的长度,
Figure PCTCN2021084377-appb-000005
表示以第一参数m为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的精度均值,σ m表示以第一参数m为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的标准差,
Figure PCTCN2021084377-appb-000006
表示自由度为n-1的T分布累积分布函数,n为大于1的正整数。
基于T假设检验理论计算本申请实施例中第一参数对应的概率,适用于小样本的应用场景,是的评测的准确度更高,并且采用小样本的应用场景可以提高评测的效率。
根据第二方面或第二方面的第一种可能的实现方式,在第三种可能的实现方式中,所述第一参数为码率,所述第一参数中每个数值对应的所述已处理图像为采用所述第一参数中每个数值对所述原图像进行压缩得到的图像。
根据第二方面或第二方面的第一种可能的实现方式,在第四种可能的实现方式中,所述第一参数为所述已处理图像的失真度,所述第一参数的阈值为失真阈值,所述装置还包括:第三确定模块,用于根据所述失真阈值和第三关系,确定所述失真阈值对应的码率阈值;其中,所述第三关系为码率和失真度之间的对应关系。
在上述实施例中,通过引入失真度作为中间变量,采用第一关系评价压缩之后的失真度对精度的影响,采用第三关系评价采用不同的码率压缩之后的失真度,将压缩的过程和压缩之后的处理过程的评价分开处理,可以实现压缩和压缩之后的处理过程的解耦,提高评测的 效率。示例性的,本申请实施例提供的上述装置可以实现压缩算法与AI处理的解耦,如果要评测新的压缩算法,不需要进行端到端评测,可以只对压缩处理的过程进行评测得到新的压缩算法对应的第三关系即可。同样的,如果要采用新的AI模块对图像进行识别,也可以采用已有的数据对AI识别的过程重新进行评测得到新的第一关系即可,不需要进行端到端的评测。本申请实施例提供的装置可以提高评测的效率。
根据第二方面的第四种可能的实现方式,在第五种可能的实现方式中,所述原图像为拜耳原始Bayer RAW图像,所述已处理图像为RAW图像、或RGB图像、或YUV图像。
本申请的图像处理方式的确定装置可以应用于多种场景,通用性强,并且易于扩展对新的压缩算法或者AI模块的评测,可以提高评测的效率。
第三方面,本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的图像处理方式的确定方法。
第四方面,本申请的实施例提供了一种电子设备,该电子设备可以执行上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的图像处理方式的确定方法。
第五方面,本申请实施例还提供一种传感器系统,用于为车辆提供感知功能。其包含至少一个本申请上述实施例提到的图像处理方式的确定装置,以及,摄像头或雷达等其他传感器中的至少一个,该系统内的至少一个传感器装置可以集成为一个整机或设备,或者该系统内的至少一个传感器装置也可以独立设置为元件或装置。
第六方面,本申请实施例还提供一种系统,应用于无人驾驶或智能驾驶中,其包含至少一个本申请上述实施例提到的图像处理方式的确定装置,摄像头、雷达等传感器中的至少一个,该系统内的至少一个装置可以集成为一个整机或设备,或者该系统内的至少一个装置也可以独立设置为元件或装置。
进一步,上述任一系统可以与车辆的中央控制器进行交互,为所述车辆驾驶的决策或控制提供探测和/或融合信息。
第七方面,本申请实施例还提供一种车辆,所述车辆包括至少一个本申请上述实施例提到的图像处理方式的确定装置或上述任一系统。
本申请的这些和其他方面在以下(多个)实施例的描述中会更加简明易懂。
附图说明
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本申请的示例性实施例、特征和方面,并且用于解释本申请的原理。
图1a是相关技术中一种基于压缩的感知系统传输数据的示意图。
图1b示出根据相关技术中一示例的视频压缩的架构的示意图。
图2a示出根据本申请一实施例的评测框架的示意图。
图2b示出根据本申请一实施例的码率-精度曲线的示意图。
图3示出根据本申请一实施例的图像处理方式的确定方法的流程图。
图4示出根据本申请一实施例的第一关系对应的曲线以及第二关系的曲线示意图。
图5示出根据本申请一实施例的T分布的概率密度函数的分布示意图。
图6示出根据本申请一实施例的获得第一关系的场景的示意图。
图7a示出根据本申请一实施例的第一关系的曲线的示意图。
图7b示出根据本申请一实施例的第三关系对应的曲线的示意图。
图8示出根据本申请一实施例的第一关系对应的曲线以及第二关系的曲线示意图。
图9示出根据本申请一些示例的测评框架的示意图。
图10示出根据本申请一些示例的测评框架的示意图。
图11示出根据本申请一些示例的测评框架的示意图。
图12示出根据本申请一实施例的图像处理方式的确定装置的框图。
具体实施方式
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本申请的主旨。
相关概念解释:
码率可以表示在对图像进行压缩时取样的频率。
精度可以表示对已处理图像进行识别的精度,比如对已压缩图像进行识别的精度。
第一精度:对原图像(没有压缩的图像)进行识别的精度。
机器无损概率:对压缩后的图像进行识别的精度与第一精度的接近程度。
图2a示出根据本申请一实施例的评测框架的示意图,图2a所示的评测框架为动态图像专家组(Moving Picture Experts Group,MPEG)-机器视觉编码(Video Coding for Machines,VCM)工作组定义的、面向机器视觉的图像质量评价方法,可以采用端到端的评测流程。也就是说,基于图2a所示的评测框架的评价方法是对从前端的压缩处理、到后端的人工智能处理整个过程的精度的评价。
如图2a所示,摄像头将经过ISP处理的视频输出到VCM编码器,经过ISP处理的视频可以为RGB或者YUV格式,由VCM编码器对视频进行编码得到编码后的视频,编码后的视频传输到VCM解码器,由VCM解码器进行视频解码得到解码后的视频,对解码后的视频进行机器视觉处理,具体地,可以将解码后的图像输出给神经网络,通过神经网络进行机器视觉处理。
对于不同的压缩算法,如果要确定在多大的码率阈值上进行压缩,识别的精度可以达到机器无损的要求,采用图2a所示的框架进行端到端的测试,得到压缩算法的多个码率点对应的精度,根据多个码率点对应的精度可以绘制码率和精度的曲线,根据业务要求的精度阈值和码率-精度曲线可以确定精度指标对应的码率阈值,效率比较低。其中,业务要求的精度阈值可以是指不同的应用场景下对精度的需求,应用场景可以为自动驾驶、辅助驾驶等等,这 些不同的应用场景对处理精度的需求可能是不同的,因此,不同的应用场景有对应的业务要求的精度阈值。在一种可能的实现方式中,业务要求的精度阈值可以是指业务要求的精度与机器无损的精度的差值。
图2b示出根据本申请一实施例的码率-精度曲线的示意图。作为示例,假设业务要求的精度阈值分别为10%和20%,如图2b所示,神经网络对无压缩的图像进行识别的精度约为26%,若精度阈值为10%,那么对应的精度约为16%,精度16%对应的码率约为0.03;若精度阈值为20%,那么对应的精度约为6%,精度6%对应的码率约为0.025。
在实际应用中,如何评价处理后的图像质量是否满足机器无损,尚无业界共识和标准,精度阈值定多少算机器无损,业界也不清楚。因此,往往无法给出合适的精度阈值,使得采用根据精度阈值确定的码率阈值对图像进行压缩后可以尽量达到机器无损,不影响AI处理的性能。
为了解决上述技术问题,本申请提供了一种图像处理方式的确定方法,提出了一种基于概率估计对处理后的图像质量是否为机器无损进行评价的方式。图3示出根据本申请一实施例的图像处理方式的确定方法的流程图。如图3所示,本申请实施例的图像处理方式的确定方法可以包括以下步骤:
步骤S300,根据第一参数和精度的第一关系,计算所述第一参数中每个数值对应的概率,所述第一参数为码率或失真度。
步骤S301,确定所述第一参数和所述概率的第二关系。
步骤S302,根据业务要求的概率阈值和所述第二关系,得到所述概率阈值对应的所述第一参数的阈值。
其中,所述精度为对已处理图像进行识别的精度,处理可以是指对图像进行的编码/解码等压缩处理。在本申请的实施例中,精度采用的指标可以为平均精度均值(mean Average Precision,mAP),精度均值(Average Precision,AP),平均召回率(Average Recall,AR),均交并比(Mean Intersection over Union,MIoU)。其中,AP可以为AP50、AP60、AP70或者weightedAP,等等。精度采用的指标还可以是以上指标中多个的结合,比如说,联合以上多个指标对AI处理的精度进行综合测评。举例来说,可以采用mAP和AR两个指标的加权指标作为精度的指标,本申请对精度采用的具体指标不作限定。
在本申请的实施例中,第一参数中每个数值对应的概率用于指示对所述第一参数中每个数值对应的所述已处理图像进行识别的精度与第一精度的接近程度,所述第一精度为对原图像进行识别的精度,所述原图像被处理后得到所述已处理图像。下面分别以第一参数为码率和第一参数为失真度为例说明本申请实施例的方法。
以第一参数为码率为例,所述第一参数中每个数值对应的所述已处理图像为采用所述第一参数中每个数值对所述原图像进行压缩得到的图像。因此,第一参数中每个数值对应的概率用于指示:采用所述第一参数中每个数值对原图像进行处理后进行识别的精度与第一精度的接近程度。
换言之,第一关系为码率和精度的对应关系,根据码率和精度的对应关系可以计算得到 每个码率值对应的概率,每个码率值对应的概率用于指示对采用该码率值进行压缩后的图像进行识别的精度与第一精度的接近程度。
在本申请的实施例中,第一参数为码率,第一关系可以是通过图2a所示的框架进行测试得到的,可以存储为离散的数值对之间的对应关系,也可以存储为函数的形式,函数可以表示为如图2b所示的曲线,也就是第一关系也可以表示为曲线的形式。
需要说明的是,本申请实施例的第一关系可以直接在已有的测试集上进行测试,比如说,可以在cityscape数据集上进行测试,cityscape数据集中包括训练图、验证图、测试图,数据集中包括的图像都带有注释,可以直接进行压缩和识别处理,得到本申请实施例需要的第一关系。
这种情况下,仿真设备可以包括图2a所示的VCM编码器、VCM解码器、机器视觉模块和处理器,不需要包括图2a所示的摄像头和ISP,仿真设备接收的是测试集中的图像数据。其中,VCM编码器、VCM解码器、机器视觉模块可以是存储在仿真设备的存储器上的软件程序,处理器可以调用相应的模块实现对测试集中的图像的处理,并得到码率和对应的精度数据。处理器可以建立码率和精度之间的第一关系,并存储第一关系。
在另一种可能的实现方式中,本申请实施例的方法也可以在实际的应用场景中进行测试得到第一关系,比如说,在自动驾驶系统上进行测试,自动驾驶系统可以包括摄像头(如图2a所示的摄像头),还可以包括但不限于:车载终端、车载控制器、车载模块、车载模组、车载部件、车载芯片、车载单元、车载雷达或车载摄像头等其他传感器。
其中,编码器可以位于摄像头上,解码器可以位于MDC上,机器视觉模块和处理器可以位于MDC上,或者,机器视觉模块位于MDC上,处理器可以是外部设备(自动驾驶系统以外的设备)的处理器。
该外部设备可以为一个通用设备或者是一个专用设备。在具体实现中,该外部设备可以是台式机、便携式电脑、网络服务器、掌上电脑(personal digital assistant,PDA)、移动手机、平板电脑、无线终端设备、嵌入式设备或其他具有处理功能的设备。本申请实施例不限定该外部设备的类型。该外部设备可以具有处理功能的芯片或处理器,该外部设备可以包括多个处理器,处理器可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。
以处理器为是外部设备的处理器为例,在得到第一关系后,本申请的图像处理方式的确定方法可以由上述外部设备离线执行。
测试时可以设置图2a中编码器的码率,摄像头采集图像后由编码器进行编码后,发送到MDC,由解码器解码后得到的图像,对解码后得到的图像进行机器视觉处理可以得到精度数据。对于不同的码率点,可以分多次设置编码器的码率,并执行上述过程得到多个码率点对应的精度。
对于得到的码率和码率对应的精度数据可以输出到外部设备,外部设备可以根据码率和码率对应的精度数据建立码率和精度之间的第一关系,并存储第一关系。
如果图2a所示的机器视觉模块和处理器都可以位于MDC上,那么自动驾驶系统可以在线执行本申请实施例的图像处理方式的确定方法,比如说,测试时可以设置图2a中编码器的 码率,摄像头采集图像后由编码器进行编码后,发送到MDC,由解码器解码后得到的图像进行机器视觉处理可以得到精度数据。MDC可以建立码率和精度之间的对应关系(第一关系),并存储第一关系。然后由MDC执行本申请的图像处理方式的确定方法。
需要说明的是,上述示例中机器视觉模块和处理器都位于MDC上仅仅是本申请的一个示例,不以任何方式限制本申请,比如说,机器视觉模块和处理器还可以位于自动驾驶系统的其他部件上,本申请对此不作限定。
在一种可能的实现方式中,步骤S300,根据第一参数和精度的第一关系,计算所述第一参数中每个数值对应的概率,可以包括:针对所述第一参数中每个数值,计算以所述每个数值为中心的滑动窗口内所述第一参数的所有数值对应的精度的精度均值和标准差;根据所述精度均值、所述标准差、所述滑动窗口的长度、所述第一精度以及累积分布函数,计算所述每个数值对应的概率。
以第一关系表示为曲线为例,在本申请的实施例中,滑动窗口可以是指覆盖的长度固定,但是可以沿着第一关系对应的曲线滑动的窗口,滑动窗口可以沿着横坐标方向按照一定的步长在第一关系对应的曲线上滑动,滑动窗口的宽度可以是对着滑动的过程变化的,比如说在滑动窗口的长度上,窗口左侧和右侧的边界对应的精度的差值越大,滑动窗口的宽度越宽,反之越窄。
图4示出根据本申请一实施例的第一关系对应的曲线以及第二关系的曲线示意图。如图4所示,针对压缩后的图像识别的精度和码率之间的关系曲线(曲线压缩:码率-精度)可以为第一关系对应的曲线的示例,图4中的虚线矩形框和实线矩形框都可以表示滑动窗口。假设滑动窗口的长度为1,步长为0.5,滑动窗口的中心沿着下一个步长对应的横坐标和第一关系对应的曲线的交点的方向滑动,如图4所示,虚线矩形框的中心在横坐标5和第一关系对应的曲线的交点处,在下一次的移动过程中,沿着横坐标5.5和第一关系对应的曲线的交点的方向移动,移动后如实线矩形框所示,实线矩形框的中心在横坐标5.5和第一关系对应的曲线的交点处。
滑动完成后,可以计算滑动窗口覆盖范围内的第一参数的精度均值和标准差,结合图4所示的示例,比如说对于实线矩形框,假设滑动窗口的长度为1,滑动窗口覆盖了码率5-6范围内的数值,可以根据第一关系对应的曲线的表达式计算码率5到6对应的精度的均值和标准差。需要说明的是,图4所示的示例中的滑动窗口的长度、码率的数值、精度的数值以及步长的数值都是本申请的示例,不以任何方式限制本申请。比如说,滑动窗口的长度还可以设置为11。
在本申请的实施例中,根据所述精度均值、所述标准差、所述滑动窗口的长度、所述第一精度以及累积分布函数,计算所述每个数值对应的概率,可以包括:
根据公式(1)计算每个数值对应的概率:
Figure PCTCN2021084377-appb-000007
其中,P m表示第一参数m对应的概率,在本示例中,m可以表示码率,μ表示所述第一精度,n表示滑动窗口的长度,
Figure PCTCN2021084377-appb-000008
表示以第一参数m为中心、长度为n的滑动窗口内的第一 参数的数值对应的精度的精度均值,σ m表示以第一参数m为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的标准差,
Figure PCTCN2021084377-appb-000009
表示自由度为n-1的T分布的累积分布函数,n为大于1的正整数。
图5示出根据本申请一实施例的T分布的概率密度函数的分布示意图。如图5所示,假设自由度n-1为4,对T分布的概率密度函数进行积分得到的值(也就是T分布的累积分布函数的值)是图5中t=-4、t=4、T分布概率密度函数在-4到4之间的曲线和x轴组成的图形的面积。因此,
Figure PCTCN2021084377-appb-000010
表示的是图5中t=4、T分布概率密度函数在4到+∞之间的曲线和x轴组成的图形的面积的两倍,P越大,表示t值越接近y轴(中心线),P越小,表示t值越远离y轴(中心线)。
由于将精度的均值和方差作为T分布的累积分布函数的参数,因此,上述公式(1)对应的中心线为图4中的精度等于1.0的粗虚线(无压缩对应的曲线),计算得到的P值越大,表示精度值越接近中心线,计算得到的P值越小,表示精度越远离中心线。因此,根据公式(1)计算得到的P值可以表示精度距离中心线的程度。这里的中心线也就是对原图像进行识别的精度线。
基于T假设检验理论计算本申请实施例中第一参数对应的概率,适用于小样本的应用场景,是的评测的准确度更高,并且采用小样本的应用场景可以提高评测的效率。
在本申请的实施例中,计算得到的离散的数值对可以作为第二关系,或者也可以对计算得到的离散的数值对进行拟合得到函数,将得到的函数作为第二关系,图4中码率-概率的曲线可以是第二关系的一个示例。
在本申请的实施例中,业务要求的概率阈值可以是指不同的应用场景下对识别的精度与机器无损的接近程度的需求,应用场景可以为自动驾驶、辅助驾驶等等,这些不同的应用场景对处理精度与机器无损的接近程度的需求可能是不同的,因此,不同的应用场景有对应的业务要求的概率阈值。在一种可能的实现方式中,可以根据业务要求的机器无损置信水平确定对应的码率阈值,业务要求的机器无损置信水平可以作为业务要求的概率阈值。比如说,业务要求的机器无损置信水平为0.1,那么,根据图4所示的第二关系对应的曲线可以确定概率阈值0.1对应的码率阈值可以为bpp=5。该码率点即为机器无损的码率阈值点,当码率高于该码率阈值时认为达到了机器无损水平。
当业务要求的机器无损置信水平为0.05时,可以确定另一码率点bpp=4.5。不同置信水平下具有不同的码率阈值点,置信水平越高,码率阈值越大,越接近无压缩时的精度指标。
本申请实施例的图像处理方式的确定方法,通过概率指标表征接近机器无损的程度,为每一个码率计算一个概率值,并根据业务要求的概率阈值确定满足机器无损的码率阈值,能够合理的给出满足机器无损的码率阈值。
本申请实施例的第一参数还可以是失真度,所述第一参数的阈值为失真阈值,失真度可以表示已处理(压缩)图像与真实环境的差异。在本申请的实施例中,失真度采用的指标可以为峰值信噪比(Peak signal noise ratio,PSNR),或者均方误差(Mean square error,MSE),或者结构相似性指标(Structure similarity index,SSIM),或者感知损失(Perception loss,P-loss)。 失真度也可以采用以上指标中的多个的结合,比如说,联合多个失真指标对已压缩的图像的失真度进行综合评测。举例来说,可以采用PSNR和SSIM两个指标的加权指标,作为最终失真度,可以适用于对信号保真(PSNR)和人眼视觉(SSIM)都有要求的应用场合。
图6示出根据本申请一实施例的获得第一关系的场景的示意图。
如图6所示,在本申请的实施例的应用场景中,可以包括压缩模块、AI模块以及处理器。其中,压缩模块可以对接收到的图像,进行压缩处理,在压缩处理时可以对图像进行取样(取样频率为码率)处理,压缩后的图像可以传输给AI模块进行目标检测、图像分割等处理。
需要说明的是,本申请实施例的方法可以直接在已有的测试集上进行测试,比如说,可以在cityscape数据集上进行测试,cityscape数据集中包括训练图、验证图、测试图,数据集中包括的图像都带有注释,可以直接进行压缩和识别处理,得到本申请实施例的测试结果数据(码率对应的失真度和精度数据)。其中,码率为采用压缩算法对原图像进行压缩得到已压缩的图像的取样频率,失真度为已压缩的图像相对于真实环境的差异。
这种情况下,仿真设备可以包括上述压缩模块、AI模块和处理器,其中,压缩模块和AI模块可以是存储在仿真设备的存储器上的软件程序,处理器可以调用相应的模块实现对测试集中的图像的处理,并得到测试结果数据。对于得到的测试结果数据,处理器可以建立失真度和精度之间的对应关系(第一关系)、以及码率和失真度的对应关系(第三关系),并存储第一关系和第三关系。
本申请实施例的方法也可以在实际的应用场景中进行测试,比如说,在自动驾驶系统上进行测试,自动驾驶系统可以包括摄像头,还可以包括但不限于:车载终端、车载控制器、车载模块、车载模组、车载部件、车载芯片、车载单元、车载雷达或车载摄像头等其他传感器。
其中,压缩模块可以是编码器,编码器可以位于摄像头上,压缩模块还可以包括解码器,解码器可以位于MDC上,AI模块和处理器可以位于MDC上,或者,AI模块位于MDC上,处理器可以是外部设备(自动驾驶系统以外的设备)的处理器。
该外部设备可以为一个通用设备或者是一个专用设备。在具体实现中,该外部设备还可以是台式机、便携式电脑、网络服务器、掌上电脑(personal digital assistant,PDA)、移动手机、平板电脑、无线终端设备、嵌入式设备或其他具有处理功能的设备。本申请实施例不限定该外部设备的类型。该外部设备可以具有处理功能的芯片或处理器(如图6所示的处理器),该外部设备可以包括多个处理器,处理器可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。
以图6所示的处理器为是外部设备的处理器为例,本申请的图像处理方式的确定方法可以由上述外部设备离线执行。测试时可以设置编码器的码率,摄像头采集图像后由编码器进行编码后,发送到MDC,由解码器解码后得到的图像可以存储在MDC,AI模块对解码后得到的图像进行识别可以得到精度数据。对于不同的码率点,可以分多次设置编码器的码率,并执行上述过程得到测试结果数据。
对于得到的测试结果数据可以输出到外部设备,外部设备可以根据解码后的图像得到已压缩的图像的失真度,外部设备可以建立失真度和精度之间的对应关系(第一关系)、以及码率和失真度的对应关系(第三关系),并存储第一关系和第三关系。
如果图6所示的AI模块和处理器都位于MDC上,那么自动驾驶系统可以在线执行本申 请实施例的图像处理方式的确定方法,比如说,测试时可以设置编码器的码率,摄像头采集图像后由编码器进行编码后,发送到MDC,由解码器解码后得到的图像可以存储在MDC,MDC根据解码后的图像可以得到已压缩的图像的失真度,AI模块对解码后得到的图像进行识别可以得到精度数据。对于得到的测试结果数据,MDC可以建立失真度和精度之间的对应关系(第一关系)、以及码率和失真度的对应关系(第三关系),并存储第一关系和第三关系。
在本申请的实施例中,第一关系和第三关系可以是以表项的形式存储的一对一对的数值,也可以是以函数的形式表示,本申请对此不作限定。
举例来说,示例性的,第一关系可以表示为如表1所示的形式。
表1
精度 失真度
P1 D1
P2 D2
Pn Dn
示例性的,第一关系还可以表示成函数的形式,如下公式(2)所示:
Figure PCTCN2021084377-appb-000011
其中,fi(D)表示在数值范围Di上,精度和失真度之间的函数关系,i为1~n的正整数。换言之,精度和失真度之间的关系可以表示为分段函数的形式。
在一种可能的实现方式中,在数值范围Di上,P和D可以为线性关系。精度和失真度之间的关系可以表示为分段线性函数。
图7a示出根据本申请一实施例的第一关系的曲线的示意图。如图7a所示,横坐标可以表示失真度,纵坐标可以表示精度,在图7a所示的示例中采用的失真度的指标为PSNR,精度的指标为mAP。图7a所示的示例中,三种不同压缩算法(X265_medium(X265缺省配置)、 X264_medium(X264缺省配置)、X264_ultrafast(X264快速配置))对应的第一关系的曲线几乎是重合的,也就是说,第一关系与具体采用的压缩算法无关,不依赖于具体的压缩算法。
换言之,机器视觉的性能主要取决于输入的图像的失真度,与压缩算法无关,不依赖于具体的压缩算法。另外,机器视觉的性能和具体采用的神经网络是有关系的。
需要说明的是,第三关系也可以离散的数值对的存储在表格中,表2示出第三关系存储的一种形式
表2
码率 失真度
R1 D1
R2 D2
Rn Dn
第三关系也可以表示成函数的形式,如下公式(3)所示:
Figure PCTCN2021084377-appb-000012
其中,gi(R)表示在数值范围Ri上,失真度和码率之间的函数关系。换言之,失真度和码率之间的关系可以表示为分段函数的形式。在一种可能的实现方式中,在数值范围Ri上,D和R可以为线性关系,失真度和码率之间的关系可以表示为分段线性函数。第三关系以函数的形式表示时,图7b示出根据本申请一实施例的第三关系对应的曲线的示意图。
如图7b所示,横坐标可以表示码率,纵坐标可以表示失真度,在图7b所示的示例中采用的失真度的指标为PSNR。图7b所示的示例中,三种不同压缩算法对应的第三关系的曲线比较分散,也就是说,不同的压缩算法即使是采用相同的码率对图像进行压缩处理得到的已压缩的图像的失真度差别比较大,第三关系与具体采用的压缩算法有关。
在得到第一关系后,处理器可以执行步骤S300-S302。在本示例中,对于步骤S300,可以采用与第一参数为码率的示例中相同的方式计算失真度对应的概率。图8示出根据本申请一实施例的第一关系对应的曲线以及第二关系的曲线示意图。如图8所示,针对压缩后的图像识别的精度和压缩后的图像的失真度之间的关系曲线(曲线压缩:失真度-精度)可以为第一关系对应的曲线的示例。对图8所示的示例也可以采用滑动窗口的方式计算失真度对应的概率,具体地,可以根据公式(4)计算每个失真度对应的概率:
Figure PCTCN2021084377-appb-000013
其中,P d表示第一参数d对应的概率,在本示例中,d可以表示失真度(distortion),μ表示所述第一精度,n表示滑动窗口的长度,
Figure PCTCN2021084377-appb-000014
表示以第一参数d为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的精度均值,σ d表示以第一参数d为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的标准差,
Figure PCTCN2021084377-appb-000015
表示自由度为n-1的T分布的累积分布函数,n为大于1的正整数。
在本申请的实施例中,计算得到的离散的数值对可以作为第二关系,或者也可以对计算得到的离散的数值对进行拟合得到函数,将得到的函数作为第二关系,图8中失真度-概率的曲线可以是第二关系的一个示例。
在确定第二关系后,执行步骤S302,可以根据业务要求的机器无损置信水平确定对应的失真阈值,业务要求的机器无损置信水平可以作为业务要求的概率阈值。比如说,业务要求的机器无损置信水平为0.05,那么,根据图8所示的第二关系对应的曲线可以确定概率阈值0.05对应的失真阈值可以为47.5。
在本申请的实施例中,图像处理方式的确定方法还可以包括:根据所述失真阈值和第三关系,确定所述失真阈值对应的码率阈值;其中,所述第三关系为码率和失真度之间的对应关系。
如图7b所示,对于不同的压缩算法,相同的失真阈值对应不同的码率。对于压缩算法X265_medium,失真阈值47.5对应的码率约为1.4,对于压缩算法x264_ultrafast,失真阈值 47.5对应的码率约为1.75。
本申请实施例的图像处理方式的确定方法,通过概率指标表征接近机器无损的程度,为每一个失真度计算一个概率值,并根据业务要求的概率阈值确定满足机器无损的失真阈值,根据失真度和码率的第三关系以及是失真阈值可以确定失真阈值对应的码率阈值,能够合理的给出满足机器无损的码率阈值。
并且,在上述实施例中,通过引入失真度作为中间变量,采用第一关系评价压缩之后的失真度对精度的影响,采用第三关系评价采用不同的码率压缩之后的失真度,将压缩的过程和压缩之后的处理过程的评价分开处理,可以实现压缩和压缩之后的处理过程的解耦,也就是实现了压缩算法与AI处理的解耦,如果要评测新的压缩算法,不需要进行端到端评测,可以只对压缩处理的过程进行评测得到新的压缩算法对应的第三关系即可。同样的,如果要采用新的AI模块对图像进行识别,也可以采用已有的数据对AI识别的过程重新进行评测得到新的第一关系即可,不需要进行端到端的评测。本申请实施例提供的方法可以提高评测的效率。
需要说明的是,图7a、图7b和图8中的数值仅仅是本申请的示例,不以任何方式限制本申请。
在一种可能的实现方式中,所述原图像可以为拜耳原始Bayer RAW图像,所述已处理图像为RAW图像、或RGB图像、或YUV图像。
下面结合具体的应用场景和应用示例对本申请的图像处理方式的确定方法进行说明。
图9示出根据本申请一些示例的测评框架的示意图。如图9所示,测评的过程可以分解为第一阶段和第二阶段两个阶段,其中,第一阶段用于对压缩算法进行测试,可以得到码率和失真度的第三关系,第二阶段用于对神经网络的识别精度进行测试,可以得到失真度和精度的第一关系。在图9的示例中,失真度可以定义在RGB域,失真度采用的指标可以为上文所述的PSNR,或者MSE,或者SSIM,或者P-loss,或者以上多个指标的加权结果。
对于压缩传输系统,失真主要是压缩编码引入的量化噪声。对精度-失真度曲线,失真度主要取决于压缩量化噪声的能量而与具体噪声形态关系不大,这种情况下,PSNR/MSE成为衡量压缩失真的合适的指标,PSNR和MSE存在log关系,MSE表征为压缩噪声能量大小,PSNR/MSE指标计算简单,使用方便。
示例(a)表示参照(reference)的测评过程,RAW图像经ISP处理后输出RGB图像到深度神经网络,同时可以输出RGB图像的失真度数据,深度神经网络为未经压缩的RGB图像进行机器视觉处理得到识别的精度数据。
示例(b)表示在RAW域对RAW图像进行压缩的场景,采用编码/解码器对RAW图像进行压缩后得到已压缩的图像,ISP对已压缩的图像进行处理可以得到RGB图像,输出RGB图像的失真度数据,深度神经网络对RGB图像进行机器视觉处理,可以得到识别的精度数据。相比于在RGB/YUV域压缩,在RAW域对RAW图像进行压缩可以降低压缩算法的复杂度,因为RAW域的数据量少。
示例(c)表示在RGB域对RGB图像进行压缩的场景,RAW图像经ISP处理后得到RGB图像,编码/解码器对RGB图像进行压缩后得到已压缩的图像,输出已压缩的RGB图像的失真度数据,深度神经网络对已压缩的RGB图像进行机器视觉处理,可以得到识别的精度数据。
示例(d)表示在YUV域对YUV图像进行压缩的场景,RAW图像经ISP处理后得到RGB图像,对RGB图像进行RGB-YUV格式转换可以得到YUV图像,采用编码/解码器对YUV图像进行压缩得到已压缩的图像,对已压缩的图像进行YUV-RGB格式转换可以得到已压缩的RGB图像,输出已压缩的RGB图像的失真度数据,深度神经网络对已压缩的RGB图像进行机器视觉处理,可以得到识别的精度数据。
示例(a)可以得到对未压缩的图像进行识别的精度,也就是所述的第一精度。对多种不同的压缩算法,可以分别采用示例(b)、示例(c)和示例(d)的框架进行测试,得到每个示例的框架上每个压缩算法对应的第三关系,以及每个示例的第一关系。
举例来说,对于三种压缩算法X265_medium、X264_medium、X264_ultrafast,可以分别在示例(b)、示例(c)和示例(d)的框架上,根据上述过程进行测试。
以示例(b)为例,在不同的码率上采用压缩算法X265_medium对RAW图像进行压缩后得到已压缩的图像,ISP对已压缩的图像进行处理可以得到RGB图像,输出RGB图像的失真度数据,可以建立压缩算法X265_medium对应的码率和失真度的第三关系,深度神经网络对RGB图像进行机器视觉处理,可以得到识别的精度数据,可以建立失真度和精度的第一关系;在不同的码率上采用压缩算法X264_medium对RAW图像进行压缩后得到已压缩的图像,ISP对已压缩的图像进行处理可以得到RGB图像,输出RGB图像的失真度数据,可以建立压缩算法X264_medium对应的码率和失真度的第三关系,对于压缩算法X264_medium可以不继续对后续的机器视觉处理的过程进行测试,在确定机器无损对应的码率时,可以采用根据压缩算法X265_medium测试得到的第一关系;对于压缩算法X264_ultrafast,可以重复与压缩算法X264_medium相同的过程得到对应的第三关系。
在得到第一关系后,可以根据本申请实施例提供的图像处理方式的确定方法中的步骤S300计算每个失真度对应的概率,并建立失真度与概率的第二关系。根据业务要求的概率阈值、第二关系以及第三关系能够合理的给出满足机器无损的码率阈值。
由此可见,本申请实施例的图像质量的评价方法简单、高效,对压缩和AI处理的过程进行解耦,实现分阶段评价,AI处理的精度和压缩算法无关,针对新的压缩算法只需要进行失真度和码率的重新测试即可,不需要进行端到端的测试,简化了测试的过程,评测效率更高。
图10示出根据本申请一些示例的测评框架的示意图。如图10所示,测评的过程仍然可以分解为第一阶段和第二阶段两个阶段,相比于图9的示例划分阶段的方式不同。在图10的示例中,失真度可以定义在YUV域,同样可以采用PSNR,或者MSE,或者SSIM,或者P-loss,或者以上多个指标的加权结果作为失真度的指标。因此,第一阶段可以划分到采用压缩算法对YUV图像进行压缩得到已压缩的YUV图像,可以输出已压缩的YUV图像的失真度数据。图10的示例中示例(e)可以表示参照的测评过程,示例(f)可以表示在YUV域压缩图像的场景,其他过程与图9的示例相似,不再赘述。
图11示出根据本申请一些示例的测评框架的示意图。如图11所示,测评的过程仍然可以分解为第一阶段和第二阶段两个阶段,相比于图9和图10的示例划分阶段的方式不同。在图11的示例中,失真度可以定义在RAW域,同样可以采用PSNR,或者MSE,或者SSIM,或者P-loss,或者以上多个指标的加权结果作为失真度的指标。因此,第一阶段可以划分到采用压缩算法对RAW图像进行压缩得到已压缩的RAW图像,可以输出已压缩的RAW图像的失真度数据。图11的示例中示例(g)可以表示参照的测评过程,示例(h)可以表示在 RAW域压缩图像的场景,其他过程与图9的示例相似,不再赘述。
根据本申请的实施例可知,本申请的图像处理方式的确定方法可以应用于多种场景,通用性强,并且易于扩展对新的压缩算法或者AI模块的评测,可以提高评测的效率。
需要说明的是,本申请的实施例中的精度可以为对已处理的其他数据进行识别的精度,比如说,可以是对已处理的音频数据进行识别的精度,处理可以是指对音频数据的编码/解码等压缩处理。也就是说,本申请实施例中的图像还可以是其他数据,比如说音频数据、文本数据,等等,本申请可以提供一种数据处理方式的确定方法,相应的,第一参数可以为码率或失真度,处理的数据不同,对应的精度的指标可能不同,如果处理的数据为图像数据,精度指标可以如上文所述,如果处理的数据为语音数据,对语音识别的精度可以为词错率(Word Error Rate,WER)、字错率(Character Error Rate,CER),或者语音识别采用的精度指标还可以是以上指标中多个的结合,比如说,联合以上多个指标对语音识别的精度进行综合测评;如果处理的数据为点云数据,对点云识别的精度可以为目标识别准确率mAP、语义分割精度IoU以及多目标跟踪精度(Multi-Object Tracking Accuracy,MOTA)等。
本申请实施例提供的数据处理方式的确定方法可以包括根据第一参数和精度的第一关系,计算所述第一参数中每个数值对应的概率;确定所述第一参数和所述概率的第二关系;根据业务要求的概率阈值和所述第二关系,得到所述概率阈值对应的所述第一参数的阈值。其中,数据可以是图像、音频、文本等,通过概率指标表征接近机器无损的程度,为每一个第一参数计算一个概率值,并根据业务要求的概率阈值确定满足机器无损的码率阈值,能够合理的给出满足机器无损的第一参数的阈值。
本申请还提供了一种图像处理方式的确定装置,图12示出根据本申请一实施例的图像处理方式的确定装置的框图。如图12所示,所述装置可以包括:计算模块120,用于根据第一参数和精度的第一关系,计算所述第一参数中每个数值对应的概率,所述第一参数为码率或失真度;其中,所述精度为对已处理图像进行识别的精度,所述第一参数中每个数值对应的概率用于指示对所述第一参数中每个数值对应的所述已处理图像进行识别的精度与第一精度的接近程度,所述第一精度为对原图像进行识别的精度,所述原图像被处理后得到所述已处理图像;第一确定模块121,用于确定所述第一参数和所述概率的第二关系;第二确定模块122,用于根据业务要求的概率阈值和所述第二关系,得到所述概率阈值对应的所述第一参数的阈值。
本申请实施例的图像处理方式的确定装置,通过概率指标表征接近机器无损的程度,为每一个码率计算一个概率值,并根据业务要求的概率阈值确定满足机器无损的码率阈值,能够合理的给出满足机器无损的码率阈值。
在一种可能的实现方式中,所述计算模块120包括:第一计算单元,用于针对所述第一参数中每个数值,计算以所述每个数值为中心的滑动窗口内所述第一参数的所有数值对应的精度的精度均值和标准差;第二计算单元,用于根据所述精度均值、所述标准差、所述滑动窗口的长度、所述第一精度以及累积分布函数,计算所述每个数值对应的概率。
在一种可能的实现方式中,所述第二计算单元用于根据公式:
Figure PCTCN2021084377-appb-000016
计算每个数值对应的概率,其中,P m表示第一参数m对应的概率,μ表示所述第一精度,n表示滑动窗口的长度,
Figure PCTCN2021084377-appb-000017
表示以第一参数m为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的精度均值,σ m表示以第一参数m为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的标准差,
Figure PCTCN2021084377-appb-000018
表示自由度为n-1的T分布累积分布函数,n为大于1的正整数。
基于T假设检验理论计算本申请实施例中第一参数对应的概率,适用于小样本的应用场景,是的评测的准确度更高,并且采用小样本的应用场景可以提高评测的效率。
在一种可能的实现方式中,所述第一参数为码率,所述第一参数中每个数值对应的所述已处理图像为采用所述第一参数中每个数值对所述原图像进行压缩得到的图像。
在一种可能的实现方式中,所述第一参数为所述已处理图像的失真度,所述第一参数的阈值为失真阈值,所述装置还包括:第三确定模块,用于根据所述失真阈值和第三关系,确定所述失真阈值对应的码率阈值;其中,所述第三关系为码率和失真度之间的对应关系。
在上述实施例中,实现了压缩算法与AI处理的解耦,如果要评测新的压缩算法,不需要进行端到端评测,可以只对压缩处理的过程进行评测得到新的压缩算法对应的第三关系即可。同样的,如果要采用新的AI模块对图像进行识别,也可以采用已有的数据对AI识别的过程重新进行评测得到新的第一关系即可,不需要进行端到端的评测。本申请实施例提供的装置可以提高评测的效率。
在一种可能的实现方式中,所述原图像为拜耳原始Bayer RAW图像,所述已处理图像为RAW图像、或RGB图像、或YUV图像。
本申请的图像处理方式的确定装置可以应用于多种场景,通用性强,并且易于扩展对新的压缩算法或者AI模块的评测,可以提高评测的效率。
该图像处理方式的确定装置可以是具有处理功能的芯片或处理器中的程序模块,处理器可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。芯片或处理器通过执行程序可以实现本申请上述实施例的方法。
本申请的实施例提供了一种电子设备,包括:处理器以及用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现本申请上述实施例的方法。
上述图像处理方式的确定装置或电子设备可以是一个通用设备或者是一个专用设备。在具体实现中,该装置还可以台式机、便携式电脑、网络服务器、掌上电脑(personal digital assistant,PDA)、移动手机、平板电脑、无线终端设备、嵌入式设备或其他具有处理功能的设备。本申请实施例不限定该图像处理方式的确定装置的类型。
本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。
本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述方法。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电 磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Electrically Programmable Read-Only-Memory,EPROM或闪存)、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。

Claims (15)

  1. 一种图像处理方式的确定方法,其特征在于,所述方法包括:
    根据第一参数和精度的第一关系,计算所述第一参数中每个数值对应的概率,所述第一参数为码率或失真度;
    其中,所述精度为对已处理图像进行识别的精度,所述第一参数中每个数值对应的概率用于指示对所述第一参数中每个数值对应的所述已处理图像进行识别的精度与第一精度的接近程度,所述第一精度为对原图像进行识别的精度,所述原图像被处理后得到所述已处理图像;
    确定所述第一参数和所述概率的第二关系;
    根据业务要求的概率阈值和所述第二关系,得到所述概率阈值对应的所述第一参数的阈值。
  2. 根据权利要求1所述的方法,其特征在于,根据第一参数和精度的第一关系,计算所述第一参数中每个数值对应的概率,包括:
    针对所述第一参数中每个数值,计算以所述每个数值为中心的滑动窗口内所述第一参数的所有数值对应的精度的精度均值和标准差;
    根据所述精度均值、所述标准差、所述滑动窗口的长度、所述第一精度以及累积分布函数,计算所述每个数值对应的概率。
  3. 根据权利要求2所述的方法,其特征在于,根据所述精度均值、所述标准差、所述滑动窗口的长度、所述第一精度以及累积分布函数,计算所述每个数值对应的概率,包括:
    根据公式
    Figure PCTCN2021084377-appb-100001
    计算每个数值对应的概率,其中,P m表示第一参数m对应的概率,μ表示所述第一精度,n表示滑动窗口的长度,
    Figure PCTCN2021084377-appb-100002
    表示以第一参数m为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的精度均值,σ m表示以第一参数m为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的标准差,
    Figure PCTCN2021084377-appb-100003
    表示自由度为n-1的T分布累积分布函数,n为大于1的正整数。
  4. 根据权利要求1或2所述的方法,其特征在于,所述第一参数为码率,所述第一参数中每个数值对应的所述已处理图像为采用所述第一参数中每个数值对所述原图像进行压缩得到的图像。
  5. 根据权利要求1或2所述的方法,其特征在于,所述第一参数为所述已处理图像的失真度,所述第一参数的阈值为失真阈值,
    所述方法还包括:
    根据所述失真阈值和第三关系,确定所述失真阈值对应的码率阈值;
    其中,所述第三关系为码率和失真度之间的对应关系。
  6. 根据权利要求5所述的方法,其特征在于,
    所述原图像为拜耳原始Bayer RAW图像,所述已处理图像为RAW图像、或RGB图像、或YUV图像。
  7. 一种图像处理方式的确定装置,其特征在于,所述装置包括:
    计算模块,用于根据第一参数和精度的第一关系,计算所述第一参数中每个数值对应的概率,所述第一参数为码率或失真度;
    其中,所述精度为对已处理图像进行识别的精度,所述第一参数中每个数值对应的概率用于指示对所述第一参数中每个数值对应的所述已处理图像进行识别的精度与第一精度的接近程度,所述第一精度为对原图像进行识别的精度,所述原图像被处理后得到所述已处理图像;
    第一确定模块,用于确定所述第一参数和所述概率的第二关系;
    第二确定模块,用于根据业务要求的概率阈值和所述第二关系,得到所述概率阈值对应的所述第一参数的阈值。
  8. 根据权利要求7所述的装置,其特征在于,所述计算模块包括:
    第一计算单元,用于针对所述第一参数中每个数值,计算以所述每个数值为中心的滑动窗口内所述第一参数的所有数值对应的精度的精度均值和标准差;
    第二计算单元,用于根据所述精度均值、所述标准差、所述滑动窗口的长度、所述第一精度以及累积分布函数,计算所述每个数值对应的概率。
  9. 根据权利要求8所述的装置,其特征在于,所述第二计算单元用于根据公式
    Figure PCTCN2021084377-appb-100004
    计算每个数值对应的概率,其中,P m表示第一参数m对应的概率,μ表示所述第一精度,n表示滑动窗口的长度,
    Figure PCTCN2021084377-appb-100005
    表示以第一参数m为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的精度均值,σ m表示以第一参数m为中心、长度为n的滑动窗口内的第一参数的数值对应的精度的标准差,
    Figure PCTCN2021084377-appb-100006
    表示自由度为n-1的T分布累积分布函数,n为大于1的正整数。
  10. 根据权利要求7或8所述的装置,其特征在于,所述第一参数为码率,所述第一参数中每个数值对应的所述已处理图像为采用所述第一参数中每个数值对所述原图像进行压缩 得到的图像。
  11. 根据权利要求7或8所述的装置,其特征在于,所述第一参数为所述已处理图像的失真度,所述第一参数的阈值为失真阈值,
    所述装置还包括:
    第三确定模块,用于根据所述失真阈值和第三关系,确定所述失真阈值对应的码率阈值;
    其中,所述第三关系为码率和失真度之间的对应关系。
  12. 根据权利要求11所述的装置,其特征在于,
    所述原图像为拜耳原始Bayer RAW图像,所述已处理图像为RAW图像、或RGB图像、或YUV图像。
  13. 一种计算机程序产品,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行计算机可读代码实现权利要求1-6中任意一项所述的方法。
  14. 一种非易失性计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1-6中任意一项所述的方法。
  15. 一种电子设备,其特征在于,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令时实现权利要求1-6任意一项所述的方法。
PCT/CN2021/084377 2021-03-31 2021-03-31 图像处理方式的确定方法及装置 WO2022205060A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/084377 WO2022205060A1 (zh) 2021-03-31 2021-03-31 图像处理方式的确定方法及装置
CN202180001348.0A CN113228657B (zh) 2021-03-31 2021-03-31 图像处理方式的确定方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/084377 WO2022205060A1 (zh) 2021-03-31 2021-03-31 图像处理方式的确定方法及装置

Publications (1)

Publication Number Publication Date
WO2022205060A1 true WO2022205060A1 (zh) 2022-10-06

Family

ID=77081353

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084377 WO2022205060A1 (zh) 2021-03-31 2021-03-31 图像处理方式的确定方法及装置

Country Status (2)

Country Link
CN (1) CN113228657B (zh)
WO (1) WO2022205060A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116866211A (zh) * 2023-06-26 2023-10-10 中国信息通信研究院 一种改进的深度合成检测方法和系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080175503A1 (en) * 2006-12-21 2008-07-24 Rohde & Schwarz Gmbh & Co. Kg Method and device for estimating image quality of compressed images and/or video sequences
CN108769685A (zh) * 2018-06-05 2018-11-06 腾讯科技(深圳)有限公司 检测图像压缩编码效率的方法、装置及存储介质
CN111901594A (zh) * 2020-06-29 2020-11-06 北京大学 面向视觉分析任务的图像编码方法、电子设备及介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108271024B (zh) * 2013-12-28 2021-10-26 同济大学 图像编码、解码方法及装置
CN110418175B (zh) * 2018-04-28 2021-10-26 华为技术有限公司 一种v2x动态调整视频传输参数的方法和相关产品
CN111332289B (zh) * 2020-03-23 2021-05-07 腾讯科技(深圳)有限公司 车辆运行环境数据获取方法、装置和存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080175503A1 (en) * 2006-12-21 2008-07-24 Rohde & Schwarz Gmbh & Co. Kg Method and device for estimating image quality of compressed images and/or video sequences
CN108769685A (zh) * 2018-06-05 2018-11-06 腾讯科技(深圳)有限公司 检测图像压缩编码效率的方法、装置及存储介质
CN111901594A (zh) * 2020-06-29 2020-11-06 北京大学 面向视觉分析任务的图像编码方法、电子设备及介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116866211A (zh) * 2023-06-26 2023-10-10 中国信息通信研究院 一种改进的深度合成检测方法和系统
CN116866211B (zh) * 2023-06-26 2024-02-23 中国信息通信研究院 一种改进的深度合成检测方法和系统

Also Published As

Publication number Publication date
CN113228657A (zh) 2021-08-06
CN113228657B (zh) 2022-08-09

Similar Documents

Publication Publication Date Title
WO2022205058A1 (zh) 图像处理方式的确定方法及装置
US20210337217A1 (en) Video analytics encoding for improved efficiency of video processing and compression
US11893761B2 (en) Image processing apparatus and method
KR20160032137A (ko) 특징 기반 영상 집합 압축
WO2022205060A1 (zh) 图像处理方式的确定方法及装置
CN113507611B (zh) 图像存储方法、装置、计算机设备和存储介质
Chan et al. Influence of AVC and HEVC compression on detection of vehicles through Faster R-CNN
CN114139703A (zh) 知识蒸馏方法及装置、存储介质及电子设备
WO2022067775A1 (zh) 点云的编码、解码方法、编码器、解码器以及编解码系统
US10904542B2 (en) Image transcoding method and apparatus
US11164328B2 (en) Object region detection method, object region detection apparatus, and non-transitory computer-readable medium thereof
US20240070924A1 (en) Compression of temporal data by using geometry-based point cloud compression
CN114564310A (zh) 一种数据处理方法、装置、电子设备和可读存储介质
US20130272583A1 (en) Methods and apparatuses for facilitating face image analysis
EP4216553A1 (en) Point cloud decoding and encoding method, and decoder, encoder and encoding and decoding system
CN114648712A (zh) 视频分类方法、装置、电子设备及计算机可读存储介质
US10706492B2 (en) Image compression/decompression in a computer vision system
WO2020107376A1 (zh) 图像处理的方法、设备及存储介质
CN116996749B (zh) 多监控画面下的远程目标对象跟踪系统和方法
WO2023203687A1 (en) Accuracy predicting system, accuracy predicting method, apparatus, and non-transitory computer-readable storage medium
US20240013426A1 (en) Image processing system and method for image processing
WO2022257528A1 (zh) 点云属性的预测方法、装置及相关设备
CN117146739B (zh) 用于光学瞄准镜的测角检定方法及系统
WO2022213843A1 (zh) 一种图像处理方法、训练方法及装置
US20230276059A1 (en) On-vehicle device, management system, and upload method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21933737

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21933737

Country of ref document: EP

Kind code of ref document: A1