CN116128827A

CN116128827A - Intelligent evaluation method, device, equipment and computer readable storage medium

Info

Publication number: CN116128827A
Application number: CN202211732768.9A
Authority: CN
Inventors: 陈晨
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Spreadtrum Communications Shanghai Co Ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-05-16

Abstract

The invention provides a high dynamic range image evaluation method based on a deep neural network, which comprises the following steps: scene identification and region of interest division in the high dynamic range image are performed through a deep volume neural network; forming a training data set by the high dynamic range image after the scene recognition and the region of interest division; sharing the training data set by constructing a data analysis model and a scoring model; the training of the data analysis model is completed by taking a high-efficiency convolutional neural network applied by mobile vision as a basis; and the scoring model finishes scoring of the high dynamic range image according to the scene identification information and the region of interest output by the data analysis model, sequentially sends the collected verification set image into the trained analysis model and scoring model, and verifies the division of the region of interest and the final scoring and the expected conformity so as to improve the evaluation effect of the non-reference HDR image quality.

Description

Intelligent evaluation method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of image quality evaluation technologies, and in particular, to a method, an apparatus, a device, and a computer readable storage medium for evaluating a non-reference high dynamic range image based on a deep convolutional neural network.

Background

The development of high dynamic range (High Dynamic Range, HDR) imaging technology has changed the traditional image display approach, which can bring a more realistic visual experience to people. However, degradation is inevitably introduced during the acquisition, compression, storage and transmission of the image. The image quality directly reflects the experience quality of the user, reduces and even completely eliminates the common wish of image consumers, and research on the quality evaluation of the high dynamic range image can effectively help solve some degradation problems.

The image quality evaluation can be divided into a subjective quality evaluation method and an objective quality evaluation method in terms of methods, and the subjective quality evaluation method is time-consuming, high in cost and difficult to operate, so that an appropriate objective quality evaluation model needs to be established to predict the image quality. Objective quality evaluation methods for high dynamic range images can be classified into high dynamic range image quality evaluation methods based on low dynamic range (Low Dynamic Range, LDR) image quality evaluation and quality evaluation methods designed for high dynamic range images. Conventional low dynamic range image quality evaluation methods, such as MSE (PSNR), SSIM, MSSIM, VIF, VSNR, etc., cannot be directly used for quality evaluation of high dynamic range images because these methods are designed assuming that the pixel values of the image and the pixel values perceived by the human eye satisfy a linear relationship, which is not true in high dynamic range images. The high dynamic range image quality evaluation method based on low dynamic range image quality evaluation needs to perform log operation or PU coding pretreatment on the image so that the pixel value of the image and the pixel value perceived by human eyes approximately meet the linear relation, and then uses the quality evaluation method of the low dynamic range image.

Currently, there are only a few quality evaluation methods designed for high dynamic range images, typically an HDR-VDP-2 method with optimized HDR-VDP-2 and weights thereof proposed by Mantiuk et al, and an HDR-VQM method designed for high dynamic range video quality evaluation. The methods simulate the human eyes to perceive the high brightness range of the high dynamic range image well, and are widely applied. However, these methods are all high dynamic range image quality evaluation methods with full reference, and require a reference image and a distorted image, however, in practical applications, the reference image is often not available or not present. Currently, research on a non-reference high dynamic range image objective quality evaluation method is relatively lacking. Therefore, accurate evaluation of the quality of a reference-free high dynamic range image is an urgent problem to be solved.

The excellent high dynamic range image objective quality evaluation method can well reflect the visual perception characteristic of human eyes, the human eyes can perceive the image distortion as the result of the combined action of chromatic distortion and brightness distortion, and the full-reference objective quality evaluation method designed for the high dynamic range image only considers the brightness distortion and ignores the chromatic distortion, which is inconsistent with the visual perception of human eyes, especially for the bright high dynamic range image.

The invention relates to an image quality evaluation method, which aims to solve the quality evaluation problem of an HDR image without reference and with a high dynamic range;

with the rapid development of mobile phone image technology, the requirements of people on image quality are higher and higher, and the requirements on the dynamic range of images are also higher and higher, and certainly, the image quality evaluation methods are also more and more varied, but how to evaluate images based on human vision is still a difficult point;

the HDR image can simultaneously display a highlight region and a low-highlight region in the same image, so that the exposure display range is wider, and the exposure display range is more in line with human visual characteristics. HDR can well solve the display problem, so that the image details are more abundant. Therefore, to ensure that the system is able to provide a good visual experience, it is critical to evaluate the quality of an HDR image.

Although subjective evaluation of human eyes is a final standard of image quality, a large amount of image data is excessively time-consuming and tedious to evaluate, and no method is available for mainstream rapid evaluation;

according to the degree of dependence of the image objective evaluation method on the reference image, the objective quality evaluation method can be divided into three image quality evaluation methods of full reference, half reference and no reference. With the continuous and intensive research, the accuracy of the full reference quality evaluation method is higher and higher, but the method needs a large amount of undistorted reference images, and is difficult to realize for rapidly evaluating the quality of a group of images. Therefore, the no-reference image quality evaluation method does not require any information of the no-distortion reference image, and can evaluate the quality of the distorted image from the distorted image alone, and thus has become a research hotspot in the fields of machine vision and image processing.

Disclosure of Invention

The invention aims to provide a high dynamic range image evaluation method, equipment and device based on a deep neural network, and a computer readable storage medium, which are used for improving the evaluation quality and the evaluation efficiency of HDR.

In a first aspect, an embodiment of the present invention provides a method for evaluating a high dynamic range image based on a deep neural network, where the method includes: scene identification and region of interest division in the high dynamic range image are performed through a deep volume neural network; forming a training data set by the high dynamic range image after the scene recognition and the region of interest division; sharing the training data set by constructing a data analysis model and a scoring model; the training of the data analysis model is completed by taking a high-efficiency convolutional neural network applied by mobile vision as a basis; the data analysis model analyzes the scene identification information and extracts the interest region according to the input data; the scoring model finishes scoring of the high dynamic range image according to the scene identification information and the interest area output by the data analysis model; and sequentially sending the collected verification set images into the trained analysis model and scoring model, and checking the division of the region of interest and the final scoring and the expected conformity.

The code scanning device provided by the embodiment of the invention has the beneficial effects that: the invention aims to eliminate the influence of other colors, compositions and the like on the image on the automatic evaluation of the dynamic range, compared with a manual mode for carrying out HDR image dynamic range scoring, the image data analysis and effect scoring are completed by using a deep learning mode, so that the efficiency of image quality evaluation can be effectively improved, the cost of image evaluation is reduced, and meanwhile, the subjective impression and effect bias which are easy to occur in the manual scoring process can be effectively avoided by using the deep learning mode to construct a standard scoring model, and the quality and objectification of the evaluation are greatly improved.

Compared with the traditional deep learning model for evaluating the dynamic range of the HDR image by using one model, the dynamic range evaluation method is easy to be influenced by other image features such as colors, distortion, composition and the like, so that the dynamic range evaluation is influenced, other noise elements are introduced into the dynamic range evaluation result, and finally the dynamic range evaluation result is influenced. Meanwhile, the dynamic range evaluation standards of human eyes on different scenes have huge differences, and the evaluation differences on different scenes are large.

Compared with the two schemes, the method classifies the collection of the data features according to the scenes, and uses the preposed data processing model to complete the extraction of the data features and the acquisition of the scene information; the scene information and the characteristic information output by the front characteristic acquisition module are used as the input of the scoring model, so that the influence of other noise information such as color, composition and the like on dynamic range evaluation is effectively avoided, and meanwhile, the scoring model distinguishes the scene information, so that the scoring model is insensitive to scenes and the evaluation is more objective.

In a further embodiment, the method for evaluating, by using the efficient convolutional neural network of the mobile vision application as a base, the training of the data analysis model includes: the data analysis model draws attention modules that capture the links between global and local information features of the high dynamic range image in the training dataset and uses hard_s_sigmoid to replace the ReLU function as an activation function.

In a further embodiment, the attention module is configured by a first convolution layer, a second convolution layer, a pooling layer, a normalization layer, a hard_s imo i d activation function, and a third convolution layer.

In some embodiments, the scoring model is consistent with the data analysis model.

In some other embodiments, the evaluation method provided includes: constructing an initial training set, wherein the initial training set comprises the high dynamic range images with different exposure levels acquired by different intelligent mobile terminals; and respectively carrying out unclonable scoring and labeling on the interest areas of the high dynamic range images by the initial training set.

In still other embodiments, the evaluation method provided includes: the initial training set is trained by constructing a data analysis model using a mobil eNet V3 model.

In some alternative embodiments, the evaluation method is provided, wherein the initial training set is trained by constructing a scoring model through a mobil eNet V3 model.

In a possible implementation manner, the present invention provides a high dynamic range image evaluation device based on a deep neural network, including:

the training data set comprises the high dynamic range images with different exposure levels collected by different intelligent mobile terminals; the data analysis module is used for identifying scenes in the high dynamic range image and dividing the regions of interest;

the scoring module is used for scoring the high dynamic range image according to the scene identification information and the interest area output by the data analysis model;

the data set is used for sharing the comprehensive data verification module by the data analysis model and the scoring model, collecting verification set images, and sequentially sending the verification set images into the trained analysis model and scoring model to verify the division of the region of interest and the final scoring and the expected conformity.

In a possible implementation manner, the invention provides an intelligent evaluation apparatus, which is characterized in that the intelligent evaluation apparatus comprises a processor, a memory and a computer program stored in the memory, the processor executing the computer program to implement the evaluation method according to any one of claims 1 to 7.

In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the evaluation method according to any one of claims 1-7.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of an HDR analysis and scoring model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a Squeeze-and-excitat ion model provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of an evaluation device according to an embodiment of the present invention;

FIG. 4 is a schematic view of a daytime region of interest division according to an embodiment of the present invention;

fig. 5 is a schematic view of a night region of interest division according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an electronic device according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a hardware configuration block diagram of an evaluation apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are described below with reference to the accompanying drawings in the embodiments of the present invention. In describing embodiments of the present invention, the terminology used in the embodiments below is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of this application and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include, for example, "one or more" such forms of expression, unless the context clearly indicates to the contrary. It should also be understood that in the various embodiments herein below, "at least one", "one or more" means one or more than two (including two). The term "and/or" is used to describe an association relationship of associated objects, meaning that there may be three relationships; for example, a and/or B may represent: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise. The term "coupled" includes both direct and indirect connections, unless stated otherwise. The terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.

In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

The evaluation method provided in the embodiment of the invention, as shown in fig. 1, is a high dynamic range image evaluation method based on a deep neural network, and the evaluation method comprises the following steps: scene identification and region of interest division in the high dynamic range image are performed through a deep volume neural network; forming a training data set by the high dynamic range image after the scene recognition and the region of interest division; sharing the training data set by constructing a data analysis model and a scoring model; the training of the data analysis model is completed by taking a high-efficiency convolutional neural network applied by mobile vision as a basis; the data analysis model analyzes the scene identification information and extracts the interest region according to the input data; and the scoring model finishes scoring of the high dynamic range image according to the scene identification information and the region of interest output by the data analysis model.

In an alternative evaluation method implementation example, since the HDR image focuses on the evaluation of the dynamic range of the image, the influence of other factors such as color, distortion, composition, etc. on the dynamic range evaluation is easily introduced for the training mode using the whole image, and the scene information and the ROI area generated by the pattern recognition and the region-of-interest division model are used as inputs, so that the influence of the irrelevant image elements except the ROI area on the dynamic range evaluation can be effectively avoided.

The subjective evaluation of the distorted image quality by adopting the manual method has the defects of low efficiency and high cost, and is difficult to evaluate the image quality accurately, in real time and efficiently, so that the subjective evaluation algorithm for researching objective images is a necessary trend. With the development of artificial intelligence, it has been possible to simulate the perception process of the human visual system with a computer.

The depth neural network has the greatest advantage that the image characteristic extraction and regression processes are integrated in an optimization framework, so that the end-to-end learning is truly realized. Because human beings have differences in evaluation standards of high-dynamic images of different scenes, and meanwhile, the attention points and areas of human beings are also different for the high-dynamic images; therefore, the Mob i l eNet V3 is used for completing scene recognition of the image and division of the ROI region, and the Mob i l eNet V3 model is used for training of the image scoring model in an iterative mode.

The region of interest training model is shown in FIG. 2; first, the convolution layer of 1*1 is used for up-scaling, and after convolution is completed, batch normalization (Batch Norma l i ze, BN) and hard_s_i gmo i d activation functions are performed; the second layer is a Depthwi se (DW) convolution with the size of 3*3, and a BN normalization layer and a hard_sIgmo i d activation function are still connected behind the convolution layer; the last convolution layer is the convolution layer of 1*1, plays the effect of dimension reduction. Finally, the neural network outputs high and low light ROI areas and scene types, wherein the scene types are divided into two main types according to the day and night, and different scene information is distinguished according to pattern recognition, such as: sky, building, green plants, people, etc.

As shown in fig. 2, 100-training set image; a convolution layer of 200-1*1; 300-3 x 3 Depthwise (DW) convolutional layer; 400-pooling layer; 500-batch normalization (Batch Norma l i ze, BN); 600-hard_s_i gmo i d activates a function; 700-1*1 convolution layer, which plays a role in dimension reduction; 800-squeze-and-exci tat i on (SE) module.

As shown in fig. 4, R frame is a highlight region recognized by the model, and B frame is a dark region recognized by the model. After scene recognition is completed, scoring of the HDR effect of the corresponding scene and the ROI is completed manually. Thus, the data preparation of the HDR image automatic scoring function can be completed.

As shown in fig. 5, the dark area is identified by the above model in the B frame, and the highlight area is identified by the model in each of the R frames. After scene recognition is completed, scoring of the HDR effect of the corresponding scene and the ROI is completed manually. Thus, the data preparation of the HDR image automatic scoring function can be completed.

The training set image acquisition method comprises the following steps: and shooting equipment with different grades is used for collecting images with different exposure levels so as to ensure the diversity of the data concentration environment.

Manual marking: 10 persons dividing the region of interest and dividing the region of interest are selected, and in order to obtain more comprehensive and objective evaluation, the composition of the persons comprises: 3 image review professionals, 3 photographers, 2 image algorithm-related persons, 2 social recruiters. The manual scoring adopts a tail-biting average method, namely, the average value is obtained after the highest score and the lowest score are removed.

The scoring model structure also uses Mob i l eNet V3, as shown in FIG. 2, and can share training sets with the data analysis model;

after model training is completed, collecting verification set images, and sequentially sending the verification set images into a trained analysis model and a scoring model to check whether the division and final scoring of the region of interest meet the expectations;

in some alternative embodiments, a deep convolutional neural (Deep convo l ut i ona l neura lnetworks, DCNN) network is used for scene recognition and region of interest segmentation. Because human eyes perceive different dynamic ranges of different images in different scenes, the DCNN network is used for identifying scene information and interested (regi on of I nterest, RO I) areas, so that the input of an evaluation model is more specific, and the influence of other colors, compositions and the like on the images on the automatic evaluation of the dynamic ranges is eliminated.

In other embodiments, the initial training data sets are used in combination; dividing and marking highlight and low-light areas on the image in a manual marking mode to form a region-of-interest dividing data set, and manually marking the HDR image to finish the establishment of the marking data set; both may share an image data source. It is worth mentioning that the manual labeling mode is unique and is not reproducible.

In some other embodiments, the data analysis model and scoring model are independent of each other, but complement each other. The data analysis model completes analysis of scene information and extraction of the ROI according to the input data; the scoring model finishes scoring of the image according to the ROI and scene information output by the data analysis model, and the scoring model and the ROI and the scene information can share a training data set and can train at the same time without mutual interference.

In an alternative embodiment, the training of the data analysis and scoring model is accomplished using Mob i l eNetV3 as the basis. In which Mob i l eNet V3 incorporates an SE (Squeeze-and-exc i tat i on) module, as shown in FIG. 2, and replaces the ReLU function with hard_s i gmo i d as the activation function. The SE module is similar to an attention module and can flexibly capture the relation between the global information and the local information. The method aims at obtaining a target area needing important attention by a model, putting more weight on the target area, highlighting significant useful features and suppressing and ignoring irrelevant features, wherein SE mainly comprises the following steps:

global tie pooling, compressing the corresponding spatial information on each channel of the most data obtained by the first layer convolution to a constant of the corresponding channel, wherein one pixel represents one channel at the moment and becomes a vector; inputting the vector obtained in the last step into two full-connection layers, and activating a function through hard_sIgmoid to obtain a weight value; operating the characteristics according to the weight table obtained in the last step, and finishing the characteristic key degree output;

in an alternative embodiment of the present invention, as shown in fig. 3, a high dynamic range image evaluation apparatus 1000 based on a deep neural network includes:

a training data set 100 containing the high dynamic range images of different exposure levels acquired by different intelligent mobile terminals; a data analysis module 120 for identifying scenes and dividing regions of interest in the high dynamic range image;

a scoring module 130 for scoring the scene identification information and the region of interest output according to the data analysis model to complete a high dynamic range image;

the data set is shared by the data analysis model and the scoring model;

the comprehensive data verification module 150 collects verification set images, and sends the verification set images into the trained analysis model and scoring model in sequence, so as to verify the division of the region of interest and finally score the degree of coincidence with the expected degree.

FIG. 6 is a schematic diagram of an example of a smart test device in accordance with an embodiment of the present application. As shown in fig. 6, the electronic device 900 of this embodiment includes: a processor 910, a memory 920 and a computer program 930 stored in the memory 920 and executable on the processor 910. The processor 910, when executing the computer program 930, implements the steps in the foregoing evaluation method embodiments.

In some embodiments, FIG. 7 shows a hardware configuration block diagram of the intelligent assessment device 30. The intelligent code scanning apparatus 30 includes at least one of a modem 310, a mobile communication module 320, a wireless communication module 330, a collector 340, an external device interface 350, a controller 360, a display 370, an audio output interface 380, a memory, a power supply, and a user interface.

In still other embodiments, the modem 310 senses electromagnetic waves through an antenna, converts the sensed electromagnetic waves into electrical signals, processes and transforms the electrical signals into sound, receives broadcast signals, for example, by wireless reception, and demodulates audio signals from the broadcast signals.

The mobile communication module 320 may provide a solution including 2G/3G/4G/5G wireless communication applied to the smart code scanning device 30. The mobile communication module 320 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 320 may receive electromagnetic waves from an antenna, filter, amplify, etc., the received electromagnetic waves, and transmit the electromagnetic waves to the modem 310 for demodulation. The mobile communication module 320 may also amplify the signal modulated by the modem 310, and convert the signal into electromagnetic waves through an antenna to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 320 may be provided in the controller 360. In some embodiments, at least some of the functional modules of the mobile communication module 320 may be provided in the same device as at least some of the modules of the controller 360.

The wireless communication module 330 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., applied to the smart code scanning device 30. The wireless communication module 330 may be one or more devices that integrate at least one communication processing module. The wireless communication module 330 receives electromagnetic waves via an antenna, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the controller 360. The wireless communication module 330 may also receive a signal to be transmitted from the controller 360, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via an antenna.

In other embodiments, the collector 340 is configured to collect signals of the external environment or interaction with the outside. For example, the collector 340 includes a light receiver, a sensor for collecting the intensity of ambient light; alternatively, the collector 340 includes an image collector, such as a camera, which may be used to collect external environmental scenes, attributes of a user, or user interaction gestures, or alternatively, the collector 340 includes a sound collector, such as a microphone, for receiving external sounds.

In still other embodiments, the external device interface 350 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, or the like. The input/output interface may be a composite input/output interface formed by a plurality of interfaces.

In other embodiments, the controller 360 and the modem 310 may be located in separate devices, i.e., the modem 310 may also be located in an external device to the host device in which the controller 360 is located, such as an external set-top box or the like.

In still other embodiments, the controller 360 controls the operation of the display device and responds to user operations by various software control programs stored on the memory. The controller 360 controls the overall operation of the intelligent code scanning device 30. For example: in response to receiving a user command to select a UI object to be displayed on the display 370, the controller 360 may perform an operation related to the object selected by the user command.

In some possible embodiments, the controller 360 includes at least one of a central processing unit (central processing unit, CPU), a video processor, an audio processor, a graphics processor (graphics processing unit, GPU), RAM, ROM, first to nth interfaces for input/output, a communication Bus (Bus), and the like.

And the central processing unit is used for executing the operating system and application program instructions stored in the memory and executing various application programs, data and contents according to various interaction instructions received from the outside so as to finally display and play various audio and video contents. The central processor may include a plurality of processors. Such as one main processor and one or more sub-processors.

In some embodiments, a graphics processor is used to generate various graphical objects, such as: at least one of icons, operation menus, and user input instruction display graphics. The graphic processor comprises an arithmetic unit, which is used for receiving various interactive instructions input by a user to operate and displaying various objects according to display attributes; the device also comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the video processor is configured to receive an external video signal, and perform at least one of decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, image synthesis, and other video processing according to a standard codec of an input signal, so as to obtain a signal displayed or played on the direct intelligent code scanning device 30.

In some embodiments, the video processor includes at least one of a demultiplexing module, a video decoding module, an image compositing module, a frame rate conversion module, a display formatting module, and the like. The demultiplexing module is used for demultiplexing the input audio and video data stream. And the video decoding module is used for processing the demultiplexed video signal, including decoding, scaling and the like. And the image synthesis module, such as an image synthesizer, is used for carrying out superposition mixing processing on the graphic user interface signals generated by the graphic generator according to user input or the graphic user interface signals generated by the graphic generator and the video images after the scaling processing so as to generate image signals for display. And the frame rate conversion module is used for converting the frame rate of the input video. And the display formatting module is used for converting the received frame rate into a video output signal and changing the video output signal to be in accordance with a display format, such as outputting RGB data signals.

In some embodiments, the audio processor is configured to receive an external audio signal, decompress and decode according to a standard codec protocol of an input signal, and at least one of noise reduction, digital-to-analog conversion, and amplification, to obtain a sound signal that can be played in the speaker.

In some embodiments, a user may input user commands through a graphical user interface displayed on the display 370, and the user input interface receives the user input commands through the graphical user interface. Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through the sensor to receive the user input command.

In some embodiments, a "user interface" is a media interface for interaction and exchange of information between an application or operating system and a user that enables conversion between an internal form of information and a form acceptable to the user. A graphical user interface refers to a user interface that is graphically displayed in connection with the operation of a computer. It may be an interface element such as an icon, window, control, etc. displayed in a display screen of the electronic device, where the control may include at least one of a visual interface element such as an icon, button, menu, tab, text box, dialog box, status bar, navigation bar, etc.

In some embodiments, the display 370 includes a display screen component for presenting pictures, and a driving component for driving image display, components for receiving image signals from the controller output, displaying video content, image content, and menu manipulation interfaces, and user manipulation interfaces, etc.

In other embodiments, the display 370 may be at least one of a liquid crystal display, an electro-mechanical laser (organic light emitting diode, OLED) display, and a projection display, and may also be a projection device and a projection screen.

In still other embodiments, audio output interface 380 includes speakers, external audio output electronics, and the like.

In some embodiments, the user interface is an interface (e.g., a physical key on the display device body, or the like) that may be used to receive control inputs.

In a specific implementation, the intelligent code scanning device 30 may be a mobile phone, a tablet computer, a handheld computer, a personal computer (personal computer, PC), a cellular phone, a personal digital assistant (personal digital assistant, PDA), a wearable device (such as a smart watch), an intelligent home device (such as a television), a vehicle-mounted computer, a game console, and an electronic product including a camera such as an augmented reality (augmented reality, AR) \virtual reality (VR) device, where the specific device configuration of the intelligent code scanning device 30 is not limited in this embodiment.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.

The functional units in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present invention may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: flash memory, removable hard disk, read-only memory, random access memory, magnetic or optical disk, and the like.

The foregoing is merely a specific implementation of the embodiment of the present invention, but the protection scope of the embodiment of the present invention is not limited to this, and any changes or substitutions within the technical scope disclosed in the embodiment of the present invention should be covered in the protection scope of the embodiment of the present invention. Therefore, the protection scope of the embodiments of the present invention shall be subject to the protection scope of the claims.

Claims

1. A high dynamic range image evaluation method based on a deep neural network, characterized in that the evaluation method comprises:

scene identification and region of interest division in the high dynamic range image are performed through a deep volume neural network;

forming a training data set by the high dynamic range image after the scene recognition and the region of interest division;

sharing the training data set by constructing a data analysis model and a scoring model;

the training of the data analysis model is completed by taking a high-efficiency convolutional neural network applied by mobile vision as a basis;

the data analysis model analyzes the scene identification information and extracts the interest region according to the input data;

the scoring model finishes scoring of the high dynamic range image according to the scene identification information and the interest area output by the data analysis model;

and sequentially sending the collected verification set images into the trained analysis model and scoring model, and checking the division of the region of interest and the final scoring and the expected conformity.

2. The method of evaluating of claim 1, wherein the training of the data analysis model is accomplished by using a high-efficiency convolutional neural network of a mobile vision application as a basis, comprising: the data analysis model draws attention modules that capture the links between global and local information features of the high dynamic range image in the training dataset and uses hard_sigmoid to replace the ReLU function as an activation function.

3. The assessment method according to claim 2, wherein said attention module is composed of a first convolution layer, a second convolution layer, a pooling layer, a normalization layer, a hard_sigmoid activation function, a third convolution layer.

4. The method of evaluation of claim 2, wherein the scoring model is consistent with the data analysis model.

5. The evaluation method according to claim 1, comprising: constructing an initial training set, wherein the initial training set comprises the high dynamic range images with different exposure levels acquired by different intelligent mobile terminals; and respectively carrying out unclonable scoring and labeling on the interest areas of the high dynamic range images by the initial training set.

6. The evaluation method according to claim 1, comprising: the initial training set is trained by constructing a data analysis model using the mobi leNet V3 model.

7. The method of evaluation of claim 6, wherein the initial training set is trained by constructing a scoring model from the mobi leNet V3 model.

8. A depth neural network-based high dynamic range image evaluation device, comprising:

the data set is shared by the data analysis model and the scoring model;

and the comprehensive data verification module is used for collecting verification set images, and sequentially sending the verification set images into the trained analysis model and scoring model to verify the division of the region of interest and the final scoring and the expected conformity.

9. The intelligent evaluation apparatus according to claim 7, comprising a processor, a memory, and a computer program stored in the memory, the processor executing the computer program to implement the evaluation method according to any one of claims 1 to 7.

10. A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the evaluation method according to any one of claims 1-7.