WO2023130990A1 - 图像处理方法、装置、设备、存储介质和程序产品 - Google Patents

图像处理方法、装置、设备、存储介质和程序产品 Download PDF

Info

Publication number
WO2023130990A1
WO2023130990A1 PCT/CN2022/141744 CN2022141744W WO2023130990A1 WO 2023130990 A1 WO2023130990 A1 WO 2023130990A1 CN 2022141744 W CN2022141744 W CN 2022141744W WO 2023130990 A1 WO2023130990 A1 WO 2023130990A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
image processing
type
salient
Prior art date
Application number
PCT/CN2022/141744
Other languages
English (en)
French (fr)
Inventor
丁大钧
肖斌
王宇
朱聪超
Original Assignee
荣耀终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 荣耀终端有限公司 filed Critical 荣耀终端有限公司
Publication of WO2023130990A1 publication Critical patent/WO2023130990A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present application relates to the technical field of terminals, and in particular to an image processing method, device, equipment, storage medium and program product.
  • terminals such as mobile phones and tablet computers have increasingly powerful functions and gradually become indispensable tools in people's work and life.
  • the terminal is usually provided with a camera to realize the shooting function, and in order to improve the display effect of the captured image, the terminal often processes the captured image to improve the image quality and achieve the visual experience expected by the user.
  • the present application provides an image processing method, device, device, storage medium and program product, which can improve the overall quality of an image and bring better visual experience to users. Described technical scheme is as follows:
  • an image processing method is provided.
  • the first image to be processed is obtained, and then the salient object detection is performed on the first image to obtain the salient region in the first image.
  • target identification is performed on the salient area to obtain the type of each of the n targets contained in the salient area, where n is a positive integer.
  • the first image is processed according to the image processing strategy of each of the n objects to obtain the second image.
  • the first image is an image that needs to be processed to improve its display effect.
  • the first image may be a captured image.
  • Salient target detection is a target detection based on visual saliency, that is, target detection that simulates human visual characteristics.
  • the salient object is the object that the user is most interested in and noticeable in the image.
  • the target image processing strategy is an image processing strategy that can improve the display effect of the target. That is, for a certain object, after the object is processed according to the image processing strategy of the object, the user will obtain a better visual experience when viewing the object.
  • the image processing strategy may include one or more image processing operations of denoising, sharpening, color, etc., and may also include the intensity of each image processing operation.
  • the descending order of denoising intensity may include strong denoising, common denoising, and weak denoising.
  • sharpening intensity from high to low it can include strong sharpening, normal sharpening, and weak sharpening.
  • color intensity from high to low it can include strong color, normal color and weak color.
  • the image processing strategy can be obtained according to the type of the target, and then according to the image processing strategy, the differential processing of the target in the salient area of the first image can be realized, so that the display effect of the main part of the image can be realized
  • the improvement can improve the overall quality of the image, thereby effectively improving the user's visual experience.
  • the salient area may contain multiple targets
  • the image segmentation of the salient area can be performed first to obtain n in the salient area
  • Each of the n target areas contains a target, and then target recognition is performed on each of the n target areas to obtain the type of target contained in each target area. In this way, the type of each object among the n objects in the salient area can be accurately obtained.
  • the types of objects in this application can include not only portraits, animals, plants, buildings, etc., but also subcategories under each category.
  • portraits it can include yellow people, white people, etc. subclasses such as race and black race, and for animals, subclasses such as cats and dogs can be included.
  • animals subclasses such as cats and dogs can be included.
  • the operation of obtaining the image processing strategy of each object according to the type of each object in the n objects may be: for each object in the n objects, according to the type of the object, from the object type In the corresponding relationship with the image processing strategy, the corresponding image processing strategy is obtained as the image processing strategy of this target.
  • the corresponding relationship between target types and image processing strategies can be stored in advance, the corresponding relationship includes multiple target types and multiple image processing strategies corresponding to the multiple target types one-to-one, and the image processing strategy corresponding to each target type is the image processing strategy for objects belonging to this object type.
  • the corresponding relationship can be set by the technician according to the visual experience requirements. For example, in the user's visual experience, the portrait may need higher definition and weaker denoising, so that the image corresponding to the target type of portrait can be set accordingly. Processing strategy, and in the user's visual experience, the building may want its lines to be prominent and sharpened stronger, so that the image processing strategy corresponding to the target type of building can be set accordingly.
  • the image processing strategy of the target can be obtained according to the corresponding relationship between the target type and the image processing strategy set by the technician in advance according to the visual experience requirements, so that the obtained image processing strategy is more in line with the user's visual experience need.
  • the proportion of each of the n targets in the first image may also be obtained.
  • the operation of obtaining the image processing strategy of each object according to the type of each object in the n objects can be: obtaining the image processing strategy of each object according to the type and proportion of each object in the n objects Strategy.
  • Objects of different sizes may require different image processing strategies to enhance the display. Therefore, the image processing strategy can be obtained according to the type of the target and its proportion in the image, so that the obtained image processing strategy can improve the display effect of the target more accurately.
  • the operation of obtaining the image processing strategy of each target according to the type and proportion of each target in the n targets can be: for each target in the n targets, according to the type and proportion of the target , from the corresponding relationship between the target type, target scale range and image processing strategy, the corresponding image processing strategy is obtained as the image processing strategy of this target.
  • Correspondence between target types, target scale ranges and image processing strategies can be stored in advance, the correspondence includes multiple target types, multiple target scale ranges and multiple image processing strategies in one-to-one correspondence, each target type and
  • the image processing strategy corresponding to the target ratio range is an image processing strategy when the proportion of objects belonging to the target type is within the target ratio range.
  • the corresponding relationship can be set by technicians according to visual experience requirements.
  • the operation of processing the first image according to the image processing strategy of each of the n objects to obtain the second image may be as follows: for each of the n objects, according to the image processing strategy of this object to The target is processed to obtain the processed target; the processed n targets are image-fused with the background area of the first image to obtain a second image, and the background area is the first image except the salient area. other areas.
  • the background area of the first image is kept unchanged, and the salient area of the first image is processed to obtain the second image. That is, the background area of the second image is the same as that of the first image, and the salient area of the second image is obtained by processing n objects in the salient area of the first image. In this way, a better display effect of the main part of the second image can be ensured, thereby ensuring a higher overall quality of the second image, which can bring a better visual experience to the user.
  • an image processing device in a second aspect, is provided, and the image processing device has a function of implementing the behavior of the image processing method in the first aspect above.
  • the image processing apparatus includes at least one module, and the at least one module is configured to implement the image processing method provided in the first aspect above.
  • a third aspect provides an image processing device, the structure of the image processing device includes a processor and a memory, and the memory is used to store a program that supports the image processing device to execute the image processing method provided in the first aspect above, And storing the data involved in implementing the image processing method described in the first aspect above.
  • the processor is configured to execute programs stored in the memory.
  • the image processing apparatus may further include a communication bus for establishing a connection between the processor and the memory.
  • a computer-readable storage medium wherein instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer is made to execute the image processing method described in the above-mentioned first aspect.
  • a fifth aspect provides a computer program product containing instructions, which when run on a computer causes the computer to execute the image processing method described in the first aspect above.
  • FIG. 1 is a schematic structural diagram of a terminal provided in an embodiment of the present application.
  • FIG. 2 is a block diagram of a software system of a terminal provided in an embodiment of the present application
  • FIG. 3 is a flow chart of an image processing method provided in an embodiment of the present application.
  • Fig. 4 is a schematic diagram of a first image provided by an embodiment of the present application.
  • Fig. 5 is a schematic diagram of a distinctive area provided by the embodiment of the present application.
  • Fig. 6 is a schematic diagram of another distinctive area provided by the embodiment of the present application.
  • Fig. 7 is a schematic diagram of another distinctive area provided by the embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • phrases such as “one embodiment” or “some embodiments” described in this application mean that a particular feature, structure, or characteristic described by the embodiment is included in one or more embodiments of the present application.
  • appearances of "in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this application are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically stated otherwise.
  • the terms “including”, “comprising”, “having” and their variations all mean “including but not limited to”, unless specifically stated otherwise.
  • FIG. 1 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, and an antenna 1 , antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and a subscriber identity module (subscriber identity module, SIM) card interface 195, etc.
  • SIM subscriber identity module
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an environmental Light sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the terminal 100 .
  • the terminal 100 may include more or fewer components than shown in the figure, or combine some components, or separate some components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU) wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit, NPU
  • the controller may be the nerve center and command center of the terminal 100 .
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be directly recalled from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • the charging management module 140 is configured to receive a charging input from a charger.
  • the charger may be a wireless charger or a wired charger.
  • the wireless communication function of the terminal 100 can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • the terminal 100 realizes the display function through the GPU, the display screen 194, and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the terminal 100 can realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 and the application processor.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. Such as saving music, video and other files in the external memory card.
  • the internal memory 121 may be used to store computer-executable program codes including instructions.
  • the processor 110 executes various functional applications and data processing of the terminal 100 by executing instructions stored in the internal memory 121 .
  • the internal memory 121 may include an area for storing programs and an area for storing data.
  • the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like.
  • the storage data area can store data (such as audio data, phone book, etc.) created by the terminal 100 during use.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.
  • the terminal 100 can implement audio functions, such as music playing and recording, through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor.
  • audio functions such as music playing and recording
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • the audio module 170 may be set in the processor 110 , or some functional modules of the audio module 170 may be set in the processor 110 .
  • the software system of the terminal 100 may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture.
  • the software system of the terminal 100 is exemplarily described by taking an Android system with a layered architecture as an example.
  • FIG. 2 is a block diagram of a software system of a terminal 100 provided by an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the Android system is divided into application program layer, application program framework layer, Android runtime (Android runtime) and system layer, and kernel layer from top to bottom.
  • the application layer can consist of a series of application packages. As shown in Figure 2, the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions. As shown in Fig. 2, the application framework layer may include window manager, content provider, view system, phone manager, resource manager, notification manager and so on.
  • a window manager is used to manage window programs. The window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data, and make these data accessible to applications. These data can include video, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on.
  • the view system can be used to build the display interface of the application, and the display interface can be composed of one or more views, for example, including a view for displaying SMS notification icons, a view for displaying text, and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the terminal 100, such as management of call status (including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, etc.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also be notifications that appear in the status bar at the top of the system in the form of charts or scrolling text, such as notifications for applications running in the background.
  • the notification manager can also be a notification that appears on the screen in the form of a dialog window, such as prompting text information in the status bar, making a prompt sound, vibrating the electronic device, and flashing an indicator light.
  • the Android Runtime includes core library and virtual machine.
  • the Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application program layer and the application program framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the system layer can include multiple functional modules, such as: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (such as: OpenGL ES), 2D graphics engine (such as: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of various commonly used audio and video formats, as well as still image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing, etc.
  • the 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
  • the workflow of the software and hardware of the terminal 100 will be exemplarily described below in conjunction with capturing and photographing scenes.
  • a corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, and other information). Raw input events are stored at the kernel level.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the original input event. Take the touch operation as a click operation, and the control corresponding to the click operation is the control of the camera application icon as an example.
  • the camera application calls the interface of the application framework layer to start the camera application, and then calls the kernel layer to start the camera driver. 193 Capture still images or video.
  • terminals such as mobile phones and tablet computers have increasingly powerful functions and gradually become indispensable tools in people's work and life.
  • the terminal is usually equipped with a camera to realize the shooting function, and in order to improve the display effect of the captured image, the terminal often processes the captured image to improve the image quality and achieve the visual experience expected by the user.
  • the subject of shooting may be a variety of different types of objects, such as portraits, buildings, and so on.
  • objects such as portraits, buildings, and so on.
  • the focus of vision is different.
  • portraits may require higher definition, while buildings want their lines to stand out.
  • the embodiment of the present application provides an image processing method, which can perform differentiated processing on various types of objects in the main part of the captured image, thereby improving the overall quality of the image and bringing users a better visual experience .
  • FIG. 3 is a flow chart of an image processing method provided by an embodiment of the present application. Referring to Fig. 3, the method includes the following steps.
  • Step 301 The terminal acquires a first image to be processed.
  • the first image is an image that needs to be processed to improve its display effect.
  • the first image may be an image captured by the terminal, for example, the first image may be the image shown in FIG. 4 .
  • the terminal may use this image as the first image to be processed, and then perform subsequent steps to process the image.
  • Step 302 The terminal performs salient object detection on the first image to obtain salient regions in the first image.
  • Salient target detection is a target detection based on visual saliency, that is, target detection that simulates human visual characteristics.
  • the salient object is the object that the user is most interested in and noticeable in the image.
  • the terminal After the terminal detects the salient object on the first image, it can obtain the salient region in the first image.
  • the salient area is the main part of the first image, and the objects contained in the salient area are the objects most likely to be of interest and noticed by the user.
  • the salient area may contain n targets, where n is a positive integer.
  • the first image is the image shown in Figure 4, after the terminal detects the salient object on the first image, it can obtain the salient object detection result as shown in (a) in Figure 5, the salient object
  • the detection result can be a mask map.
  • the white part in the mask image is used to indicate the location of the salient region in the first image
  • the black part in the mask image is used to indicate the location of the background region in the first image except the salient region
  • the background area in the first image can be blocked by using the mask map
  • the salient area in the first image can be retained, that is, the image shown in (b) in FIG. 5 is obtained.
  • the mask image can be an image with pixel values including 0 and 255, as shown in Figure 5 (a), the white part in the mask image is the part with a pixel value of 255, the The black parts in the mask map are the parts with pixel value 0.
  • an AND operation is performed between each pixel value in the mask map and the pixel value at the corresponding position in the first image, and the AND operation means that when a pixel value in the mask map is 0, the The pixel value at the corresponding position is set to 0, and when a pixel value in the mask image is not 0 (ie, 255), the pixel value at the corresponding position in the first image is retained.
  • the operation of the terminal performing salient object detection on the first image is similar to the operation of a certain terminal performing salient object detection on a certain image in the related art, and this embodiment of the present application does not elaborate on this.
  • the terminal can use a salient target detection algorithm based on the space domain (including but not limited to the ltti algorithm, CA (context-aware) algorithm, etc.), a salient target detection algorithm based on the frequency domain (including but not limited to the residual spectrum ( Spectral residual (SR) algorithm, FT (frequency-tuned) algorithm, etc.) and other salient target detection algorithms are used to perform salient target detection on the first image.
  • a salient target detection algorithm based on the space domain including but not limited to the ltti algorithm, CA (context-aware) algorithm, etc.
  • a salient target detection algorithm based on the frequency domain including but not limited to the residual spectrum (Spectral residual (SR) algorithm, FT (frequency-tuned) algorithm, etc.
  • SR Spectral residual
  • FT frequency-tuned
  • Step 303 The terminal performs object recognition on the salient area in the first image, and obtains the type of each object in the n objects included in the salient area.
  • the salient region may contain a target, that is, n is 1, and at this time, the type of the target is recognized.
  • the salient region may contain multiple objects, that is, n is an integer greater than or equal to 2, and in this case, the type of each object in the multiple objects is identified.
  • the terminal when the terminal performs target recognition on the salient area in the first image, it can first perform image segmentation on the salient area to obtain n target areas in the salient area , each of the n target areas contains a target, and then target recognition is performed on each of the n target areas to obtain the type of target contained in each target area. In this way, the type of each object among the n objects in the salient area can be accurately obtained.
  • the salient area can be shown in Figure 5 (b), and the salient area contains an object, then the terminal can directly identify the object in the salient area, and obtain the object's The type is Portrait.
  • the salient area may be as shown in Figure 6, and the salient area contains two objects, then the terminal may perform image segmentation on the salient area to obtain two object areas, and then compare the two object areas Target recognition is carried out in each target area, and the type of target contained in one target area is animal, and the type of target contained in the other target area is portrait.
  • the types of objects in the embodiment of the present application can include not only portraits, animals, plants, buildings, etc., but also subcategories under each category.
  • portraits it can include people of yellow race , Caucasian, Black and other subcategories
  • animals it can include cats, dogs and other subcategories.
  • cats, dogs and other subcategories In this way, a more precise distinction of the n objects in the salient region can be achieved.
  • the operation of the terminal performing object recognition on the salient region in the first image is similar to the operation of a certain terminal performing object recognition on a certain image in the related art, which will not be described in detail in this embodiment of the present application.
  • the terminal may input the salient area in the first image into the classification model, and the classification model outputs the position of each target area in the n target areas in the salient area and the type of the target contained in each target area.
  • the position of each target area is the position of the targets contained in each target area.
  • This classification model is used to identify the type of objects that an image contains. That is, after an image is input into the classification model, the classification model can recognize the position and type of the target contained in the image and output it. In this case, if the input image contains multiple targets, the classification model can directly realize image segmentation and target recognition, that is, the classification model can directly segment multiple target regions from the input image and classify each target Regions are object identified and then output the location of each object region and the type of objects it contains.
  • the classification model may be trained by the terminal, or may be trained by the server and then sent to the terminal, which is not limited in this embodiment of the present application.
  • the terminal or the server may obtain multiple training samples, and use the multiple training samples to train the neural network model to obtain the classification model.
  • the multiple training samples may be preset.
  • Each training sample in the plurality of training samples includes a sample image and a sample label, the sample image contains a specified target, and the sample label is a type of the specified target contained in the sample image. That is, the input data in each training sample among the plurality of training samples is a sample image containing a specified target, and the sample is marked as a type of the specified target.
  • the neural network model may include multiple network layers, and the multiple network layers include an input layer, multiple hidden layers and an output layer.
  • the input layer is responsible for receiving input data; the output layer is responsible for outputting processed data; multiple hidden layers are located between the input layer and the output layer and are responsible for processing data, and multiple hidden layers are invisible to the outside.
  • the neural network model may be a deep neural network or the like, and may be a convolutional neural network in the deep neural network or the like.
  • the input data in the training samples can be input into the neural network model to obtain output data;
  • the function determines the loss value between the output data and the sample label in the training sample; adjusts the parameters in the neural network model according to the loss value.
  • the neural network model whose parameters are adjusted is the classification model.
  • the formula to adjust any parameter in the neural network model. is the adjusted parameter.
  • w is the parameter before adjustment.
  • is the learning rate, and ⁇ can be set in advance, for example, ⁇ can be 0.001, 0.000001, etc., which is not uniquely limited in this embodiment of the present application.
  • dw is the partial derivative of the loss function with respect to w, which can be obtained according to the loss value.
  • the terminal may also obtain the proportion of each of the n objects in the first image, that is, the The ratio of the size of each target (that is, the number of pixels of each target) to the overall size of the first image (that is, the total number of pixels in the first image). So that the image processing strategy can be determined by referring to the proportion of each target in the future.
  • Step 304 The terminal obtains an image processing policy of each object according to the type of each object among the n objects.
  • the target image processing strategy is an image processing strategy that can improve the display effect of the target. That is, for a certain object, after the object is processed according to the image processing strategy of the object, the user will obtain a better visual experience when viewing the object.
  • the image processing strategy may include one or more image processing operations of denoising, sharpening, color, etc., and may also include the intensity of each image processing operation.
  • Denoising refers to the process of reducing noise in an image. Images can be denoised using denoising algorithms or neural network models. Different filter operators in the denoising algorithm have different denoising strengths, that is, different filter operators can be used to adjust the denoising strength. Alternatively, different neural network models can be used to achieve different denoising strengths. Exemplarily, the descending order of denoising intensity may include strong denoising, common denoising, and weak denoising.
  • Sharpening also called edge enhancement
  • a sharpening algorithm may be used to perform sharpening processing on an image.
  • Different filter operators in the sharpening algorithm have different sharpening strengths, that is, different filtering operators may be used to adjust the sharpening strength.
  • the order of sharpening intensity from high to low may include strong sharpening, normal sharpening, and weak sharpening.
  • Color refers to the process of color correcting and color enhancing an image.
  • the image can be color-processed using a neural network model, and different color intensities can be achieved using different neural network models.
  • the order of color intensity from high to low may include strong color, normal color, and weak color.
  • the operation of the terminal acquiring the image processing policy of each target according to the type of each target in the n targets may include the following two possible ways:
  • the first possible way for each of the n targets, the terminal obtains the corresponding image processing strategy from the correspondence between the target type and the image processing strategy according to the type of the target as the image processing strategy for the target Strategy.
  • the correspondence between object types and image processing strategies may be pre-stored in the terminal.
  • the correspondence includes multiple object types and multiple image processing strategies corresponding to the multiple object types one-to-one.
  • the image corresponding to each object type processing_strategy is the image processing strategy for objects belonging to this object type.
  • the corresponding relationship can be set by the technician according to the visual experience requirements. For example, in the user's visual experience, the portrait may need higher definition and weaker denoising, so that the image corresponding to the target type of portrait can be set accordingly. Processing strategy, and in the user's visual experience, the building may want its lines to be prominent and sharpened stronger, so that the image processing strategy corresponding to the target type of building can be set accordingly. In this way, the image processing strategy obtained according to the corresponding relationship is more in line with the user's visual experience requirements.
  • the terminal can obtain the corresponding image processing strategy from the correspondence between the object type and the image processing strategy shown in Table 1 below: weak denoising , weak sharpening, normal color, and use this image processing strategy as the image processing strategy for this target.
  • the second possible way the terminal acquires the image processing strategy of each target according to the type and proportion of each target in the n targets.
  • Objects of different sizes may require different image processing strategies to enhance the display.
  • portraits of different sizes may require different image processing strategies, as shown in (a) in Figure 7, for relatively large portraits, fine-grained processing can be performed in terms of definition and color, as shown in Figure 7
  • the image processing strategy can be obtained according to the type of the object and its proportion in the image, so that the obtained image processing strategy can improve the display effect of the object more accurately.
  • the terminal can obtain the corresponding image processing strategy from the correspondence between the target type, the range of the target ratio, and the image processing strategy according to the type and proportion of the target. strategy as the image processing strategy for this objective.
  • the correspondence between target types, target ratio ranges, and image processing strategies can be pre-stored in the terminal.
  • the correspondence includes multiple target types, multiple target ratio ranges, and multiple image processing strategies in one-to-one correspondence.
  • Each target The image processing strategy corresponding to the type and the target ratio range is an image processing strategy when the proportion of objects belonging to the target type is within the target ratio range.
  • the corresponding relationship can be set and obtained by a technician according to visual experience requirements.
  • the terminal can select from the target type, target ratio range, and image processing strategy shown in Table 2 below.
  • the corresponding image processing strategy obtained is: weak denoising, weak sharpening, and normal color, and the image processing strategy is used as the image processing strategy of this target.
  • Step 305 The terminal processes the first image according to the image processing policy of each of the n objects to obtain the second image.
  • the terminal For each of the n targets, the terminal processes the target in the first image according to the target's image processing policy. In this way, the processing of the n objects in the salient region in the first image is completed, that is, the processing of the main part of the first image is completed. Since the user generally pays the most attention to the salient region in the first image, after processing the n objects in the salient region in the first image, the improvement of the overall quality of the first image will be very obvious, which can effectively improve the user's visual experience.
  • the terminal processes the object according to the image processing policy of the object to obtain the processed object. Then image fusion is performed on the processed n objects and the background area of the first image to obtain a second image.
  • the background area of the second image is the same as that of the first image, and the salient area of the second image is obtained by processing n objects in the salient area of the first image.
  • the terminal when the terminal processes the object according to the image processing strategy of the object, it may process all pixel values in the first image according to the image processing strategy of the object.
  • the target is segmented from the image, and image fusion is performed on the segmented target and the background area of the unprocessed first image.
  • Figure 5 (b) For example, for the first image as shown in Figure 4, its salient area is shown in Figure 5 (b) and contains an object, then the terminal can process the image shown in Figure 4 according to the image processing strategy of this object All pixel values in the first image are processed, and after the processing is completed, the target is segmented from the processed first image, and the segmented target is image-fused with the background area of the unprocessed first image.
  • the terminal when the terminal processes the target according to the target's image processing strategy, it may process all the pixel values in the target region where the target is located during the salient target detection, according to the target's image processing strategy, After the processing is completed, image fusion is performed on the processed target area and the background area of the first image.
  • the terminal For example, for the first image as shown in Figure 4, its salient area is shown in Figure 5 (b) and contains a target, at this time the salient area is segmented when the salient target is detected The target area where the target is located, the terminal can process all the pixel values in the target area shown in (b) in Figure 5 according to the image processing strategy of the target, and after the processing is completed, the processed Image fusion is performed on the target area and the background area of the first image.
  • the terminal when the terminal performs image fusion on the processed n objects and the background area of the first image, it may perform image fusion on the processed n objects and the background area of the first image according to the positions of the n objects.
  • This image fusion process is similar to the operation of a certain terminal performing image fusion on a certain foreground image and a certain background image according to the position of a certain foreground image in the related art, which is not described in detail in this embodiment of the present application.
  • the pixel value of this position in the background area is used as the pixel value of the corresponding position in the second image;
  • the processed pixel value of the object at this position in the salient area is used as the pixel value of the corresponding position in the second image. In this way, the background area of the first image is kept unchanged, and the salient area of the first image is processed to obtain the second image.
  • the terminal after acquiring the first image to be processed, the terminal performs salient object detection on the first image to obtain a salient region in the first image. Afterwards, target recognition is performed on the salient area to obtain the type of each object in the n objects included in the salient area, and then an image processing strategy for each object is obtained according to the type of each object in the n objects. Finally, the first image is processed according to the image processing strategy of each of the n objects to obtain the second image.
  • the image processing strategy can be obtained according to the type of the target, and then the differential processing of the target in the salient area of the first image can be realized according to the image processing strategy, so that the main part of the image can be realized
  • the improvement of the display effect can improve the overall quality of the image, thereby effectively improving the user's visual experience.
  • Fig. 8 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • the device can be implemented by software, hardware or a combination of the two to become part or all of computer equipment.
  • the computer equipment can be implemented as shown in Fig. 1 to Fig. 2 The terminal described in the example.
  • the device includes a first acquisition module 801 , a detection module 802 , an identification module 803 , a second acquisition module 804 and a processing module 805 .
  • a detection module 802 configured to perform salient target detection on the first image to obtain a salient region in the first image
  • the recognition module 803 is configured to perform target recognition on the salient area, and obtain the type of each target in the n targets contained in the salient area, where n is a positive integer;
  • the second obtaining module 804 is used to obtain the image processing strategy of each target according to the type of each target in the n targets;
  • the processing module 805 is configured to process the first image according to the image processing policy of each of the n objects to obtain the second image.
  • the identification module 803 is used to:
  • Target identification is performed on each of the n target areas to obtain the types of targets contained in each target area.
  • the second obtaining module 804 is used for:
  • the corresponding image processing strategy is obtained from the correspondence between the object type and the image processing strategy as an image processing strategy of an object.
  • the device also includes:
  • the third acquiring module acquires the proportion of each of the n targets in the first image
  • the second obtaining module 804 is used for:
  • An image processing strategy for each object is obtained according to the type and proportion of each object in the n objects.
  • the second obtaining module 804 is used for:
  • the corresponding image processing strategy is obtained as the target from the correspondence between the target type, target ratio range and image processing strategy image processing strategy.
  • processing module 805 is used for:
  • the object is processed according to the image processing strategy of the object to obtain the processed object;
  • the processed n targets are image fused with the background area of the first image to obtain a second image, and the background area is other areas in the first image except the salient area.
  • the image processing strategy includes at least one image processing operation and the intensity of each image processing operation in the at least one image processing operation, and the at least one image processing operation includes one or more of denoising, sharpening, and coloring.
  • salient object detection is performed on the first image to obtain a salient region in the first image.
  • target recognition is performed on the salient area to obtain the type of each object in the n objects included in the salient area, and then an image processing strategy for each object is obtained according to the type of each object in the n objects.
  • the first image is processed according to the image processing strategy of each of the n objects to obtain the second image.
  • the image processing strategy can be obtained according to the type of the target, and then the differential processing of the target in the salient area of the first image can be realized according to the image processing strategy, so that the main part of the image can be realized
  • the improvement of the display effect can improve the overall quality of the image, thereby effectively improving the user's visual experience.
  • the image processing device provided in the above embodiment processes images, it only uses the division of the above-mentioned functional modules as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the functional units and modules in the above-mentioned embodiments can be integrated into one processing unit, or each unit can exist separately physically, or two or more units can be integrated into one unit, and the above-mentioned integrated units can use hardware It can also be implemented in the form of software functional units.
  • the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the embodiments of the present application.
  • all or part may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be accessed from a website, computer, server, or data center Transmission to another website site, computer, server or data center by wired (such as: coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as: infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or may be a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape), an optical medium (such as a digital versatile disc (Digital Versatile Disc, DVD)) or a semiconductor medium (such as a solid state disk (Solid State Disk, SSD)) wait.
  • a magnetic medium such as a floppy disk, a hard disk, a magnetic tape
  • an optical medium such as a digital versatile disc (Digital Versatile Disc, DVD)
  • a semiconductor medium such as a solid state disk (Solid State Disk, SSD)

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种图像处理方法、装置、设备、存储介质和程序产品,属于终端技术领域。包括:获取待处理的第一图像(301);对第一图像进行显著性目标检测,得到第一图像中的显著性区域(302);对显著性区域进行目标识别,得到显著性区域包含的n个目标中每个目标的类型(303),n为正整数;根据n个目标中每个目标的类型获取每个目标的图像处理策略(304);根据n个目标中每个目标的图像处理策略对第一图像进行处理,得到第二图像(305)。本申请根据目标的类型来获取其图像处理策略,根据图像处理策略实现对第一图像的显著性区域中的目标的差异化处理,实现对图像主体部分的显示效果的提升,提升图像整体质量,有效提升用户的视觉体验。

Description

图像处理方法、装置、设备、存储介质和程序产品
本申请要求于2022年01月07日提交到国家知识产权局、申请号为202210017755.7、申请名称为“图像处理方法、装置、设备、存储介质和程序产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,特别涉及一种图像处理方法、装置、设备、存储介质和程序产品。
背景技术
随着终端技术的飞速发展,诸如手机、平板电脑等终端的功能日益强大,逐渐成为人们工作和生活中不可或缺的工具。终端中通常设置有摄像头来实现拍摄功能,且为了提高拍摄的图像的显示效果,终端往往会对拍摄的图像进行处理,以提高图像质量,达到用户期望的视觉体验。
发明内容
本申请提供了一种图像处理方法、装置、设备、存储介质和程序产品,可以提升图像的整体质量,为用户带来更好的视觉体验。所述技术方案如下:
第一方面,提供了一种图像处理方法。在该方法中,获取待处理的第一图像,然后对第一图像进行显著性目标检测,得到第一图像中的显著性区域。之后,对该显著性区域进行目标识别,得到该显著性区域包含的n个目标中每个目标的类型,n为正整数。根据n个目标中每个目标的类型获取每个目标的图像处理策略。最后,根据n个目标中每个目标的图像处理策略对第一图像进行处理,得到第二图像。
第一图像是需要进行处理,以提升其显示效果的图像。第一图像可以是拍摄得到的图像。
显著性目标检测是基于视觉显著性进行的目标检测,即模拟人的视觉特点进行的目标检测,其目的是为了识别图像的主体,突出图像中最显著的目标(可称为显著性目标),显著性目标是用户在图像中最容易感兴趣、最容易注意到的目标。
目标的图像处理策略是能够提升目标的显示效果的图像处理策略。也即,对于某个目标来说,根据这个目标的图像处理策略对这个目标进行处理后,用户在观看这个目标时,将获得更好的视觉体验。
该图像处理策略可以包括去噪、锐化、色彩等中的一个或多个图像处理操作,且还可以包括每个图像处理操作的强度。示例地,按照去噪强度由高到低的顺序可以包括强去噪、普通去噪、弱去噪。按照锐化强度由高到低的顺序可以包括强锐化、普通锐化、弱锐化。按照色彩强度由高到低的顺序可以包括强色彩、普通色彩、弱色彩。
在本申请中,可以根据目标的类型来获取其图像处理策略,继而根据该图像处理策略实现对第一图像的显著性区域中的目标的差异化处理,如此可以实现对图像主体部分的显示效果的提升,从而可以提升图像整体质量,进而可以有效提升用户的视觉 体验。
可选地,由于该显著性区域可能会包含多个目标,所以对第一图像中的显著性区域进行目标识别时,可以先对该显著性区域进行图像分割,得到该显著性区域中的n个目标区域,n个目标区域中的每个目标区域均包含有一个目标,然后再对n个目标区域中每个目标区域进行目标识别,得到每个目标区域包含的目标的类型。如此,可以准确得到该显著性区域中的n个目标中每个目标的类型。
值得注意的是,本申请中目标的类型不仅可以包括人像、动物、植物、建筑等这种大类,还可以包括每一个大类下面的小类,比如,对于人像,可以包括黄种人、白种人、黑种人等小类,对于动物,可以包括猫、狗等小类。如此,可以实现对该显著性区域中的n个目标的更为精准的区分。
在一种可能的方式中,根据n个目标中每个目标的类型获取每个目标的图像处理策略的操作可以为:对于n个目标中的每一个目标,根据这个目标的类型,从目标类型与图像处理策略之间的对应关系中,获取对应的图像处理策略作为这个目标的图像处理策略。
可以预先存储目标类型与图像处理策略之间的对应关系,该对应关系中包括多个目标类型以及与该多个目标类型一一对应的多个图像处理策略,每个目标类型对应的图像处理策略是属于这个目标类型的目标的图像处理策略。该对应关系可以由技术人员根据视觉体验需求设置得到,比如,在用户的视觉体验中,对人像可能需要清晰度更高,去噪更弱,从而据此可以设置人像这一目标类型对应的图像处理策略,而在用户的视觉体验中,对建筑物可能希望其线条突出,锐化更强,从而据此可以设置建筑物这一目标类型对应的图像处理策略。
在本申请中,可以根据技术人员预先根据视觉体验需求设置的目标类型与图像处理策略之间的对应关系来获取目标的图像处理策略,从而使得获取到的图像处理策略更为符合用户的视觉体验需求。
在另一种可能的方式中,还可以获取n个目标中每个目标在第一图像中所占的比例。这种情况下,根据n个目标中每个目标的类型获取每个目标的图像处理策略的操作可以为:根据n个目标中每个目标的类型和所占的比例获取每个目标的图像处理策略。
不同大小的目标所需的能够提升显示效果的图像处理策略可能有所不同。因而可以根据目标的类型和其在图像中所占的比例来获取其图像处理策略,从而使得获取到的图像处理策略可以更为精准地提升目标的显示效果。
其中,根据n个目标中每个目标的类型和所占的比例获取每个目标的图像处理策略的操作可以为:对于n个目标中的每一个目标,根据这个目标的类型和所占的比例,从目标类型、目标比例范围与图像处理策略之间的对应关系中,获取对应的图像处理策略作为这个目标的图像处理策略。
可以预先存储目标类型、目标比例范围与图像处理策略之间的对应关系,该对应关系中包括一一对应的多个目标类型、多个目标比例范围以及多个图像处理策略,每个目标类型和目标比例范围对应的图像处理策略是属于这个目标类型的目标所占的比例在该目标比例范围内时的图像处理策略。该对应关系可以由技术人员根据视觉体验 需求设置得到。
可选地,根据n个目标中每个目标的图像处理策略对第一图像进行处理,得到第二图像的操作可以为:对于n个目标中的每一个目标,根据这个目标的图像处理策略对这个目标进行处理,得到处理后的这个目标;将处理后的n个目标与第一图像的背景区域进行图像融合,得到第二图像,该背景区域是第一图像中除显著性区域之外的其他区域。
在本申请中是保持第一图像的背景区域不变,对第一图像的显著性区域进行处理,来得到第二图像。也即,第二图像的背景区域与第一图像的背景区域相同,第二图像的显著性区域是对第一图像的显著性区域中的n个目标处理后得到的。如此,可以保证第二图像的主体部分的显示效果较好,从而保证第二图像的整体质量较高,可以给用户带来较好的视觉体验。
第二方面,提供了一种图像处理装置,所述图像处理装置具有实现上述第一方面中图像处理方法行为的功能。所述图像处理装置包括至少一个模块,所述至少一个模块用于实现上述第一方面所提供的图像处理方法。
第三方面,提供了一种图像处理装置,所述图像处理装置的结构中包括处理器和存储器,所述存储器用于存储支持图像处理装置执行上述第一方面所提供的图像处理方法的程序,以及存储用于实现上述第一方面所述的图像处理方法所涉及的数据。所述处理器被配置为用于执行所述存储器中存储的程序。所述图像处理装置还可以包括通信总线,所述通信总线用于在所述处理器与所述存储器之间建立连接。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面所述的图像处理方法。
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的图像处理方法。
上述第二方面、第三方面、第四方面和第五方面所获得的技术效果与上述第一方面中对应的技术手段获得的技术效果近似,在这里不再赘述。
附图说明
图1是本申请实施例提供的一种终端的结构示意图;
图2是本申请实施例提供的一种终端的软件系统的框图;
图3是本申请实施例提供的一种图像处理方法的流程图;
图4是本申请实施例提供的一种第一图像的示意图;
图5是本申请实施例提供的一种显著性区域的示意图;
图6是本申请实施例提供的另一种显著性区域的示意图;
图7是本申请实施例提供的又一种显著性区域的示意图;
图8是本申请实施例提供的一种图像处理装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请的实施方式作进一步地详细描述。
应当理解的是,本申请提及的“多个”是指两个或两个以上。在本申请的描述中,除 非另有说明,“/”表示或的意思,比如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,比如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,为了便于清楚描述本申请的技术方案,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
在本申请中描述的“一个实施例”或“一些实施例”等语句意味着在本申请的一个或多个实施例中包括该实施例描述的特定特征、结构或特点。由此,在本申请中的不同之处出现的“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等语句不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。此外,术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
在对本申请实施例提供的图像处理方法进行详细地解释说明之前,先对本申请实施例涉及的终端予以说明。
图1是本申请实施例提供的一种终端的结构示意图。参见图1,终端100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户识别模块(subscriber identity module,SIM)卡接口195等。其中,传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对终端100的具体限定。在本申请另一些实施例中,终端100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,比如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是终端100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从该存储器中直接调用。避免了重 复存取,减少了处理器110的等待时间,因而提高了系统的效率。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。
终端100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
终端100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
终端100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
外部存储器接口120可以用于连接外部存储卡,比如Micro SD卡,实现扩展终端100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。比如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,计算机可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,来执行终端100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端100在使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,比如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
终端100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D以及应用处理器等实现音频功能,比如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
接下来对终端100的软件系统予以说明。
终端100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的安卓(Android)系统为例,对终端100的软件系统进行示例性说明。
图2是本申请实施例提供的一种终端100的软件系统的框图。参见图2,分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统从上至下分为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统层,以及内核层。
应用程序层可以包括一系列应用程序包。如图2所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器, 通知管理器等。窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问,这些数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。视图系统包括可视控件,比如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序的显示界面,显示界面可以由一个或多个视图组成,比如,包括显示短信通知图标的视图,包括显示文字的视图,以及包括显示图片的视图。电话管理器用于提供终端100的通信功能,比如通话状态的管理(包括接通,挂断等)。资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等。通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如,通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或滚动条文本形式出现在系统顶部状态栏的通知,比如后台运行的应用程序的通知。通知管理器还可以是以对话窗口形式出现在屏幕上的通知,比如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统层可以包括多个功能模块,比如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(比如:OpenGL ES),二维图形引擎(比如:SGL)等。表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,比如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。二维图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面结合捕获拍照场景,示例性说明终端100的软件以及硬件的工作流程。
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别原始输入事件所对应的控件。以该触摸操作是单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用程序框架层的接口,启动相机应用,再调用内核层启动摄像头驱动,通过摄像头193捕获静态图像或视频。
下面对本申请实施例提供的图像处理方法涉及的应用场景予以说明。
随着终端技术的飞速发展,诸如手机、平板电脑等终端的功能日益强大,逐渐成为人们工作和生活中不可或缺的工具。终端中通常设置有摄像头来实现拍摄功能,且为了提高拍摄的图像的显示效果,终端往往会对拍摄的图像进行处理,以提高图像质量,达到用户 期望的视觉体验。
用户在使用终端拍摄时,拍摄的主体可能是多种不同类型的目标,如人像、建筑等。对于不同类型的目标,其视觉的侧重点有所不同,比如,人像可能需要更高的清晰度,建筑物则希望其线条突出。为此,本申请实施例提供了一种图像处理方法,可以对拍摄的图像中主体部分的各种类型的目标进行差异化处理,从而提升图像的整体质量,为用户带来更好的视觉体验。
下面对本申请实施例提供的图像处理方法进行详细地解释说明。
图3是本申请实施例提供的一种图像处理方法的流程图。参见图3,该方法包括以下步骤。
步骤301:终端获取待处理的第一图像。
第一图像是需要进行处理,以提升其显示效果的图像。第一图像可以是终端拍摄的图像,如第一图像可以是图4所示的图像。
示例地,终端在通过自身的摄像头拍摄到一张图像后,就可以将这张图像作为需要处理的第一图像,然后执行后续步骤来对这张图像进行处理。
步骤302:终端对第一图像进行显著性目标检测,得到第一图像中的显著性区域。
显著性目标检测是基于视觉显著性进行的目标检测,即模拟人的视觉特点进行的目标检测,其目的是为了识别图像的主体,突出图像中最显著的目标(可称为显著性目标),显著性目标是用户在图像中最容易感兴趣、最容易注意到的目标。
终端对第一图像进行显著性目标检测后,就可以得到第一图像中的显著性区域。该显著性区域是第一图像的主体部分,该显著性区域内包含的目标即是用户最容易感兴趣、最容易注意到的目标。在本申请实施例中,该显著性区域内可以包含n个目标,n为正整数。
比如,第一图像是图4所示的图像,终端对第一图像进行显著性目标检测后,就可以得到如图5中的(a)图所示的显著性目标检测结果,该显著性目标检测结果可以是一张掩膜图。该掩膜图中的白色部分用于指示第一图像中的显著性区域所在的位置,该掩膜图中的黑色部分用于指示第一图像中除显著性区域之外的背景区域所在的位置,使用该掩膜图可以遮挡掉第一图像中的背景区域,保留第一图像中的显著性区域,即得到如图5中的(b)图所示的图像。
具体来讲,该掩膜图可以是一张像素值包括0和255的图像,如图5中的(a)图所示,该掩膜图中的白色部分是像素值为255的部分,该掩膜图中的黑色部分是像素值为0的部分。这种情况下,将该掩膜图中的每个像素值与第一图像中对应位置的像素值进行与运算,该与运算是指在该掩膜图中的一个像素值为0时,将对应位置的像素值设为0,在该掩膜图中的一个像素值不为0(即为255)时,保留第一图像中对应位置的像素值。如此,在将该掩膜图中的每个像素值与第一图像中对应位置的像素值进行与运算后,第一图像的显著性区域的像素值不变,第一图像的背景区域的像素值均为0,即得到图5中的(b)图所示的图像,该图像是从第一图像中分割出的遮挡了背景区域(即背景区域为黑色)、保留了显著性区域的图像。
其中,终端对第一图像进行显著性目标检测的操作与相关技术中某个终端对某个图像进行显著性目标检测的操作类似,本申请实施例对此不进行详细阐述。
比如,终端可以使用基于空间域的显著性目标检测算法(包括但不限于ltti算法、CA(context-aware)算法等)、基于频率域的显著性目标检测算法(包括但不限于残差谱(spectral residual,SR)算法、FT(frequency-tuned)算法等)等显著性目标检测算法来对第一图像进行显著性目标检测。
步骤303:终端对第一图像中的显著性区域进行目标识别,得到该显著性区域包含的n个目标中每个目标的类型。
目标识别用于识别目标的类型。该显著性区域可以包含一个目标,即n为1,此时是识别这一个目标的类型。或者,该显著性区域可以包含多个目标,即n为大于或等于2的整数,此时是识别这多个目标中每个目标的类型。
由于该显著性区域可能会包含多个目标,所以终端对第一图像中的显著性区域进行目标识别时,可以先对该显著性区域进行图像分割,得到该显著性区域中的n个目标区域,n个目标区域中的每个目标区域均包含有一个目标,然后再对n个目标区域中每个目标区域进行目标识别,得到每个目标区域包含的目标的类型。如此,可以准确得到该显著性区域中的n个目标中每个目标的类型。
比如,该显著性区域可以如图5中的(b)图所示,该显著性区域包含有一个目标,则终端可以直接对该显著性区域中的这一个目标进行识别,得到这一个目标的类型为人像。
又比如,该显著性区域可以如图6所示,该显著性区域包含有两个目标,则终端可以对该显著性区域进行图像分割,得到两个目标区域,然后再对这两个目标区域中每个目标区域进行目标识别,得到一个目标区域包含的目标的类型为动物,另一个目标区域包含的目标的类型为人像。
值得注意的是,本申请实施例中目标的类型不仅可以包括人像、动物、植物、建筑等这种大类,还可以包括每一个大类下面的小类,比如,对于人像,可以包括黄种人、白种人、黑种人等小类,对于动物,可以包括猫、狗等小类。如此,可以实现对该显著性区域中的n个目标的更为精准的区分。
其中,终端对第一图像中的显著性区域进行目标识别的操作与相关技术中某个终端对某个图像进行目标识别的操作类似,本申请实施例对此不进行详细阐述。
比如,终端可以将第一图像中的显著性区域输入分类模型,由该分类模型输出该显著性区域中的n个目标区域中每个目标区域的位置以及每个目标区域包含的目标的类型。每个目标区域的位置即是每个目标区域包含的目标的位置。
该分类模型用于识别图像包含的目标的类型。也即,将一张图像输入该分类模型后,该分类模型就可以识别出这张图像包含的目标的位置和类型并输出。这种情况下,若输入的图像包含有多个目标,则该分类模型可以直接实现图像分割和目标识别,即该分类模型可以从输入的图像中直接分割出多个目标区域并对每个目标区域进行目标识别,然后输出每个目标区域的位置和所包含的目标的类型。
该分类模型可以由终端训练得到,或者可以由服务器训练得到后发送至终端,本申请实施例对此不作限定。可选地,终端或服务器在训练得到该分类模型时,可以获取多个训练样本,使用该多个训练样本对神经网络模型进行训练,得到该分类模型。
该多个训练样本可以是预先设置的。该多个训练样本中的每个训练样本包括样本图像和样本标记,样本图像中包含指定目标,样本标记为样本图像中包含的指定目标的类型。 也即,该多个训练样本中的每个训练样本中的输入数据为包含有指定目标的样本图像、样本标记为指定目标的类型。
该神经网络模型可以包括多个网络层,该多个网络层中包括输入层、多个隐含层和输出层。输入层负责接收输入数据;输出层负责输出处理后的数据;多个隐含层位于输入层与输出层之间,负责处理数据,多个隐含层对于外部是不可见的。比如,该神经网络模型可以为深度神经网络等,且可以是深度神经网络中的卷积神经网络等。
可选地,使用多个训练样本对神经网络模型进行训练时,对于该多个训练样本中的每个训练样本,可以将这个训练样本中的输入数据输入神经网络模型,获得输出数据;通过损失函数确定该输出数据与这个训练样本中的样本标记之间的损失值;根据该损失值调整该神经网络模型中的参数。在基于该多个训练样本中的每个训练样本对该神经网络模型中的参数进行调整后,参数调整完成的该神经网络模型即为该分类模型。
其中,根据该损失值调整该神经网络模型中的参数的操作可以参考相关技术,本申请实施例对此不进行详细阐述。比如,可以通过公式
Figure PCTCN2022141744-appb-000001
来对该神经网络模型中的任意一个参数进行调整。
Figure PCTCN2022141744-appb-000002
是调整后的参数。w是调整前的参数。α是学习率,α可以预先设置,如 α可以为0.001、0.000001等,本申请实施例对此不作唯一限定。dw是该损失函数关于w的偏导数,可以根据该损失值求得。
在一些实施例中,终端得到该显著性区域包含的n个目标中每个目标的类型之后,还可以获取n个目标中每个目标在第一图像中所占的比例,即n个目标中每个目标的大小(即每个目标的像素个数)占第一图像的整体大小(即第一图像的总像素个数)的比例。以便后续可以参考每个目标所占的比例来确定其图像处理策略。
步骤304:终端根据n个目标中每个目标的类型获取每个目标的图像处理策略。
目标的图像处理策略是能够提升目标的显示效果的图像处理策略。也即,对于某个目标来说,根据这个目标的图像处理策略对这个目标进行处理后,用户在观看这个目标时,将获得更好的视觉体验。
该图像处理策略可以包括去噪、锐化、色彩等中的一个或多个图像处理操作,且还可以包括每个图像处理操作的强度。
去噪是指减少图像中噪声的过程。可以使用去噪算法或神经网络模型来对图像进行去噪处理。去噪算法中的滤波算子不同,其去噪强度不同,也即,可以使用不同的滤波算子调整去噪强度。或者,可以使用不同的神经网络模型来实现不同的去噪强度。示例地,按照去噪强度由高到低的顺序可以包括强去噪、普通去噪、弱去噪。
锐化(也可称为边缘增强)是指补偿图像的轮廓,增强图像的边缘及灰度跳变的部分,使图像变得清晰的过程。可以使用锐化算法来对图像进行锐化处理,锐化算法中的滤波算子不同,其锐化强度也不同,也即,可以使用不同的滤波算子调整锐化强度。示例地,按照锐化强度由高到低的顺序可以包括强锐化、普通锐化、弱锐化。
色彩是指对图像进行色彩校正和色彩增强的过程。可以使用神经网络模型来对图像进行色彩处理,且可以使用不同的神经网络模型来实现不同的色彩强度。示例地,按照色彩强度由高到低的顺序可以包括强色彩、普通色彩、弱色彩。
可选地,终端根据n个目标中每个目标的类型获取每个目标的图像处理策略的操作可以包括如下两种可能的方式:
第一种可能的方式:对于n个目标中的每一个目标,终端根据这个目标的类型,从目标类型与图像处理策略之间的对应关系中,获取对应的图像处理策略作为这个目标的图像处理策略。
终端中可以预先存储目标类型与图像处理策略之间的对应关系,该对应关系中包括多个目标类型以及与该多个目标类型一一对应的多个图像处理策略,每个目标类型对应的图像处理策略是属于这个目标类型的目标的图像处理策略。该对应关系可以由技术人员根据视觉体验需求设置得到,比如,在用户的视觉体验中,对人像可能需要清晰度更高,去噪更弱,从而据此可以设置人像这一目标类型对应的图像处理策略,而在用户的视觉体验中,对建筑物可能希望其线条突出,锐化更强,从而据此可以设置建筑物这一目标类型对应的图像处理策略。如此,根据该对应关系获取到的图像处理策略更为符合用户的视觉体验需求。
比如,n个目标中的一个目标的类型为人像,则终端据此可以从下表1所示的目标类型与图像处理策略之间的对应关系中,获取对应的图像处理策略为:弱去噪、弱锐化、普通色彩,将该图像处理策略作为这个目标的图像处理策略。
表1
Figure PCTCN2022141744-appb-000003
在本申请实施例中仅以上表1为例对目标类型与图像处理策略之间的对应关系进行举例说明,上表1并不对本申请实施例构成限定。
第二种可能的方式:终端根据n个目标中每个目标的类型和所占的比例获取每个目标的图像处理策略。
不同大小的目标所需的能够提升显示效果的图像处理策略可能有所不同。比如,不同大小的人像可能就需要不同的图像处理策略,如图7中的(a)图所示,对于比较大的人像可以在清晰度、色彩上均进行精细化处理,如图7中的(b)图所示,对于比较小的人像只需要提升其清晰度即可。因而,可以根据目标的类型和其在图像中所占的比例来获取其图像处理策略,从而使得获取到的图像处理策略可以更为精准地提升目标的显示效果。
可选地,对于n个目标中的每一个目标,终端可以根据这个目标的类型和所占的比例,从目标类型、目标比例范围与图像处理策略之间的对应关系中,获取对应的图像处理策略作为这个目标的图像处理策略。
终端中可以预先存储目标类型、目标比例范围与图像处理策略之间的对应关系,该对应关系中包括一一对应的多个目标类型、多个目标比例范围以及多个图像处理策略,每个目标类型和目标比例范围对应的图像处理策略是属于这个目标类型的目标所占的比例在 该目标比例范围内时的图像处理策略。该对应关系可以由技术人员根据视觉体验需求设置得到。
比如,n个目标中的一个目标的类型为人像,且这个目标所占的比例为50%,则终端据此可以从下表2所示的目标类型、目标比例范围与图像处理策略之间的对应关系中,获取对应的图像处理策略为:弱去噪、弱锐化、普通色彩,将该图像处理策略作为这个目标的图像处理策略。
表2
Figure PCTCN2022141744-appb-000004
在本申请实施例中仅以上表2为例对目标类型、目标比例范围与图像处理策略之间的对应关系进行举例说明,上表2并不对本申请实施例构成限定。
步骤305:终端根据n个目标中每个目标的图像处理策略对第一图像进行处理,得到第二图像。
对于n个目标中的每一个目标,终端都根据这个目标的图像处理策略对第一图像中的这个目标进行处理。如此,就完成了对第一图像中的显著性区域中的n个目标的处理,也即完成了对第一图像的主体部分的处理。由于用户对第一图像一般最关注显著性区域,所以对第一图像中的显著性区域中的n个目标处理后,对第一图像的整体质量的提升将会非常明显,从而可以有效提升用户的视觉体验。
可选地,对于n个目标中的每一个目标,终端根据这个目标的图像处理策略对这个目标进行处理,得到处理后的这个目标。然后将处理后的n个目标与第一图像的背景区域进行图像融合,得到第二图像。
这种情况下,第二图像的背景区域与第一图像的背景区域相同,第二图像的显著性区域是对第一图像的显著性区域中的n个目标处理后得到的。
其中,终端根据这个目标的图像处理策略对这个目标进行处理时,可以是根据这个目标的图像处理策略对第一图像中的全部像素值进行处理,在处理完成后,再从处理后的第一图像中分割出这个目标,将分割出的这个目标与未处理的第一图像的背景区域进行图像融合。比如,对于如图4所示的第一图像,其显著性区域如图5中的(b)图所示,包含有一个目标,则终端可以根据这个目标的图像处理策略对图4所示的第一图像中的全部像素值进行处理,在处理完成后,再从处理后的第一图像中分割出这个目标,将分割出的这个目标与未处理的第一图像的背景区域进行图像融合。
或者,终端根据这个目标的图像处理策略对这个目标进行处理时,可以是根据这个目标的图像处理策略对在显著性目标检测时分割出的这个目标所在的目标区域中的全部像素值进行处理,在处理完成后,将处理后的这个目标区域与第一图像的背景区域进行图像融合。比如,对于如图4所示的第一图像,其显著性区域如图5中的(b)图所示,包含有一个目标,此时该显著性区域即是在显著性目标检测时分割出的这个目标所在的目标区 域,则终端可以根据这个目标的图像处理策略对图5中的(b)图所示的目标区域中的全部像素值进行处理,在处理完成后,再将处理后的该目标区域与第一图像的背景区域进行图像融合。
其中,终端将处理后的n个目标与第一图像的背景区域进行图像融合时,可以根据该n个目标的位置,将处理后的n个目标与第一图像的背景区域进行图像融合。此图像融合过程与相关技术中某个终端根据某个前景图像的位置将这个前景图像与某个背景图像进行图像融合的操作类似,本申请实施例对此不进行详细描述。
例如,对于第一图像中的每一个位置,若这个位置属于第一图像的背景区域,则将该背景区域中这个位置的像素值作为第二图像中对应位置的像素值;若这个位置属于第一图像的显著性区域,则将该显著性区域中这个位置的目标的经处理后的像素值作为第二图像中对应位置的像素值。如此,即是保持第一图像的背景区域不变,对第一图像的显著性区域进行处理,来得到第二图像。
在本申请实施例中,终端获取到待处理的第一图像后,对第一图像进行显著性目标检测,得到第一图像中的显著性区域。之后,对该显著性区域进行目标识别,得到该显著性区域包含的n个目标中每个目标的类型,然后根据n个目标中每个目标的类型获取每个目标的图像处理策略。最后,根据n个目标中每个目标的图像处理策略对第一图像进行处理,得到第二图像。在此图像处理过程中,可以根据目标的类型来获取其图像处理策略,继而根据该图像处理策略实现对第一图像的显著性区域中的目标的差异化处理,如此可以实现对图像主体部分的显示效果的提升,从而可以提升图像整体质量,进而可以有效提升用户的视觉体验。
图8是本申请实施例提供的一种图像处理装置的结构示意图,该装置可以由软件、硬件或者两者的结合实现成为计算机设备的部分或者全部,该计算机设备可以为图1至图2实施例所述的终端。参见图8,该装置包括第一获取模块801、检测模块802、识别模块803、第二获取模块804和处理模块805。
第一获取模块801,用于获取待处理的第一图像;
检测模块802,用于对第一图像进行显著性目标检测,得到第一图像中的显著性区域;
识别模块803,用于对显著性区域进行目标识别,得到显著性区域包含的n个目标中每个目标的类型,n为正整数;
第二获取模块804,用于根据n个目标中每个目标的类型获取每个目标的图像处理策略;
处理模块805,用于根据n个目标中每个目标的图像处理策略对第一图像进行处理,得到第二图像。
可选地,识别模块803用于:
对显著性区域进行图像分割,得到显著性区域中的n个目标区域;
对n个目标区域中每个目标区域进行目标识别,得到每个目标区域包含的目标的类型。
可选地,第二获取模块804用于:
对于n个目标中的每一个目标,根据一个目标的类型,从目标类型与图像处理策略之间的对应关系中,获取对应的图像处理策略作为一个目标的图像处理策略。
可选地,该装置还包括:
第三获取模块,获取n个目标中每个目标在第一图像中所占的比例;
第二获取模块804用于:
根据n个目标中每个目标的类型和所占的比例获取每个目标的图像处理策略。
可选地,第二获取模块804用于:
对于n个目标中的每一个目标,根据这一个目标的类型和所占的比例,从目标类型、目标比例范围与图像处理策略之间的对应关系中,获取对应的图像处理策略作为这一个目标的图像处理策略。
可选地,处理模块805用于:
对于n个目标中的每一个目标,根据这一个目标的图像处理策略对这一个目标进行处理,得到处理后的这一个目标;
将处理后的n个目标与第一图像的背景区域进行图像融合,得到第二图像,背景区域是第一图像中除显著性区域之外的其他区域。
可选地,图像处理策略包括至少一个图像处理操作和至少一个图像处理操作中每个图像处理操作的强度,至少一个图像处理操作包括去噪、锐化、色彩中的一个或多个。
在本申请实施例中,获取到待处理的第一图像后,对第一图像进行显著性目标检测,得到第一图像中的显著性区域。之后,对该显著性区域进行目标识别,得到该显著性区域包含的n个目标中每个目标的类型,然后根据n个目标中每个目标的类型获取每个目标的图像处理策略。最后,根据n个目标中每个目标的图像处理策略对第一图像进行处理,得到第二图像。在此图像处理过程中,可以根据目标的类型来获取其图像处理策略,继而根据该图像处理策略实现对第一图像的显著性区域中的目标的差异化处理,如此可以实现对图像主体部分的显示效果的提升,从而可以提升图像整体质量,进而可以有效提升用户的视觉体验。
需要说明的是:上述实施例提供的图像处理装置在处理图像时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
上述实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请实施例的保护范围。
上述实施例提供的图像处理装置与图像处理方法实施例属于同一构思,上述实施例中单元、模块的具体工作过程及带来的技术效果,可参见方法实施例部分,此处不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络或其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,比如,所述计算机指令可 以从一个网站站点、计算机、服务器或数据中心通过有线(比如:同轴电缆、光纤、数据用户线(Digital Subscriber Line,DSL))或无线(比如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质,或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(比如:软盘、硬盘、磁带)、光介质(比如:数字通用光盘(Digital Versatile Disc,DVD))或半导体介质(比如:固态硬盘(Solid State Disk,SSD))等。
以上所述为本申请提供的可选实施例,并不用以限制本申请,凡在本申请的揭露的技术范围之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (11)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取待处理的第一图像;
    对所述第一图像进行显著性目标检测,得到所述第一图像中的显著性区域;
    对所述显著性区域进行目标识别,得到所述显著性区域包含的n个目标中每个目标的类型,所述n为正整数;
    根据所述n个目标中每个目标的类型获取所述每个目标的图像处理策略;
    根据所述n个目标中每个目标的图像处理策略对所述第一图像进行处理,得到第二图像。
  2. 如权利要求1所述的方法,其特征在于,所述对所述显著性区域进行目标识别,得到所述显著性区域包含的n个目标中每个目标的类型,包括:
    对所述显著性区域进行图像分割,得到所述显著性区域中的n个目标区域;
    对所述n个目标区域中每个目标区域进行目标识别,得到所述每个目标区域包含的目标的类型。
  3. 如权利要求1或2所述的方法,其特征在于,所述根据所述n个目标中每个目标的类型获取所述每个目标的图像处理策略,包括:
    对于所述n个目标中的每一个目标,根据所述一个目标的类型,从目标类型与图像处理策略之间的对应关系中,获取对应的图像处理策略作为所述一个目标的图像处理策略。
  4. 如权利要求1或2所述的方法,其特征在于,所述对所述显著性区域进行目标识别,得到所述显著性区域包含的n个目标中每个目标的类型之后,还包括:
    获取所述n个目标中每个目标在所述第一图像中所占的比例;
    所述根据所述n个目标中每个目标的类型获取所述每个目标的图像处理策略,包括:
    根据所述n个目标中每个目标的类型和所占的比例获取所述每个目标的图像处理策略。
  5. 如权利要求4所述的方法,其特征在于,所述根据所述n个目标中每个目标的类型和所占的比例获取所述每个目标的图像处理策略,包括:
    对于所述n个目标中的每一个目标,根据所述一个目标的类型和所占的比例,从目标类型、目标比例范围与图像处理策略之间的对应关系中,获取对应的图像处理策略作为所述一个目标的图像处理策略。
  6. 如权利要求1-5任一所述的方法,其特征在于,所述根据所述n个目标中每个目标的图像处理策略对所述第一图像进行处理,得到第二图像,包括:
    对于所述n个目标中的每一个目标,根据所述一个目标的图像处理策略对所述一个目标进行处理,得到处理后的所述一个目标;
    将处理后的所述n个目标与所述第一图像的背景区域进行图像融合,得到所述第二图像,所述背景区域是所述第一图像中除所述显著性区域之外的其他区域。
  7. 如权利要求1-6任一所述的方法,其特征在于,所述图像处理策略包括至少一个图像处理操作和所述至少一个图像处理操作中每个图像处理操作的强度,所述至少一个图像处理操作包括去噪、锐化、色彩中的一个或多个。
  8. 一种图像处理装置,其特征在于,所述装置包括:
    第一获取模块,用于获取待处理的第一图像;
    检测模块,用于对所述第一图像进行显著性目标检测,得到所述第一图像中的显著性区域;
    识别模块,用于对所述显著性区域进行目标识别,得到所述显著性区域包含的n个目标中每个目标的类型,所述n为正整数;
    第二获取模块,用于根据所述n个目标中每个目标的类型获取所述每个目标的图像处理策略;
    处理模块,用于根据所述n个目标中每个目标的图像处理策略对所述第一图像进行处理,得到第二图像。
  9. 一种计算机设备,其特征在于,所述计算机设备包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1-7任意一项所述的方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如权利要求1-7任意一项所述的方法。
  11. 一种包含指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得计算机执行如权利要求1-7任意一项所述的方法。
PCT/CN2022/141744 2022-01-07 2022-12-26 图像处理方法、装置、设备、存储介质和程序产品 WO2023130990A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210017755.7A CN116468882B (zh) 2022-01-07 2022-01-07 图像处理方法、装置、设备、存储介质
CN202210017755.7 2022-01-07

Publications (1)

Publication Number Publication Date
WO2023130990A1 true WO2023130990A1 (zh) 2023-07-13

Family

ID=87073115

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/141744 WO2023130990A1 (zh) 2022-01-07 2022-12-26 图像处理方法、装置、设备、存储介质和程序产品

Country Status (2)

Country Link
CN (1) CN116468882B (zh)
WO (1) WO2023130990A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351004A (zh) * 2023-11-29 2024-01-05 杭州天眼智联科技有限公司 再生物料识别方法、装置、电子设备和计算机可读介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194869A (zh) * 2017-05-23 2017-09-22 腾讯科技(上海)有限公司 一种图像处理方法及终端、计算机存储介质、计算机设备
CN109451235A (zh) * 2018-10-29 2019-03-08 维沃移动通信有限公司 一种图像处理方法及移动终端
CN113159026A (zh) * 2021-03-31 2021-07-23 北京百度网讯科技有限公司 图像处理方法、装置、电子设备和介质
WO2021208709A1 (zh) * 2020-04-13 2021-10-21 北京字节跳动网络技术有限公司 图像处理方法、装置、电子设备及计算机可读存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960290A (zh) * 2018-06-08 2018-12-07 Oppo广东移动通信有限公司 图像处理方法、装置、计算机可读存储介质和电子设备
CN108764370B (zh) * 2018-06-08 2021-03-12 Oppo广东移动通信有限公司 图像处理方法、装置、计算机可读存储介质和计算机设备
CN109379625B (zh) * 2018-11-27 2020-05-19 Oppo广东移动通信有限公司 视频处理方法、装置、电子设备和计算机可读介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194869A (zh) * 2017-05-23 2017-09-22 腾讯科技(上海)有限公司 一种图像处理方法及终端、计算机存储介质、计算机设备
CN109451235A (zh) * 2018-10-29 2019-03-08 维沃移动通信有限公司 一种图像处理方法及移动终端
WO2021208709A1 (zh) * 2020-04-13 2021-10-21 北京字节跳动网络技术有限公司 图像处理方法、装置、电子设备及计算机可读存储介质
CN113159026A (zh) * 2021-03-31 2021-07-23 北京百度网讯科技有限公司 图像处理方法、装置、电子设备和介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351004A (zh) * 2023-11-29 2024-01-05 杭州天眼智联科技有限公司 再生物料识别方法、装置、电子设备和计算机可读介质
CN117351004B (zh) * 2023-11-29 2024-02-20 杭州天眼智联科技有限公司 再生物料识别方法、装置、电子设备和计算机可读介质

Also Published As

Publication number Publication date
CN116468882A (zh) 2023-07-21
CN116468882B (zh) 2024-03-15

Similar Documents

Publication Publication Date Title
US10958850B2 (en) Electronic device and method for capturing image by using display
US11914850B2 (en) User profile picture generation method and electronic device
US10181203B2 (en) Method for processing image data and apparatus for the same
WO2021115091A1 (zh) 一种文本识别方法及装置
WO2021013132A1 (zh) 输入方法及电子设备
US11861382B2 (en) Application starting method and apparatus, and electronic device
WO2019105457A1 (zh) 图像处理方法、计算机设备和计算机可读存储介质
WO2020107463A1 (zh) 一种电子设备的控制方法及电子设备
US20230269324A1 (en) Display method applied to electronic device, graphical user interface, and electronic device
WO2023130990A1 (zh) 图像处理方法、装置、设备、存储介质和程序产品
CN114463191A (zh) 一种图像处理方法及电子设备
WO2023029913A1 (zh) 消息提示方法和电子设备
CN108062405B (zh) 图片分类方法、装置、存储介质及电子设备
CN115379208A (zh) 一种摄像头的测评方法及设备
CN115964231A (zh) 基于负载模型的评估方法和装置
CN114527903A (zh) 一种按键映射方法、电子设备及系统
CN117036206B (zh) 一种确定图像锯齿化程度的方法及相关电子设备
WO2023280021A1 (zh) 一种生成主题壁纸的方法及电子设备
CN113986406B (zh) 生成涂鸦图案的方法、装置、电子设备及存储介质
US20240046504A1 (en) Image processing method and electronic device
CN115442517B (zh) 图像处理方法、电子设备及计算机可读存储介质
CN116700554B (zh) 信息的显示方法、电子设备及可读存储介质
WO2022247664A1 (zh) 图形界面显示方法、电子设备、介质以及程序产品
WO2024036998A1 (zh) 显示方法、存储介质及电子设备
WO2022206645A1 (zh) 一种滚动截屏的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22918433

Country of ref document: EP

Kind code of ref document: A1