CN112418233A - Image processing method, image processing device, readable medium and electronic equipment - Google Patents

Image processing method, image processing device, readable medium and electronic equipment Download PDF

Info

Publication number
CN112418233A
CN112418233A CN202011298862.9A CN202011298862A CN112418233A CN 112418233 A CN112418233 A CN 112418233A CN 202011298862 A CN202011298862 A CN 202011298862A CN 112418233 A CN112418233 A CN 112418233A
Authority
CN
China
Prior art keywords
image
sub
processed
loss function
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011298862.9A
Other languages
Chinese (zh)
Inventor
王光伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202011298862.9A priority Critical patent/CN112418233A/en
Publication of CN112418233A publication Critical patent/CN112418233A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics

Abstract

The present disclosure relates to an image processing method, an apparatus, a readable medium, and an electronic device, including: acquiring an image to be processed; the image to be processed is input into a pre-trained image processing model, so that the depth information and normal information of each sub-image block, which are obtained by segmenting the image to be processed according to semantics, are obtained. Therefore, when the image to be processed needs to be subsequently processed, the depth information and the normal information on the corresponding sub-image obtained by segmentation can be directly utilized for processing; in addition, because of the mapping relation of the normal information corresponding to each pixel point in each sub-image obtained by semantic segmentation, the normal information is acquired while the image to be processed is subjected to semantic segmentation, and the accuracy of acquiring the normal information can be improved to a certain extent.

Description

Image processing method, image processing device, readable medium and electronic equipment
Technical Field
The present disclosure relates to the field of computers, and in particular, to an image processing method, an image processing apparatus, a readable medium, and an electronic device.
Background
In the related art, when various features in an image are analyzed, each image feature is generally analyzed and extracted individually, and the accuracy of the obtained various image features completely depends on the method used for analyzing the image, for example, when the depth features of the image are analyzed and obtained by using a neural network, the accuracy of the obtained depth features is determined by the network accuracy obtained by training the neural network.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides a method of image processing, the method comprising:
acquiring an image to be processed;
and inputting the image to be processed into a pre-trained image processing model to obtain the depth information and normal information of each sub-image of the image to be processed, which are obtained by semantic segmentation of the image to be processed.
In a second aspect, the present disclosure also provides an image processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring an image to be processed;
and the processing module is used for inputting the image to be processed into a pre-trained image processing model so as to obtain the depth information and normal information of each sub-image of the image to be processed, which are obtained by semantic segmentation of the image to be processed.
In a third aspect, the present disclosure also provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect.
In a fourth aspect, the present disclosure also provides an electronic device, including:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to implement the steps of the method of the first aspect.
According to the technical scheme, after the image to be processed is obtained, the depth information and the normal information in the sub-image obtained by dividing each block in the image to be processed can be directly obtained through the image processing module, so that the subsequent processing on the image to be processed can be directly carried out by using the depth information and the normal information on the corresponding sub-image obtained by dividing; in addition, because of the mapping relation of the normal information corresponding to each pixel point in each sub-image obtained by semantic segmentation, the normal information is acquired while the image to be processed is subjected to semantic segmentation, and the accuracy of acquiring the normal information can be improved to a certain extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:
fig. 1 is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure. .
Fig. 2 is a schematic block diagram illustrating an image processing model in an image processing method according to an exemplary embodiment of the present disclosure.
Fig. 3 is a flowchart illustrating a method of training an image processing model in an image processing method according to an exemplary embodiment of the present disclosure.
Fig. 4 is a block diagram illustrating a configuration of an image processing apparatus according to an exemplary embodiment of the present disclosure.
FIG. 5 illustrates a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Fig. 1 is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure. According to fig. 1, the method comprises steps 101 to 102.
In step 101, an image to be processed is acquired. The image to be processed may be an image acquired in any manner. For example, the image frames may be image frames captured by a user in real time when the user captures a video, or image frames of a scene to be displayed may be captured by a corresponding video capturing device in an AR scene.
In step 102, the to-be-processed image is input into a pre-trained image processing model, so as to obtain depth information and normal information of each sub-image of the to-be-processed image, which are obtained by semantic segmentation.
After the image to be processed is acquired, the result of segmenting the image to be processed according to the semantic segmentation method can be directly obtained through the pre-trained image processing model, and the depth information and the normal information in each segmented sub-image can be obtained. In this way, when the image to be processed needs to be subsequently processed, the depth information and the normal information on the corresponding sub-image obtained by segmentation can be directly utilized for processing. For example, when the image to be processed is a scene image frame in an AR scene, if a virtual light source needs to be added to a certain scene, image processing may be performed according to the position to which the virtual light source is added, and according to the depth information and normal information of each block of sub-image in each frame of scene image frame acquired in step 102, so as to make the effect of the virtual light source more real. For another example, when the image to be processed is a scene image frame in an AR scene and the AR scene needs to be relighted, the processing may also be performed according to the depth information and normal information of each block of sub-image in each frame of scene image frame obtained in step 102; in addition, when the image to be processed is a scene image frame in an AR scene and a virtual object needs to be added to the AR scene, the processing may also be performed directly according to a position to which the virtual object needs to be added, and depth information and normal information of one or more sub-images in each frame of the scene image frame corresponding to the position.
According to the technical scheme, after the image to be processed is obtained, the depth information and the normal information in the sub-image obtained by dividing each block in the image to be processed can be directly obtained through the image processing module, so that the subsequent processing on the image to be processed can be directly carried out by using the depth information and the normal information on the corresponding sub-image obtained by dividing; in addition, because of the mapping relation of the normal information corresponding to each pixel point in each sub-image obtained by semantic segmentation, the normal information is acquired while the image to be processed is subjected to semantic segmentation, and the accuracy of acquiring the normal information can be improved to a certain extent.
Fig. 2 is a schematic block diagram illustrating an image processing model in an image processing method according to an exemplary embodiment of the present disclosure. As shown in fig. 2, the image processing model includes a first sub-network 1 and a second sub-network 2, where the input of the first sub-network 1 is the image to be processed, and the output is semantic segmentation information of the image to be processed; the input of the second sub-network 2 is the semantic segmentation information and the image to be processed, and the output is the depth information and normal information of each sub-image obtained by segmenting the image to be processed according to the semantic segmentation information.
The first sub-network 1 and the second sub-network 2 may be any suitable neural network, and the types of the first sub-network 1 and the second sub-network 2 are not limited in this disclosure as long as it can be satisfied that the output of the first sub-network 1 can be used as the input of the second sub-network 2, that is, the image to be processed can be encoded identically before being input into the first sub-network 1 and the second sub-network 2.
Through the technical scheme, the acquisition of the semantic segmentation information and the acquisition of the normal direction information and the depth information can be respectively realized through two different networks, so that the normal direction information can be acquired on the basis of acquiring the semantic segmentation information, the acquired normal direction information is more accurate, and the accuracy of semantic segmentation can be ensured.
Fig. 3 is a flowchart illustrating a training method of an image processing model in an image processing method according to an exemplary embodiment of the present disclosure, as shown in fig. 3, the method includes steps 301 to 303.
In step 301, training image samples are input into the first subnetwork.
In step 302, the output data of the first sub-network and the training image samples are simultaneously input into the second sub-network.
In step 303, network parameters in the first sub-network and the second sub-network are adjusted simultaneously with the goal of minimizing the objective loss function based on the output data of the second sub-network and the objective loss function.
That is, in the process of training the image processing model, the first sub-network can directly acquire semantic segmentation information in the training sample image through the training sample image, and combine the semantic segmentation information and the training sample image corresponding to the semantic segmentation information and input the semantic segmentation information and the training sample image into the second sub-network, so as to respectively acquire depth information and normal information in each sub-image in the training sample image.
When the model parameters are adjusted according to the output data of the second sub-network and the objective loss function, the network parameters in the first sub-network and the second sub-network are adjusted at the same time, that is, the first sub-network and the second sub-network are trained at the same time in the process of training the image processing model.
In one possible embodiment, the first subnetwork may also be a semantic segmentation network trained in advance. When the first sub-network is a semantic segmentation network trained in advance, before training sample data for training the semantic segmentation network is input into the semantic segmentation network for training, the coding method of the training sample data is the same as that before the first training image sample is input into the second sub-network.
In a possible implementation, the target loss function is determined according to a first loss function, a second loss function, and a third loss function, which are respectively used to constrain deviations between the semantic segmentation information, the depth information, and the normal information output by the image processing model and the truth labeling values in the training image samples.
That is, the training image sample is labeled with a true value of the semantic segmentation information, a true value of the depth information, and a true value of the normal information. The training image sample can be analog image data obtained by directly drawing a simulation scene, and corresponding true value data can be obtained while drawing; or the training sample image may also be image data which is acquired by an RGBD camera and contains a true value of depth information, wherein after the image data acquired by the RGBD camera is smoothed and noise is removed, normal information corresponding to the image data can be calculated by the depth information, and finally, true value labeling can be performed according to the depth information acquired by the RGBD camera, the normal information obtained by calculation, and a semantic segmentation condition; in addition, in a possible case, the image processing model may be trained by, for example, simulated image data drawn in a simulated scene, then the ordinary RGB image is processed by the image processing model obtained through training to obtain semantic segmentation information, depth information, and normal information of the RGB image, and the output data obtained after processing is used as the training image sample, and then the training of the image processing model is repeated.
When the target loss function is determined based on the first loss function, the second loss function, and the third loss function, the target loss function may be determined based on a weighted sum of the three loss functions. The weights corresponding to different loss functions may be the same or determined according to different training image samples. For example, in the case that the training image sample is analog image data obtained by directly drawing a simulation scene, the weights corresponding to three different loss functions may be the same; when the training image sample is obtained by processing the image data which is acquired by the RGBD camera and contains the true value of the depth information, the weight of the third loss function corresponding to the normal information in the target loss function may be less because the normal information is obtained by calculation; in the case where the training image sample is obtained by processing the ordinary RGB image, since the depth information and the normal information are both obtained by the model, the reliability is not high, and the weights of the second loss function and the third loss function corresponding to the depth information and the normal information should be gradually reduced in the training process.
In a possible embodiment, the target loss function is further determined according to a fourth loss function, wherein the fourth loss function is used for constraining a deviation of a mapping relation between the depth information and the normal information output by the image processing model. That is, according to the mapping relationship between the depth information and the normal information of the same pixel point, the training of the model can be restrained to a certain extent. Wherein, the mapping relationship between the depth information and the normal information may be: the differential of the depth information is equal to the normal. The target loss function may be obtained by weighting the first loss function, the second loss function, the third loss function, and the fourth loss function together.
By the technical scheme, the constraint relation between the depth information and the normal information is also considered in the target loss function, so that the accuracy obtained by training the image processing model can be further improved, and the depth information and the normal information in each sub-image acquired by the trained image processing model are more accurate.
In a possible implementation manner, the target loss function is further determined according to a fifth loss function, where the fifth loss function is used to constrain a deviation between the normal information output by the image processing model and corresponding to each pixel point in each sub-image obtained by being segmented according to the semantic segmentation information output by the image processing model. That is, since the sub-image is obtained by semantic segmentation, a certain constraint effect can be exerted on the training of the model according to the mapping relation of the normal information of all the pixel points in the same sub-image. The mapping relationship of the normal information of all pixel points in the same sub-image block can be as follows: and the normal information of all the pixel points in the same sub-image is mapped to the same normal value. The target loss function may be obtained by weighting the first loss function, the second loss function, the third loss function, the fourth loss function and the fifth loss function together, or the target loss function may be obtained by weighting only the first loss function, the second loss function, the third loss function and the fifth loss function together, and so on.
By the technical scheme, the constraint relation between the semantic segmentation information and the normal information is also considered in the target loss function, so that the accuracy obtained by training the image processing model can be further improved, the normal information in each sub-image acquired by the trained image processing model is more accurate, and the accuracy of the acquired depth information can be improved under the condition that the target loss function also comprises the fourth loss function and the normal information is more accurate.
Fig. 4 is a block diagram illustrating a configuration of an image processing apparatus according to an exemplary embodiment of the present disclosure. As shown in fig. 4, the apparatus includes: an obtaining module 10, configured to obtain an image to be processed; and the processing module 20 is configured to input the image to be processed into a pre-trained image processing model, so as to obtain depth information and normal information of each sub-image obtained by semantic segmentation of the image to be processed.
According to the technical scheme, after the image to be processed is obtained, the depth information and the normal information in the sub-image obtained by dividing each block in the image to be processed can be directly obtained through the image processing module, so that the subsequent processing on the image to be processed can be directly carried out by using the depth information and the normal information on the corresponding sub-image obtained by dividing; in addition, because of the mapping relation of the normal information corresponding to each pixel point in each sub-image obtained by semantic segmentation, the normal information is acquired while the image to be processed is subjected to semantic segmentation, and the accuracy of acquiring the normal information can be improved to a certain extent.
In a possible implementation manner, the image processing model includes a first sub-network and a second sub-network, where the input of the first sub-network is the image to be processed, and the output is semantic segmentation information of the image to be processed; the input of the second sub-network is the semantic segmentation information and the image to be processed, and the output is the depth information and the normal information of each sub-image obtained by segmenting the image to be processed according to the semantic segmentation information.
In one possible embodiment, the image processing model is trained by: inputting training image samples into the first subnetwork; simultaneously inputting the output data of the first sub-network and the training image samples into the second sub-network; and according to the output data of the second sub-network and an objective loss function, taking the objective loss function as a target to be minimized, and simultaneously adjusting network parameters in the first sub-network and the second sub-network.
In a possible implementation, the target loss function is determined according to a first loss function, a second loss function, and a third loss function, which are respectively used to constrain deviations between the semantic segmentation information, the depth information, and the normal information output by the image processing model and the truth labeling values in the training image samples.
In a possible embodiment, the target loss function is further determined according to a fourth loss function, wherein the fourth loss function is used for constraining a deviation of a mapping relation between the depth information and the normal information output by the image processing model.
In a possible implementation manner, the target loss function is further determined according to a fifth loss function, where the fifth loss function is used to constrain a deviation between the normal information output by the image processing model and corresponding to each pixel point in each sub-image obtained by being segmented according to the semantic segmentation information output by the image processing model.
Referring now to FIG. 5, a block diagram of an electronic device 500 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an image to be processed; and inputting the image to be processed into a pre-trained image processing model to obtain the depth information and normal information of each sub-image of the image to be processed, which are obtained by semantic segmentation of the image to be processed.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of the module does not in some cases constitute a limitation of the module itself, and for example, the acquisition module may also be described as a "module that acquires an image to be processed".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 1 provides, in accordance with one or more embodiments of the present disclosure, an image processing method, the method including:
acquiring an image to be processed;
and inputting the image to be processed into a pre-trained image processing model to obtain the depth information and normal information of each sub-image of the image to be processed, which are obtained by semantic segmentation of the image to be processed.
Example 2 provides the method of example 1, the image processing model including a first sub-network and a second sub-network, wherein,
the input of the first sub-network is the image to be processed, and the output is semantic segmentation information of the image to be processed;
the input of the second sub-network is the semantic segmentation information and the image to be processed, and the output is the depth information and the normal information of each sub-image obtained by segmenting the image to be processed according to the semantic segmentation information.
Example 3 provides the method of example 2, the image processing model being trained by:
inputting training image samples into the first subnetwork;
simultaneously inputting the output data of the first sub-network and the training image samples into the second sub-network;
and according to the output data of the second sub-network and an objective loss function, taking the objective loss function as a target to be minimized, and simultaneously adjusting network parameters in the first sub-network and the second sub-network.
Example 4 provides the method of example 3, the target loss function determined according to a first loss function, a second loss function, and a third loss function for constraining a deviation between the semantic segmentation information, the depth information, and the normal information output by the image processing model, respectively, and a truth annotation value in the training image sample, according to one or more embodiments of the present disclosure.
Example 5 provides the method of example 4, the target loss function further determined according to a fourth loss function,
wherein the fourth loss function is used for constraining a deviation of a mapping relation between the depth information and the normal information output by the image processing model.
Example 6 provides the method of example 4 or example 5, the target loss function further determined according to a fifth loss function,
the fifth loss function is used for constraining the deviation between the normal information output by the image processing model and corresponding to each pixel point in each sub-image obtained by segmentation according to the semantic segmentation information output by the image processing model.
Example 7 provides an image processing apparatus according to one or more embodiments of the present disclosure, characterized in that the apparatus includes:
the acquisition module is used for acquiring an image to be processed;
and the processing module is used for inputting the image to be processed into a pre-trained image processing model so as to obtain the depth information and normal information of each sub-image of the image to be processed, which are obtained by semantic segmentation of the image to be processed.
Example 8 provides the apparatus of example 7, the image processing model including a first sub-network and a second sub-network, wherein,
the input of the first sub-network is the image to be processed, and the output is semantic segmentation information of the image to be processed;
the input of the second sub-network is the semantic segmentation information and the image to be processed, and the output is the depth information and the normal information of each sub-image obtained by segmenting the image to be processed according to the semantic segmentation information.
Example 9 provides a computer-readable medium, on which a computer program is stored, according to one or more embodiments of the present disclosure, characterized in that the program, when executed by a processing device, implements the steps of the method of any one of examples 1-6.
Example 10 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method of any of examples 1-6.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims (10)

1. An image processing method, characterized in that the method comprises:
acquiring an image to be processed;
and inputting the image to be processed into a pre-trained image processing model to obtain the depth information and normal information of each sub-image of the image to be processed, which are obtained by semantic segmentation of the image to be processed.
2. The method of claim 1, wherein the image processing model includes a first sub-network and a second sub-network, wherein,
the input of the first sub-network is the image to be processed, and the output is semantic segmentation information of the image to be processed;
the input of the second sub-network is the semantic segmentation information and the image to be processed, and the output is the depth information and the normal information of each sub-image obtained by segmenting the image to be processed according to the semantic segmentation information.
3. The method of claim 2, wherein the image processing model is trained by:
inputting training image samples into the first subnetwork;
simultaneously inputting the output data of the first sub-network and the training image samples into the second sub-network;
and according to the output data of the second sub-network and an objective loss function, taking the objective loss function as a target to be minimized, and simultaneously adjusting network parameters in the first sub-network and the second sub-network.
4. The method of claim 3, wherein the target loss function is determined according to a first loss function, a second loss function, and a third loss function, the first loss function, the second loss function, and the third loss function being used to constrain deviations between the semantic segmentation information, the depth information, and the normal information output by the image processing model, respectively, and the truth labeling values in the training image samples.
5. The method of claim 4, wherein the target loss function is further determined according to a fourth loss function,
wherein the fourth loss function is used for constraining a deviation of a mapping relation between the depth information and the normal information output by the image processing model.
6. The method according to claim 4 or 5, characterized in that the target loss function is further determined according to a fifth loss function,
the fifth loss function is used for constraining the deviation between the normal information output by the image processing model and corresponding to each pixel point in each sub-image obtained by segmentation according to the semantic segmentation information output by the image processing model.
7. An image processing apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring an image to be processed;
and the processing module is used for inputting the image to be processed into a pre-trained image processing model so as to obtain the depth information and normal information of each sub-image of the image to be processed, which are obtained by semantic segmentation of the image to be processed.
8. The apparatus of claim 1, wherein the image processing model comprises a first sub-network and a second sub-network, wherein,
the input of the first sub-network is the image to be processed, and the output is semantic segmentation information of the image to be processed;
the input of the second sub-network is the semantic segmentation information and the image to be processed, and the output is the depth information and the normal information of each sub-image obtained by segmenting the image to be processed according to the semantic segmentation information.
9. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 6.
10. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 6.
CN202011298862.9A 2020-11-18 2020-11-18 Image processing method, image processing device, readable medium and electronic equipment Pending CN112418233A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011298862.9A CN112418233A (en) 2020-11-18 2020-11-18 Image processing method, image processing device, readable medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011298862.9A CN112418233A (en) 2020-11-18 2020-11-18 Image processing method, image processing device, readable medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN112418233A true CN112418233A (en) 2021-02-26

Family

ID=74773445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011298862.9A Pending CN112418233A (en) 2020-11-18 2020-11-18 Image processing method, image processing device, readable medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112418233A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023103887A1 (en) * 2021-12-09 2023-06-15 北京字跳网络技术有限公司 Image segmentation label generation method and apparatus, and electronic device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271990A (en) * 2018-09-03 2019-01-25 北京邮电大学 A kind of semantic segmentation method and device for RGB-D image
CN109447990A (en) * 2018-10-22 2019-03-08 北京旷视科技有限公司 Image, semantic dividing method, device, electronic equipment and computer-readable medium
CN109658418A (en) * 2018-10-31 2019-04-19 百度在线网络技术(北京)有限公司 Learning method, device and the electronic equipment of scene structure
CN109816709A (en) * 2017-11-21 2019-05-28 深圳市优必选科技有限公司 Depth estimation method, device and equipment based on monocular cam
WO2019218136A1 (en) * 2018-05-15 2019-11-21 深圳大学 Image segmentation method, computer device, and storage medium
WO2020020445A1 (en) * 2018-07-24 2020-01-30 Toyota Motor Europe A method and a system for processing images to obtain foggy images

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816709A (en) * 2017-11-21 2019-05-28 深圳市优必选科技有限公司 Depth estimation method, device and equipment based on monocular cam
WO2019218136A1 (en) * 2018-05-15 2019-11-21 深圳大学 Image segmentation method, computer device, and storage medium
WO2020020445A1 (en) * 2018-07-24 2020-01-30 Toyota Motor Europe A method and a system for processing images to obtain foggy images
CN109271990A (en) * 2018-09-03 2019-01-25 北京邮电大学 A kind of semantic segmentation method and device for RGB-D image
CN109447990A (en) * 2018-10-22 2019-03-08 北京旷视科技有限公司 Image, semantic dividing method, device, electronic equipment and computer-readable medium
CN109658418A (en) * 2018-10-31 2019-04-19 百度在线网络技术(北京)有限公司 Learning method, device and the electronic equipment of scene structure

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TIAN XUAN,等: "Review of image semantic segmentation based on deep learning", JOURNAL OF SOFTWARE, vol. 30, no. 02, pages 440 - 468 *
陈廷炯,等: "基于语义分割和点云配准的物体检测与位姿估计", 电子技术, vol. 49, no. 01, pages 36 - 40 *
顾攀,等: "基于神经网络的图像弱监督语义分割算法", 计算机应用与软件, no. 02, pages 284 - 288 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023103887A1 (en) * 2021-12-09 2023-06-15 北京字跳网络技术有限公司 Image segmentation label generation method and apparatus, and electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN111369427B (en) Image processing method, image processing device, readable medium and electronic equipment
CN111784712B (en) Image processing method, device, equipment and computer readable medium
CN110826567B (en) Optical character recognition method, device, equipment and storage medium
CN113313064A (en) Character recognition method and device, readable medium and electronic equipment
CN110991373A (en) Image processing method, image processing apparatus, electronic device, and medium
CN112381717A (en) Image processing method, model training method, device, medium, and apparatus
CN113449070A (en) Multimodal data retrieval method, device, medium and electronic equipment
CN112330788A (en) Image processing method, image processing device, readable medium and electronic equipment
CN112418249A (en) Mask image generation method and device, electronic equipment and computer readable medium
CN112257582A (en) Foot posture determination method, device, equipment and computer readable medium
CN113038176B (en) Video frame extraction method and device and electronic equipment
CN114170342A (en) Image processing method, device, equipment and storage medium
CN111311609B (en) Image segmentation method and device, electronic equipment and storage medium
CN112907628A (en) Video target tracking method and device, storage medium and electronic equipment
CN112752118A (en) Video generation method, device, equipment and storage medium
CN112258622A (en) Image processing method, image processing device, readable medium and electronic equipment
CN112418233A (en) Image processing method, image processing device, readable medium and electronic equipment
CN110765304A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN113033552B (en) Text recognition method and device and electronic equipment
CN112492230B (en) Video processing method and device, readable medium and electronic equipment
CN112070888B (en) Image generation method, device, equipment and computer readable medium
CN111737575B (en) Content distribution method, content distribution device, readable medium and electronic equipment
CN111915532B (en) Image tracking method and device, electronic equipment and computer readable medium
CN111680754B (en) Image classification method, device, electronic equipment and computer readable storage medium
CN114004229A (en) Text recognition method and device, readable medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination