WO2022179581A1 - Image processing method and related device - Google Patents

Image processing method and related device Download PDF

Info

Publication number
WO2022179581A1
WO2022179581A1 PCT/CN2022/077788 CN2022077788W WO2022179581A1 WO 2022179581 A1 WO2022179581 A1 WO 2022179581A1 CN 2022077788 W CN2022077788 W CN 2022077788W WO 2022179581 A1 WO2022179581 A1 WO 2022179581A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
image frame
current image
area
depth
Prior art date
Application number
PCT/CN2022/077788
Other languages
French (fr)
Chinese (zh)
Inventor
刘毅
罗达新
万单盼
许松岑
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022179581A1 publication Critical patent/WO2022179581A1/en

Links

Images

Classifications

    • G06T3/04
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics

Definitions

  • the present application relates to the field of computer technology, and in particular, to an image processing method and related equipment.
  • Panning refers to a shooting method of tracking a moving target object.
  • the image obtained by this shooting method can present a clear foreground area (including the target object) and a blurred background area.
  • users use terminal devices to pan, they usually need to grasp the shutter speed. If the shutter speed is too high, the background area of the image will not have obvious blur effect. If the shutter speed is too slow, the foreground area of the image will not be clear enough.
  • the user can obtain a set of image frames with a high shutter speed through the terminal device (because the shutter speed is too high, the background area of this set of image frames does not have obvious blur effect ), and then process it.
  • the terminal device can first align the three image frames based on the target object, and then align the three image frames based on the target object. Frame interpolation is performed between adjacent image frames to obtain more image frames. Then, the terminal device performs frame mixing of the original image frame and the inserted image frame, so that the background area of the current image frame has a blurring effect.
  • the embodiments of the present application provide an image processing method and related equipment, which can perform different degrees of blurring on the background area of the current image frame, so that the background area of the current image frame has a layered blurring effect, that is, a more realistic background area. Blur effect.
  • a first aspect of the embodiments of the present application provides an image processing method, the method comprising:
  • each image frame includes a foreground area and a background area, wherein both the foreground area and the background area include (present) a subject, and the subject included in the foreground area is generally the target of the user's attention object, the subject contained in the background area is a non-target object that the user does not pay attention to.
  • the terminal device Since the background area of the group of image frames does not have obvious blurring effect, the terminal device needs to process it.
  • the terminal device may select one of the image frames as the image frame to be processed, that is, the current image frame. Next, the terminal device can obtain the depth information of the background area of the current image frame, and the depth information of the background area of the current image frame is used to indicate the distance from each subject included in the background area to the camera, that is, these subjects are in the actual environment ( The distance from the position in 3D space to the camera.
  • the terminal device can divide the background area of the current image frame into multiple sub-areas according to the depth information of the background area of the current image frame.
  • the real background area of the current image can be divided into two sub-areas, one sub-area. The area contains the tree behind the vehicle and another sub-area contains the house behind the tree. In this way, the distances from the objects corresponding to (including) different sub-regions to the camera are different.
  • the terminal device performs different degrees of blurring processing on different sub-regions to obtain the processed current image frame.
  • the terminal device After acquiring the depth information of the background area of the current image frame, the terminal device divides the background area into multiple sub-areas according to the depth information. Since the distances from the objects corresponding to the different sub-regions to the camera are different, the motion conditions of the different sub-regions relative to the previous image frame are also different. Therefore, the terminal device can perform blurring processing on different sub-regions to different degrees, so that the background region of the current image frame has a more realistic blurring effect.
  • the depth information of the background area of the current image frame includes the depth value of each pixel in the background area of the current image frame, and dividing the background area into multiple sub-areas according to the depth information specifically includes: The depth value of each pixel in the background area of the image frame determines the depth change rate of each pixel in the background area of the current image frame. The depth change rate of each pixel is based on the depth value of the pixel and the pixel. The depth values of the remaining surrounding pixels are determined; according to the depth change rate of each pixel point and the preset change rate threshold, the background area is divided into multiple sub-areas.
  • the terminal device can determine the depth change rate of the pixel point according to the depth value of each pixel point and the depth values of the remaining pixel points around the pixel point, and the depth change rate of the pixel point is used to indicate that the pixel point is in actual The difference between the distance from the corresponding position in the environment to the camera and the distance from the corresponding position of the surrounding pixels in the actual environment to the camera. Then, after obtaining the depth change rate of all pixels in the background area of the current image frame, the terminal device can accurately divide the background area of the current image frame into multiple sub-areas according to the depth change rate. The distance from the subject to the camera is different.
  • different degrees of blurring are performed on different sub-regions
  • obtaining the processed current image frame specifically includes: in multiple sub-regions, acquiring the motion vector of each sub-region, the motion vector of each sub-region The vector is used to indicate the motion of the sub-region relative to the previous image frame; the sub-region is blurred according to the motion vector of each sub-region to obtain the processed current image frame.
  • the camera generally rotates or translates when tracking the target object. When the camera is moving, the movement of the objects at different distances relative to the camera (which can also be understood as the degree of movement) is different.
  • the closer objects move more, and the farther objects move Objects move to a lesser extent, which is shown in successive image frames captured by the camera.
  • the motions of the objects corresponding to different sub-regions relative to the camera are different. Therefore, taking the previous image frame of the current image frame as a reference, different sub-regions of the background region of the current image frame have different motions relative to the previous image frame. For example, suppose that the background region of the current image frame contains two sub-regions A and B.
  • the terminal device may obtain the motion vector of each sub-area, and the motion vector of each sub-area is used to indicate the motion condition of the sub-area relative to the previous image frame. Then, the terminal device performs blurring processing on each sub-region according to the motion vector of the sub-region. After all sub-regions are blurred, the background region of the current image frame can have a real blur effect.
  • the terminal device divides the background area into multiple sub-areas according to the depth information. Since the distances from the objects corresponding to the different sub-regions to the camera are different, the motion conditions of the different sub-regions relative to the previous image frame are also different. Therefore, the terminal device can acquire the motion vector of each sub-region, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame. Since the motion conditions of different sub-areas are different, that is, the motion vectors of different sub-areas are different, the terminal device can perform blurring processing on the sub-areas according to the motion vector of each sub-area. , blurring different sub-regions to different degrees, so that the background region of the current image frame has a more realistic blurring effect.
  • the motion vector of each sub-area includes the motion speed of the sub-area and the motion direction of the sub-area.
  • acquiring the motion vector of each sub-area includes: for the multiple sub-areas
  • the terminal device can determine the movement speed of the sub-area according to the movement speed of at least one target pixel in the sub-area from the previous image frame to the current image frame. The average value of the movement speed is taken as the movement speed of the sub-region. Further, the terminal device can also determine the movement direction of the sub-region according to the movement direction of at least one target pixel from the previous image frame to the current image frame.
  • the movement directions of these target pixels are usually the same, so the terminal The movement direction of the target pixel of this part of the device is used as the movement direction of the sub-area.
  • the terminal device can more accurately estimate the motion speed and motion direction of each sub-region, that is, relatively accurately estimate the motion of each sub-region relative to the previous image frame.
  • performing the blurring process on each sub-region according to the motion vector of the sub-region specifically includes: for each sub-region, constructing the corresponding sub-region according to the movement speed of the sub-region and the movement direction of the sub-region.
  • the convolution kernel of the sub-region is processed by the convolution kernel corresponding to the sub-region.
  • the motion vectors of different sub-regions are different (generally, the motion speeds of different sub-regions are different, and the motion directions of different sub-regions are the same)
  • volumes corresponding to different sub-regions can be constructed based on the motion vectors of different sub-regions.
  • the convolution kernel corresponding to different sub-regions is used to perform convolution processing on the corresponding sub-regions, so that different sub-regions can be blurred to different degrees, so that the background region of the current image frame has a more realistic blur effect.
  • At least one target pixel point is a corner point.
  • the target pixels in a certain sub-area are generally the corner points in the sub-area. Since the characteristics of the corner points are relatively obvious, the movement of the corner points in the sub-area can better represent the movement of the sub-area. happening.
  • the movement speed and movement direction of at least one target pixel point are acquired by an optical flow method.
  • the terminal device can determine the moving distance of the target pixel from the previous image frame to the current image frame, the position of the target pixel in the previous image frame, and the position of the target pixel in the current image frame through the optical flow method. .
  • the terminal device can determine the moving speed of the target pixel based on the moving distance of the target pixel from the previous image frame to the current image frame, and based on the position of the target pixel in the previous image frame and the target pixel in the current image.
  • the position in the frame determines the direction of movement of the target pixel.
  • acquiring the depth information of the background area of the current image frame specifically includes: acquiring the depth value of each pixel in the current image frame and the background area of the current image frame; In the depth value of , determine the depth value of each pixel in the background area of the current image frame.
  • the current image frame includes a foreground area and a background area
  • the terminal device may perform area segmentation on the current image frame to obtain the background area of the current image frame.
  • the terminal device can also obtain the depth values of all pixels in the current image frame, and determine the depth value of each pixel in the background area of the current image frame, so as to use this part of the depth values to convert the background area of the current image frame. Divided into multiple sub-regions.
  • acquiring the depth value of each pixel in the current image frame specifically includes: acquiring the depth value of each pixel in the current image frame through a first neural network.
  • accurate monocular depth estimation can be performed on the current image frame through the first neural network, so as to obtain the depth values of all pixels in the current image frame.
  • the camera is a depth camera
  • acquiring the depth value of each pixel in the current image frame specifically includes: acquiring the depth value of each pixel in the current image frame through the depth camera.
  • the depth values of all pixels in the current image frame can be accurately acquired through the depth camera.
  • acquiring the background area of the current image frame specifically includes: acquiring the background area of the current image frame through a second neural network.
  • accurate salient target detection can be performed on the current image frame through the second neural network, thereby distinguishing the foreground area and the background area of the current image frame to obtain the background area of the current image frame.
  • the depth camera is a time of flight (TOF) camera or a structured light camera.
  • TOF time of flight
  • the first neural network or the second neural network is any one of a multilayer perceptron, a convolutional neural network, a recurrent neural network, and a recurrent neural network.
  • a second aspect of the embodiments of the present application provides a model training method, the method includes: obtaining a depth value of each pixel in an image frame to be trained through a first model to be trained; The deviation between the depth value of each pixel in the training image frame and the true depth value of each pixel in the image frame to be trained; update the parameters of the first model to be trained according to the deviation until the model training conditions are met, Get the first neural network.
  • the first neural network trained by this method can perform accurate monocular depth estimation on any image frame, thereby obtaining the depth values of all pixels in the image frame.
  • a third aspect of the embodiments of the present application provides a model training method, the method includes: obtaining a background area of an image frame to be trained by using a second model to be trained; calculating the background area of the image frame to be trained by using a preset target loss function The deviation between the region and the real background region of the image frame to be trained; the parameters of the second model to be trained are updated according to the deviation until the model training conditions are met, and the second neural network is obtained.
  • the second neural network trained by this method can accurately detect salient objects in any image frame, thereby obtaining the background area of the image frame.
  • a fourth aspect of the embodiments of the present application provides an image processing apparatus, which is the aforementioned terminal equipment, and the apparatus includes: an acquisition module for acquiring depth information of a background region of an image frame to be trained; a division module for using The background area is divided into a plurality of sub-areas according to the depth information, and the distances from the objects corresponding to different sub-areas to the camera are different, and the camera is used to capture the current image frame; the processing module is used for different sub-areas. to obtain the processed current image frame.
  • the terminal device After acquiring the depth information of the background area of the current image frame, the terminal device divides the background area into multiple sub-areas according to the depth information. Since the distances from the objects corresponding to the different sub-regions to the camera are different, the motion conditions of the different sub-regions relative to the previous image frame are also different. Therefore, the terminal device can perform blurring processing on different sub-regions to different degrees, so that the background region of the current image frame has a more realistic blurring effect.
  • the depth information of the background area of the current image frame includes the depth value of each pixel in the background area of the current image frame
  • the dividing module is specifically used for: according to each pixel in the background area of the current image frame
  • the depth value of the pixel point determines the depth change rate of each pixel point in the background area of the current image frame.
  • the depth change rate of each pixel point is based on the depth value of the pixel point and the depth values of the remaining pixels around the pixel point. Determine; divide the background area into multiple sub-areas according to the depth change rate of each pixel and the preset change rate threshold.
  • the processing module is specifically configured to: in multiple sub-regions, obtain a motion vector of each sub-region, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame ; Perform a blurring process on each sub-region according to the motion vector of the sub-region to obtain the processed current image frame.
  • the processing module is specifically configured to: for each sub-area in the multiple sub-areas, determine the sub-area according to the movement speed of at least one target pixel in the sub-area from the previous image frame to the current image frame The movement speed of the area; the movement direction of the sub-area is determined according to the movement direction of at least one target pixel from the previous image frame to the current image frame.
  • the processing module is specifically used to: for each sub-region, construct a convolution kernel corresponding to the sub-region according to the movement speed of the sub-region and the movement direction of the sub-region;
  • the convolution kernel performs convolution processing on the sub-region.
  • At least one target pixel point is a corner point.
  • the movement speed and movement direction of at least one target pixel point are acquired by an optical flow method.
  • the acquisition module is specifically used to: acquire the depth value of each pixel in the current image frame and the background area of the current image frame; from the depth values of all pixels in the current image frame, determine the current The depth value of each pixel in the background area of the image frame.
  • the acquiring module is specifically configured to acquire the depth value of each pixel in the current image frame through the first neural network.
  • the camera is a depth camera
  • the acquiring module is specifically configured to acquire the depth value of each pixel in the current image frame through the depth camera.
  • the acquiring module is specifically configured to acquire the background area of the current image frame through the second neural network.
  • the depth camera is a TOF camera or a structured light camera.
  • the first neural network or the second neural network is any one of a multilayer perceptron, a convolutional neural network, a recurrent neural network, and a recurrent neural network.
  • a fifth aspect of the embodiments of the present application provides a model training device, the device comprising: an acquisition module for acquiring the depth value of each pixel in the image frame to be trained through the first to-be-trained model; a calculation module for Calculate the difference between the depth value of each pixel in the image frame to be trained and the true depth value of each pixel in the image frame to be trained through the preset target loss function; the update module is used to adjust the Once the parameters of the model to be trained are updated until the model training conditions are met, the first neural network is obtained.
  • the first neural network trained by the device can perform accurate monocular depth estimation on any image frame, thereby obtaining the depth values of all pixels in the image frame.
  • a sixth aspect of the embodiments of the present application provides a model training device, the device includes: an acquisition module for acquiring a background area of an image frame to be trained by using a second to-be-trained model; a calculation module for passing a preset target The loss function calculates the deviation between the background area of the image frame to be trained and the real background area of the image frame to be trained; the update module is used to update the parameters of the second model to be trained according to the deviation until the model training conditions are met, Get the second neural network.
  • the second neural network trained by the device can accurately detect salient objects in any image frame, thereby obtaining the background area of the image frame.
  • a seventh aspect of the embodiments of the present application provides an image processing apparatus, the apparatus includes a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the image processing apparatus executes the first The method described in the aspect or any one of the possible implementations of the first aspect.
  • An eighth aspect of the embodiments of the present application provides a model training apparatus, the apparatus includes a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the model training apparatus executes the second The method of aspect or the third aspect.
  • a ninth aspect of an embodiment of the present application provides a circuit system, where the circuit system includes a processing circuit, and the processing circuit is configured to execute any one of the first aspect, any possible implementation manner of the first aspect, the second aspect, or the third aspect the method described in the aspect.
  • a tenth aspect of an embodiment of the present application provides a chip system, where the chip system includes a processor for calling a computer program or computer instruction stored in a memory, so that the processor executes any of the first aspect and the first aspect A possible implementation manner, the method described in the second aspect or the third aspect.
  • the processor is coupled to the memory through an interface.
  • the chip system further includes a memory, and the memory stores computer programs or computer instructions.
  • An eleventh aspect of the embodiments of the present application provides a computer storage medium, where the computer storage medium stores a computer program, and when the program is executed by a computer, the computer enables the computer to implement any one of the first aspect and the first aspect.
  • a twelfth aspect of the embodiments of the present application provides a computer program product, where the computer program product stores instructions that, when executed by a computer, cause the computer to implement any one of the possible implementations of the first aspect and the first aspect manner, the method of the second aspect or the third aspect.
  • the terminal device after acquiring the depth information of the background area of the current image frame, divides the background area into multiple sub-areas according to the depth information. Since the distances from the objects corresponding to the different sub-regions to the camera are different, the motion conditions of the different sub-regions relative to the previous image frame are also different. Therefore, the terminal device can obtain the motion vector of each sub-region, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame. Since the motion conditions of different sub-areas are different, that is, the motion vectors of different sub-areas are different, the terminal device can perform blurring processing on the sub-areas according to the motion vector of each sub-area. , blurring different sub-regions to different degrees, so that the background region of the current image frame has a more realistic blurring effect.
  • Fig. 1 is a kind of structural schematic diagram of artificial intelligence main frame
  • FIG. 2a is a schematic structural diagram of an image processing system provided by an embodiment of the present application.
  • FIG. 2b is another schematic structural diagram of an image processing system provided by an embodiment of the present application.
  • FIG. 2c is a schematic diagram of a related device for image processing provided by an embodiment of the present application.
  • FIG. 3a is a schematic diagram of the architecture of the system 100 provided by the embodiment of the present application.
  • Figure 3b is a schematic diagram of a panning lens
  • FIG. 4 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an application scenario of the image processing method provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of an application example of the image processing method provided by the embodiment of the present application.
  • FIG. 7 is another schematic diagram of an application example of the image processing method provided by the embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • FIG. 9 is another schematic flowchart of a model training method provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a model training apparatus provided by an embodiment of the application.
  • FIG. 12 is another schematic structural diagram of a model training apparatus provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an execution device provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a training device provided by an embodiment of the application.
  • FIG. 15 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Panning refers to a shooting method of tracking a moving target object.
  • the image obtained by this shooting method can present a clear foreground area (including the target object) and a blurred background area.
  • panning is usually difficult and uncontrollable.
  • the user can obtain a set of image frames through the terminal device at a high shutter speed (because the shutter speed is too high, the background area of this set of image frames does not have obvious blurring effect), Then, the current image frame (that is, any one image frame in the group of image frames) is processed through the frame mixing technology, so that the background area of the current image frame has a blur effect.
  • AI technology is a technical discipline that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence. AI technology obtains the best results by perceiving the environment, acquiring knowledge and using knowledge.
  • artificial intelligence technology is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Image processing using artificial intelligence is a common application of artificial intelligence.
  • Figure 1 is a structural schematic diagram of the main frame of artificial intelligence.
  • the above-mentioned artificial intelligence theme framework is elaborated in two dimensions.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
  • the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • smart chips hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA
  • the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart city, etc.
  • FIG. 2a is a schematic structural diagram of an image processing system provided by an embodiment of the present application, where the image processing system includes a user equipment and a data processing device.
  • the user equipment includes smart terminals such as mobile phones, personal computers, or information processing centers.
  • the user equipment is the initiator of image processing. As the initiator of the image processing request, the user usually initiates the request through the user equipment.
  • the above-mentioned data processing device may be a device or server with data processing functions, such as a cloud server, a network server, an application server, and a management server.
  • the data processing device receives the image processing request from the intelligent terminal through the interactive interface, and then performs image processing in the form of machine learning, deep learning, search, reasoning, decision-making, etc. through the memory for storing data and the processor for data processing.
  • the memory in the data processing device may be a general term, including local storage and a database for storing historical data.
  • the database may be on the data processing device or on other network servers.
  • the user equipment can receive instructions from the user, for example, the user equipment can acquire an image input/selected by the user, and then initiate a request to the data processing equipment, so that the data processing equipment can target the data obtained by the user equipment.
  • the image executes an image processing application (eg, image depth estimation, image object detection, image blurring, etc.), resulting in corresponding processing results for the image.
  • the user equipment may acquire an image input by the user, and then initiate an image depth estimation request to the data processing device, so that the data processing device performs monocular depth estimation on the image, thereby obtaining the depth information of the image.
  • the data processing device may execute the image processing method of the embodiment of the present application.
  • Fig. 2b is another schematic structural diagram of the image processing system provided by the embodiment of the application.
  • the user equipment is directly used as a data processing device, and the user equipment can directly obtain the input from the user and directly perform the processing by the hardware of the user equipment itself.
  • the specific process of the processing is similar to that of FIG. 2a, and reference may be made to the above description, which will not be repeated here.
  • the user equipment can receive instructions from the user, for example, the user equipment can acquire an image selected by the user in the user equipment, and then the user equipment can execute an image processing application (for example, image depth estimation, image object detection, image blur processing, etc.), so as to obtain the corresponding processing result for the image.
  • an image processing application For example, image depth estimation, image object detection, image blur processing, etc.
  • the user equipment itself can execute the image processing method of the embodiment of the present application.
  • FIG. 2c is a schematic diagram of a related device for image processing provided by an embodiment of the present application.
  • the user equipment in the above-mentioned FIGS. 2a and 2b may specifically be the local device 301 or the local device 302 in FIG. 2c, and the data processing device in FIG. 2a may specifically be the execution device 210 in FIG. 2c, wherein the data storage system 250 may be To store the data to be processed by the execution device 210, the data storage system 250 may be integrated on the execution device 210, or may be set on the cloud or other network servers.
  • the processors in Figures 2a and 2b may perform data training/machine learning/deep learning through a neural network model or other model (eg, a support vector machine-based model), and use the data to finally train or learn the model to execute on the image Image processing application, so as to obtain the corresponding processing results.
  • a neural network model or other model eg, a support vector machine-based model
  • Fig. 3a is a schematic diagram of the architecture of the system 100 provided by the embodiment of the present application.
  • the execution device 110 is configured with an input/output (I/O) interface 112, which is used for data interaction with external devices, and the user Data may be input to the I/O interface 112 through the client device 140, and in this embodiment of the present application, the input data may include: various tasks to be scheduled, callable resources, and other parameters.
  • I/O input/output
  • the execution device 110 may call the data storage system 150
  • the data, codes, etc. in the corresponding processing can also be stored in the data storage system 150 .
  • the I/O interface 112 returns the processing results to the client device 140 for provision to the user.
  • the training device 120 can generate corresponding target models/rules based on different training data for different goals or tasks, and the corresponding target models/rules can be used to achieve the above-mentioned goals or complete the above-mentioned tasks. , which provides the user with the desired result.
  • the training data may be stored in the database 130 and come from training samples collected by the data collection device 160 .
  • the user can manually specify input data, which can be operated through the interface provided by the I/O interface 112 .
  • the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set corresponding permissions in the client device 140 .
  • the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form can be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 .
  • the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .
  • FIG. 3a is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
  • the neural network can be obtained by training according to the training device 120.
  • An embodiment of the present application also provides a chip, where the chip includes a neural network processor NPU.
  • the chip can be set in the execution device 110 as shown in FIG. 3 a to complete the calculation work of the calculation module 111 .
  • the chip can also be set in the training device 120 as shown in FIG. 3a to complete the training work of the training device 120 and output the target model/rule.
  • the neural network processor NPU is mounted on the main central processing unit (CPU) (host CPU) as a co-processor, and tasks are allocated by the main CPU.
  • the core part of the NPU is an arithmetic circuit, and the controller controls the arithmetic circuit to extract the data in the memory (weight memory or input memory) and perform operations.
  • the arithmetic circuit includes multiple processing units (process engines, PEs).
  • the arithmetic circuit is a two-dimensional systolic array.
  • the arithmetic circuit may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.
  • the arithmetic circuit is a general-purpose matrix processor.
  • the operation circuit fetches the data corresponding to the matrix B from the weight memory, and buffers it on each PE in the operation circuit.
  • the arithmetic circuit fetches the data of matrix A from the input memory and performs matrix operation on matrix B, and stores the partial result or final result of the matrix in an accumulator.
  • the vector calculation unit can further process the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on.
  • the vector computing unit can be used for network computation of non-convolutional/non-FC layers in neural networks, such as pooling, batch normalization, local response normalization, etc.
  • the vector computation unit can store the processed output vector to a unified buffer.
  • the vector computing unit may apply a nonlinear function to the output of the arithmetic circuit, such as a vector of accumulated values, to generate activation values.
  • the vector computation unit generates normalized values, merged values, or both.
  • the vector of processed outputs can be used as activation input to an arithmetic circuit, such as for use in subsequent layers in a neural network.
  • Unified memory is used to store input data as well as output data.
  • the weight data directly transfers the input data in the external memory to the input memory and/or the unified memory through the direct memory access controller (DMAC), stores the weight data in the external memory into the weight memory, and transfers the unified memory store the data in the external memory.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) is used to realize the interaction between the main CPU, the DMAC and the instruction fetch memory through the bus.
  • the instruction fetch buffer connected to the controller is used to store the instructions used by the controller
  • the controller is used for invoking the instructions cached in the memory to realize and control the working process of the operation accelerator.
  • the unified memory, input memory, weight memory and instruction fetch memory are all on-chip memories
  • the external memory is the memory outside the NPU
  • the external memory can be double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM), or other readable and writable memory.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
  • These five operations include: 1. Dimension raising/lowering; 2. Enlarging/reducing; 3. Rotation; 4. Translation; 5, "bend”. Among them, the operations of 1, 2, and 3 are completed by Wx, the operation of 4 is completed by +b, and the operation of 5 is implemented by a().
  • W is the weight vector, and each value in the vector represents the weight value of a neuron in the neural network of this layer.
  • This vector W determines the space transformation from the input space to the output space described above, that is, the weight W of each layer controls how the space is transformed.
  • the purpose of training the neural network is to finally obtain the weight matrix of all layers of the trained neural network (the weight matrix formed by the vectors W of many layers). Therefore, the training process of the neural network is essentially learning the way to control the spatial transformation, and more specifically, learning the weight matrix.
  • the neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller.
  • BP error back propagation
  • the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges.
  • the back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.
  • Panning refers to a shooting technique that uses a slow shutter speed to track the target object.
  • the specific method is to follow the moving target object, shake the camera in the same direction and shoot at a relatively close speed.
  • This technique is mainly used for sports themes.
  • the panning lens refers to the images captured by the above-mentioned shooting methods. Such images can present an artistic effect of dynamic blurring of the background, that is, the foreground area (including the target object) of such images is clear and the background area is blurred.
  • Figure 3b Figure 3b is a schematic diagram of a panning lens
  • the foreground area ie the moving car
  • the background area ie the surrounding environment and other objects near the car
  • the model training method provided by the embodiment of the present application involves the processing of images, and can be specifically applied to data processing methods such as data training, machine learning, deep learning, etc., for symbolizing and transforming training data (such as the image frames to be trained in the present application).
  • data processing methods such as data training, machine learning, deep learning, etc.
  • Formalized intelligent information modeling, extraction, preprocessing, training, etc. finally obtain a trained neural network (such as the first neural network and the second neural network in this application); and, the image processing provided by the embodiments of this application
  • the method can use the above-mentioned trained neural network, input the input data (such as the current image frame in this application) into the trained neural network, and obtain output data (such as the depth information of the current image frame in this application, the current background area of an image frame, etc.).
  • model training method and image processing method provided in the embodiments of this application are inventions based on the same idea, and can also be understood as two parts in a system, or two stages of an overall process: such as model The training phase and the model application phase.
  • FIG. 4 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the background area of an image frame processed by this method may have a real blur effect.
  • FIG. 5 FIG. 5 is a schematic diagram of an application scenario of the image processing method provided by the embodiment of the present application
  • the terminal device may select a certain image frame from a group of continuous image frames, and process it, thereby Make the background area of the image frame have a real dynamic blur effect.
  • the terminal device can also process each image frame of the group of image frames, so that the background area of each image frame has a real dynamic blur effect.
  • the method includes:
  • a group of continuous image frames can be acquired at a higher shutter speed through the camera of the terminal device (ie, the aforementioned user device or client device).
  • the user may capture the set of image frames in various ways.
  • the user can set the mode of the camera of the terminal device to the image continuous shooting mode, and then long press the shutter to acquire the group of image frames.
  • the user can continuously press the shutter to acquire the group of image frames.
  • the user can determine whether the current shooting scene conforms to a specific scene (a scene in which the target object is in motion) through the perception technology of the terminal device, and if so, trigger continuous shooting or multiple shooting to obtain the group of image frames.
  • the user can set the mode of the camera of the terminal device to the video recording mode, so as to obtain the group of image frames and so on.
  • each image frame includes a foreground area and a background area, wherein the foreground area and the background area both contain (present) the subject, and the foreground area contains
  • the photographed object is generally the target object that the user pays attention to
  • the photographed object contained in the background area is the non-target object that the user does not pay attention to.
  • the subject contained in the foreground area may be a moving car, and the subject contained in the background area may be the sky, flowers and plants, roads, street lights, etc. around the car.
  • the subject contained in the foreground area may be a person skiing, and the subject contained in the background area may be a house, snow, trees, etc. around the person.
  • the terminal device may select any one of the image frames as the image frame to be processed, that is, the current image frame.
  • the terminal device can select the current image frame in various ways. For example, the terminal device may determine the current image frame from the group of image frames according to the instruction input by the user, that is, the current image frame is the image frame designated by the user. For another example, the terminal device may score each image frame of the group of image frames according to an aesthetic evaluation algorithm, and determine the image frame with the highest score as the current image frame.
  • the terminal device can obtain the depth information of the current image frame and the background area of the current image frame, wherein the depth information of the current image frame is the depth value of each pixel in the current image frame, that is, the current image frame.
  • the depth value of each pixel is used for the distance from the corresponding position of the pixel in the actual environment (three-dimensional space) to the camera.
  • the depth information of the current image frame can be used to indicate the distance from each subject included in the current image frame to the camera, that is, the distance from the position of these subjects in the actual environment to the camera.
  • the terminal device can obtain the depth value of each pixel in the current image frame in various ways.
  • the terminal device can obtain the depth value of each pixel in the current image frame through the first neural network, that is, the monocular depth estimation is performed on the current image frame through the first neural network, so as to obtain the depth of all pixels in the current image frame. value.
  • the terminal device has a depth camera, so after the terminal device obtains the current image frame through the depth camera, it can also simultaneously obtain the depth values of all pixels in the current image frame.
  • the depth camera of the terminal device may be a TOF camera or a structured light camera.
  • the terminal device can also obtain the background area of the current image frame in various ways.
  • the terminal device can obtain the background area of the current image frame through the second neural network, that is, the terminal device can perform salient object detection on the current image frame through the second neural network (directly detect the most obvious object in the current image frame, that is, the target object), directly distinguish the foreground area and background area of the current image frame, or the terminal device can perform target detection on the current image frame through the second neural network (detect each subject in the current image frame) and target Segmentation (determination of the target object from the photographed objects).
  • the terminal device may divide a foreground area and a background area in the current image frame according to the user's instruction, and so on.
  • the first neural network can be a multi-layer perceptron (MLP), a convolutional neural network (CNN), a recurrent neural network (recursive neural network), a recurrent neural network (recurrent neural network) , RNN) and other models
  • the second neural network can also be any one of MLP, CNN, recurrent neural network, RNN and other models, which is not limited here.
  • first neural network and the second neural network in the embodiments of the present application are both trained neural network models.
  • the following will briefly introduce the training process of the first neural network and the second neural network:
  • the foregoing process needs to be performed for each training image frame, which will not be repeated here. If there are only a small number of qualified image frames to be trained in the batch of image frames to be trained, then adjust the parameters of the first model to be trained, and re-train with another batch of image frames to be trained until there are a large number of qualified image frames to be trained frame to get the first neural network.
  • the foregoing process needs to be performed for each training image frame, which will not be repeated here. If there are only a small number of qualified image frames to be trained in the batch of image frames to be trained, adjust the parameters of the second model to be trained, and re-train with another batch of image frames to be trained until there are a large number of qualified image frames to be trained frame to get the second neural network.
  • the terminal device can determine the depth information of the background area of the current image frame from the depth information of the current image frame, and the depth information of the background area of the current image frame is used for Indicates the distance from each subject contained in the background area to the camera, that is, the distance from the position of these subjects in the actual environment to the camera. Specifically, the terminal device can determine the depth value of each pixel in the background area of the current image frame from the depth values of all pixels in the current image frame, that is, the terminal device can determine the depth value of each pixel in the current image frame from all the pixels in the current image frame. Which part of the pixel point is located in the background area of the current image frame, then the depth value of this part of the pixel point is the depth value of all the pixel points in the background area of the current image frame.
  • the terminal device After obtaining the depth information of the background area of the current image frame, the terminal device can divide the background area into multiple sub-areas according to the depth information, and the distances from the object to the camera corresponding to different sub-areas are different. Specifically, after obtaining the depth values of all pixels in the background area of the current image frame, the terminal device calculates the depth change rate of the pixel according to the depth value of each pixel, and the calculation formula is as follows:
  • G(i,j) is the depth change rate of the pixel
  • D(i,j) is the depth value of the pixel
  • D(i,j+1) is the depth value of the remaining pixels around the pixel
  • the terminal device can obtain the depth change rate of all pixels in the background area of the current image frame.
  • the depth change rate of the pixel is used to indicate the depth value of the pixel and surrounding pixels.
  • the difference between the depth values of the point that is, the difference between the distance between the corresponding position of the pixel in the actual environment and the camera, and the distance between the corresponding position of the surrounding pixels in the actual environment and the camera. It can be seen that when the depth change rate of a certain pixel is small, it indicates the difference between the distance between the actual position of the pixel and the camera (the corresponding position in the actual environment) and the distance between the actual position of the surrounding pixels and the camera. The gap is small.
  • the terminal device may divide the background area of the current image frame into multiple sub-areas according to the depth change rate. Specifically, the terminal device may divide the background area of the current image frame into multiple sub-areas according to the depth change rate of each pixel in the background area of the current image frame and a preset change rate threshold.
  • the change rate threshold is equal to or approximately equal to the depth change rate of the edge point of each sub-region, and the change rate threshold is generally set to be larger, so the depth value of the edge point is the same as the depth of the pixel points around the edge point.
  • the depth value is abruptly changed at the edge points. That is to say, there is a large difference between the distance from the actual position of the edge point to the camera and the distance from the actual position of the surrounding pixels to the camera. Therefore, through the depth change rate of each pixel in the background area of the current image frame and the preset change rate threshold, the edge points of each sub-area in the background area can be determined, and then multiple sub-areas can be determined. In this way, different sub-areas correspond to actual positions of different distances, so the distances from the subjects in the same sub-area to the camera are the same or similar, and the distances from the subjects in different sub-areas to the camera are The difference is more obvious.
  • the terminal device divides the background area of the current image frame into three sub-areas according to the depth change rate of all pixels in the background area of the current image frame. It is the plants behind the road, and the third sub-area is the buildings behind the plants. It can be seen that the subject contained in the first sub-area is closest to the camera, and the subject contained in the third sub-area is the farthest from the camera.
  • the camera When the camera is tracking the target object, it generally rotates or translates.
  • the movement of the objects at different distances relative to the camera (which can also be understood as the degree of movement) is different.
  • the closer objects move more, and the farther objects move Objects move to a lesser extent, which is shown in successive image frames captured by the camera.
  • the objects corresponding to different sub-regions have different motions relative to the camera.
  • the position of a sub-area (which can also be understood as the subject contained in the sub-area) in the current image frame is compared with the position of the sub-area in the previous image frame , a certain change must have occurred, and the position changes of different sub-regions are different, that is, the motion of different sub-regions is different. It can be seen that, taking the previous image frame of the current image frame as a reference, different sub-regions of the background region of the current image frame have different motion conditions relative to the previous image frame.
  • the first sub-area is the road where the car is driving
  • the second sub-area is the plant behind the road
  • the third sub-area is the building behind the plant .
  • the movement degree of the first sub-area from the previous image frame to the current image frame is the largest
  • the movement degree of the second sub-area from the previous image frame to the current image frame is second
  • the third sub-area is from the previous image frame to the current image frame. movement is minimal.
  • the terminal device can obtain the motion vector of each sub-area, and the motion vector of each sub-area includes the motion speed of the sub-area and the motion direction of the sub-area.
  • the motion amount of a region is used to indicate the motion of the sub-region relative to the previous image frame.
  • the terminal device may first perform corner detection on the sub-area to determine at least one target pixel (ie, a corner), and this part of the target pixels is usually in the sub-area. Special evidence is more obvious pixels.
  • the terminal device uses the optical flow method to determine the moving distance of this part of the target pixels from the previous image frame to the current image frame, the position of this part of the target pixels in the previous image frame, and the part of the target pixels in the current image frame. s position. Then, the terminal device calculates the movement speed of this part of the target pixels from the previous image frame to the current image frame according to the moving distance of this part of the target pixels and the time difference between the previous image frame and the current image frame, and according to this part of the target pixel The position of the pixel point in the previous image frame and the position of this part of the target pixel point in the current image frame determine the movement direction of this part of the target pixel point from the previous image frame to the current image frame.
  • the terminal device can determine the movement speed of the sub-region (for example, the average value of the movement speed of this part of the target pixels, etc.) according to the movement speed of this part of the target pixels, and determine the movement direction of this part of the target pixels is the movement direction of the sub-region.
  • the movement speed of the sub-region for example, the average value of the movement speed of this part of the target pixels, etc.
  • the terminal device After obtaining the motion speed and motion direction of each sub-area in the background area of the current image frame, for each sub-area, the terminal device constructs the convolution kernel corresponding to the sub-area according to the motion speed of the sub-area and the motion direction of the sub-area , and then perform convolution processing on the sub-region through the corresponding convolution kernel of the sub-region. Since the motions of different sub-regions are different, the corresponding convolution kernels of different sub-regions are also different. Then, the terminal device can use this part of the convolution kernel to perform different degrees of blurring on different sub-regions. In this way, in the background area of the current image frame, different sub-areas may have different degrees of blurring effects, thereby achieving a hierarchical and more realistic dynamic blurring effect.
  • the terminal device after acquiring the depth information of the background area of the current image frame, divides the background area into multiple sub-areas according to the depth information. Since the distances from the objects corresponding to the different sub-regions to the camera are different, the motion conditions of the different sub-regions relative to the previous image frame are also different. Therefore, the terminal device can acquire the motion vector of each sub-region, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame. Since the motion conditions of different sub-areas are different, that is, the motion vectors of different sub-areas are different, the terminal device can perform blurring processing on the sub-areas according to the motion vector of each sub-area. , blurring different sub-regions to different degrees, so that the background region of the current image frame has a more realistic blurring effect.
  • FIG. 6 is a schematic diagram of an application example of the image processing method provided by the embodiment of the application
  • FIG. 7 is another schematic diagram of the application example of the image processing method provided by the embodiment of the application.
  • the application Examples include:
  • the terminal device determines the current image frame 601, it obtains the depth image 602 of the current image frame (that is, the depth information of the current image frame) through the first neural network, wherein the distances from the areas of different colors in the depth image 602 to the camera different.
  • the terminal device obtains the salient image 603 of the current image frame through the second neural network, wherein the salient image 603 of the current image frame is used to highlight the background area of the current image frame, that is, the dark part in the salient image 603 .
  • the terminal device combines the salient image 603 of the current image frame and the depth image 602 of the current image frame 601 to determine the depth image of the background area of the current image frame (ie, the depth information of the background area of the current image frame).
  • the terminal device can calculate the depth change rate of each pixel in the background area according to the depth image of the background area of the current image frame, and divide the background area of the current image frame into multiple sub-areas according to the size of the depth change rate .
  • the terminal device uses the previous image frame 605 as a reference, and in the current image frame 601, marks the motion vector (including the motion speed and motion direction) of the corner points of each sub-area in the background area, and according to The motion vectors of the corners of each sub-region determine the speed and direction of motion of the sub-region.
  • the terminal device determines the convolution kernel corresponding to the sub-area according to the movement speed and movement direction of the sub-area, and uses the convolution kernel corresponding to the sub-area to complete the convolution operation of the sub-area, so that the sub-area has a certain degree of blur effect.
  • different sub-regions can have different degrees of blurring effects, so that the background region of the current image frame has a layered blurring effect, that is, a more realistic blurring effect.
  • FIG. 8 is a schematic flowchart of the model training method provided by the embodiment of the present application. The method include:
  • steps 801 to 803 reference may be made to the relevant description of the training process of the first neural network in the aforementioned step 401, and details are not repeated here. It can be understood that the first neural network in the aforementioned step 401 can be obtained through steps 801 to 803, and the first neural network can perform accurate monocular depth estimation on any image frame, thereby obtaining all the pixels in the image frame. The depth value of the point.
  • FIG. 9 is another schematic flowchart of a model training method provided by an embodiment of the present application, and the method includes:
  • steps 901 to 903 reference may be made to the relevant description of the training process of the second neural network in the foregoing step 401, and details are not repeated here. It can be understood that, through steps 901 to 903, the second neural network in the aforementioned step 401 can be obtained, and the second neural network can perform accurate salient target detection on any image frame, thereby obtaining the background area of the image frame.
  • FIG. 10 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application. As shown in FIG. 10 , the apparatus is the aforementioned terminal equipment, and the apparatus includes:
  • An acquisition module 1001 is used to acquire the depth information of the background region of the image frame to be trained
  • a division module 1002 configured to divide the background area into a plurality of sub-areas according to the depth information, and the distances from the objects corresponding to different sub-areas to the camera are different, and the camera is used to capture the current image frame;
  • the processing module 1003 is configured to perform different degrees of blurring processing on different sub-regions to obtain a processed current image frame.
  • the terminal device after acquiring the depth information of the background area of the current image frame, divides the background area into multiple sub-areas according to the depth information. Since the distances from the objects corresponding to the different sub-regions to the camera are different, the motion conditions of the different sub-regions relative to the previous image frame are also different. Therefore, the terminal device can acquire the motion vector of each sub-region, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame. Since the motion conditions of different sub-areas are different, that is, the motion vectors of different sub-areas are different, the terminal device can perform blurring processing on the sub-areas according to the motion vector of each sub-area. , blurring different sub-regions to different degrees, so that the background region of the current image frame has a more realistic blurring effect.
  • the depth information of the background area of the current image frame includes the depth value of each pixel in the background area of the current image frame
  • the dividing module 1002 is specifically configured to: according to each pixel in the background area of the current image frame The depth value of each pixel point is determined, and the depth change rate of each pixel point in the background area of the current image frame is determined. The depth change rate of each pixel point is based on the depth value of the pixel point and the depth of the remaining pixels around the pixel point. The value is determined; according to the depth change rate of each pixel point and the preset change rate threshold, the background area is divided into multiple sub-areas.
  • the processing module 1003 is specifically configured to: in multiple sub-regions, obtain a motion vector of each sub-region, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame Situation; perform blurring processing on each sub-area according to the motion vector of the sub-area to obtain the processed current image frame.
  • the processing module 1003 is specifically configured to: for each sub-area in the plurality of sub-areas, according to the movement speed of at least one target pixel in the sub-area from the previous image frame to the current image frame, determine the The movement speed of the sub-area; the movement direction of the sub-area is determined according to the movement direction of at least one target pixel from the previous image frame to the current image frame.
  • the processing module 1003 is specifically configured to: for each sub-area, construct a convolution kernel corresponding to the sub-area according to the movement speed of the sub-area and the movement direction of the sub-area;
  • the convolution kernel of performs convolution processing on this sub-region.
  • At least one target pixel point is a corner point.
  • the movement speed and movement direction of at least one target pixel point are acquired by an optical flow method.
  • the obtaining module 1001 is specifically configured to: obtain the depth value of each pixel in the current image frame and the background area of the current image frame; from the depth values of all pixels in the current image frame, determine The depth value of each pixel in the background area of the current image frame.
  • the obtaining module 1001 is specifically configured to obtain the depth value of each pixel in the current image frame through the first neural network.
  • the camera is a depth camera
  • the acquiring module 1001 is specifically configured to acquire the depth value of each pixel in the current image frame through the depth camera.
  • the acquiring module 1001 is specifically configured to acquire the background area of the current image frame through the second neural network.
  • the depth camera is a TOF camera or a structured light camera.
  • the first neural network or the second neural network is any one of a multilayer perceptron, a convolutional neural network, a recurrent neural network, and a recurrent neural network.
  • FIG. 11 is a schematic structural diagram of a model training apparatus provided by an embodiment of the application. As shown in FIG. 11 , the apparatus includes:
  • the calculation module 1102 is used to calculate the deviation between the depth value of each pixel in the image frame to be trained and the true depth value of each pixel in the image frame to be trained through a preset target loss function;
  • the updating module 1103 is configured to update the parameters of the first model to be trained according to the deviation until the model training conditions are met, and the first neural network is obtained.
  • FIG. 12 is another schematic structural diagram of a model training device provided by an embodiment of the present application. As shown in FIG. 12 , the device includes:
  • the acquisition module 1201 is used for acquiring the background area of the image frame to be trained through the second to-be-trained model
  • the calculation module 1202 is used to calculate the deviation between the background area of the image frame to be trained and the real background area of the image frame to be trained through a preset target loss function
  • the updating module 1203 is configured to update the parameters of the second model to be trained according to the deviation until the model training conditions are met, and a second neural network is obtained.
  • FIG. 13 is a schematic structural diagram of the execution device provided by the embodiment of the present application.
  • the execution device 1300 may specifically be represented as a mobile phone, a tablet, a notebook computer, a smart wearable device, a server, etc., which is not limited here.
  • the image processing apparatus described in the embodiment corresponding to FIG. 10 may be deployed on the execution device 1300 to implement the image processing function in the embodiment corresponding to FIG. 4 .
  • the execution device 1300 includes: a receiver 1301, a transmitter 1302, a processor 1303, and a memory 1304 (wherein the number of processors 1303 in the execution device 1300 may be one or more, and one processor is taken as an example in FIG. 13 ) , wherein the processor 1303 may include an application processor 13031 and a communication processor 13032 .
  • the receiver 1301, the transmitter 1302, the processor 1303, and the memory 1304 may be connected by a bus or otherwise.
  • Memory 1304 may include read-only memory and random access memory, and provides instructions and data to processor 1303 .
  • a portion of memory 1304 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1304 stores processors and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1303 controls the operation of the execution device.
  • various components of the execution device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus.
  • the various buses are referred to as bus systems in the figures.
  • the methods disclosed in the above embodiments of the present application may be applied to the processor 1303 or implemented by the processor 1303 .
  • the processor 1303 may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 1303 or an instruction in the form of software.
  • the above-mentioned processor 1303 can be a general-purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), a field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field programmable Field-programmable gate array
  • the processor 1303 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as being executed by a hardware decoding processor, or by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 1304, and the processor 1303 reads the information in the memory 1304, and completes the steps of the above method in combination with its hardware.
  • the receiver 1301 can be used to receive input numerical or character information, and to generate signal input related to performing the relevant setting and function control of the device.
  • the transmitter 1302 can be used to output digital or character information through the first interface; the transmitter 1302 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1302 can also include a display device such as a display screen .
  • the processor 1303 is configured to execute the image processing method executed by the terminal device in the embodiment corresponding to FIG. 4 .
  • FIG. 14 is a schematic structural diagram of the training device provided by the embodiment of the present application.
  • the training device 1400 is implemented by one or more servers.
  • the training device 1400 may vary greatly due to different configurations or performances, and may include one or more central processing units (CPUs) 1414 (eg, one or more processors) and memory 1432, one or more storage media 1430 (eg, one or more mass storage devices) that store applications 1442 or data 1444.
  • the memory 1432 and the storage medium 1430 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1430 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the training device. Further, the central processing unit 1414 may be configured to communicate with the storage medium 1430 to execute a series of instruction operations in the storage medium 1430 on the training device 1400 .
  • the training device 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input and output interfaces 1458; or, one or more operating systems 1441, such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 1441 such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • the training device may perform the steps in the embodiment corresponding to FIG. 8 or FIG. 9 .
  • the embodiments of the present application also relate to a computer storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when it runs on a computer, the computer causes the computer to perform the steps performed by the aforementioned execution device, or, The computer is caused to perform the steps as performed by the aforementioned training device.
  • the embodiments of the present application also relate to a computer program product, where the computer program product stores instructions, which, when executed by the computer, cause the computer to execute the steps executed by the aforementioned execution device, or cause the computer to execute the steps executed by the aforementioned training device A step of.
  • the execution device, training device, or terminal device provided in this embodiment of the present application may specifically be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, pins or circuits, etc.
  • the processing unit can execute the computer executable instructions stored in the storage unit, so that the chip in the execution device executes the data processing method described in the above embodiments, or the chip in the training device executes the data processing method described in the above embodiment.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • FIG. 15 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • the chip may be represented as a neural network processor NPU 1500, and the NPU 1500 is mounted as a co-processor to the host CPU (Host CPU). ), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 1503, which is controlled by the controller 1504 to extract the matrix data in the memory and perform multiplication operations.
  • the arithmetic circuit 1503 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 1503 is a two-dimensional systolic array. The arithmetic circuit 1503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1503 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1502 and buffers it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches the data of matrix A and matrix B from the input memory 1501 to perform matrix operation, and stores the partial result or final result of the matrix in the accumulator 1508 .
  • Unified memory 1506 is used to store input data and output data.
  • the weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 1505, and the DMAC is transferred to the weight memory 1502.
  • Input data is also moved into unified memory 1506 via the DMAC.
  • DMAC Direct Memory Access Controller
  • the BIU is the Bus Interface Unit, that is, the bus interface unit 1510, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 1509.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 1510 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1509 to obtain instructions from the external memory, and also for the storage unit access controller 1505 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1506 , the weight data to the weight memory 1502 , or the input data to the input memory 1501 .
  • the vector calculation unit 1507 includes a plurality of operation processing units, and further processes the output of the operation circuit 1503, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc., if necessary. It is mainly used for non-convolutional/fully connected layer network computation in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
  • the vector computation unit 1507 can store the vector of processed outputs to the unified memory 1506 .
  • the vector calculation unit 1507 may apply a linear function; or a non-linear function to the output of the operation circuit 1503, such as linear interpolation of the feature plane extracted by the convolution layer, such as a vector of accumulated values, to generate activation values.
  • the vector computation unit 1507 generates normalized values, pixel-level summed values, or both.
  • the vector of processed outputs can be used as activation input to the arithmetic circuit 1503, such as for use in subsequent layers in a neural network.
  • the instruction fetch buffer (instruction fetch buffer) 1509 connected to the controller 1504 is used to store the instructions used by the controller 1504;
  • the unified memory 1506, the input memory 1501, the weight memory 1502 and the instruction fetch memory 1509 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • the processor mentioned in any one of the above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above program.
  • the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be retrieved from a website, computer, training device, or data Transmission from the center to another website site, computer, training facility or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) means.
  • wired eg coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device, a data center, or the like that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

Abstract

Disclosed are an image processing method and a related device, of computer vision technology in the field of artificial intelligence, which can perform blurring processing of different degrees on a background area of the current image frame, such that the background area of the current image frame has a hierarchical blurring effect, that is, a more realistic blurring effect. The method in the present application comprises: acquiring depth information of a background area of the current image frame; dividing the background area into a plurality of sub-areas according to the depth information, wherein the distances from captured objects corresponding to different sub-areas to a camera are different; in the plurality of sub-areas, acquiring a motion vector of each sub-area, wherein the motion vector of each sub-area is used for indicating the motion condition of the sub-area relative to a previous image frame; and according to the motion vector of each sub-area, performing blurring processing on the sub-area.

Description

一种图像处理方法及相关设备An image processing method and related equipment
本申请要求于2021年2月26日提交中国专利局、申请号为202110218462.0、申请名称为“一种图像处理方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110218462.0 and the application title "An Image Processing Method and Related Equipment" filed with the China Patent Office on February 26, 2021, the entire contents of which are incorporated herein by reference middle.
技术领域technical field
本申请涉及计算机技术领域,尤其涉及一种图像处理方法及相关设备。The present application relates to the field of computer technology, and in particular, to an image processing method and related equipment.
背景技术Background technique
摇摄是指对运动中的目标物体进行追踪的拍摄方式,通过该拍摄方式得到的图像,可呈现出清晰的前景区域(包含目标物体)和模糊的背景区域。用户使用终端设备实现摇摄时,通常需要把握好快门速度,若快门速度过高,会导致图像的背景区域不具备明显的模糊效果,若快门速度过慢,会导致图像的前景区域不够清晰。Panning refers to a shooting method of tracking a moving target object. The image obtained by this shooting method can present a clear foreground area (including the target object) and a blurred background area. When users use terminal devices to pan, they usually need to grasp the shutter speed. If the shutter speed is too high, the background area of the image will not have obvious blur effect. If the shutter speed is too slow, the foreground area of the image will not be clear enough.
鉴于摇摄的操作难度以及不可控性,用户可通过终端设备以较高的快门速度获取一组图像帧(由于快门速度过高,故这组图像帧的背景区域不具备明显的模糊效果),再对其进行处理。具体地,设该组图像帧包含按时间排序的三个图像帧(以其中任意一个图像帧为当前图像帧),终端设备可先将这三个图像帧基于目标物体进行对齐,并在相邻的图像帧之间进行插帧,从而得到更多的图像帧。然后,终端设备将原始的图像帧以及插入的图像帧进行帧混合,使得当前图像帧的背景区域具备模糊效果。In view of the difficulty and uncontrollability of panning, the user can obtain a set of image frames with a high shutter speed through the terminal device (because the shutter speed is too high, the background area of this set of image frames does not have obvious blur effect ), and then process it. Specifically, suppose that the group of image frames includes three image frames sorted by time (with any one of the image frames as the current image frame), the terminal device can first align the three image frames based on the target object, and then align the three image frames based on the target object. Frame interpolation is performed between adjacent image frames to obtain more image frames. Then, the terminal device performs frame mixing of the original image frame and the inserted image frame, so that the background area of the current image frame has a blurring effect.
在上述过程中,由于帧混合技术的局限性,若进行帧混合的图像帧数量较少,往往会导致当前图像帧的背景区域的模糊效果不够真实,如背景区域出现重影、虚化等现象。In the above process, due to the limitations of the frame mixing technology, if the number of image frames for frame mixing is small, the blurring effect of the background area of the current image frame is often unrealistic, such as ghosting and blurring in the background area. .
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种图像处理方法及相关设备,可对当前图像帧的背景区域进行不同程度的模糊处理,从而使得当前图像帧的背景区域具备层次化的模糊效果,即具备更加真实的模糊效果。The embodiments of the present application provide an image processing method and related equipment, which can perform different degrees of blurring on the background area of the current image frame, so that the background area of the current image frame has a layered blurring effect, that is, a more realistic background area. Blur effect.
本申请实施例的第一方面提供了一种图像处理方法,该方法包括:A first aspect of the embodiments of the present application provides an image processing method, the method comprising:
当用户需要对运动中的目标物体进行摇摄时,可通过终端设备的摄像头以较高的快门速度获取一组连续的图像帧。在该组图像帧中,每个图像帧均包含前景区域以及背景区域,其中,前景区域和背景区域均包含(呈现)有被摄物体,前景区域包含的被摄物体一般为用户关注的目标物体,则背景区域包含的被摄物体为用户不关注的非目标物体。When the user needs to pan the moving target object, a group of continuous image frames can be acquired through the camera of the terminal device at a high shutter speed. In this group of image frames, each image frame includes a foreground area and a background area, wherein both the foreground area and the background area include (present) a subject, and the subject included in the foreground area is generally the target of the user's attention object, the subject contained in the background area is a non-target object that the user does not pay attention to.
由于该组图像帧的背景区域不具备明显的模糊效果,故终端设备需对其进行处理。在该组图像帧中,终端设备可挑选其中一个图像帧作为待处理的图像帧,即当前图像帧。接着,终端设备可获取当前图像帧的背景区域的深度信息,当前图像帧的背景区域的深度信息用于指示背景区域包含的各个被摄物体到摄像头的距离,即这些被摄物体在实际环境(三维空间)中的位置到摄像头的距离。Since the background area of the group of image frames does not have obvious blurring effect, the terminal device needs to process it. In the group of image frames, the terminal device may select one of the image frames as the image frame to be processed, that is, the current image frame. Next, the terminal device can obtain the depth information of the background area of the current image frame, and the depth information of the background area of the current image frame is used to indicate the distance from each subject included in the background area to the camera, that is, these subjects are in the actual environment ( The distance from the position in 3D space to the camera.
值得注意的是,不同被摄物体到摄像头的距离不同,例如,当前图像帧中,前景区域 包含行驶中的车辆,背景区域包含车辆后方的树以及树后方的房子,故树到摄像头的距离以及房子到摄像头的距离不同。因此,终端设备可根据当前图像帧的背景区域的深度信息,将当前图像帧的背景区域划分为多个子区域,依旧如前述例子,可将当前图像真的背景区域划分为两个子区域,一个子区域包含车辆后方的树,另一个子区域包含树后方的房子。如此一来,不同子区域对应(包含)的被摄物体到摄像头的距离不同。It is worth noting that the distances from different subjects to the camera are different. For example, in the current image frame, the foreground area includes the moving vehicle, and the background area includes the tree behind the vehicle and the house behind the tree, so the distance from the tree to the camera and The distance from the house to the camera varies. Therefore, the terminal device can divide the background area of the current image frame into multiple sub-areas according to the depth information of the background area of the current image frame. Still as in the previous example, the real background area of the current image can be divided into two sub-areas, one sub-area. The area contains the tree behind the vehicle and another sub-area contains the house behind the tree. In this way, the distances from the objects corresponding to (including) different sub-regions to the camera are different.
最后,终端设备对不同的子区域进行不同程度的模糊处理,得到处理后的当前图像帧。Finally, the terminal device performs different degrees of blurring processing on different sub-regions to obtain the processed current image frame.
从上述方法可以看出:终端设备在获取当前图像帧的背景区域的深度信息后,则根据深度信息将背景区域划分为多个子区域。由于不同的子区域对应的被摄物体到摄像头的距离不同,导致不同的子区域相对于前一图像帧的运动情况也不同。因此,终端设备可对不同的子区域进行不同程度上的模糊处理,使得当前图像帧的背景区域具备更加真实的模糊效果。It can be seen from the above method that after acquiring the depth information of the background area of the current image frame, the terminal device divides the background area into multiple sub-areas according to the depth information. Since the distances from the objects corresponding to the different sub-regions to the camera are different, the motion conditions of the different sub-regions relative to the previous image frame are also different. Therefore, the terminal device can perform blurring processing on different sub-regions to different degrees, so that the background region of the current image frame has a more realistic blurring effect.
在一种可能的实现方式中,当前图像帧的背景区域的深度信息包括当前图像帧的背景区域中每个像素点的深度值,根据深度信息将背景区域划分为多个子区域具体包括:根据当前图像帧的背景区域中每个像素点的深度值,确定当前图像帧的背景区域中每个像素点的深度变化率,每个像素点的深度变化率根据该像素点的深度值以及该像素点周围的其余像素点的深度值确定;根据每个像素点的深度变化率以及预置的变化率阈值,将背景区域划分为多个子区域。前述实现方式中,对于当前图像帧的背景区域中的任意一个像素点,该像素点的深度值用于指示该像素点在实际环境中的对应位置到摄像头的距离。因此,终端设备可根据每个像素点的深度值以及该像素点周围的其余像素点的深度值,确定该像素点的深度变化率,该像素点的深度变化率用于指示该像素点在实际环境中的对应位置到摄像头的距离,与周围像素点在实际环境中的对应位置到摄像头的距离之间的差值。那么,终端设备得到当前图像帧的背景区域中所有像素点的深度变化率后,可按深度变化率的大小,准确地将当前图像帧的背景区域划分为多个子区域,不同子区域对应的被摄物体到摄像头的距离不同。In a possible implementation manner, the depth information of the background area of the current image frame includes the depth value of each pixel in the background area of the current image frame, and dividing the background area into multiple sub-areas according to the depth information specifically includes: The depth value of each pixel in the background area of the image frame determines the depth change rate of each pixel in the background area of the current image frame. The depth change rate of each pixel is based on the depth value of the pixel and the pixel. The depth values of the remaining surrounding pixels are determined; according to the depth change rate of each pixel point and the preset change rate threshold, the background area is divided into multiple sub-areas. In the foregoing implementation manner, for any pixel in the background area of the current image frame, the depth value of the pixel is used to indicate the distance from the corresponding position of the pixel in the actual environment to the camera. Therefore, the terminal device can determine the depth change rate of the pixel point according to the depth value of each pixel point and the depth values of the remaining pixel points around the pixel point, and the depth change rate of the pixel point is used to indicate that the pixel point is in actual The difference between the distance from the corresponding position in the environment to the camera and the distance from the corresponding position of the surrounding pixels in the actual environment to the camera. Then, after obtaining the depth change rate of all pixels in the background area of the current image frame, the terminal device can accurately divide the background area of the current image frame into multiple sub-areas according to the depth change rate. The distance from the subject to the camera is different.
在一种可能的实现方式中,对不同的子区域进行不同程度的模糊处理,得到处理后的当前图像帧具体包括:在多个子区域中,获取每个子区域的运动矢量,每个子区域的运动矢量用于指示该子区域相对于前一图像帧的运动情况;根据每个子区域的运动矢量对该子区域进行模糊处理,得到处理后的当前图像帧。前述实现方式中,摄像头在追踪目标物体时,一般会发生旋转或者平移。当摄像头在移动中进行拍摄时,不同远近的被摄物体相对于摄像头的运动情况(也可理解为移动程度)不同,例如,较近的被摄物体的移动程度较大,较远的被摄物体的移动程度较小,这一情况被呈现在摄像头所拍摄的连续图像帧中。具体地,在当前图像帧的背景区域的多个子区域中,由于不同子区域对应的被摄物体到摄像头的距离不同,故不同子区域对应的被摄物体相对于摄像头的运动情况不同。因此,以当前图像帧的前一图像帧为参考基准,当前图像帧的背景区域的不同子区域相对于前一图像帧的运动情况不同,例如,设当前图像帧的背景区域包含两个子区域A和B。那么,子区域A从前一图像帧到当前图像帧的运动情况,与子区域B从前一图像帧到当前图像帧的 运动情况不同。为了确定当前图像帧的背景区域中每个子区域的运动情况,终端设备可获取每个子区域的运动矢量,每个子区域的运动矢量用于指示该子区域相对于前一图像帧的运动情况。然后,终端设备根据每个子区域的运动矢量对该子区域进行模糊处理。完成所有子区域的模糊处理后,当前图像帧的背景区域可具备真实的模糊效果。In a possible implementation manner, different degrees of blurring are performed on different sub-regions, and obtaining the processed current image frame specifically includes: in multiple sub-regions, acquiring the motion vector of each sub-region, the motion vector of each sub-region The vector is used to indicate the motion of the sub-region relative to the previous image frame; the sub-region is blurred according to the motion vector of each sub-region to obtain the processed current image frame. In the foregoing implementation manner, the camera generally rotates or translates when tracking the target object. When the camera is moving, the movement of the objects at different distances relative to the camera (which can also be understood as the degree of movement) is different. For example, the closer objects move more, and the farther objects move Objects move to a lesser extent, which is shown in successive image frames captured by the camera. Specifically, in the multiple sub-areas of the background area of the current image frame, since the distances of the objects corresponding to different sub-regions to the camera are different, the motions of the objects corresponding to different sub-regions relative to the camera are different. Therefore, taking the previous image frame of the current image frame as a reference, different sub-regions of the background region of the current image frame have different motions relative to the previous image frame. For example, suppose that the background region of the current image frame contains two sub-regions A and B. Then, the motion of sub-region A from the previous image frame to the current image frame is different from the motion of sub-region B from the previous image frame to the current image frame. In order to determine the motion condition of each sub-area in the background area of the current image frame, the terminal device may obtain the motion vector of each sub-area, and the motion vector of each sub-area is used to indicate the motion condition of the sub-area relative to the previous image frame. Then, the terminal device performs blurring processing on each sub-region according to the motion vector of the sub-region. After all sub-regions are blurred, the background region of the current image frame can have a real blur effect.
从该实现方式可以看出:终端设备在获取当前图像帧的背景区域的深度信息后,则根据深度信息将背景区域划分为多个子区域。由于不同的子区域对应的被摄物体到摄像头的距离不同,导致不同的子区域相对于前一图像帧的运动情况也不同。因此,终端设备可获取每个子区域的运动矢量,每个子区域的运动矢量用于指示该子区域相对于前一图像帧的运动情况。由于不同的子区域的运动情况不同,即不同的子区域的运动矢量不同,故终端设备可根据每个子区域的运动矢量对该子区域进行模糊处理,即终端设备可根据不同子区域的运动情况,对不同的子区域进行不同程度上的模糊处理,使得当前图像帧的背景区域具备更加真实的模糊效果。It can be seen from this implementation manner that, after acquiring the depth information of the background area of the current image frame, the terminal device divides the background area into multiple sub-areas according to the depth information. Since the distances from the objects corresponding to the different sub-regions to the camera are different, the motion conditions of the different sub-regions relative to the previous image frame are also different. Therefore, the terminal device can acquire the motion vector of each sub-region, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame. Since the motion conditions of different sub-areas are different, that is, the motion vectors of different sub-areas are different, the terminal device can perform blurring processing on the sub-areas according to the motion vector of each sub-area. , blurring different sub-regions to different degrees, so that the background region of the current image frame has a more realistic blurring effect.
在一种可能的实现方式中,每个子区域的运动矢量包含该子区域的运动速度以及该子区域的运动方向,在多个子区域中,获取每个子区域的运动矢量包括:对于多个子区域中的每个子区域,终端设备可根据该子区域中至少一个目标像素点从前一图像帧到当前图像帧的运动速度,确定该子区域的运动速度,例如,终端设备可根据这部分目标像素点的运动速度的平均值作为该子区域的运动速度。进一步地,终端设备还可根据至少一个目标像素点从前一图像帧到当前图像帧的运动方向,确定该子区域的运动方向,例如,这部分目标像素点的运动方向通常是相同的,故终端设备这部分目标像素点的运动方向作为该子区域的运动方向。通过前述实现方式,终端设备可以较为准确地估计出每个子区域的运动速度以及运动方向,即较为准确地估计出每个子区域相对于前一图像帧的运动情况。In a possible implementation manner, the motion vector of each sub-area includes the motion speed of the sub-area and the motion direction of the sub-area. In multiple sub-areas, acquiring the motion vector of each sub-area includes: for the multiple sub-areas For each sub-area, the terminal device can determine the movement speed of the sub-area according to the movement speed of at least one target pixel in the sub-area from the previous image frame to the current image frame. The average value of the movement speed is taken as the movement speed of the sub-region. Further, the terminal device can also determine the movement direction of the sub-region according to the movement direction of at least one target pixel from the previous image frame to the current image frame. For example, the movement directions of these target pixels are usually the same, so the terminal The movement direction of the target pixel of this part of the device is used as the movement direction of the sub-area. Through the foregoing implementation manner, the terminal device can more accurately estimate the motion speed and motion direction of each sub-region, that is, relatively accurately estimate the motion of each sub-region relative to the previous image frame.
在一种可能的实现方式中,根据每个子区域的运动矢量对该子区域进行模糊处理具体包括:对于每个子区域,根据该子区域的运动速度以及该子区域的运动方向构建该子区域对应的卷积核;通过该子区域对应的卷积核对该子区域进行卷积处理。前述实现方式中,由于不同子区域的运动矢量不同(一般地,不同子区域的运动速度不同,不同子区域的运动方向相同),故基于不同子区域的运动矢量可构建不同子区域对应的卷积核,并利用不同子区域对应的卷积核对相应的子区域进行卷积处理,从而对不同的子区域实现不同程度上的模糊处理,使得当前图像帧的背景区域具备更加真实的模糊效果。In a possible implementation manner, performing the blurring process on each sub-region according to the motion vector of the sub-region specifically includes: for each sub-region, constructing the corresponding sub-region according to the movement speed of the sub-region and the movement direction of the sub-region. The convolution kernel of the sub-region is processed by the convolution kernel corresponding to the sub-region. In the foregoing implementation manner, since the motion vectors of different sub-regions are different (generally, the motion speeds of different sub-regions are different, and the motion directions of different sub-regions are the same), volumes corresponding to different sub-regions can be constructed based on the motion vectors of different sub-regions. The convolution kernel corresponding to different sub-regions is used to perform convolution processing on the corresponding sub-regions, so that different sub-regions can be blurred to different degrees, so that the background region of the current image frame has a more realistic blur effect.
在一种可能的实现方式中,至少一个目标像素点为角点。前述实现方式中,某个子区域中的目标像素点一般为该子区域中的角点,由于角点的特征比较明显,故该子区域中的角点的运动情况更能代表该子区域的运动情况。In a possible implementation manner, at least one target pixel point is a corner point. In the aforementioned implementation manner, the target pixels in a certain sub-area are generally the corner points in the sub-area. Since the characteristics of the corner points are relatively obvious, the movement of the corner points in the sub-area can better represent the movement of the sub-area. Happening.
在一种可能的实现方式中,至少一个目标像素点的运动速度和运动方向通过光流法获取。前述实现方式中,终端设备可通过光流法确定目标像素点从前一图像帧到当前图像帧的运动距离、目标像素点在前一图像帧中的位置以及目标像素点在当前图像帧中的位置。如此一来,终端设备可基于目标像素点从前一图像帧到当前图像帧的运动距离确定目标像素点的运动速度,并基于目标像素点在前一图像帧中的位置以及目标像素点在当前图像帧中的位置确定目标像素点的运动方向。In a possible implementation manner, the movement speed and movement direction of at least one target pixel point are acquired by an optical flow method. In the foregoing implementation manner, the terminal device can determine the moving distance of the target pixel from the previous image frame to the current image frame, the position of the target pixel in the previous image frame, and the position of the target pixel in the current image frame through the optical flow method. . In this way, the terminal device can determine the moving speed of the target pixel based on the moving distance of the target pixel from the previous image frame to the current image frame, and based on the position of the target pixel in the previous image frame and the target pixel in the current image. The position in the frame determines the direction of movement of the target pixel.
在一种可能的实现方式中,获取当前图像帧的背景区域的深度信息具体包括:获取当前图像帧中每个像素点的深度值以及当前图像帧的背景区域;从当前图像帧中所有像素点的深度值中,确定当前图像帧的背景区域中每个像素点的深度值。前述实现方式中,当前图像帧包括前景区域和背景区域,终端设备可对当前图像帧进行区域分割,得到当前图像帧的背景区域。进一步地,终端设备还可获取当前图像帧中所有像素点的深度值,并从中确定当前图像帧的背景区域中每个像素点的深度值,以利用这部分深度值将当前图像帧的背景区域划分为多个子区域。In a possible implementation manner, acquiring the depth information of the background area of the current image frame specifically includes: acquiring the depth value of each pixel in the current image frame and the background area of the current image frame; In the depth value of , determine the depth value of each pixel in the background area of the current image frame. In the foregoing implementation manner, the current image frame includes a foreground area and a background area, and the terminal device may perform area segmentation on the current image frame to obtain the background area of the current image frame. Further, the terminal device can also obtain the depth values of all pixels in the current image frame, and determine the depth value of each pixel in the background area of the current image frame, so as to use this part of the depth values to convert the background area of the current image frame. Divided into multiple sub-regions.
在一种可能的实现方式中,获取当前图像帧中每个像素点的深度值具体包括:通过第一神经网络获取当前图像帧中每个像素点的深度值。前述实现方式中,通过第一神经网络可对当前图像帧进行准确的单目深度估计,从而得到当前图像帧中所有像素点的深度值。In a possible implementation manner, acquiring the depth value of each pixel in the current image frame specifically includes: acquiring the depth value of each pixel in the current image frame through a first neural network. In the foregoing implementation manner, accurate monocular depth estimation can be performed on the current image frame through the first neural network, so as to obtain the depth values of all pixels in the current image frame.
在一种可能的实现方式中,摄像头为深度摄像头,获取当前图像帧中每个像素点的深度值具体包括:通过深度摄像头获取当前图像帧中每个像素点的深度值。前述实现方式中,通过深度摄像头可准确获取当前图像帧中所有像素点的深度值。In a possible implementation manner, the camera is a depth camera, and acquiring the depth value of each pixel in the current image frame specifically includes: acquiring the depth value of each pixel in the current image frame through the depth camera. In the foregoing implementation manner, the depth values of all pixels in the current image frame can be accurately acquired through the depth camera.
在一种可能的实现方式中,获取当前图像帧的背景区域具体包括:通过第二神经网络获取当前图像帧的背景区域。前述实现方式中,通过第二神经网络可对当前图像帧进行准确的显著目标检测,从而将当前图像帧的前景区域和背景区域区分开来,得到当前图像帧的背景区域。In a possible implementation manner, acquiring the background area of the current image frame specifically includes: acquiring the background area of the current image frame through a second neural network. In the foregoing implementation manner, accurate salient target detection can be performed on the current image frame through the second neural network, thereby distinguishing the foreground area and the background area of the current image frame to obtain the background area of the current image frame.
在一种可能的实现方式中,深度摄像头为飞行时间(time of flight,TOF)摄像头或结构光摄像头。In a possible implementation manner, the depth camera is a time of flight (TOF) camera or a structured light camera.
在一种可能的实现方式中,第一神经网络或第二神经网络为多层感知机、卷积神经网络、递归神经网络以及循环神经网络中的任意一种。In a possible implementation manner, the first neural network or the second neural network is any one of a multilayer perceptron, a convolutional neural network, a recurrent neural network, and a recurrent neural network.
本申请实施例的第二方面提供了一种模型训练方法,该方法包括:通过第一待训练模型获取待训练图像帧中每个像素点的深度值;通过预置的目标损失函数,计算待训练图像帧中每个像素点的深度值以及待训练图像帧中每个像素点的真实深度值之间的偏差;根据该偏差对第一待训练模型的参数进行更新,直至满足模型训练条件,得到第一神经网络。A second aspect of the embodiments of the present application provides a model training method, the method includes: obtaining a depth value of each pixel in an image frame to be trained through a first model to be trained; The deviation between the depth value of each pixel in the training image frame and the true depth value of each pixel in the image frame to be trained; update the parameters of the first model to be trained according to the deviation until the model training conditions are met, Get the first neural network.
从上述方法可以看出:通过该方法训练得到的第一神经网络,可对任意一个图像帧进行准确的单目深度估计,从而得到该图像帧中所有像素点的深度值。It can be seen from the above method that the first neural network trained by this method can perform accurate monocular depth estimation on any image frame, thereby obtaining the depth values of all pixels in the image frame.
本申请实施例的第三方面提供了一种模型训练方法,该方法包括:通过第二待训练模型获取待训练图像帧的背景区域;通过预置的目标损失函数,计算待训练图像帧的背景区域以及待训练图像帧的真实背景区域之间的偏差;根据该偏差对第二待训练模型的参数进行更新,直至满足模型训练条件,得到第二神经网络。A third aspect of the embodiments of the present application provides a model training method, the method includes: obtaining a background area of an image frame to be trained by using a second model to be trained; calculating the background area of the image frame to be trained by using a preset target loss function The deviation between the region and the real background region of the image frame to be trained; the parameters of the second model to be trained are updated according to the deviation until the model training conditions are met, and the second neural network is obtained.
从上述方法可以看出:通过该方法训练得到的第二神经网络,可对任意一个图像帧进行准确的显著目标检测,从而得到该图像帧的背景区域。It can be seen from the above method that the second neural network trained by this method can accurately detect salient objects in any image frame, thereby obtaining the background area of the image frame.
本申请实施例的第四方面提供了一种图像处理装置,该装置即为前述的终端设备,该装置包括:获取模块,用于获取待训练图像帧的背景区域的深度信息;划分模块,用于根据深度信息将背景区域划分为多个子区域,不同的子区域对应的被摄物体到摄像头的距离不同,该摄像头用于拍摄当前图像帧;处理模块,用于对不同的子区域进行不同程度的模 糊处理,得到处理后的当前图像帧。A fourth aspect of the embodiments of the present application provides an image processing apparatus, which is the aforementioned terminal equipment, and the apparatus includes: an acquisition module for acquiring depth information of a background region of an image frame to be trained; a division module for using The background area is divided into a plurality of sub-areas according to the depth information, and the distances from the objects corresponding to different sub-areas to the camera are different, and the camera is used to capture the current image frame; the processing module is used for different sub-areas. to obtain the processed current image frame.
从上述装置可以看出:终端设备在获取当前图像帧的背景区域的深度信息后,则根据深度信息将背景区域划分为多个子区域。由于不同的子区域对应的被摄物体到摄像头的距离不同,导致不同的子区域相对于前一图像帧的运动情况也不同。因此,终端设备可对不同的子区域进行不同程度上的模糊处理,使得当前图像帧的背景区域具备更加真实的模糊效果。It can be seen from the above device that after acquiring the depth information of the background area of the current image frame, the terminal device divides the background area into multiple sub-areas according to the depth information. Since the distances from the objects corresponding to the different sub-regions to the camera are different, the motion conditions of the different sub-regions relative to the previous image frame are also different. Therefore, the terminal device can perform blurring processing on different sub-regions to different degrees, so that the background region of the current image frame has a more realistic blurring effect.
在一种可能的实现方式中,当前图像帧的背景区域的深度信息包括当前图像帧的背景区域中每个像素点的深度值,划分模块具体用于:根据当前图像帧的背景区域中每个像素点的深度值,确定当前图像帧的背景区域中每个像素点的深度变化率,每个像素点的深度变化率根据该像素点的深度值以及该像素点周围的其余像素点的深度值确定;根据每个像素点的深度变化率以及预置的变化率阈值,将背景区域划分为多个子区域。In a possible implementation manner, the depth information of the background area of the current image frame includes the depth value of each pixel in the background area of the current image frame, and the dividing module is specifically used for: according to each pixel in the background area of the current image frame The depth value of the pixel point determines the depth change rate of each pixel point in the background area of the current image frame. The depth change rate of each pixel point is based on the depth value of the pixel point and the depth values of the remaining pixels around the pixel point. Determine; divide the background area into multiple sub-areas according to the depth change rate of each pixel and the preset change rate threshold.
在一种可能的实现方式中,处理模块具体用于:在多个子区域中,获取每个子区域的运动矢量,每个子区域的运动矢量用于指示该子区域相对于前一图像帧的运动情况;根据每个子区域的运动矢量对该子区域进行模糊处理,得到处理后的当前图像帧。In a possible implementation manner, the processing module is specifically configured to: in multiple sub-regions, obtain a motion vector of each sub-region, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame ; Perform a blurring process on each sub-region according to the motion vector of the sub-region to obtain the processed current image frame.
在一种可能的实现方式中,处理模块具体用于:对于多个子区域中的每个子区域,根据该子区域中至少一个目标像素点从前一图像帧到当前图像帧的运动速度,确定该子区域的运动速度;根据至少一个目标像素点从前一图像帧到当前图像帧的运动方向,确定该子区域的运动方向。In a possible implementation manner, the processing module is specifically configured to: for each sub-area in the multiple sub-areas, determine the sub-area according to the movement speed of at least one target pixel in the sub-area from the previous image frame to the current image frame The movement speed of the area; the movement direction of the sub-area is determined according to the movement direction of at least one target pixel from the previous image frame to the current image frame.
在一种可能的实现方式中,处理模块具体用于:对于每个子区域,根据该子区域的运动速度以及该子区域的运动方向构建该子区域对应的卷积核;通过该子区域对应的卷积核对该子区域进行卷积处理。In a possible implementation manner, the processing module is specifically used to: for each sub-region, construct a convolution kernel corresponding to the sub-region according to the movement speed of the sub-region and the movement direction of the sub-region; The convolution kernel performs convolution processing on the sub-region.
在一种可能的实现方式中,至少一个目标像素点为角点。In a possible implementation manner, at least one target pixel point is a corner point.
在一种可能的实现方式中,至少一个目标像素点的运动速度和运动方向通过光流法获取。In a possible implementation manner, the movement speed and movement direction of at least one target pixel point are acquired by an optical flow method.
在一种可能的实现方式中,获取模块具体用于:获取当前图像帧中每个像素点的深度值以及当前图像帧的背景区域;从当前图像帧中所有像素点的深度值中,确定当前图像帧的背景区域中每个像素点的深度值。In a possible implementation manner, the acquisition module is specifically used to: acquire the depth value of each pixel in the current image frame and the background area of the current image frame; from the depth values of all pixels in the current image frame, determine the current The depth value of each pixel in the background area of the image frame.
在一种可能的实现方式中,获取模块具体用于通过第一神经网络获取当前图像帧中每个像素点的深度值。In a possible implementation manner, the acquiring module is specifically configured to acquire the depth value of each pixel in the current image frame through the first neural network.
在一种可能的实现方式中,摄像头为深度摄像头,获取模块具体用于通过深度摄像头获取当前图像帧中每个像素点的深度值。In a possible implementation manner, the camera is a depth camera, and the acquiring module is specifically configured to acquire the depth value of each pixel in the current image frame through the depth camera.
在一种可能的实现方式中,获取模块具体用于通过第二神经网络获取当前图像帧的背景区域。In a possible implementation manner, the acquiring module is specifically configured to acquire the background area of the current image frame through the second neural network.
在一种可能的实现方式中,深度摄像头为TOF摄像头或结构光摄像头。In a possible implementation manner, the depth camera is a TOF camera or a structured light camera.
在一种可能的实现方式中,第一神经网络或第二神经网络为多层感知机、卷积神经网络、递归神经网络以及循环神经网络中的任意一种。In a possible implementation manner, the first neural network or the second neural network is any one of a multilayer perceptron, a convolutional neural network, a recurrent neural network, and a recurrent neural network.
本申请实施例的第五方面提供了一种模型训练装置,该装置包括:获取模块,用于通 过第一待训练模型获取待训练图像帧中每个像素点的深度值;计算模块,用于通过预置的目标损失函数,计算待训练图像帧中每个像素点的深度值以及待训练图像帧中每个像素点的真实深度值之间的偏差;更新模块,用于根据该偏差对第一待训练模型的参数进行更新,直至满足模型训练条件,得到第一神经网络。A fifth aspect of the embodiments of the present application provides a model training device, the device comprising: an acquisition module for acquiring the depth value of each pixel in the image frame to be trained through the first to-be-trained model; a calculation module for Calculate the difference between the depth value of each pixel in the image frame to be trained and the true depth value of each pixel in the image frame to be trained through the preset target loss function; the update module is used to adjust the Once the parameters of the model to be trained are updated until the model training conditions are met, the first neural network is obtained.
从上述装置可以看出:通过该装置训练得到的第一神经网络,可对任意一个图像帧进行准确的单目深度估计,从而得到该图像帧中所有像素点的深度值。It can be seen from the above device that the first neural network trained by the device can perform accurate monocular depth estimation on any image frame, thereby obtaining the depth values of all pixels in the image frame.
本申请实施例的第六方面提供了一种模型训练装置,该装置包括:获取模块,用于通过第二待训练模型获取待训练图像帧的背景区域;计算模块,用于通过预置的目标损失函数,计算待训练图像帧的背景区域以及待训练图像帧的真实背景区域之间的偏差;更新模块,用于根据该偏差对第二待训练模型的参数进行更新,直至满足模型训练条件,得到第二神经网络。A sixth aspect of the embodiments of the present application provides a model training device, the device includes: an acquisition module for acquiring a background area of an image frame to be trained by using a second to-be-trained model; a calculation module for passing a preset target The loss function calculates the deviation between the background area of the image frame to be trained and the real background area of the image frame to be trained; the update module is used to update the parameters of the second model to be trained according to the deviation until the model training conditions are met, Get the second neural network.
从上述装置可以看出:通过该装置训练得到的第二神经网络,可对任意一个图像帧进行准确的显著目标检测,从而得到该图像帧的背景区域。It can be seen from the above device that the second neural network trained by the device can accurately detect salient objects in any image frame, thereby obtaining the background area of the image frame.
本申请实施例的第七方面提供了一种图像处理装置,该装置包括存储器和处理器;存储器存储有代码,处理器被配置为执行代码,当代码被执行时,图像处理装置执行如第一方面或第一方面中任意一种可能的实现方式所述的方法。A seventh aspect of the embodiments of the present application provides an image processing apparatus, the apparatus includes a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the image processing apparatus executes the first The method described in the aspect or any one of the possible implementations of the first aspect.
本申请实施例的第八方面提供了一种模型训练装置,该装置包括存储器和处理器;存储器存储有代码,处理器被配置为执行代码,当代码被执行时,模型训练装置执行如第二方面或第三方面所述的方法。An eighth aspect of the embodiments of the present application provides a model training apparatus, the apparatus includes a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the model training apparatus executes the second The method of aspect or the third aspect.
本申请实施例第九方面提供了一种电路系统,该电路系统包括处理电路,该处理电路配置为执行如第一方面、第一方面中任意一种可能的实现方式、第二方面或第三方面所述的方法。A ninth aspect of an embodiment of the present application provides a circuit system, where the circuit system includes a processing circuit, and the processing circuit is configured to execute any one of the first aspect, any possible implementation manner of the first aspect, the second aspect, or the third aspect the method described in the aspect.
本申请实施例第十方面提供了一种芯片系统,该芯片系统包括处理器,用于调用存储器中存储的计算机程序或计算机指令,以使得该处理器执行如第一方面、第一方面中任意一种可能的实现方式、第二方面或第三方面所述的方法。A tenth aspect of an embodiment of the present application provides a chip system, where the chip system includes a processor for calling a computer program or computer instruction stored in a memory, so that the processor executes any of the first aspect and the first aspect A possible implementation manner, the method described in the second aspect or the third aspect.
在一种可能的实现方式中,该处理器通过接口与存储器耦合。In one possible implementation, the processor is coupled to the memory through an interface.
在一种可能的实现方式中,该芯片系统还包括存储器,该存储器中存储有计算机程序或计算机指令。In a possible implementation manner, the chip system further includes a memory, and the memory stores computer programs or computer instructions.
本申请实施例的第十一方面提供了一种计算机存储介质,该计算机存储介质存储有计算机程序,该程序在由计算机执行时,使得计算机实施如第一方面、第一方面中任意一种可能的实现方式、第二方面或第三方面所述的方法。An eleventh aspect of the embodiments of the present application provides a computer storage medium, where the computer storage medium stores a computer program, and when the program is executed by a computer, the computer enables the computer to implement any one of the first aspect and the first aspect. The implementation manner of , the method described in the second aspect or the third aspect.
本申请实施例第十二方面提供了一种计算机程序产品,该计算机程序产品存储有指令,该指令在由计算机执行时,使得计算机实施如第一方面、第一方面中任意一种可能的实现方式、第二方面或第三方面所述的方法。A twelfth aspect of the embodiments of the present application provides a computer program product, where the computer program product stores instructions that, when executed by a computer, cause the computer to implement any one of the possible implementations of the first aspect and the first aspect manner, the method of the second aspect or the third aspect.
本申请实施例中,终端设备在获取当前图像帧的背景区域的深度信息后,则根据深度信息将背景区域划分为多个子区域。由于不同的子区域对应的被摄物体到摄像头的距离不同,导致不同的子区域相对于前一图像帧的运动情况也不同。因此,终端设备可获取每个 子区域的运动矢量,每个子区域的运动矢量用于指示该子区域相对于前一图像帧的运动情况。由于不同的子区域的运动情况不同,即不同的子区域的运动矢量不同,故终端设备可根据每个子区域的运动矢量对该子区域进行模糊处理,即终端设备可根据不同子区域的运动情况,对不同的子区域进行不同程度上的模糊处理,使得当前图像帧的背景区域具备更加真实的模糊效果。In this embodiment of the present application, after acquiring the depth information of the background area of the current image frame, the terminal device divides the background area into multiple sub-areas according to the depth information. Since the distances from the objects corresponding to the different sub-regions to the camera are different, the motion conditions of the different sub-regions relative to the previous image frame are also different. Therefore, the terminal device can obtain the motion vector of each sub-region, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame. Since the motion conditions of different sub-areas are different, that is, the motion vectors of different sub-areas are different, the terminal device can perform blurring processing on the sub-areas according to the motion vector of each sub-area. , blurring different sub-regions to different degrees, so that the background region of the current image frame has a more realistic blurring effect.
附图说明Description of drawings
图1为人工智能主体框架的一种结构示意图;Fig. 1 is a kind of structural schematic diagram of artificial intelligence main frame;
图2a为本申请实施例提供的图像处理系统的一个结构示意图;2a is a schematic structural diagram of an image processing system provided by an embodiment of the present application;
图2b为本申请实施例提供的图像处理系统的另一结构示意图;FIG. 2b is another schematic structural diagram of an image processing system provided by an embodiment of the present application;
图2c为本申请实施例提供的图像处理的相关设备的一个示意图;FIG. 2c is a schematic diagram of a related device for image processing provided by an embodiment of the present application;
图3a是本申请实施例提供的系统100架构的一个示意图;FIG. 3a is a schematic diagram of the architecture of the system 100 provided by the embodiment of the present application;
图3b为摇摄镜头的一个示意图;Figure 3b is a schematic diagram of a panning lens;
图4为本申请实施例提供的图像处理方法的一个流程示意图;4 is a schematic flowchart of an image processing method provided by an embodiment of the present application;
图5为本申请实施例提供的图像处理方法的一个应用场景示意图;FIG. 5 is a schematic diagram of an application scenario of the image processing method provided by the embodiment of the present application;
图6为本申请实施例提供的图像处理方法的应用例的一个示意图;6 is a schematic diagram of an application example of the image processing method provided by the embodiment of the present application;
图7为本申请实施例提供的图像处理方法的应用例的另一示意图;FIG. 7 is another schematic diagram of an application example of the image processing method provided by the embodiment of the present application;
图8为本申请实施例提供的模型训练方法的一个流程示意图;8 is a schematic flowchart of a model training method provided by an embodiment of the present application;
图9为本申请实施例提供的模型训练方法的另一流程示意图;9 is another schematic flowchart of a model training method provided by an embodiment of the present application;
图10为本申请实施例提供的图像处理装置的一个结构示意图;FIG. 10 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application;
图11为本申请实施例提供的模型训练装置的一个结构示意图;11 is a schematic structural diagram of a model training apparatus provided by an embodiment of the application;
图12为本申请实施例提供的模型训练装置的另一结构示意图;FIG. 12 is another schematic structural diagram of a model training apparatus provided by an embodiment of the present application;
图13为本申请实施例提供的执行设备的一个结构示意图;FIG. 13 is a schematic structural diagram of an execution device provided by an embodiment of the present application;
图14为本申请实施例提供的训练设备的一个结构示意图;14 is a schematic structural diagram of a training device provided by an embodiment of the application;
图15为本申请实施例提供的芯片的一个结构示意图。FIG. 15 is a schematic structural diagram of a chip provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行详细描述。The technical solutions in the embodiments of the present application will be described in detail below with reference to the accompanying drawings in the embodiments of the present application.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is only a distinguishing manner adopted when describing objects with the same attributes in the embodiments of the present application. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, product or device comprising a series of elements is not necessarily limited to those elements, but may include no explicit or other units inherent to these processes, methods, products, or devices.
摇摄是指对运动中的目标物体进行追踪的拍摄方式,通过该拍摄方式得到的图像,可呈现出清晰的前景区域(包含目标物体)和模糊的背景区域。然而,摇摄通常具备一定的操作难度以及不可控性。为了得到较为理想的摇摄镜头,用户可通过终端设备以较高的快 门速度获取一组图像帧(由于快门速度过高,故这组图像帧的背景区域不具备明显的模糊效果),再通过帧混合技术对当前图像帧(即该组图像帧中的任意一个图像帧)进行处理,从而使得当前图像帧的背景区域具备模糊效果。Panning refers to a shooting method of tracking a moving target object. The image obtained by this shooting method can present a clear foreground area (including the target object) and a blurred background area. However, panning is usually difficult and uncontrollable. In order to obtain an ideal panning lens, the user can obtain a set of image frames through the terminal device at a high shutter speed (because the shutter speed is too high, the background area of this set of image frames does not have obvious blurring effect), Then, the current image frame (that is, any one image frame in the group of image frames) is processed through the frame mixing technology, so that the background area of the current image frame has a blur effect.
由于帧混合技术的局限性,若进行帧混合的图像帧数量较少,往往会导致当前图像帧的背景区域的模糊效果不够真实,如背景区域出现重影、虚化等现象。Due to the limitations of the frame blending technology, if the number of image frames for frame blending is small, the blurring effect of the background area of the current image frame is often unrealistic, such as ghosting and blurring of the background area.
为了解决上述问题,本申请提供了一种图像处理方法,该方法可结合人工智能(artificial intelligence,AI)技术实现。AI技术是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能的技术学科,AI技术通过感知环境、获取知识并使用知识获得最佳结果。换句话说,人工智能技术是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。利用人工智能进行图像处理是人工智能常见的一个应用方式。In order to solve the above problems, the present application provides an image processing method, which can be implemented in combination with artificial intelligence (artificial intelligence, AI) technology. AI technology is a technical discipline that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence. AI technology obtains the best results by perceiving the environment, acquiring knowledge and using knowledge. In other words, artificial intelligence technology is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Image processing using artificial intelligence is a common application of artificial intelligence.
首先对人工智能系统总体工作流程进行描述,请参见图1,图1为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system will be described. Please refer to Figure 1. Figure 1 is a structural schematic diagram of the main frame of artificial intelligence. The above-mentioned artificial intelligence theme framework is elaborated in two dimensions. Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
(1)基础设施(1) Infrastructure
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。The infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
(2)数据(2) Data
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。The data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理(3) Data processing
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
(4)通用能力(4) General ability
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的 能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the above-mentioned data processing, some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
(5)智能产品及行业应用(5) Smart products and industry applications
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、智慧城市等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart city, etc.
接下来介绍几种本申请的应用场景。Next, several application scenarios of the present application are introduced.
图2a为本申请实施例提供的图像处理系统的一个结构示意图,该图像处理系统包括用户设备以及数据处理设备。其中,用户设备包括手机、个人电脑或者信息处理中心等智能终端。用户设备为图像处理的发起端,作为图像处理请求的发起方,通常由用户通过用户设备发起请求。FIG. 2a is a schematic structural diagram of an image processing system provided by an embodiment of the present application, where the image processing system includes a user equipment and a data processing device. The user equipment includes smart terminals such as mobile phones, personal computers, or information processing centers. The user equipment is the initiator of image processing. As the initiator of the image processing request, the user usually initiates the request through the user equipment.
上述数据处理设备可以是云服务器、网络服务器、应用服务器以及管理服务器等具有数据处理功能的设备或服务器。数据处理设备通过交互接口接收来自智能终端的图像处理请求,再通过存储数据的存储器以及数据处理的处理器环节进行机器学习,深度学习,搜索,推理,决策等方式的图像处理。数据处理设备中的存储器可以是一个统称,包括本地存储以及存储历史数据的数据库,数据库可以在数据处理设备上,也可以在其它网络服务器上。The above-mentioned data processing device may be a device or server with data processing functions, such as a cloud server, a network server, an application server, and a management server. The data processing device receives the image processing request from the intelligent terminal through the interactive interface, and then performs image processing in the form of machine learning, deep learning, search, reasoning, decision-making, etc. through the memory for storing data and the processor for data processing. The memory in the data processing device may be a general term, including local storage and a database for storing historical data. The database may be on the data processing device or on other network servers.
在图2a所示的图像处理系统中,用户设备可以接收用户的指令,例如用户设备可以获取用户输入/选择的一张图像,然后向数据处理设备发起请求,使得数据处理设备针对用户设备得到的该图像执行图像处理应用(例如,图像深度估计、图像目标检测、图像模糊处理等等),从而得到针对该图像的对应的处理结果。示例性的,用户设备可以获取用户输入的一张图像,然后向数据处理设备发起图像深度估计请求,使得数据处理设备对该图像进行单目深度估计,从而得到图像的深度信息。In the image processing system shown in FIG. 2a, the user equipment can receive instructions from the user, for example, the user equipment can acquire an image input/selected by the user, and then initiate a request to the data processing equipment, so that the data processing equipment can target the data obtained by the user equipment. The image executes an image processing application (eg, image depth estimation, image object detection, image blurring, etc.), resulting in corresponding processing results for the image. Exemplarily, the user equipment may acquire an image input by the user, and then initiate an image depth estimation request to the data processing device, so that the data processing device performs monocular depth estimation on the image, thereby obtaining the depth information of the image.
在图2a中,数据处理设备可以执行本申请实施例的图像处理方法。In Fig. 2a, the data processing device may execute the image processing method of the embodiment of the present application.
图2b为本申请实施例提供的图像处理系统的另一结构示意图,在图2b中,用户设备直接作为数据处理设备,该用户设备能够直接获取来自用户的输入并直接由用户设备本身的硬件进行处理,具体过程与图2a相似,可参考上面的描述,在此不再赘述。Fig. 2b is another schematic structural diagram of the image processing system provided by the embodiment of the application. In Fig. 2b, the user equipment is directly used as a data processing device, and the user equipment can directly obtain the input from the user and directly perform the processing by the hardware of the user equipment itself. The specific process of the processing is similar to that of FIG. 2a, and reference may be made to the above description, which will not be repeated here.
在图2b所示的图像处理系统中,用户设备可以接收用户的指令,例如用户设备可以获取用户在用户设备中所选择的一张图像,然后再由用户设备自身针对该图像执行图像处理应用(例如图像深度估计、图像目标检测、图像模糊处理等),从而得到针对该图像的对应的处理结果。In the image processing system shown in Fig. 2b, the user equipment can receive instructions from the user, for example, the user equipment can acquire an image selected by the user in the user equipment, and then the user equipment can execute an image processing application ( For example, image depth estimation, image object detection, image blur processing, etc.), so as to obtain the corresponding processing result for the image.
在图2b中,用户设备自身就可以执行本申请实施例的图像处理方法。In FIG. 2b, the user equipment itself can execute the image processing method of the embodiment of the present application.
图2c为本申请实施例提供的图像处理的相关设备的一个示意图。FIG. 2c is a schematic diagram of a related device for image processing provided by an embodiment of the present application.
上述图2a和图2b中的用户设备具体可以是图2c中的本地设备301或者本地设备302,图2a中的数据处理设备具体可以是图2c中的执行设备210,其中,数据存储系统250可以存储执行设备210的待处理数据,数据存储系统250可以集成在执行设备210上,也可以设置在云上或其它网络服务器上。The user equipment in the above-mentioned FIGS. 2a and 2b may specifically be the local device 301 or the local device 302 in FIG. 2c, and the data processing device in FIG. 2a may specifically be the execution device 210 in FIG. 2c, wherein the data storage system 250 may be To store the data to be processed by the execution device 210, the data storage system 250 may be integrated on the execution device 210, or may be set on the cloud or other network servers.
图2a和图2b中的处理器可以通过神经网络模型或者其它模型(例如,基于支持向量机的模型)进行数据训练/机器学习/深度学习,并利用数据最终训练或者学习得到的模型针对图像执行图像处理应用,从而得到相应的处理结果。The processors in Figures 2a and 2b may perform data training/machine learning/deep learning through a neural network model or other model (eg, a support vector machine-based model), and use the data to finally train or learn the model to execute on the image Image processing application, so as to obtain the corresponding processing results.
图3a是本申请实施例提供的系统100架构的一个示意图,在图3a中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:各个待调度任务、可调用资源以及其他参数。Fig. 3a is a schematic diagram of the architecture of the system 100 provided by the embodiment of the present application. In Fig. 3a, the execution device 110 is configured with an input/output (I/O) interface 112, which is used for data interaction with external devices, and the user Data may be input to the I/O interface 112 through the client device 140, and in this embodiment of the present application, the input data may include: various tasks to be scheduled, callable resources, and other parameters.
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理(比如进行本申请中神经网络的功能实现)过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。When the execution device 110 preprocesses the input data, or the calculation module 111 of the execution device 110 performs computation and other related processing (for example, performing the function realization of the neural network in this application), the execution device 110 may call the data storage system 150 The data, codes, etc. in the corresponding processing can also be stored in the data storage system 150 .
最后,I/O接口112将处理结果返回给客户设备140,从而提供给用户。Finally, the I/O interface 112 returns the processing results to the client device 140 for provision to the user.
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则,该相应的目标模型/规则即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。其中,训练数据可以存储在数据库130中,且来自于数据采集设备160采集的训练样本。It is worth noting that the training device 120 can generate corresponding target models/rules based on different training data for different goals or tasks, and the corresponding target models/rules can be used to achieve the above-mentioned goals or complete the above-mentioned tasks. , which provides the user with the desired result. The training data may be stored in the database 130 and come from training samples collected by the data collection device 160 .
在图3a中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。In the case shown in FIG. 3 a , the user can manually specify input data, which can be operated through the interface provided by the I/O interface 112 . In another case, the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set corresponding permissions in the client device 140 . The user can view the result output by the execution device 110 on the client device 140, and the specific presentation form can be a specific manner such as display, sound, and action. The client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 . Of course, it is also possible not to collect through the client device 140, but the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .
值得注意的是,图3a仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图3a中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。如图3a所示,可以根据训练设备120训练得到神经网络。It is worth noting that FIG. 3a is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 3a, the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 . As shown in FIG. 3a, the neural network can be obtained by training according to the training device 120.
本申请实施例还提供的一种芯片,该芯片包括神经网络处理器NPU。该芯片可以被设置在如图3a所示的执行设备110中,用以完成计算模块111的计算工作。该芯片也可以被设置在如图3a所示的训练设备120中,用以完成训练设备120的训练工作并输出目标模型/规则。An embodiment of the present application also provides a chip, where the chip includes a neural network processor NPU. The chip can be set in the execution device 110 as shown in FIG. 3 a to complete the calculation work of the calculation module 111 . The chip can also be set in the training device 120 as shown in FIG. 3a to complete the training work of the training device 120 and output the target model/rule.
神经网络处理器NPU,NPU作为协处理器挂载到主中央处理器(central processing unit,CPU)(host CPU)上,由主CPU分配任务。NPU的核心部分为运算电路,控制器控制运算电路提取存储器(权重存储器或输入存储器)中的数据并进行运算。The neural network processor NPU is mounted on the main central processing unit (CPU) (host CPU) as a co-processor, and tasks are allocated by the main CPU. The core part of the NPU is an arithmetic circuit, and the controller controls the arithmetic circuit to extract the data in the memory (weight memory or input memory) and perform operations.
在一些实现中,运算电路内部包括多个处理单元(process engine,PE)。在一些实现中, 运算电路是二维脉动阵列。运算电路还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路是通用的矩阵处理器。In some implementations, the arithmetic circuit includes multiple processing units (process engines, PEs). In some implementations, the arithmetic circuit is a two-dimensional systolic array. The arithmetic circuit may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit fetches the data corresponding to the matrix B from the weight memory, and buffers it on each PE in the operation circuit. The arithmetic circuit fetches the data of matrix A from the input memory and performs matrix operation on matrix B, and stores the partial result or final result of the matrix in an accumulator.
向量计算单元可以对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元可以用于神经网络中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。The vector calculation unit can further process the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. For example, the vector computing unit can be used for network computation of non-convolutional/non-FC layers in neural networks, such as pooling, batch normalization, local response normalization, etc.
在一些实现种,向量计算单元能将经处理的输出的向量存储到统一缓存器。例如,向量计算单元可以将非线性函数应用到运算电路的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector computation unit can store the processed output vector to a unified buffer. For example, the vector computing unit may apply a nonlinear function to the output of the arithmetic circuit, such as a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit generates normalized values, merged values, or both. In some implementations, the vector of processed outputs can be used as activation input to an arithmetic circuit, such as for use in subsequent layers in a neural network.
统一存储器用于存放输入数据以及输出数据。Unified memory is used to store input data as well as output data.
权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)将外部存储器中的输入数据搬运到输入存储器和/或统一存储器、将外部存储器中的权重数据存入权重存储器,以及将统一存储器中的数据存入外部存储器。The weight data directly transfers the input data in the external memory to the input memory and/or the unified memory through the direct memory access controller (DMAC), stores the weight data in the external memory into the weight memory, and transfers the unified memory store the data in the external memory.
总线接口单元(bus interface unit,BIU),用于通过总线实现主CPU、DMAC和取指存储器之间进行交互。The bus interface unit (BIU) is used to realize the interaction between the main CPU, the DMAC and the instruction fetch memory through the bus.
与控制器连接的取指存储器(instruction fetch buffer),用于存储控制器使用的指令;The instruction fetch buffer connected to the controller is used to store the instructions used by the controller;
控制器,用于调用指存储器中缓存的指令,实现控制该运算加速器的工作过程。The controller is used for invoking the instructions cached in the memory to realize and control the working process of the operation accelerator.
一般地,统一存储器,输入存储器,权重存储器以及取指存储器均为片上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。Generally, the unified memory, input memory, weight memory and instruction fetch memory are all on-chip memories, and the external memory is the memory outside the NPU, and the external memory can be double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM), or other readable and writable memory.
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。Since the embodiments of the present application involve a large number of neural network applications, for ease of understanding, related terms and neural networks and other related concepts involved in the embodiments of the present application are first introduced below.
(1)神经网络(1) Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be:
Figure PCTCN2022077788-appb-000001
Figure PCTCN2022077788-appb-000001
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一 起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。Among them, s=1, 2,...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
神经网络中的每一层的工作可以用数学表达式y=a(Wx+b)来描述:从物理层面神经网络中的每一层的工作可以理解为通过五种对输入空间(输入向量的集合)的操作,完成输入空间到输出空间的变换(即矩阵的行空间到列空间),这五种操作包括:1、升维/降维;2、放大/缩小;3、旋转;4、平移;5、“弯曲”。其中1、2、3的操作由Wx完成,4的操作由+b完成,5的操作则由a()来实现。这里之所以用“空间”二字来表述是因为被分类的对象并不是单个事物,而是一类事物,空间是指这类事物所有个体的集合。其中,W是权重向量,该向量中的每一个值表示该层神经网络中的一个神经元的权重值。该向量W决定着上文所述的输入空间到输出空间的空间变换,即每一层的权重W控制着如何变换空间。训练神经网络的目的,也就是最终得到训练好的神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。因此,神经网络的训练过程本质上就是学习控制空间变换的方式,更具体的就是学习权重矩阵。The work of each layer in the neural network can be described by the mathematical expression y=a(Wx+b): from the physical level, the work of each layer in the neural network can be understood as a Set) operation to complete the transformation from input space to output space (that is, from the row space of the matrix to the column space). These five operations include: 1. Dimension raising/lowering; 2. Enlarging/reducing; 3. Rotation; 4. Translation; 5, "bend". Among them, the operations of 1, 2, and 3 are completed by Wx, the operation of 4 is completed by +b, and the operation of 5 is implemented by a(). The reason why the word "space" is used here is because the object to be classified is not a single thing, but a type of thing, and space refers to the collection of all individuals of this type of thing. Among them, W is the weight vector, and each value in the vector represents the weight value of a neuron in the neural network of this layer. This vector W determines the space transformation from the input space to the output space described above, that is, the weight W of each layer controls how the space is transformed. The purpose of training the neural network is to finally obtain the weight matrix of all layers of the trained neural network (the weight matrix formed by the vectors W of many layers). Therefore, the training process of the neural network is essentially learning the way to control the spatial transformation, and more specifically, learning the weight matrix.
因为希望神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到神经网络能够预测出真正想要的目标值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么神经网络的训练就变成了尽可能缩小这个loss的过程。Because we want the output of the neural network to be as close as possible to the value you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then update each layer of the neural network according to the difference between the two. (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the neural network), for example, if the predicted value of the network is high, adjust the weight vector to make it predict low Some, keep adjusting until the neural network can predict the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function (loss function) or objective function (objective function), which are used to measure the difference between the predicted value and the target value. important equation. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, then the training of the neural network becomes the process of reducing the loss as much as possible.
(2)反向传播算法(2) Back propagation algorithm
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。The neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges. The back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.
(3)摇摄(3) Panning
摇摄是指一种利用慢速快门跟踪目标物体的拍摄手法,具体做法是将摄像头跟着运动中的目标物体,以相对接近的速度,做同方向摇动并进行拍摄,该技巧主要用于运动题材摄影。摇摄镜头指的是采用前述拍摄手法拍摄到的图像,此类图像可以呈现一种背景动态模糊的艺术效果,即此类图像的前景区域(包含目标物体)清晰而背景区域模糊。如图3b所示(图3b为摇摄镜头的一个示意图),该摇摄镜头中,前景区域(即行驶中的汽车)是清晰的,而背景区域(即汽车附近的周围环境和其余物体)则是模糊的。Panning refers to a shooting technique that uses a slow shutter speed to track the target object. The specific method is to follow the moving target object, shake the camera in the same direction and shoot at a relatively close speed. This technique is mainly used for sports themes. photography. The panning lens refers to the images captured by the above-mentioned shooting methods. Such images can present an artistic effect of dynamic blurring of the background, that is, the foreground area (including the target object) of such images is clear and the background area is blurred. As shown in Figure 3b (Figure 3b is a schematic diagram of a panning lens), in this panning lens, the foreground area (ie the moving car) is clear, and the background area (ie the surrounding environment and other objects near the car) is vague.
下面从神经网络的训练侧和神经网络的应用侧对本申请提供的方法进行描述。The method provided by the present application will be described below from the training side of the neural network and the application side of the neural network.
本申请实施例提供的模型训练方法,涉及图像的处理,具体可以应用于数据训练、机 器学习、深度学习等数据处理方法,对训练数据(如本申请中的待训练图像帧)进行符号化和形式化的智能信息建模、抽取、预处理、训练等,最终得到训练好的神经网络(如本申请中的第一神经网络和第二神经网络);并且,本申请实施例提供的图像处理方法可以运用上述训练好的神经网络,将输入数据(如本申请中的当前图像帧)输入到所述训练好的神经网络中,得到输出数据(如本申请中当前图像帧的深度信息、当前图像帧的背景区域等等)。需要说明的是,本申请实施例提供的模型训练方法和图像处理方法是基于同一个构思产生的发明,也可以理解为一个系统中的两个部分,或一个整体流程的两个阶段:如模型训练阶段和模型应用阶段。The model training method provided by the embodiment of the present application involves the processing of images, and can be specifically applied to data processing methods such as data training, machine learning, deep learning, etc., for symbolizing and transforming training data (such as the image frames to be trained in the present application). Formalized intelligent information modeling, extraction, preprocessing, training, etc., finally obtain a trained neural network (such as the first neural network and the second neural network in this application); and, the image processing provided by the embodiments of this application The method can use the above-mentioned trained neural network, input the input data (such as the current image frame in this application) into the trained neural network, and obtain output data (such as the depth information of the current image frame in this application, the current background area of an image frame, etc.). It should be noted that the model training method and image processing method provided in the embodiments of this application are inventions based on the same idea, and can also be understood as two parts in a system, or two stages of an overall process: such as model The training phase and the model application phase.
图4为本申请实施例提供的图像处理方法的一个流程示意图,通过该方法处理的图像帧,其背景区域可具备真实的模糊效果。如图5所示(图5为本申请实施例提供的图像处理方法的一个应用场景示意图),终端设备可在一组连续的图像帧中,选择某一个图像帧,并对其进行处理,从而使得该图像帧的背景区域具备真实的动态模糊效果。此外,终端设备也可对该组图像帧的每个图像帧进行处理,从而使得每个像帧的背景区域具备真实的动态模糊效果。FIG. 4 is a schematic flowchart of an image processing method provided by an embodiment of the present application. The background area of an image frame processed by this method may have a real blur effect. As shown in FIG. 5 (FIG. 5 is a schematic diagram of an application scenario of the image processing method provided by the embodiment of the present application), the terminal device may select a certain image frame from a group of continuous image frames, and process it, thereby Make the background area of the image frame have a real dynamic blur effect. In addition, the terminal device can also process each image frame of the group of image frames, so that the background area of each image frame has a real dynamic blur effect.
下面将对本申请实施例提供的图像处理方法进行详细的介绍,如图4所示,该方法包括:The image processing method provided by the embodiment of the present application will be introduced in detail below. As shown in FIG. 4 , the method includes:
401、获取当前图像帧的深度信息以及当前图像帧的背景区域。401. Acquire the depth information of the current image frame and the background area of the current image frame.
当用户需要对运动中的目标物体进行摇摄时,可通过终端设备(即前述的用户设备或客户设备)的摄像头以较高的快门速度获取一组连续的图像帧。具体地,用户可通过多种方式拍摄该组图像帧。例如,用户可将终端设备的摄像头的模式设置为图像连拍模式,然后长按快门从而获取该组图像帧。又如,用户可不断地点按快门,从而获取该组图像帧。再如,用户可通过终端设备的感知技术判断当前的拍摄场景是否符合特定场景(目标物体处于运动状态的场景),若符合,则触发连拍或多次拍摄,从而得到该组图像帧。还如,用户可将终端设备的摄像头的模式设置为视频录制模式,从而得到该组图像帧等等。When the user needs to pan the moving target object, a group of continuous image frames can be acquired at a higher shutter speed through the camera of the terminal device (ie, the aforementioned user device or client device). Specifically, the user may capture the set of image frames in various ways. For example, the user can set the mode of the camera of the terminal device to the image continuous shooting mode, and then long press the shutter to acquire the group of image frames. For another example, the user can continuously press the shutter to acquire the group of image frames. For another example, the user can determine whether the current shooting scene conforms to a specific scene (a scene in which the target object is in motion) through the perception technology of the terminal device, and if so, trigger continuous shooting or multiple shooting to obtain the group of image frames. For another example, the user can set the mode of the camera of the terminal device to the video recording mode, so as to obtain the group of image frames and so on.
在该组图像帧中,所有图像帧按时间先后进行排序,且每个图像帧均包含前景区域以及背景区域,其中,前景区域和背景区域均包含(呈现)有被摄物体,前景区域包含的被摄物体一般为用户关注的目标物体,则背景区域包含的被摄物体为用户不关注的非目标物体。例如,前景区域包含的被摄物体可以为行驶中的汽车,背景区域包含的被摄物体可以汽车周围的天空、花草、道路、路灯等等。又如,前景区域包含的被摄物体可以为滑雪中的人,背景区域包含的被摄物体可以为人周围的房子、雪地、树木等等。In this group of image frames, all image frames are sorted chronologically, and each image frame includes a foreground area and a background area, wherein the foreground area and the background area both contain (present) the subject, and the foreground area contains The photographed object is generally the target object that the user pays attention to, and the photographed object contained in the background area is the non-target object that the user does not pay attention to. For example, the subject contained in the foreground area may be a moving car, and the subject contained in the background area may be the sky, flowers and plants, roads, street lights, etc. around the car. For another example, the subject contained in the foreground area may be a person skiing, and the subject contained in the background area may be a house, snow, trees, etc. around the person.
由于该组图像帧的背景区域不具备明显的模糊效果,故终端设备需对其进行处理,使得某个图像帧或某些图像帧的背景区域具备真实的模糊效果。在该组图像帧中,终端设备可挑选其中任意一个图像帧作为待处理的图像帧,即当前图像帧。具体地,终端设备可通过多种方式挑选出当前图像帧。例如,终端设备可根据用户输入的指令,从该组图像帧中确定出当前图像帧,即当前图像帧为用户指定的图像帧。又如,终端设备可根据美学评价算法,对该组图像帧的每个图像帧进行打分,并将得分最高的图像帧确定为当前图像帧。Since the background area of the group of image frames does not have an obvious blurring effect, the terminal device needs to process it so that a certain image frame or the background area of some image frames has a real blurring effect. In the group of image frames, the terminal device may select any one of the image frames as the image frame to be processed, that is, the current image frame. Specifically, the terminal device can select the current image frame in various ways. For example, the terminal device may determine the current image frame from the group of image frames according to the instruction input by the user, that is, the current image frame is the image frame designated by the user. For another example, the terminal device may score each image frame of the group of image frames according to an aesthetic evaluation algorithm, and determine the image frame with the highest score as the current image frame.
确定当前图像帧中,终端设备可获取当前图像帧的深度信息以及当前图像帧的背景区 域,其中,当前图像帧的深度信息即为当前图像帧中每个像素点的深度值,即当前图像帧中所有像素点的深度值。在当前图像帧中,每个像素点的深度值用于该像素点在实际环境(三维空间)中的对应位置到摄像头的距离。如此一来,当前图像帧的深度信息即可用于指示当前图像帧包含的各个被摄物体到摄像头的距离,即这些被摄物体在实际环境中的位置到摄像头的距离。It is determined that in the current image frame, the terminal device can obtain the depth information of the current image frame and the background area of the current image frame, wherein the depth information of the current image frame is the depth value of each pixel in the current image frame, that is, the current image frame. The depth values of all pixels in . In the current image frame, the depth value of each pixel is used for the distance from the corresponding position of the pixel in the actual environment (three-dimensional space) to the camera. In this way, the depth information of the current image frame can be used to indicate the distance from each subject included in the current image frame to the camera, that is, the distance from the position of these subjects in the actual environment to the camera.
值得注意的是,终端设备可通过多种方式获取当前图像帧中每个像素点的深度值。例如,终端设备可通过第一神经网络获取当前图像帧中每个像素点的深度值,即通过第一神经网络对当前图像帧进行单目深度估计,从而得到当前图像帧中所有像素点的深度值。又如,终端设备具备深度摄像头,故终端设备通过深度摄像头得到当前图像帧后,也可同时得到当前图像帧中所有像素点的深度值。进一步地,终端设备的深度摄像头可以为TOF摄像头或结构光摄像头。It is worth noting that the terminal device can obtain the depth value of each pixel in the current image frame in various ways. For example, the terminal device can obtain the depth value of each pixel in the current image frame through the first neural network, that is, the monocular depth estimation is performed on the current image frame through the first neural network, so as to obtain the depth of all pixels in the current image frame. value. For another example, the terminal device has a depth camera, so after the terminal device obtains the current image frame through the depth camera, it can also simultaneously obtain the depth values of all pixels in the current image frame. Further, the depth camera of the terminal device may be a TOF camera or a structured light camera.
终端设备还可通过多种方式获取当前图像帧的背景区域。例如,终端设备可通过第二神经网络获取当前图像帧的背景区域,即终端设备可通过第二神经网络对当前图像帧进行显著目标检测(直接检测出当前图像帧中最明显的物体,即目标物体),直接将当前图像帧的前景区域和背景区域区分开来,或,终端设备可通过第二神经网络对当前图像帧进行目标检测(检测出当前图像帧中的各个被摄物体)和目标分割(从所被摄物体中确定目标物体)。又如,终端设备可根据用户的指令,在当前图像帧中划分出前景区域和背景区域等等。The terminal device can also obtain the background area of the current image frame in various ways. For example, the terminal device can obtain the background area of the current image frame through the second neural network, that is, the terminal device can perform salient object detection on the current image frame through the second neural network (directly detect the most obvious object in the current image frame, that is, the target object), directly distinguish the foreground area and background area of the current image frame, or the terminal device can perform target detection on the current image frame through the second neural network (detect each subject in the current image frame) and target Segmentation (determination of the target object from the photographed objects). For another example, the terminal device may divide a foreground area and a background area in the current image frame according to the user's instruction, and so on.
应理解,第一神经网络可以为多层感知机(multi-layer perceptron,MLP)、卷积神经网络(convolutional neural networks,CNN)、递归神经网络(recursive neural network)、循环神经网络(recurrent neural network,RNN)等模型中的任意一种,第二神经网路也可以为MLP、CNN、递归神经网络、RNN等模型中的任意一种,此处不做限制。It should be understood that the first neural network can be a multi-layer perceptron (MLP), a convolutional neural network (CNN), a recurrent neural network (recursive neural network), a recurrent neural network (recurrent neural network) , RNN) and other models, the second neural network can also be any one of MLP, CNN, recurrent neural network, RNN and other models, which is not limited here.
还应理解,本申请实施例中的第一神经网络和第二神经网络均为经过训练后的神经网络模型。下文将对第一神经网络和第二神经网络的训练过程进行简单的介绍:It should also be understood that the first neural network and the second neural network in the embodiments of the present application are both trained neural network models. The following will briefly introduce the training process of the first neural network and the second neural network:
(1)在进行模型训练前,获取某一批待训练图像帧,并提前确定每个待训练图像帧中所有像素点的真实深度值。开始训练后,可向第一待训练模型输入某个待训练图像帧。然后,通过第一待训练模型获取该待训练图像帧中每个像素点的深度值。最后,通过预置的目标损失函数计算第一待训练模型输出的该待训练图像帧中每个像素点的深度值和真实深度值之间的差距,若该差距在合格范围内,则将该待训练图像帧视为合格的待训练图像帧,若在合格范围外,则视为不合格的待训练图像帧。对于该批待训练图像帧,每一个训练图像帧均需进行前述过程,此处不再赘述。若该批待训练图像帧中,仅有少量合格的待训练图像帧,则调整第一待训练模型的参数,并重新用另一批待训练图像帧进行训练,直至存在大量合格的待训练图像帧,以得到第一神经网络。(1) Before performing model training, obtain a certain batch of image frames to be trained, and determine the true depth values of all pixels in each image frame to be trained in advance. After starting training, a certain image frame to be trained can be input to the first model to be trained. Then, the depth value of each pixel in the to-be-trained image frame is obtained through the first to-be-trained model. Finally, calculate the difference between the depth value of each pixel in the image frame to be trained output by the first model to be trained and the real depth value through the preset target loss function, if the difference is within the qualified range, then The image frame to be trained is regarded as a qualified image frame to be trained, and if it is outside the qualified range, it is regarded as an unqualified image frame to be trained. For the batch of training image frames, the foregoing process needs to be performed for each training image frame, which will not be repeated here. If there are only a small number of qualified image frames to be trained in the batch of image frames to be trained, then adjust the parameters of the first model to be trained, and re-train with another batch of image frames to be trained until there are a large number of qualified image frames to be trained frame to get the first neural network.
(2)在进行模型训练前,获取某一批待训练图像帧,并提前确定每个待训练图像的真实背景区域。开始训练后,可向第二待训练模型输入某个待训练图像帧。然后,通过第二待训练模型获取该待训练图像帧的背景区域。最后,通过目标损失函数计算第二待训练模型输出的该待训练图像帧的背景区域和真实背景区域之间的差距,若该差距在合格范围内,则将该待训练图像帧视为合格的待训练图像帧,若在合格范围外,则视为不合格的待训练 图像帧。对于该批待训练图像帧,每一个训练图像帧均需进行前述过程,此处不再赘述。若该批待训练图像帧中,仅有少量合格的待训练图像帧,则调整第二待训练模型的参数,并重新用另一批待训练图像帧进行训练,直至存在大量合格的待训练图像帧,以得到第二神经网络。(2) Before performing model training, obtain a certain batch of image frames to be trained, and determine the real background area of each image to be trained in advance. After starting training, a certain image frame to be trained may be input to the second model to be trained. Then, the background area of the to-be-trained image frame is acquired through the second to-be-trained model. Finally, the difference between the background area of the image frame to be trained and the real background area output by the second model to be trained is calculated by the target loss function. If the difference is within the qualified range, the image frame to be trained is regarded as qualified. If the image frame to be trained is outside the qualified range, it is regarded as an unqualified image frame to be trained. For the batch of training image frames, the foregoing process needs to be performed for each training image frame, which will not be repeated here. If there are only a small number of qualified image frames to be trained in the batch of image frames to be trained, adjust the parameters of the second model to be trained, and re-train with another batch of image frames to be trained until there are a large number of qualified image frames to be trained frame to get the second neural network.
402、从当前图像帧的深度信息中,确定当前图像帧的背景区域的深度信息。402. From the depth information of the current image frame, determine the depth information of the background region of the current image frame.
终端设备得到当前图像帧的深度信息以及当前图像帧的背景区域后,可从当前图像帧的深度信息中,确定当前图像帧的背景区域的深度信息,当前图像帧的背景区域的深度信息用于指示背景区域包含的各个被摄物体到摄像头的距离,即这些被摄物体在实际环境中的位置到摄像头的距离。具体地,终端设备可从当前图像帧中所有像素点的深度值中,确定当前图像帧的背景区域中每个像素点的深度值,即终端设备可从当前图像帧的所有像素点中,确定哪一部分像素点位于当前图像帧的背景区域中,那么这部分像素点的深度值即为当前图像帧的背景区域中所有像素点的深度值。After obtaining the depth information of the current image frame and the background area of the current image frame, the terminal device can determine the depth information of the background area of the current image frame from the depth information of the current image frame, and the depth information of the background area of the current image frame is used for Indicates the distance from each subject contained in the background area to the camera, that is, the distance from the position of these subjects in the actual environment to the camera. Specifically, the terminal device can determine the depth value of each pixel in the background area of the current image frame from the depth values of all pixels in the current image frame, that is, the terminal device can determine the depth value of each pixel in the current image frame from all the pixels in the current image frame. Which part of the pixel point is located in the background area of the current image frame, then the depth value of this part of the pixel point is the depth value of all the pixel points in the background area of the current image frame.
403、根据当前图像帧的背景区域的深度信息,将背景区域划分为多个子区域,不同的子区域对应的被摄物体到摄像头的距离不同。403. Divide the background area into a plurality of sub-areas according to the depth information of the background area of the current image frame, and the distances from the object to the camera corresponding to different sub-areas are different.
终端设备得到当前图像帧的背景区域的深度信息后,可根据该深度信息将将背景区域划分为多个子区域,不同的子区域对应的被摄物体到摄像头的距离不同。具体地,终端设备得到当前图像帧的背景区域中所有像素点的深度值后,则根据每个像素点的深度值计算该像素点的深度变化率,其计算公式为:After obtaining the depth information of the background area of the current image frame, the terminal device can divide the background area into multiple sub-areas according to the depth information, and the distances from the object to the camera corresponding to different sub-areas are different. Specifically, after obtaining the depth values of all pixels in the background area of the current image frame, the terminal device calculates the depth change rate of the pixel according to the depth value of each pixel, and the calculation formula is as follows:
G(i,j)=dx(i,j)+dy(i,j)G(i,j)=dx(i,j)+dy(i,j)
dx(i,j)=D(i+1,j)-D(i,j)dx(i,j)=D(i+1,j)-D(i,j)
dy(i,j)=D(i,j+1)-D(i,j)dy(i,j)=D(i,j+1)-D(i,j)
上式中,G(i,j)为该像素点的深度变化率,D(i,j)为该像素点的深度值,D(i,j+1)、D(i+1,j)为该像素点周围的其余像素点的深度值,i=1,2,3,…,N,j=1,2,3,…,N。In the above formula, G(i,j) is the depth change rate of the pixel, D(i,j) is the depth value of the pixel, D(i,j+1), D(i+1,j) is the depth value of the remaining pixels around the pixel, i=1,2,3,...,N, j=1,2,3,...,N.
如此一来,终端设备可得到当前图像帧的背景区域中所有像素点的深度变化率,对于任意一个像素点而言,该像素点的深度变化率用于指示该像素点的深度值与周围像素点的深度值之间的差值,即该像素点在实际环境中的对应位置到摄像头的距离,与周围像素点在实际环境中的对应位置到摄像头的距离之间的差值。由此可见,当某个像素点的深度变化率较小时,说明该像素点的实际位置到摄像头的距离(在实际环境中的对应位置)与周围像素点的实际位置到摄像头的距离之间的差距较小,当某个像素点的深度变化率较大时,说明该像素点的实际位置到摄像头的距离与周围像素点的实际位置到摄像头的距离之间的差距较大。因此,终端设备可根据深度变化率的大小,将当前图像帧的背景区域划分为多个子区域。具体地,终端设备可根据当前图像帧的背景区域中每个像素点的深度变化率以及预置的变化率阈值,将当前图像帧的背景区域划分为多个子区域。需要说明的是,该变化率阈值等于或约等于每个子区域的边缘点的深度变化率,且变化率阈值一般被设置得较大,故边缘点的深度值与边缘点周围的像素点的深度值存在很大的差值,即深度值在边缘点处发生了突变。也就是说,边缘点的实际位置到摄像头的距离与周围像素点的实际位置到摄像头的距离之间存在很大的差值。因此,通过当前图像帧的背景区域中每个像素点的 深度变化率以及预置的变化率阈值可确定背景区域中各个子区域的边缘点,进而确定多个子区域。如此一来,不同的子区域对应了不同远近的实际位置,故位于同一子区域内的被摄物体到摄像头的距离是相同或相似的,不同子区域包含的被摄物体到摄像头的距离则是差异较为明显的。In this way, the terminal device can obtain the depth change rate of all pixels in the background area of the current image frame. For any pixel, the depth change rate of the pixel is used to indicate the depth value of the pixel and surrounding pixels. The difference between the depth values of the point, that is, the difference between the distance between the corresponding position of the pixel in the actual environment and the camera, and the distance between the corresponding position of the surrounding pixels in the actual environment and the camera. It can be seen that when the depth change rate of a certain pixel is small, it indicates the difference between the distance between the actual position of the pixel and the camera (the corresponding position in the actual environment) and the distance between the actual position of the surrounding pixels and the camera. The gap is small. When the depth change rate of a certain pixel is large, it means that there is a large gap between the distance between the actual position of the pixel and the camera and the distance between the actual position of the surrounding pixels and the camera. Therefore, the terminal device may divide the background area of the current image frame into multiple sub-areas according to the depth change rate. Specifically, the terminal device may divide the background area of the current image frame into multiple sub-areas according to the depth change rate of each pixel in the background area of the current image frame and a preset change rate threshold. It should be noted that the change rate threshold is equal to or approximately equal to the depth change rate of the edge point of each sub-region, and the change rate threshold is generally set to be larger, so the depth value of the edge point is the same as the depth of the pixel points around the edge point. There is a large difference in the values, that is, the depth value is abruptly changed at the edge points. That is to say, there is a large difference between the distance from the actual position of the edge point to the camera and the distance from the actual position of the surrounding pixels to the camera. Therefore, through the depth change rate of each pixel in the background area of the current image frame and the preset change rate threshold, the edge points of each sub-area in the background area can be determined, and then multiple sub-areas can be determined. In this way, different sub-areas correspond to actual positions of different distances, so the distances from the subjects in the same sub-area to the camera are the same or similar, and the distances from the subjects in different sub-areas to the camera are The difference is more obvious.
例如,终端设备根据当前图像帧的背景区域中所有像素点的深度变化率,将当前图像帧的背景区域划分为三个子区域,第一个子区域即为汽车行驶的道路,第二个子区域即为道路后方的植物,第三个子区域即为植物后方的楼房。可见,第一子区域包含的被摄物体离摄像头最近,第三个子区域包含的被摄物体离摄像头最远。For example, the terminal device divides the background area of the current image frame into three sub-areas according to the depth change rate of all pixels in the background area of the current image frame. It is the plants behind the road, and the third sub-area is the buildings behind the plants. It can be seen that the subject contained in the first sub-area is closest to the camera, and the subject contained in the third sub-area is the farthest from the camera.
404、在多个子区域中,获取每个子区域的运动矢量,每个子区域的运动矢量用于指示该子区域相对于前一图像帧的运动情况。404. In the multiple sub-regions, obtain a motion vector of each sub-region, where the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame.
摄像头在追踪目标物体时,一般会发生旋转或者平移。当摄像头在移动中进行拍摄时,不同远近的被摄物体相对于摄像头的运动情况(也可理解为移动程度)不同,例如,较近的被摄物体的移动程度较大,较远的被摄物体的移动程度较小,这一情况被呈现在摄像头所拍摄的连续图像帧中。具体地,在当前图像帧的背景区域的多个子区域中,由于不同子区域包含的被摄物体到摄像头的距离不同,故不同子区域对应的被摄物体相对于摄像头的运动情况不同。当摄像头拍摄相邻的两个图像帧时,某个子区域(也可理解为该子区域包含的被摄物体)在当前图像帧中的位置,较之该子区域在前一图像帧中的位置,肯定发生了一定的变化,且不同子区域的位置变化情况不同,即不同子区域的运动情况不同。可见,以当前图像帧的前一图像帧为参考基准,当前图像帧的背景区域的不同子区域相对于前一图像帧的运动情况不同。When the camera is tracking the target object, it generally rotates or translates. When the camera is moving, the movement of the objects at different distances relative to the camera (which can also be understood as the degree of movement) is different. For example, the closer objects move more, and the farther objects move Objects move to a lesser extent, which is shown in successive image frames captured by the camera. Specifically, in the multiple sub-regions of the background region of the current image frame, since the distances of the objects included in different sub-regions to the camera are different, the objects corresponding to different sub-regions have different motions relative to the camera. When the camera shoots two adjacent image frames, the position of a sub-area (which can also be understood as the subject contained in the sub-area) in the current image frame is compared with the position of the sub-area in the previous image frame , a certain change must have occurred, and the position changes of different sub-regions are different, that is, the motion of different sub-regions is different. It can be seen that, taking the previous image frame of the current image frame as a reference, different sub-regions of the background region of the current image frame have different motion conditions relative to the previous image frame.
依旧如上述例子,设当前图像帧的背景区域包含三个子区域,第一个子区域即为汽车行驶的道路,第二个子区域即为道路后方的植物,第三个子区域即为植物后方的楼房。那么,第一个子区域从前一图像帧到当前图像帧的移动程度最大,第二个子区域从前一图像帧到当前图像帧的移动程度次之,第三个子区域从前一图像帧到当前图像帧的移动程度最小。As in the above example, suppose the background area of the current image frame contains three sub-areas, the first sub-area is the road where the car is driving, the second sub-area is the plant behind the road, and the third sub-area is the building behind the plant . Then, the movement degree of the first sub-area from the previous image frame to the current image frame is the largest, the movement degree of the second sub-area from the previous image frame to the current image frame is second, and the third sub-area is from the previous image frame to the current image frame. movement is minimal.
为了确定当前图像帧的背景区域中每个子区域的运动情况,终端设备可获取每个子区域的运动矢量,每个子区域的运动矢量包含该子区域的运动速度以及该子区域的运动方向,每个子区域的运动适量用于指示该子区域相对于前一图像帧的运动情况。具体地,对于多个子区域中的每个子区域,终端设备可先对该子区域进行角点检测,确定出至少一个目标像素点(即角点),这部分目标像素点通常为该子区域中特证比较明显的像素点。然后,终端设备通过光流法确定这部分目标像素点从前一图像帧到当前图像帧的运动距离、这部分目标像素点在前一图像帧中的位置以及这部分目标像素点在当前图像帧中的位置。接着,终端设备根据这部分目标像素点的运动距离,前一图像帧与当前图像帧之间的时间差计算出这部分目标像素点从前一图像帧到当前图像帧的运动速度,并根据这部分目标像素点在前一图像帧中的位置以及这部分目标像素点在当前图像帧中的位置,确定这部分目标像素点从前一图像帧到当前图像帧的运动方向。最后,终端设备可通过这部分目标像素点的运动速度确定该子区域的运动速度(例如,这部分目标像素点的运动速度的平均值等等),并 将这部分目标像素点的运动方向确定为该子区域的运动方向。In order to determine the motion of each sub-area in the background area of the current image frame, the terminal device can obtain the motion vector of each sub-area, and the motion vector of each sub-area includes the motion speed of the sub-area and the motion direction of the sub-area. The motion amount of a region is used to indicate the motion of the sub-region relative to the previous image frame. Specifically, for each sub-area in the multiple sub-areas, the terminal device may first perform corner detection on the sub-area to determine at least one target pixel (ie, a corner), and this part of the target pixels is usually in the sub-area. Special evidence is more obvious pixels. Then, the terminal device uses the optical flow method to determine the moving distance of this part of the target pixels from the previous image frame to the current image frame, the position of this part of the target pixels in the previous image frame, and the part of the target pixels in the current image frame. s position. Then, the terminal device calculates the movement speed of this part of the target pixels from the previous image frame to the current image frame according to the moving distance of this part of the target pixels and the time difference between the previous image frame and the current image frame, and according to this part of the target pixel The position of the pixel point in the previous image frame and the position of this part of the target pixel point in the current image frame determine the movement direction of this part of the target pixel point from the previous image frame to the current image frame. Finally, the terminal device can determine the movement speed of the sub-region (for example, the average value of the movement speed of this part of the target pixels, etc.) according to the movement speed of this part of the target pixels, and determine the movement direction of this part of the target pixels is the movement direction of the sub-region.
需要说明的是,其余子区域的运动速度以及运动方向的确定过程可参考前述说明部分,此处不在赘述。It should be noted that, for the determination process of the movement speed and movement direction of the remaining sub-regions, reference may be made to the foregoing description part, which will not be repeated here.
405、根据每个子区域的运动矢量对该子区域进行模糊处理。405. Perform blurring processing on each sub-region according to the motion vector of the sub-region.
得到当前图像帧的背景区域中每个子区域的运动速度和运动方向后,对于每一个子区域,终端设备根据该子区域的运动速度以及该子区域的运动方向构建该子区域对应的卷积核,再通过该子区域对应的卷积核对该子区域进行卷积处理。由于不同子区域的运动不同,故不同子区域对应的卷积核也不同,那么,终端设备可这部分卷积核,对不同子区域进行不同程度的模糊处理。如此一来,当前图像帧的背景区域中,不同子区域可具备不同程度的模糊效果,从而实现层次化且更加真实的动态模糊效果。After obtaining the motion speed and motion direction of each sub-area in the background area of the current image frame, for each sub-area, the terminal device constructs the convolution kernel corresponding to the sub-area according to the motion speed of the sub-area and the motion direction of the sub-area , and then perform convolution processing on the sub-region through the corresponding convolution kernel of the sub-region. Since the motions of different sub-regions are different, the corresponding convolution kernels of different sub-regions are also different. Then, the terminal device can use this part of the convolution kernel to perform different degrees of blurring on different sub-regions. In this way, in the background area of the current image frame, different sub-areas may have different degrees of blurring effects, thereby achieving a hierarchical and more realistic dynamic blurring effect.
本申请实施例中,终端设备在获取当前图像帧的背景区域的深度信息后,则根据深度信息将背景区域划分为多个子区域。由于不同的子区域对应的被摄物体到摄像头的距离不同,导致不同的子区域相对于前一图像帧的运动情况也不同。因此,终端设备可获取每个子区域的运动矢量,每个子区域的运动矢量用于指示该子区域相对于前一图像帧的运动情况。由于不同的子区域的运动情况不同,即不同的子区域的运动矢量不同,故终端设备可根据每个子区域的运动矢量对该子区域进行模糊处理,即终端设备可根据不同子区域的运动情况,对不同的子区域进行不同程度上的模糊处理,使得当前图像帧的背景区域具备更加真实的模糊效果。In this embodiment of the present application, after acquiring the depth information of the background area of the current image frame, the terminal device divides the background area into multiple sub-areas according to the depth information. Since the distances from the objects corresponding to the different sub-regions to the camera are different, the motion conditions of the different sub-regions relative to the previous image frame are also different. Therefore, the terminal device can acquire the motion vector of each sub-region, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame. Since the motion conditions of different sub-areas are different, that is, the motion vectors of different sub-areas are different, the terminal device can perform blurring processing on the sub-areas according to the motion vector of each sub-area. , blurring different sub-regions to different degrees, so that the background region of the current image frame has a more realistic blurring effect.
为了进一步理解,下文结合一个应用例对本申请实施例提供的图像处理方法做进一步的介绍。图6为本申请实施例提供的图像处理方法的应用例的一个示意图,图7为本申请实施例提供的图像处理方法的应用例的另一示意图,如图6和图7所示,该应用例包括:For further understanding, the image processing method provided by the embodiment of the present application is further introduced below with reference to an application example. FIG. 6 is a schematic diagram of an application example of the image processing method provided by the embodiment of the application, and FIG. 7 is another schematic diagram of the application example of the image processing method provided by the embodiment of the application. As shown in FIG. 6 and FIG. 7 , the application Examples include:
(1)终端设备确定当前图像帧601后,通过第一神经网络获取当前图像帧的深度图像602(即前述当前图像帧的深度信息),其中,深度图像602中不同颜色的区域到摄像头的距离不同。(1) After the terminal device determines the current image frame 601, it obtains the depth image 602 of the current image frame (that is, the depth information of the current image frame) through the first neural network, wherein the distances from the areas of different colors in the depth image 602 to the camera different.
(2)终端设备通过第二神经网络获取当前图像帧的显著图像603,其中,当前图像帧的显著图像603用于突出显示当前图像帧的背景区域,即显著图像603中的深色部分。(2) The terminal device obtains the salient image 603 of the current image frame through the second neural network, wherein the salient image 603 of the current image frame is used to highlight the background area of the current image frame, that is, the dark part in the salient image 603 .
(3)终端设备结合当前图像帧的显著图像603以及当前图像帧601的深度图像602,确定出当前图像帧的背景区域的深度图像(即前述当前图像帧的背景区域的深度信息)。(3) The terminal device combines the salient image 603 of the current image frame and the depth image 602 of the current image frame 601 to determine the depth image of the background area of the current image frame (ie, the depth information of the background area of the current image frame).
(4)终端设备可根据当前图像帧的背景区域的深度图像,计算背景区域中每个像素点的深度变化率,并根据深度变化率的大小,将当前图像帧的背景区域划分为多个子区域。(4) The terminal device can calculate the depth change rate of each pixel in the background area according to the depth image of the background area of the current image frame, and divide the background area of the current image frame into multiple sub-areas according to the size of the depth change rate .
(5)终端设备通过光流法,以前一图像帧605为参考,在当前图像帧601中,标记出背景区域中每个子区域的角点的运动矢量(包含运动速度和运动方向),并根据每个子区域的角点的运动矢量确定该子区域的运动速度和运动方向。(5) Through the optical flow method, the terminal device uses the previous image frame 605 as a reference, and in the current image frame 601, marks the motion vector (including the motion speed and motion direction) of the corner points of each sub-area in the background area, and according to The motion vectors of the corners of each sub-region determine the speed and direction of motion of the sub-region.
(6)终端设备根据每个子区域的运动速度和运动方向确定该子区域对应的卷积核,利用该子区域对应的卷积核完成该子区域的卷积操作,使得该子区域具备一定程度的模糊效果。如此一来,不同子区域可具备不同程度的模糊效果,使得当前图像帧的背景区域具备层次化的模糊效果,即具备更加真实的模糊效果。(6) The terminal device determines the convolution kernel corresponding to the sub-area according to the movement speed and movement direction of the sub-area, and uses the convolution kernel corresponding to the sub-area to complete the convolution operation of the sub-area, so that the sub-area has a certain degree of blur effect. In this way, different sub-regions can have different degrees of blurring effects, so that the background region of the current image frame has a layered blurring effect, that is, a more realistic blurring effect.
以上是对本申请实施例提供的图像处理方法所进行的详细说明,以下将对本申请实施例提供的模型训练方法进行介绍,图8为本申请实施例提供的模型训练方法的一个流程示意图,该方法包括:The above is a detailed description of the image processing method provided by the embodiment of the present application. The model training method provided by the embodiment of the present application will be introduced below. FIG. 8 is a schematic flowchart of the model training method provided by the embodiment of the present application. The method include:
801、通过第一待训练模型获取待训练图像帧中每个像素点的深度值;801. Obtain the depth value of each pixel in the image frame to be trained through the first model to be trained;
802、通过预置的目标损失函数,计算待训练图像帧中每个像素点的深度值以及待训练图像帧中每个像素点的真实深度值之间的偏差;802. Calculate the deviation between the depth value of each pixel in the image frame to be trained and the true depth value of each pixel in the image frame to be trained by using a preset target loss function;
803、根据该偏差对第一待训练模型的参数进行更新,直至满足模型训练条件,得到第一神经网络。803. Update the parameters of the first model to be trained according to the deviation until the model training conditions are met, and obtain the first neural network.
需要说明的是,关于步骤801至步骤803的说明可参考前述步骤401中第一神经网络的训练过程的相关说明,此处不再赘述。可以理解的是,通过步骤801至步骤803可得到前述步骤401中的第一神经网络,该第一神经网络可对任意一个图像帧进行准确的单目深度估计,从而得到该图像帧中所有像素点的深度值。It should be noted that, for the description of steps 801 to 803, reference may be made to the relevant description of the training process of the first neural network in the aforementioned step 401, and details are not repeated here. It can be understood that the first neural network in the aforementioned step 401 can be obtained through steps 801 to 803, and the first neural network can perform accurate monocular depth estimation on any image frame, thereby obtaining all the pixels in the image frame. The depth value of the point.
图9为本申请实施例提供的模型训练方法的另一流程示意图,该方法包括:FIG. 9 is another schematic flowchart of a model training method provided by an embodiment of the present application, and the method includes:
901、通过第二待训练模型获取待训练图像帧的背景区域;901. Obtain a background area of an image frame to be trained by a second model to be trained;
902、通过预置的目标损失函数,计算待训练图像帧的背景区域以及待训练图像帧的真实背景区域之间的偏差;902. Calculate the deviation between the background area of the image frame to be trained and the real background area of the image frame to be trained by using a preset target loss function;
903、根据该偏差对第二待训练模型的参数进行更新,直至满足模型训练条件,得到第二神经网络。903. Update the parameters of the second model to be trained according to the deviation until the model training conditions are met, and obtain a second neural network.
需要说明的是,关于步骤901至步骤903的说明可参考前述步骤401中第二神经网络的训练过程的相关说明,此处不再赘述。可以理解的是,通过步骤901至步骤903可得到前述步骤401中的第二神经网络,该第二神经网络可对任意一个图像帧进行准确的显著目标检测,从而得到该图像帧的背景区域。It should be noted that, for the description of steps 901 to 903, reference may be made to the relevant description of the training process of the second neural network in the foregoing step 401, and details are not repeated here. It can be understood that, through steps 901 to 903, the second neural network in the aforementioned step 401 can be obtained, and the second neural network can perform accurate salient target detection on any image frame, thereby obtaining the background area of the image frame.
以上是对本申请实施例提供的模型训练方法所进行的详细说明,以下将对本申请实施例提供的图像处理装置进行介绍。图10为本申请实施例提供的图像处理装置的一个结构示意图,如图10所示,该装置即为前述的终端设备,该装置包括:The above is a detailed description of the model training method provided by the embodiment of the present application, and the image processing apparatus provided by the embodiment of the present application will be introduced below. FIG. 10 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application. As shown in FIG. 10 , the apparatus is the aforementioned terminal equipment, and the apparatus includes:
获取模块1001,用于获取待训练图像帧的背景区域的深度信息;An acquisition module 1001 is used to acquire the depth information of the background region of the image frame to be trained;
划分模块1002,用于根据所述深度信息将所述背景区域划分为多个子区域,不同的子区域对应的被摄物体到摄像头的距离不同,该摄像头用于拍摄当前图像帧;A division module 1002, configured to divide the background area into a plurality of sub-areas according to the depth information, and the distances from the objects corresponding to different sub-areas to the camera are different, and the camera is used to capture the current image frame;
处理模块1003,用于对不同的子区域进行不同程度的模糊处理,得到处理后的当前图像帧。The processing module 1003 is configured to perform different degrees of blurring processing on different sub-regions to obtain a processed current image frame.
本实施例中,终端设备在获取当前图像帧的背景区域的深度信息后,则根据深度信息将背景区域划分为多个子区域。由于不同的子区域对应的被摄物体到摄像头的距离不同,导致不同的子区域相对于前一图像帧的运动情况也不同。因此,终端设备可获取每个子区域的运动矢量,每个子区域的运动矢量用于指示该子区域相对于前一图像帧的运动情况。由于不同的子区域的运动情况不同,即不同的子区域的运动矢量不同,故终端设备可根据每个子区域的运动矢量对该子区域进行模糊处理,即终端设备可根据不同子区域的运动情况,对不同的子区域进行不同程度上的模糊处理,使得当前图像帧的背景区域具备更加真 实的模糊效果。In this embodiment, after acquiring the depth information of the background area of the current image frame, the terminal device divides the background area into multiple sub-areas according to the depth information. Since the distances from the objects corresponding to the different sub-regions to the camera are different, the motion conditions of the different sub-regions relative to the previous image frame are also different. Therefore, the terminal device can acquire the motion vector of each sub-region, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame. Since the motion conditions of different sub-areas are different, that is, the motion vectors of different sub-areas are different, the terminal device can perform blurring processing on the sub-areas according to the motion vector of each sub-area. , blurring different sub-regions to different degrees, so that the background region of the current image frame has a more realistic blurring effect.
在一种可能的实现方式中,当前图像帧的背景区域的深度信息包括当前图像帧的背景区域中每个像素点的深度值,划分模块1002具体用于:根据当前图像帧的背景区域中每个像素点的深度值,确定当前图像帧的背景区域中每个像素点的深度变化率,每个像素点的深度变化率根据该像素点的深度值以及该像素点周围的其余像素点的深度值确定;根据每个像素点的深度变化率以及预置的变化率阈值,将背景区域划分为多个子区域。In a possible implementation manner, the depth information of the background area of the current image frame includes the depth value of each pixel in the background area of the current image frame, and the dividing module 1002 is specifically configured to: according to each pixel in the background area of the current image frame The depth value of each pixel point is determined, and the depth change rate of each pixel point in the background area of the current image frame is determined. The depth change rate of each pixel point is based on the depth value of the pixel point and the depth of the remaining pixels around the pixel point. The value is determined; according to the depth change rate of each pixel point and the preset change rate threshold, the background area is divided into multiple sub-areas.
在一种可能的实现方式中,处理模块1003具体用于:在多个子区域中,获取每个子区域的运动矢量,每个子区域的运动矢量用于指示该子区域相对于前一图像帧的运动情况;根据每个子区域的运动矢量对该子区域进行模糊处理,得到处理后的当前图像帧。In a possible implementation manner, the processing module 1003 is specifically configured to: in multiple sub-regions, obtain a motion vector of each sub-region, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame Situation; perform blurring processing on each sub-area according to the motion vector of the sub-area to obtain the processed current image frame.
在一种可能的实现方式中,处理模块1003具体用于:对于多个子区域中的每个子区域,根据该子区域中至少一个目标像素点从前一图像帧到当前图像帧的运动速度,确定该子区域的运动速度;根据至少一个目标像素点从前一图像帧到当前图像帧的运动方向,确定该子区域的运动方向。In a possible implementation manner, the processing module 1003 is specifically configured to: for each sub-area in the plurality of sub-areas, according to the movement speed of at least one target pixel in the sub-area from the previous image frame to the current image frame, determine the The movement speed of the sub-area; the movement direction of the sub-area is determined according to the movement direction of at least one target pixel from the previous image frame to the current image frame.
在一种可能的实现方式中,处理模块1003具体用于:对于每个子区域,根据该子区域的运动速度以及该子区域的运动方向构建该子区域对应的卷积核;通过该子区域对应的卷积核对该子区域进行卷积处理。In a possible implementation manner, the processing module 1003 is specifically configured to: for each sub-area, construct a convolution kernel corresponding to the sub-area according to the movement speed of the sub-area and the movement direction of the sub-area; The convolution kernel of , performs convolution processing on this sub-region.
在一种可能的实现方式中,至少一个目标像素点为角点。In a possible implementation manner, at least one target pixel point is a corner point.
在一种可能的实现方式中,至少一个目标像素点的运动速度和运动方向通过光流法获取。In a possible implementation manner, the movement speed and movement direction of at least one target pixel point are acquired by an optical flow method.
在一种可能的实现方式中,获取模块1001具体用于:获取当前图像帧中每个像素点的深度值以及当前图像帧的背景区域;从当前图像帧中所有像素点的深度值中,确定当前图像帧的背景区域中每个像素点的深度值。In a possible implementation manner, the obtaining module 1001 is specifically configured to: obtain the depth value of each pixel in the current image frame and the background area of the current image frame; from the depth values of all pixels in the current image frame, determine The depth value of each pixel in the background area of the current image frame.
在一种可能的实现方式中,获取模块1001具体用于通过第一神经网络获取当前图像帧中每个像素点的深度值。In a possible implementation manner, the obtaining module 1001 is specifically configured to obtain the depth value of each pixel in the current image frame through the first neural network.
在一种可能的实现方式中,摄像头为深度摄像头,获取模块1001具体用于通过深度摄像头获取当前图像帧中每个像素点的深度值。In a possible implementation manner, the camera is a depth camera, and the acquiring module 1001 is specifically configured to acquire the depth value of each pixel in the current image frame through the depth camera.
在一种可能的实现方式中,获取模块1001具体用于通过第二神经网络获取当前图像帧的背景区域。In a possible implementation manner, the acquiring module 1001 is specifically configured to acquire the background area of the current image frame through the second neural network.
在一种可能的实现方式中,深度摄像头为TOF摄像头或结构光摄像头。In a possible implementation manner, the depth camera is a TOF camera or a structured light camera.
在一种可能的实现方式中,第一神经网络或第二神经网络为多层感知机、卷积神经网络、递归神经网络以及循环神经网络中的任意一种。In a possible implementation manner, the first neural network or the second neural network is any one of a multilayer perceptron, a convolutional neural network, a recurrent neural network, and a recurrent neural network.
以上是对本申请实施例提供的图像处理装置所进行的详细说明,以下将对本申请实施例提供的模型训练装置进行介绍。图11为本申请实施例提供的模型训练装置的一个结构示意图,如图11所示,该装置包括:The above is a detailed description of the image processing apparatus provided by the embodiments of the present application, and the model training apparatus provided by the embodiments of the present application will be introduced below. FIG. 11 is a schematic structural diagram of a model training apparatus provided by an embodiment of the application. As shown in FIG. 11 , the apparatus includes:
获取模块1101,用于通过第一待训练模型获取待训练图像帧中每个像素点的深度值;Obtaining module 1101, for obtaining the depth value of each pixel in the image frame to be trained through the first model to be trained;
计算模块1102,用于通过预置的目标损失函数,计算待训练图像帧中每个像素点的深度值以及待训练图像帧中每个像素点的真实深度值之间的偏差;The calculation module 1102 is used to calculate the deviation between the depth value of each pixel in the image frame to be trained and the true depth value of each pixel in the image frame to be trained through a preset target loss function;
更新模块1103,用于根据该偏差对第一待训练模型的参数进行更新,直至满足模型训练条件,得到第一神经网络。The updating module 1103 is configured to update the parameters of the first model to be trained according to the deviation until the model training conditions are met, and the first neural network is obtained.
图12为本申请实施例提供的模型训练装置的另一结构示意图,如图12所示,该装置包括:FIG. 12 is another schematic structural diagram of a model training device provided by an embodiment of the present application. As shown in FIG. 12 , the device includes:
获取模块1201,用于通过第二待训练模型获取待训练图像帧的背景区域;The acquisition module 1201 is used for acquiring the background area of the image frame to be trained through the second to-be-trained model;
计算模块1202,用于通过预置的目标损失函数,计算待训练图像帧的背景区域以及待训练图像帧的真实背景区域之间的偏差;The calculation module 1202 is used to calculate the deviation between the background area of the image frame to be trained and the real background area of the image frame to be trained through a preset target loss function;
更新模块1203,用于根据该偏差对第二待训练模型的参数进行更新,直至满足模型训练条件,得到第二神经网络。The updating module 1203 is configured to update the parameters of the second model to be trained according to the deviation until the model training conditions are met, and a second neural network is obtained.
需要说明的是,上述装置各模块/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其带来的技术效果与本申请方法实施例相同,具体内容可参考本申请实施例前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the information exchange, execution process and other contents among the modules/units of the above-mentioned apparatus are based on the same concept as the method embodiments of the present application, and the technical effects brought by them are the same as those of the method embodiments of the present application, and the specific contents can be Reference is made to the descriptions in the method embodiments shown in the foregoing embodiments of the present application, which will not be repeated here.
本申请实施例还涉及一种执行设备,图13为本申请实施例提供的执行设备的一个结构示意图。如图13所示,执行设备1300具体可以表现为手机、平板、笔记本电脑、智能穿戴设备、服务器等,此处不做限定。其中,执行设备1300上可以部署有图10对应实施例中所描述的图像处理装置,用于实现图4对应实施例中图像处理的功能。具体的,执行设备1300包括:接收器1301、发射器1302、处理器1303和存储器1304(其中执行设备1300中的处理器1303的数量可以一个或多个,图13中以一个处理器为例),其中,处理器1303可以包括应用处理器13031和通信处理器13032。在本申请的一些实施例中,接收器1301、发射器1302、处理器1303和存储器1304可通过总线或其它方式连接。The embodiment of the present application also relates to an execution device, and FIG. 13 is a schematic structural diagram of the execution device provided by the embodiment of the present application. As shown in FIG. 13 , the execution device 1300 may specifically be represented as a mobile phone, a tablet, a notebook computer, a smart wearable device, a server, etc., which is not limited here. The image processing apparatus described in the embodiment corresponding to FIG. 10 may be deployed on the execution device 1300 to implement the image processing function in the embodiment corresponding to FIG. 4 . Specifically, the execution device 1300 includes: a receiver 1301, a transmitter 1302, a processor 1303, and a memory 1304 (wherein the number of processors 1303 in the execution device 1300 may be one or more, and one processor is taken as an example in FIG. 13 ) , wherein the processor 1303 may include an application processor 13031 and a communication processor 13032 . In some embodiments of the present application, the receiver 1301, the transmitter 1302, the processor 1303, and the memory 1304 may be connected by a bus or otherwise.
存储器1304可以包括只读存储器和随机存取存储器,并向处理器1303提供指令和数据。存储器1304的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1304存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。Memory 1304 may include read-only memory and random access memory, and provides instructions and data to processor 1303 . A portion of memory 1304 may also include non-volatile random access memory (NVRAM). The memory 1304 stores processors and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
处理器1303控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。The processor 1303 controls the operation of the execution device. In a specific application, various components of the execution device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. However, for the sake of clarity, the various buses are referred to as bus systems in the figures.
上述本申请实施例揭示的方法可以应用于处理器1303中,或者由处理器1303实现。处理器1303可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1303中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1303可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1303可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处 理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1304,处理器1303读取存储器1304中的信息,结合其硬件完成上述方法的步骤。The methods disclosed in the above embodiments of the present application may be applied to the processor 1303 or implemented by the processor 1303 . The processor 1303 may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 1303 or an instruction in the form of software. The above-mentioned processor 1303 can be a general-purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), a field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The processor 1303 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as being executed by a hardware decoding processor, or by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 1304, and the processor 1303 reads the information in the memory 1304, and completes the steps of the above method in combination with its hardware.
接收器1301可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1302可用于通过第一接口输出数字或字符信息;发射器1302还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1302还可以包括显示屏等显示设备。The receiver 1301 can be used to receive input numerical or character information, and to generate signal input related to performing the relevant setting and function control of the device. The transmitter 1302 can be used to output digital or character information through the first interface; the transmitter 1302 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1302 can also include a display device such as a display screen .
本申请实施例中,在一种情况下,处理器1303,用于执行图4对应实施例中的终端设备执行的图像处理方法。In the embodiment of the present application, in one case, the processor 1303 is configured to execute the image processing method executed by the terminal device in the embodiment corresponding to FIG. 4 .
本申请实施例还涉及一种训练设备,图14为本申请实施例提供的训练设备的一个结构示意图。如图14所示,训练设备1400由一个或多个服务器实现,训练设备1400可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1414(例如,一个或一个以上处理器)和存储器1432,一个或一个以上存储应用程序1442或数据1444的存储介质1430(例如一个或一个以上海量存储设备)。其中,存储器1432和存储介质1430可以是短暂存储或持久存储。存储在存储介质1430的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1414可以设置为与存储介质1430通信,在训练设备1400上执行存储介质1430中的一系列指令操作。The embodiment of the present application also relates to a training device, and FIG. 14 is a schematic structural diagram of the training device provided by the embodiment of the present application. As shown in FIG. 14 , the training device 1400 is implemented by one or more servers. The training device 1400 may vary greatly due to different configurations or performances, and may include one or more central processing units (CPUs) 1414 (eg, one or more processors) and memory 1432, one or more storage media 1430 (eg, one or more mass storage devices) that store applications 1442 or data 1444. Among them, the memory 1432 and the storage medium 1430 may be short-term storage or persistent storage. The program stored in the storage medium 1430 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the training device. Further, the central processing unit 1414 may be configured to communicate with the storage medium 1430 to execute a series of instruction operations in the storage medium 1430 on the training device 1400 .
训练设备1400还可以包括一个或一个以上电源1426,一个或一个以上有线或无线网络接口1450,一个或一个以上输入输出接口1458;或,一个或一个以上操作系统1441,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。The training device 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input and output interfaces 1458; or, one or more operating systems 1441, such as Windows Server™, Mac OS X™ , UnixTM, LinuxTM, FreeBSDTM and so on.
具体的,训练设备可以执行图8或图9对应的实施例中的步骤。Specifically, the training device may perform the steps in the embodiment corresponding to FIG. 8 or FIG. 9 .
本申请实施例还涉及一种计算机存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。The embodiments of the present application also relate to a computer storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when it runs on a computer, the computer causes the computer to perform the steps performed by the aforementioned execution device, or, The computer is caused to perform the steps as performed by the aforementioned training device.
本申请实施例还涉及一种计算机程序产品,该计算机程序产品存储有指令,该指令在由计算机执行时使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。The embodiments of the present application also relate to a computer program product, where the computer program product stores instructions, which, when executed by the computer, cause the computer to execute the steps executed by the aforementioned execution device, or cause the computer to execute the steps executed by the aforementioned training device A step of.
本申请实施例提供的执行设备、训练设备或终端设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使执行设备内的芯片执行上述实施例描述的数据处理方法,或者,以使训练设备内的芯片执行上述实施例描述的数据处理方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The execution device, training device, or terminal device provided in this embodiment of the present application may specifically be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, pins or circuits, etc. The processing unit can execute the computer executable instructions stored in the storage unit, so that the chip in the execution device executes the data processing method described in the above embodiments, or the chip in the training device executes the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
具体的,请参阅图15,图15为本申请实施例提供的芯片的一个结构示意图,所述芯片可以表现为神经网络处理器NPU 1500,NPU 1500作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1503,通过控制器1504控制运算电路1503提取存储器中的矩阵数据并进行乘法运算。Specifically, please refer to FIG. 15. FIG. 15 is a schematic structural diagram of a chip provided by an embodiment of the present application. The chip may be represented as a neural network processor NPU 1500, and the NPU 1500 is mounted as a co-processor to the host CPU (Host CPU). ), tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 1503, which is controlled by the controller 1504 to extract the matrix data in the memory and perform multiplication operations.
在一些实现中,运算电路1503内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1503是二维脉动阵列。运算电路1503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1503是通用的矩阵处理器。In some implementations, the arithmetic circuit 1503 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 1503 is a two-dimensional systolic array. The arithmetic circuit 1503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1503 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1502中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1501中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1508中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1502 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit fetches the data of matrix A and matrix B from the input memory 1501 to perform matrix operation, and stores the partial result or final result of the matrix in the accumulator 1508 .
统一存储器1506用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1505,DMAC被搬运到权重存储器1502中。输入数据也通过DMAC被搬运到统一存储器1506中。Unified memory 1506 is used to store input data and output data. The weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 1505, and the DMAC is transferred to the weight memory 1502. Input data is also moved into unified memory 1506 via the DMAC.
BIU为Bus Interface Unit即,总线接口单元1510,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1509的交互。The BIU is the Bus Interface Unit, that is, the bus interface unit 1510, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 1509.
总线接口单元1510(Bus Interface Unit,简称BIU),用于取指存储器1509从外部存储器获取指令,还用于存储单元访问控制器1505从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1510 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1509 to obtain instructions from the external memory, and also for the storage unit access controller 1505 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1506或将权重数据搬运到权重存储器1502中或将输入数据数据搬运到输入存储器1501中。The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1506 , the weight data to the weight memory 1502 , or the input data to the input memory 1501 .
向量计算单元1507包括多个运算处理单元,在需要的情况下,对运算电路1503的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。The vector calculation unit 1507 includes a plurality of operation processing units, and further processes the output of the operation circuit 1503, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc., if necessary. It is mainly used for non-convolutional/fully connected layer network computation in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
在一些实现中,向量计算单元1507能将经处理的输出的向量存储到统一存储器1506。例如,向量计算单元1507可以将线性函数;或,非线性函数应用到运算电路1503的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1507生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1503的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector computation unit 1507 can store the vector of processed outputs to the unified memory 1506 . For example, the vector calculation unit 1507 may apply a linear function; or a non-linear function to the output of the operation circuit 1503, such as linear interpolation of the feature plane extracted by the convolution layer, such as a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit 1507 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation input to the arithmetic circuit 1503, such as for use in subsequent layers in a neural network.
控制器1504连接的取指存储器(instruction fetch buffer)1509,用于存储控制器1504使用的指令;The instruction fetch buffer (instruction fetch buffer) 1509 connected to the controller 1504 is used to store the instructions used by the controller 1504;
统一存储器1506,输入存储器1501,权重存储器1502以及取指存储器1509均为On-Chip存储器。外部存储器私有于该NPU硬件架构。The unified memory 1506, the input memory 1501, the weight memory 1502 and the instruction fetch memory 1509 are all On-Chip memories. External memory is private to the NPU hardware architecture.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或 一个或多个用于控制上述程序执行的集成电路。Wherein, the processor mentioned in any one of the above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above program.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary general-purpose hardware. Special components, etc. to achieve. Under normal circumstances, all functions completed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structures used to implement the same function can also be various, such as analog circuits, digital circuits or special circuit, etc. However, a software program implementation is a better implementation in many cases for this application. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art. The computer software products are stored in a readable storage medium, such as a floppy disk of a computer. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which may be a personal computer, training device, or network device, etc.) to execute the various embodiments of this application. method.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be retrieved from a website, computer, training device, or data Transmission from the center to another website site, computer, training facility or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device, a data center, or the like that includes an integration of one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

Claims (25)

  1. 一种图像处理方法,其特征在于,所述方法包括:An image processing method, characterized in that the method comprises:
    获取当前图像帧的背景区域的深度信息;Get the depth information of the background area of the current image frame;
    根据所述深度信息将所述背景区域划分为多个子区域,不同的子区域对应的被摄物体到摄像头的距离不同,所述摄像头用于拍摄所述当前图像帧;Divide the background area into a plurality of sub-areas according to the depth information, and the distances from the objects corresponding to different sub-areas to the camera are different, and the camera is used to capture the current image frame;
    对所述不同的子区域进行不同程度的模糊处理,得到处理后的当前图像帧。Perform different degrees of blurring processing on the different sub-regions to obtain the processed current image frame.
  2. 根据权利要求1所述的方法,其特征在于,所述深度信息包括所述当前图像帧的背景区域中每个像素点的深度值,所述根据所述深度信息将所述背景区域划分为多个子区域具体包括:The method according to claim 1, wherein the depth information includes a depth value of each pixel in a background area of the current image frame, and the background area is divided into multiple The sub-areas specifically include:
    根据所述当前图像帧的背景区域中每个像素点的深度值,确定所述当前图像帧的背景区域中每个像素点的深度变化率,所述每个像素点的深度变化率根据该像素点的深度值以及该像素点周围的其余像素点的深度值确定;Determine the depth change rate of each pixel point in the background area of the current image frame according to the depth value of each pixel point in the background area of the current image frame, and the depth change rate of each pixel point is based on the pixel The depth value of the point and the depth values of the remaining pixels around the pixel are determined;
    根据所述每个像素点的深度变化率以及预置的变化率阈值,将所述背景区域划分为多个子区域。The background area is divided into a plurality of sub-areas according to the depth change rate of each pixel and a preset change rate threshold.
  3. 根据权利要求1或2所述的方法,其特征在于,所述对所述不同的子区域进行不同程度的模糊处理,得到处理后的当前图像帧具体包括:The method according to claim 1 or 2, characterized in that, performing different degrees of blurring on the different sub-regions to obtain the processed current image frame specifically includes:
    在所述多个子区域中,获取每个子区域的运动矢量,所述每个子区域的运动矢量用于指示该子区域相对于前一图像帧的运动情况;In the plurality of sub-regions, a motion vector of each sub-region is obtained, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame;
    根据所述每个子区域的运动矢量对该子区域进行模糊处理,得到处理后的当前图像帧。Perform blur processing on each sub-region according to the motion vector of the sub-region to obtain the processed current image frame.
  4. 根据权利要求3所述的方法,其特征在于,所述每个子区域的运动矢量包含该子区域的运动速度以及该子区域的运动方向,所述在所述多个子区域中,获取每个子区域的运动矢量包括:The method according to claim 3, wherein the motion vector of each sub-area includes the motion speed of the sub-area and the motion direction of the sub-area, and in the plurality of sub-areas, acquiring each sub-area The motion vectors include:
    对于所述多个子区域中的每个子区域,根据该子区域中至少一个目标像素点从所述前一图像帧到所述当前图像帧的运动速度,确定该子区域的运动速度;For each sub-area in the plurality of sub-areas, determine the movement speed of the sub-area according to the movement speed of at least one target pixel in the sub-area from the previous image frame to the current image frame;
    根据所述至少一个目标像素点从所述前一图像帧到所述当前图像帧的运动方向,确定该子区域的运动方向。The movement direction of the sub-region is determined according to the movement direction of the at least one target pixel point from the previous image frame to the current image frame.
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述每个子区域的运动矢量对该子区域进行模糊处理具体包括:The method according to claim 4, wherein the blurring of each sub-region according to the motion vector of the sub-region specifically comprises:
    对于所述每个子区域,根据该子区域的运动速度以及该子区域的运动方向构建该子区域对应的卷积核;For each sub-area, construct a convolution kernel corresponding to the sub-area according to the movement speed of the sub-area and the movement direction of the sub-area;
    通过该子区域对应的卷积核对该子区域进行卷积处理。Convolution processing is performed on the sub-region through the convolution kernel corresponding to the sub-region.
  6. 根据权利要求4或5所述的方法,其特征在于,所述至少一个目标像素点为角点。The method according to claim 4 or 5, wherein the at least one target pixel point is a corner point.
  7. 根据权利要求4至6任意一项所述的方法,其特征在于,所述至少一个目标像素点的运动速度和运动方向通过光流法获取。The method according to any one of claims 4 to 6, wherein the movement speed and movement direction of the at least one target pixel point are obtained by an optical flow method.
  8. 根据权利要求3至7任意一项所述的方法,其特征在于,所述获取当前图像帧的背景区域的深度信息具体包括:The method according to any one of claims 3 to 7, wherein the acquiring the depth information of the background region of the current image frame specifically includes:
    获取当前图像帧中每个像素点的深度值以及所述当前图像帧的背景区域;Obtain the depth value of each pixel in the current image frame and the background area of the current image frame;
    从所述当前图像帧中所有像素点的深度值中,确定所述当前图像帧的背景区域中每个像素点的深度值。From the depth values of all pixels in the current image frame, determine the depth value of each pixel in the background area of the current image frame.
  9. 根据权利要求8所述的方法,其特征在于,所述获取当前图像帧中每个像素点的深度值具体包括:The method according to claim 8, wherein the acquiring the depth value of each pixel in the current image frame specifically comprises:
    通过第一神经网络获取当前图像帧中每个像素点的深度值。Obtain the depth value of each pixel in the current image frame through the first neural network.
  10. 根据权利要求8所述的方法,其特征在于,所述摄像头为深度摄像头,所述获取当前图像帧中每个像素点的深度值具体包括:The method according to claim 8, wherein the camera is a depth camera, and the acquiring the depth value of each pixel in the current image frame specifically comprises:
    通过深度摄像头获取当前图像帧中每个像素点的深度值。Obtain the depth value of each pixel in the current image frame through the depth camera.
  11. 根据权利要求8至10任意一项所述的方法,其特征在于,所述获取所述当前图像帧的背景区域具体包括:The method according to any one of claims 8 to 10, wherein the acquiring the background area of the current image frame specifically comprises:
    通过第二神经网络获取当前图像帧的背景区域。Obtain the background area of the current image frame through the second neural network.
  12. 一种图像处理装置,其特征在于,所述装置包括:An image processing device, characterized in that the device comprises:
    获取模块,用于获取当前图像帧的背景区域的深度信息;The acquisition module is used to acquire the depth information of the background area of the current image frame;
    划分模块,用于根据所述深度信息将所述背景区域划分为多个子区域,不同的子区域对应的被摄物体到摄像头的距离不同,所述摄像头用于拍摄所述当前图像帧;a dividing module, configured to divide the background area into a plurality of sub-areas according to the depth information, and the distances from the objects corresponding to different sub-areas to the camera are different, and the camera is used for shooting the current image frame;
    处理模块,用于对所述不同的子区域进行不同程度的模糊处理,得到处理后的当前图像帧。The processing module is configured to perform different degrees of blurring processing on the different sub-regions to obtain the processed current image frame.
  13. 根据权利要求12所述的装置,其特征在于,所述深度信息包括所述当前图像帧的背景区域中每个像素点的深度值,所述划分模块具体用于:The device according to claim 12, wherein the depth information includes a depth value of each pixel in the background area of the current image frame, and the dividing module is specifically configured to:
    根据所述当前图像帧的背景区域中每个像素点的深度值,确定所述当前图像帧的背景区域中每个像素点的深度变化率,所述每个像素点的深度变化率根据该像素点的深度值以及该像素点周围的其余像素点的深度值确定;Determine the depth change rate of each pixel point in the background area of the current image frame according to the depth value of each pixel point in the background area of the current image frame, and the depth change rate of each pixel point is based on the pixel The depth value of the point and the depth values of the remaining pixels around the pixel are determined;
    根据所述每个像素点的深度变化率以及预置的变化率阈值,将所述背景区域划分为多个子区域。The background area is divided into a plurality of sub-areas according to the depth change rate of each pixel and a preset change rate threshold.
  14. 根据权利要求12或13所述的装置,其特征在于,所述处理模块具体用于:The device according to claim 12 or 13, wherein the processing module is specifically configured to:
    在所述多个子区域中,获取每个子区域的运动矢量,所述每个子区域的运动矢量用于指示该子区域相对于前一图像帧的运动情况;In the plurality of sub-regions, a motion vector of each sub-region is obtained, and the motion vector of each sub-region is used to indicate the motion of the sub-region relative to the previous image frame;
    根据所述每个子区域的运动矢量对该子区域进行模糊处理,得到处理后的当前图像帧。Perform blur processing on each sub-region according to the motion vector of the sub-region to obtain the processed current image frame.
  15. 根据权利要求14所述的装置,其特征在于,所述处理模块具体用于:The device according to claim 14, wherein the processing module is specifically configured to:
    对于所述多个子区域中的每个子区域,根据该子区域中至少一个目标像素点从所述前一图像帧到所述当前图像帧的运动速度,确定该子区域的运动速度;For each sub-area in the plurality of sub-areas, determine the movement speed of the sub-area according to the movement speed of at least one target pixel in the sub-area from the previous image frame to the current image frame;
    根据所述至少一个目标像素点从所述前一图像帧到所述当前图像帧的运动方向,确定该子区域的运动方向。The movement direction of the sub-region is determined according to the movement direction of the at least one target pixel point from the previous image frame to the current image frame.
  16. 根据权利要求15所述的装置,其特征在于,所述处理模块具体用于:The device according to claim 15, wherein the processing module is specifically configured to:
    对于所述每个子区域,根据该子区域的运动速度以及该子区域的运动方向构建该子区域对应的卷积核;For each sub-area, construct a convolution kernel corresponding to the sub-area according to the movement speed of the sub-area and the movement direction of the sub-area;
    通过该子区域对应的卷积核对该子区域进行卷积处理。Convolution processing is performed on the sub-region through the convolution kernel corresponding to the sub-region.
  17. 根据权利要求15或16所述的装置,其特征在于,所述至少一个目标像素点为角点。The device according to claim 15 or 16, wherein the at least one target pixel point is a corner point.
  18. 根据权利要求15至17任意一项所述的装置,其特征在于,所述至少一个目标像素点的运动速度和运动方向通过光流法获取。The device according to any one of claims 15 to 17, wherein the movement speed and movement direction of the at least one target pixel point are obtained by an optical flow method.
  19. 根据权利要求15至18任意一项所述的装置,其特征在于,所述获取模块具体用于:The device according to any one of claims 15 to 18, wherein the acquisition module is specifically configured to:
    获取当前图像帧中每个像素点的深度值以及所述当前图像帧的背景区域;Obtain the depth value of each pixel in the current image frame and the background area of the current image frame;
    从所述当前图像帧中所有像素点的深度值中,确定所述当前图像帧的背景区域中每个像素点的深度值。From the depth values of all pixels in the current image frame, determine the depth value of each pixel in the background area of the current image frame.
  20. 根据权利要求19所述的装置,其特征在于,所述获取模块具体用于通过第一神经网络获取当前图像帧中每个像素点的深度值。The apparatus according to claim 19, wherein the obtaining module is specifically configured to obtain the depth value of each pixel in the current image frame through the first neural network.
  21. 根据权利要求19所述的装置,其特征在于,所述摄像头为深度摄像头,所述获取模块具体用于通过深度摄像头获取当前图像帧中每个像素点的深度值。The device according to claim 19, wherein the camera is a depth camera, and the acquiring module is specifically configured to acquire the depth value of each pixel in the current image frame through the depth camera.
  22. 根据权利要求19至21任意一项所述的装置,其特征在于,所述获取模块具体用于通过第二神经网络获取当前图像帧的背景区域。The apparatus according to any one of claims 19 to 21, wherein the acquiring module is specifically configured to acquire the background area of the current image frame through the second neural network.
  23. 一种图像处理装置,其特征在于,包括存储器和处理器;所述存储器存储有代码,所述处理器被配置为执行所述代码,当所述代码被执行时,所述图像处理装置执行如权利要求1至11任意一项所述的方法。An image processing apparatus, characterized in that it includes a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the image processing apparatus executes the following steps: The method of any one of claims 1 to 11.
  24. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有计算机程序,该程序由计算机执行时,使得所述计算机实施权利要求1至11任意一项所述的方法。A computer storage medium, characterized in that the computer storage medium stores a computer program, which, when executed by a computer, causes the computer to implement the method of any one of claims 1 to 11.
  25. 一种计算机程序产品,其特征在于,所述计算机程序产品存储有指令,所述指令在由计算机执行时,使得所述计算机实施权利要求1至11任意一项所述的方法。A computer program product, characterized in that the computer program product stores instructions that, when executed by a computer, cause the computer to implement the method of any one of claims 1 to 11 .
PCT/CN2022/077788 2021-02-26 2022-02-25 Image processing method and related device WO2022179581A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110218462.0 2021-02-26
CN202110218462.0A CN113066001A (en) 2021-02-26 2021-02-26 Image processing method and related equipment

Publications (1)

Publication Number Publication Date
WO2022179581A1 true WO2022179581A1 (en) 2022-09-01

Family

ID=76559272

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/077788 WO2022179581A1 (en) 2021-02-26 2022-02-25 Image processing method and related device

Country Status (2)

Country Link
CN (1) CN113066001A (en)
WO (1) WO2022179581A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113066001A (en) * 2021-02-26 2021-07-02 华为技术有限公司 Image processing method and related equipment
CN114419073B (en) * 2022-03-09 2022-08-12 荣耀终端有限公司 Motion blur generation method and device and terminal equipment
CN116740241A (en) * 2022-09-30 2023-09-12 荣耀终端有限公司 Image processing method and electronic equipment
CN115359097A (en) * 2022-10-20 2022-11-18 湖北芯擎科技有限公司 Dense optical flow generation method and device, electronic equipment and readable storage medium
CN116012675B (en) * 2023-02-14 2023-08-11 荣耀终端有限公司 Model training method, image processing method and electronic equipment
CN117278865A (en) * 2023-11-16 2023-12-22 荣耀终端有限公司 Image processing method and related device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108053363A (en) * 2017-11-30 2018-05-18 广东欧珀移动通信有限公司 Background blurring processing method, device and equipment
CN108063894A (en) * 2017-12-22 2018-05-22 维沃移动通信有限公司 A kind of method for processing video frequency and mobile terminal
CN108805832A (en) * 2018-05-29 2018-11-13 重庆大学 Improvement Gray Projection digital image stabilization method suitable for tunnel environment characteristic
CN110400331A (en) * 2019-07-11 2019-11-01 Oppo广东移动通信有限公司 Depth map treating method and apparatus
US20210125305A1 (en) * 2018-04-11 2021-04-29 Nippon Telegraph And Telephone Corporation Video generation device, video generation method, program, and data structure
CN112822402A (en) * 2021-01-08 2021-05-18 重庆创通联智物联网有限公司 Image shooting method and device, electronic equipment and readable storage medium
CN113066001A (en) * 2021-02-26 2021-07-02 华为技术有限公司 Image processing method and related equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692692A (en) * 2009-11-02 2010-04-07 彭健 Method and system for electronic image stabilization
KR101092250B1 (en) * 2010-02-18 2011-12-12 중앙대학교 산학협력단 Apparatus and method for object segmentation from range image
CN104219532B (en) * 2013-06-05 2017-10-17 华为技术有限公司 The method and apparatus for determining interpolation frame between the method in wisp region, frame of video
CN108076286B (en) * 2017-11-30 2019-12-27 Oppo广东移动通信有限公司 Image blurring method and device, mobile terminal and storage medium
CN108347558A (en) * 2017-12-29 2018-07-31 维沃移动通信有限公司 A kind of method, apparatus and mobile terminal of image optimization

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108053363A (en) * 2017-11-30 2018-05-18 广东欧珀移动通信有限公司 Background blurring processing method, device and equipment
CN108063894A (en) * 2017-12-22 2018-05-22 维沃移动通信有限公司 A kind of method for processing video frequency and mobile terminal
US20210125305A1 (en) * 2018-04-11 2021-04-29 Nippon Telegraph And Telephone Corporation Video generation device, video generation method, program, and data structure
CN108805832A (en) * 2018-05-29 2018-11-13 重庆大学 Improvement Gray Projection digital image stabilization method suitable for tunnel environment characteristic
CN110400331A (en) * 2019-07-11 2019-11-01 Oppo广东移动通信有限公司 Depth map treating method and apparatus
CN112822402A (en) * 2021-01-08 2021-05-18 重庆创通联智物联网有限公司 Image shooting method and device, electronic equipment and readable storage medium
CN113066001A (en) * 2021-02-26 2021-07-02 华为技术有限公司 Image processing method and related equipment

Also Published As

Publication number Publication date
CN113066001A (en) 2021-07-02

Similar Documents

Publication Publication Date Title
WO2022179581A1 (en) Image processing method and related device
WO2022042713A1 (en) Deep learning training method and apparatus for use in computing device
WO2021043168A1 (en) Person re-identification network training method and person re-identification method and apparatus
WO2022083536A1 (en) Neural network construction method and apparatus
WO2021238366A1 (en) Neural network construction method and apparatus
WO2021043273A1 (en) Image enhancement method and apparatus
WO2022001805A1 (en) Neural network distillation method and device
CN111368972B (en) Convolutional layer quantization method and device
CN111667399A (en) Method for training style migration model, method and device for video style migration
WO2022001372A1 (en) Neural network training method and apparatus, and image processing method and apparatus
WO2023083030A1 (en) Posture recognition method and related device
WO2021103731A1 (en) Semantic segmentation method, and model training method and apparatus
CN113449573A (en) Dynamic gesture recognition method and device
WO2021175278A1 (en) Model updating method and related device
WO2022052782A1 (en) Image processing method and related device
WO2022179603A1 (en) Augmented reality method and related device thereof
WO2022111387A1 (en) Data processing method and related apparatus
WO2022179586A1 (en) Model training method, and device associated therewith
CN113011562A (en) Model training method and device
CN111767947A (en) Target detection model, application method and related device
CN114359289A (en) Image processing method and related device
CN112258565B (en) Image processing method and device
WO2022165722A1 (en) Monocular depth estimation method, apparatus and device
WO2024067113A1 (en) Action prediction method and related device thereof
CN113627421A (en) Image processing method, model training method and related equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22758945

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22758945

Country of ref document: EP

Kind code of ref document: A1