WO2020042126A1 - Focusing apparatus, method and related device - Google Patents

Focusing apparatus, method and related device Download PDF

Info

Publication number
WO2020042126A1
WO2020042126A1 PCT/CN2018/103370 CN2018103370W WO2020042126A1 WO 2020042126 A1 WO2020042126 A1 WO 2020042126A1 CN 2018103370 W CN2018103370 W CN 2018103370W WO 2020042126 A1 WO2020042126 A1 WO 2020042126A1
Authority
WO
WIPO (PCT)
Prior art keywords
roi
image
target
effective
information
Prior art date
Application number
PCT/CN2018/103370
Other languages
French (fr)
Chinese (zh)
Inventor
马彦鹏
宋永福
杨琪
王军
陈聪
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201880096896.4A priority Critical patent/CN112602319B/en
Priority to PCT/CN2018/103370 priority patent/WO2020042126A1/en
Publication of WO2020042126A1 publication Critical patent/WO2020042126A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules

Definitions

  • the present application relates to the field of image processing technologies, and in particular, to a focusing device, method, and related equipment.
  • Smartphone camera photography technology is moving towards SLR.
  • SLR Small Remote Location Register
  • many smartphone cameras have surpassed traditional card cameras in terms of camera capabilities.
  • High-quality photography relies on high-precision focusing technology.
  • the existing focusing technology In the shooting of static scenes, the existing focusing technology generally places the focus point on the center of the screen. This focusing method can meet the needs of most consumers.
  • the center focus at this time will often cause the shooting target to be blurred.
  • shooting dynamic scenes especially when the target is moving fast, this fixed center focus cannot meet the needs, so it is urgent to develop high-precision motion tracking technology.
  • Embodiments of the present invention provide a focusing device, method, and related equipment to improve focusing accuracy.
  • an embodiment of the present invention provides a focusing device, including a processor, a neural network processor and an image signal processor coupled to the processor; the image signal processor is configured to generate a first image
  • the neural network processor is configured to obtain a first region of interest ROI set in the first image, where the first ROI set includes one or more first ROIs, and each first ROI includes one shooting An object; the processor, configured to: obtain a second ROI set in the first image, where the second ROI set includes one or more second ROIs, and each second ROI is a motion region; based on the first A ROI set and the second ROI set determine a target ROI in the first image; determine characteristic information of the target ROI; and identify the target ROI in the image signal processing according to the characteristic information of the target ROI Position information and size information in the second image generated by the processor, the first image is located before the second image in the time domain; and focusing is performed according to the position information and size information.
  • one or more candidate shooting objects are obtained by using NPU for AI object detection through image frames generated by the ISP in the focusing device, and one or more candidate motion areas are obtained by using a processor for moving object detection.
  • the detected subject and the motion area are integrated to determine the target ROI to be finally focused, and subsequent tracking and focusing are performed based on the characteristic information of the target ROI. That is, using AI target detection and moving target detection, automatically comprehensively identify the target ROI in the field of view FOV, and then use the target ROI tracking algorithm to accurately calculate the real-time motion trajectory and size of the target ROI, and finally use the autofocus AF algorithm to calculate Movement track, do sports follow focus.
  • the entire process does not require manual intervention by the user and the tracking focus is accurate, which greatly improves the shooting experience and effect.
  • the processor is specifically configured to determine a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is in the first Within a first preset region of an image; determining a valid second ROI from one or more second ROIs in the second ROI set, the valid second ROI being within a second preset of the first image Within a region; and in a case where an intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as a target ROI.
  • the first ROI set and the second ROI set are filtered to improve the recognition accuracy of the target ROI. And when the overlapping area between the effective first ROI and the effective second ROI is large, it indicates that the detection of the subject and the moving area at this time is likely to include the effective first area, so the effective first area can be As the target ROI.
  • the processor is further specifically configured to: when the intersection ratio of the effective first ROI and the effective second ROI is less than a preset threshold, reduce the effective first ROI The ROI between the two ROIs and the effective first ROI which is closer to the center point of the first image is determined as the target ROI.
  • the overlapping area between the effective first ROI and the effective second ROI when the overlapping area between the effective first ROI and the effective second ROI is small, it may indicate that the detection at this time is incorrect or the target ROI is drifting, so an ROI closer to the center point may be selected As the target ROI.
  • the valid first ROI has a highest evaluation score in one or more first ROIs within a first preset region of the first image; and / or the valid first ROI
  • the two ROIs have the highest evaluation score in one or more second ROIs in the second preset region of the first image; wherein the evaluation score of each ROI satisfies at least one of the following: the area with the ROI Proportionally proportional to the distance of the ROI from the center point of the first image, and proportional to the priority of the object category to which the ROI belongs.
  • the processor when there are still multiple ROIs that may still exist after the processor performs filtering through a preset area, at this time, the area of the ROI, the distance from the center point of the first image, and the priority of the category to which the subject belongs The level is judged, and an ROI with a higher possibility of tracking and focusing is selected.
  • the processor is further configured to update the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.
  • the characteristic information of the target ROI is determined according to the characteristic information of the first image corresponding to the target ROI and the characteristic information of at least one third image.
  • the at least one third image is Time domain is located between the first image and the second image.
  • the processor not only needs to determine the initial value of the target ROI, but also needs to update the feature information in real time based on the motion tracking situation of the target ROI to more accurately track the focus.
  • the processor is further configured to: recalculate the target ROI after a first preset time period; or when the tracking confidence of the target ROI is less than a confidence threshold , Recalculating the target ROI, wherein the tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence is directly proportional to the tracking accuracy.
  • the processor not only needs to update the feature information in real time based on the tracking situation of the target ROI to more accurately track the focus, but also the updated feature information needs to be time-efficient.
  • the confidence level of the target ROI is low, it is necessary to consider initializing related parameters to perform a new round of confirmation and tracking of the target ROI.
  • the feature information includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information.
  • the embodiments of the present invention provide multiple extraction methods of feature information to meet the requirements for extracting feature information in different images or different scenes.
  • an embodiment of the present invention provides a focusing method, which may include:
  • the first ROI set being a ROI set obtained from a first image generated by an image signal processor, the first ROI set including one or more first ROI, each first ROI includes a photographic subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more second ROIs, each The two ROIs are moving regions; determining a target ROI in the first image based on the first ROI set and the second ROI set; determining characteristic information of the target ROI; and identifying based on the characteristic information of the target ROI Position information and size information of the target ROI in a second image generated by the image signal processor, and the first image is located before the second image in the time domain; based on the position information and size information, Focus.
  • the determining a target ROI in the first image based on the first ROI set and the second ROI set includes: from one or more of the first ROI set A valid first ROI is determined from each of the first ROIs, the valid first ROI is within a first preset region of the first image; and a valid is determined from one or more second ROIs in the second ROI set A second ROI, where the effective second ROI is within a second preset region of the first image; and at an intersection of the effective first ROI and the effective second ROI that is greater than or equal to a preset threshold IoU
  • the valid first ROI is determined as a target ROI.
  • the method further includes: when the intersection ratio IoU of the effective first ROI and the effective second ROI is less than a preset threshold, dividing the effective second ROI with A ROI closer to the center point of the first image in the effective first ROI is determined as a target ROI.
  • the valid first ROI has a highest evaluation score in one or more first ROIs within a first preset region of the first image; and / or the valid first ROI
  • the two ROIs have the highest evaluation score in one or more second ROIs in the second preset region of the first image; wherein the evaluation score of each ROI satisfies at least one of the following: the area with the ROI Proportionally proportional to the distance of the ROI from the center point of the first image, and proportional to the priority of the object category to which the ROI belongs.
  • the method further includes: updating the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.
  • the characteristic information of the target ROI is determined according to the characteristic information of the first image corresponding to the target ROI and the characteristic information of at least one third image.
  • the at least one third image is Time domain is located between the first image and the second image.
  • the method further includes: recalculating the target ROI after a first preset period of time; or re-calculating the target ROI when the tracking confidence is less than a confidence threshold. Calculate the target ROI, wherein the tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence is directly proportional to the tracking accuracy.
  • the feature information includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information.
  • an embodiment of the present invention provides a focusing device, which may include:
  • a first processing unit configured to determine a first ROI set and a second ROI set, where the first ROI set is a ROI set obtained from a first image generated by an image signal processor, and the first ROI set Including one or more first ROIs, each of which includes a photographic subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more A second ROI, each second ROI being a motion region; a second processing unit, configured to determine a target ROI in the first image based on the first ROI set and the second ROI set; a third processing unit, Used to determine feature information of the target ROI; a recognition unit, configured to identify position information and size information of the target ROI in a second image generated by the image signal processor according to the feature information of the target ROI, The first image is located before the second image in a time domain; a focusing unit is configured to focus according to the position information and size information.
  • the second processing unit is specifically configured to determine a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is Within a first preset region of the first image; determining a valid second ROI from one or more second ROIs in the second ROI set, the valid second ROI being within a first Within two preset regions; and in a case where an intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as a target ROI.
  • the second processing unit is further configured to:
  • the effective second ROI and the effective first ROI are distanced from the first image center point. The more recent ROI is determined as the target ROI.
  • the valid first ROI has a highest evaluation score in one or more first ROIs within a first preset region of the first image; and / or the valid first ROI
  • the two ROIs have the highest evaluation score in one or more second ROIs in the second preset region of the first image; wherein the evaluation score of each ROI satisfies at least one of the following: the area with the ROI Proportionally proportional to the distance of the ROI from the center point of the first image, and proportional to the priority of the object category to which the ROI belongs.
  • the third processing unit is further configured to update the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.
  • the characteristic information of the target ROI is determined according to the characteristic information of the first image corresponding to the target ROI and the characteristic information of at least one third image.
  • the at least one third image is Time domain is located between the first image and the second image.
  • the apparatus further includes:
  • a first initialization unit configured to recalculate the target ROI after a first preset time period
  • a second initialization unit configured to recalculate the target ROI when the tracking confidence of the target ROI is less than a confidence threshold, where the tracking confidence is used to indicate the tracking accuracy of the target ROI,
  • the tracking confidence is directly proportional to the tracking accuracy.
  • the feature information includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information.
  • an embodiment of the present invention provides an electronic device, including an image sensor and the focusing device according to any one of the foregoing first aspects; wherein
  • the image sensor is used to collect image data
  • the image signal processor is configured to generate the first image based on the image data.
  • the electronic device further includes: a memory for storing program instructions; and the program instructions are executed by the processor.
  • the present application provides a focusing device having the function of implementing any of the above-mentioned focusing methods.
  • This function can be realized by hardware, and can also be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the present application provides a terminal.
  • the terminal includes a processor, and the processor is configured to support the terminal to perform a corresponding function in a focusing method provided in the second aspect.
  • the terminal may further include a memory, which is used for coupling with the processor, and stores the program instructions and data necessary for the terminal.
  • the terminal may further include a communication interface for the terminal to communicate with other devices or a communication network.
  • the present application provides a computer storage medium that stores a computer program that, when executed by a processor, implements the focusing method flow described in any one of the second aspects.
  • an embodiment of the present invention provides a computer program.
  • the computer program includes instructions.
  • the computer program can execute the focusing method process according to any one of the second aspects.
  • the present application provides a chip system that includes a processor, and is configured to implement functions involved in the focusing method process in any one of the foregoing second aspects.
  • the chip system further includes a memory, and the memory is configured to store program instructions and data necessary for the focusing method.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • FIG. 1 is a schematic structural diagram of a focusing device according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a first image according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of another focusing device according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a functional principle of a focusing device according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of an SSD network implementation process provided by an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of screening a target ROI provided by an embodiment of the present invention.
  • FIG. 7 is a schematic flowchart of determining a target ROI according to an embodiment of the present invention.
  • FIG. 8 is a schematic flowchart of a target ROI tracking process according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of target ROI tracking provided by an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of updating feature information of a target ROI according to an embodiment of the present invention.
  • FIG. 11 is a hardware structural diagram of a neural network processor according to an embodiment of the present invention.
  • FIG. 12 is a schematic flowchart of a focusing method according to an embodiment of the present invention.
  • FIG. 13 is a schematic structural diagram of another focusing device according to an embodiment of the present invention.
  • an embodiment herein means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application.
  • the appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are they separate or alternative embodiments that are mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
  • a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and / or a computer.
  • an application running on a computing device and a computing device can be components.
  • One or more components can reside within a process and / or thread of execution, and a component can be localized on one computer and / or distributed between 2 or more computers.
  • these components can execute from various computer readable media having various data structures stored thereon.
  • a component may, for example, be based on a signal having one or more data packets (e.g., data from two components that interact with another component between a local system, a distributed system, and / or a network, such as the Internet that interacts with other systems through signals) Communicate via local and / or remote processes.
  • data packets e.g., data from two components that interact with another component between a local system, a distributed system, and / or a network, such as the Internet that interacts with other systems through signals
  • ROI Region of interest
  • the area to be processed is outlined from the processed image in the form of boxes, circles, ellipses, and irregular polygons. It is called interest. region.
  • AI Artificial Intelligence
  • AI is a theory, method, technology, and method that uses digital computers or digital computer-controlled machines to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. operating system.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic theories of AI.
  • Convolutional Neural Network is a multi-layer neural network. Each layer consists of multiple two-dimensional planes, and each plane consists of multiple independent neurons. The neurons share weights, and the number of parameters in the neural network can be reduced by weight sharing.
  • a processor performing a convolution operation usually converts a convolution of an input signal feature and a weight into a matrix multiplication operation between a signal matrix and a weight matrix.
  • the signal matrix and the weight matrix are divided into blocks to obtain multiple Fractional signal matrices and fractal weight matrices, and then matrix multiplication and accumulation are performed on the multiple fractal signal matrices and fractal weight matrices.
  • Image Signal Processing is a unit that is mainly used to process the output signal of the front-end image sensor to match the image sensors of different manufacturers.
  • Image processor for cameras.
  • the pipelined image processing engine can process image signals at high speed. It is also equipped with a dedicated circuit for the evaluation of Auto Exposure / Auto Focus / Auto White Balance.
  • Intersection-over-Union a concept used in object detection, is the overlap rate between the generated candidate frame and the ground truth frame, that is, their The ratio of intersection to union. Ideally, they are completely overlapping, that is, the ratio is 1.
  • a fixed center position is set in advance as the focus area.
  • the AF algorithm needs to reconfigure the focus point, which lengthens the focusing time and the user's photo taking time.
  • the focus cannot follow the target movement in real time.
  • Focus tracking method based on the detection of the feature points. This method detects the feature points in the picture in real time, and then sets the focus on the feature points.
  • Target tracking method based on motion detection, through the content changes of the two frames before and after, quickly identify the moving objects in the shooting scene, and output the moving area to the AF algorithm in real time, and then adjust the focus point to the moving area in real time to achieve the moving target
  • an artificial intelligence servo autofocus function is implemented in the prior art.
  • In a high-speed continuous focusing mode of a moving subject half-press the shutter to capture the subject in the viewfinder and detect its movement track.
  • the built-in autofocus sensor in the SLR can identify whether the object is stationary or moving, and identify its moving direction, so that it can achieve accurate focus when shooting sports, children or animals.
  • the problems and application scenarios that the embodiments of the present invention mainly solve include the following:
  • AI object detection algorithm is used to detect the main object in the picture, and then the main object area is input to the target tracking algorithm to monitor the status of the target in real time
  • the AF algorithm directly sets the focus on the main target object to stabilize the focus.
  • the tracking algorithm will follow the target's movement in real time, and the AF algorithm will do the tracking focus in real time.
  • the AI object detection algorithm combined with the moving target detection algorithm comprehensively outputs the main object in the current picture, and then the target tracking algorithm monitors the position area and size of the output moving target in real time to solve the misidentification of the moving target and the target Problems such as smoothness, unstable target tracking, and discontinuous focus.
  • FIG. 1 is a schematic structural diagram of a focusing device according to an embodiment of the present invention.
  • the focusing device 10 may include a processor 101, a neural network processor 102 and image signal processing coupled to the processor 101. ⁇ 103; wherein,
  • Image Signal Processor (ISP) 103 is used to generate the first image, which can match the image sensors of different manufacturers to process the image data output by the front-end image sensor, and generate corresponding image signals based on the image data .
  • a neural network processor (Neutral Processing Unit, NPU) 102 configured to obtain a first region of interest ROI set in the first image, where the first ROI set includes one or more first ROIs, and each first The ROI includes a subject.
  • the subject can be any object, such as a person, an animal, a building, a plant, etc.
  • the neural network processor 102 recognizes that there is a flower, a person, and a dog in the first image
  • the first ROI set includes The three first ROIs are plants, people, and animals.
  • FIG. 2 is a schematic diagram of a first image provided by an embodiment of the present invention.
  • the NPU recognizes a human face (area 1), a dog face (area 3), a flower (area 4), and
  • the table (area 5) is the first ROI.
  • a processor (Central Processing Unit) 101 is configured to obtain a second ROI set in the first image, and determine a target ROI in the first image based on the first ROI set and the second ROI set. Determining characteristic information of the target ROI; identifying position information and size information of the target ROI in the second image generated by the image signal processor 103 according to the characteristic information of the target ROI, and according to the position Information and size information to focus.
  • the second ROI set includes one or more second ROIs, and each second ROI is a motion region. For example, if a puppy is moving through a frame or frames before the first image and the first image, then the area where the puppy is located in the first image is determined as the second ROI.
  • the first image is located before the second image in the time domain, that is, the feature information of the target ROI determined by integrating AI recognition and motion detection in the previously collected and generated image is used as a basis for subsequent tracking of the target ROI.
  • Real-time tracking focus It can be understood that if no object movement is detected in the first image, the second ROI set may also be an empty set, which is equivalent to a static shooting scene.
  • the CPU detects that a person is moving through the motion, and thus recognizes that the region 2 where the character is located is a motion region, that is, a second ROI.
  • the processor 101 is further configured to, for example, run a general operating system software, and control the neural network processor 102 and the image signal processor 103 to perform focusing under the function of the general operating system software.
  • the first image generated by the image signal processor 103 is sent to the neural network processor 102 to obtain a first ROI set, and the first ROI set obtained by the neural network processor 102 is received.
  • the processor 101 is further configured to complete calculation processing and control related to the focusing process.
  • the aforementioned neural network processor may also be integrated in the processor 101 as a part of the processor 101; it may also be another functional chip coupled to the processor 101 and capable of obtaining the first ROI set; Similarly, the functions performed by the processor 101 may be distributed and executed on multiple different function chips, which is not specifically limited in the embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of another focusing device according to an embodiment of the present invention
  • FIG. 4 is a functional principle schematic diagram of a focusing device according to an embodiment of the present invention.
  • the focusing device 10 may include a processor 101, a neural network processor 102 and an image signal processor 103 coupled to the processor 101, and a lens 104, an image sensor 105, and a focusing device coupled to the image signal processor 103.
  • Voice Coil Motor (VCM) 106 Voice Coil Motor
  • the lens 104 is configured to focus the optical information of the real world on the image sensor through the principle of optical imaging.
  • the lens 104 may be a rear camera, a front camera, a rotary camera, etc. of a terminal (such as a smart phone).
  • the image sensor 105 is configured to output image data based on optical information collected by the lens 103 to provide the image data to the image signal processor 103 to generate a corresponding image signal.
  • the focus motor 106 may include a mechanical structure for performing static or dynamic focusing based on the position information and size information of the target ROI determined by the processor 101. For example, if the processor 101 recognizes that the target ROI is in a stationary state, the processor 101 controls the focus motor 106 to perform static focusing; if the processor 101 recognizes that the target ROI is in a moving state, the processor 101 controls the focus motor 106 to perform dynamic focusing .
  • the focusing device in FIG. 1 or FIG. 3 may be located in a terminal (such as a smart phone, a tablet, a smart wearable device, etc.), a smart camera device (a smart camera, a smart camera, a smart tracking device), and a smart monitoring device. , Aerial drones, etc., this application will not list them one by one.
  • one or more candidate shooting objects are obtained through AI object detection using the NPU through the image frames generated by the ISP in the focusing device of FIG. 1 or FIG. 3 described above, and one or more are obtained through moving object detection using the processor.
  • a plurality of candidate motion regions are combined with the detected shooting objects and motion regions to determine a target ROI to be finally focused, and subsequent tracking and focusing are performed based on the characteristic information of the target ROI. That is, AI target detection and moving target detection are used to automatically and comprehensively identify the target ROI in the field of view FOV, and then use the target ROI tracking algorithm to accurately calculate the real-time motion trajectory and size of the target ROI.
  • the auto-focus AF algorithm is based on the real-time target ROI. Movement track, do sports follow focus. The entire process does not require manual intervention by the user and the tracking focus is accurate, which greatly improves the shooting experience and effect.
  • the neural network processor 102 obtains the first ROI set in the first image, and specifically implements The method can be as follows:
  • the neural network processor 102 uses an AI object detection algorithm to obtain the target object in the picture (the first image), that is, the target ROI, and uses a general structure (such as the first few layers of structures such as resnet18, resnet26, etc.) as the basic network, and then on this basis Add other layers as the detection structure.
  • the classification base model extracts the low-level features of the image to ensure that the low-level features can be distinguished. By adding a classifier of shallow features, it can help improve the classification performance.
  • the detection part makes it possible to output a series of discretized bounding boxes on feature maps at different levels and the probability that each box contains an object instance. Finally, a non-maximum suppression (NMS) algorithm is performed to obtain the final object prediction result.
  • the detection model algorithm may adopt a single shot detection (SSD) framework. Please refer to FIG. 5.
  • SSD single shot detection
  • the main body adopts a one-stage detection structure, which prevents a large number of candidate target positions similar to faster-rcnn from entering two stages, thereby greatly improving the detection speed.
  • each layer of features has different receptive fields, so that it can adapt to detect targets of different sizes and achieve better performance.
  • the default boxes determine the initial position of the final prediction box. Through different sizes and ratios, it can adapt to different sizes and shapes of the main object, and give the optimal initial value to make the prediction more accurate. accurate.
  • the AI object detection algorithm runs on the NPU, considering the limitation of power consumption performance, it can output detection results every 10 frames.
  • the types of objects that can be detected include: flowers, people, cats, dogs, birds, bicycles, buses, Motorcycle, truck, car, train, boat, horse, kite, balloon, vase, bowl, plate, cup, classic handbag.
  • the priority of the object category to which the shooting object belongs can be divided into four levels, the first priority is human, the second priority is flower, the third priority is cat and dog, and the fourth priority is the rest.
  • the specific implementation manner of the processor 101 in the focusing device 10 acquiring the second ROI set in the first image may be as follows:
  • the processor 101 may obtain a second ROI set by using a moving target detection algorithm.
  • the moving object detection algorithm is performed once every two frames, that is, the moving area in the current image is output every two frames.
  • the speed of the movement and the direction of the movement can be further output.
  • region 2 is the second ROI, which is the motion region output by the motion detection algorithm
  • region 1 is the final target ROI.
  • the specific implementation manner that the processor 10 in the focusing device 10 determines the target ROI in the first image based on the first ROI set and the second ROI set may be: a processor 101 Determine a valid first ROI from one or more first ROIs in the first ROI set, and determine a valid second ROI from one or more second ROIs in the second ROI set; and When the effective first ROI crosses the effective second ROI and the ratio IoU is greater than or equal to a preset threshold, determining the effective first ROI as the target ROI; wherein the effective first ROI is in the Within a first preset region of the first image; the effective second ROI is within a second preset region of the first image.
  • the processor 101 adds the effective second ROI to the effective first ROI.
  • a ROI closer to the center point of the first image is determined as a target ROI. That is, when the overlapping area between the effective first ROI and the effective second ROI is large, it indicates that the detection of the subject and the moving area at this time may include the effective first area, so the effective first area can be As the target ROI; when the overlapping area between the effective first ROI and the effective second ROI is small, it may indicate that the detection is wrong or the target ROI is drifting, so the ROI closer to the center point can be selected as Target ROI.
  • the target ROI may also be selected according to other calculation rules, such as combining a valid first ROI and a valid second ROI to obtain a new ROI, which is not enumerated in this application.
  • FIG. 6 is a schematic diagram of screening a target ROI provided by an embodiment of the present invention.
  • a first image (field of view of a camera) displayed on a mobile phone screen in FIG. 6 has a width of width and a height of height.
  • the second ROI is valid within the second preset region.
  • the length or width of the invalid region w2 min (width, height) ⁇ 0.1; at this time, ROI1 and ROI2 is valid, ROI0 is invalid.
  • the effective first ROI has the highest evaluation score in one or more first ROIs in the first preset region of the first image; and / or the effective second ROI is in the The one or more second ROIs in the second preset region of the first image have the highest evaluation score; wherein the evaluation score of each ROI satisfies at least one of the following: proportional to the area of the ROI, and The distance of the ROI from the center point of the first image is inversely proportional to the priority of the object category to which the ROI belongs. That is, when multiple ROIs may still exist after filtering through the corresponding preset regions, at this time, the area of the ROI, the distance from the center point of the first image, and the priority of the category to which the subject belongs can be determined.
  • the priority of different object categories can also be set according to the current shooting mode. For example, in portrait mode, people have the highest priority, and in landscape mode, plants or buildings have the highest priority.
  • FIG. 7 is a schematic flowchart of determining a target ROI according to an embodiment of the present invention.
  • AI object detection is performed by the NPU to obtain a first ROI set
  • moving object detection is performed by the CPU to obtain a second ROI set.
  • the processor 101 detects whether the first ROI in the first ROI set and the second ROI in the second set are valid.
  • the focusing device 10 in the embodiment of the present invention may also combine other preset strategies to provide different methods for determining the target ROI in different scenarios.
  • the preset strategy may include: 1) user-specified priority; 2) AI object detection priority; 3) motion detection priority; 4) joint selection of object detection and motion detection.
  • the feature information of the target ROI determined by the processor 101 in the above-mentioned focusing device 10 includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information.
  • it only includes the color feature Hog information extracted by the processor 101, only the directional gradient hog information extracted by the processor 101, or only the CNN information extracted by the neural network processor 102, or it is one of the three types of information described above. Any two, or a combination of three.
  • the above-mentioned direction gradient hog information and color lab information can be extracted by the processor 101, and CNN information can be extracted by the neural network processor 102, and then sent to the processor 101 through the neural network processor 102.
  • the processor 101 further updates the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.
  • the feature information of the target ROI is determined according to the feature information of the first image corresponding to the target ROI and the feature information of at least one third image, the at least one third image Located between the first image and the second image in the time domain. That is, the processor 10 in the focusing device 10 is in the process of identifying the position information and the size information of the target ROI in the second image generated by the image signal processor according to the characteristic information of the target ROI.
  • the processor 101 recalculates the target ROI after the first preset time period; or when the tracking confidence of the target ROI is less than the confidence threshold, the target ROI is recalculated, where The tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence is directly proportional to the tracking accuracy.
  • the processor 101 not only needs to update the feature information in real time based on the tracking condition of the target ROI to more accurately track the focus, but also the updated feature information is time-effective. After a long period of time, or the currently tracked When the confidence level of the target ROI is low, it is necessary to consider initializing related parameters to perform a new round of confirmation and tracking of the target ROI.
  • FIG. 8 is a schematic diagram of a target ROI tracking process according to an embodiment of the present invention.
  • the processor 101 selects a certain feature or a combination of multiple features to determine the feature information according to a preset rule, and determines whether to initialize the tracker after the rule judgment. If the tracker does not need to be initialized, directly Enter the tracking calculation, output the position and size information of the target ROI, and output a possible response map of the target's position, and finally update the feature information based on the new position and size of the target ROI, which can mainly include the following steps:
  • This part can choose different feature combinations according to different needs, such as using the hog feature alone, or a combination of hog + lab + cnn;
  • the tracking calculation algorithm uses related filtering algorithms, such as KCF (Kernel Correlation Filte), ECO (Efficient Convolution Operators), etc.
  • KCF Kernel Correlation Filte
  • ECO Easy Convolution Operators
  • the response graph for each frame of image output is w ⁇ h floating point two-dimensional
  • the array F [w] [h] can be described as F w, h , which has been normalized to the range of 0 to 1.0.
  • the response map reflects the possible distribution of the target ROI in the picture, and the largest point is where the target ROI is located. The position can reflect the confidence level of the target ROI tracking through the response graph.
  • the average correlation peak energy index is average peak-to-correlation energy (APCE), where
  • F max is max (F [w] [h]), which is the maximum value of (F [w] [h]);
  • F min is min (F [w] [h]), which is The minimum value of (F [w] [h]);
  • ⁇ w, h (F w, h -F min ) 2 means traverse each value of F w, h and subtract the minimum value, then do the square operation, and finally find with.
  • This indicator can be used to characterize: when the calculated value of this indicator drops sharply compared with the historical average, it represents that the position and size of the target ORI of the current frame are not reliable, such as the target ROI is blocked or lost.
  • FIG. 9 is a schematic diagram of target ROI tracking provided by an embodiment of the present invention.
  • the initial position of the target ROI is 1, and the movement process from 1 to 6 in the picture
  • the target tracking algorithm module outputs the position and size of the target in each frame in real time. At this time, the tracking confidence is high, and the feature information of the target ROI needs to be updated in real time.
  • the processor 101 uses the target ROI determined by the first image as an initial ROI input. After feature extraction, feature selection, and tracking calculation, the target ROI is calculated in real time for each subsequent frame image (including the first image). Position and size in. The basis for judging whether the feature information is updated is as follows:
  • the feature information is updated in order to satisfy the feature information update condition:
  • the target ROI feature information is not updated, that is, the feature information of the current image frame will not participate in the update of the target ROI feature information to optimize the tracking system and avoid the target ROI Tracking drift
  • the processor 101 may be triggered to re-determine the target ROI (including the NPU reacquiring the first A ROI set, and the CPU reacquires the second ROI set), that is, the initialization update of the tracking is completed again.
  • the position information and size information of the target ROI are output in real time.
  • the position is constrained: the green frame is the effective range when the target is stationary, at this time it is output to the AF algorithm for stable focusing; The frame is the effective range when the target is moving. At this time, the real-time output is output to the AF algorithm for motion tracking.
  • FIG. 10 is a schematic diagram of updating feature information of a target ROI according to an embodiment of the present invention.
  • the image signal processor 103 generates n frames of images in the first preset time period.
  • the feature information of the target ROI is extracted, that is, the feature information A in FIG. 10 Is also the initial identifying feature information of the target ROI; when the image signal processor generates the second frame image, at this time, the feature information B of the second frame image is first obtained; wherein the method of obtaining the feature information B may be based on,
  • the position and size of the target ROI in the first frame image are extracted from the feature information of the position and size corresponding to the area in the second frame image, that is, the feature information B.
  • the subsequent image frames are extracted from the feature information of the target ROI corresponding to the frame. The principle is the same and will not be repeated here.
  • the processor 101 compares the feature information B with the feature information A to determine the position and size of the target ROI determined in the first frame image in the second frame image; at the same time, according to the feature information A and the feature information B determines whether the second frame satisfies the feature information update condition.
  • the feature information of the latest update is used as the comparison model, or when it is determined that the initialization restart conditions are met, but the specified time point is not reached (that is, the processor 101 outputs a new Time point of the target ROI), it also continues to use the most recently updated feature information as the comparison model; however, if it is determined that the initialization restart conditions are met and the specified time point is reached, the target ROI re-output by the processor 101 can be used , And perform a new round of tracking ROI calculation.
  • the application does not specifically limit the conditions for updating the characteristic information and the update formula.
  • the feature information D of the target ROI is determined in the image of the fourth frame, and after the feature information A "updated from the update of the third frame is correlated with the feature information D, it is determined that the current image of the fourth frame does not meet the features Information update conditions (for example, at this time, the target ROI is blocked or drifted greatly in the fourth frame). Therefore, the feature information D of the fourth frame does not participate in the subsequent update of the feature information, so it is necessary to continue to use the information in the third frame.
  • the updated feature information that is, after the feature information E is determined in the fifth frame, is still associated with the feature information updated in the third frame. Further, it is assumed that the feature information E is updated with the feature information A updated in the third frame.
  • Frame 11 image recalculate feature information
  • tracking and focusing can be performed based on the embodiment of the invention described above, and feature information is updated, which is not exhaustive here.
  • the processor 101 enters the target ROI tracking and focusing process, according to the real-time target ROI information, the current state of the target ROI is determined.
  • the target ROI is tracked and focused.
  • the target detection algorithm + motion detection algorithm + Tracking algorithm can solve the two major problems of no ROI information when tracking target movement and ROI loss after the target is stationary.
  • the AF algorithm can directly follow the ROI window for motion tracking, and when the moving target is stationary, it can perform stable focusing, which can solve the focus selection when the target is not in the center. problem.
  • FIG. 11 is a hardware structural diagram of a neural network processor according to an embodiment of the present invention.
  • the neural network processor NPU 102 is mounted on the CPU (such as Host CPU) as a coprocessor, and the Host CPU assigns tasks.
  • the core part of the NPU is an arithmetic circuit 1203.
  • the controller 1204 controls the arithmetic circuit 1203 to extract matrix data in the memory and perform multiplication operations.
  • the arithmetic circuit 1203 includes multiple processing units (Process Engines, PEs). In some implementations, the arithmetic circuit 1203 is a two-dimensional pulsating array. The arithmetic circuit 1203 may also be a one-dimensional pulsation array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1203 is a general-purpose matrix processor.
  • PEs Processing Units
  • the arithmetic circuit 1203 is a two-dimensional pulsating array.
  • the arithmetic circuit 1203 may also be a one-dimensional pulsation array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
  • the arithmetic circuit 1203 is a general-purpose matrix processor.
  • the operation circuit takes the data corresponding to the matrix B from the weight memory 1202, and buffers it on each PE in the operation circuit.
  • the arithmetic circuit takes matrix A data from the input memory 1201 and performs matrix operations on the matrix B. Partial or final results of the obtained matrix are stored in the accumulator 1208 accumulator.
  • the unified memory 1206 is used to store input data and output data.
  • the weight data is directly accessed to the controller 12012 through the storage unit, and the memory is accessed to the controller, and the DMAC is transferred to the weight memory 1202.
  • the input data is also transferred to the unified memory 1206 through the DMAC.
  • BIU stands for Bus Interface Unit, that is, the bus interface unit 1210, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch memory 1209.
  • the bus interface unit 1210 (Bus Interface Unit, referred to as BIU) is used to fetch the instruction memory 1209 to obtain instructions from external memory, and is also used for the storage unit access controller 12012 to obtain the original data of the input matrix A or weight matrix B from the external memory.
  • BIU Bus Interface Unit
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1206 or the weight data to the weight memory 1202 or the input data data to the input memory 1201.
  • the vector calculation unit 1207 has a plurality of arithmetic processing units, and further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc., if necessary. It is mainly used for non-convolutional / FC layer network calculations in neural networks, such as Pooling, Batch Normalization, Local Normalization, and so on.
  • the vector calculation unit can 1207 store the processed output vector into the unified buffer 1206.
  • the vector calculation unit 1207 may apply a non-linear function to the output of the arithmetic circuit 1203, such as a vector of accumulated values, to generate an activation value.
  • the vector calculation unit 1207 generates a normalized value, a merged value, or both.
  • a vector of the processed output can be used as an activation input to the arithmetic circuit 1203, for example for use in subsequent layers in a neural network.
  • An instruction fetch memory 1209 connected to the controller 1204 is used to store instructions used by the controller 1204;
  • the unified memory 1206, the input memory 1201, the weight memory 1202, and the fetch memory 1209 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • FIG. 12 is a schematic flowchart of a focusing method according to an embodiment of the present invention.
  • the focusing method is applicable to any one of the focusing devices in FIG. 1 and FIG. 3 and a device including the focusing device.
  • the method may include the following steps S201-S205.
  • Step S201 Determine a first ROI set and a second ROI set, where the first ROI set is a ROI set obtained from a first image generated by an image signal processor, and the first ROI set includes one or more First ROIs, each of which includes a photographic subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more second ROIs, Each second ROI is a motion area;
  • Step S202 determine a target ROI in the first image based on the first ROI set and the second ROI set;
  • the determining a target ROI in the first image based on the first ROI set and the second ROI set includes:
  • the effective first ROI is determined as the target ROI.
  • the method further includes:
  • the effective second ROI and the effective first ROI are distanced from the first image center point. The more recent ROI is determined as the target ROI.
  • the effective first ROI has a highest evaluation score in one or more first ROIs within a preset area of the first image; and / or the effective second ROI The one or more second ROIs within the preset region of the first image have the highest evaluation score; wherein the evaluation score of each ROI satisfies at least one of the following: proportional to the area of the ROI, and The distance of the ROI from the center point of the first image is inversely proportional to the priority of the object category to which the ROI belongs.
  • Step S203 determine the characteristic information of the target ROI
  • the feature information includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information.
  • the feature information of the target ROI is also updated based on the feature information corresponding to the position and size of the target ROI in the historical image.
  • the characteristic information of the target ROI is determined according to the characteristic information of the first image corresponding to the target ROI and the characteristic information of at least one third image.
  • the at least one third image is Time domain is located between the first image and the second image.
  • Step S204 Identify the position information and size information of the target ROI in the second image generated by the image signal processor according to the characteristic information of the target ROI, and the first image is located in the third region in the time domain.
  • Step S205 Focus according to the position information and size information.
  • the target ROI is recalculated, where the tracking confidence is used to indicate that the tracking of the target ROI is accurate
  • the tracking confidence is directly proportional to the tracking accuracy.
  • FIG. 13 is a schematic structural diagram of another focusing device according to an embodiment of the present invention.
  • the focusing device 30 may include a first processing unit 301, a second processing unit 302, a third processing unit 303, and a recognition unit 304. And focusing unit 305,
  • the first processing unit 301 is configured to determine a first ROI set and a second ROI set, where the first ROI set is a ROI set obtained from a first image generated by an image signal processor, and the first ROI
  • the set includes one or more first ROIs, and each first ROI includes a subject;
  • the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more Second ROIs, each second ROI is a motion area;
  • a second processing unit 302 configured to determine a target ROI in the first image based on the first ROI set and the second ROI set;
  • a third processing unit 303 configured to determine feature information of the target ROI
  • a recognition unit 304 configured to identify position information and size information of the target ROI in a second image generated by the image signal processor according to the characteristic information of the target ROI, where the first image is located in the time domain Before the second image;
  • the focusing unit 305 is configured to perform focusing according to the position information and the size information.
  • the second processing unit 302 is specifically configured to:
  • the effective first ROI is determined as the target ROI.
  • the second processing unit 302 is further configured to:
  • the effective second ROI and the effective first ROI are distanced from the first image center point. The more recent ROI is determined as the target ROI.
  • the effective first ROI has a highest evaluation score in one or more first ROIs within a preset area of the first image; and / or the effective second ROI The one or more second ROIs within the preset region of the first image have the highest evaluation score; wherein the evaluation score of each ROI satisfies at least one of the following: proportional to the area of the ROI, and The distance of the ROI from the center point of the first image is inversely proportional to the priority of the object category to which the ROI belongs.
  • the third processing unit 303 is further configured to update the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.
  • the characteristic information of the target ROI is determined according to the characteristic information of the first image corresponding to the target ROI and the characteristic information of at least one third image.
  • the at least one third image is Time domain is located between the first image and the second image.
  • the apparatus further includes:
  • a first initialization unit 306 configured to recalculate the target ROI after a first preset time period
  • a second initialization unit 307 is configured to recalculate the target ROI when the tracking confidence of the target ROI is less than a confidence threshold, where the tracking confidence is used to indicate the tracking accuracy of the target ROI , The tracking confidence is directly proportional to the tracking accuracy.
  • the feature information includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information.
  • Each unit in FIG. 13 may be implemented in software, hardware, or a combination thereof.
  • Units implemented in hardware may include circuits and electric furnaces, algorithm circuits, or analog circuits.
  • a unit implemented in software may include program instructions, which is regarded as a software product, stored in a memory, and may be run by a processor to implement related functions. For details, refer to the previous introduction.
  • An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a program, and when the program is executed, it includes part or all of the steps described in any of the foregoing method embodiments.
  • An embodiment of the present invention further provides a computer program.
  • the computer program includes instructions.
  • the computer program When the computer program is executed by a computer, the computer can perform part or all of the steps of any method for upgrading a vehicle-mounted device.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the above units is only a logical function division.
  • multiple units or components may be combined or integrated.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, which may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
  • the technical solution of the present application is essentially a part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium. It includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, and specifically a processor in a computer device) to perform all or part of the steps of the foregoing method in each embodiment of the present application.
  • a computer device which may be a personal computer, a server, or a network device, and specifically a processor in a computer device
  • the foregoing storage medium may include: a U disk, a mobile hard disk, a magnetic disk, an optical disk, a read-only memory (abbreviation: ROM), or a random access memory (Random Access Memory, abbreviation: RAM).
  • ROM read-only memory
  • RAM random access memory

Abstract

Disclosed are a focusing apparatus, method and related device. The focusing apparatus includes a processor, and an NPU and an ISP coupled to a CPU; the ISP is used to generate a first image; the NPU is used to acquire a first region of interest (ROI) set in the first image, wherein the first ROI set includes one or more first ROIs, and each first ROI includes a photographic object; and the CPU is used to: acquire a second ROI set in the first image, wherein the second ROI set includes one or more second ROIs, and each second ROI is a motion region; determine a target ROI in the first image based on the first ROI set and the second ROI set; and according to characteristic information of the target ROI, identify position information and size information of the target ROI in the second image and perform focusing, wherein the first image is located before the second image in a time domain. By means of the present application, the accuracy of focusing can be improved.

Description

一种对焦装置、方法及相关设备Focusing device, method and related equipment 技术领域Technical field
本申请涉及图像处理技术领域,尤其涉及一种对焦装置、方法及相关设备。The present application relates to the field of image processing technologies, and in particular, to a focusing device, method, and related equipment.
背景技术Background technique
智能手机相机拍照技术正朝着单反方向发展,目前很多智能手机相机在拍照能力方面已经超越了传统的卡片相机。高质量的拍照依赖高精度的对焦技术,在静态场景的拍摄中,现有的对焦技术一般都是将对焦点放在画面中心位置,这种对焦方法可以满足大部分消费者的需求,但是当拍摄目标不在视场中心时,这时候的中心对焦往往会导致拍摄目标模糊。在拍摄动态场景时,尤其当目标快速运动时,这种固定的中心对焦无法满足需求,因此急需发展高精度的运动追焦技术。Smartphone camera photography technology is moving towards SLR. At present, many smartphone cameras have surpassed traditional card cameras in terms of camera capabilities. High-quality photography relies on high-precision focusing technology. In the shooting of static scenes, the existing focusing technology generally places the focus point on the center of the screen. This focusing method can meet the needs of most consumers. When the shooting target is not in the center of the field of view, the center focus at this time will often cause the shooting target to be blurred. When shooting dynamic scenes, especially when the target is moving fast, this fixed center focus cannot meet the needs, so it is urgent to develop high-precision motion tracking technology.
发明内容Summary of the Invention
本发明实施例提供一种对焦装置、方法及相关设备,以提升对焦精准度。Embodiments of the present invention provide a focusing device, method, and related equipment to improve focusing accuracy.
第一方面,本发明实施例提供了一种对焦装置,包括处理器、以及耦合于所述处理器的神经网络处理器和图像信号处理器;所述图像信号处理器,用于生成第一图像;所述神经网络处理器,用于获取所述第一图像中的第一感兴趣区域ROI集合,所述第一ROI集合包括一个或者多个第一ROI,每个第一ROI中包括一个拍摄对象;所述处理器,用于:获取所述第一图像中第二ROI集合,所述第二ROI集合包括一个或多个第二ROI,每个第二ROI为运动区域;基于所述第一ROI集合和所述第二ROI集合确定所述第一图像中的目标ROI;确定所述目标ROI的特征信息;根据所述目标ROI的特征信息,识别所述目标ROI在所述图像信号处理器生成的第二图像中的位置信息和大小信息,所述第一图像在时域上位于所述第二图像之前;以及根据所述位置信息和大小信息进行对焦。In a first aspect, an embodiment of the present invention provides a focusing device, including a processor, a neural network processor and an image signal processor coupled to the processor; the image signal processor is configured to generate a first image The neural network processor is configured to obtain a first region of interest ROI set in the first image, where the first ROI set includes one or more first ROIs, and each first ROI includes one shooting An object; the processor, configured to: obtain a second ROI set in the first image, where the second ROI set includes one or more second ROIs, and each second ROI is a motion region; based on the first A ROI set and the second ROI set determine a target ROI in the first image; determine characteristic information of the target ROI; and identify the target ROI in the image signal processing according to the characteristic information of the target ROI Position information and size information in the second image generated by the processor, the first image is located before the second image in the time domain; and focusing is performed according to the position information and size information.
本发明实施例,通过对焦装置中的ISP生成的图像帧,利用NPU进行AI物体检测得到一个或者多个候选的拍摄对象,以及利用处理器进行运动物体检测得到一个或者多个候选的运动区域,并综合检测到的拍摄对象以及运动区域确定最终要对焦的目标ROI,并基于该目标ROI的特征信息进行后续的跟踪对焦。即利用AI目标检测和运动目标检测,自动综合识别出视场FOV中的目标ROI,然后采用目标ROI跟踪算法精确计算目标ROI的实时运动轨迹和大小,最后通过自动对焦AF算法依据目标ROI的实时运动轨迹,做运动追焦。全程不需要用户手动干预选择且跟踪对焦精准,极大的提升了拍摄的体验和效果。In the embodiment of the present invention, one or more candidate shooting objects are obtained by using NPU for AI object detection through image frames generated by the ISP in the focusing device, and one or more candidate motion areas are obtained by using a processor for moving object detection. Finally, the detected subject and the motion area are integrated to determine the target ROI to be finally focused, and subsequent tracking and focusing are performed based on the characteristic information of the target ROI. That is, using AI target detection and moving target detection, automatically comprehensively identify the target ROI in the field of view FOV, and then use the target ROI tracking algorithm to accurately calculate the real-time motion trajectory and size of the target ROI, and finally use the autofocus AF algorithm to calculate Movement track, do sports follow focus. The entire process does not require manual intervention by the user and the tracking focus is accurate, which greatly improves the shooting experience and effect.
在一种可能的实现方式中,所述处理器具体用于:从所述第一ROI集合中的一个或者多个第一ROI中确定有效第一ROI,所述有效第一ROI在所述第一图像的第一预设区域内;从所述第二ROI集合中的一个或者多个第二ROI中确定有效第二ROI,所述有效第二ROI在所述第一图像的第二预设区域内;在所述有效第一ROI与所述有效第二ROI的交并比IoU大于或者等于预设阈值的情况下,将所述有效第一ROI确定为目标ROI。In a possible implementation manner, the processor is specifically configured to determine a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is in the first Within a first preset region of an image; determining a valid second ROI from one or more second ROIs in the second ROI set, the valid second ROI being within a second preset of the first image Within a region; and in a case where an intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as a target ROI.
本发明实施例,通过对第一ROI集合和第二ROI集合进行筛选,以提升目标ROI的识别精准度。且当有效第一ROI和有效第二ROI之间的重叠区域较大时,则表明此时拍摄对象和运动区域的检测均较大可能包含有该有效第一区域,因此可以将有效第一区域作为目标ROI。In the embodiment of the present invention, the first ROI set and the second ROI set are filtered to improve the recognition accuracy of the target ROI. And when the overlapping area between the effective first ROI and the effective second ROI is large, it indicates that the detection of the subject and the moving area at this time is likely to include the effective first area, so the effective first area can be As the target ROI.
在一种可能的实现方式中,所述处理器,还具体用于:在所述有效第一ROI与所述有效第二ROI的交并比小于预设阈值的情况下,将所述有效第二ROI与所述有效第一ROI中距离所述第一图像中心点更近的ROI确定为目标ROI。In a possible implementation manner, the processor is further specifically configured to: when the intersection ratio of the effective first ROI and the effective second ROI is less than a preset threshold, reduce the effective first ROI The ROI between the two ROIs and the effective first ROI which is closer to the center point of the first image is determined as the target ROI.
本发明实施例,当有效第一ROI和有效第二ROI之间的重叠区域较小时,则可能表明此时的检测有误、或者目标ROI发生漂移,因此可以选择其中距离中心点更近的ROI作为目标ROI。In the embodiment of the present invention, when the overlapping area between the effective first ROI and the effective second ROI is small, it may indicate that the detection at this time is incorrect or the target ROI is drifting, so an ROI closer to the center point may be selected As the target ROI.
在一种可能的实现方式中,所述有效第一ROI在所述第一图像的第一预设区域内的一个或者多个第一ROI中具有最高评估分值;和/或所述有效第二ROI在所述第一图像的第二预设区域内的一个或者多个第二ROI中具有最高评估分值;其中,每个ROI的评估分值满足如下至少一项:与该ROI的面积成正比,与该ROI距所述第一图像的中心点的距离成反比,与该ROI所属的物体类别的优先级成正比。In a possible implementation manner, the valid first ROI has a highest evaluation score in one or more first ROIs within a first preset region of the first image; and / or the valid first ROI The two ROIs have the highest evaluation score in one or more second ROIs in the second preset region of the first image; wherein the evaluation score of each ROI satisfies at least one of the following: the area with the ROI Proportionally proportional to the distance of the ROI from the center point of the first image, and proportional to the priority of the object category to which the ROI belongs.
本发明实施例,当处理器通过预设区域进行筛选后,仍然可能存在多个ROI时,此时,可以通过ROI的面积、离第一图像的中心点的距离以及拍摄对象所属的类别的优先级高低进行判断,从中选出跟踪对焦可能性更高的ROI。In the embodiment of the present invention, when there are still multiple ROIs that may still exist after the processor performs filtering through a preset area, at this time, the area of the ROI, the distance from the center point of the first image, and the priority of the category to which the subject belongs The level is judged, and an ROI with a higher possibility of tracking and focusing is selected.
在一种可能的实现方式中,所述处理器还用于:基于所述目标ROI在历史图像中的位置和大小所对应的特征信息更新所述目标ROI的特征信息。In a possible implementation manner, the processor is further configured to update the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.
在一种可能的实现方式中,所述目标ROI的特征信息是根据所述目标ROI对应的第一图像的特征信息和至少一个第三图像的特征信息确定的,所述至少一个第三图像在时域上位于第一图像和第二图像之间。In a possible implementation manner, the characteristic information of the target ROI is determined according to the characteristic information of the first image corresponding to the target ROI and the characteristic information of at least one third image. The at least one third image is Time domain is located between the first image and the second image.
本发明实施例,处理器不仅要确定目标ROI的初始值,还需要基于目标ROI的运动跟踪情况实时的更新特征信息,以更加精准的跟踪对焦。In the embodiment of the present invention, the processor not only needs to determine the initial value of the target ROI, but also needs to update the feature information in real time based on the motion tracking situation of the target ROI to more accurately track the focus.
在一种可能的实现方式中,所述处理器还用于:在第一预设时间段后,重新计算所述目标ROI;或者当所述目标ROI的跟踪置信度小于置信度阈值的情况下,重新计算所述目标ROI,其中,所述跟踪置信度用于指示所述目标ROI的跟踪精确度,所述跟踪置信度与跟踪精确度成正比。In a possible implementation manner, the processor is further configured to: recalculate the target ROI after a first preset time period; or when the tracking confidence of the target ROI is less than a confidence threshold , Recalculating the target ROI, wherein the tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence is directly proportional to the tracking accuracy.
本发明实施例,处理器不仅要基于目标ROI的跟踪情况实时的更新特征信息,以更加精准的跟踪对焦,而且更新的特征信息还需要具有时效性,当较长一段时间之后,或者当前跟踪的目标ROI置信度低的时候,就需要考虑初始化相关参数,进行新一轮的目标ROI的确认及跟踪。In the embodiment of the present invention, the processor not only needs to update the feature information in real time based on the tracking situation of the target ROI to more accurately track the focus, but also the updated feature information needs to be time-efficient. When the confidence level of the target ROI is low, it is necessary to consider initializing related parameters to perform a new round of confirmation and tracking of the target ROI.
在一种可能的实现方式中,所述特征信息包括方向梯度hog信息、颜色lab信息、卷积神经网络CNN信息中的一项或者多项。In a possible implementation manner, the feature information includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information.
本发明实施例,提供多种特征信息提取方式,以满足不同图像或者不同场景下的特征信息提取要求。The embodiments of the present invention provide multiple extraction methods of feature information to meet the requirements for extracting feature information in different images or different scenes.
第二方面,本发明实施例提供了一种对焦方法,可包括:In a second aspect, an embodiment of the present invention provides a focusing method, which may include:
确定第一感兴趣区域ROI集合和第二ROI集合,所述第一ROI集合为从图像信号处理器生成的第一图像中获取的ROI集合,所述第一ROI集合包括一个或者多个第一ROI,每个第一ROI中包括一个拍摄对象;所述第二ROI集合为从所述第一图像中获取的ROI集合, 所述第二ROI集合包括一个或多个第二ROI,每个第二ROI为运动区域;基于所述第一ROI集合和所述第二ROI集合确定所述第一图像中的目标ROI;确定所述目标ROI的特征信息;根据所述目标ROI的特征信息,识别所述目标ROI在所述图像信号处理器生成的第二图像中的位置信息和大小信息,所述第一图像在时域上位于所述第二图像之前;根据所述位置信息和大小信息进行对焦。Determining a first region of interest ROI set and a second ROI set, the first ROI set being a ROI set obtained from a first image generated by an image signal processor, the first ROI set including one or more first ROI, each first ROI includes a photographic subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more second ROIs, each The two ROIs are moving regions; determining a target ROI in the first image based on the first ROI set and the second ROI set; determining characteristic information of the target ROI; and identifying based on the characteristic information of the target ROI Position information and size information of the target ROI in a second image generated by the image signal processor, and the first image is located before the second image in the time domain; based on the position information and size information, Focus.
在一种可能的实现方式中,所述基于所述第一ROI集合和所述第二ROI集合确定所述第一图像中的目标ROI,包括:从所述第一ROI集合中的一个或者多个第一ROI中确定有效第一ROI,所述有效第一ROI在所述第一图像的第一预设区域内;从所述第二ROI集合中的一个或者多个第二ROI中确定有效第二ROI,所述有效第二ROI在所述第一图像的第二预设区域内;在所述有效第一ROI与所述有效第二ROI的交并比IoU大于或者等于预设阈值的情况下,将所述有效第一ROI确定为目标ROI。In a possible implementation manner, the determining a target ROI in the first image based on the first ROI set and the second ROI set includes: from one or more of the first ROI set A valid first ROI is determined from each of the first ROIs, the valid first ROI is within a first preset region of the first image; and a valid is determined from one or more second ROIs in the second ROI set A second ROI, where the effective second ROI is within a second preset region of the first image; and at an intersection of the effective first ROI and the effective second ROI that is greater than or equal to a preset threshold IoU In this case, the valid first ROI is determined as a target ROI.
在一种可能的实现方式中,所述方法还包括:在所述有效第一ROI与所述有效第二ROI的交并比IoU小于预设阈值的情况下,将所述有效第二ROI与所述有效第一ROI中距离所述第一图像中心点更近的ROI确定为目标ROI。In a possible implementation manner, the method further includes: when the intersection ratio IoU of the effective first ROI and the effective second ROI is less than a preset threshold, dividing the effective second ROI with A ROI closer to the center point of the first image in the effective first ROI is determined as a target ROI.
在一种可能的实现方式中,所述有效第一ROI在所述第一图像的第一预设区域内的一个或者多个第一ROI中具有最高评估分值;和/或所述有效第二ROI在所述第一图像的第二预设区域内的一个或者多个第二ROI中具有最高评估分值;其中,每个ROI的评估分值满足如下至少一项:与该ROI的面积成正比,与该ROI距所述第一图像的中心点的距离成反比,与该ROI所属的物体类别的优先级成正比。In a possible implementation manner, the valid first ROI has a highest evaluation score in one or more first ROIs within a first preset region of the first image; and / or the valid first ROI The two ROIs have the highest evaluation score in one or more second ROIs in the second preset region of the first image; wherein the evaluation score of each ROI satisfies at least one of the following: the area with the ROI Proportionally proportional to the distance of the ROI from the center point of the first image, and proportional to the priority of the object category to which the ROI belongs.
在一种可能的实现方式中,所述方法还包括:基于所述目标ROI在历史图像中的位置和大小所对应的特征信息更新所述目标ROI的特征信息。In a possible implementation manner, the method further includes: updating the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.
在一种可能的实现方式中,所述目标ROI的特征信息是根据所述目标ROI对应的第一图像的特征信息和至少一个第三图像的特征信息确定的,所述至少一个第三图像在时域上位于第一图像和第二图像之间。In a possible implementation manner, the characteristic information of the target ROI is determined according to the characteristic information of the first image corresponding to the target ROI and the characteristic information of at least one third image. The at least one third image is Time domain is located between the first image and the second image.
在一种可能的实现方式中,所述方法还包括:在第一预设时间段后,重新计算所述目标ROI;或者当所述目标ROI的跟踪置信度小于置信度阈值的情况下,重新计算所述目标ROI,其中,所述跟踪置信度用于指示所述目标ROI的跟踪精确度,所述跟踪置信度与跟踪精确度成正比。In a possible implementation manner, the method further includes: recalculating the target ROI after a first preset period of time; or re-calculating the target ROI when the tracking confidence is less than a confidence threshold. Calculate the target ROI, wherein the tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence is directly proportional to the tracking accuracy.
在一种可能的实现方式中,所述特征信息包括方向梯度hog信息、颜色lab信息、卷积神经网络CNN信息中的一项或者多项。In a possible implementation manner, the feature information includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information.
第三方面,本发明实施例提供了一种对焦装置,可包括:According to a third aspect, an embodiment of the present invention provides a focusing device, which may include:
第一处理单元,用于确定第一感兴趣区域ROI集合和第二ROI集合,所述第一ROI集合为从图像信号处理器生成的第一图像中获取的ROI集合,所述第一ROI集合包括一个或者多个第一ROI,每个第一ROI中包括一个拍摄对象;所述第二ROI集合为从所述第一图像中获取的ROI集合,所述第二ROI集合包括一个或多个第二ROI,每个第二ROI为运动区域;第二处理单元,用于基于所述第一ROI集合和所述第二ROI集合确定所述第一图像中的目标ROI;第三处理单元,用于确定所述目标ROI的特征信息;识别单元,用于根 据所述目标ROI的特征信息,识别所述目标ROI在所述图像信号处理器生成的第二图像中的位置信息和大小信息,所述第一图像在时域上位于所述第二图像之前;对焦单元,用于根据所述位置信息和大小信息进行对焦。A first processing unit, configured to determine a first ROI set and a second ROI set, where the first ROI set is a ROI set obtained from a first image generated by an image signal processor, and the first ROI set Including one or more first ROIs, each of which includes a photographic subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more A second ROI, each second ROI being a motion region; a second processing unit, configured to determine a target ROI in the first image based on the first ROI set and the second ROI set; a third processing unit, Used to determine feature information of the target ROI; a recognition unit, configured to identify position information and size information of the target ROI in a second image generated by the image signal processor according to the feature information of the target ROI, The first image is located before the second image in a time domain; a focusing unit is configured to focus according to the position information and size information.
在一种可能的实现方式中,所述第二处理单元,具体用于:从所述第一ROI集合中的一个或者多个第一ROI中确定有效第一ROI,所述有效第一ROI在所述第一图像的第一预设区域内;从所述第二ROI集合中的一个或者多个第二ROI中确定有效第二ROI,所述有效第二ROI在所述第一图像的第二预设区域内;在所述有效第一ROI与所述有效第二ROI的交并比IoU大于或者等于预设阈值的情况下,将所述有效第一ROI确定为目标ROI。In a possible implementation manner, the second processing unit is specifically configured to determine a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is Within a first preset region of the first image; determining a valid second ROI from one or more second ROIs in the second ROI set, the valid second ROI being within a first Within two preset regions; and in a case where an intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as a target ROI.
在一种可能的实现方式中,所述第二处理单元还用于:In a possible implementation manner, the second processing unit is further configured to:
在所述有效第一ROI与所述有效第二ROI的交并比IoU小于预设阈值的情况下,将所述有效第二ROI与所述有效第一ROI中距离所述第一图像中心点更近的ROI确定为目标ROI。In a case where the intersection ratio IoU of the effective first ROI and the effective second ROI is less than a preset threshold, the effective second ROI and the effective first ROI are distanced from the first image center point. The more recent ROI is determined as the target ROI.
在一种可能的实现方式中,所述有效第一ROI在所述第一图像的第一预设区域内的一个或者多个第一ROI中具有最高评估分值;和/或所述有效第二ROI在所述第一图像的第二预设区域内的一个或者多个第二ROI中具有最高评估分值;其中,每个ROI的评估分值满足如下至少一项:与该ROI的面积成正比,与该ROI距所述第一图像的中心点的距离成反比,与该ROI所属的物体类别的优先级成正比。In a possible implementation manner, the valid first ROI has a highest evaluation score in one or more first ROIs within a first preset region of the first image; and / or the valid first ROI The two ROIs have the highest evaluation score in one or more second ROIs in the second preset region of the first image; wherein the evaluation score of each ROI satisfies at least one of the following: the area with the ROI Proportionally proportional to the distance of the ROI from the center point of the first image, and proportional to the priority of the object category to which the ROI belongs.
在一种可能的实现方式中,所述第三处理单元还用于:基于所述目标ROI在历史图像中的位置和大小所对应的特征信息更新所述目标ROI的特征信息。In a possible implementation manner, the third processing unit is further configured to update the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.
在一种可能的实现方式中,所述目标ROI的特征信息是根据所述目标ROI对应的第一图像的特征信息和至少一个第三图像的特征信息确定的,所述至少一个第三图像在时域上位于第一图像和第二图像之间。In a possible implementation manner, the characteristic information of the target ROI is determined according to the characteristic information of the first image corresponding to the target ROI and the characteristic information of at least one third image. The at least one third image is Time domain is located between the first image and the second image.
在一种可能的实现方式中,所述装置还包括:In a possible implementation manner, the apparatus further includes:
第一初始化单元,用于在第一预设时间段后,重新计算所述目标ROI;或者A first initialization unit, configured to recalculate the target ROI after a first preset time period; or
第二初始化单元,用于当所述目标ROI的跟踪置信度小于置信度阈值的情况下,重新计算所述目标ROI,其中,所述跟踪置信度用于指示所述目标ROI的跟踪精确度,所述跟踪置信度与跟踪精确度成正比。A second initialization unit, configured to recalculate the target ROI when the tracking confidence of the target ROI is less than a confidence threshold, where the tracking confidence is used to indicate the tracking accuracy of the target ROI, The tracking confidence is directly proportional to the tracking accuracy.
在一种可能的实现方式中,所述特征信息包括方向梯度hog信息、颜色lab信息、卷积神经网络CNN信息中的一项或者多项。In a possible implementation manner, the feature information includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information.
第四方面,本发明实施例提供了一种电子设备,其特征在于,包括图像传感器、和上述第一方面中提供的任意一种所述的对焦装置;其中According to a fourth aspect, an embodiment of the present invention provides an electronic device, including an image sensor and the focusing device according to any one of the foregoing first aspects; wherein
所述图像传感器,用于采集图像数据;The image sensor is used to collect image data;
所述图像信号处理器,用于基于所述图像数据生成所述第一图像。The image signal processor is configured to generate the first image based on the image data.
在一种可能的实现方式中,所述电子设备还包括:存储器,用于存储程序指令;所述程序指令被所述处理器执行。In a possible implementation manner, the electronic device further includes: a memory for storing program instructions; and the program instructions are executed by the processor.
第五方面,本申请提供一种对焦装置,该对焦装置具有实现上述任意一种对焦方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a fifth aspect, the present application provides a focusing device having the function of implementing any of the above-mentioned focusing methods. This function can be realized by hardware, and can also be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.
第六方面,本申请提供一种终端,该终端包括处理器,处理器被配置为支持该终端执行第二方面提供的一种对焦方法中相应的功能。该终端还可以包括存储器,存储器用于与处理器耦合,其保存终端必要的程序指令和数据。该终端还可以包括通信接口,用于该终端与其它设备或通信网络通信。In a sixth aspect, the present application provides a terminal. The terminal includes a processor, and the processor is configured to support the terminal to perform a corresponding function in a focusing method provided in the second aspect. The terminal may further include a memory, which is used for coupling with the processor, and stores the program instructions and data necessary for the terminal. The terminal may further include a communication interface for the terminal to communicate with other devices or a communication network.
第七方面,本申请提供一种计算机存储介质,所述计算机存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述第二方面中任意一项所述的对焦方法流程。In a seventh aspect, the present application provides a computer storage medium that stores a computer program that, when executed by a processor, implements the focusing method flow described in any one of the second aspects.
第八方面,本发明实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被计算机执行时,使得计算机可以执行上述第二方面中任意一项所述的对焦方法流程。In an eighth aspect, an embodiment of the present invention provides a computer program. The computer program includes instructions. When the computer program is executed by a computer, the computer can execute the focusing method process according to any one of the second aspects.
第九方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于实现上述第二方面中任意一项所述的对焦方法流程所涉及的功能。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存对焦方法必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其它分立器件。In a ninth aspect, the present application provides a chip system that includes a processor, and is configured to implement functions involved in the focusing method process in any one of the foregoing second aspects. In a possible design, the chip system further includes a memory, and the memory is configured to store program instructions and data necessary for the focusing method. The chip system can be composed of chips, and can also include chips and other discrete devices.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本发明实施例提供的一种对焦装置的结构示意图;1 is a schematic structural diagram of a focusing device according to an embodiment of the present invention;
图2为本发明实施例提供的一种第一图像示意图;FIG. 2 is a schematic diagram of a first image according to an embodiment of the present invention; FIG.
图3是本发明实施例提供的另一种对焦装置的结构示意图;3 is a schematic structural diagram of another focusing device according to an embodiment of the present invention;
图4是本发明实施例提供的一种对焦装置的功能原理示意图;4 is a schematic diagram of a functional principle of a focusing device according to an embodiment of the present invention;
图5是本发明实施例所提供的一种SSD网络实现过程示意图;5 is a schematic diagram of an SSD network implementation process provided by an embodiment of the present invention;
图6是本发明实施例所提供的目标ROI的筛选示意图;6 is a schematic diagram of screening a target ROI provided by an embodiment of the present invention;
图7是本发明实施例所提供的一种目标ROI确定流程示意图;FIG. 7 is a schematic flowchart of determining a target ROI according to an embodiment of the present invention; FIG.
图8是本发明实施例所提供的目标ROI跟踪流程示意图;FIG. 8 is a schematic flowchart of a target ROI tracking process according to an embodiment of the present invention; FIG.
图9是本发明实施例所提供的一种目标ROI跟踪示意图;9 is a schematic diagram of target ROI tracking provided by an embodiment of the present invention;
图10是本发明实施例提供的一种目标ROI的特征信息更新示意图;10 is a schematic diagram of updating feature information of a target ROI according to an embodiment of the present invention;
图11是本发明实施例提供的一种神经网络处理器硬件结构图;11 is a hardware structural diagram of a neural network processor according to an embodiment of the present invention;
图12是本发明实施例提供的一种对焦方法的流程示意图;12 is a schematic flowchart of a focusing method according to an embodiment of the present invention;
图13是本发明实施例提供的又一种对焦装置的结构示意图。FIG. 13 is a schematic structural diagram of another focusing device according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例进行描述。The embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", and "fourth" in the description and claims of the present application and the drawings are used to distinguish different objects, not to describe a specific order . Furthermore, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device containing a series of steps or units is not limited to the listed steps or units, but optionally also includes steps or units that are not listed, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实 施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference to "an embodiment" herein means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are they separate or alternative embodiments that are mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。The terms “component”, “module”, “system” and the like used in this specification are used to indicate computer-related entities, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and / or a computer. By way of illustration, both an application running on a computing device and a computing device can be components. One or more components can reside within a process and / or thread of execution, and a component can be localized on one computer and / or distributed between 2 or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. A component may, for example, be based on a signal having one or more data packets (e.g., data from two components that interact with another component between a local system, a distributed system, and / or a network, such as the Internet that interacts with other systems through signals) Communicate via local and / or remote processes.
首先,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。First, some terms in this application are explained so as to facilitate understanding by those skilled in the art.
(1)感兴趣区域(region of interested,ROI),机器视觉、图像处理中,从被处理的图像以方框、圆、椭圆、不规则多边形等方式勾勒出需要处理的区域,称为感兴趣区域。(1) Region of interest (ROI). In machine vision and image processing, the area to be processed is outlined from the processed image in the form of boxes, circles, ellipses, and irregular polygons. It is called interest. region.
(2)人工智能(Artificial Intelligence,AI),是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。(2) Artificial Intelligence (AI) is a theory, method, technology, and method that uses digital computers or digital computer-controlled machines to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. operating system. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic theories of AI.
(3)卷积神经网络(Convolutional Neural Network,CNN)是一种多层的神经网络,每层有多个二维平面组成,而每个平面由多个独立神经元组成,每个平面的多个神经元共享权重,通过权重共享可以降低神经网络中的参数数目。目前,在卷积神经网络中,处理器进行卷积操作通常是将输入信号特征与权重的卷积,转换为信号矩阵与权重矩阵之间的矩阵乘运算。在具体矩阵乘运算时,对信号矩阵和权重矩阵进行分块处理,得到多个分形(Fractional)信号矩阵和分形权重矩阵,然后对多个分形信号矩阵和分形权重矩阵进行矩阵乘和累加运算。(3) Convolutional Neural Network (CNN) is a multi-layer neural network. Each layer consists of multiple two-dimensional planes, and each plane consists of multiple independent neurons. The neurons share weights, and the number of parameters in the neural network can be reduced by weight sharing. At present, in a convolutional neural network, a processor performing a convolution operation usually converts a convolution of an input signal feature and a weight into a matrix multiplication operation between a signal matrix and a weight matrix. In the specific matrix multiplication operation, the signal matrix and the weight matrix are divided into blocks to obtain multiple Fractional signal matrices and fractal weight matrices, and then matrix multiplication and accumulation are performed on the multiple fractal signal matrices and fractal weight matrices.
(3)图像信号处理(Image Signal Processing,ISP),主要用来对前端图像传感器输出信号处理的单元,以匹配不同厂商的图象传感器。相机用图像处理器ISP(Image Signal Processor)。被管道化的图像处理专用引擎可以高速处理图像信号。也搭载了为了实现Auto Exposure/Auto Focus/Auto White Balance评测的专用电路。(3) Image Signal Processing (ISP) is a unit that is mainly used to process the output signal of the front-end image sensor to match the image sensors of different manufacturers. Image processor (ISP) for cameras. The pipelined image processing engine can process image signals at high speed. It is also equipped with a dedicated circuit for the evaluation of Auto Exposure / Auto Focus / Auto White Balance.
(4)交并比(Intersection-over-Union,IoU),目标检测中使用的一个概念,是产生的候选框(candidate bound)与原标记框(ground truth bound)的交叠率,即它们的交集与并集的比值。最理想情况是完全重叠,即比值为1。(4) Intersection-over-Union (IoU), a concept used in object detection, is the overlap rate between the generated candidate frame and the ground truth frame, that is, their The ratio of intersection to union. Ideally, they are completely overlapping, that is, the ratio is 1.
为了便于理解本发明实施例,以下示例性列举本发明实施例所解决的技术问题以及对应的实际应用场景,常见的拍摄场景以及对应的对焦方法包括如下。In order to facilitate understanding of the embodiments of the present invention, the following technically addresses the technical problems solved by the embodiments of the present invention and the corresponding actual application scenarios. Common shooting scenarios and corresponding focusing methods include the following.
场景一,静态场景的技术方案: Scenario 1, technical solution of static scenario:
1)中心对焦方法,事先设定固定的中心位置作为对焦区域。1) The center focus method. A fixed center position is set in advance as the focus area.
2)用户手动touch画面中的目标位置作为对焦区域。2) The user manually touches the target position in the screen as the focus area.
上述静态场景下的对焦方案缺点:Disadvantages of the focusing solution in the above static scene:
1)中心对焦区域有限。当主体目标偏离中心时,无法将焦点聚焦在目标上;1) The center focus area is limited. When the subject's target is off-center, the focus cannot be focused on the target;
2)用户手动选择目标对焦,AF算法需要重新配置对焦点,拉长了对焦时间和用户拍照时间,当目标开始运动时,焦点无法实时跟随目标运动。2) The user manually selects the target focus. The AF algorithm needs to reconfigure the focus point, which lengthens the focusing time and the user's photo taking time. When the target starts to move, the focus cannot follow the target movement in real time.
场景二,动态场景的拍摄的技术方案:Scenario 2: Technical solution for shooting dynamic scenes:
1)基于特征点检测的目标的追焦方法,这种方法是实时检测画面中的特征点,然后将焦点设置到特征点上。1) Focus tracking method based on the detection of the feature points. This method detects the feature points in the picture in real time, and then sets the focus on the feature points.
2)基于运动检测的目标追焦方法,通过前后两帧图像的内容变化,快速识别拍摄场景中的运动物体,并实时输出运动区域给AF算法,然后将对焦点实时调整到运动区域实现运动目标追焦;另外,现有技术实现了人工智能伺服自动对焦功能,在对运动的被摄物体进行高速持续对焦的模式下,半按快门捕捉取景器内的被摄物体,检测其运动轨迹。单反中内置的自动对焦感应器能够识别被摄物体是静止的还是运动的,并辨别其移动方向,从而在进行体育运动、儿童或动物等题材的拍摄时,实现精确的对焦。2) Target tracking method based on motion detection, through the content changes of the two frames before and after, quickly identify the moving objects in the shooting scene, and output the moving area to the AF algorithm in real time, and then adjust the focus point to the moving area in real time to achieve the moving target In addition, in the prior art, an artificial intelligence servo autofocus function is implemented in the prior art. In a high-speed continuous focusing mode of a moving subject, half-press the shutter to capture the subject in the viewfinder and detect its movement track. The built-in autofocus sensor in the SLR can identify whether the object is stationary or moving, and identify its moving direction, so that it can achieve accurate focus when shooting sports, children or animals.
上述动态场景下的对焦方案缺点:Disadvantages of the focusing scheme in the above dynamic scene:
1)使用特征点检测的目标追焦方法,容易将背景纹理丰富的地方检测出来,这样并不能把焦点真正的聚焦在目标上。1) Using the focus tracking method of feature point detection, it is easy to detect the place with rich background texture, so that the focus cannot be really focused on the target.
2)基于运动目标检测方法的自动追焦方法:当运动目标周围背景变化时容易检测出运动区域,因此容易误触发、误对焦;运动目标轨迹不平滑,跳变严重,导致对焦不连续;当拍摄相机运动或者不稳定时,画面中容易检出运动物体,而这时候拍摄目标反而是静止的,因此容易导致误对焦。2) Auto focus tracking method based on moving target detection method: It is easy to detect the moving area when the background around the moving target changes, so it is easy to false trigger and misfocus; the moving target trajectory is not smooth, the jump is serious, resulting in discontinuous focusing; when When the camera is moving or unstable, it is easy to detect moving objects in the picture, but the shooting target is still at this time, so it is easy to cause misfocus.
因此,针对上述两个场景,本发明实施例主要解决的问题和应用场景主要包括以下:Therefore, for the above two scenarios, the problems and application scenarios that the embodiments of the present invention mainly solve include the following:
1、当拍摄静态场景时,拍摄目标物体不在中心时的对焦区域选择问题,采用AI物体检测算法,检测画面中的主体物体,然后将该主体物体区域输入给目标跟踪算法,实时监控目标的状态,当目标静止时,AF算法直接将对焦点设置到主体目标物体上做稳定对焦,当目标开始运动时,跟踪算法会实时跟随目标运动,AF算法会实时做跟踪对焦。1. When shooting a static scene, the problem of selecting the focus area when the target object is not in the center, AI object detection algorithm is used to detect the main object in the picture, and then the main object area is input to the target tracking algorithm to monitor the status of the target in real time When the target is stationary, the AF algorithm directly sets the focus on the main target object to stabilize the focus. When the target starts to move, the tracking algorithm will follow the target's movement in real time, and the AF algorithm will do the tracking focus in real time.
2、当拍摄动态场景时,AI物体检测算法结合运动目标检测算法综合输出当前画面中的主体物体,然后目标跟踪算法实时监控输出运动目标的位置区域和大小,解决运动目标误识别、运动目标不平滑、目标跟踪不稳定,对焦不连续等问题。2.When shooting dynamic scenes, the AI object detection algorithm combined with the moving target detection algorithm comprehensively outputs the main object in the current picture, and then the target tracking algorithm monitors the position area and size of the output moving target in real time to solve the misidentification of the moving target and the target Problems such as smoothness, unstable target tracking, and discontinuous focus.
可以理解的是,上述应用场景的只是本发明实施例中的几种示例性的实施方式,本发明实施例中的应用场景包括但不仅限于以上应用场景。It can be understood that the foregoing application scenarios are only a few exemplary implementations in the embodiments of the present invention, and the application scenarios in the embodiments of the present invention include but are not limited to the above application scenarios.
基于上述,下面结合本发明实施例提供的对焦装置以及相关设备进行描述。请参见图1,图1是本发明实施例提供的一种对焦装置的结构示意图,该对焦装置10中可包括处理器101、以及耦合于该处理器101的神经网络处理器102和图像信号处理器103;其中,Based on the above, the following describes in conjunction with the focusing device and related equipment provided by the embodiments of the present invention. Please refer to FIG. 1. FIG. 1 is a schematic structural diagram of a focusing device according to an embodiment of the present invention. The focusing device 10 may include a processor 101, a neural network processor 102 and image signal processing coupled to the processor 101.器 103; wherein,
图像信号处理器(Image Signal Processing,ISP)103,用于生成第一图像,可以匹配不 同厂商的图像传感器,以用来对前端图像传感器输出的图像数据进行处理,根据图像数据生成对应的图像信号。Image Signal Processor (ISP) 103 is used to generate the first image, which can match the image sensors of different manufacturers to process the image data output by the front-end image sensor, and generate corresponding image signals based on the image data .
神经网络处理器(Neutral Processing Unit,NPU)102,用于获取所述第一图像中的第一感兴趣区域ROI集合,所述第一ROI集合包括一个或者多个第一ROI,每个第一ROI中包括一个拍摄对象。例如,拍摄对象可以是任何物体,比如人物、动物、建筑、植物等,当神经网络处理器102识别出第一图像中有一朵花、一个人和一条狗时,那么第一ROI集合则包括三个第一ROI,分别为植物、人物和动物。如图2所示,图2是本发明实施例提供的一种第一图像示意图,图2中,NPU识别出人脸(区域1)、狗脸(区域3)、花(区域4)、以及桌子(区域5)均为第一ROI。A neural network processor (Neutral Processing Unit, NPU) 102, configured to obtain a first region of interest ROI set in the first image, where the first ROI set includes one or more first ROIs, and each first The ROI includes a subject. For example, the subject can be any object, such as a person, an animal, a building, a plant, etc. When the neural network processor 102 recognizes that there is a flower, a person, and a dog in the first image, the first ROI set includes The three first ROIs are plants, people, and animals. As shown in FIG. 2, FIG. 2 is a schematic diagram of a first image provided by an embodiment of the present invention. In FIG. 2, the NPU recognizes a human face (area 1), a dog face (area 3), a flower (area 4), and The table (area 5) is the first ROI.
处理器(Central Processing Unit,CPU)101,用于获取所述第一图像中第二ROI集合,并基于所述第一ROI集合和所述第二ROI集合确定所述第一图像中的目标ROI;确定所述目标ROI的特征信息;根据所述目标ROI的特征信息,识别所述目标ROI在所述图像信号处理器103生成的第二图像中的位置信息和大小信息,以及根据所述位置信息和大小信息进行对焦。其中,所述第二ROI集合包括一个或多个第二ROI,每个第二ROI为运动区域。例如,通过第一图像之前的某一帧或某几帧图像以及第一图像,检测到一条小狗在运动,那么将该小狗在第一图像中所在的区域确定为第二ROI,可以理解的是,当检测到视场中有多个物体在运动时,也可以确定出多个第二ROI。所述第一图像在时域上位于所述第二图像之前,即通过在先采集生成的图像中综合AI识别和运动检测所确定的目标ROI的特征信息,作为后续跟踪该目标ROI依据,以进行实时的跟踪对焦。可以理解的是,当前检测到第一图像中没有物体运动,那么第二ROI集合也可能是空集,此时相当于是静态拍摄场景。如图2所示,CPU通过运动检测出人在动,因此识别出人物所在的区域2为运动区域即第二ROI。A processor (Central Processing Unit) 101 is configured to obtain a second ROI set in the first image, and determine a target ROI in the first image based on the first ROI set and the second ROI set. Determining characteristic information of the target ROI; identifying position information and size information of the target ROI in the second image generated by the image signal processor 103 according to the characteristic information of the target ROI, and according to the position Information and size information to focus. The second ROI set includes one or more second ROIs, and each second ROI is a motion region. For example, if a puppy is moving through a frame or frames before the first image and the first image, then the area where the puppy is located in the first image is determined as the second ROI. It can be understood It is true that when multiple objects in the field of view are detected, multiple second ROIs can also be determined. The first image is located before the second image in the time domain, that is, the feature information of the target ROI determined by integrating AI recognition and motion detection in the previously collected and generated image is used as a basis for subsequent tracking of the target ROI. Real-time tracking focus. It can be understood that if no object movement is detected in the first image, the second ROI set may also be an empty set, which is equivalent to a static shooting scene. As shown in FIG. 2, the CPU detects that a person is moving through the motion, and thus recognizes that the region 2 where the character is located is a motion region, that is, a second ROI.
可以理解的是,处理器101还用于例如运行通用操作系统软件,并在通用操作系统软件的作用下控制神经网络处理器102和图像信号处理器103进行对焦。例如,将图像信号处理器103生成的第一图像发送给神经网络处理器102进行第一ROI集合的获取,以及接收神经网络处理器102所获取的第一ROI集合等。进一步地,处理器101还用于完成对焦过程相关的计算处理和控制等。It can be understood that the processor 101 is further configured to, for example, run a general operating system software, and control the neural network processor 102 and the image signal processor 103 to perform focusing under the function of the general operating system software. For example, the first image generated by the image signal processor 103 is sent to the neural network processor 102 to obtain a first ROI set, and the first ROI set obtained by the neural network processor 102 is received. Further, the processor 101 is further configured to complete calculation processing and control related to the focusing process.
可选的,上述神经网络处理器也可以作为处理器101中的一部分集成在处理器101中;也可以为耦合于上述处理器101,且能实现获取所述第一ROI集合的其它功能芯片;同理,处理器101所执行的功能也可以分布在多个不同的功能芯片上执行,本发明实施例对此不作具体限定。Optionally, the aforementioned neural network processor may also be integrated in the processor 101 as a part of the processor 101; it may also be another functional chip coupled to the processor 101 and capable of obtaining the first ROI set; Similarly, the functions performed by the processor 101 may be distributed and executed on multiple different function chips, which is not specifically limited in the embodiment of the present invention.
请参见图3和图4,图3是本发明实施例提供的另一种对焦装置的结构示意图,图4是本发明实施例提供的一种对焦装置的功能原理示意图。该对焦装置10中可包括处理器101、耦合于该处理器101的神经网络处理器102和图像信号处理器103,以及耦合于所述图像信号处理器103的镜头104、图像传感器105和对焦用的音圈马达(Voice Coil Motor,VCM)106;其中,Please refer to FIGS. 3 and 4. FIG. 3 is a schematic structural diagram of another focusing device according to an embodiment of the present invention, and FIG. 4 is a functional principle schematic diagram of a focusing device according to an embodiment of the present invention. The focusing device 10 may include a processor 101, a neural network processor 102 and an image signal processor 103 coupled to the processor 101, and a lens 104, an image sensor 105, and a focusing device coupled to the image signal processor 103. Voice Coil Motor (VCM) 106;
镜头104,用于通过光学成像原理,将现实世界的光学信息聚焦于图像传感器上。例 如,镜头104可以为终端(如智能手机)的后置摄像头,前置摄像头,旋转摄像头等。The lens 104 is configured to focus the optical information of the real world on the image sensor through the principle of optical imaging. For example, the lens 104 may be a rear camera, a front camera, a rotary camera, etc. of a terminal (such as a smart phone).
图像传感器105,用于基于所述镜头103采集的光学信息,输出的图像数据,以提供给图像信号处理器103生成对应的图像信号。The image sensor 105 is configured to output image data based on optical information collected by the lens 103 to provide the image data to the image signal processor 103 to generate a corresponding image signal.
对焦马达106,可以包括机械结构,用于基于处理器101确定的目标ROI的位置信息和大小信息进行静态或者动态对焦。例如,若处理器101识别出目标ROI处于静止状态,则处理器101控制对焦马达106进行静态对焦;若处理器101识别出目标ROI处于运动状态,则处理器101控制该对焦马达106进行动态对焦。The focus motor 106 may include a mechanical structure for performing static or dynamic focusing based on the position information and size information of the target ROI determined by the processor 101. For example, if the processor 101 recognizes that the target ROI is in a stationary state, the processor 101 controls the focus motor 106 to perform static focusing; if the processor 101 recognizes that the target ROI is in a moving state, the processor 101 controls the focus motor 106 to perform dynamic focusing .
可以理解的是,关于处理器101、神经网络处理器102以及图像信号处理器103的功能,请参见上述图1中的相关描述,在此不再赘述。It can be understood that, for the functions of the processor 101, the neural network processor 102, and the image signal processor 103, refer to the related description in FIG. 1 described above, and details are not described herein again.
可选的,上述图1或图3中的对焦装置,可以位于终端(如智能手机、平板、智能可穿戴设备等)、智能拍照设备(智能相机、智能摄像机、智能追踪设备)、智能监控设备、航拍无人机中等,本申请对此不再一一列举。Optionally, the focusing device in FIG. 1 or FIG. 3 may be located in a terminal (such as a smart phone, a tablet, a smart wearable device, etc.), a smart camera device (a smart camera, a smart camera, a smart tracking device), and a smart monitoring device. , Aerial drones, etc., this application will not list them one by one.
本发明实施例,通过上述图1或图3的对焦装置中的ISP生成的图像帧,利用NPU进行AI物体检测得到一个或者多个候选的拍摄对象,以及利用处理器进行运动物体检测得到一个或者多个候选的运动区域,并综合检测到的拍摄对象以及运动区域确定最终要对焦的目标ROI,并基于该目标ROI的特征信息进行后续的跟踪对焦。即利用AI目标检测和运动目标检测,自动综合识别出视场FOV中的目标ROI,然后采用目标ROI跟踪算法精确计算目标ROI的实时运动轨迹和大小,最后通过自动对焦AF算法依据目标ROI的实时运动轨迹,做运动追焦。全程不需要用户手动干预选择且跟踪对焦精准,极大的提升了拍摄的体验和效果。In the embodiment of the present invention, one or more candidate shooting objects are obtained through AI object detection using the NPU through the image frames generated by the ISP in the focusing device of FIG. 1 or FIG. 3 described above, and one or more are obtained through moving object detection using the processor. A plurality of candidate motion regions are combined with the detected shooting objects and motion regions to determine a target ROI to be finally focused, and subsequent tracking and focusing are performed based on the characteristic information of the target ROI. That is, AI target detection and moving target detection are used to automatically and comprehensively identify the target ROI in the field of view FOV, and then use the target ROI tracking algorithm to accurately calculate the real-time motion trajectory and size of the target ROI. Finally, the auto-focus AF algorithm is based on the real-time target ROI. Movement track, do sports follow focus. The entire process does not require manual intervention by the user and the tracking focus is accurate, which greatly improves the shooting experience and effect.
在一种可能的实现方式中,上述对焦装置10(包括图1和图2中的对焦装置,后续不再赘述)中,神经网络处理器102获取第一图像中的第一ROI集合,具体实施方式可以如下:In a possible implementation manner, in the above-mentioned focusing device 10 (including the focusing device in FIG. 1 and FIG. 2, which will not be described later), the neural network processor 102 obtains the first ROI set in the first image, and specifically implements The method can be as follows:
神经网络处理器102采用AI物体检测算法获取画面(第一图像)中的目标物体即目标ROI,使用通用的结构(如resnet18,resnet26等结构的前若干层)作为基础网络,然后在这个基础上增加其它的层作为检测结构。其中分类基模型以提取图像的低层特征,保证低层特征能够能有较大的区分性,通过增加浅层特征的分类器,可以辅助提升分类性能。检测部分使得在不同层次的feature maps上输出一系列离散化的bounding boxes以及每一个box中包含物体实例的可能性(score)。最后进行非极大值抑制(Non-maximum suppression,NMS)算法得到最终的物体预测结果。进一步地,检测模型算法可以采用单镜头检测(single shot detection,SSD)框架,请参见图5,图5为本发明实施例所提供的一种SSD网络实现过程示意图,该网络实现过程可以包括如下主要步骤:The neural network processor 102 uses an AI object detection algorithm to obtain the target object in the picture (the first image), that is, the target ROI, and uses a general structure (such as the first few layers of structures such as resnet18, resnet26, etc.) as the basic network, and then on this basis Add other layers as the detection structure. The classification base model extracts the low-level features of the image to ensure that the low-level features can be distinguished. By adding a classifier of shallow features, it can help improve the classification performance. The detection part makes it possible to output a series of discretized bounding boxes on feature maps at different levels and the probability that each box contains an object instance. Finally, a non-maximum suppression (NMS) algorithm is performed to obtain the final object prediction result. Further, the detection model algorithm may adopt a single shot detection (SSD) framework. Please refer to FIG. 5. FIG. 5 is a schematic diagram of an SSD network implementation process provided by an embodiment of the present invention. The network implementation process may include the following: The main steps:
1、主体采用one-stage检测结构,避免了类似faster-rcnn大量的候选目标位置进入two stage,从而很大程度上提升了检测速度。1. The main body adopts a one-stage detection structure, which prevents a large number of candidate target positions similar to faster-rcnn from entering two stages, thereby greatly improving the detection speed.
2、采用多尺度特征映射,利用多尺度特征的方法,每层特征具有不同的感受野,从而可以适配检测不同大小尺度的目标,达到较好的性能。2. Using multi-scale feature mapping and multi-scale feature methods, each layer of features has different receptive fields, so that it can adapt to detect targets of different sizes and achieve better performance.
3、采用不同大小及比例的default boxes,default box决定着最后预测框的初始位置, 通过不同大小及比例,可以适配不同尺度及形状的主体物体,给出最优的初始值,使预测更加准确。3. Use different sizes and ratios of default boxes. The default boxes determine the initial position of the final prediction box. Through different sizes and ratios, it can adapt to different sizes and shapes of the main object, and give the optimal initial value to make the prediction more accurate. accurate.
由于AI物体检测算法运行在NPU上,考虑到功耗性能的限制,可以每10帧输出一次检测结果,可检测识别的物体类别包括:花朵、人、猫、狗、鸟、自行车、公共汽车、摩托车、卡车、小汽车、火车、船、马、风筝、气球、花瓶、碗、盘子、杯子、经典款手提包。而其中的拍摄物体所属的物体类别的优先级可以分四个等级,人为第一优先级,花为第二优先级,猫狗为第三优先级,其余为第四优先级。Since the AI object detection algorithm runs on the NPU, considering the limitation of power consumption performance, it can output detection results every 10 frames. The types of objects that can be detected include: flowers, people, cats, dogs, birds, bicycles, buses, Motorcycle, truck, car, train, boat, horse, kite, balloon, vase, bowl, plate, cup, classic handbag. The priority of the object category to which the shooting object belongs can be divided into four levels, the first priority is human, the second priority is flower, the third priority is cat and dog, and the fourth priority is the rest.
在一种可能的实现方式中,上述对焦装置10中处理器101获取所述第一图像中的第二ROI集合的具体实施方式可以如下:In a possible implementation manner, the specific implementation manner of the processor 101 in the focusing device 10 acquiring the second ROI set in the first image may be as follows:
处理器101可采用运动目标检测算法获取第二ROI集合。例如,运动目标检测算法每两帧做一次,即每两帧输出当前图像中的运动区域,可选的,还可以进一步输出运动速度等级,运动方向等。如图2所示,区域2为运动检测算法输出的运动区域即第二ROI,区域1为最终确定的目标ROI。The processor 101 may obtain a second ROI set by using a moving target detection algorithm. For example, the moving object detection algorithm is performed once every two frames, that is, the moving area in the current image is output every two frames. Optionally, the speed of the movement and the direction of the movement can be further output. As shown in FIG. 2, region 2 is the second ROI, which is the motion region output by the motion detection algorithm, and region 1 is the final target ROI.
在一种可能的实现方式中,上述对焦装置10中处理器10基于所述第一ROI集合和所述第二ROI集合确定所述第一图像中的目标ROI的具体实施方式可以为:处理器101从所述第一ROI集合中的一个或者多个第一ROI中确定有效第一ROI,从所述第二ROI集合中的一个或者多个第二ROI中确定有效第二ROI;并在所述有效第一ROI与所述有效第二ROI的交并比IoU大于或者等于预设阈值的情况下,将所述有效第一ROI确定为目标ROI;其中,所述有效第一ROI在所述第一图像的第一预设区域内;所述有效第二ROI在所述第一图像的第二预设区域内。进一步可选的,处理器101还在所述有效第一ROI与所述有效第二ROI的交并比小于预设阈值的情况下,将所述有效第二ROI与所述有效第一ROI中距离所述第一图像中心点更近的ROI确定为目标ROI。即当有效第一ROI和有效第二ROI之间的重叠区域较大时,则表明此时拍摄对象和运动区域的检测均较大可能包含有该有效第一区域,因此可以将有效第一区域作为目标ROI;当有效第一ROI和有效第二ROI之间的重叠区域较小时,则可能表明此时的检测有误、或者目标ROI发生漂移,因此可以选择其中距离中心点更近的ROI作为目标ROI。可选的,也可以根据其它计算规则进行目标ROI的选择,如结合有效第一ROI和有效第二ROI综合得到一个新的ROI,本申请不再一一列举。In a possible implementation manner, the specific implementation manner that the processor 10 in the focusing device 10 determines the target ROI in the first image based on the first ROI set and the second ROI set may be: a processor 101 Determine a valid first ROI from one or more first ROIs in the first ROI set, and determine a valid second ROI from one or more second ROIs in the second ROI set; and When the effective first ROI crosses the effective second ROI and the ratio IoU is greater than or equal to a preset threshold, determining the effective first ROI as the target ROI; wherein the effective first ROI is in the Within a first preset region of the first image; the effective second ROI is within a second preset region of the first image. Further optionally, in a case where the intersection ratio of the effective first ROI and the effective second ROI is less than a preset threshold, the processor 101 adds the effective second ROI to the effective first ROI. A ROI closer to the center point of the first image is determined as a target ROI. That is, when the overlapping area between the effective first ROI and the effective second ROI is large, it indicates that the detection of the subject and the moving area at this time may include the effective first area, so the effective first area can be As the target ROI; when the overlapping area between the effective first ROI and the effective second ROI is small, it may indicate that the detection is wrong or the target ROI is drifting, so the ROI closer to the center point can be selected as Target ROI. Optionally, the target ROI may also be selected according to other calculation rules, such as combining a valid first ROI and a valid second ROI to obtain a new ROI, which is not enumerated in this application.
请参见图6,图6为本发明实施例所提供的目标ROI的筛选示意图,例如,图6中的手机屏幕上显示的第一图像(摄像头的视场区域)的宽为width,高为height;针对拍摄对象识别,第一ROI在第一预设区域范围内代表有效,例如,针对第一预设区域范围,无效区域的长度或宽度w1=min(width,height)×0.2,此时,ROI2有效,ROI0和ROI1无效。针对运动区域识别,第二ROI在第二预设区域范围内有效,例如,针对第二预设区域范围,无效区域的长度或宽度w2=min(width,height)×0.1;此时,ROI1和ROI2有效,ROI0无效。Please refer to FIG. 6. FIG. 6 is a schematic diagram of screening a target ROI provided by an embodiment of the present invention. For example, a first image (field of view of a camera) displayed on a mobile phone screen in FIG. 6 has a width of width and a height of height. ; For the subject recognition, the first ROI is valid within the first preset area, for example, for the first preset area range, the length or width of the invalid area w1 = min (width, height) × 0.2, at this time, ROI2 is valid, ROI0 and ROI1 are invalid. For the recognition of the moving region, the second ROI is valid within the second preset region. For example, for the second preset region, the length or width of the invalid region w2 = min (width, height) × 0.1; at this time, ROI1 and ROI2 is valid, ROI0 is invalid.
进一步可选的,所述有效第一ROI在所述第一图像的第一预设区域内的一个或者多个第一ROI中具有最高评估分值;和/或所述有效第二ROI在所述第一图像的第二预设区域 内的一个或者多个第二ROI中具有最高评估分值;其中,每个ROI的评估分值满足如下至少一项:与该ROI的面积成正比,与该ROI距所述第一图像的中心点的距离成反比,与该ROI所属的物体类别的优先级成正比。即当通过对应的预设区域筛选后,仍然可能存在多个ROI时,此时,可以通过ROI的面积、离第一图像的中心点的距离以及拍摄对象所属的类别的优先级高低进行判断,从中选出跟踪对焦可能性更高的ROI。例如离中心点越近,面积越大,且类别属于人的ROI被作为跟踪目标ROI的可能性更大。又例如也可以根据当前的拍摄模式,设置不同物体类别的优先级,如人像模式下,人物的优先级最高,风景模式下,植物或建筑的优先级最高等。Further optionally, the effective first ROI has the highest evaluation score in one or more first ROIs in the first preset region of the first image; and / or the effective second ROI is in the The one or more second ROIs in the second preset region of the first image have the highest evaluation score; wherein the evaluation score of each ROI satisfies at least one of the following: proportional to the area of the ROI, and The distance of the ROI from the center point of the first image is inversely proportional to the priority of the object category to which the ROI belongs. That is, when multiple ROIs may still exist after filtering through the corresponding preset regions, at this time, the area of the ROI, the distance from the center point of the first image, and the priority of the category to which the subject belongs can be determined. Select an ROI from which the tracking focus is more likely. For example, the closer to the center point, the larger the area, and the more likely the ROI belonging to a person is used as the tracking target ROI. For another example, the priority of different object categories can also be set according to the current shooting mode. For example, in portrait mode, people have the highest priority, and in landscape mode, plants or buildings have the highest priority.
请参见图7,图7是本发明实施例所提供的一种目标ROI确定流程示意图。图7中,通过NPU进行AI物体检测,获得了第一ROI集合,以及通过CPU进行运动目标检测获得了第二ROI集合。由于此时检测到的第一ROI和第二ROI可能均有多个,且识别精度、准确度较低,存在部分ROI不需要对焦(例如,拍摄背景画面中的花朵、拍摄背景中无意间乱入的运动对象等)。因此,需要经过CPU进行筛选。首先,处理器101分别检测第一ROI集合中的第一ROI,以及第二集合中的第二ROI是否有效。对于AI物体检测分支和/或运动区域检测分支,当只有一个ROI时,直接输出该ROI;存在多目标时,则可以根据如下公式对不同目标进行综合打分:1、分别获取每个ROI中的拍摄对象所属的物体类别的优先级Priority;2、每个第一ROI的大小area;3.每个第一ROI离画面中心的距离dist;综合打分Score=0.4×priority+0.4×area+0.2/dist,选择得分最高的ROI作为该分支的有效ROI,最后根据有效第一ROI和有效第二ROI之间的交并比,确定出目标ROI。Please refer to FIG. 7, which is a schematic flowchart of determining a target ROI according to an embodiment of the present invention. In FIG. 7, AI object detection is performed by the NPU to obtain a first ROI set, and moving object detection is performed by the CPU to obtain a second ROI set. Because there may be multiple first and second ROIs detected at this time, and the recognition accuracy and accuracy are low, there are some ROIs that do not need to be focused (for example, shooting flowers in the background picture, inadvertently straying in the shooting background) Moving objects, etc.). Therefore, it needs to be screened by the CPU. First, the processor 101 detects whether the first ROI in the first ROI set and the second ROI in the second set are valid. For the AI object detection branch and / or motion area detection branch, when there is only one ROI, the ROI is directly output; when there are multiple targets, different targets can be comprehensively scored according to the following formula: 1. Obtain the ROI in each ROI separately. Priority of the object category to which the subject belongs; 2. the size area of each first ROI; 3. the distance dist of each first ROI from the center of the screen; comprehensive score Score = 0.4 × priority + 0.4 × area + 0.2 / dist, select the ROI with the highest score as the effective ROI of the branch, and finally determine the target ROI according to the intersection ratio between the effective first ROI and the effective second ROI.
可选的,本发明实施例中的对焦装置10除了提供上述目标ROI的确定方式以外,也可以结合其它预设策略,在不同的场景下提供不同的目标ROI确定方式。例如,预设策略可以包括:1)用户指定优先;2)AI物体检测优先;3)运动检测优先;4)物体检测与运动检测联合选择等。Optionally, in addition to the above-mentioned method for determining the target ROI, the focusing device 10 in the embodiment of the present invention may also combine other preset strategies to provide different methods for determining the target ROI in different scenarios. For example, the preset strategy may include: 1) user-specified priority; 2) AI object detection priority; 3) motion detection priority; 4) joint selection of object detection and motion detection.
在一种可能的实现方式中,上述对焦装置10中处理器101所确定的目标ROI的特征信息包括方向梯度hog信息、颜色lab信息、卷积神经网络CNN信息中的一项或者多项。例如,只包括处理器101所提取的颜色特征Hog信息、只包括处理器101所提取的方向梯度hog信息、或者只神经网络处理器102所提取的CNN信息,也或者是上述三种信息中的任意两种,或者是三种的组合。需要强调的是,上述方向梯度hog信息和颜色lab信息可以通过处理器101提取,而CNN信息则可以通过神经网络处理器102进行提取,再通过神经网络处理器102发送给处理器101。In a possible implementation manner, the feature information of the target ROI determined by the processor 101 in the above-mentioned focusing device 10 includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information. For example, it only includes the color feature Hog information extracted by the processor 101, only the directional gradient hog information extracted by the processor 101, or only the CNN information extracted by the neural network processor 102, or it is one of the three types of information described above. Any two, or a combination of three. It should be emphasized that the above-mentioned direction gradient hog information and color lab information can be extracted by the processor 101, and CNN information can be extracted by the neural network processor 102, and then sent to the processor 101 through the neural network processor 102.
在一种可能的实现方式中,所述处理器101还基于所述目标ROI在历史图像中的位置和大小所对应的特征信息更新所述目标ROI的特征信息。在另一种可能的实现方式中,所述目标ROI的特征信息是根据所述目标ROI对应的第一图像的特征信息和至少一个第三图像的特征信息确定的,所述至少一个第三图像在时域上位于第一图像和第二图像之间。也即是上述对焦装置10中的处理器10在根据所述目标ROI的特征信息,识别所述目标ROI在所述图像信号处理器生成的第二图像中的位置信息和大小信息的过程中,将目标ROI在第一图像中的特征信息作为初始的特征信息,后续还基于所述目标ROI在跟踪过程中每一帧图像中的位置和大小所对应的特征信息更新所述初始的特征信息,以保证跟踪目标ROI 的精准性。进一步地,处理器101在第一预设时间段后,重新计算所述目标ROI;或者当所述目标ROI的跟踪置信度小于置信度阈值的情况下,重新计算所述目标ROI,其中,所述跟踪置信度用于指示所述目标ROI的跟踪精确度,所述跟踪置信度与跟踪精确度成正比。本发明实施例中,处理器101不仅要基于目标ROI的跟踪情况实时的更新特征信息,以更加精准的跟踪对焦,而且更新的特征信息具有时效性,当较长一段时间之后,或者当前跟踪的目标ROI置信度低的时候,就需要考虑初始化相关参数,进行新一轮的目标ROI的确认及跟踪。In a possible implementation manner, the processor 101 further updates the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image. In another possible implementation manner, the feature information of the target ROI is determined according to the feature information of the first image corresponding to the target ROI and the feature information of at least one third image, the at least one third image Located between the first image and the second image in the time domain. That is, the processor 10 in the focusing device 10 is in the process of identifying the position information and the size information of the target ROI in the second image generated by the image signal processor according to the characteristic information of the target ROI. Taking the feature information of the target ROI in the first image as initial feature information, and subsequently updating the initial feature information based on the feature information corresponding to the position and size of the target ROI in each frame of the image during the tracking process, To ensure the accuracy of tracking the target ROI. Further, the processor 101 recalculates the target ROI after the first preset time period; or when the tracking confidence of the target ROI is less than the confidence threshold, the target ROI is recalculated, where The tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence is directly proportional to the tracking accuracy. In the embodiment of the present invention, the processor 101 not only needs to update the feature information in real time based on the tracking condition of the target ROI to more accurately track the focus, but also the updated feature information is time-effective. After a long period of time, or the currently tracked When the confidence level of the target ROI is low, it is necessary to consider initializing related parameters to perform a new round of confirmation and tracking of the target ROI.
请参见图8,图8为本发明实施例所提供的目标ROI跟踪流程示意图。在目标ROI的特征提取完成后,处理器101根据预设规则选择某一种特征或者多种特征的组合以确定特征信息,经过规则判断后确定是否初始化跟踪器,如果不需要初始化跟踪器则直接进入跟踪计算,输出目标ROI的位置信息和大小信息,并输出目标的位置可能的响应图,最后基于目标ROI新的位置和大小更新特征信息等,主要可以包括如下几个步骤:Please refer to FIG. 8, which is a schematic diagram of a target ROI tracking process according to an embodiment of the present invention. After the feature extraction of the target ROI is completed, the processor 101 selects a certain feature or a combination of multiple features to determine the feature information according to a preset rule, and determines whether to initialize the tracker after the rule judgment. If the tracker does not need to be initialized, directly Enter the tracking calculation, output the position and size information of the target ROI, and output a possible response map of the target's position, and finally update the feature information based on the new position and size of the target ROI, which can mainly include the following steps:
1、特征选择:这部分可以根据不同需求选择不同的特征组合,例如单独采用hog特征,或者hog+lab+cnn组合使用;1. Feature selection: This part can choose different feature combinations according to different needs, such as using the hog feature alone, or a combination of hog + lab + cnn;
2、是否初始化?:2. Is it initialized? :
1)开始,启动跟踪系统,初始化跟踪器;1) Start, start the tracking system, and initialize the tracker;
2)基于跟踪后处理得到的置信度,当mConfidence<0.2;并且主体目标选择模块输出新的ROI时,需要重新初始化跟踪器;2) Based on the confidence obtained from the tracking post-processing, when mConfidence <0.2; and the main target selection module outputs a new ROI, the tracker needs to be re-initialized;
3、跟踪后处理:3. Post-processing after tracking:
1)通过跟踪计算模块后,跟踪计算算法采用相关滤波算法,例如KCF(Kernel Correlation Filte)、ECO(Efficient Convolution Operators)等,针对每一帧图像输出的响应图为w×h的浮点二维数组F[w][h]可以记为F w,h,已归一化到0到1.0范围内;其中,响应图反映目标ROI在画面中的位置可能分布图,最大点即为目标ROI所在的位置,通过响应图可以反映目标ROI跟踪的置信度水平。 1) After passing the tracking calculation module, the tracking calculation algorithm uses related filtering algorithms, such as KCF (Kernel Correlation Filte), ECO (Efficient Convolution Operators), etc. The response graph for each frame of image output is w × h floating point two-dimensional The array F [w] [h] can be described as F w, h , which has been normalized to the range of 0 to 1.0. Among them, the response map reflects the possible distribution of the target ROI in the picture, and the largest point is where the target ROI is located. The position can reflect the confidence level of the target ROI tracking through the response graph.
2)置信度分析:2) Confidence analysis:
(a)依据响应图计算最大值Fmax作为当前帧的跟踪置信度;(a) Calculate the maximum value Fmax according to the response graph as the tracking confidence of the current frame;
Confidence=max(F[w][h]);Confidence = max (F [w] [h]);
(b)平均相关峰能量指标为average peak-to correlation energy(APCE),其中(b) The average correlation peak energy index is average peak-to-correlation energy (APCE), where
Figure PCTCN2018103370-appb-000001
Figure PCTCN2018103370-appb-000001
其中,F max则为max(F[w][h]),即为(F[w][h])的最大取值;F min则为min(F[w][h]),即为(F[w][h])的最小取值;∑ w,h(F w,h-F min) 2表示遍历F w,h的每一个值和最小值相减再做平方运算,最终求和。该指标可用于表征:当计算出的该指标的值与历史平均值相比急剧下降时,就代表当前帧的目标ORI的位置和大小不可信,例如目标ROI被遮挡或者丢失等。 Among them, F max is max (F [w] [h]), which is the maximum value of (F [w] [h]); F min is min (F [w] [h]), which is The minimum value of (F [w] [h]); ∑ w, h (F w, h -F min ) 2 means traverse each value of F w, h and subtract the minimum value, then do the square operation, and finally find with. This indicator can be used to characterize: when the calculated value of this indicator drops sharply compared with the historical average, it represents that the position and size of the target ORI of the current frame are not reliable, such as the target ROI is blocked or lost.
(c)计算每一次跟踪过程中的平均置信度AverageConfidence和平均AverageApce;假设当帧帧为第N帧,则当前帧的AverageConfidence和AverageApce为:(c) Calculate the average confidence AverageConfidence and AverageApce during each tracking process; assuming that the frame is the Nth frame, the AverageConfidence and AverageApce of the current frame are:
Figure PCTCN2018103370-appb-000002
Figure PCTCN2018103370-appb-000002
Figure PCTCN2018103370-appb-000003
Figure PCTCN2018103370-appb-000003
3)目标ROI特征信息更新策略:3) Target ROI feature information update strategy:
请参见图9,图9为本发明实施例所提供的一种目标ROI跟踪示意图,其中,如图9的a部分所示,目标ROI初始位置在1,在画面中从1到6的运动过程中,目标跟踪算法模块实时输出目标在每一帧中的位置和大小。这时候跟踪置信度较高,需要实时更新目标ROI的特征信息。Please refer to FIG. 9, which is a schematic diagram of target ROI tracking provided by an embodiment of the present invention. As shown in part a of FIG. 9, the initial position of the target ROI is 1, and the movement process from 1 to 6 in the picture The target tracking algorithm module outputs the position and size of the target in each frame in real time. At this time, the tracking confidence is high, and the feature information of the target ROI needs to be updated in real time.
如图9的b部分所示,目标ROI在2和4的位置发生遮挡丢失时,算法输出置信度较低,不满足特征信息更新条件,这时候不能更新目标ROI的特征信息,否则特征信息会学习到背景或其它干扰物的特征信息,因此需要等到目标ROI重新出现时才能继续更新特征信息。As shown in part b of Fig. 9, when the occlusion loss of the target ROI occurs at positions 2 and 4, the output confidence of the algorithm is low and the feature information update conditions are not met. At this time, the feature information of the target ROI cannot be updated, otherwise the feature information will be lost. Learned the feature information of the background or other interfering objects, so you need to wait until the target ROI reappears before continuing to update the feature information.
本发明实施例中,处理器101根据第一图像所确定的目标ROI作为初始ROI输入,通过特征提取,特征选择,跟踪计算后,实时计算目标ROI在后续每一帧图像(包括第一图像)中的位置和大小。其中,判断特征信息是否更新的依据如下:In the embodiment of the present invention, the processor 101 uses the target ROI determined by the first image as an initial ROI input. After feature extraction, feature selection, and tracking calculation, the target ROI is calculated in real time for each subsequent frame image (including the first image). Position and size in. The basis for judging whether the feature information is updated is as follows:
计算当前帧的跟踪置信度为:mConfidence;Calculate the tracking confidence of the current frame as: mConfidence;
计算历史平均置信度:mHistoryAverageConfidence;Calculate historical average confidence: mHistoryAverageConfidence;
计算当前帧的相关峰能量:mApce;Calculate the correlation peak energy of the current frame: mApce;
计算历史平均相关峰能量:mHistoryAverageApce;Calculate the historical average correlation peak energy: mHistoryAverageApce;
①如果满足以下条件公式,则为满足特征信息更新条件,更新特征信息:① If the following conditional formula is satisfied, the feature information is updated in order to satisfy the feature information update condition:
mConfidence>0.7×mHistoryAverageConfidenc且mApce>0.45×mHistoryAverageApce,mConfidence> 0.7 × mHistoryAverageConfidenc and mApce> 0.45 × mHistoryAverageApce,
②如果不满足上述条件公式,且mConfidence>0.2,则为满足目标ROI特征信息不更新条件,即当前图像帧的特征信息不会参与到目标ROI特征信息的更新,以优化跟踪系统,避免目标ROI跟踪漂移;② If the above conditional formula is not satisfied, and mConfidence> 0.2, then the target ROI feature information is not updated, that is, the feature information of the current image frame will not participate in the update of the target ROI feature information to optimize the tracking system and avoid the target ROI Tracking drift
③如果mConfidence<0.2;并且处理器101输出新的ROI时(例如当处理器101每10帧输出一次新的目标ROI),则此时可以触发处理器101重新确定目标ROI(包括NPU重新获取第一ROI集合以及CPU重新获取第二ROI集合),也即是重新完成跟踪的初始化更新。③ If mConfidence <0.2; and the processor 101 outputs a new ROI (for example, when the processor 101 outputs a new target ROI every 10 frames), then the processor 101 may be triggered to re-determine the target ROI (including the NPU reacquiring the first A ROI set, and the CPU reacquires the second ROI set), that is, the initialization update of the tracking is completed again.
4)实时目标信息输出:4) Real-time target information output:
通过跟踪算法模块后实时输出目标ROI的位置信息和大小信息,如下图的主体目标,对位置做约束处理:绿框为目标静止时的有效范围,这时候输出给AF算法做稳定对焦;红色虚线框为目标运动时的有效范围,这时候实时输出给AF算法做运动追焦After tracking the algorithm module, the position information and size information of the target ROI are output in real time. As shown in the main target of the figure below, the position is constrained: the green frame is the effective range when the target is stationary, at this time it is output to the AF algorithm for stable focusing; The frame is the effective range when the target is moving. At this time, the real-time output is output to the AF algorithm for motion tracking.
请参见图10,图10为本发明实施例提供的一种目标ROI的特征信息更新示意图。假设第一预设时间段内图像信号处理器103生成n帧图像,图10中以n=10为例,其中第1帧则可以对应本申请中的第一图像,第二图像则可以为后续第2、3、4……10帧图像中的任意一帧。具体地,Please refer to FIG. 10, which is a schematic diagram of updating feature information of a target ROI according to an embodiment of the present invention. Assume that the image signal processor 103 generates n frames of images in the first preset time period. In FIG. 10, n = 10 is taken as an example, where the first frame may correspond to the first image in this application, and the second image may be a subsequent image. Any one of the 2nd, 3rd, 4th ... 10th image. specifically,
图10中,第1帧(第一图像)经过处理器101确定第一ROI集合和第二ROI集合,再确定目标ROI之后,提取该目标ROI的特征信息,即为图10中的特征信息A,也是作 为目标ROI的初始识别特征信息;当图像信号处理器生成第2帧图像时,此时,先获取该第2帧图像的特征信息B;其中,获取特征信息B的方式可以是,基于目标ROI在第一帧图像中的位置和大小提取该位置和大小在第2帧图像中所对应的区域的特征信息,即为特征信息B,后续图像帧提取对应帧的目标ROI的特征信息的原理相同,不再赘述。然后处理器101将特征信息B与特征信息A进行关联比对,从而确定第1帧图像中所确定的目标ROI在第2帧图像中的位置和大小;与此同时根据特征信息A和特征信息B确定第2帧是否满足特征信息更新条件,如果满足特征信息更新条件,则利用公式特征信息A'=(k1×A+k2×B)更新所述特征信息;假设当判断出不满足上述特征信息更新条件,但也不满足初始化重启条件时,则继续使用最近一次更新的特征信息作为比对模型,或者当判断出满足初始化重启条件时,但未到达指定时间点(即处理器101输出新的目标ROI的时间点)时,也继续使用最近一次更新的特征信息作为比对模型;但是如果判断出满足初始化重启条件,并达到指定时间点时,则可以利用处理器101重新输出的目标ROI,重新进行新一轮的目标ROI的跟踪计算。可选的,上述特征信息更新公式中的k1=0.988,k2=0.012。本申请对特征信息更新的条件,以及更新公式不作具体限定。In FIG. 10, after the processor 101 determines the first ROI set and the second ROI set in the first frame (the first image), and then determines the target ROI, the feature information of the target ROI is extracted, that is, the feature information A in FIG. 10 Is also the initial identifying feature information of the target ROI; when the image signal processor generates the second frame image, at this time, the feature information B of the second frame image is first obtained; wherein the method of obtaining the feature information B may be based on, The position and size of the target ROI in the first frame image are extracted from the feature information of the position and size corresponding to the area in the second frame image, that is, the feature information B. The subsequent image frames are extracted from the feature information of the target ROI corresponding to the frame. The principle is the same and will not be repeated here. The processor 101 then compares the feature information B with the feature information A to determine the position and size of the target ROI determined in the first frame image in the second frame image; at the same time, according to the feature information A and the feature information B determines whether the second frame satisfies the feature information update condition. If the feature information update condition is met, the feature information is updated using the formula feature information A ′ = (k1 × A + k2 × B); assuming that it is determined that the above characteristics are not satisfied When the information update conditions are met but the initialization restart conditions are not met, the feature information of the latest update is used as the comparison model, or when it is determined that the initialization restart conditions are met, but the specified time point is not reached (that is, the processor 101 outputs a new Time point of the target ROI), it also continues to use the most recently updated feature information as the comparison model; however, if it is determined that the initialization restart conditions are met and the specified time point is reached, the target ROI re-output by the processor 101 can be used , And perform a new round of tracking ROI calculation. Optionally, k1 = 0.988 and k2 = 0.12 in the feature information update formula. The application does not specifically limit the conditions for updating the characteristic information and the update formula.
例如,图10中,第4帧图像中确定目标ROI的特征信息D,经过将第3帧更新得到的特征信息A”与特征信息D进行关联计算之后,判断出当前第4帧图像不满足特征信息更新条件(例如,此时目标ROI在第4帧被遮挡或漂移较大)。因此,第4帧的特征信息D则不参与到后续的特征信息的更新,所以还需要沿用第3帧所更新的特征信息,也即是在第5帧确定了特征信息E之后,仍然是与第3帧更新的特征信息进行关联计算。进一步地,假设特征信息E与第3帧更新的特征信息A”进行关联计算之后,判断出满足初始化重启条件时,则需要进一步判断处理器101是否输出新的目标ROI(也可以认为判断是否达到第一预设时间段),直到处理器101输出新的目标ROI,再进行初始化。例如图10中,需要等到第11帧再重新进行目标ROI的确定,也相当于初始化了特征信息。以下为图10中每一帧图像的特征信息更新的流程:For example, in FIG. 10, the feature information D of the target ROI is determined in the image of the fourth frame, and after the feature information A "updated from the update of the third frame is correlated with the feature information D, it is determined that the current image of the fourth frame does not meet the features Information update conditions (for example, at this time, the target ROI is blocked or drifted greatly in the fourth frame). Therefore, the feature information D of the fourth frame does not participate in the subsequent update of the feature information, so it is necessary to continue to use the information in the third frame. The updated feature information, that is, after the feature information E is determined in the fifth frame, is still associated with the feature information updated in the third frame. Further, it is assumed that the feature information E is updated with the feature information A updated in the third frame. " After performing the correlation calculation, when it is determined that the initialization restart condition is satisfied, it is necessary to further determine whether the processor 101 outputs a new target ROI (it can also be considered to determine whether the first preset time period has been reached) until the processor 101 outputs a new target ROI , And then initialize. For example, in FIG. 10, it is necessary to wait until the eleventh frame to determine the target ROI again, which is also equivalent to initializing the feature information. The following is the process of updating the feature information of each frame in Figure 10:
第1帧图像:特征信息AFirst frame image: Feature information A
第2帧图像:特征信息B→更新→特征信息A'=(k1×A+k2×B)Image of the second frame: feature information B → update → feature information A '= (k1 × A + k2 × B)
第3帧图像:特征信息C→更新→特征信息A”=(k1×A'+k2×C)Image of the third frame: feature information C → update → feature information A ”= (k1 × A ′ + k2 × C)
第4帧图像:特征信息D→未更新→特征信息A”=(k1×A'+k2×C)Image of the fourth frame: feature information D → not updated → feature information A ”= (k1 × A ′ + k2 × C)
第5帧图像:特征信息E→未更新(满足初始化重启条件)→特征信息A”=(k1×A'+k2×C)5th frame image: feature information E → not updated (initial restart conditions are met) → feature information A ”= (k1 × A '+ k2 × C)
第6帧图像:…… Frame 6 image: ...
第7帧图像:…… Frame 7 image: ...
第8帧图像:……Frame 8 image: ...
第9帧图像:……Frame 9 image: ...
第10帧图像:……Frame 10 image: ...
第11帧图像:重新计算特征信息AFrame 11 image: recalculate feature information A
……...
可以理解的是,针对图像信号处理器103生成的任意一帧图像均可以基于上述发明实施例进行跟踪对焦,并且进行特征信息的更新,在此不再穷举。It can be understood that, for any one frame image generated by the image signal processor 103, tracking and focusing can be performed based on the embodiment of the invention described above, and feature information is updated, which is not exhaustive here.
在一种可能的实现方式中,处理器101进入目标ROI跟踪对焦流程时,依据实时的目标ROI信息,判断当前目标ROI的运动状态,当目标处于静止状态时,进入稳定的目标ROI对焦,当目标ROI处于运动状态时,进入目标ROI跟踪对焦。例如,对于AF算法而言,使用目标检测算法+运动检测算法+Tracking算法可以解决跟踪目标运动时没有ROI信息以及目标静止后ROI丢失这两大问题。在利用Tracking算法,实时处理每帧图像输出ROI信息的情况下,AF算法可以直接根据ROI窗进行运动追焦,而当运动目标静止时,可以进行稳定对焦,可以解决目标不在中心时的焦点选择问题。In a possible implementation manner, when the processor 101 enters the target ROI tracking and focusing process, according to the real-time target ROI information, the current state of the target ROI is determined. When the target ROI is in motion, the target ROI is tracked and focused. For example, for the AF algorithm, using the target detection algorithm + motion detection algorithm + Tracking algorithm can solve the two major problems of no ROI information when tracking target movement and ROI loss after the target is stationary. In the case of using the Tracking algorithm to process the ROI information of each frame of image in real time, the AF algorithm can directly follow the ROI window for motion tracking, and when the moving target is stationary, it can perform stable focusing, which can solve the focus selection when the target is not in the center. problem.
基于图1和图3中对对焦装置10的结构描述,图11是本发明实施例提供的一种神经网络处理器硬件结构图,其中,Based on the structural description of the focusing device 10 in FIG. 1 and FIG. 3, FIG. 11 is a hardware structural diagram of a neural network processor according to an embodiment of the present invention.
神经网络处理器NPU 102作为协处理器挂载到CPU(如Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1203,通过控制器1204控制运算电路1203提取存储器中的矩阵数据并进行乘法运算。The neural network processor NPU 102 is mounted on the CPU (such as Host CPU) as a coprocessor, and the Host CPU assigns tasks. The core part of the NPU is an arithmetic circuit 1203. The controller 1204 controls the arithmetic circuit 1203 to extract matrix data in the memory and perform multiplication operations.
在一些实现中,运算电路1203内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1203是二维脉动阵列。运算电路1203还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1203是通用的矩阵处理器。In some implementations, the arithmetic circuit 1203 includes multiple processing units (Process Engines, PEs). In some implementations, the arithmetic circuit 1203 is a two-dimensional pulsating array. The arithmetic circuit 1203 may also be a one-dimensional pulsation array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1203 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1202中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1201中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器1208 accumulator中。For example, suppose there are an input matrix A, a weight matrix B, and an output matrix C. The operation circuit takes the data corresponding to the matrix B from the weight memory 1202, and buffers it on each PE in the operation circuit. The arithmetic circuit takes matrix A data from the input memory 1201 and performs matrix operations on the matrix B. Partial or final results of the obtained matrix are stored in the accumulator 1208 accumulator.
统一存储器1206用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器12012 Direct Memory Access Controller,DMAC被搬运到权重存储器1202中。输入数据也通过DMAC被搬运到统一存储器1206中。The unified memory 1206 is used to store input data and output data. The weight data is directly accessed to the controller 12012 through the storage unit, and the memory is accessed to the controller, and the DMAC is transferred to the weight memory 1202. The input data is also transferred to the unified memory 1206 through the DMAC.
BIU为Bus Interface Unit即,总线接口单元1210,用于AXI总线与DMAC和取指存储器1209 Instruction Fetch Buffer的交互。BIU stands for Bus Interface Unit, that is, the bus interface unit 1210, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch memory 1209.
总线接口单元1210(Bus Interface Unit,简称BIU),用于取指存储器1209从外部存储器获取指令,还用于存储单元访问控制器12012从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1210 (Bus Interface Unit, referred to as BIU) is used to fetch the instruction memory 1209 to obtain instructions from external memory, and is also used for the storage unit access controller 12012 to obtain the original data of the input matrix A or weight matrix B from the external memory.
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1206或将权重数据搬运到权重存储器1202中或将输入数据数据搬运到输入存储器1201中。The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1206 or the weight data to the weight memory 1202 or the input data data to the input memory 1201.
向量计算单元1207多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/FC层网络计算,如Pooling(池化),Batch Normalization(批归一化),Local Response Normalization(局部响应归一化)等。The vector calculation unit 1207 has a plurality of arithmetic processing units, and further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc., if necessary. It is mainly used for non-convolutional / FC layer network calculations in neural networks, such as Pooling, Batch Normalization, Local Normalization, and so on.
在一些实现种,向量计算单元能1207将经处理的输出的向量存储到统一缓存器1206。例如,向量计算单元1207可以将非线性函数应用到运算电路1203的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1207生成归一化的值、合并值,或 二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1203的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector calculation unit can 1207 store the processed output vector into the unified buffer 1206. For example, the vector calculation unit 1207 may apply a non-linear function to the output of the arithmetic circuit 1203, such as a vector of accumulated values, to generate an activation value. In some implementations, the vector calculation unit 1207 generates a normalized value, a merged value, or both. In some implementations, a vector of the processed output can be used as an activation input to the arithmetic circuit 1203, for example for use in subsequent layers in a neural network.
控制器1204连接的取指存储器(instruction fetch buffer)1209,用于存储控制器1204使用的指令;An instruction fetch memory 1209 connected to the controller 1204 is used to store instructions used by the controller 1204;
统一存储器1206,输入存储器1201,权重存储器1202以及取指存储器1209均为On-Chip存储器。外部存储器私有于该NPU硬件架构。The unified memory 1206, the input memory 1201, the weight memory 1202, and the fetch memory 1209 are all On-Chip memories. External memory is private to the NPU hardware architecture.
可以理解的是,图1和图3所述的关于NPU获取第一ROI集合,以及目标ROI的CNN特征提取等相关功能,均由上述NPU中相关的功能单元进行实现,在次不再赘述。It can be understood that the related functions such as obtaining the first ROI set by the NPU and the CNN feature extraction of the target ROI described in FIG. 1 and FIG. 3 are implemented by the relevant functional units in the NPU described above, and will not be repeated here.
请参见图12,图12是本发明实施例提供的一种对焦方法的流程示意图,该对焦方法,适用于上述图1和图3中的任意一种对焦装置以及包含所述对焦装置的设备。该方法可以包括以下步骤S201-步骤S205。Please refer to FIG. 12. FIG. 12 is a schematic flowchart of a focusing method according to an embodiment of the present invention. The focusing method is applicable to any one of the focusing devices in FIG. 1 and FIG. 3 and a device including the focusing device. The method may include the following steps S201-S205.
步骤S201:确定第一感兴趣区域ROI集合和第二ROI集合,所述第一ROI集合为从图像信号处理器生成的第一图像中获取的ROI集合,所述第一ROI集合包括一个或者多个第一ROI,每个第一ROI中包括一个拍摄对象;所述第二ROI集合为从所述第一图像中获取的ROI集合,所述第二ROI集合包括一个或多个第二ROI,每个第二ROI为运动区域;Step S201: Determine a first ROI set and a second ROI set, where the first ROI set is a ROI set obtained from a first image generated by an image signal processor, and the first ROI set includes one or more First ROIs, each of which includes a photographic subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more second ROIs, Each second ROI is a motion area;
步骤S202:基于所述第一ROI集合和所述第二ROI集合确定所述第一图像中的目标ROI;Step S202: determine a target ROI in the first image based on the first ROI set and the second ROI set;
在一种可能的实现方式中,所述基于所述第一ROI集合和所述第二ROI集合确定所述第一图像中的目标ROI,包括:In a possible implementation manner, the determining a target ROI in the first image based on the first ROI set and the second ROI set includes:
从所述第一ROI集合中的一个或者多个第一ROI中确定有效第一ROI,所述有效第一ROI在所述第一图像的预设区域内;Determining a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is within a preset area of the first image;
从所述第二ROI集合中的一个或者多个第二ROI中确定有效第二ROI,所述有效第二ROI在所述第一图像的预设区域内;Determining an effective second ROI from one or more second ROIs in the second ROI set, where the effective second ROI is within a preset area of the first image;
在所述有效第一ROI与所述有效第二ROI的交并比IoU大于或者等于预设阈值的情况下,将所述有效第一ROI确定为目标ROI。In a case where the intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as the target ROI.
在一种可能的实现方式中,所述方法还包括:In a possible implementation manner, the method further includes:
在所述有效第一ROI与所述有效第二ROI的交并比IoU小于预设阈值的情况下,将所述有效第二ROI与所述有效第一ROI中距离所述第一图像中心点更近的ROI确定为目标ROI。In a case where the intersection ratio IoU of the effective first ROI and the effective second ROI is less than a preset threshold, the effective second ROI and the effective first ROI are distanced from the first image center point. The more recent ROI is determined as the target ROI.
在一种可能的实现方式中,所述有效第一ROI在所述第一图像的预设区域内的一个或者多个第一ROI中具有最高评估分值;和/或所述有效第二ROI在所述第一图像的预设区域内的一个或者多个第二ROI中具有最高评估分值;其中,每个ROI的评估分值满足如下至少一项:与该ROI的面积成正比,与该ROI距所述第一图像的中心点的距离成反比,与该ROI所属的物体类别的优先级成正比。In a possible implementation manner, the effective first ROI has a highest evaluation score in one or more first ROIs within a preset area of the first image; and / or the effective second ROI The one or more second ROIs within the preset region of the first image have the highest evaluation score; wherein the evaluation score of each ROI satisfies at least one of the following: proportional to the area of the ROI, and The distance of the ROI from the center point of the first image is inversely proportional to the priority of the object category to which the ROI belongs.
步骤S203:确定所述目标ROI的特征信息;Step S203: determine the characteristic information of the target ROI;
在一种可能的实现方式中,所述特征信息包括方向梯度hog信息、颜色lab信息、卷积神经网络CNN信息中的一项或者多项。In a possible implementation manner, the feature information includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information.
在一种可能的实现方式中,还基于所述目标ROI在历史图像中的位置和大小所对应的特征信息更新所述目标ROI的特征信息。In a possible implementation manner, the feature information of the target ROI is also updated based on the feature information corresponding to the position and size of the target ROI in the historical image.
在一种可能的实现方式中,所述目标ROI的特征信息是根据所述目标ROI对应的第一图像的特征信息和至少一个第三图像的特征信息确定的,所述至少一个第三图像在时域上位于第一图像和第二图像之间。In a possible implementation manner, the characteristic information of the target ROI is determined according to the characteristic information of the first image corresponding to the target ROI and the characteristic information of at least one third image. The at least one third image is Time domain is located between the first image and the second image.
步骤S204:根据所述目标ROI的特征信息,识别所述目标ROI在所述图像信号处理器生成的第二图像中的位置信息和大小信息,所述第一图像在时域上位于所述第二图像之前;Step S204: Identify the position information and size information of the target ROI in the second image generated by the image signal processor according to the characteristic information of the target ROI, and the first image is located in the third region in the time domain. Before two images
步骤S205:根据所述位置信息和大小信息进行对焦。Step S205: Focus according to the position information and size information.
在一种可能的实现方式中,在第一预设时间段后,重新计算所述目标ROI;或者In a possible implementation manner, after the first preset time period, recalculate the target ROI; or
在一种可能的实现方式中,当所述目标ROI的跟踪置信度小于置信度阈值的情况下,重新计算所述目标ROI,其中,所述跟踪置信度用于指示所述目标ROI的跟踪精确度,所述跟踪置信度与跟踪精确度成正比。In a possible implementation manner, when the tracking confidence of the target ROI is less than a confidence threshold, the target ROI is recalculated, where the tracking confidence is used to indicate that the tracking of the target ROI is accurate The tracking confidence is directly proportional to the tracking accuracy.
需要说明的是,本发明实施例中所描述的校准方法中的具体流程,可参见上述图1-图11中所述的发明实施例中的相关描述,此处不再赘述。It should be noted that, for specific processes in the calibration method described in the embodiments of the present invention, reference may be made to related descriptions in the embodiments of the invention described in FIG. 1 to FIG. 11 above, and details are not described herein again.
请参见图13,图13是本发明实施例提供的又一种对焦装置的结构示意图,该对焦装置30可包括第一处理单元301、第二处理单元302、第三处理单元303、识别单元304和对焦单元305,其中,Please refer to FIG. 13. FIG. 13 is a schematic structural diagram of another focusing device according to an embodiment of the present invention. The focusing device 30 may include a first processing unit 301, a second processing unit 302, a third processing unit 303, and a recognition unit 304. And focusing unit 305,
第一处理单元301,用于确定第一感兴趣区域ROI集合和第二ROI集合,所述第一ROI集合为从图像信号处理器生成的第一图像中获取的ROI集合,所述第一ROI集合包括一个或者多个第一ROI,每个第一ROI中包括一个拍摄对象;所述第二ROI集合为从所述第一图像中获取的ROI集合,所述第二ROI集合包括一个或多个第二ROI,每个第二ROI为运动区域;The first processing unit 301 is configured to determine a first ROI set and a second ROI set, where the first ROI set is a ROI set obtained from a first image generated by an image signal processor, and the first ROI The set includes one or more first ROIs, and each first ROI includes a subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more Second ROIs, each second ROI is a motion area;
第二处理单元302,用于基于所述第一ROI集合和所述第二ROI集合确定所述第一图像中的目标ROI;A second processing unit 302, configured to determine a target ROI in the first image based on the first ROI set and the second ROI set;
第三处理单元303,用于确定所述目标ROI的特征信息;A third processing unit 303, configured to determine feature information of the target ROI;
识别单元304,用于根据所述目标ROI的特征信息,识别所述目标ROI在所述图像信号处理器生成的第二图像中的位置信息和大小信息,所述第一图像在时域上位于所述第二图像之前;A recognition unit 304, configured to identify position information and size information of the target ROI in a second image generated by the image signal processor according to the characteristic information of the target ROI, where the first image is located in the time domain Before the second image;
对焦单元305,用于根据所述位置信息和大小信息进行对焦。The focusing unit 305 is configured to perform focusing according to the position information and the size information.
在一种可能的实现方式中,第二处理单元302,具体用于:In a possible implementation manner, the second processing unit 302 is specifically configured to:
从所述第一ROI集合中的一个或者多个第一ROI中确定有效第一ROI,所述有效第一ROI在所述第一图像的预设区域内;Determining a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is within a preset area of the first image;
从所述第二ROI集合中的一个或者多个第二ROI中确定有效第二ROI,所述有效第二ROI在所述第一图像的预设区域内;Determining an effective second ROI from one or more second ROIs in the second ROI set, where the effective second ROI is within a preset area of the first image;
在所述有效第一ROI与所述有效第二ROI的交并比IoU大于或者等于预设阈值的情况下,将所述有效第一ROI确定为目标ROI。In a case where the intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as the target ROI.
在一种可能的实现方式中,第二处理单元302还用于:In a possible implementation manner, the second processing unit 302 is further configured to:
在所述有效第一ROI与所述有效第二ROI的交并比IoU小于预设阈值的情况下,将所述有效第二ROI与所述有效第一ROI中距离所述第一图像中心点更近的ROI确定为目标ROI。In a case where the intersection ratio IoU of the effective first ROI and the effective second ROI is less than a preset threshold, the effective second ROI and the effective first ROI are distanced from the first image center point. The more recent ROI is determined as the target ROI.
在一种可能的实现方式中,所述有效第一ROI在所述第一图像的预设区域内的一个或者多个第一ROI中具有最高评估分值;和/或所述有效第二ROI在所述第一图像的预设区域内的一个或者多个第二ROI中具有最高评估分值;其中,每个ROI的评估分值满足如下至少一项:与该ROI的面积成正比,与该ROI距所述第一图像的中心点的距离成反比,与该ROI所属的物体类别的优先级成正比。In a possible implementation manner, the effective first ROI has a highest evaluation score in one or more first ROIs within a preset area of the first image; and / or the effective second ROI The one or more second ROIs within the preset region of the first image have the highest evaluation score; wherein the evaluation score of each ROI satisfies at least one of the following: proportional to the area of the ROI, and The distance of the ROI from the center point of the first image is inversely proportional to the priority of the object category to which the ROI belongs.
在一种可能的实现方式中,所述第三处理单元303还用于:基于所述目标ROI在历史图像中的位置和大小所对应的特征信息更新所述目标ROI的特征信息。In a possible implementation manner, the third processing unit 303 is further configured to update the feature information of the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image.
在一种可能的实现方式中,所述目标ROI的特征信息是根据所述目标ROI对应的第一图像的特征信息和至少一个第三图像的特征信息确定的,所述至少一个第三图像在时域上位于第一图像和第二图像之间。In a possible implementation manner, the characteristic information of the target ROI is determined according to the characteristic information of the first image corresponding to the target ROI and the characteristic information of at least one third image. The at least one third image is Time domain is located between the first image and the second image.
在一种可能的实现方式中,所述装置还包括:In a possible implementation manner, the apparatus further includes:
第一初始化单元306,用于在第一预设时间段后,重新计算所述目标ROI;或者A first initialization unit 306, configured to recalculate the target ROI after a first preset time period; or
第二初始化单元307,用于当所述目标ROI的跟踪置信度小于置信度阈值的情况下,重新计算所述目标ROI,其中,所述跟踪置信度用于指示所述目标ROI的跟踪精确度,所述跟踪置信度与跟踪精确度成正比。A second initialization unit 307 is configured to recalculate the target ROI when the tracking confidence of the target ROI is less than a confidence threshold, where the tracking confidence is used to indicate the tracking accuracy of the target ROI , The tracking confidence is directly proportional to the tracking accuracy.
在一种可能的实现方式中,所述特征信息包括方向梯度hog信息、颜色lab信息、卷积神经网络CNN信息中的一项或者多项。In a possible implementation manner, the feature information includes one or more of directional gradient hog information, color lab information, and convolutional neural network CNN information.
需要说明的是,本发明实施例中所描述的对焦装置30中相关单元的功能可参见上述图1-图11中所述的相关装置实施例,以及图12中所述的方法实施例中的相关描述,此处不再赘述。It should be noted that, for functions of related units in the focusing device 30 described in the embodiment of the present invention, reference may be made to the related device embodiments described in FIG. 1 to FIG. 11 and the method embodiments described in FIG. 12. Related descriptions are not repeated here.
图13中每个单元可以以软件、硬件、或其结合实现。以硬件实现的单元可以包括路及电炉、算法电路或模拟电路等。以软件实现的单元可以包括程序指令,被视为是一种软件产品,被存储于存储器中,并可以被处理器运行以实现相关功能,具体参见之前的介绍。Each unit in FIG. 13 may be implemented in software, hardware, or a combination thereof. Units implemented in hardware may include circuits and electric furnaces, algorithm circuits, or analog circuits. A unit implemented in software may include program instructions, which is regarded as a software product, stored in a memory, and may be run by a processor to implement related functions. For details, refer to the previous introduction.
本发明实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时包括上述方法实施例中记载的任意一种的部分或全部步骤。An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a program, and when the program is executed, it includes part or all of the steps described in any of the foregoing method embodiments.
本发明实施例还提供一种计算机程序,该计算机程序包括指令,当该计算机程序被计算机执行时,使得计算机可以执行任意一种车载设备升级方法的部分或全部步骤。An embodiment of the present invention further provides a computer program. The computer program includes instructions. When the computer program is executed by a computer, the computer can perform part or all of the steps of any method for upgrading a vehicle-mounted device.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not described in detail in an embodiment, reference may be made to related descriptions in other embodiments.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本 申请所必须的。It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all described as a series of action combinations. However, those skilled in the art should know that this application is not limited by the described action order. Because according to this application, some steps may be performed in other orders or simultaneously. Secondly, a person skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required for this application.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the above units is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or integrated. To another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical or other forms.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, which may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the embodiments of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本申请各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(Read-Only Memory,缩写:ROM)或者随机存取存储器(Random Access Memory,缩写:RAM)等各种可以存储程序代码的介质。When the above integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially a part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium. It includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, and specifically a processor in a computer device) to perform all or part of the steps of the foregoing method in each embodiment of the present application. The foregoing storage medium may include: a U disk, a mobile hard disk, a magnetic disk, an optical disk, a read-only memory (abbreviation: ROM), or a random access memory (Random Access Memory, abbreviation: RAM). A medium that can store program code.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to describe the technical solution of the present application, rather than limiting them. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still apply the foregoing The technical solutions described in the embodiments are modified, or some technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (18)

  1. 一种对焦装置,其特征在于,包括处理器、以及耦合于所述处理器的神经网络处理器和图像信号处理器;A focusing device, comprising a processor, a neural network processor and an image signal processor coupled to the processor;
    所述图像信号处理器,用于生成第一图像;The image signal processor is configured to generate a first image;
    所述神经网络处理器,用于获取所述第一图像中的第一感兴趣区域ROI集合,所述第一ROI集合包括一个或者多个第一ROI,每个第一ROI中包括一个拍摄对象;The neural network processor is configured to obtain a first region of interest ROI set in the first image, where the first ROI set includes one or more first ROIs, and each first ROI includes a photographic object ;
    所述处理器,用于:The processor is configured to:
    获取所述第一图像中第二ROI集合,所述第二ROI集合包括一个或多个第二ROI,每个第二ROI为运动区域;Acquiring a second ROI set in the first image, where the second ROI set includes one or more second ROIs, and each second ROI is a motion region;
    基于所述第一ROI集合和所述第二ROI集合确定所述第一图像中的目标ROI;Determining a target ROI in the first image based on the first ROI set and the second ROI set;
    确定所述目标ROI的特征信息;Determining characteristic information of the target ROI;
    根据所述目标ROI的特征信息,识别所述目标ROI在所述图像信号处理器生成的第二图像中的位置信息和大小信息,所述第一图像在时域上位于所述第二图像之前;以及Identifying position information and size information of the target ROI in a second image generated by the image signal processor according to the characteristic information of the target ROI, the first image being located before the second image in the time domain ;as well as
    根据所述位置信息和大小信息进行对焦。Focusing is performed according to the position information and the size information.
  2. 根据权利要求1所述的装置,其特征在于,所述处理器具体用于:The apparatus according to claim 1, wherein the processor is specifically configured to:
    从所述第一ROI集合中的一个或者多个第一ROI中确定有效第一ROI,所述有效第一ROI在所述第一图像的第一预设区域内;Determining a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is within a first preset region of the first image;
    从所述第二ROI集合中的一个或者多个第二ROI中确定有效第二ROI,所述有效第二ROI在所述第一图像的第二预设区域内;Determining an effective second ROI from one or more second ROIs in the second ROI set, where the effective second ROI is within a second preset region of the first image;
    在所述有效第一ROI与所述有效第二ROI的交并比IoU大于或者等于预设阈值的情况下,将所述有效第一ROI确定为目标ROI。In a case where the intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as the target ROI.
  3. 根据权利要求2所述的装置,其特征在于,所述处理器,还具体用于:The apparatus according to claim 2, wherein the processor is further configured to:
    在所述有效第一ROI与所述有效第二ROI的交并比小于预设阈值的情况下,将所述有效第二ROI与所述有效第一ROI中距离所述第一图像中心点更近的ROI确定为目标ROI。When the intersection ratio of the effective first ROI and the effective second ROI is less than a preset threshold, the effective second ROI and the effective first ROI are more distanced from the center of the first image than the effective first ROI. The near ROI is determined as the target ROI.
  4. 根据权利要求2或3所述的装置,其特征在于,所述有效第一ROI在所述第一图像的预设区域内的一个或者多个第一ROI中具有最高评估分值;和/或所述有效第二ROI在所述第一图像的预设区域内的一个或者多个第二ROI中具有最高评估分值;其中,每个ROI的评估分值满足如下至少一项:与该ROI的面积成正比,与该ROI距所述第一图像的中心点的距离成反比,与该ROI所属的物体类别的优先级成正比。The apparatus according to claim 2 or 3, wherein the effective first ROI has a highest evaluation score in one or more first ROIs within a preset area of the first image; and / or The effective second ROI has the highest evaluation score in one or more second ROIs within a preset area of the first image; wherein the evaluation score of each ROI satisfies at least one of the following: Is proportional to the area of the ROI, is inversely proportional to the distance of the ROI from the center point of the first image, and is proportional to the priority of the object category to which the ROI belongs.
  5. 根据权利要求1-4任意一项所述的装置,其特征在于,所述处理器还用于:基于所述目标ROI在历史图像中的位置和大小所对应的特征信息更新所述目标ROI的特征信息。The device according to any one of claims 1-4, wherein the processor is further configured to update the target ROI based on the feature information corresponding to the position and size of the target ROI in the historical image. Feature information.
  6. 根据权利要求1-5任意一项所述的装置,其特征在于,所述处理器还用于:The device according to any one of claims 1-5, wherein the processor is further configured to:
    在第一预设时间段后,重新计算所述目标ROI;或者Recalculate the target ROI after the first preset time period; or
    当所述目标ROI的跟踪置信度小于置信度阈值的情况下,重新计算所述目标ROI,其中,所述跟踪置信度用于指示所述目标ROI的跟踪精确度,所述跟踪置信度与跟踪精确度成正比。When the tracking confidence of the target ROI is less than the confidence threshold, the target ROI is recalculated, where the tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence and tracking Accuracy is directly proportional.
  7. 根据权利要求1-6任意一项所述的装置,其特征在于,所述特征信息包括方向梯度hog信息、颜色lab信息、卷积神经网络CNN信息中的一项或者多项。The device according to any one of claims 1-6, wherein the feature information comprises one or more of directional gradient hog information, color lab information, and CNN information of a convolutional neural network.
  8. 一种对焦方法,其特征在于,包括:A focusing method, comprising:
    确定第一感兴趣区域ROI集合和第二ROI集合,所述第一ROI集合为从图像信号处理器生成的第一图像中获取的ROI集合,所述第一ROI集合包括一个或者多个第一ROI,每个第一ROI中包括一个拍摄对象;所述第二ROI集合为从所述第一图像中获取的ROI集合,所述第二ROI集合包括一个或多个第二ROI,每个第二ROI为运动区域;Determining a first region of interest ROI set and a second ROI set, the first ROI set being a ROI set obtained from a first image generated by an image signal processor, the first ROI set including one or more first ROI, each first ROI includes a photographic subject; the second ROI set is a ROI set obtained from the first image, and the second ROI set includes one or more second ROIs, each The two ROIs are the motion areas;
    基于所述第一ROI集合和所述第二ROI集合确定所述第一图像中的目标ROI;Determining a target ROI in the first image based on the first ROI set and the second ROI set;
    确定所述目标ROI的特征信息;Determining characteristic information of the target ROI;
    根据所述目标ROI的特征信息,识别所述目标ROI在所述图像信号处理器生成的第二图像中的位置信息和大小信息,所述第一图像在时域上位于所述第二图像之前;Identifying position information and size information of the target ROI in a second image generated by the image signal processor according to the characteristic information of the target ROI, the first image being located before the second image in the time domain ;
    根据所述位置信息和大小信息进行对焦。Focusing is performed according to the position information and the size information.
  9. 根据权利要求8所述的方法,其特征在于,所述基于所述第一ROI集合和所述第二ROI集合确定所述第一图像中的目标ROI,包括:The method according to claim 8, wherein determining the target ROI in the first image based on the first ROI set and the second ROI set comprises:
    从所述第一ROI集合中的一个或者多个第一ROI中确定有效第一ROI,所述有效第一ROI在所述第一图像的第一预设区域内;Determining a valid first ROI from one or more first ROIs in the first ROI set, where the valid first ROI is within a first preset region of the first image;
    从所述第二ROI集合中的一个或者多个第二ROI中确定有效第二ROI,所述有效第二ROI在所述第一图像的第二预设区域内;Determining an effective second ROI from one or more second ROIs in the second ROI set, where the effective second ROI is within a second preset region of the first image;
    在所述有效第一ROI与所述有效第二ROI的交并比IoU大于或者等于预设阈值的情况下,将所述有效第一ROI确定为目标ROI。In a case where the intersection ratio of the effective first ROI and the effective second ROI is greater than or equal to a preset threshold, the effective first ROI is determined as the target ROI.
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:The method according to claim 9, further comprising:
    在所述有效第一ROI与所述有效第二ROI的交并比IoU小于预设阈值的情况下,将所述有效第二ROI与所述有效第一ROI中距离所述第一图像中心点更近的ROI确定为目标ROI。In a case where the intersection ratio IoU of the effective first ROI and the effective second ROI is less than a preset threshold, the effective second ROI and the effective first ROI are distanced from the first image center point. The more recent ROI is determined as the target ROI.
  11. 根据权利要求9或10所述的方法,其特征在于,所述有效第一ROI在所述第一图像的预设区域内的一个或者多个第一ROI中具有最高评估分值;和/或所述有效第二ROI在所述第一图像的预设区域内的一个或者多个第二ROI中具有最高评估分值;其中,每个ROI的评估分值满足如下至少一项:与该ROI的面积成正比,与该ROI距所述第一图像的中心点的距离成反比,与该ROI所属的物体类别的优先级成正比。The method according to claim 9 or 10, wherein the effective first ROI has a highest evaluation score in one or more first ROIs within a preset area of the first image; and / or The effective second ROI has the highest evaluation score in one or more second ROIs within a preset area of the first image; wherein the evaluation score of each ROI satisfies at least one of the following: Is proportional to the area of the ROI, is inversely proportional to the distance of the ROI from the center point of the first image, and is proportional to the priority of the object category to which the ROI belongs.
  12. 根据权利要求8-11任意一项所述的方法,其特征在于,所述方法还包括:基于所述目标ROI在历史图像中的位置和大小所对应的特征信息更新所述目标ROI的特征信息。The method according to any one of claims 8 to 11, further comprising: updating feature information of the target ROI based on feature information corresponding to the position and size of the target ROI in the historical image. .
  13. 根据权利要求8-12任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 8-12, wherein the method further comprises:
    在第一预设时间段后,重新计算所述目标ROI;或者Recalculate the target ROI after the first preset time period; or
    当所述目标ROI的跟踪置信度小于置信度阈值的情况下,重新计算所述目标ROI,其中,所述跟踪置信度用于指示所述目标ROI的跟踪精确度,所述跟踪置信度与跟踪精确度成正比。When the tracking confidence of the target ROI is less than the confidence threshold, the target ROI is recalculated, where the tracking confidence is used to indicate the tracking accuracy of the target ROI, and the tracking confidence and tracking Accuracy is directly proportional.
  14. 根据权利要求8-13任意一项所述的方法,其特征在于,所述特征信息包括方向梯度hog信息、颜色lab信息、卷积神经网络CNN信息中的一项或者多项。The method according to any one of claims 8-13, wherein the feature information comprises one or more of directional gradient hog information, color lab information, and CNN information of a convolutional neural network.
  15. 一种电子设备,其特征在于,包括图像传感器、和如权利要求1-7任意一项所述的对焦装置;其中An electronic device, comprising an image sensor and the focusing device according to any one of claims 1-7; wherein
    所述图像传感器,用于采集图像数据;The image sensor is used to collect image data;
    所述图像信号处理器,用于基于所述图像数据生成所述第一图像。The image signal processor is configured to generate the first image based on the image data.
  16. 根据权利要求15所述的电子设备,其特征在于,还包括:存储器,用于存储程序指令;所述程序指令被所述处理器执行。The electronic device according to claim 15, further comprising: a memory for storing program instructions; and the program instructions are executed by the processor.
  17. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述权利要求8-14任意一项所述的方法。A computer storage medium, characterized in that the computer storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 8 to 14 is implemented.
  18. 一种计算机程序,其特征在于,所述计算机程序包括指令,当所述计算机程序被计算机执行时,使得所述计算机执行如权利要求8-14中任意一项所述的方法。A computer program, characterized in that the computer program includes instructions that, when the computer program is executed by a computer, causes the computer to execute the method according to any one of claims 8-14.
PCT/CN2018/103370 2018-08-30 2018-08-30 Focusing apparatus, method and related device WO2020042126A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880096896.4A CN112602319B (en) 2018-08-30 2018-08-30 Focusing device, method and related equipment
PCT/CN2018/103370 WO2020042126A1 (en) 2018-08-30 2018-08-30 Focusing apparatus, method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/103370 WO2020042126A1 (en) 2018-08-30 2018-08-30 Focusing apparatus, method and related device

Publications (1)

Publication Number Publication Date
WO2020042126A1 true WO2020042126A1 (en) 2020-03-05

Family

ID=69644764

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/103370 WO2020042126A1 (en) 2018-08-30 2018-08-30 Focusing apparatus, method and related device

Country Status (2)

Country Link
CN (1) CN112602319B (en)
WO (1) WO2020042126A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626916A (en) * 2020-06-01 2020-09-04 上海商汤智能科技有限公司 Information processing method, device and equipment
CN112132162A (en) * 2020-09-08 2020-12-25 Oppo广东移动通信有限公司 Image processing method, image processor, electronic device, and readable storage medium
CN115735226A (en) * 2020-12-01 2023-03-03 华为技术有限公司 Image processing method and apparatus
CN116055866A (en) * 2022-05-30 2023-05-02 荣耀终端有限公司 Shooting method and related electronic equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114827481B (en) * 2022-06-29 2022-10-25 深圳思谋信息科技有限公司 Focusing method and device, zooming equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007077283A1 (en) * 2005-12-30 2007-07-12 Nokia Corporation Method and device for controlling auto focusing of a video camera by tracking a region-of-interest
KR20110007437A (en) * 2009-07-16 2011-01-24 삼성전기주식회사 System for automatically tracking of moving subjects and method of same
CN106060407A (en) * 2016-07-29 2016-10-26 努比亚技术有限公司 Focusing method and terminal
CN108024065A (en) * 2017-12-28 2018-05-11 努比亚技术有限公司 A kind of method of terminal taking, terminal and computer-readable recording medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5493789B2 (en) * 2009-12-07 2014-05-14 株式会社リコー Imaging apparatus and imaging method
JP2013191011A (en) * 2012-03-14 2013-09-26 Casio Comput Co Ltd Image processing apparatus, image processing method and program
US9538065B2 (en) * 2014-04-03 2017-01-03 Qualcomm Incorporated System and method for multi-focus imaging
CN106324945A (en) * 2015-06-30 2017-01-11 中兴通讯股份有限公司 Non-contact automatic focusing method and device
US9858496B2 (en) * 2016-01-20 2018-01-02 Microsoft Technology Licensing, Llc Object detection and classification in images
CN106254780A (en) * 2016-08-31 2016-12-21 宇龙计算机通信科技(深圳)有限公司 A kind of dual camera camera control method, photographing control device and terminal
CN107302658B (en) * 2017-06-16 2019-08-02 Oppo广东移动通信有限公司 Realize face clearly focusing method, device and computer equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007077283A1 (en) * 2005-12-30 2007-07-12 Nokia Corporation Method and device for controlling auto focusing of a video camera by tracking a region-of-interest
KR20110007437A (en) * 2009-07-16 2011-01-24 삼성전기주식회사 System for automatically tracking of moving subjects and method of same
CN106060407A (en) * 2016-07-29 2016-10-26 努比亚技术有限公司 Focusing method and terminal
CN108024065A (en) * 2017-12-28 2018-05-11 努比亚技术有限公司 A kind of method of terminal taking, terminal and computer-readable recording medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626916A (en) * 2020-06-01 2020-09-04 上海商汤智能科技有限公司 Information processing method, device and equipment
CN111626916B (en) * 2020-06-01 2024-03-22 上海商汤智能科技有限公司 Information processing method, device and equipment
CN112132162A (en) * 2020-09-08 2020-12-25 Oppo广东移动通信有限公司 Image processing method, image processor, electronic device, and readable storage medium
CN112132162B (en) * 2020-09-08 2024-04-02 Oppo广东移动通信有限公司 Image processing method, image processor, electronic device, and readable storage medium
CN115735226A (en) * 2020-12-01 2023-03-03 华为技术有限公司 Image processing method and apparatus
CN115735226B (en) * 2020-12-01 2023-08-22 华为技术有限公司 Image processing method and chip
CN116055866A (en) * 2022-05-30 2023-05-02 荣耀终端有限公司 Shooting method and related electronic equipment
CN116055866B (en) * 2022-05-30 2023-09-12 荣耀终端有限公司 Shooting method and related electronic equipment

Also Published As

Publication number Publication date
CN112602319A (en) 2021-04-02
CN112602319B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
WO2020042126A1 (en) Focusing apparatus, method and related device
WO2020259179A1 (en) Focusing method, electronic device, and computer readable storage medium
US11847826B2 (en) System and method for providing dominant scene classification by semantic segmentation
US11410038B2 (en) Frame selection based on a trained neural network
CN108447091B (en) Target positioning method and device, electronic equipment and storage medium
WO2021043273A1 (en) Image enhancement method and apparatus
WO2020103110A1 (en) Image boundary acquisition method and device based on point cloud map and aircraft
WO2019228196A1 (en) Method for tracking target in panoramic video, and panoramic camera
CN110866480A (en) Object tracking method and device, storage medium and electronic device
WO2020103108A1 (en) Semantic generation method and device, drone and storage medium
US20170054897A1 (en) Method of automatically focusing on region of interest by an electronic device
US20220223153A1 (en) Voice controlled camera with ai scene detection for precise focusing
WO2022076116A1 (en) Segmentation for image effects
WO2021104124A1 (en) Method, apparatus and system for determining confinement pen information, and storage medium
WO2019144263A1 (en) Control method and device for mobile platform, and computer readable storage medium
WO2023138403A1 (en) Method and apparatus for determining trigger gesture, and device
CN111291646A (en) People flow statistical method, device, equipment and storage medium
CN114339054A (en) Photographing mode generation method and device and computer readable storage medium
CN106922181A (en) Directional perception is focused on automatically
JP2020021368A (en) Image analysis system, image analysis method and image analysis program
CN113056907A (en) Imaging method, imaging device, and storage medium
CN112655021A (en) Image processing method, image processing device, electronic equipment and storage medium
CN115457666A (en) Method and system for identifying moving gravity center of living body object and computer readable storage medium
CN115223135A (en) Parking space tracking method and device, vehicle and storage medium
CN114677620A (en) Focusing method, electronic device and computer readable medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18931886

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18931886

Country of ref document: EP

Kind code of ref document: A1