WO2020077544A1 - 一种物体识别方法和终端设备 - Google Patents

一种物体识别方法和终端设备 Download PDF

Info

Publication number
WO2020077544A1
WO2020077544A1 PCT/CN2018/110525 CN2018110525W WO2020077544A1 WO 2020077544 A1 WO2020077544 A1 WO 2020077544A1 CN 2018110525 W CN2018110525 W CN 2018110525W WO 2020077544 A1 WO2020077544 A1 WO 2020077544A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
terminal device
image
frame
frame image
Prior art date
Application number
PCT/CN2018/110525
Other languages
English (en)
French (fr)
Inventor
杨仁志
江继勇
张腾
燕瑞
郁东健
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2018/110525 priority Critical patent/WO2020077544A1/zh
Priority to CN201880087013.3A priority patent/CN111615704A/zh
Priority to EP18937263.4A priority patent/EP3855358A4/en
Publication of WO2020077544A1 publication Critical patent/WO2020077544A1/zh
Priority to US17/231,352 priority patent/US20210232853A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/167Detection; Localisation; Normalisation using comparisons between temporally consecutive images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/17Image acquisition using hand-held instruments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present application relates to the field of terminal technology, and in particular, to an object recognition method and terminal equipment.
  • the mobile phone can collect an image of an object (such as a human face) and recognize the object on the image.
  • an object such as a human face
  • the object recognition technology can only recognize objects with relatively fixed shapes. In actual life, many objects will be deformed, such as cats and dogs.
  • the terminal device recognizes that a frame contains a cat (such as standing), if the next frame still contains the cat, and the cat ’s form changes (such as lying down), the terminal The device may not be able to recognize the cat contained in the next frame of the image, or the recognition error (for example, because the animal's lying posture is similar, it may be recognized as a dog).
  • the present application provides an object recognition method and a terminal device to improve the accuracy of identifying objects that can be deformed.
  • an embodiment of the present application provides an object recognition method, which may be executed by a terminal device.
  • the method includes: the terminal device identifying the first target object of the first frame image; the terminal device identifying the second target object of the second frame image adjacent to the first frame image; if the first target object and If the similarity of the second target object is greater than the preset similarity and the moving speed is less than the preset speed, the terminal device determines that the first target object and the second target object are the same object.
  • the terminal device can recognize whether the objects on two frames of images, such as images of adjacent frames, are the same object, which helps to improve the accuracy of object recognition.
  • the terminal device identifying the first target object of the first frame image includes: the terminal device acquiring the first feature information in the first frame image; the terminal device matches the pre-stored object The template matches the second feature information that matches the first feature information; the object matching template includes the correspondence between the object and the feature information; the terminal device determines that the object matching template matches the first The object corresponding to the second feature information is the first target object; the degree of similarity between the first target object and the second target object is greater than a preset degree of similarity, including: the first target object and the second The target object belongs to the same object type.
  • the terminal device determines that the target objects on two frames of images, such as adjacent frames, belong to the same object type, it can further determine whether the target object is the same object, which helps improve the accuracy of object recognition.
  • the moving speed is used to indicate: the moving speed is used to indicate: a ratio between a displacement vector and time, the displacement vector is from a first pixel point on the first target object to The displacement of the second pixel on the second target object; the time is used to indicate the time interval for the terminal device to acquire the first frame image and the second frame image; wherein, the second pixel The point is a pixel that is determined by the terminal device according to the matching algorithm and matches the first pixel.
  • the terminal device determines the speed of the target object according to the position of the pixel point of the target object on the adjacent frame image and the time interval of collecting the image, if the speed is small, the target object on the adjacent frame image Is the same object. In this way, it helps to improve the accuracy of object recognition.
  • the moving speed is less than a preset speed, including: the speed of the moving speed is less than the preset speed; and / or the angle between the direction of the moving speed and the preset direction is less than the preset Set an angle; wherein, the preset direction is a moving direction from the third pixel to the first pixel; the third pixel is the terminal device according to the matching algorithm in the first frame The pixel that matches the first pixel determined on the previous frame of the image.
  • the terminal device may determine that the target object on the adjacent frame image is the same object when the conditions of the target object on the adjacent frame image satisfy the conditions. In this way, it helps to improve the accuracy of object recognition.
  • the terminal device may also detect a user operation, and in response to the user operation, open the camera application, start the camera, and display the viewfinder interface; A preview image collected by the camera is displayed on the viewfinder interface; the preview icon includes the first frame image and the second frame image.
  • the camera application in the terminal device can be used to recognize objects, and can recognize whether the objects on the dynamically changing preview image are the same object, which helps to improve the accuracy of object recognition.
  • a first control is displayed in the viewfinder interface, and when the first control is triggered, the terminal device recognizes the target object in the preview image.
  • the object recognition function in the terminal device (such as the camera application in the mobile phone) can be activated or deactivated by the control, which is more flexible and convenient for operation.
  • the terminal device may also output prompt information, which is used to prompt the The first target object and the second target object are the same object.
  • the terminal device when the terminal device recognizes that the target object on the two frames of images is the same object, the terminal device prompts the user that the target object is the same object, which helps the user to track the object and improve the accuracy of tracking the target object .
  • the terminal device before the terminal device recognizes the first target object of the first frame image, the terminal device may also display the first frame image; the terminal device recognizes the After the first target object, the terminal device may display a label of the first target object on the first frame of the image, the label includes relevant information of the first target object; Before the second target object on the two-frame image, the terminal device can also display the second frame image; after the terminal device determines that the first target object and the second target object are the same object, the terminal device Continue to display the label on the second image.
  • the same label when the terminal device recognizes the same target object in two frames of images, the same label may be displayed, and the label includes relevant information of the target object.
  • the accuracy of object recognition is improved and the user experience is improved, and the tag can display relevant information of the object, which is convenient for users to view.
  • the display position of the label follows the first target object and the second target The object moves.
  • the display position of the object label can move with the movement of the target object in the image, which helps the user to track the object and improve Accuracy of target tracking.
  • the terminal device displays a chat interface of a communication application, and the chat interface includes a moving picture; the terminal device detects For the operation of the animation, a second control is displayed; the second control is used to trigger the terminal device to recognize the target object in the animation.
  • a terminal device such as a mobile phone, may recognize the object in the image (moving picture or video) sent in the WeChat chat interface through the object recognition method provided in the embodiment of the present application.
  • the terminal device before the terminal device recognizes the first target object of the first frame of image, the terminal device is in a screen lock state; the terminal device collects at least two frames of face images; the terminal determines the first After the face on the frame image and the face on the second frame image are the same face, the terminal device is unlocked.
  • the terminal device when the terminal device collects multiple frames of face images, the faces in the multiple frames of human images are the same face, the terminal device is unlocked, and the accuracy of face recognition is improved.
  • the terminal displays a payment verification interface; the terminal device collects at least two frames of face images; the terminal determines the first frame After the face on the image and the face on the second frame image are the same face, the terminal executes the payment process.
  • the terminal device when the terminal device displays a payment interface (such as a WeChat payment interface or an Alipay payment interface), when the terminal device collects multiple frames of face images, the faces in the multiple frames of images are the same face To complete the payment process. In this way, payment security is improved.
  • a payment interface such as a WeChat payment interface or an Alipay payment interface
  • inventions of the present application provide a terminal device.
  • the terminal device includes: a processor and a memory; wherein the memory is used to store one or more computer programs; when one or more computer programs stored in the memory are executed by the processor, the terminal device.
  • an embodiment of the present application further provides a terminal device, the terminal device includes a module / unit that executes the method of the first aspect or any one of the possible designs of the first aspect; these modules / units may be implemented by hardware Realization can also be achieved by hardware executing corresponding software.
  • a chip according to an embodiment of the present application the chip is coupled to a memory in an electronic device, and executes the first aspect of the embodiment of the present application and any possible technical solution of the first aspect; in the embodiment of the present application "Coupled” means that two components are directly or indirectly joined to each other.
  • a computer storage medium includes a computer program.
  • the computer program runs on an electronic device, the electronic device causes the electronic device to perform the first aspect of the embodiment of the present application and Any technical solution that may be designed in the first aspect.
  • a computer program product in an embodiment of the present application when the computer program product runs on an electronic device, causes the electronic device to perform the first aspect of the embodiment of the present application and any possibility of the first aspect Designed technical solutions.
  • FIG. 1 is a schematic diagram of a camera imaging process provided by an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a mobile phone according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a mobile phone according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of an object recognition method according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of an object recognition method according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a pixel moving speed provided by an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a display interface of a mobile phone according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a display interface of a mobile phone according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a display interface of a mobile phone according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a display interface of a mobile phone according to an embodiment of the present invention.
  • the original image involved in the embodiment of the present application is the original data obtained by the camera converting the collected optical signal reflected by the target object into a digital image signal.
  • the original data may be data that has not been processed.
  • the original image can be raw format data.
  • the raw format data includes target object information and camera parameters. Camera parameters such as ISO, shutter speed, aperture value, white balance, etc.
  • the preview image involved in the embodiment of the present application is an image obtained by the terminal device processing the original image.
  • the terminal device converts the original image into an image containing color information such as an RGB image or YUV data based on camera parameters in the original image.
  • the preview image can be presented on the interface of the camera application, such as the viewfinder interface.
  • the original image collected by the camera changes dynamically (such as the user holding the terminal device to move the position, the viewing range of the camera changes; or, the position or shape of the target object itself changes), that is, the original image can It includes multiple frames of images, and the positions or shapes of target objects (such as people and animals) included in different frames of images are different. Therefore, the preview image also changes dynamically, that is, the preview image may also include multiple frames of images.
  • the preview image or the original image can be used as the input image of the object recognition algorithm provided by the embodiment of the present application.
  • the preview image is taken as an input image of the object recognition algorithm provided by the embodiment of the present application as an example.
  • images involved in the embodiments of the present application may be in the form of pictures, or may be a collection of data, such as a collection of parameters (such as pixel points, color information, etc.).
  • the pixels involved in the embodiments of the present application are the smallest imaging unit on a frame of image.
  • a pixel can correspond to a coordinate point on the image.
  • a pixel can correspond to one parameter (such as gray scale), or it can correspond to a set of multiple parameters (such as gray scale, brightness, color, etc.).
  • the image plane coordinate system involved in the embodiments of the present application is a coordinate system established on the imaging plane.
  • FIG. 1 is a schematic diagram of a camera imaging process provided by an embodiment of the present application.
  • the image plane coordinate system is represented by o-x-y, where o is the origin of the image plane coordinate system, and the x-axis and y-axis are the coordinate axes of the image plane coordinate system, respectively.
  • the pixels on the original image or the preview image can be represented in the phase plane coordinate system.
  • At least one involved in the embodiments of the present application is used to indicate one or more. Among them, multiple means greater than or equal to two.
  • the terminal device may be a portable terminal containing a device such as a camera and having an image acquisition function, such as a mobile phone, a tablet computer, and the like.
  • portable terminal devices include, but are not limited to Or portable terminal devices of other operating systems.
  • the above portable terminal device may also be other portable terminal devices, such as a digital camera, as long as it has an image acquisition function. It should also be understood that, in some other embodiments of the present application, the terminal device may not be a portable terminal device, but a desktop computer with an image acquisition function.
  • terminal equipment supports multiple applications. For example, one or more of the following applications: camera application, instant messaging application, photo management application, etc. Among them, there can be multiple instant messaging applications. Such as WeChat, Tencent chat software (QQ), WhatsApp Messenger, Line Me, Line Sharing, Instagram, Kakao Talk, Nail, etc. Users can send text, voice, pictures, video files and other various files and other information to other contacts through the instant messaging application; or, users can realize video or audio calls with other contacts through the instant messaging application.
  • instant messaging applications Such as WeChat, Tencent chat software (QQ), WhatsApp Messenger, Line Me, Line Sharing, Instagram, Kakao Talk, Nail, etc.
  • Users can send text, voice, pictures, video files and other various files and other information to other contacts through the instant messaging application; or, users can realize video or audio calls with other contacts through the instant messaging application.
  • FIG. 2 shows a schematic structural diagram of the mobile phone 100.
  • the mobile phone 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, key 190, motor 191, indicator 192, camera 193, display screen 194, and user Identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the mobile phone 100.
  • the mobile phone 100 may include more or fewer components than shown, or combine some components, or split some components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), and image signals.
  • processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec video codec
  • DSP digital signal processor
  • baseband processor baseband processor
  • NPU neural-network processing unit
  • different processing units may be independent devices, or may be integrated in one or more processors.
  • the controller may be the nerve center and command center of the mobile phone 100.
  • the controller can generate the operation control signal according to the instruction operation code and the timing signal to complete the control of fetching instructions and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory may store instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. The repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • the mobile phone 100 realizes a display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, connecting the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations, and is used for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel may use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light-emitting diode or an active matrix organic light-emitting diode (active-matrix organic light) emitting diode, AMOLED), flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • the mobile phone 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the mobile phone 100 can realize the image capturing function through the processor 110, the camera 193, the display screen 194, and the like.
  • the camera 193 is used to capture still images, moving images, or videos.
  • the camera 193 may include a photosensitive element (such as a lens group) and an image sensor, where the lens group includes a plurality of lenses (convex lens or concave lens) for collecting the optical signal reflected by the target object and transmitting the collected optical signal to the image sensor.
  • the image sensor generates an original image of the target object according to the light signal.
  • the image sensor transmits the generated original image to the processor 110.
  • the processor 110 processes the original image (for example, converts the original image into an image containing color information such as an RGB image or YUV data) to obtain a preview image.
  • the display screen 194 displays a preview image.
  • the display screen 194 displays the preview image and related information of the target object identified from the preview image.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the mobile phone 100.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area may store an operating system, at least one function required application programs (such as sound playback function, image playback function, etc.) and so on.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the mobile phone 100 and the like.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and so on.
  • a non-volatile memory such as at least one disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and so on.
  • the distance sensor 180F is used to measure the distance.
  • the mobile phone 100 can measure the distance by infrared or laser. In some embodiments, when shooting scenes, the mobile phone 100 may use the distance sensor 180F to measure distance to achieve fast focusing. In other embodiments, the mobile phone 100 may also use the distance sensor 180F to detect whether a person or object is close.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the mobile phone 100 emits infrared light outward through a light emitting diode.
  • the mobile phone 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the mobile phone 100. When insufficient reflected light is detected, the mobile phone 100 can determine that there is no object near the mobile phone 100.
  • the mobile phone 100 can use the proximity light sensor 180G to detect that the user is holding the mobile phone 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L is used to sense the brightness of ambient light.
  • the mobile phone 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the mobile phone 100 is in a pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the mobile phone 100 can use the collected fingerprint characteristics to unlock a fingerprint, access an application lock, take a photo with a fingerprint, and answer a call with a fingerprint.
  • the temperature sensor 180J is used to detect the temperature.
  • the mobile phone 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the mobile phone 100 performs to reduce the performance of the processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the mobile phone 100 heats the battery 142 to avoid abnormal shutdown of the mobile phone 100 due to low temperature. In some other embodiments, when the temperature is lower than another threshold, the mobile phone 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also known as "touch panel”.
  • the touch sensor 180K may be provided on the display screen 194, and the touch sensor 180K and the display screen 194 constitute a touch screen, also called a "touch screen”.
  • the touch sensor 180K is used to detect a touch operation acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K may also be disposed on the surface of the mobile phone 100, which is different from the location where the display screen 194 is located.
  • the mobile phone 100 can realize audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone interface 170D, and an application processor. For example, music playback, recording, etc.
  • the mobile phone 100 may receive key 190 input and generate key signal input related to user settings and function control of the mobile phone 100.
  • the mobile phone 100 may use the motor 191 to generate vibration prompts (such as incoming call vibration prompts).
  • the indicator 192 in the mobile phone 100 can be an indicator light, which can be used to indicate the charging state, the power change, and can also be used to indicate messages, missed calls, notifications, and the like.
  • the SIM card interface 195 in the mobile phone 100 is used to connect a SIM card. The SIM card can be inserted into or removed from the SIM card interface 195 to achieve contact and separation with the mobile phone 100.
  • the wireless communication function of the mobile phone 100 can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide a wireless communication solution including 2G / 3G / 4G / 5G and the like applied to the electronic device 100.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), and the like.
  • the mobile communication module 150 can receive the electromagnetic wave from the antenna 1, filter and amplify the received electromagnetic wave, and transmit it to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor and convert it to electromagnetic wave radiation through the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be transmitted into a high-frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to a speaker 170A, a receiver 170B, etc.), or displays an image or video through a display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110, and may be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (bluetooth, BT), and global navigation satellites that are applied to the electronic device 100. Wireless communication solutions such as global navigation (satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR), etc.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives the electromagnetic wave via the antenna 2, frequency-modulates and filters the electromagnetic wave signal, and sends the processed signal to the processor 110.
  • the wireless communication module 160 may also receive the signal to be transmitted from the processor 110, frequency-modulate it, amplify it, and convert it to electromagnetic waves through the antenna 2 to radiate it out.
  • the following embodiments can all be implemented in a terminal device (such as a mobile phone 100, a tablet computer, etc.) having the above hardware structure.
  • a terminal device such as a mobile phone 100, a tablet computer, etc.
  • object recognition through the mobile phone 100 shown in FIG. 3 may be the following process:
  • the display screen 194 of the mobile phone 100 displays a main interface, which includes icons of various application programs (such as a phone application icon, a video player icon, a music player icon, a camera application icon, a browser application icon, etc.).
  • the user clicks the icon of the camera application in the main interface through the touch sensor 180K (not shown in FIG. 2, see FIG. 1) provided on the display screen 194 to start the camera application and turn on the camera 193.
  • the display 194 displays an interface of the camera application, such as a viewfinder interface.
  • the lens group 193-1-1 in the camera 193 collects the optical signal reflected by the target object, and transmits the collected optical signal to the image sensor 193-2.
  • the image sensor 193-2 generates an original image of the target object based on the light signal.
  • the image sensor 315-2 sends the original image to the application processor 110-1.
  • the application processor 110-1 processes the original image (such as converting the original image into an RGB image) to obtain a preview image; or, the image sensor 315-2 can also send the original image to other processors (such as ISP, as shown in FIG. 3 (Not shown), the original image is processed by the ISP to obtain a preview image.
  • the ISP sends the preview image to the application processor 110-1.
  • a specific control may be displayed on the interface of the camera application.
  • the mobile phone 100 starts the recognition function of the object in the preview image.
  • the touch sensor 180K in the mobile phone 100 detects that the user clicks a specific control in the camera application interface (such as a viewfinder interface), and triggers the application processor 301-1 to run the code of the object recognition algorithm provided in the embodiment of the present application Describe the target object in the preview image.
  • the application processor 110-1 after the application processor 110-1 obtains the preview image (for example, the application processor converts the original image into a GRB image), it can also automatically run the code of the object recognition algorithm provided by the embodiment of the present application to recognize The object in the preview image. No user is required to actively trigger object recognition.
  • the display 194 displays the related information of the target object (such as the name and type of the target object, which will be described later).
  • the processor 110 integrated application processor 110-1 is taken as an example.
  • the processor 110 may also only integrate the GPU, which is used to perform the functions of the application processor 110-1 in the above content; or, the processor 110 may also only integrate the CPU, which is used to execute the above content Application processor 110-1 functions.
  • the embodiment of the present application does not limit the main body that runs the object recognition algorithm code provided by the embodiment of the present application.
  • the camera 193 may continuously collect images at a certain time interval, that is, collect multiple frames of images. Therefore, if the target object contained in each frame of the multi-frame image is different (position, shape, etc.), then the multi-frame image is displayed on the camera application interface (such as the viewfinder interface), showing the effect of dynamic picture changes.
  • the position of the target object itself changes (for example, the target object itself generates displacement and moves from the first position to the second position) or the form changes (for example, the target object itself changes from the first form to the second form), resulting in camera 193
  • the display position and morphology of the target object in each frame of the original image in the collected multi-frame original images change.
  • the original image is dynamically changed, that is, the preview image is also dynamically changed.
  • the application processor 301-1 runs the code of the object recognition algorithm provided in the embodiment of the present application to identify the target object in the preview image in two processes.
  • the application processor 110-1 can identify the target object in each frame of the preview image.
  • the application processor 110-1 determines the two target objects Whether it is the same object.
  • the application processor 110-1 can further determine the two target objects Whether it is the same object. For example, the application processor 110-1 can determine the correlation of the target object in the adjacent frame image. If the correlation (for example, the moving speed of the pixel point of the target object in the adjacent frame image is less than or equal to the preset speed), then the phase The target object in the adjacent frame image is the same object (the specific process will be described later); if it is not related (for example, the moving speed of the pixel point of the target object in the adjacent frame image is greater than the preset speed), the adjacent The target object in the frame image is not the same object.
  • the correlation for example, the moving speed of the pixel point of the target object in the adjacent frame image is less than or equal to the preset speed
  • the phase The target object in the adjacent frame image is the same object (the specific process will be described later); if it is not related (for example, the moving speed of the pixel point of the target object in the adjacent frame image is greater than the preset speed), the adjacent The target object in the frame image
  • the user starts the camera application in the mobile phone 100 to shoot a cat.
  • the cat's form is changing (such as lying down, standing). Therefore, the preview image in the viewfinder interface of the camera application is dynamically changed.
  • the mobile phone 100 can recognize that each frame image includes a cat, but in different frame images, the shape of the cat is different, so the mobile phone 100 can further determine whether the cats included in the adjacent frame images are the same cat .
  • the terminal device when the terminal device recognizes objects in multiple frames of images (such as videos and moving pictures), it recognizes each frame of images separately.
  • the terminal device recognizes an object in the first frame of the image
  • the shape of the object in the next frame of image changes, and the terminal device re-identifies the object in the next frame of image. Due to the change in the shape, the terminal device may Identify the object.
  • the terminal device recognizes the object as another object, that is, the recognition error, but in reality, the object in the next frame image and the first frame image is the same object.
  • the terminal device when the terminal device recognizes a target object in a multi-frame image (such as video, dynamic), it may consider the correlation between adjacent frames (such as the target in the adjacent frame image) The moving speed of pixels of the object) to determine whether the target object in the adjacent frame image is the same object. Therefore, in the embodiment of the present application, the terminal device can not only recognize the target object on each frame image in multi-frame images (such as video, dynamic), but also recognize whether the target object in multi-frame images is the same object, improve Recognition accuracy.
  • a target object in a multi-frame image such as video, dynamic
  • the following describes the process in which the application processor 110-1 runs the code of the object recognition algorithm provided by the embodiment of the present application and recognizes the target object in the preview image (multi-frame preview image).
  • FIG. 4 is a schematic flowchart of an object recognition method provided by an embodiment of the present application.
  • the application processor 110-1 runs the code of the object recognition algorithm to perform the following process:
  • S401 Obtain first feature information of a first target object in a frame of preview image, and match second feature information that matches the first feature information in a pre-stored object matching template.
  • the object corresponding to the second characteristic information is the first target object.
  • the mobile phone 100 may obtain the first feature information of the first target object on the preview image of a frame in various ways, for example, foreground and background separation, edge detection algorithm, etc., which is not limited in the embodiments of the present application.
  • an object matching template may be stored in the mobile phone 100, and the object matching template includes feature information of different kinds of objects.
  • the feature information of the object includes the edge contour of the object, the color information and texture information of the feature points (eyes, mouth, tail, etc.).
  • the object matching template may be set before the mobile phone 100 leaves the factory, or may be customized by the user when using the mobile phone 100.
  • the object matching template is set before the mobile phone 100 leaves the factory, the following describes the process of obtaining the object matching template before the mobile phone 100 leaves the factory.
  • the designer may use multiple images of the same target object as input images of the mobile phone 100 to identify the multiple images. Among them, the shape of the target object in each image is different. Therefore, the mobile phone 100 obtains the feature information of the target object in each image.
  • the designer can obtain 100 images of the cat in different forms (for example, self-photographed or obtained from the network side). Among them, the shape of the cat in each of the 100 images is different.
  • the mobile phone 100 recognizes the target object (such as a cat) on each image, saves the feature information of the target object (such as a cat), and then obtains the feature information of the target object in 100 forms.
  • the feature information may be the edge contour of the target object in each form, color information of the feature points (eyes, mouth, tail, etc.), texture information, etc.
  • the mobile phone 100 may store the feature information of the object in the form of a table (such as Table 1), that is, the object matching template.
  • a table such as Table 1
  • Table 1 only two form templates (such as lying or standing) of object types such as cats are shown. In practical applications, other forms may also be included. That is, Table 1 is only an example of an object matching template, and those skilled in the art can refine Table 1. Taking the previous example as an example, the designer obtains 100 images of cats in different forms for recognition, and obtains the characteristic information of the cat in 100 forms, that is, there are 100 forms corresponding to cats in Table 1. Of course, there are many types of cats, and similar manners can be used to obtain feature information in various forms of different types of cats (that is, there are multiple types of cat objects), which is not limited in the embodiments of the present application.
  • the application processor 110-1 may first obtain the feature information (edge contour and color information of the feature point) of the target object on the preview image of the frame , Texture information, etc.). If the acquired feature information matches certain feature information in an object matching template (such as Table 1), the application processor 110-1 determines that the object corresponding to the matched feature information is the target object. Therefore, in order to identify different objects on each frame of the image, the object matching template can include various objects (such as animals, people and other objects that may be deformed) as much as possible, and the feature information of each object in different forms. In this case, the mobile phone 100 stores the characteristic information of various objects in different forms. In this way, the application processor 110-1 can identify the target object in different forms on each frame of the image.
  • the object matching template can also be updated, for example, the user manually updates or the mobile phone 100 automatically updates.
  • S402 Obtain the third feature information of the second target object in the preview image of the next frame, and match the fourth feature information that matches the third feature information in the pre-stored object matching template, and the object matching template matches The object corresponding to the fourth characteristic information is the second target object.
  • the application processor 110-1 may execute S401 for each frame of the preview image. If the application processor 110-1 determines the first target object in one frame of image, the second target object in the preview image of the next frame can be identified in the same manner. Since the application processor 110-1 recognizes the first target object and the second target object in the same way (object matching template), the identified first target object and second target object may be of the same object type (such as both Is a cat), or may not be the same object type (for example, the first target object is a cat, and the second target object is a dog).
  • the application processor 110-1 may determine the two target objects It is the same object.
  • the application processor 110-1 may continue to judge the first target object and the second target Whether the objects are the same object, that is, continue to perform subsequent steps; when the identified first target object and second target object are not the same object type, the application processor 110-1 may not perform the subsequent steps.
  • S403 Determine the first pixel of the first target object on the one-frame image.
  • each frame of image is presented on the imaging plane, so after the application processor 110-1 recognizes the first target object, the first pixel on the first target object can be determined in the image plane coordinate system point.
  • the first pixel point may be the center position coordinate of the first target object, or the position coordinate of a feature point (such as an eye) on the first target object.
  • S404 Determine a second pixel point corresponding to the first pixel point on the second target object on the next frame image according to a preset algorithm.
  • the first pixel point is the center position coordinate of the first target object or the position coordinate of the feature point on the first target object.
  • the second pixel point is the center position coordinate of the second target object.
  • the application processor 110-1 may determine the center position of the target object according to a filter algorithm (such as a Kalman filter algorithm), which is not described in detail in the embodiments of the present application.
  • a filter algorithm such as a Kalman filter algorithm
  • the application processor 110-1 may find a feature point on the second target object that matches the feature point on the first target object according to a matching algorithm (such as a similarity matching algorithm).
  • the camera collects multiple frames of preview images.
  • One frame of the preview image is taken for the cat in the solid state, and the next frame of the preview image is taken for the cat in the dashed state.
  • the application processor 110-1 recognizes that both target objects in the two frames of preview images are cats.
  • the application processor 110-1 determines the first pixel point of the first target object (cat in a solid line state) in the one-frame preview image, that is, point A on the imaging plane.
  • the application processor 110-1 determines the second pixel point of the second target object (cat in a dotted line state) in the preview image of the next frame, that is, point B on the imaging plane.
  • the pixels on the one frame image and the next frame image are all presented on the imaging plane, so in FIG. 5 the first pixel point on the one frame image and the next frame image
  • the second pixel is marked on the imaging plane, but in fact, the first pixel and the second pixel are pixels on two different images.
  • S405 Determine according to the first coordinates (x1, y1) of the first pixel A and the second coordinates (x2, y2) of the second pixel B Wherein, t is used to instruct the camera to collect the time interval between the one frame image and the next frame image.
  • the image point corresponding to the object point also changes, corresponding to The display position of the target object in the preview image changes. Therefore, the changes in the position and form of the object in the preview image can reflect the changes in the position and form of the object in the real environment.
  • the time interval for the camera 193 to acquire two frames of images is short (for example, 30 ms), and the time interval may be set before the mobile phone 100 leaves the factory, or may be customized by the user during the use of the mobile phone 100.
  • the position and shape of the target object in the real environment change little during this time interval. That is to say, although the position or shape of the target object constantly changes in the real environment, the camera 193 can continuously collect images of the target object at a shorter time interval, so the position or shape of the target object on the adjacent frame image changes more small. Therefore, the application processor 110-1 can determine whether the moving speed between two pixels on the two target objects in the adjacent frame image is less than the preset speed, if it is less, the two target objects are the same object; Greater than, the two target objects are not the same object.
  • the application processor 110-1 may determine according to the first coordinates (x1, y1) of the first pixel A and the second coordinates (x2, y2) of the second pixel B That is, the speed at which the first pixel A moves to the second pixel B.
  • the speed at which the first pixel A moves to the second pixel B Including rate and direction. Specifically, when the rate is less than the preset rate and the angle between the direction and the preset direction is less than the preset angle, the first target object and the second target object are the same object.
  • the preset rate may be set before leaving the factory, for example, the designer determines based on experience or experiment.
  • the preset direction may be a direction determined according to the image before the one-frame image.
  • the process of the application processor 110-1 determining the preset direction is described below.
  • the preview image has multiple frames, and the coordinates of the center position of the target object in each frame of the preview image in the imaging plane are shown in FIG. 6 (a black dot in the figure represents the target object in one frame of the preview image Central location). Since the position or shape of the target object is changing, the center position of the target object on the imaging plane is also changing.
  • the application processor 110-1 determines two pixel points on the two target objects (the center positions of the two target objects) according to the one frame image and the previous frame image (adjacent frame image) before the one frame image ) The direction of the speed is The The direction of is the preset direction.
  • the application processor 110-1 determines two pixel points on the two target objects (the center positions of the two target objects) according to the one frame image and the next frame image (the adjacent frame image) after the one frame image ) The direction of the speed is Then versus When the included angle is smaller than a preset angle (such as 10 degrees), the next frame image (neighboring frame image) after the one frame image determines that the two target objects are the same object.
  • a preset angle such as 10 degrees
  • the preset direction may also be a user-defined direction, or a direction set before the mobile phone 100 is shipped from the factory, or a direction determined by other methods, which is not limited in this embodiment of the present application.
  • the mobile phone 100 may consider the direction and speed of the speed at the same time, or may consider only the speed and not the direction of the speed. That is, when the rate is greater than the preset rate, it is determined that two target objects are the same object. Alternatively, the mobile phone 100 may only consider the direction of the speed, regardless of the speed, that is, when the angle between the speed direction and the preset direction is smaller than the preset angle, it is determined that the two target objects are the same object.
  • the mobile phone 100 first determines whether the first target object and the second target object are of the same object type, and then determines whether the moving speed of the first target object and the second target object is less than the preset speed.
  • the order of these two processes is not limited.
  • the mobile phone 100 may first determine whether the moving speed of the first target object and the second target object is less than a preset speed, and then determine whether the first target object and the second target object belong to the same object type.
  • the mobile phone 100 determines that the first target object and the second target object are of the same object type, it is determined that the first target object and the second target object are the same object (no need to determine whether the moving speed of the first target object and the second target object are Less than the preset speed); or, when the mobile phone 100 determines that the moving speed of the first target object and the second target object is less than the preset speed, it is determined that the first target object and the second target object are the same object (no need to determine the first target object) Whether it is the same object type as the second target object).
  • FIG. 4 takes the one frame image and the next frame image as examples.
  • the application processor 110-1 can Both images are processed using the method shown in Figure 4.
  • the above content is described by taking the camera application in the mobile phone 100 (the camera application included in the mobile phone 100 or other camera applications downloaded from the network side of the mobile phone 100 such as a beauty camera) as an example.
  • the object recognition algorithm provided by the embodiment of the present application can also be applied to other scenes, such as QQ video, WeChat video, etc., which require a camera to collect images.
  • the object recognition algorithm provided by the embodiments of the present application can not only recognize the target object in the image collected by the camera, but also can detect the moving pictures or videos sent by other devices (such as received through the mobile communication module 150 or the wireless communication module 160 (Animation pictures or videos sent by other devices), or target objects in animations or videos downloaded from the network side, which are not limited in the embodiments of the present application.
  • the mobile phone 100 after the mobile phone 100 recognizes the target object in the preview image, it can display related information of the target object.
  • the relevant information such as the name and type of the target object, or a web page link (such as a purchase link for opening the purchase information of the target object), etc.
  • the mobile phone 100 can display the relevant information of the target object, which can be displayed as text information or as an icon. Taking an icon as an example, when the mobile phone detects that the user triggers the icon, the relevant information of the target object is displayed.
  • FIGS. 7-9 are examples of several application scenarios for the mobile phone 100 to recognize objects provided by the embodiments of the present application.
  • the display interface of the mobile phone 100 displays a WeChat chat interface 701, and a moving picture 702 sent by Amy in the chat interface 701.
  • the mobile phone 100 detects that the user triggers the animation 702 (such as long-pressing the animation 702)
  • the mobile phone 100 displays an identification control 703;
  • the recognition control 703 is displayed.
  • the mobile phone 100 detects that the user triggers the recognition control 703
  • the object in the animation 703 is recognized according to the object recognition method provided in the embodiment of the present application.
  • the display interface of the mobile phone 100 is the framing interface 801 of the camera application, and a preview image 802 (dynamic change) is displayed in the framing interface 801.
  • the viewfinder interface 801 includes a control 803.
  • the object recognition algorithm provided in the embodiment of the present application recognizes the object in the preview image 802.
  • the control 803 in FIG. 8 is only used as an example. In actual applications, the control 803 may also be displayed in other forms or at other locations, which is not limited in the embodiments of the present application.
  • the mobile phone 100 When the mobile phone 100 recognizes the object in the preview image, it can display relevant information of the image. For example, referring to (b) in FIG. 8, the mobile phone 100 displays a label 804 of an object (such as a flower), and the label 804 displays the name of the recognized flower.
  • the mobile phone 100 detects that the tag 804 is triggered, display more detailed information of the object (ie, flower) (such as showing the origin, alias, planting method, etc. of the flower); or, when the mobile phone 100 detects the tag 804 is triggered , Display other applications (such as Baidu Encyclopedia), and display more detailed information of the object in the interface of other applications, which is not limited in the embodiments of the present application.
  • the display position of the label 804 in the preview image may also change with the change in the position of the object.
  • the mobile phone 100 displays a scanning frame 901, and when the image of an object is displayed in the scanning frame 901, a scanning control 902 is displayed.
  • the image in the scan frame 901 is recognized according to the object recognition method provided in the embodiment of the present application.
  • the embodiment shown in FIG. 9 can be applied to scenarios with a scanning function such as Taobao and Alipay. Taking Taobao as an example, when the mobile phone 100 recognizes the object in the scan frame 901, it can display the purchase link of the object.
  • FIGS. 7-9 are only examples of several application scenarios, and the object recognition algorithm provided by the embodiment of the present application may also be applied to other scenarios.
  • the characters on the display are undergoing changes in position and form.
  • the object recognition algorithm provided by the embodiment of the present application can more accurately track the The same person.
  • the display position of the person's label (a mark used to identify a person such as a specific symbol, color, etc.) can also be moved to improve the accuracy of object tracking.
  • the object recognition algorithm provided by the embodiments of the present application may be applicable to a scenario where a terminal device is unlocked through face recognition: when the mobile phone 100 collects multiple frames of face images, the faces in the multiple frames of human images are the same face, the terminal The device is unlocked.
  • the object recognition algorithm provided by the embodiment of the present application can also be applied to the scenario of paying by face: when the mobile phone 100 displays a payment interface (such as a WeChat payment interface or Alipay payment interface), the mobile phone 100 collects multiple frames of faces In the case of images, when the faces in the multiple frames of images are the same face, the payment process is completed.
  • the object recognition algorithm provided by the embodiment of the present application can also be applied to a face-swipe and punch-card scenario, which will not be described in detail.
  • a separate application may be provided in the mobile phone 100, which is used to photograph an object to identify the object, so that the user can easily recognize the object.
  • the object recognition method provided by the embodiment of the present application may also be applicable to game applications, such as augmented reality (augmented reality (AR) or virtual reality (VR) and other applications.
  • game applications such as augmented reality (augmented reality (AR) or virtual reality (VR) and other applications.
  • AR augmented reality
  • VR virtual reality
  • a virtual reality device such as a mobile phone or a computer
  • AR glasses can recognize the same object on different images, display the label of the object, and present the object and the label to the user through a virtual reality display (AR glasses).
  • AR glasses virtual reality display
  • the display screen of the mobile phone 100 displays the recognized object and related information of the object.
  • the object recognized by the mobile phone 100 and related information of the object may also be displayed through other display screens (such as an external display), which is not limited in the embodiment of the present application.
  • the method provided by the embodiments of the present application is introduced from the perspective of the terminal device (mobile phone 100) as an execution subject.
  • the terminal may include a hardware structure and / or a software module, and implement the above functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. Whether one of the above functions is executed in a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application of the technical solution and design constraints.
  • a terminal device provided by an embodiment of the present application can perform the method in the embodiments shown in FIG. 2 to FIG. 9 described above.
  • the terminal device includes: a processing unit and a display unit. among them,
  • the processing unit is used to identify the first target object of the first frame image; identify the second target object of the second frame image adjacent to the first frame image;
  • the degree of similarity between the first target object and the second target object is greater than a preset degree of similarity, and the moving speed is less than the preset speed, it is determined that the first target object and the second target object are the same object;
  • the display unit is used to display the first frame image or the second frame image
  • modules / units can be implemented by hardware, and can also be implemented by hardware executing corresponding software.
  • the processing unit may be the processor 110 shown in FIG. 2, or the application processor 110-1 shown in FIG. 3, or other processors.
  • the display screen may be the display screen 194 shown in FIG. 2 or other display screens connected to the terminal device (such as an external display screen).
  • An embodiment of the present application further provides a computer storage medium.
  • the storage medium may include a memory.
  • the memory may store a program.
  • the program When the program is executed, the electronic device is executed to include the method described in FIG. 4 as described above. All steps.
  • An embodiment of the present application further provides a computer program product, which, when the computer program product runs on an electronic device, causes the electronic device to perform all the steps described in the method embodiment shown in FIG. 4 above.
  • the division of units in the embodiments of the present application is schematic, and is only a division of logical functions, and there may be other division manners in actual implementation.
  • the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the first acquiring unit and the second acquiring unit may be the same unit or different units.
  • the above integrated unit can be implemented in the form of hardware or software function unit.
  • the term “when” may be interpreted to mean “if " or “after” or “in response to determination " or “in response to detection ".
  • the phrase “when determined” or “if detected (the stated condition or event)” may be interpreted to mean “if determined " or “in response to determination " or “detected (The stated condition or event) “or” in response to the detection (the stated condition or event) ".
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server or data center Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including a server, a data center, and the like integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, DVD), or a semiconductor medium (for example, a solid-state hard disk), or the like.
  • the method provided by the embodiments of the present application is introduced from the perspective of the terminal device as an execution subject.
  • the terminal device may include a hardware structure and / or a software module, and implement the above functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. Whether one of the above-mentioned functions is executed in a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application of the technical solution and design constraints.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Studio Devices (AREA)
  • Telephone Function (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种物体识别方法和终端设备。该方法包括:终端设备识别第一帧图像的第一目标物体;所述终端设备识别与所述第一帧图像相邻的第二帧图像的第二目标物体;若所述第一目标物体和所述第二目标物体的相似程度大于预设相似程度,且移动速度小于预设速度,则所述终端设备确定所述第一目标物体和所述第二目标物体为同一物体。在该方法中,终端设备可以识别相邻帧的图像上的物体是否是同一物体,有助于提高物体识别的准确性。

Description

一种物体识别方法和终端设备 技术领域
本申请涉及终端技术领域,尤其涉及一种物体识别方法和终端设备。
背景技术
随着终端技术的进步,越来越多的终端设备上应用了物体识别技术。以手机为例,手机可以采集一个物体(比如人脸)的图像,对该图像上的物体进行识别。
现有技术中,物体识别技术只能识别出形态相对固定的物体。而实际生活中,很多物体会发生形变,比如猫、狗等。当终端设备识别出一帧图像中包含一只猫(比如处于站立状态),若下一帧图像中仍然包含这只猫,而这只猫的形态发生变化(比如处于趴着状态)时,终端设备可能无法识别下一帧图像中所包含的猫,或者识别出错(比如由于动物趴着的姿态类似,可能识别成狗)。
可见,现有技术中,对能够发生形变的物体的识别的准确性较低。
发明内容
本申请提供一种物体识别方法和终端设备,用以提高识别能够发生形变的物体的准确性。
第一方面,本申请实施例提供一种物体识别方法,该方法可以由终端设备执行。该方法包括:终端设备识别第一帧图像的第一目标物体;所述终端设备识别与所述第一帧图像相邻的第二帧图像的第二目标物体;若所述第一目标物体和所述第二目标物体的相似程度大于预设相似程度,且移动速度小于预设速度,则所述终端设备确定所述第一目标物体和所述第二目标物体为同一物体。
在本申请实施例中,终端设备可以识别两帧图像比如相邻帧的图像上的物体是否是同一物体,有助于提高物体识别的准确性。
在一种可能的设计中,终端设备识别第一帧图像的第一目标物体,包括:所述终端设备获取所述第一帧图像中的第一特征信息;所述终端设备在预存的物体匹配模板中匹配将所述第一特征信息相匹配的第二特征信息;所述物体匹配模板中包括物体和特征信息之间的对应关系;所述终端设备确定所述物体匹配模板中与所述第二特征信息对应的物体即为所述第一目标物体;所述第一目标物体和所述第二目标物体的相似程度大于预设相似程度,包括:所述第一目标物体和所述第二目标物体属于同一物体种类。
在本申请实施例中,终端设备确定两帧图像比如相邻帧上的目标物体属于同一物体种类后,可以进一步判断该目标物体是否是同一物体,有助于提高物体识别的准确性。
在一种可能的设计中,所述移动速度用于指示:所述移动速度用于指示:位移矢量和时间的比值,所述位移矢量为从所述第一目标物体上的第一像素点到所述第二目标物体上的第二像素点的位移;所述时间用于指示所述终端设备采集所述第一帧图像和所述第二帧图像的时间间隔;其中,所述第二像素点是所述终端设备根据匹配算法确定的与所述第一像素点相匹配的像素点。
在本申请实施例中,终端设备根据相邻帧图像上的目标物体的像素点的位置和采集图像的时间间隔确定目标物体的速度,若该速度较小,则相邻帧图像上的目标物体是同一物 体。通过这种方式,有助于提高物体识别的准确性。
在一种可能的设计中,所述移动速度小于预设速度,包括:所述移动速度的速率小于预设速率;和/或所述移动速度的方向与预设方向之间的夹角小于预设角度;其中,所述预设方向为从所述第三像素点移动到所述第一像素点的移动方向;所述第三像素点是所述终端设备根据匹配算法在所述第一帧图像的上一帧图像上确定的与所述第一像素点匹配的像素点。
在本申请实施例中,终端设备可以根据相邻帧图像上的目标物体的速度和方向满足条件时,确定相邻帧图像上的目标物体是同一物体。通过这种方式,有助于提高物体识别的准确性。
在一种可能的设计中,在终端设备识别第一帧图像的第一目标物体之前,所述终端设备还可以检测到用户操作,响应于用户操作,打开相机应用,启动摄像头,显示取景界面;所述取景界面中显示所述摄像头采集的预览图像;所述预览图标包括所述第一帧图像和所述第二帧图像。
在本申请实施例中,终端设备(比如手机)中的相机应用可以用于识别物体,并且可以识别动态变化的预览图像上的物体是否是同一物体,有助于提高物体识别的准确性。
在一种可能的设计中,所述取景界面中显示第一控件,当所述第一控件被触发时,所述终端设备识别所述预览图像中的目标物体。
在本申请实施例中,终端设备(比如手机中的相机应用)中的物体识别功能可以通过控件来启动或关闭,较为灵活,方便操作。
在一种可能的设计中,所述终端设备识别出所述第一目标物体和所述第二目标物体是同一物体之后,所述终端设备还可以输出提示信息,该提示信息用于提示所述第一目标物体和所述第二目标物体是同一物体。
在本申请实施例中,当终端设备识别出两帧图像上的目标物体是同一物体时,终端设备提示用户该目标物体是同一物体,有助于帮助用户跟踪物体,提高目标物体跟踪的准确性。
在一种可能的设计中,在终端设备识别第一帧图像的第一目标物体之前,所述终端设备还可以显示所述第一帧图像;所述终端设备识别所述第一帧图像上的第一目标物体之后,所述终端设备可以在所述第一帧图像上显示第一目标物体的标签,所述标签包括所述第一目标物体的相关信息;在所述终端设备识别所述第二帧图像上的第二目标物体之前,所述终端设备还可以显示第二帧图像;所述终端设备确定所述第一目标物体和所述第二目标物体为同一物体之后,所述终端设备在第二帧图像上继续显示所述标签。
在本申请实施例中,终端设备识别两帧图像中的同一个目标物体时,可以显示同一标签,该标签包括该目标物体的相关信息。通过这种方式,提高物体识别准确性,提高用户体验,而且该标签可以显示物体的相关信息,方便用户查看。
在一种可能的设计中,所述终端设备确定所述第一目标物体和所述第二目标物体为同一物体之后,所述标签的显示位置跟随所述第一目标物体和所述第二目标物体移动。
在本申请实施例中,终端设备识别两帧图像上的目标物体是同一个物体时,该物体标签的显示位置可以随着图像中目标物体的移动而移动,有助于帮助用户跟踪物体,提高目标物体跟踪的准确性。
在一种可能的设计中,在终端设备识别第一帧图像的第一目标物体之前,所述终端设 备显示一通信应用的聊天界面,所述聊天界面中包括一动图;所述终端设备检测到针对所述动图的操作,显示第二控件;所述第二控件用于触发所述终端设备识别所述动图中的目标物体。
终端设备比如手机可以通过本申请实施例提供的物体识别方法识别微信聊天界面中发送的图像(动图或视频)中的物体。
在一种可能的设计中,在终端设备识别第一帧图像的第一目标物体之前,所述终端设备处于锁屏状态;所述终端设备采集至少两帧人脸图像;所述终端确定第一帧图像上的人脸和第二帧图像上的人脸是同一人脸之后,所述终端设备解锁。
在本申请实施例中,当终端设备采集多帧人脸图像,这多帧人图像中的人脸是同一人脸,终端设备解锁,提高人脸识别准确性。
在一种可能的设计中,在终端设备识别第一帧图像的第一目标物体之前,所述终端显示支付验证界面;所述终端设备采集至少两帧人脸图像;所述终端确定第一帧图像上的人脸和第二帧图像上的人脸是同一人脸之后,所述终端执行支付流程。
在本申请实施例中,当终端设备显示支付界面(比如微信支付界面、或者支付宝支付界面)时,终端设备采集到多帧人脸图像时,这多帧图像中的人脸是同一人脸时,完成支付流程。通过这种方式,提高支付安全性。
第二方面,本申请实施例提供一种终端设备。该终端设备包括:处理器和存储器;其中,所述存储器用于存储一个或多个计算机程序;当所述存储器存储的一个或多个计算机程序被所述处理器执行时,使得所述终端设备能够实现本申请实施例第一方面及其第一方面任一可能设计的技术方案。
第三方面,本申请实施例还提供了一种终端设备,所述终端设备包括执行第一方面或者第一方面的任意一种可能的设计的方法的模块/单元;这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
第四方面,本申请实施例的一种芯片,所述芯片与电子设备中的存储器耦合,执行本申请实施例第一方面及其第一方面任一可能设计的技术方案;本申请实施例中“耦合”是指两个部件彼此直接或间接地结合。
第五方面,本申请实施例的一种计算机存储介质,所述计算机可读存储介质包括计算机程序,当计算机程序在电子设备上运行时,使得所述电子设备执行本申请实施例第一方面及其第一方面任一可能设计的技术方案。
第六方面,本申请实施例的中一种计算机程序产品,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行本申请实施例第一方面及其第一方面任一可能设计的技术方案。
附图说明
图1为本发明一实施例提供的一种摄像头成像过程的示意图;
图2为本发明一实施例提供的一种手机的结构示意图;
图3为本发明一实施例提供的一种手机的结构示意图;
图4为本发明一实施例提供的一种物体识别方法的流程示意图;
图5为本发明一实施例提供的一种物体识别方法的流程示意图;
图6为本发明一实施例提供的一种像素点移动速度的示意图;
图7为本发明一实施例提供的一种手机的显示界面的示意图;
图8为本发明一实施例提供的一种手机的显示界面的示意图;
图9为本发明一实施例提供的一种手机的显示界面的示意图;
图10为本发明一实施例提供的一种手机的显示界面的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
以下,对本申请实施例中的部分用语进行解释说明,以便于本领域技术人员理解。
本申请实施例涉及的原始图像,即摄像头将采集的目标物体反射的光信号转化为数字图像信号而得到的原始数据,该原始数据可以是未经过加工处理的数据。比如,原始图像可以是raw格式数据。该raw格式数据中包括目标物体的信息和摄像头参数。摄像头参数比如ISO、快门速度、光圈值、白平衡等。
本申请实施例涉及的预览图像,终端设备对原始图像进行加工处理得到的图像。比如,终端设备基于原始图像中的摄像头参数将原始图像转换成RGB图像或者YUV数据等包含颜色信息的图像。通常,预览图像可以呈现在相机应用的界面比如取景界面。
需要说明的是,由于摄像头采集的原始图像是动态变化的(比如用户握持终端设备移动位置,导致摄像头的取景范围发生变化;或者,目标物体自身的位置或形态发生变化),即原始图像可以包括多帧图像,不同帧图像包括的目标物体(比如,人、动物)的位置或者形态有差异。因此,预览图像也是动态变化的,即预览图像也可以包括多帧图像的。
预览图像或者原始图像均可以作为本申请实施例提供的物体识别算法的输入图像。在下文中,以预览图像作为本申请实施例提供的物体识别算法的输入图像为例。
需要说明的是,本申请实施例涉及的图像,比如原始图像、预览图像等可以是图片形式,也可以是数据的集合,比如是一些参数(比如像素点、颜色信息等)的集合。
本申请实施例涉及的像素,为一帧图像上的最小成像单元。一个像素可以对应图像上的一个坐标点。一个像素点可以对应一个参数(比如灰度),也可以对应多个参数的集合(比如灰度、亮度、颜色等)。
本申请实施例涉及的像平面坐标系,是建立于成像平面上的坐标系。请参见图1所示,为本申请实施例提供的一种摄像头成像过程的示意图。如图1所示,摄像头拍摄人物时,采集人物图像,并将采集的人物图像呈现在成像平面上。在图1中,像平面坐标系以o-x-y表示,其中,o为像平面坐标系原点,x轴、y轴分别是像平面坐标系的坐标轴。其中,原始图像或者预览图像上的像素点均可以在相平面坐标系中表示出来。
本申请实施例涉及的至少一个,用于指示一个或多个。其中,多个,是指大于或等于两个。
需要说明的是,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,如无特殊说明,一般表示前后关联对象是一种“或”的关系。且在本发明实施例的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。
以下介绍终端设备、用于这样的终端设备的图形用户界面(graphical user interface,GUI)、和用于使用这样的终端设备的实施例。在本申请一些实施例中,终端设备可以是包 含摄像头等具有图像采集功能的器件的便携式终端,诸如手机、平板电脑等。便携式终端设备的示例性实施例包括但不限于搭载
Figure PCTCN2018110525-appb-000001
或者其它操作系统的便携式终端设备。上述便携式终端设备也可以是其它便携式终端设备,例如数码相机,只要具有图像采集功能即可。还应当理解的是,在本申请其他一些实施例中,上述终端设备也可以不是便携式终端设备,而是具有图像采集功能的台式计算机。
通常情况下,终端设备支持多种应用。比如以下应用中的一个或多个:相机应用、即时消息收发应用、照片管理应用等。其中,即时消息收发应用可以有多种。比如微信、腾讯聊天软件(QQ)、WhatsApp Messenger、连我(Line)、照片分享(instagram)、Kakao Talk、钉钉等。用户通过即时消息收发应用,可以将文字、语音、图片、视频文件以及其他各种文件等信息发送给其他联系人;或者,用户可以通过即时消息收发应用实现与其他联系人的视频或音频通话。
以终端设备是手机为例,图2示出了手机100的结构示意图。
手机100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160、音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本发明实施例示意的结构并不构成对手机100的具体限定。在本申请另一些实施例中,手机100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
其中,处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是手机100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
手机100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,手机100可以包括1个或N个显示屏194,N为大于1的正整数。
手机100可以通过处理器110,摄像头193,显示屏194等实现图像拍摄功能。摄像头193用于捕获静态图像、动态图像或视频。通常,摄像头193可以包括感光元件(比如镜头组)和图像传感器,其中,镜头组包括多个透镜(凸透镜或凹透镜),用于采集目标物体反射的光信号,并将采集的光信号传递给图像传感器。图像传感器根据所述光信号生成目标物体的原始图像。图像传感器将生成的原始图像发送给处理器110。处理器110对原始图像进行处理(比如将原始图像转换成RGB图像或者YUV数据等包含颜色信息的图像),得到预览图像。显示屏194显示预览图像。
当处理器110运行本申请实施例提供的物体识别算法对预览图像中的物体进行识别后(比如用户主动触发处理器110运行本申请实施例提供的物体识别算法对预览图像中的目标物体进行识别),显示屏194显示预览图像以及从预览图像中识别出的目标物体的相关信息。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行手机100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储手机100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
其中,距离传感器180F,用于测量距离。手机100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,手机100可以利用距离传感器180F测距以实现快速对焦。在另一些实施例中,手机100还可以利用距离传感器180F检测是否有人或物体靠近。
其中,接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。手机100通过发光二极管向外发射红外光。手机100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定手机100附近有物体。当检测到不充分的反射光时,手机100可以确定手机100附近没有物体。手机100可以利用接近光传感器180G检测用户手持手机100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。手机100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测手机100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。手机100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,手机100利用温度传感器180J检 测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,手机100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,手机100对电池142加热,以避免低温导致手机100异常关机。在其他一些实施例中,当温度低于又一阈值时,手机100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于手机100的表面,与显示屏194所处的位置不同。
另外,手机100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。手机100可以接收按键190输入,产生与手机100的用户设置以及功能控制有关的键信号输入。手机100可以利用马达191产生振动提示(比如来电振动提示)。手机100中的指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。手机100中的SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和手机100的接触和分离。
手机100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation, FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
以下实施例均可以在具有上述硬件结构的终端设备(例如手机100、平板电脑等)中实现。
为了方便描述本申请实施例提供的物体识别算法,下文将通过与本申请实施例提供的物体识别算法相关的部件介绍本申请实施例的物体识别算法,具体请参见图3,图3中的部件可参考关于图1的相关描述。需要说明的是,在图3中,以处理器110集成应用处理器110-1为例。
在本申请一些实施例中,通过图3所示的手机100进行物体识别可以是如下过程:
手机100的显示屏194显示主界面,主界面中包括各个应用程序的图标(比如电话应用图标、视频播放器图标、音乐播放器图标、相机应用图标、浏览器应用图标等等)。用户通过设置于显示屏194上的触摸传感器180K(图2中未示出,可参见图1所示)点击主界面中的相机应用的图标,启动相机应用,打开摄像头193。显示器194显示相机应用的界面,例如取景界面。
摄像头193中的镜头组193-1-1采集目标物体反射的光信号,并将采集的光信号传递给图像传感器193-2。图像传感器193-2根据所述光信号生成目标物体的原始图像。图像传感器315-2将原始图像发送给应用处理器110-1。应用处理器110-1对原始图像进行处理(比如将原始图像转换成RGB图像),得到预览图像;或者,图像传感器315-2还可以将原始图像发送给其它处理器(比如ISP,图3中未示出),由ISP对原始图像进行处理,得到预览图像。ISP将预览图像发送给应用处理器110-1。
在本申请一些实施例中,相机应用的界面中可以显示特定的控件,当该控件被触发时,手机100启动对预览图像中的物体的识别功能。具体而言,手机100中的触摸传感器180K检测到用户点击相机应用界面(比如取景界面)中的特定控件,触发应用处理器301-1运行本申请实施例提供的物体识别算法的代码,识别所述预览图像中的目标物体。
在本申请另一些实施例中,应用处理器110-1得到预览图像(比如应用处理器将原始图像转化为GRB图像)后,也可以自动运行本申请实施例提供的物体识别算法的代码,识别所述预览图像中的物体。无需用户主动触发进行物体识别。
无论哪种方式,应用处理器110-1识别出预览图像中的目标物体时,显示器194显示该目标物体的关联信息(比如目标物体的名称、类型等,将在后文介绍)。
需要说明的是,上述内容中是以处理器110集成应用处理器110-1为例的。实际上,处理器110中还可以只集成GPU,该GPU用于执行上述内容中应用处理器110-1的功能;或者,处理器110中还可以只集成CPU,该CPU用于执行上述内容中应用处理器110-1的功能。总之,本申请实施例不限定运行本申请实施例提供的物体识别算法代码的主体。
在本申请实施例中,摄像头193可以以一定的时间间隔不断的采集图像,即采集多帧图像。因此,若多帧图像中每帧图像中包含的目标物体不同(位置、形态等不同),那么多帧图像显示在相机应用界面(比如取景界面)时,呈现出画面动态变化的效果。比如,目标物体自身的位置发生改变(比如目标物体自身产生位移,从第一位置移动到第二位置) 或形态发生改变(比如目标物体自身从第一形态改变为第二形态),导致摄像头193采集的多帧原始图像中每帧原始图像中目标物体的显示位置、形态变化。原始图像是动态变化的,即预览图像也是动态变化的。
为了准确的识别预览图像中的目标物体,在本申请实施例中,应用处理器301-1运行本申请实施例提供的物体识别算法的代码识别预览图像中的目标物体可以分两个过程。第一个过程,应用处理器110-1可以对每一帧预览图像中的目标物体进行识别。第二个过程,当相邻帧图像上的两个目标物体的相似程度大于预设相似程序(比如这两个目标物体属于同一物体种类)时,应用处理器110-1确定这两个目标物体是否是同一物体。这是因为,相邻帧图像上的两个目标物体的相似程度较高时,说明这两个目标物体是同一物体的可能性较大,应用处理器110-1可以进一步判断这两个目标物体是否是同一物体。比如,应用处理器110-1可以判断相邻帧图像中的目标物体的相关性,若相关(比如,相邻帧图像中的目标物体的像素点的移动速度小于等于预设速度),则相邻帧图像中的目标物体是同一个物体(具体过程将在后文介绍);若不相关(比如,相邻帧图像中的目标物体的像素点的移动速度大于预设速度),则相邻帧图像中的目标物体不是同一个物体。
举例来说,用户启动手机100中的相机应用对着一只猫进行拍摄。这只猫的形态在发生变化(比如躺着、站着)。因此,相机应用中取景界面中的预览图像是动态变化的。手机100可以识别出每一帧图像都包括一只猫,但是在不同帧图像中,这只猫的形态有差异,所以手机100可以进一步确定相邻帧图像中所包含的猫是否是同一只猫。
需要说明的是,现有技术中,终端设备对多帧图像(比如视频、动图)中的物体识别时,是对每帧图像单独识别的。假设终端设备识别出第一帧图像中的一个物体时,下一帧图像中该物体的形态发生变化,终端设备重新识别下一帧图像中该物体,由于形态发生变化,有可能导致终端设备无法识别该物体。或者,由于物体形态发生变化,终端设备将该物体识别成其它物体,即识别出错,但实际上,下一帧图像和第一帧图像中的该物体是同一个物体。
在本申请实施例提供的物体识别方法中,终端设备在识别多帧图像(比如视频、动态)中的目标物体时,可以考虑相邻帧之间的相关性(比如相邻帧图像中的目标物体的像素点的移动速度)来判断相邻帧图像中的目标物体是否是同一个物体。因此,在本申请实施例中,终端设备不仅可以识别多帧图像(比如视频、动态)中每帧图像上的目标物体,还可以识别多帧图像中的目标物体是否是同一个物体,提高物体识别准确性。
下面介绍应用处理器110-1运行本申请实施例提供的物体识别算法的代码,识别所述预览图像(多帧预览图像)中的目标物体的过程。
请参见图4所示,为本申请实施例提供的物体识别方法的流程示意图。如图4所示,应用处理器110-1运行物体识别算法的代码以执行如下过程:
S401:获取一帧预览图像中的第一目标物体的第一特征信息,在预存的物体匹配模板中匹配与所述第一特征信息相匹配的第二特征信息,所述物体匹配模块中与所述第二特性信息对应的物体即为所述第一目标物体。
手机100获取一帧预览图像上的第一目标物体的第一特征信息可以有多种实现方式,比如,前景背景分离,边缘检测算法等等,本申请实施例不限定。
在本申请实施例中,手机100中可以存储有物体匹配模板,所述物体匹配模板中包括不同种类的物体的特征信息。其中,物体的特征信息包括物体的边缘轮廓,特征点(眼睛、 嘴巴、尾巴等)的颜色信息、纹理信息等。物体匹配模板可以是手机100出厂之前设置好的,或者也可以是用户使用手机100的过程中自定义的。
假设物体匹配模板是手机100出厂之前设置好的,下面介绍手机100出厂之前,获得物体匹配模板的过程。
设计人员可以将同一个目标物体的多张图像作为手机100的输入图像,对所述多张图像进行识别。其中,每张图像中该目标物体的形态不同。因此,手机100得到每张图像中的目标物体的特征信息。
以目标物体是猫为例,设计人员可以获得这只猫处于不同形态下的100张图像(比如,自己拍摄或者从网络侧获取)。其中,100张图像中每张图像中这只猫的形态不同。手机100对每张图像上的目标物体(比如猫)进行识别,保存该目标物体(比如猫)的特征信息,进而得到该目标物体的100种形态下的特征信息。其中,该特征信息可以是目标物体在每种形态下的边缘轮廓,特征点(眼睛、嘴巴、尾巴等)的颜色信息、纹理信息等。
示例性的,手机100可以以表格(如表1)的形式存储物体的特征信息,即物体匹配模板。
表1
Figure PCTCN2018110525-appb-000002
在表1中,仅示出了物体种类比如猫的两种形态模板(比如趴着或站着),在实际应用中,还可以包括其它形态。即表1仅为一种物体匹配模板的示例,本领域技术人员可以细化表1。继续以前面的例子为例,设计人员获得猫处于不同形态下的100张图像进行识别,得到猫在100种形态下的特征信息,即表1中与猫对应的形态有100种。当然,猫的种类有多种,可以采用类似的方式获得不同种类猫(即猫的物体种类有多种)的各种形态下的特征信息,本申请实施例不作限定。
因此,在本申请实施例中,应用处理器110-1在识别某帧预览图像上的目标物体时,可以先获取该帧预览图像上的目标物体的特征信息(边缘轮廓和特征点的颜色信息、纹理信息等)。若获取的特征信息和物体匹配模板(比如表1)中的某个特征信息相匹配,应用处理器110-1确定与匹配出的特征信息对应的物体即为所述目标物体。因此,为了识别每帧图像上的不同物体,物体匹配模板中可以尽可能的包括各种物体(比如动物、人物等可能发生形变的物体),以及每种物体在不同形态下的特征信息。这样的话,手机100中存储了各种物体在不同形态下的特征信息,通过这种方式,应用处理器110-1可以识别每帧图像上处于不同形态的目标物体。当然,物体匹配模板还可以更新,比如用户手动更新或者手机100自动更新。
S402:获取下一帧预览图像中的第二目标物体的第三特征信息,在预存的物体匹配模板中匹配与所述第三特征信息相匹配的第四特征信息,所述物体匹配模板中与所述第四特性信息对应的物体即为所述第二目标物体。
如前述内容可知,预览图像是动态变化的。因此,在本申请实施例中,应用处理器110-1可以对每一帧预览图像均执行S401。若应用处理器110-1确定一帧图像中的第一目标物体 之后,可以采用相同的方式对下一帧预览图像中的第二目标物体进行识别。由于应用处理器110-1是采用相同的方式(物体匹配模板)识别第一目标物体和第二目标物体,因此,识别出的第一目标物体和第二目标物体可能是同一物体种类(比如都是猫),也可能不是同一物体种类(比如第一目标物体是猫,第二目标物体是狗)。
作为一种示例,在本申请实施例中,当识别出的第一目标物体和第二目标物体是同一物体种类(比如都是猫)时,应用处理器110-1可以确定这两个目标物体就是同一个物体。作为另一种示例,为了提高物体识别准确性,当识别出的第一目标物体和第二目标物体时同一物体种类时,应用处理器110-1还可以继续判断第一目标物体和第二目标物体是否是同一物体,即继续执行后续步骤;当识别出的第一目标物体和第二目标物体不是同一物体种类时,应用处理器110-1可以不执行后续步骤。
S403:确定所述一帧图像上的第一目标物体的第一像素点。
需要说明的是,每一帧图像都是呈现在成像平面上的,所以应用处理器110-1识别出第一目标物体后,可以在像平面坐标系中确定第一目标物体上的第一像素点。该第一像素点可以是第一目标物体的中心位置坐标,或者是第一目标物体上的特征点(比如眼睛)的位置坐标。
S404:根据预设算法确定所述下一帧图像上第二目标物体上与所述第一像素点对应的第二像素点。
由于应用处理器110-1选取的第一像素点有多种可能的情况。比如,第一像素点是第一目标物体的中心位置坐标或者是第一目标物体上的特征点的位置坐标。假设第一像素点是第一目标物体的中心位置坐标,那么第二像素点就是第二目标物体的中心位置坐标。具体而言,应用处理器110-1可以根据滤波算法(比如卡尔曼滤波算法)确定目标物体的中心位置,本申请实施例不多赘述。假设第一像素点是第一目标物体上的特征点(比如眼睛),那么第二像素点是第二目标物体上的特征点(比如眼睛)。具体而言,应用处理器110-1可以根据匹配算法(比如相似度匹配算法)在第二目标物体上寻找与第一目标物体上的特征点相匹配的特征点。
请参见图5所示,以目标物体是猫为例,猫的形态发生变化。摄像头采集到多帧预览图像,其中一帧预览图像是对处于实线状态的猫拍摄的,下一帧预览图像是对处于虚线状态的猫拍摄的。应用处理器110-1识别这两帧预览图像中的两个目标物体均是猫。应用处理器110-1确定所述一帧预览图像中第一目标物体(处于实线状态的猫)的第一像素点即成像平面上的A点。应用处理器110-1确定下一帧预览图像中第二目标物体(处于虚线状态的猫)的第二像素点即成像平面上的B点。需要说明的是,所述一帧图像和下一帧图像上的像素点均是呈现在成像平面上的,所以在图5中所述一帧图像上的第一像素点和下一帧图像上的第二像素点都标注在成像平面上了,但实际上,第一像素点和第二像素点是两张不同图像上的像素点。
S405:根据第一像素点A的第一坐标(x1、y1)和所述第二像素点B的第二坐标(x2、y2),确定
Figure PCTCN2018110525-appb-000003
其中,t用于指示摄像头采集所述一帧图像和所述下一帧图像的时间间隔。
需要说明的是,目标物体自身的位置或形态发生变化时,即目标物体上的物点发生变化,与该物点对应的像点(像平面坐标系中的像素点)也发生变化,对应到预览图像中目标物体的显示位置发生变化。因此,预览图像中物体的位置、形态的变化情况可以反映真 实环境中物体的位置、形态的变化情况。
通常,摄像头193采集两帧图像的时间间隔较短(比如30ms),该时间间隔可以是手机100出厂之前设置好的,也可以是用户在使用手机100的过程中自定义的。而在这段时间间隔内真实环境中的目标物体的位置、形态发生变化较小。也就是说,虽然真实环境中目标物体的位置或形态不断发生变化,但是摄像头193可以以较短的时间间隔不断采集目标物体的图像,所以相邻帧图像上的目标物体的位置或形态变化较小。因此,应用处理器110-1可以判断相邻帧图像中两个目标物体上的两个像素点之间的移动速度是否小于预设速度,若小于,则这两个目标物体是同一物体;若大于,则这两个目标物体不是同一物体。
因此,应用处理器110-1可以根据第一像素点A的第一坐标(x1、y1)和所述第二像素点B的第二坐标(x2、y2),确定
Figure PCTCN2018110525-appb-000004
即第一像素点A移动到第二像素点B的速度。
S406:若
Figure PCTCN2018110525-appb-000005
的速率小于预设速率v 0、方向与预设方向之间的夹角小于预设夹角,确定所述第一目标物体和所述第二目标物体是同一物体。
第一像素点A移动到第二像素点B的速度
Figure PCTCN2018110525-appb-000006
包括速率和方向。具体而言,速率小于预设速率,方向与预设方向之间的夹角小于预设角度时,所述第一目标物体和所述第二目标物体是同一物体。其中,预设速率可以是出厂之前设置好的,比如设计人员根据经验或者实验确定的。
预设方向可以是根据所述一帧图像之前的图像确定出的方向。下面介绍应用处理器110-1确定预设方向的过程。
作为一种示例,举例来说,预览图像有多帧,每帧预览图像中目标物体的中心位置在成像平面中的坐标如图6所示(图中一个黑点表征一帧预览图像中目标物体的中心位置)。由于目标物体的位置或形态在变化,所以成像平面上的目标物体的中心位置也在变化。应用处理器110-1根据所述一帧图像和所述一帧图像之前的上一帧图像(相邻帧图像)确定出两个目标物体上的两个像素点(两个目标物体的中心位置)之间的速度的方向是
Figure PCTCN2018110525-appb-000007
Figure PCTCN2018110525-appb-000008
的方向即为预设方向。
应用处理器110-1根据所述一帧图像和所述一帧图像之后的下一帧图像(相邻帧图像)确定出两个目标物体上的两个像素点(两个目标物体的中心位置)之间的速度的方向是
Figure PCTCN2018110525-appb-000009
那么
Figure PCTCN2018110525-appb-000010
Figure PCTCN2018110525-appb-000011
之间的夹角小于预设角度(比如10度)时,所述一帧图像之后的下一帧图像(相邻帧图像)确定出两个目标物体是同一物体。
另外,预设方向还可以是用户自定义的方向,或者手机100出厂之前设置的方向,或者通过其它方式确定出的方向,本申请实施例不作限定。
需要说明的是,手机100在判断相邻帧图像上的两个目标物体是否为同一物体时,可以同时考虑速度的方向和速率,也可以只考虑速率大小,不考虑速度的方向。即当速率大于预设速率时,确定两个目标物体时同一物体。或者,手机100也可以只考虑速度的方向,不考虑速率大小,即当速度方向与预设方向的夹角小于预设角度时,确定两个目标物体是同一物体。
需要说明的是,在上面的实施例中,手机100先判断第一目标物体和第二目标物体是否是同一物体种类,然后再判断第一目标物体和第二目标物体的移动速度是否小于预设速度。在实际应用中,这两个过程的顺序不限定。比如,手机100可以先判断第一目标物体和第二目标物体的移动速度是否小于预设速度,然后再判断第一目标物体和第二目标物体 是否属于同一物体种类。或者,手机100判断第一目标物体和第二目标物体是同一物体种类时,就确定第一目标物体和第二目标物体是同一物体(无需判断第一目标物体和第二目标物体的移动速度是否小于预设速度);或者,手机100判断第一目标物体和第二目标物体的移动速度小于预设速度时,就确定第一目标物体和第二目标物体时同一物体(无需判断第一目标物体和第二目标物体是否是同一物体种类)。
需要说明的是,上述图4所示的实施例是以一帧图像和下一帧图像为例说明的是,在实际应用中,应用处理器110-1可以对视频或动图中每相邻两帧图像都采用图4所示的方法流程进行处理。
上述内容是以手机100中的相机应用(手机100自带的相机应用,或者手机100从网络侧下载的其他相机应用比如美颜相机)为例进行描述的。实际上,本申请实施例提供的物体识别算法还可以应用于其它场景,比如QQ视频、微信视频等需要摄像头采集图像的场景。再比如,本申请实施例提供的物体识别算法不仅可以对摄像头采集的图像中的目标物体进行识别,还可以对其它设备发送的动图或视频(比如通过移动通信模块150或无线通信模块160接收其它设备发送的动图或视频),或者从网络侧下载的动图或视频中的目标物体进行识别,本申请实施例不作限定。
在本申请实施例中,手机100识别出预览图像中的目标物体后,可以显示该目标物体的相关信息。比如,所述相关信息比如目标物体的名称、种类,或者网页链接(比如用于打开所述目标物体的购买信息的购买链接)等,本申请实施例不作限定。另外,手机100显示目标物体的相关信息的方式有多种,可以以文字信息显示,也可以以图标的形式显示。以图标为例,手机检测到用户触发该图标时,显示该目标物体的相关信息。
作为一种示例,请参见图7-9所示,为本申请实施例提供的手机100识别物体的几种应用场景的示例。
如图7所示,手机100的显示界面显示微信聊天界面701,该聊天界面701中Amy发送的一张动图702。当手机100检测到用户触发该动图702(比如长按动图702)时,手机100显示识别控件703;或者当手机100检测到用户触发该动图702后,放大该动图,当手机100检测到用户长按放大后的动图时,显示识别控件703,。当手机100检测到用户触发识别控件703时,根据本申请实施例提供的物体识别方法对动图703中的物体进行识别。
如图8所示,手机100的显示界面相机应用的取景界面801,该取景界面801中显示一预览图像802(动态变化)。取景界面801中包括一控件803。当用户触发该控件803时,根据本申请实施例提供的物体识别算法识别预览图像802中物体。需要说明的是,图8中控件803仅作为一种示例,实际应用中,控件803还可以以其它形式显示,或者显示在其它位置处,本申请实施例不限定。
当手机100识别出预览图像中的物体时,可以显示该图像的相关信息。比如,请参见图8中的(b),手机100显示物体(比如花)的标签804,该标签804显示识别出的花的名称。当手机100检测到该标签804被触发时,显示该物体(即花)的更加详细信息(比如显示花的产地、别名、种植方式等);或者,当手机100检测到该标签804被触发时,显示其它应用(比如百度百科),在其它应用的界面中显示该物体的更加详细信息,本申请实施例不限定。需要说明的是,当物体在预览图像中的位置发生变化时,预览图像中的标签804的显示位置也可以随着物体的位置的变化而变化。
如图9所示,手机100显示扫描框901,当该扫描框901中显示物体的图像时,显示扫描控件902。手机100检测到用户触发该扫描识物控件902时,根据本申请实施例提供的物体识别方法对扫描框901中的图像进行识别。图9所示的实施例可以适用于淘宝、支付宝等具有扫描功能的场景。以淘宝为例,手机100识别出扫描框901中的物体时,可以显示该物体的购买链接。
需要说明的是,上述图7-9所示的仅是几种应用场景的示例,本申请实施例提供的物体识别算法还可以应用于其它场景。比如,视频监控领域,请参见图10,如图10所示,显示屏上的人物正在发生位置和形态的变化,通过本申请实施例提供的物体识别算法,可以更加准确的跟踪监控视频中的同一个人物。其中,视频中人物的移动时,人物的标签(用于标识一个人的标记比如特定符号、颜色等)的显示位置也可以移动,提高物体跟踪的准确性。
再比如,本申请实施例提供的物体识别算法可以适用于通过人脸识别解锁终端设备的场景:当手机100采集多帧人脸图像,这多帧人图像中的人脸是同一人脸,终端设备解锁。再比如,本申请实施例提供的物体识别算法还可以适用于通过人脸支付的场景:当手机100显示支付界面(比如微信支付界面、或者支付宝支付界面)时,手机100采集到多帧人脸图像时,这多帧图像中的人脸是同一人脸时,完成支付流程。类似的,本申请实施例提供的物体识别算法还可以适用于刷脸打卡场景,不多赘述。
再比如,手机100中可以设置一个单独的应用,该应用用于拍摄物体来识别物体,以方便用户认知物体。
当然,本申请实施例提供的物体识别方法还可以适用于游戏类应用,比如增强现实(augmented reality,AR)或虚拟现实(virtual reality,VR)等应用中。以VR为例,虚拟现实设备(比如手机、电脑)可以识别不同图像上的同一物体,显示该物体的标签,并将该物体和标签通过虚拟现实显示器(AR眼镜)呈现给用户。
在上面的实施例中,是以手机100为例的,手机100的显示屏显示识别出的物体和物体的相关信息。当本申请实施例提供的物体识别方法适用于其它场景时,手机100识别出的物体和物体的相关信息也可以通过其它显示屏(比如外接的显示器)显示,本申请实施例不限定。
本申请的各个实施方式可以任意进行组合,以实现不同的技术效果。
上述本申请提供的实施例中,从终端设备(手机100)作为执行主体的角度对本申请实施例提供的方法进行了介绍。为了实现上述本申请实施例提供的方法中的各功能,终端可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。
基于相同的构思,本申请实施例提供的一种终端设备,该终端设备可以执行上述图2-图9所示实施例中的方法。该终端设备包括:处理单元,显示单元。其中,
所述处理单元用于识别第一帧图像的第一目标物体;识别与所述第一帧图像相邻的第二帧图像的第二目标物体;
若所述第一目标物体和所述第二目标物体的相似程度大于预设相似程度,且移动速度小于预设速度,确定所述第一目标物体和所述第二目标物体为同一物体;
显示单元用于显示第一帧图像或第二帧图像;
这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
当终端设备是图2所示的手机100时,处理单元可以是图2所示的处理器110,或者是图3所示的应用处理器110-1、或其他处理器。显示屏可以是图2所示的显示屏194,也可以是与终端设备连接的其它显示屏(比如外接的显示屏)。
本申请实施例还提供一种计算机存储介质,该存储介质可以包括存储器,该存储器可存储有程序,该程序被执行时,使得电子设备执行包括如前的图4所示的方法实施例中记载的全部步骤。
本申请实施例还提供一种包含计算机程序产品,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行包括如前的图4所示的方法实施例中记载的全部步骤。
需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。本发明实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。例如,上述实施例中,第一获取单元和第二获取单元可以是同一个单元,也不同的单元。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述实施例中所用,根据上下文,术语“当…时”可以被解释为意思是“如果…”或“在…后”或“响应于确定…”或“响应于检测到…”。类似地,根据上下文,短语“在确定…时”或“如果检测到(所陈述的条件或事件)”可以被解释为意思是“如果确定…”或“响应于确定…”或“在检测到(所陈述的条件或事件)时”或“响应于检测到(所陈述的条件或事件)”。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如DVD)、或者半导体介质(例如固态硬盘)等。
为了解释的目的,前面的描述是通过参考具体实施例来进行描述的。然而,上面的示例性的讨论并非意图是详尽的,也并非意图要将本申请限制到所公开的精确形式。根据以上教导内容,很多修改形式和变型形式都是可能的。选择和描述实施例是为了充分阐明本申请的原理及其实际应用,以由此使得本领域的其他技术人员能够充分利用具有适合于所构想的特定用途的各种修改的本申请以及各种实施例。
上述本申请提供的实施例中,从终端设备作为执行主体的角度对本申请实施例提供的方法进行了介绍。为了实现上述本申请实施例提供的方法中的各功能,终端设备可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的 方式来执行,取决于技术方案的特定应用和设计约束条件。

Claims (15)

  1. 一种物体识别方法,其特征在于,所述方法包括:
    终端设备识别第一帧图像的第一目标物体;
    所述终端设备识别与所述第一帧图像相邻的第二帧图像的第二目标物体;
    若所述第一目标物体和所述第二目标物体的相似程度大于预设相似程度,且移动速度小于预设速度,则所述终端设备确定所述第一目标物体和所述第二目标物体为同一物体。
  2. 如权利要求1所述的方法,其特征在于,终端设备识别第一帧图像的第一目标物体,包括:
    所述终端设备获取所述第一帧图像中的第一特征信息;
    所述终端设备在预存的物体匹配模板中匹配将所述第一特征信息相匹配的第二特征信息;所述物体匹配模板中包括物体和特征信息之间的对应关系;
    所述终端设备确定所述物体匹配模板中与所述第二特征信息对应的物体即为所述第一目标物体;
    所述第一目标物体和所述第二目标物体的相似程度大于预设相似程度,包括:
    所述第一目标物体和所述第二目标物体属于同一物体种类。
  3. 如权利要求1或2所述的方法,其特征在于,所述移动速度用于指示:位移矢量和时间的比值,所述位移矢量为从所述第一目标物体上的第一像素点到所述第二目标物体上的第二像素点的位移;所述时间用于指示所述终端设备采集所述第一帧图像和所述第二帧图像的时间间隔;其中,所述第二像素点是所述终端设备根据匹配算法确定的与所述第一像素点相匹配的像素点。
  4. 如权利要求1-3任一所述的方法,其特征在于,所述移动速度小于预设速度,包括:
    所述移动速度的速率小于预设速率;和/或
    所述移动速度的方向与预设方向之间的夹角小于预设角度;
    其中,所述预设方向为从所述第三像素点移动到所述第一像素点的移动方向;所述第三像素点是所述终端设备根据匹配算法在所述第一帧图像的上一帧图像上确定的与所述第一像素点匹配的像素点。
  5. 如权利要求1-4任一所述的方法,其特征在于,在终端设备识别第一帧图像的第一目标物体之前,还包括:
    所述终端设备检测到用户操作,响应于用户操作,打开相机应用,启动摄像头,显示取景界面;所述取景界面中显示所述摄像头采集的预览图像;所述预览图标包括所述第一帧图像和所述第二帧图像。
  6. 如权利要求5所述的方法,其特征在于,所述方法还包括:所述取景界面中显示第一控件,当所述第一控件被触发时,所述终端设备识别所述预览图像中的目标物体。
  7. 如权利要求1-6任一所述的方法,其特征在于,所述终端设备识别出所述第一目标物体和所述第二目标物体是同一物体之后,所述方法还包括:
    所述终端设备输出提示信息,所述提示信息用于提示所述第一目标物体和所述第二目标物体是同一物体。
  8. 如权利要求1-4任一所述的方法,其特征在于,在终端设备识别第一帧图像的第一目标物体之前,还包括:
    所述终端设备显示一通信应用的聊天界面,所述聊天界面中包括一动图;
    所述终端设备检测到针对所述动图的操作,显示第二控件;所述第二控件用于触发所述终端设备识别所述动图中的目标物体。
  9. 如权利要求1-4任一所述的方法,其特征在于,在终端设备识别第一帧图像的第一目标物体之前,所述方法还包括:
    所述终端设备显示所述第一帧图像;
    所述终端设备识别所述第一帧图像上的第一目标物体之后,还包括:
    所述终端设备在所述第一帧图像上显示第一目标物体的标签,所述标签包括所述第一目标物体的相关信息;
    在所述终端设备识别所述第二帧图像上的第二目标物体之前,还包括:
    所述终端设备显示第二帧图像;
    所述终端设备确定所述第一目标物体和所述第二目标物体为同一物体之后,还包括:
    所述终端设备在第二帧图像上继续显示所述标签。
  10. 如权利要求9所述的方法,其特征在于,所述方法还包括:
    所述终端设备确定所述第一目标物体和所述第二目标物体为同一物体之后,所述标签的显示位置跟随所述第一目标物体和所述第二目标物体移动。
  11. 如权利要求1-4任一项所述的方法,其特征在于,在终端设备识别第一帧图像的第一目标物体之前,所述方法还包括:
    所述终端设备处于锁屏状态;
    所述终端设备采集至少两帧人脸图像;
    所述终端确定所述至少两帧人脸图像中第一帧图像上的人脸和第二帧图像上的人脸是同一人脸之后,所述方法还包括:所述终端设备解锁。
  12. 如权利要求1-4任一项所述的方法,其特征在于,在终端设备识别第一帧图像的第一目标物体之前,所述方法还包括:
    所述终端显示支付验证界面;
    所述终端设备采集至少两帧人脸图像;
    所述终端确定所述至少两帧人脸图像中第一帧图像上的人脸和第二帧图像上的人脸是同一人脸之后,所述方法还包括:所述终端执行支付流程。
  13. 一种终端设备,其特征在于,包括;处理器和存储器;
    其中,所述存储器用于存储一个或多个计算机程序;
    当所述存储器存储的一个或多个计算机程序被所述处理器执行时,使得所述电子设备能够实现如权利要求1至12任一所述的方法。
  14. 一种计算机存储介质,其特征在于,所述计算机可读存储介质包括计算机程序,当计算机程序在电子设备上运行时,使得所述电子设备执行如权利要求1至12任一所述的方法。
  15. 一种计算机程序产品,其特征在于,当所述计算机程序产品被计算机执行时,使所述计算机执如行权利要求1~12任一项所述的方法。
PCT/CN2018/110525 2018-10-16 2018-10-16 一种物体识别方法和终端设备 WO2020077544A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2018/110525 WO2020077544A1 (zh) 2018-10-16 2018-10-16 一种物体识别方法和终端设备
CN201880087013.3A CN111615704A (zh) 2018-10-16 2018-10-16 一种物体识别方法和终端设备
EP18937263.4A EP3855358A4 (en) 2018-10-16 2018-10-16 OBJECT DETECTION METHOD AND TERMINAL DEVICE
US17/231,352 US20210232853A1 (en) 2018-10-16 2021-04-15 Object Recognition Method and Terminal Device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/110525 WO2020077544A1 (zh) 2018-10-16 2018-10-16 一种物体识别方法和终端设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/231,352 Continuation US20210232853A1 (en) 2018-10-16 2021-04-15 Object Recognition Method and Terminal Device

Publications (1)

Publication Number Publication Date
WO2020077544A1 true WO2020077544A1 (zh) 2020-04-23

Family

ID=70283704

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/110525 WO2020077544A1 (zh) 2018-10-16 2018-10-16 一种物体识别方法和终端设备

Country Status (4)

Country Link
US (1) US20210232853A1 (zh)
EP (1) EP3855358A4 (zh)
CN (1) CN111615704A (zh)
WO (1) WO2020077544A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609317B (zh) * 2021-09-16 2024-04-02 杭州海康威视数字技术股份有限公司 一种图像库构建方法、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831385A (zh) * 2011-06-13 2012-12-19 索尼公司 多相机监控网络中的目标识别设备和方法
US20150104073A1 (en) * 2013-10-16 2015-04-16 Xerox Corporation Delayed vehicle identification for privacy enforcement
CN107798292A (zh) * 2017-09-20 2018-03-13 翔创科技(北京)有限公司 对象识别方法、计算机程序、存储介质及电子设备
CN108509436A (zh) * 2017-02-24 2018-09-07 阿里巴巴集团控股有限公司 一种确定推荐对象的方法、装置及计算机存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0502371D0 (en) * 2005-02-04 2005-03-16 British Telecomm Identifying spurious regions in a video frame
KR100883066B1 (ko) * 2007-08-29 2009-02-10 엘지전자 주식회사 텍스트를 이용한 피사체 이동 경로 표시장치 및 방법
CN101968884A (zh) * 2009-07-28 2011-02-09 索尼株式会社 检测视频图像中的目标的方法和装置
CN102880623B (zh) * 2011-07-13 2015-09-09 富士通株式会社 同名人物搜索方法及系统
US9479703B2 (en) * 2014-09-28 2016-10-25 Hai Yu Automatic object viewing methods and apparatus
US10222932B2 (en) * 2015-07-15 2019-03-05 Fyusion, Inc. Virtual reality environment based manipulation of multilayered multi-view interactive digital media representations
CN107871107A (zh) * 2016-09-26 2018-04-03 北京眼神科技有限公司 人脸认证方法和装置
US10497382B2 (en) * 2016-12-16 2019-12-03 Google Llc Associating faces with voices for speaker diarization within videos
CN107657160B (zh) * 2017-09-12 2020-01-31 Oppo广东移动通信有限公司 面部信息采集方法及相关产品
US11507646B1 (en) * 2017-09-29 2022-11-22 Amazon Technologies, Inc. User authentication using video analysis
CN107741996A (zh) * 2017-11-30 2018-02-27 北京奇虎科技有限公司 基于人脸识别的家庭图谱构建方法及装置、计算设备
CN108197570A (zh) * 2017-12-28 2018-06-22 珠海市君天电子科技有限公司 一种人数统计方法、装置、电子设备及存储介质
CN108549867B (zh) * 2018-04-12 2019-12-20 Oppo广东移动通信有限公司 图像处理方法、装置、计算机可读存储介质和电子设备
TWI679612B (zh) * 2018-08-14 2019-12-11 國立交通大學 影像追蹤方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831385A (zh) * 2011-06-13 2012-12-19 索尼公司 多相机监控网络中的目标识别设备和方法
US20150104073A1 (en) * 2013-10-16 2015-04-16 Xerox Corporation Delayed vehicle identification for privacy enforcement
CN108509436A (zh) * 2017-02-24 2018-09-07 阿里巴巴集团控股有限公司 一种确定推荐对象的方法、装置及计算机存储介质
CN107798292A (zh) * 2017-09-20 2018-03-13 翔创科技(北京)有限公司 对象识别方法、计算机程序、存储介质及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3855358A4 *

Also Published As

Publication number Publication date
EP3855358A4 (en) 2021-10-27
CN111615704A (zh) 2020-09-01
US20210232853A1 (en) 2021-07-29
EP3855358A1 (en) 2021-07-28

Similar Documents

Publication Publication Date Title
WO2020259038A1 (zh) 一种拍摄方法及设备
EP3813352B1 (en) Photographing method and electronic device
CN112840642B (zh) 一种图像拍摄方法和终端设备
WO2021037157A1 (zh) 图像识别方法及电子设备
CN113741681B (zh) 一种图像校正方法与电子设备
CN113660408B (zh) 一种视频拍摄防抖方法与装置
US11627437B2 (en) Device searching method and electronic device
WO2022228274A1 (zh) 一种变焦拍摄场景下的预览图像显示方法与电子设备
WO2021179186A1 (zh) 一种对焦方法、装置及电子设备
WO2020103732A1 (zh) 一种皱纹检测方法和终端设备
WO2020077544A1 (zh) 一种物体识别方法和终端设备
CN115150542B (zh) 一种视频防抖方法及相关设备
WO2022033344A1 (zh) 视频防抖方法、终端设备和计算机可读存储介质
CN113970965A (zh) 消息显示方法和电子设备
CN116074624B (zh) 一种对焦方法和装置
CN115633255B (zh) 视频处理方法和电子设备
WO2022206589A1 (zh) 一种图像处理方法以及相关设备
WO2021197014A1 (zh) 图片传输方法及装置
CN117115003A (zh) 去除噪声的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18937263

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018937263

Country of ref document: EP

Effective date: 20210422