CN111291638A

CN111291638A - Object comparison method, system, equipment and medium

Info

Publication number: CN111291638A
Application number: CN202010061250.1A
Authority: CN
Inventors: 周曦; 姚志强; 袁余锋
Original assignee: Shanghai Yunconghuilin Artificial Intelligence Technology Co Ltd
Current assignee: Shanghai Yunconghuilin Artificial Intelligence Technology Co Ltd
Priority date: 2020-01-19
Filing date: 2020-01-19
Publication date: 2020-06-16

Abstract

The invention provides an object comparison method, system, device and medium, comprising: acquiring an image to be compared containing one or more objects; processing the images to be compared, and mapping the images to be compared to the same comparison space through at least two deep neural networks; and determining whether one or more identical objects exist in the images to be compared by comparing the spatial comparison images to be compared. The invention inputs the multi-frame images containing one or more human faces into at least two deep neural networks, and can determine whether one or more human faces contained in one frame image of the one or more multi-frame images appear in another frame image of the one or more continuous frame images; meanwhile, the invention also utilizes the deep neural network, and can conveniently add a plurality of prior information such as illumination, shielding, angle, age, race and the like in the process of identifying the face image, thereby enhancing the adaptability of the face image and the expression capability of the characteristics.

Description

Object comparison method, system, equipment and medium

Technical Field

The present invention relates to image recognition technologies, and in particular, to a method, a system, a device, and a medium for comparing objects.

Background

In recent years, object (such as human face and human body) identification technology is widely applied to the aspects of building smart cities, safe cities and the like. However, in the existing cameras, more than 80% of the cameras can not shoot clear human faces or human bodies under any circumstances, and in addition, the anti-reconnaissance capability of the criminals is improved, the cameras can be intentionally avoided, the human faces or human body information can be captured in time, and the difficulty of timely alarming and handling is large; moreover, in an actual scene, one camera often cannot cover all areas, and multiple cameras generally do not overlap with each other. Therefore, it is very important to construct whole body information to lock and search and locate the pedestrian, and the complete movement track of the pedestrian can be constructed by utilizing the characteristics of the pedestrian, so that the cross-lens tracking of the pedestrian is realized.

Disclosure of Invention

In view of the above-mentioned shortcomings in the prior art, an object of the present invention is to provide an object comparison method, system, device and medium, which are used to solve the technical problems in the prior art.

In order to achieve the above and other related objects, the present invention provides an object comparison method, comprising the following steps:

acquiring an image to be compared containing one or more objects; wherein, the images to be compared are at least two images;

processing the images to be compared, and mapping the images to be compared to the same comparison space through at least two deep neural networks;

and comparing the images to be compared through the same comparison space, and determining whether one or more identical objects exist in the images to be compared.

Optionally, after the images to be compared are simultaneously mapped to the same comparison space, feature extraction is performed on the images to be compared, and comparison is performed.

Optionally, the types of the images to be compared include: face images, body images.

Optionally, the sources of the images to be compared include: identification photo image and image collected by camera.

Optionally, the identification photo image and the image collected by the camera are respectively input to the corresponding deep neural network for processing, so that the identification photo image and the image collected by the camera are simultaneously mapped to the same comparison space;

and respectively extracting the characteristics of the identification photo image and the image collected by the camera, and comparing the characteristics to determine whether one or more same objects exist in the image to be compared.

Optionally, the sources of the images to be compared include: a multi-frame image containing one or more objects.

Optionally, the multiple frames of images are respectively input into at least two deep neural networks, so that a certain frame of image and another frame of image containing the one or more objects are simultaneously mapped into the same comparison space;

comparing one or more objects in the certain frame image and the other frame image, and determining whether one or more same objects exist in the certain frame image and the other frame image according to a comparison result.

Optionally, inputting a frame of image containing the one or more objects into a deep neural network, and inputting another frame of image containing the one or more objects into another deep neural network;

and the deep neural network and the other deep neural network form a double-layer deep neural network, so that the certain frame of image and the other frame of image are mapped to the same comparison space at the same time.

Optionally, acquiring one or more facial features of each face in the certain frame image and one or more facial features of each face in the other frame image;

comparing one or more face features of each face in the certain frame image with one or more face features of each face in the other frame image in the same comparison space;

if one or more facial features in the certain frame image are the same as one or more facial features in the other frame image, one or more identical faces exist in the certain frame image and the other frame image.

Optionally, the method further comprises: if the one or more same faces contain the faces of one or more target objects;

acquiring each frame of image of the face containing the one or more target objects from the multi-frame images; and determining the motion information of the one or more target objects according to the acquired image of each frame of the face containing the one or more target objects.

Optionally, the motion information comprises at least one of: time of movement, geographical location of movement.

Optionally, the deep neural network refers to a deep neural network after training is completed.

Optionally, the multi-frame image including one or more human faces is acquired by one or more image acquisition devices.

Optionally, the geographical location set by the one or more image capturing devices comprises at least one of: residential areas, schools, stations, airports, markets and hospitals.

The invention also provides an object comparison system, which comprises:

the image module is used for acquiring an image to be compared containing one or more objects; wherein, the images to be compared are at least two images;

the mapping module is used for processing the images to be compared and mapping the images to be compared to the same comparison space through at least two deep neural networks;

and the comparison module is used for comparing the images to be compared through the same comparison space and determining whether one or more same objects exist in the images to be compared.

Optionally, the image module is configured to obtain a multi-frame image including one or more objects;

the mapping module is used for respectively inputting the multi-frame images into at least two deep neural networks so that one frame of image containing the one or more objects and the other frame of image are mapped into the same comparison space at the same time;

and the comparison module is used for comparing one or more objects in the certain frame image and the other frame image and determining whether one or more same objects exist in the certain frame image and the other frame image according to a comparison result.

Optionally, inputting a frame of image containing the one or more faces into a deep neural network, and inputting another frame of image containing the one or more faces into another deep neural network;

The invention also provides an object comparison device, comprising:

processing the images to be compared through at least two deep neural networks, and mapping the images to be compared to the same comparison space at the same time;

The present invention also provides an apparatus comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform a method as described in one or more of the above.

The present invention also provides one or more machine-readable media having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the methods as described in one or more of the above.

As described above, the object comparison method, system, device and medium provided by the present invention have the following beneficial effects: obtaining an image to be compared containing one or more objects; wherein, the images to be compared are at least two images; processing the images to be compared, and mapping the images to be compared to the same comparison space through at least two deep neural networks; and comparing the images to be compared through the same comparison space, and determining whether one or more identical objects exist in the images to be compared. The method inputs a multi-frame image containing one or more faces into at least two deep neural networks, and can determine whether one or more faces contained in one frame of one or more continuous frame images appear in another frame of one or more continuous frame images; meanwhile, the invention also utilizes the deep neural network, and can conveniently add a plurality of prior information such as illumination, shielding, angle, age, race and the like in the process of identifying the face image, thereby enhancing the adaptability of the face image and the expression capability of the characteristics.

Drawings

Fig. 1 is a schematic flowchart of an object comparison method according to an embodiment.

Fig. 2 is a schematic structural diagram of a two-layer deep neural network according to an embodiment.

Fig. 3 is a schematic hardware structure diagram of an object comparison system according to an embodiment.

Fig. 4 is a schematic diagram of a hardware structure of a terminal device according to an embodiment.

Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to another embodiment.

Description of the element reference numerals

M10 image module

M20 mapping module

M30 alignment module

1100 input device

1101 first processor

1102 output device

1103 first memory

1104 communication bus

1200 processing assembly

1201 second processor

1202 second memory

1203 communication assembly

1204 Power supply Assembly

1205 multimedia assembly

1206 voice assembly

1207 input/output interface

1208 sensor assembly

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

The invention provides an object comparison method, which comprises the following steps:

acquiring an image to be compared containing one or more objects; wherein the object comprises: human face, human body; the images to be compared are at least two images;

Referring to fig. 1 and fig. 2, the following description will be made in detail by taking "face" as an example:

s100, acquiring a multi-frame image containing one or more human faces; the multi-frame image comprises one or more continuous frame images and a plurality of single frame images.

S200, inputting the multi-frame images into at least two deep neural networks, and simultaneously mapping a certain frame image containing the one or more faces and another frame image containing the one or more faces to the same comparison space;

s300, comparing one or more faces in the certain frame image and the other frame image in the comparison space, and determining whether one or more identical faces exist in the certain frame image and the other frame image according to a comparison result.

The method comprises the steps of obtaining a multi-frame image containing one or more human faces; inputting the multi-frame images into at least two deep neural networks, so that one frame image containing the one or more faces and another frame image containing the one or more faces are mapped to the same comparison space at the same time; and comparing one or more faces in the certain frame image and the other frame image in a comparison space, and determining whether one or more identical faces exist in the certain frame image and the other frame image according to a comparison result. Specifically, the method can input one or more frames of images in one or more continuous frames of images containing one or more faces into at least two deep neural networks, so that one frame of image containing one or more faces and another frame of image containing one or more faces are mapped to the same comparison space at the same time; and comparing one or more faces in the certain frame image and the other frame image in the comparison space, and determining whether one or more faces contained in one frame image in the one or more continuous frame images appear in the other frame image in the one or more continuous frame images. Meanwhile, the method also utilizes the deep neural network, and can conveniently add a plurality of prior information such as illumination, shielding, angle, age, race and the like in the process of identifying the face image, thereby enhancing the adaptability of the face image and the expression capability of the characteristics.

In an exemplary embodiment, a certain frame of image containing the one or more faces is input into one deep neural network, and another frame of image containing the one or more faces is input into another deep neural network; and the deep neural network and the other deep neural network form a double-layer deep neural network, so that the certain frame of image and the other frame of image are mapped to the same comparison space at the same time. Wherein, the deep neural network refers to the deep neural network after training is completed. During training, each deep neural network inputs one or more frames of face images containing the same target object for training, and the differences of corresponding weights in the two networks are regularized by adopting a two-classification loss function, so that mapping from different image spaces to the same comparison space is realized. In the same comparison space, the difference between the classes of the face images of the same object becomes smaller, and the difference between the classes of the face images of different objects becomes larger, so that the feature discrimination is enhanced.

In an exemplary embodiment, one or more facial features of each face in one frame of image in one or more videos and one or more facial features of each face in another frame of image in one or more videos are obtained;

Through the above scheme, the method can determine whether one or more faces contained in one frame of image in one or more videos appear in another frame of image in one or more videos by comparing one or more face features of the faces.

Of course, the face images acquired by other methods may also be compared, for example: and comparing the acquired face image on the identification photo with the face image acquired by the camera. Specifically, a certificate photo image and an image collected by a camera are respectively input to corresponding deep neural networks for processing, so that the certificate photo image and the image collected by the camera are simultaneously mapped to the same comparison space; and respectively extracting the characteristics of the identification photo image and the image collected by the camera, and comparing the characteristics to determine whether one or more identical human faces exist in the image to be compared.

In some exemplary embodiments, one or more multiframe images containing one or more human faces are acquired by one or more image acquisition devices. By way of example, the image acquisition device in the application can be a camera, for example, a network camera which is built in the past is multiplexed, one or more multi-frame images are acquired by multiplexing the built camera, compared with a newly installed camera, weak current line reconstruction and fire examination and approval are omitted, implementation is simple and convenient, and no technical threshold exists. The flow of people in residential areas, schools, stations, airports, shopping malls, hospitals and other places is usually large, and the number of covered people is large; the geographical location set by one or more image capturing devices in embodiments of the present application includes at least one of: residential areas, schools, stations, airports, markets and hospitals.

In an exemplary embodiment, if one or more faces contained in one frame image of one or more videos appear in another frame image of the one or more videos, one or more same faces exist in the certain frame image of the one or more videos. If the same faces contain one or more faces of the target object; acquiring each frame image of the face containing the one or more target objects from the one or more continuous frame images; and determining the motion information of the one or more target objects according to the acquired image of each frame of the face containing the one or more target objects. Wherein the motion information comprises at least one of: time of movement, geographical location of movement.

Specifically, one or more videos shot by one or more cameras are obtained, and then whether the picture presented by each frame of image in the videos contains the face of one or more target objects is determined. If the pictures presented by some frame images in some videos contain the human faces of one or more target objects, the moving time and the moving geographic position of one or more target objects are determined according to the image pictures. As an example, for example, a video shot by 8 cameras in a certain residential area is obtained, each camera shoots a section of video, whether a human face exists in the 8 sections of video is manually watched, a video segment with the human face existing in the 8 sections of video is cut out, the video segment with the human face existing is cut into frames and frames containing images of the human face, and then each frame containing the images of the human face is input into at least two deep neural networks, so that one frame containing one or more human faces and another frame containing one or more human faces are simultaneously mapped into the same comparison space; and comparing one or more faces in the certain frame image and the other frame image in the comparison space, and determining whether one or more identical faces exist in the certain frame image and the other frame image according to a comparison result. If one or more identical faces exist in the video segments, and the identical faces contain the faces of one or more target objects; acquiring each frame of image of the face containing the one or more target objects from the one or more videos; and determining the motion information of the one or more target objects according to the acquired image of each frame of the face containing the one or more target objects. Wherein the motion information comprises at least one of: time of movement, geographical location of movement. The deep neural network refers to a deep neural network after training is completed, and the deep neural network is trained according to an image of a face containing a target object. If the human faces of one or more target objects exist in some video segments, the motion time of the one or more target objects is directly obtained from the video segments, then the video segments from which the video segments originate are judged, and the motion geographic positions of the one or more target objects can be approximately obtained according to the installation positions of the cameras; cross-shot tracking may thus be achieved for the one or more target objects. The target object in the embodiment of the present application is a person such as a lost child, a suspect in which a certain state exists, or the like.

The method comprises the steps of obtaining one or more continuous frame images containing one or more human faces; inputting one or more frames of images in the one or more continuous frames of images into at least two deep neural networks, so that one frame of image containing the one or more faces and another frame of image containing the one or more faces are mapped to the same comparison space at the same time; and comparing one or more faces in the certain frame image and the other frame image in the comparison space, and determining whether one or more identical faces exist in the certain frame image and the other frame image according to a comparison result. If one or more identical faces exist in the continuous frame images, and the identical faces contain the faces of one or more target objects; acquiring each frame image of the face containing the one or more target objects from the one or more continuous frame images; and determining the motion information of the one or more target objects according to the acquired image of each frame of the face containing the one or more target objects. The method can identify whether the multi-frame images all contain the human faces or the human bodies of one or more target objects, then judge the image acquisition equipment where the multi-frame images come from, and generate the motion information of one or more target objects according to the corresponding geographic positions of the image acquisition equipment, thereby carrying out cross-border tracking on one or more target objects.

In the embodiment of the application, if the object is a human body, the comparison method of the human body is consistent with the comparison method of the human face; specific functions and technical effects can be obtained by referring to the above embodiments, which are not described herein again.

The invention also provides an object comparison system, which comprises:

an image module M10 for acquiring an image to be compared containing one or more objects; wherein the object comprises: human face, human body; the images to be compared are at least two images;

the mapping module M20 is configured to process the images to be compared, and map the images to be compared to the same comparison space through at least two deep neural networks;

an comparing module M30, configured to compare the images to be compared with the same comparing space, and determine whether one or more identical objects exist in the images to be compared.

As shown in fig. 2 and 3, the following description will be made in detail by taking "face" as an example:

an image module M10, configured to obtain a multi-frame image containing one or more human faces; the multi-frame image comprises one or more continuous frame images and a plurality of single frame images.

The mapping module M20 is configured to input the multiple frame images into two deep neural networks, so that a certain frame image including the one or more faces and another frame image including the one or more faces are simultaneously mapped into the same comparison space;

a comparing module M30, configured to compare one or more faces in the frame of image and the other frame of image in a comparison space, and determine whether one or more identical faces exist in the frame of image and the other frame of image according to a comparison result.

The system acquires a plurality of frame images containing one or more faces through an image module; inputting the multi-frame images into at least two deep neural networks through a mapping module, so that one frame of image containing the one or more faces and the other frame of image containing the one or more faces are mapped to the same comparison space at the same time; and comparing one or more faces in the certain frame image and the other frame image in a comparison space through a comparison module, and determining whether one or more identical faces exist in the certain frame image and the other frame image according to a comparison result. Specifically, the system can input one or more frames of images in one or more continuous frames of images containing one or more faces into at least two deep neural networks, so that one frame of image containing one or more faces and another frame of image containing one or more faces are mapped to the same comparison space at the same time; and comparing one or more faces in the certain frame image and the other frame image in the comparison space, and determining whether one or more faces contained in one frame image in the one or more continuous frame images appear in the other frame image in the one or more continuous frame images. Meanwhile, the system also utilizes the deep neural network, and can conveniently add a plurality of prior information such as illumination, shielding, angles, ages, ethnicities and the like in the process of identifying the face image, thereby enhancing the adaptability of the face image and the expression capability of the characteristics.

Through the description of the scheme, the system can determine whether one or more faces contained in one frame of image in one or more videos appear in another frame of image in one or more videos by comparing one or more face features of the faces.

In some exemplary embodiments, a plurality of frames of images containing one or more faces of a person are acquired by one or more image acquisition devices. By way of example, the image acquisition device in the application can be a camera, for example, a network camera which is built in the past is multiplexed, one or more multi-frame images are acquired by multiplexing the built camera, compared with a newly installed camera, weak current line reconstruction and fire examination and approval are omitted, implementation is simple and convenient, and no technical threshold exists. The flow of people in residential areas, schools, stations, airports, shopping malls, hospitals and other places is usually large, and the number of covered people is large; the geographical location set by one or more image capturing devices in embodiments of the present application includes at least one of: residential areas, schools, stations, airports, markets and hospitals.

Specifically, one or more videos shot by one or more cameras are obtained, and then whether the picture presented by each frame of image in the videos contains the face of one or more target objects is determined. If the pictures presented by some frame images in some videos contain the human faces of one or more target objects, the moving time and the moving geographic position of one or more target objects are determined according to the image pictures. For example, a video shot by 15 cameras in a hospital is obtained, each camera shoots a section of video, whether a human face exists in the 15 sections of video is manually watched, a video segment with the human face existing in the 15 sections of video is cut out, the video segment with the human face existing is cut into frames and frames containing images of the human face, then each frame containing image of the human face is input into at least two deep neural networks, and one frame of image containing the one or more human faces and another frame of image containing the one or more human faces are simultaneously mapped into the same comparison space; and comparing one or more faces in the certain frame image and the other frame image in the comparison space, and determining whether one or more identical faces exist in the certain frame image and the other frame image according to a comparison result. If one or more identical faces exist in the video segments, and the identical faces contain the faces of one or more target objects; acquiring each frame of image of the face containing the one or more target objects from the one or more videos; and determining the motion information of the one or more target objects according to the acquired image of each frame of the face containing the one or more target objects. Wherein the motion information comprises at least one of: time of movement, geographical location of movement. The deep neural network refers to a deep neural network after training is completed, and the deep neural network is trained according to an image of a face containing a target object. If the human faces of one or more target objects exist in some video segments, the motion time of the one or more target objects is directly obtained from the video segments, then the video segments from which the video segments originate are judged, and the motion geographic positions of the one or more target objects can be approximately obtained according to the installation positions of the cameras; cross-shot tracking may thus be achieved for the one or more target objects. The target object in the embodiments of the present application is a person, such as a doctor, a patient, a ticket vendor, or the like.

The system comprises the steps of acquiring one or more continuous frame images containing one or more human faces; inputting one or more frames of images in the one or more continuous frames of images into at least two deep neural networks, so that one frame of image containing the one or more faces and another frame of image containing the one or more faces are mapped to the same comparison space at the same time; and comparing one or more faces in the certain frame image and the other frame image in the comparison space, and determining whether one or more identical faces exist in the certain frame image and the other frame image according to a comparison result. If one or more identical faces exist in the continuous frame images, and the identical faces contain the faces of one or more target objects; acquiring each frame image of the face containing the one or more target objects from the one or more continuous frame images; and determining the motion information of the one or more target objects according to the acquired image of each frame of the face containing the one or more target objects. The system can identify whether the multi-frame images all contain the human faces or the human bodies of one or more target objects, then judge the image acquisition equipment where the multi-frame images come from, and generate the motion information of one or more target objects according to the corresponding geographic positions of the image acquisition equipment, so that the cross-border tracking can be carried out on one or more target objects.

In the embodiment of the application, if the object is a human body, the comparison of the human body is consistent with the comparison of the human face; specific functions and technical effects can be obtained by referring to the above embodiments, which are not described herein again.

The embodiment of the present application further provides an object comparison device, which includes:

In this embodiment, the object comparison device executes the system or the method, and specific functions and technical effects are described with reference to the above embodiments, which are not described herein again.

An embodiment of the present application further provides an apparatus, which may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: the mobile terminal includes a smart phone, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a vehicle-mounted computer, a desktop computer, a set-top box, an intelligent television, a wearable device, and the like.

Embodiments of the present application also provide a non-transitory readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may execute instructions (instructions) included in the method in fig. 1 according to the embodiments of the present application.

Fig. 4 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.

Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the first processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.

Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.

In this embodiment, the processor of the terminal device includes a function for executing each module of the speech recognition apparatus in each device, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.

Fig. 5 is a schematic hardware structure diagram of a terminal device according to an embodiment of the present application. Fig. 5 is a specific embodiment of the implementation process of fig. 4. As shown, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.

The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 1 in the above embodiment.

The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

Optionally, a second processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication component 1203, power component 1204, multimedia component 1205, speech component 1206, input/output interfaces 1207, and/or sensor component 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.

The processing component 1200 generally controls the overall operation of the terminal device. The processing assembly 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the data processing method described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.

The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.

The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The voice component 1206 is configured to output and/or input voice signals. For example, the voice component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, the speech component 1206 further comprises a speaker for outputting speech signals.

The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.

The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.

The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.

As can be seen from the above, the communication component 1203, the voice component 1206, the input/output interface 1207 and the sensor component 1208 involved in the embodiment of fig. 5 can be implemented as the input device in the embodiment of fig. 4.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. An object comparison method, comprising the steps of:

2. The object comparison method according to claim 1, wherein after the images to be compared are simultaneously mapped to the same comparison space, feature extraction is performed on the images to be compared, and comparison is performed.

3. The object matching method according to claim 2, wherein the types of the images to be compared include: face images, body images.

4. The method according to claim 3, wherein the sources of the images to be compared comprise: identification photo image and image collected by camera.

5. The object comparison method according to claim 4, wherein the identification photo image and the image collected by the camera are respectively input to the corresponding deep neural network for processing, so that the identification photo image and the image collected by the camera are simultaneously mapped to the same comparison space;

6. The method according to claim 3, wherein the sources of the images to be compared comprise: a multi-frame image containing one or more objects.

7. The method according to claim 6, wherein the plurality of frames of images are respectively input into at least two deep neural networks, such that one frame of image and another frame of image containing the one or more objects are simultaneously mapped into a same comparison space;

8. The method according to claim 7, wherein a frame of image containing the one or more objects is inputted into a deep neural network, and another frame of image containing the one or more objects is inputted into another deep neural network;

9. The object comparison method according to claim 8, wherein one or more facial features of each face in the certain frame image and one or more facial features of each face in the other frame image are obtained;

10. The object alignment method of claim 9, further comprising: if the one or more same faces contain the faces of one or more target objects;

11. The object matching method of claim 10, wherein the motion information comprises at least one of: time of movement, geographical location of movement.

12. The object comparison method of claim 1, wherein the deep neural network is a deep neural network after training is completed.

13. The object comparison method according to claim 1, wherein a plurality of frame images including one or more human faces are acquired by one or more image acquisition devices.

14. The object matching method according to claim 13, wherein the geographical location set by the one or more image capturing devices comprises at least one of: residential areas, schools, stations, airports, markets and hospitals.

15. An object matching system, comprising:

16. The object comparison system of claim 15, wherein after the images to be compared are simultaneously mapped to the same comparison space, feature extraction is performed on the images to be compared and comparison is performed.

17. The object matching system of claim 16, wherein the types of images to be compared include: face images, body images.

18. The object matching system of claim 17, wherein the sources of the images to be compared include: identification photo image and image collected by camera.

19. The object comparison system of claim 18, wherein the identification photo image and the image collected by the camera are respectively input to corresponding deep neural networks for processing, so that the identification photo image and the image collected by the camera are simultaneously mapped to the same comparison space;

20. The object matching system of claim 17, wherein the sources of the images to be compared include: a multi-frame image containing one or more objects.

21. The object alignment system of claim 20,

the device comprises an image module, a processing module and a display module, wherein the image module is used for acquiring a multi-frame image containing one or more objects;

22. The object matching system of claim 21, wherein one frame of image containing the one or more faces is input into one deep neural network, and another frame of image containing the one or more faces is input into another deep neural network;

23. The object matching system of claim 22, wherein one or more facial features of each face in one of the frames of images and one or more facial features of each face in the other frame of image are obtained;

24. The object alignment system of claim 23, further comprising: if the one or more same faces contain the faces of one or more target objects;

25. The object matching system of claim 24, wherein the motion information comprises at least one of: time of movement, geographical location of movement.

26. The object matching system of claim 21, wherein the deep neural network is a deep neural network after training is completed.

27. The object matching system of claim 21, wherein multiple frames of images including one or more faces are captured by one or more image capturing devices.

28. The object alignment system of claim 27, wherein the geographic location set by the one or more image capture devices comprises at least one of: residential areas, schools, stations, airports, markets and hospitals.

29. An object comparison device, comprising:

30. An apparatus, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method recited by one or more of claims 1-12.

31. One or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method recited by one or more of claims 1-12.