CN115862063A

CN115862063A - Object identification method, device, equipment and storage medium

Info

Publication number: CN115862063A
Application number: CN202211511637.8A
Authority: CN
Inventors: 李良斌
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-03-28

Abstract

The application discloses an object identification method, an object identification device, object identification equipment and a storage medium, and belongs to the technical field of computers. The object recognition method is applied to recognition equipment of an object recognition system, the recognition equipment is deployed in a corresponding scene area for object recognition, the object features of a collected scene image are compared with the object features of a stored target object, and the behavior state of the target object can be determined according to the comparison result so as to determine whether the target object is in the scene area. Based on this, because the object characteristics can describe the whole body morphological characteristics of the object, the accuracy and the efficiency of determining the behavior state of the object can be greatly improved by comparing the object characteristics acquired in real time with the stored object characteristics, thereby improving the efficiency of personnel management.

Description

Object identification method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for object recognition.

Background

The digital sentry point is portable intelligent terminal equipment, and can identify the identity of a reciprocating object in a scene through modes of face recognition, certificate verification, body temperature detection and the like in a large-scale activity and a dense and passing scene of people so as to prevent the passing of an abnormal object which does not accord with a passing condition.

Since the digital sentry point has the characteristics of portability and flexible deployment, and is not generally provided with a physical interception function like a gate, a target object is generally arranged to assist in a scene area where the digital sentry point is deployed so as to effectively prevent abnormal objects from passing through. The behavior of the target object largely determines whether the abnormal object can be truly prevented from passing through. For example, if the target object is not in the scene area, it may cause an abnormal person to be not blocked. Therefore, in a scene using an identification device such as a digital sentry point, how to accurately identify the behavior state of the target object so as to improve the personnel management efficiency in a traffic scene becomes a problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides an object identification method, device, equipment and storage medium, which can accurately identify the behavior state of a target object and effectively improve the personnel management efficiency. The technical scheme is as follows:

on one hand, an object identification method is provided, which is applied to identification devices included in an object identification system, wherein each identification device is deployed in a corresponding scene area for object identification; the method comprises the following steps:

the identification equipment collects at least one scene image in the scene area;

the recognition device compares the object features of the at least one scene image with the stored object features of the target object, wherein the object features describe the whole-body morphological features of the object, the object features of the scene image are obtained by performing object recognition on the scene image, and the object features of the target object are obtained by performing object recognition on the whole-body image of the target object;

the recognition device determines a behavior state of the target object based on the comparison result, the behavior state indicating whether the target object is located within the scene area.

In a possible embodiment, the identification device determines the behavior state of the target object based on the comparison result, including:

in a case where the comparison result indicates that the object feature of the scene image does not match the object feature of the target object, the identification device determines that the target object is in a first behavior state indicating that the target object is not within the scene area.

In one possible embodiment, the method further comprises:

the recognition device records the time when the target object leaves the scene area based on the time stamp of any scene image in response to detecting that the target object is in the first behavior state.

In one possible embodiment, the method further comprises:

the recognition device responds to the detection that the target object is in the first behavior state, and plays a first prompt voice which is used for enabling the target object to return to the scene area.

In one possible embodiment, the method further comprises:

the recognition device responds to the detection that the target object is in the first behavior state, and sends the object characteristics of the target object to a target recognition device adjacent to the recognition device in the object recognition system, wherein the target recognition device is used for playing the first prompt voice under the condition that the target object is detected.

In one possible embodiment, the method further comprises:

the identification device responds to identification of an abnormal object from any one of the scene images under the condition that the target object is in the first behavior state, and acquires a scene image including the abnormal object in the scene area within a target duration after the abnormal object is identified.

In a possible implementation, the identification device determines the behavior state of the target object based on the comparison result, including:

in a case that the comparison result indicates that the object features of the scene image match the object features of the target object, the recognition device determines that the target object is within the scene area;

determining the distance between the target object and the recognition device based on the depth information of the image area where the target object is located in the at least one scene image;

in the case that the distance is greater than a target threshold, determining that the target object is in a second behavior state, the second behavior state indicating that the target object is too far away from the recognition device within the scene area.

In one possible embodiment, the method further comprises:

under the condition that the target object is determined to be in the scene area, determining the distance change trend between the target object and the identification device based on the depth information of the area where the target object is located in a plurality of scene images;

and if the distance change trend indicates that the target object is far away from the recognition equipment, playing a second prompt voice aiming at the target object, wherein the second prompt voice is used for keeping the target object in the scene area.

In one possible embodiment, before the recognition device compares the object features of the at least one scene image with the stored object features of the target object, the method further comprises:

the identification equipment extracts the face characteristics of the target object based on the whole body image of the target object, and determines the identity information of the target object based on the face characteristics;

and when the identity information of the target object passes the verification, extracting object features of the target object from the whole-body image, and storing the object features of the target object.

In one aspect, an object recognition apparatus is provided, which is applied to recognition devices included in an object recognition system, and each recognition device is deployed in a corresponding scene area to perform object recognition; the device comprises:

the acquisition module is used for acquiring at least one scene image in the scene area;

a comparison module, configured to compare an object feature of the at least one scene image with an object feature of a stored target object, where the object feature describes a whole-body morphological feature of an object, the object feature of the scene image is obtained by performing object recognition on the scene image, and the object feature of the target object is obtained by performing object recognition on the whole-body image of the target object;

and the state determining module is used for determining the behavior state of the target object based on the comparison result, wherein the behavior state indicates whether the target object is positioned in the scene area or not.

In one possible implementation, the state determination module is configured to:

In one possible embodiment, the apparatus further comprises:

and the recording module is used for recording the time when the target object leaves the scene area based on the time stamp of any scene image in response to the detection that the target object is in the first behavior state.

In one possible embodiment, the apparatus further comprises:

and the first prompt module is used for responding to the detection that the target object is in the first behavior state, and playing a first prompt voice, wherein the first prompt voice is used for enabling the target object to return to the scene area.

In one possible embodiment, the apparatus further comprises:

and the linkage prompting module is used for responding to the detection that the target object is in the first behavior state, sending the object characteristics of the target object to a target recognition device adjacent to the recognition device in the object recognition system, and playing the first prompting voice under the condition that the target object is detected by the target recognition device.

In one possible embodiment, the apparatus further comprises:

and the abnormal acquisition module is used for responding to the identification of an abnormal object from any one of the scene images under the condition that the target object is in the first behavior state, and acquiring the scene image including the abnormal object in the scene area within the target time length after the abnormal object is identified.

determining that the target object is within the scene area if the comparison result indicates that the object features of the scene image match the object features of the target object;

In one possible embodiment, the apparatus further comprises:

the second prompting module is used for determining a distance change trend between the target object and the identification device based on the depth information of the area where the target object is located in a plurality of scene images under the condition that the target object is determined to be located in the scene area;

In one possible embodiment, the apparatus further comprises:

the object feature extraction module is used for extracting the face features of the target object based on the whole-body image of the target object and determining the identity information of the target object based on the face features;

and the object binding module is used for extracting the object features of the target object from the whole-body image and storing the object features of the target object under the condition that the identity information of the target object passes verification.

In one aspect, a computer device is provided, the computer device comprising:

one or more processors;

a memory for storing the processor executable program code;

wherein the processor is configured to execute the program code to implement the object recognition method described above.

In one aspect, a computer-readable storage medium is provided, the computer-readable storage medium comprising: the program code in the computer readable storage medium, when executed by a processor of a computer device, enables the computer device to perform the above-described object recognition method.

In one aspect, a computer program product is provided that includes one or more instructions for execution by one or more processors of a computer device to enable the computer device to perform the object recognition method described above.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an object recognition system provided in an embodiment of the present application;

fig. 2 is a flowchart of an object identification method according to an embodiment of the present application;

fig. 3 is a flowchart of an object identification method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an object recognition apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution.

The term "at least one" in this application refers to one or more, "a plurality" means two or more, for example, a plurality of images refers to two or more images.

It should be noted that information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals referred to in this application are authorized by the user or sufficiently authorized by various parties, and the collection, use, and processing of the relevant data is required to comply with relevant laws and regulations and standards in relevant countries and regions. For example, the scene image, the whole body image, the face image, the identity information and the like referred to in the present application are obtained under the condition of obtaining sufficient authorization of the user.

Next, an environment in which embodiments of the present application are implemented will be described.

The object identification method provided by the application can be applied to identification devices included in an object identification system, and each identification device is deployed in a corresponding scene area to perform object identification. Fig. 1 is a schematic diagram of an object recognition system according to an embodiment of the present application, and referring to fig. 1, the object recognition system shown in fig. 1 includes a plurality of recognition devices 101 and a server 102. The plurality of identification devices 101 can communicate with a server through a network.

The recognition device 101 is configured to perform object recognition in a corresponding scene area, and determine a behavior state of a target object by comparing an object feature of a captured scene image with a stored object feature of the target object. In some embodiments, the recognition device 101 can acquire scene images in a scene area in real time, and implement functions such as face recognition, identity verification, target tracking, and behavior state recognition based on the acquired scene images. In some embodiments, the recognition device 101 can communicate with the server 102, and query the database associated with the server 102 for identity information corresponding to the acquired facial image, so as to implement an identity verification function. In some embodiments, the recognition device 10 may further perform image processing such as segmentation, classification, and recognition on a scene image acquired in real time, and extract object features in the scene image to compare the extracted object features with the stored object features, so as to determine a behavior state of the object according to a comparison result, and further perform personnel management based on the recognized behavior state.

In some embodiments, the recognition device 101 is an image capturing terminal, such as at least one of a webcam, a smart phone, a smart watch, a desktop computer, a laptop computer, a virtual reality terminal, an augmented reality terminal, a wireless terminal, a laptop computer, and the like. In some embodiments, the recognition device 101 is also referred to as a digital sentinel, and is used to provide services such as identity verification, face recognition, and person state detection.

The identification device 101 has a communication function and can access the internet. The identification device 101 may be generally referred to as one of a plurality of identification devices, and those skilled in the art will appreciate that the number of the identification devices may be greater or less, and is not limited thereto.

The server 102 is configured to manage a plurality of recognition devices 101, and provide background data services for the plurality of recognition devices 101, for example, store the captured scene images, perform image processing such as segmentation and recognition on the scene images captured by the recognition devices, perform object recognition on the scene images based on an object recognition algorithm, and the like. In some embodiments, the server 102 includes a plurality of function servers, such as a device management server, a data storage server, an object recognition server, and the like, which can be used to implement the various function services described above. In other embodiments, the server 102 has associated therewith a database for storing data relating to the identification of objects by the object identification system. In still other embodiments, the server 102 may be an independent physical server, a server cluster or a distributed file system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content Delivery Networks (CDNs), big data and artificial intelligence platforms, and the like.

In some embodiments, the network may be a wired network or a wireless network. The network uses standard communication techniques and/or protocols. The Network is typically the Internet, but can be any Network including, but not limited to, a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile, wireline or wireless Network, a private Network, or any combination of virtual private networks. In some embodiments, data exchanged over a network is represented using techniques and/or formats including Hypertext Markup Language (HTML), extensible Markup Language (XML), and the like. All or some of the links can also be encrypted using conventional encryption techniques such as Secure Socket Layer (SSL), transport Layer Security (TLS), virtual Private Network (VPN), internet Protocol Security (IPsec). In other embodiments, custom and/or dedicated data communication techniques can also be used in place of or in addition to the data communication techniques described above.

The following describes application scenarios related to the present application.

The object identification method and the object identification system can be applied to a scene of managing personnel. Each identification device in the object identification system can be deployed in a corresponding scene area to identify the passing personnel in the scene area, so that efficient personnel management and safety monitoring are realized. Aiming at the target object which is in charge of assisting the identification equipment to carry out personnel management, the identification equipment can accurately identify the behavior state of the target object, so that the personnel management efficiency is improved. In some embodiments, the identification device is also referred to as a digital sentinel.

After the implementation environment and the application scenario of the embodiment of the present application are introduced, the object identification method provided in the embodiment of the present application is briefly described below. Fig. 2 is a flowchart of an object identification method provided in an embodiment of the present application, where the method is applied to identification devices included in the object identification system, and each identification device is deployed in a corresponding scene area to perform object identification, and referring to fig. 2, the method is performed by any one identification device, and includes the following steps 201 to 203.

201. The recognition device acquires at least one scene image in a scene area, and the recognition device is deployed in the scene area for object recognition.

The scene image acquired by the recognition device includes objects moving in the scene area, for example, people passing through the scene area corresponding to the recognition device, or staff assisting the recognition device in the scene area.

In some embodiments, the recognition device captures images of the scene within the scene area via the configured image capture device. In some embodiments, the image capture device may be a camera. Optionally, the camera may include a monocular camera, a binocular camera, a wide-angle camera, a super wide-angle camera, or other types to acquire images of various specifications or images carrying various types of image information, which is not limited in this application.

In some embodiments, the recognition device extracts image features from the pre-processed scene image by pre-processing the acquired scene image, and then performs object recognition according to the extracted image features. Illustratively, the image feature may be a human face feature, and the recognition device may recognize the identity of the object corresponding to the human face feature according to the human face feature.

202. The recognition device compares object features of at least one scene image with stored object features of a target object.

The object feature describes a whole body shape feature of the object, for example, the object refers to a person, and the object feature may include a plurality of dimensions capable of describing a whole body shape of the person, such as a posture feature, a clothing feature, and a hair style feature of the person. In some embodiments, the posture feature can characterize the height and the size ratio of the person, and the clothing feature and the hair style feature can characterize the directional appearance feature.

Wherein the object feature of the target object is obtained by performing object recognition on a whole-body image of the target object. In some embodiments, the recognition device performs feature extraction on the whole-body image of the target object through the feature extraction network. Illustratively, the identification device can identify the skeleton and the outline of the target object in the whole-body image through key point detection, so as to obtain the posture characteristics of the object; the identification device can also perform fine-grained feature extraction on each region by partitioning the whole-body image so as to obtain features such as hairstyle, clothing and the like; based on this, by integrating the above-mentioned multiple features, a multi-dimensional object feature can be obtained.

And the object characteristics of the scene image are obtained by carrying out object recognition on the scene image. In some embodiments, the recognition device may determine object features of the scene image based on a trained object recognition neural network model. Illustratively, a plurality of regions to be identified in a scene image can be segmented by a target segmentation network in a neural network model, and then feature extraction is performed on each region to be identified by a feature extraction network (see the above feature extraction process), and then the extracted features of each region to be identified are classified, so that the region to be identified, the features of which conform to the features of the object, is determined as the region where the object is located, and accordingly, the features of the region to be identified, in which the object is located, are also the object features of the scene image. In some embodiments, the feature conforming to the object feature refers to: the probability that the features of the region to be recognized are classified as object features is greater than a preset value.

In some embodiments, object features of a plurality of objects can be identified in one scene image, and the determination process is the same as the above process, and is not described herein again. For example, 3 persons are included in the scene image, person 1 corresponds to object feature 1, person 2 corresponds to object feature 2, and person 3 corresponds to object feature 3.

In this embodiment, the target object is an object corresponding to the identification device. The recognition device may determine whether the object in the scene image includes the target object by comparing the stored object features of the target object with object features extracted from the scene image in real time.

In some embodiments, the comparison result obtained by the identification device indicates the probability that the object in the scene image is the target object.

203. The recognition device determines a behavior state of the target object based on the comparison result, the behavior state indicating whether the target object is located within the scene area.

Wherein the comparison result indicates the probability that the object in the scene image is the target object. In some embodiments, the recognition device may determine whether an object in the scene image is the target object according to the comparison result, and further determine whether the target object is currently in the scene area. Based on this, the behavioral state of the target object can be determined.

According to the technical scheme, the object characteristics can describe the whole body morphological characteristics of the object, and the accuracy and the efficiency of determining the behavior state of the object can be greatly improved by comparing the real-time acquired object characteristics with the stored object characteristics, so that the efficiency of personnel management is improved.

The embodiment corresponding to fig. 2 briefly describes the object recognition method provided in the embodiment of the present application. The object recognition method provided by the present application is explained in detail below with an embodiment corresponding to fig. 3.

Referring to fig. 3, fig. 3 is a flowchart of an object identification method provided in an embodiment of the present application, where the method is applied to identification devices included in the object identification system, and each identification device is deployed in a corresponding scene area to perform object identification, and referring to fig. 3, the method is performed by any one identification device, and includes the following steps 301 to 308.

301. The recognition device extracts the face features of the target object based on the whole-body image of the target object, and determines the identity information of the target object based on the face features.

In the embodiment of the present application, there is a correspondence between the target object and the identification device.

In some embodiments, each recognition device in the object recognition system corresponds to one or more target objects, that is, each recognition device has its corresponding target object, and each recognition device is responsible for detecting the behavior state of its corresponding target object.

The whole-body image comprises a face image region, the recognition equipment can extract the face features of the target object from the whole-body image through a face recognition algorithm, and then identity information of the target object is inquired from the associated database based on the face features. In some embodiments, the database is provided by a server, and the recognition device sends the extracted facial features to the server by communicating with the server; the server compares the received face features with the face features stored in the database, and then returns the identity information corresponding to the matched face features to the identification equipment.

In some embodiments, the whole-body image is acquired on-site by an identification device to ensure authenticity of the whole-body image. Optionally, the whole-body image includes a plurality of angles, for example, a front whole-body image (including a face image) and a side whole-body image.

In some embodiments, the database stores a face image of the target object, and the face image is stored in the database after being collected in advance and is used as a basis for performing identity verification on the target object. In such an example, the recognition device may directly transmit the whole-body image to the server; the server divides the face image by identifying the area where the face image is located, and performs comparison retrieval based on the divided face image and the face image in the database to determine the identity information of the target object. Based on this, the identification equipment only needs to gather the whole-body image, send the whole-body image and receive identity information can to can utilize the computing power of server to further promote work efficiency.

In other embodiments, the facial images stored in the database correspond to recognition devices, so as to achieve the purpose of configuring target objects for the recognition devices. Illustratively, the identity information comprises an object identifier of the object, the identity information of the object and the face image are stored in the database, and the purpose of personnel preplanning can be achieved by binding the object identifier and the equipment identifier of the identification equipment.

302. The identification device extracts object features of the target object from the whole-body image and stores the object features of the target object, wherein the object features describe the whole-body morphological features of the object, when the identity information of the target object is verified.

In the embodiment of the application, the identification device can determine the identity of the target object based on the face features so as to verify the target object. In some embodiments, the identity information includes an object identifier, the recognition device determines the object identifier of the target object through the facial features, and if the object identifier is bound to the device identifier of the recognition device, the target object passes the verification.

The process of extracting the object features of the target object refers to the description in step 202, and is not described herein again. In the embodiment of the application, the process of binding the object features with the identification device is completed by storing the object features of the target object, and based on the process, the identification device can detect the behavior state of the target object in real time according to the bound object features. In some embodiments, the recognition device utilizes a pedestrian Re-identification (ReID) technique to extract the object features.

In some embodiments, the binding relationship between the identified device and the target object is valid for the target time period. After the target time period, the recognition device may be rebinding with another object, the object characteristics of the target object, in this example, a temporary binding between the recognition device and the target object. For example, the target time period may be determined according to a work shift of the target object, e.g., a work shift of every three days, and the identification device may be temporarily bound with a new target object every three days.

In an exemplary application scenario, the above steps 301 to 302 may be described as a process of binding a worker with a corresponding identification device, where the process includes: acquiring a face image of each worker according to the staff planning; when the workers report, acquiring a front whole-body image and a side whole-body image of the workers by corresponding recognition equipment, wherein the face part in the front whole-body image is clear; the recognition equipment recognizes the human face characteristics according to the collected full-body photos, and compares the human face characteristics with the human face characteristics in the database to confirm the identity of the staff; if the identity verification of the worker passes, the identification device calculates the object characteristics based on the whole-body image (for example, through the calculation of the ReID technology), and temporarily binds the object characteristics of the worker with the identification device; based on the method, the staff completes the report, the identification equipment enters the working state, and the object identification is carried out on the communication staff in the scene area.

Through the technical scheme, the joint verification is carried out based on the object characteristics and the face characteristics, corresponding workers can be accurately identified, the condition of misjudgment of the workers is effectively avoided, and the personnel management efficiency is greatly improved.

303. The recognition device acquires at least one scene image within the scene area.

Step 303 refers to step 301.

In some embodiments, the recognition device acquires the scene images according to a preset frequency, that is, the recognition device acquires one scene image every a preset time length, for example, the preset time length may be 0.1 second.

In some embodiments, the recognition device is configured with an image capture device disposed on a front panel of the recognition device to enable the recognition device to capture images of a scene in front. In some embodiments, the image capturing device is a wide-angle camera through which the recognition device captures images of the scene within a certain angle range, for example, the capturing angle range of the wide-angle camera disposed on the front panel is 0 ° to 160 °, and the recognition device is capable of capturing images of the scene within a range of 0 ° to 160 ° ahead.

304. The recognition device compares object features of at least one scene image with stored object features of a target object, the object features describing a full-body morphological feature of the object.

In step 304, refer to step 202, which is not described herein.

In the process of extracting the object features of the scene image in this step, reference may be made to the description in step 202. In some embodiments, the recognition device performs object recognition on the scene image based on a Person Re-identification (ReID) technique to determine object features of the scene image.

In other embodiments, the identification device determines the object feature corresponding to a period of time based on a plurality of scene images within the period of time, that is, the identification device calculates the object feature in the scene area according to a certain frequency, so as to save the calculation resources.

305. In a case where the comparison result indicates that the object feature of the scene image does not match the object feature of the target object, the identification device determines that the target object is in a first behavior state indicating that the target object is not within the scene area.

In some embodiments, the recognition device calculates a similarity between the object feature of the target object and the object feature of the scene image, thereby determining the contrast result based on the calculated similarity. In some embodiments, the object features are represented by feature vectors, and the similarity may be represented by euclidean distances between the feature vectors, where the larger the euclidean distance, the smaller the similarity, and the smaller the probability that the object in the scene image is the target object is.

In some embodiments, the comparison result may be represented by a flag bit, where the flag bit is set to 1 to indicate that the object feature of the scene image matches the object feature of the target object, and the flag bit is set to 0 to indicate that the object feature of the scene image does not match the object feature of the target object.

In some embodiments, if the calculated similarity is smaller than the similarity threshold, it indicates that the object feature of the scene image does not match the object feature of the target object, and the flag is set to 0.

Through above-mentioned technical scheme, through the contrast characteristic, can accurately detect whether target object is located the scene area, and then guarantee that target object is located identification equipment's field of vision within range, promote personnel's management efficiency.

In other embodiments, in the case that it is determined that the target object is in the scene area, the identification device further performs fine-grained identification on the behavior state of the target object, and the process includes the following steps 1 to 3.

Step 1, in the case that the comparison result indicates that the object features of the scene image match with the object features of the target object, the identification device determines that the target object is in the scene area.

Step 1 is similar to step 305, and will not be described herein.

And 2, the identification equipment determines the distance between the target object and the identification equipment based on the depth information of the image area where the target object is located in the at least one scene image.

Wherein the depth information indicates distances between the respective pixel points of the target object and the camera. In some embodiments, the depth information indicates three-dimensional coordinates of respective pixel points of the target object in the scene area, based on which the distance between the target object and the recognition device can be determined.

In some embodiments, the image capturing device configured by the identification device includes a depth camera, and the scene image captured by the depth camera is a depth image carrying depth information. Optionally, the depth image comprises three channels of pixel values and one channel of depth values. The recognition device may obtain the depth information based on a depth value channel of the scene image. In some embodiments, the image capturing device is a binocular camera, the binocular camera can capture images of the left camera and the right camera synchronously, the parallax of each pixel is estimated by calculating the parallax between the images, and then the depth information of the target object in the images is estimated through the parallax.

And 3, under the condition that the distance is greater than a target threshold value, the identification device determines that the target object is in a second behavior state, wherein the second behavior state indicates that the target object is in the scene area and is too far away from the identification device.

The target threshold value can be set according to the personnel planning requirement in the application scene.

In some embodiments, it is considered that although the target object is in the scene area, the target object may be too far away from the identification device, and the identification device may not be assisted in time to perform the personnel identity verification.

In other embodiments, after performing step 1 above and determining that the target object is located in the scene area, the identification device may further identify a behavior trend of the target object in the scene area, where the identification process includes the following steps a and B.

Step A, under the condition that the identification device determines that the target object is located in the scene area, based on the depth information of the area where the target object is located in the plurality of scene images, the distance change trend between the target object and the identification device is determined.

Wherein the depth information refers to step 1.

In some embodiments, the depth information of the plurality of scene images indicates a distance between a target object and the recognition device, and if the distance between the target object and the recognition device is farther and farther, the target object tends to be far away from the recognition device and even away from the scene area.

And B, if the distance change trend indicates that the target object is far away from the recognition equipment, the recognition equipment plays a second prompt voice aiming at the target object, and the second prompt voice is used for keeping the target object in the scene area.

In some embodiments, if the target object has a tendency of being far away from the recognition device or even being away from the scene area, the recognition device may not be assisted in time to perform the personnel identity verification, and in this case, by accurately detecting the distance change tendency corresponding to the target object, the target object can be prompted in time by playing a voice, so that the target object can efficiently assist the recognition device in working, and the personnel management efficiency is improved.

It should be noted that, the above step a and step 2 can be executed after the step 1 is executed, and the step 2 can be implemented in the process of executing the step a, for example, according to the step a, in the process that the recognition device determines the distances between the target object and the recognition device at a plurality of time points based on the depth information in the plurality of scene images, if it is detected that the distance at any time point is greater than the target threshold, the recognition device directly executes the step 3, that is, it can directly determine that the distance between the target object and the recognition device is too far.

306. The recognition device records the time when the target object leaves the scene area based on the time stamp of any scene image in response to detecting that the target object is in the first behavior state.

In this embodiment of the application, if the recognition device detects that the target object is in the first behavior state, the target object has left the scene area. The recognition device can accurately determine the time when the target object leaves the scene area according to the scene image detected that the target object is in the first behavior state.

In other embodiments, the identification device can send the time to a server for recording to facilitate personnel management and liability determination.

307. The recognition device responds to the detection that the target object is in the first behavior state, and plays a first prompt voice which is used for enabling the target object to return to the scene area.

In some embodiments, the recognition device may link a plurality of recognition devices around the target object to prompt the target object by sending the object feature to other recognition devices adjacent to the recognition device, and the process includes: the recognition device responds to the detection that the target object is in the first behavior state, and sends the object characteristics of the target object to a target recognition device adjacent to the recognition device in the object recognition system, wherein the target recognition device is used for playing the first prompt voice under the condition that the target object is detected. In some embodiments, the target recognition device may detect whether the target object is present by comparing the received object features with object features of the captured scene image according to a process similar to steps 303 to 305.

Through the technical scheme, the scheme for prompting the linkage of the multiple devices is provided, the range for reminding the personnel is enlarged, and the personnel management efficiency is further improved.

308. The identification device responds to identification of an abnormal object from any one of the scene images under the condition that the target object is in the first behavior state, and acquires a scene image including the abnormal object in the scene area within a target duration after the abnormal object is identified.

In some embodiments, the target object is in the first behavior state, the current abnormal object is not blocked from passing, and the track of the abnormal object can be effectively tracked by timely acquiring the image including the abnormal object, so that a basis is provided for subsequent safety tracing and responsibility judgment.

In some embodiments, the target duration is a duration that ensures that the abnormal object is located within the scene area, for example, the target duration may be 30 seconds.

Wherein, the abnormal object refers to an object whose identity does not pass the verification. In some embodiments, the manner of performing the verification may include the following one to three manners.

The first method is face verification.

In some embodiments, the recognition device queries, based on the facial features recognized in the scene image, the identity information corresponding to the facial features from the associated database, and if valid identity information cannot be queried or the queried identity information is not allowed to pass, the object corresponding to the facial features is an abnormal object that fails the verification.

And the second mode is certificate verification.

In some embodiments, the identification device is provided with credential verification functionality. The identification device can read the identity information in the chip after carrying out security authentication with the chip in the certificate in a wireless transmission mode, then compares the read algorithm information with the identity information in the database, and if the read identity information does not belong to the effective algorithm information in the database or confirms that the read identity information is not allowed to pass, the object holding the certificate is an abnormal object which does not pass the verification.

And thirdly, checking the health condition.

In some embodiments, the identification device may perform body temperature detection on the subject in the scene area through the target sensor, and if the body temperature detection result of the subject is abnormal (e.g., the body temperature is too high), the subject is an abnormal person who has not passed the verification.

In other embodiments, the recognition device can send the scene image including the abnormal object to a server for saving, so as to facilitate subsequent safety tracing.

It should be noted that the

steps

306, 307, and 308 can be executed after the execution of the step 305 is completed.

In the technical scheme, the object characteristics can describe the whole body morphological characteristics of the object, and the accuracy and the efficiency of determining the behavior state of the object can be greatly improved by comparing the real-time acquired object characteristics with the stored object characteristics, so that the efficiency of personnel management is improved. In addition, a scheme for prompting by linking multiple devices is provided, the range for prompting the personnel is expanded, and the efficiency of personnel management is further improved. Furthermore, the track of the abnormal object can be effectively tracked by timely acquiring the image comprising the abnormal object, so that a basis is provided for subsequent safety tracing and responsibility judgment, and the personnel management efficiency is further improved.

Fig. 4 is a schematic structural diagram of an object recognition apparatus provided in an embodiment of the present application, and referring to fig. 4, the apparatus is applied to recognition devices included in an object recognition system, and each recognition device is deployed in a corresponding scene area to perform object recognition; the device includes:

an acquisition module 401, configured to acquire at least one scene image in the scene area;

a comparing module 402, configured to compare an object feature of the at least one scene image with an object feature of a stored target object, where the object feature describes a whole-body morphological feature of an object, the object feature of the scene image is obtained by performing object recognition on the scene image, and the object feature of the target object is obtained by performing object recognition on the whole-body image of the target object;

a state determining module 403, configured to determine a behavior state of the target object based on the comparison result, where the behavior state indicates whether the target object is located in the scene area.

In one possible implementation, the state determining module 403 is configured to:

In one possible embodiment, the apparatus further comprises:

In one possible implementation, the state determination module 403 is configured to:

In one possible embodiment, the apparatus further comprises:

the second prompting module is used for determining a distance change trend between the target object and the identification equipment based on the depth information of the area where the target object is located in a plurality of scene images under the condition that the target object is determined to be located in the scene area;

In one possible embodiment, the apparatus further comprises:

According to the technical scheme, the object characteristics can describe the whole body morphological characteristics of the object, and the accuracy and the efficiency of determining the behavior state of the object can be greatly improved by comparing the real-time acquired object characteristics with the stored object characteristics, so that the efficiency of personnel management is improved. In addition, a scheme for prompting by linking multiple devices is provided, the range for prompting the personnel is expanded, and the efficiency of personnel management is further improved. Furthermore, the track of the abnormal object can be effectively tracked by timely acquiring the image comprising the abnormal object, so that a basis is provided for subsequent safety tracing and responsibility judgment, and the personnel management efficiency is further improved.

It should be noted that: in the object recognition apparatus provided in the foregoing embodiment, when the corresponding steps are executed, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the object identification apparatus and the object identification method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.

The embodiment of the present application provides a computer device, which includes a processor and a memory, where the memory is used to store at least one computer program, and the at least one computer program is loaded and executed by the processor to implement the object recognition method described above.

Taking a computer device as an example, the terminal can be implemented as the identification device. Fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal may be: personal Computer (PC), mobile phone, smart phone, personal Digital Assistant (PDA), wearable device, pocket PC (Pocket PC, PPC), tablet Computer, smart car machine, smart television, smart sound box, smart voice interaction device, smart home appliance, vehicle-mounted terminal, and the like. A terminal may also be referred to by other names such as user equipment, user terminal, portable terminal, laptop terminal, desktop terminal, etc.

Generally, a terminal includes: a processor 501 and a memory 502.

The processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 501 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 501 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 501 may be integrated with a Graphics Processing Unit (GPU) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, processor 501 may also include an AI processor for processing computational operations related to machine learning.

Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 502 is used to store at least one instruction for execution by the processor 501 to cause the terminal to implement the object identification method provided by the method embodiments herein.

In some embodiments, the terminal may further include: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502 and peripheral interface 503 may be connected by a bus or signal lines. Each peripheral may be connected to the peripheral interface 503 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 504, display screen 505, camera assembly 506, audio circuitry 507, and power supply 508.

The peripheral interface 503 may be used to connect at least one Input/Output (I/O) related peripheral to the processor 501 and the memory 502. In some embodiments, the processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 501, the memory 502, and the peripheral interface 503 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 504 is used for receiving and transmitting Radio Frequency (RF) signals, also called electromagnetic signals. The radio frequency circuitry 504 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 504 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 504 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or Wireless Fidelity (WiFi) networks. In some embodiments, rf circuitry 504 may also include Near Field Communication (NFC) related circuitry, which is not limited in this application.

The display screen 505 is used to display a User Interface (UI). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 505 is a touch display screen, the display screen 505 also has the ability to capture touch signals on or over the surface of the display screen 505. The touch signal may be input to the processor 501 as a control signal for processing. At this point, the display screen 505 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 505 may be one, disposed on a front panel of the terminal; in other embodiments, the display 505 may be at least two, respectively disposed on different surfaces of the terminal or in a folded design; in other embodiments, the display 505 may be a flexible display, disposed on a curved surface or a folded surface of the terminal. Even more, the display screen 505 can be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display screen 505 may be made of Liquid Crystal Display (LCD), organic Light-Emitting Diode (OLED), or the like.

Camera assembly 506 is used to capture images or video, for example, to capture images of a scene. Optionally, camera assembly 506 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and a Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 506 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuitry 507 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 501 for processing, or inputting the electric signals to the radio frequency circuit 504 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones can be arranged at different parts of the terminal respectively. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 501 or the radio frequency circuit 504 into sound waves. The loudspeaker can be a traditional film loudspeaker and can also be a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 507 may also include a headphone jack.

The power supply 508 is used to supply power to the various components in the terminal. The power source 508 may be alternating current, direct current, disposable or rechargeable. When the power source 508 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal also includes one or more sensors 509. The one or more sensors 509 include, but are not limited to: acceleration sensor 510, gyro sensor 511, pressure sensor 512, optical sensor 513, and proximity sensor 514.

The acceleration sensor 510 may detect the magnitude of acceleration on three coordinate axes of a coordinate system established with the terminal. For example, the acceleration sensor 510 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 501 may control the display screen 505 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 510.

The gyro sensor 511 may detect a body direction and a rotation angle of the terminal, and the gyro sensor 511 may cooperate with the acceleration sensor 510 to acquire a 3D motion of the user with respect to the terminal. The processor 501 may implement the following functions according to the data collected by the gyro sensor 511: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensors 512 may be disposed on the side frame of the terminal and/or underneath the display screen 505. When the pressure sensor 512 is disposed on the side frame of the terminal, the holding signal of the user to the terminal can be detected, and the processor 501 performs left-right hand identification or shortcut operation according to the holding signal collected by the pressure sensor 512. When the pressure sensor 512 is disposed at the lower layer of the display screen 505, the processor 501 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 505. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The optical sensor 513 is used to collect the ambient light intensity. In one embodiment, the processor 501 may control the display brightness of the display screen 505 based on the ambient light intensity collected by the optical sensor 513. Specifically, when the ambient light intensity is higher, the display brightness of the display screen 505 is increased; when the ambient light intensity is low, the display brightness of the display screen 505 is reduced. In another embodiment, processor 501 may also dynamically adjust the shooting parameters of camera head assembly 506 based on the ambient light intensity collected by optical sensor 513.

A proximity sensor 514, also known as a distance sensor, is typically provided on the front panel of the terminal. The proximity sensor 514 is used to collect the distance between the user and the front of the terminal. In one embodiment, when the proximity sensor 514 detects that the distance between the user and the front surface of the terminal gradually decreases, the processor 501 controls the display screen 505 to switch from the bright screen state to the dark screen state; when the proximity sensor 514 detects that the distance between the user and the front face of the terminal gradually becomes larger, the display screen 505 is controlled by the processor 501 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 5 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

Taking a computer device as an example, the server may be implemented as the server 102 described above.

Fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 600 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 601 and one or more memories 602, where the one or more memories 602 store at least one computer program that is loaded and executed by the one or more processors 601 to implement the steps performed by the server in the object identification method. Certainly, the server 600 may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 600 may further include other components for implementing functions of the device, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory including a computer program, which is executable by a processor to perform the object recognition method in the above embodiments, is also provided. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product or a computer program is also provided, which includes program code stored in a computer-readable storage medium, which is read by a processor of a computer apparatus from the computer-readable storage medium, and which is executed by the processor to cause the computer apparatus to execute the above-described object recognition method.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. The object identification method is applied to identification devices included in an object identification system, and each identification device is deployed in a corresponding scene area to perform object identification; the method comprises the following steps:

the identification equipment acquires at least one scene image in the scene area;

2. The method of claim 1, wherein the identification device determines the behavior state of the target object based on the comparison, comprising:

in a case where the comparison result indicates that the object feature of the scene image does not match the object feature of the target object, the identification apparatus determines that the target object is in a first behavior state indicating that the target object is not within the scene area.

3. The method of claim 2, further comprising:

4. The method of claim 2, further comprising:

and the recognition equipment responds to the detection that the target object is in the first behavior state, and plays a first prompt voice, wherein the first prompt voice is used for returning the target object to the scene area.

5. The method of claim 4, further comprising:

the recognition device responds to the detection that the target object is in a first behavior state, and sends the object characteristics of the target object to a target recognition device adjacent to the recognition device in the object recognition system, wherein the target recognition device is used for playing the first prompt voice under the condition that the target object is detected.

6. The method of claim 2, further comprising:

7. The method of claim 1, wherein the identification device determines the behavior state of the target object based on the comparison, comprising:

in a case where the comparison result indicates that the object feature of the scene image matches the object feature of the target object, the recognition device determines that the target object is within the scene area;

determining a distance between the target object and the recognition device based on depth information of an image area where the target object is located in the at least one scene image;

in the event that the distance is greater than a target threshold, determining that the target object is in a second behavioral state, the second behavioral state indicating that the target object is too far away from the identification device within the scene area.

8. The method of claim 7, further comprising:

determining a distance variation trend between the target object and the identification device based on depth information of the area where the target object is located in a plurality of scene images under the condition that the target object is determined to be located in the scene area;

9. The method of claim 1, wherein before the recognition device compares the object features of the at least one scene image to the object features of the stored target objects, the method further comprises:

the identification equipment extracts the face features of the target object based on the whole-body image of the target object, and determines the identity information of the target object based on the face features;

and under the condition that the identity information of the target object is verified, extracting the object feature of the target object from the whole-body image, and storing the object feature of the target object.

10. The object recognition device is applied to recognition equipment included in an object recognition system, and each recognition equipment is deployed in a corresponding scene area to perform object recognition; the device comprises:

and the state determining module is used for determining the behavior state of the target object based on the comparison result, wherein the behavior state indicates whether the target object is positioned in the scene area.

11. A computer device, characterized in that the computer device comprises a processor and a memory for storing at least one computer program, which is loaded and executed by the processor to implement the object recognition method according to any one of claims 1 to 9.

12. A computer-readable storage medium, in which at least one computer program is stored, which is loaded and executed by a processor to implement the object recognition method according to any one of claims 1 to 9.

13. A computer program product, characterized in that the computer program product comprises at least one computer program which is loaded and executed by a computer device to implement the object recognition method according to any one of claims 1 to 9.