CN116939206A

CN116939206A - Object recognition method and related equipment

Info

Publication number: CN116939206A
Application number: CN202210375889.6A
Authority: CN
Inventors: 王军; 郭润增; 王少鸣; 陈晓杰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2023-10-24

Abstract

The application discloses an object identification method and related equipment, and related embodiments can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic and the like; depth image data for an object to be identified may be acquired; grouping at least one pixel point to obtain a pixel point group; acquiring a target blank data set in a target compression format; aiming at the pixel points in each pixel point group, carrying out data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixel points in the single channel, so as to obtain target depth data suitable for a target compression format; compressing the target depth data to obtain compressed depth image data; and carrying out object recognition on the object to be recognized according to the compressed depth image data to obtain an object recognition result. The application can improve the data transmission efficiency of the depth image, further improve the object recognition efficiency and reduce the network transmission flow consumption in the object recognition process.

Description

Object recognition method and related equipment

Technical Field

The application relates to the technical field of computers, in particular to an object identification method and related equipment.

Background

With the development of computer technology, image processing technology is applied to more and more fields, for example, biological feature recognition technology is widely applied to various fields such as access control and attendance checking, information security, electronic certificates and the like. Specifically, the biometric identification technology is a technology of automatically extracting biometric features from an image to be identified and then performing authentication based on the features.

In the process of biometric identification, in addition to capturing a visible light image for an object to be identified, a depth image is typically captured, where the depth image may be used for living body detection of the object to be identified, and accuracy of the depth image is critical to living body detection. Therefore, in the related art, the depth image is generally directly transmitted to the back end for living body detection and final identity confirmation, but such a direct transmission manner may result in high network traffic consumption and low image data transmission efficiency.

Disclosure of Invention

The embodiment of the application provides an object recognition method and related equipment, wherein the related equipment can comprise an object recognition device, electronic equipment, a computer readable storage medium and a computer program product, and can improve the compression rate of depth image data, so that the improvement of the transmission efficiency of the depth image data is facilitated, the object recognition efficiency is further improved, and the network transmission flow consumption caused in the object recognition process can be reduced.

The embodiment of the application provides an object identification method, which comprises the following steps:

collecting depth image data aiming at an object to be identified, wherein the depth image data comprises depth data of at least one pixel point under a single channel;

grouping the at least one pixel point to obtain at least one pixel point group, wherein each pixel point group comprises a preset number of pixel points;

acquiring a target blank data set corresponding to a target compression format, wherein the target blank data set comprises data units under multiple channels;

aiming at the pixel points in each pixel point group, carrying out data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixel points in a single channel to obtain target depth data suitable for the target compression format;

compressing the target depth data based on the target compression format to obtain compressed depth image data;

and carrying out object recognition on the object to be recognized according to the compressed depth image data to obtain an object recognition result of the object to be recognized.

Accordingly, an embodiment of the present application provides an object recognition apparatus, including:

The device comprises an acquisition unit, a detection unit and a control unit, wherein the acquisition unit is used for acquiring depth image data aiming at an object to be identified, and the depth image data comprises depth data of at least one pixel point under a single channel;

the grouping unit is used for grouping the at least one pixel point to obtain at least one pixel point group, and each pixel point group comprises a preset number of pixel points;

the acquisition unit is used for acquiring a target blank data set corresponding to the target compression format, wherein the target blank data set comprises data units under multiple channels;

the filling unit is used for carrying out data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixel points in each pixel point group to obtain target depth data suitable for the target compression format;

the compression unit is used for compressing the target depth data based on the target compression format to obtain compressed depth image data;

and the identification unit is used for carrying out object identification on the object to be identified according to the compressed depth image data to obtain an object identification result of the object to be identified.

Alternatively, in some embodiments of the present application, the grouping unit may include a pixel filling subunit and a grouping subunit, as follows:

The pixel filling subunit is configured to perform filling processing of reference pixel points on the depth image data when a number relationship between the total number of pixel points in the depth image data and the preset number does not meet a preset multiple relationship condition, so as to obtain depth image data after the filling processing;

and the grouping subunit is used for grouping the pixel points in the depth image data after the filling processing based on the preset quantity to obtain at least one pixel point group.

Optionally, in some embodiments of the present application, the pixel filling subunit may be specifically configured to perform a division operation on the total number of pixel points in the depth image data and the preset number, so as to obtain a remainder after the operation; determining the number of reference pixel points to be filled according to the remainder; and based on the number, filling the reference pixel points into the depth image data to obtain the depth image data after filling.

Alternatively, in some embodiments of the present application, the filling unit may include a determining subunit, a splitting subunit, and a filling subunit, as follows:

the determining subunit is configured to determine a first data amount that can be carried by a data unit in the multiple channels in the target blank data set;

The splitting unit is used for splitting the depth data of the pixel points under a single channel based on the first data amount and the second data amount of the depth data of the pixel points under the single channel aiming at the pixel points in each pixel point group;

and the filling subunit is used for carrying out data filling processing on the data units under the multiple channels based on the splitting result to obtain target depth data applicable to the target compression format.

Alternatively, in some embodiments of the present application, the compression unit may include an acquisition subunit and a compression subunit, as follows:

the acquisition subunit is used for acquiring data index information corresponding to the target depth data;

and the compression subunit is used for compressing the data index information based on the target compression format to obtain compressed depth image data.

Optionally, in some embodiments of the present application, the obtaining subunit may specifically be configured to obtain a preset mapping relationship set, where the preset mapping relationship set includes a mapping relationship between preset depth data and preset data index information; and searching data index information corresponding to the target depth data from the preset mapping relation set.

Alternatively, in some embodiments of the present application, the identification unit may include a transmitting subunit, a decompressing subunit, a merging subunit, and an identification subunit, as follows:

the sending subunit is configured to send an identification request for the object to be identified to a server, where the identification request includes the compressed depth image data;

the decompression subunit is used for performing decompression processing on the compressed depth image data based on the identification request by triggering the server to obtain decompressed depth data, wherein the decompressed depth data comprises depth data under multiple channels;

the merging subunit is used for carrying out data merging processing on the depth data under the multi-channel in the decompressed depth data to obtain the depth data of the pixel point in the depth image data under the single channel;

and the identification subunit is used for carrying out object identification on the object to be identified based on the depth data of the pixel point under a single channel to obtain an object identification result of the object to be identified.

Optionally, in some embodiments of the present application, the merging subunit may be specifically configured to divide depth data under multiple channels in the decompressed depth data to obtain at least one depth data set, where the depth data set includes depth data of at least two data units under multiple channels; and carrying out data merging processing on the depth data of at least two data units under the multi-channel to obtain the depth data of the pixel points in the depth image data under the single channel.

The electronic equipment provided by the embodiment of the application comprises a processor and a memory, wherein the memory stores a plurality of instructions, and the processor loads the instructions to execute the steps in the object identification method provided by the embodiment of the application.

The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps in the object recognition method provided by the embodiment of the application.

In addition, the embodiment of the application also provides a computer program product, which comprises a computer program or instructions, and the computer program or instructions realize the steps in the object recognition method provided by the embodiment of the application when being executed by a processor.

The embodiment of the application provides an object identification method and related equipment, which can collect depth image data aiming at an object to be identified, wherein the depth image data comprises depth data of at least one pixel under a single channel; grouping the at least one pixel point to obtain at least one pixel point group, wherein each pixel point group comprises a preset number of pixel points; acquiring a target blank data set corresponding to a target compression format, wherein the target blank data set comprises data units under multiple channels; aiming at the pixel points in each pixel point group, carrying out data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixel points in a single channel to obtain target depth data suitable for the target compression format; compressing the target depth data based on the target compression format to obtain compressed depth image data; and carrying out object recognition on the object to be recognized according to the compressed depth image data to obtain an object recognition result of the object to be recognized. The application can convert the depth image data into the target depth data under multiple channels, thereby compressing the target depth data based on the target compression format, improving the compression rate of the depth image data while ensuring the accuracy of the depth image data, being beneficial to improving the transmission efficiency of the depth image data, further improving the object recognition efficiency and reducing the network transmission flow consumption in the object recognition process.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1a is a schematic view of a scenario of an object recognition method according to an embodiment of the present application;

FIG. 1b is a flowchart of an object recognition method provided by an embodiment of the present application;

FIG. 1c is an explanatory diagram of an object recognition method provided by an embodiment of the present application;

FIG. 1d is another illustration of an object recognition method provided by an embodiment of the present application;

FIG. 1e is a schematic page diagram of an object recognition method according to an embodiment of the present application;

FIG. 1f is another schematic diagram of a page of an object recognition method according to an embodiment of the present application;

FIG. 1g is another flow chart of an object recognition method provided by an embodiment of the present application;

FIG. 2 is another flow chart of an object recognition method provided by an embodiment of the present application;

fig. 3 is a schematic structural diagram of an object recognition device according to an embodiment of the present application;

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

Embodiments of the present application provide an object recognition method and related devices, which may include an object recognition apparatus, an electronic device, a computer-readable storage medium, and a computer program product. The object recognition device may be integrated in an electronic device, which may be a terminal or a server or the like.

It will be appreciated that the object recognition method of the present embodiment may be executed on the terminal, may be executed on the server, or may be executed by both the terminal and the server. The above examples should not be construed as limiting the application.

As shown in fig. 1a, an object recognition method is taken as an example, where a terminal and a server perform the object recognition method together. The object recognition system provided by the embodiment of the application comprises a terminal 10, a server 11 and the like; the terminal 10 and the server 11 are connected via a network, e.g. a wired or wireless network connection, etc., wherein the object recognition means may be integrated in the terminal.

Wherein, terminal 10 can be used for: collecting depth image data aiming at an object to be identified, wherein the depth image data comprises depth data of at least one pixel point under a single channel; grouping the at least one pixel point to obtain at least one pixel point group, wherein each pixel point group comprises a preset number of pixel points; acquiring a target blank data set corresponding to a target compression format, wherein the target blank data set comprises data units under multiple channels; aiming at the pixel points in each pixel point group, carrying out data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixel points in a single channel to obtain target depth data suitable for the target compression format; compressing the target depth data based on the target compression format to obtain compressed depth image data; and sending the compressed depth image data to a server 11 to perform object recognition on the object to be recognized based on the compressed depth image data by the server 11, so as to obtain an object recognition result of the object to be recognized. The terminal 10 may include a mobile phone, a smart tv, a tablet computer, a notebook computer, or a personal computer (PC, personal Computer), among others. A client may also be provided on the terminal 10, which may be an application client or a browser client, etc.

Wherein, the server 11 can be used for: receiving compressed depth image data sent by a terminal 10, and performing object recognition on the object to be recognized based on the compressed depth image data to obtain an object recognition result of the object to be recognized; the object recognition result is transmitted to the terminal 10. The server 11 may be a single server, or may be a server cluster or cloud server composed of a plurality of servers. The application discloses an object recognition method or device, wherein a plurality of servers can be formed into a blockchain, and the servers are nodes on the blockchain.

The step of performing object recognition by the server 11 may be performed by the terminal 10.

The object recognition method provided by the embodiment of the application relates to a computer vision technology in the field of artificial intelligence.

Among these, artificial intelligence (AI, artificial Intelligence) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

The Computer Vision technology (CV) Computer Vision is a science of researching how to make a machine "look at", and more specifically, it means to replace a human eye with a camera and a Computer to perform machine Vision such as identifying and measuring on a target, and further perform graphic processing, so that the Computer processing becomes an image more suitable for the human eye to observe or transmit to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping, autopilot, intelligent transportation, etc., as well as common biometric technologies such as face recognition, fingerprint recognition, etc.

The following will describe in detail. The following description of the embodiments is not intended to limit the preferred embodiments.

The present embodiment will be described from the viewpoint of an object recognition apparatus, which may be integrated in an electronic device, which may be a server or a terminal or the like.

It will be appreciated that in the specific embodiments of the present application, related data such as user information is involved, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

The object recognition method of the embodiment of the application can be applied to various scenes needing object recognition, such as palm payment, entrance guard attendance and other scenes. The embodiment can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like.

As shown in fig. 1b, the specific flow of the object recognition method may be as follows:

101. depth image data for an object to be identified is acquired, the depth image data comprising depth data for at least one pixel point under a single channel.

The object to be identified may be an object that needs to be identified by a biometric feature, for example, an object that needs to be identified by a human face, or an object that needs to be identified by a palm print, which is not limited in this embodiment. Specifically, the object to be identified may be a real face, or may be a false face, such as a printed paper face photo. Currently, a depth image is generally used to perform living body detection on an object to be identified, and whether the object to be identified is a photo or not is judged.

The depth image data may specifically be a depth image obtained by collecting infrared light with a speckle structure by an infrared Sensor (Sensor) and analyzing the speckle by a depth unit. In three-dimensional (3D,three dimensional) computer graphics and computer vision, a depth image is an image or image channel containing information about the distance of the surface of an acquired scene object to a viewpoint. Each pixel point of the depth image represents a vertical distance between the depth camera plane and the photographed object plane, and is usually represented by 16 bits, in millimeters. In a face-brushing payment scenario, the depth image is typically used for in-vivo detection and assisted identification; the identification of the object is carried out by utilizing the assistance of the depth image, so that the accuracy and the robustness of the identification can be greatly improved.

The biological characteristic recognition technology is a technology for carrying out personal identity authentication by closely combining a computer with high-tech means such as optics, acoustics, biological sensors, a biological statistics principle and the like and utilizing inherent physiological characteristics (such as fingerprints, faces, irises and the like) and behavioral characteristics (such as handwriting, sound, gait and the like) of a human body.

In this embodiment, the depth image may include at least one pixel, where a pixel value of each pixel represents an actual distance from the sensor to the surface of the object, and specifically, the actual distance corresponding to each pixel may be represented by depth data under a single channel, where the depth data may be understood as a distance; the depth image data may include depth data for each pixel in the depth image in a single channel. A single channel is understood to mean that each pixel is represented by only one pixel value. In one example, the pixel value of each pixel point may be represented with a single channel 16bit precision.

In a particular scenario, at least one set of biometric images for an object to be identified may be acquired, each set of biometric images including depth image data and reference image data, which may include visible light images and infrared light images.

The infrared light image can be specifically an infrared image of floodinfrared light imaging acquired by an infrared Sensor (Sensor); it can be used for living body detection of an object to be identified, for example, whether the object to be identified is a silica gel head model can be judged based on the brightness of an infrared light image. The visible light image can be a color image of natural light imaging acquired by a color Sensor (Sensor), and can be used for identifying the identity of the object to be identified.

It should be noted that, the depth image data and the reference image data belonging to the same group may be acquired at the same time, so that the depth image data and the reference image data are corresponding to each other.

In a specific embodiment, after the camera collects multiple groups of biological feature images, each group of biological feature images can be optimized, and one group of infrared light images, visible light images and depth images which meet the preconditions of the living body detection and identification algorithm can be selected. It should be noted that the depth image, the visible light image, and the infrared light image preferably obtained belong to the same group of biometric images.

For selecting the visible light image, specifically, image analysis of at least one dimension may be performed on each visible light image, so as to select a target visible light image from multiple visible light images. Specifically, at least one dimension of image analysis is performed on each visible light image to ensure that the quality of the image meets the operation requirement of subsequent business, wherein the at least one dimension of image analysis can comprise a face shielding range, an illumination environment, a face size, a face centering degree, a face angle, an image contrast, brightness and definition of the image and the like; and selecting a target visible light image from the acquired visible light images based on the image analysis result.

In addition, as for the infrared light image, it can be preferable based on the brightness of the infrared light image; for depth images, the depth image may be preferred based on its integrity.

The completeness of the depth image refers to the proportion of effective points in the depth image, the value of a part of pixel points in the depth image is 0, usually 0 represents an ineffective point, and the rest represents effective points; integrity refers to the ratio of the active point to all points in the depth image.

And then, the terminal packs and compresses the depth image, the visible light image and the infrared light image which are preferably obtained, and transmits the depth image, the visible light image and the infrared light image to a rear-end server through a network to carry out living body detection, feature extraction and identity recognition, so that the online identity recognition of the object to be recognized is completed. In the payment scene, after the identity information of the object to be identified is determined, a deduction code can be generated and returned for payment.

However, in the process of object recognition, a large amount of image transmission can cause larger network traffic consumption; since the depth image contains depth information of a scene where an object to be identified is located, and accuracy of the depth information is critical to living body detection, the industry generally uses a PNG lossless compression scheme to compress and transmit depth image data. The PNG lossless compression rate is low, the ideal image compression effect is difficult to achieve, and the aims of reducing transmission flow and accelerating image transmission efficiency cannot be well achieved.

Among them, PNG (Portable Network Graphics ) is a bitmap format employing a lossless compression algorithm.

102. And carrying out grouping processing on the at least one pixel point to obtain at least one pixel point group, wherein each pixel point group comprises a preset number of pixel points.

The preset number may be set according to practical situations, which is not limited in this embodiment. For example, the preset number may be set to 3, and each pixel group includes 3 pixels.

In some embodiments, if the total number of pixels included in the depth image is a multiple of the preset number, for example, the preset number is 3, and the total number of pixels included in the depth image is just a multiple of 3, then the pixels in the depth image may be directly subjected to grouping processing, so that each obtained pixel group includes 3 pixels.

However, in other embodiments, if the total number of pixels included in the depth image is not a multiple of the preset number, for example, the preset number is 3, and the total number of pixels included in the depth image is not a multiple of 3, then each three pixels in the depth image are grouped and converted into a pixel group, and then, for the remaining one or two pixels, the three pixels can be filled with reference pixels and then combined into one pixel group.

For example, if one pixel point remains, two reference pixel points need to be refilled, and the pixel points in the depth image data after the filling process can be just divided into an integer number of pixel point groups; for another example, if two pixels remain, one reference pixel needs to be refilled, and the pixels in the depth image data after the filling process can be just divided into an integer number of pixel groups.

Optionally, in this embodiment, the step of "grouping the at least one pixel to obtain at least one pixel group" may include:

when the number relation between the total number of the pixel points in the depth image data and the preset number does not meet the preset multiple relation condition, filling the reference pixel points in the depth image data to obtain the depth image data after the filling;

And based on the preset number, grouping the pixel points in the depth image data after the filling processing to obtain at least one pixel point group.

The pixel value of the reference pixel point may be specifically set to 0, and the reference pixel point with the pixel value of 0 is filled, which is equivalent to filling the depth image data with a data set of all 0, so that the number relationship between the total number of the pixel points in the depth image data after the filling processing and the preset number satisfies the preset multiple relationship condition.

The preset multiple relation condition may specifically be that the total number of pixel points in the depth image data is an integer multiple of a preset number.

Optionally, in this embodiment, the step of performing the filling process of the reference pixel point on the depth image data to obtain the depth image data after the filling process may include:

dividing the total number of the pixel points in the depth image data with the preset number to obtain a remainder after operation;

determining the number of reference pixel points to be filled according to the remainder;

and based on the number, filling the reference pixel points into the depth image data to obtain the depth image data after filling.

Wherein the remainder may be determined as the number of reference pixel points that need to be filled.

103. And acquiring a target blank data set corresponding to the target compression format, wherein the target blank data set comprises data units under multiple channels.

The target compression format may specifically be WebP format, etc., which is not limited in this embodiment. Wherein Webp is a picture file format supporting lossy compression and lossless compression, derived from image encoding format VP8, VP8 is an open image compression format. The WebP has the advantages of having a better image data compression algorithm, being capable of bringing smaller picture volume, and having an image quality that is indistinguishable by naked eyes. According to the test, the lossless compressed WebP file has 45% less file volume than the PNG file.

As shown in FIG. 1c, the compression rate of PNG lossless compression can be improved by 10.9% compared with PNG original image non-compression, the picture volume is 30.1kb (kilobyte) under PNG lossless compression, the picture volume is 24.1kb under WebP lossless compression, the compression rate of WebP lossless compression is about 20% -40% higher than that of PNG lossless compression, and compared with the compression rate, the WebP lossless compression can realize good compression effect and reduce the picture volume. Although the compression ratio of PNG lossy compression and WebP lossy compression is high, the data accuracy of the depth image is important in this embodiment, so only lossless compression can be selected.

However, webP does not support the image format of the depth image in the above embodiment, i.e., does not support the image format of the single channel 16bit precision. According to the object recognition method provided by the application, the single-channel 16-bit precision data of the depth image can be converted into the three-channel 8-bit precision image format supported by the WebP, so that the converted depth image can be compressed by using WebP lossless compression, the compression rate of the data is improved, and the consumption of network transmission flow and the time consumption of transmission are reduced.

The target blank data set includes data units under multiple channels, and may include data structure blocks under multiple channels, where each data structure block may be regarded as a data group, and specifically, the target blank data set may include a data group with a value of 0 under multiple channels.

In a specific embodiment, the target blank data set may include data units under three channels, where the amount of data that each data unit can carry may be set according to practical situations, for example, the amount of data that each data unit can carry may be 8 bits, so that the data set format of the target blank data set may be 8 bits precision of the three channels, and the amount of data that the target blank data set can carry is 24 bits. It is understood that the target blank data set may be understood as a data set corresponding to any one pixel in a blank image, where the blank image includes image data of at least one pixel under three channels, where three channels may be understood as that one pixel is represented by three pixel values, and each pixel value has a size of 8 bits.

104. And aiming at the pixel points in each pixel point group, carrying out data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixel points in the single channel, and obtaining target depth data suitable for the target compression format.

Specifically, each pixel point group includes a preset number of pixel points, depth data of the pixel points under a single channel can be filled into data units under multiple channels in a target blank data set, and the data in the target blank data set after filling processing is determined as target depth data, wherein the target depth data is the depth data under multiple channels.

Optionally, in this embodiment, the step of performing, for each pixel in each pixel group, data filling processing on the data unit in the multiple channels in the target blank data set based on the depth data of the pixel in the single channel to obtain target depth data applicable to the target compression format may include:

determining a first data volume which can be borne by a data unit in a multichannel in the target blank data set;

for the pixel points in each pixel point group, splitting the depth data of the pixel points under a single channel based on the first data amount and the second data amount of the depth data of the pixel points under the single channel;

And based on the splitting result, carrying out data filling processing on the data units under the multiple channels to obtain target depth data suitable for the target compression format.

In a specific embodiment, if the target blank data set includes data units under three channels, a first data amount that can be carried by each data unit is 8 bits, and a second data amount corresponding to depth data of a pixel point in the depth image under a single channel is 16 bits, the depth data 16 bits of the pixel point under the single channel can be split into 2 data with 8 bits, and the 2 data units under multiple channels are respectively filled with the two data units.

If each pixel group includes 3 pixels, the data amount corresponding to each pixel group is 16 bits by 3, and total data amount that can be carried by the target blank data set is 3 by 8 bits, and total data amount that can be carried by the target blank data set is 24 bits, so that the data amount corresponding to each pixel group needs to be carried by two target blank data sets, that is, the 16bit precision pixel points of 3 single channels are converted into the 8bit precision pixel points of two three channels in the blank image.

In some embodiments, the step of performing data filling processing on the data units under the multiple channels to obtain the target depth data applicable to the target compression format based on the splitting result may include:

Determining the target quantity of the target blank data sets for filling according to the data quantity carried by each target blank data set and the data quantity corresponding to the splitting result;

and based on the splitting result, performing data filling processing on the data units in the multiple channels in the target blank data set with the target number to obtain target depth data suitable for the target compression format.

The splitting result may specifically include a splitting result of each pixel in the pixel group, and the data amount corresponding to the splitting result may refer to the data amount corresponding to the pixel group. If the pixel point group comprises three pixel points, the 16bit depth data of each pixel point is split into two 8bit data under a single channel, then 6 8bit data can be obtained, and two groups of target blank data sets are needed for filling.

105. And carrying out compression processing on the target depth data based on the target compression format to obtain compressed depth image data.

Alternatively, in the present embodiment, the target compression format may be a WebP compression format or the like. The WebP compression format may convert image data using different techniques, which may include: prediction spatial transformation, color spatial conversion, using palette, multi-pixel packing into one pixel, alpha value replacement, entropy coding, etc.

Among them, prediction spatial transformation is a technique of reducing entropy based on spatial prediction by taking advantage of the fact that neighboring pixels are often correlated. In the predictor transform, the current pixel value is predicted (in scan line order) from the already decoded pixel, and only the residual value (actual prediction) is encoded.

Where color space conversion refers to that if the image data before compression is in RGB (red green blue) format, the RGB format can be converted into YUV format, Y represents a luminance component, and UV represents a chrominance component. The conversion to YUV format is because human vision is far more sensitive to luminance than chrominance, so the space occupied by the data can be saved by properly reducing the storage of chrominance data, but without greatly affecting the visual effect.

Where using a palette refers to creating an array of color indices and replacing the pixel values with the indices of the array if there are no more unique pixel values, the compression may be better.

The entropy coding is a lossless data compression coding mode. For entropy coding, modified LZ77 Huffman coding is used in WebP to compact sparse values, which is a two-dimensional coding technique for distance values.

Optionally, in this embodiment, the step of performing compression processing on the target depth data based on the target compression format to obtain compressed depth image data may include:

acquiring data index information corresponding to the target depth data;

and based on the target compression format, compressing the data index information to obtain compressed depth image data.

Optionally, in this embodiment, the step of "obtaining the data index information corresponding to the target depth data" may include:

acquiring a preset mapping relation set, wherein the preset mapping relation set comprises a mapping relation between preset depth data and preset data index information;

and searching data index information corresponding to the target depth data from the preset mapping relation set.

The preset mapping relation set may be a relation table between preset depth data and preset data index information, and different depth data may correspond to different data index information.

For the depth data corresponding to each byte in the target depth data, the depth data can be replaced by the corresponding data index information, so that the data volume is reduced, and the compression effect is improved.

106. And carrying out object recognition on the object to be recognized according to the compressed depth image data to obtain an object recognition result of the object to be recognized.

In a specific scenario, the compressed depth image data and the compressed reference image data may be transmitted to the server, where the compressed reference image data may include a compressed infrared light image and a compressed visible light image, so as to trigger the server to perform in-vivo detection on the object to be identified based on the compressed depth image data and the compressed infrared light image, and perform identity recognition on the object to be identified based on the compressed visible light image.

The compression format of the visible light image and the infrared light image may be different from or the same as the compression format of the depth image data, which is not limited in this embodiment. For example, the visible light image and the infrared light image may be compressed in a JPG (Joint Photographic Group, joint group of images) compression format, and the depth image data may be compressed in a WebP lossless compression format.

Optionally, in this embodiment, the step of performing object recognition on the object to be recognized according to the compressed depth image data to obtain an object recognition result of the object to be recognized may include:

Sending an identification request for the object to be identified to a server, wherein the identification request comprises the compressed depth image data;

the server is triggered to decompress the compressed depth image data based on the identification request to obtain decompressed depth data, wherein the decompressed depth data comprise depth data under multiple channels;

carrying out data merging processing on the depth data under multiple channels in the decompressed depth data to obtain depth data of pixel points in the depth image data under a single channel;

and carrying out object recognition on the object to be recognized based on the depth data of the pixel point under a single channel to obtain an object recognition result of the object to be recognized.

The recognition request can indicate to perform object recognition on the object to be recognized, the recognition request can also include a compressed infrared light image and a compressed visible light image, the server can respectively decompress the compressed infrared light image and the compressed visible light image to obtain a decompressed infrared light image and a decompressed visible light image, and the object to be recognized is recognized by combining depth data of pixels in depth image data under a single channel. Object recognition may include, among other things, in vivo detection and identification.

The decompression method used for decompressing the compressed depth image processing corresponds to the compression method used in the above embodiment. If the target compression format is adopted to compress the target depth data, the decompression format corresponding to the target compression format is also adopted to decompress the compressed depth image.

The server can perform living detection on the object to be identified based on the depth data of the pixel points under the single channel so as to judge whether the object to be identified is a photo.

In this embodiment, the step of performing data merging processing on the depth data under multiple channels in the decompressed depth data to obtain depth data of a pixel point in the depth image data under a single channel may include:

deleting the depth data corresponding to the reference pixel points from the decompressed depth data under multiple channels to obtain processed depth data;

and carrying out data merging processing on the depth data under the multi-channel in the processed depth data to obtain the depth data of the pixel point in the depth image data under the single channel.

In the data filling process in the above embodiment, if the number relationship between the total number of pixel points and the preset number in the depth image data does not satisfy the preset multiple relationship condition, the filling process of the reference pixel points needs to be performed on the depth image data, that is, the filling process is equivalent to filling all 0 data sets in the depth image data; therefore, after decompressing to obtain depth data (e.g., data with three-channel 8bit precision) in multiple channels, the data with all 0 end in the three-channel 8bit data can be discarded.

Optionally, in this embodiment, the step of performing data merging processing on the depth data under multiple channels in the decompressed depth data to obtain depth data of a pixel point in the depth image data under a single channel may include:

dividing the depth data under the multiple channels in the decompressed depth data to obtain at least one depth data set, wherein the depth data set comprises the depth data of at least two data units under the multiple channels;

and carrying out data merging processing on the depth data of at least two data units under the multi-channel to obtain the depth data of the pixel points in the depth image data under the single channel.

The depth data of each two data units in the multi-channel of the decompressed depth data can be divided into a depth data set, for example, the data amount carried by each data unit in the multi-channel of the decompressed depth data is 8 bits, then the data amount contained in one depth data set is 16 bits, and after the depth data of the two data units in the multi-channel of the depth data set are combined, the depth data of the 16 bits of the pixel point in the depth image data in a single channel can be obtained.

It should be noted that, the depth data in the two data units for merging are originally obtained by splitting the depth data of the same pixel point of the depth image data under a single channel.

As shown in fig. 1d, in this embodiment, in order to enable the single-channel 16-bit depth image data to be subjected to lossless compression by WebP, so as to achieve a higher compression rate, the 16-bit depth data of every three pixel points in the depth image data under a single channel can be split to obtain 6 8-bit data, which is 48 bits in total, and the 6 8-bit data are filled into two sets of three-channel data which are also 48 bits, so that the depth image data is converted into a series of three-channel 8-bit data.

After obtaining the compressed depth image data subjected to WebP lossless compression, the background server can decompress the compressed depth image data to obtain decompressed depth data, wherein the decompressed depth data comprises depth data under multiple channels, and then the three-channel 8-bit data is converted through the figure 1d, and then the three-channel 16-bit data is restored.

The object recognition method provided by the application can solve the problem that depth image data is difficult to compress through WebP. According to the application, the depth image data in the single-channel 16-bit format can be converted into three-channel 8-bit precision data accepted by the WebP image format through data filling, so that the converted target depth data can be subjected to lossless compression by using the WebP, the compression rate is improved, the network transmission flow consumption and the overall transmission time consumption are reduced, and the face-brushing payment flow is smoother.

As shown in fig. 1e, a page schematic diagram corresponding to the object recognition method of the present application is shown in a face-brushing payment scene, specifically, when an object to be recognized starts to brush a face, prompt information of "please face the screen, and an acquired face image (specifically, a visible light image) are displayed on a screen, and in addition, the face-brushing payment device may acquire an infrared light image and a depth image; the object recognition method converts the depth image into 8bit data under three channels, compresses the converted target depth data based on a WebP compression format, and sends the compressed depth image data, the acquired infrared light image and visible light image to the background for object recognition. When the identity of the object to be identified cannot be clearly identified through object identification, secondary verification is needed, the object to be identified can be guided to input the last four digits of the bound mobile phone number on the screen, so that the identity information of the object to be identified is determined, corresponding payment is further carried out, and after the payment is successful, the paid amount, the preferential information and the like can be displayed on the screen.

Specifically, as shown in fig. 1f, the page a, the page b, the page c and the page d are front screen display pages in the face-brushing payment process, the page e, the page f, the page g and the page h are rear screen display pages in the face-brushing payment process, the front screen may be a screen of a terminal corresponding to an object to be identified, the rear screen may be a screen of a terminal corresponding to a cashier, and the rear screen may observe the front screen in real time in the face-brushing payment process so as to guide the front screen user when the operation steps are ambiguous. The object to be identified may click on "pay for face" in page a to initiate a process of pay for face, and the cashier may click on "pay for face" in the back screen (page e) to initiate a process of pay for face. After the face-brushing payment flow starts, a face image aiming at an object to be identified can be acquired based on the page b, after the identity information of the object to be identified is acquired through the acquired face image, a page shown as the page c can be displayed, a user can click a payment confirmation button in the page c to finish payment, a front screen displays a successful payment page (shown as the page d) when the payment is finished, and a rear screen pops up a successful payment message notification (shown as the page h).

In a specific embodiment, as shown in fig. 1g, a flowchart corresponding to face payment based on the object recognition method provided by the present application is specifically described as follows:

1. the face brushing flow is started, the process of matching the object to be identified with image acquisition can be guided, and the acquired visible light image, infrared light image and depth image aiming at the object to be identified are subjected to preferential biopsy operation;

2. obtaining a preferred biopsy result, and obtaining a visible light image, an infrared light image and a depth image corresponding to the success of the preferred biopsy;

3. compressing the visible light image and the infrared light image through JPG; for the depth image, 16bit depth data under a single channel can be filled and converted into three-channel 8bit precision data supported by WebP (target compression format), and then lossless compression is carried out through WebP to obtain compressed depth image data;

4. and packaging the three compressed images (namely the compressed depth image data, the compressed visible light image and the compressed infrared light image) and directly sending the three compressed images to the background for object recognition. After receiving the image, the background needs to decompress the image and then identify the object to obtain an object identification result;

5. and after the back end determines the identity information of the object to be identified, returning a payment code, and ending the flow after the terminal finishes the payment operation.

As can be seen from the above, the present embodiment may collect depth image data for an object to be identified, where the depth image data includes depth data of at least one pixel point under a single channel; grouping the at least one pixel point to obtain at least one pixel point group, wherein each pixel point group comprises a preset number of pixel points; acquiring a target blank data set corresponding to a target compression format, wherein the target blank data set comprises data units under multiple channels; aiming at the pixel points in each pixel point group, carrying out data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixel points in a single channel to obtain target depth data suitable for the target compression format; compressing the target depth data based on the target compression format to obtain compressed depth image data; and carrying out object recognition on the object to be recognized according to the compressed depth image data to obtain an object recognition result of the object to be recognized. The application can convert the depth image data into the target depth data under multiple channels, thereby compressing the target depth data based on the target compression format, improving the compression rate of the depth image data while ensuring the accuracy of the depth image data, being beneficial to improving the transmission efficiency of the depth image data, further improving the object recognition efficiency and reducing the network transmission flow consumption in the object recognition process.

The method according to the previous embodiment will be described in further detail below with the object recognition device being integrated in the terminal.

The embodiment of the application provides an object identification method, as shown in fig. 2, the specific flow of the object identification method can be as follows:

201. the terminal acquires depth image data for an object to be identified, wherein the depth image data comprises depth data of at least one pixel point under a single channel.

Wherein the depth image is an image or image channel containing information about the distance of the surface of the acquired scene object to the viewpoint. Each pixel point of the depth image represents a vertical distance between the depth camera plane and the photographed object plane, and is usually represented by 16 bits, in millimeters. In a face payment scenario, depth images are typically used for in vivo detection and assisted identification.

202. The terminal performs grouping processing on the at least one pixel point to obtain at least one pixel point group, wherein each pixel point group comprises a preset number of pixel points.

203. The terminal acquires a target blank data set corresponding to a target compression format, wherein the target blank data set comprises data units under multiple channels.

204. And the terminal performs data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixel points in each pixel point group under a single channel to obtain target depth data suitable for the target compression format.

205. And the terminal compresses the target depth data based on the target compression format to obtain compressed depth image data.

206. And the terminal transmits the compressed depth image data to a server so as to perform object recognition on the object to be recognized based on the compressed depth image data through the server, and an object recognition result of the object to be recognized is obtained.

Optionally, in this embodiment, the step of transmitting, by the terminal, the compressed depth image data to the server, so as to perform object recognition on the object to be recognized based on the compressed depth image data by the server, to obtain an object recognition result of the object to be recognized may include:

As can be seen from the above, in this embodiment, depth image data for an object to be identified may be acquired by a terminal, where the depth image data includes depth data of at least one pixel point under a single channel; grouping the at least one pixel point to obtain at least one pixel point group, wherein each pixel point group comprises a preset number of pixel points; acquiring a target blank data set corresponding to a target compression format, wherein the target blank data set comprises data units under multiple channels; aiming at the pixel points in each pixel point group, carrying out data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixel points in a single channel to obtain target depth data suitable for the target compression format; compressing the target depth data based on the target compression format to obtain compressed depth image data; and the terminal transmits the compressed depth image data to a server so as to perform object recognition on the object to be recognized based on the compressed depth image data through the server, and an object recognition result of the object to be recognized is obtained. The application can convert the depth image data into the target depth data under multiple channels, thereby compressing the target depth data based on the target compression format, improving the compression rate of the depth image data while ensuring the accuracy of the depth image data, being beneficial to improving the transmission efficiency of the depth image data, further improving the object recognition efficiency and reducing the network transmission flow consumption in the object recognition process.

In order to better implement the above method, the embodiment of the present application further provides an object recognition device, as shown in fig. 3, where the object recognition device may include an acquisition unit 301, a grouping unit 302, an acquisition unit 303, a filling unit 304, a compression unit 305, and a recognition unit 306, as follows:

(1) An acquisition unit 301;

and the acquisition unit is used for acquiring depth image data aiming at the object to be identified, wherein the depth image data comprises depth data of at least one pixel point under a single channel.

(2) A grouping unit 302;

the grouping unit is used for grouping the at least one pixel point to obtain at least one pixel point group, and each pixel point group comprises a preset number of pixel points.

(3) An acquisition unit 303;

the acquisition unit is used for acquiring a target blank data set corresponding to the target compression format, wherein the target blank data set comprises data units under multiple channels.

(4) A filling unit 304;

and the filling unit is used for carrying out data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixel points in each pixel point group to obtain target depth data suitable for the target compression format.

(5) A compression unit 305;

and the compression unit is used for compressing the target depth data based on the target compression format to obtain compressed depth image data.

(6) An identification unit 306;

As can be seen from the above, in this embodiment, the acquisition unit 301 may acquire depth image data for an object to be identified, where the depth image data includes depth data of at least one pixel under a single channel; grouping the at least one pixel point through a grouping unit 302 to obtain at least one pixel point group, wherein each pixel point group comprises a preset number of pixel points; acquiring a target blank data set corresponding to a target compression format through an acquisition unit 303, wherein the target blank data set comprises data units under multiple channels; the filling unit 304 performs data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixel points in the single channel for each pixel point group to obtain target depth data suitable for the target compression format; compressing the target depth data by a compression unit 305 based on the target compression format to obtain compressed depth image data; and carrying out object recognition on the object to be recognized according to the compressed depth image data by a recognition unit 306 to obtain an object recognition result of the object to be recognized. The application can convert the depth image data into the target depth data under multiple channels, thereby compressing the target depth data based on the target compression format, improving the compression rate of the depth image data while ensuring the accuracy of the depth image data, being beneficial to improving the transmission efficiency of the depth image data, further improving the object recognition efficiency and reducing the network transmission flow consumption in the object recognition process.

The embodiment of the application also provides an electronic device, as shown in fig. 4, which shows a schematic structural diagram of the electronic device according to the embodiment of the application, where the electronic device may be a terminal or a server, specifically:

the electronic device may include one or more processing cores 'processors 401, one or more computer-readable storage media's memory 402, power supply 403, and input unit 404, among other components. Those skilled in the art will appreciate that the electronic device structure shown in fig. 4 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components. Wherein:

the processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 402, and calling data stored in the memory 402. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

The electronic device further comprises a power supply 403 for supplying power to the various components, preferably the power supply 403 may be logically connected to the processor 401 by a power management system, so that functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The electronic device may further comprise an input unit 404, which input unit 404 may be used for receiving input digital or character information and generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.

Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 401 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:

collecting depth image data aiming at an object to be identified, wherein the depth image data comprises depth data of at least one pixel point under a single channel; grouping the at least one pixel point to obtain at least one pixel point group, wherein each pixel point group comprises a preset number of pixel points; acquiring a target blank data set corresponding to a target compression format, wherein the target blank data set comprises data units under multiple channels; aiming at the pixel points in each pixel point group, carrying out data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixel points in a single channel to obtain target depth data suitable for the target compression format; compressing the target depth data based on the target compression format to obtain compressed depth image data; and carrying out object recognition on the object to be recognized according to the compressed depth image data to obtain an object recognition result of the object to be recognized.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any of the object recognition methods provided by the embodiments of the present application. For example, the instructions may perform the steps of:

Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

Because the instructions stored in the computer readable storage medium may execute the steps in any one of the object recognition methods provided in the embodiments of the present application, the beneficial effects that any one of the object recognition methods provided in the embodiments of the present application can be achieved, which are detailed in the previous embodiments and are not described herein.

According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in various alternative implementations of the object recognition aspects described above.

The foregoing has described in detail a method for identifying objects and related devices according to embodiments of the present application, and specific examples have been applied herein to illustrate the principles and embodiments of the present application, and the above description of the embodiments is only for aiding in understanding the method and core ideas of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims

1. An object recognition method, comprising:

2. The method of claim 1, wherein grouping the at least one pixel to obtain at least one pixel group comprises:

3. The method according to claim 2, wherein the filling processing of the reference pixel points for the depth image data to obtain the depth image data after the filling processing includes:

4. The method according to claim 1, wherein the performing data filling processing on the data units in the multiple channels in the target blank data set based on the depth data of the pixels in the single channel for each pixel group to obtain target depth data applicable to the target compression format includes:

5. The method of claim 1, wherein the compressing the target depth data based on the target compression format to obtain compressed depth image data comprises:

acquiring data index information corresponding to the target depth data;

6. The method of claim 5, wherein the obtaining the data index information corresponding to the target depth data comprises:

7. The method according to claim 1, wherein the performing object recognition on the object to be recognized according to the compressed depth image data to obtain an object recognition result of the object to be recognized includes:

8. The method of claim 7, wherein the performing data merging processing on the depth data under multiple channels in the decompressed depth data to obtain depth data of pixels in the depth image data under a single channel includes:

9. An object recognition apparatus, comprising:

10. An electronic device comprising a memory and a processor; the memory stores an application program, and the processor is configured to execute the application program in the memory to perform the operations in the object recognition method according to any one of claims 1 to 8.

11. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the object recognition method of any one of claims 1 to 8.

12. A computer program product comprising a computer program or instructions which, when executed by a processor, carries out the steps of the object recognition method of any one of claims 1 to 8.