CN112949427A

CN112949427A - Person identification method, electronic device, storage medium, and apparatus

Info

Publication number: CN112949427A
Application number: CN202110178851.5A
Authority: CN
Inventors: 李洋
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2021-06-11

Abstract

The application relates to a person identification method, an electronic device, a storage medium and an apparatus. The method comprises the steps of obtaining a video frame at the current moment, wherein the current moment is the moment when an operation event related to a video is monitored; acquiring a face head portrait picture in a video frame; and sending the face avatar picture to a server, and receiving the figure information which is returned by the server and corresponds to the face avatar picture. The method and the device have the advantages that the detection process of the video frames is completed at the client side, and the identification process is completed at the server, so that the pressure of the server is relieved, and the bandwidth cost is reduced.

Description

Person identification method, electronic device, storage medium, and apparatus

Technical Field

The present application relates to the field of computer technologies, and in particular, to a person identification method, an electronic device, a storage medium, and an apparatus.

Background

When the AI algorithm is adopted to identify the persons in the video content at present, the mode implementation based on the server is realized, specifically, the server captures the video frame image to be identified, and the AI algorithm is adopted to detect and identify the video frame image to be identified, so that the persons contained in the video frame image to be identified are obtained.

However, when a large number of video frame images are transmitted to the server through the network for detection and identification, not only the bandwidth cost is increased, but also the server is stressed.

Disclosure of Invention

The application provides a person identification method, an electronic device, a storage medium and a device, which are used for reducing the pressure of a server and reducing the bandwidth cost.

In a first aspect, a person identification method is provided, where the method includes:

acquiring a video frame at the current moment, wherein the current moment is the moment when an operation event related to a video is monitored;

acquiring a face head portrait picture in the video frame;

and sending the face avatar picture to a server, and receiving the figure information which is returned by the server and corresponds to the face avatar picture.

Optionally, acquiring a face avatar picture in the video frame includes:

adopting a pre-trained face detection model to infer a target area of a face head portrait in the video frame;

and intercepting the face head portrait picture corresponding to the target area from the video frame.

Optionally, the method for obtaining a target region of a face avatar in the video frame by inference using a pre-trained face detection model includes:

creating an inference request corresponding to the face detection model;

and reasoning the video frame by using the reasoning request to obtain the target area.

Optionally, the method further includes, by using a pre-trained face detection model, performing inference to obtain a face avatar before a target region in the video frame, where the face avatar is located in front of the target region in the video frame:

and loading the human face detection model.

Optionally, loading the face detection model comprises:

obtaining an intermediate file corresponding to the face detection model;

and loading the intermediate file to realize the loading of the face detection model.

Optionally, the intermediate file includes a first file and a second file, the first file is used to describe a network topology of the face detection model, and the second file includes model parameters and model variables of the face detection model.

Optionally, loading the intermediate file includes:

obtaining model parameters and model variables in the second file;

obtaining a weight parameter of the face detection model by using the model parameter and the model variable;

and loading the first file and the weight parameter so as to realize the loading of the intermediate file.

Optionally, loading the face detection model comprises:

obtaining a model file corresponding to the face detection model;

loading the model file through a neural network processor to obtain a task model;

and configuring a model memory for the task model, thereby realizing the loading of the face detection model.

In a second aspect, an electronic device is provided, which includes: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

the memory for storing a computer program;

the processor is configured to execute the program stored in the memory to implement the method of the first aspect.

In a third aspect, a computer-readable storage medium is provided, which stores a computer program, wherein the computer program, when executed by a processor, implements the method of the first aspect.

In a fourth aspect, there is provided a person identification apparatus, the apparatus including:

the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring a video frame at the current moment, and the current moment is the moment when an operation event related to a video is monitored;

the second acquisition unit is used for acquiring a human face head portrait picture in the video frame;

and the receiving unit is used for sending the face avatar picture to a server and receiving the figure information which is returned by the server and corresponds to the face avatar picture.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:

according to the technical scheme provided by the embodiment of the application, when people in a video frame are identified, a client side obtains the video frame at the current moment and the face avatar picture in the video frame, and a server identifies the face avatar picture to obtain the people information corresponding to the face avatar picture. Namely, the video frame detection process is completed at the client side, and the identification process is completed at the server side, so that the pressure of the server is relieved, and the bandwidth cost is reduced.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a schematic flowchart of a person identification method in an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating another method for identifying a person according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of another method for identifying a person in an embodiment of the present application;

FIG. 4 is a schematic flow chart illustrating another method for identifying a person according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a person identification device according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of another human recognition device in the embodiment of the present application;

FIG. 7 is a schematic diagram of a person identification system according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

The embodiment of the application provides a person identification method, which can be applied to any electronic equipment at a client side, wherein the electronic equipment can be at least one of a mobile phone, a tablet computer and a notebook computer;

as shown in fig. 1, the method may include the steps of:

step 101, obtaining a video frame at the current moment.

The current moment is the moment when the operation event related to the video is monitored.

In one example, the operation event may be a drag event for the video.

At this time, the current time is the time when the drag event ends.

In practical application, when a user watches a video through electronic equipment, the user drags a progress bar of the video, and the electronic equipment monitors a dragging event; when the user's drag action is ended, the electronic device may determine an end time of the drag event to acquire a video frame of the end time.

In another example, the operation event may be a click event for a video.

In practical application, when a user watches a video through electronic equipment, the user can click the video through a display interface of the electronic equipment, the electronic equipment monitors the click event, and determines the current time corresponding to the click event so as to acquire the video frame at the current time.

In another example, the operation event may be a click event of an AI (Artificial Intelligence) recognition button, which is a virtual button displayed on an interface displaying a video.

Optionally, since the image pixel size corresponding to the video frame is large, in order to reduce the amount of computation and save the network bandwidth, the present embodiment may perform format conversion and compression processing on the video frame after obtaining the video frame at the current time.

Specifically, the video frame may be converted from YUV format to RGB format, and the video frame in RGB format is compressed to obtain a video frame in JPEG format.

And 102, acquiring a face head portrait picture in the video frame.

Optionally, in this embodiment, a face detection model obtained by pre-training may be adopted to perform inference on a video frame to obtain a target area of a face avatar in the video frame, and a face avatar picture corresponding to the target area is captured from the video frame.

Optionally, in this embodiment, in order to speed up the execution speed of the face detection model, the face detection model may be loaded to a device, such as a Graphics Processing Unit (GPU) device. In specific implementation, an inference engine is created on the GPU, and the face detection model is loaded to the graphics processor equipment by using the inference engine.

Optionally, an inference request corresponding to the face detection model may be created based on an inference engine, the inference request is used to perform inference on the video frame to obtain a target region of the face avatar in the video frame, and the target region is used to capture a face avatar picture from the video frame.

For example, in the face detection model, a sliding window may be used to detect a region of a face image in a video frame, resulting in a target region containing the face image.

Wherein the size of the sliding window is smaller than the pixel size of the video frame.

Alternatively, the sliding window may be a rectangular window, and thus the resulting target region may also be a rectangular region.

In the concrete implementation, the sub-picture after the sliding window moves every time is obtained from the picture, the neural network model is adopted to identify the sub-picture, and if the sub-picture obtained through identification is one part of the face image picture, the sub-picture is reserved until the whole face image picture in the picture is identified.

Optionally, when the face avatar picture is captured, the coordinates of the target area may be obtained, and the face avatar picture corresponding to the coordinates of the target area is captured from the video frame.

In this embodiment, since the target area corresponds to a plurality of sub-pictures, the coordinates of the target area can be determined by the pixel coordinates of the sub-pictures.

Illustratively, the face avatar picture can be obtained by calling Gdiplus (graphics device) graphics library of Windows and intercepting the picture corresponding to the coordinates of the target area in the video frame.

Optionally, in this embodiment, after the video frame at the current time is acquired, the video frame may be stored locally, so before the face detection model is used to infer the video frame, the picture binary data of the video frame may be read locally based on the local storage path of the video frame to acquire the video frame.

Illustratively, the picture binary data of the video frame may be read by a file format reader library (format reader) in the electronic device based on the local storage path, and the picture binary data may be input to the face detection model.

Optionally, based on the specific implementation of the device, the present embodiment provides the following two specific implementations of loading the face detection model into the device:

first, when the device is a Graphics Processing Unit (GPU):

optionally, the intermediate files of the face detection model may include a first file and a second file.

The first file is used for describing a network topology structure of the face detection model, and the second file comprises model parameters and model variables of the face detection model.

For example, the file format of the first file may be an XML file, that is, after the network topology of the face detection model is obtained, the network topology may be stored in the XML file, so as to obtain the first file.

Illustratively, the file format of the second file may be a BIN file.

Based on the above specific implementation of the intermediate file, when the intermediate file is loaded, as shown in fig. 2, the method may include the following steps:

step 201, obtaining model parameters and model variables in a second file;

step 202, obtaining a weight parameter of the face detection model by using the model parameter and the model variable;

and step 203, loading the first file and the weight parameter to realize the loading of the intermediate file.

Secondly, when the device is a Neural-Network Processing Unit (NPU):

when a face detection model is loaded to equipment, a model file corresponding to the face detection model is obtained; loading the model file through a neural network processor to obtain a task model; and configuring a model memory for the task model, thereby realizing the loading of the face detection model.

The model file stores a network topology structure, model parameters and model variables of the face detection model.

Illustratively, the file extension of the model file may be.

Under the condition, when the face portrait image in the video frame is obtained, the video frame is input into the model memory, and the task model is operated in the model memory, so that the face portrait image is obtained.

The model memory comprises the size of the memory occupied by the running of the task model.

And 103, sending the face avatar picture to the server, and receiving the figure information corresponding to the face avatar picture returned by the server.

Optionally, considering that when the server identifies the face avatar picture, the pixel size of the identified picture may be 112 × 112, based on which, in order to reduce the network bandwidth, before sending the face avatar picture to the server, the face image picture may also be scaled to the minimum picture size supported by the server.

Illustratively, the persona information includes, but is not limited to, a persona name, and a persona identification.

In practical application, after the figure information corresponding to the face avatar picture is received, the avatar and the name of the figure can be displayed on a display interface of the video.

Optionally, after receiving the personal information corresponding to the facial avatar picture, the display order corresponding to the personal identifier in the personal list corresponding to the video may be updated from the current order to the target order according to the personal identifier in the personal information, so that when there is an operation on the personal list by the user (e.g., a click operation on the personal list by the user), the personal list is displayed to the user according to the target order.

Alternatively, when the presentation order corresponding to the personal identifier in the personal list is updated to the target order, the presentation order of other personal information in the personal list, which is the personal information whose presentation order is changed based on the change of the presentation order corresponding to the personal identifier, may be sequentially moved downward.

For example, before the operation event is detected, the presentation order of the personal information in the personal list can be seen in table one:

watch 1

Character information A
	Character information B
Personal information C

The person identification obtained based on the operation event corresponds to the personal information C, and when the determined target order of the personal information C is displayed in the second row in the personal list, the updated personal list can be referred to as table two:

watch two

Character information A
	Personal information C
Character information B

It can be seen that the display order of the personal information B is changed correspondingly based on the change of the display order corresponding to the personal identifier, so that the display order of the personal information B is sequentially moved downwards based on the change of the display order of the personal information C, and the display order of the personal information B is changed from the second row display to the third row display in the personal list.

Optionally, the initial display order of each piece of character information in the character list may be issued by the server to the client.

For example, the initial presentation order may correspond to the user portraits of the video viewing users, so for the same video, when the user portraits of the video viewing users are different, the initial presentation order of the character list is also different.

The user image includes parameters such as age, sex, and interest.

Optionally, in this embodiment, the update cycle of the character list may include the following two cases:

firstly, when an operation event is detected, the display order of the personal information in the personal list is updated.

In this case, when the presentation order corresponding to the person identifier in the person list is updated to the target order and the operation event is detected again, the presentation order of each person information in the person list is updated according to the operation event detected again.

And secondly, updating the display sequence of each character information in the character list by combining the operation event and a preset updating period.

In this case, after the display order corresponding to the person identifier in the person list is updated to the target order, when the operation event is detected again or the preset update period comes, the display order of the person information in the person list is updated again.

For example, when the preset updating period comes, the displaying order of each task information in the task list is updated according to the displaying order after the operation time occurs or the initial displaying order of the character list.

Optionally, the target order may be determined according to the weight corresponding to the person identifier, and on the basis of the foregoing embodiment, as shown in fig. 3, the following manner is provided for determining the target order:

and 301, acquiring the current weight corresponding to the person identifier in the current order.

In this embodiment, the current weight corresponding to the person identifier includes the following two cases:

first, the current weight is the last target weight determined after the last operation event of the user is finished.

Secondly, the current weight is an initial preset value.

In this case, when it is determined that there is no last target weight, the current weight is set to the initial preset value.

Wherein, the initial preset value can be set manually.

In this embodiment, the initial preset values of the personal information in the personal list are the same.

And 302, determining a target weight corresponding to the current weight and a preset weight variable.

In this embodiment, the preset weight variable is preset by a user.

Optionally, the current weight and the preset weight variable may be summed to obtain the target weight.

Optionally, when the weight corresponding to the personal identifier is updated from the current weight to the target weight, the weights corresponding to other personal information in the personal list are also changed accordingly.

For example, when the target weight is obtained by summing the current weight and a preset weight variable, the weight corresponding to the other personal information is reduced by performing a subtraction operation based on the preset weight variable.

For example, when the current weight corresponding to the personal identifier is 1, the weight corresponding to the other personal information is 1, and the preset weight variable is also 1, the obtained target weight may be 2, and the weight corresponding to the other personal information becomes 0.

Step 303, determining a target order corresponding to the person identifier by using the target weight.

In this embodiment, the corresponding relationship between the weight and the order may be preset, and therefore, after the target weight is determined, the target order corresponding to the target weight may be determined by using the preset corresponding relationship between the weight and the order.

Taking the operation event as the drag event as an example, the process of updating the display order of the character list according to the weight is illustrated as follows:

before the drag event is detected, the person list has the person information a, the person information B and the person information C, the current weights corresponding to the three person information are all set to be 1, when the user drags the video progress bar once to only the video frame image where the person B is located, according to the recognition result, the weight of the person B is increased by 1, namely the target weight of the person B is changed to be 2, at the moment, the weight of the person A, C is changed to be 0, and therefore after the person list is rearranged according to the weights, the display order of the person information in the person list is changed from A, B, C to B, A, C.

The embodiment of the application provides a person identification method, which is applied to a server, and as shown in fig. 4, the method may include the following steps:

step 401, obtaining a face head portrait picture from a client;

step 402, identifying the face avatar picture to obtain a person identification corresponding to the face avatar picture;

optionally, the process of recognizing the face avatar picture by the server includes two processes of face feature point marking and face recognition.

The face feature point marks comprise feature point marks of facial five sense organs and feature point marks of facial contours.

Optionally, feature points of the facial features and feature points of the facial contour may be obtained by acquiring a facial feature region and a facial contour region of the facial features in the facial head image picture, and respectively performing feature labeling on a picture corresponding to the facial feature region and a picture corresponding to the facial contour region.

Illustratively, the feature point labeling of the facial features is realized based on a three-level DCNN-deep convolutional neural network model, and the feature point labeling process of the facial features can comprise the following three parts:

firstly, inputting a face head portrait picture to a convolutional neural network of a first layer, and marking a facial five sense organ region where the facial five sense organs are located;

secondly, inputting a human face head portrait picture marked with a five sense organ region into the convolution neural network of the second layer, and marking feature points of the five sense organs in the human face head portrait picture;

and thirdly, segmenting the facial head portrait picture based on the marked feature points to obtain a facial head portrait image, and further marking the facial head portrait image by adopting a convolutional neural network model of a third level.

Illustratively, the feature point labeling process of the face contour may include:

firstly, obtaining a face contour region based on a picture and a five sense organ region;

secondly, marking feature points of the face contour region by using a deep convolutional neural network.

In practical applications, the marks of the feature points of the face contour region may be marks of the chin, the forehead, the cheek, and the like.

After the feature points of the facial features image and the feature points of the facial contour are obtained, feature vectors can be generated based on the facial image picture, the feature points of the facial features image and the feature points of the facial contour.

The process of face recognition may include:

searching a target characteristic vector with the highest matching degree with the characteristic vector from a database;

the database stores feature vectors of a plurality of character portraits.

For example, the target feature vector with the highest degree of matching with the feature vector may be the target feature vector with the highest degree of similarity with the feature vector.

Acquiring a matching value of the feature vector and the target feature vector;

and when the matching degree value is larger than a preset matching degree threshold value, taking the figure identification of the figure avatar corresponding to the target characteristic vector as the figure identification corresponding to the face avatar picture.

And step 403, acquiring the personal information corresponding to the personal identification, and returning the personal information to the client.

Optionally, multiple sets of person identifiers and person information are stored in the server, so when the person identifier corresponding to the facial avatar picture is obtained, the person identifier obtained through recognition based on the facial avatar picture and the corresponding person information can be obtained from the multiple sets of person identifiers and person information stored in advance.

Based on the same inventive concept, an embodiment of the present application further provides a person identification device, as shown in fig. 5, including:

a first obtaining unit 501, configured to obtain a video frame at a current time, where the current time is a time when an event related to a video is monitored;

a second obtaining unit 502, configured to obtain a face avatar picture in a video frame;

the receiving unit 503 is configured to send the face avatar picture to the server, and receive the personal information corresponding to the face avatar picture returned by the server.

Based on the same inventive concept, an embodiment of the present application further provides a person identification device, as shown in fig. 6, including:

a third obtaining unit 601, configured to obtain a face avatar picture from the client;

the identification unit 602 is configured to identify the face avatar picture to obtain a person identifier corresponding to the face avatar picture;

a returning unit 603, configured to obtain the personal information corresponding to the personal identifier, and return the personal information to the client.

Based on the same inventive concept, an embodiment of the present application further provides a person identification system, as shown in fig. 7, including:

a server 701 and a client 702 in communication with the server 701;

a server 701, configured to obtain a face avatar picture from a client 702; identifying the face head portrait picture to obtain a person identification corresponding to the face head portrait picture; acquiring the personal information corresponding to the personal identification, and returning the personal information to the client 702;

a client 702, configured to obtain a video frame at a current time, where the current time is a time when an operation event related to a video is monitored; acquiring a face head portrait picture in a video frame; the face avatar picture is sent to the server, and the person information corresponding to the face avatar picture returned by the server 701 is received.

Based on the same concept, an embodiment of the present application further provides an electronic device, as shown in fig. 8, the electronic device mainly includes: a processor 801, a communication interface 802, a memory 803 and a communication bus 804, wherein the processor 801, the communication interface 802 and the memory 803 communicate with each other via the communication bus 804. Wherein, the memory 803 stores the program which can be executed by the processor 801, the processor 801 executes the program stored in the memory 803, and the following steps are realized:

acquiring a face head portrait picture in a video frame;

The communication bus 804 mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 804 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

The communication interface 802 is used for communication between the above-described electronic apparatus and other apparatuses.

The Memory 803 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor 801.

The Processor 801 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc., and may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic devices, discrete gates or transistor logic devices, and discrete hardware components.

In still another embodiment of the present application, there is also provided a computer-readable storage medium having stored therein a computer program which, when run on a computer, causes the computer to execute the person identification method described in the above embodiment.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The available media may be magnetic media (e.g., floppy disks, hard disks, tapes, etc.), optical media (e.g., DVDs), or semiconductor media (e.g., solid state drives), among others.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A person identification method, the method comprising:

acquiring a face head portrait picture in the video frame;

2. The method of claim 1, wherein obtaining the facial avatar picture in the video frame comprises:

3. The method of claim 2, wherein the step of reasoning to obtain the target area of the face avatar in the video frame by using a pre-trained face detection model comprises:

creating an inference request corresponding to the face detection model;

4. The method of claim 2, wherein the deriving the face avatar before the target region in the video frame using a pre-trained face detection model further comprises:

and loading the human face detection model.

5. The method of claim 4, wherein loading the face detection model comprises:

obtaining an intermediate file corresponding to the face detection model;

6. The method of claim 5, wherein the intermediate file comprises a first file describing a network topology of the face detection model and a second file comprising model parameters and model variables of the face detection model.

7. The method of claim 6, wherein loading the intermediate file comprises:

obtaining model parameters and model variables in the second file;

8. The method of claim 4, wherein loading the face detection model comprises:

obtaining a model file corresponding to the face detection model;

9. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

the memory for storing a computer program;

the processor, executing a program stored in the memory, implementing the method of any of claims 1-7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 7.

11. A person identification apparatus, characterized in that the apparatus comprises: