WO2021159672A1

WO2021159672A1 - Face image recognition method and apparatus

Info

Publication number: WO2021159672A1
Application number: PCT/CN2020/105861
Authority: WO
Inventors: 吴贞海
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2020-02-11
Filing date: 2020-07-30
Publication date: 2021-08-19
Also published as: CN111290800A

Abstract

The present application is applicable to the technical field of image processing, and provides a face image recognition method and apparatus. The method comprises: receiving a video playback instruction; starting a video playback application, and loading a video playback plugin into the video playback application on the basis of a plugin identifier; if the plugin identifier matches an identifier of a face recognition plugin, extracting each video image frame of a video file by means of the video playback application into which the plugin has been loaded; calling the face recognition plugin to extract a face image comprised in each video image frame; and establishing a face image library of the video file according to a physical user corresponding to each face image. In the invention, the face recognition plugin is loaded into the video playback application, such that recognition is automatically performed on a face image comprised in each video image frame while the video file is being played, thereby improving recognition efficiency of a face image, reducing user operations.

Description

Recognition method and equipment of face image

This application affirms that it enjoys the priority of the Chinese patent application with the application number 202010087125.8 filed on February 11, 2020, entitled "A method and equipment for facial image recognition", and the entire content of the Chinese patent application is incorporated by reference In this application.

Technical field

This application belongs to the field of image processing technology, and in particular relates to a method and device for recognizing a face image.

Background technique

With the continuous development of recognition technology, more and more recognition tasks can be automatically performed by computers. For example, the ORC algorithm is used to recognize the text in the picture, and for example, the QR code recognition algorithm is used to analyze and extract the two-dimensional code image. information. In addition to the above-mentioned recognition technologies, face recognition technology can automatically determine the user's identity, and its application fields are becoming wider and wider. How to perform face recognition efficiently and accurately has become a problem that needs to be solved today.

The inventor realizes that the existing face recognition technology is mainly used for static image recognition, but it is difficult to realize face recognition in video. Especially for most video playback applications, the user needs to manually intercept the target user's location. The video image frames are handed over to the corresponding software for recognition, which increases the difficulty of face collection and the operation efficiency.

technical problem

In view of this, the embodiments of the present application provide a face image recognition method and device to solve the existing face image recognition technology, which can only perform face recognition on static images, requiring the user to manually intercept the location of the target user The video image frames are handed over to the corresponding software for recognition, which increases the difficulty of face collection and the problem of operating efficiency.

Technical solutions

The first aspect of the embodiments of the present application provides a face image recognition method, including:

Receiving a video play instruction; the video play instruction carries the plug-in identifier of the video play plug-in that needs to be called when the video file is played;

Start a video playback application, and load the video playback plug-in to the video playback application based on the plug-in identifier;

If the plug-in ID matches the ID of the face recognition plug-in, extract each video image frame of the video file through the video playback application after the plug-in is loaded;

Calling the face recognition plug-in to extract the face images contained in each of the video image frames;

According to the entity user corresponding to each of the facial images, a facial image library of the video file is established.

Beneficial effect

This embodiment of the application starts the video playback application by receiving the video playback instruction initiated by the user, and loads the video playback plug-in indicated by the video playback instruction to expand the functions of the video playback application. If it is detected that the loaded video playback plug-in is a human face Recognition plug-in, you can analyze the video file through the video playback application, extract each video image frame, and import each video image frame into the face recognition plug-in, so as to recognize the face image contained in each video image frame through the face recognition plug-in , And build a face image library associated with video files based on all face images, thereby realizing face recognition for dynamic video files. Compared with the existing facial image recognition technology, this application does not require the user to manually intercept the video image frame and hand it over to other applications for facial image recognition. Instead, the face recognition plug-in can be loaded in the video playback application. While playing the video file, the face image contained in each video image frame is automatically recognized, which improves the efficiency of face image recognition and reduces user operations. On the other hand, since the recognition process of the face image is synchronized with the playback of the video file, the user does not need to perform the face image recognition after watching the video file, thereby reducing the time-consuming operation of the face recognition operation.

Description of the drawings

FIG. 1 is an implementation flowchart of a face image recognition method provided by the first embodiment of the present application;

FIG. 2 is a schematic diagram of playing a video file provided by an embodiment of the present application;

FIG. 3 is a specific implementation flowchart of a face image recognition method provided by the second embodiment of the present application;

4 is a specific implementation flow chart of a face image recognition method S105 provided by the third embodiment of the present application;

FIG. 5 is a specific implementation flowchart of a face image recognition method S1051 provided by the fourth embodiment of the present application;

FIG. 6 is a specific implementation flowchart of a face image recognition method S105 provided by the fifth embodiment of the present application;

FIG. 7 is a specific implementation flowchart of a face image recognition method S104 provided by the sixth embodiment of the present application;

FIG. 8 is a specific implementation flowchart of a face image recognition method S102 provided by the seventh embodiment of the present application;

FIG. 9 is a structural block diagram of a face image recognition device provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of a terminal device provided by another embodiment of the present application.

Embodiments of the present invention

In the embodiment of the present application, the execution subject of the process is the terminal device. The terminal equipment includes, but is not limited to: servers, computers, smart phones, and tablet computers that can perform facial image recognition tasks. Fig. 1 shows a flow chart of the method for recognizing a face image provided by the first embodiment of the present application, and the details are as follows:

In S101, a video play instruction is received; the video play instruction carries the plug-in identifier of the video play plug-in that needs to be called when the video file is played.

In this embodiment, the user can send a video play instruction to the terminal device. Specifically, the user can trigger the video playback instruction locally on the terminal device through the interactive module configured by the terminal device, such as a keyboard, mouse, or touch screen; of course, the user can also generate a video playback instruction on the local user terminal and establish The communication link between the user terminal and the terminal device sends video playback instructions to the terminal device through the communication link, that is, the user terminal is equivalent to a remote control device that can control the terminal device to perform video playback operations.

In this embodiment, when the user initiates a video playback operation, he can select the video playback plug-in to be called from the loadable plug-in list of the terminal device, and select one or more from the list of loadable plug-ins by clicking or checking. After selecting a video playback plug-in, click the play button. At this time, the terminal device will recognize that the user has selected the video playback plug-in, and add the plug-in identifier of the video playback plug-in selected by the user to the video playback instruction to trigger the video playback operation. Optionally, the terminal device can be configured with a default configuration mode, that is, when the terminal device performs a video playback operation, it can load one or more video playback plug-ins by default, without the need for the user to re-select plug-ins for each playback operation. Thereby improving the user's operating efficiency. For example, the default configuration mode of the terminal device is to load the frame rate optimization plug-in and the face recognition plug-in by default. When it is detected that the user clicks the video playback button and does not select the video playback plug-in to be loaded, the plug-in identification of the two plug-ins Add to the video playback instruction, and generate the video playback instruction. The video playback plug-in to be loaded in the default configuration mode can be set by default by the system, or can be manually configured by the user. Preferably, the terminal device can count the use times of each video playback plug-in, and if it detects that the use times of a certain video playback plug-in is greater than a preset use threshold, it prompts the user whether to add the video playback plug-in to the default configuration mode, if After receiving the user's feedback to agree to the addition instruction, the video playback plug-in whose use times are greater than the use threshold is added to the default configuration mode, so that the video playback plug-in is automatically loaded in subsequent playback operations.

Specifically, the terminal device is installed with a VLC video playback application. The VLC video playback application is specifically an application with a core framework to perform video playback functions. The application can add multiple video playback-based plug-ins based on user needs, such as video optimization. Plug-in, video recording plug-in, and face recognition plug-in that needs to be called in this embodiment. When the user performs a playback operation, the video playback instruction can not only specify the video file to be played, but also carry the plug-in identifier that needs to be loaded. For example, the video playback instruction can be: vlc.exe--video-filter all, facereader test.mp4, where vlc.exe is the video playback application that needs to be started, and facereader is the plug-in identifier; test.mp4 It is the file ID of the video file to be played.

In S102, a video playback application is started, and the video playback plug-in is loaded to the video playback application based on the plug-in identifier.

In this embodiment, after receiving the video playback instruction, the terminal device can start the video playback application associated with the instruction, and the terminal device will also parse the video playback instruction, extract the plug-in ID corresponding to the video playback plug-in, and query each plug-in ID The corresponding video playback plug-in loads the video playback plug-in to the video playback application to expand the functions of the video playback application.

In this embodiment, the video playback application can be associated with a list of loadable plug-ins. Each video playback plug-in can store the startup declaration file in the installation address of the video playback application. When the video playback application starts, it will detect the storage in the installation location. Some startup declaration files are generated to generate a list of loadable plug-ins for the video playback application. The terminal device will detect whether the plug-in identifier in the video playback instruction is in the list of loadable plug-ins, if so, query the installation address of the video playback plug-in, and create a new thread in the process of the video playback application, and run the video playback through this thread The running file of the plug-in to load the video playback plug-in to the video playback application; if it is detected that the plug-in ID is not in the list of loadable plug-ins, the plug-in non-existence information is output.

Optionally, in this embodiment, if it is detected that the plug-in identifier carried by the video play instruction is not in the list of loadable plug-ins, a plug-in download request may be generated based on the plug-in identifier, and the plug-in download request is sent to the server corresponding to the video playback application. In order to download the plug-in running file corresponding to the plug-in ID from the server, and after the download is completed, add the plug-in ID to the list of loadable plug-ins to load the video playback plug-in to the video playback application.

In S103, if the plug-in ID matches the ID of the face recognition plug-in, each video image frame of the video file is extracted by the video playback application after the plug-in is loaded.

In this embodiment, the terminal device parses the video playback instruction, determines the file identifier of the video file to be played, and obtains the video file based on the file identifier, imports the video file into the video playback application, and outputs the video file through the video playback application. When the video playback application outputs a video file, it reads each video image frame contained in the video file, and based on the frame number of each video image frame, outputs each video playback frame in turn at the preset video playback frame rate, for example, the video The playback frame rate can be 60dps, that is, 60 video image frames are output per second.

In this embodiment, since the video playback application can be loaded with multiple video playback plug-ins, the terminal device can create multiple parallel processing threads in the process of the video playback application, and run the corresponding video playback plug-ins through different parallel processing threads. Video processing operations, and different video playback plug-ins have different types of data that need to be input. For example, for a vocal optimization plug-in, the input data type is audio signal; and for video screen brightening plug-ins, the input data type is video Image frame. Therefore, the terminal device needs to determine the currently loaded video playback plug-in before playing the video file, and extract the corresponding data from the video file based on the type of data input required by each different video playback plug-in, and process the data through the corresponding concurrent thread. data. If the terminal device detects that the video playback plug-in that needs to be loaded contains a face recognition plug-in, that is, the plug-in ID matches the ID of the face recognition plug-in, it needs to obtain the input data of the face recognition plug-in, because the face recognition plug-in is for each video image Frame recognition, that is, the input data type is a video image frame, so when the video application plays the video file, each video image frame can be extracted in sequence according to the frame number and imported into the face recognition plug-in for face recognition operation.

In S104, the face recognition plug-in is called to extract the face images contained in each of the video image frames.

In this embodiment, during the playback process of the video playback application, the video image frames can be output to the image processing unit GPU for display output flow. At the same time, the video image frames can be imported into the face recognition plug-in, the face recognition plug-in The input video image frame will be analyzed, and the face image contained in each video image will be extracted through the built-in face recognition algorithm. Since there can be multiple objects to be photographed, the number of face images contained in one video image frame can be multiple.

Specifically, the way for the face recognition plug-in to obtain the face image may be: use the built-in multi-size face template to slide the frame in the video image frame, and calculate the match between the framed area image and the face template If it is detected that the matching degree between the two is greater than the preset matching threshold, it is recognized that the current framed area image contains a human face, and the area image is recognized as a human face image.

Optionally, in this embodiment, the terminal device may first perform face recognition, and after determining the face image contained in the video image frame, perform the video playback operation. In this case, the terminal device can mark the recognized face image in the video image frame, for example, a rectangular frame is used to mark the face area, and the terminal device outputs the marked video image frame, so that the user can quickly determine the video file Face image. Refer to FIG. 2, which shows a schematic diagram of playing a video file provided by an embodiment of the present application.

In S105, a face image library of the video file is established according to the entity user corresponding to each of the face images.

In this embodiment, after acquiring the face image of each video image frame, the terminal device can determine the entity user to which each face image belongs, and classify the face image based on the entity user to which it belongs, and create the person who created the video file. Face image library.

Specifically, in this embodiment, the terminal device can calculate the similarity between any two face images, for example, convert each face image into a face feature vector, and calculate the Euclidean distance between the two face feature vectors, Taking the reciprocal of the Euclidean distance as the similarity between the two, multiple facial region images with greater similarity are recognized as the same entity user, and facial images belonging to different entity users are distinguished, thus realizing the classification of facial images, and Mark a user ID for the face area image belonging to the same entity user.

It can be seen from the above that the face image recognition method provided by the embodiment of the present application receives a user-initiated video playback instruction, starts a video playback application, and loads the video playback plug-in indicated by the video playback instruction to apply to the video playback If it is detected that the loaded video playback plug-in is a face recognition plug-in, you can parse the video file through the video playback application, extract each video image frame, and import each video image frame into the face recognition plug-in. The face recognition plug-in recognizes the face images contained in each video image frame, and builds a face image library associated with the video file based on all face images, thereby realizing face recognition for dynamic video files. Compared with the existing facial image recognition technology, this application does not require the user to manually intercept the video image frame and hand it over to other applications for facial image recognition. Instead, the face recognition plug-in can be loaded in the video playback application. While playing the video file, the face image contained in each video image frame is automatically recognized, which improves the efficiency of face image recognition and reduces user operations. On the other hand, since the recognition process of the face image is synchronized with the playback of the video file, the user does not need to perform the face image recognition after watching the video file, thereby reducing the time-consuming operation of the face recognition operation.

Fig. 3 shows a specific implementation flowchart of a face image recognition method provided by the second embodiment of the present application. Referring to FIG. 3, with respect to the embodiment described in FIG. 1, in the method for recognizing a face image provided by this embodiment, the video playback application is started, and the video playback plug-in is loaded to the Before the video playback application, it also includes: S301~S303, the details are as follows:

Further, before the starting the video playback application and loading the video playback plug-in to the video playback application based on the plug-in identifier, the method further includes:

In S301, a plug-in data package of the face recognition plug-in is acquired.

In this embodiment, the terminal device can obtain the plug-in data package of the face recognition plug-in through a mobile storage device or a network download. Optionally, the terminal device may detect the completeness of the obtained plug-in data package, for example, by extracting the CRC check code of the plug-in data package to determine whether the plug-in data package is complete.

Preferably, in this embodiment, the terminal device can run the above plug-in data package and import the preset test image into the running process to obtain the output result. If the position of the face image marked in the output result is consistent with the preset standard If the coordinates match, the plug-in data package is recognized as a complete data package, and the operation of S302 is executed; otherwise, if the face image contained in the test image cannot be run or cannot be recognized, the plug-in data package is recognized as an abnormal data package, and the face is retrieved again Identify the plug-in data package of the plug-in.

In S302, a version verification request is sent to the server, and a legal verification result fed back by the server based on the version verification request is received; the version verification request includes the version identification of the plug-in data package.

In this embodiment, the terminal device may extract the version identifier carried in the plug-in data packet, generate a version verification request carrying the version identifier, and send the version verification request to the server corresponding to the video playback application. Since the face recognition plug-in needs to be loaded into the video playback application, the video playback application needs to be compatible with the face recognition plug-in. If the server detects that the face recognition plug-in is compatible with the video playback application, it can return a successful legal verification result to the terminal device; otherwise, if the server detects that the video playback plug-in is not compatible with the video playback application, it returns The verification is the result of the legal verification.

Optionally, if the legal verification result is that the verification failed, the terminal device may extract the download link carried in the legal verification result, and retrieve the plug-in data package of the face recognition plug-in through the download link. When the server detects that the current face recognition plug-in is incompatible with the video playback application, a download link of a compatible face playback plug-in can be provided, so that the terminal device can obtain a legal face recognition plug-in through the download link.

In S303, if the legal verification result is a successful verification, query the installation location of the video playback application, and add the call statement file in the plug-in data package to the file directory associated with the installation location To add the face recognition plug-in to the callable plug-in list of the video playback application.

In this embodiment, if the terminal device determines that the plug-in data package is a legal data package, that is, the legal verification result is a successful verification, it extracts the call declaration file contained in the plug-in data package and queries the installation location of the video playback application. Add the extracted call declaration file to the file directory corresponding to the installation location, because the video playback application will detect the declaration file contained in the file directory corresponding to the installation location during the startup process, and generate a callable based on the included declaration file Plug-in list, through the above operations, the video playback application can call the face recognition plug-in in subsequent playback operations.

For example, the video playback application is a VLC video playback application, and the installation location of the VLC video playback application is //plugins, that is, the call statement file of the face recognition plug-in is added to the directory corresponding to the installation location, that is, the FaceReader plug-in Place the libfacereader_plugin.dll file in the plugins directory of the vlc player installation directory.

In the embodiment of this application, by checking the validity of the plug-in data package, it can be ensured that the face recognition plug-in and the video playback application are compatible with each other, and adding the face recognition plug-in to the original video playback application can improve the face recognition. Convenience of operation.

FIG. 4 shows a specific implementation flowchart of a face image recognition method S105 provided by the third embodiment of the present application. Referring to FIG. 4, with respect to the embodiment described in FIG. 1, in the face image recognition method provided in this embodiment, S105 includes: S1051 to S1054, and the details are as follows:

Further, the establishment of the face image database of the video file according to the entity user corresponding to each of the face images includes:

In S1051, the similarity between the face images in any two video image frames is calculated.

In this embodiment, after the terminal device extracts the face images contained in each video image frame, it can perform similarity calculation operations on the face images selected from any two video image frames. For example, the first video image frame contains Face image A, face image B, and face image C, and the second video image frame contains face image A', face image B', and face image C', and the terminal device can retrieve the image from the first video image frame The face image A is extracted, the face image C'is extracted from the second video image frame, and the similarity between the face image A and the face image C'is calculated. Among them, the calculation of similarity can refer to the description of S105, that is, the face image is converted into a face feature vector, and the Euclidean distance between the face feature vectors located on two video image frames is calculated, and the inverse of the Euclidean distance is calculated. As the phase velocity between the above two face images.

Preferably, when the terminal device calculates the similarity, it selects face images in adjacent video image frames for comparison, and calculates the difference between the center coordinates of each face image, and calculates the difference between the center coordinates. Face images whose values are less than the preset distance threshold are recognized as target images, and the similarity between the target images is calculated. Since the two video image frames are adjacent, the distance of the face movement is small. The face image with the smaller face movement distance can be selected as the target image, and the similarity between the target images can be calculated, thereby reducing a large number of invalid The calculation operation of face similarity improves the construction speed of the face database.

In S1052, if the similarity is greater than a preset association threshold, the face images located in two different video image frames are identified as associated images, and the relationship between the two face images is established connection relation.

In this embodiment, the terminal device is set with an association threshold. If it is detected that the similarity between the two face images is less than the preset association threshold, it is determined that the two face images belong to different entity users, and there is no need to establish both. Conversely, if it is detected that the similarity between the two face images is greater than the preset association threshold, it is determined that the two face images belong to the same entity user, and the two face images are identified as related images. , And establish an association relationship between the above two face images, so as to determine all face images belonging to the same entity user according to the association relationship.

In S1053, based on the association relationship of all the face images, divide into multiple user face groups, and configure a user identifier for each of the user face groups; all faces in the user face group The images are the related images.

In this embodiment, after determining the association relationship between all face images, the terminal device can divide the face images according to the association relationship, and divide all mutually related images into a user face group. If the face image is a related image, it means that the two related images belong to the same entity user, which realizes the grouping of face images based on the entity user, and configures the user ID for all the divided face groups. The user ID can be the user serial number , Where the user serial number can be determined according to the sequence of the appearance time of the earliest face image in the user's face group.

Optionally, in this embodiment, the terminal device may store the standard face image of the candidate user, and the terminal device may match the standard face image with the face image in the face group of each user, and identify according to the matching result The user face group associated with the candidate user, and the user identification of the candidate user is used as the user identification of the user face group.

In S1054, the face image database is established according to the user face group and the user identifier.

In this embodiment, after the terminal device has identified all user face groups and configured a user identity for each user face group, it can construct a face image library about video files.

In the embodiment of the present application, by calculating the similarity between the face images, the related images are recognized, and based on the association relationship between the related images, they are divided into user face groups belonging to the same entity user, thereby realizing the face image The classification improves the management efficiency of face images.

Fig. 5 shows a specific implementation flowchart of a face image recognition method S1051 provided by the fourth embodiment of the present application. Referring to FIG. 5, with respect to the embodiment described in FIG. 4, a face image recognition method S1051 provided in this embodiment includes: S501 to S505, which are detailed as follows:

Further, the calculating the similarity between the face images in any two video image frames includes:

In S501, based on a preset list of key face features, the feature coordinates of each key feature of the face in the face image are marked.

In this embodiment, the terminal device can configure a face key feature list according to the face features to be located. For example, the face key feature list can include four facial features: eyes, ears, mouth, and nose. Including eyebrows, forehead, etc., the specific facial features included can be configured according to different recognition accuracy. The terminal device can mark each key feature of the face in the face image according to the list of key features of the face, and obtain the feature coordinates according to the coordinates of each key feature of the face in the face image.

In S502, the feature coordinate sequence of the face image is constructed according to the feature coordinates of the key features of all faces in the list of key features of the face.

In this embodiment, the terminal device can be configured with a sequence template, which specifies the position of each key feature of the face in the sequence template. According to the feature coordinates of each key feature of the face in the face image, they are sequentially imported into the sequence template for association The position of the face image corresponding to the feature coordinate sequence is generated.

In S503, the feature distance value between the feature coordinate sequence of the face image of any two video image frames is calculated.

In this embodiment, after determining the feature coordinate sequence of the face image, the terminal device can calculate the feature distance value between the two feature coordinate sequences through the Euclidean distance calculation formula and the coordinate distance calculation formula.

In S504, identify the number of interval image frames between any two of the video image frames.

In this embodiment, the terminal device can also identify the frame sequence numbers of two video image frames, and determine the number of interval image frames between the two video image frames based on the difference between the frame sequence numbers. For example, if the frame number of a certain video image frame is 65, and the frame number of another video image frame is 68, the number of interval image frames is 68-65=3.

In S505, the feature distance value and the number of interval image frames are imported into a preset similarity calculation model to obtain the similarity between the face images in the two video image frames; the similarity The specific degree calculation model is:

Wherein, Similarity is the similarity; ActFrame is the number of interval image frames; FigDist is the characteristic distance value; BaseDist is the reference distance value; BaseFrame is the shooting frame rate of the video file; StandardDist is the preset adjustment coefficient

In this embodiment, since the larger the value of the number of interval image frames between two video image frames, the longer the moving distance of the face of the same entity user. Therefore, when calculating the similarity between two face images, The normalization process can be performed through the number of interval image frames, thereby reducing the influence of the difference between the number of frames on the similarity calculation. And according to the difference between the feature distance value and the standard distance value, the difference between the feature distance is judged, and the similarity between the two face images is calculated, and the similarity calculation is performed through the face feature coordinates.

In the embodiment of the present application, by identifying the facial feature coordinates, constructing a facial feature sequence, and calculating the facial feature based on the number of interval image frames between two video image frames and the feature distance value between the facial feature sequences The similarity of the image improves the accuracy of the similarity calculation.

FIG. 6 shows a specific implementation flowchart of a face image recognition method S105 provided by the fifth embodiment of the present application. Referring to Fig. 6, with respect to any of the embodiments described in Figs. 1-5, a face image recognition method S105 provided in this embodiment includes: S601 to S603, which are detailed as follows:

Further, if the current time is within the effective duration, displaying the web page through the page data includes:

In S601, the expression type of the face image is determined, and the expression type is recognized as a reference expression.

In this embodiment, the terminal device may use an expression recognition algorithm to determine the type of expression corresponding to the moment of shooting in the face image. For example, the expression type may be: smiling type, laughing type, crying type, sad type, etc., and the expression type of the face image is recognized as a reference expression.

In S602, according to the expression conversion algorithm and the reference expression, a derivative image of the face image is output; the expression type of the derivative image is different from the expression type of the face image.

In this embodiment, the terminal device can adjust the expression conversion algorithm according to the reference expression to determine how to convert from the reference expression to other expressions, for example, from a smiling expression to a crying expression, and from a smiling expression to a laughing expression. The terminal device can import the face image into the expression conversion algorithm adjusted according to the reference expression, and can output derivative images corresponding to different tag types.

In S603, the face image library is generated according to the face image and the derivative image.

In this embodiment, after the terminal device outputs the face image and the derived images of different expressions obtained based on the face image, the face image library can be established, so that the content of the face image library can be expanded.

In the embodiment of the present application, multiple derivative images with different tags are obtained based on the face image, so that the richness of the face image library can be improved.

FIG. 7 shows a specific implementation flowchart of a face image recognition method S104 provided by the sixth embodiment of the present application. Referring to Fig. 7, with respect to any one of the embodiments described in Figs. 1 to 5, S104 in a face image recognition method provided in this embodiment includes: S1041 to S1043, which are detailed as follows:

Further, the invoking the face recognition plug-in to extract the face images contained in each of the video image frames includes:

In S1041, based on the image data corresponding to the RGB channels in the video image frame, an image matrix of the video image frame is generated.

In this embodiment, after acquiring the video image frame, the terminal device can preprocess the video image frame before importing it into the face recognition plug-in. The specific operation is based on the video image frame on the three RGB channels. Perform data fusion on the image data to generate an image matrix about the video image frame.

In S1042, configure a convolution kernel corresponding to the video image frame according to the matrix size of the image matrix, and perform a convolution operation on the image matrix through the convolution kernel to obtain a standard matrix.

In this embodiment, after the terminal device generates the image matrix of the video image frame, it can identify the matrix size of the image matrix, and determine the convolution kernel used for the convolution operation based on the matrix size and the matrix size of the standard matrix. And through the convolution kernel, the image matrix is convolved, so that the size of the image matrix is adjusted to the standard size, and the standard matrix is obtained.

In S1043, the standard matrix is imported into the face recognition algorithm of the face recognition plug-in, and the face image is output.

In this embodiment, the standard matrix is imported into the face recognition plug-in, the standard matrix is analyzed by the face recognition algorithm in the face recognition plug-in, and the face image is output.

In the embodiment of the present application, the standard matrix is obtained by preprocessing the video image frame, and the standard matrix is imported into the face recognition plug-in to output the face image, thereby improving the processing efficiency of the face recognition plug-in on the video image frame and The accuracy of face recognition.

FIG. 8 shows a specific implementation flowchart of a face image recognition method S102 provided by the seventh embodiment of the present application. Referring to Fig. 8, with respect to any one of the embodiments described in Figs. 1 to 5, S102 in a face image recognition method provided in this embodiment includes: S1021 to S1022, and the details are as follows:

In S1021, query the installation address of the video playback plug-in corresponding to the plug-in identifier according to a preset playback plug-in addressing table.

In this embodiment, the terminal device starts the video playback application, and when the video playback plug-in needs to be loaded, the installation location of the video playback plug-in needs to be determined to allow the corresponding plug-in file. Therefore, the plug-in identification can be queried according to the playback plug-in addressing table. The associated storage address, which is the installation address of the video playback plug-in.

In S1022, the plug-in file of the video playback plug-in is acquired from the installation address, and the plug-in file is run through the video playback application to load the video playback plug-in to the video playback application.

In this embodiment, the terminal device extracts the plug-in file of the video playback plug-in from the installation address, runs the plug-in file through the video playback application, that is, creates a new concurrent thread under the process corresponding to the video playback application, and executes the plug-in file through the concurrent thread. Load the video playback plug-in to the video playback application.

In the embodiment of the present application, the installation location of the video playback application is determined by the playback plug-in addressing table, so that the video playback plug-in can be loaded into the video playback application, and the efficiency of the loading operation is improved.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

FIG. 9 shows a structural block diagram of a facial image recognition device provided by an embodiment of the present application, and each unit included in the facial image recognition device is used to execute each step in the embodiment corresponding to FIG. 1. For details, please refer to related descriptions in the embodiments corresponding to FIG. 1 and FIG. 1. For ease of description, only the parts related to this embodiment are shown.

Referring to Figure 9, the facial image recognition device includes:

The video play instruction receiving unit 91 is configured to receive a video play instruction; the video play instruction carries the plug-in identifier of the video play plug-in that needs to be called when the video file is played;

The video playback application starting unit 92 is configured to start a video playback application, and load the video playback plug-in to the video playback application based on the plug-in identifier;

The video image frame extraction unit 93 is configured to extract each video image frame of the video file through the video playback application after the plug-in is loaded if the plug-in ID matches the ID of the face recognition plug-in;

The face image recognition unit 94 is configured to call the face recognition plug-in to extract the face images contained in each of the video image frames;

The face image database establishment unit 95 is configured to establish a face image database of the video file according to the entity user corresponding to each of the face images.

Optionally, the facial image recognition device further includes:

A plug-in data package acquisition unit for acquiring the plug-in data package of the face recognition plug-in;

The legal verification result receiving unit is configured to send a version verification request to the server and receive the legal verification result fed back by the server based on the version verification request; the version verification request includes the version of the plug-in data package Logo

The calling statement file adding unit is used to query the installation location of the video playback application if the legal verification result is successful, and add the calling statement file in the plug-in data package to the installation location association To add the face recognition plug-in to the list of callable plug-ins of the video playback application.

Optionally, the face image database establishment unit 95 includes:

A similarity calculation unit, configured to calculate the similarity between the face images in any two video image frames;

An association relationship establishment unit, configured to, if the similarity is greater than a preset association threshold, identify the face images located in two different video image frames as associated images, and establish two face images The relationship between

The user face group dividing unit is configured to divide into multiple user face groups based on the association relationship of all the face images, and configure a user identifier for each of the user face groups; the user face group All face images in the image are the associated images;

The first face image database establishment unit is configured to establish the face image database according to the user face group and the user identifier.

Optionally, the similarity calculation unit includes:

The feature coordinate marking unit is configured to mark the feature coordinates of each key feature of the face in the face image based on a preset list of key features of the face;

The feature coordinate sequence generating unit is configured to construct the feature coordinate sequence of the face image according to the feature coordinates of the key features of all faces in the face key feature list;

A feature distance value calculation unit, configured to calculate feature distance values between the feature coordinate sequences of the face images of any two video image frames;

Interval image frame number identification unit for identifying the number of interval image frames between any two video image frames;

A similarity conversion unit, configured to import the characteristic distance value and the number of interval image frames into a preset similarity calculation model to obtain the similarity between the face images in the two video image frames; The similarity calculation model is specifically:

Wherein, Similarity is the similarity; ActFrame is the number of interval image frames; FigDist is the characteristic distance value; BaseDist is the reference distance value; BaseFrame is the shooting frame rate of the video file; StandardDist is the preset adjustment coefficient.

Optionally, the face image database establishment unit 95 includes:

A reference expression recognition unit, configured to determine the expression type of the face image, and recognize the expression type as a reference expression;

A derivative image output unit, configured to output a derivative image of the face image according to the expression conversion algorithm and the reference expression; the expression type of the derivative image is different from the expression type of the face image;

The second face image database establishment unit is configured to generate the face image database according to the face image and the derivative image.

Optionally, the face image recognition unit 94 includes:

An image matrix generating unit, configured to generate an image matrix of the video image frame based on the image data corresponding to the RGB channels in the video image frame;

A standard matrix generating unit, configured to configure a convolution kernel corresponding to the video image frame according to the matrix size of the image matrix, and perform a convolution operation on the image matrix through the convolution kernel to obtain a standard matrix;

The standard matrix importing unit is configured to import the standard matrix into the face recognition algorithm of the face recognition plug-in, and output the face image.

Optionally, the video playback application starting unit 92 includes:

The installation address obtaining unit is configured to query the installation address of the video playback plug-in corresponding to the plug-in identifier according to a preset playback plug-in addressing table;

The video playback plug-in loading unit is configured to obtain the plug-in file of the video playback plug-in from the installation address, and run the plug-in file through the video playback application to load the video playback plug-in to the video playback application.

Therefore, the facial image recognition device provided by the embodiment of the present application also does not require the user to manually intercept the video image frame and hand it over to other applications for facial image recognition. Instead, the facial recognition plug-in can be loaded in the video playback application. While playing the video file, the face image contained in each video image frame is automatically recognized, which improves the recognition efficiency of the face image and reduces the user's operation. On the other hand, since the recognition process of the face image is synchronized with the playback of the video file, the user does not need to perform the face image recognition after watching the video file, thereby reducing the time-consuming operation of the face recognition operation.

FIG. 10 is a schematic diagram of a terminal device provided by another embodiment of the present application. As shown in FIG. 10, the terminal device 10 of this embodiment includes: a processor 100, a memory 101, and computer-readable instructions 102 stored in the memory 101 and running on the processor 100, such as a human face image Recognition procedures. When the processor 100 executes the computer-readable instructions 102, the steps in the above-mentioned face image recognition method embodiments are implemented, for example, S101 to S105 shown in FIG. 1. Alternatively, when the processor 100 executes the computer-readable instructions 102, the functions of the units in the foregoing device embodiments, for example, the functions of the modules 91 to 95 shown in FIG. 8 are realized.

Exemplarily, the computer-readable instruction 102 may be divided into one or more units, and the one or more units are stored in the memory 101 and executed by the processor 100 to complete the present application . The one or more units may be a series of computer-readable instruction instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instruction 102 in the terminal device 10. For example, the computer-readable instruction 102 can be divided into a video playback instruction receiving unit, a video playback application startup unit, a video image frame extraction unit, a face image recognition unit, and a face image library establishment unit. The specific functions of each unit are as described above. Narrated.

The terminal device 10 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The terminal device may include, but is not limited to, a processor 100 and a memory 101. Those skilled in the art can understand that FIG. 10 is only an example of the terminal device 10, and does not constitute a limitation on the terminal device 10. It may include more or fewer components than shown in the figure, or a combination of certain components, or different components. For example, the terminal device may also include input and output devices, network access devices, buses, and so on.

The so-called processor 100 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The memory 101 may be an internal storage unit of the terminal device 10, such as a hard disk or a memory of the terminal device 10. The memory 101 may also be an external storage device of the terminal device 10, such as a plug-in hard disk equipped on the terminal device 10, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, etc. Further, the memory 101 may also include both an internal storage unit of the terminal device 10 and an external storage device. The memory 101 is used to store the computer-readable instructions and other programs and data required by the terminal device. The memory 101 can also be used to temporarily store data that has been output or will be output. In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores computer instructions, and when the computer instructions are run on a computer, the computer is caused to execute the steps of the method for recognizing the face image.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A face image recognition method, which includes:

Receiving a video play instruction; the video play instruction carries the plug-in identifier of the video play plug-in that needs to be called when the video file is played;

Start a video playback application, and load the video playback plug-in to the video playback application based on the plug-in identifier;

If the plug-in ID matches the ID of the face recognition plug-in, extract each video image frame of the video file through the video playback application after the plug-in is loaded;

Calling the face recognition plug-in to extract the face images contained in each of the video image frames;

According to the entity user corresponding to each of the facial images, a facial image library of the video file is established.
The identification method according to claim 1, wherein before the starting the video playing application and loading the video playing plug-in to the video playing application based on the plug-in identifier, the method further comprises:

Acquiring the plug-in data package of the face recognition plug-in;

Sending a version verification request to the server, and receiving a legal verification result fed back by the server based on the version verification request; the version verification request includes the version identification of the plug-in data package;

If the legal verification result is a successful verification, query the installation location of the video playback application, and add the call declaration file in the plug-in data package to the file directory associated with the installation location to add all The face recognition plug-in to the callable plug-in list of the video playback application.
The recognition method according to claim 1, wherein said establishing a face image library of said video file according to the entity user corresponding to each said face image comprises:

Calculating the similarity between the face images in any two video image frames;

If the similarity is greater than a preset association threshold, identifying the face images located in two different video image frames as associated images, and establishing an association relationship between the two face images;

Based on the association relationship of all the face images, divide them into multiple user face groups, and configure a user identifier for each of the user face groups; all face images in the user face groups are mutually exclusive The associated image;

The face image database is established according to the user face group and the user identifier.
The recognition method according to claim 3, wherein the calculating the similarity between the face images in any two of the video image frames comprises:

Based on a preset list of key face features, mark the feature coordinates of each key feature of the face in the face image;

Constructing the feature coordinate sequence of the face image according to the feature coordinates of the key features of all faces in the face key feature list;

Calculating the feature distance value between the feature coordinate sequences of the face image of any two video image frames;

Identifying the number of interval image frames between any two video image frames;

Import the characteristic distance value and the number of interval image frames into a preset similarity calculation model to obtain the similarity between the face images in the two video image frames; the similarity calculation model is specifically for:

Wherein, Similarity is the similarity; ActFrame is the number of interval image frames; FigDist is the characteristic distance value; BaseDist is the reference distance value; BaseFrame is the shooting frame rate of the video file; StandardDist is the preset adjustment coefficient.
The recognition method according to any one of claims 1 to 4, wherein the establishing the face image library of the video file according to the entity user corresponding to each of the face images comprises:

Determining the expression type of the face image, and identifying the expression type as a reference expression;

Outputting a derivative image of the face image according to the expression conversion algorithm and the reference expression; the expression type of the derivative image is different from the expression type of the face image;

According to the face image and the derivative image, the face image library is generated.
The recognition method according to any one of claims 1 to 4, wherein the invoking the face recognition plug-in to extract the face images contained in each of the video image frames comprises:

Generating an image matrix of the video image frame based on the image data corresponding to the RGB channels in the video image frame;

Configure a convolution kernel corresponding to the video image frame according to the matrix size of the image matrix, and perform a convolution operation on the image matrix through the convolution kernel to obtain a standard matrix;

Import the standard matrix into the face recognition algorithm of the face recognition plug-in, and output the face image.
The identification method according to any one of claims 1 to 4, wherein the starting a video playing application and loading the video playing plug-in to the video playing application based on the plug-in identifier comprises:

Query the installation address of the video playback plug-in corresponding to the plug-in identifier according to the preset playback plug-in addressing table;

Obtain the plug-in file of the video playback plug-in from the installation address, and run the plug-in file through the video playback application to load the video playback plug-in to the video playback application.
A face image recognition device, which includes:

The video play instruction receiving unit is configured to receive the video play instruction; the video play instruction carries the plug-in identifier of the video play plug-in that needs to be called when the video file is played;

A video playback application starting unit, configured to start a video playback application, and load the video playback plug-in to the video playback application based on the plug-in identifier;

The video image frame extraction unit is configured to extract each video image frame of the video file through the video playback application after the plug-in is loaded if the plug-in ID matches the ID of the face recognition plug-in;

The face image recognition unit is configured to call the face recognition plug-in to extract the face images contained in each of the video image frames;

The face image database establishment unit is used to establish the face image database of the video file according to the entity user corresponding to each of the face images.
The recognition device according to claim 8, wherein the recognition device of the face image further comprises:

A plug-in data package acquisition unit for acquiring the plug-in data package of the face recognition plug-in;

The legal verification result receiving unit is configured to send a version verification request to the server and receive the legal verification result fed back by the server based on the version verification request; the version verification request includes the version of the plug-in data package Logo

The calling statement file adding unit is used to query the installation location of the video playback application if the legal verification result is successful, and add the calling statement file in the plug-in data package to the installation location association To add the face recognition plug-in to the list of callable plug-ins of the video playback application.
8. The recognition device according to claim 8, wherein the face image database establishment unit comprises:

A similarity calculation unit, configured to calculate the similarity between the face images in any two video image frames;

An association relationship establishment unit, configured to, if the similarity is greater than a preset association threshold, identify the face images located in two different video image frames as associated images, and establish two face images The relationship between

The user face group dividing unit is configured to divide into multiple user face groups based on the association relationship of all the face images, and configure a user identifier for each of the user face groups; the user face group All face images in the image are the associated images;

The first face image database establishment unit is configured to establish the face image database according to the user face group and the user identifier.
A terminal device, wherein the terminal device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and when the processor executes the computer-readable instructions Perform the following steps:

Receiving a video play instruction; the video play instruction carries the plug-in identifier of the video play plug-in that needs to be called when playing the video file;

Start a video playback application, and load the video playback plug-in to the video playback application based on the plug-in identifier;

If the plug-in ID matches the ID of the face recognition plug-in, extract each video image frame of the video file through the video playback application after the plug-in is loaded;

Calling the face recognition plug-in to extract the face images contained in each of the video image frames;

According to the entity user corresponding to each of the facial images, a facial image library of the video file is established.
The terminal device according to claim 11, wherein, before the video playback application is started and the video playback plug-in is loaded into the video playback application based on the plug-in identifier, the processor executes the computer readable The following steps are also performed when instructing:

Acquiring the plug-in data package of the face recognition plug-in;

Sending a version verification request to the server, and receiving a legal verification result fed back by the server based on the version verification request; the version verification request includes the version identification of the plug-in data package;

If the legal verification result is a successful verification, query the installation location of the video playback application, and add the call declaration file in the plug-in data package to the file directory associated with the installation location to add all The face recognition plug-in to the callable plug-in list of the video playback application.
The terminal device according to claim 11, wherein said establishing a face image library of said video file according to the entity user corresponding to each said face image comprises:

Calculating the similarity between the face images in any two video image frames;

If the similarity is greater than a preset association threshold, identifying the face images located in two different video image frames as associated images, and establishing an association relationship between the two face images;

Based on the association relationship of all the face images, divide them into multiple user face groups, and configure a user identifier for each of the user face groups; all face images in the user face groups are mutually exclusive The associated image;

The face image database is established according to the user face group and the user identifier.
The terminal device according to claim 13, wherein the calculating the similarity between the face images in any two of the video image frames comprises:

Based on a preset list of key face features, mark the feature coordinates of each key feature of the face in the face image;

Constructing the feature coordinate sequence of the face image according to the feature coordinates of the key features of all faces in the face key feature list;

Calculating the feature distance value between the feature coordinate sequences of the face image of any two video image frames;

Identifying the number of interval image frames between any two video image frames;

Import the characteristic distance value and the number of interval image frames into a preset similarity calculation model to obtain the similarity between the face images in the two video image frames; the similarity calculation model is specifically for:

Wherein, Similarity is the similarity; ActFrame is the number of interval image frames; FigDist is the characteristic distance value; BaseDist is the reference distance value; BaseFrame is the shooting frame rate of the video file; StandardDist is the preset adjustment coefficient.
The terminal device according to any one of claims 11-14, wherein the establishing the face image library of the video file according to the entity user corresponding to each of the face images comprises:

Determining the expression type of the face image, and identifying the expression type as a reference expression;

Outputting a derivative image of the face image according to the expression conversion algorithm and the reference expression; the expression type of the derivative image is different from the expression type of the face image;

According to the face image and the derivative image, the face image library is generated.
A computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, wherein, when the computer-readable instructions are executed by a processor, the following steps are implemented:

Receiving a video play instruction; the video play instruction carries the plug-in identifier of the video play plug-in that needs to be called when the video file is played;

Start a video playback application, and load the video playback plug-in to the video playback application based on the plug-in identifier;

If the plug-in ID matches the ID of the face recognition plug-in, extract each video image frame of the video file through the video playback application after the plug-in is loaded;

Calling the face recognition plug-in to extract the face images contained in each of the video image frames;

According to the entity user corresponding to each of the facial images, a facial image library of the video file is established.
The computer-readable storage medium according to claim 16, wherein the computer-readable instructions are executed before the video playback application is started and the video playback plug-in is loaded to the video playback application based on the plug-in identifier. The processor also implements the following steps when executing:

Acquiring the plug-in data package of the face recognition plug-in;

Sending a version verification request to the server, and receiving a legal verification result fed back by the server based on the version verification request; the version verification request includes the version identification of the plug-in data package;

If the legal verification result is a successful verification, query the installation location of the video playback application, and add the call declaration file in the plug-in data package to the file directory associated with the installation location to add all The face recognition plug-in to the callable plug-in list of the video playback application.
16. The computer-readable storage medium according to claim 16, wherein said establishing a face image library of said video file according to the entity user corresponding to each said face image comprises:

Calculating the similarity between the face images in any two video image frames;

If the similarity is greater than a preset association threshold, identifying the face images located in two different video image frames as associated images, and establishing an association relationship between the two face images;

Based on the association relationship of all the face images, divide them into multiple user face groups, and configure a user identifier for each of the user face groups; all face images in the user face groups are mutually exclusive The associated image;

The face image database is established according to the user face group and the user identifier.
18. The computer-readable storage medium of claim 18, wherein the calculating the similarity between the face images in any two of the video image frames comprises:

Based on a preset list of key face features, mark the feature coordinates of each key feature of the face in the face image;

Constructing the feature coordinate sequence of the face image according to the feature coordinates of the key features of all faces in the face key feature list;

Calculating the feature distance value between the feature coordinate sequences of the face image of any two video image frames;

Identifying the number of interval image frames between any two video image frames;

Import the characteristic distance value and the number of interval image frames into a preset similarity calculation model to obtain the similarity between the face images in the two video image frames; the similarity calculation model is specifically for:

Wherein, Similarity is the similarity; ActFrame is the number of interval image frames; FigDist is the characteristic distance value; BaseDist is the reference distance value; BaseFrame is the shooting frame rate of the video file; StandardDist is the preset adjustment coefficient.
22. The computer-readable storage medium according to any one of claims 16-19, wherein the establishing the face image library of the video file according to the entity user corresponding to each of the face images comprises:

Determining the expression type of the face image, and identifying the expression type as a reference expression;

Outputting a derivative image of the face image according to the expression conversion algorithm and the reference expression; the expression type of the derivative image is different from the expression type of the face image;

According to the face image and the derivative image, the face image library is generated.