CN111444817B

CN111444817B - Character image recognition method and device, electronic equipment and storage medium

Info

Publication number: CN111444817B
Application number: CN202010213416.7A
Authority: CN
Inventors: 程星星
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2023-07-07
Anticipated expiration: 2040-03-24
Also published as: CN111444817A

Abstract

The embodiment of the invention provides a character image recognition method, a device, electronic equipment and a storage medium, wherein after identification information is recognized from a video through face recognition, a first image, in which the identification information is not recognized through the face recognition, is re-recognized through an image tracking algorithm for each piece of identification information, and the result of the face recognition is supplemented through the image tracking algorithm. The combination of the face recognition and the image tracking algorithm improves the success rate of the character image recognition of the identity information and reduces the condition of missing recognition.

Description

Character image recognition method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and apparatus for identifying a person image, an electronic device, and a storage medium.

Background

With the development of deep neural networks and deep learning technologies, the deep neural networks have been successful in more and more aspects based on the strong learning ability of the deep neural networks. The face recognition method is particularly excellent in face recognition performance, and even exceeds the accuracy of manual recognition. For example, for person identification in video, the existing method mainly aims at video image to carry out face identification, calculates the distance (such as Euclidean distance) between the standard face feature vector stored in the database and the face feature vector detected in the image frame, sets a distance threshold to judge the matching degree of the face image of the current frame and the face image in the database, if the distance is smaller than the threshold, the identification is successful, otherwise, the identification is failed, and the person identity corresponding to the feature vector with the highest matching degree in the database is the identification identity of the current person.

Specifically, for person identification, the existing solution is to convert a face image into a feature vector (such as a 512-dimensional feature vector), and determine whether the minimum distance between all standard vectors in a database and the face vector to be identified currently meets a set threshold condition based on a certain distance measurement method (such as euclidean distance), so as to determine the face identification result. However, unlike recognition scenes such as face unlocking and face payment, the recognition scenes in video are complex and changeable, the figures are different in posture, the face angles and the expressions are large in change, and video shots are frequently switched. Therefore, if the person identity in the video is identified through face recognition, the situation of missing identification is easy to occur, and the identification success rate is low.

Disclosure of Invention

The embodiment of the invention provides a character image recognition method, a device, electronic equipment and a storage medium, which are used for solving the problems that the prior art is used for recognizing the character identity in a video through face recognition, the condition of missing recognition is easy to occur and the recognition success rate is low.

In view of the above technical problems, in a first aspect, an embodiment of the present invention provides a person image recognition method, including:

For any identity information identified from each frame of image of a video through face recognition, acquiring a first image of the video, in which the identity information is not identified through face recognition;

identifying a target image area corresponding to the identity information from the first image through an image tracking algorithm according to the face identification area corresponding to the identity information in the second image;

the second image is an image in the video, wherein the identity information is successfully identified through face recognition.

In a second aspect, an embodiment of the present invention provides a person image recognition apparatus, including:

the acquisition module is used for acquiring any identity information identified from each frame of image of the video through face recognition, and acquiring a first image of the video, in which the identity information is not identified through face recognition;

the identification module is used for identifying a target image area corresponding to the identity information from the first image through an image tracking algorithm according to the face identification area corresponding to the identity information in the second image;

in a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the character image recognition method described above when the processor executes the program.

In a fourth aspect, embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the person image recognition method of any of the above.

According to the character image recognition method, device, electronic equipment and storage medium, after the identification information is recognized from the video through face recognition, the first image, which is not recognized through the face recognition, is re-recognized through an image tracking algorithm for each piece of identification information, and the result of the face recognition is supplemented through the image tracking algorithm. The combination of the face recognition and the image tracking algorithm improves the success rate of the character image recognition of the identity information and reduces the condition of missing recognition.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a character image recognition method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a general flow for identifying person identities through face detection, identification and tracking according to another embodiment of the present invention;

FIG. 3 is a schematic diagram of a tracking network model for tracking a person image according to another embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating the principle of calculating the area overlap ratio according to another embodiment of the present invention;

fig. 5 is a block diagram showing a configuration of a personal image recognition apparatus according to another embodiment of the present invention;

fig. 6 is a physical structure diagram of an electronic device according to another embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The existing method for identifying the person in the video has the problem of low success rate of identification, for example, the image in the video is influenced by environment or lens switching, so that the face image in some images cannot provide more comprehensive and clear facial features. In order to solve the problem of low success rate of human identification in video, the application provides a human image identification method which is mainly used for identifying human appearing in video (such as films and film fragments). So that the user can learn the person appearing in each frame of the video before the user views the video. The method may be performed by any device, e.g., a computer, server, cell phone, etc. Fig. 1 is a flowchart of a person image recognition method provided in this embodiment, referring to fig. 1, the method includes:

step 101: and acquiring any identity information identified from each frame of image of the video through face recognition, and acquiring a first image of the video, in which the identity information is not identified through face recognition.

The identity information is information for identifying different persons, for example, the name, certificate number, biometric information, or the like of the person, which is not particularly limited in this embodiment.

Step 102: and identifying a target image area corresponding to the identity information from the first image through an image tracking algorithm according to the face recognition area corresponding to the identity information in the second image.

There are many kinds of image tracking algorithms, for example, tracking an image by similarity, image position change, or the like, which is not particularly limited in the present embodiment. For each first image, a target image area corresponding to the identity information can be identified according to a face recognition area corresponding to the identity information in any second image in the video (specifically, a target tracking area is determined according to the face recognition area corresponding to the identity information in the second image, and an image tracking algorithm is executed according to the target tracking area). However, in order to improve the recognition success rate through the image tracking algorithm, the target image area corresponding to the identity information may be recognized according to a second image in which the video playing time is before the first image and the video playing time is close to the first image (e.g., the video playing time is closest to the first image)

According to the character image recognition method provided by the embodiment of the invention, after the identification information is recognized from the video through the face recognition, the first image which does not recognize the identification information through the face recognition is re-recognized through the image tracking algorithm for each identification information, and the result of the face recognition is supplemented through the image tracking algorithm. The combination of the face recognition and the image tracking algorithm improves the success rate of the character image recognition of the identity information and reduces the condition of missing recognition.

In order to improve the recognition success rate, the embodiment divides the video segments for each first image, and for each video segment, the association between images in the video process is utilized to respectively determine the corresponding tracking image for each video segment so as to realize the recognition of the identity information in the video segment. Further, on the basis of the foregoing embodiment, the identifying, by an image tracking algorithm, the target image area corresponding to the identity information from the first image according to the face recognition area corresponding to the identity information in the second image includes:

any initial video segment formed by first images with continuous video playing time is obtained to be used as a video segment to be identified, and a second image with the video playing time before the first frame playing time of the video segment to be identified and with preset duration of the first frame playing time interval is obtained to be used as a tracking image;

and taking a face recognition area corresponding to the identity information in the tracking image as a target tracking area, and identifying the target image area from the video fragment to be identified through an image tracking algorithm according to the target tracking area.

It should be noted that, for each identity information, a video clip to be identified corresponding to the identity information needs to be determined. The video clips to be identified can be extracted according to the first images which are continuous in video playing time. The video playing time refers to a playing time point corresponding to each frame image in the video when the whole video is played without stopping the broadcasting.

The following describes the initial video clip determination process in an exemplary manner:

for three identity information of "person1", "person2" and "person3", it is necessary to determine an initial video clip for each identity information, and table 1 is a time axis of the recognition result for the different identity information provided in this embodiment. Referring to Table 1, for person1, person1 was successfully identified during three time periods of "00:00:05:880-00:01:10:200", "00:02:39:360-00:03:13:060", "00:04:37:540-00:05:05:340", person1 was not identified during the time periods of "00:01:10:200-00:02:39:360", "00:03:13:060-00:04:37:540" and "00:05:05:340-video end", and these 3 time periods were the 3 initial video clips determined for person 1. Wherein "00:01:10:200", "00:03:13:060", and "00:05:05:340" correspond to the starting time points of the three unrecognized time periods of person1, respectively.

Table 1 video short time axis for determination of different identity information

In this embodiment, for each initial video segment that is a video segment to be identified, a second image that precedes the initial video segment and is closer to the video playing time of the initial video segment is determined as the tracking image. The identification of the identity information in the initial video segment is realized by tracking the image, and the relevance among the images in the video is utilized, so that the identification accuracy of the identification of the identity information is improved.

It should be noted that, the preset duration is a set value, for example, the preset duration is equal to a time interval between two adjacent frames of images of the video in video playing time (that is, an image obtained by the tracking image before the initial video segment and closest to the first frame playing time of the initial video segment), or the preset duration is equal to a time interval between any image of the video and the 2 nd frame or the 3 rd frame of images before the image in video playing time, which is not limited in this embodiment.

The method comprises the steps of dividing a first image into initial video segments, and respectively determining a tracking image for each initial video segment to obtain a target video segment composed of the tracking image and the initial video segment. Since the tracking image is an image before and strongly associated with the initial video clip, the recognition success rate of recognizing the target video clip can be further improved by the face recognition area in the tracking image.

In order to further explain the specific flow of the person identification method in the present application, fig. 2 is a schematic diagram of the general flow of person identification through face detection, identification and tracking provided in the present embodiment, and referring to fig. 2, the process includes the following steps:

step 1: performing face detection on images in the video;

step 2: performing face recognition;

step 3: calculating unrecognized time axis intervals;

step 4: extracting interval fragments with tracking values;

step 5: the discriminability of the target to be tracked is improved;

step 6: tracking a target area;

step 7: and the character recognition rate is improved.

To improve the efficiency of recognition, the first image of each video clip may also be "condensed". Further, on the basis of the foregoing embodiments, the obtaining, as the video segment to be identified, any initial video segment composed of the first images with continuous video playing time includes:

for any initial video segment composed of first images with continuous video playing time, determining a first image suspected to contain an image area corresponding to the identity information from the initial video segment according to the position information of the target tracking area in the tracking image and the position information of a face recognition area appearing in each first image in the initial video segment;

And taking the video segment consisting of the first image suspected to contain the image area corresponding to the identity information as the video segment to be identified.

The location information of the face recognition area comprises the coordinate location and the size of the face recognition area.

Specifically, whether the face recognition image is suspected to be included in the next frame image or not can be primarily judged through the area overlapping ratio of the face recognition image determined by the position information in the adjacent frame images.

According to the embodiment, the initial video segment is further shortened according to whether the image region corresponding to the identity information is suspected to be contained or not, the shortened initial video segment is used as the video segment to be identified, the operation amount of an image tracking algorithm is reduced, and the identification efficiency is improved.

In some cases (e.g., where a person is occluded), the face recognition area may provide few features that are difficult to successfully identify based on the features of the face recognition area. In this embodiment, the face recognition area is enlarged, and the enlarged person recognition image is used as the tracking image, so that the person recognition image contains more features, which is beneficial to further improving the recognition accuracy. Further, on the basis of the foregoing embodiments, the identifying, by an image tracking algorithm, the target image area corresponding to the identity information from the first image according to the face recognition area corresponding to the identity information in the second image includes:

And identifying a target image area corresponding to the identity information from the first image through an image tracking algorithm according to a person identification area formed by amplifying the face identification area corresponding to the identity information in the second image.

And amplifying the face recognition area corresponding to the identity information in the second image according to a set amplification rule to obtain the person recognition area.

Specifically, the person identification area obtained after the amplification is used as a target tracking area, and the target image area in the first image is identified through an image tracking algorithm.

Wherein the amplification rule is set according to a formula

And amplifying the face recognition area corresponding to the identity information in the second image, wherein x and y are the coordinate positions of the face recognition area binding box relative to the upper left corner of the image, and w and h are the width and the height of the face recognition area binding box.

That is, in the position of the person recognition area obtained by enlarging the face recognition area with respect to the face recognition area, the abscissa is reduced by 0.3 times the width w of the face recognition area, the ordinate is reduced by 0.3 times the height h of the face recognition area, the person recognition area is increased in width by 1.3 times the width w of the face recognition area with respect to the face recognition area, and the height is increased by 1.3 times the height h of the face recognition area.

The embodiment amplifies the face recognition area, and initializes the tracking area of the image tracking algorithm according to the person recognition area obtained after the amplification. The character recognition area formed by amplification comprises more characters, so that the recognition success rate of recognizing the characters of the identity information can be further improved through the character recognition area, and the condition of missing recognition is reduced.

The embodiment provides a specific implementation manner for identifying a target image area corresponding to identity information, further, on the basis of the foregoing embodiments, the identifying, by an image tracking algorithm, the target image area from the initial video segment according to the target tracking area includes:

determining a target video segment which consists of the tracking image and the video segment to be identified, and taking the tracking image as a first frame image;

inputting the target video segment and the position information of the target tracking area in the first frame image of the target video segment into a pre-trained tracking network model, and outputting the similarity between each image area contained in each first image in the target video segment and the target tracking area by the tracking network model;

Identifying the target image area from the target video segment according to the similarity of the tracking network model output to each first image;

the tracking network model is trained through machine learning according to the position information of a sample person tracking area marked in a first frame image of a sample video and an image area marked in the sample video and representing the same person with the sample person tracking area, and is used for identifying a model of the image area representing the same person with the person tracking area from the video according to the input video and the position information of the person tracking area in the first frame image of the video.

Wherein identifying the target image region from the target video segment according to the similarity of the tracking network model output to each first image comprises:

and judging whether an image area with the similarity larger than a similarity threshold value with the target tracking area exists in the first image or not according to any first image in the target video segment, if so, taking the image area with the maximum similarity with the target tracking area in the first image as the target image area, otherwise, the image area with the maximum similarity with the target tracking area does not exist in the first image.

Wherein the tracking network model is a "twin network" based on a deep neural network.

Specifically, the tracking network model performs contrast analysis on subsequent images in the video according to the initialized target tracking area, searches an area with highest similarity with the target tracking area as a tracking result, and sequentially tracks the areas. For the step 6, fig. 3 is a schematic diagram of the principle of tracking the person image by the tracking network model provided in the present embodiment, where the present embodiment completes the tracking task by adopting the "twin network" tracking method based on the deep neural network. Referring to fig. 3, z is a template image, that is, an enlarged bounding box area, x is a subsequent image frame of a video, phi is a convolutional neural network, the image area is mapped to a feature space, the convolutional operation is performed, features of 6 x 128 and features of 22 x 128 are convolved, a tracking result score matrix of 17 x 17 is output, the similarity degree between each position in the search area x and the template z is represented, the image area corresponding to the point with the highest similarity degree is the tracking result of the tracker, and the successfully tracked image x and the template image z have the same identity of a person.

According to the embodiment, the target tracking area is tracked through the pre-trained tracking network model, the identification process of the image area of certain identity information is realized based on the comparison of the similarity, the re-identification of the first image which is not successfully identified by the face identification is realized, and the identification success rate of the identity information is improved.

The present embodiment provides a specific implementation of how to "thin" video clips. Further, on the basis of the foregoing embodiments, the determining, from the initial video segment, the first image suspected to include the image area corresponding to the identity information according to the position information of the target tracking area in the tracking image and the position information of the face recognition area appearing in each first image in the initial video segment, includes:

starting from the first image of the first frame of the initial video segment, circularly executing the detection operation until the first image which does not contain the image area corresponding to the identity information is detected or the detection operation is executed on the first image of the last frame of the initial video segment;

wherein the detecting operation includes:

when the method is executed for the first time, according to the position information of the target tracking area in the tracking image and the position information of each face recognition area in the first image of the initial video segment, determining the first area overlap ratio of each face recognition area in the first image of the initial video segment and the target tracking area, if the first area overlap ratio which is larger than an overlap ratio threshold exists, the first image of the initial video segment is suspected to contain an image area corresponding to the identity information, and the face recognition area corresponding to the largest first area overlap ratio is taken as a suspected image area corresponding to the identity information, which is suspected to be contained in the first image of the initial video segment, otherwise, the first image of the initial video segment does not contain the image area corresponding to the identity information;

When the detection operation is not executed for the first time, a suspected image area determined by the detection operation executed for the previous time is obtained, according to the position information of the suspected image area in a first image executed for the detection operation executed for the previous time and the position information of each face recognition area in the first image executed for the detection operation currently, the second area overlapping ratio of each face recognition area in the first image executed for the detection operation currently and the suspected image area corresponding to the identity information is determined, if the second area overlapping ratio larger than the overlapping ratio threshold exists, the image area corresponding to the identity information is suspected to be contained in the first image executed for the detection operation currently, and the face recognition area corresponding to the largest second area overlapping ratio is regarded as the suspected image area corresponding to the identity information in the first image executed for the detection operation currently, otherwise, the image area corresponding to the identity information is not contained in the first image executed for the detection operation currently;

the area overlapping ratio is determined according to a first area occupied by the overlapping parts of the two areas and the sum of a second area occupied by the non-overlapping parts of the two areas and the first area.

Wherein the area overlap ratio is equal to the ratio of the first area to the sum.

Regarding calculation of the area overlap ratio, fig. 4 is a schematic diagram of the calculation principle of the area overlap ratio provided in this embodiment, referring to fig. 4, for two face recognition areas, the ratio (i.e. the intersection ratio) of the first area occupied by the intersection of the two face recognition areas (i.e. the overlapping portion of the two areas) to the union of the two face recognition areas (i.e. the sum of the second area occupied by the non-overlapping portion of the two areas and the first area) is the area overlap ratio of the two face recognition areas.

In fig. 4, two boxes represent the coordinate positions of the face regions detected in the two continuous frames of images, "intersection" represents the area where the two regions overlap in the coordinate system, and "union" represents the total area of the two regions in the coordinate system. Area overlap ratio IoU e [0,1], the embodiment sets IoU threshold (i.e. overlap ratio threshold) to 0.5, and a value lower than IoU threshold indicates that the subsequent video scene has larger change and lower trackability. The closer the IoU value is to 1, the larger the coincidence of the face regions in two continuous frames is, the higher the continuity probability of the video segment is, and the larger the tracking value is.

The overlap ratio threshold is a set value, for example, the overlap ratio threshold is 0.5.

In particular, for step 4 above, only continuously progressive video clips have the value of tracking and the possibility of successful tracking. And (3) calculating the coincidence ratio of the coordinate positions of the face sounding boxes in the continuous video images by taking the face sounding box successfully recognized in the first frame as the starting point in each unidentified segment according to the unidentified video segments output in the step (3) and the unidentified face sounding box positions output in the step (2). And preliminarily judging the continuity of the video fragments through the coincidence degree of the marking box, and screening fragments with tracking value. The higher the binding box overlap ratio is, the higher the continuity probability of the video is, and the higher the probability of successful tracking is. The face region bounding box overlap ratio can be calculated by using the "overlap ratio" of the face regions in two consecutive frames as shown in fig. 4.

And 5, initializing a target tracker by taking the successfully identified face tracking box area in the first frame of each segment as an initial tracking area based on the screening result of the step 4. Because the face region detected by the face detection method only contains face features and does not contain information such as hairstyles, clothes styles, colors, backgrounds and the like of people, if the face region is only tracked, the accuracy is low, and tracking results are easy to drift, so that tracking failure is caused. The method uses the amplified face detection box to initialize the tracker, introduces image information around the face into the amplified box area, increases the discriminability of the tracking area, and is beneficial to improving the accuracy and robustness of the tracking result.

According to the method and the device for generating the image tracking algorithm, the initial video segment is further shortened through the area overlapping ratio of the face recognition area, the image forming target video segment with good continuity of the initial video segment is extracted, the effectiveness of the image tracking algorithm on the target video segment is improved, and the recognition efficiency and the recognition success rate are improved.

Further, on the basis of the above embodiments, the method further includes:

Determining face recognition areas contained in each frame of image of the video through face detection, determining the similarity between the face recognition areas and each reference face image in a database through face recognition for each face recognition area, and determining identity information corresponding to the face recognition areas according to the similarity between the face recognition areas and each reference face image in the database;

the database comprises a corresponding relation between identity information and a reference face image.

Wherein, still include: and determining the position information of each face recognition area in the image through face detection, wherein the position information comprises the position of the face recognition area in the image and the size of the face recognition area.

The method for determining the position information of each face recognition area in the image through face detection comprises the following steps:

detecting face recognition areas in each frame of image of the video through face detection, and storing four-tuple information of each face recognition area; in the quadruple information (x, y, w, h), x and y represent the coordinate positions of the face recognition area relative to the upper left corner of the image, and w and h represent the width and the height of the face recognition area.

Specifically, for the step 1, the face detection algorithm may detect a face region in the image, a locating box coordinate of the face region, and facial feature position coordinate information of the face. And carrying out face detection on all image frames in the video to obtain the position information of all face region binding boxes. The binding box is stored using four tuples (x, y, w, h).

According to the similarity between the face recognition area and each reference face image in the database, determining identity information corresponding to the face recognition area comprises the following steps: and for each face recognition area, converting the face recognition area into a feature vector with fixed length, determining the similarity between the face recognition area and each reference face image in the database according to the converted feature vector and the feature vector stored in the database, and taking the identity information of the reference face image corresponding to the maximum similarity larger than the face recognition threshold as the identity information corresponding to the face recognition area.

The determining the similarity between the face recognition area and each reference face image in the database according to the converted feature vector and the feature vector stored in the database comprises the following steps:

by the formula

Calculating the similarity;

wherein x is _i And x _j Respectively representing two 512-dimensional face feature vectors, dist E [0, + ] infinity]Representing the euclidean distance between two feature vectors. If dist is less than the face recognition threshold, x _i And x _j The human identities are the same, and the human face recognition is successful. Otherwise, the face recognition fails.

Specifically, for the step 2, based on the face detection result in the step 1, the detected face image is converted into a feature vector (such as a 512-dimensional vector) with a fixed length, and the best match between the detected feature vector and the feature vector stored in the database is calculated to determine the attribution of the face identity.

The embodiment realizes the identification of the easily-identified image in the video through the face recognition.

For the step 7, based on the step 1 to the step 6, in the finally outputted identification results of the characters in the video, not only the identification of the face image easy to identify is realized, but also the identification of the characters which are not easy to identify and are influenced by the environment or the lens is realized, and the success rate of the identification of the characters in the video is improved.

Therefore, the method combines face detection, face recognition and target tracking, and improves the recognition rate of the characters in the video. The part which is easy to identify in the video is completed by adopting a face recognition method, and the part which is not easy to identify is completed by adopting a target tracking method. By introducing the target tracking method, the recognition rate of the face in the scenes such as shielding, illumination, side faces, low head, exaggerated expression, long-range view and the like is improved. In addition, the video clips are split based on face detection and face recognition results, and the video clips with potential tracking value are screened by calculating the coincidence ratio IoU of the face areas. And when initializing the target tracker, zooming the facial region binding box, introducing image information such as hairstyles, clothes styles, colors, backgrounds and the like of the people through zooming, improving the discrimination capability of the tracking region, and improving the accuracy and the robustness of the tracking result.

Fig. 5 is a block diagram of the configuration of the personal image recognition apparatus provided in this embodiment, referring to fig. 5, the apparatus includes an acquisition module 501 and a recognition module 502, wherein,

an obtaining module 501, configured to obtain, for any identity information identified from each frame of image of a video by face recognition, a first image in the video in which the identity information is not identified by face recognition;

the identifying module 502 is configured to identify, according to a face recognition area corresponding to the identity information in the second image, a target image area corresponding to the identity information from the first image by using an image tracking algorithm;

The person image recognition device provided in this embodiment is applicable to the person image recognition method provided in the foregoing embodiment, and will not be described herein.

According to the character image recognition device provided by the embodiment, after the identification information is recognized from the video through the face recognition, the first image, in which the identification information is not recognized through the face recognition, is re-recognized through the image tracking algorithm for each piece of identification information, and the face recognition result is supplemented through the image tracking algorithm. The combination of the face recognition and the image tracking algorithm improves the success rate of the character image recognition of the identity information and reduces the condition of missing recognition.

Further, on the basis of the above embodiment, the identification area is further configured to:

Further, on the basis of the above embodiments,

the step of obtaining any initial video segment composed of the first images with continuous video playing time as the video segment to be identified comprises the following steps:

Further, on the basis of the foregoing embodiments, the identifying, by an image tracking algorithm, the target image area corresponding to the identity information from the first image according to the face recognition area corresponding to the identity information in the second image includes:

Further, on the basis of the above embodiments,

the identifying the target image area from the initial video segment by an image tracking algorithm according to the target tracking area comprises the following steps:

Further, on the basis of the foregoing embodiments, the determining, from the initial video segment, the first image suspected to include the image area corresponding to the identity information according to the position information of the target tracking area in the tracking image and the position information of the face recognition area appearing in each first image in the initial video segment, includes:

Wherein the detecting operation includes:

Further, on the basis of the above embodiments, the method further includes:

Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, which may include: processor 601, communication interface (Communications Interface) 602, memory 603 and communication bus 604, wherein processor 601, communication interface 602, memory 603 complete the communication between each other through communication bus 604. The processor 601 may call logic instructions in the memory 603 to perform the following method: for any identity information identified from each frame of image of a video through face recognition, acquiring a first image of the video, in which the identity information is not identified through face recognition; identifying a target image area corresponding to the identity information from the first image through an image tracking algorithm according to the face identification area corresponding to the identity information in the second image; the second image is an image in the video, wherein the identity information is successfully identified through face recognition.

Further, the logic instructions in the memory 603 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Further, embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the methods provided by the above-described method embodiments, for example comprising: for any identity information identified from each frame of image of a video through face recognition, acquiring a first image of the video, in which the identity information is not identified through face recognition; identifying a target image area corresponding to the identity information from the first image through an image tracking algorithm according to the face identification area corresponding to the identity information in the second image; the second image is an image in the video, wherein the identity information is successfully identified through face recognition.

In another aspect, embodiments of the present invention further provide a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the transmission method provided in the above embodiments, for example, including: for any identity information identified from each frame of image of a video through face recognition, acquiring a first image of the video, in which the identity information is not identified through face recognition; identifying a target image area corresponding to the identity information from the first image through an image tracking algorithm according to the face identification area corresponding to the identity information in the second image; the second image is an image in the video, wherein the identity information is successfully identified through face recognition.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A character image recognition method, comprising:

2. The person image recognition method according to claim 1, wherein the identifying, by an image tracking algorithm, the target image area corresponding to the identity information from the first image based on the face recognition area corresponding to the identity information in the second image includes:

3. The person image recognition method according to claim 2, wherein the acquiring any one of the initial video clips composed of the first images whose video playing time is continuous as the video clip to be recognized includes:

4. The person image recognition method according to claim 1, wherein the identifying, by an image tracking algorithm, the target image area corresponding to the identity information from the first image based on the face recognition area corresponding to the identity information in the second image includes:

5. The person image recognition method according to claim 2, wherein the recognizing the target image area from the initial video clip by an image tracking algorithm according to the target tracking area includes:

6. The person image recognition method according to claim 3, wherein the determining a first image suspected to include an image area corresponding to the identity information from the initial video clip based on the position information of the target tracking area in the tracking image and the position information of the face recognition area appearing in each first image in the initial video clip, includes:

wherein the detecting operation includes:

7. The personal image identification method as set forth in claim 1, further comprising:

8. A personal image recognition apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the character image recognition method according to any one of claims 1 to 7 when the program is executed.

10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the character image recognition method according to any one of claims 1 to 7.