WO2023173686A1

WO2023173686A1 - Detection method and apparatus, electronic device, and storage medium

Info

Publication number: WO2023173686A1
Application number: PCT/CN2022/114904
Authority: WO
Inventors: 张殿炎; 尹瑞鹏; 胡文超
Original assignee: 上海商汤智能科技有限公司
Priority date: 2022-03-17
Filing date: 2022-08-25
Publication date: 2023-09-21
Also published as: CN114612986A

Abstract

The present disclosure relates to a detection method and apparatus, an electronic device and a storage medium. The detection method comprises: receiving an image sequence sent by a terminal in response to an action sequence, the image sequence comprising multiple frames of images; sequentially acquiring one action content in the action sequence as current action content, and performing the following operations: determining a starting image corresponding to the current action content, and sequentially determining action scores of each image after the starting image in the image sequence and of the current action content; determining, according to the action score of any image, a matching result corresponding to the current action content; determining a matching result between the image sequence and the action sequence according to the matching results corresponding to all the action content in the action sequence; and generating a detection result on the basis of the matching result between the image sequence and the action sequence. The present disclosure can improve the security of a user detection environment and the accuracy of detection results.

Description

Detection methods, devices, electronic equipment and storage media

This application claims priority to the Chinese patent application filed on March 17, 2022, with the application number 202210265003.2 and the invention title "Detection method, device, electronic equipment and storage medium", the entire content of which is incorporated into this application by reference.

Technical field

The present disclosure relates to the field of detection, and in particular, to a detection method, device, electronic equipment and storage medium.

Background technique

In scenarios such as online finance and account login that require human-machine verification, operators prefer that users who pass human-machine verification are real account owners, not program scripts or impostors. If in the human-machine verification scenario, the user is a program script or an impostor, there is a high probability that the verification will be malicious. That is, the verification environment is not safe and may easily cause property losses to the user. Therefore, how to improve the security of the verification environment is one of the issues that needs to be solved urgently.

Contents of the invention

This disclosure proposes a detection technical solution.

According to an aspect of the present disclosure, a detection method is provided and applied to a server. The detection method includes: receiving an image sequence sent by a terminal in response to an action sequence, where the image sequence includes multiple frames of images; and sequentially acquiring the An action content in the action sequence is used as the current action content, and the following operations are performed: determine the starting image corresponding to the current action content, and sequentially determine the relationship between at least one image after the starting image in the image sequence and the current action content. Action scoring; according to the action score of any image, determine the matching result corresponding to the current action content; according to the matching results corresponding to all action contents in the action sequence, determine the matching result between the image sequence and the action sequence; based on the The matching result between the image sequence and the action sequence is used to generate a detection result.

In a possible implementation, determining the starting image corresponding to the current action content includes: when determining that the current action content is the first action content in the action sequence, The starting image is the starting image of the image sequence. When it is determined that the current action content is not the first action content in the action sequence, the starting image is the one that successfully matches the previous action content. The next frame of the image.

In a possible implementation, determining the matching result of the image sequence and the action sequence includes: within the first preset time since determining the matching result of the action sequence, the step of determining the matching result of the action sequence is not obtained. In the case of a successful matching result of the action sequence, and/or, within the second preset time since the determination of the matching result of any action content in the action sequence, no matching result of a successful match of the action content is obtained. In this case, it is determined that the matching result between the image sequence and the action sequence is a matching failure.

In a possible implementation, generating a detection result based on a matching result between the image sequence and the action sequence includes: determining that the matching result between the image sequence and the action sequence is a successful match. Next, filter out the first image in the image sequence; generate a living body detection result based on the first image; determine the detection result based on the living body detection result, wherein, after determining that the living body detection result is a living body In the case of , the test result is that the test passed.

In a possible implementation, filtering out the first image in the image sequence includes: filtering out a preset number of first images in the image sequence whose action scores are greater than or equal to a second score threshold. image.

In a possible implementation, generating a living body detection result based on the first image includes: generating a living body detection sub-result corresponding to the first image based on the first image; and selecting the action with the highest score The first image is used as the second image; when it is determined that the life detection sub-result corresponding to the second image is a living body, and the ratio of the number of first images with the life detection sub-result being a living body and the number of all first images is greater than or equal to the predetermined In the case of a ratio, it is determined that the living body detection result is a living body.

In a possible implementation, the receiving the image sequence sent by the terminal in response to the action sequence includes: decrypting the image sequence sent by the terminal in response to the action sequence to obtain a decrypted image sequence; the sequentially determining The action score of at least one image after the starting image in the image sequence and the current action content includes: sequentially determining the action score of at least one image after the start image in the decrypted image sequence and the current action content. .

In a possible implementation, determining the matching result between the image sequence and the action sequence further includes: generating at least one of facial area coordinates and facial numbers corresponding to images in the image sequence; The matching result is determined based on at least one of the facial area coordinates and the facial number, the image sequence, and the action sequence.

In a possible implementation, determining the matching result based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence includes: determining the In at least one image of the image sequence, the matching result between the facial area indicated by the facial area coordinates and the action sequence is used as the matching result between the image sequence and the action sequence.

In a possible implementation, determining the matching result based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence further includes: determining If the number of images corresponding to the smallest facial number in the image sequence is greater than the first threshold, the matching result is determined to be a failed match.

In a possible implementation, the detection method further includes at least one of the following: when it is determined that the detection result is that the detection fails, sending a first instruction to the terminal, and the first instruction controls the terminal Enter the page for resending the detection request; when it is determined that the number of times the first instruction is sent to the terminal within the third preset time reaches the second threshold, in response to the detection request sent by the terminal through the page, to The terminal sends a second instruction, the second instruction is used to notify the terminal that the server rejects the detection request; when it is determined that the time from sending the first instruction to receiving the new image sequence is greater than the fourth preset time, send a second instruction to the terminal.

According to an aspect of the present disclosure, a detection device is provided, which is applied to a server. The detection device includes: an image sequence receiving module to receive an image sequence sent by a terminal in response to the action sequence. In the image sequence, It includes multiple frames of images; an action content processing module is used to sequentially obtain an action content in the action sequence as the current action content, and perform the following operations: determine the starting image corresponding to the current action content, and determine the image sequence in sequence The action score of at least one image after the starting image and the current action content in the action sequence; determine the matching result corresponding to the current action content according to the action score of any image; determine the matching result corresponding to all action content in the action sequence, Determine the matching result of the image sequence and the action sequence; a detection result generation module is used to generate a detection result based on the matching result of the image sequence and the action sequence.

According to an aspect of the present disclosure, an electronic device is provided, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call instructions stored in the memory to execute any of the above The detection method described in one item.

According to an aspect of the present disclosure, a computer-readable storage medium is provided, on which computer program instructions are stored. When the computer program instructions are executed by a processor, any one of the above detection methods is implemented.

According to an aspect of the present disclosure, a computer program product is provided, including computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code, when the computer readable code is stored in an electronic device When running in a processor, the processor in the electronic device executes the detection method described in any one of the above.

The present disclosure provides a detection method. The server can receive an image sequence sent by a terminal in response to an action sequence. The image sequence includes multiple frames of images, and then sequentially obtains an action content in the action sequence as the current action content. , and perform the following operations: determine the starting image corresponding to the current action content, determine the action score of at least one image after the start image in the image sequence and the current action content, and determine based on the action score of any image. The matching result corresponding to the current action content is determined based on the matching results corresponding to all action contents in the action sequence, and the matching result between the image sequence and the action sequence is finally determined based on the matching result between the image sequence and the action sequence. , generate detection results. Since the above detection results are generated in the server, the possibility of malicious programs changing the detection results is reduced. Combined with the above action scores, the accuracy of the matching results can be further improved, thereby achieving accurate detection of the security of the verification environment. In addition, because the computing power of the server is higher than that of the terminal, the server can reduce the time required to generate detection results, or use a detection model with more complex calculations but higher detection accuracy.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

Description of the drawings

The accompanying drawings herein are incorporated into and constitute a part of this specification. They illustrate embodiments consistent with the disclosure and, together with the description, serve to explain the technical solutions of the disclosure.

Figure 1 shows a flow chart of a detection method according to an embodiment of the present disclosure.

Figure 2 shows a flow chart of a detection method according to an embodiment of the present disclosure.

FIG. 3 shows a block diagram of a detection device according to an embodiment of the present disclosure.

FIG. 4 shows a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed ways

Various exemplary embodiments, features, and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the drawings identify functionally identical or similar elements. Although various aspects of the embodiments are illustrated in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.

The word "exemplary" as used herein means "serving as an example, example, or illustrative." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or superior to other embodiments.

The term "and/or" in this article is just an association relationship that describes related objects, indicating that three relationships can exist. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and they exist alone. B these three situations. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, and C, which can mean including from A, Any one or more elements selected from the set composed of B and C.

In addition, in order to better explain the present disclosure, numerous specific details are given in the following detailed description. It will be understood by those skilled in the art that the present disclosure may be practiced without certain specific details. In some instances, methods, means, components and circuits that are well known to those skilled in the art are not described in detail in order to emphasize the subject matter of the disclosure.

In related technologies, detection technology is usually built into the terminal application, so the detection process is roughly as follows: the terminal receives the user's detection request, and uses the built-in detection technology to identify the user's image data, generate detection results, and then detect The results are sent to the server, and the server determines whether to provide further services to the terminal based on the detection results.

However, such a setting can easily cause the following problems: 1. The detection results are generated by the terminal and then transmitted to the server, so the detection results are easily tampered with by malicious programs. For example: the terminal detection result is a matching failure, but the malicious program modifies it to a successful matching and then sends it to the server. The server will think that it can provide further services to the terminal, that is, the server thinks that the terminal's detection environment is safe, but in actual circumstances, the terminal's detection environment is not safe, and malicious programs can easily cause property losses to users. 2. The detection technology is integrated into the application, and the detection technology used has limited computing power. In order to reduce the user's waiting time, it is not easy for the application to use detection technology with more complex calculations when the computing power is limited. Therefore, its detection accuracy is limited.

In view of this, an embodiment of the present disclosure provides a detection method. The server can receive an image sequence sent by the terminal in response to an action sequence. The image sequence includes multiple frames of images, and then obtains one of the action sequences in sequence. Action content, as the current action content, and perform the following operations: determine the starting image corresponding to the current action content, sequentially determine the action score of at least one image after the starting image in the image sequence and the current action content, according to any The action score of an image determines the matching result corresponding to the current action content, and determines the matching result between the image sequence and the action sequence based on the matching results corresponding to all action contents in the action sequence. Finally, based on the image sequence and The matching results of the action sequences generate detection results. Since the above detection results are generated in the server, the possibility of malicious programs changing the detection results is reduced. Combined with the above action scores, the accuracy of the matching results can be further improved, thereby achieving accurate detection of the security of the verification environment. In addition, since the computing power of the server is higher than that of the terminal, the server can reduce the time required to generate detection results, or use a detection model with more complex calculations but higher detection accuracy, and the terminal can display recorded images in the form of an H5 web page Sequence, retry, detection results and other pages make the terminal-side program lightweight and reduce the requirements on terminal computing power.

Illustratively, the above detection method is executed by a server. For example, the above server can be a physical server, a virtual host, a virtual private server (Virtual Private Server, VPS), a cloud server, etc. The server interacts with a terminal, which can be: a mobile device, a user terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, etc. In some possible implementations, the above detection method can also be implemented by the processor calling computer readable instructions stored in the memory.

Referring to Figure 1, Figure 1 shows a flow chart of a detection method according to an embodiment of the present disclosure. As shown in Figure 1, the above-mentioned detection method includes the following steps:

In step S100, receive an image sequence sent by the terminal in response to the action sequence, where the image sequence includes multiple frame images. For example, before performing this step, the server may randomly select action content to generate an action sequence in response to a detection request sent by the terminal. In one example, the above action sequence includes multiple action contents, each action content indicates a facial action that the user needs to complete. The server can obtain the above action content through a preset action content library that stores multiple action contents. In one example, the server can randomly select a fixed number of action contents through the preset action content library and randomly sequence each action content (that is, the server can respond to each detection request sent by the terminal and randomly issue Different numbers and different sequences of action sequences to the terminal). For example, if the server is set to select 3 action contents, then 3 action contents are selected from several action contents such as blinking, shaking head, nodding, opening mouth, tilting head, smiling, etc., and after shuffling the order of the action contents, a random action sequence. In one embodiment of the present disclosure, the action sequence is randomly generated, that is, each time the terminal sends a detection request, the action sequence obtained is highly likely to be different. Therefore, the possibility of malware recording the image sequence in advance is reduced, thereby improving the detection environment. security. Following the above example, the server can also select a random number of action contents and randomly sequence each action content to further improve the security of the detection environment. For example, if the server is set to select 2 to 5 random action contents, then randomly select a random number of action contents from several action contents such as blinking, shaking head, nodding, opening mouth, tilting head, smiling, etc., and then disrupt the action contents. After the sequence, a random sequence of actions is obtained. The server can then send the sequence of actions to the terminal. For example, after receiving the action sequence, the terminal prompts the user with the action sequence. For example, the terminal may prompt the user through voice or text. Then the user follows the action sequence and starts recording the image sequence through the terminal. After the recording is completed, the terminal sends the recorded image sequence to the server. For example, the terminal can also limit and prompt the user for the maximum recording duration of the image sequence to save server computing power. In one example, the terminal can interact with the user through a web page to implement a lightweight detection method. The image sequence can be a video, or a sequence of images taken continuously.

In a possible implementation, the terminal can send an encrypted image sequence to the server to improve the security of the image sequence transmission. In this case, step S100 can include: processing the images sent by the terminal in response to the action sequence. The sequence is decrypted to obtain the decrypted image sequence. Possible steps are then executed based on the decrypted image sequence. An embodiment of the present disclosure can reduce the risk of the image sequence being altered by other malicious programs by encrypting the image sequence, thereby improving the security of the image sequence during transmission. For example, the image sequence may be encrypted frame by frame to further increase the security of the image sequence during transmission.

Continuing to refer to Figure 1, in step S200, one action content in the action sequence is sequentially obtained as the current action content, and the following operations are performed:

In step S210, the starting image corresponding to the current action content is determined, and the action score of at least one image after the starting image in the image sequence and the current action content is determined in sequence. The above action score can be positively related to the standard degree of the action content in the image relative to the current action content, and can be obtained through a machine learning model. For example: If the machine learning model is a two-classification model (that is, the image input in each frame is classified as 'is the action content' or 'is not the action content'), then the machine learning model is in the process of classifying the action content in an image. , first the action score of the input image is generated, and then when the action score of the input image is greater than or equal to a score threshold, the input image is classified as 'is the action content'. In other words, the action score mentioned in an embodiment of the present disclosure may be equal to the action score used by the machine learning model in the classification process. For example, when it is determined that the current action content is the first action content in the action sequence, the starting image is the starting image of the image sequence, for example, the first frame image is used as The starting image of the image sequence, or pre-specifying a certain frame of image in the image sequence as the starting image. If the current action content is not the first action content in the action sequence, the starting image is the next frame image of the image that successfully matches the previous action content. An embodiment of the present disclosure reduces the calculation amount of action content matching by setting a starting image. On the other hand, the starting image can be used as a marker for the order of action content in an image sequence. For example, if the image sequence has 20 frames in total, and the action sequence includes: blinking, opening the mouth, and raising the head, the server will use the first frame image as the starting image of the blinking action and generate an action score. If the blinking action is successfully matched in the 6th frame, the 7th frame will be used as the starting image of the mouth opening action. If the mouth-opening action is successfully matched at the 12th frame, the 13th frame will be used as the starting image for the head-raising action. If the starting image of the head-up movement is successfully matched at frame 15, there is no need to detect images from frames 16 to 20 to save server computing power. Then, the action content corresponding to the successfully matched frames 6, 12, and 15 is used as the order of the action content in the image sequence. In a possible implementation, if the image sequence is encrypted by the terminal device, step S210 may include: sequentially determining at least one image after the starting image in the decrypted image sequence and the action of the current action content. score. An embodiment of the present disclosure can reduce the risk of the image sequence being altered by other malicious programs by encrypting the image sequence, thereby improving the security of the image sequence during transmission.

In step S220, the matching result corresponding to the current action content is determined based on the action score of any image. For example, when the action score of any image is greater than the first score threshold, the matching result corresponding to the current action content is determined to be a successful match. The server administrator can set the above-mentioned first scoring threshold according to the actual situation. For example, the higher the above-mentioned first scoring threshold, the more standard the corresponding action content in the image needs to be, and the more accurate the final matching result will be. The embodiment of the present disclosure does not limit the specific value of the first scoring threshold here. For example, within the first preset time from when the matching result of the action sequence is determined, if the matching result of the action sequence is not obtained successfully, and/or, when the action sequence is determined from the start, the matching result is not obtained. If no matching result indicating that the action content is successfully matched is obtained within the second preset time from the matching result of any action content, it is determined that the matching result between the image sequence and the action sequence is a matching failure. For example: if the first preset time is 20 seconds and the server does not complete the matching of each action content in the action sequence within 20 seconds, the server determines that the image sequence matching fails. If the second preset time is 5 seconds and the server does not complete the matching of a certain action content in the action sequence within 5 seconds, that is, a certain action content in the image sequence is matched for 5 seconds and still does not match successfully, the server determines that the Image sequence matching failed. By setting the above conditions, the security of the user verification phase can be further increased and the verification efficiency can be improved. The specific values of the first preset time and the second preset time are not limited in this embodiment of the present disclosure.

In step S230, the matching result between the image sequence and the action sequence is determined based on the matching results corresponding to all action contents in the action sequence. For example, when the matching results corresponding to all action contents in the action sequence are successful matches, it is determined that the matching results between the image sequence and the action sequence are successful matches. For example, the matching result of the above image sequence and action sequence can be determined through a machine learning model. Each action content in the action sequence corresponds to a machine learning model. The server implements matching detection of the image sequence by sequentially calling the machine learning model corresponding to the action content in the action sequence. The embodiments of the present disclosure do not limit the machine learning model here. The training method of the learning model is as long as each machine learning model can detect the corresponding action content. For example, the input of the machine learning model may be an image, and the output may be the matching result of the action content corresponding to the machine learning model. For example: the machine learning model can determine the action by extracting the positional feature relationship between facial key points in the image (for example, it can be extracted through the following algorithms: Active Shape Model, Active Appearance Models, cascade posture regression algorithm, temporal action detection algorithm, etc.) Whether the content matches successfully. Illustratively, in an embodiment of the present disclosure, the machine learning model is integrated in a server with higher computing power rather than in a terminal. That is, the detection method of an embodiment of the present disclosure can use more complex operations, but has a higher accuracy. For example, you can use a machine learning model with continuous image matching logic in related technologies to make the matching results more accurate. For example: for an image that successfully matches the 'open mouth' action content, the vertical distance of the key points of the mouth in the previous image should be smaller than the vertical distance of the key points of the mouth in the image (that is, the user's face has experienced 'Close state' to 'Open mouth state'). For the above machine learning model, reference may be made to related technologies, and the embodiments of the present disclosure will not be described in detail here. In other words, in an embodiment of the present disclosure, the server can perform motion detection and living body detection frame by frame to introduce related information of previous and subsequent frames to increase the accuracy of the detection results. For example, various time limits can also be added to the detection process (which will be described in detail later) to further increase the security of the user environment.

In a possible implementation, step S200 may include determining that the matching result is a successful match when it is determined that the action content in the action sequence matches the action content detected in the image sequence one by one and in the same order. . For example, if the action sequence includes: blinking, shaking head, and opening mouth, the sequence of action content in the image sequence should follow the order of blinking, shaking head, and opening mouth. If the order of the action content in the image sequence is blinking, opening the mouth, and shaking the head, the matching result is determined to be a matching failure. If the order of the action content in the image sequence is blinking, shaking the head, the matching result is also determined to be a matching failure. An embodiment of the present disclosure can accurately determine whether the user's detection environment is safe by detecting the number and sequence of action content.

In a possible implementation, for the sake of saving computing power and improving user security, determining the matching result between the image sequence and the action sequence may include: generating an image correspondence in the image sequence. The matching result is determined based on at least one of the facial area coordinates and the facial number, the image sequence, and the action sequence.

For example, the facial area coordinates can be obtained through the facial area extraction model in the related art. The embodiments of the present disclosure are not limited here. The above facial area coordinates are used to indicate the user's facial area in each image in the image sequence. In one example, determining the matching result may be determining the matching result between the facial area indicated by the facial area coordinates and the action sequence in at least one image of the image sequence, as the image sequence and the action sequence. The matching result of the action sequence. For example, steps S200 and S300 can be performed on the face area instead of the "image" mentioned above to obtain the detection result. By setting the coordinates of the facial area, local matching of images can be achieved, thereby reducing the computing power loss of the server.

The above facial number is used to distinguish users with different facial features, and can be obtained through the above facial region extraction model to ensure that the facial images in the image sequence are generally from the same user. In other words, if the image sequence includes images of user A and user B, then the facial area image of user A and the facial area image of user B have different corresponding face numbers. In one example, when it is determined that the number of images corresponding to the smallest facial number in the image sequence is greater than the first threshold, the matching result is determined to be a failed match. For example: the image sequence contains 15 frames of user A's images and 20 frames of user B's images. If the first threshold is 10 frames, the server determines that the matching result is a matching failure (that is, 15 frames is greater than 10 frames) to reduce the occurrence of user A. The probability that this situation is not recognized when verifying at the same time as user B. If set as above, the server can allow unexpected situations within certain limits when the terminal collects image sequences (for example: the terminal's camera collects the face behind the user, etc.) while ensuring the security of the user verification environment.

Continuing to refer to FIG. 1 , in step S300 , a detection result is generated based on the matching result of the image sequence and the action sequence.

In a possible implementation, the final detection result can be generated based on the matching result and the living body detection result.

Referring to FIG. 2 , FIG. 2 shows a flow chart of a detection method according to an embodiment of the present disclosure. As shown in Figure 2, in a possible implementation, step S300 may include:

In step S310, when it is determined that the matching result between the image sequence and the action sequence is a successful match, the first image in the image sequence is filtered out. In one example, this step may be: filtering out a preset number of first images in the image sequence whose action scores are greater than or equal to the second score threshold. An embodiment of the present disclosure can use the images with higher action scores in the filtered image sequence as images for subsequent life detection, thereby saving server computing power. In addition, images with higher action scores usually have a certain degree of representativeness, so filtering images has less impact on the accuracy of live detection results.

For example, the above-mentioned second scoring threshold may be less than or equal to the first scoring threshold, and the first scoring threshold and the second scoring threshold corresponding to different action contents may also be different. For example: If the image sequence includes in sequence: image A (score of 20), image B (score of 40), image C (score of 60), image D (score of 80), image E (score of 30), image F (score of 45), image G (score of 70), image H (score of 80), the above images A to D belong to the same action content (the first rating threshold corresponding to the action content is 65), the above Images E to H belong to the same action content (the first scoring threshold corresponding to this action content is 75). If the second scoring thresholds are both 50 and the preset number is 3, if the preset number is selected according to In principle, images C, D, and G are used as the above-mentioned first images, that is, image H is discarded. If the calculation time is not considered, all images with action scores greater than the second score threshold can also be obtained, and the images with the lowest scores are discarded, that is, a preset number of images with the highest action scores are retained to improve the accuracy of live body detection. For example, discard image C. If the second scoring threshold of the first action content is 30, the second scoring threshold of the second action content is 40, and the preset number is 6, then images B, C, D, F, G, H are used as the above-mentioned third an image. For example, the above-mentioned preset number may represent the total number of first images whose action scores are greater than or equal to the second scoring threshold, or may represent the number of images in each action content whose action scores are greater than or equal to the second scoring threshold. Following the above example, if the preset number corresponding to each action content is 2, then images B, C, D, F, G, H are filtered into images B, C, F, G, without considering the calculation time. , you can also filter the images with the highest scores, such as images C, D, G, and H.

In step S320, a living body detection result is generated based on the first image. The filtered first image not only has a higher picture quality (that is, it is more likely to be a living body), but also has a smaller number than the image sequence, which can effectively reduce the calculation time of living body detection.

In a possible implementation, step S320 may include: based on the first image, generating a living body detection sub-result corresponding to the first image. The first image with the highest action score is used as the second image. When it is determined that the life detection sub-result corresponding to the second image is a living body, and the ratio of the number of first images in which the life detection sub-result is a living body and the number of all first images is greater than or equal to the preset ratio, it is determined that the living body The test results are for living organisms. In an embodiment of the present disclosure, the above detection rule is defined, that is, the second image is a living body, and the proportion of images with detection results that are living bodies is greater than or equal to the preset ratio, and the server determines that the living body detection results of the image sequence are living bodies. In actual shooting scenarios, when users shoot image sequences through the terminal, there is a certain chance that they will be interfered by external factors, such as other people's faces being accidentally captured by the camera, the terminal falling, etc. Therefore, in the above situation, the image sequence may contain non-living body images. An embodiment of the present disclosure allows the image sequence to contain a certain number of non-living body images through the above detection rules. However, if the number of non-living images is greater than the preset ratio, it is more likely to be malicious detection. For example, someone else maliciously makes a mask of the account owner. If the mask can fit the person's face, other people can easily complete the account ownership. For various human action detections, in view of the above situation, one embodiment of the present disclosure reduces the probability that the above situations can pass detection by setting up a living body detection method, thereby improving the security of user verification. The above preset ratio can be set according to actual conditions, and is not limited in the embodiments of the present disclosure. For example, the higher the preset ratio, the higher the proportion of non-living images that can be accounted for, and the higher the probability that the living body detection result is a living body.

Illustratively, the above-mentioned living body detection sub-results can be generated by a machine learning model in the related art. The above-mentioned machine learning model can generate the living body detection sub-results based on the image or the face region image in the image. For example, the machine learning model can extract features such as color texture, non-rigid motion deformation, face material, and image distortion rate of living and non-living bodies, and generate living body detection sub-results. The embodiments of the present disclosure will not be described in detail here.

In step S330, the detection result is determined based on the biological detection result, wherein, when the biological detection result is determined to be a living body, the detection result is a detection pass. That is, when the matching result between the image sequence and the action sequence is successful and the living body detection result is alive, the detection result is that the detection is passed. The combination of action matching and live body detection further improves the accuracy of the verification. In addition, one embodiment of the present disclosure reduces the unsafety of using silent life detection in related technologies by using a combination of motion and life detection.

In a possible implementation, the above detection method further includes: sending the detection result to the terminal. For example, when the test result is that the test has passed, the server allows the terminal to perform further operations (for example: entering payment password, changing account password, opening specific permissions, etc.). After receiving the above test result, the terminal prompts the user that the test has passed, Further operations are possible. For example, after the detection result is generated, the service provider can obtain the detection result through the interface of the server, and then determine whether to provide the corresponding service to the terminal. That is, the service provider can use its own server and the server in an embodiment of the present disclosure to provide various services.

For example, when it is determined that the detection result is that the detection fails, a first instruction is sent to the terminal, and the first instruction controls the terminal to enter a page for resending the detection request. After receiving the first instruction, the terminal can enter the page for re-sending the detection request and prompt the user whether the detection fails and whether the detection request needs to be re-sent. This prompt can last for a certain period of time until the user re-sends the detection request through the terminal. In this case, the detection method of the embodiment of the present disclosure is re-executed from step S100 or its preceding steps. During each retry process, the server can generate different action sequences to reduce the possibility of pre-generated image sequences by malware passing detection, thereby improving the security of the user environment.

In one example, when it is determined that the number of times the first instruction is sent to the terminal within the third preset time reaches a second threshold, in response to a detection request sent by the terminal through the page, the terminal is Send a second instruction, where the second instruction is used to notify the terminal that the server rejects the detection request. After receiving the second instruction, the terminal prompts the user that the test fails, and the server refuses the terminal to initiate a retry through the above-mentioned retry page. Correspondingly, if the number of times the terminal sends the first instruction does not reach the second threshold, but the third preset time has been reached, the terminal may no longer display the page for resending the detection request, that is, the user can no longer send the detection request through this page. Detection request.

The above-mentioned third preset time can be calculated from the first time the terminal makes a detection request in the overall detection process. For example: If the above third preset time is 10 minutes, the time will start when the user opens the web page in the terminal and sends the first detection request. When the time exceeds 10 minutes, the user will not be able to retry on the page. Submit the detection request again within 12 days. If the above second threshold is 5 times, and the number of times the server issues the first command reaches 5 times within 10 minutes, it means that the user has retried 5 times and failed the test, and the server will reject subsequent requests sent by the terminal on this page. Detection request. For example, the detection request sent by the terminal when initiating a retry may carry a request identifier. For example, the request identifier may be an accumulated request identifier. Each time a retry is initiated, 1 will be added to the request identifier. The server can obtain The request identifier in the detection request sent by the terminal determines that the detection request comes from the above-mentioned page, is a detection request in the retry process, and determines the number of retries of the terminal (that is, the number of times the above-mentioned server sends the first instruction).

If set as above, the cost for an attacker to crack the detection method provided by the embodiments of the present disclosure may be increased. For example: 10 minutes after submitting the detection request for the first time or after 5 retries, the attacker will not be able to submit the detection request again through the same web page (such as the page used to resend the detection request above). If the attacker wants to continue to experience To try to crack the above detection method, you need to open a new web page again. If an attacker opens the webpage too many times, the IP address corresponding to the terminal will have records of multiple visits to the webpage. The owner of the terminal or all units can discover the terminal in time through security detection methods in related technologies. When performing malicious operations, it increases the probability of being discovered when the terminal performs malicious operations, which also increases the attacker's cracking cost.

In one example, if it is determined that the time from sending the first instruction to receiving the new image sequence is greater than a fourth preset time, the second instruction is sent to the terminal. For example: the fourth preset time can be 1 minute, that is, the user needs to complete the recording of the image sequence within 1 minute, so as to shorten the available time for attackers to maliciously use video editing software to generate synthetic image sequences, thereby reducing the attacker's use of The possibility of synthesizing image sequences further increases the security of the user's detection environment.

By formulating retry rules for detection requests, the disclosed embodiments shorten the time for an attacker to prepare a synthetic image sequence, thereby increasing the security of the user's detection environment.

The embodiments of the present disclosure do not limit the specific values of the above-mentioned third preset time, fourth preset time, and second threshold, and the service provider can determine the specific data according to actual needs.

Combined with actual application scenarios, users can enter online financial scenarios (or any other scenarios that require users to authenticate) through the H5 interface displayed on the terminal (such as a mobile phone or computer). When authenticating, the server can use the action content library based on (such as facial movements, head movements, etc.) to generate a sequence of actions with a random number and content, and then send it to the terminal. The terminal displays it to the user through the H5 interface, and the user takes corresponding actions based on the action sequence for the terminal to record. After the recording is completed (for example, the camera detects a specific action or the user manually clicks the corresponding button), the terminal sends the recorded image sequence to the server. The server scores the action content in the image sequence according to the action content in the action sequence. Until the action content in a certain image is rated qualified, it can start scoring the next action content in the action sequence until all scoring of the image sequence is completed or The action content in the action sequence has been fully completed (for example, if the score of the last action content in the action sequence is higher than the threshold, it can be regarded as fully completed). When the action content in the action sequence is deemed to be complete, live body detection can be started on the images in the image sequence whose scores are higher than a certain threshold. If the liveness test also passes, the user's usage environment can be considered safe and the user is allowed to perform some sensitive operations, such as transfers, cash withdrawals, etc. If the action sequence fails to pass the detection or the liveness detection fails, a prompt window can pop up in the H5 interface to remind the user to retry. If the action sequence fails to pass after multiple retries, the user's account will be restricted (for example: freeze funds). The service provider of online financial functions can also call the account's retry number information. If the number of retries is too many, the service provider will know that the account may have security risks, and can send prompt information to the mobile phone bound to the account. It can be understood that the above-mentioned method embodiments mentioned in this disclosure can be combined with each other to form a combined embodiment without violating the principle logic. Due to space limitations, the details will not be described in this disclosure. Those skilled in the art can understand that in the above-mentioned methods of specific embodiments, the specific execution order of each step should be determined by its function and possible internal logic. In addition, the execution body of the method steps may be executed by hardware, or executed by a processor running computer executable code.

In addition, the disclosure also provides detection devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any detection method provided by the disclosure. For corresponding technical solutions and descriptions, please refer to the corresponding records in the method section. Again.

Referring to FIG. 3 , FIG. 3 shows a block diagram of a detection device according to an embodiment of the present disclosure. As shown in Figure 3, in a possible implementation, an embodiment of the present disclosure also provides a detection device 100, which is applied to a server. The detection device includes: an image sequence receiving module 110 to receive a terminal response. In the image sequence sent in the action sequence, the image sequence includes multiple frames of images; the action content processing module 120 is used to sequentially obtain one action content in the action sequence as the current action content, and perform the following operations : Determine the starting image corresponding to the current action content, and sequentially determine the action score of at least one image after the start image in the image sequence and the current action content; determine the current action content based on the action score of any image Corresponding matching results; determine the matching results between the image sequence and the action sequence based on the matching results corresponding to all action contents in the action sequence; the detection result generation module 130 is used to determine the matching result between the image sequence and the action sequence based on Match the results and generate detection results.

In a possible implementation, the detection device is further configured to perform at least one of the following: when it is determined that the detection result is a failed detection, send a first instruction to the terminal, and the first instruction controls all The terminal enters a page for resending a detection request; and when the number of times the first instruction is sent to the terminal reaches a second threshold within a third preset time, responds to the detection request sent by the terminal through the page. , sending a second instruction to the terminal, the second instruction being used to notify the terminal that the server rejects the detection request; when it is determined that the time from sending the first instruction to receiving the new image sequence is greater than the fourth In the case of preset time, send the second instruction to the terminal.

In some embodiments, the functions or modules provided by the device provided by the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments. For specific implementation, refer to the description of the above method embodiments. For the sake of brevity, here No longer.

An embodiment of the present disclosure also provides a computer-readable storage medium on which computer program instructions are stored. When the computer program instructions are executed by a processor, the above method is implemented. Computer-readable storage media may be volatile or non-volatile computer-readable storage media.

An embodiment of the present disclosure also provides an electronic device, including: a processor; and a memory for storing instructions executable by the processor; wherein the processor is configured to call instructions stored in the memory to execute the above method.

An embodiment of the present disclosure also provides a computer program product, which includes computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code. When the computer readable code is processed by an electronic device When running in the processor, the processor in the electronic device executes the above method.

FIG. 4 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure. For example, electronic device 1900 may be provided as a server. Referring to FIG. 4 , electronic device 1900 includes a processing component 1922 , which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions, such as application programs, executable by processing component 1922 . The application program stored in memory 1932 may include one or more modules, each corresponding to a set of instructions. Furthermore, the processing component 1922 is configured to execute instructions to perform the above-described method.

Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input-output (I/O) interface 1958 . The electronic device 1900 can operate based on an operating system stored in the memory 1932, such as a Microsoft server operating system (Windows Server ^TM ), a graphical user interface operating system (Mac OS X ^TM ) launched by Apple, a multi-user multi-process computer operating system (Unix ^TM ), a free and open source Unix-like operating system (Linux ^TM ), an open source Unix-like operating system (FreeBSD ^TM ), or similar.

In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the above method.

The above-mentioned electronic device may be provided as a terminal, a server, or other forms of equipment.

The present disclosure may be a system, method, and/or computer program product. A computer program product may include a computer-readable storage medium having thereon computer-readable program instructions for causing a processor to implement aspects of the present disclosure.

Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) or Flash memory), Static Random Access Memory (SRAM), Compact Disk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Floppy Disk, Mechanical Coding Device, such as a printer with instructions stored on it. Protruding structures in hole cards or grooves, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.

Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .

Computer program instructions for performing operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages. Source code or object code written in any combination of object-oriented programming languages - such as Smalltalk, C++, etc., and conventional procedural programming languages - such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement. In situations involving remote computers, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect). In some embodiments, by utilizing state information of computer-readable program instructions to personalize an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), the electronic circuit can Computer readable program instructions are executed to implement various aspects of the disclosure.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.

Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that embody one or more elements for implementing the specified logical function(s). Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.

The computer program product can be implemented specifically through hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium. In another optional embodiment, the computer program product is embodied as a software product, such as a Software Development Kit (SDK), etc. .

The embodiments of the present disclosure have been described above. The above description is illustrative, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles of the embodiments, practical applications, or improvements to the technology in the market, or to enable other persons of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

A detection method applied to a server, characterized in that the detection method includes:

receiving an image sequence sent by the terminal in response to the action sequence, the image sequence including multiple frame images;

Acquire one action content in the action sequence in sequence as the current action content, and perform the following operations: determine the starting image corresponding to the current action content, and determine in sequence at least one image after the starting image in the image sequence and The action score of the current action content; based on the action score of any image, determine the matching result corresponding to the current action content; determine the matching result between the image sequence and the action sequence based on the matching results corresponding to all action content in the action sequence result;

Based on the matching result of the image sequence and the action sequence, a detection result is generated.
The detection method according to claim 1, wherein determining the starting image corresponding to the current action content includes: determining that the current action content is the first action content in the action sequence. In this case, the starting image is the starting image of the image sequence. When it is determined that the current action content is not the first action content in the action sequence, the starting image is the same as the previous one. The next frame of the image whose action content is successfully matched.
The detection method according to claim 1 or 2, wherein determining the matching result between the image sequence and the action sequence includes:

If within the first preset time from the start of determining the matching result of the action sequence, a successful matching result of the action sequence is not obtained, and/or, after the start of determining the content of any action in the action sequence If no matching result indicating that the action content is successfully matched is obtained within the second preset time from the matching result, it is determined that the matching result between the image sequence and the action sequence is a matching failure.
The detection method according to any one of claims 1 to 3, wherein generating a detection result based on the matching result of the image sequence and the action sequence includes:

When it is determined that the matching result between the image sequence and the action sequence is a successful match, filter out the first image in the image sequence;

Based on the first image, generate a living body detection result;

The detection result is determined based on the vitality detection result, wherein, when the vitality detection result is determined to be a living body, the detection result is a detection pass.
The detection method according to claim 4, wherein filtering out the first image in the image sequence includes:

Filter out a preset number of first images in the image sequence whose action scores are greater than or equal to the second score threshold.
The detection method according to claim 4 or 5, wherein generating a living body detection result based on the first image includes:

Based on the first image, generate a living body detection sub-result corresponding to the first image;

Use the first image with the highest action score as the second image;

When it is determined that the life detection sub-result corresponding to the second image is a living body, and the ratio of the number of first images in which the life detection sub-result is a living body and the number of all first images is greater than or equal to the preset ratio, it is determined that the living body The test results are for living organisms.
The detection method according to any one of claims 1 to 6, characterized in that the image sequence sent by the receiving terminal in response to the action sequence includes:

Decrypt the image sequence sent by the terminal in response to the action sequence to obtain the decrypted image sequence;

The step of sequentially determining the action score of at least one image after the starting image and the current action content in the image sequence includes: sequentially determining the action score of at least one image after the starting image and the current action in the decrypted image sequence. Action rating for content.
The detection method according to any one of claims 1 to 7, wherein determining the matching result between the image sequence and the action sequence further includes:

Generate at least one of facial area coordinates and facial number corresponding to the image in the image sequence;

The matching result is determined based on at least one of the facial area coordinates and the facial number, the image sequence, and the action sequence.
The detection method according to claim 8, wherein the matching result is determined based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence, include:

The matching result between the facial area indicated by the facial area coordinates and the action sequence in at least one image of the image sequence is determined as the matching result between the image sequence and the action sequence.
The detection method according to claim 8 or 9, wherein the matching is determined based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence. The results also include:

When it is determined that the number of images corresponding to the smallest facial number in the image sequence is greater than the first threshold, the matching result is determined to be a matching failure.
The detection method according to any one of claims 1 to 10, characterized in that the detection method further includes at least one of the following:

When it is determined that the detection result is that the detection fails, a first instruction is sent to the terminal, and the first instruction controls the terminal to enter a page for resending the detection request;

When it is determined that the number of times the first instruction is sent to the terminal within the third preset time reaches the second threshold, in response to the detection request sent by the terminal through the page, a second instruction is sent to the terminal, the The second instruction is used to notify the terminal that the server rejects the detection request;

If it is determined that the time from sending the first instruction to receiving the new image sequence is greater than the fourth preset time, a second instruction is sent to the terminal.
A face detection device applied to a server, characterized in that the detection device includes:

An image sequence receiving module, configured to receive an image sequence sent by the terminal in response to the action sequence, where the image sequence includes multiple frames of images;

The action content processing module is used to sequentially obtain an action content in the action sequence as the current action content, and perform the following operations: determine the starting image corresponding to the current action content, and determine the starting image in the image sequence in sequence The action score of at least one image after the image and the current action content; determine the matching result corresponding to the current action content based on the action score of any image; determine the image sequence based on the matching results corresponding to all action content in the action sequence The matching result with the action sequence;

A detection result generation module is used to generate detection results based on the matching results of the image sequence and the action sequence.
An electronic device, characterized by including:

processor;

Memory used to store instructions executable by the processor;

The processor is configured to call instructions stored in the memory to execute the detection method according to any one of claims 1 to 11.
A computer-readable storage medium on which computer program instructions are stored, characterized in that when the computer program instructions are executed by a processor, the detection method described in any one of claims 1 to 11 is implemented.
A computer program product includes computer readable code, or a non-volatile computer readable storage medium carrying computer readable code. When the computer readable code is run in a processor of an electronic device, the electronic device The processor in the device executes the detection method described in any one of claims 1 to 11.