WO2023173686A1 - Detection method and apparatus, electronic device, and storage medium - Google Patents

Detection method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2023173686A1
WO2023173686A1 PCT/CN2022/114904 CN2022114904W WO2023173686A1 WO 2023173686 A1 WO2023173686 A1 WO 2023173686A1 CN 2022114904 W CN2022114904 W CN 2022114904W WO 2023173686 A1 WO2023173686 A1 WO 2023173686A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
action
sequence
detection
image sequence
Prior art date
Application number
PCT/CN2022/114904
Other languages
French (fr)
Chinese (zh)
Inventor
张殿炎
尹瑞鹏
胡文超
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023173686A1 publication Critical patent/WO2023173686A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the field of detection, and in particular, to a detection method, device, electronic equipment and storage medium.
  • This disclosure proposes a detection technical solution.
  • a detection method is provided and applied to a server.
  • the detection method includes: receiving an image sequence sent by a terminal in response to an action sequence, where the image sequence includes multiple frames of images; and sequentially acquiring the An action content in the action sequence is used as the current action content, and the following operations are performed: determine the starting image corresponding to the current action content, and sequentially determine the relationship between at least one image after the starting image in the image sequence and the current action content.
  • Action scoring according to the action score of any image, determine the matching result corresponding to the current action content; according to the matching results corresponding to all action contents in the action sequence, determine the matching result between the image sequence and the action sequence; based on the The matching result between the image sequence and the action sequence is used to generate a detection result.
  • determining the starting image corresponding to the current action content includes: when determining that the current action content is the first action content in the action sequence, The starting image is the starting image of the image sequence. When it is determined that the current action content is not the first action content in the action sequence, the starting image is the one that successfully matches the previous action content. The next frame of the image.
  • determining the matching result of the image sequence and the action sequence includes: within the first preset time since determining the matching result of the action sequence, the step of determining the matching result of the action sequence is not obtained. In the case of a successful matching result of the action sequence, and/or, within the second preset time since the determination of the matching result of any action content in the action sequence, no matching result of a successful match of the action content is obtained. In this case, it is determined that the matching result between the image sequence and the action sequence is a matching failure.
  • generating a detection result based on a matching result between the image sequence and the action sequence includes: determining that the matching result between the image sequence and the action sequence is a successful match. Next, filter out the first image in the image sequence; generate a living body detection result based on the first image; determine the detection result based on the living body detection result, wherein, after determining that the living body detection result is a living body In the case of , the test result is that the test passed.
  • filtering out the first image in the image sequence includes: filtering out a preset number of first images in the image sequence whose action scores are greater than or equal to a second score threshold. image.
  • generating a living body detection result based on the first image includes: generating a living body detection sub-result corresponding to the first image based on the first image; and selecting the action with the highest score
  • the first image is used as the second image; when it is determined that the life detection sub-result corresponding to the second image is a living body, and the ratio of the number of first images with the life detection sub-result being a living body and the number of all first images is greater than or equal to the predetermined In the case of a ratio, it is determined that the living body detection result is a living body.
  • the receiving the image sequence sent by the terminal in response to the action sequence includes: decrypting the image sequence sent by the terminal in response to the action sequence to obtain a decrypted image sequence; the sequentially determining The action score of at least one image after the starting image in the image sequence and the current action content includes: sequentially determining the action score of at least one image after the start image in the decrypted image sequence and the current action content. .
  • determining the matching result between the image sequence and the action sequence further includes: generating at least one of facial area coordinates and facial numbers corresponding to images in the image sequence; The matching result is determined based on at least one of the facial area coordinates and the facial number, the image sequence, and the action sequence.
  • determining the matching result based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence includes: determining the In at least one image of the image sequence, the matching result between the facial area indicated by the facial area coordinates and the action sequence is used as the matching result between the image sequence and the action sequence.
  • determining the matching result based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence further includes: determining If the number of images corresponding to the smallest facial number in the image sequence is greater than the first threshold, the matching result is determined to be a failed match.
  • the detection method further includes at least one of the following: when it is determined that the detection result is that the detection fails, sending a first instruction to the terminal, and the first instruction controls the terminal Enter the page for resending the detection request; when it is determined that the number of times the first instruction is sent to the terminal within the third preset time reaches the second threshold, in response to the detection request sent by the terminal through the page, to The terminal sends a second instruction, the second instruction is used to notify the terminal that the server rejects the detection request; when it is determined that the time from sending the first instruction to receiving the new image sequence is greater than the fourth preset time, send a second instruction to the terminal.
  • a detection device which is applied to a server.
  • the detection device includes: an image sequence receiving module to receive an image sequence sent by a terminal in response to the action sequence.
  • the image sequence It includes multiple frames of images; an action content processing module is used to sequentially obtain an action content in the action sequence as the current action content, and perform the following operations: determine the starting image corresponding to the current action content, and determine the image sequence in sequence The action score of at least one image after the starting image and the current action content in the action sequence; determine the matching result corresponding to the current action content according to the action score of any image; determine the matching result corresponding to all action content in the action sequence, Determine the matching result of the image sequence and the action sequence; a detection result generation module is used to generate a detection result based on the matching result of the image sequence and the action sequence.
  • an electronic device including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call instructions stored in the memory to execute any of the above The detection method described in one item.
  • a computer-readable storage medium on which computer program instructions are stored.
  • the computer program instructions are executed by a processor, any one of the above detection methods is implemented.
  • a computer program product including computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code, when the computer readable code is stored in an electronic device
  • the processor in the electronic device executes the detection method described in any one of the above.
  • the present disclosure provides a detection method.
  • the server can receive an image sequence sent by a terminal in response to an action sequence.
  • the image sequence includes multiple frames of images, and then sequentially obtains an action content in the action sequence as the current action content. , and perform the following operations: determine the starting image corresponding to the current action content, determine the action score of at least one image after the start image in the image sequence and the current action content, and determine based on the action score of any image.
  • the matching result corresponding to the current action content is determined based on the matching results corresponding to all action contents in the action sequence, and the matching result between the image sequence and the action sequence is finally determined based on the matching result between the image sequence and the action sequence. , generate detection results.
  • the server Since the above detection results are generated in the server, the possibility of malicious programs changing the detection results is reduced. Combined with the above action scores, the accuracy of the matching results can be further improved, thereby achieving accurate detection of the security of the verification environment. In addition, because the computing power of the server is higher than that of the terminal, the server can reduce the time required to generate detection results, or use a detection model with more complex calculations but higher detection accuracy.
  • Figure 1 shows a flow chart of a detection method according to an embodiment of the present disclosure.
  • Figure 2 shows a flow chart of a detection method according to an embodiment of the present disclosure.
  • FIG. 3 shows a block diagram of a detection device according to an embodiment of the present disclosure.
  • FIG. 4 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • exemplary means "serving as an example, example, or illustrative.” Any embodiment described herein as “exemplary” is not necessarily to be construed as superior or superior to other embodiments.
  • a and/or B can mean: A exists alone, A and B exist simultaneously, and they exist alone. B these three situations.
  • at least one herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, and C, which can mean including from A, Any one or more elements selected from the set composed of B and C.
  • detection technology is usually built into the terminal application, so the detection process is roughly as follows: the terminal receives the user's detection request, and uses the built-in detection technology to identify the user's image data, generate detection results, and then detect The results are sent to the server, and the server determines whether to provide further services to the terminal based on the detection results.
  • the detection results are generated by the terminal and then transmitted to the server, so the detection results are easily tampered with by malicious programs.
  • the terminal detection result is a matching failure, but the malicious program modifies it to a successful matching and then sends it to the server.
  • the server will think that it can provide further services to the terminal, that is, the server thinks that the terminal's detection environment is safe, but in actual circumstances, the terminal's detection environment is not safe, and malicious programs can easily cause property losses to users.
  • the detection technology is integrated into the application, and the detection technology used has limited computing power. In order to reduce the user's waiting time, it is not easy for the application to use detection technology with more complex calculations when the computing power is limited. Therefore, its detection accuracy is limited.
  • an embodiment of the present disclosure provides a detection method.
  • the server can receive an image sequence sent by the terminal in response to an action sequence.
  • the image sequence includes multiple frames of images, and then obtains one of the action sequences in sequence.
  • Action content as the current action content, and perform the following operations: determine the starting image corresponding to the current action content, sequentially determine the action score of at least one image after the starting image in the image sequence and the current action content, according to any
  • the action score of an image determines the matching result corresponding to the current action content, and determines the matching result between the image sequence and the action sequence based on the matching results corresponding to all action contents in the action sequence. Finally, based on the image sequence and The matching results of the action sequences generate detection results.
  • the server can reduce the time required to generate detection results, or use a detection model with more complex calculations but higher detection accuracy, and the terminal can display recorded images in the form of an H5 web page Sequence, retry, detection results and other pages make the terminal-side program lightweight and reduce the requirements on terminal computing power.
  • the above detection method is executed by a server.
  • the above server can be a physical server, a virtual host, a virtual private server (Virtual Private Server, VPS), a cloud server, etc.
  • the server interacts with a terminal, which can be: a mobile device, a user terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, etc.
  • a terminal can be: a mobile device, a user terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, etc.
  • PDA Personal Digital Assistant
  • the above detection method can also be implemented by the processor calling computer readable instructions stored in the memory.
  • Figure 1 shows a flow chart of a detection method according to an embodiment of the present disclosure. As shown in Figure 1, the above-mentioned detection method includes the following steps:
  • step S100 receive an image sequence sent by the terminal in response to the action sequence, where the image sequence includes multiple frame images.
  • the server may randomly select action content to generate an action sequence in response to a detection request sent by the terminal.
  • the above action sequence includes multiple action contents
  • each action content indicates a facial action that the user needs to complete.
  • the server can obtain the above action content through a preset action content library that stores multiple action contents.
  • the server can randomly select a fixed number of action contents through the preset action content library and randomly sequence each action content (that is, the server can respond to each detection request sent by the terminal and randomly issue Different numbers and different sequences of action sequences to the terminal).
  • the server can also select a random number of action contents and randomly sequence each action content to further improve the security of the detection environment.
  • the server can then send the sequence of actions to the terminal.
  • the terminal prompts the user with the action sequence.
  • the terminal may prompt the user through voice or text.
  • the user follows the action sequence and starts recording the image sequence through the terminal.
  • the terminal sends the recorded image sequence to the server.
  • the terminal can also limit and prompt the user for the maximum recording duration of the image sequence to save server computing power.
  • the terminal can interact with the user through a web page to implement a lightweight detection method.
  • the image sequence can be a video, or a sequence of images taken continuously.
  • the terminal can send an encrypted image sequence to the server to improve the security of the image sequence transmission.
  • step S100 can include: processing the images sent by the terminal in response to the action sequence. The sequence is decrypted to obtain the decrypted image sequence. Possible steps are then executed based on the decrypted image sequence.
  • An embodiment of the present disclosure can reduce the risk of the image sequence being altered by other malicious programs by encrypting the image sequence, thereby improving the security of the image sequence during transmission.
  • the image sequence may be encrypted frame by frame to further increase the security of the image sequence during transmission.
  • step S200 one action content in the action sequence is sequentially obtained as the current action content, and the following operations are performed:
  • step S210 the starting image corresponding to the current action content is determined, and the action score of at least one image after the starting image in the image sequence and the current action content is determined in sequence.
  • the above action score can be positively related to the standard degree of the action content in the image relative to the current action content, and can be obtained through a machine learning model. For example: If the machine learning model is a two-classification model (that is, the image input in each frame is classified as 'is the action content' or 'is not the action content'), then the machine learning model is in the process of classifying the action content in an image.
  • the action score mentioned in an embodiment of the present disclosure may be equal to the action score used by the machine learning model in the classification process.
  • the starting image is the starting image of the image sequence, for example, the first frame image is used as The starting image of the image sequence, or pre-specifying a certain frame of image in the image sequence as the starting image. If the current action content is not the first action content in the action sequence, the starting image is the next frame image of the image that successfully matches the previous action content.
  • An embodiment of the present disclosure reduces the calculation amount of action content matching by setting a starting image.
  • the starting image can be used as a marker for the order of action content in an image sequence. For example, if the image sequence has 20 frames in total, and the action sequence includes: blinking, opening the mouth, and raising the head, the server will use the first frame image as the starting image of the blinking action and generate an action score. If the blinking action is successfully matched in the 6th frame, the 7th frame will be used as the starting image of the mouth opening action. If the mouth-opening action is successfully matched at the 12th frame, the 13th frame will be used as the starting image for the head-raising action.
  • step S210 may include: sequentially determining at least one image after the starting image in the decrypted image sequence and the action of the current action content. score.
  • step S220 the matching result corresponding to the current action content is determined based on the action score of any image. For example, when the action score of any image is greater than the first score threshold, the matching result corresponding to the current action content is determined to be a successful match.
  • the server administrator can set the above-mentioned first scoring threshold according to the actual situation. For example, the higher the above-mentioned first scoring threshold, the more standard the corresponding action content in the image needs to be, and the more accurate the final matching result will be.
  • the embodiment of the present disclosure does not limit the specific value of the first scoring threshold here.
  • the matching result of the action sequence is not obtained successfully, and/or, when the action sequence is determined from the start. If no matching result indicating that the action content is successfully matched is obtained within the second preset time from the matching result of any action content, it is determined that the matching result between the image sequence and the action sequence is a matching failure. For example: if the first preset time is 20 seconds and the server does not complete the matching of each action content in the action sequence within 20 seconds, the server determines that the image sequence matching fails.
  • the server determines that the Image sequence matching failed.
  • the security of the user verification phase can be further increased and the verification efficiency can be improved.
  • the specific values of the first preset time and the second preset time are not limited in this embodiment of the present disclosure.
  • the matching result between the image sequence and the action sequence is determined based on the matching results corresponding to all action contents in the action sequence. For example, when the matching results corresponding to all action contents in the action sequence are successful matches, it is determined that the matching results between the image sequence and the action sequence are successful matches.
  • the matching result of the above image sequence and action sequence can be determined through a machine learning model.
  • Each action content in the action sequence corresponds to a machine learning model.
  • the server implements matching detection of the image sequence by sequentially calling the machine learning model corresponding to the action content in the action sequence.
  • the embodiments of the present disclosure do not limit the machine learning model here.
  • the training method of the learning model is as long as each machine learning model can detect the corresponding action content.
  • the input of the machine learning model may be an image
  • the output may be the matching result of the action content corresponding to the machine learning model.
  • the machine learning model can determine the action by extracting the positional feature relationship between facial key points in the image (for example, it can be extracted through the following algorithms: Active Shape Model, Active Appearance Models, cascade posture regression algorithm, temporal action detection algorithm, etc.) Whether the content matches successfully.
  • the machine learning model is integrated in a server with higher computing power rather than in a terminal. That is, the detection method of an embodiment of the present disclosure can use more complex operations, but has a higher accuracy.
  • the server can perform motion detection and living body detection frame by frame to introduce related information of previous and subsequent frames to increase the accuracy of the detection results. For example, various time limits can also be added to the detection process (which will be described in detail later) to further increase the security of the user environment.
  • step S200 may include determining that the matching result is a successful match when it is determined that the action content in the action sequence matches the action content detected in the image sequence one by one and in the same order. .
  • the action sequence includes: blinking, shaking head, and opening mouth
  • the sequence of action content in the image sequence should follow the order of blinking, shaking head, and opening mouth. If the order of the action content in the image sequence is blinking, opening the mouth, and shaking the head, the matching result is determined to be a matching failure. If the order of the action content in the image sequence is blinking, shaking the head, the matching result is also determined to be a matching failure.
  • An embodiment of the present disclosure can accurately determine whether the user's detection environment is safe by detecting the number and sequence of action content.
  • determining the matching result between the image sequence and the action sequence may include: generating an image correspondence in the image sequence.
  • the matching result is determined based on at least one of the facial area coordinates and the facial number, the image sequence, and the action sequence.
  • the facial area coordinates can be obtained through the facial area extraction model in the related art.
  • the embodiments of the present disclosure are not limited here.
  • the above facial area coordinates are used to indicate the user's facial area in each image in the image sequence.
  • determining the matching result may be determining the matching result between the facial area indicated by the facial area coordinates and the action sequence in at least one image of the image sequence, as the image sequence and the action sequence.
  • the matching result of the action sequence For example, steps S200 and S300 can be performed on the face area instead of the "image" mentioned above to obtain the detection result.
  • the above facial number is used to distinguish users with different facial features, and can be obtained through the above facial region extraction model to ensure that the facial images in the image sequence are generally from the same user.
  • the image sequence includes images of user A and user B, then the facial area image of user A and the facial area image of user B have different corresponding face numbers.
  • the matching result is determined to be a failed match.
  • the image sequence contains 15 frames of user A's images and 20 frames of user B's images.
  • the server determines that the matching result is a matching failure (that is, 15 frames is greater than 10 frames) to reduce the occurrence of user A. The probability that this situation is not recognized when verifying at the same time as user B. If set as above, the server can allow unexpected situations within certain limits when the terminal collects image sequences (for example: the terminal's camera collects the face behind the user, etc.) while ensuring the security of the user verification environment.
  • step S300 a detection result is generated based on the matching result of the image sequence and the action sequence.
  • the final detection result can be generated based on the matching result and the living body detection result.
  • FIG. 2 shows a flow chart of a detection method according to an embodiment of the present disclosure.
  • step S300 may include:
  • step S310 when it is determined that the matching result between the image sequence and the action sequence is a successful match, the first image in the image sequence is filtered out.
  • this step may be: filtering out a preset number of first images in the image sequence whose action scores are greater than or equal to the second score threshold.
  • An embodiment of the present disclosure can use the images with higher action scores in the filtered image sequence as images for subsequent life detection, thereby saving server computing power.
  • images with higher action scores usually have a certain degree of representativeness, so filtering images has less impact on the accuracy of live detection results.
  • the above-mentioned second scoring threshold may be less than or equal to the first scoring threshold, and the first scoring threshold and the second scoring threshold corresponding to different action contents may also be different.
  • the image sequence includes in sequence: image A (score of 20), image B (score of 40), image C (score of 60), image D (score of 80), image E (score of 30), image F (score of 45), image G (score of 70), image H (score of 80), the above images A to D belong to the same action content (the first rating threshold corresponding to the action content is 65), the above Images E to H belong to the same action content (the first scoring threshold corresponding to this action content is 75).
  • the second scoring thresholds are both 50 and the preset number is 3, if the preset number is selected according to In principle, images C, D, and G are used as the above-mentioned first images, that is, image H is discarded. If the calculation time is not considered, all images with action scores greater than the second score threshold can also be obtained, and the images with the lowest scores are discarded, that is, a preset number of images with the highest action scores are retained to improve the accuracy of live body detection. For example, discard image C. If the second scoring threshold of the first action content is 30, the second scoring threshold of the second action content is 40, and the preset number is 6, then images B, C, D, F, G, H are used as the above-mentioned third an image.
  • the above-mentioned preset number may represent the total number of first images whose action scores are greater than or equal to the second scoring threshold, or may represent the number of images in each action content whose action scores are greater than or equal to the second scoring threshold.
  • the preset number corresponding to each action content is 2, then images B, C, D, F, G, H are filtered into images B, C, F, G, without considering the calculation time. , you can also filter the images with the highest scores, such as images C, D, G, and H.
  • a living body detection result is generated based on the first image.
  • the filtered first image not only has a higher picture quality (that is, it is more likely to be a living body), but also has a smaller number than the image sequence, which can effectively reduce the calculation time of living body detection.
  • step S320 may include: based on the first image, generating a living body detection sub-result corresponding to the first image.
  • the first image with the highest action score is used as the second image.
  • the above detection rule is defined, that is, the second image is a living body, and the proportion of images with detection results that are living bodies is greater than or equal to the preset ratio, and the server determines that the living body detection results of the image sequence are living bodies.
  • the image sequence may contain non-living body images.
  • An embodiment of the present disclosure allows the image sequence to contain a certain number of non-living body images through the above detection rules.
  • the number of non-living images is greater than the preset ratio, it is more likely to be malicious detection. For example, someone else maliciously makes a mask of the account owner. If the mask can fit the person's face, other people can easily complete the account ownership.
  • one embodiment of the present disclosure reduces the probability that the above situations can pass detection by setting up a living body detection method, thereby improving the security of user verification.
  • the above preset ratio can be set according to actual conditions, and is not limited in the embodiments of the present disclosure. For example, the higher the preset ratio, the higher the proportion of non-living images that can be accounted for, and the higher the probability that the living body detection result is a living body.
  • the above-mentioned living body detection sub-results can be generated by a machine learning model in the related art.
  • the above-mentioned machine learning model can generate the living body detection sub-results based on the image or the face region image in the image.
  • the machine learning model can extract features such as color texture, non-rigid motion deformation, face material, and image distortion rate of living and non-living bodies, and generate living body detection sub-results.
  • the embodiments of the present disclosure will not be described in detail here.
  • step S330 the detection result is determined based on the biological detection result, wherein, when the biological detection result is determined to be a living body, the detection result is a detection pass. That is, when the matching result between the image sequence and the action sequence is successful and the living body detection result is alive, the detection result is that the detection is passed.
  • the combination of action matching and live body detection further improves the accuracy of the verification.
  • one embodiment of the present disclosure reduces the unsafety of using silent life detection in related technologies by using a combination of motion and life detection.
  • the above detection method further includes: sending the detection result to the terminal.
  • the server allows the terminal to perform further operations (for example: entering payment password, changing account password, opening specific permissions, etc.).
  • the terminal prompts the user that the test has passed, Further operations are possible.
  • the service provider can obtain the detection result through the interface of the server, and then determine whether to provide the corresponding service to the terminal. That is, the service provider can use its own server and the server in an embodiment of the present disclosure to provide various services.
  • a first instruction is sent to the terminal, and the first instruction controls the terminal to enter a page for resending the detection request.
  • the terminal can enter the page for re-sending the detection request and prompt the user whether the detection fails and whether the detection request needs to be re-sent. This prompt can last for a certain period of time until the user re-sends the detection request through the terminal.
  • the detection method of the embodiment of the present disclosure is re-executed from step S100 or its preceding steps.
  • the server can generate different action sequences to reduce the possibility of pre-generated image sequences by malware passing detection, thereby improving the security of the user environment.
  • the terminal when it is determined that the number of times the first instruction is sent to the terminal within the third preset time reaches a second threshold, in response to a detection request sent by the terminal through the page, the terminal is Send a second instruction, where the second instruction is used to notify the terminal that the server rejects the detection request.
  • the terminal prompts the user that the test fails, and the server refuses the terminal to initiate a retry through the above-mentioned retry page.
  • the terminal may no longer display the page for resending the detection request, that is, the user can no longer send the detection request through this page. Detection request.
  • the above-mentioned third preset time can be calculated from the first time the terminal makes a detection request in the overall detection process. For example: If the above third preset time is 10 minutes, the time will start when the user opens the web page in the terminal and sends the first detection request. When the time exceeds 10 minutes, the user will not be able to retry on the page. Submit the detection request again within 12 days. If the above second threshold is 5 times, and the number of times the server issues the first command reaches 5 times within 10 minutes, it means that the user has retried 5 times and failed the test, and the server will reject subsequent requests sent by the terminal on this page. Detection request.
  • the detection request sent by the terminal when initiating a retry may carry a request identifier.
  • the request identifier may be an accumulated request identifier. Each time a retry is initiated, 1 will be added to the request identifier.
  • the server can obtain The request identifier in the detection request sent by the terminal determines that the detection request comes from the above-mentioned page, is a detection request in the retry process, and determines the number of retries of the terminal (that is, the number of times the above-mentioned server sends the first instruction).
  • the cost for an attacker to crack the detection method provided by the embodiments of the present disclosure may be increased. For example: 10 minutes after submitting the detection request for the first time or after 5 retries, the attacker will not be able to submit the detection request again through the same web page (such as the page used to resend the detection request above). If the attacker wants to continue to experience To try to crack the above detection method, you need to open a new web page again. If an attacker opens the webpage too many times, the IP address corresponding to the terminal will have records of multiple visits to the webpage. The owner of the terminal or all units can discover the terminal in time through security detection methods in related technologies. When performing malicious operations, it increases the probability of being discovered when the terminal performs malicious operations, which also increases the attacker's cracking cost.
  • the second instruction is sent to the terminal.
  • the fourth preset time can be 1 minute, that is, the user needs to complete the recording of the image sequence within 1 minute, so as to shorten the available time for attackers to maliciously use video editing software to generate synthetic image sequences, thereby reducing the attacker's use of
  • the possibility of synthesizing image sequences further increases the security of the user's detection environment.
  • the disclosed embodiments shorten the time for an attacker to prepare a synthetic image sequence, thereby increasing the security of the user's detection environment.
  • the embodiments of the present disclosure do not limit the specific values of the above-mentioned third preset time, fourth preset time, and second threshold, and the service provider can determine the specific data according to actual needs.
  • users can enter online financial scenarios (or any other scenarios that require users to authenticate) through the H5 interface displayed on the terminal (such as a mobile phone or computer).
  • the server can use the action content library based on (such as facial movements, head movements, etc.) to generate a sequence of actions with a random number and content, and then send it to the terminal.
  • the terminal displays it to the user through the H5 interface, and the user takes corresponding actions based on the action sequence for the terminal to record. After the recording is completed (for example, the camera detects a specific action or the user manually clicks the corresponding button), the terminal sends the recorded image sequence to the server.
  • the server scores the action content in the image sequence according to the action content in the action sequence.
  • the action content in a certain image is rated qualified, it can start scoring the next action content in the action sequence until all scoring of the image sequence is completed or The action content in the action sequence has been fully completed (for example, if the score of the last action content in the action sequence is higher than the threshold, it can be regarded as fully completed).
  • live body detection can be started on the images in the image sequence whose scores are higher than a certain threshold. If the liveness test also passes, the user's usage environment can be considered safe and the user is allowed to perform some sensitive operations, such as transfers, cash withdrawals, etc.
  • a prompt window can pop up in the H5 interface to remind the user to retry. If the action sequence fails to pass after multiple retries, the user's account will be restricted (for example: freeze funds).
  • the service provider of online financial functions can also call the account's retry number information. If the number of retries is too many, the service provider will know that the account may have security risks, and can send prompt information to the mobile phone bound to the account.
  • the specific execution order of each step should be determined by its function and possible internal logic.
  • the execution body of the method steps may be executed by hardware, or executed by a processor running computer executable code.
  • the disclosure also provides detection devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any detection method provided by the disclosure.
  • detection devices electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any detection method provided by the disclosure.
  • FIG. 3 shows a block diagram of a detection device according to an embodiment of the present disclosure.
  • an embodiment of the present disclosure also provides a detection device 100, which is applied to a server.
  • the detection device includes: an image sequence receiving module 110 to receive a terminal response.
  • the image sequence includes multiple frames of images; the action content processing module 120 is used to sequentially obtain one action content in the action sequence as the current action content, and perform the following operations : Determine the starting image corresponding to the current action content, and sequentially determine the action score of at least one image after the start image in the image sequence and the current action content; determine the current action content based on the action score of any image Corresponding matching results; determine the matching results between the image sequence and the action sequence based on the matching results corresponding to all action contents in the action sequence; the detection result generation module 130 is used to determine the matching result between the image sequence and the action sequence based on Match the results and generate detection results.
  • determining the starting image corresponding to the current action content includes: when determining that the current action content is the first action content in the action sequence, The starting image is the starting image of the image sequence. When it is determined that the current action content is not the first action content in the action sequence, the starting image is the one that successfully matches the previous action content. The next frame of the image.
  • determining the matching result of the image sequence and the action sequence includes: within the first preset time since determining the matching result of the action sequence, the step of determining the matching result of the action sequence is not obtained. In the case of a successful matching result of the action sequence, and/or, within the second preset time since the determination of the matching result of any action content in the action sequence, no matching result of a successful match of the action content is obtained. In this case, it is determined that the matching result between the image sequence and the action sequence is a matching failure.
  • generating a detection result based on a matching result between the image sequence and the action sequence includes: determining that the matching result between the image sequence and the action sequence is a successful match. Next, filter out the first image in the image sequence; generate a living body detection result based on the first image; determine the detection result based on the living body detection result, wherein, after determining that the living body detection result is a living body In the case of , the test result is that the test passed.
  • filtering out the first image in the image sequence includes: filtering out a preset number of first images in the image sequence whose action scores are greater than or equal to a second score threshold. image.
  • generating a living body detection result based on the first image includes: generating a living body detection sub-result corresponding to the first image based on the first image; and selecting the action with the highest score
  • the first image is used as the second image; when it is determined that the life detection sub-result corresponding to the second image is a living body, and the ratio of the number of first images with the life detection sub-result being a living body and the number of all first images is greater than or equal to the predetermined In the case of a ratio, it is determined that the living body detection result is a living body.
  • the receiving the image sequence sent by the terminal in response to the action sequence includes: decrypting the image sequence sent by the terminal in response to the action sequence to obtain a decrypted image sequence; the sequentially determining The action score of at least one image after the starting image in the image sequence and the current action content includes: sequentially determining the action score of at least one image after the start image in the decrypted image sequence and the current action content. .
  • determining the matching result between the image sequence and the action sequence further includes: generating at least one of facial area coordinates and facial numbers corresponding to images in the image sequence; The matching result is determined based on at least one of the facial area coordinates and the facial number, the image sequence, and the action sequence.
  • determining the matching result based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence includes: determining the In at least one image of the image sequence, the matching result between the facial area indicated by the facial area coordinates and the action sequence is used as the matching result between the image sequence and the action sequence.
  • determining the matching result based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence further includes: determining If the number of images corresponding to the smallest facial number in the image sequence is greater than the first threshold, the matching result is determined to be a failed match.
  • the detection device is further configured to perform at least one of the following: when it is determined that the detection result is a failed detection, send a first instruction to the terminal, and the first instruction controls all The terminal enters a page for resending a detection request; and when the number of times the first instruction is sent to the terminal reaches a second threshold within a third preset time, responds to the detection request sent by the terminal through the page. , sending a second instruction to the terminal, the second instruction being used to notify the terminal that the server rejects the detection request; when it is determined that the time from sending the first instruction to receiving the new image sequence is greater than the fourth In the case of preset time, send the second instruction to the terminal.
  • the functions or modules provided by the device provided by the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or modules provided by the device provided by the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • An embodiment of the present disclosure also provides a computer-readable storage medium on which computer program instructions are stored. When the computer program instructions are executed by a processor, the above method is implemented.
  • Computer-readable storage media may be volatile or non-volatile computer-readable storage media.
  • An embodiment of the present disclosure also provides an electronic device, including: a processor; and a memory for storing instructions executable by the processor; wherein the processor is configured to call instructions stored in the memory to execute the above method.
  • An embodiment of the present disclosure also provides a computer program product, which includes computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code.
  • computer readable code When the computer readable code is processed by an electronic device When running in the processor, the processor in the electronic device executes the above method.
  • FIG. 4 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.
  • electronic device 1900 may be provided as a server.
  • electronic device 1900 includes a processing component 1922 , which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions, such as application programs, executable by processing component 1922 .
  • the application program stored in memory 1932 may include one or more modules, each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above-described method.
  • Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input-output (I/O) interface 1958 .
  • the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as a Microsoft server operating system (Windows Server TM ), a graphical user interface operating system (Mac OS X TM ) launched by Apple, a multi-user multi-process computer operating system (Unix TM ), a free and open source Unix-like operating system (Linux TM ), an open source Unix-like operating system (FreeBSD TM ), or similar.
  • Microsoft server operating system Windows Server TM
  • Mac OS X TM graphical user interface operating system
  • Unix TM multi-user multi-process computer operating system
  • Linux TM free and open source Unix-like operating system
  • FreeBSD TM open source Unix-like operating system
  • a non-volatile computer-readable storage medium is also provided, such as a memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the above method.
  • the above-mentioned electronic device may be provided as a terminal, a server, or other forms of equipment.
  • the present disclosure may be a system, method, and/or computer program product.
  • a computer program product may include a computer-readable storage medium having thereon computer-readable program instructions for causing a processor to implement aspects of the present disclosure.
  • Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) or Flash memory), Static Random Access Memory (SRAM), Compact Disk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Floppy Disk, Mechanical Coding Device, such as a printer with instructions stored on it.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • Flash memory Static Random Access Memory
  • CD-ROM Compact Disk Read Only Memory
  • DVD Digital Versatile Disk
  • Memory Stick
  • Computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .
  • Computer program instructions for performing operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages.
  • the computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect).
  • LAN local area network
  • WAN wide area network
  • an external computer such as an Internet service provider through the Internet. connect
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA)
  • the electronic circuit can Computer readable program instructions are executed to implement various aspects of the disclosure.
  • These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that embody one or more elements for implementing the specified logical function(s).
  • Executable instructions may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the computer program product can be implemented specifically through hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium.
  • the computer program product is embodied as a software product, such as a Software Development Kit (SDK), etc. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a detection method and apparatus, an electronic device and a storage medium. The detection method comprises: receiving an image sequence sent by a terminal in response to an action sequence, the image sequence comprising multiple frames of images; sequentially acquiring one action content in the action sequence as current action content, and performing the following operations: determining a starting image corresponding to the current action content, and sequentially determining action scores of each image after the starting image in the image sequence and of the current action content; determining, according to the action score of any image, a matching result corresponding to the current action content; determining a matching result between the image sequence and the action sequence according to the matching results corresponding to all the action content in the action sequence; and generating a detection result on the basis of the matching result between the image sequence and the action sequence. The present disclosure can improve the security of a user detection environment and the accuracy of detection results.

Description

检测方法、装置、电子设备及存储介质Detection methods, devices, electronic equipment and storage media
本申请要求2022年03月17日提交、申请号为202210265003.2,发明名称为“检测方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed on March 17, 2022, with the application number 202210265003.2 and the invention title "Detection method, device, electronic equipment and storage medium", the entire content of which is incorporated into this application by reference.
技术领域Technical field
本公开涉及检测领域,尤其涉及一种检测方法、装置、电子设备及存储介质。The present disclosure relates to the field of detection, and in particular, to a detection method, device, electronic equipment and storage medium.
背景技术Background technique
在线上金融、账号登录等需要人机验证的场景中,运营商更希望通过了人机验证的用户是真实的账户所有者,而不是程序脚本或假冒者。若人机验证场景中,用户为程序脚本或假冒者,则此次验证有较大概率为恶意验证,即此次验证的环境并不安全,易造成用户的财产损失。因此,如何提高验证环境的安全性,是亟需解决的问题之一。In scenarios such as online finance and account login that require human-machine verification, operators prefer that users who pass human-machine verification are real account owners, not program scripts or impostors. If in the human-machine verification scenario, the user is a program script or an impostor, there is a high probability that the verification will be malicious. That is, the verification environment is not safe and may easily cause property losses to the user. Therefore, how to improve the security of the verification environment is one of the issues that needs to be solved urgently.
发明内容Contents of the invention
本公开提出了一种检测技术方案。This disclosure proposes a detection technical solution.
根据本公开的一方面,提供了一种检测方法,应用于服务器,所述检测方法包括:接收终端响应于动作序列所发送的图像序列,所述图像序列中包括多帧图像;依次获取所述动作序列中的一个动作内容,作为当前动作内容,并执行以下操作:确定所述当前动作内容对应的起始图像,依次确定图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分;根据任一图像的动作评分,确定所述当前动作内容对应的匹配结果;根据动作序列中所有动作内容对应的匹配结果,确定所述图像序列与所述动作序列的匹配结果;基于所述图像序列与所述动作序列的匹配结果,生成检测结果。According to an aspect of the present disclosure, a detection method is provided and applied to a server. The detection method includes: receiving an image sequence sent by a terminal in response to an action sequence, where the image sequence includes multiple frames of images; and sequentially acquiring the An action content in the action sequence is used as the current action content, and the following operations are performed: determine the starting image corresponding to the current action content, and sequentially determine the relationship between at least one image after the starting image in the image sequence and the current action content. Action scoring; according to the action score of any image, determine the matching result corresponding to the current action content; according to the matching results corresponding to all action contents in the action sequence, determine the matching result between the image sequence and the action sequence; based on the The matching result between the image sequence and the action sequence is used to generate a detection result.
在一种可能的实施方式中,所述确定所述当前动作内容对应的起始图像,包括:在确定所述当前动作内容为所述动作序列中的第一个动作内容的情况下,所述起始图像为所述图像序列的起始图像,在确定所述当前动作内容不是所述动作序列中的第一个动作内容的情况下,所述起始图像为与前一动作内容匹配成功的图像的下一帧图像。In a possible implementation, determining the starting image corresponding to the current action content includes: when determining that the current action content is the first action content in the action sequence, The starting image is the starting image of the image sequence. When it is determined that the current action content is not the first action content in the action sequence, the starting image is the one that successfully matches the previous action content. The next frame of the image.
在一种可能的实施方式中,所述确定所述图像序列与所述动作序列的匹配结果,包括:在开始确定所述动作序列的匹配结果起的第一预设时间内,未得到所述动作序列匹配成功的匹配结果的情况下,和/或,在开始确定所述动作序列中任一动作内容的匹配结果起的第二预设时间内,未得到该动作内容匹配成功的匹配结果的情况下,确定所述图像序列与所述动作序列的匹配结果为匹配失败。In a possible implementation, determining the matching result of the image sequence and the action sequence includes: within the first preset time since determining the matching result of the action sequence, the step of determining the matching result of the action sequence is not obtained. In the case of a successful matching result of the action sequence, and/or, within the second preset time since the determination of the matching result of any action content in the action sequence, no matching result of a successful match of the action content is obtained. In this case, it is determined that the matching result between the image sequence and the action sequence is a matching failure.
在一种可能的实施方式中,所述基于所述图像序列与所述动作序列的匹配结果,生成检测结果,包括:在确定所述图像序列与所述动作序列的匹配结果为匹配成功的情况下,筛选出所述图像序列中的第一图像;基于所述第一图像,生成活体检测结果;基于所述活体检测结果,确定所述检测结果,其中,在确定所述活体检测结果为活体的情况 下,所述检测结果为检测通过。In a possible implementation, generating a detection result based on a matching result between the image sequence and the action sequence includes: determining that the matching result between the image sequence and the action sequence is a successful match. Next, filter out the first image in the image sequence; generate a living body detection result based on the first image; determine the detection result based on the living body detection result, wherein, after determining that the living body detection result is a living body In the case of , the test result is that the test passed.
在一种可能的实施方式中,所述筛选出所述图像序列中的第一图像,包括:筛选出所述图像序列中的预设数量的、动作评分大于或等于第二评分阈值的第一图像。In a possible implementation, filtering out the first image in the image sequence includes: filtering out a preset number of first images in the image sequence whose action scores are greater than or equal to a second score threshold. image.
在一种可能的实施方式中,所述基于所述第一图像,生成活体检测结果,包括:基于所述第一图像,生成所述第一图像对应的活体检测子结果;将动作评分最高的第一图像作为第二图像;在确定所述第二图像对应的活体检测子结果为活体、且活体检测子结果为活体的第一图像的数量与所有第一图像的数量的比值大于或等于预设比值的情况下,确定活体检测结果为活体。In a possible implementation, generating a living body detection result based on the first image includes: generating a living body detection sub-result corresponding to the first image based on the first image; and selecting the action with the highest score The first image is used as the second image; when it is determined that the life detection sub-result corresponding to the second image is a living body, and the ratio of the number of first images with the life detection sub-result being a living body and the number of all first images is greater than or equal to the predetermined In the case of a ratio, it is determined that the living body detection result is a living body.
在一种可能的实施方式中,所述接收终端响应于动作序列所发送的图像序列,包括:对终端响应于动作序列所发送的图像序列进行解密,得到解密后的图像序列;所述依次确定图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分,包括:依次确定所述解密后的图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分。In a possible implementation, the receiving the image sequence sent by the terminal in response to the action sequence includes: decrypting the image sequence sent by the terminal in response to the action sequence to obtain a decrypted image sequence; the sequentially determining The action score of at least one image after the starting image in the image sequence and the current action content includes: sequentially determining the action score of at least one image after the start image in the decrypted image sequence and the current action content. .
在一种可能的实施方式中,所述确定所述图像序列与所述动作序列的匹配结果,还包括:生成所述图像序列中的图像对应的面部区域坐标、面部编号中的至少一项;根据所述面部区域坐标、所述面部编号中的至少一项,以及所述图像序列、所述动作序列,确定所述匹配结果。In a possible implementation, determining the matching result between the image sequence and the action sequence further includes: generating at least one of facial area coordinates and facial numbers corresponding to images in the image sequence; The matching result is determined based on at least one of the facial area coordinates and the facial number, the image sequence, and the action sequence.
在一种可能的实施方式中,所述根据所述面部区域坐标、所述面部编号中的至少一项,以及所述图像序列、所述动作序列,确定所述匹配结果,包括:确定所述图像序列的至少一个图像中,所述面部区域坐标指示的面部区域与所述动作序列的匹配结果,作为所述图像序列与所述动作序列的匹配结果。In a possible implementation, determining the matching result based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence includes: determining the In at least one image of the image sequence, the matching result between the facial area indicated by the facial area coordinates and the action sequence is used as the matching result between the image sequence and the action sequence.
在一种可能的实施方式中,所述根据所述面部区域坐标、所述面部编号中的至少一项,以及所述图像序列、所述动作序列,确定所述匹配结果,还包括:在确定所述图像序列中数量最少的面部编号对应的图像的数量,大于第一阈值的情况下,确定所述匹配结果为匹配失败。In a possible implementation, determining the matching result based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence further includes: determining If the number of images corresponding to the smallest facial number in the image sequence is greater than the first threshold, the matching result is determined to be a failed match.
在一种可能的实施方式中,所述检测方法还包括以下至少一项:在确定所述检测结果为检测不通过的情况下,向终端发送第一指令,所述第一指令控制所述终端进入用于重新发送检测请求的页面;在确定第三预设时间内向所述终端发送第一指令的次数达到第二阈值的情况下,响应于所述终端通过所述页面发送的检测请求,向所述终端发送第二指令,所述第二指令用以通知所述终端,所述服务器拒绝所述检测请求;在确定自发送第一指令至接收到新的图像序列的时间大于第四预设时间的情况下,向所述终端发送第二指令。In a possible implementation, the detection method further includes at least one of the following: when it is determined that the detection result is that the detection fails, sending a first instruction to the terminal, and the first instruction controls the terminal Enter the page for resending the detection request; when it is determined that the number of times the first instruction is sent to the terminal within the third preset time reaches the second threshold, in response to the detection request sent by the terminal through the page, to The terminal sends a second instruction, the second instruction is used to notify the terminal that the server rejects the detection request; when it is determined that the time from sending the first instruction to receiving the new image sequence is greater than the fourth preset time, send a second instruction to the terminal.
根据本公开的一方面,提供了一种检测装置,应用于服务器,所述检测装置包括:图像序列接收模块,用以接收终端响应于所述动作序列所发送的图像序列,所述图像序列中包括多帧图像;动作内容处理模块,用以依次获取所述动作序列中的一个动作内容,作为当前动作内容,并执行以下操作:确定所述当前动作内容对应的起始图像,依次确 定图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分;根据任一图像的动作评分,确定所述当前动作内容对应的匹配结果;根据动作序列中所有动作内容对应的匹配结果,确定所述图像序列与所述动作序列的匹配结果;检测结果生成模块,用以基于所述图像序列与所述动作序列的匹配结果,生成检测结果。According to an aspect of the present disclosure, a detection device is provided, which is applied to a server. The detection device includes: an image sequence receiving module to receive an image sequence sent by a terminal in response to the action sequence. In the image sequence, It includes multiple frames of images; an action content processing module is used to sequentially obtain an action content in the action sequence as the current action content, and perform the following operations: determine the starting image corresponding to the current action content, and determine the image sequence in sequence The action score of at least one image after the starting image and the current action content in the action sequence; determine the matching result corresponding to the current action content according to the action score of any image; determine the matching result corresponding to all action content in the action sequence, Determine the matching result of the image sequence and the action sequence; a detection result generation module is used to generate a detection result based on the matching result of the image sequence and the action sequence.
根据本公开的一方面,提供了一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中所述处理器被配置为调用所述存储器存储的指令,以执行上述任意一项所述的检测方法。According to an aspect of the present disclosure, an electronic device is provided, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call instructions stored in the memory to execute any of the above The detection method described in one item.
根据本公开的一方面,提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述任意一项所述的检测方法。According to an aspect of the present disclosure, a computer-readable storage medium is provided, on which computer program instructions are stored. When the computer program instructions are executed by a processor, any one of the above detection methods is implemented.
根据本公开的一方面,提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行用于实现上述任意一项所述的检测方法。According to an aspect of the present disclosure, a computer program product is provided, including computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code, when the computer readable code is stored in an electronic device When running in a processor, the processor in the electronic device executes the detection method described in any one of the above.
本公开提供了一种检测方法,服务器可接收终端响应于动作序列所发送的图像序列,所述图像序列中包括多帧图像,而后依次获取所述动作序列中的一个动作内容,作为当前动作内容,并执行以下操作:确定所述当前动作内容对应的起始图像,依次确定图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分,根据任一图像的动作评分,确定所述当前动作内容对应的匹配结果,根据动作序列中所有动作内容对应的匹配结果,确定所述图像序列与所述动作序列的匹配结果,最终基于所述图像序列与所述动作序列的匹配结果,生成检测结果。由于上述检测结果是在服务器中生成的,故降低了恶意程序更改检测结果的可能性,结合上述动作评分,能够进一步提高匹配结果的准确性,进而实现对验证环境安全性的准确检测。此外,由于服务器算力相较于终端更高,故服务器可减少生成检测结果所需的时间,或使用运算较复杂,但是检测准确率较高的检测模型。The present disclosure provides a detection method. The server can receive an image sequence sent by a terminal in response to an action sequence. The image sequence includes multiple frames of images, and then sequentially obtains an action content in the action sequence as the current action content. , and perform the following operations: determine the starting image corresponding to the current action content, determine the action score of at least one image after the start image in the image sequence and the current action content, and determine based on the action score of any image. The matching result corresponding to the current action content is determined based on the matching results corresponding to all action contents in the action sequence, and the matching result between the image sequence and the action sequence is finally determined based on the matching result between the image sequence and the action sequence. , generate detection results. Since the above detection results are generated in the server, the possibility of malicious programs changing the detection results is reduced. Combined with the above action scores, the accuracy of the matching results can be further improved, thereby achieving accurate detection of the security of the verification environment. In addition, because the computing power of the server is higher than that of the terminal, the server can reduce the time required to generate detection results, or use a detection model with more complex calculations but higher detection accuracy.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。The accompanying drawings herein are incorporated into and constitute a part of this specification. They illustrate embodiments consistent with the disclosure and, together with the description, serve to explain the technical solutions of the disclosure.
图1示出了根据本公开一实施例提供的检测方法的流程图。Figure 1 shows a flow chart of a detection method according to an embodiment of the present disclosure.
图2示出了根据本公开一实施例提供的检测方法的流程图。Figure 2 shows a flow chart of a detection method according to an embodiment of the present disclosure.
图3示出了根据本公开一实施例提供的检测装置的框图。FIG. 3 shows a block diagram of a detection device according to an embodiment of the present disclosure.
图4示出了根据本公开一实施例提供的电子设备的框图。FIG. 4 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。Various exemplary embodiments, features, and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the drawings identify functionally identical or similar elements. Although various aspects of the embodiments are illustrated in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The word "exemplary" as used herein means "serving as an example, example, or illustrative." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or superior to other embodiments.
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article is just an association relationship that describes related objects, indicating that three relationships can exist. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and they exist alone. B these three situations. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, and C, which can mean including from A, Any one or more elements selected from the set composed of B and C.
另外,为了更好地说明本公开,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本公开同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本公开的主旨。In addition, in order to better explain the present disclosure, numerous specific details are given in the following detailed description. It will be understood by those skilled in the art that the present disclosure may be practiced without certain specific details. In some instances, methods, means, components and circuits that are well known to those skilled in the art are not described in detail in order to emphasize the subject matter of the disclosure.
相关技术中,通常将检测技术内置于终端的应用程序中,故其检测流程大体为:终端接收用户的检测请求,并通过内置的检测技术,识别用户的图像数据,生成检测结果,而后将检测结果发送至服务器,服务器根据该检测结果,确定是否为终端提供进一步的服务。In related technologies, detection technology is usually built into the terminal application, so the detection process is roughly as follows: the terminal receives the user's detection request, and uses the built-in detection technology to identify the user's image data, generate detection results, and then detect The results are sent to the server, and the server determines whether to provide further services to the terminal based on the detection results.
但是如此设置易造成以下问题:1、检测结果由终端生成,而后传输至服务器,故其检测结果易被恶意程序篡改。例如:终端检测结果为匹配失败,但是恶意程序将其修改为匹配成功,而后发送至服务器。服务器便会认为其可以为终端提供进一步的服务,即服务器认为终端的检测环境安全,但是在实际情况下,终端的检测环境并不安全,恶意程序此时极易造成用户的财产损失。2、检测技术集成在应用程序中,其使用的检测技术的算力有限,而在算力有限的情况下,出于减少用户等待时长的考虑,应用程序不易使用运算量更加复杂的检测技术,故其检测准确率有限。However, such a setting can easily cause the following problems: 1. The detection results are generated by the terminal and then transmitted to the server, so the detection results are easily tampered with by malicious programs. For example: the terminal detection result is a matching failure, but the malicious program modifies it to a successful matching and then sends it to the server. The server will think that it can provide further services to the terminal, that is, the server thinks that the terminal's detection environment is safe, but in actual circumstances, the terminal's detection environment is not safe, and malicious programs can easily cause property losses to users. 2. The detection technology is integrated into the application, and the detection technology used has limited computing power. In order to reduce the user's waiting time, it is not easy for the application to use detection technology with more complex calculations when the computing power is limited. Therefore, its detection accuracy is limited.
有鉴于此,本公开一实施例提供了一种检测方法,服务器可接收终端响应于动作序列所发送的图像序列,所述图像序列中包括多帧图像,而后依次获取所述动作序列中的一个动作内容,作为当前动作内容,并执行以下操作:确定所述当前动作内容对应的起始图像,依次确定图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分,根据任一图像的动作评分,确定所述当前动作内容对应的匹配结果,根据动作序列中所有动作内容对应的匹配结果,确定所述图像序列与所述动作序列的匹配结果,最终基于所述图像序列与所述动作序列的匹配结果,生成检测结果。由于上述检测结果是在服务器中生成的,故降低了恶意程序更改检测结果的可能性,结合上述动作评分,能够进一步提高匹配结果的准确性,进而实现对验证环境安全性的准确检测。此外,由于服务器算力相较于终端更高,故服务器可减少生成检测结果所需的时间,或使用运算较复 杂,但是检测准确率较高的检测模型,终端可以以H5网页形式展示录制图像序列、重试、检测结果等页面,使终端侧程序轻量化,降低对终端算力的要求。In view of this, an embodiment of the present disclosure provides a detection method. The server can receive an image sequence sent by the terminal in response to an action sequence. The image sequence includes multiple frames of images, and then obtains one of the action sequences in sequence. Action content, as the current action content, and perform the following operations: determine the starting image corresponding to the current action content, sequentially determine the action score of at least one image after the starting image in the image sequence and the current action content, according to any The action score of an image determines the matching result corresponding to the current action content, and determines the matching result between the image sequence and the action sequence based on the matching results corresponding to all action contents in the action sequence. Finally, based on the image sequence and The matching results of the action sequences generate detection results. Since the above detection results are generated in the server, the possibility of malicious programs changing the detection results is reduced. Combined with the above action scores, the accuracy of the matching results can be further improved, thereby achieving accurate detection of the security of the verification environment. In addition, since the computing power of the server is higher than that of the terminal, the server can reduce the time required to generate detection results, or use a detection model with more complex calculations but higher detection accuracy, and the terminal can display recorded images in the form of an H5 web page Sequence, retry, detection results and other pages make the terminal-side program lightweight and reduce the requirements on terminal computing power.
示例性地,上述检测方法由服务器执行,例如:上述服务器可为物理服务器、虚拟主机、虚拟专用服务器(Virtual Private Server,VPS)、云端服务器等。该服务器与一终端进行交互,上述终端可为:移动设备、用户终端、蜂窝电话、无绳电话、个人数字处理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实施方式中,上述检测方法也可通过处理器调用存储器中存储的计算机可读指令的方式实现。Illustratively, the above detection method is executed by a server. For example, the above server can be a physical server, a virtual host, a virtual private server (Virtual Private Server, VPS), a cloud server, etc. The server interacts with a terminal, which can be: a mobile device, a user terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, etc. In some possible implementations, the above detection method can also be implemented by the processor calling computer readable instructions stored in the memory.
参阅图1所示,图1示出根据本公开一实施例的检测方法的流程图,如图1所示,上述检测方法包括以下步骤:Referring to Figure 1, Figure 1 shows a flow chart of a detection method according to an embodiment of the present disclosure. As shown in Figure 1, the above-mentioned detection method includes the following steps:
在步骤S100中,接收终端响应于动作序列所发送的图像序列,所述图像序列中包括多帧图像。示例性的,在执行该步骤之前,服务器可响应于终端发送的检测请求,随机选取动作内容以生成动作序列。在一个示例中,上述动作序列包括多个动作内容,每个动作内容指示了一种用户需完成的面部动作。服务器可通过一预设的存储有多个动作内容的动作内容库,获取上述动作内容。在一个示例中,服务器可通过所述预设的动作内容库,随机选取固定数量的动作内容,并随机为每个动作内容排序(也即服务器可响应终端每次发送的检测请求,随机下发不同数量、不同顺序的动作序列至终端)。例如:服务器设定为选取3个动作内容,则从眨眼、摇头、点头、张嘴、歪头、微笑等几个动作内容中选取3个动作内容,在打乱动作内容的顺序后,得到一个随机的动作序列。在本公开一实施例中动作序列为随机生成,即每次终端在发送检测请求后,所得到的动作序列极大概率不同,故降低了恶意软件提前录制图像序列的可能,进而提升了检测环境的安全性。承接上例,服务器也可选取随机个数的动作内容,并随机为每个动作内容排序,以进一步提高检测环境的安全性。例如:服务器设定为选取2至5个随机数个动作内容,则从眨眼、摇头、点头、张嘴、歪头、微笑等几个动作内容中随机选取随机数个动作内容,在打乱动作内容的顺序后,得到一个随机的动作序列。而后服务器可发送所述动作序列至所述终端。示例性地,终端在接收所述动作序列后,将动作序列提示给用户,例如:可通过语音或是文字的方式进行提示。而后用户遵照动作序列,开始通过终端录制图像序列,在录制完毕后,终端将录制后的图像序列发送至服务器。示例性地,终端也可限制并为用户提示图像序列的最大录制时长,以节约服务器算力。在一个示例中,终端可通过一网页与用户进行交互,以实现检测方法的轻量化。图像序列可以为视频,或连续拍摄的图像。In step S100, receive an image sequence sent by the terminal in response to the action sequence, where the image sequence includes multiple frame images. For example, before performing this step, the server may randomly select action content to generate an action sequence in response to a detection request sent by the terminal. In one example, the above action sequence includes multiple action contents, each action content indicates a facial action that the user needs to complete. The server can obtain the above action content through a preset action content library that stores multiple action contents. In one example, the server can randomly select a fixed number of action contents through the preset action content library and randomly sequence each action content (that is, the server can respond to each detection request sent by the terminal and randomly issue Different numbers and different sequences of action sequences to the terminal). For example, if the server is set to select 3 action contents, then 3 action contents are selected from several action contents such as blinking, shaking head, nodding, opening mouth, tilting head, smiling, etc., and after shuffling the order of the action contents, a random action sequence. In one embodiment of the present disclosure, the action sequence is randomly generated, that is, each time the terminal sends a detection request, the action sequence obtained is highly likely to be different. Therefore, the possibility of malware recording the image sequence in advance is reduced, thereby improving the detection environment. security. Following the above example, the server can also select a random number of action contents and randomly sequence each action content to further improve the security of the detection environment. For example, if the server is set to select 2 to 5 random action contents, then randomly select a random number of action contents from several action contents such as blinking, shaking head, nodding, opening mouth, tilting head, smiling, etc., and then disrupt the action contents. After the sequence, a random sequence of actions is obtained. The server can then send the sequence of actions to the terminal. For example, after receiving the action sequence, the terminal prompts the user with the action sequence. For example, the terminal may prompt the user through voice or text. Then the user follows the action sequence and starts recording the image sequence through the terminal. After the recording is completed, the terminal sends the recorded image sequence to the server. For example, the terminal can also limit and prompt the user for the maximum recording duration of the image sequence to save server computing power. In one example, the terminal can interact with the user through a web page to implement a lightweight detection method. The image sequence can be a video, or a sequence of images taken continuously.
在一种可能的实施方式中,终端可向服务器发送加密后的图像序列,以提高图像序列传输时的安全性,在此情况下,步骤S100可包括:对终端响应于动作序列所发送的图像序列进行解密,得到解密后的图像序列。而后基于解密后的图像序列执行可能包括的各个步骤。本公开一实施例通过为图像序列加密的方式,可以降低图像序列被其他恶意程序更改的风险,进而提高了图像序列在传输过程中的安全性。示例性地,图像序列可 采用逐帧加密的形式,以进一步增加图像序列在传输过程中的安全性。In a possible implementation, the terminal can send an encrypted image sequence to the server to improve the security of the image sequence transmission. In this case, step S100 can include: processing the images sent by the terminal in response to the action sequence. The sequence is decrypted to obtain the decrypted image sequence. Possible steps are then executed based on the decrypted image sequence. An embodiment of the present disclosure can reduce the risk of the image sequence being altered by other malicious programs by encrypting the image sequence, thereby improving the security of the image sequence during transmission. For example, the image sequence may be encrypted frame by frame to further increase the security of the image sequence during transmission.
继续参阅图1,在步骤S200中,依次获取所述动作序列中的一个动作内容,作为当前动作内容,并执行以下操作:Continuing to refer to Figure 1, in step S200, one action content in the action sequence is sequentially obtained as the current action content, and the following operations are performed:
在步骤S210中,确定所述当前动作内容对应的起始图像,依次确定图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分。上述动作评分可与图像中动作内容相对于当前动作内容的标准程度正相关,可通过机器学习模型获取。例如:若机器学习模型为二分类模型(即每帧输入的图像分类为‘是该动作内容’或‘不是该动作内容’),则机器学习模型在将一个图像中动作内容进行分类的过程中,首先会生成该输入图像的动作评分,而后在该输入图像的动作评分大于或等于一评分门限时,该输入图像被分类为‘是该动作内容’。换言之,本公开一实施例中所提及的动作评分可等于机器学习模型在分类过程中所使用的动作评分。示例性地,在确定所述当前动作内容为所述动作序列中的第一个动作内容的情况下,所述起始图像为所述图像序列的起始图像,例如,以第一帧图像作为图像序列的起始图像,或预先指定图像序列中的某一帧图像作为起始图像。在所述当前动作内容不是所述动作序列中的第一个动作内容的情况下,所述起始图像为与前一动作内容匹配成功的图像的下一帧图像。本公开一实施例通过设定起始图像的方式,一方面降低了动作内容匹配的计算量,另一方面起始图像可作为图像序列中的动作内容顺序的标记。例如:图像序列共20帧图像,动作序列依序包括:眨眼、张嘴、抬头,则服务器将第1帧图像作为眨眼动作的起始图像,并生成动作评分。若眨眼动作在第6帧匹配成功,则将第7帧作为张嘴动作的起始图像。若张嘴动作在第12帧匹配成功,则将第13帧作为抬头动作的起始图像。若抬头动作的起始图像在第15帧匹配成功,则不必再检测第16帧至20帧的图像,以节约服务器算力。而后将匹配成功的第6、12、15帧对应的动作内容,作为图像序列中动作内容的顺序。在一种可能的实施方式中,若图像序列经过终端设备进行加密,则步骤S210可包括:依次确定所述解密后的图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分。本公开一实施例通过为图像序列加密的方式,可以降低图像序列被其他恶意程序更改的风险,进而提高了图像序列在传输过程中的安全性。In step S210, the starting image corresponding to the current action content is determined, and the action score of at least one image after the starting image in the image sequence and the current action content is determined in sequence. The above action score can be positively related to the standard degree of the action content in the image relative to the current action content, and can be obtained through a machine learning model. For example: If the machine learning model is a two-classification model (that is, the image input in each frame is classified as 'is the action content' or 'is not the action content'), then the machine learning model is in the process of classifying the action content in an image. , first the action score of the input image is generated, and then when the action score of the input image is greater than or equal to a score threshold, the input image is classified as 'is the action content'. In other words, the action score mentioned in an embodiment of the present disclosure may be equal to the action score used by the machine learning model in the classification process. For example, when it is determined that the current action content is the first action content in the action sequence, the starting image is the starting image of the image sequence, for example, the first frame image is used as The starting image of the image sequence, or pre-specifying a certain frame of image in the image sequence as the starting image. If the current action content is not the first action content in the action sequence, the starting image is the next frame image of the image that successfully matches the previous action content. An embodiment of the present disclosure reduces the calculation amount of action content matching by setting a starting image. On the other hand, the starting image can be used as a marker for the order of action content in an image sequence. For example, if the image sequence has 20 frames in total, and the action sequence includes: blinking, opening the mouth, and raising the head, the server will use the first frame image as the starting image of the blinking action and generate an action score. If the blinking action is successfully matched in the 6th frame, the 7th frame will be used as the starting image of the mouth opening action. If the mouth-opening action is successfully matched at the 12th frame, the 13th frame will be used as the starting image for the head-raising action. If the starting image of the head-up movement is successfully matched at frame 15, there is no need to detect images from frames 16 to 20 to save server computing power. Then, the action content corresponding to the successfully matched frames 6, 12, and 15 is used as the order of the action content in the image sequence. In a possible implementation, if the image sequence is encrypted by the terminal device, step S210 may include: sequentially determining at least one image after the starting image in the decrypted image sequence and the action of the current action content. score. An embodiment of the present disclosure can reduce the risk of the image sequence being altered by other malicious programs by encrypting the image sequence, thereby improving the security of the image sequence during transmission.
在步骤S220中,根据任一图像的动作评分,确定所述当前动作内容对应的匹配结果。示例性地,当任一图像的动作评分大于第一评分阈值时,确定所述当前动作内容对应的匹配结果为匹配成功。服务器管理人员可根据实际情况设定上述第一评分阈值。示例性地,上述第一评分阈值越高,图像中对应的动作内容便需要越标准,最终确定的匹配结果也越准确,本公开实施例在此不限定第一评分阈值的具体数值。示例性地,在开始确定所述动作序列的匹配结果起的第一预设时间内,未得到所述动作序列匹配成功的匹配结果的情况下,和/或,在开始确定所述动作序列中任一动作内容的匹配结果起的第二预设时间内,未得到该动作内容匹配成功的匹配结果的情况下,确定所述图像序列与所述动作序列的匹配结果为匹配失败。例如:若第一预设时间若为20秒,服务器在20秒内未完成动作序列中每个动作内容的匹配,则服务器确定该图像序列匹配失败。若第二预设 时间若为5秒,服务器在5秒内未完成动作序列中某一个动作内容的匹配,也即图像序列中某一个动作内容匹配了5秒仍然未匹配成功,则服务器确定该图像序列匹配失败。通过设置以上条件,可进一步增加用户验证阶段的安全性,同时提高验证效率。上述第一预设时间与第二预设时间的具体数值,本公开实施例在此不作限制。In step S220, the matching result corresponding to the current action content is determined based on the action score of any image. For example, when the action score of any image is greater than the first score threshold, the matching result corresponding to the current action content is determined to be a successful match. The server administrator can set the above-mentioned first scoring threshold according to the actual situation. For example, the higher the above-mentioned first scoring threshold, the more standard the corresponding action content in the image needs to be, and the more accurate the final matching result will be. The embodiment of the present disclosure does not limit the specific value of the first scoring threshold here. For example, within the first preset time from when the matching result of the action sequence is determined, if the matching result of the action sequence is not obtained successfully, and/or, when the action sequence is determined from the start, the matching result is not obtained. If no matching result indicating that the action content is successfully matched is obtained within the second preset time from the matching result of any action content, it is determined that the matching result between the image sequence and the action sequence is a matching failure. For example: if the first preset time is 20 seconds and the server does not complete the matching of each action content in the action sequence within 20 seconds, the server determines that the image sequence matching fails. If the second preset time is 5 seconds and the server does not complete the matching of a certain action content in the action sequence within 5 seconds, that is, a certain action content in the image sequence is matched for 5 seconds and still does not match successfully, the server determines that the Image sequence matching failed. By setting the above conditions, the security of the user verification phase can be further increased and the verification efficiency can be improved. The specific values of the first preset time and the second preset time are not limited in this embodiment of the present disclosure.
在步骤S230中,根据动作序列中所有动作内容对应的匹配结果,确定所述图像序列与所述动作序列的匹配结果。示例性地,当动作序列中的所有动作内容对应的匹配结果均为匹配成功时,确定所述图像序列与所述动作序列的匹配结果为匹配成功。示例性地,上述图像序列与动作序列的匹配结果可通过机器学习模型确定。动作序列中的每一种动作内容对应了一种机器学习模型,服务器通过依次调用动作序列中动作内容所对应的机器学习模型,实现图像序列的匹配检测,本公开实施例在此并不限定机器学习模型的训练方式,每种机器学习模型能够检测出对应的动作内容即可。示例性地,机器学习模型的输入可为图像,输出为该机器学习模型对应的动作内容的匹配结果。例如:机器学习模型可通过提取图像中面部关键点(例如可通过以下算法提取:Active Shape Model、Active Appearance Models、级联姿势回归算法、时序动作检测算法等)之间的位置特征关系,确定动作内容是否匹配成功。示例性地,在本公开一实施例中,机器学习模型集成在算力较高的服务器中,而不是终端中,即本公开一实施例的检测方法可以使用运算更复杂,但是准确率更高的机器学习模型,例如:可以使用相关技术中带有连续图像匹配逻辑的机器学习模型,进而使得匹配结果更加准确。比如说:与‘张嘴’动作内容匹配成功的图像,其之前的图像嘴部关键点的纵向距离应当小于该张图像中嘴部关键点的纵向距离(即连续的多帧图像中用户面部经历了‘闭嘴状态’至‘张嘴状态’)。上述机器学习模型可参考相关技术,本公开实施例在此不作赘述。换言之,在本公开一实施例中,服务器可逐帧进行动作检测、活体检测,以引入前后帧的关联信息,以增加检测结果的准确率。示例性地,也可为检测过程中增加各类时间限制(后文将予以详述),以进一步增加用户使用环境的安全性。In step S230, the matching result between the image sequence and the action sequence is determined based on the matching results corresponding to all action contents in the action sequence. For example, when the matching results corresponding to all action contents in the action sequence are successful matches, it is determined that the matching results between the image sequence and the action sequence are successful matches. For example, the matching result of the above image sequence and action sequence can be determined through a machine learning model. Each action content in the action sequence corresponds to a machine learning model. The server implements matching detection of the image sequence by sequentially calling the machine learning model corresponding to the action content in the action sequence. The embodiments of the present disclosure do not limit the machine learning model here. The training method of the learning model is as long as each machine learning model can detect the corresponding action content. For example, the input of the machine learning model may be an image, and the output may be the matching result of the action content corresponding to the machine learning model. For example: the machine learning model can determine the action by extracting the positional feature relationship between facial key points in the image (for example, it can be extracted through the following algorithms: Active Shape Model, Active Appearance Models, cascade posture regression algorithm, temporal action detection algorithm, etc.) Whether the content matches successfully. Illustratively, in an embodiment of the present disclosure, the machine learning model is integrated in a server with higher computing power rather than in a terminal. That is, the detection method of an embodiment of the present disclosure can use more complex operations, but has a higher accuracy. For example, you can use a machine learning model with continuous image matching logic in related technologies to make the matching results more accurate. For example: for an image that successfully matches the 'open mouth' action content, the vertical distance of the key points of the mouth in the previous image should be smaller than the vertical distance of the key points of the mouth in the image (that is, the user's face has experienced 'Close state' to 'Open mouth state'). For the above machine learning model, reference may be made to related technologies, and the embodiments of the present disclosure will not be described in detail here. In other words, in an embodiment of the present disclosure, the server can perform motion detection and living body detection frame by frame to introduce related information of previous and subsequent frames to increase the accuracy of the detection results. For example, various time limits can also be added to the detection process (which will be described in detail later) to further increase the security of the user environment.
在一种可能的实施方式中,步骤S200可包括当确定所述动作序列中的动作内容与所述图像序列中检测到的动作内容一一匹配且顺序相同时,确定所述匹配结果为匹配成功。例如:动作序列依次包括:眨眼、摇头、张嘴,则图像序列中动作内容的顺序应当遵照眨眼、摇头、张嘴这一顺序。若图像序列中动作内容的顺序为眨眼、张嘴、摇头,则确定匹配结果为匹配失败,若图像序列中动作内容的顺序为眨眼、摇头,亦确定匹配结果为匹配失败。本公开一实施例通过检测动作内容数量以及顺序的方式,可以准确地确定用户的检测环境是否安全。In a possible implementation, step S200 may include determining that the matching result is a successful match when it is determined that the action content in the action sequence matches the action content detected in the image sequence one by one and in the same order. . For example, if the action sequence includes: blinking, shaking head, and opening mouth, the sequence of action content in the image sequence should follow the order of blinking, shaking head, and opening mouth. If the order of the action content in the image sequence is blinking, opening the mouth, and shaking the head, the matching result is determined to be a matching failure. If the order of the action content in the image sequence is blinking, shaking the head, the matching result is also determined to be a matching failure. An embodiment of the present disclosure can accurately determine whether the user's detection environment is safe by detecting the number and sequence of action content.
在一种可能的实施方式中,若出于节约算力、提高用户安全性的考虑,则上述确定所述图像序列与所述动作序列的匹配结果可包括:生成所述图像序列中的图像对应的面部区域坐标、面部编号中的至少一项;根据所述面部区域坐标、所述面部编号中的至少一项,以及所述图像序列、所述动作序列,确定所述匹配结果。In a possible implementation, for the sake of saving computing power and improving user security, determining the matching result between the image sequence and the action sequence may include: generating an image correspondence in the image sequence. The matching result is determined based on at least one of the facial area coordinates and the facial number, the image sequence, and the action sequence.
示例性地,面部区域坐标可通过相关技术中的面部区域提取模型获取,本公开实施 例在此不作限定,上述面部区域坐标用以指示所述图像序列中每个图像中用户的面部区域。在一个示例中,上述确定所述匹配结果可为确定所述图像序列的至少一个图像中,所述面部区域坐标指示的面部区域与所述动作序列的匹配结果,作为所述图像序列与所述动作序列的匹配结果。例如,可以面部区域,代替上文中的“图像”,执行步骤S200、S300,以得到检测结果。通过设定面部区域坐标的方式,可实现图像的局部匹配,进而降低了服务器的算力损耗。For example, the facial area coordinates can be obtained through the facial area extraction model in the related art. The embodiments of the present disclosure are not limited here. The above facial area coordinates are used to indicate the user's facial area in each image in the image sequence. In one example, determining the matching result may be determining the matching result between the facial area indicated by the facial area coordinates and the action sequence in at least one image of the image sequence, as the image sequence and the action sequence. The matching result of the action sequence. For example, steps S200 and S300 can be performed on the face area instead of the "image" mentioned above to obtain the detection result. By setting the coordinates of the facial area, local matching of images can be achieved, thereby reducing the computing power loss of the server.
上述面部编号用以区别不同面部特征的用户,可通过上述面部区域提取模型获取,以确保图像序列中的面部图像大体上出自于同一用户。换言之,若图像序列中包含了用户A、用户B的图像,则用户A的面部区域图像与用户B的面部区域图像,两者对应的面部编号不同。在一个示例中,在确定所述图像序列中数量最少的面部编号对应的图像的数量,大于第一阈值的情况下,确定所述匹配结果为匹配失败。例如:图像序列中包含了用户A的图像15帧、用户B的图像20帧,第一阈值为10帧,则服务器确定匹配结果为匹配失败(即15帧大于10帧),以降低出现用户A和用户B同时进行验证这一情况未被识别出的概率。若如上设置,服务器能够在保证用户验证环境安全的情况下,允许终端在采集图像序列时出现一定限度内的意外情况(例如:终端的摄像头采集到用户身后的人脸等)。The above facial number is used to distinguish users with different facial features, and can be obtained through the above facial region extraction model to ensure that the facial images in the image sequence are generally from the same user. In other words, if the image sequence includes images of user A and user B, then the facial area image of user A and the facial area image of user B have different corresponding face numbers. In one example, when it is determined that the number of images corresponding to the smallest facial number in the image sequence is greater than the first threshold, the matching result is determined to be a failed match. For example: the image sequence contains 15 frames of user A's images and 20 frames of user B's images. If the first threshold is 10 frames, the server determines that the matching result is a matching failure (that is, 15 frames is greater than 10 frames) to reduce the occurrence of user A. The probability that this situation is not recognized when verifying at the same time as user B. If set as above, the server can allow unexpected situations within certain limits when the terminal collects image sequences (for example: the terminal's camera collects the face behind the user, etc.) while ensuring the security of the user verification environment.
继续参阅图1,在步骤S300中,基于所述图像序列与所述动作序列的匹配结果,生成检测结果。Continuing to refer to FIG. 1 , in step S300 , a detection result is generated based on the matching result of the image sequence and the action sequence.
在一种可能的实现方式中,可基于所述匹配结果,结合活体检测结果,来生成最终的检测结果。In a possible implementation, the final detection result can be generated based on the matching result and the living body detection result.
参阅图2所示,图2示出根据本公开一实施例的检测方法的流程图。如图2所示,在一种可能的实施方式中,步骤S300可包括:Referring to FIG. 2 , FIG. 2 shows a flow chart of a detection method according to an embodiment of the present disclosure. As shown in Figure 2, in a possible implementation, step S300 may include:
在步骤S310中,在确定所述图像序列与所述动作序列的匹配结果为匹配成功的情况下,筛选出所述图像序列中的第一图像。在一个示例中,该步骤可为:筛选出所述图像序列中的预设数量的、动作评分大于或等于第二评分阈值的第一图像。本公开一实施例可将筛选图像序列中在动作评分较高的图像作为后续活体检测的图像,进而节约了服务器算力。此外动作评分较高的图像通常具备一定的代表性,故筛选图像对活体检测结果的准确率影响较小。In step S310, when it is determined that the matching result between the image sequence and the action sequence is a successful match, the first image in the image sequence is filtered out. In one example, this step may be: filtering out a preset number of first images in the image sequence whose action scores are greater than or equal to the second score threshold. An embodiment of the present disclosure can use the images with higher action scores in the filtered image sequence as images for subsequent life detection, thereby saving server computing power. In addition, images with higher action scores usually have a certain degree of representativeness, so filtering images has less impact on the accuracy of live detection results.
示例性地,上述第二评分阈值可小于或等于第一评分阈值,不同的动作内容对应的第一评分阈值、第二评分阈值也可不同。例如:若图像序列依序包括:图像A(评分为20)、图像B(评分为40)、图像C(评分为60)、图像D(评分为80)、图像E(评分为30)、图像F(评分为45)、图像G(评分为70)、图像H(评分为80),上述图像A至图像D所属于同一个动作内容(该动作内容对应的第一评分阈值为65),上述图像E至图像H所属于同一个动作内容(该动作内容对应的第一评分阈值为75),若第二评分阈值均为50,且预设数量为3,如按照选取到预设数量为止的原则,则图像C、D、G作为上述第一图像,即舍弃了图像H。若不考虑计算时长,也可获取全部的动作评分大于第二评分阈值的图像, 并舍弃评分最低几个的图像,即保留动作评分最高的预设数量的图像,以提高活体检测的准确率,如舍弃图像C。若第一个动作内容的第二评分阈值为30,第二个动作内容的第二评分阈值为40,且预设数量为6,则图像B、C、D、F、G、H作为上述第一图像。示例性地,上述预设数量即可以表示动作评分大于或等于第二评分阈值的第一图像的总数,也可以表示每个动作内容中动作评分大于或等于第二评分阈值的图像的个数。承接上例,若每个动作内容对应的预设数量为2,则图像B、C、D、F、G、H被筛选为图像B、C、F、G,在不考虑计算时长的情况下,也可筛选评分最高的图像,如图像C、D、G、H。For example, the above-mentioned second scoring threshold may be less than or equal to the first scoring threshold, and the first scoring threshold and the second scoring threshold corresponding to different action contents may also be different. For example: If the image sequence includes in sequence: image A (score of 20), image B (score of 40), image C (score of 60), image D (score of 80), image E (score of 30), image F (score of 45), image G (score of 70), image H (score of 80), the above images A to D belong to the same action content (the first rating threshold corresponding to the action content is 65), the above Images E to H belong to the same action content (the first scoring threshold corresponding to this action content is 75). If the second scoring thresholds are both 50 and the preset number is 3, if the preset number is selected according to In principle, images C, D, and G are used as the above-mentioned first images, that is, image H is discarded. If the calculation time is not considered, all images with action scores greater than the second score threshold can also be obtained, and the images with the lowest scores are discarded, that is, a preset number of images with the highest action scores are retained to improve the accuracy of live body detection. For example, discard image C. If the second scoring threshold of the first action content is 30, the second scoring threshold of the second action content is 40, and the preset number is 6, then images B, C, D, F, G, H are used as the above-mentioned third an image. For example, the above-mentioned preset number may represent the total number of first images whose action scores are greater than or equal to the second scoring threshold, or may represent the number of images in each action content whose action scores are greater than or equal to the second scoring threshold. Following the above example, if the preset number corresponding to each action content is 2, then images B, C, D, F, G, H are filtered into images B, C, F, G, without considering the calculation time. , you can also filter the images with the highest scores, such as images C, D, G, and H.
在步骤S320中,基于所述第一图像,生成活体检测结果。筛选后的第一图像不仅画面质量更高(也即更可能是活体),而且数量较图像序列更少,能够有效减少活体检测的计算时间。In step S320, a living body detection result is generated based on the first image. The filtered first image not only has a higher picture quality (that is, it is more likely to be a living body), but also has a smaller number than the image sequence, which can effectively reduce the calculation time of living body detection.
在一种可能的实施方式中,步骤S320可包括:基于所述第一图像,生成所述第一图像对应的活体检测子结果。将动作评分最高的第一图像作为第二图像。在确定所述第二图像对应的活体检测子结果为活体、且活体检测子结果为活体的第一图像的数量与所有第一图像的数量的比值大于或等于预设比值的情况下,确定活体检测结果为活体。在本公开一实施例中定义了如上的检测规则,即第二图像为活体,且检测结果为活体的图像占比大于或等于预设比值,服务器即确定图像序列的活体检测结果为活体。在实际拍摄场景中,用户在通过终端进行图像序列的拍摄时,有一定几率被外界因素干扰,如他人的面部被摄像机无意捕捉到、终端掉落等。故在以上情况中,图像序列中可能包含非活体图像,本公开一实施例通过以上检测规则,允许图像序列包含一定数量的非活体图像。但是非活体图像的数量若大于预设比值,则有较大可能为恶意检测,例如:其他人恶意制作账户所有人的面具,该面具若可贴合人脸,则其他人能够轻松完成账户所有人的各类动作检测,针对以上情况,本公开一实施例通过设置活体检测的方式,降低了上述情况能够通过检测的概率,进而提高了用户验证的安全性。上述预设比值可根据实际情况设定,本公开实施例在此不作限制。示例性地,预设比值越高,非活体图像的可占比率就越高,活体检测结果为活体的概率就越高。In a possible implementation, step S320 may include: based on the first image, generating a living body detection sub-result corresponding to the first image. The first image with the highest action score is used as the second image. When it is determined that the life detection sub-result corresponding to the second image is a living body, and the ratio of the number of first images in which the life detection sub-result is a living body and the number of all first images is greater than or equal to the preset ratio, it is determined that the living body The test results are for living organisms. In an embodiment of the present disclosure, the above detection rule is defined, that is, the second image is a living body, and the proportion of images with detection results that are living bodies is greater than or equal to the preset ratio, and the server determines that the living body detection results of the image sequence are living bodies. In actual shooting scenarios, when users shoot image sequences through the terminal, there is a certain chance that they will be interfered by external factors, such as other people's faces being accidentally captured by the camera, the terminal falling, etc. Therefore, in the above situation, the image sequence may contain non-living body images. An embodiment of the present disclosure allows the image sequence to contain a certain number of non-living body images through the above detection rules. However, if the number of non-living images is greater than the preset ratio, it is more likely to be malicious detection. For example, someone else maliciously makes a mask of the account owner. If the mask can fit the person's face, other people can easily complete the account ownership. For various human action detections, in view of the above situation, one embodiment of the present disclosure reduces the probability that the above situations can pass detection by setting up a living body detection method, thereby improving the security of user verification. The above preset ratio can be set according to actual conditions, and is not limited in the embodiments of the present disclosure. For example, the higher the preset ratio, the higher the proportion of non-living images that can be accounted for, and the higher the probability that the living body detection result is a living body.
示例性地,上述活体检测子结果可通过相关技术中的机器学习模型生成,上述机器学习模型可基于图像或图像中的人脸区域图像,生成活体检测子结果。例如:机器学习模型可提取活体与非活体的颜色纹理、非刚性运动变形、人脸材质、图像失真率等特征,生成活体检测子结果,本公开实施例在此不作赘述。Illustratively, the above-mentioned living body detection sub-results can be generated by a machine learning model in the related art. The above-mentioned machine learning model can generate the living body detection sub-results based on the image or the face region image in the image. For example, the machine learning model can extract features such as color texture, non-rigid motion deformation, face material, and image distortion rate of living and non-living bodies, and generate living body detection sub-results. The embodiments of the present disclosure will not be described in detail here.
在步骤S330中,基于所述活体检测结果,确定所述检测结果,其中,在确定所述活体检测结果为活体的情况下,所述检测结果为检测通过。也即,在图像序列与动作序列的匹配结果为成功,且活体检测结果为活体时,检测结果为检测通过,通过动作匹配与活体检测的结合,进一步提高了验证的准确性。此外,本公开一实施例通过使用动作、活体检测相结合的方式,降低了相关技术中使用静默活体检测的不安全性。In step S330, the detection result is determined based on the biological detection result, wherein, when the biological detection result is determined to be a living body, the detection result is a detection pass. That is, when the matching result between the image sequence and the action sequence is successful and the living body detection result is alive, the detection result is that the detection is passed. The combination of action matching and live body detection further improves the accuracy of the verification. In addition, one embodiment of the present disclosure reduces the unsafety of using silent life detection in related technologies by using a combination of motion and life detection.
在一种可能的实施方式中,上述检测方法还包括:发送所述检测结果至所述终端。示例性地,当检测结果为检测通过时,服务器允许终端进行进一步的操作(例如:输入 支付密码、账户密码更改、开放特定权限等),终端在接收到上述检测结果后,提示用户检测通过,可进行进一步的操作。示例性地,服务提供商可在所述检测结果生成后,通过服务器的接口获取本次检测结果,而后确定是否为终端提供相应的服务。即服务提供商可使用自己的服务器与本公开一实施例中的服务器实现各类服务的提供。In a possible implementation, the above detection method further includes: sending the detection result to the terminal. For example, when the test result is that the test has passed, the server allows the terminal to perform further operations (for example: entering payment password, changing account password, opening specific permissions, etc.). After receiving the above test result, the terminal prompts the user that the test has passed, Further operations are possible. For example, after the detection result is generated, the service provider can obtain the detection result through the interface of the server, and then determine whether to provide the corresponding service to the terminal. That is, the service provider can use its own server and the server in an embodiment of the present disclosure to provide various services.
示例性地,在确定所述检测结果为检测不通过的情况下,向终端发送第一指令,所述第一指令控制所述终端进入用于重新发送检测请求的页面。终端在接收到第一指令后,可进入用于重新发送检测请求的页面,并提示用户检测不通过,是否需要重新发送检测请求,该提示可持续一定时间,在用户通过终端重新发送检测请求的情况下,自步骤S100或其前置步骤重新执行本公开实施例的检测方法。在每次重试过程中,服务器可生成不同的动作序列,以降低恶意软件预先生成图像序列通过检测的可能性,进而提高用户使用环境的安全性。For example, when it is determined that the detection result is that the detection fails, a first instruction is sent to the terminal, and the first instruction controls the terminal to enter a page for resending the detection request. After receiving the first instruction, the terminal can enter the page for re-sending the detection request and prompt the user whether the detection fails and whether the detection request needs to be re-sent. This prompt can last for a certain period of time until the user re-sends the detection request through the terminal. In this case, the detection method of the embodiment of the present disclosure is re-executed from step S100 or its preceding steps. During each retry process, the server can generate different action sequences to reduce the possibility of pre-generated image sequences by malware passing detection, thereby improving the security of the user environment.
在一个示例中,在确定第三预设时间内向所述终端发送所述第一指令的次数达到第二阈值的情况下,响应于所述终端通过所述页面发送的检测请求,向所述终端发送第二指令,所述第二指令用以通知所述终端,所述服务器拒绝所述检测请求。终端在接收到第二指令后,提示用户检测不通过,服务器拒绝终端再次通过上述用于重试的页面发起重试。相应地,如果所述终端发送第一指令的次数未达到第二阈值,但第三预设时间已经达到,则终端可不再展示用于重新发送检测请求的页面,即用户无法再通过该页面发送检测请求。In one example, when it is determined that the number of times the first instruction is sent to the terminal within the third preset time reaches a second threshold, in response to a detection request sent by the terminal through the page, the terminal is Send a second instruction, where the second instruction is used to notify the terminal that the server rejects the detection request. After receiving the second instruction, the terminal prompts the user that the test fails, and the server refuses the terminal to initiate a retry through the above-mentioned retry page. Correspondingly, if the number of times the terminal sends the first instruction does not reach the second threshold, but the third preset time has been reached, the terminal may no longer display the page for resending the detection request, that is, the user can no longer send the detection request through this page. Detection request.
上述第三预设时间可自此次整体检测流程中终端第一次进行检测请求开始计算。例如:若上述第三预设时间为10分钟,则自用户在终端中打开网页,并发送第一次检测请求时开始计时,当计时超过10分钟时,用户无法在该用于重试的页面内再次提交检测请求。若上述第二阈值为5次,且服务器10分钟内发出第一指令的次数达到5次,则意味着用户已重试5次且均检测不通过,服务器将拒绝终端在该页面中后续发送的检测请求。示例性地,上述终端发起重试时所发送的检测请求可带有请求标识,例如:上述请求标识可为累加的请求标识,每次发起重试将在请求标识上加1,服务器可通过获取终端发送的检测请求中的请求标识,确定该检测请求来自上述页面,属于重试过程中的检测请求,以及确定终端的重试次数(也即上述服务器发送第一指令的次数)。The above-mentioned third preset time can be calculated from the first time the terminal makes a detection request in the overall detection process. For example: If the above third preset time is 10 minutes, the time will start when the user opens the web page in the terminal and sends the first detection request. When the time exceeds 10 minutes, the user will not be able to retry on the page. Submit the detection request again within 12 days. If the above second threshold is 5 times, and the number of times the server issues the first command reaches 5 times within 10 minutes, it means that the user has retried 5 times and failed the test, and the server will reject subsequent requests sent by the terminal on this page. Detection request. For example, the detection request sent by the terminal when initiating a retry may carry a request identifier. For example, the request identifier may be an accumulated request identifier. Each time a retry is initiated, 1 will be added to the request identifier. The server can obtain The request identifier in the detection request sent by the terminal determines that the detection request comes from the above-mentioned page, is a detection request in the retry process, and determines the number of retries of the terminal (that is, the number of times the above-mentioned server sends the first instruction).
若如上设置,则可增加攻击者破解本公开实施例提供的检测方法的成本。例如:在第一次提交检测请求10分钟后或重试5次后,攻击者便无法通过同一个网页(例如上述用于重新发送检测请求的页面)再次提交检测请求,攻击者若想继续体验上述检测方法,以尝试破解上述检测方法,则需要再次打开一个新的网页。若攻击者打开该网页的次数过多,则该终端对应的IP地址会存在多次访问该网页的记录,终端的所有者或所有单位便可通过相关技术中的安全检测手段,及时发现该终端在执行恶意操作,进而增加了终端执行恶意操作时被发现的概率,也即提高了攻击者的破解成本。If set as above, the cost for an attacker to crack the detection method provided by the embodiments of the present disclosure may be increased. For example: 10 minutes after submitting the detection request for the first time or after 5 retries, the attacker will not be able to submit the detection request again through the same web page (such as the page used to resend the detection request above). If the attacker wants to continue to experience To try to crack the above detection method, you need to open a new web page again. If an attacker opens the webpage too many times, the IP address corresponding to the terminal will have records of multiple visits to the webpage. The owner of the terminal or all units can discover the terminal in time through security detection methods in related technologies. When performing malicious operations, it increases the probability of being discovered when the terminal performs malicious operations, which also increases the attacker's cracking cost.
在一个示例中,在确定自发送第一指令至接收到新的图像序列的时间大于第四预设时间的情况下,向所述终端发送第二指令。例如:第四预设时间可为1分钟,即在1分钟 内用户便需要完成图像序列的录制,以缩短攻击者恶意使用视频剪辑软件,生成合成图像序列的可用时间,进而降低了攻击者使用合成图像序列的可能性,进一步增加了用户检测环境的安全性。In one example, if it is determined that the time from sending the first instruction to receiving the new image sequence is greater than a fourth preset time, the second instruction is sent to the terminal. For example: the fourth preset time can be 1 minute, that is, the user needs to complete the recording of the image sequence within 1 minute, so as to shorten the available time for attackers to maliciously use video editing software to generate synthetic image sequences, thereby reducing the attacker's use of The possibility of synthesizing image sequences further increases the security of the user's detection environment.
本公开实施例通过制定了检测请求的重试规则,缩短了攻击者准备合成图像序列的时间,进而可增加用户检测环境的安全性。By formulating retry rules for detection requests, the disclosed embodiments shorten the time for an attacker to prepare a synthetic image sequence, thereby increasing the security of the user's detection environment.
本公开实施例在此不限制上述第三预设时间、第四预设时间、第二阈值的具体数值,服务提供商可根据实际需求确定其具体数据。The embodiments of the present disclosure do not limit the specific values of the above-mentioned third preset time, fourth preset time, and second threshold, and the service provider can determine the specific data according to actual needs.
结合实际应用场景,用户可通过终端(如手机或电脑)上显示的H5界面进入线上金融场景(或其他任何需要用户进行身份验证的场景),在进行身份验证时,服务器可基于动作内容库(如面部动作、头部动作等)生成一段数量、内容随机的动作序列,而后发送至终端中。终端通过H5界面将其显示给用户查阅,用户根据该动作序列作出相应的动作以供终端录制。录制完毕后(例如:摄像头检测到某一特定动作或用户手动点击相应按钮),终端将录制的图像序列发送至服务器。服务器依照动作序列中的动作内容,对图像序列中的动作内容进行评分,直至某一个图像中的动作内容评分合格,则可开始动作序列中下一个动作内容的评分,直至图像序列全部评分完成或动作序列中动作内容已被全部完成(例如:动作序列中的最后一个动作内容的评分高于阈值,则可视为全部完成)。而当动作序列中的动作内容视为全部完成后,可开始对图像序列中评分高于一定阈值的图像进行活体检测。若活体检测也通过,则可视为用户的使用环境是安全的,允许用户作出一些敏感操作,例如:转账、提现等。若动作序列的检测不通过或活体检测不通过,则可在H5界面中弹出提示窗口,以提醒用户需进行重试,若重试多次仍未通过,则对用户的账户进行限制(例如:冻结资金)。线上金融功能的服务提供商也可调用账户的重试次数信息,若重试次数过多,服务提供商便可得知账户可能存在安全隐患,可发送提示信息至账户绑定的手机中。可以理解,本公开提及的上述各个方法实施例,在不违背原理逻辑的情况下,均可以彼此相互结合形成结合后的实施例,限于篇幅,本公开不再赘述。本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。此外,方法步骤的执行主体可以为硬件执行,或者通过处理器运行计算机可执行代码的方式执行。Combined with actual application scenarios, users can enter online financial scenarios (or any other scenarios that require users to authenticate) through the H5 interface displayed on the terminal (such as a mobile phone or computer). When authenticating, the server can use the action content library based on (such as facial movements, head movements, etc.) to generate a sequence of actions with a random number and content, and then send it to the terminal. The terminal displays it to the user through the H5 interface, and the user takes corresponding actions based on the action sequence for the terminal to record. After the recording is completed (for example, the camera detects a specific action or the user manually clicks the corresponding button), the terminal sends the recorded image sequence to the server. The server scores the action content in the image sequence according to the action content in the action sequence. Until the action content in a certain image is rated qualified, it can start scoring the next action content in the action sequence until all scoring of the image sequence is completed or The action content in the action sequence has been fully completed (for example, if the score of the last action content in the action sequence is higher than the threshold, it can be regarded as fully completed). When the action content in the action sequence is deemed to be complete, live body detection can be started on the images in the image sequence whose scores are higher than a certain threshold. If the liveness test also passes, the user's usage environment can be considered safe and the user is allowed to perform some sensitive operations, such as transfers, cash withdrawals, etc. If the action sequence fails to pass the detection or the liveness detection fails, a prompt window can pop up in the H5 interface to remind the user to retry. If the action sequence fails to pass after multiple retries, the user's account will be restricted (for example: freeze funds). The service provider of online financial functions can also call the account's retry number information. If the number of retries is too many, the service provider will know that the account may have security risks, and can send prompt information to the mobile phone bound to the account. It can be understood that the above-mentioned method embodiments mentioned in this disclosure can be combined with each other to form a combined embodiment without violating the principle logic. Due to space limitations, the details will not be described in this disclosure. Those skilled in the art can understand that in the above-mentioned methods of specific embodiments, the specific execution order of each step should be determined by its function and possible internal logic. In addition, the execution body of the method steps may be executed by hardware, or executed by a processor running computer executable code.
此外,本公开还提供了检测装置、电子设备、计算机可读存储介质、程序,上述均可用来实现本公开提供的任一种检测方法,相应技术方案和描述和参见方法部分的相应记载,不再赘述。In addition, the disclosure also provides detection devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any detection method provided by the disclosure. For corresponding technical solutions and descriptions, please refer to the corresponding records in the method section. Again.
参阅图3所示,图3示出根据本公开实施例的检测装置的框图。如图3所示,在一种可能的实施方式中,本公开一实施例还提供了一种检测装置100,应用于服务器,所述检测装置包括:图像序列接收模块110,用以接收终端响应于所述动作序列所发送的图像序列,所述图像序列中包括多帧图像;动作内容处理模块120,用以依次获取所述动作序列中的一个动作内容,作为当前动作内容,并执行以下操作:确定所述当前动作内容对应的起始图像,依次确定图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评 分;根据任一图像的动作评分,确定所述当前动作内容对应的匹配结果;根据动作序列中所有动作内容对应的匹配结果,确定所述图像序列与所述动作序列的匹配结果;检测结果生成模块130,用以基于所述图像序列与所述动作序列的匹配结果,生成检测结果。Referring to FIG. 3 , FIG. 3 shows a block diagram of a detection device according to an embodiment of the present disclosure. As shown in Figure 3, in a possible implementation, an embodiment of the present disclosure also provides a detection device 100, which is applied to a server. The detection device includes: an image sequence receiving module 110 to receive a terminal response. In the image sequence sent in the action sequence, the image sequence includes multiple frames of images; the action content processing module 120 is used to sequentially obtain one action content in the action sequence as the current action content, and perform the following operations : Determine the starting image corresponding to the current action content, and sequentially determine the action score of at least one image after the start image in the image sequence and the current action content; determine the current action content based on the action score of any image Corresponding matching results; determine the matching results between the image sequence and the action sequence based on the matching results corresponding to all action contents in the action sequence; the detection result generation module 130 is used to determine the matching result between the image sequence and the action sequence based on Match the results and generate detection results.
在一种可能的实施方式中,所述确定所述当前动作内容对应的起始图像,包括:在确定所述当前动作内容为所述动作序列中的第一个动作内容的情况下,所述起始图像为所述图像序列的起始图像,在确定所述当前动作内容不是所述动作序列中的第一个动作内容的情况下,所述起始图像为与前一动作内容匹配成功的图像的下一帧图像。In a possible implementation, determining the starting image corresponding to the current action content includes: when determining that the current action content is the first action content in the action sequence, The starting image is the starting image of the image sequence. When it is determined that the current action content is not the first action content in the action sequence, the starting image is the one that successfully matches the previous action content. The next frame of the image.
在一种可能的实施方式中,所述确定所述图像序列与所述动作序列的匹配结果,包括:在开始确定所述动作序列的匹配结果起的第一预设时间内,未得到所述动作序列匹配成功的匹配结果的情况下,和/或,在开始确定所述动作序列中任一动作内容的匹配结果起的第二预设时间内,未得到该动作内容匹配成功的匹配结果的情况下,确定所述图像序列与所述动作序列的匹配结果为匹配失败。In a possible implementation, determining the matching result of the image sequence and the action sequence includes: within the first preset time since determining the matching result of the action sequence, the step of determining the matching result of the action sequence is not obtained. In the case of a successful matching result of the action sequence, and/or, within the second preset time since the determination of the matching result of any action content in the action sequence, no matching result of a successful match of the action content is obtained. In this case, it is determined that the matching result between the image sequence and the action sequence is a matching failure.
在一种可能的实施方式中,所述基于所述图像序列与所述动作序列的匹配结果,生成检测结果,包括:在确定所述图像序列与所述动作序列的匹配结果为匹配成功的情况下,筛选出所述图像序列中的第一图像;基于所述第一图像,生成活体检测结果;基于所述活体检测结果,确定所述检测结果,其中,在确定所述活体检测结果为活体的情况下,所述检测结果为检测通过。In a possible implementation, generating a detection result based on a matching result between the image sequence and the action sequence includes: determining that the matching result between the image sequence and the action sequence is a successful match. Next, filter out the first image in the image sequence; generate a living body detection result based on the first image; determine the detection result based on the living body detection result, wherein, after determining that the living body detection result is a living body In the case of , the test result is that the test passed.
在一种可能的实施方式中,所述筛选出所述图像序列中的第一图像,包括:筛选出所述图像序列中的预设数量的、动作评分大于或等于第二评分阈值的第一图像。In a possible implementation, filtering out the first image in the image sequence includes: filtering out a preset number of first images in the image sequence whose action scores are greater than or equal to a second score threshold. image.
在一种可能的实施方式中,所述基于所述第一图像,生成活体检测结果,包括:基于所述第一图像,生成所述第一图像对应的活体检测子结果;将动作评分最高的第一图像作为第二图像;在确定所述第二图像对应的活体检测子结果为活体、且活体检测子结果为活体的第一图像的数量与所有第一图像的数量的比值大于或等于预设比值的情况下,确定活体检测结果为活体。In a possible implementation, generating a living body detection result based on the first image includes: generating a living body detection sub-result corresponding to the first image based on the first image; and selecting the action with the highest score The first image is used as the second image; when it is determined that the life detection sub-result corresponding to the second image is a living body, and the ratio of the number of first images with the life detection sub-result being a living body and the number of all first images is greater than or equal to the predetermined In the case of a ratio, it is determined that the living body detection result is a living body.
在一种可能的实施方式中,所述接收终端响应于动作序列所发送的图像序列,包括:对终端响应于动作序列所发送的图像序列进行解密,得到解密后的图像序列;所述依次确定图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分,包括:依次确定所述解密后的图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分。In a possible implementation, the receiving the image sequence sent by the terminal in response to the action sequence includes: decrypting the image sequence sent by the terminal in response to the action sequence to obtain a decrypted image sequence; the sequentially determining The action score of at least one image after the starting image in the image sequence and the current action content includes: sequentially determining the action score of at least one image after the start image in the decrypted image sequence and the current action content. .
在一种可能的实施方式中,所述确定所述图像序列与所述动作序列的匹配结果,还包括:生成所述图像序列中的图像对应的面部区域坐标、面部编号中的至少一项;根据所述面部区域坐标、所述面部编号中的至少一项,以及所述图像序列、所述动作序列,确定所述匹配结果。In a possible implementation, determining the matching result between the image sequence and the action sequence further includes: generating at least one of facial area coordinates and facial numbers corresponding to images in the image sequence; The matching result is determined based on at least one of the facial area coordinates and the facial number, the image sequence, and the action sequence.
在一种可能的实施方式中,所述根据所述面部区域坐标、所述面部编号中的至少一项,以及所述图像序列、所述动作序列,确定所述匹配结果,包括:确定所述图像序列的至少一个图像中,所述面部区域坐标指示的面部区域与所述动作序列的匹配结果,作 为所述图像序列与所述动作序列的匹配结果。In a possible implementation, determining the matching result based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence includes: determining the In at least one image of the image sequence, the matching result between the facial area indicated by the facial area coordinates and the action sequence is used as the matching result between the image sequence and the action sequence.
在一种可能的实施方式中,所述根据所述面部区域坐标、所述面部编号中的至少一项,以及所述图像序列、所述动作序列,确定所述匹配结果,还包括:在确定所述图像序列中数量最少的面部编号对应的图像的数量,大于第一阈值的情况下,确定所述匹配结果为匹配失败。In a possible implementation, determining the matching result based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence further includes: determining If the number of images corresponding to the smallest facial number in the image sequence is greater than the first threshold, the matching result is determined to be a failed match.
在一种可能的实施方式中,所述检测装置还用以执行以下至少一项:在确定所述检测结果为检测不通过的情况下,向终端发送第一指令,所述第一指令控制所述终端进入用于重新发送检测请求的页面;在确定第三预设时间内向所述终端发送第一指令的次数达到第二阈值的情况下,响应于所述终端通过所述页面发送的检测请求,向所述终端发送第二指令,所述第二指令用以通知所述终端,所述服务器拒绝所述检测请求;在确定自发送第一指令至接收到新的图像序列的时间大于第四预设时间的情况下,向所述终端发送第二指令。In a possible implementation, the detection device is further configured to perform at least one of the following: when it is determined that the detection result is a failed detection, send a first instruction to the terminal, and the first instruction controls all The terminal enters a page for resending a detection request; and when the number of times the first instruction is sent to the terminal reaches a second threshold within a third preset time, responds to the detection request sent by the terminal through the page. , sending a second instruction to the terminal, the second instruction being used to notify the terminal that the server rejects the detection request; when it is determined that the time from sending the first instruction to receiving the new image sequence is greater than the fourth In the case of preset time, send the second instruction to the terminal.
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。In some embodiments, the functions or modules provided by the device provided by the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments. For specific implementation, refer to the description of the above method embodiments. For the sake of brevity, here No longer.
本公开一实施例还提出一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。计算机可读存储介质可以是易失性或非易失性计算机可读存储介质。An embodiment of the present disclosure also provides a computer-readable storage medium on which computer program instructions are stored. When the computer program instructions are executed by a processor, the above method is implemented. Computer-readable storage media may be volatile or non-volatile computer-readable storage media.
本公开一实施例还提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行上述方法。An embodiment of the present disclosure also provides an electronic device, including: a processor; and a memory for storing instructions executable by the processor; wherein the processor is configured to call instructions stored in the memory to execute the above method.
本公开一实施例还提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述方法。An embodiment of the present disclosure also provides a computer program product, which includes computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code. When the computer readable code is processed by an electronic device When running in the processor, the processor in the electronic device executes the above method.
图4示出根据本公开一实施例的一种电子设备1900的框图。例如,电子设备1900可以被提供为一服务器。参照图4,电子设备1900包括处理组件1922,其进一步包括一个或多个处理器,以及由存储器1932所代表的存储器资源,用于存储可由处理组件1922的执行的指令,例如应用程序。存储器1932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1922被配置为执行指令,以执行上述方法。FIG. 4 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure. For example, electronic device 1900 may be provided as a server. Referring to FIG. 4 , electronic device 1900 includes a processing component 1922 , which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions, such as application programs, executable by processing component 1922 . The application program stored in memory 1932 may include one or more modules, each corresponding to a set of instructions. Furthermore, the processing component 1922 is configured to execute instructions to perform the above-described method.
电子设备1900还可以包括一个电源组件1926被配置为执行电子设备1900的电源管理,一个有线或无线网络接口1950被配置为将电子设备1900连接到网络,和一个输入输出(I/O)接口1958。电子设备1900可以操作基于存储在存储器1932的操作系统,例如微软服务器操作系统(Windows Server TM),苹果公司推出的基于图形用户界面操作系统(Mac OS X TM),多用户多进程的计算机操作系统(Unix TM),自由和开放原代码的类Unix操作系统(Linux TM),开放原代码的类Unix操作系统(FreeBSD TM)或类似。 Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input-output (I/O) interface 1958 . The electronic device 1900 can operate based on an operating system stored in the memory 1932, such as a Microsoft server operating system (Windows Server TM ), a graphical user interface operating system (Mac OS X TM ) launched by Apple, a multi-user multi-process computer operating system (Unix TM ), a free and open source Unix-like operating system (Linux TM ), an open source Unix-like operating system (FreeBSD TM ), or similar.
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器1932,上述计算机程序指令可由电子设备1900的处理组件1922执行以 完成上述方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the above method.
上述电子设备可以被提供为终端、服务器或其它形态的设备。The above-mentioned electronic device may be provided as a terminal, a server, or other forms of equipment.
本公开可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本公开的各个方面的计算机可读程序指令。The present disclosure may be a system, method, and/or computer program product. A computer program product may include a computer-readable storage medium having thereon computer-readable program instructions for causing a processor to implement aspects of the present disclosure.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是(但不限于)电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) or Flash memory), Static Random Access Memory (SRAM), Compact Disk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Floppy Disk, Mechanical Coding Device, such as a printer with instructions stored on it. Protruding structures in hole cards or grooves, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。Computer program instructions for performing operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages. Source code or object code written in any combination of object-oriented programming languages - such as Smalltalk, C++, etc., and conventional procedural programming languages - such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement. In situations involving remote computers, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect). In some embodiments, by utilizing state information of computer-readable program instructions to personalize an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), the electronic circuit can Computer readable program instructions are executed to implement various aspects of the disclosure.
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处 理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that embody one or more elements for implementing the specified logical function(s). Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.
该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等。The computer program product can be implemented specifically through hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium. In another optional embodiment, the computer program product is embodied as a software product, such as a Software Development Kit (SDK), etc. .
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。The embodiments of the present disclosure have been described above. The above description is illustrative, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles of the embodiments, practical applications, or improvements to the technology in the market, or to enable other persons of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (15)

  1. 一种检测方法,应用于服务器,其特征在于,所述检测方法包括:A detection method applied to a server, characterized in that the detection method includes:
    接收终端响应于动作序列所发送的图像序列,所述图像序列中包括多帧图像;receiving an image sequence sent by the terminal in response to the action sequence, the image sequence including multiple frame images;
    依次获取所述动作序列中的一个动作内容,作为当前动作内容,并执行以下操作:确定所述当前动作内容对应的起始图像,依次确定图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分;根据任一图像的动作评分,确定所述当前动作内容对应的匹配结果;根据动作序列中所有动作内容对应的匹配结果,确定所述图像序列与所述动作序列的匹配结果;Acquire one action content in the action sequence in sequence as the current action content, and perform the following operations: determine the starting image corresponding to the current action content, and determine in sequence at least one image after the starting image in the image sequence and The action score of the current action content; based on the action score of any image, determine the matching result corresponding to the current action content; determine the matching result between the image sequence and the action sequence based on the matching results corresponding to all action content in the action sequence result;
    基于所述图像序列与所述动作序列的匹配结果,生成检测结果。Based on the matching result of the image sequence and the action sequence, a detection result is generated.
  2. 如权利要求1所述的检测方法,其特征在于,所述确定所述当前动作内容对应的起始图像,包括:在确定所述当前动作内容为所述动作序列中的第一个动作内容的情况下,所述起始图像为所述图像序列的起始图像,在确定所述当前动作内容不是所述动作序列中的第一个动作内容的情况下,所述起始图像为与前一动作内容匹配成功的图像的下一帧图像。The detection method according to claim 1, wherein determining the starting image corresponding to the current action content includes: determining that the current action content is the first action content in the action sequence. In this case, the starting image is the starting image of the image sequence. When it is determined that the current action content is not the first action content in the action sequence, the starting image is the same as the previous one. The next frame of the image whose action content is successfully matched.
  3. 如权利要求1或2所述的检测方法,其特征在于,所述确定所述图像序列与所述动作序列的匹配结果,包括:The detection method according to claim 1 or 2, wherein determining the matching result between the image sequence and the action sequence includes:
    在开始确定所述动作序列的匹配结果起的第一预设时间内,未得到所述动作序列匹配成功的匹配结果的情况下,和/或,在开始确定所述动作序列中任一动作内容的匹配结果起的第二预设时间内,未得到该动作内容匹配成功的匹配结果的情况下,确定所述图像序列与所述动作序列的匹配结果为匹配失败。If within the first preset time from the start of determining the matching result of the action sequence, a successful matching result of the action sequence is not obtained, and/or, after the start of determining the content of any action in the action sequence If no matching result indicating that the action content is successfully matched is obtained within the second preset time from the matching result, it is determined that the matching result between the image sequence and the action sequence is a matching failure.
  4. 如权利要求1至3中任一项所述的检测方法,其特征在于,所述基于所述图像序列与所述动作序列的匹配结果,生成检测结果,包括:The detection method according to any one of claims 1 to 3, wherein generating a detection result based on the matching result of the image sequence and the action sequence includes:
    在确定所述图像序列与所述动作序列的匹配结果为匹配成功的情况下,筛选出所述图像序列中的第一图像;When it is determined that the matching result between the image sequence and the action sequence is a successful match, filter out the first image in the image sequence;
    基于所述第一图像,生成活体检测结果;Based on the first image, generate a living body detection result;
    基于所述活体检测结果,确定所述检测结果,其中,在确定所述活体检测结果为活体的情况下,所述检测结果为检测通过。The detection result is determined based on the vitality detection result, wherein, when the vitality detection result is determined to be a living body, the detection result is a detection pass.
  5. 如权利要求4所述的检测方法,其特征在于,所述筛选出所述图像序列中的第一图像,包括:The detection method according to claim 4, wherein filtering out the first image in the image sequence includes:
    筛选出所述图像序列中的预设数量的、动作评分大于或等于第二评分阈值的第一图像。Filter out a preset number of first images in the image sequence whose action scores are greater than or equal to the second score threshold.
  6. 如权利要求4或5所述的检测方法,其特征在于,所述基于所述第一图像,生成活 体检测结果,包括:The detection method according to claim 4 or 5, wherein generating a living body detection result based on the first image includes:
    基于所述第一图像,生成所述第一图像对应的活体检测子结果;Based on the first image, generate a living body detection sub-result corresponding to the first image;
    将动作评分最高的第一图像作为第二图像;Use the first image with the highest action score as the second image;
    在确定所述第二图像对应的活体检测子结果为活体、且活体检测子结果为活体的第一图像的数量与所有第一图像的数量的比值大于或等于预设比值的情况下,确定活体检测结果为活体。When it is determined that the life detection sub-result corresponding to the second image is a living body, and the ratio of the number of first images in which the life detection sub-result is a living body and the number of all first images is greater than or equal to the preset ratio, it is determined that the living body The test results are for living organisms.
  7. 如权利要求1至6中任一项所述的检测方法,其特征在于,所述接收终端响应于动作序列所发送的图像序列,包括:The detection method according to any one of claims 1 to 6, characterized in that the image sequence sent by the receiving terminal in response to the action sequence includes:
    对终端响应于动作序列所发送的图像序列进行解密,得到解密后的图像序列;Decrypt the image sequence sent by the terminal in response to the action sequence to obtain the decrypted image sequence;
    所述依次确定图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分,包括:依次确定所述解密后的图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分。The step of sequentially determining the action score of at least one image after the starting image and the current action content in the image sequence includes: sequentially determining the action score of at least one image after the starting image and the current action in the decrypted image sequence. Action rating for content.
  8. 如权利要求1至7中任一项所述的检测方法,其特征在于,所述确定所述图像序列与所述动作序列的匹配结果,还包括:The detection method according to any one of claims 1 to 7, wherein determining the matching result between the image sequence and the action sequence further includes:
    生成所述图像序列中的图像对应的面部区域坐标、面部编号中的至少一项;Generate at least one of facial area coordinates and facial number corresponding to the image in the image sequence;
    根据所述面部区域坐标、所述面部编号中的至少一项,以及所述图像序列、所述动作序列,确定所述匹配结果。The matching result is determined based on at least one of the facial area coordinates and the facial number, the image sequence, and the action sequence.
  9. 如权利要求8所述的检测方法,其特征在于,所述根据所述面部区域坐标、所述面部编号中的至少一项,以及所述图像序列、所述动作序列,确定所述匹配结果,包括:The detection method according to claim 8, wherein the matching result is determined based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence, include:
    确定所述图像序列的至少一个图像中,所述面部区域坐标指示的面部区域与所述动作序列的匹配结果,作为所述图像序列与所述动作序列的匹配结果。The matching result between the facial area indicated by the facial area coordinates and the action sequence in at least one image of the image sequence is determined as the matching result between the image sequence and the action sequence.
  10. 如权利要求8或9所述的检测方法,其特征在于,所述根据所述面部区域坐标、所述面部编号中的至少一项,以及所述图像序列、所述动作序列,确定所述匹配结果,还包括:The detection method according to claim 8 or 9, wherein the matching is determined based on at least one of the facial area coordinates, the facial number, the image sequence, and the action sequence. The results also include:
    在确定所述图像序列中数量最少的面部编号对应的图像的数量,大于第一阈值的情况下,确定所述匹配结果为匹配失败。When it is determined that the number of images corresponding to the smallest facial number in the image sequence is greater than the first threshold, the matching result is determined to be a matching failure.
  11. 如权利要求1至10中任一项所述的检测方法,其特征在于,所述检测方法还包括以下至少一项:The detection method according to any one of claims 1 to 10, characterized in that the detection method further includes at least one of the following:
    在确定所述检测结果为检测不通过的情况下,向终端发送第一指令,所述第一指令控制所述终端进入用于重新发送检测请求的页面;When it is determined that the detection result is that the detection fails, a first instruction is sent to the terminal, and the first instruction controls the terminal to enter a page for resending the detection request;
    在确定第三预设时间内向所述终端发送第一指令的次数达到第二阈值的情况下,响 应于所述终端通过所述页面发送的检测请求,向所述终端发送第二指令,所述第二指令用以通知所述终端,所述服务器拒绝所述检测请求;When it is determined that the number of times the first instruction is sent to the terminal within the third preset time reaches the second threshold, in response to the detection request sent by the terminal through the page, a second instruction is sent to the terminal, the The second instruction is used to notify the terminal that the server rejects the detection request;
    在确定自发送第一指令至接收到新的图像序列的时间大于第四预设时间的情况下,向所述终端发送第二指令。If it is determined that the time from sending the first instruction to receiving the new image sequence is greater than the fourth preset time, a second instruction is sent to the terminal.
  12. 一种面部检测装置,应用于服务器,其特征在于,所述检测装置包括:A face detection device applied to a server, characterized in that the detection device includes:
    图像序列接收模块,用以接收终端响应于动作序列所发送的图像序列,所述图像序列中包括多帧图像;An image sequence receiving module, configured to receive an image sequence sent by the terminal in response to the action sequence, where the image sequence includes multiple frames of images;
    动作内容处理模块,用以依次获取所述动作序列中的一个动作内容,作为当前动作内容,并执行以下操作:确定所述当前动作内容对应的起始图像,依次确定图像序列中所述起始图像后的至少一个图像与当前动作内容的动作评分;根据任一图像的动作评分,确定所述当前动作内容对应的匹配结果;根据动作序列中所有动作内容对应的匹配结果,确定所述图像序列与所述动作序列的匹配结果;The action content processing module is used to sequentially obtain an action content in the action sequence as the current action content, and perform the following operations: determine the starting image corresponding to the current action content, and determine the starting image in the image sequence in sequence The action score of at least one image after the image and the current action content; determine the matching result corresponding to the current action content based on the action score of any image; determine the image sequence based on the matching results corresponding to all action content in the action sequence The matching result with the action sequence;
    检测结果生成模块,用以基于所述图像序列与所述动作序列的匹配结果,生成检测结果。A detection result generation module is used to generate detection results based on the matching results of the image sequence and the action sequence.
  13. 一种电子设备,其特征在于,包括:An electronic device, characterized by including:
    处理器;processor;
    用于存储处理器可执行指令的存储器;Memory used to store instructions executable by the processor;
    其中所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至11中任意一项所述的检测方法。The processor is configured to call instructions stored in the memory to execute the detection method according to any one of claims 1 to 11.
  14. 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1至11中任意一项所述的检测方法。A computer-readable storage medium on which computer program instructions are stored, characterized in that when the computer program instructions are executed by a processor, the detection method described in any one of claims 1 to 11 is implemented.
  15. 一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行用于实现权利要求1至11中任意一项所述的检测方法。A computer program product includes computer readable code, or a non-volatile computer readable storage medium carrying computer readable code. When the computer readable code is run in a processor of an electronic device, the electronic device The processor in the device executes the detection method described in any one of claims 1 to 11.
PCT/CN2022/114904 2022-03-17 2022-08-25 Detection method and apparatus, electronic device, and storage medium WO2023173686A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210265003.2 2022-03-17
CN202210265003.2A CN114612986A (en) 2022-03-17 2022-03-17 Detection method, detection device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023173686A1 true WO2023173686A1 (en) 2023-09-21

Family

ID=81865128

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/114904 WO2023173686A1 (en) 2022-03-17 2022-08-25 Detection method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN114612986A (en)
WO (1) WO2023173686A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117113260A (en) * 2023-10-19 2023-11-24 深圳市磐锋精密技术有限公司 Intelligent laminating equipment fault early warning system based on data analysis

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612986A (en) * 2022-03-17 2022-06-10 北京市商汤科技开发有限公司 Detection method, detection device, electronic equipment and storage medium
CN116912947B (en) * 2023-08-25 2024-03-12 东莞市触美电子科技有限公司 Intelligent screen, screen control method, device, equipment and storage medium thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126214A (en) * 2019-12-13 2020-05-08 北京旷视科技有限公司 Living body detection method and apparatus, computer device, and computer-readable storage medium
CN112950801A (en) * 2021-02-08 2021-06-11 中国联合网络通信集团有限公司 Remote office attendance recording method and device
CN113971841A (en) * 2021-10-28 2022-01-25 北京市商汤科技开发有限公司 Living body detection method and device, computer equipment and storage medium
CN114612986A (en) * 2022-03-17 2022-06-10 北京市商汤科技开发有限公司 Detection method, detection device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126214A (en) * 2019-12-13 2020-05-08 北京旷视科技有限公司 Living body detection method and apparatus, computer device, and computer-readable storage medium
CN112950801A (en) * 2021-02-08 2021-06-11 中国联合网络通信集团有限公司 Remote office attendance recording method and device
CN113971841A (en) * 2021-10-28 2022-01-25 北京市商汤科技开发有限公司 Living body detection method and device, computer equipment and storage medium
CN114612986A (en) * 2022-03-17 2022-06-10 北京市商汤科技开发有限公司 Detection method, detection device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117113260A (en) * 2023-10-19 2023-11-24 深圳市磐锋精密技术有限公司 Intelligent laminating equipment fault early warning system based on data analysis
CN117113260B (en) * 2023-10-19 2024-01-30 深圳市磐锋精密技术有限公司 Intelligent laminating equipment fault early warning system based on data analysis

Also Published As

Publication number Publication date
CN114612986A (en) 2022-06-10

Similar Documents

Publication Publication Date Title
WO2023173686A1 (en) Detection method and apparatus, electronic device, and storage medium
US11551482B2 (en) Facial recognition-based authentication
US20200097643A1 (en) rtCaptcha: A Real-Time Captcha Based Liveness Detection System
US10339402B2 (en) Method and apparatus for liveness detection
CN108804884B (en) Identity authentication method, identity authentication device and computer storage medium
US10275672B2 (en) Method and apparatus for authenticating liveness face, and computer program product thereof
US10268910B1 (en) Authentication based on heartbeat detection and facial recognition in video data
US9684779B2 (en) Secure face authentication with liveness detection for mobile
JP2022532677A (en) Identity verification and management system
TW201907330A (en) Method, device, device and data processing method for identity authentication
US8970348B1 (en) Using sequences of facial gestures to authenticate users
WO2020019591A1 (en) Method and device used for generating information
JP7006584B2 (en) Biometric data processing device, biometric data processing system, biometric data processing method, biometric data processing program, storage medium for storing biometric data processing program
Smith-Creasey et al. Continuous face authentication scheme for mobile devices with tracking and liveness detection
US11017253B2 (en) Liveness detection method and apparatus, and storage medium
TW201504839A (en) Portable electronic apparatus and interactive human face login method
JP2022105583A (en) Face living body detection method and device, electronic equipment, storage medium, and computer program
WO2022099989A1 (en) Liveness identification and access control device control methods, apparatus, electronic device, storage medium, and computer program
CN109005104A (en) A kind of instant communicating method, device, server and storage medium
US20230306792A1 (en) Spoof Detection Based on Challenge Response Analysis
US11985128B2 (en) Device step-up authentication system
WO2020007191A1 (en) Method and apparatus for living body recognition and detection, and medium and electronic device
WO2022111688A1 (en) Face liveness detection method and apparatus, and storage medium
CN113518061B (en) Data transmission method, equipment, device, system and medium in face recognition
McQuillan Is lip-reading the secret to security?

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22931708

Country of ref document: EP

Kind code of ref document: A1