CN114220051B

CN114220051B - Video processing method, application program testing method and electronic equipment

Info

Publication number: CN114220051B
Application number: CN202111507922.8A
Authority: CN
Inventors: 王淳; 冯晟; 黎彬; 赵幸福; 曾定衡; 周迅溢; 王洪斌; 蒋宁
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2023-07-28
Anticipated expiration: 2041-12-10
Also published as: CN114220051A

Abstract

The embodiment of the application discloses a video processing method, an application program testing method and electronic equipment. The video processing method comprises the following steps: acquiring a plurality of segments of video materials based on living bodies, processing the segments of video materials based on target human faces, acquiring a first transformation video and a second transformation video of the target human faces, and generating a transition video between the first transformation video and the second transformation video; and splicing the first transformation video, the second transformation video and the transition video to obtain a target video. The scheme is based on the transformation of the multi-section video materials, a large number of transformation videos are generated, meanwhile, any two sections of transformation videos are joined through the transition videos, richer target videos are obtained, meanwhile, the defect problem caused by joining between the two sections of transformation videos is avoided, and therefore enough rich video materials can be used for more sufficient testing.

Description

Video processing method, application program testing method and electronic equipment

Technical Field

The present disclosure relates to the field of video processing technologies, and in particular, to a video processing method, an application program testing method, and an electronic device.

Background

With the continuous development of artificial intelligence technology, face recognition technology has been applied to various industries. While the face recognition technology brings great convenience to people, many face changes (namely non-living faces such as 3D prosthesis faces) occur, so that the safety of face recognition is seriously threatened. In order to improve the safety of face recognition, the face verification program needs to be strictly tested, so that more face videos need to be transformed to fully test the face verification program, however, the existing face-transformed video materials are relatively single, and the single video materials are far from being sufficient for fully testing.

Disclosure of Invention

In view of the above, the present application proposes a video processing method, an application program testing method, and an electronic device, so as to improve the above problem.

In a first aspect, the present application provides a video processing method, including: acquiring a living body-based multi-section video material; processing the multi-section video material based on a target face to obtain a first transformation video and a second transformation video of the target face; generating a transition video between the first transformed video and the second transformed video for the first transformed video and the second transformed video; and splicing the first transformation video, the second transformation video and the transition video to obtain a target video.

In a second aspect, the present application provides a method for testing an application program, including: acquiring audio and video information of test equipment; acquiring a video to be played according to the audio and video information; wherein the video to be played is generated according to the method of the first aspect; collecting a face verification result output by the test equipment; the face verification result is a verification result output by the testing equipment in response to the video to be played; and testing the application program on the testing equipment according to the face verification result.

In a third aspect, the present application provides a video processing apparatus, including: an acquisition unit configured to acquire a plurality of pieces of video material based on a living body; the processing unit is used for processing the multi-section video material based on a target face to obtain a first transformation video and a second transformation video of the target face; a generation unit configured to generate, for the first and second transformed videos, a transition video between the first and second transformed videos; and the splicing unit is used for splicing the first transformation video, the second transformation video and the transition video to obtain a target video.

In a fourth aspect, the present application provides a test apparatus comprising: the first acquisition unit is used for acquiring audio and video information of the test equipment; the second acquisition unit is used for acquiring the video to be played according to the audio and video information; wherein the video to be played is generated according to the method of the first aspect; the acquisition unit is used for acquiring a face verification result output by the test equipment; the face verification result is a verification result output by the testing equipment in response to the video to be played; and the testing unit is used for testing the application program on the testing equipment according to the face verification result.

In a fifth aspect, the present application provides an electronic device comprising a processor and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the methods described above.

In a sixth aspect, the present application provides a computer readable storage medium having program code stored therein, wherein the program code, when executed by a processor, performs the method described above.

In a seventh aspect, the present application provides a test system, where the test system includes a main control device, a test device, and an adjustment device, where the main control device is configured to perform face verification on an acquired video to be played according to the methods described in the first and second aspects, and the test device is configured to output a face verification result; the adjusting device is used for adjusting the current position and the current angle of the testing device.

According to the scheme, a large number of transformation videos are generated based on transformation of the multi-section video materials, meanwhile, any two sections of transformation videos are joined through the transition videos, richer target videos are obtained, meanwhile, the defect problem caused by joining between the two sections of transformation videos is avoided, and therefore enough abundant video materials can be used for more sufficient testing.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates a schematic diagram of an application environment in accordance with embodiments of the present application;

FIG. 2 illustrates a schematic diagram of another application environment in accordance with embodiments of the present application;

FIG. 3 is a flow chart illustrating a video processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a video transformation according to an embodiment of the present application;

FIG. 5 is a schematic diagram of another video transformation according to an embodiment of the present application;

FIG. 6 is a schematic diagram of transitional video generation according to an embodiment of the present application;

FIG. 7 is a flow chart illustrating another video processing method according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a video affine transformation according to an embodiment of the present application;

FIG. 9 is a flowchart of a method for testing an application according to an embodiment of the present application;

FIG. 10 is a schematic diagram of an interface of a test apparatus according to an embodiment of the present application;

fig. 11 is a block diagram showing a configuration of a video processing apparatus according to an embodiment of the present application;

FIG. 12 is a block diagram showing a test apparatus according to an embodiment of the present application;

fig. 13 shows a block diagram of a configuration of an electronic device for executing a test method of a video processing method or an application program according to an embodiment of the present application;

fig. 14 illustrates a storage unit for storing or carrying program codes for implementing a test method of a video processing method or an application program according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Among them, with the development of computer vision technology in artificial intelligence technology, many methods of transforming face images or face videos, such as deep-fake (deep-transformation) method, have appeared, which use machine learning technology, especially deep learning technology, to transform images or videos, and are characterized by strong realism. When used in an unscrupulous setting, there is a great hazard, such as changing stars to inaugurations, such as editing the lecture video of politics to an unrealistic language, etc. These impossions of face identity have plagued many people and have also presented challenges to the management of the wind.

In order to improve the safety of face recognition, the face verification program needs to be strictly tested, so that more face videos need to be transformed to fully test the face verification program, however, the existing face-transformed video materials are relatively single, and the single video materials are far from being sufficient for fully testing. In addition, to obtain a larger quantity of richer test video material, the manufacturing cost of re-recording the whole section of live video and manufacturing the test video material each time is too high, and the combination of interaction and numbers used by the tested program is not known during the test, so that the test program cannot be used for automatic testing.

In order to improve the problems, the application provides a video processing method and an application program testing method, in the method, a large number of transformation videos are generated by transforming based on multi-section video materials, meanwhile, any two sections of transformation videos are joined through the generated transition videos, richer target videos are obtained, meanwhile, the defect problem occurring when the two sections of transformation videos are joined is avoided, the fidelity of the video materials is improved, and therefore enough abundant video materials can be used for more sufficient testing. Furthermore, the automatic test can be realized by flexibly coping with the interaction action of the tested program, the combination of numbers and the like through various playing modes. Thereby reducing the manufacturing cost of the test video material and the cost of the automated test. .

Before describing embodiments of the present application in further detail, embodiments of the present application are described with reference to an application environment.

Referring to fig. 1, fig. 1 is a schematic diagram of an application environment according to an embodiment of the present application. The test system 100 shown in fig. 1 includes a main control device 110, a test device 120, and an adjustment device 130. The testing device 120 is provided with a face verification program, which is used for performing face verification on the acquired video to be played and outputting a face verification result; the adjustment device 130 is used to adjust the current position and the current angle of the test device. The main control device 110 is configured to analyze a face verification result output by the test device, so as to test a face verification program installed on the test device. The video to be played acquired by the test device 120 may be a transformed face video generated by the main control device 110, or a face video of a real person acquired by the test device 120, or a face video of a non-living body (such as a 3D face mask) acquired by the test device 120.

Optionally, the test system 100 shown in fig. 1 may further include a device for delivering video to be played, such as a display screen, a curtain, an AR (Augmented Reality ) device, etc. (not shown in fig. 1), where the display screen is used to deliver video to be played (i.e. a face video), and the video to be played is a transformed face video generated by the master control device 110.

Optionally, the test system 100 shown in fig. 1 may further include a camera for acquiring interface information of the test device, or acquiring face video of a real person, or acquiring face video of a non-living body, and a microphone (not shown in fig. 1) for acquiring audio information of the test device.

The test system 100 may be any combination of the above devices according to actual test requirements, and may further include other devices, such as a controller, etc., which is not limited by the embodiment of the present invention.

It should be noted that, the face-changing video generated by the main control device 110 in fig. 1 may be used for performing the face anti-counterfeit test in a mode of playing on a high-definition display screen, or may be used for performing the face anti-counterfeit test in a mode of hijacking the camera data stream to inject.

It should be noted that fig. 1 is an exemplary application environment, and the method provided in the embodiment of the present application may also be run in other application environments.

As shown in fig. 2, fig. 2 is a schematic diagram of another application environment according to an embodiment of the present application. The device comprises a main control device 110, a test device 120, an adjustment device 130, a display screen 140, a camera 150 and a microphone 160. Wherein the test device 120 is located on the cradle head of the adjustment device 130. The application environment shown in fig. 2 is mainly used for anti-counterfeit testing of a human face, when testing is performed, the camera 150 and the microphone 160 acquire interface information and audio information on the testing device 120 and transmit the interface information and the audio information to the main control device 110, the main control device 110 can generate a human face video meeting requirements according to the interface information and the audio information of the testing device 120, and then put the human face video into the display screen 140, the testing device 120 acquires the human face video in the display screen 140, and then verifies a human face in the human face video to obtain a human face verification result, and the main control device 110 tests a human face verification program on the testing device 120 according to the human face verification result.

Optionally, the camera 150 may further collect an interface image of the test device, and determine, according to the interface image, whether the current position and the current angle of the test device meet the test requirement (for example, whether a complete face is shot); if the current position and the current angle of the test equipment do not meet the test requirements, sending an adjustment instruction to the adjustment equipment 130, wherein the adjustment equipment 130 adjusts the position and the angle of the test equipment 120 according to the adjustment instruction; or, the adjusting device 130 adjusts the display position of the video to be played on the display screen 140, so that the testing device 120 can shoot the face video meeting the requirement.

Specifically, the adjusting device 130 may adjust the position of the pan-tilt along the X-Y-Z track, so as to adjust different distances between the testing device and the target user, and may change the pitch angle and/or yaw angle and/or roll angle of the testing device according to the instruction of the host computer, so as to adjust different shooting angles between the testing device and the target user.

It should be noted that, the master device 110 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content distribution networks), and basic cloud computing services such as big data and artificial intelligent platforms. The electronic device in which the test device 120 is located may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The display 140, the camera 150, and the microphone 160 may be separate devices or integrated into other devices, and the embodiment of the present invention is not limited thereto.

The test system provided in fig. 1 and 2 is capable of simulating a transformation scenario including, but not limited to:

a) The electronic screen plays the pre-shot face image and/or video;

b) Hijacking a camera data stream to inject a pre-shot face image and/or video;

c) The electronic screen plays the transformed face image and/or video;

d) Hijacking the camera data stream injects the transformed face image and or video.

The method for producing the transformed face image and/or video can be a traditional transformation method such as ps, mixed cutting and pre-shooting of video and the like; but also a method based on deep depth transform algorithm. The embodiments of the present application are not limited in this regard.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to fig. 3, fig. 3 is a flowchart illustrating a video processing method according to an embodiment of the present application, where the method is applied to a master device, and the method includes:

s110, acquiring a plurality of segments of video materials based on living bodies.

In a common face authentication related system, a user is often required to do some related actions, such as nodding, shaking, blinking, opening mouth, even reading some numbers, or the like, or combined actions, and technologies of face key point positioning, face tracking, and the like are used for verifying whether the user is a real living body operation. A living body, i.e. a real human body, is an entity with respect to a non-living body, such as a 3D prosthetic face.

In the embodiment of the application, the related actions can be disassembled into unit actions and corresponding action videos are recorded, and the videos are called video materials. Among other things, video material may include, but is not limited to: silence, nodding, shaking, zhang Zhangzui, blinking, reading numbers 0-9, etc.

The video materials can be obtained by continuously shooting a whole video and then segmenting the whole video, or can be obtained by independently shooting each basic action, or can be obtained by combining two modes, and the embodiment of the application is not limited.

Namely, a plurality of sections of video materials can be obtained by cutting a section of continuously shot video; the method can also obtain a plurality of sections of video materials through shooting each basic action independently and shooting for a plurality of times; the multi-section video material can also be obtained by combining the two modes.

When a real person shoots basic videos, certain requirements, such as uniform light, clear pictures and high resolution, need not be met, and females need not have bans. The initial state of each action is a silence state, and the corresponding action is finished and then the state is returned to the silence state, namely the head keeps the position as fixed as possible, and the mouth is closed. If the whole video is recorded, the silence state is kept, each action is separated by at least three seconds, and the silence state is returned after the recording of each action is completed.

Wherein the video material may take one or more sets of male material and/or one or more sets of female material. The embodiment of the application is not limited, but different materials are not mixed.

Further, after shooting is completed, the shot video needs to be clipped, so that the silent section length of each basic video before and after the start of the action is appropriate. In addition, the positions of the front silence segment and the rear silence segment need to be marked on the base video after clipping, so that the subsequent splicing is convenient.

According to the above requirements, in the multiple segments of video materials obtained by the main control device, each segment of the video material includes three segments, that is, three segments form the whole video material, and the three segments include: the method comprises the steps of a front preset segment (namely a front silent segment), an action segment and a rear preset segment (namely a rear silent segment), wherein the time sequence of the three segments in the whole video material is fixed, namely the front preset segment, the action segment and the rear preset segment are sequentially arranged in time sequence, the front preset segment and the rear preset segment do not comprise any action information, the front preset segment represents a video segment in a static state before the action starts, the rear preset segment represents a video segment in a static state after the action ends, and the action segment represents a video segment in a motion state. The time length of the front preset segment is matched with the time length of the rear preset segment, the front preset segment and the rear preset segment after the matched lengths are used for video splicing, and the action segment is used for face authentication.

And S120, processing the multi-segment video material based on the target face to obtain a first transformation video and a second transformation video of the target face.

In the prior art, since it is not known what interaction and number combination the tested application program will use, if a large number of abundant counterfeit videos are to be obtained during each test, the whole segment of living body needs to be re-recorded and the counterfeit videos need to be made, and in this way, the manufacturing cost is too high to be used for automatic test.

In the application, the multi-section video material is a pre-photographed multi-section basic video material, and includes various basic actions such as shaking head, blinking, and opening mouth. And processing the multi-section video material based on the target face to obtain a multi-section transformation basic video, wherein the multi-section transformation basic video also comprises various basic actions. Therefore, a large amount of rich change videos are obtained, the manufacturing cost of the forged videos is reduced, meanwhile, various interactive videos required by testing can be obtained when random combination is carried out later, and interaction of the tested application programs is flexibly dealt with.

The multi-section transformation basic video comprises a first transformation video and a second transformation video, and the first transformation video and the second transformation video can be combinations of basic videos which are needed to be interacted by an application program in the subsequent test.

In the embodiment of the application, the generation of the transformation video can be performed in various manners, such as face-changing type depth transformation video and driving type depth transformation video.

As an optional implementation manner, the processing the multiple segments of real basic videos based on the target face to obtain multiple segments of transformed basic videos of the target face includes:

replacing the face in the video material with the target face aiming at each section of the video material to obtain a transformation basic video of the target face; or alternatively, the first and second heat exchangers may be,

and extracting the motion characteristics of the faces in the video materials aiming at each section of the video materials, and fusing the motion characteristics to the target faces to obtain a transformation basic video of the target faces.

The above manner of replacing the face in the video material with the target face to obtain the transform base video may be referred to as a face-changing depth transform video. And obtaining the video of the target face by changing the target face of the target user to the video material of a certain user, thereby passing the authentication. The final motion of the original basic video is still seen, and only the face has become the target face, the background and the like are all the original basic video. The face-changing algorithm may be FSGAN, faceSwap, etc., and the embodiment of the present application does not limit the face-changing algorithm used.

The above manner of fusing the motion features to the target face to obtain the transformation base video may be referred to as driving depth transformation video. And (3) extracting the motion characteristics (such as the characteristics of an open mouth) of the face in the basic video, fusing the motion characteristics to the picture of the target face, and generating the motion video (namely, transforming the basic video) of the target face. In the motion video of the target face, no information (such as background, face) on the original video material can be seen.

For example, referring to fig. 4 and 5, fig. 4 is a schematic diagram illustrating a video transformation according to an embodiment of the present application; fig. 5 shows a schematic diagram of another video transformation according to an embodiment of the present application. Fig. 4 is a face-changing depth conversion video, and fig. 5 is a driving depth conversion video.

S130, generating transition videos between the first transformation video and the second transformation video aiming at the first transformation video and the second transformation video.

The first transformation video and the second transformation video can be any two sections of videos to be spliced in the multi-section transformation basic video, and the any two sections of videos to be spliced can be two sections of videos determined according to test requirements.

In this application, since a flaw is unavoidable in the multi-segment transformation base video when it is spliced, it is necessary to insert a transition video between two spliced transformation videos (a first transformation video and the second transformation video) for a more sufficient test.

The first transformation video may include a front preset segment, an action segment, and a rear preset segment, and the second transformation video may include a front preset segment, an action segment, and a rear preset segment. And performing alignment splicing according to the rear preset segment of the first transformation video and the front preset segment of the second transformation video.

Specifically, the generating, for a first transformed video and a second transformed video in the multi-segment transformed base video, a transition video between the first transformed video and the second transformed video includes:

acquiring a first image frame of a first transformation video and a second image frame of a second transformation video aiming at the first transformation video and the second transformation video in the multi-section transformation basic video;

calculating a similarity transformation matrix according to the first image frame and the second image frame;

processing the second transformed video based on the similarity transformation matrix to obtain a third transformed video;

Performing frame inserting processing on the first transformation video and the third transformation video to generate a transition video;

the splicing the first transformation video, the second transformation video and the transition video to obtain a target video includes:

and splicing the first transformation video, the transition video and the third transformation video to obtain a target video.

Specifically, the acquiring the first image frame of the first transformed video and the second image frame of the second transformed video includes:

acquiring a first image frame with clear face from a rear preset segment of the first transformation video, and

acquiring a second image frame with clear human faces from a front preset segment of the second transformation video;

said computing a similarity transformation matrix from said first image frame and said second image frame, comprising:

extracting a first key point of a face in the first image frame and extracting a second key point of the face in the second image frame;

and calculating a similarity transformation matrix between the first image frame and the second image frame according to the first key point and the second key point.

In this embodiment of the present application, the first transformed video and the second transformed video may be respectively recorded as a video i and a video j, and an intermediate video from the video i to the video j may be generated. Considering that video i and video j may be taken separately, faces in the two video segments are aligned first. Specifically, selecting a frame with clear face in a rear silence segment of the video i, marking the frame as an image i, and selecting a frame with clear face in a front silence segment of the video j, marking the frame as an image j; extracting key points of faces in the image i and the image j, and calculating a similarity transformation matrix T between the two; and applying the similarity transformation matrix T to the image j to obtain an image j ', wherein the faces of the image i and the image j' have similar sizes and are positioned in similar positions in the image. And applying the similarity transformation matrix T to each frame of the video j to obtain the video.

Wherein the similarity transformation (similarity transformation) can be performed by scaling, translation, rotation, etc.

A transitional video ij is generated between the image j' and the image i by using a frame inserting algorithm. The frame inserting algorithm may be a Real-time video frame inserting algorithm (Real-Time Intermediate Flow Estimation, life) or a stream independent video representation (Flow-Agnostic Video Representations for Fast Frame Interpolation, FLAVR) algorithm for fast frame interpolation, and the embodiment of the present application does not limit the frame inserting algorithm used.

And finally, splicing the video i, the transition video ij and the video j' together to obtain a coherent natural video.

As shown in fig. 6, fig. 6 is a schematic diagram illustrating a transitional video generation according to an embodiment of the present application. The process shown in fig. 6 is to transform the video j, and similarly, may also transform the video i, and may also transform the video i and the video j at the same time, so as to achieve similar effects, which are not described in detail in this application.

And S140, splicing the first transformation video, the second transformation video and the transition video to obtain a target video.

In the method, according to interaction required by testing, a first transformation video and a second transformation video containing basic actions of the interaction are selected, and meanwhile, transition videos of the first transformation video and the second transformation video are generated. The first transformation video, the second transformation video and the transition video are spliced to obtain a target video, so that the interaction video required by the test is obtained, the interaction of the tested application program can be flexibly dealt with, meanwhile, the defect problem occurring when two sections of transformation videos are spliced is avoided, the test can be more fully performed, and the test cost is reduced.

Referring to fig. 7, fig. 7 is a flowchart illustrating another video processing method according to an embodiment of the present application, where the method is applied to a master device, and the method includes:

s710, acquiring a plurality of segments of video materials based on living bodies.

S720, processing the multi-segment video material based on the target face to obtain a first transformation video and a second transformation video of the target face.

S730, generating a transition video between the first transformed video and the second transformed video for the first transformed video and the second transformed video.

And S740, splicing the first transformation video, the second transformation video and the transition video to obtain a target video.

S750, determining playing parameters.

In the embodiment of the application, in order to test more behavior patterns and cover interactive face authentication scenes, the combination of the interactive actions and the numbers of the tested program is flexibly dealt with, so that the purpose of automatic testing is achieved, and the test materials need to be played in a diversified mode.

Wherein, the playing parameters may include, but are not limited to: basic video play sequence, motion amplitude, play transformation, and video frame rate. The base video playing sequence may represent a playing sequence of a plurality of videos to be played, for example, the base video playing sequence is a base video silence, a number 3, a number 7, a number 1, and silence; the motion amplitude represents the amplitude of the motion of the face in the video, such as the size of the mouth; the play transformation represents affine transformation modes of the video to be played, such as zoom-in, zoom-out, panning and the like; the video frame rate represents the speed of video playing, and the faster the playing speed is, the faster the speed of human faces in the video to act is represented.

S760, processing the target video according to the playing parameters to obtain a video to be played.

The target video may be processed according to at least one playing parameter to obtain a video to be played. Such as: and processing the target video according to the basic video playing sequence to obtain a video to be played. Such as: and processing the target video according to the basic video playing sequence and the action amplitude to obtain a video to be played. And, for example: and processing the target video according to the basic video playing sequence, the action amplitude, the playing transformation and the video frame rate to obtain the video to be played.

Optionally, the playing parameters include a basic video playing sequence, and the processing the target video according to the playing parameters to obtain a video to be played includes:

determining at least one target video according to two adjacent basic videos in the basic video play sequence;

and connecting at least one target video according to the playing sequence in the basic video playing sequence to obtain a video to be played.

In this embodiment, the base video play sequence includes a plurality of base videos and play orders of the plurality of base videos, from which any two adjacent base videos can be determined, and the two adjacent base videos can be processed by the method of steps S710 to S740 above to obtain target videos, and if the base video play sequence includes more than 3 base videos, more than 2 target videos can be obtained. And finally, connecting at least one target video according to the playing sequence in the basic video playing sequence to obtain the video to be played.

For example, when a digital living body in a face algorithm is tested, the number 3371 needs to be read out according to the interface prompt, and then the basic video playing sequence is the basic video silence, the number 3, the number 7, the number 1 and the silence, and two adjacent basic videos can be processed to generate a plurality of target videos, and the target videos need to be played according to the sequence during playing.

Optionally, the playing parameter includes an action amplitude, and the processing the target video according to the playing parameter to obtain a video to be played includes:

and deleting at least one frame of action image with action amplitude larger than a preset amplitude threshold value from the basic video aiming at each section of basic video in the target video, and filling the deleted position by adopting the frame inserting image to obtain the video to be played.

In this embodiment, in order to enrich the test material, the motion amplitudes of different users may be simulated by extracting image frames with large motion amplitudes.

For example, for a shaking head video in a basic video, a face gesture can be calculated through a face key point, one or more frames of images with the face gesture larger than a certain random angle are selectively extracted, and the extracted positions are filled by a frame inserting algorithm, so that a video to be played is obtained. For example, the number 3 in the basic video may be used to calculate the opening width through the opening key point, selectively extract one or more frames of images with the opening width greater than a certain random threshold, and fill the extracted position with a frame inserting algorithm, so as to obtain the video to be played.

Optionally, the playing parameters include playing transformation, and the processing the target video according to the playing parameters to obtain a video to be played includes:

randomly generating affine transformation;

and processing the target video according to the affine transformation to obtain a video to be played.

In this embodiment, for enriching the test material, different photographing angles between the test apparatus and the user can be simulated by affine transformation of the target video. And randomly generating an affine transformation T in a certain range, and applying the T to the target video to serve as the video to be played.

Referring to fig. 8, fig. 8 shows a schematic diagram of affine transformation of video according to an embodiment of the present application.

Optionally, the playing parameters include a video frame rate, and the processing the target video according to the playing parameters to obtain a video to be played includes:

changing the video frame rate of the target video to obtain a video to be played; or (b)

And changing the video frame rate of each section of basic video in the target video to obtain the video to be played.

In this embodiment, to enrich the test material, the speed of action of different users may be simulated by varying the video frame rate. The mode of changing the video frame rate may be to change the frame rate of each basic video independently at random, or change the video to be output finally as a whole, or change the video randomly when outputting, which is not limited in the embodiment of the present application.

S770, outputting the video to be played.

According to the video processing method provided by the embodiment, a plurality of segments of video materials based on living bodies are firstly obtained, and the segments of video materials are processed based on target faces to obtain a first transformation video and a second transformation video of the target faces; further, generating a transition video between the first transformed video and the second transformed video for the first transformed video and the second transformed video; and splicing the first transformation video, the second transformation video and the transition video to obtain a target video. After the playing parameters are determined, the target video can be processed according to the playing parameters, and the video to be played is obtained. Therefore, the transformed target video can be matched with various playing modes to obtain more behavior modes and cover interactive face authentication scenes, and the requirements of the test are met more, so that enough rich video materials can be used for more sufficient test. According to the embodiment, through the methods of multi-section basic video materials, transition video generation and diversified playing, the test cost for manufacturing the test video materials can be reduced, interaction actions of tested programs and the like can be flexibly dealt with, and the purpose of automatic test is achieved.

Referring to fig. 9, fig. 9 is a flowchart illustrating a testing method of an application according to an embodiment of the present application. The test method of the application program shown in fig. 9 may be implemented in the application environment shown in fig. 2, and the test method may be applied to the master control device. The test method comprises the following steps:

s910, acquiring audio and video information of the test equipment.

The audio and video information may include interface information of the test device and audio information output by the test device. Wherein the interface information such as face image, the audio information such as test equipment voice prompt "blink", "read the following number", etc.

The main control equipment can collect interface information of the test equipment through the camera, and can collect audio information output by the test equipment through the microphone.

S920, acquiring a video to be played according to the audio and video information.

The main control device can determine required video materials according to the audio and video information, further generate a target video according to the video processing method described in the above embodiment, and process the target video by combining the playing parameters described in the above embodiment, further obtain a video to be played.

S930, putting the video to be played on a display screen.

The display screen may be an independent device or may be integrated on the main control device, which is not limited in the embodiment of the present application.

S940, collecting a face verification result output by the testing equipment.

The face verification result is a verification result output by the testing equipment in response to the video to be played.

After the video to be played is played on the display screen, the testing equipment can acquire the face video on the video to be played through the camera, further, the face of the face video is verified through a face verification program installed on the testing equipment, and a face verification result, such as voice prompt verification passing and verification failure, is output.

S950, testing the application program on the testing equipment according to the face verification result.

The main control equipment can analyze according to the acquired face verification result, specifically, if the input video to be played is a face-changing video, the acquired face verification result is a face-changing video, the main control equipment indicates that the face verification program on the test equipment is accurate in test and has stronger attack prevention capability, otherwise, if the input video to be played is a face-changing video, the acquired face verification result is a face-true video, the main control equipment indicates that the face verification program on the test equipment is failed in test and has weaker attack prevention capability.

As an optional implementation manner, after the video to be played is launched onto a display screen and before the face verification result output by the test device is collected, the method further includes:

acquiring an interface image of the test equipment;

judging whether the current position and the current angle of the test equipment meet the test requirements or not according to the interface image;

if the current position and the current angle of the test equipment do not meet the test requirements, sending an adjustment instruction to an adjustment device, wherein the adjustment device adjusts the position and the angle of the test equipment according to the adjustment instruction; or alternatively, the first and second heat exchangers may be,

and if the current position and the current angle of the testing equipment do not meet the testing requirements, adjusting the display position of the video to be played on the display screen.

Where the test requires, for example, taking a complete face.

In this embodiment, the camera may collect the interface image of the test device and send it to the master device. The main control equipment judges whether the complete face is shot by the test equipment or not under the current position and the current angle of the test equipment through analyzing the interface content. If the photographed face is incomplete, an adjustment instruction is sent to instruct adjustment of the position and angle of the testing device, and/or the display position of the video to be played on the high-definition display screen is adjusted, so that the testing device can photograph a complete face image. So that smooth test can be ensured.

Referring to fig. 10, fig. 10 is an interface schematic diagram of a test apparatus according to an embodiment of the present application. In which fig. 10 shows a complete face image, a reduced face image, and an incomplete face image in this order from left to right.

The embodiment of the application provides a test method of an application program, which comprises the steps of firstly acquiring audio and video information of test equipment; according to the audio and video information, acquiring a video to be played generated by the video processing method, and putting the video to be played on a display screen; further, collecting a face verification result output by the test equipment; and finally, testing the application program on the testing equipment according to the face verification result. Therefore, the face verification program on the equipment can be tested by adopting richer face video conversion and is matched with various playing modes, and further the face verification program of the test equipment is sufficiently tested, so that the test effect is better.

Referring to fig. 11, fig. 11 is a block diagram illustrating a video processing apparatus 1100 according to an embodiment of the present application, where the apparatus 1100 includes: an acquisition unit 1110, a processing unit 1120, a generation unit 1130, and a splicing unit 1140.

Wherein, the acquiring unit 1110 is configured to acquire a plurality of segments of video materials based on a living body.

And the processing unit 1120 is used for processing the multiple segments of video materials based on the target face to obtain multiple segments of transformation basic videos of the target face. The multi-segment transform base video described above may include a first transform video and a second transform video.

A generating unit 1130, configured to generate, for a first transformed video and a second transformed video in the multi-segment transformed base video, a transition video between the first transformed video and the second transformed video.

And a stitching unit 1140, configured to stitch the first transformed video, the second transformed video, and the transition video to obtain a target video.

Optionally, the processing unit 1120 processes the multiple segments of video material based on a target face to obtain a multiple segment transformation base video of the target face, including:

Optionally, the generating unit 1130 generates, for a first transformed video and a second transformed video in the multi-segment transformed base video, a transition video between the first transformed video and the second transformed video, including:

the stitching unit 1140 stitches the first transformed video, the second transformed video, and the transition video to obtain a target video, including:

Specifically, the manner in which the generating unit 1130 acquires the first image frame of the first transformed video and the second image frame of the second transformed video may be:

the generating unit 1130 may calculate the similarity transformation matrix according to the first image frame and the second image frame by:

Optionally, each segment of the video material includes three segments, the three segments including: the device comprises a front preset segment, an action segment and a rear preset segment, wherein the length of the front preset segment is matched with that of the rear preset segment, the front preset segment and the rear preset segment after the lengths are matched are used for video splicing, and the action segment is used for face authentication.

Optionally, the apparatus 1100 may further include:

a determining unit, configured to determine a play parameter;

the processing unit 1120 is further configured to process the target video according to the playing parameter to obtain a video to be played.

Wherein the playing parameters include the following

At least one of: basic video play sequence, motion amplitude, play transformation, and video frame rate.

Optionally, the playing parameters include a basic video playing sequence, and the processing unit 1120 processes the target video according to the playing parameters to obtain a video to be played, including:

Optionally, the playing parameter includes an action amplitude, and the processing unit 1120 processes the target video according to the playing parameter to obtain a video to be played, including:

deleting action amplitude from the base video for each section of the base video in the target video

And filling at least one frame of action image larger than a preset amplitude threshold value by adopting the frame inserting image at the deleted position to obtain the video to be played.

Optionally, the playing parameters include playing transformation, and the processing unit 1120 processes the target video according to the playing parameters to obtain a video to be played, including:

Randomly generating affine transformation;

Optionally, the playing parameters include a video frame rate, and the processing unit 1120 processes the target video according to the playing parameters to obtain a video to be played, including:

The application provides a video processing device, transform based on multistage video material, generated a large amount of transformation videos, simultaneously, link up arbitrary two sections transformation videos through transition video, obtained richer target video, still avoided simultaneously the flaw problem that appears when linking between two sections transformation videos to can have abundant enough video material to be used for more abundant test.

Referring to fig. 12, fig. 12 is a block diagram illustrating a testing apparatus 1200 according to an embodiment of the present application, where the testing apparatus 1200 includes: a first acquisition unit 1210, a second acquisition unit 1220, an acquisition unit 1230, and a test unit 1240. Wherein, the liquid crystal display device comprises a liquid crystal display device,

A first obtaining unit 1210, configured to obtain audio and video information of the test device;

a second obtaining unit 1220, configured to obtain a video to be played according to the audio and video information; the video to be played is generated according to the video processing method described in the above embodiment;

the acquisition unit 1230 is used for acquiring a face verification result output by the test equipment; the face verification result is a verification result output by the testing equipment in response to the video to be played;

and a testing unit 1240, configured to test the application program on the testing device according to the face verification result.

Optionally, the apparatus 1200 may further include:

the acquisition unit is used for acquiring interface images of the test equipment;

the judging unit is used for judging whether the current position and the current angle of the testing equipment meet the testing requirements or not according to the interface image;

the sending unit is used for sending an adjustment instruction to the adjustment device if the current position and the current angle of the test device do not meet the test requirement, wherein the adjustment device adjusts the position and the angle of the test device according to the adjustment instruction; or alternatively, the first and second heat exchangers may be,

And the adjusting unit is used for adjusting the display position of the video to be played on the display screen if the current position and the current angle of the testing equipment do not meet the testing requirements.

The testing device provided by the application can adopt richer face video conversion to attack the face verification program on the testing equipment, and is matched with various playing modes, so that the face verification program of the testing equipment is fully tested, and the testing effect is better.

Further, the testing device provided by the application can record the position and angle of the testing equipment, and the video content and the playing parameters of the changing face played by the screen during testing. By recording the above information, a test scene can be reproduced at a later date. By reproducing the test scene, the method helps developers to locate problems and helps testers to prove.

It should be noted that, in the present application, the device embodiment and the foregoing method embodiment correspond to each other, and specific principles in the device embodiment may refer to the content in the foregoing method embodiment, which is not described herein again.

An electronic device provided in the present application will be described with reference to fig. 13.

Referring to fig. 13, based on the above-mentioned video processing method and the application program testing method, another electronic device 200 including a processor 104 capable of executing the above-mentioned video processing method and application program testing method is further provided in this embodiment, where the electronic device 200 may be a smart phone, a tablet computer, a computer, or a portable computer. The electronic device 200 also includes a memory 104, a network module 106, a screen 108, and a voice acquisition module 109. The memory 104 stores therein a program capable of executing the contents of the foregoing embodiments, and the processor 102 can execute the program stored in the memory 104.

Processor 102 may include one or more cores for processing data and a message matrix unit, among other things. The processor 102 utilizes various interfaces and lines to connect various portions of the overall electronic device 200, perform various functions of the electronic device 200, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 104, and invoking data stored in the memory 104. Alternatively, the processor 102 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 102 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 102 and may be implemented solely by a single communication chip.

The Memory 104 may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (RAM). Memory 104 may be used to store instructions, programs, code sets, or instruction sets. The memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (e.g., a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, etc. The storage data area may also store data created by the terminal 100 in use (such as audio-visual data), and the like.

The network module 106 is configured to receive and transmit electromagnetic waves, and implement mutual conversion between the electromagnetic waves and the electrical signals, so as to communicate with a communication network or other devices, for example, the network module 106 may transmit broadcast data, or may analyze broadcast data transmitted by other devices. The network module 106 may include various existing circuit elements for performing these functions, such as an antenna, a radio frequency transceiver, a digital signal processor, an encryption/decryption chip, a Subscriber Identity Module (SIM) card, memory, and the like. The network module 106 may communicate with various networks such as the Internet, intranets, wireless networks, or other devices via wireless networks. The wireless network may include a cellular telephone network, a wireless local area network, or a metropolitan area network. For example, the network module 106 may interact with base stations.

The screen 108 may display interface content, for example, may display that the foregoing embodiment example red performs adverse labeling and punctuation labeling on the text based on the adverse detection result and the punctuation labeling result, and obtains the labeled text. And the scoring corresponding to the text to be scored, which is obtained based on the disfluency detection result and the punctuation marking result, can be displayed.

The voice acquisition module 109 is configured to acquire audio information. For example, it may be used to collect speech output by a user. It should be noted that, when the electronic device 200 is used as a server, the voice acquisition module 109 may not be included.

It should be noted that, to implement more functions, the electronic device 200 may also protect more devices, for example, may further include a structured light sensor for acquiring face information, or may further protect a camera for acquiring iris, or the like.

Referring to fig. 14, a block diagram of a computer readable storage medium according to an embodiment of the present application is shown. The computer readable medium 1100 has stored therein program code that can be invoked by a processor to perform the methods described in the method embodiments above.

The computer readable storage medium 1100 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, computer readable storage medium 1100 includes non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium 1100 has storage space for program code 810 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 1110 may be compressed, for example, in a suitable form.

In summary, according to the video processing method, the application program testing method and the electronic device, a multi-segment video material based on a living body is obtained, and the multi-segment video material is processed based on a target face, so that a multi-segment transformation basic video of the target face is obtained; further, generating a transition video between a first transformed video and a second transformed video in the multi-segment transformed base video for the first transformed video and the second transformed video; finally, the first transformation video, the second transformation video and the transition video are spliced to obtain a target video, so that transformation is performed based on multi-section video materials, a large number of transformation videos are generated, meanwhile, any two sections of transformation videos are joined through the transition video, richer target videos are obtained, meanwhile, the defect problem occurring when the two sections of transformation videos are joined is avoided, and therefore enough rich video materials can be used for more sufficient testing.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, one of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A video processing method, comprising:

acquiring a living body-based multi-section video material;

processing the multiple segments of video materials based on a target face to obtain a first transformed video and a second transformed video of the target face, wherein processing each segment of video materials comprises the following steps: replacing the face in the video material with the target face; or fusing the motion characteristics of the face in the video material with the target face;

generating a transition video between the first transformed video and the second transformed video for the first transformed video and the second transformed video;

Splicing the first transformation video, the second transformation video and the transition video to obtain a target video so as to test a face verification program;

the generating a transition video between the first transformed video and the second transformed video for the first transformed video and the second transformed video, comprising:

performing similar transformation on the second transformation video according to the first image frame of the first transformation video and the second image frame of the second transformation video to obtain a third transformation video;

and performing frame inserting processing on the first transformation video and the third transformation video to generate a transition video.

2. The method of claim 1, wherein processing the plurality of segments of video material based on the target face to obtain a first transformed video and a second transformed video of the target face comprises:

replacing the face in the video material with the target face aiming at each section of the video material to obtain the first transformation video and the second transformation video; or alternatively, the first and second heat exchangers may be,

and extracting the motion characteristics of the faces in the video materials aiming at each section of the video materials, and fusing the motion characteristics to the target faces to obtain the first transformation video and the second transformation video.

3. The method as recited in claim 1, further comprising:

acquiring a first image frame of the first transformed video and a second image frame of the second transformed video;

the performing similar transformation on the second transformation video according to the first image frame of the first transformation video and the second image frame of the second transformation video to obtain a third transformation video, including:

and processing the second transformation video based on the similar transformation matrix to obtain a third transformation video.

4. A method according to claim 3, wherein said obtaining said first transformation

A first image frame of video and a second image frame of said second transformed video, comprising:

5. The method according to any one of claims 1 to 4, further comprising:

determining a play parameter;

and processing the target video according to the playing parameters to obtain the video to be played.

6. The method of claim 5, wherein the playback parameters include the following

7. The method of claim 6, wherein the playback parameters include a base

And a video playing sequence, wherein the processing of the target video according to the playing parameters to obtain a video to be played comprises the following steps:

8. The method of claim 6, wherein the playing parameters include motion amplitude, and the processing the target video according to the playing parameters to obtain a video to be played comprises:

9. The method of claim 6, wherein the playing parameters include a playing transformation, and wherein the processing the target video according to the playing parameters to obtain a video to be played comprises:

randomly generating affine transformation;

10. The method of claim 6, wherein the playing parameters include a video frame rate, and wherein the processing the target video according to the playing parameters to obtain the video to be played comprises:

11. A method for testing an application installed on a test device, comprising

Acquiring audio and video information output by the test equipment;

Acquiring a video to be played according to the audio and video information; wherein the video to be played is based on

A method as generated by the method of any one of claims 5 to 10;

collecting a face verification result output by the test equipment; the face verification result is a verification result output by the testing equipment in response to the video to be played;

and testing the application program on the testing equipment according to the face verification result.

12. The method of claim 11, wherein after the obtaining the video to be played according to the audio-video information and before the collecting the face verification result output by the test device, the method further comprises:

acquiring an interface image of the test equipment;

and if the current position and the current angle of the test equipment do not meet the test requirements, adjusting the display position of the released video to be played.

13. A video processing apparatus, comprising:

an acquisition unit configured to acquire a plurality of pieces of video material based on a living body;

the processing unit is configured to process the multiple segments of video materials based on a target face to obtain a first transformed video and a second transformed video of the target face, where processing each segment of video material includes: replacing the face in the video material with the target face; or fusing the motion characteristics of the face in the video material with the target face;

a generation unit configured to generate, for the first and second transformed videos, a transition video between the first and second transformed videos;

the splicing unit is used for splicing the first transformation video, the second transformation video and the transition video to obtain a target video so as to test a face verification program;

the generating unit is specifically configured to:

14. A test device, comprising:

the first acquisition unit is used for acquiring audio and video information of the test equipment;

the second acquisition unit is used for acquiring the video to be played according to the audio and video information; wherein said at least one of

The video to be played is generated according to the method of any one of claims 5 to 10;

the acquisition unit is used for acquiring a face verification result output by the test equipment; the face verification result is a verification result output by the testing equipment in response to the video to be played;

and the testing unit is used for testing the application program on the testing equipment according to the face verification result.

15. An electronic device comprising a processor and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the methods of any of claims 1-10 or to implement the methods of any of claims 11-12.

16. A computer readable storage medium, characterized in that a program code is stored in the computer readable storage medium, wherein the program code, when being executed by a processor, performs the method of any of claims 1-10 or performs the method of any of claims 11-12.

17. A test system, characterized in that the test system comprises a main control device, a test device and an adjustment device, wherein the main control device is used for executing the method of any one of claims 1-12, and the test device is provided with a face verification program for carrying out face verification on an acquired video to be played and outputting a face verification result; the adjusting device is used for adjusting the current position and the current angle of the testing device.

18. The test system of claim 17, further comprising a display screen for delivering the video to be played, the video to be played being generated according to

A method as produced by the method of any one of claims 5 to 10.

19. The test system of claim 18, further comprising a camera for capturing interface information of the test device, or capturing face video of a real person, or capturing face video of a non-living body, and a microphone for capturing audio information of the test device.