CN115471780A

CN115471780A - Method and device for testing sound-picture time delay

Info

Publication number: CN115471780A
Application number: CN202211412028.7A
Authority: CN
Inventors: 李文姣
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-11-11
Filing date: 2022-11-11
Publication date: 2022-12-13
Anticipated expiration: 2042-11-11
Also published as: CN115471780B

Abstract

The application provides a method and a device for testing sound and picture time delay, which are applied to first electronic equipment, wherein the first electronic equipment can be communicated with second electronic equipment, and the second electronic equipment completes preset actions based on a preset interface, and the method comprises the following steps: controlling the second electronic equipment to start screen recording; triggering the second electronic equipment to enter a preset interface, and triggering the second electronic equipment to finish a preset action on the preset interface; after the second electronic equipment completes the preset action, controlling the second electronic equipment to stop screen recording; identifying a target image frame from a video recorded by a screen, wherein the target image frame comprises pixels representing that the second electronic equipment starts a preset action; identifying a target audio from the video, wherein the target audio comprises a sound emitted when the second electronic device starts a preset action; and acquiring the sound-picture time delay of the preset action based on the time stamp of the target image frame and the time stamp of the target audio.

Description

Method and device for testing sound-picture time delay

Technical Field

The present application relates to the field of testing technologies, and in particular, to a method and an apparatus for testing a sound-picture delay.

Background

Along with the continuous development of the technology, the forms of electronic equipment are more and more, electronic equipment in different forms can test sound and picture time delay, and the sound and picture time delay can be understood as time delay between audio and image frames in a video, is an important parameter when the electronic equipment plays a video, and how to improve the automation degree of sound and picture time delay test is a problem to be solved urgently.

Disclosure of Invention

The application provides a method and a device for testing sound and picture time delay, and aims to solve the problem of how to improve the automation degree of sound and picture time delay testing. In order to achieve the above object, the present application provides the following technical solutions:

a first aspect of the present application provides a method for testing a sound-picture time delay, where the method is applied to a first electronic device, the first electronic device can communicate with a second electronic device, and the second electronic device completes a preset action based on a preset interface, and the method includes: controlling the second electronic equipment to start screen recording; triggering the second electronic equipment to enter a preset interface, and triggering the second electronic equipment to finish a preset action on the preset interface; after the second electronic equipment completes the preset action, controlling the second electronic equipment to stop screen recording; identifying a target image frame from a video recorded by a screen, wherein the target image frame comprises a pixel for representing that second electronic equipment starts a preset action; identifying a target audio from the video, wherein the target audio comprises a sound emitted when the second electronic device starts a preset action; and acquiring the sound-picture time delay of the preset action based on the time stamp of the target image frame and the time stamp of the target audio. In the method for testing the sound-picture time delay provided by this embodiment, the first electronic device controls the second electronic device, simulates a user to execute a preset action on the second electronic device, and records a video including the preset action. After the recorded video is obtained, the sound and picture time delay of the preset action is obtained by automatically analyzing the image frames and the audio in the video, so that the manual intervention is not needed from the recording to the obtaining of the sound and picture time delay, the automation degree is high, and the efficiency is high. Moreover, the voice painting time delay is obtained according to the time stamp of the audio frequency and the time stamp of the image frame, so that the adverse effect of manual intervention on the accuracy of the voice painting time delay can be reduced, the accuracy of the voice painting time delay is improved, and the objectivity of the voice painting time delay can be improved.

Optionally, identifying the target image frame from the video recorded on the screen includes: extracting a first image frame and a second image frame from the video, wherein the first image frame is a previous frame image of the second image frame; extracting image characteristics of a first image frame and image characteristics of a second image frame; determining the second image frame as a target image frame in response to the image characteristics of the first image frame and the image characteristics of the second image frame satisfying a first preset condition; in response to that the image characteristics of the image frame of the first image frame and the second image frame do not meet a first preset condition, taking the second image frame as the first image frame, extracting a third image frame from the video, taking the third image frame as the second image frame, and taking the third image frame as a next frame image of the second image frame; the first preset condition comprises that the brightness difference between the brightness of the first image frame and the brightness of the second image frame is within a preset difference range; alternatively, the first preset condition includes that the first image frame is similar to a second image frame, and the second image frame includes a pixel indicating that the second electronic device starts a preset action. That is, the first electronic device may extract two consecutive image frames from the video, extract image features for the two image frames, determine a target image frame based on whether the image features of the two image frames satisfy a first preset condition, if the first preset condition is met, determining the next frame of image frame as a target image frame so as to improve the accuracy by two continuous frame image frames; and if the first preset condition is not met, continuously extracting the image frames, and continuously utilizing two continuous image frames before and after the image frames to determine the target image frame. The image features may be brightness, similarity, etc., and are not described herein again.

Optionally, the identifying the target audio from the video includes: extracting audio from the video and extracting audio characteristics of the audio; and determining the audio frequency as the target audio frequency in response to the audio frequency characteristic of the audio frequency meeting a second preset condition, wherein the second preset condition comprises that the maximum decibel of the audio frequency is within a preset decibel range, so that the target audio frequency is objectively determined according to the maximum decibel of the audio frequency, and the objectivity and the accuracy are improved.

Optionally, the method further comprises: obtaining a timestamp of the target image frame based on the frame number of the target image frame; obtaining a time stamp of the target audio based on the sampling number of the target audio; based on the timestamp of the target image frame and the timestamp of the target audio, acquiring the sound-picture time delay of the preset action comprises the following steps: and taking the difference value between the time stamp of the target audio and the time stamp of the target image frame as the sound-picture time delay of the preset action. Because the time stamp of the target audio is obtained through the sampling number of the target audio, and the time stamp of the target image frame is obtained through the frame number of the target image frame, the process does not need human intervention, the objectivity and the accuracy of the time stamp of the target audio and the time stamp of the target image frame are improved, on the basis, the voice painting time delay also avoids human intervention, namely, the influence of subjectivity can be avoided, and the method has higher accuracy and objectivity.

Optionally, obtaining the timestamp of the target image frame based on the frame number of the target image frame includes: obtaining a timestamp of the target image frame based on the frame number of the target image frame, the frame rate of the video and a preset time conversion unit; obtaining the timestamp of the target audio based on the number of samples of the target audio comprises: and determining the sampling number of the target audio as the time stamp of the target audio, wherein the sampling number of the target audio points to the sampling time of the target audio. For example, the timestamp of the target image frame = the number of frames of the target image frame (preset time conversion unit/frame rate of the video), where the frame rate of the video may also be referred to as a video frame rate. The time stamp of the target image frame and the time stamp of the target audio frequency can be automatically calculated by the first electronic equipment, so that the influence of subjectivity can be avoided, and higher accuracy and objectivity are achieved.

Optionally, the method further comprises: identifying a first suspected target audio and a second suspected target audio from the video, wherein the first suspected target audio and the second suspected target audio are the audio of which the maximum decibel is within a preset decibel range and are identified from the video, and the second suspected target audio is behind the first suspected target audio; obtaining a difference value between the sampling number of the first suspected target audio and the sampling number of the second suspected target audio; identifying a target image frame from a video recorded on a screen and identifying a target audio from the video comprises: and in response to the difference value being smaller than the preset value, identifying a target image frame from image frames with the number of frames being behind the sampling number of the second suspected target audio and identifying a target audio from audio with the sampling number being behind the sampling number of the second suspected target audio. The difference value between the sampling number of the first suspected target audio and the sampling number of the second suspected target audio is smaller than a preset value, which indicates that the first electronic device detects a preset action in the two audios with smaller sampling intervals, that is, the preset action is detected in the first suspected target audio and the second suspected target audio, but the sampling intervals of the two target audios are smaller. Generally, after the audio of the preset action is detected, the audio of the preset action is detected again only with a larger sampling interval, so that the preset action is detected in two suspected target audios with a smaller sampling interval, which indicates that the detection is wrong, and the sampling of the second suspected target audio is after the first suspected target audio, so that the first electronic device identifies the target image frame and the target audio after the second suspected target audio, and the accuracy is improved. The value of the preset value is not limited.

Optionally, the controlling the second electronic device to start the screen recording includes: sending a first command to second electronic equipment, wherein the first command is used for indicating the second electronic equipment to enter a screen recording interface of a screen recording application; sending a second command to the second electronic equipment, wherein the second command is used for indicating the second electronic equipment to trigger a starting control in the screen recording interface; after the second electronic device completes the preset action, controlling the second electronic device to stop screen recording comprises: after the second electronic device finishes the preset action, sending a first command to the second electronic device; and sending a third command to the second electronic equipment, wherein the third command is used for indicating the second electronic equipment to trigger an ending control in the screen recording interface so as to realize automatic screen recording and automatic ending of screen recording of the second electronic equipment through a plurality of commands.

Optionally, triggering the second electronic device to enter a preset interface, and triggering the second electronic device to complete a preset action on the preset interface includes: sending a fourth command to the second electronic device, wherein the fourth command is used for indicating the second electronic device to enter a preset interface, and the preset interface comprises an identifier pointing to a preset action; and sending a fifth command to the second electronic device, wherein the fifth command is used for indicating the trigger mark so that the second electronic device completes the preset action on the preset interface, and the second electronic device can automatically simulate the preset action through a plurality of commands.

A second aspect of the present application provides an electronic device comprising: one or more processors and a memory for storing computer program code, the computer program code comprising computer instructions, which when executed by the one or more processors, cause the electronic device to perform the method for testing vocal tract latency provided by the first aspect of the present application.

A second aspect of the present application provides a computer storage medium, configured to store a computer program, where the computer program is specifically configured to implement the method for testing the sound-picture delay provided in the first aspect of the present application.

A fourth aspect of the present application provides a computer program product containing instructions. When the computer program product runs on a computer or a processor, the computer or the processor is caused to execute the method for testing the sound painting time delay provided by the first aspect of the present application.

Drawings

FIG. 1 is a schematic diagram of a gun shooting scene of a game interface provided herein;

fig. 2 is a hardware structure diagram of an electronic device provided in the present application;

FIG. 3 is an exemplary diagram of the control terminal controlling the electronic device;

fig. 4 is a timing diagram of a method for testing a sound-painting delay according to the present application;

fig. 5 is a flowchart of controlling an electronic device to start recording a screen in the method for testing the sound-picture delay provided by the present application;

fig. 6 is a flowchart illustrating a process of controlling an electronic device to simulate a game action and finish screen recording in the method for testing sound-picture delay provided by the present application;

fig. 7 is a flowchart of obtaining a sound-picture delay in the sound-picture delay testing method provided in the present application;

fig. 8 is a diagram showing decibels of audio collected at a sampling time according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of this application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. It should also be understood that in the embodiments of the present application, "one or more" means one, two, or more than two; "and/or" describes the association relationship of the associated objects, indicating that three relationships may exist; for example, a and/or B, may represent: a alone, both A and B, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The embodiments of the present application refer to a plurality of the same or greater than two. It should be noted that, in the description of the embodiments of the present application, the terms "first", "second", and the like are used for distinguishing the description, and are not to be construed as indicating or implying relative importance or order.

Fig. 1 is a schematic diagram of a shooting scene of a game interface, and (1) in fig. 1 shows that a user aims an object through a prop pistol, and the user can click a shooting control, and the shooting control controls the prop pistol to shoot the aimed object. The prop pistol makes a sound when shooting at the object aimed at, and the waveform shown in (2) in fig. 1 indicates that the prop pistol makes a sound when shooting. As can be known from the gun shooting scene shown in fig. 1, the sound and picture delay is the time length from the time when the user clicks the gun shooting control to the time when the pistol sounds, and the shorter the time length is, the better the sound and picture synchronization effect is.

At present, the method for testing the sound-picture time delay comprises the following steps: when the electronic equipment displays the image frame related to the specific action, the user starts timing, and after the electronic equipment responds to the specific action and generates sound, the user ends timing, and then the user calculates the sound-picture time delay. The specific action may be an action that triggers the electronic device to make a sound during the display of the image frame by the electronic device, such as clicking a gun shooting control as described above with reference to fig. 1. The method for testing the voice painting time delay by the user can reduce the accuracy and the objectivity due to artificial factors, and the testing efficiency of the voice painting time delay is low, so that auxiliary timing can be performed by means of some tools during testing the voice painting time delay, but the tools often need the user to trigger the start and the end of timing, and the accuracy and the objectivity of the voice painting time delay and the testing efficiency of the voice painting time delay are still to be improved.

In order to improve the automation degree of the sound-picture time delay test and obtain more accurate and objective sound-picture time delay, the application provides a sound-picture time delay test method and device, electronic equipment responds to a specific action to obtain a video file containing the specific action, analyzes image frames in the video file to obtain image frames corresponding to the specific action, and obtains a timestamp of the image frames; analyzing the audio in the video file to obtain the audio corresponding to the specific action and obtain the time stamp of the audio; and determining the difference between the audio time stamp and the image frame time stamp as the voice and picture time delay, wherein the voice and picture time delay testing process does not need user intervention, so that more accurate and objective voice and picture time delay can be obtained, and higher testing efficiency can be obtained.

The method for testing the sound and picture time delay is used for obtaining the sound and picture time delay when the electronic equipment simulates a specific action. In some embodiments, the electronic device may be a cell phone, a tablet, a desktop, a laptop, a notebook, an Ultra-mobile Personal Computer (UMPC), a handheld Computer, a netbook, a Personal Digital Assistant (PDA), a wearable electronic device, a smart watch, or the like. The specific form of the electronic device is not particularly limited in the present application.

As shown in fig. 2, the electronic device may include: the mobile terminal comprises a processor, an external memory interface, an internal memory, a Universal Serial Bus (USB) interface, a charging management Module, a power management Module, a battery, an antenna 1, an antenna 2, a mobile communication Module, a wireless communication Module, a sensor Module, a key, a motor, an indicator, a camera, a display screen, a Subscriber Identity Module (SIM) card interface and the like. Wherein the audio module may include a speaker, a receiver, a microphone, an earphone interface, etc., and the sensor module may include a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc.

It is to be understood that the illustrated structure of the present embodiment does not constitute a specific limitation to the electronic device. In other embodiments, an electronic device may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Among other things, a processor may include one or more processing units, such as: the Processor may include an Application Processor (AP), a modem Processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband Processor, and/or a Neural-Network Processing Unit (NPU), among others. The different processing units may be separate devices or may be integrated into one or more processors. The processor is a nerve center and a command center of the electronic equipment, and the controller can generate an operation control signal according to the instruction operation code and the time sequence signal to finish the control of instruction fetching and instruction execution.

The external memory interface can be used for connecting an external memory card, such as a Micro SD card, so as to expand the storage capacity of the electronic device. The external memory card communicates with the processor through the external memory interface to realize the data storage function. Such as video files, image frames, audio, and the like, are saved in an external memory card. The internal memory may be used to store computer-executable program code, which includes instructions. The processor executes various functional applications of the electronic device and data processing by executing instructions stored in the internal memory. For example, in the present application, the processor executes the instructions stored in the internal memory, so that the electronic device executes the method for testing the sound painting delay provided in the present application.

The electronic device realizes the display function through the GPU, the display screen, the application processor and the like. The GPU is a microprocessor for image processing and is connected with a display screen and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen is used for displaying images, videos and the like. The display screen includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-OLED, a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the electronic device may include 1 or N display screens, N being a positive integer greater than 1.

A series of Graphical User Interfaces (GUIs) may be displayed on a display screen of an electronic device, and these GUIs are the main screens of the electronic device. Generally, the size of the display screen of the electronic device is fixed, and only limited controls can be displayed in the display screen of the electronic device. A control is a GUI element, which is a software component contained in an application program and controls all data processed by the application program and interactive operations related to the data, and a user can interact with the control through direct manipulation (direct manipulation) to read or edit information related to the application program. Generally, a control may include a visual interface element such as an icon, button, menu, tab, text box, dialog box, status bar, navigation bar, widget, and the like. For example, in the present embodiment, the display screen may display a video file or the like containing a specific action.

In some embodiments, the processor may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The USB interface is an interface which accords with USB standard specifications, and specifically can be a Mini USB interface, a Micro USB interface, a USB Type C interface and the like. The USB interface can be used for connecting a charger to charge the electronic equipment and can also be used for transmitting data between the electronic equipment and peripheral equipment. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface can also be used for connecting other electronic equipment, such as computers and the like.

In this embodiment, the computer may be used as a control terminal, an application program for testing the sound and image delay is configured in the control terminal in advance, and in fig. 3, an icon a represents an icon of the application program for testing the sound and image delay. And after the user starts the application program for testing the sound and picture time delay based on the icon A, displaying an interface B of the application program for testing the sound and picture time delay. The control terminal (first electronic device) presets a plurality of test cases for the electronic device (second electronic device), and the test cases can be displayed through an interface B, taking the example that B in the interface displays the following test cases:

1. simulating game action, and testing sound and picture time delay when the electronic equipment simulates the game action; if the gun shooting action is simulated, testing the sound picture time delay from the beginning of shooting to the playing of the gun sound of the electronic equipment; for another example, a strange action is simulated, and the test electronic device simulates a sound drawing time delay from the strange action to the sound.

2. Simulating video playing, and testing sound and picture time delay when the electronic equipment starts the video playing; if the electronic equipment starts a video playing program, the sound-picture time delay from the time when the video playing button is clicked to the time when the audio in the video is played is tested.

3. Simulating audio playing, and testing sound and picture time delay when the electronic equipment starts the audio playing; if the electronic equipment starts an audio playing program, the sound-picture time delay from the time when the audio playing button is clicked to the time when music is played is tested.

Assuming that a user selects a test case 1 in the interface B, that is, to test the sound-picture delay when the game action is simulated on the electronic device, an application program for testing the sound-picture delay, which is run by the control terminal, executes the following flow with reference to fig. 4:

s1, a control terminal generates a folder used for storing videos obtained by screen recording and controls electronic equipment to start screen recording.

And S2, the control terminal controls the electronic equipment to simulate game actions.

And S3, when the electronic equipment is detected to finish the game action, the control terminal controls the electronic equipment to finish screen recording. In some implementations, after detecting that the electronic device has completed the game action for a period of time, the electronic device is controlled to end screen recording to ensure that the image frames and audio can be recorded.

And S4, the control terminal stores the video obtained by screen recording into a folder to obtain a video file of the screen recording. The folder may be the folder created in step S1.

And S5, reading the video file by the control terminal.

And S6, the control terminal extracts the audio from the video file.

S7, the control terminal analyzes the audio, searches the audio corresponding to the game action from the audio and obtains the time stamp of the audio.

And S8, the control terminal extracts the image frame from the video file.

S9, the control terminal analyzes the image frames, searches the image frames corresponding to the game action from the image frames and obtains the time stamps of the image frames.

And S10, the control terminal obtains the sound-picture time delay corresponding to the game action according to the audio time stamp and the image frame time stamp.

It is understood that after obtaining the sound picture delay, the following steps can be performed to extend the functions:

s11, the control terminal judges whether the sound-picture time delay is within a preset error range, and a judgment result is obtained.

And S12, the control terminal stores the judgment result.

As can be seen from the flow shown in fig. 4, in the method for testing the sound-painting delay provided by this embodiment, the control terminal controls the electronic device, simulates a user to execute a game action on the electronic device, and records a video including the game action. After the recorded video is obtained, the image frame and the audio frequency in the video are automatically analyzed, the sound and picture time delay corresponding to the game action is obtained, and therefore, except for selecting a test case, manual intervention is not needed, and the sound and picture time delay is obtained according to the time stamp of the audio frequency and the time stamp of the image frame, so that the adverse effect of the manual intervention on the accuracy of the sound and picture time delay can be reduced, the accuracy of the sound and picture time delay is improved, and the objectivity of the sound and picture time delay can be improved. Moreover, the process shown in fig. 4 has a high degree of automation and thus a high efficiency.

The steps in the above method for testing the sound-painting delay will be described in detail with reference to the accompanying drawings. Referring to fig. 5, a process of generating an empty file and controlling the electronic device to record a screen by the control terminal is shown, and may include the following steps:

s101, the control terminal generates a folder named by the current time. Namely, the current time is taken as the name of the folder to distinguish the folders.

S102, the control terminal obtains interface control information of screen recording software.

In some implementation manners, the interface control information of the screen recording software includes an identifier of the screen recording software and an identifier of a screen recording starting control, the identifier of the screen recording software is used for starting the screen recording software, and the identifier of the screen recording starting control is used for starting screen recording. The control terminal and the electronic equipment can be connected through a USB, after the electronic equipment is connected, the control terminal can preset the test terminal, the presetting is initialization of the test terminal, and after the initialization of the test terminal is completed, interface control information of the screen recording software is obtained from the electronic equipment through the USB.

And S103, controlling the electronic equipment to start screen recording by the control terminal according to the interface control information of the screen recording software.

It can be understood that, the control terminal sends the command line carrying the interface control information to the electronic device, and controls the electronic device to start the screen recording software and start the screen recording through the screen recording software. For example, command line start activity XXX indicates that the user can simulate entering a screen recording software interface marked as XXX, and command line click id/text XXX indicates that the user can start screen recording by clicking a start control marked as XXX. For another example, the command line open _ notification YYY represents a simulated user drop-down menu, and the command line click id/text yyyy represents a simulated click on the start control of the screen recording software identified as YYY.

In some implementation manners, the screen recording software in the electronic device may be screen recording software of the electronic device, and in this case, the interface control information of the screen recording software of the electronic device needs to be sent to the control terminal. In other implementation manners, the screen recording software in the electronic device pushes and indicates the screen recording software installed in the electronic device for the control terminal. For example, the control terminal pushes the screen recording software to the electronic device through an Android Debug Bridge (adb) channel, and instructs the electronic device to install the screen recording software pushed by the control terminal through a command line. In this case, interface control information of the screen recording software does not need to be additionally configured in the control terminal.

No matter the electronic equipment records the screen through the screen recording software provided with the screen recording software or through the screen recording software recommended by the control terminal, the control terminal controls the electronic equipment to automatically start to record the screen through a preset command line, and manual screen recording of a user is omitted.

Fig. 6 shows a flow of controlling the electronic device to simulate a game action and controlling the electronic device to end screen recording by the control terminal, which may include the following steps:

s201, the control terminal obtains interface control information of the game action.

S202, the control terminal controls the electronic equipment to simulate game actions according to the interface control information of the game actions.

The interface control information of the game action comprises an identification of game software and an identification of a game action starting control, the identification of the game software is used for starting the game software to enter an interface of the game action, and the identification of the game action starting control is used for starting the game action. If the gun shooting action of the royal is simulated, the mark of the game software points to the royal, the game action starting control can be a gun shooting control, and the electronic equipment is controlled to start shooting by simulating and clicking the gun shooting control.

It can be understood that the control terminal controls the electronic device to simulate the game action by sending a command line carrying interface control information of the game action to the electronic device. For example, the command line start activity ZZZ indicates simulated entry into the game software interface identified as ZZZ, and the command line click id/text ZZZ indicates clicking on the start control identified as XXX to begin the game action.

And S203, the control terminal controls the electronic equipment to finish screen recording according to the interface control information of the screen recording software.

In some implementation manners, the electronic device finishes screen recording by sending interface control information carrying screen recording software through the control terminal, for example, a command line includes start activity XXX and click id/text XXX to simulate entering a screen recording software interface, and simulate clicking an end control to finish screen recording, where a start control and an end control of the screen recording software may be one control.

The control terminal controls the electronic device to simulate game actions through a preset command line of the game actions, so that the electronic device can record videos containing the game actions, wherein the videos comprise image frames simulating the game actions, audio when the electronic device executes the game actions, such as at least two image frames shown in the figure 1 and a gunshot sound emitted by the electronic device. After the electronic equipment finishes game action, the control terminal controls the electronic equipment to finish screen recording through a preset command line of screen recording software, so that no user participates in the process from screen recording to game action simulation and screen recording, and the automation degree is improved. After the electronic device finishes recording the screen, the electronic device obtains a video containing the game action, and the video can be stored in a folder, such as the folder created in step S1. In some implementations, the control terminal exports the video from the electronic device via the adb channel and stores the video in the folder based on the path of the folder.

Fig. 7 shows a flow of obtaining a sound-picture delay corresponding to a game action by a control terminal, which may include the following steps:

s301, setting a click flag (ClickFlag) as False and an audio flag (soundFlag) as Flase in advance.

S302, reading the video file. The video file video _ path is read from a folder report created in advance, which is the folder created in step S1, if the control terminal calls cap = cv2.

S303, reading the total frame number of the video, the frame rate of the video, the video duration and the resolution date.

If the control terminal calls the cap.get (cv2. CAP _ PROP _ FRAME _ COUNT) to read the total FRAME number of the video FRAME _ COUNT, calls the cap.get (cv2. CAP _ PROP _ FPS) to read the video FRAME rate FPS _ COUNT, calls the VideoFileClip (video _ path), reads the video duration _ duration through the duration, and reads the resolution date test _ time through the date time. The total video frame number, the video frame rate, the video duration and the parsing date can be stored in a folder where the video file is located, so that the video file can be conveniently viewed subsequently.

And S304, traversing each frame of image frame in the video file. Read () is called by the control terminal to traverse each frame image frame.

And S305, naming the image frames according to the number of the image frames of each frame, and storing the image frames of each frame.

The image frame can be saved under a folder where the video file is located, for example, the control terminal calls cur _ img _ path = os.path.join (os.path.diameter (video _ path), "images", "{ }. Jpg". Format (i)) to set a storage path of the image frame, and then calls cv2.Imwrite (cur _ img _ path, frame) to store the image frame under the set storage path. The frame number indicates that the image frame is the number of frames in the video.

S306, if the ClickFlag is False, the control terminal calculates the brightness difference between the ith frame image frame and the (i-1) th frame image frame. The ith frame image frame and the (i-1) th frame image frame are two consecutive frame images.

In some implementations, for two consecutive image frames, the control terminal calculates an R channel mean value, a G channel mean value, and a B channel mean value of each image frame, respectively, such as:

calling R _ one, G _ one, B _ one = imagestat (stat) (image. Open (cur _ img _ path)), mean calculates an R-channel mean value (R _ one), a G-channel mean value (G _ one) and a B-channel mean value (B _ one) of a first frame image frame in two frame image frame frames which are continuous back and forth;

calling R _ two, G _ two, B _ two = imagestat.stat (image. Open (last _ img _ path)). The mean calculates the R channel mean (R _ two), the G channel mean (G _ two) and the B channel mean (B _ two) of the second frame image frame in the two consecutive frame image frames. The function mainly extracts the R channel value, the G channel value and the B channel value of each pixel in the image frame, and then carries out average processing on the R channel values of all the pixels to obtain the R channel average value of one frame of the image frame; averaging the G channel values of all the pixels to obtain the G channel value of one frame of image frame; and averaging the B channel values of all the pixels to obtain the B channel value of one frame of image frame.

The control terminal obtains the brightness of each frame of image frame according to the R channel mean value, the G channel mean value and the B channel mean value of each frame of image frame; such as:

call brightness _ one = mah. Sqrt (0.241 (r _ one × 2) +0.691 (g _ one × 2) +0.068 (b _ one × 2)) to calculate the brightness of the image frame of the first frame; call brightness _ two = math.sqrt (0.241 (r _ two 2) +0.691 (g _ two 2) +0.068 (b _ two 2)) to calculate brightness of the second frame image frame; and then invoking diff _ image = brightness _ one-brightness _ two to obtain the brightness difference diff _ image between the two frame image frames.

And S307, judging whether the brightness difference is within a preset difference range, if so, executing the step S308, otherwise, returning to execute the step S304 to continue traversing the image frames, and continuously calculating the brightness difference between two continuous image frames before and after, so that the control terminal can calculate the brightness difference between the two image frames in the process of traversing the image frames. In some examples, the control terminal may calculate the brightness difference between the two image frames after completing the traversal of the image frames.

S308, determining that the image frame of the ith frame is the image frame of the game action first-appearing, obtaining the time stamp of the image frame according to the frame number of the image frame, and setting the ClickFlag to True.

The image frame of the game action first appears on the image frame, the image frame of the game action first appears on the image frame and comprises pixels of the electronic equipment for starting the game action, the control terminal can call a ClickFrame (1000/fps _ count) to obtain a time stamp of the image frame, the ClickFrame is the frame number of the image frame, 1000 is a preset time conversion unit and represents that milliseconds are converted into seconds, namely the time stamp of the image frame is in units of seconds; if the time stamp of the image frame is in units of minutes, the preset time conversion unit is 1000 × 60.

It can be understood that, assuming that the first frame of the video is the first frame, i.e. in the case that i is 1, the i-1 th frame does not exist, so that it can be confirmed that the luminance difference between the i-th frame and the i-1 th frame is not within the preset difference range. Besides the image frame which is determined to be the first image frame of the game action by the brightness difference, the control terminal can adopt other modes, such as the control terminal calculates the similarity between two consecutive image frames, further identifies whether the game action exists on the ith image frame when the similarity is within a preset range, and determines the image frame as the first image frame of the game action if the game action exists on the ith image frame. Of course, in the way of determining the image frame of the game action first appearance by the brightness difference, the control terminal may also be introduced to recognize whether the game action exists on the image frame.

In the embodiment, the control terminal determines the image frame of the game action first occurrence by comparing the two continuous image frames before and after, and the image frame is not skipped in the comparison process because the two continuous image frames before and after are compared, so that the control terminal can accurately identify the image frame of the game action first occurrence in time.

And S309, extracting the audio file from the video and saving the audio file. The control terminal calls my _ clip = mp.videofileclip (video _ path) to extract an audio file from the video _ path, and calls my _ clip.audio.write _ audio ('{ }. Wav'. Format (video _ path)) to save the audio file, which can be saved to the folder in which the video is located.

S310, reading the audio file, for example, the control terminal calls wf = wave.open ("{ }. Wav". Format (vector _ path), "rb") to read the audio file.

S311, reading the total audio sampling number, the audio frame rate, the audio duration and the maximum audio decibel. The audio maximum decibel is the maximum decibel of the audio file.

The control terminal may call nframes = wf. Getnframes () to read the total number of audio samples nframes; get frame () is called to read the audio frame rate; calling nframes/frame to read the audio time duration wavetimes; invoking video _ sound = audio _ sound.from _ wav ("{ }. Wav". Format ("wav"), and sound _ db _ avg = video _ sound.max _ dBFS results in audio maximum decibel sound _ db _ avg. The total audio sampling number, the audio frame rate, the audio duration and the maximum audio decibel can be stored in a folder where the video file is located, so that the video file can be conveniently checked subsequently.

S312, go through each sampling number to extract audio from the audio file. The control terminal calls video _ sound [ i: i + n ] to traverse sampling numbers, n is a sampling interval between two adjacent sampling numbers, for example, n can be equal to 1ms,1ms can be a minimum sampling interval set by screen recording software, and audio corresponding to game actions can be found in time.

And S313, naming the audio according to the number of the samples and storing the audio.

In this embodiment, the audio may be saved to the folder in which the video is located, and the control terminal may call cur _ sound _ path = os.path.join (os.path.diameter (video _ path), "voices", "{ }. Wav". Format (i)) to set a saving path of the audio, and then call cur.exit (cur _ sound _ path, format = "wav") to save the audio.

And S314, if the SoundFlag is False, determining the maximum decibel of the current audio. That is, the control terminal determines the maximum decibel of the audio sampled by the number of samples every time the control terminal traverses one number of samples.

If the control terminal calls sound _ index _ one = audio _ aggregate _ path from _ wav and sound _ db = sound _ index _ one.max _ dBFS, the maximum decibel sound _ db of the currently sampled audio is determined.

And S315, judging whether the maximum decibel of the current audio is within a preset decibel range, if so, executing the step S316, otherwise, returning to execute the step S312 to continuously traverse the sampling number to acquire the audio and continuously determine the maximum decibel of the audio, so that the control terminal can determine the maximum decibel of the audio in the process of traversing the sampling number. In some examples, the control terminal may determine the maximum decibel for each audio after completing the traversal of the number of samples. In this embodiment, the number of samples may be a sampling time. Fig. 8 shows the decibels of the audio collected at each sample time, with the abscissa being the sample time and the ordinate being the decibel.

S316, determining that the current audio is the audio corresponding to the game action, obtaining the time stamp of the audio according to the sampling number of the current audio, and setting SoundFlag to True.

The current audio is the audio corresponding to the game action, which means that the current audio is the sound emitted when the electronic device starts the game action and is the first sound emitted during the process of simulating the game action, that is, the current audio includes the sound emitted when the electronic device starts the game action. The control terminal may then take the number of samples of the current audio as the time stamp of the audio, i.e. the time stamp Sound _ timestamps = SoundFrame, which is the number of samples.

In some examples, the control terminal may calculate the time stamps of the image frames and the time stamps of the audio in parallel, or the control terminal calculates the time stamps of the audio first and then calculates the time stamps of the image frames, where the order is not limited.

The points to be explained here are: if the maximum decibel of the multiple audios is within the preset distribution range, calculating the sampling interval between two adjacent audios in the multiple audios, if the sampling interval is smaller than a preset value, which indicates that an error exists, determining the audio corresponding to the game action again, and setting SoundFlag as flag; and setting the ClickFlag as the Flase, and re-determining the image frame corresponding to the game action.

For example, 1000 is the number of samples of the first audio corresponding to the determined game action, 1205 is the number of samples of the second audio corresponding to the determined game action, and the difference between 1000 and 1205 is smaller than the preset value, which indicates that both 1000 and 1205 are wrong (since the audio corresponding to the game action can theoretically pass through a larger sampling interval after the audio corresponding to the game action appears, the audio corresponding to the game action can reappear), soundFlag and ClickFlag are set as flag, the audio corresponding to the game action is searched after the number of samples 1205, and the image frame corresponding to the game action is searched after the number of frames 1205.

And S317, calculating the sound picture time delay.

Duration _ timesamps =duration of Sound and picture Sound _ timestamps-Click _ timestamps.

In the flow shown in fig. 7, each image frame and audio in the video are traversed to search for an image frame corresponding to a game action and an audio corresponding to the game action, and because the image frame corresponding to the game action and the audio corresponding to the game action are the first image frame generated when the electronic device executes the game action and the first sound generated when the electronic device executes the game action, according to the timestamp of the image frame corresponding to the game action and the timestamp of the audio corresponding to the game action, the time delay (i.e., sound-picture time delay) between the picture and the sound when the electronic device executes the game action can be obtained, manual participation is not needed, results with higher accuracy and objectivity can be obtained, and higher efficiency is achieved.

Further, the process shown in fig. 7 may further include the following steps:

and S318, judging whether the sound-picture time delay is within a preset error range, if so, executing the step S319, and if not, executing the step S320.

S319, determining that the test identifier (check _ flag) is True, and the check _ flag is True, which indicates that the sound-picture delay is within the preset error range, and the sound-picture synchronization of the electronic equipment is normal.

S320, determining that the check _ flag is False, and the check _ flag is False, which indicates that the sound picture time delay is not within the preset error range, and the sound picture synchronization of the electronic equipment is abnormal.

In some examples, the control terminal may use openpyxl to store [ frame _ count, fps _ count, times _ duration, click _ time, sound _ time, duration _ time, check _ flag, test _ time ] in a regular write list (Excel), and the control terminal may be configured with a storage template in advance and stored in accordance with the storage template.

In the above embodiment, for example, the control terminal is connected to the electronic device through the USB to obtain the sound and picture time delay of the electronic device for simulating the game action, it may be understood that, in another implementation manner, the application program for testing the sound and picture time delay may be installed in the electronic device, the application program for testing the sound and picture time delay is run in the electronic device, and the application program for testing the sound and picture time delay transmits an instruction to a corresponding module in the electronic device to control the electronic device to execute a process corresponding to the game action, for example, as shown in fig. 4, so as to obtain and automatically analyze the video, and obtain the sound and picture time delay.

In some implementations, the above-mentioned processes of simulating game actions by the electronic device and calculating the sound-painting delay may be packaged as an interface, for example, the interface, the gate _ control, is used for controlling the electronic device to simulate game actions, and the gate _ time _ delay is used for calculating the sound-painting delay of game actions; for other specific actions, corresponding interfaces may also be packaged, and the electronic device is controlled to simulate other specific actions and calculate the sound-picture delay of other specific actions, for example, the video _ app _ control is used to control the electronic device to simulate an audio playing action, and the video _ app _ time _ delay is used to calculate the sound-picture delay of the audio playing action, where there may be differences in sound-picture delays of different specific actions, which is not described here one by one.

In addition, the present application also provides an electronic device, including: the electronic device comprises one or more processors and a memory, wherein the memory is used for storing computer program codes, the computer program codes comprise computer instructions, and when the one or more processors execute the computer instructions, the electronic device executes the test method for the sound painting time delay.

The application also provides a computer storage medium for storing a computer program, and the computer program is specifically used for implementing the method for testing the sound-picture time delay when being executed.

The present application also provides a computer program product containing instructions. When the computer program product runs on a computer or a processor, the computer or the processor is enabled to execute the method for testing the sound painting time delay.

Claims

1. A method for testing sound and picture time delay is applied to first electronic equipment, the first electronic equipment can communicate with second electronic equipment, the second electronic equipment completes preset actions based on a preset interface, and the method comprises the following steps:

controlling the second electronic equipment to start screen recording;

triggering the second electronic device to enter the preset interface, and triggering the second electronic device to complete the preset action on the preset interface;

after the second electronic equipment finishes the preset action, controlling the second electronic equipment to stop screen recording;

identifying a target image frame from a video recorded by the screen, wherein the target image frame comprises pixels representing that the second electronic device starts the preset action;

identifying a target audio from the video, wherein the target audio comprises a sound emitted when the second electronic device starts the preset action;

and acquiring the sound-picture time delay of the preset action based on the time stamp of the target image frame and the time stamp of the target audio.

2. The method of claim 1, wherein identifying a target image frame from the screen recorded video comprises:

extracting a first image frame and a second image frame from the video, wherein the first image frame is a previous frame image of the second image frame;

extracting image features of the first image frame and image features of the second image frame;

determining the second image frame as the target image frame in response to the image characteristics of the first image frame and the image characteristics of the second image frame satisfying a first preset condition;

in response to that the image characteristics of the image frame of the first image frame and the second image frame do not meet the first preset condition, regarding the second image frame as the first image frame, extracting a third image frame from the video, regarding the third image frame as the second image frame, and regarding the third image frame as a next frame image of the second image frame;

the first preset condition comprises that the brightness difference between the brightness of the first image frame and the brightness of the second image frame is within a preset difference range; or, the first preset condition includes that the first image frame is similar to the second image frame, and the second image frame includes a pixel representing that the second electronic device starts the preset action.

3. The method of claim 1, wherein the identifying target audio from the video comprises:

extracting audio from the video and extracting audio features of the audio;

and determining the audio frequency as the target audio frequency in response to the audio frequency characteristic of the audio frequency meeting a second preset condition, wherein the second preset condition comprises that the maximum decibel of the audio frequency is within a preset decibel range.

4. The method according to any one of claims 1 to 3, further comprising: obtaining a time stamp of the target image frame based on the frame number of the target image frame; obtaining a time stamp of the target audio based on the sampling number of the target audio;

the acquiring of the voice-painting delay of the preset action based on the timestamp of the target image frame and the timestamp of the target audio comprises: and taking the difference value between the time stamp of the target audio and the time stamp of the target image frame as the sound-picture time delay of the preset action.

5. The method of claim 4, wherein the deriving the timestamp of the target image frame based on the number of frames of the target image frame comprises: obtaining a timestamp of the target image frame based on the frame number of the target image frame, the frame rate of the video and a preset time conversion unit;

the obtaining the timestamp of the target audio based on the number of samples of the target audio comprises: determining a number of samples of the target audio as a time stamp of the target audio, the number of samples of the target audio pointing to a sample time of the target audio.

6. The method according to any one of claims 1 to 3, further comprising: identifying a first suspected target audio and a second suspected target audio from the video, wherein the first suspected target audio and the second suspected target audio are audio of which the maximum decibel identified from the video is within a preset decibel range, and the second suspected target audio is behind the first suspected target audio;

obtaining a difference value between the sampling number of the first suspected target audio and the sampling number of the second suspected target audio;

the identifying a target image frame from the video recorded from the screen and a target audio from the video comprises: and in response to the difference value being smaller than a preset value, identifying the target image frame from image frames with a number of frames after the sampling number of the second suspected target audio and identifying the target audio from audio with a sampling number after the sampling number of the second suspected target audio.

7. The method according to any one of claims 1 to 3, wherein the controlling the second electronic device to start screen recording comprises:

sending a first command to the second electronic device, wherein the first command is used for indicating the second electronic device to enter a screen recording interface of a screen recording application;

sending a second command to the second electronic device, wherein the second command is used for indicating the second electronic device to trigger a starting control in the screen recording interface;

after the second electronic device completes the preset action, controlling the second electronic device to stop screen recording comprises: after the second electronic device completes the preset action, sending the first command to the second electronic device; and sending a third command to the second electronic equipment, wherein the third command is used for indicating the second electronic equipment to trigger a finishing control in the screen recording interface.

8. The method according to any one of claims 1 to 3, wherein the triggering the second electronic device to enter the preset interface and the triggering the second electronic device to complete the preset action on the preset interface comprises:

sending a fourth command to the second electronic device, where the fourth command is used to instruct the second electronic device to enter a preset interface, and the preset interface includes an identifier pointing to the preset action;

and sending a fifth command to the second electronic device, wherein the fifth command is used for indicating to trigger the identifier so as to enable the second electronic device to complete the preset action on the preset interface.

9. An electronic device, comprising:

one or more processors and memory;

the memory is configured to store computer program code comprising computer instructions which, when executed by the one or more processors, cause the electronic device to perform a method of testing of voice print latency according to any one of claims 1 to 8.

10. A computer storage medium storing a computer program, wherein the computer program is configured to implement the method for testing the sound painting delay according to any one of claims 1 to 8 when executed.