CN114666636A

CN114666636A - Sound and picture synchronous detection method and computer readable storage medium

Info

Publication number: CN114666636A
Application number: CN202210199631.5A
Authority: CN
Inventors: 李瑞佳; 周冰心; 周妍; 曹慧芳
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-03-01
Filing date: 2022-03-01
Publication date: 2022-06-24

Abstract

The invention discloses a sound and picture synchronous detection method and a computer readable storage medium. Wherein, the method comprises the following steps: acquiring a target video, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information; according to the mark information, acquiring first time for playing the target image frame and second time for playing the target audio data when the target video is played; and determining a sound and picture synchronization result when the target video is played according to the first time and the second time. The invention solves the technical problems that the sound and picture synchronous detection in the related technology is difficult to be applied to various scenes and has various limitations in detection.

Description

Sound and picture synchronous detection method and computer readable storage medium

Technical Field

The invention relates to the field of computers, in particular to a sound and picture synchronous detection method and a computer readable storage medium.

Background

The video and sound picture are not synchronous, which means that the sound and picture display are not synchronous when the user watches the video. The audio and video related development tests the asynchrony of sound and pictures, and the initial testers visually detect the asynchrony of voices, but the method has subjective judgment due to the existence of errors of-100-25 ms of human eyes, so an objective evaluation scheme is imperative. In the objective evaluation scheme of the related technology, the characteristic values of the video and the audio are mainly analyzed, and the limitation is large.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a sound and picture synchronous detection method and a computer readable storage medium, which at least solve the technical problems that the sound and picture synchronous detection in the related technology is difficult to be suitable for various scenes and has various limitations in detection.

According to an aspect of the embodiments of the present invention, there is provided a method for detecting synchronization between sound and picture, including: acquiring a target video, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information; according to the mark information, acquiring first time for playing the target image frame and second time for playing the target audio data when the target video is played; and determining a sound and picture synchronization result when the target video is played according to the first time and the second time.

Optionally, determining a sound-picture synchronization result when the target video is played according to the first time and the second time, including: determining a time difference between the first time and the second time; and under the condition that the time difference value is larger than a first preset threshold value or smaller than a second preset threshold value, determining that the sound and picture synchronization result of the target video is sound and picture asynchronization.

Optionally, acquiring, according to the mark information, a first time for playing the target image frame when the target video is played, including: identifying a plurality of image frames for playing the target video, and determining the image frame with the mark information displayed in the plurality of image frames as a target image frame; and determining the playing time of playing the target image frame as the first time.

Optionally, obtaining, according to the tag information, a second time for playing the target audio data when the target video is played, includes: identifying audio data for playing the target video, and determining part of audio data played with the mark information in the audio data as target audio data; and determining the playing time for playing the target audio data as the second time.

Optionally, determining that the playing time for playing the target audio data is after the second time, further includes: determining the duration of playing the target audio data; selecting a target volume value from a plurality of volume values corresponding to the duration; calibrating the second time according to the target volume value.

Optionally, the method further comprises: and adjusting the image frame and the audio data of the target video according to the time difference value between the first time and the second time.

Optionally, after determining that the sound-picture synchronization result of the target video is sound-picture synchronization, the method further includes: and displaying prompt information, wherein the prompt information is used for prompting that the target video sound and picture are not synchronous.

According to an aspect of the embodiments of the present invention, there is provided a method for detecting synchronization between sound and picture, including: displaying a target video on a display interface, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information; receiving a video playing instruction on a display interface; responding to a video playing instruction, playing a target video on a display interface, and displaying prompt information on the display interface under the condition that the sound and picture synchronization result of the target video is that the sound and picture are not synchronized, wherein the prompt information is used for prompting that the sound and picture of the target video are not synchronized, the sound and picture synchronization result is determined according to first time and second time, the first time is the time for playing a target image frame when the target video is played, which is acquired according to the marking information, and the second time is the time for playing target audio data, which is acquired according to the marking information.

According to an aspect of the embodiments of the present invention, there is provided a sound-picture synchronization detecting apparatus, including: the first acquisition module is used for acquiring a target video, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information; the second acquisition module is used for acquiring first time for playing the target image frame and second time for playing the target audio data when the target video is played according to the mark information; and the determining module is used for determining the sound-picture synchronization result when the target video is played according to the first time and the second time.

According to an aspect of the embodiments of the present invention, there is provided a sound-picture synchronization detecting apparatus, including: the display module is used for displaying a target video on a display interface, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information; the receiving module is used for receiving a video playing instruction on a display interface; and the playing module is used for responding to a video playing instruction, playing the target video on the display interface, and displaying prompt information on the display interface under the condition that the sound and picture synchronization result of the target video is that the sound and picture are not synchronized, wherein the prompt information is used for prompting that the sound and picture of the target video are not synchronized, the sound and picture synchronization result is determined according to first time and second time, the first time is the time for playing the target image frame when the target video is played according to the mark information, and the second time is the time for playing the target audio data according to the mark information.

According to an aspect of the embodiments of the present invention, there is provided a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute any one of the sound-picture synchronization detection methods described above.

According to an aspect of an embodiment of the present invention, there is provided a computer apparatus including: a memory and a processor, the memory storing a computer program; and a processor for executing the computer program stored in the memory, wherein the computer program causes the processor to execute any one of the sound-picture synchronization detection methods.

In the embodiment of the invention, the target video is obtained, the target position of the playing time axis comprises a target image frame and target audio data which carry mark information, wherein the mark information is used for marking the time when the target image frame and the target audio data appear, namely, the first time when the target image frame is played and the second time when the target audio data is played are obtained, and then the sound and picture synchronization result when the target video is played is determined according to the first time and the second time, because the first time and the second time are determined according to the playing target image time frame and the playing time of the target audio data, the detection result of the sound and picture synchronization can be effectively determined, and the target image frame and the target audio data are marked by adopting the mark information, the detection result can be obtained by detecting the mark information, the method and the device get rid of various condition limitations, and further solve the technical problems that the method and the device are difficult to be applied to various scenes and have various limitations in detection when sound and picture synchronous detection is carried out in the related technology.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:

FIG. 1 is a block diagram showing a hardware structure of a computer terminal for implementing a sound-picture synchronization detection method;

FIG. 2 is a flowchart of a first method for detecting synchronization of sound and picture according to embodiment 1 of the present invention;

FIG. 3 is a flowchart of a second method for detecting synchronization between sound and picture according to embodiment 1 of the present invention;

FIG. 4 is a flow chart of a method for detecting synchronization of sound and picture provided in accordance with an alternative embodiment of the present invention;

fig. 5 is a block diagram of a first sound-picture synchronization detecting apparatus provided in embodiment 2 of the present invention;

fig. 6 is a block diagram of a second sound-picture synchronization detection apparatus provided in embodiment 3 of the present invention;

fig. 7 is a block diagram of a computer terminal according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method for detecting synchronization of sound and picture, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be executed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be executed in an order different from that described herein.

The method provided by the embodiment 1 of the present application can be executed in a mobile terminal, a computer terminal or a similar computing device. Fig. 1 shows a hardware structure block diagram of a computer terminal (or mobile device) for implementing the sound-picture synchronization detection method. As shown in fig. 1, the computer terminal 10 (or mobile device) may include one or more processors (shown as 102a, 102b, … …, 102n, which may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission device for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the sound-picture synchronization detection method in the embodiment of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implements the sound-picture synchronization detection method of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

Under the above operating environment, the present application provides a sound-picture synchronization detection method as shown in fig. 2. Fig. 2 is a flowchart of a first sound-picture synchronization detection method according to embodiment 1 of the present invention, as shown in fig. 2, the method includes the following steps:

step S202, a target video is obtained, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information;

step S204, according to the mark information, acquiring a first time for playing the target image frame and a second time for playing the target audio data when the target video is played;

step S206, determining the sound and picture synchronization result when the target video is played according to the first time and the second time.

Through the steps, the target video is obtained, the target position of the playing time axis comprises the target image frame and the target audio data which carry the mark information, wherein the mark information is used for marking the time when the target image frame and the target audio data appear, namely the first time when the target image frame is played and the second time when the target audio data is played are obtained, and then the sound and picture synchronization result when the target video is played is determined according to the first time and the second time, because the first time and the second time are determined according to the playing target image time frame and the playing time when the target audio data is played, the detection result of the sound and picture synchronization can be effectively determined, and the target image frame and the target audio data are marked by adopting the mark information, the detection result can be obtained by detecting the mark information, the method gets rid of various condition limitations, and further solves the technical problems that the method is difficult to be applied to various scenes and has various limitations in detection when sound and picture synchronous detection is carried out in the related technology.

As an alternative embodiment, a target video is obtained, where the target video may be a video for performing audio-visual synchronous detection, and the target video includes a target image frame carrying marker information and target audio data at a target position of a play time axis. Namely, the target video has marking information on the image frame and the audio data played at the preset time point, and the marking information is used for marking the target image frame and the target audio data to acquire the time of the target image frame and the target audio data to detect the actual time of the target image frame and the target video data which should be simultaneously appeared. By marking the target video with the marking information, the detection result of the synchronization of the sound and the picture can be obtained subsequently, so that the detection is more convenient and quick, and the detection convenience is improved.

Alternatively, the above-mentioned mark information may be presented in various forms, such as words, arabic numerals, capital letters, lowercase letters, and the like, or in combination of various forms. The number of target positions is not limited to a specific one, and when a plurality of target positions are provided, the continuity of the target positions of the mark can be displayed. The marking information can be presented in various forms, the applicability of the optional embodiment is improved, the marking information which is more in line with the requirement or preference can be set according to the actual situation, and the use experience in the detection process is improved.

Optionally, the target position may be a single position or a plurality of positions, and the target position is considered to be a plurality of positions, which may also consider the case of frame loss or sound card under weak network, and the case that target image frames or target audio data of some target positions cannot be obtained, thereby avoiding the problem of inaccurate test. When multiple positions are considered, the setting of the target position may be performed in various ways. For example, the marking information is an arabic numeral, a numeral serial number may be sequentially added at predetermined time intervals, and for an image frame, the image frame with the numeral serial number appears every predetermined time (for example, 2s), which is 1, 2, 3, etc. respectively; for audio data, the number of the numbered counts of mandarin voice occurring every predetermined time (e.g., 2s) is 1, 2, 3. For another example, since the time of the target video may be longer, the number sequence numbers may be sequentially added at increasing time intervals, for the image frames, that is, for the image frames, one frame of image frame with the number sequence number appears at every increasing time interval (for example, the time interval is 2s, the appearance time is 2s, 6s, 12s …), which is 1, 2, 3, etc. respectively; for audio data, the number of the numbered counts of mandarin human voice occurring every incremental time interval (e.g., 2s time interval, 2s, 6s, 12s …), is 1, 2, 3. And are not limited herein. As long as the time at which the target image frame and the target audio data occur can be regularly presented. By setting the target position, corresponding selection can be performed according to different network conditions or other actual conditions and requirements, so that the detection is more reasonable and more efficient.

As an alternative embodiment, acquiring a first time of playing the target image frame when playing the target video according to the mark information includes: identifying a plurality of image frames of a played target video, and determining the image frame with mark information displayed in the plurality of image frames as a target image frame; and determining the playing time of the playing target image frame as a first time. And recording the mark information displayed in the target image frame and the first time for displaying the target image frame for subsequent processing. When the marking information is a number, i.e., an arabic number, it is determined that an image frame including the number in the plurality of image frames is a target image frame. When the position is multiple and the mark information is a sequentially increasing number sequence number, the plurality of image frames including the number sequence number in the plurality of image frames are determined to be a plurality of target image frames, namely the image frame including the number 1, the image frame including the number 2 and the like. Recording mark information, a first time corresponding to the mark information. And respectively recording a plurality of first times of playing a plurality of target image frames carrying the mark information when the target video is played, respectively calculating time differences with a plurality of second times of a plurality of target audio data, and detecting.

As an alternative embodiment, acquiring the second time when the target audio data is played while the target video is played according to the mark information includes: identifying audio data of a played target video, and determining part of audio data played with the mark information in the audio data as target audio data; and determining the playing time of the playing target audio data as a second time. And recording the mark information displayed in the target audio data and the second time for playing the target audio data for subsequent processing. And when the marking information is the digital serial number, determining that part of the audio data broadcasting the digital serial number in the audio data is the target audio data. When the positions are multiple and the marking information is sequentially increased number serial numbers, determining that multiple parts of audio data broadcasting the number serial numbers in the audio data are multiple target audio data, namely the part of audio data including broadcasting number 1, the part of audio data including broadcasting number 2, and the like. Recording mark information, and recording a second time corresponding to the mark information. And respectively calculating time differences with a plurality of first times of a plurality of target image frames according to a plurality of second times of playing a plurality of target audio data with mark information recorded respectively, and detecting.

It should be noted that, when the tag information is broadcasted by voice, the broadcasting time may be long, for example, when the number 1 is broadcasted, a long sound may be dragged, and therefore, the time for broadcasting the target audio data cannot be well determined. In this case, the duration for playing the target audio data may be determined, the target volume value may be selected from a plurality of volume values corresponding to the duration, the volume value may be acquired and marked in the form of volume pulses, and the second time may be calibrated. The target volume value may be the volume value when the sound is the highest, or may be other volume values, and may be defined by itself. Therefore, the time of the occurrence of the target audio data can be calibrated according to the duration and the target volume value, so that the second time can be determined more accurately.

As an alternative embodiment, the sound-picture synchronization result when the target video is played is determined according to the first time and the second time. The sound and picture synchronization result can be obtained by determining the time difference value between the first time and the second time and comparing the time difference value with the sound and picture synchronization standard. And under the condition that the time difference value is less than or equal to a first preset threshold value or greater than or equal to a second preset threshold value, determining that the sound and picture synchronization result of the target video is sound and picture synchronization. And under the condition that the time difference value is larger than a first preset threshold value or smaller than a second preset threshold value, determining that the sound and picture synchronization result of the target video is sound and picture asynchronization. The result of whether the sound and the picture are synchronized is obtained accurately. After the result of the sound-picture asynchronism is obtained, prompt information for prompting the sound-picture asynchronism of the target video can be displayed, and the image frame and the audio data of the target video can be adjusted according to the time difference between the first time and the second time.

It should be noted that, when the target position includes a plurality of target positions, that is, when the number of the target image frames carrying the tag information and the number of the target audio data are plural, the first time and the second time are respectively plural, where the number of the target image frames and the number of the target audio data are the same, and therefore, the number of the first time and the number of the second time which are correspondingly acquired should be the same. At this time, a plurality of time differences between the corresponding target image frames and the corresponding target audio data are acquired. After obtaining the plurality of time difference values, corresponding settings may be performed according to actual applications and scenes so as to determine the sound-picture synchronization result, for example, an average value may be calculated for the plurality of time difference values within a predetermined time period, and image frames and audio data of the adjustment target video may be calibrated. The time difference value can be adjusted when the time difference value is larger than the first preset threshold value or smaller than the second preset threshold value, and the method is suitable for strict and stricter scenes, so that adjustment can be performed according to the reality to enhance the impression and hearing experience of people.

It should be noted that, when the sound and picture synchronous detection is performed in the present application, there is also a case that, in the case of a weak network, when a frame jam or a sound break occurs, it may not be detected, or target image frames and/or target audio data carrying tag information are missed to be detected, that is, when a plurality of target positions are included, the number of target image frames and target audio data carrying tag information is multiple and the same, and the number of corresponding first time and second time obtained should also be the same, but in some environments or situations, for example, the weak network, target image frames and/or target audio data carrying tag information cannot be detected, so some target image frames and/or target audio data carrying tag information may not be detected, and the number of obtained first time and second time may be missing, or the number of first times and the number of second times are not the same. In this case, the correspondence relationship of the plurality of marker information corresponding to the plurality of target image frames under recording and the plurality of marker information corresponding to the plurality of target audio data may be determined; determining the missing mark information and the target position of the mark information under the condition that the corresponding relation is that the mark information is missing in a plurality of mark information corresponding to a plurality of detected target image frames and/or a plurality of mark information corresponding to a plurality of detected target audio data; and deleting the mark information of the target image frame or the target audio data detected at the recorded target position, and the first time when the detected target image frame appears or the second time when the target audio data appears, namely when determining the sound-picture synchronization result when playing the target video according to the first time and the second time, not considering that both sides are missing and one side has missing. The invalid value can be accurately filtered according to a plurality of mark information corresponding to a plurality of target image frames and a plurality of mark information corresponding to a plurality of target audio data after deletion operation and a plurality of time difference values between a plurality of first time and a plurality of second time, so that the method is suitable for testing the sound picture asynchronization of a normal network and a weak network of various audio and video applications, and can distinguish the missing data from the complete data effectively and inefficiently, thereby improving the accuracy of the detection result.

Fig. 3 is a flowchart of a second sound-picture synchronization detection method according to embodiment 1 of the present invention, and as shown in fig. 3, the method includes the following steps:

step S302, displaying a target video on a display interface, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information;

step S304, receiving a video playing instruction on a display interface;

step S306, responding to the video playing instruction, playing the target video on the display interface, and displaying prompt information on the display interface under the condition that the sound and picture synchronization result of the target video is that the sound and picture are not synchronized, wherein the prompt information is used for prompting that the sound and picture of the target video are not synchronized, the sound and picture synchronization result is determined according to a first time and a second time, the first time is the time for playing the target image frame when the target video is played according to the mark information, and the second time is the time for playing the target audio data according to the mark information.

Through the steps, the target video is displayed on the display interface, wherein the target video comprises the target image frame and the target audio data which carry the mark information at the target position of the playing time axis, the mark information is used for marking the time when the target image frame and the target audio data appear, namely the first time when the target image frame is played and the second time when the target audio data is played are obtained, and then the sound and picture synchronization result when the target video is played is determined according to the first time and the second time. And then the display interface receives and responds to the video playing instruction, the target video is played on the display interface, and prompt information is displayed on the display interface to adjust the sound and picture synchronization condition under the condition that the sound and picture synchronization result of the target video is that the sound and picture are not synchronous. The first time and the second time are determined according to the time frame of playing the target image and the time of playing the target audio data, so that the detection result of the sound-picture synchronization can be effectively determined, and the target image frame and the target audio data are marked by adopting the marking information, so that the detection result can be obtained only by detecting the marking information, various condition limitations are eliminated, and the technical problems that the detection is difficult to adapt to various scenes and the detection has various limitations when the sound-picture synchronization detection is carried out in the related technology are solved.

Based on the above embodiments and alternative embodiments, an alternative implementation is provided, which is described in detail below.

In the related art, the method mainly analyzes the characteristic values of videos and audios, and has large limitation.

Based on this, in the alternative embodiment of the present invention, a sound-picture synchronization detecting method is provided, and fig. 4 is a flowchart of the sound-picture synchronization detecting method according to the alternative embodiment of the present invention, as shown in fig. 4, and the following describes in detail the alternative embodiment of the present invention.

S1, material addition feature:

the method comprises the steps of obtaining a preset video, adding mark information in sequence according to a preset time interval in an original image frame of the preset video, wherein the mark information refers to a serial number, adding mark information in sequence according to the same preset time interval in original audio data of the preset video, and the mark information refers to a characteristic value and the serial number to obtain a target video.

For example, the audio is a digital number of a mandarin voice at intervals, which is 1, 2, 3, etc., and the others are silent. The video is a frame of text with a combination of feature values and numbers at intervals, and other pictures of the target video 1, the target video 2 and the target video 3 … are blank pictures respectively.

S2, inputting the target video, respectively extracting the video and the audio of the target video:

s2.1, extracting a video of the target video;

s2.1.1, framing the target video to obtain a plurality of image frames and obtaining time stamps of each frame;

s2.1.2, traversing each image frame, detecting whether a characteristic value appears in the image frame, wherein the characteristic value can be displayed in a text form, and the text can mark not only the image frame but also the target video;

s2.1.3, in the case of a feature value appearing in an image frame, the image frame with the feature is extracted and the corresponding number and the relative time stamp T2 at which the image frame appears are recorded.

S2.2, extracting the audio of the target video;

s2.2.1, performing voice recognition of each digit of the audio data of the target video, converting the digit in text form into Arabic numerals, recording the digits and reading the time stamp corresponding to the digits; the volume identifies the pulse appearing in the voice number, the identified volume pulse is calculated, and the corresponding relative timestamp appearing in the corresponding high value is recorded;

s2.2.2, the time corresponding to the read number is calibrated by the two time stamps, namely the result corresponding to the character of the voice recognition is calibrated by referring to the result of the volume peak recognition, and the time stamp T1 corresponding to the accurate voice recognized number is obtained.

It should be noted that, in the above process, it may also be checked whether the number recognized by the voice is continuous; under the condition that the numbers are discontinuous, converting the numbers in the text form into Arabic numbers, recording the numbers and reading the time corresponding to the numbers; combining processing is carried out under the condition of continuous digits, the digits in the text form are converted into Arabic numerals, the digits are recorded, and the time corresponding to the digits is read out.

And S3, corresponding numbers of the audio and the video are obtained, and if one of the audio and the video numbers is lost, the other one of the audio and the video numbers is discarded. If the audio time and the video time of the same number are obtained, the time (T2-T1) of the asynchronization of the sound and the picture can be obtained, and the average effect is obtained by averaging a plurality of tested characteristics. By the obtained average effect, the sound and picture synchronization standard of television broadcasting is combined, namely, the following cannot be perceived: -100ms to 25ms, capable of recognizing: 125 ms-45 ms, unacceptable: less than-185 ms to more than 90 ms. To obtain the detection result of the sound-picture synchronization.

Through the above optional embodiment, the following beneficial effects can be achieved:

(1) obtaining a time list of each target audio data by using voice recognition, obtaining a time list of each target image frame, and when part of features under a weak network are lost (marking information is not detected), determining the features as invalid features in an algorithm comparison process, and only calculating valid feature data according to a detection result of sound-picture synchronization, so that an audio-video feature can not be aligned to an upper data error, invalid values are accurately filtered, and the problem of inaccurate frame loss test caused by a network problem is solved;

(2) the requirement on environmental noise is low, only characteristic data are obtained for analysis, and the method is wide in applicability and high in accuracy;

(3) the method is suitable for testing the asynchrony of the sound and the picture under a normal network and a weak network of various audio and video applications, and the accuracy of the detection result is improved by effective and ineffective differentiation.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a computer-readable storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

According to an embodiment of the present invention, there is further provided a first device for implementing the sound-picture synchronization detection method, and fig. 5 is a block diagram of a first structure of the sound-picture synchronization detection device provided in embodiment 2 of the present invention, as shown in fig. 5, the device includes: a first obtaining module 502, a second obtaining module 504, and a determining module 506, which are described below.

A first obtaining module 502, configured to obtain a target video, where the target video includes a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry tag information; a second obtaining module 504, connected to the first obtaining module 502, for obtaining a first time for playing the target image frame and a second time for playing the target audio data when the target video is played according to the mark information; the determining module 506 is connected to the second obtaining module 504, and configured to determine a sound-picture synchronization result when the target video is played according to the first time and the second time.

It should be noted here that the first obtaining module 502, the second obtaining module 504 and the determining module 506 correspond to steps S202 to S206 in embodiment 1, and a plurality of modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.

Example 3

According to an embodiment of the present invention, there is further provided a second device for implementing the sound-picture synchronization detection method, and fig. 6 is a block diagram of a second device for detecting sound-picture synchronization provided in embodiment 3 of the present invention, and as shown in fig. 6, the second device includes: a display module 602, a receiving module 604 and a playing module 606, which will be described below.

The display module 602 is configured to display a target video on a display interface, where the target video includes a target image frame and target audio data at a target position of a play time axis, and the target image frame and the target audio data carry tag information; a receiving module 604, connected to the display module 602, for receiving a video playing instruction on a display interface; the playing module 606 is connected to the receiving module 604, and configured to respond to the video playing instruction, play the target video on the display interface, and display a prompt message on the display interface when the audio-visual synchronization result of the target video is that the audio and visual synchronization is not synchronous, where the prompt message is used to prompt that the audio and visual synchronization of the target video is not synchronous, the audio-visual synchronization result is determined according to a first time and a second time, the first time is a time for playing the target image frame when the target video is played according to the tag information, and the second time is a time for playing the target audio data according to the tag information.

It should be noted that, the display module 602, the receiving module 604 and the playing module 606 correspond to steps S302 to S306 in embodiment 1, and the implementation examples and application scenarios of the modules and the corresponding steps are the same, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.

Example 4

The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute the program code of the following steps in the sound-picture synchronization detection method of the application program: acquiring a target video, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information; according to the mark information, acquiring first time for playing the target image frame and second time for playing the target audio data when the target video is played; and determining a sound and picture synchronization result when the target video is played according to the first time and the second time.

Alternatively, fig. 7 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 7, the computer terminal may include: one or more (only one shown) processors 71, memory 72, and the like.

The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the sound and picture synchronization detection method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by operating the software programs and modules stored in the memory, that is, implements the sound and picture synchronization detection method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the computer terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a target video, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information; according to the mark information, acquiring first time for playing the target image frame and second time for playing the target audio data when the target video is played; and determining a sound and picture synchronization result when the target video is played according to the first time and the second time.

Optionally, the processor may further execute the program code of the following steps: determining a sound and picture synchronization result when the target video is played according to the first time and the second time, wherein the sound and picture synchronization result comprises the following steps: determining a time difference between the first time and the second time; and under the condition that the time difference value is larger than a first preset threshold value or smaller than a second preset threshold value, determining that the sound and picture synchronization result of the target video is sound and picture asynchronization.

Optionally, the processor may further execute the program code of the following steps: acquiring a first time for playing the target image frame when the target video is played according to the mark information, wherein the method comprises the following steps: identifying a plurality of image frames of a played target video, and determining the image frame with mark information displayed in the plurality of image frames as a target image frame; and determining the playing time of the playing target image frame as a first time.

Optionally, the processor may further execute the program code of the following steps: acquiring a second time for playing the target audio data when the target video is played according to the mark information, comprising: identifying audio data of a played target video, and determining part of audio data played with the marking information in the audio data as target audio data; and determining the playing time of the playing target audio data as a second time.

Optionally, the processor may further execute the program code of the following steps: determining that the playing time of the playing target audio data is after the second time, further comprising: determining the duration of playing the target audio data; selecting a target volume value from a plurality of volume values corresponding to the duration; and calibrating the second time according to the target volume value.

Optionally, the processor may further execute the program code of the following steps: further comprising: and adjusting the image frame and the audio data of the target video according to the time difference value between the first time and the second time.

Optionally, the processor may further execute the program code of the following steps: after determining that the sound-picture synchronization result of the target video is that the sound-picture is not synchronized, the method further comprises the following steps: and displaying prompt information, wherein the prompt information is used for prompting that the target video pictures and voices are not synchronous.

Optionally, the processor may further execute the program code of the following steps: displaying a target video on a display interface, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information; receiving a video playing instruction on a display interface; responding to a video playing instruction, playing a target video on a display interface, and displaying prompt information on the display interface under the condition that a sound picture synchronization result of the target video is that sound pictures are not synchronized, wherein the prompt information is used for prompting that the sound pictures of the target video are not synchronized, the sound picture synchronization result is determined according to first time and second time, the first time is the time for playing a target image frame when the target video is played according to the mark information, and the second time is the time for playing target audio data according to the mark information.

It can be understood by those skilled in the art that the structure shown in fig. 7 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 7 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 7 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 7, or have a different configuration than shown in FIG. 7.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the computer-readable storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 5

Embodiments of the present invention also provide a computer-readable storage medium. Optionally, in this embodiment, the computer-readable storage medium may be configured to store program codes executed by the sound-picture synchronization detection method provided in the first embodiment.

Optionally, in this embodiment, the computer-readable storage medium may be located in any one of a group of computer terminals in a computer network, or in any one of a group of mobile terminals.

Optionally, in this embodiment, a computer-readable storage medium is configured to store program code for performing the steps of: acquiring a target video, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information; according to the mark information, acquiring first time for playing the target image frame and second time for playing the target audio data when the target video is played; and determining a sound and picture synchronization result when the target video is played according to the first time and the second time.

Optionally, in this embodiment, the computer readable storage medium is configured to store program code for performing the following steps: determining a sound and picture synchronization result when the target video is played according to the first time and the second time, wherein the sound and picture synchronization result comprises the following steps: determining a time difference between the first time and the second time; and under the condition that the time difference value is larger than a first preset threshold value or smaller than a second preset threshold value, determining that the sound and picture synchronization result of the target video is sound and picture asynchronization.

Optionally, in this embodiment, the computer readable storage medium is configured to store program code for performing the following steps: acquiring a first time for playing the target image frame when the target video is played according to the mark information, wherein the first time comprises the following steps: identifying a plurality of image frames of a played target video, and determining the image frame with mark information displayed in the plurality of image frames as a target image frame; and determining the playing time of the playing target image frame as a first time.

Optionally, in this embodiment, the computer readable storage medium is configured to store program code for performing the following steps: acquiring a second time for playing the target audio data when the target video is played according to the mark information, comprising: identifying audio data of a played target video, and determining part of audio data played with the mark information in the audio data as target audio data; and determining the playing time of the playing target audio data as a second time.

Optionally, in this embodiment, the computer readable storage medium is configured to store program code for performing the following steps: determining that the playing time of the playing target audio data is after the second time, further comprising: determining the duration of playing the target audio data; selecting a target volume value from a plurality of volume values corresponding to the duration; the second time is calibrated according to the target volume value.

Optionally, in this embodiment, the computer readable storage medium is configured to store program code for performing the following steps: further comprising: and adjusting the image frame and the audio data of the target video according to the time difference value between the first time and the second time.

Optionally, in this embodiment, the computer readable storage medium is configured to store program code for performing the following steps: after determining that the sound-picture synchronization result of the target video is that the sound-picture is not synchronized, the method further comprises the following steps: and displaying prompt information, wherein the prompt information is used for prompting that the target video pictures and voices are not synchronous.

Optionally, in this embodiment, the computer readable storage medium is configured to store program code for performing the following steps: displaying a target video on a display interface, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information; receiving a video playing instruction on a display interface; responding to a video playing instruction, playing a target video on a display interface, and displaying prompt information on the display interface under the condition that a sound picture synchronization result of the target video is that sound pictures are not synchronized, wherein the prompt information is used for prompting that the sound pictures of the target video are not synchronized, the sound picture synchronization result is determined according to first time and second time, the first time is the time for playing a target image frame when the target video is played according to the mark information, and the second time is the time for playing target audio data according to the mark information.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described in detail in a certain embodiment.

In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a computer-readable storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned computer-readable storage media comprise: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A sound and picture synchronous detection method is characterized by comprising the following steps:

acquiring a target video, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information;

according to the marking information, acquiring first time for playing the target image frame and second time for playing the target audio data when the target video is played;

and determining a sound and picture synchronization result when the target video is played according to the first time and the second time.

2. The method of claim 1, wherein determining the sound-picture synchronization result when playing the target video according to the first time and the second time comprises:

determining a time difference between the first time and the second time;

and under the condition that the time difference value is larger than a first preset threshold value or smaller than a second preset threshold value, determining that the sound and picture synchronization result of the target video is sound and picture asynchronization.

3. The method of claim 1, wherein obtaining a first time to play the target image frame when playing the target video according to the mark information comprises:

identifying a plurality of image frames for playing the target video, and determining the image frame with the mark information displayed in the plurality of image frames as a target image frame;

and determining the playing time of playing the target image frame as the first time.

4. The method of claim 1, wherein obtaining a second time for playing the target audio data when playing the target video according to the mark information comprises:

identifying audio data playing the target video, and determining part of audio data playing the mark information in the audio data as target audio data;

and determining the playing time for playing the target audio data as the second time.

5. The method of claim 4, wherein determining a playing time for playing the target audio data after the second time further comprises:

determining the duration of playing the target audio data;

selecting a target volume value from a plurality of volume values corresponding to the duration;

calibrating the second time according to the target volume value.

6. The method of claim 2, further comprising:

and adjusting the image frames and the audio data of the target video according to the time difference value between the first time and the second time.

7. The method according to claim 2, wherein after determining that the sound-picture synchronization result of the target video is sound-picture out-of-synchronization, the method further comprises:

and displaying prompt information, wherein the prompt information is used for prompting that the target video sound and picture are not synchronous.

8. A sound and picture synchronous detection method is characterized by comprising the following steps:

displaying a target video on a display interface, wherein the target video comprises a target image frame and target audio data at a target position of a playing time axis, and the target image frame and the target audio data carry mark information;

receiving a video playing instruction on the display interface;

responding to the video playing instruction, playing the target video on the display interface, and displaying prompt information on the display interface under the condition that the sound and picture synchronization result of the target video is that the sound and picture are not synchronized, wherein the prompt information is used for prompting that the sound and picture of the target video are not synchronized, the sound and picture synchronization result is determined according to a first time and a second time, the first time is the time for playing the target image frame when the target video is played according to the mark information, and the second time is the time for playing the target audio data according to the mark information.

9. A computer-readable storage medium, comprising a stored program, wherein when the program runs, the computer-readable storage medium is controlled by an apparatus to execute the sound-picture synchronization detection method according to any one of claims 1 to 7.

10. A computer device, comprising: a memory and a processor, wherein the processor is configured to,

the memory stores a computer program;

the processor is configured to execute the computer program stored in the memory, and the computer program causes the processor to execute the sound-picture synchronization detection method according to any one of claims 1 to 7 when running.