CN113810680A

CN113810680A - Audio synchronization detection method and device, computer readable medium and electronic equipment

Info

Publication number: CN113810680A
Application number: CN202111089134.1A
Authority: CN
Inventors: 于雪松; 熊磊; 文锐烽
Original assignee: Shenzhen Huantai Technology Co Ltd
Current assignee: Shenzhen Huantai Technology Co Ltd
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2021-12-17

Abstract

The disclosure provides an audio synchronization detection method, an audio synchronization detection device, a computer readable medium and an electronic device, and relates to the technical field of audio and video processing. The method comprises the following steps: decoding the test audio based on a tested decoder to obtain first audio data; generating a corresponding sample character string to be tested based on first audio data, and acquiring a standard sample character string corresponding to the test audio; and determining a matching character string matched with the sample character string to be detected in the standard character string, and determining the audio synchronous detection result of the decoder to be detected according to the matching character string. Compared with the related technology, the audio synchronous detection method realized in a software mode can realize large-batch automatic testing, does not need hardware signal acquisition equipment, and avoids the problem of high testing cost caused by hardware testing.

Description

Audio synchronization detection method and device, computer readable medium and electronic equipment

Technical Field

The present disclosure relates to the field of audio and video processing technologies, and in particular, to an audio synchronization detection method, an audio synchronization detection apparatus, a computer-readable medium, and an electronic device.

Background

Under the background of the continuous development of multimedia technology, the application scenes of multimedia real-time transmission are more and more. Audio delays occur in these scenarios, which can severely impact the user experience. For example, in a multi-player voice scenario of a game, audio delay may cause a game critical opportunity to be missed, which may result in a game failure, etc. Audio delay is typically caused by synchronization errors introduced by the audio data as it passes through the audiovisual processing system, particularly when decoding processing is performed.

Related multimedia decoder synchronization detection techniques generally fall into two categories: one is manual test, and the evaluation is carried out by subjective scoring of multiple persons; one is automatic test, which respectively collects audio and video hardware signals generated by auxiliary equipment and equipment to be tested through a hardware signal collecting device, and calculates audio and video synchronization parameters to judge synchronization conditions. However, the efficiency of testing is performed manually, and large-batch testing cannot be realized; although the automatic test can realize mass test, two sets of hardware equipment are required to be provided for hardware signal acquisition, and the implementation cost is high.

Disclosure of Invention

The present disclosure is directed to provide a new audio synchronization detection method, an audio synchronization detection apparatus, a computer-readable medium, and an electronic device, which can implement a large batch of automated tests in a software manner, and avoid the problems of low efficiency caused by manual tests and high test cost caused by hardware tests.

According to a first aspect of the present disclosure, there is provided an audio synchronization detection method, including: decoding the test audio based on a tested decoder to obtain first audio data; generating a corresponding sample character string to be tested based on the first audio data, and acquiring a standard sample character string corresponding to the test audio; the standard sample character string is generated based on second audio data obtained by decoding the test audio by a standard decoder; the length of the standard sample character string is more than or equal to that of the sample character string to be detected; and determining a matching character string matched with the sample character string to be detected in the standard character string, and determining the audio synchronous detection result of the decoder to be detected according to the matching character string.

According to a second aspect of the present disclosure, there is provided an audio synchronization detecting apparatus comprising: the audio decoding module is used for decoding the test audio based on the tested decoder to obtain first audio data; the character string generating module is used for generating a corresponding sample character string to be tested based on the first audio data and acquiring a standard sample character string corresponding to the test audio; the standard sample character string is generated based on second audio data obtained by decoding the test audio by a standard decoder; the length of the standard sample character string is more than or equal to that of the sample character string to be detected; and the result determining module is used for determining a matching character string matched with the sample character string to be detected in the standard character string and determining the audio synchronous detection result of the decoder to be detected according to the matching character string.

According to a third aspect of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the above-mentioned method.

According to a fourth aspect of the present disclosure, there is provided an electronic apparatus, comprising: a processor; and memory storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the above-described method.

According to the audio synchronous detection method provided by the embodiment of the disclosure, a sample character string to be detected is generated based on first audio data obtained by decoding a test audio by a decoder to be detected, a standard sample character string generated by second audio data obtained by decoding the test audio by a standard decoder is obtained, then a matching character string corresponding to the sample character string to be detected is determined in the standard character string, and then a synchronous detection result of the decoder to be detected is determined based on the matching character string. On one hand, the invention provides a new audio synchronization detection method, which realizes the test process of synchronizing audio by using a software method by converting audio data into character strings and matching; on the other hand, compared with the related art, the mode of converting the character string into the character string can realize large-batch automatic testing, does not need hardware signal acquisition equipment and can be realized only through software, and the problem of high testing cost caused by hardware testing is solved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

FIG. 1 illustrates a schematic diagram of an exemplary system architecture to which embodiments of the present disclosure may be applied;

FIG. 2 shows a schematic diagram of an electronic device to which embodiments of the present disclosure may be applied;

FIG. 3 schematically illustrates a flow chart of a method of audio synchronization detection in an exemplary embodiment of the present disclosure;

FIG. 4 is a diagram schematically illustrating corresponding values of a characteristic of audio data;

FIG. 5 schematically illustrates 4 cases where a matching sample string is determined in a standard sample string in exemplary embodiments of the present disclosure;

FIG. 6 schematically illustrates a flow chart of another audio synchronization detection method in an exemplary embodiment of the present disclosure;

fig. 7 schematically shows a composition diagram of an audio synchronization detection apparatus in an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Fig. 1 is a schematic diagram illustrating a system architecture of an exemplary application environment to which an audio synchronization detection method and apparatus according to an embodiment of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The

terminal devices

101, 102, 103 may be various terminal devices having multimedia processing functions, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.

The audio synchronization detection method provided by the embodiment of the present disclosure is generally executed by the

terminal devices

101, 102, and 103, and accordingly, the audio synchronization detection apparatus is generally disposed in the

terminal devices

101, 102, and 103. However, it is easily understood by those skilled in the art that the audio synchronization detecting method provided in the embodiment of the present disclosure may also be executed by the server 105, and accordingly, the audio synchronization detecting apparatus may also be disposed in the server 105, which is not particularly limited in the exemplary embodiment. For example, in an exemplary embodiment, the server 105 may obtain, through the network 104, first audio data and second audio data obtained by decoding by decoders in the

terminal devices

101, 102, and 103, then process the first audio data and the second audio data respectively to generate a sample character string to be detected and a standard sample character string, further determine, in the standard character string, a matching character string corresponding to the sample character string to be detected, and determine, according to the matching character string, a synchronous detection result of the decoder to be detected.

The exemplary embodiment of the present disclosure provides an electronic device for implementing an audio synchronization detection method, which may be the

terminal device

101, 102, 103 or the server 105 in fig. 1. The electronic device comprises at least a processor and a memory for storing executable instructions of the processor, the processor being configured to perform the audio synchronization detection method via execution of the executable instructions.

The following takes the mobile terminal 200 in fig. 2 as an example, and exemplifies the configuration of the electronic device. It will be appreciated by those skilled in the art that the configuration of figure 2 can also be applied to fixed type devices, in addition to components specifically intended for mobile purposes. In other embodiments, mobile terminal 200 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware. The interfacing relationship between the components is only schematically illustrated and does not constitute a structural limitation of the mobile terminal 200. In other embodiments, the mobile terminal 200 may also interface differently than shown in fig. 2, or a combination of multiple interfaces.

As shown in fig. 2, the mobile terminal 200 may specifically include: a processor 210, an internal memory 221, an external memory interface 222, a Universal Serial Bus (USB) interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 250, a wireless communication module 260, an audio module 270, a speaker 271, a microphone 272, a microphone 273, an earphone interface 274, a sensor module 280, a display 290, a camera module 291, an indicator 292, a motor 293, a button 294, and a Subscriber Identity Module (SIM) card interface 295. Wherein the sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyroscope sensor 2803, and the like.

Processor 210 may include one or more processing units, such as: the Processor 210 may include an Application Processor (AP), a modem Processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband Processor, and/or a Neural-Network Processing Unit (NPU), and the like. The different processing units may be separate devices or may be integrated into one or more processors.

A memory is provided in the processor 210. The memory may store instructions for implementing six modular functions: detection instructions, connection instructions, information management instructions, analysis instructions, data transmission instructions, and notification instructions, and execution is controlled by processor 210.

The mobile terminal 200 may implement an audio function through the audio module 270, the speaker 271, the receiver 272, the microphone 273, the earphone interface 274, the application processor, and the like. Such as music playing, recording, etc.

The depth sensor 2801 is used to obtain depth information of the scene; the pressure sensor 2802 is used for sensing a pressure signal and converting the pressure signal into an electrical signal; the gyro sensor 2803 may be used to determine a motion gesture of the mobile terminal 200.

In addition, other functional sensors, such as an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc., may be provided in the sensor module 280 according to actual needs.

Other devices for providing auxiliary functions may also be included in mobile terminal 200. For example, the keys 294 include a power-on key, a volume key, and the like, and a user can generate key signal inputs related to user settings and function control of the mobile terminal 200 through key inputs. Further examples include indicator 292, motor 293, SIM card interface 295, etc.

The present exemplary embodiment provides an audio synchronization detection method. The audio synchronization detection method may be applied to the server 105, and may also be applied to one or more of the

terminal devices

101, 102, and 103, which is not particularly limited in this exemplary embodiment. The audio synchronous detection method can be used for not only carrying out factory detection on the detected decoder, but also detecting whether the detected decoder is damaged or not.

Referring to fig. 3, the audio synchronization detecting method may include the following steps S310 to S330:

in step S310, the test audio is decoded based on the decoder under test to obtain first audio data.

In an exemplary embodiment, after the test audio for synchronous detection is obtained, the test audio may be decoded by the decoder under test to obtain the first audio data decoded by the decoder under test. The test audio may include directly acquired audio data, or may also include audio data included in the video data, which is not particularly limited in this disclosure; correspondingly, the decoder under test may be a decoder for decoding only audio, or may be a decoder for decoding video data. In addition, when the test audio is the audio data contained in the video data, the synchronization between the picture and the audio after the video decoding can be determined based on the audio synchronization detection result.

In step S320, a corresponding sample string to be tested is generated based on the first audio data, and a standard sample string corresponding to the test audio is obtained.

Wherein the standard sample string is generated based on second audio data obtained by decoding the test audio by a standard decoder. Specifically, there are various ways to obtain the standard sample string. For example, the second audio data obtained by decoding the test audio based on the standard decoder may be stored in advance, and when performing audio synchronization detection, a standard sample character string is generated by sampling and converting the second audio data; for another example, fixed and unchangeable sampling content can be set in advance, and a standard sample character string corresponding to the tested audio is stored in advance, and is directly read and used when audio synchronous detection is carried out; for another example, when audio synchronization detection is performed, the test audio may be processed through the standard decoder and the tested decoder at the same time, so as to obtain the sample string to be tested and the standard sample string at the same time. It should be noted that, since the test audio may include audio data directly acquired, it may also include audio data included in the video data. Correspondingly, the standard decoder may be a decoder for decoding only audio, or may be a decoder for decoding video data, as in the decoder under test.

The decoding data of the decoder is acquired for synchronous detection, and analog audio signals can be directly acquired for synchronous detection, so that the synchronous detection does not depend on hardware equipment to acquire audio and video hardware signals, and the test cost is reduced. In addition, the obtained synchronous detection result can also assist in repairing, updating and the like of the decoder to be tested.

It should be added that a standard decoder generally refers to a decoder that can achieve fully synchronous decoding. However, in some scenarios, the standard decoder may also be an index for defining the requirement of the current scenario for the decoder, that is, the decoding result of the standard decoder is the target decoding result required in the current scenario, so that the decoding result of the standard decoder is compared with the decoding result of the decoder under test.

The standard sample character string is a character string generated correspondingly by a sample collected in second audio data obtained by decoding the test audio by a standard decoder; the character string of the sample to be detected is a character string generated correspondingly by the sample collected in the first audio data decoded by the decoder to be detected; the length of the standard sample character string is more than or equal to that of the sample character string to be detected. Specifically, in the subsequent process, a matching character string matching the sample character string to be detected needs to be determined in the standard sample character string. Therefore, the length of the standard sample character string can be larger than or equal to the length of the sample character string to be detected, so that at least one candidate character string exists in the standard sample character string, and the matching character string matched with the sample character string to be detected is further determined.

In an exemplary embodiment, when a corresponding to-be-tested sample character string is generated based on first audio data and a standard sample character string corresponding to a test audio is acquired, acquiring continuous N audio data with preset lengths as standard sample units in second audio data based on a first sampling timestamp; for a sample unit to be detected, acquiring continuous audio data with N preset lengths in first audio data based on a first sampling timestamp, and taking the audio data with the Mth preset length in the N preset lengths as the sample unit to be detected; and then, respectively calculating a to-be-detected sample character string corresponding to the to-be-detected sample unit and a standard sample character string corresponding to the standard sample unit according to the feature-character string mapping rule. Wherein, N can be a positive integer, and M can be a positive integer less than or equal to N. By limiting the values of M and N, a standard sample unit and a sample unit to be detected can be obtained, the length of the standard sample unit is greater than or equal to that of the sample unit to be detected, and further the length of the obtained standard sample character string is greater than or equal to that of the sample character string to be detected; in addition, since the mth preset-length audio data is one of the N preset lengths, the standard sample unit can be made to have a reference function.

It should be noted that, in the case of synchronization, the matching string should be the mth preset length in the standard sample unit. Therefore, in order to make the difference between the positions of the matching character string and the mth preset length on the timeline visible under the asynchronous condition, the mth preset length can be located at the middle position of the N preset lengths as much as possible. For example, when N is 5, M may be 3; when N is 10, M may be 5 or 6.

Furthermore, since audio data is streamed, there is no explicit concept of units or frames. However, for the convenience of audio algorithm processing, the multimedia resource is usually packaged as a frame according to a certain time length, for example, 2.5ms to 60ms of audio data may be taken as a frame, so that the preset length may be set according to the time length, for example, a frame may be taken. Correspondingly, different application scenarios have different requirements for synchronization of the decoder, so different preset lengths can be set according to the application scenarios of the decoder.

In an exemplary embodiment, when the corresponding sample character string to be tested is generated based on the first audio data and the standard sample character string corresponding to the test audio is acquired, sampling and conversion may be performed based on the second sampling time stamp, respectively. Specifically, the first audio data may be sampled based on the second sampling timestamp to obtain a sample unit to be detected, then the second audio data may be sampled based on the second sampling timestamp and the unit length of the sample unit to be detected to obtain a standard sample unit with a length greater than or equal to that of the sample unit to be detected, and then the sample unit to be detected and the standard sample unit may be converted according to the same feature-character string mapping rule to obtain a sample character string to be detected after conversion of the sample unit to be detected and a standard sample character string after conversion of the standard sample unit.

The characteristic-character string mapping rule can be customized differently, and the mapping rule can achieve the purpose of marking the characteristic of each numerical value. For example, the mapping may be performed according to positive and negative attributes of a numerical value corresponding to the audio data. Namely, when the numerical value is positive, the corresponding character is 1, the negative value and the zero value correspond to the character is 0, and then the sample character string to be measured and the standard sample character string are correspondingly calculated based on the rule. The numerical value corresponding to the audio data refers to a sampling value acquired by the sound card when sampling is performed according to a specific frequency.

For example, referring to the audio passage shown in fig. 4, the audio passage may be mapped to the character string "111111111100010100000000111110 … …" based on the above-described feature-string mapping rule that positive values map to 1, and negative values and zero values map to 0.

Note that, when different decoders perform audio decoding, if audio data corresponding to the same time stamp is the same, synchronization is described. Therefore, when the standard sample unit is collected, if the time stamp corresponding to the collected standard sample unit covers the time stamp corresponding to the sample unit to be tested, whether the standard sample unit is synchronous or not can be determined more quickly. Based on the above problem, when the standard sample unit is acquired, the acquisition may also be performed based on the second sampling time stamp.

In an exemplary embodiment, when the sample unit to be tested is acquired based on the second sampling timestamp, the first audio data with a preset length may be intercepted as the sample unit to be tested, starting from the second sampling timestamp. As with the sampling based on the first sampling timestamp, the preset length may also be set in terms of time duration, for example, one frame may be taken.

When the second sampling time stamp is used as a starting point and the first audio data with the preset length is intercepted as the sample unit to be measured, in order to enable the standard sample unit to have a reference function, the standard sample unit can be acquired on the basis of the second sampling time stamp when being acquired.

Specifically, the second sampling time stamp of the sample unit to be detected may be collected as a starting point, and the second audio data having the same length as the unit of the sample unit to be detected may be collected as a sampling reference, that is, the sampling reference is the same as the time stamp range corresponding to the sample unit to be detected; then, taking the unit length of the sample unit to be measured as a length standard, and respectively calculating the forward length intercepted forwards and the backward length intercepted backwards of the standard sample unit on a time line; then, regarding the forward length, taking the starting time stamp of the sampling reference as a starting point, and intercepting the second audio data with the same forward length forward by using a delay time line as a forward sample; regarding the backward length, taking the termination time stamp of the sampling reference as a starting point, and intercepting second audio data with loudness of the backward length backwards by using a delay time line as a backward sample; and then splicing the sampling reference, the forward sample and the backward sample according to the time line sequence to obtain the standard sample unit which is continuous according to time.

It should be noted that the lengths of the forward length and the backward length may be the same or different.

For example, if the second sampling timestamp is 75ms and the preset time duration is 25ms, the corresponding timestamp range of the sample unit to be detected in the first audio data is 75ms-100 ms; correspondingly, the corresponding time stamp range of the sampling reference in the second audio data is 75ms-100 ms; then, on the basis of the unit length of a sample unit to be detected being 25ms, the forward length and the backward length are both set to be 2 times of the unit length, namely 50 ms; at this time, the corresponding timestamp range of the forward samples intercepted based on the forward length in the second audio data is 25ms-75ms, and the corresponding timestamp range of the backward samples intercepted based on the backward length in the second audio data is 100ms-150 ms; and then splicing the forward sample, the sampling reference and the backward sample to obtain a corresponding time stamp range of the standard sample unit in the second audio data, wherein the time stamp range is 25ms-150 ms.

By the method for collecting the sample unit to be detected and the standard sample unit, the standard sample unit can keep samples with a certain length before and after the time stamp range on the premise that the time stamp range corresponding to the second audio data of the standard sample unit covers the time stamp range corresponding to the first audio data of the sample unit to be detected, so that whether the decoding result of the decoder to be detected deviates backwards or forwards can be determined conveniently.

It should be noted that, when testing, a plurality of first sampling time stamps or a plurality of second sampling time stamps can be designed to perform multi-point sampling, and the above-mentioned sampling process is performed on the plurality of first sampling time stamps or the plurality of second sampling time stamps respectively, so that the decoding result of the tested decoder is evaluated integrally, and a more accurate synchronous detection result is obtained.

In step S330, a matching character string matching the sample character string to be tested is determined in the standard character string, and an audio synchronization detection result of the decoder under test is determined according to the matching character string.

The matching character string refers to a character string which is most similar to the character string of the sample to be detected in a plurality of character strings which can be determined in the standard character string and have the same length as the character string of the sample to be detected. For example, assume that a sample cell to be measured is mapped to obtain a character string 0100110100 with 10 bits, and a corresponding standard sample cell is mapped to obtain a character string with 50 bits. A total of 41 continuous character strings with 10 bits can be determined in the standard character string, and the character string with the highest similarity to 0100110100 is determined to be the matching character string in the 41 character strings.

In an exemplary embodiment, when determining a matching character string in a standard character string, a comparison character string having the same length as a sample character string to be detected may be sequentially obtained bit by bit in the standard character string, then for each comparison character string, the matching degree between the comparison character string and the sample character string to be detected is respectively calculated, and the comparison character string having the largest matching degree is determined as the matching character string corresponding to the sample character to be detected.

It should be noted that the matching degree between character strings can be determined according to the ratio of the number of characters with the same number of digits to the total number of digits of the character string. For example, assuming that the number of bits of a certain character string a is 10 bits, and the 2 nd to 9 th characters of another character string B are the same as the 2 nd to 9 th characters in the character string a, it can be determined that the matching degree between the character string a and the character string B is 80%.

In an exemplary embodiment, when determining the synchronization detection result of the decoder under test according to the matching string, the synchronization detection result may be determined according to a relationship between a matching degree of the matching string and a threshold of the matching degree.

Specifically, in one case, when the matching degree of the matched character string is smaller than the matching degree threshold, it may be stated that the character string matched with the sample character string to be detected is not matched within the standard character string range. In this case, the decoding result of the tested decoder may be caused by severe distortion, or the decoding result of the tested decoder may have a very large delay or a very large advance with respect to the decoding result of the standard decoder, i.e., it may be determined that the decoding result of the tested decoder is out of synchronization or has distortion. The non-synchronization means that the decoding results of the tested decoder and the standard decoder are completely out of synchronization.

In addition, on the premise that the decoding result is not distorted, the synchronization between the decoding results of the tested decoder and the standard decoder in an ideal case means that the corresponding characters of the two decoding results on the same timestamp should be consistent. In this case, the matching character string should have the same timestamp as the corresponding sample character string to be tested, and the matching degree of the matching character string and the sample character string should be 100%. However, in a non-ideal case, individual characters in originally identical character strings may be different due to various errors and the like, and therefore, the tolerance of different application scenarios to the errors and the like may be limited by setting a matching degree threshold. For example, in some application scenarios, a higher accuracy synchronization is required, and therefore the matching degree threshold may be set to a larger value; in other scenarios where synchronization accuracy is less critical, the threshold value of the degree of match may be set to a smaller value.

It should be noted that, when the matching degree threshold is too small, synchronous detection may be disabled, so that a priori test may be performed to determine a value range of the matching degree threshold. In determining the threshold of the degree of matching, different settings may be made within the range.

In another case, when the matching degree of the matching character string is greater than or equal to the matching degree threshold, it may be stated that the character string matching the sample character string to be tested is matched within the range of the standard character string. At this time, the audio data corresponding to the matching character string may be determined in the second audio data, and then the first time period may be determined according to the position of the audio data in the second audio data; then determining audio data corresponding to the sample character string to be detected in the first audio data, and determining a second time period according to the position of the audio data in the first audio data; then, the synchronization condition of the decoding result of the decoder under test is further determined based on the first time period and the second time period.

In an exemplary embodiment, when determining the synchronization of the decoding results of the decoder under test according to the first time period and the second time period, it may be determined whether the positions of the first time period and the second time period on the timeline overlap, and different results may be determined according to whether the positions of the first time period and the second time period overlap. Specifically, when the positions of the first time segment and the second time segment on the time line are not overlapped, it is determined that a deviation error exists in the decoding result of the decoder under test (refer to case 1 in fig. 5, and the dark color mark in fig. 5 is an overlapped part); when the positions of the first time segment and the second time segment on the time line are overlapped, the synchronization condition can be further determined according to the overlapping state.

In a non-ideal situation, some error may be allowed in the position of the string under test and the matching string on the timeline. For example, the first time period is 75ms-100ms, and the second time period is 76ms-101ms, and this small error has little influence on the synchronization and can be allowed in some application scenarios. Based on this, when the unit length of the sample unit to be measured is one frame, the one frame may be used as the size of the error, that is, when there is an overlap between the time period corresponding to the character string to be measured and the matching character string, the error may be used as an offset error, specifically, the offset error may include a pre-offset error and a post-offset error; when the character string to be measured and the matching character string do not correspond to each other in the same time period, the error can be regarded as a deviation error.

By setting the deviation error, the pre-deviation error and the post-deviation error, the errors of the corresponding decoding result of the tested decoder can be classified according to the degree, and the deviation error is larger than the deviation error, namely, the synchronization is worse.

In an exemplary embodiment, when there is an overlap between the positions of the first time segment and the second time segment on the timeline, the synchronous detection result of the decoding result of the decoder under test can be determined by the overlap condition. Specifically, when the position of the first time segment on the timeline completely overlaps with the second time segment, it is determined that the decoding result of the decoder under test is completely synchronous (refer to case 2 in fig. 5); when the position of the first time period on the time line is overlapped with the second time period and the position of the first time period on the time line is before the second time period, determining that the decoding result of the decoder to be tested is synchronous and has an error before the error (refer to the case 3 in the figure 5); when the position of the first time segment on the time line is overlapped with the second time segment and the position of the first time segment on the time line is behind the second time segment, the decoding results of the tested decoder are determined to be synchronous and have a rear error (refer to the case 4 in the figure 5).

In an exemplary embodiment, based on the above-mentioned process of determining the synchronous detection result, it can be seen that the pre-bias error and the post-bias error are distinguished from the bias error mainly depending on the length of the sample string to be measured (whether the first time period and the second time period overlap). Therefore, when the length of the sample character string to be detected is set, the synchronous requirement can be set according to different application scenes. For example, when the length of the sample string to be measured is set to be longer, it is described that the definition range of the offset error (pre-offset error and post-offset error) in the current scene is larger; when the length of the sample character string to be measured is set to be shorter, the definition range of the offset error (the pre-offset error and the post-offset error) in the current scene is smaller.

As shown in fig. 6, the technical solution of the embodiment of the present disclosure is described in detail below by taking an example where N is 5, M is 3, the preset length is one frame, and the matching degree threshold is 95%:

in step S601, 5 frames of second audio data are acquired as standard sample units.

And decoding by using a standard decoder to obtain second audio frame data corresponding to the test audio, designating sampling points based on the first time stamp, and acquiring audio data of 5 continuous frames by each sampling point to obtain a standard sample unit corresponding to the sampling point.

Step S603, acquiring 1 frame of first audio data as a sample unit to be measured.

Decoding by using a tested decoder to obtain first audio frame data corresponding to the tested audio, and designating sampling points based on the first time stamp, wherein each sampling point acquires audio data of 5 continuous frames; and for each sampling point, taking the 3 rd frame in the continuous 5 frames as a sample unit to be measured corresponding to the sampling point.

Step S605, calculating a sample character string to be detected corresponding to the sample unit to be detected and a standard sample character string corresponding to the standard sample unit according to the feature-character string mapping rule.

And generating a character string according to the positive and negative attributes of the numerical value corresponding to each feature point in the audio data. Specifically, the positive value is recorded as "1", the negative value and the zero value are recorded as "0", and the sample unit to be measured and the standard sample unit can be converted into a character string shaped like "111111111100000100000000111110 … …".

Step S607, determining a matching character string matching the sample character string to be detected in the standard character string.

And for each sampling point, sequentially intercepting comparison character strings with the same length as the character strings of the sample to be detected bit by bit in the standard character strings, calculating the matching degree of each comparison character string and the character strings of the sample to be detected, and determining the comparison character string with the maximum matching degree as the matching character string corresponding to the character string of the sample to be detected.

Step S609, judging whether the matching degree of the matching character string is greater than the threshold value of the matching degree by 95%.

Step S611, when the matching degree of the matching character string is less than 95%, it is determined that the decoding result of the tested decoder is not synchronous or has distortion.

Step S613, when the matching degree of the matching character string is greater than or equal to 95%, reading a first time period corresponding to the matching character string and a second time period corresponding to the sample character string to be detected.

Step S615, determining the overlapping condition of the first time period and the second time period.

In step S617, when the first time period does not overlap with the second time period, it is determined that a deviation error exists in the decoding result of the tested decoder.

Step S619, when the position of the first time period on the time line is overlapped with the second time period and the position of the first time period on the time line is before the second time period, determining that the decoding result of the detected decoder is synchronous and has an error ahead.

Step S621, when the first time period and the second time period completely overlap, it is determined that the decoding results of the decoder under test are completely synchronized.

And step S623, when the position of the first time period on the time line is overlapped with the second time period and the position of the first time period on the time line is behind the second time period, determining that the decoding result of the detected decoder is synchronous and has a rear error.

In summary, in the exemplary embodiment, the process of synchronous detection is converted into a process of judging whether the character strings are matched or not and whether the corresponding time periods of the character strings are overlapped or not, so that the purpose of synchronous detection of the multimedia decoder can be realized in a software mode, manpower occupation can be effectively saved, and a hardware signal acquisition device is not relied on. In addition, through the classification of the error, the synchronous deviation and deviation of the multimedia decoder can be quantitatively calculated, and the batch automatic test supporting the synchronous detection of the multimedia is facilitated. In addition, when the audio data belongs to the audio data contained in the video, if the audio and the picture in the decoding result of the standard decoder (video decoder) are completely synchronous or meet the synchronous requirement, and the frame rates of the pictures decoded by the two decoders are the same, the synchronous condition of the audio and the picture when the detected decoder performs video decoding can be obtained by performing synchronous detection on the decoding result of the detected decoder (video decoder).

It is noted that the above-mentioned figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Further, referring to fig. 7, an audio synchronization detecting apparatus 700 is further provided in the present exemplary embodiment, and includes an audio decoding module 710, a character string generating module 720, and a result determining module 730. Wherein:

the audio decoding module 710 may be configured to decode the test audio based on the decoder under test to obtain first audio data.

The character string generating module 720 may be configured to generate a corresponding sample character string to be tested based on the first audio data, and obtain a standard sample character string corresponding to the test audio; the standard sample character string is generated based on second audio data obtained by decoding the test audio by a standard decoder; the length of the standard sample character string is more than or equal to that of the sample character string to be detected; .

The result determining module 730 may be configured to determine a matching character string matching the sample character string to be tested in the standard character string, and determine an audio synchronization detection result of the decoder under test according to the matching character string.

In an exemplary embodiment, the character string generating module 720 may be configured to acquire N consecutive audio data of a preset length in the second audio data as standard sample units based on the first sampling time stamp; wherein N is a positive integer; acquiring continuous N audio data with preset lengths in the first audio data based on the first sampling timestamp, and taking the Mth audio data with the preset length as a sample unit to be detected; wherein M is a positive integer less than or equal to N; and calculating a to-be-detected sample character string corresponding to the to-be-detected sample unit and a standard sample character string corresponding to the standard sample unit according to the feature-character string mapping rule.

In an exemplary embodiment, the character string generating module 720 may be configured to sample the first audio data based on the second sampling timestamp to obtain a sample unit to be tested; sampling the second audio data based on the second sampling time stamp and the unit length of the sample unit to be detected to obtain a standard sample unit; the length of the standard sample unit is greater than that of the sample unit to be detected; and calculating a to-be-detected sample character string corresponding to the to-be-detected sample unit and a standard sample character string corresponding to the standard sample unit according to the feature-character string mapping rule.

In an exemplary embodiment, the character string generating module 720 may be configured to intercept the first audio data with a preset length as a sample unit to be tested, with the second sampling timestamp as a starting point.

In an exemplary embodiment, the character string generating module 720 may be configured to intercept, as a sampling reference, second audio data with a length equal to a unit length of a sample unit to be detected, starting from the second sampling timestamp; calculating the interception length of the standard sample unit based on the unit length of the sample unit to be detected; the interception length comprises a forward length and a backward length; taking the starting time stamp of the sampling reference as a starting point, and forward intercepting second audio data with the same forward length as a forward sample by a delay time line; taking the termination time stamp of the sampling reference as a starting point, and intercepting second audio data with the same backward length backwards by using a delay time line as a backward sample; and splicing the forward sample, the sampling reference and the backward sample according to a time line sequence to obtain a standard sample unit.

In an exemplary embodiment, the result determining module 730 may be configured to sequentially obtain comparison character strings with the same length as the sample character string to be detected bit by bit in the standard character string, and calculate a matching degree between each comparison character string and the sample character string to be detected; and determining the comparison character string with the maximum matching degree as a matching character string corresponding to the sample character string to be detected.

In an exemplary embodiment, the result determining module 730 may be configured to determine that the decoding result of the decoder under test is not synchronous or has distortion when the matching degree of the matching character string is smaller than the threshold matching degree; when the matching degree of the matching character string is larger than or equal to the threshold value of the matching degree, reading a first time period corresponding to the matching character string in the second audio data and a second time period corresponding to the sample character string to be detected in the first audio data; and determining the synchronization condition of the decoding result of the decoder to be tested according to the first time period and the second time period.

In an exemplary embodiment, the result determining module 730 may be configured to determine a synchronization condition according to an overlapping status when there is an overlap between positions of the first time period and the second time period on the timeline; and when the positions of the first time segment and the second time segment on the time line are not overlapped, determining that the decoding result of the tested decoder has deviation error.

In an exemplary embodiment, the result determining module 730 may be configured to determine that the decoding result of the decoder under test is completely synchronous when the position of the first time period on the timeline completely overlaps with the second time period; when the position of the first time period on the time line is overlapped with the second time period and the position of the first time period on the time line is before the second time period, determining that the decoding result of the decoder to be tested is synchronous and has an error before the error; and when the position of the first time period on the time line is overlapped with the second time period and the position of the first time period on the time line is behind the second time period, determining that the decoding result of the decoder to be tested is synchronous and has a rear error.

The specific details of each module in the above apparatus have been described in detail in the method section, and details that are not disclosed may refer to the method section, and thus are not described again.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product including program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device, for example, any one or more of the steps in fig. 3 or fig. 6 may be performed.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Furthermore, program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. An audio synchronization detection method, comprising:

decoding the test audio based on a tested decoder to obtain first audio data;

generating a corresponding sample character string to be tested based on the first audio data, and acquiring a standard sample character string corresponding to the test audio;

wherein the standard sample string is generated based on second audio data obtained by a standard decoder decoding the test audio; the length of the standard sample character string is more than or equal to that of the sample character string to be detected;

and determining a matching character string matched with the sample character string to be detected in the standard character string, and determining an audio synchronous detection result of the decoder to be detected according to the matching character string.

2. The method of claim 1, wherein the generating a corresponding sample string to be tested based on the first audio data and obtaining a standard sample string corresponding to the test audio comprises:

acquiring N continuous audio data with preset lengths in the second audio data based on the first sampling time stamp as standard sample units; wherein N is a positive integer;

acquiring N continuous audio data with preset lengths in the first audio data based on the first sampling time stamp, and taking the Mth audio data with the preset length as a sample unit to be detected; wherein M is a positive integer less than or equal to N;

and calculating a to-be-detected sample character string corresponding to the to-be-detected sample unit and a standard sample character string corresponding to the standard sample unit according to a feature-character string mapping rule.

3. The method of claim 1, wherein the generating a corresponding sample string to be tested based on the first audio data and obtaining a standard sample string corresponding to the test audio comprises:

sampling the first audio data based on a second sampling time stamp to obtain a sample unit to be detected;

sampling the second audio data based on the second sampling time stamp and the unit length of the sample unit to be detected to obtain a standard sample unit; the length of the standard sample unit is greater than that of the sample unit to be detected;

4. The method of claim 3, wherein sampling the first audio data based on the second sampling timestamp to obtain a sample unit under test comprises:

and intercepting the first audio data with preset length as a sample unit to be detected by taking the second sampling time stamp as a starting point.

5. The method of claim 4, wherein sampling the second audio data based on the second sampling timestamp and a unit length of the sample unit under test results in a standard sample unit, comprising:

taking the second sampling time stamp as a starting point, and intercepting second audio data with the length equal to the unit length of the sample unit to be detected as a sampling reference;

calculating the interception length of the standard sample unit based on the unit length of the sample unit to be detected; the truncation length comprises a forward length and a backward length;

taking the starting time stamp of the sampling reference as a starting point, and forward intercepting second audio data with the same forward length as the forward sample by a delay time line;

taking the termination time stamp of the sampling reference as a starting point, and intercepting second audio data with the same backward length backwards by a delay time line to serve as a backward sample;

and splicing the forward sample, the sampling reference and the backward sample according to a time line sequence to obtain a standard sample unit.

6. The method according to claim 1, wherein the determining a matching string in the standard string that matches the sample string to be tested comprises:

in the standard character string, sequentially acquiring comparison character strings with the same length as a sample character string to be detected bit by bit, and calculating the matching degree of each comparison character string and the sample character string to be detected;

and determining the comparison character string with the maximum matching degree as a matching character string corresponding to the sample character string to be detected.

7. The method of claim 6, wherein determining the audio synchronization detection result of the decoder under test according to the matching string comprises:

when the matching degree of the matching character string is smaller than a threshold value of the matching degree, determining that the decoding result of the decoder to be tested is asynchronous or has distortion;

when the matching degree of the matching character string is larger than or equal to the matching degree threshold value, determining a first time period according to the position of the audio data corresponding to the matching character string in the second audio data, and determining a second time period according to the position of the audio data corresponding to the sample character string to be detected in the first audio data;

and determining the synchronization condition of the decoding result of the decoder to be tested according to the first time period and the second time period.

8. The method of claim 6, wherein determining synchronization of the decoding results of the decoder under test according to the first time period and the second time period comprises:

when the positions of the first time period and the second time period on the time line are overlapped, determining the synchronization condition according to the overlapping state;

and when the positions of the first time segment and the second time segment on the time line are not overlapped, determining that the decoding result of the decoder under test has deviation error.

9. The method of claim 8, wherein determining synchronization based on the overlap condition comprises:

when the position of the first time segment on the time line is completely overlapped with the second time segment, determining that the decoding result of the decoder under test is completely synchronous;

when the position of the first time period on the time line is overlapped with the second time period and the position of the first time period on the time line is before the second time period, determining that the decoding result of the decoder under test is synchronous and has an error before the error;

when the position of the first time period on the time line is overlapped with the second time period and the position of the first time period on the time line is behind the second time period, determining that the decoding result of the decoder under test is synchronous and has an offset error.

10. An audio synchronization detecting apparatus, comprising:

the audio decoding module is used for decoding the test audio based on a tested decoder to obtain first audio data;

the character string generating module is used for generating a corresponding to-be-tested sample character string based on the first audio data and acquiring a standard sample character string corresponding to the test audio; wherein the standard sample string is generated based on second audio data obtained by a standard decoder decoding the test audio; the length of the standard sample character string is more than or equal to that of the sample character string to be detected;

and the result determining module is used for determining a matching character string matched with the sample character string to be detected in the standard character string and determining the audio synchronous detection result of the decoder to be detected according to the matching character string.

11. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 9.

12. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1-9 via execution of the executable instructions.