WO2020238777A1 - 音频片段的匹配方法、装置、计算机可读介质及电子设备 - Google Patents

音频片段的匹配方法、装置、计算机可读介质及电子设备 Download PDF

Info

Publication number
WO2020238777A1
WO2020238777A1 PCT/CN2020/091698 CN2020091698W WO2020238777A1 WO 2020238777 A1 WO2020238777 A1 WO 2020238777A1 CN 2020091698 W CN2020091698 W CN 2020091698W WO 2020238777 A1 WO2020238777 A1 WO 2020238777A1
Authority
WO
WIPO (PCT)
Prior art keywords
distance
candidate
cumulative
audio segment
target position
Prior art date
Application number
PCT/CN2020/091698
Other languages
English (en)
French (fr)
Inventor
林方超
云伟标
曾鹏
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2021535923A priority Critical patent/JP7337169B2/ja
Priority to EP20815214.0A priority patent/EP3979241B1/en
Publication of WO2020238777A1 publication Critical patent/WO2020238777A1/zh
Priority to US17/336,562 priority patent/US11929090B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • This application relates to the field of computer and communication technologies, and in particular to a method, device, computer readable medium and electronic equipment for matching audio segments.
  • the embodiments of the present application provide an audio segment matching method, device, computer readable medium, and electronic equipment, which can improve the matching accuracy of audio segments.
  • an audio segment matching method including:
  • the degree of matching between the first audio segment and the second audio segment is determined according to the minimum distance.
  • an audio segment matching method including:
  • the server obtains the first feature sequence corresponding to the first audio segment and the second feature sequence corresponding to the second audio segment;
  • the server constructs a distance matrix between the first feature sequence and the second feature sequence, and elements in the distance matrix are used to represent the distance between the first position point and the second position point, and the first A position point is in the first characteristic sequence, and the second position point is in the second characteristic sequence;
  • the server determines the degree of matching between the first audio segment and the second audio segment according to the minimum distance.
  • an audio segment matching device which includes:
  • An acquiring unit configured to acquire a first feature sequence corresponding to the first audio segment and a second feature sequence corresponding to the second audio segment;
  • the constructing unit is configured to obtain the first characteristic sequence and the second characteristic sequence from the obtaining unit, and construct a distance matrix between the first characteristic sequence and the second characteristic sequence, the distance The elements in the matrix are used to represent the distance between a first location point and a second location point, the first location point is in the first feature sequence, and the second location point is in the second feature sequence ;
  • the processing unit is configured to obtain the distance matrix from the construction unit, and determine the first cumulative distance between the starting position and the target position in the distance matrix, and the ending position in the distance matrix to the The second cumulative distance between the target positions; determine the minimum distance between the first characteristic sequence and the second characteristic sequence based on the first cumulative distance and the second cumulative distance; according to the minimum distance Determine the degree of matching between the first audio segment and the second audio segment.
  • the processing unit includes:
  • a determining subunit configured to determine an accumulated distance from the starting position to a first candidate position, the first candidate position being located between the starting position and the target position;
  • the determining subunit is further configured to determine the starting position to the target position according to the accumulated distance between the starting position and the first candidate position and the distance value represented by the first candidate position The first candidate cumulative distance between, and the smallest value among the first candidate cumulative distances is determined as the first cumulative distance.
  • the determining subunit is further configured to sum the accumulated distance and the distance value represented by the first candidate position to obtain the corresponding value of the first candidate position A distance and a value, where the cumulative distance is the distance between the starting position and the first candidate position;
  • the determining subunit is further configured to determine the distance and value as the first candidate cumulative distance corresponding to the first candidate position.
  • the determining subunit is further configured to perform a calculation on each of the first candidate positions according to the distance value indicated by the first candidate position and the weight value corresponding to the first candidate position. Weighted calculation is performed on the distance value represented by the position to obtain the weighted distance value corresponding to the first candidate position;
  • the determining subunit is further configured to sum the cumulative distance between the starting position and the first candidate position and the weighted distance value corresponding to the first candidate position to obtain the first candidate position Corresponding distance and value; determining the distance and value as the first candidate cumulative distance corresponding to the first candidate position.
  • the determining subunit is further configured to determine the distance between the first candidate position and the diagonal line of the distance matrix, and the diagonal line connects the starting point The straight line between the position and the end position; and the weight value corresponding to each first candidate position is determined according to the distance between each of the first candidate positions and the diagonal line.
  • association relationship between the first candidate position and the target position, and the association relationship is used to indicate that the first candidate position is located in a preset distance range around the target position Inside.
  • the processing unit includes:
  • a determining subunit configured to determine the cumulative distance from the end position to a second candidate position, where the second candidate position is located between the target position and the end position;
  • the determining subunit is further configured to determine the distance between the end position and the target position according to the accumulated distance between the end position and the second candidate position and the distance value represented by the second candidate position The second candidate cumulative distance; the minimum value of the second candidate cumulative distance is determined as the second cumulative distance.
  • association relationship between the second candidate position and the target position, and the association relationship is used to indicate that the second candidate position is located in a preset distance range around the target position Inside.
  • the processing unit includes:
  • the determining subunit is configured to determine the minimum cumulative distance corresponding to the target position based on the distance value represented by the target position, the first cumulative distance and the second cumulative distance; and the minimum cumulative distance corresponding to the target position The minimum value is selected from the accumulated distance, and the minimum value is determined as the minimum distance between the first characteristic sequence and the second characteristic sequence.
  • the determining subunit is further configured to sum the distance value represented by the target position, the first accumulated distance, and the second accumulated distance to obtain the target The minimum cumulative distance corresponding to the position;
  • the determining subunit is further configured to perform weighted calculation on the distance value represented by the target position and the weight value corresponding to the target position to obtain the weighted distance value corresponding to the target position; The first accumulated distance and the second accumulated distance are summed to obtain the minimum accumulated distance corresponding to the target position.
  • the first audio segment corresponds to the n first characteristic sequences
  • the second audio segment corresponds to the n second characteristic sequences
  • n is a positive integer
  • the acquiring unit is further configured to acquire n minimum distances between the n first characteristic sequences and the n second characteristic sequences;
  • the processing unit is further configured to perform a weighted sum calculation on the n minimum distances to obtain a weighted distance value between the first audio segment and the second audio segment; and determine the first audio segment according to the weighted distance value. The degree of matching between an audio segment and the second audio segment.
  • the multiple characteristics include: pitch characteristics of the audio segment, musical tone energy, frequency cepstral coefficients, and root mean square energy value of each frame.
  • a computer-readable medium having a computer program stored thereon, and the computer program, when executed by a processor, implements the audio segment matching method described in the foregoing embodiment.
  • an electronic device including: one or more processors; a storage device, configured to store one or more programs, when the one or more programs are used by the one When executed by the or multiple processors, the one or more processors implement the audio segment matching method described in the foregoing embodiment.
  • a computer program product is provided.
  • the computer program product runs on a computer, the computer executes the audio segment matching method provided in the above-mentioned embodiments of the present application.
  • the distance and the second cumulative distance obtain the minimum distance between the first feature sequence and the second feature sequence, so that when calculating the minimum distance between the two feature sequences, it can start from two directions (ie, the starting position of the distance matrix).
  • the target position the end position of the distance matrix to the target position
  • the end position of the distance matrix to the target position to comprehensively calculate, and then can take into account the matching relationship of the feature sequence in two directions, so it can ensure that the calculated minimum distance between the two feature sequences is more accurate, thus Helps improve the accuracy of audio clip matching.
  • Fig. 1 shows a schematic diagram of a system architecture provided by an exemplary embodiment of the present application
  • Fig. 2 shows a flowchart of an audio segment matching method provided by an exemplary embodiment of the present application
  • Fig. 3 shows a schematic diagram of a distance matrix provided by an exemplary embodiment of the present application
  • FIG. 4 shows a flowchart of calculating the first cumulative distance provided by an exemplary embodiment of the present application
  • Fig. 5 shows a schematic diagram of the calculation direction of calculating the accumulated distance from the starting position of the distance matrix provided by an exemplary embodiment of the present application
  • Fig. 6 shows a flowchart of calculating a second cumulative distance provided by an exemplary embodiment of the present application
  • FIG. 7 shows a schematic diagram of the calculation direction of calculating the accumulated distance from the end position of the distance matrix provided by an exemplary embodiment of the present application
  • FIG. 8 shows a flowchart of calculating the minimum distance between the first feature sequence and the second feature sequence provided by an exemplary embodiment of the present application
  • FIG. 9 shows a flowchart for determining the degree of matching between a first audio segment and a second audio segment provided by an exemplary embodiment of the present application
  • Fig. 10 shows a flowchart of a humming scoring method provided by an exemplary embodiment of the present application
  • FIG. 11 shows a schematic diagram of a calculated global optimal path provided by an exemplary embodiment of the present application.
  • Fig. 12 shows a block diagram of an audio segment matching device provided by an exemplary embodiment of the present application
  • Fig. 13 shows a schematic structural diagram of a computer system of an electronic device provided by an exemplary embodiment of the present application.
  • FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.
  • the system architecture may include terminal devices (as shown in FIG. 1, one or more of the smart phone 101, the tablet computer 102, and the portable computer 103, of course, it may also be a desktop computer, etc.), a network 104 And server 105.
  • the network 104 is used as a medium for providing a communication link between the terminal device and the server 105.
  • the network 104 may include various connection types, such as wired communication links, wireless communication links, and so on.
  • the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks and servers.
  • the server 105 may be a server cluster composed of multiple servers.
  • the user can use the terminal device to upload the first audio segment to the server 105 (for example, a piece of audio sung by the user), and the server 105 can extract the first audio segment uploaded by the terminal device.
  • the first feature sequence corresponding to the first audio segment, and the second audio segment that needs to be matched with the first audio segment (such as the audio segment pre-stored in the server 105) is obtained, and the second feature corresponding to the second audio segment is extracted sequence.
  • construct a distance matrix between the first feature sequence and the second feature sequence The elements in the distance matrix are used to represent the distance between the first location point and the second location point, where the first location point is in the first feature In the sequence, the second location point is in the second feature sequence.
  • the server 105 may calculate the first cumulative distance from the starting position in the distance matrix to the target position in the distance matrix, and calculate the ending position in the distance matrix The second cumulative distance to the target location, and then calculate the minimum distance between the first feature sequence and the second feature sequence based on the first cumulative distance and the second cumulative distance, and determine the first audio segment and the first audio segment based on the minimum distance The degree of match between the two audio clips.
  • the technical solution of the embodiment of the present application can comprehensively calculate the minimum distance between the feature sequences of the audio clips from two directions (ie, the starting position of the distance matrix to the target position, and the ending position of the distance matrix to the target position), Furthermore, the matching relationship of the feature sequences in two directions can be taken into account, so it can ensure that the calculated minimum distance between the two feature sequences is more accurate, which is beneficial to improve the accuracy of audio segment matching.
  • Fig. 2 shows a flow chart of a method for matching audio segments provided according to an exemplary embodiment of the present application.
  • the method for matching audio segments can be executed by a device with a computing processing function, such as the one shown in Fig. 1
  • the server 105 executes.
  • the matching method of the audio segment at least includes:
  • step S210 the server obtains the first feature sequence corresponding to the first audio segment and the second feature sequence corresponding to the second audio segment.
  • the first audio segment and the second audio segment are two audio segments that need to be compared to determine the degree of matching.
  • the first audio segment is an audio segment input by the user (such as an audio segment sung by the user). , Audio clips recorded by the user, etc.), the second audio clip is an audio clip already stored in the database.
  • the first feature sequence and the second feature sequence are sequences for the same audio feature, and the audio features include pitch feature, music energy, frequency cepstral coefficient, and root mean square energy value of each frame At least one of them.
  • the feature sequences of the first audio segment and the second audio segment include at least one of an autocorrelation function algorithm, a Yin algorithm, and a PYin algorithm when extracted.
  • step S220 the server constructs a distance matrix between the first feature sequence and the second feature sequence.
  • the elements in the distance matrix are used to indicate the distance between the first location point and the second location point.
  • the second position point is in the second feature sequence.
  • the size of the distance matrix is related to the length of the first feature sequence and the second feature sequence.
  • the size of the distance matrix is m ⁇ n.
  • the first feature sequence is 301 and the second feature sequence is 302, then construct a distance matrix 303 of size m ⁇ n, and each position in the distance matrix represents the first feature sequence 301
  • the distance between a point on and a point on the second feature sequence 302, for example, (i, j) in the distance matrix represents the i-th point on the first feature sequence 301 and the second feature sequence 302
  • the distance between j points which includes the Euclidean distance.
  • step S230 the server determines the first cumulative distance from the start position to the target position in the distance matrix, and the second cumulative distance from the end position in the distance matrix to the target position.
  • the starting position in the distance matrix is the corresponding position in the distance matrix between the first feature point on the first feature sequence and the first feature point on the second feature sequence;
  • the ending position in the matrix is the position corresponding to the last characteristic point on the first characteristic sequence and the last characteristic point on the second characteristic sequence in the distance matrix;
  • the target position includes the distance matrix divided by the starting position and the ending position Any location outside.
  • the calculation process of the first cumulative distance between the starting position and the target position includes the following steps S410 to S430:
  • step S410 the server determines the cumulative distance between the actual position and the first candidate position, which is located between the starting position and the target position.
  • the distance cumulative calculation is performed position by position from the three directions on the matrix, that is, the upper, right and upper right shown in FIG. 5 Diagonally.
  • the association relationship is used to indicate that the first candidate position is located within a preset distance range around the target position.
  • the coordinates of the target position are (i, j)
  • the coordinates of multiple first candidate positions include: (i-1, j-1), (i-1, jx), (ix, j-1) , Where x is a natural number smaller than i or j, for example x is 0, 2, 3, etc.
  • x in the above-mentioned embodiment may be greater than 1.
  • the accumulation is performed at intervals of (x-1) when calculating the accumulated distance, thereby accelerating the calculation process of the accumulated distance. Conducive to increase the calculation rate.
  • the value of x is inconvenient to take a too large value, so set it according to actual needs, for example, set it to 2.
  • the process of calculating the cumulative distance from the starting position to the first candidate position is similar to the process of calculating the cumulative distance from the starting position to the target position.
  • step S420 the server determines the first candidate cumulative distance between the starting position and the target position according to the cumulative distance between the starting position and the first candidate position and the distance value represented by the first candidate position.
  • the process of calculating a plurality of first candidate cumulative distances between the starting position and the target position includes: calculating the cumulative distance between the starting position and the first candidate position and the first candidate position. The indicated distance values are summed to obtain the distance sum value corresponding to the first candidate position, and the distance sum value is determined as the first candidate cumulative distance corresponding to the first candidate position.
  • the first candidate position corresponds to The distance and value can be expressed as: D_forward(i-1,j-1)+d(i-1,j-1), where D_forward(i-1,j-1) represents the starting position in the distance matrix to The cumulative distance between the first candidate positions.
  • the distance and value corresponding to the first candidate position can be expressed as: D_forward(i-1,j-1)+d(i-1,j-1) ⁇ w, where D_forward(i-1 ,j-1) represents the cumulative distance from the starting position in the distance matrix to the first candidate position.
  • the weight value corresponding to the first candidate position is determined according to the distance between the first candidate position and the diagonal line of the distance matrix, that is, the difference between the first candidate position and the diagonal line of the distance matrix is determined.
  • the weight value corresponding to each first candidate position is determined according to the distance between each first candidate position and the diagonal line.
  • the diagonal is a straight line connecting the starting position and the ending position. That is, in the above embodiment, the weight value corresponding to each first candidate position is taken into consideration. This is because the distance between each first candidate position and the diagonal of the distance matrix may be different.
  • the position selected in is too far away from the diagonal of the distance matrix, and the weight corresponding to each first candidate position is set according to the distance between each first candidate position and the diagonal of the distance matrix. For example, if a position is closer to the diagonal, the weight corresponding to the position is closer to 1; if a position is farther from the diagonal, the weight corresponding to the position is greater.
  • step S430 the server determines the smallest value among the first candidate cumulative distances as the first cumulative distance.
  • the technical solution of the embodiment shown in FIG. 4 makes it possible to calculate the first cumulative distance from the starting position of the distance matrix to the target position in the distance matrix.
  • step S610 the server determines the cumulative distance between the end position and the second candidate position, and the second candidate position is located between the target position and the end position.
  • the distance cumulative calculation is performed from three directions on the matrix position by position, that is, the downward, left, and downward slopes shown in FIG. 7 Diagonal. That is, there is an association relationship between the second candidate position and the target position, and the association relationship is used to indicate that the second candidate position is located within a preset range around the target position. For example, if the coordinates of the target position are (i, j), the coordinates of multiple second candidate positions include: (i+1, j+1), (i+1, j+y), (i+y, j+1), where y is a natural number, for example, y is 0, 2, 3, etc.
  • y in the foregoing embodiment may be greater than 1.
  • the accumulation is performed at intervals of (y-1) when calculating the accumulated distance, thereby accelerating the calculation process of the accumulated distance. Conducive to increase the calculation rate.
  • the value of y is inconvenient to take a too large value, so it is set according to actual needs, for example, set to 2.
  • the process of calculating the cumulative distance from the end position to the second candidate position is similar to the process of calculating the cumulative distance from the end position to the target position.
  • step S620 the server determines the second candidate cumulative distance between the end position and the target position according to the cumulative distance between the end position and the second candidate position and the distance value represented by the second candidate position.
  • the process of calculating the second candidate cumulative distance from the end position to the target position includes: calculating the cumulative distance between the end position and the second candidate position and the distance value represented by the second candidate position The sum is performed to obtain the distance sum value corresponding to each second candidate position, and the distance sum value is determined as the second candidate cumulative distance corresponding to the second candidate position.
  • the second candidate position corresponds to The distance and value can be expressed as: D_backward(i+1,j+1)+d(i+1,j+1), where D_backward(i+1,j+1) represents the end position in the distance matrix to this The cumulative distance between the second candidate positions.
  • the process of calculating the cumulative distances of multiple second candidates between the end position and the target position includes: according to the distance value represented by the second candidate position and the weight value corresponding to the second candidate position, Perform weighted calculation on the distance value represented by the second candidate position to obtain the weighted distance value corresponding to the second candidate position; calculate the cumulative distance between the end position and the second candidate position and the weighted distance value corresponding to the second candidate position And to obtain the distance sum value corresponding to each second candidate position, and determine the distance sum value as the second candidate cumulative distance corresponding to the second candidate position.
  • the distance and value corresponding to the second candidate position can be expressed as: D_backward(i+1,j+1)+d(i+1,j+1) ⁇ w, where D_backward(i+1 ,j+1) represents the cumulative distance from the end position in the distance matrix to the second candidate position.
  • the weight value corresponding to the above-mentioned second candidate position is determined according to the distance between the second candidate position and the diagonal of the distance matrix, that is, the difference between the second candidate position and the diagonal of the distance matrix is determined.
  • the weight value corresponding to each second candidate position is determined according to the distance between each second candidate position and the diagonal line.
  • the diagonal is a straight line connecting the start position and the end position. That is, in the above-mentioned embodiment, the weight value corresponding to each second candidate position is taken into consideration. This is because the distance between each second candidate position and the diagonal of the distance matrix may be different.
  • the position selected in is too far away from the diagonal line of the distance matrix, and the weight corresponding to each second candidate position is set according to the distance between each second candidate position and the diagonal line of the distance matrix. For example, if a position is closer to the diagonal, the weight corresponding to the position is closer to 1; if a position is farther from the diagonal, the weight corresponding to the position is greater.
  • step S630 the server determines the minimum value of the second candidate accumulated distance as the second accumulated distance.
  • the technical solution of the embodiment shown in FIG. 6 makes it possible to calculate the second cumulative distance from the end position of the distance matrix to the target position in the distance matrix.
  • step S240 the server calculates the minimum distance between the first characteristic sequence and the second characteristic sequence based on the first cumulative distance and the second cumulative distance.
  • the process of calculating the minimum distance between the first feature sequence and the second feature sequence based on the first accumulated distance and the second accumulated distance in step S240 includes the following step S810 Go to step S820:
  • step S810 the server determines the minimum accumulated distance corresponding to the target position based on the first accumulated distance, the second accumulated distance, and the distance value indicated by the target position.
  • weighted calculation is performed according to the distance value represented by the target position and the weight value corresponding to the target position to obtain the weighted distance value corresponding to the target position, and then the weighted distance value corresponding to the target position and the first accumulation The distance and the second accumulated distance are summed to obtain the minimum accumulated distance corresponding to the target position.
  • the target position is (i,j)
  • the distance value represented by the target position is d(i,j)
  • the weight value corresponding to the target position is w
  • the first cumulative distance is D_forward(i,j)
  • the second The accumulated distance is D_backward(i,j)
  • the weight value corresponding to the target position can also be determined according to the distance between the target position and the diagonal of the distance matrix.
  • step S820 the server selects the minimum value from the minimum accumulated distance corresponding to the target position as the minimum distance between the first feature sequence and the second feature sequence.
  • the minimum distance between the first feature sequence and the second feature sequence is
  • step S250 the server determines the degree of matching between the first audio segment and the second audio segment according to the minimum distance.
  • step S240 the process of determining the degree of matching between the first audio segment and the second audio segment according to the minimum distance between the first feature sequence and the second feature sequence in step S240 , Including the following steps S910 to S930:
  • step S910 the server obtains n minimum distances between n first feature sequences and n second feature sequences, where n is a positive integer.
  • the first audio segment corresponds to n first feature sequences
  • the second audio segment corresponds to n second feature sequences, where the i-th first feature sequence and the i-th second feature sequence correspond to the same Feature, and when calculating the minimum distance, the minimum distance between the i-th first feature sequence and the i-th second feature sequence is calculated, i is a positive integer, and i ⁇ n.
  • the characteristics of the audio segment include: at least one of the pitch feature of the audio segment, musical tone energy, frequency cepstral coefficient, and root mean square energy value of each frame.
  • the first feature sequence of the first audio segment and the second feature sequence of the second audio segment are obtained, and the minimum distance between the two feature sequences is calculated based on this, and the corresponding features are obtained.
  • the minimum distance between the first feature sequence and the second feature sequence is calculated based on this, and the corresponding features are obtained.
  • step S920 the server performs a weighted sum calculation on the n minimum distance values to obtain a weighted distance value between the first audio segment and the second audio segment.
  • the weight corresponding to the feature is set according to the importance of the feature. For example, if a feature is more important, then the weight corresponding to the feature is set to a larger value; if a feature is relatively not If it is too important, then set the weight corresponding to the feature to a smaller value to highlight the impact of important features on the weighted distance value and weaken the impact of non-important features on the weighted distance value.
  • step S930 the server determines the degree of matching between the first audio segment and the second audio segment according to the weighted distance value.
  • the weighted distance value between the first audio segment and the second audio segment is calculated, the weighted distance value is divided by a reference value (such as the length of the first feature sequence or the second The length of the characteristic sequence) is calculated to obtain the matching score, and then the matching degree between the first audio segment and the second audio segment is determined according to the matching score. For example, if the matching score is large, it is determined that the degree of matching between the first audio segment and the second audio segment is strong; on the contrary, if the matching score is small, it is determined that the degree of matching between the first audio segment and the second audio segment is The matching degree is weak.
  • a reference value such as the length of the first feature sequence or the second The length of the characteristic sequence
  • the minimum distance between the first feature sequence and the second feature sequence corresponding to a feature is directly used as the distance between the first audio segment and the second audio segment to determine the first audio The degree of matching between the segment and the second audio segment.
  • the technical solution of the above-mentioned embodiment of the present application comprehensively calculates the minimum distance between the feature sequences of audio clips from two directions (ie, the starting position of the distance matrix to the target position, and the ending position of the distance matrix to the target position). Taking into account the matching relationship of the feature sequences in two directions, it can ensure that the calculated minimum distance between the two feature sequences is more accurate, which is beneficial to improve the accuracy of audio segment matching.
  • the humming scoring method includes the following steps:
  • step S1001 the server collects audio clips sung by the user.
  • the user sings a short specified song a cappella
  • the terminal collects the user's audio segment, and records the start and end time of the audio segment, thereby obtaining the audio duration.
  • the audio duration of the audio segment collected by the terminal is less than the preset duration, the audio segment is filtered out, and the scoring failure information is returned.
  • step S1002 the server extracts the pitch sequence of the audio segment.
  • the autocorrelation function method, Yin algorithm, or PYin algorithm is used to extract the pitch sequence of the audio segment according to the specified sampling rate.
  • step S1003 the server extracts the pitch sequence of the Musical Instrument Digital Interface (MIDI) of the target song segment.
  • MIDI Musical Instrument Digital Interface
  • the bottom layer of the humming score depends on the music MIDI library, which is the source of the scoring standard.
  • the audio segment sung by the user has a start timestamp and an end timestamp corresponding to it, which can accurately correspond to a note sequence in the MIDI library, and then the pitch sequence is obtained according to the conversion formula of MIDI notes and pitch.
  • the pitch sequence of the target song segment MIDI is generated in the MIDI library in advance.
  • step S1003 step S1001, and step S1002.
  • step S1004 the server uses the audio information matching algorithm of the embodiment of the present application to calculate the minimum distance between two pitch sequences.
  • step S1005 the server converts the result of the minimum distance into a standard score through a conversion formula.
  • step S1004 since the minimum distance in step S1004 is obtained by accumulation, the longer the pitch sequence, the greater the value of the calculated minimum distance. In order to eliminate this influence, the calculated value in step S1004 The minimum distance is divided by the length of the pitch sequence of the user's audio segment to obtain the standard score, which is then fed back to the user.
  • the solution for calculating the minimum distance between two pitch sequences using the audio information matching algorithm of the embodiment of the present application is as follows, which mainly includes the following steps:
  • step (1) the distance matrix and weight matrix of sequence p and sequence q are calculated.
  • the weight matrix considers the position (i, j) of the element in the distance matrix and the diagonal of the distance matrix (that is, the straight line formed by the point (1, 1) and the point (m, n)) If the sequence p is closer to the sequence q, the calculated distance from the starting position of the distance matrix (that is, point (1, 1)) to the end position (that is, point (m, n)) The optimal path is closer to the diagonal, so the penalty weight is set for the element position far away from the diagonal, that is, the closer the element position is to the diagonal line, the closer the corresponding weight is to 1, and the further away the element position is from the diagonal line, the corresponding The greater the weight.
  • the distance t(i, j) between the position (i, j) in the distance matrix and the diagonal of the distance matrix can be approximated as:
  • the calculation formula for the position (i, j) in the weight matrix is a smooth correction of t(i, j), that is, the weight w(i, j) corresponding to the position (i, j) in the distance matrix j) can be calculated by the following formula 1:
  • step (2) according to the distance matrix and weight matrix calculated in the above step (1), calculate the forward accumulated distance matrix, the forward source node matrix, the backward accumulated distance matrix and the backward source node matrix.
  • the shortest distance is found by backtracking from the start position and the end position of the distance matrix to the middle position. That is, an improved Dynamic Time Warping (DTW) algorithm is proposed in the embodiments of the present application, which can perform bidirectional calculations to take into account sequence head matching and tail matching, so that the matching situation is more comprehensive.
  • DTW Dynamic Time Warping
  • the element position corresponds to The weight of the position (i,j) starts from the three positions (i-1,j-1), (i-1,j-2) and (i-2,j-1), and defines The forward local decision function D_forward(i,j) shown in the following formula 2 is used to represent the cumulative distance from the starting position of the distance matrix to the position (i,j) in the distance matrix, and the forward The cumulative distance matrix.
  • the forward calculation from the starting position of the distance matrix starts from the lower left corner (1,1) of the distance matrix, and each row is calculated from left to right, and the distance D_forward(i,j) is accumulated in the calculation forward.
  • subscript the source node of D_forward(i,j) which is one of (i-1,j-1), (i-1,j-2) and (i-2,j-1), Stored in the (i,j) position in the forward source node matrix.
  • the process of calculating backward from the end position of the distance matrix is similar to the solution of forward calculation in the foregoing embodiment, except that the calculation direction starts from the end position of the distance matrix, that is, from the end position of the distance matrix.
  • the cumulative distance of the (i,j) position is from (i+1,j+1), (i+1,j+2) and (i+2,j+1) three Start at positions.
  • D_backward(i,j) shown in the following formula 4 which is used to represent the cumulative distance from the end position of the distance matrix to the position (i,j) in the distance matrix, and obtain the direction The distance matrix is accumulated afterwards.
  • w represents the weight value
  • d represents the distance value
  • the backward calculation from the end position of the distance matrix starts from the upper right corner (m, n) of the distance matrix, and each row is calculated from right to left, and the accumulated distance D_backward(i, j) is calculated backward.
  • subscript the source node of D_backward(i,j) namely one of (i+1,j+1), (i+1,j+2) and (i+2,j+1), and save Into the (i,j) position in the backward source node matrix.
  • step (3) Obtain the minimum distance and the shortest path from the forward cumulative distance matrix and the backward cumulative distance matrix.
  • the shortest path connected to (i, j) can be found from the lower left corner and the upper right corner.
  • the shortest distance calculation formula is as follows 6 shows:
  • the minimum distance min_dist is calculated by the following formula 7:
  • the minimum value of D_total(i,j) is the global best path corresponding to the minimum distance.
  • the two pitch sequences are 1101 and 1102 respectively, and the global best path finally obtained according to the technical solution of the foregoing embodiment of the present application is 1103.
  • Fig. 12 shows a block diagram of an audio segment matching device according to an embodiment of the present application.
  • an audio segment matching device 1200 includes: an acquisition unit 1202, a construction unit 1204, and a processing unit 1206.
  • the obtaining unit 1202 is configured to obtain a first feature sequence corresponding to the first audio segment and a second feature sequence corresponding to the second audio segment;
  • the constructing unit 1204 is configured to obtain the first feature sequence and the second feature sequence from the obtaining unit 1202, and construct a distance matrix between the first feature sequence and the second feature sequence, so The elements in the distance matrix are used to represent the distance between a first location point and a second location point, where the first location point is in the first feature sequence, and the second location point is in the second feature sequence.
  • the processing unit 1206 is configured to obtain the distance matrix from the construction unit 1204, and determine the first cumulative distance from the start position to the target position in the distance matrix, and the end position in the distance matrix The second cumulative distance to the target position; determine the minimum distance between the first characteristic sequence and the second characteristic sequence based on the first cumulative distance and the second cumulative distance; according to the The minimum distance determines the degree of matching between the first audio segment and the second audio segment.
  • the processing unit 1206 includes:
  • a determining subunit configured to determine an accumulated distance from the starting position to a first candidate position, the first candidate position being located between the starting position and the target position;
  • the determining subunit is further configured to determine the starting position to the target position according to the accumulated distance between the starting position and the first candidate position and the distance value represented by the first candidate position The first candidate cumulative distance between, and the smallest value among the first candidate cumulative distances is determined as the first cumulative distance.
  • the determining subunit is further configured to sum the accumulated distance and the distance value represented by the first candidate position to obtain the corresponding value of the first candidate position A distance and a value, where the cumulative distance is the distance between the starting position and the first candidate position;
  • the determining subunit is further configured to determine the distance and value as the first candidate cumulative distance corresponding to the first candidate position.
  • the determining subunit is further configured to perform a calculation on each of the first candidate positions according to the distance value indicated by the first candidate position and the weight value corresponding to the first candidate position. Weighted calculation is performed on the distance value represented by the position to obtain the weighted distance value corresponding to the first candidate position;
  • the determining subunit is further configured to sum the cumulative distance between the starting position and the first candidate position and the weighted distance value corresponding to the first candidate position to obtain the first candidate position Corresponding distance and value; determining the distance and value as the first candidate cumulative distance corresponding to the first candidate position.
  • the determining subunit is further configured to determine the distance between the first candidate position and the diagonal line of the distance matrix, and the diagonal line connects the starting point The straight line between the position and the end position; and the weight value corresponding to each first candidate position is determined according to the distance between each of the first candidate positions and the diagonal line.
  • association relationship between the first candidate position and the target position, and the association relationship is used to indicate that the first candidate position is located in a preset distance range around the target position Inside.
  • the processing unit 1206 includes:
  • a determining subunit configured to determine the cumulative distance from the end position to a second candidate position, where the second candidate position is located between the target position and the end position;
  • the determining subunit is further configured to determine the distance between the end position and the target position according to the accumulated distance between the end position and the second candidate position and the distance value represented by the second candidate position The second candidate cumulative distance; the minimum value of the second candidate cumulative distance is determined as the second cumulative distance.
  • association relationship between the second candidate position and the target position, and the association relationship is used to indicate that the second candidate position is located in a preset distance range around the target position Inside.
  • the processing unit 1206 includes:
  • the determining subunit is configured to determine the minimum cumulative distance corresponding to the target position based on the distance value represented by the target position, the first cumulative distance and the second cumulative distance; and the minimum cumulative distance corresponding to the target position The minimum value is selected from the accumulated distance, and the minimum value is determined as the minimum distance between the first characteristic sequence and the second characteristic sequence.
  • the determining subunit is further configured to sum the distance value represented by the target position, the first accumulated distance, and the second accumulated distance to obtain the target The minimum cumulative distance corresponding to the position;
  • the determining subunit is further configured to perform weighted calculation on the distance value represented by the target position and the weight value corresponding to the target position to obtain the weighted distance value corresponding to the target position; The first accumulated distance and the second accumulated distance are summed to obtain the minimum accumulated distance corresponding to the target position.
  • the first audio segment corresponds to the n first characteristic sequences
  • the second audio segment corresponds to the n second characteristic sequences
  • n is a positive integer
  • the obtaining unit 1202 is further configured to obtain n minimum distances between the n first feature sequences and the n second feature sequences;
  • the processing unit 1206 is further configured to perform a weighted sum calculation on the n minimum distances to obtain a weighted distance value between the first audio segment and the second audio segment; and determine the weighted distance value according to the weighted distance value. The degree of matching between the first audio segment and the second audio segment.
  • the multiple characteristics include: pitch characteristics of the audio segment, musical tone energy, frequency cepstral coefficients, and root mean square energy value of each frame.
  • the aforementioned acquisition unit 1202 can be implemented by a memory in a computer device, or can be implemented by a processor in a computer device, or can be implemented by both the memory and the processor; the aforementioned construction unit 1204 and processing unit 1206 are implemented by the computer device.
  • FIG. 13 shows a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • the computer system 1300 includes a central processing unit (Central Processing Unit, CPU) 1301, which can be loaded into a random system according to a program stored in a read-only memory (Read-Only Memory, ROM) 1302 or from the storage part 1308.
  • Access memory (Random Access Memory, RAM) 1303 programs to perform various appropriate actions and processing, for example, perform the methods described in the foregoing embodiments.
  • RAM 1303 various programs and data required for system operation are also stored.
  • the CPU 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304.
  • An Input/Output (I/O) interface 1305 is also connected to the bus 1304.
  • the following components are connected to the I/O interface 1305: input part 1306 including keyboard, mouse, etc.; output part 1307 such as cathode ray tube (Cathode Ray Tube, CRT), liquid crystal display (LCD), and speakers. ; A storage part 1308 including a hard disk, etc.; and a communication part 1309 including a network interface card such as a LAN (Local Area Network) card and a modem.
  • the communication section 1309 performs communication processing via a network such as the Internet.
  • the driver 1310 is also connected to the I/O interface 1305 as needed.
  • a removable medium 1311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1310 as required, so that the computer program read from it is installed into the storage portion 1308 as required.
  • the process described above with reference to the flowchart can be implemented as a computer software program.
  • the embodiments of the present application include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication part 1309, and/or installed from the removable medium 1311.
  • CPU central processing unit
  • the computer-readable medium shown in the embodiments of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electric, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable of the above The combination.
  • the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the above-mentioned module, program segment, or part of the code includes one or more executables for realizing the specified logic function. instruction.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram or flowchart, and the combination of blocks in the block diagram or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations, or can be It is realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present application can be implemented in software or hardware, and the described units can also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • this application also provides a computer-readable medium.
  • the computer-readable medium may be included in the electronic device described in the above-mentioned embodiments; or it may exist alone without being assembled into the electronic device. in.
  • the foregoing computer-readable medium carries one or more programs, and when the foregoing one or more programs are executed by an electronic device, the electronic device realizes the method described in the foregoing embodiment.
  • modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
  • the exemplary embodiments described herein can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) execute the method according to the embodiment of the present application.
  • a computing device which can be a personal computer, a server, a touch terminal, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种音频片段的匹配方法、装置(1200)、计算机可读介质及电子设备。匹配方法包括:获取第一音频片段对应的第一特征序列和第二音频片段对应的第二特征序列(S210);构建距离矩阵,距离矩阵中的元素表示第一特征序列上的第一位置点与第二特征序列上的第二位置点之间的距离(S220);计算距离矩阵中的起始位置到目标位置之间的第一累加距离,并计算距离矩阵中的终止位置到目标位置之间的第二累加距离(S230);基于第一累加距离和第二累加距离计算第一特征序列与第二特征序列之间的最小距离(S240),并根据最小距离确定第一音频片段与第二音频片段之间的匹配度(S250)。可以提高音频片段的匹配准确性。

Description

音频片段的匹配方法、装置、计算机可读介质及电子设备
本申请要求于2019年05月24日提交的申请号为201910441366.5、发明名称为“音频片段的匹配方法、装置、计算机可读介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机及通信技术领域,特别涉及一种音频片段的匹配方法、装置、计算机可读介质及电子设备。
背景技术
在音频片段的匹配场景中,比如哼唱式音乐检索或哼唱评分场景,通常是通过比较哼唱旋律的音频特征序列与待检索音频的特征序列之间的差异程度来衡量两个音频片段之间的相似度。但是,如何能够提高音频片段匹配的准确性是亟待解决的技术问题。
发明内容
本申请的实施例提供了一种音频片段的匹配方法、装置、计算机可读介质及电子设备,能够提高音频片段的匹配准确性。
本申请的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本申请的实践而习得。
根据本申请实施例的一个方面,提供了一种音频片段的匹配方法,该方法包括:
获取第一音频片段对应的第一特征序列和第二音频片段对应的第二特征序列;
构建所述第一特征序列与所述第二特征序列之间的距离矩阵,所述距离矩阵中的元素用于表示第一位置点与第二位置点之间的距离,所述第一位置点在所述第一特征序列中,所述第二位置点在所述第二特征序列中;
确定所述距离矩阵中的起始位置到目标位置之间的第一累加距离,以及所述距离矩阵中的终止位置到所述目标位置之间的第二累加距离;
基于所述第一累加距离和所述第二累加距离确定所述第一特征序列与所述第二特征序列之间的最小距离;
根据所述最小距离确定所述第一音频片段与所述第二音频片段之间的匹配度。
根据本申请实施例的另一方面,提供了一种音频片段的匹配方法,所述方法包括:
服务器获取第一音频片段对应的第一特征序列和第二音频片段对应的第二特征序列;
所述服务器构建所述第一特征序列与所述第二特征序列之间的距离矩阵,所述距离矩阵中的元素用于表示第一位置点与第二位置点之间的距离,所述第一位置点在所述第一特征序列中,所述第二位置点在所述第二特征序列中;
所述服务器确定所述距离矩阵中的起始位置到目标位置之间的第一累加距离,以及所述距离矩阵中的终止位置到所述目标位置之间的第二累加距离;
所述服务器基于所述第一累加距离和所述第二累加距离确定所述第一特征序列与所述第二特征序列之间的最小距离;
所述服务器根据所述最小距离确定所述第一音频片段与所述第二音频片段之间的匹配度。
根据本申请实施例的另一方面,提供了一种音频片段的匹配装置,该装置包括:
获取单元,用于获取第一音频片段对应的第一特征序列和第二音频片段对应的第二特征序列;
构建单元,用于从所述获取单元中获取所述第一特征序列和所述第二特征序列,并构建所述第一特征序列与所述第二特征序列之间的距离矩阵,所述距离矩阵中的元素用于表示第一位置点与第二位置点之间的距离,所述第一位置点在所述第一特征序列中,所述第二位置点在所述第二特征序列中;
处理单元,用于从所述构建单元中获取所述距离矩阵,并确定所述距离矩阵中的起始位置到目标位置之间的第一累加距离,以及所述距离矩阵中的终止位置到所述目标位置之间的第二累加距离;基于所述第一累加距离和所述第二累加距离确定所述第一特征序列与所述第二特征序列之间的最小距离;根据所述最小距离确定所述第一音频片段与所述第二音频片段之间的匹配度。
在一个可选的实施例中,所述处理单元,包括:
确定子单元,用于确定所述起始位置到第一候选位置之间的累加距离,所述第一候选位置位于所述起始位置与所述目标位置之间;
确定子单元,还用于根据所述起始位置到所述第一候选位置之间的累加距离,以及所述第一候选位置所表示的距离值,确定所述起始位置到所述目标位置之间的第一候选累加距离;将所述第一候选累加距离中的最小值确定为所述第一累加距离。
在一个可选的实施例中,所述确定子单元,还用于对所述累加距离与所述第一候选位置所表示的所述距离值进行求和,得到所述第一候选位置对应的距离和值,所述累加距离为所述起始位置到所述第一候选位置之间的距离;
确定子单元,还用于将所述距离和值确定为所述第一候选位置对应的所述第一候选累加距离。
在一个可选的实施例中,所述确定子单元,还用于根据所述第一候选位置所表示的距离值,以及所述第一候选位置对应的权重值,对各个所述第一候选位置所表示的距离值进行加权计算,得到所述第一候选位置对应的加权距离值;
确定子单元,还用于对所述起始位置到所述第一候选位置之间的累加距离与所述第一候选位置对应的所述加权距离值进行求和,得到所述第一候选位置对应的距离和值;将所述距离和值确定为所述第一候选位置对应的所述第一候选累加距离。
在一个可选的实施例中,所述确定子单元,还用于确定所述第一候选位置与所述距离矩阵的对角线之间的距离,所述对角线是连接所述起始位置与所述终止位置的直线;根据各个所述第一候选位置与所述对角线之间的距离,确定各个所述第一候选位置对应的权重值。
在一个可选的实施例中,所述第一候选位置与所述目标位置之间存在关联关系,所述关联关系用于表示所述第一候选位置位于所述目标位置周侧预设距离范围内。
在一个可选的实施例中,所述处理单元,包括:
确定子单元,用于确定所述终止位置到第二候选位置之间的累加距离,所述第二候选位置位于所述目标位置与所述终止位置之间;
确定子单元,还用于根据所述终止位置到所述第二候选位置之间的累加距离,以及所述第二候选位置所表示的距离值,确定所述终止位置到所述目标位 置之间的第二候选累加距离;将所述第二候选累加距离中的最小值确定为所述第二累加距离。
在一个可选的实施例中,所述第二候选位置与所述目标位置之间存在关联关系,所述关联关系用于表示所述第二候选位置位于所述目标位置周侧预设距离范围内。
在一个可选的实施例中,所述处理单元,包括:
确定子单元,用于基于所述目标位置所表示的距离值、所述第一累加距离和所述第二累加距离,确定所述目标位置对应的最小累加距离;从所述目标位置对应的最小累加距离中选择最小值,将所述最小值确定作为所述第一特征序列与所述第二特征序列之间的最小距离。
在一个可选的实施例中,所述确定子单元,还用于对所述目标位置所表示的距离值、所述第一累加距离和所述第二累加距离进行求和,得到所述目标位置对应的所述最小累加距离;
或,
所述确定子单元,还用于对所述目标位置所表示的距离值和所述目标位置对应的权重值进行加权计算,得到所述目标位置对应的加权距离值;对所述加权距离值、所述第一累加距离和所述第二累加距离进行求和,得到所述目标位置对应的所述最小累加距离。
在一个可选的实施例中,所述第一音频片段对应n个所述第一特征序列,所述第二音频片段对应n个所述第二特征序列,n为正整数;
所述获取单元,还用于获取n个所述第一特征序列和n个所述第二特征序列之间的n个最小距离;
处理单元,还用于对n个所述最小距离进行加权求和计算,得到所述第一音频片段和所述第二音频片段之间的加权距离值;根据所述加权距离值确定所述第一音频片段和所述第二音频片段之间的匹配度。
在一个可选的实施例中,基于前述方案,所述多种特征包括:音频片段的音高特征、乐音能量、频率倒谱系数、每帧的均方根能量值。
根据本申请实施例的另一方面,提供了一种计算机可读介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上述实施例中所述的音频片段的匹配方法。
根据本申请实施例的另一方面,提供了一种电子设备,包括:一个或多个 处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如上述实施例中所述的音频片段的匹配方法。
根据本申请实施例的另一方面,提供了一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得计算机执行如上述本申请实施例中提供的音频片段的匹配方法。
本申请实施例提供的技术方案带来的有益效果至少包括:
通过计算距离矩阵中的起始位置到距离矩阵中的目标位置之间的第一累加距离,并计算距离矩阵中的终止位置到该目标位置之间的第二累加距离,以基于该第一累加距离和该第二累加距离得到第一特征序列与第二特征序列之间的最小距离,使得在计算两个特征序列之间的最小距离时,可以从两个方向(即距离矩阵的起始位置到目标位置、距离矩阵的终止位置到目标位置)来综合计算,进而能够兼顾特征序列在两个方向上的匹配关系,因此能够保证计算得到的两个特征序列之间的最小距离更加准确,从而有利于提高音频片段匹配的准确性。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本申请一个示例性实施例提供的系统架构的示意图;
图2示出了本申请的一个示例性实施例提供的音频片段的匹配方法的流程图;
图3示出了本申请的一个示例性实施例提供的距离矩阵的示意图;
图4示出了本申请的一个示例性实施例提供的计算第一累加距离的流程图;
图5示出了本申请的一个示例性实施例提供的从距离矩阵的起始位置开 始计算累加距离的计算方向示意图;
图6示出了本申请的一个示例性实施例提供的计算第二累加距离的流程图;
图7示出了本申请的一个示例性实施例提供的从距离矩阵的终止位置开始计算累加距离的计算方向示意图;
图8示出了本申请的一个示例性实施例提供的计算第一特征序列与第二特征序列之间的最小距离的流程图;
图9示出了本申请的一个示例性实施例提供的确定第一音频片段与第二音频片段之间的匹配度的流程图;
图10示出了本申请的一个示例性实施例提供的哼唱评分方法的流程图;
图11示出了本申请的一个示例性实施例提供的计算得到的全局最佳路径的示意图;
图12示出了本申请的一个示例性实施例提供的音频片段的匹配装置的框图;
图13示出了本申请的一个示例性实施例提供的电子设备的计算机系统的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图。
如图1所示,系统架构可以包括终端设备(如图1中所示智能手机101、平板电脑102和便携式计算机103中的一种或多种,当然也可以是台式计算机等等)、网络104和服务器105。网络104用以在终端设备和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线通信链路、无线通信链路等等。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。比如服务器105可以是多个服务器组成的服务器集群等。
在本申请的一个实施例中,用户可以利用终端设备向服务器105上传第一音频片段(比如可以是用户清唱的一段音频),服务器105在获取到终端设备上传的第一音频片段之后,可以提取第一音频片段对应的第一特征序列,并且获取到需要与第一音频片段进行匹配的第二音频片段(比如服务器105中预先存储的音频片段),并提取第二音频片段对应的第二特征序列。然后构建第一特征序列与第二特征序列之间的距离矩阵,该距离矩阵中的元素用于表示第一位置点与第二位置点之间的距离,其中,第一位置点在第一特征序列中,第二位置点在第二特征序列中。
在本申请的一个实施例中,当构建距离矩阵之后,服务器105可以计算该距离矩阵中的起始位置到距离矩阵中的目标位置之间的第一累加距离,并计算距离矩阵中的终止位置到目标位置之间的第二累加距离,然后基于第一累加距离和第二累加距离计算第一特征序列与第二特征序列之间的最小距离,并根据该最小距离确定第一音频片段与第二音频片段之间的匹配度。可见,由于本申请实施例的技术方案可以从两个方向(即距离矩阵的起始位置到目标位置、距离矩阵的终止位置到目标位置)来综合计算音频片段的特征序列之间的最小距离,进而能够兼顾特征序列在两个方向上的匹配关系,因此能够保证计算得到的两个特征序列之间的最小距离更加准确,从而有利于提高音频片段匹配的准确性。
以下对本申请实施例的技术方案的实现细节进行详细阐述:
图2示出了根据本申请的一个示例性实施例提供的音频片段的匹配方法的流程图,该音频片段的匹配方法可以由具有计算处理功能的设备来执行,比如由图1中所示的服务器105来执行。参照图2所示,该音频片段的匹配方法至少包括:
在步骤S210中,服务器获取第一音频片段对应的第一特征序列和第二音频片段对应的第二特征序列。
在本申请的一个实施例中,第一音频片段和第二音频片段为需要进行对比以确定匹配度的两个音频片段,比如第一音频片段是用户输入的音频片段(如用户清唱的音频片段、用户录制的音频片段等),第二音频片段是数据库中已存储的音频片段。
在本申请的一个实施例中,第一特征序列和第二特征序列是针对同一个音频特征的序列,音频特征包括音高特征、乐音能量、频率倒谱系数、每帧的均方根能量值中的至少一种。
在本申请的一个实施例中,第一音频片段和第二音频片段的特征序列在提取时包括自相关函数算法、Yin算法、PYin算法中的至少一种。
在步骤S220中,服务器构建第一特征序列与第二特征序列之间的距离矩阵,距离矩阵中的元素用于表示第一位置点与第二位置点之间的距离,第一位置点在第一特征序列中,第二位置点在第二特征序列中。
在一些实施例中,距离矩阵的尺寸与第一特征序列和第二特征序列的长度相关。示意性的,当第一特征序列的长度为m,第二特征序列的长度为n,则距离矩阵的大小为m×n。示意性的,如图3所示,假设第一特征序列为301,第二特征序列为302,那么构建一个大小为m×n的距离矩阵303,距离矩阵中的各个位置表示第一特征序列301上的一个点与第二特征序列302上的一个点之间的距离,比如距离矩阵中的(i,j)表示第一特征序列301上的第i个点与第二特征序列302上的第j个点之间的距离,该距离包括欧氏距离。
在步骤S230中,服务器确定距离矩阵中的起始位置到目标位置之间的第一累加距离,以及距离矩阵中的终止位置到目标位置之间的第二累加距离。
在本申请的一个实施例中,距离矩阵中的起始位置为第一特征序列上的第一个特征点与第二特征序列上的第一个特征点在距离矩阵中所对应的位置;距离矩阵中的终止位置即第一特征序列上的最后一个特征点与第二特征序列上的最后一个特征点在距离矩阵中所对应的位置;目标位置包括距离矩阵中除起始位置和终止位置之外的任意一个位置。
在本申请的一个实施例中,如图4所示,起始位置目标位置之间的第一累加距离的计算过程,包括如下步骤S410至步骤S430:
在步骤S410中,服务器确定其实位置到第一候选位置之间的累加距离,第一候选位置位于起始位置与目标位置之间。
在本申请的一个实施例中,在从距离矩阵的起始位置开始计算累加距离时,从矩阵上的三个方向来逐个位置进行距离累加计算,即图5中所示的上、右和右上斜对角。
也即,第一候选位置与目标位置之间存在关联关系,该关联关系用于表示第一候选位置位于目标位置周侧预设距离范围内。比如,若目标位置的坐标为 (i,j),则多个第一候选位置的坐标包括:(i-1,j-1)、(i-1,j-x)、(i-x,j-1),其中,x为小于i或j的自然数,比如x为0、2、3等。
在本申请的一个实施例中,上述实施例中的x可以大于1,在这种情况下,在计算累加距离时间隔(x-1)个位置进行累加,进而加速累加距离的计算过程,有利于提高计算速率。但是,为了保证计算结果的准确性,x的值又不便于取过大的值,因此根据实际需要进行设定,比如设置为2。
在本申请的一个实施例中,计算起始位置到第一候选位置之间的累加距离的过程与计算起始位置到目标位置的累加距离的过程类似。
在步骤S420中,服务器根据起始位置到第一候选位置之间的累加距离,以及第一候选位置所表示的距离值,确定起始位置到目标位置之间的第一候选累加距离。
在本申请的一个实施例中,计算起始位置到目标位置之间的多个第一候选累加距离的过程包括:对起始位置到第一候选位置之间的累加距离与第一候选位置所表示的距离值进行求和,得到第一候选位置对应的距离和值,将距离和值确定为第一候选位置对应的第一候选累加距离。
比如,若某个第一候选位置为(i-1,j-1),该第一候选位置所表示的距离值为d(i-1,j-1),那么该第一候选位置对应的距离和值可以表示为:D_forward(i-1,j-1)+d(i-1,j-1),其中的D_forward(i-1,j-1)表示距离矩阵中的起始位置到该第一候选位置之间的累加距离。
在本申请的一个实施例中,计算起始位置到目标位置之间的第一候选累加距离时,首先根据第一候选位置所表示的距离值,以及第一候选位置对应的权重值,对各个第一候选位置所表示的距离值进行加权计算,得到第一候选位置对应的加权距离值;对起始位置到第一候选位置之间的累加距离与第一候选位置对应的加权距离值进行求和,得到第一候选位置对应的距离和值,将距离和值确定为第一候选位置对应的第一候选累加距离。比如,若某个第一候选位置为(i-1,j-1),该第一候选位置所表示的距离值为d(i-1,j-1),该第一候选位置对应的权重值为w,那么该第一候选位置对应的距离和值可以表示为:D_forward(i-1,j-1)+d(i-1,j-1)×w,其中的D_forward(i-1,j-1)表示距离矩阵中的起始位置到该第一候选位置之间的累加距离。
在一些实施例中,上述第一候选位置对应的权重值是根据第一候选位置与距离矩阵对角线之间的距离确定的,也即,确定第一候选位置与距离矩阵的对 角线之间的距离,并根据各个第一候选位置与对角线之间的距离,确定各个第一候选位置对应的权重值。其中,对角线是连接起始位置与终止位置的直线。即在上述实施例中,考虑到了各个第一候选位置对应的权重值,这是由于各个第一候选位置与距离矩阵的对角线之间的距离可能不同,为了避免最后从各个第一候选位置中选择的位置偏离距离矩阵的对角线过远,则根据各个第一候选位置与距离矩阵的对角线之间的距离来设置各个第一候选位置对应的权重。比如,若某个位置距离对角线越近,则该位置对应的权重越接近1;若某个位置距离对角线越远,则该位置对应的权重越大。
在步骤S430中,服务器将第一候选累加距离中的最小值确定为第一累加距离。
图4所示实施例的技术方案使得能够计算得到从距离矩阵的起始位置到距离矩阵中的目标位置之间的第一累加距离。
以下结合图6介绍本申请实施例中计算距离矩阵中的终止位置到目标位置之间的第二累加距离,具体包括如下步骤S610至步骤S630:
在步骤S610中,服务器确定终止位置到第二候选位置之间的累加距离,第二候选位置位于目标位置与终止位置之间。
在本申请的一个实施例中,在从距离矩阵的终止位置开始计算累加距离时,从矩阵上的三个方向来逐个位置进行距离累加计算,即图7中所示的下、左和左下斜对角。也即,第二候选位置与目标位置之间存在关联关系,该关联关系用于表示第二候选位置位于目标位置周侧预设范围内。比如,若目标位置的坐标为(i,j),则多个第二候选位置的坐标包括:(i+1,j+1)、(i+1,j+y)、(i+y,j+1),其中,y为自然数,比如y为0,2、3等。
在本申请的一个实施例中,上述实施例中的y可以大于1,在这种情况下,在计算累加距离时间隔(y-1)个位置进行累加,进而加速累加距离的计算过程,有利于提高计算速率。但是,为了保证计算结果的准确性,y的值又不便于取过大的值,因此根据实际需要进行设定,比如设置为2。
在本申请的一个实施例中,计算终止位置到第二候选位置之间的累加距离的过程与计算终止位置到目标位置的累加距离的过程类似。
在步骤S620中,服务器根据终止位置到第二候选位置之间的累加距离,以及第二候选位置所表示的距离值,确定终止位置到目标位置之间的第二候选累加距离。
在本申请的一个实施例中,计算终止位置到目标位置之间的第二候选累加距离的过程包括:对终止位置到第二候选位置之间的累加距离与第二候选位置所表示的距离值进行求和,得到各个第二候选位置对应的距离和值,将距离和值确定为第二候选位置对应的第二候选累加距离。
比如,若某个第二候选位置为(i+1,j+1),该第二候选位置所表示的距离值为d(i+1,j+1),那么该第二候选位置对应的距离和值可以表示为:D_backward(i+1,j+1)+d(i+1,j+1),其中的D_backward(i+1,j+1)表示距离矩阵中的终止位置到该第二候选位置之间的累加距离。
在本申请的一个实施例中,计算终止位置到目标位置之间的多个第二候选累加距离的过程包括:根据第二候选位置所表示的距离值,以及第二候选位置对应的权重值,对第二候选位置所表示的距离值进行加权计算,得到第二候选位置对应的加权距离值;对终止位置到第二候选位置之间的累加距离与第二候选位置对应的加权距离值进行求和,得到各个第二候选位置对应的距离和值,将距离和值确定为第二候选位置对应的第二候选累加距离。比如,若某个第二候选位置为(i+1,j+1),该第二候选位置所表示的距离值为d(i+1,j+1),该第二候选位置对应的权重值为w,那么该第二候选位置对应的距离和值可以表示为:D_backward(i+1,j+1)+d(i+1,j+1)×w,其中的D_backward(i+1,j+1)表示距离矩阵中的终止位置到该第二候选位置之间的累加距离。
在一些实施例中,上述第二候选位置对应的权重值是根据第二候选位置与距离矩阵对角线之间的距离确定的,也即,确定第二候选位置与距离矩阵的对角线之间的距离,并根据各个第二候选位置与对角线之间的距离,确定各个第二候选位置对应的权重值。其中,对角线是连接起始位置与终止位置的直线。即在上述实施例中,考虑到了各个第二候选位置对应的权重值,这是由于各个第二候选位置与距离矩阵的对角线之间的距离可能不同,为了避免最后从各个第二候选位置中选择的位置偏离距离矩阵的对角线过远,则根据各个第二候选位置与距离矩阵的对角线之间的距离来设置各个第二候选位置对应的权重。比如,若某个位置距离对角线越近,则该位置对应的权重越接近1;若某个位置距离对角线越远,则该位置对应的权重越大。
在步骤S630中,服务器将第二候选累加距离中的最小值确定为第二累加距离。
图6所示实施例的技术方案使得能够计算得到从距离矩阵的终止位置到 距离矩阵中的目标位置之间的第二累加距离。
在步骤S240中,服务器基于第一累加距离和第二累加距离计算第一特征序列与第二特征序列之间的最小距离。
在本申请的一个实施例中,如图8所示,步骤S240中基于第一累加距离和第二累加距离计算第一特征序列与第二特征序列之间的最小距离的过程,包括如下步骤S810至步骤S820:
在步骤S810中,服务器基于第一累加距离、第二累加距离和目标位置所表示的距离值,确定目标位置对应的最小累加距离。
在本申请的一个实施例中,对目标位置所表示的距离值、第一累加距离和第二累加距离进行求和,得到目标位置对应的最小累加距离。比如,若目标位置为(i,j)、目标位置所表示的距离值为d(i,j)、第一累加距离为D_forward(i,j)、第二累加距离为D_backward(i,j),那么目标位置对应的最小累加距离为D_total(i,j)=D_forward(i,j)+D_backward(i,j)+d(i,j)。
在本申请的一个实施例中,根据目标位置所表示的距离值和目标位置对应的权重值进行加权计算,得到目标位置对应的加权距离值,然后对目标位置对应的加权距离值、第一累加距离和第二累加距离进行求和,得到目标位置对应的最小累加距离。比如,若目标位置为(i,j)、目标位置所表示的距离值为d(i,j)、目标位置对应的权重值为w、第一累加距离为D_forward(i,j)、第二累加距离为D_backward(i,j),那么目标位置对应的最小累加距离为D_total(i,j)=D_forward(i,j)+D_backward(i,j)+d(i,j)×w。其中,目标位置对应的权重值也可以根据目标位置与距离矩阵的对角线之间的距离来确定。
继续参照图8所示,在步骤S820中,服务器从目标位置对应的最小累加距离中选择最小值作为第一特征序列与第二特征序列之间的最小距离。
在本申请的一个实施例中,第一特征序列与第二特征序列之间的最小距离即为
Figure PCTCN2020091698-appb-000001
在步骤S250中,服务器根据最小距离确定第一音频片段与第二音频片段之间的匹配度。
在本申请的一个实施例中,如图9所示,步骤S240中根据第一特征序列与第二特征序列之间的最小距离确定第一音频片段与第二音频片段之间的匹配度的过程,包括如下步骤S910至步骤S930:
在步骤S910中,服务器获取n个第一特征序列和n个第二特征序列之间 的n个最小距离,n为正整数。
在一些实施例中,第一音频片段对应n个第一特征序列,第二音频片段对应n个第二特征序列,其中,第i个第一特征序列和第i个第二特征序列对应同样的特征,并且在计算最小距离时,对第i个第一特征序列和第i个第二特征序列之间的最小距离进行计算,i为正整数,且i≤n。音频片段的特征包括:音频片段的音高特征、乐音能量、频率倒谱系数、每帧的均方根能量值中的至少一种。即,针对每种特征,分别获取到第一音频片段的第一特征序列与第二音频片段的第二特征序列,并据此计算两特征序列之间的最小距离,进而得到多种特征分别对应的第一特征序列和第二特征序列之间的最小距离。
在步骤S920中,服务器对n个最小距离值进行加权求和计算,得到第一音频片段和第二音频片段之间的加权距离值。
在本申请的一个实施例中,特征对应的权重根据特征的重要程度来设定,比如若某个特征比较重要,那么将该特征对应的权重设置为较大的值;若某个特征相对不是太重要,那么将该特征对应的权重设置为较小的值,进而突出重要特征对加权距离值的影响,并弱化非重要特征对加权距离值的影响。
在步骤S930中,服务器根据加权距离值确定第一音频片段和第二音频片段之间的匹配度。
在本申请的一个实施例中,在计算得到第一音频片段和第二音频片段之间的加权距离值之后,将该加权距离值除以一个参考值(如第一特征序列的长度或第二特征序列的长度)来计算得到匹配分数,进而根据匹配分数来确定第一音频片段和第二音频片段之间的匹配度。比如,若匹配分数较大,则确定第一音频片段和第二音频片段之间的匹配度较强;相反地,若匹配分数较小,则确定第一音频片段和第二音频片段之间的匹配度较弱。
在本申请的一个实施例中,直接将一个特征对应的第一特征序列与第二特征序列之间的最小距离作为第一音频片段与第二音频片段之间的距离,进而来确定第一音频片段与第二音频片段之间的匹配度。
本申请上述实施例的技术方案由于从两个方向(即距离矩阵的起始位置到目标位置、距离矩阵的终止位置到目标位置)来综合计算音频片段的特征序列之间的最小距离,因此能够兼顾特征序列在两个方向上的匹配关系,进而能够保证计算得到的两个特征序列之间的最小距离更加准确,从而有利于提高音频片段匹配的准确性。
示意性的,以哼唱评分场景为例,对本申请实施例的技术方案进行详细阐述:
如图10所示,根据本申请实施例的哼唱评分方法,包括如下步骤:
步骤S1001,服务器采集用户清唱的音频片段。
在本申请的一个实施例中,用户清唱一小段指定歌曲,终端采集用户的音频片段,并记录音频片段的开始和结束时间,进而得到音频时长。可选地,当终端采集到的音频片段的音频时长小于预设时间长度时,则过滤掉该段音频片段,并且返回评分失败的信息。
步骤S1002,服务器提取音频片段的音高序列。
在本申请的一个实施例中,采用自相关函数法、Yin算法或PYin算法按照指定的采样率来提取音频片段的音高序列。
步骤S1003,服务器提取目标歌曲片段乐器数字接口(Musical Instrument Digital Interface,MIDI)的音高序列。
在本申请的一个实施例中,哼唱评分底层依赖于音乐MIDI库,它是评分标准的来源。其中,用户清唱的音频片段对应有开始时间戳和结束时间戳,可精确对应至MIDI库中的一段音符序列,然后根据MIDI音符与音高的换算公式得出音高序列。可选地,目标歌曲片段MIDI的音高序列预先生成在MIDI库中。
需要说明的是,步骤S1003与步骤S1001和步骤S1002之间并没有绝对的先后顺序。
步骤S1004,服务器采用本申请实施例的音频信息匹配算法计算两个音高序列的最小距离。
步骤S1005,服务器将最小距离的结果经过换算公式转换为标准得分。
在本申请的一个实施例中,由于步骤S1004中的最小距离是通过累加得到的,因此音高序列越长,计算得到的最小距离的值越大,为了消除这个影响,将步骤S1004计算得到的最小距离除以用户音频片段的音高序列长度得到标准得分,进而将该标准得分反馈给用户。
在上述步骤S1004中,假设两个音高序列分别为序列p和序列q,其中序列p的长度为m,序列q的长度为n,即p=(p 1,p 2,...,p i,...,p m);q=(q 1,q 2,...,q j,...,q n)。 那么采用本申请实施例的音频信息匹配算法计算两个音高序列的最小距离的方案如下所述,主要包括下述步骤:
在步骤(1)中,计算序列p和序列q的距离矩阵和权重矩阵。
在本申请的一个实施例中,距离矩阵中的位置(i,j)表示p i与q j之间的距离d(i,j),若该距离是欧式距离,那么d(i,j)=(p i-q j) 2
在本申请的一个实施例中,权重矩阵是考虑距离矩阵中的元素位置(i,j)与距离矩阵的对角线(即点(1,1)与点(m,n)形成的直线)之间的距离,如果序列p和序列q越接近,则最后计算出的从距离矩阵的起始位置(即点(1,1))到终止位置(即点(m,n))之间的最优路径越靠近对角线,因此对于远离对角线的元素位置设置惩罚权重,即元素位置越接近对角线,则对应的权重越接近1,元素位置越远离对角线,则对应的权重越大。
在本申请的一个实施例中,距离矩阵中的位置(i,j)与距离矩阵的对角线之间的距离t(i,j)可以近似为:
Figure PCTCN2020091698-appb-000002
在本申请的一个实施例中,权重矩阵中位置(i,j)的计算公式是t(i,j)的平滑修正,即距离矩阵中的位置(i,j)对应的权重w(i,j)可以通过如下公式1来计算:
w(i,j)=[1+t(i,j)×0.025]×[1+log(1+t(i,j)×0.025)]     公式1
其中,上述公式1中的数值仅为示例。
在步骤(2)中:根据上述步骤(1)计算得到的距离矩阵和权重矩阵,计算向前累加距离矩阵、向前来源节点矩阵、向后累加距离矩阵和向后来源节点矩阵。
在本申请的一个实施例中,从距离矩阵的起始位置和终止位置分别向中间位置回溯来找出最短距离。即本申请实施例中提出了改进的动态时间归整(Dynamic Time Warping,DTW)算法,可以进行双向计算,以兼顾考虑序列头部匹配和尾部匹配,使得匹配情况更加全面。
在本申请的一个实施例中,在从距离矩阵的起始位置向前计算的过程中,为了加速距离累加的过程并考虑距离矩阵中的元素位置与对角线的偏离程度(即元素位置对应的权重),位置(i,j)的累加距离从(i-1,j-1)、(i-1,j-2)和(i-2,j-1)三个位置开始,并且定义如下述公式2所示的向前局部判决函数 D_forward(i,j),用于表示从距离矩阵的起始位置到距离矩阵中的位置(i,j)的累加距离,并以此得到向前累加距离矩阵。
Figure PCTCN2020091698-appb-000003
对公式2进行调整,得到如下公式3:
Figure PCTCN2020091698-appb-000004
在上述实施例中,从距离矩阵的起始位置向前计算是从距离矩阵的左下角(1,1)开始,每一行从左到右计算,在计算向前累加距离D_forward(i,j)的同时,将D_forward(i,j)的来源节点下标,即(i-1,j-1)、(i-1,j-2)和(i-2,j-1)中的一个,存入向前来源节点矩阵中的(i,j)位置。
在本申请的一个实施例中,在从距离矩阵的终止位置向后计算的过程与前述实施例中向前计算的方案类似,只是计算方向是从距离矩阵的终止位置开始,即从距离矩阵的右上角(m,n)位置开始,(i,j)位置的累加距离从(i+1,j+1)、(i+1,j+2)和(i+2,j+1)三个位置开始。并且定义如下述公式4所示的向后局部判决函数D_backward(i,j),用于表示从距离矩阵的终止位置到距离矩阵中的位置(i,j)的累加距离,并以此得到向后累加距离矩阵。
Figure PCTCN2020091698-appb-000005
对公式4进行调整,得到如下公式5:
Figure PCTCN2020091698-appb-000006
其中,w表示权重值,d表示距离值。
在上述实施例中,从距离矩阵的终止位置向后计算是从距离矩阵的右上角 (m,n)开始,每一行从右到左计算,在计算向后累加距离D_backward(i,j)的同时,将D_backward(i,j)的来源节点下标,即(i+1,j+1)、(i+1,j+2)和(i+2,j+1)中的一个,存入向后来源节点矩阵中的(i,j)位置。
在步骤(3)中:从向前累加距离矩阵和向后累加距离矩阵中获取最小距离和最短路径。
在本申请的一个实施例中,距离矩阵中的任何一个位置(i,j)均可从左下角和右上角开始找到连接到(i,j)的最短路径,其中最短距离计算公式如下述公式6所示:
D_total(i,j)=d(i,j)×w(i,j)+D_forward(i,j)+D_backward(i,j)     公式6
基于上述公式6,通过如下公式7计算得到最小距离min_dist:
Figure PCTCN2020091698-appb-000007
在最小距离对应的位置查找向前来源节点矩阵和向后来源节点矩阵,获取上一个节点的下标,并依次遍历得到向前(从(1,1)到(i,j))和向后(从(m,n)到(i,j))两条路径,合起来得到D_total(i,j)的最小值即为最小距离对应的全局最佳路径。具体地,如图11所示,两个音高序列分别为1101和1102,根据本申请上述实施例的技术方案最终得到的全局最佳路径为1103。
本申请上述实施例的技术方案兼顾考虑了音高序列的头部优先匹配和尾部优先匹配,因此匹配情况更加全面;同时,在计算累积距离时考虑了位置与距离矩阵对角线的偏移量,避免了最后得到的最佳路径大幅偏离距离矩阵的对角线,因此使得序列匹配更具鲁棒性。
以下介绍本申请的装置实施例,可以用于执行本申请上述实施例中的音频片段的匹配方法。对于本申请装置实施例中未披露的细节,请参照本申请上述的音频片段的匹配方法的实施例。
图12示出了根据本申请的一个实施例的音频片段的匹配装置的框图。
参照图12所示,根据本申请的一个实施例的音频片段的匹配装置1200,包括:获取单元1202、构建单元1204、处理单元1206。
获取单元1202,用于获取第一音频片段对应的第一特征序列和第二音频片段对应的第二特征序列;
构建单元1204,用于从所述获取单元1202中获取所述第一特征序列和所 述第二特征序列,并构建所述第一特征序列与所述第二特征序列之间的距离矩阵,所述距离矩阵中的元素用于表示第一位置点与第二位置点之间的距离,所述第一位置点在所述第一特征序列中,所述第二位置点在所述第二特征序列中;
处理单元1206,用于从所述构建单元1204中获取所述距离矩阵,并确定所述距离矩阵中的起始位置到目标位置之间的第一累加距离,以及所述距离矩阵中的终止位置到所述目标位置之间的第二累加距离;基于所述第一累加距离和所述第二累加距离确定所述第一特征序列与所述第二特征序列之间的最小距离;根据所述最小距离确定所述第一音频片段与所述第二音频片段之间的匹配度。
在一个可选的实施例中,所述处理单元1206,包括:
确定子单元,用于确定所述起始位置到第一候选位置之间的累加距离,所述第一候选位置位于所述起始位置与所述目标位置之间;
确定子单元,还用于根据所述起始位置到所述第一候选位置之间的累加距离,以及所述第一候选位置所表示的距离值,确定所述起始位置到所述目标位置之间的第一候选累加距离;将所述第一候选累加距离中的最小值确定为所述第一累加距离。
在一个可选的实施例中,所述确定子单元,还用于对所述累加距离与所述第一候选位置所表示的所述距离值进行求和,得到所述第一候选位置对应的距离和值,所述累加距离为所述起始位置到所述第一候选位置之间的距离;
确定子单元,还用于将所述距离和值确定为所述第一候选位置对应的所述第一候选累加距离。
在一个可选的实施例中,所述确定子单元,还用于根据所述第一候选位置所表示的距离值,以及所述第一候选位置对应的权重值,对各个所述第一候选位置所表示的距离值进行加权计算,得到所述第一候选位置对应的加权距离值;
确定子单元,还用于对所述起始位置到所述第一候选位置之间的累加距离与所述第一候选位置对应的所述加权距离值进行求和,得到所述第一候选位置对应的距离和值;将所述距离和值确定为所述第一候选位置对应的所述第一候选累加距离。
在一个可选的实施例中,所述确定子单元,还用于确定所述第一候选位置 与所述距离矩阵的对角线之间的距离,所述对角线是连接所述起始位置与所述终止位置的直线;根据各个所述第一候选位置与所述对角线之间的距离,确定各个所述第一候选位置对应的权重值。
在一个可选的实施例中,所述第一候选位置与所述目标位置之间存在关联关系,所述关联关系用于表示所述第一候选位置位于所述目标位置周侧预设距离范围内。
在一个可选的实施例中,所述处理单元1206,包括:
确定子单元,用于确定所述终止位置到第二候选位置之间的累加距离,所述第二候选位置位于所述目标位置与所述终止位置之间;
确定子单元,还用于根据所述终止位置到所述第二候选位置之间的累加距离,以及所述第二候选位置所表示的距离值,确定所述终止位置到所述目标位置之间的第二候选累加距离;将所述第二候选累加距离中的最小值确定为所述第二累加距离。
在一个可选的实施例中,所述第二候选位置与所述目标位置之间存在关联关系,所述关联关系用于表示所述第二候选位置位于所述目标位置周侧预设距离范围内。
在一个可选的实施例中,所述处理单元1206,包括:
确定子单元,用于基于所述目标位置所表示的距离值、所述第一累加距离和所述第二累加距离,确定所述目标位置对应的最小累加距离;从所述目标位置对应的最小累加距离中选择最小值,将所述最小值确定作为所述第一特征序列与所述第二特征序列之间的最小距离。
在一个可选的实施例中,所述确定子单元,还用于对所述目标位置所表示的距离值、所述第一累加距离和所述第二累加距离进行求和,得到所述目标位置对应的所述最小累加距离;
或,
所述确定子单元,还用于对所述目标位置所表示的距离值和所述目标位置对应的权重值进行加权计算,得到所述目标位置对应的加权距离值;对所述加权距离值、所述第一累加距离和所述第二累加距离进行求和,得到所述目标位置对应的所述最小累加距离。
在一个可选的实施例中,所述第一音频片段对应n个所述第一特征序列,所述第二音频片段对应n个所述第二特征序列,n为正整数;
所述获取单元1202,还用于获取n个所述第一特征序列和n个所述第二特征序列之间的n个最小距离;
处理单元1206,还用于对n个所述最小距离进行加权求和计算,得到所述第一音频片段和所述第二音频片段之间的加权距离值;根据所述加权距离值确定所述第一音频片段和所述第二音频片段之间的匹配度。
在一个可选的实施例中,基于前述方案,所述多种特征包括:音频片段的音高特征、乐音能量、频率倒谱系数、每帧的均方根能量值。
值得注意的是,上述获取单元1202可以有计算机设备中的存储器实现,也可以由计算机设备中的处理器实现,还可以由存储器和处理器共同实现;上述构建单元1204和处理单元1206由计算机设备中的处理器实现。
图13示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
需要说明的是,图13示出的电子设备的计算机系统1300仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图13所示,计算机系统1300包括中央处理单元(Central Processing Unit,CPU)1301,其可以根据存储在只读存储器(Read-Only Memory,ROM)1302中的程序或者从存储部分1308加载到随机访问存储器(Random Access Memory,RAM)1303中的程序而执行各种适当的动作和处理,例如执行上述实施例中所述的方法。在RAM 1303中,还存储有系统操作所需的各种程序和数据。CPU 1301、ROM 1302以及RAM 1303通过总线1304彼此相连。输入/输出(Input/Output,I/O)接口1305也连接至总线1304。
以下部件连接至I/O接口1305:包括键盘、鼠标等的输入部分1306;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1307;包括硬盘等的存储部分1308;以及包括诸如LAN(Local Area Network,局域网)卡、调制解调器等的网络接口卡的通信部分1309。通信部分1309经由诸如因特网的网络执行通信处理。驱动器1310也根据需要连接至I/O接口1305。可拆卸介质1311,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1310上,以便于从其上读出的计算机程序根据需要被安装入存储部分1308。
特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1309从网络上被下载和安装,和/或从可拆卸介质1311被安装。在该计算机程序被中央处理单元(CPU)1301执行时,执行本申请的系统中限定的各种功能。
需要说明的是,本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。其中,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执 行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该电子设备执行时,使得该电子设备实现上述实施例中所述的方法。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本申请实施方式的方法。
本领域技术人员在考虑说明书及实践这里公开的实施方式后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (15)

  1. 一种音频片段的匹配方法,其特征在于,应用于计算机设备中,所述方法包括:
    获取第一音频片段对应的第一特征序列和第二音频片段对应的第二特征序列;
    构建所述第一特征序列与所述第二特征序列之间的距离矩阵,所述距离矩阵中的元素用于表示第一位置点与第二位置点之间的距离,所述第一位置点在所述第一特征序列中,所述第二位置点在所述第二特征序列中;
    确定所述距离矩阵中的起始位置到目标位置之间的第一累加距离,以及所述距离矩阵中的终止位置到所述目标位置之间的第二累加距离;
    基于所述第一累加距离和所述第二累加距离确定所述第一特征序列与所述第二特征序列之间的最小距离;
    根据所述最小距离确定所述第一音频片段与所述第二音频片段之间的匹配度。
  2. 根据权利要求1所述的方法,其特征在于,所述确定所述距离矩阵中的起始位置到目标位置之间的第一累加距离,包括:
    确定所述起始位置到第一候选位置之间的累加距离,所述第一候选位置位于所述起始位置与所述目标位置之间;
    根据所述起始位置到所述第一候选位置之间的累加距离,以及所述第一候选位置所表示的距离值,确定所述起始位置到所述目标位置之间的第一候选累加距离;
    将所述第一候选累加距离中的最小值确定为所述第一累加距离。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述起始位置到所述第一候选位置之间的累加距离,以及所述第一候选位置所表示的距离值,确定所述起始位置到所述目标位置之间的第一候选累加距离,包括:
    对所述累加距离与所述第一候选位置所表示的所述距离值进行求和,得到所述第一候选位置对应的距离和值,所述累加距离为所述起始位置到所述第一候选位置之间的距离;
    将所述距离和值确定为所述第一候选位置对应的所述第一候选累加距离。
  4. 根据权利要求2所述的方法,其特征在于,所述根据所述起始位置到所述第一候选位置之间的累加距离,以及所述第一候选位置所表示的距离值,确定所述起始位置到所述目标位置之间的第一候选累加距离,包括:
    根据所述第一候选位置所表示的距离值,以及所述第一候选位置对应的权重值,对各个所述第一候选位置所表示的距离值进行加权计算,得到所述第一候选位置对应的加权距离值;
    对所述累加距离与所述第一候选位置对应的所述加权距离值进行求和,得到所述第一候选位置对应的距离和值,所述累加距离为所述起始位置到所述第一候选位置之间的距离;
    将所述距离和值确定为所述第一候选位置对应的所述第一候选累加距离。
  5. 根据权利要求4所述的方法,其特征在于,所述将所述距离和值确定为所述第一候选位置对应的所述第一候选累加距离之前,还包括:
    确定所述第一候选位置与所述距离矩阵的对角线之间的距离,所述对角线是连接所述起始位置与所述终止位置的直线;
    根据各个所述第一候选位置与所述对角线之间的距离,确定各个所述第一候选位置对应的权重值。
  6. 根据权利要求2所述的方法,其特征在于,
    所述第一候选位置与所述目标位置之间存在关联关系,所述关联关系用于表示所述第一候选位置位于所述目标位置周侧预设距离范围内。
  7. 根据权利要求1所述的方法,其特征在于,所述计算所述距离矩阵中的终止位置到所述目标位置之间的第二累加距离,包括:
    确定所述终止位置到第二候选位置之间的累加距离,所述第二候选位置位于所述目标位置与所述终止位置之间;
    根据所述终止位置到所述第二候选位置之间的累加距离,以及所述第二候选位置所表示的距离值,确定所述终止位置到所述目标位置之间的第二候选累 加距离;
    将所述第二候选累加距离中的最小值确定为所述第二累加距离。
  8. 根据权利要求7所述的方法,其特征在于,
    所述第二候选位置与所述目标位置之间存在关联关系,所述关联关系用于表示所述第二候选位置位于所述目标位置周侧预设距离范围内。
  9. 根据权利要求1至8中任一所述的方法,其特征在于,所述基于所述第一累加距离和所述第二累加距离确定所述第一特征序列与所述第二特征序列之间的最小距离,包括:
    基于所述目标位置所表示的距离值、所述第一累加距离和所述第二累加距离,确定所述目标位置对应的最小累加距离;
    从所述目标位置对应的最小累加距离中选择最小值,将所述最小值确定为所述第一特征序列与所述第二特征序列之间的最小距离。
  10. 根据权利要求9所述的方法,其特征在于,所述基于所述第一累加距离、所述第二累加距离和所述目标位置所表示的距离值,确定所述目标位置对应的最小累加距离,包括:
    对所述目标位置所表示的距离值、所述第一累加距离和所述第二累加距离进行求和,得到所述目标位置对应的所述最小累加距离;
    或,
    对所述目标位置所表示的距离值和所述目标位置对应的权重值进行加权计算,得到所述目标位置对应的加权距离值;对所述加权距离值、所述第一累加距离和所述第二累加距离进行求和,得到所述目标位置对应的所述最小累加距离。
  11. 根据权利要求1至8中任一所述的方法,其特征在于,所述第一音频片段对应n个所述第一特征序列,所述第二音频片段对应n个所述第二特征序列,n为正整数;
    所述根据所述最小距离确定所述第一音频片段与所述第二音频片段之间 的匹配度,包括:
    获取n个所述第一特征序列和n个所述第二特征序列之间的n个最小距离;
    对n个所述最小距离进行加权求和计算,得到所述第一音频片段和所述第二音频片段之间的加权距离值;
    根据所述加权距离值确定所述第一音频片段和所述第二音频片段之间的匹配度。
  12. 一种音频片段的匹配方法,其特征在于,所述方法包括:
    服务器获取第一音频片段对应的第一特征序列和第二音频片段对应的第二特征序列;
    所述服务器构建所述第一特征序列与所述第二特征序列之间的距离矩阵,所述距离矩阵中的元素用于表示第一位置点与第二位置点之间的距离,所述第一位置点在所述第一特征序列中,所述第二位置点在所述第二特征序列中;
    所述服务器确定所述距离矩阵中的起始位置到目标位置之间的第一累加距离,以及所述距离矩阵中的终止位置到所述目标位置之间的第二累加距离;
    所述服务器基于所述第一累加距离和所述第二累加距离确定所述第一特征序列与所述第二特征序列之间的最小距离;
    所述服务器根据所述最小距离确定所述第一音频片段与所述第二音频片段之间的匹配度。
  13. 一种音频片段的匹配装置,其特征在于,所述装置包括:
    获取单元,用于获取第一音频片段对应的第一特征序列和第二音频片段对应的第二特征序列;
    构建单元,用于从所述获取单元中获取所述第一特征序列和所述第二特征序列,并构建所述第一特征序列与所述第二特征序列之间的距离矩阵,所述距离矩阵中的元素用于表示第一位置点与第二位置点之间的距离,所述第一位置点在所述第一特征序列中,所述第二位置点在所述第二特征序列中;
    处理单元,用于从所述构建单元中获取所述距离矩阵,并确定所述距离矩阵中的起始位置到目标位置之间的第一累加距离,以及所述距离矩阵中的终止 位置到所述目标位置之间的第二累加距离;基于所述第一累加距离和所述第二累加距离确定所述第一特征序列与所述第二特征序列之间的最小距离;根据所述最小距离确定所述第一音频片段与所述第二音频片段之间的匹配度。
  14. 一种计算机可读介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至12中任一项所述的音频片段的匹配方法。
  15. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1至12中任一项所述的音频片段的匹配方法。
PCT/CN2020/091698 2019-05-24 2020-05-22 音频片段的匹配方法、装置、计算机可读介质及电子设备 WO2020238777A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2021535923A JP7337169B2 (ja) 2019-05-24 2020-05-22 オーディオクリップのマッチング方法及びその装置、コンピュータプログラム並びに電子機器
EP20815214.0A EP3979241B1 (en) 2019-05-24 2020-05-22 Audio clip matching method and apparatus, computer-readable medium and electronic device
US17/336,562 US11929090B2 (en) 2019-05-24 2021-06-02 Method and apparatus for matching audio clips, computer-readable medium, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910441366.5 2019-05-24
CN201910441366.5A CN111986698B (zh) 2019-05-24 2019-05-24 音频片段的匹配方法、装置、计算机可读介质及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/336,562 Continuation US11929090B2 (en) 2019-05-24 2021-06-02 Method and apparatus for matching audio clips, computer-readable medium, and electronic device

Publications (1)

Publication Number Publication Date
WO2020238777A1 true WO2020238777A1 (zh) 2020-12-03

Family

ID=73437134

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/091698 WO2020238777A1 (zh) 2019-05-24 2020-05-22 音频片段的匹配方法、装置、计算机可读介质及电子设备

Country Status (5)

Country Link
US (1) US11929090B2 (zh)
EP (1) EP3979241B1 (zh)
JP (1) JP7337169B2 (zh)
CN (1) CN111986698B (zh)
WO (1) WO2020238777A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114724583A (zh) * 2021-01-05 2022-07-08 北京字跳网络技术有限公司 一种音乐片段的定位方法、装置、设备及存储介质
CN113268630B (zh) * 2021-06-08 2023-03-10 腾讯音乐娱乐科技(深圳)有限公司 一种音频检索方法、设备及介质
CN113488083B (zh) * 2021-08-23 2023-03-21 北京字节跳动网络技术有限公司 数据匹配方法、装置、介质及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070048695A1 (en) * 2005-08-31 2007-03-01 Wen-Chen Huang Interactive scoring system for learning language
CN103871426A (zh) * 2012-12-13 2014-06-18 上海八方视界网络科技有限公司 对比用户音频与原唱音频相似度的方法及其系统
CN106935248A (zh) * 2017-02-14 2017-07-07 广州孩教圈信息科技股份有限公司 一种语音相似度检测方法及装置
CN108417226A (zh) * 2018-01-09 2018-08-17 平安科技(深圳)有限公司 语音对比方法、终端及计算机可读存储介质
CN109192223A (zh) * 2018-09-20 2019-01-11 广州酷狗计算机科技有限公司 音频对齐的方法和装置
CN109493853A (zh) * 2018-09-30 2019-03-19 福建星网视易信息系统有限公司 一种音频相似度的确定方法及终端

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5727299A (en) * 1980-07-28 1982-02-13 Fujitsu Ltd Feature vector time series interval distance calculating device
JPS5945583A (ja) * 1982-09-06 1984-03-14 Nec Corp パタンマッチング装置
JPS61292697A (ja) * 1985-06-21 1986-12-23 三菱電機株式会社 パタン類似度計算装置
JPS62144200A (ja) * 1985-12-18 1987-06-27 富士通株式会社 連続音声認識装置
JPS62147496A (ja) * 1985-12-23 1987-07-01 富士通株式会社 連続音声認識装置
JP3631650B2 (ja) * 1999-03-26 2005-03-23 日本電信電話株式会社 音楽検索装置,音楽検索方法および音楽検索プログラムを記録した計算機読み取り可能な記録媒体
JP2001134584A (ja) * 1999-11-04 2001-05-18 Nippon Telegr & Teleph Corp <Ntt> 類似データの検索方法,検索装置および類似データ検索プログラム記録媒体
EP1785891A1 (en) * 2005-11-09 2007-05-16 Sony Deutschland GmbH Music information retrieval using a 3D search algorithm
JP5228432B2 (ja) * 2007-10-10 2013-07-03 ヤマハ株式会社 素片検索装置およびプログラム
RU2419890C1 (ru) * 2009-09-24 2011-05-27 Общество с ограниченной ответственностью "Центр речевых технологий" Способ идентификации говорящего по фонограммам произвольной устной речи на основе формантного выравнивания
JP5728918B2 (ja) * 2010-12-09 2015-06-03 ヤマハ株式会社 情報処理装置
US9344759B2 (en) * 2013-03-05 2016-05-17 Google Inc. Associating audio tracks of an album with video content
GB201310861D0 (en) * 2013-06-18 2013-07-31 Nokia Corp Audio signal analysis
US9390727B2 (en) * 2014-01-13 2016-07-12 Facebook, Inc. Detecting distorted audio signals based on audio fingerprinting
US9466316B2 (en) * 2014-02-06 2016-10-11 Otosense Inc. Device, method and system for instant real time neuro-compatible imaging of a signal
JP6011565B2 (ja) * 2014-03-05 2016-10-19 カシオ計算機株式会社 音声検索装置、音声検索方法及びプログラム
KR101551122B1 (ko) * 2014-09-26 2015-09-08 중앙대학교 산학협력단 사용자 단말에서의 패턴 인식 방법 및 그 장치
US9501568B2 (en) * 2015-01-02 2016-11-22 Gracenote, Inc. Audio matching based on harmonogram
CN106547797B (zh) * 2015-09-23 2019-07-05 腾讯科技(深圳)有限公司 音频生成方法和装置
US20170294185A1 (en) * 2016-04-08 2017-10-12 Knuedge Incorporated Segmentation using prior distributions
CN106910494B (zh) * 2016-06-28 2020-11-13 创新先进技术有限公司 一种音频识别方法和装置
US10453475B2 (en) * 2017-02-14 2019-10-22 Adobe Inc. Automatic voiceover correction system
US20180254054A1 (en) * 2017-03-02 2018-09-06 Otosense Inc. Sound-recognition system based on a sound language and associated annotations
CN107563297B (zh) * 2017-08-07 2020-06-09 中国石油天然气集团公司 一种波形匹配方法及装置
CN110322897B (zh) * 2018-03-29 2021-09-03 北京字节跳动网络技术有限公司 一种音频检索识别方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070048695A1 (en) * 2005-08-31 2007-03-01 Wen-Chen Huang Interactive scoring system for learning language
CN103871426A (zh) * 2012-12-13 2014-06-18 上海八方视界网络科技有限公司 对比用户音频与原唱音频相似度的方法及其系统
CN106935248A (zh) * 2017-02-14 2017-07-07 广州孩教圈信息科技股份有限公司 一种语音相似度检测方法及装置
CN108417226A (zh) * 2018-01-09 2018-08-17 平安科技(深圳)有限公司 语音对比方法、终端及计算机可读存储介质
CN109192223A (zh) * 2018-09-20 2019-01-11 广州酷狗计算机科技有限公司 音频对齐的方法和装置
CN109493853A (zh) * 2018-09-30 2019-03-19 福建星网视易信息系统有限公司 一种音频相似度的确定方法及终端

Also Published As

Publication number Publication date
JP2022515173A (ja) 2022-02-17
EP3979241A1 (en) 2022-04-06
US11929090B2 (en) 2024-03-12
CN111986698A (zh) 2020-11-24
US20210287696A1 (en) 2021-09-16
EP3979241B1 (en) 2024-05-15
EP3979241A4 (en) 2022-08-10
CN111986698B (zh) 2023-06-30
JP7337169B2 (ja) 2023-09-01

Similar Documents

Publication Publication Date Title
WO2020238777A1 (zh) 音频片段的匹配方法、装置、计算机可读介质及电子设备
JP4640407B2 (ja) 信号処理装置、信号処理方法及びプログラム
US9418643B2 (en) Audio signal analysis
US20150094835A1 (en) Audio analysis apparatus
RU2011151721A (ru) Служба основывающегося на социальном графе списка воспроизведения
WO2021218158A1 (zh) 搭配和声的方法、装置、电子设备及计算机可读介质
US20140116233A1 (en) Metrical grid inference for free rhythm musical input
US20190130033A1 (en) Acquiring, maintaining, and processing a rich set of metadata for musical projects
CN105718486B (zh) 在线哼唱检索方法及系统
WO2023169258A1 (zh) 音频检测方法、装置、存储介质及电子设备
JP6729515B2 (ja) 楽曲解析方法、楽曲解析装置およびプログラム
JP2015031738A (ja) コード進行推定検出装置及びコード進行推定検出プログラム
Ryynanen et al. Automatic bass line transcription from streaming polyphonic audio
WO2023169259A1 (zh) 音乐热度的预测方法、装置、存储介质及电子设备
CN112820254B (zh) 一种音乐生成方法、装置、电子设备和存储介质
CN113674723A (zh) 一种音频处理方法、计算机设备及可读存储介质
KR102497878B1 (ko) 노트 레벨의 오디오 데이터를 기초로 학습을 수행하는 보컬 채보 학습 방법 및 장치
US20230139415A1 (en) Systems and methods for importing audio files in a digital audio workstation
JP2008304610A (ja) 信号特徴抽出方法、信号探索方法、信号特徴抽出装置、コンピュータプログラム、及び、記録媒体
JP2022123072A (ja) 情報処理方法
JP7147384B2 (ja) 情報処理方法および情報処理装置
JP2020106766A (ja) 演奏補正方法および演奏補正装置
JP6077492B2 (ja) 情報処理装置、情報処理方法、及びプログラム
EP4270373A1 (en) Method for identifying a song
CN113744763B (zh) 确定相似旋律的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20815214

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021535923

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020815214

Country of ref document: EP

Effective date: 20220103