CN113688283A

CN113688283A - Method and device for determining matching degree of video subtitles and electronic equipment

Info

Publication number: CN113688283A
Application number: CN202110997692.1A
Authority: CN
Inventors: 牟晋勇
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2021-11-23
Anticipated expiration: 2041-08-27
Also published as: CN113688283B

Abstract

The embodiment of the invention provides a method, a device and electronic equipment for determining the matching degree of video subtitles, wherein the method comprises the following steps: acquiring foreign language lines and corresponding Chinese subtitles of a target foreign language video; inquiring phonetic symbols corresponding to words included in foreign language lines from a preset word phonetic symbol library, and determining pronunciation change times corresponding to each word based on the phonetic symbols; and determining the matching degree of the foreign language lines of the target foreign language video and the Chinese subtitles according to the pronunciation change times and the number of the Chinese subtitles. By adopting the method, the foreign language video with high user preference degree, namely the foreign language speech and the foreign language video with high Chinese subtitle matching degree can be recommended to the user according to the determined matching degree of the foreign language speech and the Chinese subtitle of the target foreign language video, so that the recommendation effect of recommending the foreign language video to the user is improved.

Description

Method and device for determining matching degree of video subtitles and electronic equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a degree of matching between video subtitles, and an electronic device.

Background

In daily film watching life, a user can watch a large amount of foreign language movies, television shows, integrated art and other videos, such as hollywood movies, disney animations and other English videos. Generally, characters in foreign language videos are expressed in foreign languages, and corresponding translated chinese subtitles are displayed below the videos, so that viewers can know the meaning of the video expressions.

However, in the actual playing process of the foreign language video, because there is a difference between the foreign language pronunciation mouth shape change of the character in the foreign language video and the translated chinese caption, the user feels that the character in the foreign language video expresses unnaturally, which affects the viewing experience of the user, and further causes the preference degree of the user to the foreign language video to be reduced.

Therefore, the degree of correspondence between the mouth shape change of the character expressing foreign language words in the foreign language video and the Chinese subtitles influences the degree of preference of the user on the foreign language video, and the higher the degree of correspondence between the mouth shape change and the Chinese subtitles is, the higher the degree of preference of the user on the foreign language video is. Therefore, if the video software recommends foreign language videos with low correspondence degree between the mouth shape change of the character expressing foreign language words and the Chinese subtitles to the user, the use experience of the user is influenced. Therefore, how to determine the corresponding degree between the mouth shape change of the character expressing the foreign language words in the foreign language video and the Chinese subtitles to improve the recommendation effect of the foreign language video becomes a problem to be solved urgently.

Disclosure of Invention

The embodiment of the invention aims to provide a method, a device and an electronic device for determining a video subtitle matching degree, so as to determine the corresponding degree of mouth shape change of characters expressing foreign language words in a foreign language video and a Chinese subtitle.

In a first aspect of the present invention, a method for determining a degree of matching between video subtitles is provided, including:

acquiring foreign language lines and corresponding Chinese subtitles of a target foreign language video;

inquiring phonetic symbols corresponding to words included in the foreign language lines from a preset word phonetic symbol library, and determining pronunciation change times corresponding to each word based on the phonetic symbols;

and determining the matching degree of the foreign language lines of the target foreign language video and the Chinese subtitles according to the pronunciation change times and the word number of the Chinese subtitles.

Optionally, the determining the pronunciation change times corresponding to each word based on the phonetic symbol includes:

determining the number of vowels in the phonetic symbol corresponding to each word;

and determining the number of the vowels as the pronunciation change times corresponding to the word.

Optionally, the determining the matching degree between the foreign language lines of the target foreign language video and the chinese subtitles according to the pronunciation change times and the number of words of the chinese subtitles includes:

determining the sum of pronunciation change times corresponding to words included in the foreign language lines as the pronunciation change times of the foreign language lines aiming at each foreign language line of the target foreign language video;

determining the word number of the Chinese subtitles corresponding to each foreign language speech of the target foreign language video;

traversing each foreign language speech of the target foreign language video, and determining the difference rate between the foreign language speech and the corresponding Chinese caption based on the pronunciation change times of the foreign language speech and the word number of the Chinese caption corresponding to the foreign language speech;

and determining the matching degree of the foreign language lines and the Chinese subtitles of the target foreign language video according to the difference rate between each foreign language line of the target foreign language video and the corresponding Chinese subtitles.

Optionally, the determining a difference rate between the foreign language lines and the corresponding chinese subtitles based on the pronunciation change times of the foreign language lines and the number of words of the chinese subtitles corresponding to the foreign language lines includes:

if the pronunciation change times of the foreign language lines are larger than the word number of the Chinese subtitles corresponding to the foreign language lines, determining the ratio of the word number of the Chinese subtitles corresponding to the foreign language lines to the pronunciation change times of the foreign language lines as the difference rate between the foreign language lines and the corresponding Chinese subtitles;

and if the pronunciation change times of the foreign language lines are smaller than or equal to the number of the Chinese subtitles corresponding to the foreign language lines, determining the ratio of the pronunciation change times of the foreign language lines to the number of the Chinese subtitles corresponding to the foreign language lines as the difference rate between the foreign language lines and the corresponding Chinese subtitles.

Optionally, the determining the matching degree between the foreign language lines of the target foreign language video and the chinese subtitles according to the difference rate between each foreign language line of the target foreign language video and the corresponding chinese subtitles includes:

calculating the average value of the difference rate between the foreign language lines of the target foreign language video and the corresponding Chinese subtitles as the average difference rate;

calculating a standard deviation of a difference rate between foreign language lines of the target foreign language video and corresponding Chinese subtitles based on the average difference rate and the difference rate between each foreign language line of the target foreign language video and the corresponding Chinese subtitles;

and if the average difference rate is greater than or equal to a preset difference rate threshold value and the standard variance is less than or equal to a preset fluctuation threshold value, determining that the matching degree of the foreign language speech of the target foreign language video and the Chinese caption is a first matching degree, otherwise, determining that the matching degree of the foreign language speech of the target foreign language video and the Chinese caption is a second matching degree, wherein the first matching degree is greater than the second matching degree.

Optionally, after determining the matching degree between the foreign language lines of the target foreign language video and the chinese subtitles, the method further includes:

and if the target foreign language video is the video to be recommended, recommending the target foreign language video to a user based on the matching degree of the foreign language lines and the Chinese subtitles of the target foreign language video.

In a second aspect of the present invention, there is also provided an apparatus for determining a degree of matching between video subtitles, including:

the speech acquisition module is used for acquiring foreign language speech of the target foreign language video and corresponding Chinese subtitles;

the change frequency determining module is used for inquiring the phonetic symbols corresponding to the words included in the foreign language lines from a preset word phonetic symbol library and determining the pronunciation change frequency corresponding to each word based on the phonetic symbols;

and the matching degree determining module is used for determining the matching degree of the foreign language lines of the target foreign language video and the Chinese subtitles according to the pronunciation change times and the word number of the Chinese subtitles.

Optionally, the change number determining module is specifically configured to determine the number of vowels in the phonetic symbol corresponding to each word; and determining the number of the vowels as the pronunciation change times corresponding to the word.

Optionally, the matching degree determining module includes:

the frequency determining submodule is used for determining the sum of pronunciation change frequencies corresponding to words included in each foreign language speech of the target foreign language video as the pronunciation change frequency of the foreign language speech;

the word number determining submodule is used for determining the word number of the Chinese subtitle corresponding to each foreign language speech of the target foreign language video;

the difference rate determining submodule is used for traversing each foreign language speech of the target foreign language video and determining the difference rate between the foreign language speech and the corresponding Chinese caption based on the pronunciation change times of the foreign language speech and the word number of the Chinese caption corresponding to the foreign language speech;

and the matching degree determining submodule is used for determining the matching degree of the foreign language lines and the Chinese subtitles of the target foreign language video according to the difference rate between each foreign language line of the target foreign language video and the corresponding Chinese subtitles.

Optionally, the difference rate determining sub-module is specifically configured to determine, if the pronunciation change number of the foreign language speech is greater than the word number of the chinese subtitle corresponding to the foreign language speech, a ratio of the word number of the chinese subtitle corresponding to the foreign language speech to the pronunciation change number of the foreign language speech as the difference rate between the foreign language speech and the corresponding chinese subtitle; and if the pronunciation change times of the foreign language lines are smaller than or equal to the number of the Chinese subtitles corresponding to the foreign language lines, determining the ratio of the pronunciation change times of the foreign language lines to the number of the Chinese subtitles corresponding to the foreign language lines as the difference rate between the foreign language lines and the corresponding Chinese subtitles.

Optionally, the matching degree determining sub-module is specifically configured to calculate an average value of difference rates between the foreign language speech of the target foreign language video and the corresponding chinese subtitles, where the average value is used as an average difference rate; calculating a standard deviation of a difference rate between foreign language lines of the target foreign language video and corresponding Chinese subtitles based on the average difference rate and the difference rate between each foreign language line of the target foreign language video and the corresponding Chinese subtitles; and if the average difference rate is greater than or equal to a preset difference rate threshold value and the standard variance is less than or equal to a preset fluctuation threshold value, determining that the matching degree of the foreign language speech of the target foreign language video and the Chinese caption is a first matching degree, otherwise, determining that the matching degree of the foreign language speech of the target foreign language video and the Chinese caption is a second matching degree, wherein the first matching degree is greater than the second matching degree.

Optionally, the device further includes a video recommendation module, configured to recommend the target foreign language video to a user based on a matching degree between foreign language lines of the target foreign language video and the chinese subtitles if the target foreign language video is a video to be recommended.

In another aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the method for determining the matching degree of the video subtitles when executing the program stored in the memory.

In yet another aspect of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any one of the above-mentioned methods for determining a degree of matching between video subtitles.

In yet another aspect of the present invention, there is also provided a computer program product containing instructions, which when run on a computer, causes the computer to execute any of the above-mentioned methods for determining a degree of matching of video subtitles.

By adopting the method provided by the embodiment of the invention, foreign language lines and corresponding Chinese subtitles of the target foreign language video can be obtained; inquiring phonetic symbols corresponding to words included in foreign language lines from a preset word phonetic symbol library, and determining pronunciation change times corresponding to each word based on the phonetic symbols; and determining the matching degree of the foreign language lines of the target foreign language video and the Chinese subtitles according to the pronunciation change times and the number of the Chinese subtitles. Furthermore, according to the determined matching degree between the foreign language lines and the Chinese subtitles of the target foreign language video, the foreign language video with high user preference degree, namely the foreign language video with high matching degree between the foreign language lines and the Chinese subtitles, can be recommended to the user, and the recommendation effect of recommending the foreign language video to the user is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a flowchart of a method for determining a degree of matching between video subtitles according to an embodiment of the present invention;

fig. 2 is a flowchart for determining the matching degree between the foreign language speech and the chinese subtitles of the target foreign language video according to the embodiment of the present invention;

fig. 3 is a schematic structural diagram of an apparatus for determining a degree of matching between video subtitles according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an apparatus for determining a degree of matching between video subtitles according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an apparatus for determining a degree of matching between video subtitles according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

Because the corresponding degree of the mouth shape change of the character expressing the foreign language words in the foreign language video and the Chinese subtitles affects the use experience of the user, but the prior art does not have an effective method for determining the corresponding degree of the mouth shape change of the character expressing the foreign language words in the foreign language video and the Chinese subtitles, so in order to determine the corresponding degree of the mouth shape change of the character expressing the foreign language words in the foreign language video and the Chinese subtitles and further improve the recommendation effect of the foreign language video, the embodiment of the invention provides a method, a device and an electronic device for determining the matching degree of the video subtitles.

Fig. 1 is a flowchart of a method for determining a degree of matching between video subtitles according to an embodiment of the present invention, as shown in fig. 1, the method includes:

step 101, obtaining foreign language speech words and corresponding Chinese subtitles of a target foreign language video.

Step 102, phonetic symbols corresponding to words included in the foreign language lines are inquired from a preset word phonetic symbol library, and pronunciation change times corresponding to each word are determined based on the phonetic symbols.

And 103, determining the matching degree of the foreign language lines of the target foreign language video and the Chinese subtitles according to the pronunciation change times and the word number of the Chinese subtitles.

In the embodiment of the invention, the foreign language video can be English video, French video, German video and the like. Specifically, the foreign language videos may be specifically foreign language movie videos, foreign language appreciation videos, foreign language news videos, and the like. The target foreign language video may be a complete foreign language movie video, a complete foreign language comprehension video in one stage, or a complete foreign language news video in one stage, or the target foreign language video may also be a partial fragment of a complete foreign language movie video, a partial fragment of a complete foreign language comprehension video in one stage, or a partial fragment of a complete foreign language news video in one stage, which is not specifically limited herein.

In the embodiment of the invention, the foreign language video can be stored in the database of the server. The server can directly determine the matching degree of foreign language lines and Chinese subtitles according to the target foreign language video needing to determine the matching degree of the video subtitles. In the embodiment of the invention, the matching degree of the foreign language speech and the Chinese caption can also be determined aiming at the target foreign language video played at the client: when the matching degree of the video subtitles needs to be determined, the client can send foreign language lines of the target foreign language video and the corresponding Chinese subtitles to the server, and then the server determines the matching degree of the foreign language lines and the Chinese subtitles by adopting the method provided by the embodiment of the invention.

In the embodiment of the invention, a corresponding relation library of foreign language words and phonetic symbols thereof, namely a preset word phonetic symbol library, can be maintained at the server. Each foreign language word and the phonetic symbol thereof are stored in the preset word phonetic symbol library, and each foreign language word and the phonetic symbol thereof in the preset word phonetic symbol library can be correspondingly stored. For example, a word phonetic symbol library stored with each english word and its phonetic symbol preset may be maintained at the server, and the phonetic symbol of each english word may be found in the preset word phonetic symbol library, for example, the english phonetic symbol of the english word banana may be found

Harmony American phonetic symbol

In a possible implementation manner, the determining the pronunciation change times corresponding to each word based on the phonetic symbols may specifically include steps a1-a 2:

step A1: the number of vowels in the phonetic symbol corresponding to each word is determined.

Step A2: and determining the number of vowels as the pronunciation change times corresponding to the word.

If the target foreign language video is an English video, the server can maintain a preset English word phonetic symbol library for storing English words and phonetic symbols thereof. The preset english word phonetic symbol library stores vowels, consonants and the like in english phonetic symbols, for example, the vocabulary of the stored vowels may include the following unit tones and diphthongs:

unit sound: [ i:]、[i]、

[u:]、[u]、

[ɑ:]、[Λ]、[e]、

a vowel: [ ei]、[ai]、

[au]、

In this embodiment, for each word of the target foreign language video, the phonetic symbol of the target foreign language video may be searched in the preset english word phonetic symbol library, the number of vowels in the phonetic symbol of the target foreign language video may be determined, and the number of vowels in the phonetic symbol of the target foreign language video may be determined as the number of vowels in the target foreign language videoThe number of times of change of pronunciation. If the target foreign language video comprises an English word banana, the phonetic symbol of the banana can be searched in the preset English word phonetic symbol library: english phonetic symbol

Harmony American phonetic symbol

English phonetic symbol capable of determining banana

The vowels in (1) include

[ɑ:]And

banana American phonetic symbol

The vowels in (1) include

And

therefore, whether the phonetic symbol is American phonetic symbol or English phonetic symbol, the number of vowels in the phonetic symbol of the banana is 3, and the pronunciation change frequency corresponding to the English word banana can be determined to be 3. The pronunciation change times corresponding to the word may reflect the number of times that the character is changing mouth shape when reading the word, and the number of times that the character is changing mouth shape when reading the word is consistent with the pronunciation change times corresponding to the word, for example, the character is changing mouth shape 3 times when reading the word banana, and the number of times of mouth shape change is consistent with the pronunciation change times corresponding to banana.

In a possible implementation manner, fig. 2 is a flowchart of determining a matching degree between foreign language lines of a target foreign language video and chinese subtitles according to an embodiment of the present invention, and as shown in fig. 2, the step of determining the matching degree between the foreign language lines of the target foreign language video and the chinese subtitles according to the number of pronunciation changes and the number of words of the chinese subtitles may specifically include:

step 201, aiming at each foreign language speech of the target foreign language video, determining the sum of the pronunciation change times corresponding to the words included in the foreign language speech as the pronunciation change time of the foreign language speech.

In the embodiment of the invention, each foreign language speech of the target foreign language video corresponds to one Chinese caption, and each foreign language speech of the target foreign language video and the corresponding Chinese caption are stored with the identification of the target foreign language video. Each foreign language speech and corresponding chinese caption have their own identifier, which may be the sequential number of the foreign language speech and its corresponding chinese caption in all the speech of the target video, e.g., the identifier of the target foreign language video a is V_idThe identification of the third foreign language speech in the target foreign language video a is the serial number E of the third foreign language speech in the target foreign language video a₃And the mark of the Chinese caption corresponding to the third foreign language speech is also the serial number C of the Chinese caption in all the speech of the target foreign language video A₃And the third foreign language speech in the target foreign language video A and the Chinese caption corresponding to the third foreign language speech are stored with the mark V of the target foreign language video A_id。

Each foreign language line may include at least one word. In this step, the sum of the pronunciation change times corresponding to all words included in each foreign language speech-line may be determined as the pronunciation change time of the foreign language speech-line.

For example, if the target foreign language video is an english video S, and the first line in the english video S is "a yellow banana", the line includes the word: a. yellow and banana. English phonetic symbol capable of searching word a in the preset word phonetic symbol library

Harmony American phonetic symbol

Further, the number of vowels in the phonetic symbol of the word a can be determined to be 1, namely the pronunciation change frequency of the word a is 1; english phonetic symbol capable of searching word yellow

Harmony American phonetic symbol

Further, the number of vowels in the phonetic symbol of the word yellow can be determined to be 2, that is, the pronunciation change frequency of the word yellow is 2; english phonetic symbol capable of searching word banana

Harmony American phonetic symbol

Further, it can be determined that the number of vowels in the phonetic symbol of the word banana is 3, i.e., the pronunciation change number of the word yellow is 3. Therefore, the sum of the pronunciation change times of all the words "a", "yellow" and "banana" in the first sentence of the phrase "a yellow banana" that is "1 +2+3 ═ 6" can be determined as the pronunciation change time of the first sentence of the phrase "a yellow banana" in the english video S.

Step 202, determining the word number of the Chinese subtitle corresponding to each foreign language speech of the target foreign language video.

For example, if the chinese subtitle corresponding to the first sentence of the speech in the english video S is "a yellow banana", it may be determined that the number of words of the chinese subtitle corresponding to the first sentence of the speech in the english video S is "a yellow banana" is 7.

Step 203, traversing each foreign language speech of the target foreign language video, and determining the difference rate between the foreign language speech and the corresponding Chinese caption based on the pronunciation change times of the foreign language speech and the word number of the Chinese caption corresponding to the foreign language speech.

In this step, if the pronunciation change times of the foreign language lines are larger than the word number of the Chinese subtitles corresponding to the foreign language lines, determining the ratio of the word number of the Chinese subtitles corresponding to the foreign language lines to the pronunciation change times of the foreign language lines as the difference rate between the foreign language lines and the corresponding Chinese subtitles; and if the pronunciation change times of the foreign language lines are smaller than or equal to the number of the Chinese subtitles corresponding to the foreign language lines, determining the ratio of the pronunciation change times of the foreign language lines to the number of the Chinese subtitles corresponding to the foreign language lines as the difference rate between the foreign language lines and the corresponding Chinese subtitles.

For example, if the english video S includes N english japanese syllables, the set of english japanese syllables of all sentences is S_English：[E₁,E₂,…,E_N]. Can traverse each sentence of English lines E_iI is more than or equal to 1 and less than or equal to N, and each sentence of English lines E_iEach Word of_XFinding out correspondent phonetic symbol SoundMark from preset word phonetic symbol library_XCalculating phonetic symbol SoundMark __XNumber of vowels Vcount __xTherefore, the lines E of each sentence of English video S can be obtained_iThe number set Vc of vowels corresponding to each word: [ Vcount \u₁,Vcount_₂,…,Vcount__X]. Further, a set S of all English lines in the English video S can be obtained_EnglishThe total number of vowels set SVc corresponding to each sentence of English speech: [ Vc ]_{_all_1},Vc_{_all_2},…,Vc_{_all_N}]I.e. all English lines in the English video S are set S_EnglishPronunciation change times SVc corresponding to each sentence of English lines: [ Vc ]_{_all_1},Vc_{_all_2},…,Vc_{_all_N}]。

If the target foreign language video is English video S, the Chinese character curtain set corresponding to each English line in the English video S is S_chinese：[C₁,C₂,…,C_N]Traversal of Chinese captions C per sentence_iI is more than or equal to 1 and less than or equal to N, and Chinese captions C are added to each sentence_iNumber of Chinese characters Ccount_{_x}Further, a Chinese caption set S corresponding to each sentence of English lines in the English video S can be obtained_ChineseSet Cc of the number of words of chinese subtitles per sentence: [ Ccount_{_1},Ccount_{_2},…,Ccount_{_X}]. And isChinese captions C for each sentence_iFor corresponding to each sentence of English lines E_iAnd (7) corresponding.

With the first sentence English line E in the English video S₁And its corresponding Chinese caption C₁For example, if the sentence is English speech E₁The number of pronunciation variations Vc_{_all_1}Greater than the English speech E₁Number of words Ccount of corresponding chinese caption_{_1}The English line E of the sentence₁Number of words Ccount of corresponding chinese caption_{_1}The pronunciation change times Vc of the English line of the sentence_{_all_1}Is determined as the difference rate P between the foreign language lines and the corresponding Chinese captions₁＝Ccount_{_1}/Vc_{_all_1}. If the sentence is English speech-line E₁The number of pronunciation variations Vc_{_all_1}The number of characters Ccount of the Chinese caption corresponding to the English table word is less than or equal to_{_1}The pronunciation change times Vc of the English line of the sentence_{_all_1}The number Ccount of Chinese captions corresponding to the English line_{_1}Is determined as the difference rate P between the English lines and the corresponding Chinese captions₁＝Vc_{_all_1}/Ccount_{_1}。

Traversing each sentence of the lines in the english video S, the difference rate between each sentence of the lines in the english video S and the corresponding chinese subtitle can be obtained, and the difference rate between each sentence of the lines in the english video S and the corresponding chinese subtitle can form a set SP: [ P ]₁,…,P_N]。P_iThe difference rate between the ith sentence lines in the English video S and the corresponding Chinese subtitles is that i is more than or equal to 1 and less than or equal to N.

And 204, determining the matching degree of the foreign language lines and the Chinese subtitles of the target foreign language video according to the difference rate between each foreign language line of the target foreign language video and the corresponding Chinese subtitles.

In this step, determining the matching degree between the foreign language lines and the chinese subtitles may specifically include the following steps B1-B3:

step B1: and calculating the average value of the difference rate between the foreign language speech of the target foreign language video and the corresponding Chinese subtitle as the average difference rate.

For example, if the target foreign language video is an english video S, and the difference rate set SP between each line of the lines in the english video S and the corresponding chinese subtitles: [ P ]₁,…,P_N]Then, the average value P of the difference rates between all the lines of the english video S and the corresponding chinese subtitles can be calculated_avr＝(P₁+…+P_N) N, then P can be_avrAs the average difference rate.

Step B2: and calculating the standard variance of the difference rate between the foreign language lines of the target foreign language video and the corresponding Chinese subtitles based on the average difference rate and the difference rate between each foreign language line of the target foreign language video and the corresponding Chinese subtitles.

For example, if the target foreign language video is an english video S, and the difference rate set SP between each line of the lines in the english video S and the corresponding chinese subtitles: [ P ]₁,…,P_N]And average disparity ratio P of English video S_avr＝(P₁+…+P_N) and/N, calculating the standard variance σ of the difference rate between all the lines of the English video S and the corresponding Chinese subtitles:

step B3: and if the average difference rate is greater than or equal to a preset difference rate threshold value and the standard deviation is less than or equal to a preset fluctuation threshold value, determining the matching degree of the foreign language speech of the target foreign language video and the Chinese caption as a first matching degree, otherwise, determining the matching degree of the foreign language speech of the target foreign language video and the Chinese caption as a second matching degree.

Wherein the first matching degree is greater than the second matching degree.

In the embodiment of the invention, a difference rate threshold value P is preset_validLess than 1, and may be adjusted according to the actual application condition under the premise of less than 1, and is not specifically limited herein. The preset fluctuation threshold may be set to σ_valid＝P_avrM can be an empirical reference value, M can be properly adjusted according to practical application conditions, and is not particularly limited herein。

In the embodiment of the invention, the practical application significance of the standard deviation reflection of the difference rate between the foreign language speech of the target foreign language video and the corresponding Chinese caption can be as follows: when the data distribution is comparatively dispersed (i.e. the data fluctuates greatly around the average difference rate), the sum of squares of the differences between each data and the average difference rate is large, and the standard deviation after the arithmetic square root is large; when the data distribution is more concentrated, the sum of squares of the differences of the individual data from the average difference rate is small. Therefore, the larger the standard deviation σ, the larger the fluctuation of the data; the smaller the standard deviation σ, the smaller the fluctuation of the data.

In the embodiment of the invention, the average difference rate is more than or equal to the preset difference rate threshold value, which indicates that the difference rate between the actual foreign language pronunciation mouth shape change of the character in the target foreign language video and the word number of the Chinese caption is small, namely the matching degree of the foreign language lines of the target foreign language video and the Chinese caption is high and is relatively acceptable for users; if the standard variance is less than or equal to the preset fluctuation threshold value, the fact that the actual pronunciation mouth shape change of each foreign language of the character in the target foreign language video and the fluctuation rate of the number of the Chinese character curtains are small is shown, and the difference rate is relatively stable and continuous. Therefore, if the average difference rate is greater than or equal to the preset difference rate threshold value and the standard deviation is less than or equal to the preset fluctuation threshold value, the matching degree of the foreign language speech of the target foreign language video and the Chinese caption is determined to be the first matching degree, the matching degree of the foreign language speech of the target foreign language video and the Chinese caption is higher, and otherwise, the matching degree of the foreign language speech of the target foreign language video and the Chinese caption is lower.

By adopting the method provided by the embodiment of the invention, the difference rate between each foreign language speech and the corresponding Chinese subtitle can be determined through the pronunciation change number of each foreign language speech of the target foreign language video and the word number of the corresponding Chinese subtitle, and further the foreign language video with high user preference degree, namely the foreign language video with high matching degree between the foreign language speech and the Chinese subtitle, is recommended to the user according to the matching degree between the foreign language speech of the target foreign language video and the corresponding Chinese subtitle, so that the recommendation effect of recommending the foreign language video to the user is improved.

In a possible embodiment, the step of determining the matching degree between the foreign language lines of the target foreign language video and the chinese subtitles according to the pronunciation change times and the number of words of the chinese subtitles may further include the following steps C1-C3:

step C1: and determining the sum of the pronunciation change times corresponding to all the words of the target foreign language video as the total pronunciation change time.

Step C2: and determining the total word number of the Chinese subtitles corresponding to all foreign language lines of the target foreign language video.

Step C3: and calculating the absolute value of the ratio of the total word number to the total pronunciation change times as the matching degree of the foreign language lines of the target foreign language video and the Chinese subtitles.

If the matching degree is larger than the preset matching degree threshold value, the difference rate between the actual foreign language pronunciation mouth shape change of the character in the target foreign language video and the word number of the Chinese caption is small, namely the matching degree between the foreign language lines of the target foreign language video and the Chinese caption is high. If the matching degree is not larger than the preset matching degree threshold value, the difference rate between the actual foreign language pronunciation mouth shape change of the character in the target foreign language video and the word number of the Chinese caption is larger, namely the matching degree between the foreign language lines of the target foreign language video and the Chinese caption is lower. The preset matching degree threshold may be specifically set according to the actual application, and is not specifically limited herein.

In one possible implementation, after the matching degree of the foreign language speech of the target foreign language video and the chinese subtitle is determined, if the target foreign language video is a video to be recommended, the target foreign language video may be recommended to the user based on the matching degree of the foreign language speech of the target foreign language video and the chinese subtitle. Specifically, if the target foreign language video is the video to be recommended, and the determined matching degree between the foreign language lines and the Chinese subtitles of the target foreign language video is the first matching degree, the matching degree between the foreign language lines and the Chinese subtitles of the target foreign language video is high, that is, the difference between the mouth shape change of the person speaking the foreign language in the target foreign language video and the Chinese subtitles is small, the target foreign language video looks more natural, the user is more interested and is favored by the user, so the matching degree between the foreign language lines and the Chinese subtitles of the target foreign language video can be used as a characteristic dimension for recommending the video to the user, that is, the matching degree between the foreign language lines and the Chinese subtitles of the target foreign language video can be considered when determining whether to recommend the target foreign language video to the user.

By adopting the method provided by the embodiment of the invention, the matching degree of the foreign language lines and the Chinese subtitles of the target foreign language video can be determined by analyzing the difference rate between the mouth shape change of the figure speaking the foreign language in the target foreign language video and the number of the Chinese subtitles, and the matching degree can be extracted as one characteristic dimension of the target foreign language video to be used as one dimension of a video recommendation system, so that the recommendation accuracy of the video recommendation system is improved, and the recommendation effect of recommending the foreign language video to a user is improved.

Based on the same inventive concept, according to the method for determining the degree of matching between video subtitles provided in the foregoing embodiment of the present invention, correspondingly, another embodiment of the present invention further provides an apparatus for determining the degree of matching between video subtitles, a schematic structural diagram of which is shown in fig. 3, specifically including:

a speech acquiring module 301, configured to acquire foreign language speech of a target foreign language video and a corresponding chinese subtitle;

a change number determining module 302, configured to query a phonetic symbol corresponding to a word included in the foreign language speech from a preset word phonetic symbol library, and determine a pronunciation change number corresponding to each word based on the phonetic symbol;

and a matching degree determining module 303, configured to determine a matching degree between the foreign language lines of the target foreign language video and the chinese subtitles according to the pronunciation change times and the number of words of the chinese subtitles.

By adopting the device provided by the embodiment of the invention, foreign language lines and corresponding Chinese subtitles of the target foreign language video can be obtained; inquiring phonetic symbols corresponding to words included in foreign language lines from a preset word phonetic symbol library, and determining pronunciation change times corresponding to each word based on the phonetic symbols; and determining the matching degree of the foreign language lines of the target foreign language video and the Chinese subtitles according to the pronunciation change times and the number of the Chinese subtitles. Furthermore, according to the determined matching degree between the foreign language lines and the Chinese subtitles of the target foreign language video, the foreign language video with high user preference degree, namely the foreign language video with high matching degree between the foreign language lines and the Chinese subtitles, can be recommended to the user, and the recommendation effect of recommending the foreign language video to the user is improved.

Optionally, the change number determining module 302 is specifically configured to determine the number of vowels in the phonetic symbol corresponding to each word; and determining the number of the vowels as the pronunciation change times corresponding to the word.

Optionally, referring to fig. 4, the matching degree determining module 303 includes:

the frequency determining submodule 401 is configured to determine, for each foreign language speech of the target foreign language video, the sum of the pronunciation change frequencies corresponding to the words included in the foreign language speech as the pronunciation change frequency of the foreign language speech;

a word number determining submodule 402, configured to determine a word number of a chinese subtitle corresponding to each foreign language speech of the target foreign language video;

a difference rate determining sub-module 403, configured to traverse each foreign language speech of the target foreign language video, and determine a difference rate between the foreign language speech and the corresponding chinese subtitle based on the pronunciation change number of the foreign language speech and the number of words of the chinese subtitle corresponding to the foreign language speech;

and a matching degree determining sub-module 404, configured to determine a matching degree between the foreign language lines and the chinese subtitles of the target foreign language video according to a difference rate between each foreign language line of the target foreign language video and the corresponding chinese subtitles.

Optionally, the difference rate determining sub-module 403 is specifically configured to determine, if the pronunciation change number of the foreign language speech is greater than the number of words of the chinese subtitle corresponding to the foreign language speech, a ratio of the number of words of the chinese subtitle corresponding to the foreign language speech to the pronunciation change number of the foreign language speech as the difference rate between the foreign language speech and the corresponding chinese subtitle; and if the pronunciation change times of the foreign language lines are smaller than or equal to the number of the Chinese subtitles corresponding to the foreign language lines, determining the ratio of the pronunciation change times of the foreign language lines to the number of the Chinese subtitles corresponding to the foreign language lines as the difference rate between the foreign language lines and the corresponding Chinese subtitles.

Optionally, the matching degree determining sub-module 404 is specifically configured to calculate an average value of difference rates between the foreign language speech of the target foreign language video and the corresponding chinese subtitle, as an average difference rate; calculating a standard deviation of a difference rate between foreign language lines of the target foreign language video and corresponding Chinese subtitles based on the average difference rate and the difference rate between each foreign language line of the target foreign language video and the corresponding Chinese subtitles; and if the average difference rate is greater than or equal to a preset difference rate threshold value and the standard variance is less than or equal to a preset fluctuation threshold value, determining that the matching degree of the foreign language speech of the target foreign language video and the Chinese caption is a first matching degree, otherwise, determining that the matching degree of the foreign language speech of the target foreign language video and the Chinese caption is a second matching degree, wherein the first matching degree is greater than the second matching degree.

Optionally, referring to fig. 5, the apparatus further includes a video recommending module 501, configured to recommend the target foreign language video to the user based on a matching degree between foreign language lines of the target foreign language video and chinese subtitles if the target foreign language video is a video to be recommended.

By adopting the device provided by the embodiment of the invention, the matching degree of foreign language lines and Chinese subtitles of the target foreign language video can be determined by analyzing the difference rate between the mouth shape change of the figure speaking the foreign language in the target foreign language video and the number of the Chinese subtitles, and the matching degree can be extracted as one characteristic dimension of the target foreign language video to be used as one dimension of a video recommendation system, so that the recommendation accuracy of the video recommendation system is improved, and the recommendation effect of recommending the foreign language video to a user is improved.

An embodiment of the present invention further provides an electronic device, as shown in fig. 6, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete mutual communication through the communication bus 604,

a memory 603 for storing a computer program;

the processor 601 is configured to implement the following steps when executing the program stored in the memory 603:

The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the terminal and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the method for determining the degree of matching between video subtitles according to any one of the above embodiments.

In another embodiment of the present invention, there is also provided a computer program product containing instructions, which when run on a computer, causes the computer to execute the method for determining the degree of matching of video subtitles according to any one of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, the electronic device and the storage medium, since they are substantially similar to the method embodiments, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for determining matching degree of video subtitles is characterized by comprising the following steps:

2. The method of claim 1, wherein determining the number of pronunciation variations for each of the words based on the phonetic symbol comprises:

3. The method of claim 1, wherein determining how well the foreign language lines of the target foreign language video match the chinese subtitles according to the pronunciation change times and the number of words of the chinese subtitles comprises:

4. The method of claim 3, wherein determining the rate of difference between the foreign language lines and the corresponding Chinese subtitles based on the number of pronunciation changes of the foreign language lines and the number of words of the Chinese subtitles corresponding to the foreign language lines comprises:

5. The method of claim 3, wherein determining how well the foreign language lines of the target foreign language video match the chinese subtitles according to a difference ratio between each foreign language line of the target foreign language video and the corresponding chinese subtitles comprises:

6. The method of any one of claims 1-5, wherein after the determining the degree of matching of foreign language speech of the target foreign language video with Chinese subtitles, the method further comprises:

7. An apparatus for determining a degree of matching between subtitles of a video, comprising:

8. The apparatus according to claim 7, wherein the change number determining module is specifically configured to determine the number of vowels in the phonetic symbol corresponding to each of the words; and determining the number of the vowels as the pronunciation change times corresponding to the word.

9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.