CN113688283B

CN113688283B - Method and device for determining video subtitle matching degree and electronic equipment

Info

Publication number: CN113688283B
Application number: CN202110997692.1A
Authority: CN
Inventors: 牟晋勇
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2023-09-05
Anticipated expiration: 2041-08-27
Also published as: CN113688283A

Abstract

The embodiment of the invention provides a method and a device for determining the matching degree of video subtitles and electronic equipment, wherein the method comprises the following steps: obtaining a foreign language speech of a target foreign language video and a corresponding Chinese caption; inquiring phonetic symbols corresponding to words included in foreign language speech from a preset word phonetic symbol library, and determining pronunciation change times corresponding to each word based on the phonetic symbols; and determining the matching degree of the foreign language speech of the target foreign language video and the Chinese caption according to the pronunciation change times and the word number of the Chinese caption. By adopting the method, the foreign language video with high user preference degree, namely the foreign language speech and the Chinese subtitle, can be recommended to the user according to the determined matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle, and the recommending effect of recommending the foreign language video to the user is improved.

Description

Method and device for determining video subtitle matching degree and electronic equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a video subtitle matching degree, and an electronic device.

Background

In daily movie watching life, a user can watch a large amount of videos such as foreign language movies, television shows, variety and the like, such as English videos such as hollywood movies, disney cartoons and the like. In general, characters in a foreign language video are expressed in a foreign language, and simultaneously, a corresponding translated Chinese subtitle is displayed below the video, so that a viewer can know the meaning of the video expression.

However, in the actual playing process of the foreign language video, the user can feel that the character expression in the foreign language video is unnatural due to the difference between the foreign language pronunciation mouth shape change of the character in the foreign language video and the translated Chinese subtitle, so that the watching experience of the user is affected, and the preference degree of the user on the foreign language video is further reduced.

Therefore, the degree of correspondence between the mouth shape change of the character expression foreign language words in the foreign language video and the Chinese subtitle can influence the preference degree of the user on the foreign language video, and the higher the degree of correspondence between the mouth shape change and the Chinese subtitle is, the higher the preference degree of the user on the foreign language video is. Based on the above, if the video software recommends the foreign language video with the mouth shape of the character expressing the foreign language word and the very low degree of correspondence with the Chinese caption to the user, the use experience of the user is likely to be affected. Therefore, how to determine the correspondence between the mouth shape variation of the characters expressing the foreign language words in the foreign language video and the Chinese subtitle, so as to improve the recommendation effect of the foreign language video becomes a problem to be solved urgently.

Disclosure of Invention

The embodiment of the invention aims to provide a method and a device for determining the matching degree of video captions and electronic equipment, so as to determine the corresponding degree of the mouth shape change of characters expressing foreign language words in foreign language videos and Chinese captions.

In a first aspect of the present invention, there is provided a method for determining a matching degree of video subtitles, including:

obtaining a foreign language speech of a target foreign language video and a corresponding Chinese caption;

inquiring phonetic symbols corresponding to words included in the foreign language speech from a preset word phonetic symbol library, and determining pronunciation change times corresponding to each word based on the phonetic symbols;

and determining the matching degree of the foreign language speech of the target foreign language video and the Chinese caption according to the pronunciation change times and the word number of the Chinese caption.

Optionally, the determining, based on the phonetic symbols, the number of pronunciation changes corresponding to each word includes:

determining the number of vowels in the phonetic symbol corresponding to each word;

and determining the number of the vowels as the pronunciation change times corresponding to the word.

Optionally, the determining, according to the number of pronunciation changes and the number of words of the chinese subtitle, the matching degree of the foreign language speech of the target foreign language video and the chinese subtitle includes:

Aiming at each foreign language speech of the target foreign language video, determining the sum of pronunciation change times corresponding to words included in the foreign language speech as the pronunciation change times of the foreign language speech;

determining the word number of Chinese subtitles corresponding to each foreign language line of the target foreign language video;

traversing each foreign language speech of the target foreign language video, and determining the difference rate between the foreign language speech and the corresponding Chinese subtitle based on the pronunciation change times of the foreign language speech and the number of Chinese subtitle characters corresponding to the foreign language speech;

and determining the matching degree of the foreign language speech lines of the target foreign language video and the Chinese subtitles according to the difference rate between each sentence of foreign language speech lines of the target foreign language video and the corresponding Chinese subtitles.

Optionally, the determining the difference rate between the foreign language line and the corresponding chinese subtitle based on the pronunciation change times of the foreign language line and the number of words of the chinese subtitle corresponding to the foreign language line includes:

if the pronunciation change times of the foreign language speech is larger than the number of Chinese characters of the Chinese characters corresponding to the foreign language speech, determining the ratio of the number of the Chinese characters corresponding to the foreign language speech to the pronunciation change times of the foreign language speech as the difference rate between the foreign language speech and the corresponding Chinese characters;

If the pronunciation change times of the foreign language speech is less than or equal to the number of Chinese characters of the Chinese characters corresponding to the foreign language speech, determining the ratio of the pronunciation change times of the foreign language speech to the number of Chinese characters of the Chinese characters corresponding to the foreign language speech as the difference rate between the foreign language speech and the corresponding Chinese characters.

Optionally, the determining, according to the difference rate between each foreign language speech of the target foreign language video and the corresponding chinese subtitle, the matching degree of the foreign language speech of the target foreign language video and the chinese subtitle includes:

calculating an average value of the difference rates between the foreign language speech of the target foreign language video and the corresponding Chinese subtitles, and taking the average value as an average difference rate;

calculating a standard deviation of the difference rate between the foreign language speech of the target foreign language video and the corresponding Chinese caption based on the average difference rate and the difference rate between each foreign language speech of the target foreign language video and the corresponding Chinese caption;

if the average difference rate is greater than or equal to a preset difference rate threshold value and the standard deviation is less than or equal to a preset fluctuation threshold value, determining that the matching degree of the foreign language speech lines of the target foreign language video and the Chinese subtitles is a first matching degree, otherwise, determining that the matching degree of the foreign language speech lines of the target foreign language video and the Chinese subtitles is a second matching degree, wherein the first matching degree is greater than the second matching degree.

Optionally, after the determining the matching degree of the foreign language speech line of the target foreign language video and the chinese subtitle, the method further includes:

if the target foreign language video is the video to be recommended, recommending the target foreign language video to a user based on the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle.

In a second aspect of the present invention, there is also provided a device for determining a matching degree of video subtitles, including:

the speech acquisition module is used for acquiring the foreign speech of the target foreign language video and the corresponding Chinese captions;

the change frequency determining module is used for inquiring phonetic symbols corresponding to words included in the foreign language speech from a preset word phonetic symbol library and determining the pronunciation change frequency corresponding to each word based on the phonetic symbols;

and the matching degree determining module is used for determining the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle according to the pronunciation change times and the word number of the Chinese subtitle.

Optionally, the change number determining module is specifically configured to determine the number of vowels in a phonetic symbol corresponding to each word; and determining the number of the vowels as the pronunciation change times corresponding to the word.

Optionally, the matching degree determining module includes:

the frequency determining submodule is used for determining the sum of pronunciation change times corresponding to words included in each foreign language speech word as the pronunciation change times of the foreign language speech word aiming at each foreign language speech word of the target foreign language video;

the word number determining submodule is used for determining the word number of the Chinese subtitle corresponding to each foreign language line of the target foreign language video;

the difference rate determination submodule is used for traversing each foreign language speech of the target foreign language video and determining the difference rate between the foreign language speech and the corresponding Chinese subtitle based on the pronunciation change times of the foreign language speech and the number of Chinese subtitle words corresponding to the foreign language speech;

and the matching degree determining sub-module is used for determining the matching degree of the foreign language speech lines of the target foreign language video and the Chinese subtitles according to the difference rate between each sentence of the foreign language speech lines of the target foreign language video and the corresponding Chinese subtitles.

Optionally, the difference rate determining submodule is specifically configured to determine, as a difference rate between the foreign language speech word and the corresponding chinese subtitle, a ratio of a number of words of the chinese subtitle corresponding to the foreign language speech word to a number of pronunciation changes of the foreign language speech word if the number of pronunciation changes of the foreign language speech word is greater than the number of words of the chinese subtitle corresponding to the foreign language speech word; if the pronunciation change times of the foreign language speech is less than or equal to the number of Chinese characters of the Chinese characters corresponding to the foreign language speech, determining the ratio of the pronunciation change times of the foreign language speech to the number of Chinese characters of the Chinese characters corresponding to the foreign language speech as the difference rate between the foreign language speech and the corresponding Chinese characters.

Optionally, the matching degree determining submodule is specifically configured to calculate an average value of difference rates between the foreign language speech of the target foreign language video and the corresponding chinese subtitle, as an average difference rate; calculating a standard deviation of the difference rate between the foreign language speech of the target foreign language video and the corresponding Chinese caption based on the average difference rate and the difference rate between each foreign language speech of the target foreign language video and the corresponding Chinese caption; if the average difference rate is greater than or equal to a preset difference rate threshold value and the standard deviation is less than or equal to a preset fluctuation threshold value, determining that the matching degree of the foreign language speech lines of the target foreign language video and the Chinese subtitles is a first matching degree, otherwise, determining that the matching degree of the foreign language speech lines of the target foreign language video and the Chinese subtitles is a second matching degree, wherein the first matching degree is greater than the second matching degree.

Optionally, the device further includes a video recommendation module, configured to recommend the target foreign language video to a user based on a matching degree of a foreign language speech of the target foreign language video and a chinese subtitle if the target foreign language video is a video to be recommended.

In yet another aspect of the present invention, there is also provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory perform communication with each other through the communication bus;

A memory for storing a computer program;

and the processor is used for realizing any of the steps of the method for determining the video subtitle matching degree when executing the program stored in the memory.

In still another aspect of the implementation of the present invention, there is further provided a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the method for determining a matching degree of video subtitles according to any of the above.

In yet another aspect of the implementation of the present invention, there is also provided a computer program product containing instructions that, when executed on a computer, cause the computer to perform the method for determining the matching degree of video subtitles as described in any of the above.

By adopting the method provided by the embodiment of the invention, the foreign language speech of the target foreign language video and the corresponding Chinese captions can be obtained; inquiring phonetic symbols corresponding to words included in foreign language speech from a preset word phonetic symbol library, and determining pronunciation change times corresponding to each word based on the phonetic symbols; and determining the matching degree of the foreign language speech of the target foreign language video and the Chinese caption according to the pronunciation change times and the word number of the Chinese caption. Furthermore, according to the determined matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle, the foreign language video with high user preference degree, namely the foreign language speech with high matching degree of the foreign language speech and the Chinese subtitle, can be recommended to the user, and the recommending effect of the user on the foreign language video is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a flowchart of a method for determining a matching degree of a video subtitle according to an embodiment of the present invention;

FIG. 2 is a flowchart for determining matching degree of a foreign language speech line and a Chinese caption of a target foreign language video according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a device for determining matching degree of video subtitles according to an embodiment of the present invention;

fig. 4 is a schematic diagram of another structure of a device for determining matching degree of video subtitles according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a device for determining matching degree of video subtitles according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.

The method for determining the matching degree of the video subtitles is provided in the embodiment of the invention, so as to determine the corresponding degree of the mouth shape change of the character expression foreign language word in the foreign language video and the Chinese subtitles, and further improve the recommending effect of the foreign language video.

Fig. 1 is a flowchart of a method for determining a matching degree of a video subtitle according to an embodiment of the present invention, where, as shown in fig. 1, the method includes:

step 101, obtaining foreign language speech of the target foreign language video and corresponding Chinese captions.

Step 102, inquiring phonetic symbols corresponding to words included in the foreign language speech from a preset word phonetic symbol library, and determining the pronunciation change times corresponding to each word based on the phonetic symbols.

And step 103, determining the matching degree of the foreign language speech of the target foreign language video and the Chinese caption according to the pronunciation change times and the word number of the Chinese caption.

In the embodiment of the invention, the foreign language video can be English video, french video or German video, and the like. Specifically, the foreign language video may be a foreign language movie video, a foreign language variety video, a foreign language news video, and the like. The target foreign language video may be a complete foreign language movie video, a complete foreign language variety video or a complete foreign language news video, and the target foreign language video may also be a partial segment of a complete foreign language movie video, a partial segment of a complete foreign language variety video or a partial segment of a complete foreign language news video, which is not specifically limited herein.

In the embodiment of the invention, the foreign language video can be stored in the database of the server. The server side can directly determine the matching degree of the foreign language lines and the Chinese subtitles aiming at the target foreign language video needing to determine the matching degree of the video subtitles. In the embodiment of the invention, the matching degree of the foreign language speech and the Chinese caption can be determined according to the target foreign language video played at the client: when the matching degree of the video subtitles needs to be determined, the client can send the foreign language speech of the target foreign language video and the corresponding Chinese subtitles to the server, and then the server determines the matching degree of the foreign language speech and the Chinese subtitles by adopting the method provided by the embodiment of the invention.

In the embodiment of the invention, a corresponding relation library of the foreign language words and the phonetic symbols thereof, namely a preset word phonetic symbol library, can be maintained at the server side. Each foreign language word and the phonetic symbol thereof are stored in a preset word phonetic symbol library, and each foreign language word and the phonetic symbol thereof in the preset word phonetic symbol library can be correspondingly stored. For example, a preset word phonetic symbol library storing each english word and its phonetic symbol may be maintained at the server, and then the phonetic symbol of each english word may be found in the preset word phonetic symbol library, for example, the english phonetic symbol of the english word banana may be foundAnd American phonetic symbol->

In one possible implementation manner, the determining the number of pronunciation change times corresponding to each word based on the phonetic symbols may specifically include steps A1-A2:

step A1: the number of vowels in the phonetic symbol corresponding to each word is determined.

Step A2: the number of vowels is determined as the number of pronunciation changes corresponding to the word.

If the target foreign language video is an English video, the server can maintain the English word and the phonetic symbol thereof in a preset English word phonetic symbol library. The preset english word phonetic symbol library stores vowels, consonants and the like in the english phonetic symbols, for example, the vocabulary of the stored vowels may include the following single vowels and double vowels:

Single vowels: [ i:]、[i]、[u:]、[u]、/>[ɑ:]、[Λ]、[e]、/>

diphthong: [ ei ]]、[ai]、[au]、/>

In this embodiment, for each word of the target foreign language video, a phonetic symbol of the target foreign language video is searched in the preset phonetic symbol library of english words, the number of vowels in the phonetic symbol is determined, and the number of vowels in the phonetic symbol is determined as the corresponding pronunciation change times. If the target foreign language video includes the english word banana, the phonetic symbols of the banana may be searched in the preset english word phonetic symbol library: english phonetic symbolAnd American phonetic symbol->The english phonetic symbols +.>The vowels of (a) include->[ɑ:]And->American phonetic symbol of banana->The vowels of (a) includeAnd->Therefore, the number of vowels in the phonetic symbol of the banana is 3, either in the American phonetic symbol or in the English phonetic symbol, and the number of pronunciation changes corresponding to the English word banana can be determined to be 3. The number of pronunciation changes corresponding to a word may reflect the number of times the character is changed in shape when reading the word, and the number of times the character is changed in shape when reading the word is consistent with the number of pronunciation changes corresponding to the word, e.g., the character is changed 3 times in shape when reading the word banana, and the number of times of change in shape is consistent with the number of pronunciation changes corresponding to banana.

In a possible implementation manner, fig. 2 is a flowchart of determining a matching degree of a foreign language speech line of a target foreign language video and a chinese subtitle according to an embodiment of the present invention, as shown in fig. 2, the step of determining the matching degree of the foreign language speech line of the target foreign language video and the chinese subtitle according to the number of pronunciation change times and the number of words of the chinese subtitle may specifically include:

Step 201, for each foreign language speech of the target foreign language video, determining the sum of pronunciation change times corresponding to the words included in the foreign language speech as the pronunciation change times of the foreign language speech.

In the embodiment of the invention, each foreign language speech of the target foreign language video corresponds to a Chinese caption, and each foreign language speech of the target foreign language video and the corresponding Chinese caption are stored with the identification of the target foreign language video. Each foreign language line and the corresponding Chinese caption have self marks which can be the foreign language lineNumbering of the order of the speech and its corresponding chinese captions in all sentence speech of the target video, e.g. the target foreign language video a is identified as V _id The identification of the third foreign language speech in the target foreign language video A is the number E of the sequence of the third foreign language speech in all the speech in the target foreign language video A ₃ And the identification of Chinese captions corresponding to the third foreign language speech is also the serial number C of the sequence of the Chinese captions in all the speech of the target foreign language video A ₃ In addition, the third foreign language speech in the target foreign language video A and the Chinese subtitle corresponding to the third foreign language speech are stored with the identifier V of the target foreign language video A _id 。

Each foreign language word may include at least one word. In this step, the sum of the number of pronunciation changes corresponding to all the words included in each foreign language speech line may be determined as the number of pronunciation changes of the foreign language speech line.

For example, the target foreign language video is an english video S, where the first sentence in the english video S is "a yellow kana", and the sentence includes a word: a. yellow and banana. The english phonetic symbols of word a can be searched in the preset word phonetic symbol libraryAnd American phonetic symbol->Further, the number of vowels in the phonetic symbol of the word a can be determined to be 1, namely the pronunciation change frequency of the word a is 1; english phonetic symbols of the word yellow can be found +.>And American phonetic symbol->Further, the number of vowels in the phonetic symbol of the word yellow can be determined to be 2, namely the pronunciation change times of the word yellow is 2; english phonetic symbols of the word banana can be found +.>And American phonetic symbol->Further, it is possible to determine that the number of vowels in the phonetic symbol of the word banana is 3, i.e., the number of pronunciation changes of the word yellow is 3. Therefore, the sum "1+2+3=6" of the pronunciation change times corresponding to all the words "a", "yellow" and "banna" in the first sentence in "a yellow banna" can be determined as the pronunciation change times of the first sentence in the english video S for "a yellow banna".

Step 202, determining the number of words of the Chinese subtitle corresponding to each foreign language line of the target foreign language video.

For example, if the first sentence in the english video S is "a yellow banana" and the corresponding chinese subtitle is "a yellow banana", the number of words of the first sentence in the english video S corresponding to "a yellow banana" may be determined to be 7.

Step 203, traversing each foreign language speech of the target foreign language video, and determining the difference rate between the foreign language speech and the corresponding Chinese subtitle based on the pronunciation change times of the foreign language speech and the number of Chinese subtitle words corresponding to the foreign language speech.

In the step, if the pronunciation change times of the foreign language speech is larger than the number of Chinese characters corresponding to the foreign language speech, determining the ratio of the number of Chinese characters corresponding to the foreign language speech to the pronunciation change times of the foreign language speech as the difference rate between the foreign language speech and the corresponding Chinese characters; if the pronunciation change times of the foreign language speech is less than or equal to the number of Chinese characters of the Chinese characters corresponding to the foreign language speech, determining the ratio of the pronunciation change times of the foreign language speech to the number of Chinese characters of the Chinese characters corresponding to the foreign language speech as the difference rate between the foreign language speech and the corresponding Chinese characters.

For example, if the english video S includes N english speech, the english speech set of all the sentences is S _English ：[E ₁ ,E ₂ ,…,E _N ]. Can traverse English in each sentenceSpeech line E _i I is more than or equal to 1 and less than or equal to N, and each English word E is processed _i Word u of each Word in (a) _X Searching corresponding phonetic symbols SoundMark/u from a preset word phonetic symbol library _X Calculating the phonetic symbol SoundMark/u _X Number of middle vowels Vcount/u _x So each sentence of speech E of English video S can be obtained _i The number of vowels corresponding to each word is set Vc: [ Vcount/u ] ₁ ,Vcount_ ₂ ,…,Vcount_ _X ]. Furthermore, all English speech sets S in the English video S can be obtained _English Vowel total number set SVc corresponding to each English speech: [ Vc ] _{_all_1} ,Vc _{_all_2} ,…,Vc _{_all_N} ]I.e. all English speech sets S in the English video S _English Pronunciation change times SVc corresponding to each English speech: [ Vc ] _{_all_1} ,Vc _{_all_2} ,…,Vc _{_all_N} ]。

If the target foreign language video is English video S, the Chinese subtitle set corresponding to each English speech in English video S is S _chinese ：[C ₁ ,C ₂ ,…,C _N ]Traversing each sentence of Chinese caption C _i I is more than or equal to 1 and less than or equal to N, and each sentence of Chinese caption C _i Number of Chinese characters Ccount _{_x} Furthermore, a Chinese caption set S corresponding to each English speech in the English video S can be obtained _Chinese A set Cc of the number of words of each sentence of chinese subtitles: [ Ccount _{_1} ,Ccount _{_2} ,…,Ccount _{_X} ]. And each sentence of Chinese caption C _i For English speech E _i Corresponding to the above.

With the first English speech E in English video S ₁ Chinese caption C corresponding to the same ₁ For example, if the sentence is English speech E ₁ The number of pronunciation variations Vc _{_all_1} English speech E larger than the sentence ₁ Word number Ccount of corresponding Chinese caption _{_1} English speech E ₁ Word number Ccount of corresponding Chinese caption _{_1} The pronunciation change times Vc of English speech _{_all_1} Is determined as the difference rate P between the foreign language speech line and the corresponding Chinese caption ₁ ＝Ccount _{_1} /Vc _{_all_1} . If the processSentence English speech E ₁ The number of pronunciation variations Vc _{_all_1} The word number Ccount of Chinese subtitles corresponding to English lines of the sentence is smaller than or equal to _{_1} The pronunciation change times Vc of English speech of the sentence are changed _{_all_1} Word number Ccount of Chinese caption corresponding to English speech of the sentence _{_1} Is determined as the difference rate P between the English speech of the sentence and the corresponding Chinese subtitle ₁ ＝Vc _{_all_1} /Ccount _{_1} 。

Traversing each sentence of the English video S, the difference rate between each sentence of the English video S and the corresponding Chinese subtitle can be obtained, and the difference rate between each sentence of the English video S and the corresponding Chinese subtitle can form a set SP: [ P ] ₁ ,…,P _N ]。P _i The difference rate between the i-th sentence in the English video S and the corresponding Chinese subtitle is equal to or more than 1 and equal to or less than N.

Step 204, determining the matching degree of the foreign language speech lines of the target foreign language video and the Chinese subtitles according to the difference rate between each sentence of foreign language speech lines of the target foreign language video and the corresponding Chinese subtitles.

In this step, determining the matching degree of the foreign language speech line and the chinese subtitle may specifically include the following steps B1 to B3:

step B1: and calculating an average value of the difference rates between the foreign language speech of the target foreign language video and the corresponding Chinese subtitles as an average difference rate.

For example, if the target foreign language video is an english video S, and the difference rate set SP between each sentence in the english video S and the corresponding chinese subtitle: [ P ] ₁ ,…,P _N ]The average value P of the difference rates between all the lines of the English video S and the corresponding Chinese subtitles can be calculated _avr ＝(P ₁ +…+P _N ) N, then P can be _avr As the average difference rate.

Step B2: based on the average difference rate and the difference rate between each foreign language speech of the target foreign language video and the corresponding Chinese caption, calculating the standard deviation of the difference rate between the foreign language speech of the target foreign language video and the corresponding Chinese caption.

For example, if the target foreign language video is an english video S, and the difference rate set SP between each sentence in the english video S and the corresponding chinese subtitle: [ P ] ₁ ,…,P _N ]And average difference rate P of english video S _avr ＝(P ₁ +…+P _N ) and/N, calculating the standard deviation sigma of the difference rate between all the lines of the English video S and the corresponding Chinese subtitles:

step B3: if the average difference rate is greater than or equal to a preset difference rate threshold value and the standard deviation is less than or equal to a preset fluctuation threshold value, determining that the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle is a first matching degree, otherwise, determining that the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle is a second matching degree.

Wherein the first degree of matching is greater than the second degree of matching.

In the embodiment of the invention, a difference rate threshold value P is preset _valid Is smaller than 1, and can be adjusted on the premise of smaller than 1 according to practical application conditions, and is not particularly limited herein. The preset fluctuation threshold may be set to σ _valid ＝P _avr M, M may be an empirical reference value, and M may be appropriately adjusted according to practical application conditions, which is not particularly limited herein.

In the embodiment of the invention, the practical application meaning of the standard deviation reflection of the difference rate between the foreign language speech of the target foreign language video and the corresponding Chinese caption can be as follows: when the data distribution is relatively scattered (i.e., the data fluctuates more around the average difference rate), the sum of squares of differences between each data and the average difference rate is larger, and the standard deviation after the arithmetic square root is larger; when the data distribution is relatively concentrated, the sum of squares of differences between individual data and the average difference rate is small. Thus the larger the standard deviation σ, the larger the fluctuation of the data; the smaller the standard deviation σ, the smaller the fluctuation of the data.

In the embodiment of the invention, the average difference rate is larger than or equal to the preset difference rate threshold value, and the difference rate between the actual foreign language pronunciation mouth shape change of the character in the target foreign language video and the word number of the Chinese subtitle is smaller, namely the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle is higher, so that the method is acceptable to users; and if the standard variance is smaller than or equal to a preset fluctuation threshold, the standard variance indicates that the actual pronunciation mouth shape change of each sentence of the character in the target foreign language video and the quantity difference rate of Chinese subtitles are not greatly fluctuated, and the difference rate is relatively stable and continuous. Therefore, if the average difference rate is greater than or equal to the preset difference rate threshold and the standard deviation is less than or equal to the preset fluctuation threshold, determining that the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle is the first matching degree, wherein the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle is higher, otherwise, the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle is lower.

By adopting the method provided by the embodiment of the invention, the difference rate between each foreign language word and the corresponding Chinese subtitle can be determined through the pronunciation change number of each foreign language word of the target foreign language video and the word number of the corresponding Chinese subtitle, so that the foreign language video with high user preference, namely, the foreign language video with high matching degree of the foreign language word and the Chinese subtitle, is recommended to the user according to the matching degree between the foreign language word of the target foreign language video and the corresponding Chinese subtitle, and the recommending effect of the foreign language video recommended to the user is improved.

In one possible implementation manner, the step of determining the matching degree of the foreign language speech of the target foreign language video and the chinese subtitle according to the number of pronunciation change times and the number of words of the chinese subtitle may specifically further include the following steps C1-C3:

step C1: and determining the sum of pronunciation change times corresponding to all words of the target foreign language video as the total pronunciation change times.

Step C2: and determining the total word number of Chinese subtitles corresponding to all the foreign language lines of the target foreign language video.

Step C3: and calculating the absolute value of the ratio of the total word number to the total pronunciation change number, and taking the absolute value as the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle.

If the matching degree is larger than the preset matching degree threshold, the difference rate between the actual foreign language pronunciation mouth shape change of the character in the target foreign language video and the word number of the Chinese subtitle is smaller, namely the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle is higher. If the matching degree is not greater than the preset matching degree threshold, the difference rate between the actual foreign language pronunciation mouth shape change of the character in the target foreign language video and the word number of the Chinese subtitle is larger, namely the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle is lower. The preset matching degree threshold may be specifically set according to practical applications, and is not specifically limited herein.

In one possible implementation manner, after determining the matching degree of the foreign language speech of the target foreign language video and the chinese subtitle, if the target foreign language video is the video to be recommended, the target foreign language video may be recommended to the user based on the matching degree of the foreign language speech of the target foreign language video and the chinese subtitle. Specifically, if the target foreign language video is a video to be recommended, and the determined matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle is the first matching degree, the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle is higher, namely, the difference between the mouth shape change of the person speaking the foreign language in the target foreign language video and the Chinese subtitle is smaller, the target foreign language video looks more natural, the interestingness of the user is higher and is more loved by the user, so that the matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle can be used as a characteristic dimension of recommending the video to the user, namely, the matching degree of the foreign speech of the target foreign language video and the Chinese subtitle can be considered when determining whether to recommend the target foreign language video to the user or not.

By adopting the method provided by the embodiment of the invention, the matching degree of the foreign language speech of the target foreign language video and the Chinese caption can be determined by analyzing the difference rate between the mouth shape change of the person speaking the foreign language in the target foreign language video and the Chinese caption number, and the matching degree can be extracted as one characteristic dimension of the target foreign language video to be used as one dimension of a video recommendation system, so that the recommendation accuracy of the video recommendation system is improved, and the recommendation effect of recommending the foreign language video to the user is improved.

Based on the same inventive concept, according to the method for determining the matching degree of video subtitles provided in the foregoing embodiment of the present invention, correspondingly, another embodiment of the present invention further provides a device for determining the matching degree of video subtitles, where a schematic structural diagram of the device is shown in fig. 3, and the device specifically includes:

the speech acquisition module 301 is configured to acquire a foreign speech of a target foreign speech video and a corresponding chinese caption;

the change number determining module 302 is configured to query a preset word phonetic symbol library for phonetic symbols corresponding to words included in the foreign language speech, and determine a pronunciation change number corresponding to each word based on the phonetic symbols;

and the matching degree determining module 303 is configured to determine the matching degree of the foreign language speech of the target foreign language video and the chinese subtitle according to the pronunciation change times and the word number of the chinese subtitle.

By adopting the device provided by the embodiment of the invention, the foreign language speech of the target foreign language video and the corresponding Chinese captions can be obtained; inquiring phonetic symbols corresponding to words included in foreign language speech from a preset word phonetic symbol library, and determining pronunciation change times corresponding to each word based on the phonetic symbols; and determining the matching degree of the foreign language speech of the target foreign language video and the Chinese caption according to the pronunciation change times and the word number of the Chinese caption. Furthermore, according to the determined matching degree of the foreign language speech of the target foreign language video and the Chinese subtitle, the foreign language video with high user preference degree, namely the foreign language speech with high matching degree of the foreign language speech and the Chinese subtitle, can be recommended to the user, and the recommending effect of the user on the foreign language video is improved.

Optionally, the number of changes determining module 302 is specifically configured to determine the number of vowels in the phonetic symbol corresponding to each word; and determining the number of the vowels as the pronunciation change times corresponding to the word.

Optionally, referring to fig. 4, the matching degree determining module 303 includes:

a number determining submodule 401, configured to determine, for each foreign language speech of the target foreign language video, a sum of pronunciation change numbers corresponding to words included in the foreign language speech as a pronunciation change number of the foreign language speech;

a word number determining sub-module 402, configured to determine the number of words of the chinese subtitle corresponding to each foreign language line of the target foreign language video;

a difference rate determining submodule 403, configured to traverse each foreign language speech of the target foreign language video, and determine a difference rate between the foreign language speech and the corresponding chinese subtitle based on the number of pronunciation changes of the foreign language speech and the number of words of the chinese subtitle corresponding to the foreign language speech;

and the matching degree determining sub-module 404 is configured to determine the matching degree of the foreign language speech lines of the target foreign language video and the chinese subtitles according to the difference rate between each sentence of the foreign language speech lines of the target foreign language video and the corresponding chinese subtitles.

Optionally, the difference rate determining submodule 403 is specifically configured to determine, as the difference rate between the foreign language speech word and the corresponding chinese subtitle, a ratio of the number of words of the chinese subtitle corresponding to the foreign language speech word to the number of pronunciation changes of the foreign language speech word if the number of pronunciation changes of the foreign language speech word is greater than the number of words of the chinese subtitle corresponding to the foreign language speech word; if the pronunciation change times of the foreign language speech is less than or equal to the number of Chinese characters of the Chinese characters corresponding to the foreign language speech, determining the ratio of the pronunciation change times of the foreign language speech to the number of Chinese characters of the Chinese characters corresponding to the foreign language speech as the difference rate between the foreign language speech and the corresponding Chinese characters.

Optionally, the matching degree determining submodule 404 is specifically configured to calculate an average value of difference rates between the foreign language speech of the target foreign language video and the corresponding chinese subtitle, as an average difference rate; calculating a standard deviation of the difference rate between the foreign language speech of the target foreign language video and the corresponding Chinese caption based on the average difference rate and the difference rate between each foreign language speech of the target foreign language video and the corresponding Chinese caption; if the average difference rate is greater than or equal to a preset difference rate threshold value and the standard deviation is less than or equal to a preset fluctuation threshold value, determining that the matching degree of the foreign language speech lines of the target foreign language video and the Chinese subtitles is a first matching degree, otherwise, determining that the matching degree of the foreign language speech lines of the target foreign language video and the Chinese subtitles is a second matching degree, wherein the first matching degree is greater than the second matching degree.

Optionally, referring to fig. 5, the apparatus further includes a video recommendation module 501, configured to recommend the target foreign language video to the user based on the matching degree of the foreign language speech of the target foreign language video and the chinese subtitle if the target foreign language video is the video to be recommended.

By adopting the device provided by the embodiment of the invention, the matching degree of the foreign language speech of the target foreign language video and the Chinese caption can be determined by analyzing the difference rate between the mouth shape change of the person speaking the foreign language in the target foreign language video and the Chinese caption number, and the matching degree can be extracted as one characteristic dimension of the target foreign language video to be used as one dimension of a video recommendation system, so that the recommendation accuracy of the video recommendation system is improved, and the recommendation effect of recommending the foreign language video to a user is improved.

The embodiment of the invention also provides an electronic device, as shown in fig. 6, which comprises a processor 601, a communication interface 602, a memory 603 and a communication bus 604, wherein the processor 601, the communication interface 602 and the memory 603 complete communication with each other through the communication bus 604,

a memory 603 for storing a computer program;

the processor 601 is configured to execute the program stored in the memory 603, and implement the following steps:

The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the terminal and other devices.

The memory may include random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, a computer readable storage medium is provided, where a computer program is stored, where the computer program is executed by a processor to implement the method for determining a matching degree of video subtitles according to any of the foregoing embodiments.

In yet another embodiment of the present invention, a computer program product containing instructions that, when executed on a computer, cause the computer to perform the method for determining a matching degree of video subtitles according to any of the above embodiments is also provided.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, the electronic device and the storage medium, since they are substantially similar to the method embodiments, the description is relatively simple, and the relevant points are referred to in the description of the method embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A method for determining a matching degree of video subtitles, comprising:

inquiring phonetic symbols corresponding to words included in the foreign language speech lines from a preset word phonetic symbol library, determining the number of vowels in the phonetic symbols corresponding to each word, and determining the number of vowels as the pronunciation change times corresponding to the word;

2. The method of claim 1, wherein determining the difference rate between the foreign language speech word and the corresponding chinese subtitle based on the number of pronunciation variations of the foreign language speech word and the number of words of the chinese subtitle corresponding to the foreign language speech word comprises:

3. The method according to claim 1, wherein determining the matching degree of the foreign language speech lines of the target foreign language video with the chinese subtitles according to the difference rate between each sentence of the foreign language speech lines of the target foreign language video and the corresponding chinese subtitles comprises:

4. A method according to any one of claims 1-3, wherein after said determining the degree of matching of the foreign language speech of the target foreign language video to the chinese subtitle, the method further comprises:

5. A video subtitle matching degree determining apparatus, comprising:

the change number determining module is used for inquiring phonetic symbols corresponding to words included in the foreign language speech from a preset word phonetic symbol library, determining the number of vowels in the phonetic symbols corresponding to each word and determining the number of vowels as the pronunciation change number corresponding to the word;

the matching degree determining module is used for determining the sum of pronunciation change times corresponding to words included in each foreign language speech word as the pronunciation change times of the foreign language speech word for each foreign language speech word of the target foreign language video; determining the word number of Chinese subtitles corresponding to each foreign language line of the target foreign language video; traversing each foreign language speech of the target foreign language video, and determining the difference rate between the foreign language speech and the corresponding Chinese subtitle based on the pronunciation change times of the foreign language speech and the number of Chinese subtitle characters corresponding to the foreign language speech; and determining the matching degree of the foreign language speech lines of the target foreign language video and the Chinese subtitles according to the difference rate between each sentence of foreign language speech lines of the target foreign language video and the corresponding Chinese subtitles.

6. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for carrying out the method steps of any one of claims 1-4 when executing a program stored on a memory.

7. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-4.