CN113127701A

CN113127701A - Subtitle language identification method and device, computer equipment and computer readable medium

Info

Publication number: CN113127701A
Application number: CN201911416584.XA
Authority: CN
Inventors: 洪冲; 王伟
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2021-07-16
Also published as: WO2021136096A1

Abstract

The invention provides a caption language identification method, which comprises the steps of obtaining a video stream with a preset coding format in a code stream, obtaining network abstract layer data of a supplementary enhancement information type from the video stream, determining character codes of closed captions if the closed captions are obtained from the network abstract layer data of the supplementary enhancement information type, and determining the language of the closed captions according to the character codes and the mapping relation between the preset character codes and the language. The embodiment of the disclosure can quickly and accurately identify the language of the closed caption under the condition that the original code stream lacks the caption language information. The disclosure also provides a caption language identification device, a computer device and a computer readable medium.

Description

Subtitle language identification method and device, computer equipment and computer readable medium

Technical Field

The disclosure relates to the technical field of multimedia, in particular to a method and a device for recognizing caption languages, computer equipment and a computer readable medium.

Background

Many broadcast and multicast video signals now include text that can be displayed on a television or other display device, with CC (Closed Caption) text being one such type of text that is typical. CC text is a transcript of the text portion of a person's voice in video, and sometimes describes a small background portion of the vocal tract (soundtrack). Initially, CC text was used to provide convenience to hearing impaired people, and later, CC text was also used in some environments where the audio portion of the signal is difficult for listeners to hear due to high or low ambient noise levels, such as in bars, restaurants, airports, medical rooms, etc. There are two types of CC texts: one is EIA (Electronic Industries Association) 608 conforming to NTSC (National Television Standards Committee) Standards, and the other is EIA 708 conforming to ATSC (Advanced Television Systems Committee) Standards, wherein the EIA 608 text supports 6 languages: english, french, spanish, danish, german, and portuguese. EIA 608 supports these 6 languages through three types of character sets: a standard character set, a special character set, and an extended character set.

Other text services may also be included in the video signal relating to programs, electronic program guides, news, sports and emergency announcements, and many other types of information, and may be TeleText services such as TeleText (r), Ceefax (west fox system), and Oracle (Oracle) that contain text. Most text services are currently encoded in the VBI (vertical blanking interval) of a video signal, and a small portion of text services may also be carried along with the audio and video portions of the signal by digital video signals such as MPEG-2 (moving picture experts group) encoded signals and MPEG-4 (moving picture experts group) encoded signals.

The amount of text that can be carried in any video signal is limited by the encoding system, and video signals using the VBI encoding system have only a limited capacity for carrying text. Since the CC text must be carried entirely on 21 lines of the VBI, the number of characters that can be encoded into each frame is limited. The original code stream lacks the language information of the caption, so the language information of the code stream data cannot be obtained.

Disclosure of Invention

In view of the above-mentioned shortcomings in the prior art, the present disclosure provides a method, an apparatus, a computer device and a computer readable medium for recognizing a caption language.

In a first aspect, an embodiment of the present disclosure provides a method for recognizing a caption language, where the method includes:

acquiring a video stream with a preset coding format in a code stream;

acquiring network abstraction layer data of a supplementary enhancement information type from the video stream;

if the closed caption is obtained from the network abstraction layer data of the supplementary enhancement information type, determining the character code of the closed caption;

and determining the language of the closed caption according to the character code and a preset mapping relation between the character code and the language.

Further, the determining the character encoding of the closed caption includes:

and converting the closed captions with the preset number of bytes into character codes with a preset scale.

Further, the determining the language of the closed caption according to the character code and a preset mapping relationship between the character code and the language includes:

selecting a character code;

and if a first determination result comprising a language is obtained according to the character codes and the mapping relation, determining the language as the language of the closed caption.

Further, the determining the language of the closed caption according to the character code and a preset mapping relationship between the character code and the language further includes:

and if a second determination result comprising a plurality of languages is obtained according to the character codes and the mapping relation, processing the second determination result and a previous processing result to determine the same language in the second determination result and the previous processing result, and if the same language is one, determining the same language as the language of the closed caption.

if the same language is multiple, selecting other character codes, and determining the language of the closed caption according to the selected character code and the mapping relation.

Further, after acquiring the closed caption from the network abstraction layer data of the supplemental enhancement information type, and before determining the language of the closed caption according to the character code and the preset mapping relationship between the character code and the language, the method further includes: determining a transmission channel of the closed captions;

the method further comprises the following steps:

and if the second determination result corresponding to the last character code of the closed caption and the previous processing result have the same languages, and the transmission channel of the closed caption is one, determining that the language of the closed caption is English.

In another aspect, an embodiment of the present disclosure further provides a device for recognizing a caption language, including: the device comprises a first acquisition module, a second acquisition module, a first determination module and a second determination module;

the first acquisition module is used for acquiring a video stream with a preset coding format in a code stream;

the second obtaining module is configured to obtain, from the video stream, network abstraction layer data of a supplemental enhancement information type;

the first determining module is configured to determine a character code of the closed caption if the closed caption is acquired from the network abstraction layer data of the supplemental enhancement information type;

and the second determining module is used for determining the language of the closed caption according to the character code and the mapping relation between the preset character code and the language.

In another aspect, an embodiment of the present disclosure further provides a computer device, including: one or more processors and storage; the storage device stores one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the method for recognizing the caption language according to the foregoing embodiments.

The embodiments of the present disclosure also provide a computer readable medium, on which a computer program is stored, where the computer program is executed to implement the method for recognizing the caption language according to the foregoing embodiments.

The caption language identification method provided by the embodiment of the disclosure obtains a video stream with a preset coding format in a code stream, obtains network abstraction layer data of a supplementary enhancement information type from the video stream, determines a character code of a closed caption if the closed caption in the network abstraction layer data of the supplementary enhancement information type is obtained, and determines the language of the closed caption according to the character code and a mapping relation between the preset character code and the language. The embodiment of the disclosure can quickly and accurately identify the language of the closed caption under the condition that the original code stream lacks the caption language information.

Drawings

Fig. 1 is a flowchart of a caption language identification method according to an embodiment of the present disclosure;

FIG. 2 is a flowchart illustrating the closed caption language determination according to character encoding and mapping according to another embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a caption language identification device according to yet another embodiment of the present disclosure.

Detailed Description

Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but which may be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Embodiments described herein may be described with reference to plan and/or cross-sectional views in light of idealized schematic illustrations of the disclosure. Accordingly, the example illustrations can be modified in accordance with manufacturing techniques and/or tolerances. Accordingly, the embodiments are not limited to the embodiments shown in the drawings, but include modifications of configurations formed based on a manufacturing process. Thus, the regions illustrated in the figures have schematic properties, and the shapes of the regions shown in the figures illustrate specific shapes of regions of elements, but are not intended to be limiting.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

An embodiment of the present disclosure provides a method for recognizing a caption language, as shown in fig. 1, the method may include the following steps:

and step 11, acquiring a video stream with a preset coding format in the code stream.

In the embodiment of the present disclosure, the preset encoding format may be an AVC/h.264 compression format (Advanced Video Coding, Advanced Video Coding/MPEG-4 part ten compression format). The code Stream may be a channel code Stream containing encoded text received on any broadcast or multicast channel, for example, a UDP (User Datagram Protocol) MPEG (Moving Picture Experts Group) TS (Transport Stream) type code Stream, and the UDP MPEG TS type code Stream may be obtained by adding a UDP multicast mode or by binding a UDP unicast address.

In this step, taking the case that the code stream is UDP MPEG TS code stream as an example, the subtitle language identification apparatus may first find a TS packet (TS packet, a basic component unit of UDP MPEG TS code stream) whose PID is 0. A TS Packet with a PID (Packet Identifier, flag code transport Packet) of 0 is generally a PAT Table (Program Association Table), a PMT Table can be found according to the PID of each PMT Table (Program Map Table, Program mapping Table) provided in the PAT Table, and a TS Packet consistent with the PID of the PMT Table in the code stream can be further found after the PMT Table is analyzed. The caption language type recognition device can analyze the TS packets, determine whether each TS packet belongs to a video Stream according to a Stream _ type value of each TS packet, if so, further determine whether the video Stream uses an AVC/h.264 compression format, and if so, obtain the video Stream in the preset coding format.

And step 12, acquiring the network abstraction layer data of the type of the supplementary enhancement information from the video stream.

In this step, a video stream in AVC/h.264 compression format may generally include NAL (Network Abstraction Layer) data, and the subtitle language identification apparatus may first obtain the NAL from the video stream, and then determine whether the type of the NAL data is SEI (Supplemental Enhancement Information), and if so, obtain the SEI NAL data.

And step 13, if the closed caption is obtained from the network abstraction layer data of the supplementary enhancement information type, determining the character coding of the closed caption.

In this step, the caption language identification device may first parse the SEI NAL data to find a data area with a payloadType value (payload) of 4 in the SEI NAL data (i.e., a user _ data _ registered _ itu _ t _ t35 data area), then may parse the user _ data _ registered _ itu _ t _ t35 data area, and determine whether there are four consecutive bytes with an ASCII value of GA94 in the data area, if so, it may be determined that the SEI NAL data includes CC captions of type EIA (Electronic industry Association) 608, and the caption language identification device may obtain the CC captions in the SEI NAL data. When acquiring the CC caption in the SEI NAL data, the caption language identification apparatus may determine the character encoding of the CC caption.

And step 14, determining the language of the closed caption according to the character code and the preset mapping relation between the character code and the language.

The mapping relationship between character codes and languages is shown in table 1.

TABLE 1

Character encoding	Language 1	Language 2	Language 3	Language 4	Language 5
						……	……	……	……	……	……

The mapping relationship between the character codes and the languages (i.e. table 1) reflects the correspondence relationship between the character codes and the languages supported by the CC caption text, and indicates in which languages the characters corresponding to the character codes may appear. In this step, the language of the CC caption corresponding to the character code can be determined by querying the mapping relationship using the character code as an index.

In embodiments of the present disclosure, the languages include french, spanish, denmark, german, and portuguese.

It should be noted that table 1 may further include a character set (Characters Sets), a CC Caption splitting (Closed Caption splitting), a Display (Display), and a character annotation (Unicode Name).

It can be seen from steps 11 to 14 that, in the caption language identification method provided in the embodiment of the present disclosure, a video stream in a preset coding format in a code stream is obtained, network abstraction layer data in a type of supplemental enhancement information is obtained from the video stream, if a closed caption is obtained from the network abstraction layer data in the type of supplemental enhancement information, a character code of the closed caption is determined, and a language of the closed caption is determined according to the character code and a mapping relationship between the preset character code and the language. The embodiment of the disclosure can quickly and accurately identify the language of the closed caption under the condition that the original code stream lacks the caption language information.

In some embodiments, the determining the character encoding of the closed caption may include: and converting the closed captions with the preset number of bytes into character codes with a preset scale. In this step, the caption language identification device may determine to convert CC captions of every several bytes into one character code according to the character coding method used by the CC captions, for example, if the character coding method used by the CC captions is utf-16 (hexadecimal unicode), the CC captions of every two bytes may be converted into one character code of utf-16. In this step, the caption language identification device may convert all CC captions in the SEI NAL into character codes.

In some embodiments, as shown in fig. 2, the determining the language of the closed caption according to the character encoding and the preset mapping relationship between the character encoding and the language may include the following steps:

step 21, selecting a character code.

In this step, the caption language identification device selects a character code as an index to inquire the mapping relation.

Step 22, determining the language of the closed caption according to the character coding and mapping relationship, if a first determination result including one language is obtained, executing step 23, and if a second determination result including a plurality of languages is obtained, executing step 24.

In this step, if only one language is obtained by the query, it can be said that the character represented by the character code is a character specific to the language, and the character is not present in other languages, and the result obtained by the query is the first determination result. If the query obtains a plurality of languages, it can be stated that the characters represented by the character codes do not exist in a plurality of languages but are specific to a certain language, and the result obtained by the query in the plurality of languages is the second determination result. The caption language identification device may determine whether the language in the determination result is unique, execute step 23 if the language in the determination result is unique, and execute step 24 if the language in the determination result is not unique.

The determination result obtained by determining the language type of the closed caption according to the character coding and mapping relationship may be represented by a binary character, the digit of the binary character is the number of the language types in table 1, for example, 5 language types are taken as an example, and a binary determination result of 5 digits is obtained according to the ordering of the language types in table 1, for example, when the determination result is 10001, 1 indicates that the character corresponding to the character coding exists in the corresponding language type (e.g., language 1 and language 5), and 0 indicates that the character corresponding to the character coding does not exist in the corresponding language type (e.g., language 2, language 3, and language 4). The caption language type identifying means may judge whether the determination result is the first determination result or the second determination result by judging that there are several "1" s in the determination result, for example, when there are 1 "s in the determination result, the explanation step 22 obtains the first determination result including one language type, and when there are a plurality of" 1 "s in the determination result, the explanation step 22 obtains the second determination result including a plurality of language types.

And step 23, determining the language as the language of the closed caption.

In this step, the caption language identification means may directly determine the only language in the second determination result as the language of the CC caption.

And 24, processing the second determination result and the previous processing result to determine the same language in the second determination result and the previous processing result.

The caption language identification device can further perform the current processing according to the second determination result and the previous processing result, that is, determine the same part of the plurality of languages in the second determination result and the plurality of languages in the previous processing result, if the same part exists, determine the same language, and judge whether the same language is unique, if the same language is unique, determine that the unique and same language is the language of the CC caption.

The processing operation in this step may be an and operation, for example, when the second determination result is 10001 and the previous processing result is 11100, 10001 and 11100 may be subjected to an and operation to obtain a current processing result of 10000, where "1" appears in the first place, which indicates that the same language in the current second determination result and the previous processing result is the language 1. When the second determination result is 10001 and the previous processing result is 11001, the operations of and may be performed on 10001 and 11001 to obtain that the processing result of this time is 10001, and "1" appears in the first digit and the fifth digit, which indicates that the same language in the second determination result and the previous processing result is language 1 and language 5.

And 25, judging whether the same language is one or not, if so, executing a step 26, otherwise, executing a step 27.

In this step, it can be determined whether the same language is one by determining that several "1" exist in the current processing result. For example, when the processing result of this time is 10000, there is only one "1", which indicates that there is only one language in the same language in the second determination result and the previous processing result, step 26 may be executed. When the result of the current processing is 10011, where there are three "1" s, it is described that the same language is multiple in the second determination result and the previous processing result, and step 27 may be executed.

And step 26, determining the same language as the language of the closed caption.

In this step, when the same language is unique (i.e. only one of the second determination result obtained by the processing in step 24 and the previous processing result is the same language), the caption language identification device may directly determine the same language as the language of the CC caption. That is, the caption language identification device may determine the language corresponding to "1" in the current processing result according to table 1, and determine the language as the language of the closed caption.

And 27, selecting other character codes, and determining the language of the closed caption according to the selected character code and the mapping relation.

In this step, when the same language is not unique (i.e. the same language in the second determination result obtained by the processing in step 24 and the previous processing result is multiple), the caption language identification apparatus needs to select another character code as an index to query the mapping relationship to obtain a determination result, and continue to determine the language of the CC caption according to the determination result, i.e. return to step 22.

It should be noted that, when the language of the CC caption is determined according to one character code and mapping relationship for the first time, the previous processing result is empty, and if a second determination result including multiple languages is obtained, other character codes are directly selected, and the language of the CC caption is determined according to the currently selected character code and mapping relationship. And when the language of the CC caption is determined according to the character coding and mapping relation for the second time, the processing result for the previous time is empty, and if a second determination result comprising a plurality of languages is obtained, processing is carried out according to the second determination result and the second determination result of the previous character coding so as to determine the same language.

It should be noted that, when the same language does not exist in the processing result (i.e. when 0 "1" exists in the processing result), the caption language identification device may directly select another character code and determine the language of the CC caption according to the currently selected character code and the mapping relationship.

Since english does not include special characters and english non-special characters are also included in other languages, english is not included in the mapping relationship between character codes and languages (i.e., table 1), and in the embodiment of the present disclosure, it is possible to determine whether a CC caption is english through a transmission channel of the CC caption.

In some embodiments, after acquiring the closed caption from the network abstraction layer data of the supplemental enhancement information type and before determining the language of the closed caption according to the character code and the preset mapping relationship between the character code and the language, the caption language identification method provided by the embodiment of the present disclosure may further include: a transmission channel for closed captions is determined.

The method for recognizing the language of the subtitle provided by the embodiment of the present disclosure may further include: and if the second determination result corresponding to the last character code of the closed caption and the same language in the previous processing result are determined to be multiple and the transmission channel of the closed caption is one, determining that the language of the closed caption is English.

That is to say, in the caption language identification method provided by the embodiment of the present disclosure, after the CC caption in the SEI NAL data is acquired, the transmission channel of the CC caption may also be determined. If the language of the CC caption cannot be determined until the last character code, whether the transmission channel of the CC caption is unique can be judged, and if the transmission channel is unique, the language of the CC caption can be directly determined to be English.

It should be noted that, if the channel of the CC caption is not unique, the language of the CC caption cannot be determined, and at this time, the channel information is output. Considering that the language in the same channel may change, the caption language identification device may use the newly determined unique language as the final language of the CC caption in the channel in real time.

The following briefly describes the caption language identification method provided by the embodiment of the present disclosure with reference to a specific embodiment. The caption language type identification device acquires a video stream with a preset coding format in a code stream, acquires SEI NAL data from the video stream, acquires CC captions in the SEI NAL data, determines character codes of the CC captions, and determines the language type of the CC captions according to the character codes and the mapping relation. The mapping relationship between the character code (i.e. hexadecimal converted value) of CC caption and the language is shown in table 2:

TABLE 2

If the selected character is coded as c3a7, the query mapping relationship may obtain a determination result of 10000, where "1" appears in the first place, which indicates that the character corresponding to c3a7 exists only in french, and at this time, the first determination result including french is obtained, and at this time, it is determined that french is the language of the CC caption. If the character code is c3a9, the query mapping relationship may obtain a determination result of 11101, where "1" respectively appears at the first, second, third, and fifth digits, which indicates that the character corresponding to c3a9 exists in four languages of french, spanish, denmark, and portuguese, and at this time, a second determination result including four languages of french, spanish, denmark, and portuguese is obtained. At this time, the caption language type recognition device processes according to the second determination result and the previous processing result, if the previous processing result is 11011, the second determination result 11101 and the previous processing result 11011 are subjected to and operation, the obtained processing result is 11001, wherein '1' respectively appears at the first place, the second place and the fifth place, the same language is french, spanish and portuguese, and the same language is multiple, so the caption language type recognition device needs to select other character codes and determine the language type of the CC caption according to the currently selected character code and mapping relation.

Based on the same technical concept, an embodiment of the present disclosure further provides a caption language identification device, as shown in fig. 3, the device may include: a first obtaining module 301, a second obtaining module 302, a first determining module 303, and a second determining module 303.

The first obtaining module 301 is configured to obtain a video stream in a preset encoding format in a code stream.

The second obtaining module 302 is configured to obtain network abstraction layer data of the supplemental enhancement information type from the video stream.

The first determining module 303 is configured to determine character encoding of the closed caption if the closed caption is acquired from the network abstraction layer data of the supplemental enhancement information type.

The second determining module 304 is configured to determine the language of the closed caption according to the character code and a preset mapping relationship between the character code and the language.

In some embodiments, the first determining module 303 is configured to convert the closed caption with a preset number of bytes into a character code with a preset scale.

In some embodiments, the second determination module 304 is configured to select a character encoding; and if a first determination result comprising one language is obtained according to the character coding and mapping relation, determining the language as the language of the closed caption.

In some embodiments, the second determining module 304 is configured to, if a second determination result including multiple languages is obtained according to the character encoding and mapping relationship, process the second determination result and the previous processing result to determine that the languages in the second determination result and the previous processing result are the same, and if the same language is one, determine that the same language is the language of the closed caption.

In some embodiments, the second determining module 304 is configured to select another character encoding if the same language is multiple, and determine the language of the closed caption according to the currently selected character encoding and the mapping relationship.

In some embodiments, the first determining module 303 is further configured to determine a transmission channel for closed captioning.

The second determining module 304 is further configured to determine that the language of the closed caption is english if it is determined that the second determination result corresponding to the last character code of the closed caption is multiple in the same language as the previous processing result and the closed caption has one transmission channel.

An embodiment of the present disclosure further provides a computer device, including: one or more processors and storage; the storage device stores one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the method for recognizing the caption language according to the foregoing embodiments.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods disclosed above, functional modules/units in the apparatus, may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. It will, therefore, be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.

Claims

1. A caption language identification method comprises the following steps:

acquiring a video stream with a preset coding format in a code stream;

2. The method of claim 1, wherein said determining character encoding of the closed captioning includes:

3. The method according to claim 1 or 2, wherein said determining the language of the closed caption according to the character code and the preset mapping relationship between the character code and the language comprises:

selecting a character code;

4. The method of claim 3, wherein said determining the language of the closed caption according to the character encoding and a preset mapping relationship between the character encoding and the language further comprises:

5. The method of claim 4, wherein said determining the language of the closed caption according to the character encoding and a preset mapping relationship between the character encoding and the language further comprises:

6. The method of claim 5, wherein after acquiring the closed caption from the network abstraction layer data of the supplemental enhancement information type, before determining the language of the closed caption according to the character code and a preset mapping relationship between the character code and the language, the method further comprises: determining a transmission channel of the closed captions;

the method further comprises the following steps:

7. A caption language identification device comprises: the device comprises a first acquisition module, a second acquisition module, a first determination module and a second determination module;

8. A computer device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the caption language identification method of any one of claims 1-6.

9. A computer-readable medium on which a computer program is stored, wherein the program, when executed, implements the caption language identification method according to any one of claims 1 to 6.