CN114363536A

CN114363536A - AI subtitle correction method and device and cloud video server

Info

Publication number: CN114363536A
Application number: CN202111628785.3A
Authority: CN
Inventors: 时凤珍
Original assignee: Suzhou Titou Electronic Information Technology Co Ltd
Current assignee: Suzhou Titou Electronic Information Technology Co Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-04-15

Abstract

The embodiment of the disclosure provides an AI subtitle correction method, an AI subtitle correction device and a cloud video server. The method comprises the following steps: acquiring an externally-embedded SRT subtitle of a current video; the externally-embedded SRT subtitle is a self-made subtitle issued by a user to the current video; detecting to obtain a target phrase in the externally-embedded SRT caption; the target phrase is a phrase with the conformity lower than a first threshold value in the externally-embedded SRT caption, and the conformity of the target phrase is obtained by inputting the target phrase into a hot word stock; acquiring a video frame of the current video corresponding to the target phrase, and identifying content information in the video frame; and if the content information is not matched with the target phrase, correcting the target phrase. By using the method, the recognition capability and the repair capability of the wrong phrases in the self-made captions can be improved, so that the viewing experience is improved.

Description

AI subtitle correction method and device and cloud video server

Technical Field

The disclosure relates to the technical field of video subtitles, in particular to an AI subtitle correction method, an AI subtitle correction device and a cloud video server.

Background

With the rise of video websites, more and more people begin to share their lives in a video form on the internet, and in order to enable viewers to have better viewing experience, some video producers can embed subtitles in an embedded or external embedded manner, so that the viewers can watch the subtitles conveniently. However, producing subtitles is a work with a large amount of work for a video producer, so that currently, a video website supports AI identification or audience uploading of an externally embedded subtitle, however, in the above manner, there may be a low AI identification rate or careless omission of producing subtitles by an audience, which may cause errors in subtitles, thereby reducing viewing experience.

Disclosure of Invention

In order to overcome at least the above disadvantages in the prior art, an object of the present disclosure is to provide an AI subtitle modification method and apparatus, and a cloud video server.

In a first aspect, the present disclosure provides an AI caption correcting method, including:

acquiring an externally-embedded SRT subtitle of a current video; the externally-embedded SRT subtitle is a self-made subtitle issued by a user to the current video;

detecting to obtain a target phrase in the externally-embedded SRT caption; the target phrase is a phrase with the conformity lower than a first threshold value in the externally-embedded SRT caption, and the conformity of the target phrase is obtained by inputting the target phrase into a hot word stock;

acquiring a video frame of the current video corresponding to the target phrase, and identifying content information in the video frame;

and if the content information is not matched with the target phrase, correcting the target phrase.

In a second aspect, the present disclosure provides an AI caption correcting apparatus including:

the external embedding unit is used for acquiring the external embedded SRT subtitle of the current video; the externally-embedded SRT subtitle is a self-made subtitle issued by a user to the current video;

the checking unit is used for detecting and obtaining a target phrase in the externally-embedded SRT caption; the target phrase is a phrase with the conformity lower than a first threshold value in the externally-embedded SRT caption, and the conformity of the target phrase is obtained by inputting the target phrase into a hot word stock;

the matching unit is used for acquiring the video frame of the current video corresponding to the target phrase and identifying content information in the video frame;

and the correcting unit is used for correcting the target phrase if the content information is not matched with the target phrase.

In a third aspect, an embodiment of the present disclosure provides a computer-readable storage medium, where instructions are stored, and when executed, cause a computer to perform the AI caption correcting method in the first aspect or any one of the possible designs of the first aspect.

In a fourth aspect, the disclosed embodiments further provide a cloud video server, where the cloud video server includes a processor, a machine-readable storage medium, and a network interface, where the machine-readable storage medium, the network interface, and the processor are connected through a bus system, the network interface is configured to be communicatively connected to at least one client, the machine-readable storage medium is configured to store a program, an instruction, or a code, and the processor is configured to execute the program, the instruction, or the code in the machine-readable storage medium to perform the AI subtitle modification method in the first aspect or any one of the possible designs in the first aspect.

Based on any one of the above aspects, the present disclosure provides an AI subtitle modification method, an AI subtitle modification device, and a cloud video server. The method comprises the following steps: acquiring an externally-embedded SRT subtitle of a current video; the externally-embedded SRT subtitle is a self-made subtitle issued by a user to the current video; detecting to obtain a target phrase in the externally-embedded SRT caption; the target phrase is a phrase with the conformity lower than a first threshold value in the externally-embedded SRT caption, and the conformity of the target phrase is obtained by inputting the target phrase into a hot word stock; acquiring a video frame of the current video corresponding to the target phrase, and identifying content information in the video frame; and if the content information is not matched with the target phrase, correcting the target phrase. By using the method, the recognition capability and the repair capability of wrong phrases in the self-made caption can be improved, the wrong phrases in the self-made caption can be effectively corrected, and self-learning can be performed based on the errors, so that the viewing experience is improved.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present disclosure and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings may be obtained from the drawings without inventive effort.

Fig. 1 is a schematic view of an application scenario of an AI subtitle modification system according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of an AI subtitle correction method according to an embodiment of the present disclosure;

fig. 3 is a schematic functional block diagram of an AI subtitle modification apparatus according to an embodiment of the present disclosure;

fig. 4 is a block diagram schematically illustrating a structure of a cloud video server for implementing the AI subtitle modification method according to the present disclosure.

Detailed Description

The present disclosure is described in detail below with reference to the drawings, and the specific operation methods in the method embodiments can also be applied to the device embodiments or the system embodiments.

Fig. 1 is an interactive schematic diagram of an AI subtitle modification system 10 according to an embodiment of the present disclosure. The AI subtitle modification system 10 may include a cloud video server 100 and a client 200 communicatively connected to the cloud video server 100. The AI caption correcting system 10 shown in fig. 1 is only one possible example, and in other possible embodiments, the AI caption correcting system 10 may include only a part of the components shown in fig. 1 or may also include other components.

In this embodiment, the client 200 may comprise a mobile device, a tablet computer, a laptop computer, etc., or any combination thereof. In some embodiments, the mobile device may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home devices may include control devices of smart electrical devices, smart monitoring devices, smart televisions, smart cameras, and the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, a smart lace, smart glass, a smart helmet, a smart watch, a smart garment, a smart backpack, a smart accessory, or the like, or any combination thereof. In some embodiments, the smart mobile device may include a smartphone, a personal digital assistant, a gaming device, and the like, or any combination thereof. In some embodiments, the virtual reality device and/or the augmented reality device may include a virtual reality helmet, virtual reality glass, a virtual reality patch, an augmented reality helmet, augmented reality glass, an augmented reality patch, or the like, or any combination thereof. For example, the virtual reality device and/or augmented reality device may include various virtual reality products and the like.

In this embodiment, the cloud video server 100 and the client 200 in the AI subtitle modification system 10 may cooperatively perform the AI subtitle modification method described in the following method embodiment, and the detailed description of the method embodiment may be referred to in the following specific steps of the cloud video server 100 and the client 200.

To solve the technical problem in the foregoing background, fig. 2 is a flowchart illustrating an AI subtitle correction method according to an embodiment of the present disclosure, which may be executed by the cloud video server 100 shown in fig. 1, and the AI subtitle correction method is described in detail below.

Step S110, obtaining an externally-embedded SRT subtitle of a current video; the externally-embedded SRT subtitle is a self-made subtitle issued by a user to the current video;

step S120, detecting to obtain a target phrase in the externally-embedded SRT subtitle; the target phrase is a phrase with the conformity lower than a first threshold value in the externally-embedded SRT caption, and the conformity of the target phrase is obtained by inputting the target phrase into a hot word stock;

step S130, acquiring a video frame of the current video corresponding to the target phrase, and identifying content information in the video frame;

step S140, if the content information does not match the target phrase, the target phrase is corrected.

In a possible embodiment, the AI subtitle modification method further includes obtaining a candidate phrase corresponding to the target phrase; the matching degree of the candidate phrase is larger than that of the target phrase, and the matching degree of the candidate phrase is obtained by inputting the candidate phrase into the hot word stock; and replacing the target phrase with the candidate phrase.

In a possible embodiment, the detecting and obtaining the target phrase in the externally-embedded SRT subtitle specifically includes: inputting the externally-embedded SRT caption into the hot word stock to obtain the engagement degree of a first word group in the externally-embedded SRT caption; and taking the first phrase as the target phrase under the condition that the degree of engagement of the first phrase is smaller than a second threshold value.

In one possible embodiment, the hot word library includes a first hot word library and a second hot word library, the first hot word library is obtained based on a video tag of a training video, and the second hot word library is obtained based on a homemade caption of the training video.

In a possible embodiment, the externally-embedded SRT subtitles are input into the hot thesaurus to obtain a degree of engagement of a first phrase in the externally-embedded SRT subtitles, specifically including inputting the externally-embedded SRT subtitles into the first hot thesaurus to obtain a first estimate of the first phrase; inputting the externally-embedded SRT caption into the second hot word bank to obtain a second evaluation value of the first phrase; and taking the result of weighted summation of the first estimation and the second estimation as the integrating degree of the first phrase.

In one possible embodiment, step S130 further includes:

step S131, when the target phrase appears, intercepting a video frame of the current video corresponding to the target phrase timestamp;

step S132, identifying the video frame to obtain a target object in the video frame;

step S133, determining whether the target object matches the target phrase.

In one possible embodiment, step S133 further includes:

step S1331, if the target object is matched with the target phrase, the target phrase is considered to be correct;

step S1332, if the target object is not matched with the target phrase, the target phrase is considered to be wrong.

In one possible embodiment, step S140 further includes:

step S141, if the content information is not matched with the target phrase, acquiring a correct phrase corresponding to the content information;

and step S142, replacing the target phrase with the correct phrase.

Fig. 3 is a schematic functional module diagram of an AI subtitle correcting apparatus 300 according to an embodiment of the present disclosure, and in this embodiment, functional modules of the AI subtitle correcting apparatus 300 may be divided according to a method embodiment executed by the cloud video server 100, that is, the following functional modules corresponding to the AI subtitle correcting apparatus 300 may be used to execute the method embodiments executed by the cloud video server 100. The AI subtitle correcting apparatus 300 may include an external embedding unit 310, a correcting unit 320, a matching unit 330, and a correcting unit 340, and the functions of the functional modules of the AI subtitle correcting apparatus 300 will be described in detail below.

The external embedding unit 310 may be configured to perform the step S110, that is, to obtain the external embedded SRT subtitle of the current video; the externally-embedded SRT subtitle is a self-made subtitle issued by a user to the current video.

The checking unit 320 may be configured to perform step S120 described above, that is, to detect and obtain a target phrase in the externally-embedded SRT subtitle; the target phrase is a phrase with the fitness lower than a first threshold in the externally-embedded SRT caption, and the fitness of the target phrase is obtained by inputting the target phrase into a hot word library.

The matching unit 330 may be configured to perform the step S130 described above, that is, to acquire a video frame of the current video corresponding to the target phrase and identify content information in the video frame.

The correcting unit 340 may be configured to perform the step S140, that is, if the content information does not match the target phrase, the target phrase is corrected.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the external unit 310 may be a processing element separately installed, or may be integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a processing element of the apparatus calls and executes the functions of the external unit 310. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when some of the above modules are implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor that can call program code. As another example, these modules may be integrated together, implemented in the form of a system-on-a-chip (SOC).

Fig. 4 shows a hardware structure diagram of the cloud video server 100 for implementing the above control device according to an embodiment of the present disclosure, and as shown in fig. 4, the cloud video server 100 may include a processor 110, a machine-readable storage medium 120, a bus 130, and a transceiver 140.

In a specific implementation process, the at least one processor 110 executes computer-executable instructions stored in the machine-readable storage medium 120 (for example, included in the AI caption correction apparatus 300 shown in fig. 3), so that the processor 110 may perform the AI caption correction method according to the above method embodiment, where the processor 110, the machine-readable storage medium 120, and the transceiver 140 are connected via the bus 130, and the processor 110 may be configured to control transceiving actions of the transceiver 140, so as to perform data transceiving with the aforementioned client 200.

For a specific implementation process of the processor 110, reference may be made to the above-mentioned method embodiments executed by the cloud video server 100, and implementation principles and technical effects are similar, which are not described herein again.

In the embodiment shown in fig. 4, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The machine-readable storage medium 120 may comprise high-speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.

The bus 130 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus 130 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

In addition, the embodiment of the disclosure also provides a readable storage medium, in which a computer executing instruction is stored, and when a processor executes the computer executing instruction, the AI subtitle correcting method is implemented.

The readable storage medium described above may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present disclosure, and not for limiting the same; while the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.

Claims

1. An AI subtitle modification method, comprising:

2. The method of claim 1, further comprising:

acquiring a candidate phrase corresponding to the target phrase; the matching degree of the candidate phrase is larger than that of the target phrase, and the matching degree of the candidate phrase is obtained by inputting the candidate phrase into the hot word stock;

and replacing the target phrase with the candidate phrase.

3. The method according to claim 1 or 2, wherein the detecting to obtain the target phrase in the externally-embedded SRT subtitle comprises:

inputting the externally-embedded SRT caption into the hot word stock to obtain the engagement degree of a first word group in the externally-embedded SRT caption;

and taking the first phrase as the target phrase under the condition that the degree of engagement of the first phrase is smaller than a second threshold value.

4. The method of claim 3, wherein the hot thesaurus comprises a first hot thesaurus and a second hot thesaurus, the first hot thesaurus being obtained based on video tags of a training video, the second hot thesaurus being obtained based on homemade subtitles of the training video;

the step of inputting the externally-embedded SRT caption into the hot lexicon to obtain the engagement degree of a first phrase in the externally-embedded SRT caption comprises the following steps:

inputting the externally-embedded SRT caption into the first hot word bank to obtain a first evaluation value of the first phrase;

inputting the externally-embedded SRT caption into the second hot word bank to obtain a second evaluation value of the first phrase;

and taking the result of weighted summation of the first estimation and the second estimation as the integrating degree of the first phrase.

5. The method according to claim 3, wherein the obtaining a video frame of the current video corresponding to the target phrase and identifying content information in the video frame comprises:

when the target phrase appears, intercepting a video frame of the current video corresponding to the target phrase timestamp;

identifying the video frame to obtain a target object in the video frame;

and judging whether the target object is matched with the target phrase.

6. The method of claim 5, wherein the determining whether the target object matches the target phrase comprises:

if the target object is matched with the target phrase, the target phrase is considered to be correct;

and if the target object is not matched with the target phrase, the target phrase is considered to be wrong.

7. The method of claim 1, wherein the modifying the target phrase if the content information does not match the target phrase comprises:

if the content information is not matched with the target phrase, acquiring a correct phrase corresponding to the content information;

and replacing the target phrase with the correct phrase.

8. An AI caption correcting device, comprising:

9. A computer readable storage medium storing instructions/executable code which, when executed by a processor of an electronic device, causes the electronic device to implement the method of any one of claims 1-7.

10. A cloud video server, characterized in that the cloud video server comprises a processor, a machine-readable storage medium, and a network interface, the machine-readable storage medium, the network interface, and the processor are connected through a bus system, the network interface is used for being connected with at least one client in a communication manner, the machine-readable storage medium is used for storing programs, instructions, or codes, and the processor is used for executing the programs, instructions, or codes in the machine-readable storage medium to execute the AI caption modification method according to any one of claims 1 to 7.