CN108495185B

CN108495185B - Video title generation method and device

Info

Publication number: CN108495185B
Application number: CN201810210914.9A
Authority: CN
Inventors: 杨振坤
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2018-03-14
Filing date: 2018-03-14
Publication date: 2021-04-16
Anticipated expiration: 2038-03-14
Also published as: CN108495185A

Abstract

The embodiment of the invention provides a video title generation method and a video title generation device, wherein the method comprises the following steps: the method comprises the steps of playing a target video through video playing equipment, obtaining a target image when an image obtaining instruction input by a user is received, carrying out character recognition on the target image to obtain a target text corresponding to a target character, and generating a target video title according to the target text. Based on the processing, the target characters in the target image can be identified, the target text can be obtained, the target video title can be further generated, and the video title generation efficiency can be improved.

Description

Video title generation method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for generating a video title.

Background

The video titles have unique representativeness, which can help users to quickly know the content of the video and can also improve the playing amount of the video to a certain extent. In general, when a user produces a program including a video, a video title needs to be input for the video. In the prior art, after a user previews a video, the user may determine a video title according to characters appearing in the video, and then, the user may manually input the video title.

However, the inventor finds that the prior art has at least the following problems in the process of implementing the invention:

the user needs to browse the characters in the video content firstly, then the user determines the video title according to the browsed characters and manually inputs the determined video title, and finally the user needs to check whether the video title is correctly input or not. If the video title is input incorrectly, the user needs to re-input the video title. It can be seen that the video title generation in the prior art is inefficient.

Disclosure of Invention

The embodiment of the invention aims to provide a video title generation method and a video title generation device so as to improve the efficiency of video title generation. The specific technical scheme is as follows:

in a first aspect, to achieve the above object, an embodiment of the present invention discloses a method for generating a video title, where the method includes:

playing the target video through video playing equipment;

when an image acquisition instruction input by a user is received, acquiring a target image, wherein the target image is an image containing target characters in the target video;

performing character recognition on the target image to obtain a target text corresponding to the target character;

and generating a target video title according to the target text.

Optionally, the image acquisition instruction carries an image identifier of a target frame image where the target image is located and coordinate information of the target image in the target frame image;

the acquiring of the target image comprises:

acquiring the target frame image according to the image identifier;

and extracting the target image from the target frame image according to the coordinate information.

Optionally, the generating a target video title according to the target text includes:

receiving a deleting instruction which is input by the user and corresponds to redundant characters in the target text,

and deleting the redundant characters from the target text to obtain a target video title.

receiving an adding instruction input by the user, wherein the adding instruction carries the characters required to be added by the user and the position information of the characters required to be added in the target text;

and adding the characters which need to be added by the user to the corresponding position in the target text to obtain a target video title.

Optionally, when a plurality of received image acquisition instructions are received, the generating a target video title according to the target text includes:

receiving a selection instruction input by the user;

acquiring target title texts selected by the user from the acquired target texts;

and generating a target video title according to the target title text.

Optionally, after generating the target video title according to the target text, the method further includes:

and sending the target video title to a preset user terminal so that the user terminal outputs the target video title.

In a second aspect, to achieve the above object, an embodiment of the present invention discloses a video title generating apparatus, where the apparatus includes:

the playing module is used for playing the target video through the video playing equipment;

the acquisition module is used for acquiring a target image when an image acquisition instruction input by a user is received, wherein the target image is an image containing target characters in the target video;

the recognition module is used for carrying out character recognition on the target image to obtain a target text corresponding to the target character;

and the generating module is used for generating a target video title according to the target text.

the obtaining module is specifically configured to obtain the target frame image according to the image identifier;

Optionally, the generating module is specifically configured to receive a deletion instruction corresponding to redundant characters in the target text, the deletion instruction being input by the user,

Optionally, the generating module is specifically configured to receive an adding instruction input by the user, where the adding instruction carries the characters that the user needs to add and the position information of the characters that need to be added in the target text;

Optionally, when a plurality of received image acquisition instructions are received, the generating module is specifically configured to receive a selection instruction input by the user;

and generating a target video title according to the target title text.

Optionally, the apparatus further comprises:

and the sending module is used for sending the target video title to a preset user terminal so as to enable the user terminal to output the target video title.

In another aspect of the present invention, in order to achieve the above object, an embodiment of the present invention further discloses an electronic device, where the electronic device includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the video title generating method according to the first aspect when executing the program stored in the memory.

In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions which, when run on a computer, implement the video title generation method according to the first aspect described above.

In another aspect of the present invention, there is also provided a computer program product including instructions, which when run on a computer, causes the computer to execute the video title generating method according to the first aspect.

According to the method and the device for generating the video title, the target video can be played through the video playing equipment, when an image acquisition instruction input by a user is received, the target image is acquired, character recognition is carried out on the target image to obtain a target text corresponding to the target character, and the target video title is generated according to the target text. Based on the processing, the target characters in the target image can be identified, the target text can be obtained, the target video title can be further generated, and the video title generation efficiency can be improved.

Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a flowchart of a video title generating method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for obtaining a target image according to an embodiment of the present invention;

fig. 3 is a flowchart of a method for modifying a target text according to an embodiment of the present invention;

fig. 4 is a flowchart of a second method for modifying a target text according to an embodiment of the present invention;

fig. 5 is an application scene diagram of a video title generating method according to an embodiment of the present invention;

fig. 6 is a block diagram of a video title generation apparatus according to an embodiment of the present invention;

fig. 7 is a structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

In the prior art, in a process of generating a video title, a user needs to browse characters in video content first, then, the user determines the video title according to the browsed characters and manually inputs the determined video title, and finally, the user needs to check whether the video title is correctly input. If the video title is input incorrectly, the user needs to re-input the video title. It can be seen that the video title generation in the prior art is inefficient.

Based on the above consideration, the present invention provides a method and an apparatus for generating a video title, which can be applied to an electronic device, where the electronic device can be a terminal or a server. The electronic equipment can play the target video through the video playing equipment. When an image acquisition instruction input by a user is received, the electronic equipment can acquire a target image and perform character recognition on the target image to obtain a target text corresponding to a target character. The electronic device may then generate a target video title from the target text. Based on the processing, the electronic equipment can identify the target characters in the target image to obtain the target text, and further generate the target video title, so that the video title generation efficiency can be improved.

Referring to fig. 1, fig. 1 is a flowchart of a video title generating method according to an embodiment of the present invention, including:

s101: and playing the target video through the video playing equipment.

In practice, when a user makes a television program, it is necessary to make a video title for a target video required for the television program. In general, the target video may be a video in which text required for producing a video title is included in video content. For a news video, a producer of the news video usually marks the news video with characters representing the content of the news video, for example, an image with characters of "celebration six decades of construction" may appear in a segment of the news video, and an image with characters of "hong kong returns to twenty years" may also appear in the segment of the news video.

The electronic device can play the target video through the video playing device, and specifically, when the electronic device is a terminal, the electronic device can send the target video to a display (video playing device) of the electronic device so as to play the target video; when the electronic device is a server, the electronic device may send the target video to a display terminal (video playing device) with a display, so that the display terminal plays the target video. Specifically, the electronic device may first compress the target video, and then send the compressed target video to the display terminal, so that the transmission rate of the target video can be increased, and the efficiency of generating the video title can be further increased.

S102: when an image acquisition instruction input by a user is received, a target image is acquired.

The target image may be an image of the target video containing target words, and the target words may be words used by a user to generate a video target video title, for example, the target words may contain a subject or a keyword of the target video.

In implementation, a user may view a target video through a video playback device. When a certain frame of image in a target video watched by a user contains a target character required by the user, an image acquisition instruction can be input to the electronic equipment. The electronic device can receive the image acquisition instruction, and acquire the video picture played by the video playing device at the current moment as the target image. Specifically, the electronic device may determine, as the target image, a frame of image being played by the video playing device at the current time according to the time length that the target video has been played. The target image may be an image including only the target text portion in one frame image. In the following embodiments, the process of acquiring the target image will be described in detail, in the case where the target image is an image including only the target text portion in one frame of image.

S103: and performing character recognition on the target image to obtain a target text corresponding to the target character.

In implementation, a character recognition algorithm may be stored in the electronic device in advance, and characters (target characters) in the target image are recognized according to the character recognition algorithm to obtain the target text. The Character Recognition algorithm may be an OCR (Optical Character Recognition) algorithm, which detects characters in an image, determines the shape of the characters by detecting dark and light patterns, and then translates the shape into characters by a Character Recognition method.

S104: and generating a target video title according to the target text.

In implementations, the electronic device can generate a target video title from the target text. In the process of generating the target video title, the electronic equipment can also receive a modification instruction input by a user, and modify the target text according to the modification instruction to generate the target video title. Specifically, the method for modifying the target text by the electronic device will be described in detail in the following embodiments.

Therefore, based on the video title generation method provided by the embodiment of the invention, the electronic equipment can identify the target characters in the target image to obtain the target text, so as to generate the target video title, and the video title generation efficiency can be improved.

Alternatively, the target image may be an image including only the target text portion in one frame image. Specifically, referring to fig. 2, fig. 2 is a flowchart of a method for acquiring a target image according to an embodiment of the present invention, where an image acquisition instruction may carry an image identifier of a target frame image where the target image is located and coordinate information of the target image in the target frame image, and the method includes:

s201: and acquiring a target frame image according to the image identifier.

The image identifier may be a timestamp of the target frame image in the target video, or may be a serial number of the target frame image arranged in all frame images of the target video.

In implementation, when a certain frame of image in a target video watched by a user contains a target character required by the user, if the electronic device is a terminal (which may be a computer), the user may input an image acquisition instruction to the electronic device, and for example, the user may press a preset screenshot shortcut key "R" key on a keyboard of the computer, where the screenshot shortcut key may be set by the user. The electronic device can acquire the video picture played by the display at the current moment as the target frame image. If the electronic device is a server, a user can input an image acquisition instruction to the electronic device, and the electronic device can determine a frame of image being played by the video playing device at the current moment, and at this moment, the electronic device can acquire a corresponding high-definition frame image from the locally stored target video to serve as a target frame image and send the target frame image to the video playing device.

S202: and extracting the target image from the target frame image according to the coordinate information.

When the target image is a rectangle, the coordinate information may include coordinates of four vertex pixels of the target image in the target frame image; if the target image is of another shape, the coordinate information may include coordinates of pixels of the edge of the target image in the target frame image.

In implementation, when a certain frame of image in a target video watched by a user contains a target character required by the user, the target image can be directly selected in a frame in the video playing device. The electronic equipment can determine the position of the target image in the target frame image according to the coordinate information, and then obtain the target image.

As can be seen from the above, based on the video title generation method of the embodiment of the present invention, the target image acquired by the electronic device may be an image only containing the target text portion in the target frame image, which can improve the accuracy of text recognition, thereby improving the efficiency of video title generation.

Optionally, referring to fig. 3, fig. 3 is a flowchart of a method for modifying a target text according to an embodiment of the present invention, where the method includes:

s301: and receiving a deleting instruction of redundant characters in the corresponding target text input by the user.

The number of the redundant characters may be one or more.

In implementation, when the user browses the determined target text, if the user needs to delete redundant characters in the target text, a deletion instruction can be input to the electronic device, and the electronic device can receive the deletion instruction.

S302: and deleting redundant characters from the target text to obtain a target video title.

In implementation, after receiving the deletion instruction, the electronic device may delete the redundant characters from the target text.

Therefore, based on the video title generation method provided by the embodiment of the invention, the electronic equipment can delete redundant characters in the target text, so that the personalized requirements of the user are met, and the user experience is improved.

Optionally, referring to fig. 4, fig. 4 is a flowchart of a second method for modifying a target text according to an embodiment of the present invention, where the method includes:

s401: and receiving an adding instruction input by a user.

The adding instruction may carry the characters that the user needs to add and the position information of the characters that need to add in the target text. One or more characters may be added.

In implementation, when the user browses the determined target text, if the user needs to add characters in the target text, an adding instruction can be input to the electronic device, and the electronic device can receive the adding instruction.

S402: and adding characters which need to be added by the user to the corresponding position in the target text to obtain the target video title.

In implementation, after the electronic device receives the adding instruction, the characters which the user needs to add can be added to the corresponding positions in the target text.

Therefore, based on the video title generation method provided by the embodiment of the invention, the electronic equipment can add the characters required to be added by the user into the target text, so that the personalized requirements of the user are met, and the user experience is improved.

Optionally, the electronic device may further be implemented to screen out the target text selected by the user from the determined target texts. Specifically, when the electronic device receives a plurality of image acquisition instructions, the processing step may further include: receiving a selection instruction input by a user; acquiring target title texts selected by a user from the acquired target texts; and generating a target video title according to the target title text.

In implementation, a user may input an image acquisition instruction for multiple times, and the electronic device may acquire each image acquisition instruction to determine corresponding target texts. The electronic equipment can display the determined target texts to the user through the video playing equipment. The user can select according to the requirement and input a selection instruction to the electronic equipment. The electronic equipment can receive the selection instruction, acquire the target title text selected by the user from each target text, and generate the target video title according to the target title text. Specifically, the step of generating the target video title by the electronic device according to the target title text may refer to the method of S104, which is not described herein again.

As can be seen from the above, based on the video title generation method provided by the embodiment of the present invention, the electronic device can determine a plurality of target texts, and generate a target video title according to the target title text selected by the user, which can improve user experience.

Optionally, the method may further include:

The user terminal may be the video playing device or another terminal.

Therefore, based on the video title generation method provided by the embodiment of the invention, the electronic equipment can send the target video title to the user terminal so that the user terminal can output the target video title, the user can browse the target video title conveniently, and the user experience is improved.

Referring to fig. 5, fig. 5 is an application scene diagram of a video title generation method according to an embodiment of the present invention, including:

the electronic device 501 may be a server and the video playback device 502 may be a computer in this application scenario. A user may send a video preview request to the electronic device 501 through the video playback device 502. The electronic device may compress the target video and then send the compressed target video to the video playing device 502, so that the video playing device 502 plays the target video, specifically, the video playing device 502 may have a solid-state memory and an animation editor flash player installed therein, for playing the target video. The user can watch the target video played by the video playing device 502, and when the user needs to acquire the target image, the user can use the video playing device 502 to send an image acquisition request to the electronic device 501. The electronic device 501 may obtain the target frame image and send the target frame image to the video playback device 502. When the user views the target frame image through the video playback device 502, the target image can be framed out through the video playback device 502. Then, the user can send an OCR recognition request to the electronic device 501 through the video playback device 502, and send the coordinate information of the target image to the electronic device 501. The electronic device 501 may perform character recognition on the target image framed by the user according to an OCR recognition algorithm to obtain a target text, generate a target video title, and send the target video title to the video playing device 502. The video playback device 502 may display the target video title to the user.

Corresponding to the embodiment of the method in fig. 1, referring to fig. 6, fig. 6 is a structural diagram of a video title generating device according to an embodiment of the present invention, including:

the playing module 601 is configured to play the target video through a video playing device;

an obtaining module 602, configured to obtain a target image when an image obtaining instruction input by a user is received, where the target image is an image containing target characters in the target video;

the recognition module 603 is configured to perform character recognition on the target image to obtain a target text corresponding to the target character;

and a generating module 604, configured to generate a target video title according to the target text.

the obtaining module 602 is specifically configured to obtain the target frame image according to the image identifier;

Optionally, the generating module 604 is specifically configured to receive a deletion instruction corresponding to redundant words in the target text, which is input by the user,

Optionally, the generating module 604 is specifically configured to receive an adding instruction input by the user, where the adding instruction carries the characters that the user needs to add and the position information of the characters that the user needs to add in the target text;

Optionally, when a plurality of received image acquisition instructions are received, the generating module 604 is specifically configured to receive a selection instruction input by a user;

acquiring target title texts selected by a user from the acquired target texts;

and generating a target video title according to the target title text.

Optionally, the apparatus further comprises:

As can be seen from the above, the video title generation apparatus according to the embodiment of the present invention can identify the target characters in the target image, obtain the target text, and further generate the target video title, thereby improving the efficiency of video title generation.

An embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete mutual communication through the communication bus 704,

a memory 703 for storing a computer program;

the processor 701 is configured to implement the following steps when executing the program stored in the memory 703:

playing the target video through video playing equipment;

and generating a target video title according to the target text.

The communication bus 704 mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 704 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface 702 is used for communication between the above-described electronic apparatus and other apparatuses.

The Memory 703 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory 703 may also be at least one memory device located remotely from the aforementioned processor.

The Processor 701 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

The electronic equipment provided by the embodiment of the invention can identify the target characters in the target image to obtain the target text when generating the video title, further generate the target video title and improve the efficiency of generating the video title.

An embodiment of the present invention further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a computer, the computer is caused to execute the video title generation method provided in the embodiment of the present invention.

Specifically, the video title generating method includes:

playing the target video through video playing equipment;

and generating a target video title according to the target text.

It should be noted that other implementation manners of the video title generation method are the same as those of the foregoing method embodiment, and are not described herein again.

By operating the instructions stored in the computer-readable storage medium provided by the embodiment of the invention, when the video title is generated, the target characters in the target image can be identified to obtain the target text, so that the target video title is generated, and the video title generation efficiency can be improved.

Embodiments of the present invention also provide a computer program product including instructions, which when run on a computer, cause the computer to execute the video title generation method provided by the embodiments of the present invention.

Specifically, the video title generating method includes:

playing the target video through video playing equipment;

and generating a target video title according to the target text.

By operating the computer program product provided by the embodiment of the invention, when the video title is generated, the target characters in the target image can be identified to obtain the target text, so that the target video title is generated, and the video title generation efficiency can be improved.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

Claims

1. A video title generation method, comprising:

playing the target video through video playing equipment;

generating a target video title according to the target text;

the image acquisition instruction carries an image identifier of a target frame image where the target image is located and coordinate information of the target image in the target frame image;

the acquiring of the target image comprises:

acquiring the target frame image according to the image identifier; the target frame image is a high-definition frame image corresponding to a frame image which is played by the video playing device when the image acquisition instruction is received in the locally stored target video;

2. The method of claim 1, wherein generating a target video title from the target text comprises:

3. The method of claim 1, wherein generating a target video title from the target text comprises:

4. The method of claim 1, wherein when a plurality of received image capturing instructions are received, the generating a target video title from the target text comprises:

receiving a selection instruction input by the user;

and generating a target video title according to the target title text.

5. The method of claim 1, wherein after the generating a target video title from the target text, the method further comprises:

6. A video title generation apparatus, comprising:

the generating module is used for generating a target video title according to the target text;

the obtaining module is specifically configured to obtain the target frame image according to the image identifier; the target frame image is a high-definition frame image corresponding to a frame image which is played by the video playing device when the image acquisition instruction is received in the locally stored target video;

7. The apparatus of claim 6,

the generating module is specifically configured to receive a deletion instruction corresponding to redundant characters in the target text, the deletion instruction being input by the user,

8. The apparatus of claim 6,

the generating module is specifically configured to receive an adding instruction input by the user, where the adding instruction carries the characters that the user needs to add and the position information of the characters that the user needs to add in the target text;

9. The apparatus according to claim 6, wherein the generating module is configured to, when a plurality of received image acquisition instructions are received, receive a selection instruction input by the user;

and generating a target video title according to the target title text.

10. The apparatus of claim 6, further comprising:

11. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;

the memory is used for storing a computer program;

the processor, when executing the program stored in the memory, implementing the method steps of any of claims 1-5.