CN112822539A

CN112822539A - Information display method, device, server and storage medium

Info

Publication number: CN112822539A
Application number: CN202011642906.5A
Authority: CN
Inventors: 陈妙; 钟宜峰; 吴耀华; 李琳
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-05-18
Anticipated expiration: 2040-12-30
Also published as: CN112822539B

Abstract

The embodiment of the invention relates to the field of videos and discloses an information display method, an information display device, a server and a storage medium. The information display method comprises the following steps: acquiring the appearance duration of a preset target object of a display text in a video; judging whether the occurrence time length meets a preset time length condition or not; if yes, acquiring a character recognition result of the text; storing the character recognition result in a display file; the display file is read by a video player, and the character recognition result is displayed when the video is played. The video text information can be automatically supplemented, and the human resources are saved.

Description

Information display method, device, server and storage medium

Technical Field

The embodiment of the invention relates to the field of videos, in particular to an information display method, an information display device, a server and a storage medium.

Background

With the development of the internet, watching videos becomes an indispensable part in daily entertainment of people, when video playing software or a video website plays videos for users, the display time of some pictures displaying characters in the videos is short, and the users are difficult to acquire complete character information within the display time, so that the users are difficult to acquire enough information from the picture content, and the content understanding of the users on the videos is influenced.

In the related information display method, the video is watched manually for a long time, the character content needing to be additionally displayed in the video is identified, and the additional character content is added into the video picture.

Therefore, the related information display method has the following problems: the video content needs to be identified manually, the character content needing to be supplemented is added into the video, and high labor cost is needed.

Disclosure of Invention

The embodiment of the invention aims to provide an information display method, an information display device, a server and a storage medium, which can automatically supplement video text information and save human resources.

In order to solve the above technical problem, an embodiment of the present invention provides an information display method, including: acquiring the appearance duration of a preset target object of a display text in a video; judging whether the occurrence time length meets a preset time length condition or not; if yes, acquiring a character recognition result of the text; storing the character recognition result in a display file; the display file is read by a video player, and the character recognition result is displayed when the video is played.

An embodiment of the present invention also provides an information display device including: the acquisition module is used for acquiring the appearance duration of a preset target object of a display text in a video; the identification module is used for judging whether the occurrence time meets a preset time condition or not; if yes, acquiring a character recognition result of the text; the display module is used for storing the character recognition result in a display file; the display file is read by a video player, and the character recognition result is displayed when the video is played.

An embodiment of the present invention further provides a server, including: at least one processor; a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the information display method described above.

Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the above-described information display method.

Compared with the prior art, the method and the device for displaying the information have the advantages that the occurrence duration of the preset target object of the display text in the video is obtained, whether the occurrence duration meets the preset duration condition is judged, if yes, the character recognition result of the text is obtained, the character recognition result is stored in the display file, and the display file can be read by a video player and displays the character recognition result when the video is played.

In addition, before storing the character recognition result in the display file, the method further comprises the following steps: judging whether the character recognition result is related to the subsequent plot of the video or not; and if so, storing the character recognition result in a display file. By judging whether the character recognition result is related to the subsequent plot of the video or not, if so, storing the character recognition result in the display file, screening the text which is additionally displayed, avoiding displaying the text which is unrelated to the plot of the video, and saving system resources.

In addition, judging whether the character recognition result is related to the subsequent plot of the video comprises the following steps: judging whether an image appearing in a preset target object is an advertisement or not; if not, judging whether the character recognition result is consistent with the subsequent plot; if the two images are consistent, the character recognition result is related to the subsequent plot of the video. By judging whether the image appearing on the preset target object is an advertisement or not, if not, judging whether the character recognition result is consistent with the subsequent plot or not, and if so, judging whether the character recognition result is related to the subsequent plot of the video.

In addition, the preset duration condition includes: the occurrence duration is less than the understanding time of the text; understanding the time is derived from the number of words contained in the text. And obtaining the understanding time according to the number of words contained in the text, and performing text supplementary display on the text with the occurrence time less than the understanding time, so that the display time of the character recognition result is more in line with the requirements of users, and the accuracy of judging whether the text needs supplementary display can be improved.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

Fig. 1 is a flowchart of an information display method according to a first embodiment of the present invention;

FIG. 2 is a flow chart of an information display method provided according to a second embodiment of the present invention;

fig. 3 is a schematic view of an information display apparatus provided according to a third embodiment of the present invention;

fig. 4 is a schematic diagram of a server structure provided in accordance with a fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.

A first embodiment of the present invention relates to an information display method. The specific flow is shown in figure 1.

Step 101, acquiring the appearance duration of a preset target object of a display text in a video;

step 102, judging whether the occurrence time length meets a preset time length condition;

step 103, if yes, acquiring a character recognition result of the text;

step 104, storing the character recognition result in a display file; the display file is read by a video player, and the character recognition result is displayed when the video is played.

The information display method of the embodiment is applied to a server of a video platform and provides supplementary information display for videos. The video platform can provide video-on-demand service for users in the form of web pages, client software and the like. The video is stored in a server of the video platform and is provided for a platform user to request, and particularly is a movie and television play video. In a movie and television play video, a leading character often watches scenes of text contents on articles such as letters, mobile phones and the like, but the appearing time of the scenes is usually not long enough for audiences, namely, platform users can watch the text information in the scenes, so that the audiences are difficult to acquire enough plot information only by the text information in the video, and the audience is influenced to understand the plot content. The video platform server generates a display file for supplementing character display by processing the video, and when a platform user requests a certain video, the video file of the video and the display file corresponding to the video are imported into the video player together, so that the platform user can obtain the text content for supplementing display while watching the video.

The following describes in detail the implementation details of the information display method according to the present embodiment, and the following is only provided for the convenience of understanding and is not necessary for implementing the present embodiment.

In step 101, the video platform server obtains the occurrence duration of a preset target object of a display text in a video, where the preset target object of the display text may be a prop of a common text message in a movie and television scenario, such as a letter, a mobile phone, a guideboard, a message bar, and the like, to prompt a video episode.

Specifically, after the video platform server acquires the video, the video platform server captures the video image at regular time to obtain the image of the video. The video platform server can perform target detection on the image by using a target detection algorithm, and identify whether a preset target object for displaying the text and a text display area of the preset target object exist in the image. If the preset target object exists in the image, the subsequent images of the image are identified until the preset target object cannot be identified from the continuous subsequent images, and the appearance duration of the target object is calculated according to the number of video images between the first image of the preset target object and the first image of the preset target object which is not identified subsequently. The video platform server captures images of the video frame at regular time, and may extract video frames at regular time. Since the number of frames displayed per second in a video is fixed, the timed extraction of video frames may be performed by extracting video frames of the same frame interval. Preferably, the decimation on the video frame may be every 5 frames. Specifically, the image may be input into a pre-trained image recognition model, and the image recognition model performs feature extraction, recognition and detection according to training data, so that feature extraction, recognition and detection may be performed on the input image, whether the image includes a target object or not may be determined, and a result may be output. Preferably, the image recognition model may be a YOLOv4 (young Only Look one, abbreviated as "YOLO") algorithm model, and the video platform server obtains whether the detected image includes the preset target object and the position of the preset target object in the image by using the YOLOv4 algorithm model.

For example, if the video platform server recognizes that a mobile phone appears in the 5 th frame of the video and text information is displayed on the screen of the mobile phone, the video platform server recognizes the subsequent images, extracts the 10 th frame, the 15 th frame, the 20 th frame and other video frames for recognition. If the mobile phone screen displaying the text message is detected in the 10 th frame and the 15 th frame, the mobile phone screen displaying the text message does not appear in the 20 th frame. Calculating the appearance time of the mobile phone screen for displaying the text message as the 5 th frame to the 20 th frame, and if the frame rate of the video is 25 frames per second, the appearance time of the mobile phone screen for displaying the text message is 0.6 second.

In step 102, the video platform server determines whether the appearance duration of the article displaying the text meets a preset duration condition. Wherein the preset duration condition may be that the occurrence duration is less than the understanding time of the text. And when the occurrence time length is between the first preset time length and the second preset time length, the video platform server judges whether the occurrence time length is less than the understanding time of the text in the image. Wherein the understanding time is derived from the number of words contained in the text. When the occurrence time is shorter than the understanding time, performing supplementary display on the image characters; when the appearance time is longer than or equal to the understanding time, the image characters are not complementarily displayed. Specifically, the video platform server may perform word division processing on a text display region of the image, count the total word count of the text, and calculate the understanding time according to a human-eye understanding word count-time ratio, where the word division processing on the text display region may be performed by selecting any one of the target item images, and the human-eye understanding word count-time ratio is the time required for the human eye to understand the word count. For example, if the total number of words divided by characters in the text display area is n and the ratio of the number of words understood by the human eyes to the time is r according to the big data statistics, the understanding time S is n/r.

In this embodiment, the understanding time is obtained according to the number of words contained in the text, and the text with the occurrence duration less than the understanding time is subjected to text supplementary display, so that the display duration of the character recognition result more meets the requirements of the user, and the accuracy of judging whether the text needs supplementary display can be improved.

Further, before judging whether the occurrence time length is less than the understanding time, the video platform server also judges whether the occurrence time length is less than a preset threshold value, and if the occurrence time length is less than the preset threshold value, the video platform server executes judgment to judge whether the occurrence time length meets a preset time length condition. The preset threshold may be a preset maximum display time, the maximum display time is obtained from a human-eye reading experience statistical time, and if the occurrence time is longer than or equal to a first preset time, the occurrence time of the preset target object is enough for the audience to understand the text information, and the additional display is not needed, and other steps of the information display method are not executed. The preset threshold value can also be preset shortest display time which is obtained by counting the time length of human eye reading experience, and if the occurrence time length is less than the first preset time length, the occurrence time length of the preset target object is not enough for the audience to understand the information of the text, and the supplementary display is required.

In this embodiment, before determining whether the occurrence duration meets the preset duration condition, it is determined whether the occurrence duration is smaller than a preset threshold, and if the occurrence duration is smaller than the preset threshold, it is determined whether the occurrence duration meets the preset duration condition.

In step 103, the video platform server obtains a text recognition result of a text displayed by a preset target object whose occurrence time meets a preset time condition. Specifically, the video platform server may detect a text display area of a preset target article, perform text Recognition on a text in the text display area to obtain a text Recognition result, perform text feature extraction and Recognition by using convolution vector operations (CONV), Long Short-Term Memory networks (LSTM), and Optical Character Recognition (OCR) algorithms, and optimize the Recognition result by using a post-filtering algorithm to obtain a text Recognition result.

In step 104, the video platform server may store the text recognition result in a display file, where the display file is provided for the video player to read, and when playing the video, the video platform server displays the text recognition result, where the display file includes the text recognition result and display time information. The display file is a file that can be recognized by a video player, runs while the video is playing, and displays content on the video, and for example, a subtitle file or a bullet screen file can be used as the display file. When the character recognition result is obtained for the first time, the video platform server can generate a display file. The display file contains the text to be displayed, namely the character recognition result, the display time information, the display position information and the display form information of the text. Wherein the display form information includes: the display font, size, color and the like of the text, and the video platform server can set the display form information of the character recognition result according to the picture size of the video, the number of words needing the character recognition result, the position pixel and other factors. The video platform server can also generate a display box for the text to be displayed in the display file, and when the display file is executed by the video player, the displayed text is displayed in the display box.

In one example, the video platform server generates display time information corresponding to the character recognition result according to the understanding time, and stores the character recognition result and the corresponding display time information in the display file. For example, if the duration of the understanding time is taken as the display duration of the character recognition result, the video platform server takes the appearance time of the preset target object marking the display text in the video as the starting display time for displaying the character recognition result, and takes the duration of the appearance time plus the understanding time as the ending display time for displaying the character recognition result. When the preset target object displaying the text appears, the text recognition result is displayed at a peripheral position of the text display area, for example, the text may be displayed on the left side of the text display area by default. Further, the video platform server may further obtain a lens end time of the preset target object displaying the text according to the appearance duration of the preset target object displaying the text, set the character recognition result between the start display time and the lens end time, display the character recognition result around the text display area, and display the character recognition result in any area of the video picture image between the lens end time of the preset target object displaying the text and the end display time of the character recognition result, for example, may display the character recognition result at the upper right of the video picture image by default.

In this embodiment, the display time information of the character recognition result is generated by the video platform server, and the display duration is obtained according to the understanding time of the display text, so that the situation that the display duration is not matched with the understanding time required by the user for understanding the character content due to artificial subjectivity when the display duration of the character recognition result is set is avoided.

Further, before the display position of the character recognition result is set, the video platform server carries out face recognition segmentation on the video image within the display time of the character recognition result, and avoids the face area to display the character recognition result. If the face area cannot be avoided for displaying, the subtitle can be displayed at the subtitle position.

In the embodiment, the face recognition segmentation is carried out on the video image within the display time, the face area is avoided from displaying the character recognition result, the character recognition result can be prevented from being displayed in the face area of the video, and therefore the watching experience of a user when watching the video can be improved.

In one example, before generating the display file, the video platform server further detects the periphery of the preset target object, and detects whether the periphery of the preset target object has supplementary display of the text. Specifically, text detection may be performed on a video image, and the detected text may be compared with a text recognition result, if the detected text content is the same as the text recognition result, a supplementary display of the text already exists in the video, and repeated supplementary display using a display file is not required, and if the text that is the same as the text recognition result is not detected, a display file may be generated, and the text displayed in the article may be subjected to supplementary display.

In one example, after the video platform server obtains the video, it also obtains a type tag of the video, such as a national drama, an ancient drama, a metropolitan drama, etc., and generates display frames of different styles for displaying the text recognition result according to the video tag. Specifically, the corresponding relationship between the type labels of different videos and the setting information of different preset display frames can be imported in advance, and different display frame styles can be selected for the character recognition results of different videos.

In one example, the video platform server further obtains a history watching video record of the user and subtitle information corresponding to the history watching video, obtains a first language type according to the history watching video record and the subtitle information thereof, takes the language type of the character recognition result as a second language type, and if the first language type is different from the second language type, can translate the character recognition result into the first language type and generate a display file.

In the embodiment, the occurrence duration of the preset target object displaying the text in the video is obtained, whether the occurrence duration meets the preset duration condition is judged, if yes, the character recognition result of the text is obtained, the character recognition result is stored in the display file, and the display file can be read by a video player and displays the character recognition result when the video is played.

The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

A second embodiment of the present invention relates to an information display method. The second embodiment is substantially the same as the first embodiment, and mainly differs therefrom in that: in the first embodiment, a display file is generated from the character recognition result of the text displayed on the preset target object satisfying the preset duration condition. In the second embodiment of the present invention, only the character recognition results related to the subsequent plot of the video are generated into the display file.

Fig. 2 shows a specific flow of an information display method according to a second embodiment of the present invention.

Step 201, acquiring the appearance duration of a preset target object of a display text in a video;

step 202, judging whether the occurrence time length meets a preset time length condition;

step 203, if yes, acquiring a character recognition result of the text;

step 204, judging whether the character recognition result is related to the subsequent plot of the video;

step 205, if relevant, storing the character recognition result in a display file; the display file is read by a video player, and the character recognition result is displayed when the video is played.

Step 201, step 202, step 203, and step 205 are substantially the same as step 101, step 102, step 103, and step 104 in the first embodiment, and are not described again.

In step 204, the video platform server determines whether the text recognition result is related to a subsequent episode of the video. The video platform server can judge whether an image of a preset target object displaying a text is an advertisement, if so, the character recognition result is irrelevant to a subsequent plot, and a display file does not need to be generated for the character recognition result to be additionally displayed; if not, the character recognition result is judged to be related to the subsequent plot, and a display file of the character recognition result is generated.

Specifically, the video platform server may input the image into a pre-trained advertisement classification model, and determine whether the image is an advertisement image by using the advertisement classification model. An advertisement classification model can be constructed by using an I3D (Two-Stream infected 3D ConvNet, abbreviated as 'I3D') algorithm, and the advertisement picture image and the actual movie picture image are subjected to Two classifications to judge whether the image is an advertisement image or not, so as to judge whether the character recognition result is related to a subsequent plot or not.

Further, after the image of the preset target object is judged not to be the advertisement, the video platform server also judges whether the character recognition result is consistent with the subsequent plot, and if so, the character recognition result is judged to be related to the subsequent plot of the video. The video platform server acquires video images within the subsequent preset time, and extracts subsequent plot information from the video images within the subsequent preset time so as to judge whether the character recognition result is consistent with the subsequent plot of the video.

In the embodiment, whether the image of the article is the advertisement is judged, if not, whether the character recognition result is consistent with the subsequent plot is judged, and if so, the character recognition result is related to the subsequent plot of the video.

In one example, the video platform server performs language understanding and content extraction on the character recognition result, and also performs OCR recognition on the video image in a subsequent preset time to extract characters in the image, particularly subtitle characters. The video platform server extracts keywords such as names of people, events, place names and the like by using a word bag model and a natural language processing model, vectorization processing and result comparison are carried out on the extraction results of the words and the place names, wherein the similarity of the words and the place names can be calculated, and when the similarity is larger than a preset threshold value, the character recognition result is judged to be consistent with the subsequent plot of the video. Specifically, the cosine distance between the word vector of the keyword of the text recognition result and the word vector of the subsequent video image may be calculated, and when the cosine distance is greater than a preset threshold, it is determined that the text recognition result is consistent with the subsequent plot of the video. When the video is the video which is requested by the user in the video platform, the video platform server can also obtain the bullet screen information of the video, and extract the keywords from the bullet screen.

In one example, the video platform server further performs action extraction and classification on the video follow-up plots to perform scene recognition on the video follow-up plots, and after obtaining keywords of the character recognition result, judges whether the keywords are consistent with the video plot scene. For example, when the follow-up plot of the video is the leader board title, and the keywords of the character recognition result are high, medium, and number-of-state, etc., it is determined that the character recognition result is consistent with the follow-up plot of the video. Specifically, motion extraction classification scene recognition may be performed using the I3D model.

In one example, the video platform server performs language understanding and content extraction on the character recognition result, extracts character emotion types in the character recognition result, performs emotion analysis and extraction on human faces of characters in a video image within a subsequent preset time to obtain character emotion types, judges whether the character emotion types are the same as the character emotion types, and judges that the character recognition result is consistent with a subsequent plot of the video if the character emotion types are the same as the character emotion types. The emotion analysis and extraction of the human face can be performed by using a CNN (conditional Neural Networks, abbreviated as "CNN") human face emotion analysis algorithm.

In one example, the video platform server further obtains a character display area image of a preset target article meeting a preset duration condition as a candidate image, performs character recognition on a text in the candidate image, judges whether a character recognition result is related to a subsequent plot of the video, if so, screens out a target image of which the character recognition result is related to the subsequent plot from the candidate image, and generates a display file according to text information in the character display area of the preset target article in the target image, namely, a character recognition result in the target image. The video platform server can generate a display file for displaying the supplementary display of the text according to the character recognition results in all the target images in the video.

In the embodiment, whether the character recognition result is related to the subsequent plot of the video is judged, if so, the display file is generated, the additionally displayed text can be screened, the text which is unrelated to the plot of the video is prevented from being displayed, and therefore system resources are saved.

A third embodiment of the present invention relates to an information display device, as shown in fig. 3, including:

an obtaining module 301, configured to obtain an appearance duration of a preset target item of a display text in a video;

the identification module 302 is configured to determine whether the occurrence duration meets a preset duration condition; if yes, acquiring a character recognition result of the text;

the display module 303 is configured to store the character recognition result in a display file; the display file is read by a video player, and the character recognition result is displayed when the video is played.

In one example, the display module 303 is further configured to determine whether the text recognition result is related to a subsequent episode of the video before storing the text recognition result in the display file; and if so, storing the character recognition result in a display file.

In an example, the display module 303 is further configured to determine whether an image appearing on the preset target object is an advertisement; if not, judging whether the character recognition result is consistent with the subsequent plot; if the two images are consistent, the character recognition result is related to the subsequent plot of the video.

In one example, determining whether the text recognition result is consistent with the subsequent plot includes: the character recognition result is consistent with the emotion types of the character expressions in the subsequent plots, and/or the character recognition result is consistent with the keywords of the subsequent plots.

In one example, the preset duration condition includes: the occurrence duration is less than the understanding time of the text; understanding the time is derived from the number of words contained in the text.

In an example, the identifying module 302 is further configured to determine whether the occurrence duration is smaller than a preset threshold before determining whether the occurrence duration satisfies a preset duration condition; if the time length is less than the preset threshold value, judging whether the occurrence time length meets the preset time length condition or not.

In an example, the display module 303 is specifically configured to generate display time information corresponding to the character recognition result according to the understanding time; and storing the display time information and the character recognition result in a display file.

In one example, the information display device of the present application further includes: the intercepting module is used for intercepting images of the video at regular time; and identifying whether a preset target object of the displayed text exists in the image or not through a pre-trained image identification model.

It should be understood that this embodiment is an example of an apparatus corresponding to the above embodiment, and that this embodiment can be implemented in cooperation with the above embodiment. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.

It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.

A fourth embodiment of the present invention relates to a server, as shown in fig. 4, including: at least one processor 401; a memory 402 communicatively coupled to the at least one processor; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executed by the at least one processor 401 to perform the above-mentioned information display method.

Where the memory 402 and the processor 401 are coupled by a bus, which may include any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 401 and the memory 402 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The information processed by the processor 401 may be transmitted over a wireless medium through an antenna, which may receive the information and transmit the information to the processor 401.

The processor 401 is responsible for managing the bus and general processing and may provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 402 may be used to store information used by the processor in performing operations.

A fifth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. An information display method, comprising:

acquiring the appearance duration of a preset target object of a display text in a video;

judging whether the occurrence time length meets a preset time length condition or not;

if so, acquiring a character recognition result of the text;

storing the character recognition result in a display file; and the display file is read by a video player, and the character recognition result is displayed when the video is played.

2. The information display method according to claim 1, further comprising, before the storing the character recognition result in a display file:

judging whether the character recognition result is related to the subsequent plot of the video or not;

and if so, storing the character recognition result in the display file.

3. The information display method of claim 2, wherein the determining whether the text recognition result is related to a subsequent episode of the video comprises:

judging whether the image appearing on the preset target object is an advertisement or not;

if not, judging whether the character recognition result is consistent with the subsequent plot;

and if the two images are consistent, the character recognition result is related to the subsequent plot of the video.

4. The information display method of claim 3, wherein the determining whether the text recognition result is consistent with the subsequent episode comprises:

the character recognition result is consistent with the emotion type of the character expression in the subsequent plot, and/or the character recognition result is consistent with the key word of the subsequent plot.

5. The information display method according to claim 1, wherein the preset duration condition includes:

the occurrence duration is less than the understanding time of the text; the understanding time is derived from the number of words contained in the text.

6. The information display method according to claim 5, before said determining whether the occurrence duration satisfies a preset duration condition, further comprising:

judging whether the occurrence time length is smaller than a preset threshold value or not;

and if the occurrence time length is smaller than the preset threshold, judging whether the occurrence time length meets a preset time length condition or not.

7. The information display method of claim 6, wherein the storing the text recognition result in a display file comprises:

generating display time information corresponding to the character recognition result according to the understanding time;

and storing the display time information and the character recognition result in the display file.

8. An information display device characterized by comprising:

the acquisition module is used for acquiring the appearance duration of a preset target object of a display text in a video;

the identification module is used for judging whether the occurrence time meets a preset time condition or not; if so, acquiring a character recognition result of the text;

the display module is used for storing the character recognition result in a display file; and the display file is read by a video player, and the character recognition result is displayed when the video is played.

9. A server, comprising:

at least one processor;

a memory communicatively coupled to the at least one processor;

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the information display method of any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the information display method according to any one of claims 1 to 7.