CN113552984A

CN113552984A - Text extraction method, device, equipment and medium

Info

Publication number: CN113552984A
Application number: CN202110910385.5A
Authority: CN
Inventors: 曹宽怡; 邹应; 张宁静; 王彦杰; 赵绚; 冯杨兰
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2021-10-26

Abstract

The present disclosure relates to a text extraction method, apparatus, device, and medium. The text extraction method comprises the following steps: displaying a target video, wherein the target video comprises text information; responding to a first trigger operation of the target video, and displaying a structured text on a text display interface, wherein the structured text is obtained based on text information; and responding to the selection operation of the structured text, and extracting the target text selected by the selection operation from the structured text so as to publish the target text. According to the embodiment of the disclosure, a fast and simple video text extraction mode can be provided for a user, the efficiency of extracting the video text by the user is improved, and the efficiency of publishing the video text is further improved.

Description

Text extraction method, device, equipment and medium

Technical Field

The present disclosure relates to the field of electronic devices, and in particular, to a method, an apparatus, a device, and a medium for text extraction.

Background

With the continuous development of internet technology, video clients installed on electronic devices such as mobile phones and computers have become important tools for people to watch videos. During the process of watching the video, if some favorite or special text contents are seen, people want to store the favorite or special text contents so as to distribute the text contents when needed.

At present, people generally need to manually extract the text content which is desired to be saved, and the efficiency of extracting the text content and the efficiency of releasing the text content are reduced.

Disclosure of Invention

To solve the technical problems described above or at least partially solve the technical problems, the present disclosure provides a text extraction method, apparatus, device, and medium.

In a first aspect, the present disclosure provides a text extraction method, including:

displaying a target video, wherein the target video comprises text information;

responding to a first trigger operation of the target video, and displaying a structured text on a text display interface, wherein the structured text is obtained based on text information;

and responding to the selection operation of the structured text, and extracting the target text selected by the selection operation from the structured text so as to publish the target text.

In a second aspect, the present disclosure provides a text extraction apparatus, including:

the video display unit is configured to display a target video, and the target video comprises text information;

the first display unit is configured to respond to a first trigger operation on the target video and display a structured text on a text display interface, wherein the structured text is obtained based on text information;

and the pattern extraction unit is configured to respond to the selection operation of the structured text, and extract the target text selected by the selection operation from the structured text so as to publish the target text.

In a third aspect, the present disclosure provides a text extraction apparatus, comprising:

a processor;

a memory for storing executable instructions;

the processor is configured to read the executable instructions from the memory and execute the executable instructions to implement the text extraction method according to the first aspect.

In a fourth aspect, the present disclosure provides a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement the text extraction method of the first aspect.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:

according to the text extraction method, the text extraction device, the text extraction equipment and the text extraction medium, in the process of displaying the target video including the text information, the first trigger operation on the target video can be detected in real time, when the first trigger operation is detected, the first trigger operation is responded, the structured text obtained based on the text information is displayed on the text display interface, the selection operation on the structured text is detected in real time, and when the selection operation is detected, the selection operation is responded, the target text selected by the selection operation is extracted from the structured text to be published on the target text.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic flow chart of a text extraction method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a video recommendation stream interface provided by an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a video search results interface provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a text display interface provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of another text presentation interface provided by embodiments of the present disclosure;

FIG. 6 is a schematic diagram of yet another text presentation interface provided by an embodiment of the present disclosure;

fig. 7 is a flowchart illustrating a method for generating a structured text according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of irrelevant text information provided by an embodiment of the disclosure;

FIG. 9 is a schematic diagram of yet another text display interface provided by an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of yet another text presentation interface provided by an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of a text extraction apparatus according to an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of a text extraction device according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The embodiment of the disclosure provides a text extraction method, a text extraction device, text extraction equipment and a text extraction medium, which can provide a quick and simple video text extraction mode for a user.

The text extraction method provided by the embodiment of the disclosure is first described with reference to fig. 1 to 10.

Fig. 1 shows a schematic flow chart of a text extraction method provided by an embodiment of the present disclosure.

In an embodiment of the present disclosure, the text extraction method may be performed by an electronic device. Among them, the electronic devices may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), wearable devices, and the like, and fixed terminals such as digital TVs, desktop computers, smart home devices, and the like.

As shown in fig. 1, the text extraction method may include the following steps.

And S110, displaying a target video, wherein the target video comprises text information.

In embodiments of the present disclosure, the electronic device may present a target video containing textual information to the user, where the textual information may include textual words contained in individual video frames of the target video. Alternatively, the text may be any text in the video frame, or may also be a subtitle in the video frame, and the like, which is not limited herein.

In some embodiments, the target video may belong to any video category.

In other embodiments, the target video may belong to a target video category, which may include a video category that satisfies the user's text extraction needs.

The text extraction requirement may refer to a requirement that a user extracts a video text in a video, that is, the target video category may include a video category to which a video that the user will frequently extract the video text belongs.

For example, the target video category may include a video category to which videos containing more video texts, such as an emotion category, a movie rating category, a music short category, a television series category, a movie category, a menu category, and the like, belong.

Therefore, in the embodiment of the disclosure, when a user is interested in a target video containing more video texts, namely, video texts in the target video belonging to the target video classification, the user can quickly and simply record interested texts without memorizing and manually inputting a large number of texts, so that the user experience can be improved.

In the embodiment of the present disclosure, the electronic device may display the target video in various scenes, such as a video recommendation stream, a video search result, and the like, which is not limited herein.

In some embodiments, the video recommendation stream scene refers to a video in a video recommendation stream received by the user through the electronic device and played in sequence. Videos in the video recommendation stream can be recommended for the user by the server based on the viewing needs of the user and the hot events of the society.

Optionally, S110 may specifically include:

and displaying the video content of the target video in the playing interface of the target video.

Specifically, when the electronic device plays the target video in the video recommendation stream scene, the electronic device may display the video content of the target video in a playing interface of the target video, so as to display the target video.

Fig. 2 is a schematic diagram illustrating a video recommendation stream interface provided by an embodiment of the present disclosure.

As shown in fig. 2, the electronic device 201 may display a play interface 202, and play a video 203 recommended for the user in the video recommendation stream in the play interface 202, where the video 203 contains text information "video file: XXXXXXXX ".

In other embodiments, S110 may further specifically include:

and responding to the received search keywords, and displaying video preview information of the target video on a search result interface, wherein the target video belongs to videos obtained through searching based on the search keywords.

Specifically, in a video search result scenario, a user may input a search keyword to an electronic device to trigger video search based on the search keyword, and after receiving the search keyword input by the user, the electronic device may send the search keyword to a server in response to receiving the search keyword, so that the server performs video search based on the search keyword in a pre-stored video, and takes a video associated with the search keyword as a video obtained by search based on the search keyword, where the video obtained by search based on the search keyword may include a target video, that is, the target video belongs to a video obtained by search based on the search keyword, that is, the target video is one of the videos obtained by search based on the search keyword.

The server can feed back videos obtained by searching based on the search keywords and containing the target videos to the electronic equipment, and the electronic equipment can display video preview information of the videos obtained by searching based on the search keywords in a search result interface after receiving the videos obtained by searching based on the search keywords. The electronic equipment can display video preview information of the target video on the search result interface when a user browses the search result interface so as to display the target video.

Further, the video preview information of the target video may further include a preview window of the target video, and when the complete preview window of the target video is displayed in the search result interface, the target video may be automatically played in the preview window. When at least part of the preview window of the target video is moved out of the search result interface, the playing of the target video can be paused in the preview window until the target video can be continuously played in the preview window after the preview window of the target video completely reenters the search result interface.

Fig. 3 shows a schematic diagram of a video search result interface provided by an embodiment of the present disclosure.

As shown in fig. 3, electronic device 301 may display video preview information for video 303 in search results interface 302 for the "XX copy," video 303 containing the textual information "video copy: XXXXXXXX ".

Returning to fig. 1, continuing to explain S120, in response to the first trigger operation on the target video, displaying a structured text on the text display interface, where the structured text is obtained based on the text information.

In the embodiment of the disclosure, the electronic device may detect a first trigger operation on a target video in real time, and when a user wants to extract a video text from the target video, the first trigger operation on the target video may be input to the electronic device, so that the electronic device may display a structured text obtained based on text information on a text display interface in response to the first trigger operation.

In some embodiments, the first trigger operation may include, without limitation, a gesture control operation such as double-click, long-press, or the like, a voice control operation, or an expression control operation, or the like, on the target video for triggering display of the structured text corresponding to the target video.

In other embodiments, a first control may be further displayed on the target video, and the first control may be an icon, a button, or the like for triggering display of the structured text corresponding to the target video, which is not limited herein.

Wherein the first trigger operation may include a trigger operation on the first control. Specifically, the first trigger operation may include, without limitation, a gesture control operation such as a click and a long press on the first control, a voice control operation, or an expression control operation, which is used to trigger a control function corresponding to the first control.

Optionally, the first control may be displayed on the target video in various scenarios such as a video recommendation stream, a video search result, and the like.

With continued reference to fig. 2, an "extract text" button 204 may be displayed on the video 203, and the user may click on the "extract text" button 204 to trigger display of a text-based information "video copy" corresponding to the video 203: xxxxxxxx "resulting structured text.

With continued reference to fig. 3, a "extract text" button 304 may be displayed on the video 303, and the user may click on the "extract text" button 304 to trigger display of a text-based information "video copy" corresponding to the video 303: xxxxxxxx "resulting structured text.

Returning to fig. 1, in an example, after S110, the text extraction method may further include: timing the display duration of the target video; and displaying the first control on the target video under the condition that the display duration reaches the preset duration.

Specifically, in the process of displaying the target video containing the text information to the user, the electronic device can time the display duration of the target video, and judge whether the display duration reaches the preset duration, if so, the first control is displayed on the target video, otherwise, the first control is not displayed on the target video, so that the user disturbance of the target video without the video text extraction requirement is reduced.

In another example, after S110, the text extraction method may further include: in the event that the target video has pre-generated structured text associated therewith, a first control is displayed on the target video.

Specifically, after presenting a target video containing text information to a user, the electronic device may detect whether the target video has pre-generated structured text associated therewith, and if so, display a first control on the target video, otherwise, not display the first control on the target video to avoid erroneously presenting the user with the extraction capability of the video text.

Alternatively, the electronic device may detect whether the server has sent structured text concurrently with sending the target video thereto, and if so, may determine that the target video has pre-generated structured text associated therewith, otherwise, may determine that the target video does not have pre-generated structured text associated therewith.

In this case, the electronic device may display the received structured text directly within the text presentation interface in response to the first trigger operation on the target video.

Optionally, the electronic device may detect whether the received target video carries an identifier with structured text, and if so, may determine that the target video has pre-generated structured text associated therewith, otherwise, may determine that the target video does not have pre-generated structured text associated therewith.

In this case, the electronic device may acquire, from the server, the structured text associated with the target video and generated in advance by the server in response to the first trigger operation on the target video, and then display the acquired structured text in the text display interface.

In the embodiment of the disclosure, the electronic device may open and display a text display interface in response to the first trigger operation, and a user may extract the video text of the target video in the text display interface.

Wherein the text presentation interface can be used to display structured text. The structured text refers to a text which is obtained by structured processing in a preset mode and has a structured format corresponding to the preset mode, and the structured text can display each text component by using different text paragraphs, so that the venation of the text is clearer, a user can quickly grab important information of the text, and the video text can be quickly extracted according to each text component.

Alternatively, the structured text may be obtained by performing a preset structured processing on the target information.

The preset mode may include a line-based processing mode, a punctuation-based processing mode and a text element-based processing mode.

In some embodiments, the line-by-line processing may be embodied as each line of video text displayed in the target video as a paragraph of text in the structured text.

For example, when the target information is subtitles displayed in a line in the target video, the target information may be split into a plurality of text paragraphs in a line processing manner, so that each text paragraph corresponds to a line of subtitles in the target video.

Fig. 4 is a schematic diagram illustrating a text presentation interface provided by an embodiment of the present disclosure.

As shown in fig. 4, the electronic device 401 may display a text presentation interface 402, where six sections of text are displayed in the text presentation interface 402, each section of text may correspond to one text passage in the structured text, and each text passage may correspond to one line of subtitles in the target video.

In other embodiments, the punctuation processing manner may be specifically that a section of video text before each punctuation displayed in the target video is taken as a text paragraph in the structured text.

For example, when the target information is a piece of article displayed in the target video, the target information may be split into a plurality of text paragraphs in a punctuation processing manner, so that each text paragraph corresponds to a piece of video text before a punctuation in the piece of article.

With continued reference to fig. 4, each of the six text segments displayed within the text presentation interface 402 may correspond to a text passage in the structured text, and each text passage may correspond to a segment of video text preceding a punctuation in an article displayed by the target video.

In still other embodiments, the text element processing manner may be embodied as that a section of video text which is displayed in the target video and used for representing a text element is taken as a text section in the structured text.

In one example, the text elements may be event elements such as time, place, people, and the like.

Fig. 5 is a schematic diagram illustrating another text presentation interface provided by an embodiment of the present disclosure.

As shown in fig. 5, the electronic device 501 may display a text presentation interface 502, where multiple lines of text are displayed within the text presentation interface 502, each line of text may correspond to a text passage in the structured text, and each text passage may correspond to an event element in a news event displayed by the target video.

In another example, the text elements may also be menu elements such as materials, steps, notes, etc.

Fig. 6 is a schematic diagram illustrating yet another text presentation interface provided by an embodiment of the present disclosure.

As shown in fig. 6, the electronic device 601 may display a text presentation interface 602, where multiple pieces of text are displayed in the text presentation interface 602, each piece of text may correspond to one text passage in the structured text, and each text passage may correspond to one menu element in a menu displayed by the target video.

In the embodiment of the present disclosure, the target information may be information extracted from the text information, which satisfies a preset quality evaluation condition and is used to characterize the target content.

The preset quality evaluation condition may be a preset condition for quality screening of the text, for example, the quality evaluation condition may include a security quality evaluation condition and a content quality evaluation condition.

The target information meeting the safety quality evaluation condition means that the target information does not contain sensitive words and other information which is not easy to propagate, and the target information meeting the content quality evaluation condition means that the target information is information which is complete in content, smooth in statement and easy to understand and use by a user.

The target information is used for representing the target content, and the target information does not contain information irrelevant to the target content.

The target content may include at least one of a target emotion, video content and audio content corresponding to the target video.

The target emotions corresponding to the target video may include a happy emotion, a blessing emotion, and the like, which are comparatively energetic. Accordingly, the target information is used to represent the target content, and means that the target information is information for expressing a more positive energy emotion, such as a good sentence, a blessing word, and the like. The video content corresponding to the target video may include content presented by the video frame. Accordingly, the target information is used for representing the target content, and the target information is information for describing the content displayed by the video picture, such as a video subtitle related to the content displayed by the video picture.

The audio content corresponding to the target video may include content played by background audio of the video. Accordingly, the target information is used to represent that the target information refers to information that the target information is content played by background audio for showing video, such as video subtitles related to the content played by the background audio.

In the disclosed embodiment, before the electronic device presents the target video, the server may generate a structured text for the target video in advance.

Fig. 7 shows a flowchart of a method for generating a structured text according to an embodiment of the present disclosure.

As shown in fig. 7, the server may identify a video text in each video frame of the published target video by using an Optical Character Recognition (OCR) technology, determine that the target video only has image content if no video text is identified in each video frame, and not perform further processing on the target video, that is, the target video does not have a function of fast extracting a video text, and if a video text is identified in each video frame, use all the identified video texts as text information of the target video. After the text information of the target video is obtained, the server can utilize a preset text filtering mode, such as inputting the text information into a text filtering model obtained through pre-training, and filter unnecessary information irrelevant to the target content in the text information so as to screen out information used for representing the target content in the text information.

Fig. 8 is a schematic diagram illustrating irrelevant text information provided by an embodiment of the disclosure.

As shown in fig. 8, the target video may include a video screen 801 with content of a screenshot of the social platform, and the text information displayed in the video screen 801 may include target information 802 such as "video copy: xxxxxx "and optional information 803 such as" remind who see "and" who can see ". In the process of processing the text information by the server, unnecessary information 803 that is not concerned by the user can be filtered out by using a preset text filtering mode to obtain the target information 802.

Returning to fig. 7, after the information for representing the target content is obtained, the information for representing the target content is scanned again, matching detection is performed with the sensitive vocabulary, if the information for representing the target content is detected to contain the sensitive words, the target video is not processed continuously, that is, the target video does not have the function of fast extracting the video text, if it is not detected that the information for representing the target content contains sensitive words, information for characterizing the target content is input into a content quality assessment model trained in advance, if the predicted value output by the content quality evaluation model is smaller than the preset predicted value threshold value, the target video is not processed continuously, and if not, taking the information for representing the target content as target information corresponding to the target video to finish quality evaluation of the target information. Since OCR technology can recognize video text in a video frame by lines, structured text extracted in a line processing manner can be obtained based on the above steps.

Further, the server can further perform segmentation processing on the structured text according to the punctuation to obtain the structured text according to the punctuation processing mode.

Further, the server can further segment the structured text according to the text elements to obtain the structured text according to the text element processing mode.

Therefore, in the embodiment of the disclosure, the OCR technology can be utilized to identify the video texts with different fonts, different types and different positions in the target video, and filter out other character information irrelevant to the target content in the target video, so as to provide the information to the user in the form of the structured text.

In the embodiment of the disclosure, after the server generates the structured text, the structured text and the target video can be stored in association, so that when the electronic device needs to show the target video to a user, the target video and the structured text are sent to the electronic device together; or after the server generates the structured text, the structured text and the target video can be stored in association, so that when the electronic device needs to show the target video to a user, the target video carrying the identifier with the structured text is sent to the electronic device.

Returning to fig. 1, continuing to describe S130, in response to the selection operation on the structured text, the target text selected by the selection operation is extracted from the structured text, so as to publish the target text.

In the embodiment of the disclosure, the electronic device may detect a selection operation on the structured text in real time, and when a user wants to extract a target text of interest from the structured text, the selection operation of the user selecting the target text may be input to the electronic device, so that the electronic device may extract the target text selected by the selection operation from the structured text in response to the selection operation to publish the target text.

The target text is published in a form including, but not limited to, pasting the target text in the target display area, forwarding the target text by one key, and generating an analysis picture by using the target text.

Optionally, the structured text displayed on the text presentation interface is in a selectable state, and the selection operation may include a gesture control operation such as a click, a double click, a long press, etc., a voice control operation, or an expression control operation, etc., for selecting the target text, which is not limited herein.

In the embodiment of the disclosure, in the process of displaying a target video including text information, a first trigger operation on the target video can be detected in real time, and when the first trigger operation is detected, a structured text obtained based on the text information is displayed on a text display interface in response to the first trigger operation, so that a selection operation on the structured text is detected in real time, and when the selection operation is detected, a target text selected by the selection operation is extracted from the structured text in response to the selection operation, so as to publish the target text.

In another embodiment of the present disclosure, after S120, the text extraction method may further include: and responding to a third trigger operation on the structured text, and updating the text content displayed on the text display interface according to the operation direction corresponding to the third trigger operation and the text display sequence of the structured text.

Optionally, the third trigger operation may include, without limitation, a gesture control operation such as a slide operation, a voice control operation, or an expression control operation on the structured document.

In the embodiment of the disclosure, in the case that the structured text is long, the entire text content of the structured text may not be displayed in the text display interface, and at this time, the electronic device may respond to the first trigger operation on the target video, and display a part of the text content of the structured text in the text display interface according to the sequence that the text display sequence of the structured text is from first to last. In the process of displaying the text display interface, the electronic device may further detect a third trigger operation on the structured text in real time, and when the user wants to view the undisplayed text content of the structured text, the third trigger operation on the structured text may be input to the electronic device, so that the electronic device may respond to the third trigger operation to acquire and display text content adjacent to the currently displayed partial text content in the operation direction corresponding to the third trigger operation, so as to update the text content displayed on the text display interface, and further enable the user to view all the text content of the structured text.

In yet another embodiment of the present disclosure, the structured text may include a plurality of text paragraphs, and the plurality of text paragraphs may be segmented according to the preset manner.

Accordingly, S120 may specifically include: and displaying a plurality of text paragraphs on the text display interface in a segmented manner according to a preset text format.

In the embodiment of the present disclosure, the electronic device may determine a preset text format of each text passage of the structured text, and display each text passage in a segmented manner according to the preset text format of each text passage in the text presentation interface.

The preset text formats of the text paragraphs of the structured text may be the same or different, and are not limited herein.

Optionally, the preset text format may include a text paragraph format, a text font format, and the like preset as needed, and may also include a text paragraph format, a text font format, and the like consistent with that in the target video, which is not limited herein.

With continued reference to fig. 4, the electronic device 401 may display a text presentation interface 402, where the text presentation interface 402 may be used to display text contained in a video, for example, six pieces of text may be displayed in a left-justified manner in segments within the text presentation interface 402, and each piece of text may correspond to a text paragraph in the structured text.

In some embodiments of the present disclosure, S120 may further include: and under the condition that a first text paragraph meeting a preset highlighting condition exists in the plurality of text paragraphs, displaying the first text paragraph on the text display interface in a segmented manner according to a target display mode corresponding to the highlighting condition met by the first text paragraph.

In this disclosure, the server may preset different highlighting conditions, and detect whether any one of the paragraphs meets any one of the highlighting conditions, if the paragraph meets any one of the highlighting conditions, the server may add a highlighting identifier corresponding to the meeting highlighting condition to the paragraph, and if the paragraph does not meet any one of the highlighting conditions, the server does not add the highlighting identifier to the paragraph.

Specifically, the electronic device may detect whether each text paragraph carries a highlight identifier, and if any text paragraph carries a highlight identifier, the text paragraph is a first text paragraph that satisfies a preset highlight condition, that is, a first text paragraph that satisfies the preset highlight condition exists in the text paragraphs. Different display modes corresponding to different highlighting identifications are stored in the electronic equipment in advance, after the electronic equipment detects the highlighting identification carried by the first text paragraph, a target display mode corresponding to the carried highlighting identification can be determined, and the first text paragraph is displayed on the text display interface in a segmented mode according to the target display mode.

The highlighting condition may include that a text passage contains a search keyword, the heat of the text passage is the highest, the repetition rate of the text passage in the formatted text is the highest, and the like, which is not limited herein.

Alternatively, the popularity of a text paragraph may be determined according to historical interaction data of the text paragraph, such as the number of history extractions, the number of history comments, and the like.

Further, the display modes corresponding to the respective highlighting identifications are respectively used for prompting the user that the text passage satisfies the highlighting condition corresponding to the highlighting identification.

In one example, the display manner corresponding to each highlighted identifier may be a text format corresponding to the text paragraph displayed as each highlighted identifier.

In another embodiment, the display manner corresponding to each highlighted identifier may be to add a label corresponding to each highlighted identifier to the text passage.

With continued reference to FIG. 4, a hit label 403 may be added for the extracted text paragraph with the highest rank in the structured text selected by a large number of users.

In other embodiments of the present disclosure, S120 may further include: and under the condition that a second text paragraph which does not meet the highlighting condition exists in the text paragraphs, the second text paragraph is displayed on the text display interface in a segmented mode according to a first preset display mode.

Specifically, the electronic device may detect whether each text paragraph carries a highlight identifier, and if any text paragraph does not carry a highlight identifier, the text paragraph is a second text paragraph that does not satisfy any preset highlight condition, that is, there is a second text paragraph that does not satisfy the highlight condition in the text paragraphs. The electronic device may display the second text paragraph on the text display interface in a segmented manner according to a preset first preset display mode.

The first preset display mode may be a non-highlighted display mode.

Therefore, in the embodiment of the present disclosure, the text paragraphs can be displayed according to the display mode corresponding to the highlight condition that the text paragraphs satisfy, so as to improve the screening efficiency of the user.

In the disclosed embodiments, the selection operation may optionally be used to select a target passage of text among a plurality of passages of text, such that the target passage of text may form the target text.

Further, the selection operation may include, without limitation, a gesture control operation such as a click, a double click, a long press, etc., a voice control operation, or an expression control operation, etc., for selecting the target text.

Optionally, in these embodiments, after responding to the selection operation on the structured text, the text extraction method may further include: and highlighting the target text paragraph according to a second preset display mode.

The second preset display mode may be any display mode for representing that the text passage is in the selected state, and is not limited herein. For example, the second preset display mode may add a selected mark to a text paragraph.

Fig. 9 is a schematic diagram illustrating a further text presentation interface provided by an embodiment of the present disclosure.

As shown in fig. 9, the electronic device 901 may display a text presentation interface 902, where the text presentation interface 902 may be used to display text included in a video, for example, six pieces of text may be displayed in a left-justified manner in a segmented manner in the text presentation interface 902, and each piece of text may correspond to one text paragraph in the structured text. When a user wants to extract a target text passage of interest from the structured text, such as a first text passage and a second text passage, the first text passage and the second text passage can be clicked on respectively. Electronic device 901 may add a check mark 903 such as a "√" symbol to the first piece of text after detecting that the user clicked on the first piece of text, and electronic device 901 may add a check mark 903 such as a "√" symbol to the second piece of text after detecting that the user clicked on the second piece of text.

Optionally, in these embodiments, a second control may be further displayed in the text presentation interface, and the second control may be an icon, a button, or the like for triggering the copying of the target text, which is not limited herein.

Further, after extracting the target text selected by the selection operation from the structured text, the text extraction method may further include: responding to a second trigger operation of the second control, and copying the target text in a segmented mode according to a preset text format; and responding to the pasting operation of the target text, and pasting the target text in a segmentation manner in the target display area according to a preset text format so as to release the target text.

In the embodiment of the disclosure, after the user finishes selecting the target text, a second trigger operation on the second control may be input, so that the electronic device may respond to the second trigger operation and copy the target text in a segmented manner according to the preset text format, so as to implement storage of the target text.

Specifically, the second trigger operation may include, without limitation, a gesture control operation such as a click and a long press on the second control, a voice control operation, or an expression control operation, which is used to trigger a control function corresponding to the second control.

Therefore, in the embodiment of the disclosure, when the electronic device copies the target text, the text format of the target text can be kept unchanged.

With continued reference to fig. 9, a "copy document" button 904 may also be displayed in the text display interface 902, and the user may click the "copy document" button 904 to cause the electronic device 901 to copy the first and second text segments in a left-justified manner.

Fig. 10 is a schematic diagram illustrating a further text presentation interface provided by an embodiment of the present disclosure.

As shown in fig. 10, after the electronic device 1001 completes the copying of the first text segment and the second text segment, a prompt message 1003 such as "copy successful" may be displayed in the text presentation interface 1002.

In the embodiment of the disclosure, after the electronic device finishes copying the target text, if the user wants to publish the target text, the user may input a paste operation on the target text in the target display area, so that the electronic device responds to the paste operation and pastes the target text in the target display area in a segmented manner according to a preset text format, so as to publish the target text.

Therefore, in the embodiment of the disclosure, when the electronic device pastes the target text, the text format of the target text can be kept unchanged.

The target display area can be a text input box of a social platform or a chat interface, so that the user can publish the target text on the social platform or to a friend.

In summary, the text extraction method provided by the embodiment of the present disclosure may generate a structured video text corresponding to text information included in a video based on an OCR technology, so as to provide an entry for a user to quickly extract the video text, and support the user to perform personalized selection and one-key copy in the video text, so that the user may quickly obtain a required text.

The embodiment of the present disclosure further provides a text extraction apparatus, which is described below with reference to fig. 11.

Fig. 11 shows a schematic structural diagram of a text extraction apparatus provided in an embodiment of the present disclosure.

In the embodiment of the present disclosure, the text extraction device may be an electronic device. The electronic devices may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs, PADs, PMPs, in-vehicle terminals (e.g., car navigation terminals), wearable devices, and the like, and fixed terminals such as digital TVs, desktop computers, smart home devices, and the like, among others.

As shown in fig. 11, the text extraction apparatus 1100 may include a video presentation unit 1110, a first display unit 1120, and a pattern extraction unit 1130.

The video presentation unit 1110 may be configured to present a target video including text information.

The first display unit 1120 may be configured to display a structured text on the text presentation interface in response to a first trigger operation on the target video, where the structured text is obtained based on text information.

The pattern extraction unit 1130 may be configured to extract a target text selected by a selection operation from the structured text in response to the selection operation on the structured text, so as to publish the target text.

In some embodiments of the present disclosure, the structured text may be obtained by performing a structured processing in a preset manner on target information, where the target information may be information extracted from the text information, meeting a preset quality assessment condition, and used for representing target content, and the target content may include at least one of a target emotion, video content, and audio content corresponding to a target video.

In some embodiments of the present disclosure, the structured text may include a plurality of text paragraphs.

The first display unit 1120 may be further configured to display a plurality of text paragraphs on the text presentation interface in a segmented manner according to a preset text format.

In some embodiments of the present disclosure, the first display unit 1120 may include a first sub display unit and a second sub display unit.

The first sub-display unit may be configured to, when a first text paragraph satisfying a preset highlighting condition exists in the plurality of text paragraphs, display the first text paragraph on the text presentation interface in a segmented manner according to a target display manner corresponding to the highlighting condition satisfied by the first text paragraph.

The second sub-display unit may be configured to, in a case where there is a second text paragraph that does not satisfy the highlight condition among the plurality of text paragraphs, segmentally display the second text paragraph on the text presentation interface in a first preset display manner.

In some embodiments of the present disclosure, the selection operation may be used to select a target text paragraph among a plurality of text paragraphs, the target text paragraph may form a target text, and a second control may be displayed within the text presentation interface.

Accordingly, the text extraction apparatus 1100 may further include a text copy unit and a text paste unit.

The text copying unit can be configured to respond to a second trigger operation on the second control after extracting the target text selected by the selection operation from the structured text, and copy the target text in a segmented manner according to the preset text format.

The text pasting unit may be configured to paste the target text in a segmented manner in the target display area according to a preset text format in response to a pasting operation of the target text, so as to publish the target text.

In some embodiments of the present disclosure, the text extraction apparatus 1100 may further include a second display unit, and the second display unit may be configured to highlight the target text paragraph in a second preset display manner after responding to the selection operation on the structured text.

In some embodiments of the present disclosure, the target video may belong to a target video category, which may include a video category that satisfies the user's text extraction needs.

In some embodiments of the present disclosure, a first control may be displayed on the target video, and the first trigger operation may include a trigger operation on the first control.

Accordingly, the text extraction apparatus 1100 may further include a display timing unit and a third display unit.

The display timing unit may be configured to time a display duration of the target video after the target video is presented.

The third display unit may be configured to display the first control on the target video when the display duration reaches a preset duration.

In some embodiments of the present disclosure, the video presenting unit 1110 may be further configured to display the video content of the target video in the playing interface of the target video; or in response to receiving the search keyword, displaying video preview information of a target video on a search result interface, wherein the target video belongs to videos obtained through searching based on the search keyword.

In some embodiments of the present disclosure, the text extraction apparatus 1100 may further include a fourth display unit, and the fourth display unit may be configured to, after the structured text is displayed, respond to a third trigger operation on the structured text, and update the text content displayed on the text display interface according to the operation direction corresponding to the third trigger operation and the text display order of the structured text.

It should be noted that the text extraction apparatus 1100 shown in fig. 11 may perform each step in the method embodiments shown in fig. 1 to 10, and implement each process and effect in the method embodiments shown in fig. 1 to 10, which are not described herein again.

Embodiments of the present disclosure also provide a text extraction device that may include a processor and a memory, which may be used to store executable instructions. The processor may be configured to read the executable instructions from the memory and execute the executable instructions to implement the text extraction method in the above embodiments.

Fig. 12 shows a schematic structural diagram of a text extraction device provided in an embodiment of the present disclosure. Referring now specifically to fig. 12, a schematic diagram of a text extraction device 1200 suitable for use in implementing embodiments of the present disclosure is shown.

The text extraction device 1200 in the embodiments of the present disclosure may be an electronic device. The electronic devices may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs, PADs, PMPs, in-vehicle terminals (e.g., car navigation terminals), wearable devices, and the like, and fixed terminals such as digital TVs, desktop computers, smart home devices, and the like, among others.

It should be noted that the text extraction device 1200 shown in fig. 12 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present disclosure.

As shown in fig. 12, the text extraction apparatus 1200 may include a processing device (e.g., a central processing unit, a graphic processor, etc.) 1201, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1202 or a program loaded from a storage device 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for the operation of the text extraction device 1200 are also stored. The processing apparatus 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.

Generally, the following devices may be connected to the I/O interface 1205: input devices 1206 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, or the like; output devices 1207 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, or the like; storage devices 1208 including, for example, magnetic tape, hard disk, etc.; and a communication device 1209. The communication means 1209 may allow the text extraction device 1200 to perform wireless or wired communication with other devices to exchange data. While FIG. 12 illustrates a text extraction apparatus 1200 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

The embodiment of the present disclosure also provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the processor is enabled to implement the text extraction method in the above embodiment.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication device 1209, or installed from the storage device 1208, or installed from the ROM 1202. The computer program performs the above-described functions defined in the text extraction method of the embodiment of the present disclosure when executed by the processing apparatus 1201.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP, and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be included in the text extraction device; or may exist separately without being assembled into the text extraction device.

The computer-readable medium carries one or more programs which, when executed by the text extraction device, cause the text extraction device to perform:

displaying a target video, wherein the target video comprises text information; responding to a first trigger operation of the target video, and displaying a structured text on a text display interface, wherein the structured text is obtained based on text information; and responding to the selection operation of the structured text, and extracting the target text selected by the selection operation from the structured text so as to publish the target text.

In embodiments of the present disclosure, computer program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A text extraction method, comprising:

displaying a target video, wherein the target video comprises text information;

responding to a first trigger operation of the target video, and displaying a structured text on a text display interface, wherein the structured text is obtained based on the text information;

2. The method according to claim 1, wherein the structured text is obtained by performing a structured processing in a preset manner on target information, the target information being information extracted from the text information and satisfying a preset quality assessment condition and being used for representing target content, and the target content including at least one of a target emotion, video content and audio content corresponding to the target video.

3. The method of claim 1, wherein the structured text comprises a plurality of text paragraphs;

wherein the displaying the structured text on the text presentation interface comprises:

and displaying the plurality of text paragraphs on the text display interface in a segmented manner according to a preset text format.

4. The method of claim 3, wherein the step of displaying the plurality of text paragraphs in segments on the text presentation interface comprises:

under the condition that a first text paragraph meeting a preset highlighting condition exists in the text paragraphs, displaying the first text paragraph on the text display interface in a segmented manner according to a target display mode corresponding to the highlighting condition met by the first text paragraph;

and under the condition that a second text paragraph which does not meet the highlight condition exists in the text paragraphs, displaying the second text paragraph on the text display interface in a segmented manner according to a first preset display mode.

5. The method of claim 3, wherein the selecting operation is used to select a target text paragraph among the plurality of text paragraphs, wherein the target text paragraph forms the target text, and wherein a second control is further displayed within the text display interface;

wherein after the extracting the target text selected by the selection operation from the structured text, the method further comprises:

responding to a second trigger operation of the second control, and copying the target text in a segmented manner according to the preset text format;

and responding to the pasting operation of the target text, and pasting the target text in a target display area in a segmentation mode according to the preset text format so as to publish the target text.

6. The method of claim 5, wherein after the operation of selecting the structured text in response, the method further comprises:

and highlighting the target text paragraph according to a second preset display mode.

7. The method of claim 1, wherein the target video belongs to a target video category, and wherein the target video category comprises a video category that satisfies a user's text extraction requirements.

8. The method according to claim 1, wherein a first control is displayed on the target video, and the first trigger operation comprises a trigger operation on the first control;

wherein after the presenting the target video, the method further comprises:

timing the display duration of the target video;

and displaying the first control on the target video under the condition that the display duration reaches a preset duration.

9. The method of claim 1, wherein said presenting the target video comprises:

displaying the video content of the target video in a playing interface of the target video;

alternatively, the first and second electrodes may be,

and responding to the received search keywords, and displaying the video preview information of the target video on a search result interface, wherein the target video belongs to the videos searched based on the search keywords.

10. The method of claim 1, wherein after the displaying the structured text on the text presentation interface, the method further comprises:

and responding to a third trigger operation on the structured text, and updating the text content displayed on the text display interface according to the operation direction corresponding to the third trigger operation and the text display sequence of the structured text.

11. A text extraction device characterized by comprising:

the first display unit is configured to respond to a first trigger operation on the target video and display a structured text on a text display interface, wherein the structured text is obtained based on the text information;

12. A text extraction device characterized by comprising:

a processor;

a memory for storing executable instructions;

wherein the processor is configured to read the executable instructions from the memory and execute the executable instructions to implement the text extraction method of any of claims 1-10.

13. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the text extraction method of any of the preceding claims 1-10.