CN113536037A

CN113536037A - Video-based information query method, device, equipment and storage medium

Info

Publication number: CN113536037A
Application number: CN202010324145.2A
Authority: CN
Inventors: 程高飞
Original assignee: Baidu Online Network Technology Beijing Co Ltd; Shanghai Xiaodu Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2020-04-22
Filing date: 2020-04-22
Publication date: 2021-10-22

Abstract

The application discloses an information query method, device, equipment and storage medium based on videos, and relates to the technical field of intelligent search. The specific implementation scheme is as follows: determining a text recognition result of a target video picture by acquiring the target video picture, wherein the text recognition result comprises: and finally, acquiring an information query result corresponding to the text information based on the text recognition result in the text information in the target video picture, and displaying the information query result. In the technical scheme, the information query result corresponding to the text information is queried based on the text recognition result of the target video picture, a user does not need to switch a terminal interface, and the user does not need to manually input the text to be queried, so that the query efficiency and the user experience of the video text are improved.

Description

Video-based information query method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a video-based information query method, device, equipment and storage medium, which can be used in the technical field of intelligent search.

Background

In daily life, a user may be interested in some characters (e.g., signposts, billboards, book names, subtitles, etc.) in a video being played during the process of watching the video through a terminal device such as a mobile phone, a tablet computer, etc., and at this time, in order to know the related knowledge of the interested characters, it is usually necessary to query the related information of the characters by means of a search engine.

In the prior art, for querying characters in a video, the main method is as follows: the user remembers the interested characters to be searched, then the terminal device is controlled to be switched to the search page from the video player page, the characters to be searched are input into the search page, further the related information of the characters to be searched is obtained, and finally the characters to be searched are switched back to the video player page.

However, the above scheme requires the user to control the terminal device to execute page switching and skipping, the operation process is complicated, and the characters to be searched need to be manually input by the user, so that the problems of low searching efficiency and poor experience of the user exist.

Disclosure of Invention

The embodiment of the application provides a video-based information query method, device, equipment and storage medium, which are used for solving the problem of low user search efficiency in the existing video text search query.

In a first aspect, an embodiment of the present application provides a method for querying information of a video, including:

acquiring a target video picture;

determining a text recognition result of the target video picture, wherein the text recognition result comprises: text information in the target video picture;

acquiring an information query result corresponding to the text information based on the text recognition result;

and displaying the information query result.

In a second aspect, an embodiment of the present application provides a method for querying information of a video, including:

receiving a text query request from a terminal device, the text query request including: the text information is a text obtained by performing text recognition on a target video picture;

acquiring an information query result corresponding to the text information;

and sending the information inquiry result to the terminal equipment.

In a third aspect, an embodiment of the present application provides an information query apparatus based on a video, including: the device comprises an acquisition module, a processing module and a display module;

the acquisition module is used for acquiring a target video picture;

the processing module is configured to determine a text recognition result of the target video picture, where the text recognition result includes: text information in the target video picture and an information query result corresponding to the text information are obtained based on the text recognition result;

and the display module is used for displaying the information query result.

In a fourth aspect, an embodiment of the present application provides a video-based information query apparatus, including: the device comprises a receiving module, a processing module and a sending module;

the receiving module is configured to receive a text query request from a terminal device, where the text query request includes: the text information is a text obtained by performing text recognition on a target video picture;

the processing module is used for acquiring an information query result corresponding to the text information;

and the sending module is used for sending the information inquiry result to the terminal equipment.

In a fifth aspect, an embodiment of the present application provides a terminal device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

In a sixth aspect, an embodiment of the present application provides a server, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the second aspect.

In a seventh aspect, the present application provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect.

In an eighth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of the first aspect.

In a ninth aspect, an embodiment of the present application provides a video-based information query method, including:

acquiring a text recognition result of a target video picture, wherein the text recognition result comprises: text information in the target video picture;

and determining an information query result corresponding to the text information based on the text recognition result.

According to the information query method, the information query device, the information query equipment and the storage medium based on the video, the text recognition result of the target video picture is determined by acquiring the target video picture, and the text recognition result comprises the following steps: and finally, acquiring an information query result corresponding to the text information based on the text recognition result in the text information in the target video picture, and displaying the information query result. In the technical scheme, the information query result corresponding to the text to be queried is queried based on the text recognition result of the target video picture, a user does not need to switch a terminal interface, and the user does not need to manually input the text to be queried, so that the query efficiency and the user experience of the video text are improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic view of an application scenario of a video-based information query method provided in the present application;

fig. 2 is a schematic flowchart of a video-based information query method according to a first embodiment of the present application;

FIG. 3 is an interactive schematic diagram of a video-based information query method according to a second embodiment of the present application;

fig. 4 is a schematic diagram illustrating that a terminal device presents a target video frame through a current playing interface in an embodiment of the present application;

fig. 5 is an interaction diagram of a video-based information query method according to a third embodiment of the present application;

fig. 6 is a schematic diagram illustrating a processed target video frame presented through a display interface of a terminal device in an embodiment of the present application;

FIG. 7 is a diagram illustrating the terminal device selecting a text to be queried based on a text selection instruction of a user;

fig. 8 is a schematic flowchart of a video-based information query method according to a fourth embodiment of the present application;

FIG. 9 is a schematic interface diagram illustrating an information query result in a pop-up window form in the embodiment of the present application;

FIG. 10 is a block diagram of an information interaction of a video-based information query method according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a video-based information query apparatus according to a first embodiment of the present application;

fig. 12 is a schematic structural diagram of a video-based information query apparatus according to a second embodiment of the present application;

fig. 13 is a block diagram of a terminal device for implementing a video-based information query method provided by an embodiment of the present application;

fig. 14 is a block diagram of a server for implementing the video-based information query method provided by the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Before the technical solution of the present application is introduced, the terms referred to in the embodiments of the present application are explained first:

OCR character recognition

Optical Character Recognition (OCR) refers to a process of analyzing and recognizing an image file of text data to obtain text and layout information. That is, the characters in the image are recognized and returned in the form of text. A typical OCR solution can be divided into two parts: character detection and character recognition. Character detection is to detect the position, range and layout of the text in the image, and usually includes layout analysis, character line detection, and the like. Text detection mainly determines which positions of an image have text, and how large the range of the text is. The text recognition is to recognize the text content on the basis of text detection and convert the text information in the image into text information. Character recognition mainly determines what each character detected by the character is.

Search engine

The search engine is a retrieval technology that retrieves specified information from the internet by using a specific strategy and feeds the information back to a user according to user requirements and a certain algorithm. The search engine relies on various technologies, such as a web crawler technology, a retrieval sorting technology, a web page processing technology, a big data processing technology, a natural language processing technology and the like, and provides quick and high-relevance information service for information retrieval users. The core modules of the search engine technology generally comprise crawlers, indexing, retrieving, sorting and the like, and a series of other auxiliary modules can be added to create a better network use environment for users.

The OCR character recognition and search engine technology is mainly used for achieving the task of querying characters in the target video, and the specific implementation principle is not limited.

Before the technical scheme of the application is introduced, the application scenarios of the embodiment of the application are briefly introduced:

with the development of terminal equipment, the application field of videos is more and more extensive. When a user watches videos on a terminal device, characters (signposts, billboards, book names, some characters in subtitles, and the like) appearing in the videos are sometimes interested, and if the user wants to know related information deeply, the user needs to search a search engine for content related to the characters.

Aiming at a video text query method in the prior art, a user remembers a text to be searched, then the terminal device is controlled to be switched to a search app or a search engine website page from a video player page, then the text to be searched is input into the search app or the search engine website page, and further relevant information is obtained. However, in the method, the user needs to operate the terminal device to jump from the video playing page to the search engine app or the search engine website page, and needs to manually input the related characters for searching, so that the operation process is complicated, and when the phrase to be searched is uncommon, the user input is inconvenient, and the searching efficiency and experience of the user can be affected. Therefore, for the convenience of the user, the embodiment of the application provides a scheme, which can identify characters appearing in the video, and then jump to a search app or a webpage for search display, so that the user can conveniently acquire information.

According to the video-based information query method, a typical application scene is a scene for searching related contents based on characters in a video playing picture. The overall thought is as follows: when the characters in the video need to be recognized, the terminal equipment sends the video image carrying the characters to the server side for OCR character recognition, and information is inquired based on the recognition text included in the OCR character recognition result. The terminal equipment can also present a text at a corresponding position of the video picture of the video interface, acquire a selected phrase to be searched, send the selected phrase to be searched to a search engine after receiving a search instruction of a user, obtain a search result, and display the search result by overlapping a popup on the video interface; and finally, closing the popup window based on the popup window closing instruction of the user, returning to the video playing page and continuing to play the video.

Illustratively, fig. 1 is a schematic view of an application scenario of the video-based information query method provided in the present application. As shown in fig. 1, the application scenario may include: at least one terminal device (fig. 1 shows three terminal devices, respectively terminal device 111, terminal device 112, terminal device 113), network 12 and server 13. Wherein each terminal device and the server 13 can communicate through the network 12.

For example, in the application scenario shown in fig. 1, the server 13 may receive a text recognition request sent by a user through a terminal device via the network 12, process a target video frame included in the text recognition request, and return a resulting text recognition result to the terminal device via the network 12.

The server 13 may also receive a text query request sent by a user through the terminal device through the network 12, search the text to be queried carried in the text query request to obtain an information query result corresponding to the text to be queried, and return the information query result to the terminal device through the network 12.

It should be noted that fig. 1 is only a schematic diagram of an application scenario provided in the embodiment of the present application, and the embodiment of the present application does not limit the devices included in fig. 1, nor does it limit the position relationship between the devices in fig. 1, which may be set according to actual requirements.

In practical applications, a terminal device is a device for inputting programs and data to a computer or receiving a result of processing output from the computer via a communication facility, and is generally installed at a convenient place where it can be connected to a remote computer for work using the communication facility. For example, the terminal device of the embodiment of the present application is a terminal that has a display screen and is capable of performing video playing through the display, for example, a device such as a mobile phone, a tablet computer, and a smart television.

The server is a cloud server and provides calculation or application service for other clients (such as terminals like a PC, a smart phone, an ATM and the like, and even large-scale equipment like a train system and the like) in a network. In the embodiment of the application, the server has an image character recognition function and can inquire related content through an existing search engine.

It can be understood that, in the embodiments of the present application, specific implementations and functions of the terminal device and the server are not limited, and may be determined according to actual needs, which is not described herein again.

The technical solution of the present application will be described in detail below with reference to specific examples. It should be noted that the following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.

Fig. 2 is a schematic flowchart of a video-based information query method according to a first embodiment of the present application. For example, the embodiment of the present application is explained by taking an execution subject as a terminal device. As shown in fig. 2, the method may include the steps of:

s201, acquiring a target video picture.

Video text is composed of images, and each video picture is a frame of image. In practical application, when a user watches a target video through a terminal device, if a text which is interested by the user exists on a current video interface, the user can operate the terminal device, so that the terminal device obtains the current video interface, namely a target video picture.

In the embodiment of the present application, there are various ways for a terminal device to acquire a target video picture, for example, one way is: after the video is paused, the background acquires the current video interface on the paused interface and takes the current video interface as a target video picture; the other mode is as follows: and acquiring a target video picture corresponding to the current video interface based on the screenshot instruction of the user. The embodiment of the application does not limit the manner of obtaining the target video picture, and the method can be determined according to actual requirements, and is not described herein again.

As an example, the S201 may be specifically implemented by the following steps:

pausing playing of the target video according to a video pause request of a user; and acquiring a target video picture corresponding to the target video displayed on the current playing interface.

As another example, the S201 may be specifically implemented by the following steps:

and acquiring a target video picture according to the video screenshot instruction of the user, wherein the target video picture is the picture of the video currently played by the terminal equipment.

S202, determining a text recognition result of the target video picture, wherein the text recognition result comprises: text information in the target video picture.

In the embodiment of the application, after the terminal device acquires the target video picture, character recognition needs to be performed on the target video picture to determine a text recognition result of the target video picture.

In one possible design of the present application, if the terminal device has a text recognition capability, the terminal device may recognize the text in the target video frame, so as to obtain the text information in the target video frame.

In another possible design of the present application, if the terminal device does not have the capability of character recognition, the terminal device may send the target video picture to the server, and the server recognizes characters in the target video picture, and after obtaining text information in the target video picture, transmits the text information to the terminal device.

Optionally, in the above possible designs of the present application, the terminal device or the server usually performs character recognition on the target video image by using an OCR character recognition technology, and may also use other modes.

And S203, acquiring an information query result corresponding to the text information based on the text recognition result.

In an embodiment of the application, after the terminal device obtains the text recognition result, in a possible design, the terminal device may directly perform information retrieval according to text information in the text recognition result. In another possible design, the terminal device may first display the text selection instruction on a display interface of the terminal device for a user to operate the text selection instruction, and for example, the terminal device may receive a text selection instruction sent by the user according to the text recognition result; and secondly, selecting a text to be queried in the text information based on the text selection instruction, and finally performing information retrieval on the text to be queried.

Optionally, the information retrieval may be understood as a process of querying the text information or the text to be queried by a search engine, and further obtaining an information query result corresponding to the text information or the text to be queried.

In general, in order to obtain comprehensive query information, terminal equipment generally sends text information or a text to be queried to a server, the server queries the text information or related information of the text to be queried in a word stock of the whole network, and sends a corresponding information query result to the terminal equipment after the corresponding information query result is obtained.

And S204, displaying the information query result.

In the embodiment of the application, after the terminal device obtains the information query result, the information query result is displayed on a display interface of the terminal device, for example, on the basis of not switching a video playing interface, the terminal device may display the information query result in a popup page form, so that a user can know relevant query information corresponding to a text in a target video picture.

In the video-based information query method provided by the embodiment of the application, the text recognition result of the target video picture is determined by acquiring the target video picture, and the text recognition result includes: and finally, acquiring an information query result corresponding to the text information based on the text recognition result and displaying the information query result. In the technical scheme, the information query result corresponding to the text information is queried based on the text recognition result of the target video picture, a user does not need to switch a terminal interface, and the user does not need to manually input the text to be queried, so that the query efficiency and the user experience of the video text are improved.

On the basis of the above embodiments, fig. 3 is an interaction schematic diagram of a video-based information query method according to a second embodiment of the present application. The embodiment of the application is explained by information interaction between the terminal equipment and the server. Referring to fig. 3, in this embodiment, the step S202 may be implemented by:

s301, the terminal device sends a text recognition request to the server, and the text recognition request carries a target video picture.

In the embodiment of the application, after the terminal device acquires the target video picture, the target video picture can be transmitted to the server for text recognition. For example, an identifier for performing text recognition is set on a current playing interface of the terminal device, and when the terminal device receives a click operation of a user on the identifier, the terminal device may transmit a text recognition request carrying a target video picture to the server for performing text recognition.

Illustratively, fig. 4 is a schematic diagram of a terminal device presenting a target video frame through a current play interface in an embodiment of the present application. If the target video currently played by the terminal device mainly says 'beautiful mountain river', the video background is 'lake A'. Thus, referring to fig. 4, the target video screen has the text information "beauty mountain river" and "a lake", and at the same time, the play display interface of the terminal device has the logo (or button) of "text recognition" and the logo (or button) of search.

Illustratively, the terminal device pauses the playing of the video on the video APP of the terminal device according to a pause instruction of the user, at this time, the playing interface of the terminal device stops on a picture in which the user wants to search for a text in the video, i.e. a target video picture, as shown in fig. 4. At this time, if the user clicks an 'identify character' button on the current playing interface, the terminal device sends the screenshot of the target video image of the current interface to the server through the video APP, and specifically, sends the screenshot to the OCR character identification cloud service.

S302, the server performs text recognition on the received target video picture to obtain a text recognition result, wherein the text recognition result comprises: text information in the target video picture.

In this embodiment, after receiving a text recognition request carrying a target video picture from a terminal device, a server performs character recognition on the target video picture by an OCR character recognition method to determine text information in the target video picture, and obtains a text recognition result.

Alternatively, the text recognition result may include position information of the text information in addition to the text information in the target video picture.

For example, when receiving a target video picture, an OCR character recognition cloud service on a server may recognize characters in the target video picture and determine position information of the characters, that is, coordinate information of a certain recognized character in the target video picture.

S303, the server sends the text recognition result of the target video picture to the terminal equipment.

Optionally, after obtaining the text recognition result of the target video picture, the server may transmit the text recognition result to the terminal device, so that the terminal device displays the text recognition result.

In the embodiment of the application, the server is used for carrying out character recognition on the target video picture, and the text recognition result is fed back to the terminal equipment, so that the recognition of image characters is realized, the text recognition efficiency is improved, and a foundation is laid for the subsequent automatic query of the text to be queried in the target video picture.

The specific implementation of the recognition of the text in the target video picture by the server can be determined according to actual requirements, and details are not described here.

According to the information query method based on the video, the terminal device sends the text recognition request carrying the target video picture to the server, the server performs text recognition on the received target video picture to obtain the text recognition result, and the text recognition result is fed back to the terminal device, namely the server recognizes the image characters, the recognition efficiency is high, and a foundation is laid for automatic query of the text to be queried in the target video picture.

Exemplarily, on the basis of the foregoing embodiments, fig. 5 is an interaction schematic diagram of a video-based information query method according to a third embodiment of the present application. The embodiment of the application is explained by information interaction between the terminal equipment and the server. In an embodiment of the present application, the text recognition result further includes: the position information of the text information, therefore, the terminal equipment can also display the text recognition result of the target video picture after determining the text recognition result. Illustratively, referring to fig. 5, the method may further include the steps of:

and S501, the terminal equipment superposes the text information on the corresponding position of the target video picture according to the position information of the text information to obtain the processed target video picture.

In the embodiment of the application, after the terminal device receives the text recognition result of the target video picture from the server, in order to improve the visual experience of the user, the terminal device may superimpose the text information on the corresponding position of the target video picture according to the position information of the text information in the text recognition result, that is, superimpose the text information of each position in the target video picture on the corresponding position of the target video picture, so as to obtain the processed target video picture.

And S502, presenting the processed target video picture on a display interface of the terminal equipment.

For example, the terminal device may process the text recognition result to obtain a processed target video picture, and then may present the processed target video picture on a display interface of the terminal device, so that the user can process the processed target video picture.

By superposing the text information on the corresponding position of the target video picture and displaying the text information on the display interface, the user can accurately and clearly distinguish different text information included in the text recognition result, and the realization possibility of selecting the text information at different positions for the subsequent user is provided.

Exemplarily, fig. 6 is a schematic diagram of presenting a processed target video screen through a display interface of a terminal device in an embodiment of the present application. Fig. 6 is implemented on the basis of fig. 4, and referring to fig. 6, in the present embodiment, the text information "beauty mountain river" and "a lake" on the target video screen are respectively layered at corresponding positions on the target video screen, and both the "beauty mountain river" and the "a lake" are texts that can be selected.

For example, after acquiring text information of each position in the target video picture, that is, determining text information in the target video picture and coordinate information corresponding to the text information, a video APP of the terminal device displays the text information and the coordinate information on an interface of the video APP, as shown in fig. 6.

It is understood that, in the embodiment of the present application, the coordinate information of the text information is determined by taking the pixel of the target video frame as a unit and taking the upper left corner as a coordinate system as an origin. The text recognition results may correspond to the approximate location of the text in the target video frame when displayed.

The terminal equipment superposes the text information in the text recognition result on the corresponding position of the target video picture to obtain a processed target video picture, and the processed target video picture is displayed on a display interface of the terminal equipment, so that a user can accurately and clearly distinguish different text information included in the text recognition result, and the realization possibility of selecting the text information at different positions for a subsequent user is provided.

Further, in an embodiment of the present application, as shown in fig. 5, the method may further include the following steps:

and S503, the terminal equipment acquires a text selection instruction sent by the user according to the text recognition result.

The text selection indication is used for indicating the text to be inquired in the selected text information.

In the embodiment of the application, after the terminal device presents the obtained text recognition result on the display interface, the user can determine the text (called the text to be queried in this embodiment) that is interested in the user according to the text recognition result, and send a text selection instruction to instruct the terminal device to select the text.

It will be appreciated that the user can select the text to be queried in a number of ways, such as by touching or clicking on certain text displayed on the display interface to issue a text selection indication, or by speaking a text selection indication. The embodiment of the application does not limit the specific way in which the user sends the text selection instruction to select the text to be queried, and the text selection instruction can be determined according to the actual scene, and is not described herein again.

Further, in the embodiment of the present application, referring to fig. 5, the step S203 may be implemented by:

and S504, the terminal equipment selects the text to be inquired in the text recognition result according to the text selection instruction.

In the embodiment of the application, after the terminal device obtains the text selection instruction of the user, the selection operation of the text to be queried can be executed, namely the text to be queried indicated by the user is selected based on the text selection instruction.

Illustratively, fig. 7 is a schematic diagram of the terminal device selecting the text to be queried based on the text selection instruction of the user. Referring to FIG. 7, assume that the user is interested in "A lake" in the target video frame, at which point A lake is selected and indicated by dark shading.

And S505, the terminal equipment acquires a text search instruction of the user and generates a text query request according to the text search instruction, wherein the text query request carries the selected text to be queried.

In the embodiment of the application, the terminal device can detect the operation of the user in real time and execute the corresponding operation based on the operation of the user. For example, after the terminal device selects the text to be queried, when the user sends a text search instruction by clicking a search identifier on a display interface or in a voice manner, the terminal device may obtain the text search instruction of the user, and then generate a text query request according to the selected text to be queried.

S506, the terminal device sends a text query request to the server.

In the embodiment of the application, because the contents recorded on the network are complete, the server can provide the query service through the search engine on the network, the processing capability is strong, and the query range is wide, so that the terminal device sends the text query request to the server, and the server executes the query operation.

For example, as shown in fig. 7, if a user wants to retrieve a certain segment of text, the user needs to select a relevant text, that is, select a text to be queried, and after the user selects the text to be queried, the user can click a search identifier on a current display interface, so as to send a text query request to a server, that is, the video APP sends the text to be queried to an interface of a search engine.

And S507, the server acquires an information query result corresponding to the text information according to the received text query request.

In the embodiment of the application, after receiving the text query request, the server can perform query in the network according to the text information in the text query request to obtain the information query result corresponding to the text information. It can be understood that the text information is a text obtained by performing text recognition on the target video picture, and may also be a text selected by the terminal device according to a result selected by the user.

Illustratively, the server queries and retrieves information of the text to be queried through a search engine to obtain an information query result. Specifically, the search engine retrieves the related content according to the text to be queried in the text query request to obtain the information query result.

And S508, the server sends the information inquiry result to the terminal equipment.

In this embodiment, after acquiring the information query result corresponding to the text to be queried, the server may send the information query result to the terminal device, so that the terminal device displays the acquired information query result, and the like.

According to the video-based information query method provided by the embodiment of the application, the terminal equipment acquires the text selection indication sent by the user according to the text recognition result, selects the text to be queried in the text recognition result according to the text selection indication, acquires the text search indication of the user, generates the text query request according to the text search indication, and sends the text query request to the server, and the server acquires the information query result corresponding to the text to be queried according to the received text query request and feeds the information query result back to the terminal equipment.

Exemplarily, on the basis of the foregoing embodiments, fig. 8 is a schematic flowchart of a video-based information query method according to a fourth embodiment of the present application. The embodiment of the present application is described with a terminal device as an execution subject. Referring to fig. 8, in this embodiment, the step S204 may be implemented by:

s801, displaying the information query result on a display interface of the terminal device in a popup window mode.

In the embodiment of the application, one implementation manner of displaying the information query result by the terminal device is to display the information query result in a pop-up window form, and the presentation manner of the information query result is similar to the presentation manner of the result obtained by the user performing the target query on the search page, so that the user can conveniently operate the information query result.

Further, in the embodiment of the present application, as shown in fig. 8, the method may further include the following steps:

and S802, processing the information query result according to the popup operation instruction of the user.

Wherein the popup operation instruction comprises at least one of the following operations: a pop-up window closing indication, a pop-up window page turning indication and a page gliding indication.

For example, when the terminal device displays the information query result on the display interface of the terminal device in a popup mode, the user may perform some processing operations on the information query result in a popup page, for example, the terminal device performs operations such as sliding down and page turning on the popup page according to the user operation, so as to browse information and click a search result. After the user acquires the information, the user can also send a popup window closing instruction to remove the popup window page and return to the video playing page to continue playing the video.

For example, when the pop-up operation instruction is a pop-up closing instruction, the terminal device may close the pop-up displaying the information query result according to the pop-up closing instruction of the user, so as to control the terminal device to continue playing the paused target video on the display interface.

For example, fig. 9 is an interface schematic diagram showing an information query result in a pop-up window form in the embodiment of the present application. Referring to fig. 9, after receiving the information query result, the video APP of the terminal device displays the result in a pop-up window manner, and the user may slide down or turn over the page of the video APP, and may further interact with the search engine to obtain the content of the next page.

The specific operation of the user for processing the information query result may be determined according to actual requirements, and is not described herein again.

According to the information query method based on the video, after the terminal device obtains the information query result of the text to be queried, the information query result can be displayed on the display interface of the terminal device in a pop-up window mode, and then the information query result is processed according to the pop-up window operation instruction of the user. According to the technical scheme, the user does not need to switch the display page of the terminal equipment, the query operation of the text in the target video can be realized, the query operation process is simplified, and the video text query efficiency is improved.

As can be seen from the foregoing embodiments, fig. 10 is an information interaction block diagram of a video-based information query method according to an embodiment of the present application. As shown in fig. 10, the technical solution of the present application is mainly: for a video played in a mobile application of a terminal device, when characters (including subtitles) which are interesting to a user appear in a video picture, the user can pause the playing (or screenshot) of the video and click a button for quick searching, at the moment, a video app sends the obtained target video picture to a cloud OCR character recognition service of a server, so that the video app recognizes the characters and the position thereof in the video, obtains a text recognition result (character information of each position), and then feeds back the text recognition result to the terminal device, so that the terminal device displays the text at the corresponding position of the target video picture. In the process of displaying the text recognition result on the display interface of the terminal device, a user presses characters for a long time, namely a cursor for selecting a text appears, the user can move the starting position and the starting position left and right, then a search button on the display interface is clicked, the terminal device sends the text to be inquired (the selected text) to a search engine of a server through a video APP for information inquiry, and the information inquiry result is fed back to the terminal device and displayed through a pop-up window.

In the embodiment of the application, the operation of initiating the search by the user is faster and more convenient, keywords to be searched do not need to be memorized, characters do not need to be manually input, the search result can be obtained by staying in the current interface of the current video app in the search process, the search can be completed without switching to other apps or opening a browser for searching, furthermore, after the search result is obtained, the pop-up page can be closed, the video is continuously watched, and the experience is more smooth.

In the above, a specific implementation of the data processing method mentioned in the embodiment of the present application is introduced, and the following is an embodiment of the apparatus of the present application, which can be used to execute the embodiment of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Fig. 11 is a schematic structural diagram of a video-based information query apparatus according to a first embodiment of the present application. The device can be integrated in the terminal equipment and can also be realized by the terminal equipment. As shown in fig. 11, in this embodiment, the video-based information query apparatus 110 may include: an acquisition module 1101, a processing module 1102 and a display module 1103.

The acquiring module 1101 is configured to acquire a target video frame;

the processing module 1102 is configured to determine a text recognition result of the target video picture, where the text recognition result includes: text information in the target video picture and an information query result corresponding to the text information are obtained based on the text recognition result;

the display module 1103 is configured to display the information query result.

Optionally, as shown in fig. 11, in an embodiment of the present application, the apparatus further includes: a transmitting module 1104 and a receiving module 1105.

Optionally, in a possible design of the present application, the processing module 1102 is configured to determine a text recognition result of the target video picture, specifically:

the processing module 1102 is specifically configured to send a text recognition request to a server through the sending module 1104, where the text recognition request carries the target video picture, and receive a text recognition result of the target video picture from the server through the receiving module 1105.

In another possible design of the present application, the text recognition result further includes: location information of the text information; the processing module 1102 is further configured to superimpose the text information on a corresponding position of the target video picture according to the position information of the text information, so as to obtain a processed target video picture;

the display module 1103 is further configured to present the processed target video picture on a display interface of the terminal device.

Optionally, in another possible design of the present application, the obtaining module 1101 is further configured to obtain a text selection instruction sent by a user according to the text recognition result, where the text selection instruction is used to instruct to select a text to be queried in the text information;

correspondingly, the processing module 1102 is configured to obtain an information query result corresponding to the text information based on the text recognition result, specifically:

the processing module 1102 is specifically configured to:

according to the text selection instruction, selecting the text to be inquired in the text recognition result;

acquiring a text search instruction of a user, and generating a text query request according to the text search instruction, wherein the text query request carries the selected text to be queried;

the text query request is sent to the server through the sending module 1104, and the information query result corresponding to the text to be queried is received from the server through the receiving module 1105.

Optionally, in another possible design of the present application, the display module 1103 is specifically configured to display the information query result on a display interface of the terminal device in a pop-up window manner.

For example, in an embodiment of the present application, the processing module 1102 is further configured to process the information query result according to a pop-up operation instruction of a user, where the pop-up operation instruction includes at least one of the following operations: a pop-up window closing indication, a pop-up window page turning indication and a page gliding indication.

Optionally, in another possible design of the present application, the obtaining module 1101 is specifically configured to pause playing of a target video according to a video pause request of a user, and obtain a target video picture corresponding to the target video displayed on a current playing interface.

Optionally, in another possible design of the present application, the obtaining module 1101 is specifically configured to obtain the target video picture according to a video screenshot instruction of a user, where the target video picture is a picture of a video currently played by a terminal device.

The apparatus provided in the embodiment of the present application may be configured to implement the scheme of the terminal device in any one of the foregoing method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 12 is a schematic structural diagram of a video-based information query apparatus according to a second embodiment of the present application. The device can be integrated in a server or realized by the server. As shown in fig. 12, in this embodiment, the video-based information query device 120 may include: a receiving module 1201, a processing module 1202 and a sending module 1203.

The receiving module 1201 is configured to receive a text query request from a terminal device, where the text query request includes: the text information is a text obtained by performing text recognition on a target video picture;

the processing module 1202 is configured to obtain an information query result corresponding to the text to be queried;

the sending module 1203 is configured to send the information query result to the terminal device.

In an embodiment of the present application, the receiving module 1201 is further configured to receive a text recognition request from a terminal device before receiving a text query request from the terminal device, where the text recognition request carries the target video picture;

the processing module 1202 is further configured to perform text recognition on the target video picture to obtain a text recognition result, where the text recognition result includes: text information in the target video picture;

the sending module 1203 is further configured to send the text recognition result to the terminal device.

The apparatus provided in the embodiment of the present application may be configured to execute the scheme of the server in any one of the foregoing method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the processing module may be a processing element separately set up, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a function of the processing module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

Further, according to the embodiment of the application, the application also provides a terminal device, a server and a computer readable storage medium.

Fig. 13 is a block diagram of a terminal device for implementing a video-based information query method according to an embodiment of the present application. In embodiments of the present application, the terminal device may represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices, which are terminals with video playing capabilities.

Illustratively, as shown in fig. 13, the terminal device may include: at least one processor 1301, memory 1302 communicatively coupled to the at least one processor; the memory 1302 stores instructions executable by the at least one processor 1301, and the instructions are executed by the at least one processor 1301, so that the at least one processor 1301 can execute the scheme of the terminal device in the embodiments shown in fig. 2 to 10.

Optionally, in an embodiment of the present application, the terminal device may further include: an input device 1303 and an output device 1304. The processor 1301, the memory 1302, the input device 1303 and the output device 1304 may be connected by a bus or other means, and fig. 13 illustrates the bus connection.

Further, in the embodiment of the present application, the terminal device further includes a display 1305, where the display 1305 is configured to display the played video and display an information query result corresponding to the text information.

Fig. 14 is a block diagram of a server for implementing the video-based information query method provided by the embodiment of the present application. In embodiments of the present application, server is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.

Illustratively, as shown in fig. 14, the server may include: at least one processor 1401, memory 1402 communicatively coupled to the at least one processor; the memory 1402 stores instructions executable by the at least one processor 1401, and the instructions are executed by the at least one processor 1401, so that the at least one processor 1401 can execute the server scheme in the embodiments shown in fig. 2 to 10.

Optionally, in an embodiment of the present application, the server may further include: an input device 1403 and an output device 1404. The processor 1401, the memory 1402, the input device 1403, and the output device 1404 may be connected by a bus or other means, as exemplified by the bus connection in fig. 14.

It will be appreciated that the components illustrated in fig. 13 and 14, their connections and relationships, and their functions, described above, are meant to be examples only, and are not intended to limit implementations of the present application as described and/or claimed herein.

In the schematic diagrams shown in fig. 13 and fig. 14, the terminal device and the server may further include interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system).

In the schematic diagrams shown in fig. 13 and 14, the memory is a non-transitory computer readable storage medium provided by the present application. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the methods provided herein.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the video-based information query method in the embodiments of the present application (for example, the memory 1302 corresponds to the acquisition module 1101, the processing module 1102, the display module 1103, the sending module 1104, and the receiving module 1105 shown in fig. 11, and the memory 1402 corresponds to the receiving module 1201, the processing module 1202, and the sending module 1203 shown in fig. 12). The processor executes various functional applications of the server and data processing by executing non-transitory software programs, instructions, and modules stored in the memory, that is, implements the method in the above-described method embodiments.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device and/or the server, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the terminal device and/or the server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means may receive input numeric or character information and generate key signal inputs related to user settings and function control of the terminal device and/or the server, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick or other input means. The output devices may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Further, an embodiment of the present application further provides a video-based information query method, including:

According to the technical scheme of the embodiment of the application, the text recognition result of the target video picture is determined, and the text recognition result comprises the following steps: and acquiring an information query result corresponding to the text information based on the text information in the target video picture. In the technical scheme, the information query result corresponding to the text to be queried is queried based on the text recognition result of the target video picture, a user does not need to switch a terminal interface, and the user does not need to manually input the text to be queried, so that the query efficiency and the user experience of the video text are improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A video-based information query method is characterized by comprising the following steps:

acquiring a target video picture;

and displaying the information query result.

2. The method of claim 1, wherein determining the text recognition result of the target video picture comprises:

sending a text recognition request to a server, wherein the text recognition request carries the target video picture;

and receiving a text recognition result of the target video picture from the server.

3. The method of claim 1, wherein the text recognition result further comprises: location information of the text information;

the method further comprises the following steps: according to the position information of the text information, the text information is superposed on the corresponding position of the target video picture to obtain a processed target video picture;

and presenting the processed target video picture on a display interface of the terminal equipment.

4. The method of claim 3, further comprising:

acquiring a text selection instruction sent by a user according to the text recognition result, wherein the text selection instruction is used for indicating selection of a text to be queried in the text information;

the obtaining of the information query result corresponding to the text information based on the text recognition result includes:

sending the text query request to a server;

and receiving an information query result corresponding to the text to be queried from the server.

5. The method according to any one of claims 1 to 4, wherein the displaying of the information query result corresponding to the text to be queried includes:

and displaying the information query result on a display interface of the terminal equipment in a popup window mode.

6. The method of claim 5, further comprising:

processing the information query result according to a popup operation instruction of a user, wherein the popup operation instruction comprises at least one of the following operations: a pop-up window closing indication, a pop-up window page turning indication and a page gliding indication.

7. The method according to any one of claims 1-4, wherein the obtaining the target video picture comprises:

pausing playing of the target video according to a video pause request of a user;

and acquiring a target video picture corresponding to the target video displayed on the current playing interface.

8. The method according to any one of claims 1-4, wherein the obtaining the target video picture comprises:

and acquiring the target video picture according to the video screenshot instruction of the user, wherein the target video picture is the picture of the video currently played by the terminal equipment.

9. A video-based information query method is characterized by comprising the following steps:

acquiring an information query result corresponding to the text information;

and sending the information inquiry result to the terminal equipment.

10. The method of claim 9, wherein prior to said receiving a text query request from a terminal device, the method further comprises:

receiving a text recognition request from a terminal device, wherein the text recognition request carries the target video picture;

performing text recognition on the target video picture to obtain a text recognition result, wherein the text recognition result comprises: text information in the target video picture;

and sending the text recognition result to the terminal equipment.

11. An apparatus for querying information based on video, comprising: the device comprises an acquisition module, a processing module and a display module;

the acquisition module is used for acquiring a target video picture;

and the display module is used for displaying the information query result.

12. The apparatus of claim 11, further comprising: a transmitting module and a receiving module;

the processing module is configured to determine a text recognition result of the target video picture, and specifically includes:

the processing module is specifically configured to send a text recognition request to a server through the sending module, where the text recognition request carries the target video picture, and receive a text recognition result of the target video picture from the server through the receiving module.

13. The apparatus of claim 11, wherein the text recognition result further comprises: location information of the text information;

the processing module is further configured to superimpose the text information on a corresponding position of the target video picture according to the position information of the text information to obtain a processed target video picture;

and the display module is also used for presenting the processed target video picture on a display interface of the terminal equipment.

14. The apparatus of claim 13, further comprising: a transmitting module and a receiving module;

the acquisition module is further used for acquiring a text selection instruction sent by a user according to the text recognition result, wherein the text selection instruction is used for indicating selection of a text to be queried in the text information;

the processing module is configured to obtain an information query result corresponding to the text information based on the text recognition result, and specifically includes:

the processing module is specifically configured to:

and sending the text query request to a server through the sending module, and receiving an information query result corresponding to the text to be queried from the server through the receiving module.

15. The apparatus according to any one of claims 11 to 14, wherein the display module is specifically configured to display the information query result on a display interface of a terminal device in a pop-up window form.

16. The apparatus of claim 15, wherein the processing module is further configured to process the information query result according to a pop-up operation instruction of a user, where the pop-up operation instruction includes at least one of: a pop-up window closing indication, a pop-up window page turning indication and a page gliding indication.

17. The apparatus according to any one of claims 11 to 14, wherein the obtaining module is specifically configured to pause playing of a target video according to a video pause request of a user, and obtain a target video picture corresponding to the target video displayed on a currently playing interface.

18. The apparatus according to any one of claims 11 to 14, wherein the obtaining module is specifically configured to obtain the target video frame according to a video screenshot instruction of a user, where the target video frame is a frame of a video currently played by the terminal device.

19. An apparatus for querying information based on video, comprising: the device comprises a receiving module, a processing module and a sending module;

20. The apparatus of claim 19, wherein the receiving module is further configured to receive a text recognition request from a terminal device before receiving a text query request from the terminal device, where the text recognition request carries the target video frame;

the processing module is further configured to perform text recognition on the target video picture to obtain a text recognition result, where the text recognition result includes: text information in the target video picture;

the sending module is further configured to send the text recognition result to the terminal device.

21. A terminal device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

22. A server, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of claim 9 or 10.

23. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

24. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of claim 9 or 10.

25. A video-based information query method is characterized by comprising the following steps: