WO2023124874A1

WO2023124874A1 - Question and answer result searching method and apparatus, device, and storage medium

Info

Publication number: WO2023124874A1
Application number: PCT/CN2022/137552
Authority: WO
Inventors: 汪忠超; 王艳丽
Original assignee: 北京字节跳动网络技术有限公司
Priority date: 2021-12-28
Filing date: 2022-12-08
Publication date: 2023-07-06
Also published as: CN114297433A; CN114297433B

Abstract

The present invention relates to the technical field of video processing, and provides a question and answer result searching method and apparatus, a device, and a storage medium. The method comprises: obtaining video information corresponding to a search question, wherein the video information comprises a video cover, the video cover is composed of a plurality of key frame images of a video, each key frame image comprises text content and image content which solve the search question, and the number of the key frame images is matched with the number of answer key points for solving the search question; displaying the video cover in a search result page; and playing back the video in response to detection of a trigger operation for the video cover. According to the present invention, video searching efficiency can be improved.

Description

Search method, device, equipment and storage medium for question answering results

Cross References to Related Applications

This application is based on the Chinese application with the application number 202111620699.8 and the filing date is December 28, 2021, and claims its priority. The disclosure content of the Chinese application is hereby incorporated into this application as a whole.

technical field

Embodiments of the present disclosure relate to the technical field of video processing, and in particular, to a search method, device, device, and storage medium for question and answer results.

Background technique

The video application provided by the related technology can provide a video search function and a video playback function, through which a user can search for related videos and play them.

Contents of the invention

The disclosure provides a search method, device, equipment and storage medium for question and answer results.

The present disclosure provides a search method for question and answer results, including:

Acquiring video information corresponding to the search question; wherein the video information includes a video cover, the video cover is composed of a plurality of key frame images of the video, each of the key frame images includes text content and Image content, the number of key frame images matches the number of answer points to answer the search question; display the video cover on the search results page; The video plays.

Optionally, the arrangement order of the plurality of key frame images in the video cover matches the order of key points of the answers corresponding to the search question.

Optionally, the key frame image is an image including the text content and the image content of the key points of the answer obtained by intercepting from the video frame of the video.

Optionally, the text content of the answer points contained in the key frame image is recognized from the video frame of the video, and the image content of the answer points contained in the key frame image is based on the The text content of the answer gist is matched from the video frame of the video.

Optionally, the playing the video in response to detecting a trigger operation on the video cover includes: determining the The time mark corresponding to the first key frame image on the playback time axis of the video; jump to the play page of the video, and start playing the video from the time mark

The present disclosure also provides a search method for question and answer results, including:

receiving a search request from a terminal device, where the search request includes a search question;

Based on the search question, searching for videos containing answers to the search question;

determining a plurality of answer points of the answer based on the audio signal or the text content of the video;

For each answer point, process a key frame image containing text content and image content of the answer point from the video;

Generate a video cover of the video based on a plurality of key frame images corresponding to the plurality of answer points;

Feedback the video information including the video cover to the terminal device.

Optionally, the key frame images containing the text content and image content of the answer points are obtained from the video processing, including:

Carrying out character recognition processing to the video to obtain the text content of the answer points; based on the text content of the answer points, identifying the image content matching the text content from the video frame of the video; The image content is intercepted in the video frame; based on the text content of the answer points and the image content, a key frame image containing the answer points is generated.

Optionally, generating the video cover of the video based on the multiple key frame images corresponding to the multiple answer points includes:

Based on the order of the plurality of answer points in the answer, determine the arrangement order of the plurality of key frame images corresponding to the plurality of answer points in the video cover; based on the size of the plurality of key frame images and the order of arrangement between the plurality of key frame images, determine the corresponding mosaic template; based on the size of each region in the mosaic template, perform proportional scaling processing on the key frame images corresponding to each region; The key frame image is inserted into the corresponding area of the template to obtain the video cover of the video.

Optionally, before feeding back the video information including the cover of the video to the terminal device, it also includes: determining the time mark corresponding to each key frame image on the playback time axis of the video; The corresponding relationship with the time identifier is added to the video information.

Optionally, the determining the time mark corresponding to each key frame image on the playback time axis of the video includes:

For each key frame image, the key frame image is matched with the video frame in the video, and the target video frame matching the key frame image in the video frame is determined; the target video frame is correspondingly The time identifier of is determined as the corresponding time identifier of the key frame image on the playing time axis of the video.

Optionally, the determining the time identifier corresponding to each key frame image on the playing time axis of the video includes: determining according to the mapping relationship between each video frame and the playing time on the playing time axis of the video Time marks corresponding to each key frame image on the playing time axis of the video.

The present disclosure also provides a search device for question and answer results, including:

An information acquisition module, configured to acquire video information corresponding to the search question; wherein, the video information includes a video cover, and the video cover is composed of a plurality of key frame images of the video, and each of the key frame images includes the answer described in the text content and image content of the search question, the number of keyframe images matching the number of answer points to answer the search question;

A cover display module, configured to display the cover of the video on the search result page;

A video playing module, configured to play the video in response to detecting a trigger operation on the video cover.

Optionally, the video playing module is configured to: in response to detecting a trigger operation on the first key frame image on the cover of the video, determine that the first key frame image corresponds to time mark; jump to the playing page of the video, and start playing the video from the time mark.

A request receiving module, configured to receive a search request from a terminal device, where the search request includes a search question;

A video search module, configured to search for a video containing an answer to the search question based on the search question;

A point determination module, configured to determine multiple answer points of the answer based on the audio signal or text content of the video;

An image obtaining module, for each answer point, from the video to process the key frame image containing the text content and image content of the answer point;

A cover generation module, for generating the video cover of the video based on a plurality of key frame images corresponding to the plurality of answer points;

An information feedback module, configured to feed back video information including the video cover to the terminal device.

Optionally, the image obtaining module includes:

The text content obtaining unit is used to perform character recognition processing on the video to obtain the text content of the answer points;

an image content identification unit, configured to identify image content matching the text content from the video frame of the video based on the text content of the answer points;

an image content intercepting unit, configured to intercept the image content from the video frame;

An image generation unit, configured to generate a key frame image containing the answer points based on the text content of the answer points and the image content.

Optionally, the image obtaining module includes:

A sequence determination unit, configured to determine the arrangement order of the plurality of key frame images corresponding to the plurality of answer points in the video cover based on the order of the plurality of answer points in the answer;

A template determination unit, configured to determine a corresponding splicing template based on the size of the plurality of key frame images and the arrangement sequence among the plurality of key frame images;

A scaling processing unit, configured to perform proportional scaling processing on the key frame images corresponding to each area based on the size of each area in the mosaic template;

The image inserting unit is configured to insert the scaled key frame image into the corresponding area of the template to obtain the video cover of the video.

Optionally, the device also includes:

A time stamp determination module, configured to determine the time stamp corresponding to each key frame image on the playing time axis of the video;

The relationship adding module is used to add the corresponding relationship between each key frame image and the time stamp to the video information.

Optionally, the time stamp determination module includes:

A matching unit, for each key frame image, matching the key frame image with a video frame in the video, and determining a target video frame in the video frame that matches the key frame image;

The first identification determining unit is configured to determine the time identification corresponding to the target video frame as the time identification corresponding to the key frame image on the playback time axis of the video.

Optionally, the time identifier determination module includes: a second identifier determination unit, configured to determine the key frame images in the The corresponding time mark on the playing time axis of the video.

The present disclosure also provides a terminal device, including: a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the above-mentioned method.

The present disclosure also provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the processor executes the method as described above.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or related technologies, the following will briefly introduce the drawings that need to be used in the descriptions of the embodiments or related technologies. Obviously, for those of ordinary skill in the art, Other drawings can also be obtained from these drawings without any creative effort.

FIG. 1 is a flowchart of a search method for question and answer results according to Embodiment 1 of the present disclosure;

FIG. 2 is a flowchart of a method for generating a video cover according to Embodiment 1 of the present disclosure;

FIG. 3 is a schematic diagram of a video cover according to Embodiment 1 of the present disclosure;

FIG. 4 is a flowchart of a search method for question and answer results described in Embodiment 2 of the present disclosure;

FIG. 5 is a structural block diagram of a search device for question and answer results described in Embodiment 4 of the present disclosure;

FIG. 6 is a structural block diagram of a search device for question and answer results according to Embodiment 5 of the present disclosure.

Detailed ways

In order to understand the above features and advantages of the present disclosure more clearly, the solutions of the present disclosure will be further described below. It should be noted that, in the case of no conflict, the embodiments of the present disclosure and the features in the embodiments can be combined with each other.

In the following description, many specific details are set forth in order to fully understand the present disclosure, but the present disclosure can also be implemented in other ways than described here; obviously, the embodiments in the description are only some of the embodiments of the present disclosure, and Not all examples.

In the video application provided by the related technology, the user can search for related videos and play them. However, in the video search results, users need to spend time to find the content they need to watch, which reduces the search efficiency. Based on this, embodiments of the present disclosure provide a search method, device, device, and storage medium for question and answer results, and the embodiments of the present disclosure will be described in detail below.

Embodiment one:

Referring to the flow chart of the search method for question and answer results shown in FIG. 1 , the method can be applied to terminal devices with video applications, such as mobile phones and tablet computers. The search method for the question and answer results includes the following steps:

Step S102, obtain video information corresponding to the search question; wherein, the video information includes a video cover, and the video cover is composed of a plurality of key frame images of the video, and each key frame image includes text content and image content for answering the search question, and the key frame The number of images matches the number of answer points that answer the search question.

In an embodiment, the user sends a search request including a search question to the server through the terminal device; wherein, the search question may contain at least one keyword.

For ease of understanding, an implementation manner in which the server feeds back video information corresponding to the search question is given below.

First, based on the search question, the server searches to obtain a video containing the answer to the search question; during implementation, the server can first search for an answer that matches the keywords in the search question based on the search question, and then select the video from the preset video resource library. Search for videos with that answer in .

Then, based on the audio signal or the text content of the video, a plurality of answer points of the answer are determined. The server can identify the content of the video, such as text content, audio signal, image content and playing time of the video, so as to determine multiple answer points of the answer based on the identified content. For each answer point, process the key frame image containing the text content and image content of the answer point from the video; specifically, for example, use the recognized text content and image content to determine from the video the key that matches the answer point frame image.

Thereafter, based on the multiple key frame images corresponding to the multiple answer points, the video cover of the video is generated, and the video information including the video cover is fed back to the terminal device. Wherein, the video cover may be the splicing processing result of each key frame image, and the splicing mode of each key frame image is flexible and changeable, and the video cover thus formed can present a better page display effect.

Step S104, displaying the video cover on the search result page.

The terminal device receives the video information and displays the video cover contained in the video information on the search result page.

In step S106, the video is played in response to detecting a trigger operation on the video cover.

In this embodiment, the terminal device judges whether a trigger operation on the video cover is detected; if a trigger operation is detected, the video is played in response to the trigger operation.

The search method for the question-and-answer results provided by the embodiments of the present disclosure obtains video information corresponding to the search question; the video information includes a video cover composed of multiple key frame images of the video, and each key frame image includes text to answer the search question Content and image content, the number of keyframe images matches the number of answer points to answer the search question; display the video cover in the search results page; play the video in response to detecting a trigger action on the video cover. In this technical solution, the key frame images that make up the video cover include text content and image content that answer the search question. The text content is more in line with the user's daily reading habits and can improve the intuitiveness of the search results. The image content has A strong image sense can improve the vividness of the search results; and, the number of key frame images matches the number of answer points, which can more comprehensively cover the answers to the search questions; based on this, display on the search results page The video cover of the video can comprehensively, intuitively and accurately express the user's willingness to search for the video, so that the user can quickly find the content they need to watch from the video cover, so, in response to the detection of the trigger operation on the video cover, the video is played. Effectively improve the video search efficiency.

Considering that the arrangement of key frame images has a great influence on the display effect of the composed video cover; based on this, this embodiment provides a display method of the video cover: the arrangement order of multiple key frame images in the video cover Match the order of answer points corresponding to the search question. The video cover uses orderly arranged key frame images to highlight the order of the key points of the answer, increasing the matching between the video cover and the answer to the search question.

In practical applications, there are generally multiple videos obtained by searching based on the search question, and the proportions of the videos are not necessarily the same; for example, videos are divided into horizontal and vertical videos. The video cover is usually taken from the video frame in the video. In this case, the size of the video cover is different for different videos. When the video cover of multiple videos is displayed on the same search result page at the same time, the layout of the display interface will be compared. confusion. However, if the video covers with different ratios are simply unified in size, for example, the vertical version of the video is displayed as a horizontal version of the cover, the effect of the horizontal version of the cover is not good and it is difficult to guarantee the image quality.

Therefore, in order to improve the display mode of the video cover on the search result page and improve the friendliness of the interface display, this embodiment provides a key frame image, which is the text containing the key points of the answer intercepted from the video frame of the video Image of content and image content. Among them, the text content of the answer points contained in the key frame image is recognized from the video frame of the video, and the image content of the answer points contained in the key frame image is matched from the video frame of the video based on the text content of the answer points of.

In practical applications, the key frame image in this embodiment is generated by the server. For better understanding, here is a description of how to obtain the key frame image for each key point of the answer. Referring to Fig. 2, the process of processing the key frame images containing the text content and image content of the answer points from the video includes the following steps S202-S208:

Step S202, performing character recognition processing on the video to obtain the text content of the key points of the answer.

Most of the videos have subtitles, and the subtitles can reflect the video content more accurately. Therefore, in a specific embodiment, character recognition processing can be performed on each video frame in the video by OCR (Optical Character Recognition, Optical Character Recognition) technology or a detection model. Get the candidate character recognition results on each video frame; judge whether the candidate character recognition results can match any of the answer points; The character recognition result matches the search question, and can express the main points of the answer to the search question, so that the candidate character recognition result that can successfully match the answer key points is determined as the target character recognition result, and the target character recognition result or the key in the target character recognition result words as the text content of the answer points.

Generally speaking, the video shows the complete process of things going on, and the answer determined based on the video should also include multiple logical answer points. For example, tutorial videos such as food, handicrafts, and fitness. The displayed content usually includes multiple answer points, and each answer point corresponds to a key step. Therefore, corresponding to the key points of the answer, there are usually multiple text contents obtained according to the above embodiment.

Step S204, based on the text content of the answer points, identify the image content matching the text content from the video frame of the video.

When the total number of frames of the video is small, the image content matching the text content can be identified one by one from the video frames of the video.

When the total number of frames of the video is large, the information contained in the video will be richer and more diverse. In order to improve the recognition efficiency, this embodiment can first segment the video to obtain a plurality of video segments, and then from at least some of the video segments. Identify the image content in the frame that matches the text content. For specific implementation, reference may be made to the following examples of steps (1) and (2).

(1) Based on the candidate character recognition results obtained by character recognition processing of the video, calculate the correlation between the characters of two consecutive video frames. When the calculated correlation is less than the predetermined correlation threshold, based on the above two consecutive video frames The position of the video is segmented to obtain multiple video segments containing characters.

(2) Based on the text content of the answer gist, identify the image content matching the text content from the video frames of the video clip.

Considering that the chronological sequence of the video playback is consistent with the sequence of the key points of each answer, based on this, the chronological sequence of each video clip is also related to the sequence of the key points of each answer. In this case, at least one video segment may be assigned to each answer point; for the current answer point, based on the text content of the current answer point, identifying a video frame matching the text content from the video frames of the video segment assigned to the current answer point image content. The specific implementation may include: identifying the image content in each video frame in the video clip one by one, and judging whether the image recognition result matches the text content of the current answer point, and if so, obtaining the image content matching the text content.

Certainly, if the image content matching the text content of the current answer main point cannot be identified from the assigned video clip, then the image content matching the text content of the current answer main point is identified from other video clips. This method does not need to compare all the video frames of the video with the text content of each answer point. For each answer point, it only needs to compare with the video frames in some video clips, and since each answer point is assigned a Respective video clips, so multiple answer points can simultaneously identify the image content. Therefore, this method can effectively improve the efficiency of image content identification.

Step S206, intercepting image content from the video frame.

In this embodiment, after the image content matching the text content is identified from the video frame, the corresponding video frame is intercepted to obtain an intercepted image including the image content.

Step S208, based on the text content and image content of the key points of the answer, a key frame image containing the key points of the answer is generated.

In order to preserve the text content and image content of the key points of the answer more completely, this embodiment can generate a key frame image containing the key points of the answer in the following manner, including:

Perform target detection on the intercepted image, and obtain the bounding box of the text content and image content containing the key points of the answer in the intercepted image. Expand and adjust the bounding box according to the preset aspect ratio; among them, setting the aspect ratio is to facilitate the splicing process of key frame images and improve the size mismatch between images during the splicing process; and, expanding adjustment refers to, when Compared with the preset aspect ratio, when a size parameter of the bounding box is small, increase the size of the smaller size parameter, so that the adjusted aspect ratio of the bounding box is the same as the preset aspect ratio. For example, when the width of the bounding box is smaller than the preset aspect ratio, the size of the bounding box is adjusted by increasing the width of the bounding box. The enlarged adjustment method will not cause the problem of cutting out partial text content or partial image content, which ensures the integrity of text content and image content. . Based on the adjusted position parameters of the bounding box in the intercepted image, the image determined by the position parameters is intercepted from the intercepted image to obtain a key frame image containing the key points of the answer.

According to any one of the above embodiments, after the multiple key frame images corresponding to the multiple answer points are extracted from the video, the video cover of the video is generated based on the multiple key frame images corresponding to the multiple answer points. The implementation of this step can refer to the following:

Based on the order of the multiple answer points in the answer, the arrangement sequence of the multiple key frame images corresponding to the multiple answer points in the video cover is determined. Specifically, the image content corresponding to the text content of each answer point is sorted according to the order of the answer points in the answer, and the sorting result of the image content is used as the arrangement order between the key frame images.

Based on the sizes of the multiple key frame images and the arrangement order among the multiple key frame images, a corresponding splicing template is determined. In a specific implementation, the splicing template may be determined based on the number, arrangement order and size of the key frame images. For example, if the number of key frame images is two images with an arrangement order of front and rear, then, when determining the splicing template, select a splicing template that reflects the arrangement order in the ways of up-down, left-right, etc., and can splice two images. The width-to-height ratio of each key-frame image is determined according to the size of the key-frame image, and a stitching template that matches the width-to-height ratio can be selected when determining a splicing template. Those skilled in the art may also determine the splicing template based on other image parameters according to actual needs, which is not specifically limited here. The mosaic template in this embodiment may be preset in a mosaic template library for calling.

Based on the size of each area in the mosaic template, the key frame images corresponding to each area are scaled proportionally; and the scaled key frame images are inserted into the corresponding areas of the template to obtain the video cover of the video. For the current region, this embodiment calculates the scaling ratio according to the size of the current region and the size of the key frame image corresponding to the current region; performs proportional scaling on the key frame image according to the calculated scaling ratio; inserts the zoomed key frame image into into the current region.

According to the above embodiment, Fig. 3 provides several examples of displaying video cover on the search result page. The left picture in Figure 3 shows the video cover obtained after the key frame image extraction method based on the text content extracts three key frame images, and then intercepts and stitches the key frame images. It can be seen that each key frame image of the video cover shown in the left picture of Figure 3 includes the key points of the answer to the search question in the search box (how to delete mobile phone garbage) and the content of this article corresponding to the key points of the answer (such as the subtitle: "Correctly clean up Mobile phone memory") and image content; at the same time, the order of arrangement of multiple key frame images from left to right and from top to bottom in the video cover matches the order of the answer points corresponding to the search question. Each key frame image of the video cover shown in the right figure of Figure 3 contains the image content associated with the key points of the answer to the search question, which are: minced meat, egg liquid, and steamed egg with minced meat.

In this embodiment, the video information corresponding to the search question acquired by the terminal device may also include a time stamp, which is used to indicate the time on the video playing time axis of the key frame image in the video cover.

Based on this, this embodiment provides a method for playing a video in response to detecting a trigger operation on the video cover, including:

In response to detecting a trigger operation on the first key frame image on the video cover, determine the time mark corresponding to the first key frame image on the playback time axis of the video; jump to the play page of the video, and start playing the video from the time mark .

The terminal device judges whether a trigger operation is detected; the trigger operation is an operation aimed at solving the first key frame image of the video cover on the search result page. For example, when the display screen of the terminal device is a touch screen, the trigger operation can be a , a touch operation of an operating body such as a stylus; when the input device of the terminal device is a mouse, the triggering operation may be a click operation of the user on the search result page through the mouse.

If a trigger operation is detected, in response to the trigger operation, determine the time mark corresponding to the first key frame image on the playback time axis of the video; jump to the video playback page, and start playing the video from the time mark, that is, based on the time mark , start playing the video with the first keyframe as the starting frame.

For ease of understanding, an embodiment of generating a time stamp by the server is given below, including: first determining the time stamp corresponding to each key frame image on the playing time axis of the video.

In this embodiment, various implementation manners may be adopted to determine the time identifier corresponding to each key frame image, and the following two manners are taken as examples.

Implementation method 1: For each key frame image, match the key frame image with the video frame in the video, determine the target video frame in the video frame that matches the key frame image; determine the time identifier corresponding to the target video frame as the key The time mark corresponding to the frame image on the playback time axis of the video.

Implementation mode 2: According to the mapping relationship between each video frame and the playing time on the playing time axis of the video, the corresponding time identifier of each key frame image on the playing time axis of the video is determined.

After the time stamp corresponding to each key frame image is determined according to the above method, the corresponding relationship between each key frame image and the time stamp is added to the video information.

To sum up, in the search method for question and answer results provided by the above disclosed embodiments, the terminal device obtains video information corresponding to the search question from the server; the video information includes: the video cover that can be displayed on the search result page, and the video that can specify the video playback position. time stamp. Among them, the video cover is composed of multiple key frame images, and the composition method can be flexible and changeable, which enriches the display mode of the video cover; the arrangement order of the key frame images matches the answer points corresponding to the search question, reflecting a certain logic Because each key frame image that makes up the video cover includes text content and image content to answer the search question, the text content is more in line with the user's daily reading habits and can improve the intuitiveness of the search results, while the image content has a strong The sense of picture can improve the vividness of search results. Therefore, the video cover displayed on the search result page can improve the friendliness of the search result page display, comprehensively, intuitively and accurately express the user's willingness to search for the video, so that the user can quickly find the content they need to watch from the video cover , thus, playing the video in response to the trigger operation effectively improves the video search efficiency. For the time stamp in the video information, it is convenient for the user to quickly jump to the desired video position, which improves the user experience.

Embodiment two:

According to above embodiment one, present embodiment can also provide a kind of search method of question and answer result, and this method is applied to server; As shown in Figure 4, this method comprises:

Step S402, receiving a search request from the terminal device, where the search request includes a search question;

Step S404, based on the search question, search to obtain a video containing an answer to the search question;

Step S406, based on the audio signal or text content of the video, determine multiple answer points of the answer;

Step S408, for each key point of the answer, process the key frame image containing the text content and image content of the key point of the answer from the video;

Step S410, generating a video cover of the video based on multiple key frame images corresponding to multiple answer points;

Step S412, feeding back the video information including the video cover to the terminal device.

In one embodiment, the step of obtaining key frame images containing text content and image content of answer points from video processing includes:

Perform character recognition processing on the video to obtain the text content of the key points of the answer; based on the text content of the key points of the answer, identify the image content that matches the text content from the video frame of the video; intercept the image content from the video frame; text based on the key points of the answer content and image content, generating a keyframed image containing the key points of the answer.

In one embodiment, based on multiple key frame images corresponding to multiple answer points, the step of generating the video cover of the video includes:

Based on the order of the multiple answer points in the answer, determine the arrangement order of the multiple key frame images corresponding to the multiple answer points in the video cover; based on the size of the multiple key frame images and the arrangement between the multiple key frame images According to the layout sequence, the corresponding splicing template is determined; based on the size of each area in the splicing template, the key frame images corresponding to each area are scaled proportionally; the scaled key frame images are inserted into the corresponding areas of the template to obtain the video image Video cover.

In one embodiment, before the step of feeding back the video information including the video cover to the terminal device, the above method further includes:

Determining the corresponding time marks of each key frame image on the playing time axis of the video; adding the corresponding relationship between each key frame image and the time mark to the video information.

In one embodiment, the step of determining the time mark corresponding to each key frame image on the playback time axis of the video includes:

For each key frame image, the key frame image is matched with the video frame in the video, and the target video frame matching the key frame image in the video frame is determined; the time identifier corresponding to the target video frame is determined as the key frame image in the video The corresponding time mark on the playing time axis of .

According to the mapping relationship between each video frame and the playing time on the playing time axis of the video, the time identifier corresponding to each key frame image on the playing time axis of the video is determined.

In the search method for question and answer results provided by the embodiments of the present disclosure, the server searches for a video containing the answer to the search question, so that the obtained video has a high degree of matching with the answer to the search question, and then generates video information based on the video, which can Improve the matching degree between the video information and the answer to the search question; then based on the key frame image determined by the video, it contains the text content and image content of the key points of the answer, which can intuitively and vividly reflect the key points of the answer; furthermore, based on multiple key frame images The video cover generated by the frame image can fully, intuitively and accurately express the user's willingness to search for the video, and feed back the video information including the video cover to the terminal device, so that users can quickly find the content they need to watch through the video cover. In the video search function Significantly improve user experience.

The implementation principle and technical effects of the method provided in this embodiment are the same as those in the first embodiment. For brief description, for the part not mentioned in this embodiment, you can refer to the corresponding content in the first embodiment.

Embodiment three:

According to the first and second embodiments above, this embodiment can also provide a search method for question and answer results, the method comprising:

Step 1, the terminal device sends a search request to the server, and the search request includes a search question.

Step 2, the server receives the search request from the terminal device, and based on the search question in the search request, searches to obtain a video containing an answer to the search question.

Step 3, based on the search question, the server searches to obtain a video containing an answer to the search question.

In step 4, the server determines multiple key points of the answer based on the audio signal or the text content of the video.

Step 5, for each key point of the answer, the server processes the video to obtain a key frame image containing text content and image content of the key point of the answer.

Step 6: The server generates a video cover of the video based on multiple key frame images corresponding to multiple answer points; the video cover is composed of multiple key frame images of the video, and each key frame image includes text content and image content for answering the search question , the number of keyframe images matches the number of answer points that answer the search question.

In step 7, the server determines the corresponding time mark of each key frame image on the playing time axis of the video; and adds the corresponding relationship between each key frame image and time mark to the video information.

In step 8, the server feeds back the video information including the video cover to the terminal device.

Step 9, the terminal device acquires video information corresponding to the search question.

Step 10, the terminal device displays the video cover on the search result page.

Step 11, the terminal device determines the time mark corresponding to the first key frame image on the playback time axis of the video in response to detecting a trigger operation on the first key frame image on the video cover;

Step 12, the terminal device jumps to the video playing page, and starts playing the video from the time mark.

The implementation principle and technical effect of the method provided in this embodiment are the same as those of the first and second embodiments described above. For a brief description, for the parts not mentioned in this embodiment, you can refer to the first and second embodiments above. Corresponding content in the second.

Embodiment four:

According to the first embodiment above, the present disclosure also provides a search device for question and answer results, which can be applied to a terminal device, as shown in FIG. 5 , the device includes:

The information obtaining module 502 is used to obtain video information corresponding to the search question; wherein, the video information includes a video cover, and the video cover is composed of a plurality of key frame images of the video, and each key frame image includes text content and images to answer the search question Content, where the number of keyframed images matches the number of answer points that answer the search question;

Cover display module 504, for displaying the video cover in the search result page;

The video playing module 506 is configured to play the video in response to detecting a trigger operation on the video cover.

In one embodiment, the above-mentioned video playing module 506 is specifically used for:

Embodiment five:

According to the second embodiment above, the present disclosure also provides a search device for question and answer results, which can be applied to a server, as shown in FIG. 6 , the device includes:

A request receiving module 602, configured to receive a search request from a terminal device, where the search request includes a search question;

A video search module 604, configured to search for a video containing an answer to the search question based on the search question;

Key point determination module 606, for determining multiple answer points of the answer based on the audio signal or text content of the video;

Image obtains module 608, is used for each answer main point, obtains the key frame image that contains the text content of answer main point and image content from video processing;

Cover generation module 610, for generating the video cover of video based on multiple key frame images corresponding to multiple answer points;

The information feedback module 612 is configured to feed back the video information including the video cover to the terminal device.

In one embodiment, the image obtaining module 608 includes:

The text content obtaining unit is used to perform character recognition processing on the video to obtain the text content of the key points of the answer;

An image content recognition unit, configured to identify image content matching the text content from the video frame of the video based on the text content of the answer points;

An image content intercepting unit, configured to intercept image content from a video frame;

The image generating unit is configured to generate a key frame image containing the key points of the answer based on the text content and the image content of the key points of the answer.

In one embodiment, the image obtaining module 608 includes:

A sequence determination unit is used to determine the arrangement order of multiple key frame images corresponding to multiple answer points in the video cover based on the order of multiple answer points in the answer;

A template determination unit, configured to determine a corresponding splicing template based on the size of the multiple key frame images and the arrangement sequence between the multiple key frame images;

A scaling processing unit, configured to perform proportional scaling processing on the key frame images corresponding to each area based on the size of each area in the splicing template;

In one embodiment, the above-mentioned device also includes:

A time stamp determining module is used to determine the corresponding time stamp of each key frame image on the playback time axis of the video;

In one embodiment, the time stamp determination module includes:

The matching unit is used for matching the key frame image with the video frame in the video for each key frame image, and determining the target video frame matching the key frame image in the video frame;

In one embodiment, the time identification determination module includes: a second identification determination unit, configured to determine each key frame image during the playback of the video according to the mapping relationship between each video frame and the playback time on the playback time axis of the video. The corresponding timestamp on the time axis.

The implementation principle and technical effects of the device provided in this embodiment are the same as those of the first to third embodiments described above. For brief description, for parts not mentioned in this embodiment, refer to the corresponding content in the first to third embodiments above.

Based on the foregoing embodiments, this embodiment provides a terminal device, including: a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the above-mentioned embodiment 1 to 3 method in .

This embodiment also provides a computer-readable storage medium, in which a computer program is stored. When the computer program is executed by a processor, the processor executes the methods in the first to third embodiments above.

It should be noted that in this article, relative terms such as "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these No such actual relationship or order exists between entities or operations. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

The above descriptions are only specific implementation manners of the present disclosure, so that those skilled in the art can understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present disclosure. Therefore, the present disclosure will not be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

A search method for question and answer results, comprising:

Acquiring video information corresponding to the search question; wherein the video information includes a video cover, the video cover is composed of a plurality of key frame images of the video, each of the key frame images includes text content and image content, the number of keyframe images matching the number of answer points to the search question;

display the cover art of said video in the search results page;

Playing the video in response to detecting a trigger operation on the video cover.
According to the method according to claim 1, the arrangement order of the plurality of key frame images in the video cover matches the order of key points of the answers corresponding to the search question.
According to the method according to any one of claims 1-2, the key frame image is an image including the text content and the image content of the key points of the answer obtained by intercepting from the video frame of the video.
According to the method according to any one of claims 1-3, the text content of the answer points contained in the key frame image is recognized from the video frame of the video, and the key frame image contains The image content of the answer point is obtained by matching from the video frame of the video based on the text content of the answer point.
The method according to any one of claims 1-4, the playing the video in response to detecting a trigger operation on the video cover, comprising:

In response to detecting a trigger operation on the first key frame image on the cover of the video, determine the time mark corresponding to the first key frame image on the playing time axis of the video;

Jump to the playing page of the video, and start playing the video from the time mark.
A search method for question and answer results, comprising:

receiving a search request from a terminal device, where the search request includes a search question;

Based on the search question, searching for videos containing answers to the search question;

determining a plurality of answer points of the answer based on the audio signal or the text content of the video;

For each answer point, process a key frame image containing text content and image content of the answer point from the video;

Generate a video cover of the video based on a plurality of key frame images corresponding to the plurality of answer points;

Feedback the video information including the video cover to the terminal device.
The method according to claim 6, said processing obtains the key frame image containing the text content and image content of said answer main points from said video, comprising:

Perform character recognition processing on the video to obtain the text content of the answer points;

identifying, from video frames of the video, image content matching the text content based on the text content of the answer points;

intercepting the image content from the video frame;

Based on the text content of the answer points and the image content, a key frame image containing the answer points is generated.
According to the method according to claim 6 or 7, said multiple key frame images corresponding to said multiple answer points, generating a video cover of said video, comprising:

Based on the order of the multiple answer points in the answer, determine the arrangement order of the multiple key frame images corresponding to the multiple answer points in the video cover;

Based on the size of the plurality of key frame images and the arrangement order among the plurality of key frame images, determine a corresponding splicing template;

Based on the size of each area in the mosaic template, the key frame images corresponding to each area are scaled proportionally;

Inserting the scaled key frame image into the corresponding area of the template to obtain the video cover of the video.
According to the method according to any one of claims 6-8, before feeding back the video information including the video cover to the terminal device, further comprising:

Determine the time mark corresponding to each key frame image on the playback time axis of the video;

The corresponding relationship between each key frame image and the time stamp is added to the video information.
The method according to claim 9, said determining the time mark corresponding to each key frame image on the playing time axis of said video, comprising:

For each key frame image, match the key frame image with video frames in the video, and determine a target video frame in the video frame that matches the key frame image;

Determine the time identifier corresponding to the target video frame as the time identifier corresponding to the key frame image on the playing time axis of the video.
According to the method according to any one of claims 9-10, said determining the time mark corresponding to each key frame image on the playing time axis of said video comprises:

According to the mapping relationship between each video frame and the playing time on the playing time axis of the video, determine the corresponding time mark of each key frame image on the playing time axis of the video.
A search device for question and answer results, comprising:

The information acquisition module is configured to acquire video information corresponding to the search question; wherein, the video information includes a video cover, and the video cover is composed of a plurality of key frame images of the video, and each of the key frame images includes an answer. The text content and image content of the search question, the number of key frame images matches the number of answer points to answer the search question;

A cover display module configured to display the video cover on the search result page;

The video playing module is configured to play the video in response to detecting a trigger operation on the video cover.
According to the device according to claim 12, the arrangement order of the plurality of key frame images in the video cover matches the order of key points of the answers corresponding to the search question.
The device according to any one of claims 12-13, wherein the key frame image is an image including the text content and the image content of the key points of the answer obtained by intercepting from a video frame of the video.
According to the device according to any one of claims 12-14, the text content of the answer points contained in the key frame image is recognized from the video frame of the video, and the key frame image contains The image content of the answer point is obtained by matching from the video frame of the video based on the text content of the answer point.
According to the device according to any one of claims 12-15, the video playback module is configured to:

In response to detecting a trigger operation on the first key frame image on the cover of the video, determine the time mark corresponding to the first key frame image on the playing time axis of the video;

Jump to the playing page of the video, and start playing the video from the time mark.
A search device for question and answer results, comprising:

A request receiving module configured to receive a search request from a terminal device, where the search request includes a search question;

A video search module configured to search for videos containing answers to the search questions based on the search questions;

A point determining module configured to determine a plurality of answer points of the answer based on the audio signal or the text content of the video;

The image obtaining module is configured to process and obtain a key frame image containing text content and image content of the answer points from the video for each answer point;

The cover generation module is configured to generate a video cover of the video based on a plurality of key frame images corresponding to the plurality of answer points;

An information feedback module configured to feed back video information including the video cover to the terminal device.
The device according to claim 17, said image obtaining module comprising:

The text content obtaining unit is configured to perform character recognition processing on the video to obtain the text content of the answer points;

an image content identification unit configured to identify, from video frames of the video, image content matching the text content based on the text content of the answer points;

an image content intercepting unit configured to intercept the image content from the video frame;

An image generation unit configured to generate a key frame image containing the answer points based on the text content of the answer points and the image content.
The device according to claim 17 or 18, said image obtaining module comprising:

An order determination unit configured to determine the arrangement order of the plurality of key frame images corresponding to the plurality of answer points in the video cover based on the order of the plurality of answer points in the answer;

The template determination unit is configured to determine a corresponding splicing template based on the size of the plurality of key frame images and the arrangement sequence among the plurality of key frame images;

The scaling processing unit is configured to perform proportional scaling processing on the key frame images corresponding to each area based on the size of each area in the mosaic template;

The image inserting unit is configured to insert the scaled key frame image into the corresponding area of the template to obtain the video cover of the video.
The device according to any one of claims 17-19, further comprising:

A time stamp determination module configured to determine the time stamp corresponding to each key frame image on the playing time axis of the video;

The relationship adding module is configured to add the corresponding relationship between each key frame image and the time stamp to the video information.
The device according to claim 20, the time stamp determining module comprising:

A matching unit configured to, for each key frame image, match the key frame image with a video frame in the video, and determine a target video frame in the video frame that matches the key frame image;

The first identifier determining unit is configured to determine the time identifier corresponding to the target video frame as the time identifier corresponding to the key frame image on the playing time axis of the video.
According to the device according to any one of claims 20-21, the time stamp determining module comprises:

The second identifier determination unit is configured to determine the time identifier corresponding to each key frame image on the playback time axis of the video according to the mapping relationship between each video frame and the playback time on the playback time axis of the video.
A terminal device comprising:

A memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor performs the method according to any one of claims 1-5, or Performing the method as described in any one of claims 6-11.
A computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the processor executes the method according to any one of claims 1-5, or Performing the method as described in any one of claims 6-11.
A computer program product, the computer program product comprising a computer program/instruction, when the computer program/instruction is executed by a processor, the method according to any one of claims 1-5 is executed, or the method according to claim 6 is executed - the method described in any one of 11.