Disclosure of Invention
In order to solve the technical problems described above or at least partially solve the technical problems, the present disclosure provides a method, an apparatus, a device, and a storage medium for searching question and answer results.
The present disclosure provides a method for searching question and answer results, including:
acquiring video information corresponding to the search problem; the video information comprises a video cover, the video cover is composed of a plurality of key frame images of a video, each key frame image comprises text content and image content for answering the search question, and the number of the key frame images is matched with the number of key points for answering the search question; displaying the video cover in a search results page; and responding to the detected triggering operation of the video cover to play the video.
Optionally, the arrangement sequence of the plurality of key frame images in the video cover is matched with the answer main point sequence corresponding to the search question.
Optionally, the key frame image is an image that is obtained by capturing a video frame of the video and contains text content and image content of the answer key point.
Optionally, the text content of the answer key point included in the key frame image is identified from the video frame of the video, and the image content of the answer key point included in the key frame image is matched from the video frame of the video based on the text content of the answer key point.
Optionally, the playing the video in response to detecting the triggering operation on the video cover includes: in response to detecting a trigger operation on a first key frame image on the video cover, determining a corresponding time identifier of the first key frame image on a playing time axis of the video; jumping to the playing page of the video, and starting to play the video from the time mark
The present disclosure also provides a method for searching question and answer results, including:
receiving a search request of terminal equipment, wherein the search request comprises a search problem;
searching and obtaining a video containing answers to the search questions based on the search questions;
determining a plurality of answer key points of the answer based on the audio signal or text content of the video;
processing each answer key point from the video to obtain a key frame image containing text content and image content of the answer key point;
generating a video cover of the video based on a plurality of key frame images corresponding to the plurality of answer key points;
and feeding back the video information containing the video cover to the terminal equipment.
Optionally, processing the video to obtain a key frame image including text content and image content of the answer key point, including:
performing character recognition processing on the video to obtain text content of the key points of the answer; identifying image content matched with the text content from video frames of the video based on the text content of the answer key points; intercepting the image content from the video frame; and generating a key frame image containing the answer key points based on the text content and the image content of the answer key points.
Optionally, the generating a video cover of the video based on a plurality of key frame images corresponding to the plurality of answer key points includes:
determining the arrangement sequence of a plurality of key frame images corresponding to a plurality of answer key points in the video cover based on the sequence of the answer key points in the answer; determining a corresponding splicing template based on the sizes of the plurality of key frame images and the arrangement sequence among the plurality of key frame images; based on the size of each region in the splicing template, carrying out equal-scale scaling processing on the key frame image corresponding to each region; and inserting the zoomed key frame image into the corresponding area of the template to obtain the video cover of the video.
Optionally, before feeding back the video information including the video cover to the terminal device, the method further includes: determining a corresponding time identifier of each key frame image on the playing time axis of the video; and adding the corresponding relation between each key frame image and the time mark into the video information.
Optionally, the determining the corresponding time identifier of each key frame image on the playing time axis of the video includes:
for each key frame image, matching the key frame image with a video frame in the video, and determining a target video frame matched with the key frame image in the video frame; and determining the time identifier corresponding to the target video frame as the time identifier corresponding to the key frame image on the playing time axis of the video.
Optionally, the determining the corresponding time identifier of each key frame image on the playing time axis of the video includes: and determining the corresponding time identifier of each key frame image on the playing time axis of the video according to the mapping relation between each video frame and the playing time on the playing time axis of the video.
The present disclosure also provides a device for searching question and answer results, including:
the information acquisition module is used for acquiring video information corresponding to the search problem; the video information comprises a video cover, the video cover is composed of a plurality of key frame images of a video, each key frame image comprises text content and image content for answering the search question, and the number of the key frame images is matched with the number of key points for answering the search question;
the cover display module is used for displaying the video cover in a search result page;
and the video playing module is used for responding to the detection of the triggering operation of the video cover to play the video.
Optionally, the arrangement sequence of the plurality of key frame images in the video cover is matched with the answer main point sequence corresponding to the search question.
Optionally, the key frame image is an image that is obtained by capturing a video frame of the video and contains text content and image content of the answer key point.
Optionally, the text content of the answer key point included in the key frame image is identified from the video frame of the video, and the image content of the answer key point included in the key frame image is matched from the video frame of the video based on the text content of the answer key point.
Optionally, the video playing module is configured to: in response to detecting a trigger operation on a first key frame image on the video cover, determining a corresponding time identifier of the first key frame image on a playing time axis of the video; jumping to a playing page of the video, and starting to play the video from the time identifier.
The present disclosure also provides a device for searching question and answer results, including:
the device comprises a request receiving module, a search processing module and a search processing module, wherein the request receiving module is used for receiving a search request of terminal equipment, and the search request comprises a search problem;
the video searching module is used for searching and obtaining a video containing answers to the searching questions based on the searching questions;
the key point determining module is used for determining a plurality of key point of the answer based on the audio signal or the text content of the video;
the image obtaining module is used for processing each answer key point from the video to obtain a key frame image containing text content and image content of the answer key point;
the cover generation module is used for generating a video cover of the video based on a plurality of key frame images corresponding to the plurality of answer key points;
and the information feedback module is used for feeding back the video information containing the video cover to the terminal equipment.
Optionally, the image obtaining module includes:
a text content obtaining unit, configured to perform character recognition processing on the video to obtain text content of the answer key;
the image content identification unit is used for identifying image content matched with the text content from video frames of the video based on the text content of the answer key point;
an image content intercepting unit for intercepting the image content from the video frame;
and the image generating unit is used for generating a key frame image containing the answer key points based on the text content and the image content of the answer key points.
Optionally, the image obtaining module includes:
the order determining unit is used for determining the arrangement order of a plurality of key frame images corresponding to a plurality of answer key points in the video cover based on the order of the answer key points in the answer;
the template determining unit is used for determining a corresponding splicing template based on the sizes of the plurality of key frame images and the arrangement sequence among the plurality of key frame images;
the zooming processing unit is used for carrying out equal-scale zooming processing on the key frame images corresponding to the regions based on the sizes of the regions in the splicing template;
and the image inserting unit is used for inserting the zoomed key frame image into the corresponding area of the template to obtain the video cover of the video.
Optionally, the apparatus further comprises:
the time identifier determining module is used for determining the corresponding time identifier of each key frame image on the playing time axis of the video;
and the relationship adding module is used for adding the corresponding relationship between each key frame image and the time identifier into the video information.
Optionally, the time identifier determining module includes:
the matching unit is used for matching the key frame images with video frames in the video aiming at each key frame image and determining a target video frame matched with the key frame image in the video frames;
and the first identification determining unit is used for determining the time identification corresponding to the target video frame as the time identification corresponding to the key frame image on the playing time axis of the video.
Optionally, the time identifier determining module includes: and the second identifier determining unit is used for determining the corresponding time identifier of each key frame image on the playing time axis of the video according to the mapping relation between each video frame and the playing time on the playing time axis of the video.
The present disclosure also provides a terminal device, including: a memory in which a computer program is stored and a processor, wherein the processor performs the method as described above when the computer program is executed by the processor.
The present disclosure also provides a computer-readable storage medium having stored therein a computer program which, when executed by a processor, performs the method as described above.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
the present disclosure provides a method, apparatus, device and storage medium for searching question and answer results, the method obtains video information corresponding to a search question; the video information comprises a video cover, the video cover is composed of a plurality of key frame images of a video, each key frame image comprises text content and image content for answering a search question, and the number of the key frame images is matched with the number of key points for answering the search question; displaying a video cover page in a search result page; and playing the video in response to detecting the triggering operation on the video cover. The method for searching the question and answer results can be exemplarily applied to video search scenes and the like of the terminal equipment. In the technical scheme, each key frame image forming the video cover comprises text content and image content for solving the search problem, wherein the text content is more in line with the daily reading habit of a user, the intuition of the search result can be improved, the image content has stronger picture feeling, and the vividness of the search result can be improved; the number of the key frame images is matched with the number of the key points of the answers, so that the answers under the search questions can be more comprehensively covered; based on this, the video cover displayed in the search result page can comprehensively, intuitively and accurately express the video searching desire of the user, so that the user can quickly find the content needing to be watched by the user from the video cover, and then the video is played in response to the detection of the triggering operation of the video cover, thereby effectively improving the video searching efficiency.
The method comprises the steps of firstly receiving a search request of terminal equipment, wherein the search request comprises a search question; then searching and obtaining a video containing answers to the search questions based on the search questions; determining a plurality of answer key points of the answers based on the audio signals or the text contents of the videos; processing each answer key point from the video to obtain a key frame image containing text content and image content of the answer key point; generating a video cover of the video based on a plurality of key frame images corresponding to a plurality of answer key points; and finally, feeding back the video information containing the video cover to the terminal equipment. The above method for searching the question and answer results can be exemplarily applied to a video search scene of a server and the like. In the technical scheme, the video containing the answer to the search question is searched, so that the obtained video and the answer to the search question have higher matching degree, and the key frame image determined based on the video contains the text content and the image content of the key point of the answer, so that the key point of the answer can be visually and vividly represented; furthermore, the video cover generated based on the plurality of key frame images can comprehensively, intuitively and accurately express the video searching desire of the user, and the video information containing the video cover is fed back to the terminal equipment, so that the user can quickly find the content needing to be watched through the video cover, and the user experience is obviously improved on the video searching function.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
In a video application provided by the related art, a user can search for a related video to play. However, in the video search result, the user needs to spend time finding the content needed by himself to watch, which reduces the search efficiency. Based on this, the embodiments of the present disclosure provide a method, an apparatus, a device, and a storage medium for searching question and answer results, and the embodiments of the present disclosure are described in detail below.
The first embodiment is as follows:
referring to a flowchart of a method for searching question and answer results shown in fig. 1, the method is applicable to terminal devices with video applications, such as mobile phones, tablet computers and the like. The method for searching the question and answer results comprises the following steps:
step S102, video information corresponding to the search question is obtained; the video information comprises a video cover, the video cover is composed of a plurality of key frame images of the video, each key frame image comprises text content and image content of the answering search question, and the number of the key frame images is matched with the number of answer key points of the answering search question.
In one embodiment, a user sends a search request including a search question to a server through a terminal device; wherein, the search question may contain at least one keyword.
For ease of understanding, one embodiment of the server feeding back video information corresponding to the search problem is given below.
Firstly, searching and obtaining a video containing answers to a search question by a server based on the search question; in practice, the server may search for an answer matching the keyword in the search question based on the search question, and then search for a video containing the answer from a preset video resource library.
Then, a plurality of answer key points of the answer are determined based on the audio signal or text content of the video. The server may identify content of the video, such as text content, audio signals, image content, play time, and the like of the video, so as to determine a plurality of answer points of the answer based on the identified content. Processing each answer key point from the video to obtain a key frame image containing text content and image content of the answer key point; specifically, for example, a key frame image matching the answer gist is determined from the video using the recognized text content and image content.
Then, based on a plurality of key frame images corresponding to a plurality of answer key points, a video cover of the video is generated, and video information containing the video cover is fed back to the terminal equipment. The video cover can be a splicing processing result of each key frame image, the splicing mode of each key frame image is flexible and changeable, and the video cover formed by the key frame images can present a better page display effect.
And step S104, displaying the video cover in the search result page.
The terminal device receives the video information and displays a video cover page included in the video information in a search result page.
And step S106, playing the video in response to the detection of the triggering operation on the video cover.
In this embodiment, the terminal device determines whether a trigger operation for the video cover is detected; and if the trigger operation is detected, playing the video in response to the trigger operation.
According to the method for searching the question and answer result, the video information corresponding to the search question is obtained; the video information comprises a video cover composed of a plurality of key frame images of the video, each key frame image comprises text content and image content of the answering search question, and the number of the key frame images is matched with the number of answer key points of the answering search question; displaying a video cover page in a search result page; and playing the video in response to detecting the triggering operation on the video cover. In the technical scheme, each key frame image forming the video cover comprises text content and image content for solving the search problem, wherein the text content is more in line with the daily reading habit of a user, the intuition of the search result can be improved, the image content has stronger picture feeling, and the vividness of the search result can be improved; the number of the key frame images is matched with the number of the key points of the answers, so that the answers under the search questions can be more comprehensively covered; based on the method, the video cover displayed in the search result page can comprehensively, intuitively and accurately express the video searching intention of the user, so that the user can quickly find the content to be watched from the video cover, and the video is played in response to the detection of the triggering operation of the video cover, thereby effectively improving the video searching efficiency.
The arrangement mode of the key frame images is considered to have great influence on the display effect of the formed video cover; based on this, this embodiment provides a video cover's show mode: the arrangement sequence of the plurality of key frame images in the video cover is matched with the answer key point sequence corresponding to the search question. The video cover highlights the order of the key points of the answers through the orderly arranged key frame images, and the matching between the video cover and the answers of the search questions is improved.
In practical application, a plurality of videos are generally obtained by searching based on a search problem, and the proportion of the videos is not necessarily the same; for example, videos are divided into landscape videos and portrait videos. The video covers are usually taken from video frames in the videos, in this case, the sizes of the video covers corresponding to different videos are different, and when the video covers of multiple videos are simultaneously displayed in the same search result page, the layout of the display interface is disordered. And if the sizes of the video covers with different proportions are simply unified, for example, the vertical video is displayed by the horizontal display cover, the effect of the horizontal display cover is not good and the image quality is difficult to guarantee.
Therefore, in order to improve the display mode of the video cover in the search result page and improve the friendliness of interface display, the embodiment provides a key frame image, which is an image obtained by intercepting the video frame of the video and containing the text content and the image content of the answer key point. The text content of the answer key points contained in the key frame images is obtained by identification from video frames of the video, and the image content of the answer key points contained in the key frame images is obtained by matching from the video frames of the video based on the text content of the answer key points.
In practical applications, the key frame image in this embodiment is generated by a server, and for better understanding, a description is given here on a manner of obtaining the key frame image for each answer point. Referring to fig. 2, the process of processing a key frame image containing text content and image content of an answer key point from a video includes the following steps S202 to S208:
step S202, character recognition processing is carried out on the video to obtain text content of key points of the answer.
Most videos have subtitles, and the subtitles can reflect the video content more accurately, so in a specific embodiment, Character Recognition processing can be performed on each video frame in the video through an OCR (Optical Character Recognition) technology or a detection model to obtain a candidate Character Recognition result on each video frame; judging whether the candidate character recognition result can be matched with any answer key point; if so, that is, the current candidate character recognition result can be successfully matched with a certain answer key point, the candidate character recognition result is matched with the search question, the answer key point of the search question can be expressed, so that the candidate character recognition result which can be successfully matched with the answer key point is determined as the target character recognition result, and the target character recognition result or the keyword in the target character recognition result is used as the text content of the answer key point.
Generally, video shows a complete program of things going on, and a solution determined based on video should also include multiple answer points with logic. For example, for tutorial videos such as gourmet videos, manual videos, and fitness videos, the displayed content usually includes a plurality of answer key points, and each answer key point corresponds to a key step. Thus, the text content obtained according to the above embodiment is also generally plural in correspondence with the answer key.
And step S204, based on the text content of the key points of the answer, identifying the image content matched with the text content from the video frames of the video.
When the total number of frames of the video is small, image contents matching the text contents can be identified one by one from the video frames of the video.
When the total frame number of the video is large, the information contained in the video is more abundant and diversified, and in order to improve the identification efficiency, the embodiment may perform segmentation processing on the video to obtain a plurality of video segments, and then identify the image content matched with the text content from the video frames of at least part of the video segments. Specific embodiments can refer to the following examples of steps (1) and (2).
(1) And when the calculated correlation degree is smaller than a preset correlation degree threshold value, performing segmentation processing on the video based on the positions of the two continuous video frames to obtain a plurality of video segments containing the characters.
(2) And identifying image content matched with the text content from the video frames of the video clip based on the text content of the key points of the answer.
Considering that the time sequence of video playing and the sequence between the answer key points are consistent, the time sequence of each video segment and the sequence of the answer key points are also related based on the time sequence. In this case, at least one video clip may be assigned to each answer gist; and identifying image content matched with the text content from the video frames of the video clips distributed for the current answer key points according to the text content of the current answer key points. The specific implementation manner may include: and identifying the image content in each video frame in the video clip one by one, judging whether the image identification result is matched with the text content of the current answer key point, and if so, obtaining the image content matched with the text content.
Of course, if the image content matching the text content of the current answer main point cannot be identified from the distributed video segments, the image content matching the text content of the current answer main point is identified from the other video segments. According to the method, all video frames of the video do not need to be compared with the text content of each answer key point, each answer key point only needs to be compared with the video frames in part of the video clips, and each answer key point is distributed with the respective video clip, so that the image content can be identified by a plurality of answer key points at the same time, and therefore the identification efficiency of the image content can be effectively improved.
Step S206, intercepting image content from the video frame.
In this embodiment, after the image content matched with the text content is identified from the video frame, the corresponding video frame is intercepted to obtain an intercepted image containing the image content.
In step S208, a key frame image including the answer key is generated based on the text content and the image content of the answer key.
In order to more completely preserve the text content and the image content of the answer key points, the embodiment may generate the key frame image containing the answer key points by the following method, including:
and carrying out target detection on the intercepted image to obtain a text content containing the key points of the answer and an enclosure of the image content in the intercepted image. Expanding and adjusting the bounding box according to a preset length-width ratio; the length-width ratio is set for facilitating splicing processing of the key frame images, and size mismatching between the images in the splicing process is avoided; and, the expanding adjustment means that, when a certain size parameter of the bounding box is smaller than the preset length-width ratio, the size of the smaller size parameter is increased so that the length-width ratio of the adjusted bounding box is the same as the preset length-width ratio. For example, when the width of the enclosure frame is smaller than the preset length-width ratio, the enclosure frame is adjusted in size by increasing the width of the enclosure frame. The problem of cutting off the local text content or the local image content can not occur in the expanded adjusting mode, and the completeness of the text content and the image content is ensured. . And intercepting the image determined by the position parameter from the intercepted image based on the position parameter of the adjusted surrounding frame in the intercepted image to obtain the key frame image containing the answer key point.
According to any one of the above embodiments, after a plurality of key frame images corresponding to a plurality of answer key points are extracted from a video, a video cover of the video is generated based on the plurality of key frame images corresponding to the plurality of answer key points. Embodiments of this step can be referred to as follows:
and determining the arrangement sequence of a plurality of key frame images corresponding to the plurality of answer key points in the video cover based on the sequence of the plurality of answer key points in the answers. Specifically, according to the sequence of the answer key points in the answers, the image contents corresponding to the text contents of the answer key points are sequenced, and the sequencing result of the image contents is used as the arrangement sequence among the key frame images.
And determining a corresponding splicing template based on the sizes of the plurality of key frame images and the arrangement sequence among the plurality of key frame images. In a specific implementation, the stitching template may be determined based on the number, arrangement order, and size of the key frame images. For example, the number of the key frame images is two images having a front-back arrangement order, and then, when determining the mosaic template, the mosaic template which represents the arrangement order in an up-down, left-right, or other manner and can mosaic the two images is selected. The width-height ratio of each key frame image is determined according to the size of the key frame image, and when the splicing template is determined, the splicing template matched with the width-height ratio can be selected. The person skilled in the art may also determine the stitching template based on other image parameters according to actual needs, and is not limited in this respect. The splicing template in this embodiment may be preset in the splicing template library for calling.
Based on the size of each region in the splicing template, carrying out equal-scale scaling processing on the key frame image corresponding to each region; and inserting the zoomed key frame image into the corresponding area of the template to obtain the video cover of the video. For the current region, the scaling is calculated according to the size of the current region and the size of the key frame image corresponding to the current region; carrying out equal-scale scaling processing on the key frame image according to the calculated scaling; the scaled key frame image is inserted into the current region.
In accordance with the above embodiments, FIG. 3 provides several examples of the presentation of video covers in a search results page. The left image in fig. 3 shows a video cover obtained by intercepting and stitching key frame images after three key frame images are extracted in a key frame image extraction mode based on text content. It can be seen that each key frame image of the video cover shown in the left image of fig. 3 includes an answer key point for solving the search question (how to delete the mobile phone spam) in the search box, and text content (such as subtitle: "correctly clear the mobile phone memory") and image content corresponding to the answer key point; meanwhile, the arrangement sequence of the plurality of key frame images from left to right and from top to bottom in the video cover is matched with the sequence of answer key points corresponding to the search question. Each key frame image of the video cover shown in the right diagram of fig. 3 includes image content associated with the answer key of the search question, which is in turn: minced meat, egg liquid and minced meat steamed eggs.
In this embodiment, the video information corresponding to the search question acquired by the terminal device may further include a time identifier, where the time identifier is used to indicate the time of the key frame image in the video cover on the playing time axis of the video.
Based on this, the present embodiment provides a method for playing a video in response to detecting a trigger operation on a video cover, including:
in response to detecting the triggering operation of the first key frame image on the video cover, determining a corresponding time identifier of the first key frame image on the playing time axis of the video; and jumping to a playing page of the video, and starting to play the video from the time identifier.
The terminal equipment judges whether the triggering operation is detected; the triggering operation is an operation for a first key frame image of a video cover on a search result solution page, for example, when a display screen of the terminal device is a touch display screen, the triggering operation may be a touch operation of an operation body such as a finger or a stylus; when the input device of the terminal device is a mouse, the triggering operation may be a clicking operation of the user on the search result page through the mouse.
If the triggering operation is detected, responding to the triggering operation, and determining a corresponding time identifier of the first key frame image on a playing time axis of the video; and jumping to a playing page of the video, and starting to play the video from the time identifier, namely starting to play the video by taking the first key frame as a starting frame based on the time identifier.
To facilitate understanding, an embodiment of a server generated time stamp is given below, comprising: firstly, determining the corresponding time identification of each key frame image on the playing time axis of the video.
The embodiment may determine the time identifier corresponding to each key frame image by using various implementation manners, which take the following two manners as examples.
The implementation mode is as follows: for each key frame image, matching the key frame image with a video frame in a video, and determining a target video frame matched with the key frame image in the video frame; and determining the time identifier corresponding to the target video frame as the time identifier corresponding to the key frame image on the playing time axis of the video.
The implementation mode two is as follows: and determining the corresponding time identifier of each key frame image on the playing time axis of the video according to the mapping relation between each video frame and the playing time on the playing time axis of the video.
And after the time identifier corresponding to each key frame image is determined according to the mode, adding the corresponding relation between each key frame image and the time identifier into the video information.
In summary, in the method for searching question and answer results provided by the above-mentioned disclosure, the terminal device obtains the video information corresponding to the search question from the server; the video information includes: a video cover page that can be presented in the search results page, and a time stamp that can specify the video playback location. The video cover is composed of a plurality of key frame images, the composing mode can be flexible and changeable, and the display mode of the video cover is enriched; the arrangement sequence of the key frame images is matched with the key points of answers corresponding to the search questions, and certain logicality is embodied; because each key frame image forming the video cover comprises the text content and the image content for solving the search question, the text content is more in line with the daily reading habit of the user, the intuition of the search result can be improved, the image content has stronger picture feeling, and the vividness of the search result can be improved. Therefore, the video cover displayed in the search result page can improve the display friendliness of the search result page, and comprehensively, intuitively and accurately express the video searching intention of the user, so that the user can quickly find the content to be watched from the video cover, and the video is played in response to the triggering operation, thereby effectively improving the video searching efficiency. For the time identification in the video information, the user can conveniently and rapidly jump to the desired video position, and the user experience is improved.
Example two:
according to the first embodiment, the present embodiment may further provide a method for searching a question and answer result, where the method is applied to a server; as shown in fig. 4, the method includes:
step S402, receiving a search request of the terminal equipment, wherein the search request comprises a search problem;
step S404, searching and obtaining a video containing answers to the search questions based on the search questions;
step S406, determining a plurality of answer key points of the answers based on the audio signals or the text contents of the videos;
step S408, processing each answer key point from the video to obtain a key frame image containing text content and image content of the answer key point;
step S410, generating a video cover of the video based on a plurality of key frame images corresponding to a plurality of answer key points;
and step S412, feeding back the video information containing the video cover to the terminal equipment.
In one embodiment, the step of processing the keyframe image from the video with the text content and the image content including the answer key points comprises:
performing character recognition processing on the video to obtain text content of key points of the answer; identifying image content matched with the text content from video frames of the video based on the text content of the answer key points; intercepting image content from a video frame; and generating a key frame image containing the key points of the answer based on the text content and the image content of the key points of the answer.
In one embodiment, the step of generating a video cover of the video based on a plurality of key frame images corresponding to a plurality of answer key points comprises:
determining the arrangement sequence of a plurality of key frame images corresponding to a plurality of answer key points in a video cover based on the sequence of the answer key points in the answers; determining a corresponding splicing template based on the sizes of the plurality of key frame images and the arrangement sequence among the plurality of key frame images; based on the size of each region in the splicing template, carrying out equal-scale scaling processing on the key frame image corresponding to each region; and inserting the zoomed key frame image into a corresponding area of the template to obtain a video cover of the video.
In one embodiment, before the step of feeding back video information including the video cover to the terminal device, the method further comprises:
determining a corresponding time identifier of each key frame image on a playing time axis of the video; and adding the corresponding relation between each key frame image and the time mark into the video information.
In one embodiment, the step of determining the corresponding time identifier of each key frame image on the playing time axis of the video includes:
for each key frame image, matching the key frame image with a video frame in a video, and determining a target video frame matched with the key frame image in the video frame; and determining the time identifier corresponding to the target video frame as the time identifier corresponding to the key frame image on the playing time axis of the video.
In one embodiment, the step of determining the corresponding time identifier of each key frame image on the playing time axis of the video includes:
and determining the corresponding time identifier of each key frame image on the playing time axis of the video according to the mapping relation between each video frame and the playing time on the playing time axis of the video.
According to the method for searching question and answer results provided by the embodiment of the disclosure, the server searches the video containing the answers to the search questions, so that the obtained video and the answers to the search questions have higher matching degree, and further the video information is generated based on the video, so that the matching degree between the video information and the answers to the search questions can be improved; the key frame image determined based on the video comprises text content and image content of answer key points, and the answer key points can be visually and vividly embodied; furthermore, the video cover generated based on the plurality of key frame images can comprehensively, visually and accurately express the video searching desire of the user, and the video information containing the video cover is fed back to the terminal equipment, so that the user can quickly find the content needing to be watched through the video cover, and the user experience is obviously improved on the video searching function.
The method provided by the embodiment has the same implementation principle and technical effect as the first embodiment, and for the sake of brief description, reference may be made to the corresponding contents in the first embodiment for the part of this embodiment that is not mentioned.
Example three:
according to the first and second embodiments, this embodiment may further provide a method for searching question and answer results, where the method includes:
step 1, the terminal equipment sends a search request to a server, wherein the search request comprises a search problem.
And 2, the server receives a search request of the terminal equipment, and searches and obtains a video containing answers to the search questions based on the search questions in the search request.
And 3, searching and obtaining the video containing the answer of the search question by the server based on the search question.
And 4, the server determines a plurality of answer key points of the answers based on the audio signals or the text contents of the videos.
And 5, processing the key frame images containing the text content and the image content of the key points from the video by the server according to each key point of the answer.
Step 6, the server generates a video cover of the video based on a plurality of key frame images corresponding to a plurality of answer key points; the video cover is composed of a plurality of key frame images of the video, each key frame image comprises text content and image content of the answering search question, and the number of the key frame images is matched with the number of answer key points of the answering search question.
Step 7, the server determines the corresponding time identification of each key frame image on the playing time axis of the video; and adding the corresponding relation between each key frame image and the time mark into the video information.
And 8, the server feeds back the video information containing the video cover to the terminal equipment.
And 9, the terminal equipment acquires the video information corresponding to the search problem.
And step 10, the terminal equipment displays the video cover in the search result page.
Step 11, the terminal device determines a corresponding time identifier of a first key frame image on a playing time axis of the video in response to detecting a trigger operation on the first key frame image on the video cover;
and step 12, the terminal equipment jumps to a playing page of the video and starts to play the video from the time identifier.
The method provided by the embodiment has the same implementation principle and technical effect as the first embodiment and the second embodiment, and for the sake of brief description, corresponding contents in the first embodiment and the second embodiment may be referred to where not mentioned in this embodiment.
Example four:
according to the first embodiment, the present disclosure also provides a device for searching question and answer results, which is applicable to a terminal device, as shown in fig. 5, and includes:
an information obtaining module 502, configured to obtain video information corresponding to the search question; the video information comprises a video cover, the video cover is composed of a plurality of key frame images of the video, each key frame image comprises text content and image content of the answering search question, and the number of the key frame images is matched with the number of answer key points of the answering search question;
a cover display module 504 for displaying a video cover in the search result page;
a video playing module 506, configured to play the video in response to detecting the triggering operation on the video cover.
In an embodiment, the video playing module 506 is specifically configured to:
in response to detecting the triggering operation of the first key frame image on the video cover, determining a corresponding time identifier of the first key frame image on the playing time axis of the video; and jumping to a playing page of the video, and starting to play the video from the time identifier.
Example five:
according to the second embodiment, the present disclosure also provides a device for searching for a question and answer result, which is applicable to a server, as shown in fig. 6, and comprises:
a request receiving module 602, configured to receive a search request of a terminal device, where the search request includes a search question;
a video searching module 604, configured to search for a video that includes an answer to the search question based on the search question;
a gist determining module 606 for determining a plurality of answer gist of the answer based on the audio signal or the text content of the video;
an image obtaining module 608, configured to, for each answer key point, process a video to obtain a key frame image including text content and image content of the answer key point;
a cover generation module 610, configured to generate a video cover of the video based on a plurality of key frame images corresponding to the plurality of answer key points;
and an information feedback module 612, configured to feed back video information including the video cover to the terminal device.
In one embodiment, the image acquisition module 608 includes:
the text content obtaining unit is used for carrying out character recognition processing on the video to obtain text content of answer key points;
the image content identification unit is used for identifying image content matched with the text content from video frames of the video based on the text content of the answer key points;
the image content intercepting unit is used for intercepting image content from the video frame;
and the image generating unit is used for generating a key frame image containing the key points of the answer based on the text content and the image content of the key points of the answer.
In one embodiment, the image acquisition module 608 includes:
the order determining unit is used for determining the arrangement order of a plurality of key frame images corresponding to a plurality of answer key points in the video cover based on the order of the answer key points in the answers;
the template determining unit is used for determining a corresponding splicing template based on the sizes of the key frame images and the arrangement sequence among the key frame images;
the zooming processing unit is used for carrying out equal-scale zooming processing on the key frame images corresponding to the regions based on the sizes of the regions in the splicing template;
and the image inserting unit is used for inserting the zoomed key frame image into the corresponding area of the template to obtain a video cover of the video.
In one embodiment, the apparatus further comprises:
the time identifier determining module is used for determining the corresponding time identifier of each key frame image on the playing time axis of the video;
and the relationship adding module is used for adding the corresponding relationship between each key frame image and the time identifier into the video information.
In one embodiment, the time identification determination module comprises:
the matching unit is used for matching the key frame images with video frames in the video aiming at each key frame image and determining a target video frame matched with the key frame images in the video frames;
and the first identification determining unit is used for determining the time identification corresponding to the target video frame as the time identification corresponding to the key frame image on the playing time axis of the video.
In one embodiment, the time identification determination module comprises: and the second identifier determining unit is used for determining the corresponding time identifier of each key frame image on the playing time axis of the video according to the mapping relation between each video frame and the playing time on the playing time axis of the video.
The device provided in this embodiment has the same implementation principle and technical effects as those of the first to third embodiments, and for the sake of brief description, reference may be made to corresponding contents of the first to third embodiments for a part not mentioned in this embodiment.
Based on the foregoing embodiment, this embodiment provides a terminal device, including: a memory in which a computer program is stored and a processor, wherein the processor performs the method as described in the above embodiments one to three when the computer program is executed by the processor.
The present embodiment also provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, performs the method as in the above embodiments one to three.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.