WO2023124874A1 - Question and answer result searching method and apparatus, device, and storage medium - Google Patents

Question and answer result searching method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2023124874A1
WO2023124874A1 PCT/CN2022/137552 CN2022137552W WO2023124874A1 WO 2023124874 A1 WO2023124874 A1 WO 2023124874A1 CN 2022137552 W CN2022137552 W CN 2022137552W WO 2023124874 A1 WO2023124874 A1 WO 2023124874A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
key frame
answer
image
search
Prior art date
Application number
PCT/CN2022/137552
Other languages
French (fr)
Chinese (zh)
Inventor
汪忠超
王艳丽
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2023124874A1 publication Critical patent/WO2023124874A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • Embodiments of the present disclosure relate to the technical field of video processing, and in particular, to a search method, device, device, and storage medium for question and answer results.
  • the video application provided by the related technology can provide a video search function and a video playback function, through which a user can search for related videos and play them.
  • the disclosure provides a search method, device, equipment and storage medium for question and answer results.
  • the present disclosure provides a search method for question and answer results, including:
  • the video information includes a video cover
  • the video cover is composed of a plurality of key frame images of the video, each of the key frame images includes text content and Image content, the number of key frame images matches the number of answer points to answer the search question; display the video cover on the search results page; The video plays.
  • the arrangement order of the plurality of key frame images in the video cover matches the order of key points of the answers corresponding to the search question.
  • the key frame image is an image including the text content and the image content of the key points of the answer obtained by intercepting from the video frame of the video.
  • the text content of the answer points contained in the key frame image is recognized from the video frame of the video, and the image content of the answer points contained in the key frame image is based on the The text content of the answer gist is matched from the video frame of the video.
  • the playing the video in response to detecting a trigger operation on the video cover includes: determining the The time mark corresponding to the first key frame image on the playback time axis of the video; jump to the play page of the video, and start playing the video from the time mark
  • the present disclosure also provides a search method for question and answer results, including:
  • the key frame images containing the text content and image content of the answer points are obtained from the video processing, including:
  • generating the video cover of the video based on the multiple key frame images corresponding to the multiple answer points includes:
  • the terminal device before feeding back the video information including the cover of the video to the terminal device, it also includes: determining the time mark corresponding to each key frame image on the playback time axis of the video; The corresponding relationship with the time identifier is added to the video information.
  • the determining the time mark corresponding to each key frame image on the playback time axis of the video includes:
  • the key frame image is matched with the video frame in the video, and the target video frame matching the key frame image in the video frame is determined; the target video frame is correspondingly
  • the time identifier of is determined as the corresponding time identifier of the key frame image on the playing time axis of the video.
  • the determining the time identifier corresponding to each key frame image on the playing time axis of the video includes: determining according to the mapping relationship between each video frame and the playing time on the playing time axis of the video Time marks corresponding to each key frame image on the playing time axis of the video.
  • the present disclosure also provides a search device for question and answer results, including:
  • An information acquisition module configured to acquire video information corresponding to the search question; wherein, the video information includes a video cover, and the video cover is composed of a plurality of key frame images of the video, and each of the key frame images includes the answer described in the text content and image content of the search question, the number of keyframe images matching the number of answer points to answer the search question;
  • a cover display module configured to display the cover of the video on the search result page
  • a video playing module configured to play the video in response to detecting a trigger operation on the video cover.
  • the arrangement order of the plurality of key frame images in the video cover matches the order of key points of the answers corresponding to the search question.
  • the key frame image is an image including the text content and the image content of the key points of the answer obtained by intercepting from the video frame of the video.
  • the text content of the answer points contained in the key frame image is recognized from the video frame of the video, and the image content of the answer points contained in the key frame image is based on the The text content of the answer gist is matched from the video frame of the video.
  • the video playing module is configured to: in response to detecting a trigger operation on the first key frame image on the cover of the video, determine that the first key frame image corresponds to time mark; jump to the playing page of the video, and start playing the video from the time mark.
  • the present disclosure also provides a search device for question and answer results, including:
  • a request receiving module configured to receive a search request from a terminal device, where the search request includes a search question
  • a video search module configured to search for a video containing an answer to the search question based on the search question
  • a point determination module configured to determine multiple answer points of the answer based on the audio signal or text content of the video
  • An image obtaining module for each answer point, from the video to process the key frame image containing the text content and image content of the answer point;
  • a cover generation module for generating the video cover of the video based on a plurality of key frame images corresponding to the plurality of answer points;
  • An information feedback module configured to feed back video information including the video cover to the terminal device.
  • the image obtaining module includes:
  • the text content obtaining unit is used to perform character recognition processing on the video to obtain the text content of the answer points;
  • an image content identification unit configured to identify image content matching the text content from the video frame of the video based on the text content of the answer points
  • an image content intercepting unit configured to intercept the image content from the video frame
  • An image generation unit configured to generate a key frame image containing the answer points based on the text content of the answer points and the image content.
  • the image obtaining module includes:
  • a sequence determination unit configured to determine the arrangement order of the plurality of key frame images corresponding to the plurality of answer points in the video cover based on the order of the plurality of answer points in the answer;
  • a template determination unit configured to determine a corresponding splicing template based on the size of the plurality of key frame images and the arrangement sequence among the plurality of key frame images;
  • a scaling processing unit configured to perform proportional scaling processing on the key frame images corresponding to each area based on the size of each area in the mosaic template
  • the image inserting unit is configured to insert the scaled key frame image into the corresponding area of the template to obtain the video cover of the video.
  • the device also includes:
  • a time stamp determination module configured to determine the time stamp corresponding to each key frame image on the playing time axis of the video
  • the relationship adding module is used to add the corresponding relationship between each key frame image and the time stamp to the video information.
  • the time stamp determination module includes:
  • a matching unit for each key frame image, matching the key frame image with a video frame in the video, and determining a target video frame in the video frame that matches the key frame image;
  • the first identification determining unit is configured to determine the time identification corresponding to the target video frame as the time identification corresponding to the key frame image on the playback time axis of the video.
  • the time identifier determination module includes: a second identifier determination unit, configured to determine the key frame images in the The corresponding time mark on the playing time axis of the video.
  • the present disclosure also provides a terminal device, including: a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the above-mentioned method.
  • the present disclosure also provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the processor executes the method as described above.
  • FIG. 1 is a flowchart of a search method for question and answer results according to Embodiment 1 of the present disclosure
  • FIG. 2 is a flowchart of a method for generating a video cover according to Embodiment 1 of the present disclosure
  • FIG. 3 is a schematic diagram of a video cover according to Embodiment 1 of the present disclosure.
  • FIG. 4 is a flowchart of a search method for question and answer results described in Embodiment 2 of the present disclosure
  • FIG. 5 is a structural block diagram of a search device for question and answer results described in Embodiment 4 of the present disclosure
  • FIG. 6 is a structural block diagram of a search device for question and answer results according to Embodiment 5 of the present disclosure.
  • the user can search for related videos and play them.
  • the video search results users need to spend time to find the content they need to watch, which reduces the search efficiency.
  • embodiments of the present disclosure provide a search method, device, device, and storage medium for question and answer results, and the embodiments of the present disclosure will be described in detail below.
  • the search method for the question and answer results includes the following steps:
  • Step S102 obtain video information corresponding to the search question; wherein, the video information includes a video cover, and the video cover is composed of a plurality of key frame images of the video, and each key frame image includes text content and image content for answering the search question, and the key frame The number of images matches the number of answer points that answer the search question.
  • the user sends a search request including a search question to the server through the terminal device; wherein, the search question may contain at least one keyword.
  • the server searches to obtain a video containing the answer to the search question; during implementation, the server can first search for an answer that matches the keywords in the search question based on the search question, and then select the video from the preset video resource library. Search for videos with that answer in .
  • the server can identify the content of the video, such as text content, audio signal, image content and playing time of the video, so as to determine multiple answer points of the answer based on the identified content. For each answer point, process the key frame image containing the text content and image content of the answer point from the video; specifically, for example, use the recognized text content and image content to determine from the video the key that matches the answer point frame image.
  • the video cover of the video is generated, and the video information including the video cover is fed back to the terminal device.
  • the video cover may be the splicing processing result of each key frame image, and the splicing mode of each key frame image is flexible and changeable, and the video cover thus formed can present a better page display effect.
  • Step S104 displaying the video cover on the search result page.
  • the terminal device receives the video information and displays the video cover contained in the video information on the search result page.
  • step S106 the video is played in response to detecting a trigger operation on the video cover.
  • the terminal device judges whether a trigger operation on the video cover is detected; if a trigger operation is detected, the video is played in response to the trigger operation.
  • the search method for the question-and-answer results obtains video information corresponding to the search question;
  • the video information includes a video cover composed of multiple key frame images of the video, and each key frame image includes text to answer the search question Content and image content, the number of keyframe images matches the number of answer points to answer the search question; display the video cover in the search results page; play the video in response to detecting a trigger action on the video cover.
  • the key frame images that make up the video cover include text content and image content that answer the search question. The text content is more in line with the user's daily reading habits and can improve the intuitiveness of the search results.
  • the image content has A strong image sense can improve the vividness of the search results; and, the number of key frame images matches the number of answer points, which can more comprehensively cover the answers to the search questions; based on this, display on the search results page
  • the video cover of the video can comprehensively, intuitively and accurately express the user's willingness to search for the video, so that the user can quickly find the content they need to watch from the video cover, so, in response to the detection of the trigger operation on the video cover, the video is played. Effectively improve the video search efficiency.
  • this embodiment provides a display method of the video cover: the arrangement order of multiple key frame images in the video cover Match the order of answer points corresponding to the search question.
  • the video cover uses orderly arranged key frame images to highlight the order of the key points of the answer, increasing the matching between the video cover and the answer to the search question.
  • the proportions of the videos are not necessarily the same; for example, videos are divided into horizontal and vertical videos.
  • the video cover is usually taken from the video frame in the video. In this case, the size of the video cover is different for different videos.
  • the layout of the display interface will be compared. confusion.
  • the video covers with different ratios are simply unified in size, for example, the vertical version of the video is displayed as a horizontal version of the cover, the effect of the horizontal version of the cover is not good and it is difficult to guarantee the image quality.
  • this embodiment provides a key frame image, which is the text containing the key points of the answer intercepted from the video frame of the video Image of content and image content.
  • the text content of the answer points contained in the key frame image is recognized from the video frame of the video, and the image content of the answer points contained in the key frame image is matched from the video frame of the video based on the text content of the answer points of.
  • the key frame image in this embodiment is generated by the server.
  • the process of processing the key frame images containing the text content and image content of the answer points from the video includes the following steps S202-S208:
  • Step S202 performing character recognition processing on the video to obtain the text content of the key points of the answer.
  • character recognition processing can be performed on each video frame in the video by OCR (Optical Character Recognition, Optical Character Recognition) technology or a detection model. Get the candidate character recognition results on each video frame; judge whether the candidate character recognition results can match any of the answer points; The character recognition result matches the search question, and can express the main points of the answer to the search question, so that the candidate character recognition result that can successfully match the answer key points is determined as the target character recognition result, and the target character recognition result or the key in the target character recognition result words as the text content of the answer points.
  • OCR Optical Character Recognition, Optical Character Recognition
  • the video shows the complete process of things going on, and the answer determined based on the video should also include multiple logical answer points.
  • tutorial videos such as food, handicrafts, and fitness.
  • the displayed content usually includes multiple answer points, and each answer point corresponds to a key step. Therefore, corresponding to the key points of the answer, there are usually multiple text contents obtained according to the above embodiment.
  • Step S204 based on the text content of the answer points, identify the image content matching the text content from the video frame of the video.
  • the image content matching the text content can be identified one by one from the video frames of the video.
  • this embodiment can first segment the video to obtain a plurality of video segments, and then from at least some of the video segments. Identify the image content in the frame that matches the text content. For specific implementation, reference may be made to the following examples of steps (1) and (2).
  • the chronological sequence of each video clip is also related to the sequence of the key points of each answer.
  • at least one video segment may be assigned to each answer point; for the current answer point, based on the text content of the current answer point, identifying a video frame matching the text content from the video frames of the video segment assigned to the current answer point image content.
  • the specific implementation may include: identifying the image content in each video frame in the video clip one by one, and judging whether the image recognition result matches the text content of the current answer point, and if so, obtaining the image content matching the text content.
  • the image content matching the text content of the current answer main point cannot be identified from the assigned video clip, then the image content matching the text content of the current answer main point is identified from other video clips.
  • This method does not need to compare all the video frames of the video with the text content of each answer point. For each answer point, it only needs to compare with the video frames in some video clips, and since each answer point is assigned a Respective video clips, so multiple answer points can simultaneously identify the image content. Therefore, this method can effectively improve the efficiency of image content identification.
  • Step S206 intercepting image content from the video frame.
  • the corresponding video frame is intercepted to obtain an intercepted image including the image content.
  • Step S208 based on the text content and image content of the key points of the answer, a key frame image containing the key points of the answer is generated.
  • this embodiment can generate a key frame image containing the key points of the answer in the following manner, including:
  • the enlarged adjustment method will not cause the problem of cutting out partial text content or partial image content, which ensures the integrity of text content and image content. .
  • the image determined by the position parameters is intercepted from the intercepted image to obtain a key frame image containing the key points of the answer.
  • the video cover of the video is generated based on the multiple key frame images corresponding to the multiple answer points.
  • the arrangement sequence of the multiple key frame images corresponding to the multiple answer points in the video cover is determined. Specifically, the image content corresponding to the text content of each answer point is sorted according to the order of the answer points in the answer, and the sorting result of the image content is used as the arrangement order between the key frame images.
  • a corresponding splicing template is determined.
  • the splicing template may be determined based on the number, arrangement order and size of the key frame images. For example, if the number of key frame images is two images with an arrangement order of front and rear, then, when determining the splicing template, select a splicing template that reflects the arrangement order in the ways of up-down, left-right, etc., and can splice two images.
  • the width-to-height ratio of each key-frame image is determined according to the size of the key-frame image, and a stitching template that matches the width-to-height ratio can be selected when determining a splicing template.
  • a stitching template that matches the width-to-height ratio can be selected when determining a splicing template.
  • Those skilled in the art may also determine the splicing template based on other image parameters according to actual needs, which is not specifically limited here.
  • the mosaic template in this embodiment may be preset in a mosaic template library for calling.
  • this embodiment calculates the scaling ratio according to the size of the current region and the size of the key frame image corresponding to the current region; performs proportional scaling on the key frame image according to the calculated scaling ratio; inserts the zoomed key frame image into into the current region.
  • Fig. 3 provides several examples of displaying video cover on the search result page.
  • the left picture in Figure 3 shows the video cover obtained after the key frame image extraction method based on the text content extracts three key frame images, and then intercepts and stitches the key frame images.
  • each key frame image of the video cover shown in the left picture of Figure 3 includes the key points of the answer to the search question in the search box (how to delete mobile phone garbage) and the content of this article corresponding to the key points of the answer (such as the subtitle: "Correctly clean up Mobile phone memory" and image content; at the same time, the order of arrangement of multiple key frame images from left to right and from top to bottom in the video cover matches the order of the answer points corresponding to the search question.
  • Each key frame image of the video cover shown in the right figure of Figure 3 contains the image content associated with the key points of the answer to the search question, which are: minced meat, egg liquid, and steamed egg with minced meat.
  • the video information corresponding to the search question acquired by the terminal device may also include a time stamp, which is used to indicate the time on the video playing time axis of the key frame image in the video cover.
  • this embodiment provides a method for playing a video in response to detecting a trigger operation on the video cover, including:
  • the terminal device judges whether a trigger operation is detected; the trigger operation is an operation aimed at solving the first key frame image of the video cover on the search result page.
  • the trigger operation can be a , a touch operation of an operating body such as a stylus;
  • the input device of the terminal device is a mouse, the triggering operation may be a click operation of the user on the search result page through the mouse.
  • a trigger operation in response to the trigger operation, determine the time mark corresponding to the first key frame image on the playback time axis of the video; jump to the video playback page, and start playing the video from the time mark, that is, based on the time mark , start playing the video with the first keyframe as the starting frame.
  • an embodiment of generating a time stamp by the server including: first determining the time stamp corresponding to each key frame image on the playing time axis of the video.
  • Implementation method 1 For each key frame image, match the key frame image with the video frame in the video, determine the target video frame in the video frame that matches the key frame image; determine the time identifier corresponding to the target video frame as the key The time mark corresponding to the frame image on the playback time axis of the video.
  • Implementation mode 2 According to the mapping relationship between each video frame and the playing time on the playing time axis of the video, the corresponding time identifier of each key frame image on the playing time axis of the video is determined.
  • the corresponding relationship between each key frame image and the time stamp is added to the video information.
  • the terminal device obtains video information corresponding to the search question from the server; the video information includes: the video cover that can be displayed on the search result page, and the video that can specify the video playback position. time stamp.
  • the video cover is composed of multiple key frame images, and the composition method can be flexible and changeable, which enriches the display mode of the video cover; the arrangement order of the key frame images matches the answer points corresponding to the search question, reflecting a certain logic Because each key frame image that makes up the video cover includes text content and image content to answer the search question, the text content is more in line with the user's daily reading habits and can improve the intuitiveness of the search results, while the image content has a strong The sense of picture can improve the vividness of search results.
  • the video cover displayed on the search result page can improve the friendliness of the search result page display, comprehensively, intuitively and accurately express the user's willingness to search for the video, so that the user can quickly find the content they need to watch from the video cover , thus, playing the video in response to the trigger operation effectively improves the video search efficiency.
  • the time stamp in the video information it is convenient for the user to quickly jump to the desired video position, which improves the user experience.
  • present embodiment can also provide a kind of search method of question and answer result, and this method is applied to server; As shown in Figure 4, this method comprises:
  • Step S402 receiving a search request from the terminal device, where the search request includes a search question
  • Step S404 based on the search question, search to obtain a video containing an answer to the search question;
  • Step S406 based on the audio signal or text content of the video, determine multiple answer points of the answer;
  • Step S408 for each key point of the answer, process the key frame image containing the text content and image content of the key point of the answer from the video;
  • Step S410 generating a video cover of the video based on multiple key frame images corresponding to multiple answer points
  • Step S412 feeding back the video information including the video cover to the terminal device.
  • the step of obtaining key frame images containing text content and image content of answer points from video processing includes:
  • the step of generating the video cover of the video includes:
  • the arrangement order of the multiple key frame images corresponding to the multiple answer points in the video cover Based on the order of the multiple answer points in the answer, determine the arrangement order of the multiple key frame images corresponding to the multiple answer points in the video cover; based on the size of the multiple key frame images and the arrangement between the multiple key frame images According to the layout sequence, the corresponding splicing template is determined; based on the size of each area in the splicing template, the key frame images corresponding to each area are scaled proportionally; the scaled key frame images are inserted into the corresponding areas of the template to obtain the video image Video cover.
  • the above method before the step of feeding back the video information including the video cover to the terminal device, the above method further includes:
  • the step of determining the time mark corresponding to each key frame image on the playback time axis of the video includes:
  • the key frame image is matched with the video frame in the video, and the target video frame matching the key frame image in the video frame is determined; the time identifier corresponding to the target video frame is determined as the key frame image in the video The corresponding time mark on the playing time axis of .
  • the step of determining the time mark corresponding to each key frame image on the playback time axis of the video includes:
  • the time identifier corresponding to each key frame image on the playing time axis of the video is determined.
  • the server searches for a video containing the answer to the search question, so that the obtained video has a high degree of matching with the answer to the search question, and then generates video information based on the video, which can Improve the matching degree between the video information and the answer to the search question; then based on the key frame image determined by the video, it contains the text content and image content of the key points of the answer, which can intuitively and vividly reflect the key points of the answer; furthermore, based on multiple key frame images
  • the video cover generated by the frame image can fully, intuitively and accurately express the user's willingness to search for the video, and feed back the video information including the video cover to the terminal device, so that users can quickly find the content they need to watch through the video cover.
  • the video search function Significantly improve user experience.
  • this embodiment can also provide a search method for question and answer results, the method comprising:
  • Step 1 the terminal device sends a search request to the server, and the search request includes a search question.
  • Step 2 the server receives the search request from the terminal device, and based on the search question in the search request, searches to obtain a video containing an answer to the search question.
  • Step 3 based on the search question, the server searches to obtain a video containing an answer to the search question.
  • step 4 the server determines multiple key points of the answer based on the audio signal or the text content of the video.
  • Step 5 for each key point of the answer, the server processes the video to obtain a key frame image containing text content and image content of the key point of the answer.
  • Step 6 The server generates a video cover of the video based on multiple key frame images corresponding to multiple answer points; the video cover is composed of multiple key frame images of the video, and each key frame image includes text content and image content for answering the search question , the number of keyframe images matches the number of answer points that answer the search question.
  • step 7 the server determines the corresponding time mark of each key frame image on the playing time axis of the video; and adds the corresponding relationship between each key frame image and time mark to the video information.
  • step 8 the server feeds back the video information including the video cover to the terminal device.
  • Step 9 the terminal device acquires video information corresponding to the search question.
  • Step 10 the terminal device displays the video cover on the search result page.
  • Step 11 the terminal device determines the time mark corresponding to the first key frame image on the playback time axis of the video in response to detecting a trigger operation on the first key frame image on the video cover;
  • Step 12 the terminal device jumps to the video playing page, and starts playing the video from the time mark.
  • the present disclosure also provides a search device for question and answer results, which can be applied to a terminal device, as shown in FIG. 5 , the device includes:
  • the information obtaining module 502 is used to obtain video information corresponding to the search question; wherein, the video information includes a video cover, and the video cover is composed of a plurality of key frame images of the video, and each key frame image includes text content and images to answer the search question Content, where the number of keyframed images matches the number of answer points that answer the search question;
  • Cover display module 504 for displaying the video cover in the search result page
  • the video playing module 506 is configured to play the video in response to detecting a trigger operation on the video cover.
  • the above-mentioned video playing module 506 is specifically used for:
  • the present disclosure also provides a search device for question and answer results, which can be applied to a server, as shown in FIG. 6 , the device includes:
  • a request receiving module 602 configured to receive a search request from a terminal device, where the search request includes a search question;
  • a video search module 604 configured to search for a video containing an answer to the search question based on the search question;
  • Key point determination module 606 for determining multiple answer points of the answer based on the audio signal or text content of the video
  • Image obtains module 608, is used for each answer main point, obtains the key frame image that contains the text content of answer main point and image content from video processing;
  • Cover generation module 610 for generating the video cover of video based on multiple key frame images corresponding to multiple answer points
  • the information feedback module 612 is configured to feed back the video information including the video cover to the terminal device.
  • the image obtaining module 608 includes:
  • the text content obtaining unit is used to perform character recognition processing on the video to obtain the text content of the key points of the answer;
  • An image content recognition unit configured to identify image content matching the text content from the video frame of the video based on the text content of the answer points
  • An image content intercepting unit configured to intercept image content from a video frame
  • the image generating unit is configured to generate a key frame image containing the key points of the answer based on the text content and the image content of the key points of the answer.
  • the image obtaining module 608 includes:
  • a sequence determination unit is used to determine the arrangement order of multiple key frame images corresponding to multiple answer points in the video cover based on the order of multiple answer points in the answer;
  • a template determination unit configured to determine a corresponding splicing template based on the size of the multiple key frame images and the arrangement sequence between the multiple key frame images
  • a scaling processing unit configured to perform proportional scaling processing on the key frame images corresponding to each area based on the size of each area in the splicing template
  • the image inserting unit is configured to insert the scaled key frame image into the corresponding area of the template to obtain the video cover of the video.
  • the above-mentioned device also includes:
  • a time stamp determining module is used to determine the corresponding time stamp of each key frame image on the playback time axis of the video
  • the relationship adding module is used to add the corresponding relationship between each key frame image and the time stamp to the video information.
  • the time stamp determination module includes:
  • the matching unit is used for matching the key frame image with the video frame in the video for each key frame image, and determining the target video frame matching the key frame image in the video frame;
  • the first identification determining unit is configured to determine the time identification corresponding to the target video frame as the time identification corresponding to the key frame image on the playback time axis of the video.
  • the time identification determination module includes: a second identification determination unit, configured to determine each key frame image during the playback of the video according to the mapping relationship between each video frame and the playback time on the playback time axis of the video. The corresponding timestamp on the time axis.
  • this embodiment provides a terminal device, including: a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the above-mentioned embodiment 1 to 3 method in .
  • This embodiment also provides a computer-readable storage medium, in which a computer program is stored.
  • the processor executes the methods in the first to third embodiments above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention relates to the technical field of video processing, and provides a question and answer result searching method and apparatus, a device, and a storage medium. The method comprises: obtaining video information corresponding to a search question, wherein the video information comprises a video cover, the video cover is composed of a plurality of key frame images of a video, each key frame image comprises text content and image content which solve the search question, and the number of the key frame images is matched with the number of answer key points for solving the search question; displaying the video cover in a search result page; and playing back the video in response to detection of a trigger operation for the video cover. According to the present invention, video searching efficiency can be improved.

Description

问答结果的搜索方法、装置、设备及存储介质Search method, device, equipment and storage medium for question answering results
相关申请的交叉引用Cross References to Related Applications
本申请是以申请号为202111620699.8,申请日为2021年12月28日的中国申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。This application is based on the Chinese application with the application number 202111620699.8 and the filing date is December 28, 2021, and claims its priority. The disclosure content of the Chinese application is hereby incorporated into this application as a whole.
技术领域technical field
本公开实施例涉及视频处理技术领域,尤其涉及一种问答结果的搜索方法、装置、设备及存储介质。Embodiments of the present disclosure relate to the technical field of video processing, and in particular, to a search method, device, device, and storage medium for question and answer results.
背景技术Background technique
相关技术提供的视频应用可以提供视频搜索功能和视频播放功能,通过该视频搜索功能用户可以搜索相关视频并进行播放。The video application provided by the related technology can provide a video search function and a video playback function, through which a user can search for related videos and play them.
发明内容Contents of the invention
本公开提供了一种问答结果的搜索方法、装置、设备及存储介质。The disclosure provides a search method, device, equipment and storage medium for question and answer results.
本公开提供了一种问答结果的搜索方法,包括:The present disclosure provides a search method for question and answer results, including:
获取与搜索问题对应的视频信息;其中,所述视频信息包括视频封面,所述视频封面由视频的多个关键帧图像组成,各所述关键帧图像中包括解答所述搜索问题的文本内容和图像内容,所述关键帧图像的数量与解答所述搜索问题的答案要点的数量相匹配;在搜索结果页面中展示所述视频封面;响应于检测到对所述视频封面的触发操作对所述视频进行播放。Acquiring video information corresponding to the search question; wherein the video information includes a video cover, the video cover is composed of a plurality of key frame images of the video, each of the key frame images includes text content and Image content, the number of key frame images matches the number of answer points to answer the search question; display the video cover on the search results page; The video plays.
可选的,所述多个关键帧图像在所述视频封面中的排布顺序与所述搜索问题对应的答案要点顺序匹配。Optionally, the arrangement order of the plurality of key frame images in the video cover matches the order of key points of the answers corresponding to the search question.
可选的,所述关键帧图像是从所述视频的视频帧中截取获得的包含所述答案要点的文本内容和图像内容的图像。Optionally, the key frame image is an image including the text content and the image content of the key points of the answer obtained by intercepting from the video frame of the video.
可选的,所述关键帧图像中包含的所述答案要点的文本内容是从所述视频的视频帧中识别得到的,所述关键帧图像中包含的所述答案要点的图像内容是基于所述答案要点的文本内容从所述视频的视频帧中匹配得到的。Optionally, the text content of the answer points contained in the key frame image is recognized from the video frame of the video, and the image content of the answer points contained in the key frame image is based on the The text content of the answer gist is matched from the video frame of the video.
可选的,所述响应于检测到对所述视频封面的触发操作对所述视频进行播放,包括:响应于检测到对所述视频封面上的第一关键帧图像的触发操作,确定所述第一关键帧图像 在所述视频的播放时间轴上对应的时间标识;跳转到所述视频的播放页面,从所述时间标识开始播放所述视频Optionally, the playing the video in response to detecting a trigger operation on the video cover includes: determining the The time mark corresponding to the first key frame image on the playback time axis of the video; jump to the play page of the video, and start playing the video from the time mark
本公开还提供了一种问答结果的搜索方法,包括:The present disclosure also provides a search method for question and answer results, including:
接收终端设备的搜索请求,所述搜索请求中包括搜索问题;receiving a search request from a terminal device, where the search request includes a search question;
基于所述搜索问题,搜索获得包含所述搜索问题的答案的视频;Based on the search question, searching for videos containing answers to the search question;
基于所述视频的音频信号或者文本内容,确定所述答案的多个答案要点;determining a plurality of answer points of the answer based on the audio signal or the text content of the video;
针对每个答案要点,从所述视频中处理得到包含所述答案要点的文本内容和图像内容的关键帧图像;For each answer point, process a key frame image containing text content and image content of the answer point from the video;
基于所述多个答案要点对应的多个关键帧图像,生成所述视频的视频封面;Generate a video cover of the video based on a plurality of key frame images corresponding to the plurality of answer points;
将包含所述视频封面的视频信息反馈给所述终端设备。Feedback the video information including the video cover to the terminal device.
可选的,从所述视频中处理得到包含所述答案要点的文本内容和图像内容的关键帧图像,包括:Optionally, the key frame images containing the text content and image content of the answer points are obtained from the video processing, including:
对所述视频进行字符识别处理,得到所述答案要点的文本内容;基于所述答案要点的文本内容,从所述视频的视频帧中识别与所述文本内容相匹配的图像内容;从所述视频帧中截取所述图像内容;基于所述答案要点的文本内容和所述图像内容,生成包含所述答案要点的关键帧图像。Carrying out character recognition processing to the video to obtain the text content of the answer points; based on the text content of the answer points, identifying the image content matching the text content from the video frame of the video; The image content is intercepted in the video frame; based on the text content of the answer points and the image content, a key frame image containing the answer points is generated.
可选的,所述基于所述多个答案要点对应的多个关键帧图像,生成所述视频的视频封面,包括:Optionally, generating the video cover of the video based on the multiple key frame images corresponding to the multiple answer points includes:
基于所述多个答案要点在所述答案中的顺序,确定所述多个答案要点对应的多个关键帧图像在所述视频封面中的排布顺序;基于所述多个关键帧图像的尺寸以及所述多个关键帧图像之间的排布顺序,确定对应的拼接模板;基于所述拼接模板中各区域的尺寸,对各区域对应的关键帧图像进行等比例缩放处理;将缩放后的关键帧图像插入到所述模板的对应区域中,得到所述视频的视频封面。Based on the order of the plurality of answer points in the answer, determine the arrangement order of the plurality of key frame images corresponding to the plurality of answer points in the video cover; based on the size of the plurality of key frame images and the order of arrangement between the plurality of key frame images, determine the corresponding mosaic template; based on the size of each region in the mosaic template, perform proportional scaling processing on the key frame images corresponding to each region; The key frame image is inserted into the corresponding area of the template to obtain the video cover of the video.
可选的,所述将包含所述视频封面的视频信息反馈给所述终端设备之前,还包括:确定各关键帧图像在所述视频的播放时间轴上对应的时间标识;将各关键帧图像与时间标识之间的对应关系,添加到所述视频信息中。Optionally, before feeding back the video information including the cover of the video to the terminal device, it also includes: determining the time mark corresponding to each key frame image on the playback time axis of the video; The corresponding relationship with the time identifier is added to the video information.
可选的,所述确定各关键帧图像在所述视频的播放时间轴上对应的时间标识,包括:Optionally, the determining the time mark corresponding to each key frame image on the playback time axis of the video includes:
针对每个关键帧图像,将所述关键帧图像与所述视频中的视频帧进行匹配,确定所述视频帧中与所述关键帧图像相匹配的目标视频帧;将所述目标视频帧对应的时间标识确定 为所述关键帧图像在所述视频的播放时间轴上对应的时间标识。For each key frame image, the key frame image is matched with the video frame in the video, and the target video frame matching the key frame image in the video frame is determined; the target video frame is correspondingly The time identifier of is determined as the corresponding time identifier of the key frame image on the playing time axis of the video.
可选的,所述确定各关键帧图像在所述视频的播放时间轴上对应的时间标识,包括:根据各视频帧与所述视频的播放时间轴上的播放时间之间的映射关系,确定各关键帧图像在所述视频的播放时间轴上对应的时间标识。Optionally, the determining the time identifier corresponding to each key frame image on the playing time axis of the video includes: determining according to the mapping relationship between each video frame and the playing time on the playing time axis of the video Time marks corresponding to each key frame image on the playing time axis of the video.
本公开还提供了一种问答结果的搜索装置,包括:The present disclosure also provides a search device for question and answer results, including:
信息获取模块,用于获取与搜索问题对应的视频信息;其中,所述视频信息包括视频封面,所述视频封面由视频的多个关键帧图像组成,各所述关键帧图像中包括解答所述搜索问题的文本内容和图像内容,所述关键帧图像的数量与解答所述搜索问题的答案要点的数量相匹配;An information acquisition module, configured to acquire video information corresponding to the search question; wherein, the video information includes a video cover, and the video cover is composed of a plurality of key frame images of the video, and each of the key frame images includes the answer described in the text content and image content of the search question, the number of keyframe images matching the number of answer points to answer the search question;
封面展示模块,用于在搜索结果页面中展示所述视频封面;A cover display module, configured to display the cover of the video on the search result page;
视频播放模块,用于响应于检测到对所述视频封面的触发操作对所述视频进行播放。A video playing module, configured to play the video in response to detecting a trigger operation on the video cover.
可选的,所述多个关键帧图像在所述视频封面中的排布顺序与所述搜索问题对应的答案要点顺序匹配。Optionally, the arrangement order of the plurality of key frame images in the video cover matches the order of key points of the answers corresponding to the search question.
可选的,所述关键帧图像是从所述视频的视频帧中截取获得的包含所述答案要点的文本内容和图像内容的图像。Optionally, the key frame image is an image including the text content and the image content of the key points of the answer obtained by intercepting from the video frame of the video.
可选的,所述关键帧图像中包含的所述答案要点的文本内容是从所述视频的视频帧中识别得到的,所述关键帧图像中包含的所述答案要点的图像内容是基于所述答案要点的文本内容从所述视频的视频帧中匹配得到的。Optionally, the text content of the answer points contained in the key frame image is recognized from the video frame of the video, and the image content of the answer points contained in the key frame image is based on the The text content of the answer gist is matched from the video frame of the video.
可选的,所述视频播放模块用于:响应于检测到对所述视频封面上的第一关键帧图像的触发操作,确定所述第一关键帧图像在所述视频的播放时间轴上对应的时间标识;跳转到所述视频的播放页面,从所述时间标识开始播放所述视频。Optionally, the video playing module is configured to: in response to detecting a trigger operation on the first key frame image on the cover of the video, determine that the first key frame image corresponds to time mark; jump to the playing page of the video, and start playing the video from the time mark.
本公开还提供了一种问答结果的搜索装置,包括:The present disclosure also provides a search device for question and answer results, including:
请求接收模块,用于接收终端设备的搜索请求,所述搜索请求中包括搜索问题;A request receiving module, configured to receive a search request from a terminal device, where the search request includes a search question;
视频搜索模块,用于基于所述搜索问题,搜索获得包含所述搜索问题的答案的视频;A video search module, configured to search for a video containing an answer to the search question based on the search question;
要点确定模块,用于基于所述视频的音频信号或者文本内容,确定所述答案的多个答案要点;A point determination module, configured to determine multiple answer points of the answer based on the audio signal or text content of the video;
图像得到模块,用于针对每个答案要点,从所述视频中处理得到包含所述答案要点的文本内容和图像内容的关键帧图像;An image obtaining module, for each answer point, from the video to process the key frame image containing the text content and image content of the answer point;
封面生成模块,用于基于所述多个答案要点对应的多个关键帧图像,生成所述视频的 视频封面;A cover generation module, for generating the video cover of the video based on a plurality of key frame images corresponding to the plurality of answer points;
信息反馈模块,用于将包含所述视频封面的视频信息反馈给所述终端设备。An information feedback module, configured to feed back video information including the video cover to the terminal device.
可选的,所述图像得到模块包括:Optionally, the image obtaining module includes:
文本内容得到单元,用于对所述视频进行字符识别处理,得到所述答案要点的文本内容;The text content obtaining unit is used to perform character recognition processing on the video to obtain the text content of the answer points;
图像内容识别单元,用于基于所述答案要点的文本内容,从所述视频的视频帧中识别与所述文本内容相匹配的图像内容;an image content identification unit, configured to identify image content matching the text content from the video frame of the video based on the text content of the answer points;
图像内容截取单元,用于从所述视频帧中截取所述图像内容;an image content intercepting unit, configured to intercept the image content from the video frame;
图像生成单元,用于基于所述答案要点的文本内容和所述图像内容,生成包含所述答案要点的关键帧图像。An image generation unit, configured to generate a key frame image containing the answer points based on the text content of the answer points and the image content.
可选的,所述图像得到模块包括:Optionally, the image obtaining module includes:
顺序确定单元,用于基于所述多个答案要点在所述答案中的顺序,确定所述多个答案要点对应的多个关键帧图像在所述视频封面中的排布顺序;A sequence determination unit, configured to determine the arrangement order of the plurality of key frame images corresponding to the plurality of answer points in the video cover based on the order of the plurality of answer points in the answer;
模板确定单元,用于基于所述多个关键帧图像的尺寸以及所述多个关键帧图像之间的排布顺序,确定对应的拼接模板;A template determination unit, configured to determine a corresponding splicing template based on the size of the plurality of key frame images and the arrangement sequence among the plurality of key frame images;
缩放处理单元,用于基于所述拼接模板中各区域的尺寸,对各区域对应的关键帧图像进行等比例缩放处理;A scaling processing unit, configured to perform proportional scaling processing on the key frame images corresponding to each area based on the size of each area in the mosaic template;
图像插入单元,用于将缩放后的关键帧图像插入到所述模板的对应区域中,得到所述视频的视频封面。The image inserting unit is configured to insert the scaled key frame image into the corresponding area of the template to obtain the video cover of the video.
可选的,所述装置还包括:Optionally, the device also includes:
时间标识确定模块,用于确定各关键帧图像在所述视频的播放时间轴上对应的时间标识;A time stamp determination module, configured to determine the time stamp corresponding to each key frame image on the playing time axis of the video;
关系添加模块,用于将各关键帧图像与时间标识之间的对应关系,添加到所述视频信息中。The relationship adding module is used to add the corresponding relationship between each key frame image and the time stamp to the video information.
可选的,所述时间标识确定模块包括:Optionally, the time stamp determination module includes:
匹配单元,用于针对每个关键帧图像,将所述关键帧图像与所述视频中的视频帧进行匹配,确定所述视频帧中与所述关键帧图像相匹配的目标视频帧;A matching unit, for each key frame image, matching the key frame image with a video frame in the video, and determining a target video frame in the video frame that matches the key frame image;
第一标识确定单元,用于将所述目标视频帧对应的时间标识确定为所述关键帧图像在所述视频的播放时间轴上对应的时间标识。The first identification determining unit is configured to determine the time identification corresponding to the target video frame as the time identification corresponding to the key frame image on the playback time axis of the video.
可选的,所述时间标识确定模块包括:第二标识确定单元,用于根据各视频帧与所述视频的播放时间轴上的播放时间之间的映射关系,确定各关键帧图像在所述视频的播放时间轴上对应的时间标识。Optionally, the time identifier determination module includes: a second identifier determination unit, configured to determine the key frame images in the The corresponding time mark on the playing time axis of the video.
本公开还提供了一种终端设备,包括:存储器和处理器,其中,所述存储器中存储有计算机程序,当所述计算机程序被所述处理器执行时,所述处理器执行如上所述的方法。The present disclosure also provides a terminal device, including: a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the above-mentioned method.
本公开还提供了一种计算机可读存储介质,所述存储介质中存储有计算机程序,当所述计算机程序被处理器执行时,所述处理器执行如上所述的方法。The present disclosure also provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the processor executes the method as described above.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or related technologies, the following will briefly introduce the drawings that need to be used in the descriptions of the embodiments or related technologies. Obviously, for those of ordinary skill in the art, Other drawings can also be obtained from these drawings without any creative effort.
图1为本公开实施例一所述问答结果的搜索方法流程图;FIG. 1 is a flowchart of a search method for question and answer results according to Embodiment 1 of the present disclosure;
图2为本公开实施例一所述生成视频封面的方法流程图;FIG. 2 is a flowchart of a method for generating a video cover according to Embodiment 1 of the present disclosure;
图3为本公开实施例一所述视频封面的示意图;FIG. 3 is a schematic diagram of a video cover according to Embodiment 1 of the present disclosure;
图4为本公开实施例二所述问答结果的搜索方法流程图;FIG. 4 is a flowchart of a search method for question and answer results described in Embodiment 2 of the present disclosure;
图5为本公开实施例四所述问答结果的搜索装置的结构框图;FIG. 5 is a structural block diagram of a search device for question and answer results described in Embodiment 4 of the present disclosure;
图6为本公开实施例五所述问答结果的搜索装置的结构框图。FIG. 6 is a structural block diagram of a search device for question and answer results according to Embodiment 5 of the present disclosure.
具体实施方式Detailed ways
为了能够更清楚地理解本公开的上述特征和优点,下面将对本公开的方案进行进一步描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。In order to understand the above features and advantages of the present disclosure more clearly, the solutions of the present disclosure will be further described below. It should be noted that, in the case of no conflict, the embodiments of the present disclosure and the features in the embodiments can be combined with each other.
在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,而不是全部的实施例。In the following description, many specific details are set forth in order to fully understand the present disclosure, but the present disclosure can also be implemented in other ways than described here; obviously, the embodiments in the description are only some of the embodiments of the present disclosure, and Not all examples.
相关技术提供的视频应用中,用户可以搜索相关视频进行播放。然而,在视频搜索结果中用户需要花费时间找到自己需要的内容进行观看,降低了搜索效率。基于此,本公开 实施例提供了一种问答结果的搜索方法、装置、设备及存储介质,以下对本公开实施例进行详细介绍。In the video application provided by the related technology, the user can search for related videos and play them. However, in the video search results, users need to spend time to find the content they need to watch, which reduces the search efficiency. Based on this, embodiments of the present disclosure provide a search method, device, device, and storage medium for question and answer results, and the embodiments of the present disclosure will be described in detail below.
实施例一:Embodiment one:
参照图1所示的问答结果的搜索方法流程图,该方法可应用于具有视频应用的终端设备,终端设备诸如手机、平板电脑等。问答结果的搜索方法包括如下步骤:Referring to the flow chart of the search method for question and answer results shown in FIG. 1 , the method can be applied to terminal devices with video applications, such as mobile phones and tablet computers. The search method for the question and answer results includes the following steps:
步骤S102,获取与搜索问题对应的视频信息;其中,视频信息包括视频封面,视频封面由视频的多个关键帧图像组成,各关键帧图像中包括解答搜索问题的文本内容和图像内容,关键帧图像的数量与解答搜索问题的答案要点的数量相匹配。Step S102, obtain video information corresponding to the search question; wherein, the video information includes a video cover, and the video cover is composed of a plurality of key frame images of the video, and each key frame image includes text content and image content for answering the search question, and the key frame The number of images matches the number of answer points that answer the search question.
在一种实施例中,用户通过终端设备向服务器发送包括搜索问题的搜索请求;其中,搜索问题中可以包含至少一个关键词。In an embodiment, the user sends a search request including a search question to the server through the terminal device; wherein, the search question may contain at least one keyword.
为便于理解,以下给出服务器反馈与搜索问题对应的视频信息的一种实施方式。For ease of understanding, an implementation manner in which the server feeds back video information corresponding to the search question is given below.
首先,服务器基于搜索问题,搜索获得包含搜索问题的答案的视频;在实施时,服务器可以先基于搜索问题,搜索与搜索问题中的关键词相匹配的答案,然后再从预设的视频资源库中搜索包含该答案的视频。First, based on the search question, the server searches to obtain a video containing the answer to the search question; during implementation, the server can first search for an answer that matches the keywords in the search question based on the search question, and then select the video from the preset video resource library. Search for videos with that answer in .
然后,基于视频的音频信号或者文本内容,确定答案的多个答案要点。服务器可以识别视频的内容,如视频的文本内容、音频信号、图像内容和播放时间等,以便根据识别到的内容确定答案的多个答案要点。针对每个答案要点,从视频中处理得到包含答案要点的文本内容和图像内容的关键帧图像;具体地例如,利用识别到的文本内容和图像内容,从视频中确定与答案要点相匹配的关键帧图像。Then, based on the audio signal or the text content of the video, a plurality of answer points of the answer are determined. The server can identify the content of the video, such as text content, audio signal, image content and playing time of the video, so as to determine multiple answer points of the answer based on the identified content. For each answer point, process the key frame image containing the text content and image content of the answer point from the video; specifically, for example, use the recognized text content and image content to determine from the video the key that matches the answer point frame image.
此后,基于多个答案要点对应的多个关键帧图像,生成视频的视频封面,以及,将包含视频封面的视频信息反馈给终端设备。其中,视频封面可以为各关键帧图像的拼接处理结果,各关键帧图像的拼接方式灵活多变,由此组成的视频封面能够呈现较好的页面展示效果。Thereafter, based on the multiple key frame images corresponding to the multiple answer points, the video cover of the video is generated, and the video information including the video cover is fed back to the terminal device. Wherein, the video cover may be the splicing processing result of each key frame image, and the splicing mode of each key frame image is flexible and changeable, and the video cover thus formed can present a better page display effect.
步骤S104,在搜索结果页面中展示视频封面。Step S104, displaying the video cover on the search result page.
终端设备接收视频信息并在搜索结果页面中展示视频信息中包含的视频封面。The terminal device receives the video information and displays the video cover contained in the video information on the search result page.
步骤S106,响应于检测到对视频封面的触发操作对视频进行播放。In step S106, the video is played in response to detecting a trigger operation on the video cover.
在本实施例中,终端设备判断是否检测到对视频封面的触发操;如果检测到触发操作,则响应于触发操作对视频进行播放。In this embodiment, the terminal device judges whether a trigger operation on the video cover is detected; if a trigger operation is detected, the video is played in response to the trigger operation.
本公开实施例提供的问答结果的搜索方法,通过获取与搜索问题对应的视频信息;该视频信息包括由视频的多个关键帧图像组成的视频封面,各关键帧图像中包括解答搜索问题的文本内容和图像内容,关键帧图像的数量与解答搜索问题的答案要点的数量相匹配;在搜索结果页面中展示视频封面;响应于检测到对视频封面的触发操作对视频进行播放。在本技术方案中,组成视频封面的各关键帧图像中包括解答搜索问题的文本内容和图像内容,其中的文本内容更符合用户日常的阅读习惯,能够提升搜索结果的直观性,图像内容则具有较强的画面感,能够提升搜索结果的生动性;以及,关键帧图像的数量与答案要点的数量相匹配,能够更为全面地覆盖搜索问题下的解答;基于此,在搜索结果页面中展示的视频封面能够全面、直观且准确的表达出用户搜索视频的意愿,这样用户从视频封面即可快速找到自己需要观看的内容,于是,响应于检测到对视频封面的触发操作对视频进行播放,有效提高了视频搜索效率。The search method for the question-and-answer results provided by the embodiments of the present disclosure obtains video information corresponding to the search question; the video information includes a video cover composed of multiple key frame images of the video, and each key frame image includes text to answer the search question Content and image content, the number of keyframe images matches the number of answer points to answer the search question; display the video cover in the search results page; play the video in response to detecting a trigger action on the video cover. In this technical solution, the key frame images that make up the video cover include text content and image content that answer the search question. The text content is more in line with the user's daily reading habits and can improve the intuitiveness of the search results. The image content has A strong image sense can improve the vividness of the search results; and, the number of key frame images matches the number of answer points, which can more comprehensively cover the answers to the search questions; based on this, display on the search results page The video cover of the video can comprehensively, intuitively and accurately express the user's willingness to search for the video, so that the user can quickly find the content they need to watch from the video cover, so, in response to the detection of the trigger operation on the video cover, the video is played. Effectively improve the video search efficiency.
考虑到关键帧图像的排布方式对组成的视频封面的展示效果有较大影响;基于此,本实施例提供一种视频封面的展示方式:多个关键帧图像在视频封面中的排布顺序与搜索问题对应的答案要点顺序匹配。该视频封面通过有序排布的关键帧图像,来凸显答案要点的顺序,增加了视频封面与搜索问题的答案之间的匹配性。Considering that the arrangement of key frame images has a great influence on the display effect of the composed video cover; based on this, this embodiment provides a display method of the video cover: the arrangement order of multiple key frame images in the video cover Match the order of answer points corresponding to the search question. The video cover uses orderly arranged key frame images to highlight the order of the key points of the answer, increasing the matching between the video cover and the answer to the search question.
在实际应用中,基于搜索问题搜索获得的视频一般为多个,且视频比例不一定相同;例如,视频分为横版视频和竖版视频。视频封面通常取自视频中的视频帧,在此情况下,不同视频对应的视频封面尺寸不一,当在同一搜索结果页面中同时展示多个视频的视频封面时,展示界面的排版将会比较混乱。而如果简单地将不同比例的视频封面进行尺寸统一,比如将竖版视频以横版展示封面进行展示,则横版展示封面的效果并不好且难以保障图像质量。In practical applications, there are generally multiple videos obtained by searching based on the search question, and the proportions of the videos are not necessarily the same; for example, videos are divided into horizontal and vertical videos. The video cover is usually taken from the video frame in the video. In this case, the size of the video cover is different for different videos. When the video cover of multiple videos is displayed on the same search result page at the same time, the layout of the display interface will be compared. confusion. However, if the video covers with different ratios are simply unified in size, for example, the vertical version of the video is displayed as a horizontal version of the cover, the effect of the horizontal version of the cover is not good and it is difficult to guarantee the image quality.
从而,为了改进搜索结果页面中视频封面的展示方式,提高界面展示的友好性,本实施例提供一种关键帧图像,该关键帧图像是从视频的视频帧中截取获得的包含答案要点的文本内容和图像内容的图像。其中,关键帧图像中包含的答案要点的文本内容是从视频的视频帧中识别得到的,关键帧图像中包含的答案要点的图像内容是基于答案要点的文本内容从视频的视频帧中匹配得到的。Therefore, in order to improve the display mode of the video cover on the search result page and improve the friendliness of the interface display, this embodiment provides a key frame image, which is the text containing the key points of the answer intercepted from the video frame of the video Image of content and image content. Among them, the text content of the answer points contained in the key frame image is recognized from the video frame of the video, and the image content of the answer points contained in the key frame image is matched from the video frame of the video based on the text content of the answer points of.
在实际应用中,本实施例中的关键帧图像通过服务器生成的,为了更好地理解,在此针对每个答案要点,对关键帧图像的得到方式展开描述。参照图2,从视频中处理得到包含答案要点的文本内容和图像内容的关键帧图像的过程包括如下步骤S202-S208:In practical applications, the key frame image in this embodiment is generated by the server. For better understanding, here is a description of how to obtain the key frame image for each key point of the answer. Referring to Fig. 2, the process of processing the key frame images containing the text content and image content of the answer points from the video includes the following steps S202-S208:
步骤S202,对视频进行字符识别处理,得到答案要点的文本内容。Step S202, performing character recognition processing on the video to obtain the text content of the key points of the answer.
视频大多具有字幕,且字幕能够较准确地反映视频内容,因此,在具体实施例中,可以通过OCR(Optical Character Recognition,光学字符识别)技术或者检测模型对视频中各视频帧进行字符识别处理,得到各视频帧上的候选字符识别结果;判断候选字符识别结果是否能够与任意一个答案要点相匹配;如果是,也即当前的候选字符识别结果能够与某一答案要点匹配成功,则表示该候选字符识别结果与搜索问题相匹配,能够表达搜索问题的答案要点,从而将能够与答案要点匹配成功的候选字符识别结果确定为目标字符识别结果,将目标字符识别结果或者目标字符识别结果中的关键词作为答案要点的文本内容。Most of the videos have subtitles, and the subtitles can reflect the video content more accurately. Therefore, in a specific embodiment, character recognition processing can be performed on each video frame in the video by OCR (Optical Character Recognition, Optical Character Recognition) technology or a detection model. Get the candidate character recognition results on each video frame; judge whether the candidate character recognition results can match any of the answer points; The character recognition result matches the search question, and can express the main points of the answer to the search question, so that the candidate character recognition result that can successfully match the answer key points is determined as the target character recognition result, and the target character recognition result or the key in the target character recognition result words as the text content of the answer points.
一般来说,视频展示的是事情进行的完整程序,基于视频确定的解答同样应该包括具有逻辑性的多个答案要点。例如美食、手工、健身等教程类视频,展示的内容通常包括多个答案要点,每个答案要点对应一个关键步骤。因而,与答案要点相对应,按照上述实施例得到的文本内容也通常为多个。Generally speaking, the video shows the complete process of things going on, and the answer determined based on the video should also include multiple logical answer points. For example, tutorial videos such as food, handicrafts, and fitness. The displayed content usually includes multiple answer points, and each answer point corresponds to a key step. Therefore, corresponding to the key points of the answer, there are usually multiple text contents obtained according to the above embodiment.
步骤S204,基于答案要点的文本内容,从视频的视频帧中识别与文本内容相匹配的图像内容。Step S204, based on the text content of the answer points, identify the image content matching the text content from the video frame of the video.
当视频的总帧数较少时,可以从视频的视频帧中逐一识别与文本内容相匹配的图像内容。When the total number of frames of the video is small, the image content matching the text content can be identified one by one from the video frames of the video.
当视频的总帧数较多时,视频包含的信息会更加丰富多样,为了提高识别效率,本实施例可以先对视频进行分段处理,得到多个视频片段,然后再从至少部分视频片段的视频帧中识别与文本内容相匹配的图像内容。具体实施方式可参照如下步骤(1)和(2)的示例。When the total number of frames of the video is large, the information contained in the video will be richer and more diverse. In order to improve the recognition efficiency, this embodiment can first segment the video to obtain a plurality of video segments, and then from at least some of the video segments. Identify the image content in the frame that matches the text content. For specific implementation, reference may be made to the following examples of steps (1) and (2).
(1)基于对视频进行字符识别处理得到的候选字符识别结果,计算连续两个视频帧的字符之间的相关度,当计算的相关度小于预定相关度阈值时,基于上述连续两个视频帧的位置对视频进行分段处理,得到多个包含字符的视频片段。(1) Based on the candidate character recognition results obtained by character recognition processing of the video, calculate the correlation between the characters of two consecutive video frames. When the calculated correlation is less than the predetermined correlation threshold, based on the above two consecutive video frames The position of the video is segmented to obtain multiple video segments containing characters.
(2)基于答案要点的文本内容,从视频片段的视频帧中识别与文本内容相匹配的图像内容。(2) Based on the text content of the answer gist, identify the image content matching the text content from the video frames of the video clip.
考虑到视频播放的时间顺序与各答案要点之间的顺序是前后一致的,基于此,各个视频片段的时间顺序与各答案要点的顺序也是有关联的。在此情况下,可以为每个答案要点分配至少一个视频片段;针对当前答案要点,基于当前答案要点的文本内容,从为当前答案要点分配的视频片段的视频帧中识别与文本内容相匹配的图像内容。具体实现方式可以 包括:逐一识别视频片段中每一视频帧中的图像内容,并判断图像识别结果是否与当前答案要点的文本内容相匹配,如果匹配,则得到与文本内容相匹配的图像内容。Considering that the chronological sequence of the video playback is consistent with the sequence of the key points of each answer, based on this, the chronological sequence of each video clip is also related to the sequence of the key points of each answer. In this case, at least one video segment may be assigned to each answer point; for the current answer point, based on the text content of the current answer point, identifying a video frame matching the text content from the video frames of the video segment assigned to the current answer point image content. The specific implementation may include: identifying the image content in each video frame in the video clip one by one, and judging whether the image recognition result matches the text content of the current answer point, and if so, obtaining the image content matching the text content.
当然,如果不能从分配的视频片段中识别出与当前答案要点的文本内容相匹配的图像内容,则再从其他的视频片段中识别与当前答案要点的文本内容相匹配的图像内容。该方式无需将视频的全部视频帧分别与每个答案要点的文本内容进行比对,针对每个答案要点,只需要与部分视频片段中的视频帧进行比较,且由于每个答案要点均分配有各自的视频片段,由此多个答案要点能够同时进行图像内容的识别,因此,该方式能够有效提升图像内容的识别效率。Certainly, if the image content matching the text content of the current answer main point cannot be identified from the assigned video clip, then the image content matching the text content of the current answer main point is identified from other video clips. This method does not need to compare all the video frames of the video with the text content of each answer point. For each answer point, it only needs to compare with the video frames in some video clips, and since each answer point is assigned a Respective video clips, so multiple answer points can simultaneously identify the image content. Therefore, this method can effectively improve the efficiency of image content identification.
步骤S206,从视频帧中截取图像内容。Step S206, intercepting image content from the video frame.
本实施例在从视频帧中识别与文本内容相匹配的图像内容后,对相应的视频帧进行截取处理,得到包含图像内容的截取图像。In this embodiment, after the image content matching the text content is identified from the video frame, the corresponding video frame is intercepted to obtain an intercepted image including the image content.
步骤S208,基于答案要点的文本内容和图像内容,生成包含答案要点的关键帧图像。Step S208, based on the text content and image content of the key points of the answer, a key frame image containing the key points of the answer is generated.
为了较完整地保留答案要点的文本内容和图像内容,本实施例可以通过如下方式生成包含答案要点的关键帧图像,包括:In order to preserve the text content and image content of the key points of the answer more completely, this embodiment can generate a key frame image containing the key points of the answer in the following manner, including:
对截取图像进行目标检测,得到截取图像中包含答案要点的文本内容和图像内容的包围框。根据预设的长宽比例对包围框进行扩大调整;其中,设置长宽比例是为了方便对关键帧图像进行拼接处理,改善拼接过程中图像之间尺寸不匹配;以及,扩大调整是指,当相对于预设的长宽比例,包围框的某一项尺寸参数较小时,增加该较小的尺寸参数的大小,以使调整后的包围框的长宽比例与预设的长宽比例相同。比如,当相对于预设的长宽比例,包围框的宽度较小时,采用增加包围框的宽度的调整方式对包围框进行尺寸调整。扩大的调整方式不会出现裁减掉局部文本内容或局部图像内容的问题,保证了文本内容和图像内容的完整性。。基于调整后包围框在截取图像中的位置参数,从截取图像中截取位置参数确定的图像,得到包含答案要点的关键帧图像。Perform target detection on the intercepted image, and obtain the bounding box of the text content and image content containing the key points of the answer in the intercepted image. Expand and adjust the bounding box according to the preset aspect ratio; among them, setting the aspect ratio is to facilitate the splicing process of key frame images and improve the size mismatch between images during the splicing process; and, expanding adjustment refers to, when Compared with the preset aspect ratio, when a size parameter of the bounding box is small, increase the size of the smaller size parameter, so that the adjusted aspect ratio of the bounding box is the same as the preset aspect ratio. For example, when the width of the bounding box is smaller than the preset aspect ratio, the size of the bounding box is adjusted by increasing the width of the bounding box. The enlarged adjustment method will not cause the problem of cutting out partial text content or partial image content, which ensures the integrity of text content and image content. . Based on the adjusted position parameters of the bounding box in the intercepted image, the image determined by the position parameters is intercepted from the intercepted image to obtain a key frame image containing the key points of the answer.
根据以上任一种实施例,从视频中提取出多个答案要点对应的多个关键帧图像后,基于多个答案要点对应的多个关键帧图像,生成视频的视频封面。本步骤的实施方式可参照如下:According to any one of the above embodiments, after the multiple key frame images corresponding to the multiple answer points are extracted from the video, the video cover of the video is generated based on the multiple key frame images corresponding to the multiple answer points. The implementation of this step can refer to the following:
基于多个答案要点在答案中的顺序,确定多个答案要点对应的多个关键帧图像在视频封面中的排布顺序。具体的,按照答案要点在答案中的顺序,对各答案要点的文本内容对应的图像内容进行排序,将图像内容的排序结果作为关键帧图像之间的排布顺序。Based on the order of the multiple answer points in the answer, the arrangement sequence of the multiple key frame images corresponding to the multiple answer points in the video cover is determined. Specifically, the image content corresponding to the text content of each answer point is sorted according to the order of the answer points in the answer, and the sorting result of the image content is used as the arrangement order between the key frame images.
基于多个关键帧图像的尺寸以及多个关键帧图像之间的排布顺序,确定对应的拼接模板。具体实施中,可以基于关键帧图像的数量、排布顺序和尺寸,确定拼接模板。例如,关键帧图像的数量为具有前后排布顺序的两张图像,那么,确定拼接模板时,选择以上下、左右等方式体现排布顺序,且可以拼接两个图像的拼接模板。根据关键帧图像的尺寸确定各关键帧图像的宽高比例,在确定拼接模板时,可以选择与宽高比例相匹配的拼接模板。本领域技术人员也可以根据实际需要,基于其他图像参数来确定拼接模板,在此不做具体限定。本实施例中的拼接模板可以是预先设置在拼接模板库中供调用的。Based on the sizes of the multiple key frame images and the arrangement order among the multiple key frame images, a corresponding splicing template is determined. In a specific implementation, the splicing template may be determined based on the number, arrangement order and size of the key frame images. For example, if the number of key frame images is two images with an arrangement order of front and rear, then, when determining the splicing template, select a splicing template that reflects the arrangement order in the ways of up-down, left-right, etc., and can splice two images. The width-to-height ratio of each key-frame image is determined according to the size of the key-frame image, and a stitching template that matches the width-to-height ratio can be selected when determining a splicing template. Those skilled in the art may also determine the splicing template based on other image parameters according to actual needs, which is not specifically limited here. The mosaic template in this embodiment may be preset in a mosaic template library for calling.
基于拼接模板中各区域的尺寸,对各区域对应的关键帧图像进行等比例缩放处理;以及,将缩放后的关键帧图像插入到模板的对应区域中,得到视频的视频封面。针对当前区域,本实施例根据当前区域的尺寸以及当前区域对应的关键帧图像的尺寸计算缩放比例;根据计算出的缩放比例将关键帧图像进行等比例缩放处理;将缩放后的关键帧图像插入到当前区域中。Based on the size of each area in the mosaic template, the key frame images corresponding to each area are scaled proportionally; and the scaled key frame images are inserted into the corresponding areas of the template to obtain the video cover of the video. For the current region, this embodiment calculates the scaling ratio according to the size of the current region and the size of the key frame image corresponding to the current region; performs proportional scaling on the key frame image according to the calculated scaling ratio; inserts the zoomed key frame image into into the current region.
根据以上实施例,图3提供了几种在搜索结果页面中展示视频封面的示例。图3左图展示的是基于文本内容的关键帧图像提取方式提取三张关键帧图像后,对关键帧图像进行截取处理和拼接处理后得到的视频封面。可以看出,图3左图展示的视频封面的各关键帧图像中,包括解答搜索框中的搜索问题(怎么删除手机垃圾)的答案要点以及答案要点对应的本文内容(如字幕:“正确清理手机内存”)和图像内容;同时,多个关键帧图像在视频封面中由左到右、由上到下的排布顺序与搜索问题对应的答案要点的顺序匹配。图3右图展示的视频封面的各关键帧图像中,均包含与搜索问题的答案要点相关联的图像内容,依次为:肉末、蛋液、肉末蒸蛋。According to the above embodiment, Fig. 3 provides several examples of displaying video cover on the search result page. The left picture in Figure 3 shows the video cover obtained after the key frame image extraction method based on the text content extracts three key frame images, and then intercepts and stitches the key frame images. It can be seen that each key frame image of the video cover shown in the left picture of Figure 3 includes the key points of the answer to the search question in the search box (how to delete mobile phone garbage) and the content of this article corresponding to the key points of the answer (such as the subtitle: "Correctly clean up Mobile phone memory") and image content; at the same time, the order of arrangement of multiple key frame images from left to right and from top to bottom in the video cover matches the order of the answer points corresponding to the search question. Each key frame image of the video cover shown in the right figure of Figure 3 contains the image content associated with the key points of the answer to the search question, which are: minced meat, egg liquid, and steamed egg with minced meat.
在本实施例中,终端设备获取的与搜索问题对应的视频信息还可以包括时间标识,时间标识用于表示视频封面中关键帧图像在视频的播放时间轴上时间。In this embodiment, the video information corresponding to the search question acquired by the terminal device may also include a time stamp, which is used to indicate the time on the video playing time axis of the key frame image in the video cover.
基于此,本实施例提供一种响应于检测到对视频封面的触发操作对视频进行播放的方法,包括:Based on this, this embodiment provides a method for playing a video in response to detecting a trigger operation on the video cover, including:
响应于检测到对视频封面上的第一关键帧图像的触发操作,确定第一关键帧图像在视频的播放时间轴上对应的时间标识;跳转到视频的播放页面,从时间标识开始播放视频。In response to detecting a trigger operation on the first key frame image on the video cover, determine the time mark corresponding to the first key frame image on the playback time axis of the video; jump to the play page of the video, and start playing the video from the time mark .
终端设备判断是否检测到触发操作;该触发操作是针对搜索结果解页面上视频封面的第一关键帧图像的操作,例如当终端设备的显示屏为触控式显示屏,那么触发操作可以为手指、触控笔等操作体的触控操作;当终端设备的输入设备为鼠标时,触发操作可以是用 户通过鼠标在搜索结果页面上的点击操作。The terminal device judges whether a trigger operation is detected; the trigger operation is an operation aimed at solving the first key frame image of the video cover on the search result page. For example, when the display screen of the terminal device is a touch screen, the trigger operation can be a , a touch operation of an operating body such as a stylus; when the input device of the terminal device is a mouse, the triggering operation may be a click operation of the user on the search result page through the mouse.
如果检测到触发操作,则响应于触发操作,确定第一关键帧图像在视频的播放时间轴上对应的时间标识;跳转到视频的播放页面,从时间标识开始播放视频,也就是基于时间标识,以第一关键帧为起始帧开始播放视频。If a trigger operation is detected, in response to the trigger operation, determine the time mark corresponding to the first key frame image on the playback time axis of the video; jump to the video playback page, and start playing the video from the time mark, that is, based on the time mark , start playing the video with the first keyframe as the starting frame.
为便于理解,以下给出服务器生成时间标识的一种实施例,包括:首先确定各关键帧图像在视频的播放时间轴上对应的时间标识。For ease of understanding, an embodiment of generating a time stamp by the server is given below, including: first determining the time stamp corresponding to each key frame image on the playing time axis of the video.
本实施例可以采用多种实现方式确定各关键帧图像对应的时间标识,以以下两种方式为例。In this embodiment, various implementation manners may be adopted to determine the time identifier corresponding to each key frame image, and the following two manners are taken as examples.
实现方式一:针对每个关键帧图像,将关键帧图像与视频中的视频帧进行匹配,确定视频帧中与关键帧图像相匹配的目标视频帧;将目标视频帧对应的时间标识确定为关键帧图像在视频的播放时间轴上对应的时间标识。Implementation method 1: For each key frame image, match the key frame image with the video frame in the video, determine the target video frame in the video frame that matches the key frame image; determine the time identifier corresponding to the target video frame as the key The time mark corresponding to the frame image on the playback time axis of the video.
实现方式二:根据各视频帧与视频的播放时间轴上的播放时间之间的映射关系,确定各关键帧图像在视频的播放时间轴上对应的时间标识。Implementation mode 2: According to the mapping relationship between each video frame and the playing time on the playing time axis of the video, the corresponding time identifier of each key frame image on the playing time axis of the video is determined.
根据以上方式确定各关键帧图像对应的时间标识后,将各关键帧图像与时间标识之间的对应关系,添加到视频信息中。After the time stamp corresponding to each key frame image is determined according to the above method, the corresponding relationship between each key frame image and the time stamp is added to the video information.
综上,上述公开实施例提供的问答结果的搜索方法,终端设备从服务器获取与搜索问题对应的视频信息;视频信息包括:能够在搜索结果页面中展示的视频封面,和能够指定视频播放位置的时间标识。其中,视频封面是由多个关键帧图像组成的,组成方式可以灵活多变,丰富了视频封面的展示方式;关键帧图像的排布顺序与搜索问题对应的答案要点匹配,体现了一定的逻辑性;由于组成视频封面的各关键帧图像中包括解答搜索问题的文本内容和图像内容,其中的文本内容更符合用户日常的阅读习惯,能够提升搜索结果的直观性,图像内容则具有较强的画面感,能够提升搜索结果的生动性。因此,在搜索结果页面中展示的视频封面,能够提高搜索结果页面展示的友好性,全面、直观且准确的表达出用户搜索视频的意愿,这样用户从视频封面即可快速找到自己需要观看的内容,于是,响应于触发操作对视频进行播放有效提高了视频搜索效率。对于视频信息中的时间标识,能够方便用户快速跳转至想看的视频位置,提升了用户体验。To sum up, in the search method for question and answer results provided by the above disclosed embodiments, the terminal device obtains video information corresponding to the search question from the server; the video information includes: the video cover that can be displayed on the search result page, and the video that can specify the video playback position. time stamp. Among them, the video cover is composed of multiple key frame images, and the composition method can be flexible and changeable, which enriches the display mode of the video cover; the arrangement order of the key frame images matches the answer points corresponding to the search question, reflecting a certain logic Because each key frame image that makes up the video cover includes text content and image content to answer the search question, the text content is more in line with the user's daily reading habits and can improve the intuitiveness of the search results, while the image content has a strong The sense of picture can improve the vividness of search results. Therefore, the video cover displayed on the search result page can improve the friendliness of the search result page display, comprehensively, intuitively and accurately express the user's willingness to search for the video, so that the user can quickly find the content they need to watch from the video cover , thus, playing the video in response to the trigger operation effectively improves the video search efficiency. For the time stamp in the video information, it is convenient for the user to quickly jump to the desired video position, which improves the user experience.
实施例二:Embodiment two:
根据以上实施例一,本实施例还可以提供一种问答结果的搜索方法,该方法应用于服 务器;如图4所示,该方法包括:According to above embodiment one, present embodiment can also provide a kind of search method of question and answer result, and this method is applied to server; As shown in Figure 4, this method comprises:
步骤S402,接收终端设备的搜索请求,搜索请求中包括搜索问题;Step S402, receiving a search request from the terminal device, where the search request includes a search question;
步骤S404,基于搜索问题,搜索获得包含搜索问题的答案的视频;Step S404, based on the search question, search to obtain a video containing an answer to the search question;
步骤S406,基于视频的音频信号或者文本内容,确定答案的多个答案要点;Step S406, based on the audio signal or text content of the video, determine multiple answer points of the answer;
步骤S408,针对每个答案要点,从视频中处理得到包含答案要点的文本内容和图像内容的关键帧图像;Step S408, for each key point of the answer, process the key frame image containing the text content and image content of the key point of the answer from the video;
步骤S410,基于多个答案要点对应的多个关键帧图像,生成视频的视频封面;Step S410, generating a video cover of the video based on multiple key frame images corresponding to multiple answer points;
步骤S412,将包含视频封面的视频信息反馈给终端设备。Step S412, feeding back the video information including the video cover to the terminal device.
在一种实施例中,从视频中处理得到包含答案要点的文本内容和图像内容的关键帧图像的步骤,包括:In one embodiment, the step of obtaining key frame images containing text content and image content of answer points from video processing includes:
对视频进行字符识别处理,得到答案要点的文本内容;基于答案要点的文本内容,从视频的视频帧中识别与文本内容相匹配的图像内容;从视频帧中截取图像内容;基于答案要点的文本内容和图像内容,生成包含答案要点的关键帧图像。Perform character recognition processing on the video to obtain the text content of the key points of the answer; based on the text content of the key points of the answer, identify the image content that matches the text content from the video frame of the video; intercept the image content from the video frame; text based on the key points of the answer content and image content, generating a keyframed image containing the key points of the answer.
在一种实施例中,基于多个答案要点对应的多个关键帧图像,生成视频的视频封面的步骤,包括:In one embodiment, based on multiple key frame images corresponding to multiple answer points, the step of generating the video cover of the video includes:
基于多个答案要点在答案中的顺序,确定多个答案要点对应的多个关键帧图像在视频封面中的排布顺序;基于多个关键帧图像的尺寸以及多个关键帧图像之间的排布顺序,确定对应的拼接模板;基于拼接模板中各区域的尺寸,对各区域对应的关键帧图像进行等比例缩放处理;将缩放后的关键帧图像插入到模板的对应区域中,得到视频的视频封面。Based on the order of the multiple answer points in the answer, determine the arrangement order of the multiple key frame images corresponding to the multiple answer points in the video cover; based on the size of the multiple key frame images and the arrangement between the multiple key frame images According to the layout sequence, the corresponding splicing template is determined; based on the size of each area in the splicing template, the key frame images corresponding to each area are scaled proportionally; the scaled key frame images are inserted into the corresponding areas of the template to obtain the video image Video cover.
在一种实施例中,在将包含视频封面的视频信息反馈给终端设备的步骤之前,上述方法还包括:In one embodiment, before the step of feeding back the video information including the video cover to the terminal device, the above method further includes:
确定各关键帧图像在视频的播放时间轴上对应的时间标识;将各关键帧图像与时间标识之间的对应关系,添加到视频信息中。Determining the corresponding time marks of each key frame image on the playing time axis of the video; adding the corresponding relationship between each key frame image and the time mark to the video information.
在一种实施例中,确定各关键帧图像在视频的播放时间轴上对应的时间标识的步骤,包括:In one embodiment, the step of determining the time mark corresponding to each key frame image on the playback time axis of the video includes:
针对每个关键帧图像,将关键帧图像与视频中的视频帧进行匹配,确定视频帧中与关键帧图像相匹配的目标视频帧;将目标视频帧对应的时间标识确定为关键帧图像在视频的播放时间轴上对应的时间标识。For each key frame image, the key frame image is matched with the video frame in the video, and the target video frame matching the key frame image in the video frame is determined; the time identifier corresponding to the target video frame is determined as the key frame image in the video The corresponding time mark on the playing time axis of .
在一种实施例中,确定各关键帧图像在视频的播放时间轴上对应的时间标识的步骤, 包括:In one embodiment, the step of determining the time mark corresponding to each key frame image on the playback time axis of the video includes:
根据各视频帧与视频的播放时间轴上的播放时间之间的映射关系,确定各关键帧图像在视频的播放时间轴上对应的时间标识。According to the mapping relationship between each video frame and the playing time on the playing time axis of the video, the time identifier corresponding to each key frame image on the playing time axis of the video is determined.
本公开实施例提供的问答结果的搜索方法,服务器通过搜索包含搜索问题的答案的视频,使获得的视频与搜索问题的答案之间具有较高的匹配度,进而基于该视频生成视频信息,能够提升视频信息与搜索问题的答案之间的匹配度;再基于该视频确定的关键帧图像,其包含答案要点的文本内容和图像内容,能够直观、生动地体现答案要点;进而,基于多个关键帧图像生成的视频封面能够全面、直观且准确的表达出用户搜索视频的意愿,将包含视频封面的视频信息反馈给终端设备,便于用户通过视频封面快速找到自己需要观看的内容,在视频搜索功能上明显提升用户体验。In the search method for question and answer results provided by the embodiments of the present disclosure, the server searches for a video containing the answer to the search question, so that the obtained video has a high degree of matching with the answer to the search question, and then generates video information based on the video, which can Improve the matching degree between the video information and the answer to the search question; then based on the key frame image determined by the video, it contains the text content and image content of the key points of the answer, which can intuitively and vividly reflect the key points of the answer; furthermore, based on multiple key frame images The video cover generated by the frame image can fully, intuitively and accurately express the user's willingness to search for the video, and feed back the video information including the video cover to the terminal device, so that users can quickly find the content they need to watch through the video cover. In the video search function Significantly improve user experience.
本实施例所提供的方法,其实现原理及产生的技术效果和前述实施例一相同,为简要描述,本实施例部分未提及之处,可参考前述实施例一中相应内容。The implementation principle and technical effects of the method provided in this embodiment are the same as those in the first embodiment. For brief description, for the part not mentioned in this embodiment, you can refer to the corresponding content in the first embodiment.
实施例三:Embodiment three:
根据以上实施例一和实施例二,本实施例还可以提供一种问答结果的搜索方法,该方法包括:According to the first and second embodiments above, this embodiment can also provide a search method for question and answer results, the method comprising:
步骤1,终端设备向服务器发送搜索请求,搜索请求中包括搜索问题。Step 1, the terminal device sends a search request to the server, and the search request includes a search question.
步骤2,服务器接收终端设备的搜索请求,基于搜索请求中的搜索问题,搜索获得包含搜索问题的答案的视频。Step 2, the server receives the search request from the terminal device, and based on the search question in the search request, searches to obtain a video containing an answer to the search question.
步骤3,服务器基于搜索问题,搜索获得包含搜索问题的答案的视频。Step 3, based on the search question, the server searches to obtain a video containing an answer to the search question.
步骤4,服务器基于视频的音频信号或者文本内容,确定答案的多个答案要点。In step 4, the server determines multiple key points of the answer based on the audio signal or the text content of the video.
步骤5,服务器针对每个答案要点,从视频中处理得到包含答案要点的文本内容和图像内容的关键帧图像。Step 5, for each key point of the answer, the server processes the video to obtain a key frame image containing text content and image content of the key point of the answer.
步骤6,服务器基于多个答案要点对应的多个关键帧图像,生成视频的视频封面;视频封面由视频的多个关键帧图像组成,各关键帧图像中包括解答搜索问题的文本内容和图像内容,关键帧图像的数量与解答搜索问题的答案要点的数量相匹配。Step 6: The server generates a video cover of the video based on multiple key frame images corresponding to multiple answer points; the video cover is composed of multiple key frame images of the video, and each key frame image includes text content and image content for answering the search question , the number of keyframe images matches the number of answer points that answer the search question.
步骤7,服务器确定各关键帧图像在视频的播放时间轴上对应的时间标识;将各关键帧图像与时间标识之间的对应关系,添加到视频信息中。In step 7, the server determines the corresponding time mark of each key frame image on the playing time axis of the video; and adds the corresponding relationship between each key frame image and time mark to the video information.
步骤8,服务器将包含视频封面的视频信息反馈给终端设备。In step 8, the server feeds back the video information including the video cover to the terminal device.
步骤9,终端设备获取与搜索问题对应的视频信息。Step 9, the terminal device acquires video information corresponding to the search question.
步骤10,终端设备在搜索结果页面中展示视频封面。Step 10, the terminal device displays the video cover on the search result page.
步骤11,终端设备响应于检测到对视频封面上的第一关键帧图像的触发操作,确定第一关键帧图像在视频的播放时间轴上对应的时间标识;Step 11, the terminal device determines the time mark corresponding to the first key frame image on the playback time axis of the video in response to detecting a trigger operation on the first key frame image on the video cover;
步骤12,终端设备跳转到视频的播放页面,从时间标识开始播放视频。Step 12, the terminal device jumps to the video playing page, and starts playing the video from the time mark.
本实施例所提供的方法,其实现原理及产生的技术效果和前述实施例一和实施例二相同,为简要描述,本实施例部分未提及之处,可参考前述实施例一和实施例二中相应内容。The implementation principle and technical effect of the method provided in this embodiment are the same as those of the first and second embodiments described above. For a brief description, for the parts not mentioned in this embodiment, you can refer to the first and second embodiments above. Corresponding content in the second.
实施例四:Embodiment four:
根据以上实施例一,本公开还提供了一种问答结果的搜索装置,该装置可应用于终端设备,如图5所示,该装置包括:According to the first embodiment above, the present disclosure also provides a search device for question and answer results, which can be applied to a terminal device, as shown in FIG. 5 , the device includes:
信息获取模块502,用于获取与搜索问题对应的视频信息;其中,视频信息包括视频封面,视频封面由视频的多个关键帧图像组成,各关键帧图像中包括解答搜索问题的文本内容和图像内容,关键帧图像的数量与解答搜索问题的答案要点的数量相匹配;The information obtaining module 502 is used to obtain video information corresponding to the search question; wherein, the video information includes a video cover, and the video cover is composed of a plurality of key frame images of the video, and each key frame image includes text content and images to answer the search question Content, where the number of keyframed images matches the number of answer points that answer the search question;
封面展示模块504,用于在搜索结果页面中展示视频封面;Cover display module 504, for displaying the video cover in the search result page;
视频播放模块506,用于响应于检测到对视频封面的触发操作对视频进行播放。The video playing module 506 is configured to play the video in response to detecting a trigger operation on the video cover.
在一种实施例中,上述视频播放模块506具体用于:In one embodiment, the above-mentioned video playing module 506 is specifically used for:
响应于检测到对视频封面上的第一关键帧图像的触发操作,确定第一关键帧图像在视频的播放时间轴上对应的时间标识;跳转到视频的播放页面,从时间标识开始播放视频。In response to detecting a trigger operation on the first key frame image on the video cover, determine the time mark corresponding to the first key frame image on the playback time axis of the video; jump to the play page of the video, and start playing the video from the time mark .
实施例五:Embodiment five:
根据以上实施例二,本公开还提供了一种问答结果的搜索装置,该装置可应用于服务器,如图6所示,该装置包括:According to the second embodiment above, the present disclosure also provides a search device for question and answer results, which can be applied to a server, as shown in FIG. 6 , the device includes:
请求接收模块602,用于接收终端设备的搜索请求,搜索请求中包括搜索问题;A request receiving module 602, configured to receive a search request from a terminal device, where the search request includes a search question;
视频搜索模块604,用于基于搜索问题,搜索获得包含搜索问题的答案的视频;A video search module 604, configured to search for a video containing an answer to the search question based on the search question;
要点确定模块606,用于基于视频的音频信号或者文本内容,确定答案的多个答案要点;Key point determination module 606, for determining multiple answer points of the answer based on the audio signal or text content of the video;
图像得到模块608,用于针对每个答案要点,从视频中处理得到包含答案要点的文本内容和图像内容的关键帧图像;Image obtains module 608, is used for each answer main point, obtains the key frame image that contains the text content of answer main point and image content from video processing;
封面生成模块610,用于基于多个答案要点对应的多个关键帧图像,生成视频的视频封面;Cover generation module 610, for generating the video cover of video based on multiple key frame images corresponding to multiple answer points;
信息反馈模块612,用于将包含视频封面的视频信息反馈给终端设备。The information feedback module 612 is configured to feed back the video information including the video cover to the terminal device.
在一种实施例中,图像得到模块608包括:In one embodiment, the image obtaining module 608 includes:
文本内容得到单元,用于对视频进行字符识别处理,得到答案要点的文本内容;The text content obtaining unit is used to perform character recognition processing on the video to obtain the text content of the key points of the answer;
图像内容识别单元,用于基于答案要点的文本内容,从视频的视频帧中识别与文本内容相匹配的图像内容;An image content recognition unit, configured to identify image content matching the text content from the video frame of the video based on the text content of the answer points;
图像内容截取单元,用于从视频帧中截取图像内容;An image content intercepting unit, configured to intercept image content from a video frame;
图像生成单元,用于基于答案要点的文本内容和图像内容,生成包含答案要点的关键帧图像。The image generating unit is configured to generate a key frame image containing the key points of the answer based on the text content and the image content of the key points of the answer.
在一种实施例中,图像得到模块608包括:In one embodiment, the image obtaining module 608 includes:
顺序确定单元,用于基于多个答案要点在答案中的顺序,确定多个答案要点对应的多个关键帧图像在视频封面中的排布顺序;A sequence determination unit is used to determine the arrangement order of multiple key frame images corresponding to multiple answer points in the video cover based on the order of multiple answer points in the answer;
模板确定单元,用于基于多个关键帧图像的尺寸以及多个关键帧图像之间的排布顺序,确定对应的拼接模板;A template determination unit, configured to determine a corresponding splicing template based on the size of the multiple key frame images and the arrangement sequence between the multiple key frame images;
缩放处理单元,用于基于拼接模板中各区域的尺寸,对各区域对应的关键帧图像进行等比例缩放处理;A scaling processing unit, configured to perform proportional scaling processing on the key frame images corresponding to each area based on the size of each area in the splicing template;
图像插入单元,用于将缩放后的关键帧图像插入到模板的对应区域中,得到视频的视频封面。The image inserting unit is configured to insert the scaled key frame image into the corresponding area of the template to obtain the video cover of the video.
在一种实施例中,上述装置还包括:In one embodiment, the above-mentioned device also includes:
时间标识确定模块,用于确定各关键帧图像在视频的播放时间轴上对应的时间标识;A time stamp determining module is used to determine the corresponding time stamp of each key frame image on the playback time axis of the video;
关系添加模块,用于将各关键帧图像与时间标识之间的对应关系,添加到视频信息中。The relationship adding module is used to add the corresponding relationship between each key frame image and the time stamp to the video information.
在一种实施例中,时间标识确定模块包括:In one embodiment, the time stamp determination module includes:
匹配单元,用于针对每个关键帧图像,将关键帧图像与视频中的视频帧进行匹配,确定视频帧中与关键帧图像相匹配的目标视频帧;The matching unit is used for matching the key frame image with the video frame in the video for each key frame image, and determining the target video frame matching the key frame image in the video frame;
第一标识确定单元,用于将目标视频帧对应的时间标识确定为关键帧图像在视频的播放时间轴上对应的时间标识。The first identification determining unit is configured to determine the time identification corresponding to the target video frame as the time identification corresponding to the key frame image on the playback time axis of the video.
在一种实施例中,时间标识确定模块包括:第二标识确定单元,用于根据各视频帧与视频的播放时间轴上的播放时间之间的映射关系,确定各关键帧图像在视频的播放时间轴 上对应的时间标识。In one embodiment, the time identification determination module includes: a second identification determination unit, configured to determine each key frame image during the playback of the video according to the mapping relationship between each video frame and the playback time on the playback time axis of the video. The corresponding timestamp on the time axis.
本实施例所提供的装置,其实现原理及产生的技术效果和前述实施例一至三相同,为简要描述,本实施例部分未提及之处,可参考前述实施例一至三中相应内容。The implementation principle and technical effects of the device provided in this embodiment are the same as those of the first to third embodiments described above. For brief description, for parts not mentioned in this embodiment, refer to the corresponding content in the first to third embodiments above.
基于前述实施例,本实施例给出了一种终端设备,包括:存储器和处理器,其中,存储器中存储有计算机程序,当计算机程序被处理器执行时,处理器执行如上述实施例一至三中的方法。Based on the foregoing embodiments, this embodiment provides a terminal device, including: a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the above-mentioned embodiment 1 to 3 method in .
本实施例还给出了一种计算机可读存储介质,存储介质中存储有计算机程序,当计算机程序被处理器执行时,处理器执行如上述实施例一至三中的方法。This embodiment also provides a computer-readable storage medium, in which a computer program is stored. When the computer program is executed by a processor, the processor executes the methods in the first to third embodiments above.
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relative terms such as "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these No such actual relationship or order exists between entities or operations. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above descriptions are only specific implementation manners of the present disclosure, so that those skilled in the art can understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present disclosure. Therefore, the present disclosure will not be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (25)

  1. 一种问答结果的搜索方法,包括:A search method for question and answer results, comprising:
    获取与搜索问题对应的视频信息;其中,所述视频信息包括视频封面,所述视频封面由视频的多个关键帧图像组成,各所述关键帧图像中包括解答所述搜索问题的文本内容和图像内容,所述关键帧图像的数量与解答所述搜索问题的答案要点的数量相匹配;Acquiring video information corresponding to the search question; wherein the video information includes a video cover, the video cover is composed of a plurality of key frame images of the video, each of the key frame images includes text content and image content, the number of keyframe images matching the number of answer points to the search question;
    在搜索结果页面中展示所述视频封面;display the cover art of said video in the search results page;
    响应于检测到对所述视频封面的触发操作对所述视频进行播放。Playing the video in response to detecting a trigger operation on the video cover.
  2. 根据权利要求1所述的方法,所述多个关键帧图像在所述视频封面中的排布顺序与所述搜索问题对应的答案要点顺序匹配。According to the method according to claim 1, the arrangement order of the plurality of key frame images in the video cover matches the order of key points of the answers corresponding to the search question.
  3. 根据权利要求1-2中任一项所述的方法,所述关键帧图像是从所述视频的视频帧中截取获得的包含所述答案要点的文本内容和图像内容的图像。According to the method according to any one of claims 1-2, the key frame image is an image including the text content and the image content of the key points of the answer obtained by intercepting from the video frame of the video.
  4. 根据权利要求1-3中任一项所述的方法,所述关键帧图像中包含的所述答案要点的文本内容是从所述视频的视频帧中识别得到的,所述关键帧图像中包含的所述答案要点的图像内容是基于所述答案要点的文本内容从所述视频的视频帧中匹配得到的。According to the method according to any one of claims 1-3, the text content of the answer points contained in the key frame image is recognized from the video frame of the video, and the key frame image contains The image content of the answer point is obtained by matching from the video frame of the video based on the text content of the answer point.
  5. 根据权利要求1-4中任一项所述的方法,所述响应于检测到对所述视频封面的触发操作对所述视频进行播放,包括:The method according to any one of claims 1-4, the playing the video in response to detecting a trigger operation on the video cover, comprising:
    响应于检测到对所述视频封面上的第一关键帧图像的触发操作,确定所述第一关键帧图像在所述视频的播放时间轴上对应的时间标识;In response to detecting a trigger operation on the first key frame image on the cover of the video, determine the time mark corresponding to the first key frame image on the playing time axis of the video;
    跳转到所述视频的播放页面,从所述时间标识开始播放所述视频。Jump to the playing page of the video, and start playing the video from the time mark.
  6. 一种问答结果的搜索方法,包括:A search method for question and answer results, comprising:
    接收终端设备的搜索请求,所述搜索请求中包括搜索问题;receiving a search request from a terminal device, where the search request includes a search question;
    基于所述搜索问题,搜索获得包含所述搜索问题的答案的视频;Based on the search question, searching for videos containing answers to the search question;
    基于所述视频的音频信号或者文本内容,确定所述答案的多个答案要点;determining a plurality of answer points of the answer based on the audio signal or the text content of the video;
    针对每个答案要点,从所述视频中处理得到包含所述答案要点的文本内容和图像内容的关键帧图像;For each answer point, process a key frame image containing text content and image content of the answer point from the video;
    基于所述多个答案要点对应的多个关键帧图像,生成所述视频的视频封面;Generate a video cover of the video based on a plurality of key frame images corresponding to the plurality of answer points;
    将包含所述视频封面的视频信息反馈给所述终端设备。Feedback the video information including the video cover to the terminal device.
  7. 根据权利要求6所述的方法,所述从所述视频中处理得到包含所述答案要点的文 本内容和图像内容的关键帧图像,包括:The method according to claim 6, said processing obtains the key frame image containing the text content and image content of said answer main points from said video, comprising:
    对所述视频进行字符识别处理,得到所述答案要点的文本内容;Perform character recognition processing on the video to obtain the text content of the answer points;
    基于所述答案要点的文本内容,从所述视频的视频帧中识别与所述文本内容相匹配的图像内容;identifying, from video frames of the video, image content matching the text content based on the text content of the answer points;
    从所述视频帧中截取所述图像内容;intercepting the image content from the video frame;
    基于所述答案要点的文本内容和所述图像内容,生成包含所述答案要点的关键帧图像。Based on the text content of the answer points and the image content, a key frame image containing the answer points is generated.
  8. 根据权利要求6或7所述的方法,所述基于所述多个答案要点对应的多个关键帧图像,生成所述视频的视频封面,包括:According to the method according to claim 6 or 7, said multiple key frame images corresponding to said multiple answer points, generating a video cover of said video, comprising:
    基于所述多个答案要点在所述答案中的顺序,确定所述多个答案要点对应的多个关键帧图像在所述视频封面中的排布顺序;Based on the order of the multiple answer points in the answer, determine the arrangement order of the multiple key frame images corresponding to the multiple answer points in the video cover;
    基于所述多个关键帧图像的尺寸以及所述多个关键帧图像之间的排布顺序,确定对应的拼接模板;Based on the size of the plurality of key frame images and the arrangement order among the plurality of key frame images, determine a corresponding splicing template;
    基于所述拼接模板中各区域的尺寸,对各区域对应的关键帧图像进行等比例缩放处理;Based on the size of each area in the mosaic template, the key frame images corresponding to each area are scaled proportionally;
    将缩放后的关键帧图像插入到所述模板的对应区域中,得到所述视频的视频封面。Inserting the scaled key frame image into the corresponding area of the template to obtain the video cover of the video.
  9. 根据权利要求6-8中任一项所述的方法,所述将包含所述视频封面的视频信息反馈给所述终端设备之前,还包括:According to the method according to any one of claims 6-8, before feeding back the video information including the video cover to the terminal device, further comprising:
    确定各关键帧图像在所述视频的播放时间轴上对应的时间标识;Determine the time mark corresponding to each key frame image on the playback time axis of the video;
    将各关键帧图像与时间标识之间的对应关系,添加到所述视频信息中。The corresponding relationship between each key frame image and the time stamp is added to the video information.
  10. 根据权利要求9所述的方法,所述确定各关键帧图像在所述视频的播放时间轴上对应的时间标识,包括:The method according to claim 9, said determining the time mark corresponding to each key frame image on the playing time axis of said video, comprising:
    针对每个关键帧图像,将所述关键帧图像与所述视频中的视频帧进行匹配,确定所述视频帧中与所述关键帧图像相匹配的目标视频帧;For each key frame image, match the key frame image with video frames in the video, and determine a target video frame in the video frame that matches the key frame image;
    将所述目标视频帧对应的时间标识确定为所述关键帧图像在所述视频的播放时间轴上对应的时间标识。Determine the time identifier corresponding to the target video frame as the time identifier corresponding to the key frame image on the playing time axis of the video.
  11. 根据权利要求9-10中任一项所述的方法,所述确定各关键帧图像在所述视频的播放时间轴上对应的时间标识,包括:According to the method according to any one of claims 9-10, said determining the time mark corresponding to each key frame image on the playing time axis of said video comprises:
    根据各视频帧与所述视频的播放时间轴上的播放时间之间的映射关系,确定各关键帧 图像在所述视频的播放时间轴上对应的时间标识。According to the mapping relationship between each video frame and the playing time on the playing time axis of the video, determine the corresponding time mark of each key frame image on the playing time axis of the video.
  12. 一种问答结果的搜索装置,包括:A search device for question and answer results, comprising:
    信息获取模块,被配置为获取与搜索问题对应的视频信息;其中,所述视频信息包括视频封面,所述视频封面由视频的多个关键帧图像组成,各所述关键帧图像中包括解答所述搜索问题的文本内容和图像内容,所述关键帧图像的数量与解答所述搜索问题的答案要点的数量相匹配;The information acquisition module is configured to acquire video information corresponding to the search question; wherein, the video information includes a video cover, and the video cover is composed of a plurality of key frame images of the video, and each of the key frame images includes an answer. The text content and image content of the search question, the number of key frame images matches the number of answer points to answer the search question;
    封面展示模块,被配置为在搜索结果页面中展示所述视频封面;A cover display module configured to display the video cover on the search result page;
    视频播放模块,被配置为响应于检测到对所述视频封面的触发操作对所述视频进行播放。The video playing module is configured to play the video in response to detecting a trigger operation on the video cover.
  13. 根据权利要求12所述的装置,所述多个关键帧图像在所述视频封面中的排布顺序与所述搜索问题对应的答案要点顺序匹配。According to the device according to claim 12, the arrangement order of the plurality of key frame images in the video cover matches the order of key points of the answers corresponding to the search question.
  14. 根据权利要求12-13中任一项所述的装置,所述关键帧图像是从所述视频的视频帧中截取获得的包含所述答案要点的文本内容和图像内容的图像。The device according to any one of claims 12-13, wherein the key frame image is an image including the text content and the image content of the key points of the answer obtained by intercepting from a video frame of the video.
  15. 根据权利要求12-14中任一项所述的装置,所述关键帧图像中包含的所述答案要点的文本内容是从所述视频的视频帧中识别得到的,所述关键帧图像中包含的所述答案要点的图像内容是基于所述答案要点的文本内容从所述视频的视频帧中匹配得到的。According to the device according to any one of claims 12-14, the text content of the answer points contained in the key frame image is recognized from the video frame of the video, and the key frame image contains The image content of the answer point is obtained by matching from the video frame of the video based on the text content of the answer point.
  16. 根据权利要求12-15中任一项所述的装置,所述视频播放模块被配置为:According to the device according to any one of claims 12-15, the video playback module is configured to:
    响应于检测到对所述视频封面上的第一关键帧图像的触发操作,确定所述第一关键帧图像在所述视频的播放时间轴上对应的时间标识;In response to detecting a trigger operation on the first key frame image on the cover of the video, determine the time mark corresponding to the first key frame image on the playing time axis of the video;
    跳转到所述视频的播放页面,从所述时间标识开始播放所述视频。Jump to the playing page of the video, and start playing the video from the time mark.
  17. 一种问答结果的搜索装置,包括:A search device for question and answer results, comprising:
    请求接收模块,被配置为接收终端设备的搜索请求,所述搜索请求中包括搜索问题;A request receiving module configured to receive a search request from a terminal device, where the search request includes a search question;
    视频搜索模块,被配置为基于所述搜索问题,搜索获得包含所述搜索问题的答案的视频;A video search module configured to search for videos containing answers to the search questions based on the search questions;
    要点确定模块,被配置为基于所述视频的音频信号或者文本内容,确定所述答案的多个答案要点;A point determining module configured to determine a plurality of answer points of the answer based on the audio signal or the text content of the video;
    图像得到模块,被配置为针对每个答案要点,从所述视频中处理得到包含所述答案要点的文本内容和图像内容的关键帧图像;The image obtaining module is configured to process and obtain a key frame image containing text content and image content of the answer points from the video for each answer point;
    封面生成模块,被配置为基于所述多个答案要点对应的多个关键帧图像,生成所述视 频的视频封面;The cover generation module is configured to generate a video cover of the video based on a plurality of key frame images corresponding to the plurality of answer points;
    信息反馈模块,被配置为将包含所述视频封面的视频信息反馈给所述终端设备。An information feedback module configured to feed back video information including the video cover to the terminal device.
  18. 根据权利要求17所述的装置,所述图像得到模块包括:The device according to claim 17, said image obtaining module comprising:
    文本内容得到单元,被配置为对所述视频进行字符识别处理,得到所述答案要点的文本内容;The text content obtaining unit is configured to perform character recognition processing on the video to obtain the text content of the answer points;
    图像内容识别单元,被配置为基于所述答案要点的文本内容,从所述视频的视频帧中识别与所述文本内容相匹配的图像内容;an image content identification unit configured to identify, from video frames of the video, image content matching the text content based on the text content of the answer points;
    图像内容截取单元,被配置为从所述视频帧中截取所述图像内容;an image content intercepting unit configured to intercept the image content from the video frame;
    图像生成单元,被配置为基于所述答案要点的文本内容和所述图像内容,生成包含所述答案要点的关键帧图像。An image generation unit configured to generate a key frame image containing the answer points based on the text content of the answer points and the image content.
  19. 根据权利要求17或18所述的装置,所述图像得到模块包括:The device according to claim 17 or 18, said image obtaining module comprising:
    顺序确定单元,被配置为基于所述多个答案要点在所述答案中的顺序,确定所述多个答案要点对应的多个关键帧图像在所述视频封面中的排布顺序;An order determination unit configured to determine the arrangement order of the plurality of key frame images corresponding to the plurality of answer points in the video cover based on the order of the plurality of answer points in the answer;
    模板确定单元,被配置为基于所述多个关键帧图像的尺寸以及所述多个关键帧图像之间的排布顺序,确定对应的拼接模板;The template determination unit is configured to determine a corresponding splicing template based on the size of the plurality of key frame images and the arrangement sequence among the plurality of key frame images;
    缩放处理单元,被配置为基于所述拼接模板中各区域的尺寸,对各区域对应的关键帧图像进行等比例缩放处理;The scaling processing unit is configured to perform proportional scaling processing on the key frame images corresponding to each area based on the size of each area in the mosaic template;
    图像插入单元,被配置为将缩放后的关键帧图像插入到所述模板的对应区域中,得到所述视频的视频封面。The image inserting unit is configured to insert the scaled key frame image into the corresponding area of the template to obtain the video cover of the video.
  20. 根据权利要求17-19中任一项所述的装置,所述装置还包括:The device according to any one of claims 17-19, further comprising:
    时间标识确定模块,被配置为确定各关键帧图像在所述视频的播放时间轴上对应的时间标识;A time stamp determination module configured to determine the time stamp corresponding to each key frame image on the playing time axis of the video;
    关系添加模块,被配置为将各关键帧图像与时间标识之间的对应关系,添加到所述视频信息中。The relationship adding module is configured to add the corresponding relationship between each key frame image and the time stamp to the video information.
  21. 根据权利要求20所述的装置,所述时间标识确定模块包括:The device according to claim 20, the time stamp determining module comprising:
    匹配单元,被配置为针对每个关键帧图像,将所述关键帧图像与所述视频中的视频帧进行匹配,确定所述视频帧中与所述关键帧图像相匹配的目标视频帧;A matching unit configured to, for each key frame image, match the key frame image with a video frame in the video, and determine a target video frame in the video frame that matches the key frame image;
    第一标识确定单元,被配置为将所述目标视频帧对应的时间标识确定为所述关键帧图像在所述视频的播放时间轴上对应的时间标识。The first identifier determining unit is configured to determine the time identifier corresponding to the target video frame as the time identifier corresponding to the key frame image on the playing time axis of the video.
  22. 根据权利要求20-21中任一项所述的装置,所述时间标识确定模块包括:According to the device according to any one of claims 20-21, the time stamp determining module comprises:
    第二标识确定单元,被配置为根据各视频帧与所述视频的播放时间轴上的播放时间之间的映射关系,确定各关键帧图像在所述视频的播放时间轴上对应的时间标识。The second identifier determination unit is configured to determine the time identifier corresponding to each key frame image on the playback time axis of the video according to the mapping relationship between each video frame and the playback time on the playback time axis of the video.
  23. 一种终端设备,包括:A terminal device comprising:
    存储器和处理器,其中,所述存储器中存储有计算机程序,当所述计算机程序被所述处理器执行时,所述处理器执行如权利要求1-5中任一项所述的方法,或者执行如权利要求6-11中任一项所述的方法。A memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor performs the method according to any one of claims 1-5, or Performing the method as described in any one of claims 6-11.
  24. 一种计算机可读存储介质,所述存储介质中存储有计算机程序,当所述计算机程序被处理器执行时,所述处理器执行如权利要求1-5中任一项所述的方法,或者执行如权利要求6-11中任一项所述的方法。A computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the processor executes the method according to any one of claims 1-5, or Performing the method as described in any one of claims 6-11.
  25. 一种计算机程序产品,所述计算机程序产品包括计算机程序/指令,所述计算机程序/指令被处理器执行时执行如权利要求1-5中任一项所述的方法,或者执行如权利要求6-11中任一项所述的方法。A computer program product, the computer program product comprising a computer program/instruction, when the computer program/instruction is executed by a processor, the method according to any one of claims 1-5 is executed, or the method according to claim 6 is executed - the method described in any one of 11.
PCT/CN2022/137552 2021-12-28 2022-12-08 Question and answer result searching method and apparatus, device, and storage medium WO2023124874A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111620699.8 2021-12-28
CN202111620699.8A CN114297433B (en) 2021-12-28 2021-12-28 Method, device, equipment and storage medium for searching question and answer result

Publications (1)

Publication Number Publication Date
WO2023124874A1 true WO2023124874A1 (en) 2023-07-06

Family

ID=80969482

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/137552 WO2023124874A1 (en) 2021-12-28 2022-12-08 Question and answer result searching method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN114297433B (en)
WO (1) WO2023124874A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297433B (en) * 2021-12-28 2024-04-19 抖音视界有限公司 Method, device, equipment and storage medium for searching question and answer result
CN115422398A (en) * 2022-08-30 2022-12-02 北京字跳网络技术有限公司 Comment information processing method and device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109905782A (en) * 2019-03-31 2019-06-18 联想(北京)有限公司 A kind of control method and device
CN110019933A (en) * 2018-01-02 2019-07-16 阿里巴巴集团控股有限公司 Video data handling procedure, device, electronic equipment and storage medium
CN111694984A (en) * 2020-06-12 2020-09-22 百度在线网络技术(北京)有限公司 Video searching method and device, electronic equipment and readable storage medium
WO2021004247A1 (en) * 2019-07-11 2021-01-14 北京字节跳动网络技术有限公司 Method and apparatus for generating video cover and electronic device
CN113392288A (en) * 2020-03-11 2021-09-14 阿里巴巴集团控股有限公司 Visual question answering and model training method, device, equipment and storage medium thereof
CN114297433A (en) * 2021-12-28 2022-04-08 北京字节跳动网络技术有限公司 Method, device and equipment for searching question and answer results and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102018295B1 (en) * 2017-06-14 2019-09-05 주식회사 핀인사이트 Apparatus, method and computer-readable medium for searching and providing sectional video
CN109492087A (en) * 2018-11-27 2019-03-19 北京中熙正保远程教育技术有限公司 A kind of automatic answer system and method for online course learning
CN110337011A (en) * 2019-07-17 2019-10-15 百度在线网络技术(北京)有限公司 Method for processing video frequency, device and equipment
CN111400553A (en) * 2020-04-26 2020-07-10 Oppo广东移动通信有限公司 Video searching method, video searching device and terminal equipment
CN112447073A (en) * 2020-12-11 2021-03-05 北京有竹居网络技术有限公司 Explanation video generation method, explanation video display method and device
CN112883235A (en) * 2021-03-11 2021-06-01 深圳市一览网络股份有限公司 Video content searching method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019933A (en) * 2018-01-02 2019-07-16 阿里巴巴集团控股有限公司 Video data handling procedure, device, electronic equipment and storage medium
CN109905782A (en) * 2019-03-31 2019-06-18 联想(北京)有限公司 A kind of control method and device
WO2021004247A1 (en) * 2019-07-11 2021-01-14 北京字节跳动网络技术有限公司 Method and apparatus for generating video cover and electronic device
CN113392288A (en) * 2020-03-11 2021-09-14 阿里巴巴集团控股有限公司 Visual question answering and model training method, device, equipment and storage medium thereof
CN111694984A (en) * 2020-06-12 2020-09-22 百度在线网络技术(北京)有限公司 Video searching method and device, electronic equipment and readable storage medium
CN114297433A (en) * 2021-12-28 2022-04-08 北京字节跳动网络技术有限公司 Method, device and equipment for searching question and answer results and storage medium

Also Published As

Publication number Publication date
CN114297433A (en) 2022-04-08
CN114297433B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
WO2023124874A1 (en) Question and answer result searching method and apparatus, device, and storage medium
WO2019109643A1 (en) Video recommendation method and apparatus, and computer device and storage medium
US11317139B2 (en) Control method and apparatus
CN110430476B (en) Live broadcast room searching method, system, computer equipment and storage medium
CN108712665B (en) Live broadcast list generation method and device, server and storage medium
CN110225387A (en) A kind of information search method, device and electronic equipment
CN107229741B (en) Information searching method, device, equipment and storage medium
CN110913241B (en) Video retrieval method and device, electronic equipment and storage medium
US20070297643A1 (en) Information processing system, information processing method, and program product therefor
JP2000348040A (en) Information processor and information processing system
CN109309844A (en) Video platform word treatment method, videoconference client and server
US10650814B2 (en) Interactive question-answering apparatus and method thereof
CN106407358B (en) Image searching method and device and mobile terminal
CN111294660A (en) Video clip positioning method, server, client and electronic equipment
CN110740389A (en) Video positioning method and device, computer readable medium and electronic equipment
CN106469308A (en) Efficient question searching method and device
JP2014120032A (en) Character recognition device, character recognition method and character recognition program
CN110309324A (en) A kind of searching method and relevant apparatus
CN108108143B (en) Recording playback method, mobile terminal and device with storage function
KR20200063316A (en) Apparatus for searching video based on script and method for the same
JP5552987B2 (en) Search result output device, search result output method, and search result output program
CN107180058B (en) Method and device for inquiring based on subtitle information
US20100281046A1 (en) Method and web server of processing a dynamic picture for searching purpose
TW201523421A (en) Determining images of article for extraction
KR102122918B1 (en) Interactive question-anwering apparatus and method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22914129

Country of ref document: EP

Kind code of ref document: A1