CN111985395A - Video generation method and device - Google Patents

Video generation method and device Download PDF

Info

Publication number
CN111985395A
CN111985395A CN202010839622.9A CN202010839622A CN111985395A CN 111985395 A CN111985395 A CN 111985395A CN 202010839622 A CN202010839622 A CN 202010839622A CN 111985395 A CN111985395 A CN 111985395A
Authority
CN
China
Prior art keywords
target
question
data
video
multimedia data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010839622.9A
Other languages
Chinese (zh)
Inventor
彭旸
门宇雯
王承博
李广
郭常圳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ape Power Future Technology Co Ltd
Original Assignee
Beijing Ape Power Future Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ape Power Future Technology Co Ltd filed Critical Beijing Ape Power Future Technology Co Ltd
Priority to CN202010839622.9A priority Critical patent/CN111985395A/en
Publication of CN111985395A publication Critical patent/CN111985395A/en
Priority to CN202110252283.9A priority patent/CN112861784B/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B7/00Electrically-operated teaching apparatus or devices working with questions and answers
    • G09B7/02Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student
    • G09B7/04Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student characterised by modifying the teaching programme in response to a wrong answer, e.g. repeating the question, supplying a further explanation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23412Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs for generating or manipulating the scene composition of objects, e.g. MPEG-4 objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • User Interface Of Digital Computer (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The application provides a video generation method and a video generation device, wherein the video generation method comprises the following steps: acquiring multimedia data of a target user aiming at a target problem; displaying the obtained multimedia data and the target problem; obtaining reply information of the target user aiming at the target question based on audio data in the multimedia data; obtaining a reply result for the target question by comparing the reply information with a preset answer of the target question; and generating a target video according to the multimedia data, the target question and the reply result.

Description

Video generation method and device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a video generation method and apparatus, a computing device, and a computer-readable storage medium.
Background
With the development of the internet, on-line problem-solving methods are more and more diversified, however, in the current problem-solving methods, the problem-solving process of the problem-solving person is not recorded in detail, so that the problem-solving person or others cannot analyze the specific situation of the problem-solving person more accurately, and a more reliable scheme needs to be provided.
Disclosure of Invention
In view of this, embodiments of the present application provide a video generation method and apparatus, a computing device, and a computer-readable storage medium, so as to solve the technical defects existing in the prior art.
According to a first aspect of embodiments of the present application, there is provided a video generation method, including:
acquiring multimedia data of a target user aiming at a target problem;
displaying the obtained multimedia data and the target problem;
obtaining reply information of the target user aiming at the target question based on audio data in the multimedia data;
obtaining a reply result for the target question by comparing the reply information with a preset answer of the target question;
and generating a target video according to the multimedia data, the target question and the reply result.
Optionally, the obtaining reply information of the target user for the target question based on the audio data in the multimedia data includes:
detecting voice data in the audio data within a preset time interval;
intercepting the audio data according to the voice data under the condition that the voice data are detected to obtain target audio data;
and identifying the voice data in the target audio data to obtain text information corresponding to the target audio data, and taking the text information as the reply information.
Optionally, after the step of detecting the voice data in the audio data within the preset time interval is executed, the method further includes:
in a case where the human voice data is not detected, the reply information is determined as incomplete.
Optionally, the displaying the obtained multimedia data and the target problem includes:
carrying out face positioning and/or human body posture positioning on an image frame of video data in the multimedia data to acquire position information of a face and/or a human body in the image frame;
determining the display position of the target problem according to the position information and a preset display rule;
and displaying the target problem in the video data according to the display position.
Optionally, the obtaining a response result for the target question by comparing the response information with a preset answer to the target question includes:
acquiring the preset answer of the target question according to the question mark of the target question;
comparing the preset answer with the reply information according to a preset grading standard to obtain a grade aiming at the reply information;
and taking the scores and the reply information as the reply result.
Optionally, the acquiring multimedia data of the target user for the target problem includes:
starting a shooting device to shoot in real time to obtain video data containing a target user;
recording in real time through a recording device to obtain the audio data;
and acquiring the video data and the audio data as the multimedia data.
Optionally, recording is performed in real time through a recording device to obtain the audio data, including:
and starting the playing device to play the background music, and starting the recording device to record to obtain the audio data containing the background music.
Optionally, after the step of generating the target video according to the multimedia data, the target question and the reply result is executed, the method further includes:
aligning the background music contained in the audio data with the background music played by the playing device according to the sound fingerprint;
and fusing the target video and the background music to obtain a second target video in a mode of fusing the aligned background music and the played background music.
Optionally, after the step of generating the target video according to the multimedia data, the target question and the reply result is executed, the method further includes:
determining a second target question in a question bank to which the target question belongs;
and taking the second target question as the target question, and returning to execute the step of acquiring the multimedia data of the target user aiming at the target question.
Optionally, the determining a second target question in the question bank to which the target question belongs includes:
and determining a second target problem in the question bank to which the target problem belongs according to the problem type to which the target problem belongs and the difficulty value corresponding to the target problem.
Optionally, the displaying the obtained multimedia data and the target problem includes:
starting a timing program to perform answer timing aiming at the target question, wherein the time count value corresponding to the timing program is increased or decreased according to a time unit;
and displaying the target problem and the time count value in video data of the multimedia data.
According to a second aspect of embodiments of the present application, there is provided a video generation apparatus, including:
the acquisition module is configured to acquire multimedia data of a target user aiming at a target problem;
a presentation module configured to present the acquired multimedia data and the target question;
an obtaining module configured to obtain reply information of the target user for the target question based on audio data in the multimedia data;
a comparison module configured to obtain a reply result for the target question by comparing the reply information with a preset answer to the target question;
a generating module configured to generate a target video according to the multimedia data, the target question, and the reply result.
According to a third aspect of embodiments herein, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the video generation method when executing the instructions.
According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the video generation method.
The video generation method provided by the embodiment of the application acquires the multimedia data of the target user aiming at the target problem, displays the acquired multimedia data and the target problem, so that the target user or other users can observe the answering process, correct the bad answering habit, and obtaining reply information of the target user to the target question based on audio data in the multimedia data, obtaining a reply result to the target question by comparing the reply information with a preset answer to the target question, and further generating a target video according to the multimedia data, the target question and the reply result, so that the target user or others can immediately determine the reply feedback of the target user according to the target video, so as to promote the target user to perform the thinking resistance and further promote the learning effect and the learning motivation of the target user.
Drawings
FIG. 1 is a block diagram of a computing device provided by an embodiment of the present application;
fig. 2 is a flowchart of a video generation method provided in an embodiment of the present application;
fig. 3 is a flowchart of a video generation method applied to an answering scene according to an embodiment of the present application;
fig. 4 is a schematic diagram of a video generation method provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application.
In the present application, a video generation method and apparatus, a computing device, and a computer-readable storage medium are provided, and detailed descriptions are individually provided in the following embodiments.
FIG. 1 shows a block diagram of a computing device 100 according to an embodiment of the present application. The components of the computing device 100 include, but are not limited to, memory 110 and processor 120. The processor 120 is coupled to the memory 110 via a bus 130 and a database 150 is used to store data.
Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 140 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present application, the above-mentioned components of the computing device 100 and other components not shown in fig. 1 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 1 is for purposes of example only and is not limiting as to the scope of the present application. Those skilled in the art may add or replace other components as desired.
Computing device 100 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.
Wherein the processor 120 may perform the steps of one of the video generation methods shown in fig. 2. Fig. 2 shows a flowchart of a video generation method provided in an embodiment of the present application, where the method includes steps 202 to 210.
Step 202, multimedia data of the target user for the target problem is obtained.
Specifically, the target user is a user who answers a target question, and the target question may be a question of any subject or a question of a survey, and the like, which is not limited herein; the multimedia data includes multimedia data obtained by shooting and recording the answering process of the target user for the target question through the shooting device and the recording device, and specifically, the multimedia data includes, but is not limited to, video data, audio data, text data, picture data, and the like, and is not limited herein.
According to the method and the device, the multimedia data of the target user in the answering process are recorded, and the answer of the target user to the target question is fed back immediately, so that the question making experience of the target user is improved, the target user and other people can check the target video, and the answering condition of the target user can be clearly known.
In practical applications, the target question may be selected from a question bank according to a preset question selection rule, and each question in the question bank includes, but is not limited to, a question serial number, a question identifier, a question surface, a preset answer, and a corresponding scoring standard, so that a user who answers the question may obtain relevant information of the target question during or after answering the question.
In specific implementation, in an optional implementation manner provided by the embodiment of the present application, the multimedia data of the target user for the target problem is obtained specifically by the following manner:
starting a shooting device to shoot in real time to obtain video data containing a target user;
recording in real time through a recording device to obtain the audio data;
and acquiring the video data and the audio data as the multimedia data.
In specific implementation, in the process of answering, the target user can shoot the process of answering the target question by the target user in real time through the shooting device, record the process of answering the target user in real time through the recording device, and obtain the shot video data and the recorded audio data in real time.
Taking the example that the user a answers the question 1, if the user a clicks the answer starting button to answer the question, the shooting device is started to shoot to obtain video data, the recording device is started to record to obtain audio data, and in the process of shooting the recording in real time, the video data and the recording data aiming at the question 1 are obtained in real time and serve as the multimedia data M provided by the user a aiming at the question 1.
Further, in order to increase interest and enthusiasm of the target user in making questions and play back background music during the process of making questions by the target user, in an optional implementation manner provided by the embodiment of the present application, the recording sound in real time by a recording device to obtain the audio data includes:
and starting the playing device to play the background music, and starting the recording device to record to obtain the audio data containing the background music.
Specifically, in the process of recording the answer process of the target user in real time by the recording device, the playing device plays the background music, and then the background music is recorded together in the recording process to obtain the audio data containing the background music.
And step 204, displaying the obtained multimedia data and the target problem.
In practical application, on the basis of acquiring multimedia data provided by a target user for a target question, the acquired multimedia data and the target question are displayed, so that the target user can intuitively see and/or hear the recorded multimedia data and the target question to be answered.
It should be noted that, when the obtained multimedia data and the target question are shown, the target question may be added to the video data included in the multimedia data for showing, and the added position may be any position, which is not limited herein.
Optionally, the target question is displayed in the form of a title.
Along with the above example, on the basis of the above real-time acquisition of video data and audio data as the multimedia data M provided by the user a for the question 1, the acquired question 1 is added to the video data and is presented in real time.
In order to ensure that a target user has a good reading perspective for a target problem and improve the answering experience of the target user, the method adjusts the display position of the target problem in real time according to the position of the target user in the answering process of the user, and specifically, in an optional implementation mode provided by the embodiment of the method, the display is acquired by multimedia data and the target problem are realized by the following specific method:
carrying out face positioning and/or human body posture positioning on an image frame of video data in the multimedia data to acquire position information of a face and/or a human body in the image frame;
determining the display position of the target problem according to the position information and a preset display rule;
and displaying the target problem in the video data according to the display position.
Specifically, the performing face positioning and/or human body posture positioning on the image frame in the multimedia data means that a face or a human body in the image frame of the video data in the multimedia data is recognized through face recognition or posture recognition, and further position information of the face and the human body in the image frame is determined.
Further, a display position of the target problem is determined according to the position information and a preset display rule, specifically, the display rule refers to a corresponding relationship between the preset position information and the display position, for example, the display position is opposite to the position information, or the display position is right above a human face, and the like, without limitation, and after the display position is determined, the target problem and the multimedia data are fusion-displayed.
According to the above example, in the process of displaying the multimedia data M and the target problem in real time, the face of the image frame of the video data is positioned in real time, the position information X of the face in the image frame is obtained, the display position is determined to be under the position information X according to the position information X and the preset display rule, and the problem 1 is added under the position information X for displaying.
In addition, in order to make the target user clearly determine the specific answer time in the answer process, so as to better control the answer time and improve the answer efficiency, in an optional implementation manner provided by the embodiment of the present application, the answer time for the target question is shown on the basis of the multimedia data obtained by the display and the target question, and the following implementation manner is specifically adopted:
starting a timing program to perform answer timing aiming at the target question, wherein the time count value corresponding to the timing program is increased or decreased according to a time unit;
and displaying the target problem and the time count value in video data of the multimedia data.
Specifically, the counting procedure is used for counting the response time length, the timing procedure corresponds to an initial time count value at the beginning of starting the timing procedure, and the time count value is incremented or decremented according to a time unit in the timing procedure, wherein the time unit is a unit for measuring the response time length of the target question, and can be set according to actual needs, such as seconds, milliseconds, and the like.
For example, the preset answer time is 60 seconds, before the question 1 is displayed, a timing program which counts down according to the second from 60 is started, and the time count value corresponding to the timing program and the target question are displayed in the video data in real time.
And step 206, obtaining the reply information of the target user to the target question based on the audio data in the multimedia data.
It should be noted that, in the present application, the target user is set to answer in a voice manner, and therefore, the audio data in the obtained multimedia data is the audio data recorded in real time in the shooting process, so as to collect the answer of the target user to the target question.
In a specific implementation, in order to control the answering time of the target user and improve the question making efficiency of the target user, in an optional implementation manner provided in the embodiment of the present application, the obtaining of the answer information of the target user for the target question based on the audio data in the multimedia data is specifically implemented in the following manner:
detecting voice data in the audio data within a preset time interval;
intercepting the audio data according to the voice data under the condition that the voice data are detected to obtain target audio data;
and identifying the voice data in the target audio data to obtain text information corresponding to the target audio data, and taking the text information as the reply information.
The preset time interval is preset answering time for a target question, specifically, detecting Voice data of audio data in the preset time interval can be realized through Voice Activity Detection (VAD) algorithm, since there may be a case that a target does not answer in the preset time interval, there is a case that no Voice data exists, and when the Voice data is detected, the Voice data in the preset time interval in the audio data is intercepted to obtain answering data of the target question by the target user, namely, the target audio data, and the Voice data in the target audio data is identified, so that text information corresponding to the answering data can be obtained, and the text information is used as answering information of the target question by the target user.
In practical application, after the target user finishes answering, a finishing instruction for the target question is submitted, and after the finishing instruction submitted by the target user for the target question is obtained, voice data for displaying the target question and obtaining audio data among the finishing instructions are obtained and used as answering data of the target user for the target question.
According to the above example, the preset time interval is 60 seconds, the voice data in the audio data M in the multimedia data M within 60 seconds is detected, under the condition that the voice data is detected, the start-stop end point of the voice data is detected through the VAD algorithm, the audio data is intercepted according to the start-stop end point, the target audio data V is obtained, voice recognition is performed on the voice data in the target audio data V, and the reply information corresponding to the voice data is obtained.
In an optional implementation manner provided by the embodiment of the present application, the reply information is determined to be incomplete when the voice data is not detected.
In practical application, if the time count value is displayed within a preset time interval, the count-down display is performed on the answer time of the target question to remind the user of how much answer time remains, and when the time count value is cleared, the vocal data is not detected, the target user is determined to be unfinished, the answer information of the target question is determined to be unfinished, the situation that the answer information is not clear when the target user does not answer within the preset time interval is avoided, and the question which is not answered is counted according to the answer information by determining that the answer information is unfinished when a plurality of questions exist.
And step 208, comparing the reply information with a preset answer of the target question to obtain a reply result aiming at the target question.
Specifically, on the basis of obtaining the reply information of the target question, the reply information is compared with the preset answer of the target question, whether the reply information is the correct answer of the target question is judged, manual correction of the reply information is not needed, and labor cost is reduced.
In a specific implementation, because the target question may not be only a selection question, if the selection question is, the answer information of the target user may be determined to be correct or incorrect by direct comparison, if the selection question is a calculation question or a question-and-answer question, the answer information of the target user needs to be analyzed more deeply, and a corresponding answer result is obtained according to the answer degree of the target user, so as to more accurately judge the answer information of the target user.
Acquiring the preset answer of the target question according to the question mark of the target question;
comparing the preset answer with the reply information according to a preset grading standard to obtain a grade aiming at the reply information;
and taking the scores and the reply information as the reply result.
Specifically, the question identifier may be a character string or a code string, and uniquely identifies a question, and then according to the question identifier, a preset answer, i.e., a standard correct answer, of the target question may be obtained, and then according to the scoring standard, the scoring points in the preset answer and the scoring points in the reply information are compared one by one, so as to determine the score of the reply information, and the score and the reply information are used together as a reply result for the target user to view.
Step 210, generating a target video according to the multimedia data, the target question and the reply result.
Specifically, on the basis of obtaining the response result, the multimedia data, the target question and the response result are combined to generate the target video, specifically, the video data in the multimedia video and the target question are combined according to the display mode in the step 204, corresponding audio data are further added to the video data, and on the basis of obtaining the response result, the response result is added to the video data according to the response time to obtain the target video, so that the target user can immediately know the response result, and the learning power and the learning efficiency of the target user are improved.
In practical application, the target video can be generated by combining the correct answer of the target question and deeply analyzing the correct answer on the basis of the multimedia data, the target question and the response result, so that the target user can know the correct answer in time, learn the correct answer and correct the error of answering the question.
Further, the multimedia data includes: on the basis of the audio data containing the background music, the generated target video also contains the background music correspondingly, so that the boring feeling of checking the target video is avoided, and the relaxing feeling and the interestingness of checking the target video are increased.
In order to ensure the sound quality and stability of the background music in the target video, an optional implementation manner provided by the embodiment of the present application, after the step of generating the target video according to the multimedia data, the target question and the reply result is executed, the method further includes:
aligning background music contained in the audio data with the background music played by the playing device according to the sound fingerprint;
and fusing the target video and the background music to obtain a second target video in a mode of fusing the aligned background music and the played background music.
Specifically, the sound fingerprint refers to a unique feature in the audio, and the same sound can be identified according to the sound fingerprint, in the embodiment of the present application, the same feature in the background music included in the audio data and the background music played by the playing device is aligned according to a time sequence through the sound fingerprint, and the target video is updated by fusing the aligned included background music and the background music played by the playing device, so as to obtain the second target video.
In practical application, after the target user completes the reply of the target question, the target user can continue to answer the question and further generate a corresponding video, which not only enriches the questions to be answered by the target user, but also increases the completeness of the questions.
Determining a second target question in a question bank to which the target question belongs;
and taking the second target question as the target question, and returning to execute the step of acquiring the multimedia data of the target user aiming at the target question.
In specific implementation, the manner of determining the second target question in the question bank to which the target question belongs is various, for example, the second target question is determined according to the sequence of the question numbers of the target questions from small to large, or the second target question is determined according to the preset question sequence, and the like, which is not limited herein.
After the second target question is determined, the above step 202 may be performed again, and by repeating the above steps 202 to 210, an answer video for the second target question is generated, and then the target video and the answer video corresponding to the second target question are combined to form an answer video for two questions.
Further, in an optional implementation manner provided by the embodiment of the present application, the second target question is determined according to a response result of the target user to the target question, and the following implementation manner is specifically adopted:
and determining a second target problem in the question bank to which the target problem belongs according to the problem type to which the target problem belongs and the difficulty value corresponding to the target problem.
In a specific implementation, the score of the response information included in the response result may be compared with a score threshold, where the score threshold is used to indicate the correctness of the response information, and in a case where the score of the response information is greater than or equal to the score threshold, it indicates that the response of the target user to the target question is substantially or completely correct, and in a case where the score of the response information is less than the score threshold, it indicates that the response of the target user to the target question is mostly incorrect.
The problem type of the target problem indicates a knowledge point corresponding to the target problem, different problem types correspond to different knowledge points, and the difficulty value corresponding to the target problem indicates the depth mined for the knowledge point by the problem, and can be represented by a specific numerical value, wherein the larger the numerical value is, the higher the difficulty value is, and in addition, the higher the grade is, the higher the difficulty is, and no limitation is made herein; when the score of the response information is greater than or equal to the score threshold, the next question, namely the second target question, can be selected by improving the difficulty of the question type to which the target question belongs and/or changing the question type and the like, and when the score of the response information is smaller than the score threshold, the second target question with similar knowledge points and difficulty values to the target question or the second target question with similar knowledge points and lower difficulty values is selected for errors in answer of the target user, so that the target user can master the knowledge points corresponding to the target question, and the learning effect of the target user is improved.
In summary, the video generation method provided by the embodiment of the present application obtains the multimedia data of the target user for the target problem, and displays the obtained multimedia data and the target problem, so that the target user or other users can observe the answering process, correct the bad answering habit, and obtaining reply information of the target user to the target question based on audio data in the multimedia data, obtaining a reply result to the target question by comparing the reply information with a preset answer to the target question, and further generating a target video according to the multimedia data, the target question and the reply result, so that the target user or others can immediately determine the reply feedback of the target user according to the target video, so as to promote the target user to perform the thinking resistance and further promote the learning effect and the learning motivation of the target user.
In the following, with reference to fig. 3, the video generation method according to an embodiment of the present application is further described by taking an application of the video generation method in a question answering scene as an example. Fig. 3 shows a flowchart of a video generation method applied to a question answering scene according to an embodiment of the present application, which specifically includes the following steps:
step 302, based on the received answer instruction, starting a shooting device to shoot in real time, and obtaining video data containing the target user.
And step 304, recording in real time through a recording device to obtain audio data.
Step 306, obtaining the video data and the audio data as multimedia data.
And 308, performing face positioning and/or human body posture positioning on the image frame of the video data in the multimedia data to acquire the position information of the face and/or the human body in the image frame.
And 310, determining the display position of the target problem according to the position information and a preset display rule.
And step 312, displaying the target problem in the video data according to the display position.
Specifically, as shown in fig. 4, the target question is presented directly above the target user.
And step 314, detecting voice data in the audio data within a preset time interval.
And step 316, intercepting the audio data according to the voice data under the condition that the voice data is detected to obtain target audio data.
Step 318, recognizing the voice data in the target audio data, obtaining text information corresponding to the target audio data, and using the text information as reply information.
And 320, acquiring the preset answer of the target question according to the question mark of the target question.
Step 322, comparing the preset answer with the reply information according to a preset scoring standard, and obtaining a score for the reply information.
Step 324, using the score and the reply information as the reply result of the target question.
Step 326, generating a target video according to the multimedia data, the target question and the reply result.
Specifically, as shown in fig. 4, after the target question is displayed right above the target user, the target video is generated by combining the response result of the target user to the target question, and after the response result is obtained, the target question and the response result are displayed right above the target user together.
And 328, determining a second target problem in the question bank to which the target problem belongs according to the problem type to which the target problem belongs and the difficulty value corresponding to the target problem.
Specifically, after the second target question is determined, the above steps 302 to 326 are repeatedly performed to generate a target video for the second target question, and further, after the target video is generated, questions can be continuously generated and answered, and the target videos generated for the questions can be combined into one answer video for a plurality of questions.
In summary, the video generation method provided by the embodiment of the present application obtains the multimedia data of the target user for the target problem, and displays the obtained multimedia data and the target problem, so that the target user or other users can observe the answering process, correct the bad answering habit, and obtaining reply information of the target user to the target question based on audio data in the multimedia data, obtaining a reply result to the target question by comparing the reply information with a preset answer to the target question, and further generating a target video according to the multimedia data, the target question and the reply result, so that the target user or others can immediately determine the reply feedback of the target user according to the target video, so as to promote the target user to perform the thinking resistance and further promote the learning effect and the learning motivation of the target user.
Corresponding to the above video generation method embodiment, the present application further provides a video generation apparatus embodiment, and fig. 5 shows a schematic structural diagram of a video generation apparatus provided in an embodiment of the present application. As shown in fig. 5, the apparatus includes:
an obtaining module 502 configured to obtain multimedia data of a target user for a target question;
a presentation module 504 configured to present the acquired multimedia data and the target question;
an obtaining module 506 configured to obtain reply information of the target user for the target question based on audio data in the multimedia data;
a comparison module 508 configured to obtain a reply result for the target question by comparing the reply information with a preset answer to the target question;
a generating module 510 configured to generate a target video according to the multimedia data, the target question and the reply result.
Optionally, the obtaining module 506 includes:
the detection submodule is configured to detect human voice data in the audio data within a preset time interval;
the intercepting submodule is configured to intercept the audio data according to the voice data under the condition that the voice data are detected, so that target audio data are obtained;
and the recognition submodule is configured to recognize the voice data in the target audio data, obtain text information corresponding to the target audio data, and use the text information as the reply information.
Optionally, the obtaining module 506 further includes:
a determination sub-module configured to determine the reply information as incomplete if the vocal data is not detected.
Optionally, the display module 504 includes:
the positioning sub-module is configured to perform face positioning and/or human body posture positioning on an image frame of video data in the multimedia data to acquire position information of a face and/or a human body in the image frame;
the position determining sub-module is configured to determine a display position of the target problem according to the position information and a preset display rule;
and the first display sub-module is configured to display the target problem in the video data according to the display position.
Optionally, the comparing module 508 includes:
the answer obtaining sub-module is configured to obtain the preset answer of the target question according to the question identification of the target question;
the grade obtaining submodule is configured to compare the preset answer with the reply information according to a preset grade standard, and obtain a grade aiming at the reply information; and taking the scores and the reply information as the reply result.
Optionally, the obtaining module 502 includes:
the starting shooting sub-module is configured to start shooting equipment to shoot in real time to obtain video data containing a target user;
starting a recording submodule configured to record in real time through a recording device to obtain the audio data;
an acquire data submodule configured to acquire the video data and the audio data as the multimedia data.
Optionally, the sound recording start sub-module is further configured to:
and starting the playing device to play the background music, and starting the recording device to record to obtain the audio data containing the background music.
Optionally, the video generating apparatus further includes:
an alignment module configured to align the background music contained in the audio data with the background music played by the playing device according to the sound fingerprint;
and the fusion module is configured to fuse the target video and the background music to obtain a second target video in a manner of fusing the aligned background music and the played background music.
Optionally, the video generating apparatus further includes:
the question determining module is configured to determine a second target question in a question bank to which the target question belongs; and taking the second target question as the target question, and returning to execute the step of acquiring the multimedia data of the target user aiming at the target question.
Optionally, the problem determination module is further configured to:
and determining a second target problem in the question bank to which the target problem belongs according to the problem type to which the target problem belongs and the difficulty value corresponding to the target problem.
Optionally, the display module 504 includes:
the timing submodule is configured to start a timing program to perform answer timing on the target question, and a time count value corresponding to the timing program is increased or decreased according to a time unit;
a presentation timing sub-module configured to present the target issue and the time count value in video data of the multimedia data.
It should be noted that the components in the device claims should be understood as functional blocks which are necessary to implement the steps of the program flow or the steps of the method, and each functional block is not actually defined by functional division or separation. The device claims defined by such a set of functional modules are to be understood as a functional module framework for implementing the solution mainly by means of a computer program as described in the specification, and not as a physical device for implementing the solution mainly by means of hardware.
The above is a schematic scheme of a video generating apparatus of the present embodiment. It should be noted that the technical solution of the video generation apparatus belongs to the same concept as the technical solution of the above-mentioned video generation method, and details that are not described in detail in the technical solution of the video generation apparatus can be referred to the description of the technical solution of the above-mentioned video generation method.
There is also provided in an embodiment of the present application a computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the video generation method when executing the instructions.
An embodiment of the present application further provides a computer readable storage medium, which stores computer instructions, and when the instructions are executed by a processor, the instructions implement the steps of the video generation method as described above.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the video generation method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the video generation method.
The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims (14)

1. A method of video generation, comprising:
acquiring multimedia data of a target user aiming at a target problem;
displaying the obtained multimedia data and the target problem;
obtaining reply information of the target user aiming at the target question based on audio data in the multimedia data;
obtaining a reply result for the target question by comparing the reply information with a preset answer of the target question;
and generating a target video according to the multimedia data, the target question and the reply result.
2. The video generation method according to claim 1, wherein the obtaining reply information of the target user to the target question based on audio data in the multimedia data comprises:
detecting voice data in the audio data within a preset time interval;
intercepting the audio data according to the voice data under the condition that the voice data are detected to obtain target audio data;
and identifying the voice data in the target audio data to obtain text information corresponding to the target audio data, and taking the text information as the reply information.
3. The video generation method according to claim 2, wherein after the step of detecting the human voice data in the audio data within the preset time interval is executed, the method further comprises:
in a case where the human voice data is not detected, the reply information is determined as incomplete.
4. The video generation method of claim 1, wherein said presenting the obtained multimedia data and the target question comprises:
carrying out face positioning and/or human body posture positioning on an image frame of video data in the multimedia data to acquire position information of a face and/or a human body in the image frame;
determining the display position of the target problem according to the position information and a preset display rule;
and displaying the target problem in the video data according to the display position.
5. The video generation method according to claim 1, wherein the obtaining of the answer result to the target question by comparing the answer information with a preset answer to the target question comprises:
acquiring the preset answer of the target question according to the question mark of the target question;
comparing the preset answer with the reply information according to a preset grading standard to obtain a grade aiming at the reply information;
and taking the scores and the reply information as the reply result.
6. The video generation method according to claim 1, wherein the obtaining multimedia data of the target user for the target question comprises:
starting a shooting device to shoot in real time to obtain video data containing a target user;
recording in real time through a recording device to obtain the audio data;
and acquiring the video data and the audio data as the multimedia data.
7. The video generation method according to claim 6, wherein the obtaining the audio data by recording in real time by a recording device comprises:
and starting the playing device to play the background music, and starting the recording device to record to obtain the audio data containing the background music.
8. The video generation method of claim 7, wherein after the step of generating the target video according to the multimedia data, the target question and the reply result is executed, the method further comprises:
aligning the background music contained in the audio data with the background music played by the playing device according to the sound fingerprint;
and fusing the target video and the background music to obtain a second target video in a mode of fusing the aligned background music and the played background music.
9. The video generation method of claim 1, wherein after the step of generating the target video according to the multimedia data, the target question and the reply result is executed, the method further comprises:
determining a second target question in a question bank to which the target question belongs;
and taking the second target question as the target question, and returning to execute the step of acquiring the multimedia data of the target user aiming at the target question.
10. The video generation method of claim 9, wherein the determining a second target question in the question bank to which the target question belongs comprises:
and determining a second target problem in the question bank to which the target problem belongs according to the problem type to which the target problem belongs and the difficulty value corresponding to the target problem.
11. The video generation method of claim 1, wherein said presenting the obtained multimedia data and the target question comprises:
starting a timing program to perform answer timing aiming at the target question, wherein the time count value corresponding to the timing program is increased or decreased according to a time unit;
and displaying the target problem and the time count value in video data of the multimedia data.
12. A video generation apparatus, comprising:
the acquisition module is configured to acquire multimedia data of a target user aiming at a target problem;
a presentation module configured to present the acquired multimedia data and the target question;
an obtaining module configured to obtain reply information of the target user for the target question based on audio data in the multimedia data;
a comparison module configured to obtain a reply result for the target question by comparing the reply information with a preset answer to the target question;
a generating module configured to generate a target video according to the multimedia data, the target question, and the reply result.
13. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-11 when executing the instructions.
14. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 11.
CN202010839622.9A 2020-08-19 2020-08-19 Video generation method and device Pending CN111985395A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010839622.9A CN111985395A (en) 2020-08-19 2020-08-19 Video generation method and device
CN202110252283.9A CN112861784B (en) 2020-08-19 2021-03-08 Answering method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010839622.9A CN111985395A (en) 2020-08-19 2020-08-19 Video generation method and device

Publications (1)

Publication Number Publication Date
CN111985395A true CN111985395A (en) 2020-11-24

Family

ID=73435117

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010839622.9A Pending CN111985395A (en) 2020-08-19 2020-08-19 Video generation method and device
CN202110252283.9A Active CN112861784B (en) 2020-08-19 2021-03-08 Answering method and device

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202110252283.9A Active CN112861784B (en) 2020-08-19 2021-03-08 Answering method and device

Country Status (1)

Country Link
CN (2) CN111985395A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114566167A (en) * 2022-02-28 2022-05-31 安徽淘云科技股份有限公司 Voice answer method and device, electronic equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100385892B1 (en) * 1998-09-10 2003-08-14 이에스피 평가 아카데미(주) Foreign Language Speaking Assessment System
CN106485964B (en) * 2016-10-19 2019-04-02 深圳市鹰硕技术有限公司 A kind of recording of classroom instruction and the method and system of program request
CN108495194A (en) * 2018-03-21 2018-09-04 优酷网络技术(北京)有限公司 Video broadcasting method, computer storage media during answer and terminal device
CN109543011A (en) * 2018-10-16 2019-03-29 深圳壹账通智能科技有限公司 Question and answer data processing method, device, computer equipment and storage medium
CN110706536B (en) * 2019-10-25 2021-10-01 北京猿力教育科技有限公司 Voice answering method and device

Also Published As

Publication number Publication date
CN112861784A (en) 2021-05-28
CN112861784B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
CN107203953B (en) Teaching system based on internet, expression recognition and voice recognition and implementation method thereof
CN109359215B (en) Video intelligent pushing method and system
US10885802B2 (en) System and method for validating honest test taking
US20210104169A1 (en) System and method for ai based skill learning
JP2018205638A (en) Concentration ratio evaluation mechanism
CN113377200B (en) Interactive training method and device based on VR technology and storage medium
EP3454328A1 (en) Computer implemented method for providing feedback of harmonic content relating to music track
CN112862639B (en) Education method of online education platform based on big data analysis
CN112614489A (en) User pronunciation accuracy evaluation method and device and electronic equipment
CN109086431B (en) Knowledge point consolidation learning method and electronic equipment
CN111427990A (en) Intelligent examination control system and method assisted by intelligent campus teaching
CN111079499B (en) Writing content identification method and system in learning environment
CN111985395A (en) Video generation method and device
CN109657099A (en) A kind of learning interaction method and study client
CN112837190B (en) Training method based on online interaction training classroom training device
CN111601061B (en) Video recording information processing method and electronic equipment
CN111046293B (en) Method and system for recommending content according to evaluation result
CN114971975B (en) Learning abnormity prompting method and system for online education platform
CN110991943A (en) Teaching quality evaluation system based on cloud computing
CN116320534A (en) Video production method and device
CN111027536A (en) Question searching method based on electronic equipment and electronic equipment
CN112560728B (en) Target object identification method and device
CN111078992B (en) Dictation content generation method and electronic equipment
CN113837010A (en) Education assessment system and method
JP2014112269A (en) Learning support system, learning support server, learning support method and learning support program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201124