WO2021235246A1 - 情報処理装置、生成方法、およびプログラム - Google Patents

情報処理装置、生成方法、およびプログラム Download PDF

Info

Publication number
WO2021235246A1
WO2021235246A1 PCT/JP2021/017535 JP2021017535W WO2021235246A1 WO 2021235246 A1 WO2021235246 A1 WO 2021235246A1 JP 2021017535 W JP2021017535 W JP 2021017535W WO 2021235246 A1 WO2021235246 A1 WO 2021235246A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
importance
lecture
video
information processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2021/017535
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
洸嘉 藤井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Group Corp
Original Assignee
Sony Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Group Corp filed Critical Sony Group Corp
Priority to CN202180034991.3A priority Critical patent/CN115552889A/zh
Priority to US17/916,717 priority patent/US20230141178A1/en
Priority to JP2022524382A priority patent/JP7790342B2/ja
Publication of WO2021235246A1 publication Critical patent/WO2021235246A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • G09B5/065Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules

Definitions

  • the present technology relates to an information processing device, a generation method, and a program, and more particularly to an information processing device, a generation method, and a program capable of editing or playing back a lecture recorded video in an appropriate form.
  • Patent Document 1 the importance is evaluated based on the number of remarks in each section of the video divided based on the remark time of a predetermined person, the number of participants in the discussion, the discussion time, the volume, gestures, emotions, and the like. , Techniques for editing less important sections are described.
  • Patent Document 1 When the technique described in Patent Document 1 is applied to the editing of the video recorded in the lecture, the importance is determined based on the information associated with the person such as the teacher's speech time, volume, gesture, and emotion.
  • the video recorded in the lecture is edited according to the importance determined in this way, it is judged to be important in learning because the section of the video in which the teacher is writing on the board is judged to be less important. There is a possibility that the information about the order in which the board writing was done will be lost from the lecture recording video.
  • This technology was made in view of such a situation, and enables the video recorded in the lecture to be edited or played back in an appropriate form.
  • the information processing device of one aspect of the present technology is for reproduction assistance according to the importance determined based on the information about the lecture for each of the predetermined sections separating the data including the video and sound of the lecture. It is an information processing apparatus provided with a generation unit for generating the information of.
  • the generation method of one aspect of the present technology is for assisting reproduction according to the importance determined based on the information about the lecture for each of the predetermined sections separating the data including the video and sound of the lecture. It is a generation method that generates information.
  • the program of one aspect of the present technology assists the computer in playing back according to the importance determined based on the information about the lecture for each of the predetermined sections separating the data including the video and sound of the lecture. It is a program to execute the process to generate the information for.
  • information for assisting reproduction is provided according to the importance determined based on the information about the lecture for each of the predetermined sections that separate the data including the video and sound of the lecture. Generated.
  • FIG. 1 is a diagram showing the appearance of a shooting system according to an embodiment of the present technology.
  • the shooting system is configured as a lecture capture system and is installed in classrooms and auditoriums where teacher U1 gives lectures to student U2.
  • FIG. 1 shows a student (auditor) U2 listening to a lecture given by a teacher (lecturer) U1 while using a whiteboard WB in a classroom (lecturer room).
  • Teacher U1 is a person who is giving a lecture, and explains the lecture while writing on the whiteboard WB during the lecture.
  • the board writing is done or erased according to the explanation of the lecture.
  • the color of the board is not limited to one color, and multiple colors are used.
  • the characters shown by solid lines on the surface of the whiteboard WB represent characters written with a black pen (pen with black ink), and the characters shown with dotted lines are with a red pen (pen with red ink). ) Represents a character written in.
  • Student U2 is a person who is listening to the lecture, and speaks during the lecture or goes out and writes on the board.
  • the lecture may be taken in a place such as a dedicated studio where there is no student U2.
  • the lecture may be taken while a plurality of students are listening to the lecture in the classroom.
  • the video shooting device 1 is installed in the lecture room and shoots at an angle of view in which the teacher U1 and the whiteboard WB are captured.
  • the video data including the video signal and the sound signal representing the captured video is output to the arithmetic unit 2.
  • the arithmetic unit 2 receives the video data supplied from the video shooting device 1 and determines the importance based on the video signal and the sound signal.
  • the arithmetic unit 2 edits the video data based on the result of the importance determination.
  • FIG. 2 is a block diagram showing a configuration example of a shooting system.
  • the shooting system of FIG. 2 is composed of a video shooting device 1, an arithmetic unit 2, a recording device 3, and an input / output device 4.
  • the video shooting device 1 is configured as, for example, a camera that shoots at an angle of view in which the teacher U1 and the whiteboard WB are simultaneously captured.
  • the video data representing the captured video is output to the arithmetic unit 2.
  • the number of video imaging devices 1 is not limited to one, and a plurality of image photographing devices 1 may be provided.
  • the arithmetic unit 2 is configured as an information processing device that receives video data supplied from the video shooting device 1 and determines the importance based on the video data.
  • the arithmetic unit 2 is connected to the videographing device 1 by wired or wireless communication.
  • the arithmetic unit 2 edits the video data based on the result of the importance determination, and outputs the edited video data to the recording device 3 and the input / output device 4.
  • the arithmetic unit 2 may be configured by dedicated hardware having each function, or may be configured by a general computer, and each function may be realized by software. Further, the arithmetic unit 2 and the video imaging device 1 may not be configured as independent devices, but may be integrally configured as one device.
  • the recording device 3 records the video data supplied from the arithmetic unit 2.
  • the recording device 3 and the arithmetic unit 2 may not be configured as independent devices, but may be integrally configured as one device. Further, the recording device 3 may be connected to the arithmetic unit 2 via a network.
  • the input / output device 4 is composed of a keyboard and a mouse that accept user operations, a display having a display function, a speaker having a sound output function, and the like.
  • a display having a display function may be provided with a touch panel function.
  • the input / output device 4 receives an instruction based on the user's operation, and outputs a rule signal indicating the instruction by the user to the arithmetic unit 2. For example, the user is instructed by an importance determination rule indicating what kind of information the importance determination is based on, and an editing rule indicating what kind of editing is to be performed based on the result of the importance determination.
  • the input / output device 4 presents the data including the video signal and the sound signal supplied from the arithmetic unit 2 to the user.
  • the input / output device 4 and the arithmetic unit 2 may not be configured as independent devices, but may be configured as one device. Further, the input / output device 4 may be connected to the arithmetic unit 2 via a network.
  • FIG. 3 is a block diagram showing a functional configuration example of the arithmetic unit 2.
  • the arithmetic unit 2 of FIG. 3 is composed of a video input unit 101, a video analysis unit 102, a sound analysis unit 103, a control parameter input unit 104, an importance determination unit 105, an automatic editing execution unit 106, and a video output unit 107. ..
  • the video input unit 101 receives at least one video data supplied from the video shooting device 1. As described above, the video data includes a video signal and a sound signal. The video input unit 101 supplies a video signal representing a video shot by the video shooting device 1 to the video analysis unit 102, and outputs a sound signal representing the sound collected in the lecture room to the sound analysis unit 103.
  • the video analysis unit 102 analyzes at least one type of video information (information representing a video related to a lecture) based on the video signal supplied from the video input unit 101. For example, information about the behavior of the teacher, the behavior of the student, the content of the board, the amount of increase / decrease in the characters on the board, the color of the characters on the board, the material attached to the whiteboard, etc. is analyzed by the video analysis unit 102 as video information. ..
  • the video analysis unit 102 outputs the analysis result of the video information and the video signal to the importance determination unit 105.
  • the sound analysis unit 103 analyzes at least one type of sound information (information representing the sound related to the lecture) based on the sound signal supplied from the video input unit 101. For example, information about the teacher's voice, the student's voice, and the chime sound is analyzed by the sound analysis unit 103 as sound information. In the following, when it is not necessary to distinguish between the video information and the sound information, the video information and the sound information are collectively referred to as analysis information.
  • the sound analysis unit 103 outputs the analysis result of the sound information and the sound signal to the importance determination unit 105.
  • the control parameter input unit 104 receives a rule signal representing an importance determination rule and a rule signal representing an editing rule supplied from the input / output device 4.
  • FIG. 4 is a diagram showing an example of an importance determination rule.
  • the importance judgment rule for the video information for example, "when the teacher is facing forward (toward the back side of the classroom), the importance is high” and "the teacher writes on the board”. If so, the importance is low “,” If the student is writing on the board, the importance is high “,” If the student is writing on the board using a red pen (red pen), the importance is high “,” When the amount of writing on the board is reduced, the importance is low "is instructed by the user.
  • the importance judgment rule for sound information for example, "the importance is high when the teacher is explaining”, “the importance is high when the student is asking a question”, and "the chime is”. The user instructs the rule that "if it sounds, it is of high importance”.
  • FIG. 5 is a diagram showing an example of an editing rule.
  • the control parameter input unit 104 of FIG. 3 outputs the rule signal representing the importance determination rule as described above to the importance determination unit 105, and outputs the rule signal representing the edit rule to the automatic editing execution unit 106.
  • the importance determination unit 105 determines the analysis result of the video information supplied from the video analysis unit 102 and the analysis result of the sound information supplied from the sound analysis unit 103 according to the rule signal supplied from the control parameter input unit 104. The importance is judged based on this.
  • the importance is not determined as a unique value for the entire video data, but as a value for each section in which the video data is divided into short times.
  • a method of dividing video data at predetermined time intervals for example, 5 seconds
  • a method of dividing video data based on a teacher's voice for example, sound pressure
  • a method of recognizing the tip of a pen used for writing on a board There are a method of dividing the video data at a timing when the tip of the pen is separated from the board surface of the whiteboard by a predetermined time, a method of dividing the video data based on the amount of increase / decrease of characters on the board, and the like.
  • the video data may be divided by combining these division methods.
  • the importance determination unit 105 determines the importance for each section in which the video data is divided, not by a binary value such as important or non-important, but by a value of -1.0 to 1.0, for example.
  • the importance may be further determined for the determination section which is a section obtained by combining a plurality of consecutive sections for which the importance has been determined.
  • the importance of the determination section is an average value, a maximum value, a minimum value, a weighted sum according to the time length of each section, and the like.
  • the average value, maximum value, minimum value, sum, product, and rule signal of the importance determined based on the analysis results of each analysis information.
  • One of the weighted sums according to the weights represented by is the final importance.
  • the number of sections to be summarized as the determination section is, for example, a preset number of sections.
  • the number of sections set based on the teacher's voice, the number of sections set based on the recognition result of the pen tip, and the number of sections set based on the amount of increase / decrease in the characters on the board are summarized as the judgment section. You may do it.
  • the importance determination unit 105 outputs the video data that summarizes the video signal supplied from the video analysis unit 102 and the sound signal supplied from the sound analysis unit 103, and the result of the importance determination to the automatic editing execution unit 106. ..
  • the automatic editing execution unit 106 edits the video data based on the result of the importance determination by the importance determination unit 105 according to the rule signal supplied from the control parameter input unit 104.
  • the video data edited by the automatic editing execution unit 106 is output to the video output unit 107.
  • the video output unit 107 outputs the video data supplied from the automatic editing execution unit 106 to the recording device 3 and the input / output device 4.
  • Example of video data editing> an example of editing video data obtained by recording a lecture in a classroom described with reference to FIG. 1 will be described. Here, it is assumed that a 120-minute lecture was given in the classroom shown in FIG.
  • 6 to 8 are diagrams showing an example of a timeline of a video recorded in a lecture.
  • the video data of the video recorded in the lecture is divided into 12 judgment sections of judgment sections 1 to 12 in chronological order.
  • Each determination section is a section every 10 minutes.
  • 6 to 8 show typical screenshots for each determination section and characters representing the contents of the sound.
  • the image of the determination section 1 shows the teacher U1 standing in front of the whiteboard WB. There is no writing on the whiteboard WB. As a typical sound of the determination section 1, a chime sound is recorded.
  • the image of the judgment section 2 shows the teacher U1 who writes on the board on the left side of the whiteboard WB with a black pen.
  • a typical sound of the determination section 2 a sound for writing on a board is recorded.
  • the image of the determination section 3 shows the teacher U1 explaining the board written on the whiteboard WB.
  • the voice of the teacher U1 is recorded.
  • the image of the judgment section 4 shows the teacher U1 who writes on the board on the upper right side of the whiteboard WB with a red pen.
  • a typical sound of the determination section 4 a sound for writing on a board is recorded.
  • the video of the determination section 5 shows the teacher U1 explaining the question from the student U2.
  • the voice of the teacher U1 and the voice of the question of the student U2 are recorded.
  • the image of the judgment section 6 shows the teacher U1 explaining while writing on the lower right side of the whiteboard WB with a black pen.
  • the chemical formula is written on the whiteboard WB by teacher U1.
  • the sound of writing on the board and the voice of the teacher U1 are recorded.
  • the image of the judgment section 7 shows the teacher U1 who erases the writing on the left side of the whiteboard WB.
  • a typical sound of the determination section 7 a sound of erasing the writing on the board is recorded.
  • the video of the determination section 8 shows the teacher U1 explaining the lecture.
  • the voice of the teacher U1 is recorded.
  • the student U2 chatting is shown together with the teacher U1 and the whiteboard WB.
  • the voice of the chat of the student U2 is recorded.
  • the image of the judgment section 10 shows the student U2 writing on the board on the lower left side of the whiteboard WB with a black pen.
  • a typical sound of the determination section 10 a sound for writing on a board is recorded.
  • the video of the determination section 11 shows the teacher U1 explaining the board writing performed on the whiteboard WB by the student U2.
  • the voice of the teacher U1 and the voice of the chat of the student U2 are recorded.
  • the video of the determination section 12 shows the teacher U1 explaining the summary of the lecture.
  • the voice of the teacher U1 and the sound of the chime are recorded.
  • the video analysis unit 102 and the sound analysis unit 103 analyze the video information and the sound information for each of the above 12 determination sections.
  • video information the movement of the teacher, the orientation of the teacher's face, the movement of the student, the color of the board, the increase / decrease in the amount of the board, and the content of the board are analyzed.
  • sound information the content of the teacher's voice, the volume of the teacher's voice, the tone of the teacher's voice, the question of the student's voice, the chat of the student's voice, the chime, the content sound, and the board writing sound are analyzed. ..
  • video information and sound information is performed using a conventional method. For example, it is possible to distinguish between teachers and students by image-based personal identification or voiceprint-based personal identification, and it is also possible to recognize the contents of board writing by combining the board writing extraction function and OCR (Optical Character Recognition). Is.
  • the importance determination unit 105 determines the importance of each of the 12 determination sections based on the analysis result of the video information and the analysis result of the sound information. Specifically, the importance determination unit 105 determines the importance of each analysis information in each section according to the importance determination rule. For example, the video data is divided into sections every 5 seconds.
  • the importance determination unit 105 collectively determines 120 consecutive sections as one determination section, and determines the average value of the importance of each analysis information in each of the 120 sections as the importance of the determination section.
  • FIG. 9 is a diagram showing an example of an importance determination rule.
  • the importance of student movement is determined by the rule that "when the student is in the angle of view, the importance is 1.0".
  • the volume of the teacher's voice is "the importance is 1.0 when the volume of the teacher's voice is louder than a certain level”, and the volume of the teacher's voice is "teacher”. If the voice of the teacher is emotional, the importance is 1.0. "
  • FIG. 10 is a diagram showing an example of the importance of each analysis information determined for each determination section.
  • the movement of the teacher As shown in FIG. 10, for each judgment section, the movement of the teacher, the orientation of the face of the teacher, the movement of the student, the color of the board, the increase / decrease of the board, the content of the board, the content of the teacher's voice, and the teacher's voice.
  • the importance of each of the volume, teacher's voice, student's voice question, student's voice chat, chime, content sound, and board writing sound is judged.
  • the importance of the teacher's movement is 0.3
  • the importance of the teacher's face orientation is 0.9
  • the importance of the student's movement is 0
  • the importance of the color of the board is 0
  • the importance of the board is 0.
  • Increase / decrease importance is 0, board content is 0
  • teacher's voice content is 0
  • teacher's voice volume is 0
  • teacher's voice is 0
  • student The importance of the voice question is 0, the importance of the student voice chat is 0, the importance of the chime is 1.0
  • the importance of the content sound is 0, and the importance of the board writing sound is 0.
  • the importance determination unit 105 calculates the sum of the importance determined for each analysis information in each of the determination sections as the final importance.
  • the final importance of each of the determination sections 1 to 12 is 2.2, 0.7, 1.9, 2.1, 2.4, 2.5, -0.9, 1.7, 1.6, It is required as 2.5, 1.6, 2.2.
  • the final order of importance is as follows: 1st place is judgment section 6 and judgment section 10, 3rd place is judgment section 5, 4th place is judgment section 1 and judgment section 12, 6th place is judgment section 4, 7th place is judgment section.
  • the 3rd and 8th places are the judgment section 8, the 9th place is the judgment section 9, the 11th and 11th places are the judgment section 2, and the 12th place is the judgment section 7.
  • the automatic editing execution unit 106 edits according to the final importance of the determination sections 1 to 12 according to the editing rule.
  • an editing rule it is assumed that the rule “delete in ascending order of importance so that the time of the recorded video of the lecture becomes 2/3 of the actual lecture time" is instructed.
  • the automatic editing execution unit 106 has the determination section 7, the determination section 2, and the determination section in ascending order of importance from the determination sections 1 to 12. Editing is performed so as to delete the four sections of 9 and the determination section 11.
  • FIG. 11 is a diagram showing an example of a timeline of the edited video data.
  • the judgment section 1 As shown in FIG. 11, in the edited video data, the judgment section 1, the judgment section 3, the judgment section 4, the judgment section 5, the judgment section 6, the judgment section 8, the judgment section 10, and the judgment section 12 are combined. It becomes video data.
  • the video data obtained by the above editing is output to the recording device 3 and the input / output device 4 by the video output unit 107.
  • the video data obtained by editing is recorded in the recording device 3 or presented to the user by the input / output device 4.
  • the process of FIG. 12 is started, for example, when the video data is input from the video shooting device 1 to the video input unit 101.
  • the video signal is output to the video analysis unit 102, and the sound signal is output to the sound analysis unit 103.
  • step S1 the video analysis unit 102 analyzes the video information based on the video signal.
  • step S2 the sound analysis unit 103 analyzes the sound information based on the sound signal.
  • the process of step S2 may be performed in parallel with the process of step S1, or may be performed after the process of step S1 is performed.
  • step S3 the importance determination unit 105 determines the importance of each section in which the video data is divided based on the analysis result of the video information by the video analysis unit 102 and the analysis result of the sound information by the sound analysis unit 103. do.
  • step S4 the automatic editing execution unit 106 generates information for reproduction assistance according to the importance determined by the importance determination unit 105. That is, the automatic editing execution unit 106 functions as a generation unit that generates information for assisting reproduction.
  • the information for assisting reproduction is information used to provide the user with the recorded video of the lecture.
  • the automatic editing execution unit 106 edits, for example, by deleting the video data in the less important section, compressing the less important section by making the compression rate higher than the compression rate in the other section, and the like. , Generates video data as information for playback assistance.
  • meta information for editing according to importance and meta information for reproduction according to importance may be generated as information for assisting reproduction. These meta information will be described later.
  • the information for assisting reproduction is output to the recording device 3 and the input / output device 4 by the video output unit 107, and is used to provide the lecture recording video to the user.
  • the input / output device 4 displays the lecture recorded video obtained by reproducing the video data as the playback assisting information and provides it to the user.
  • the video data is edited according to the importance determined for each section of the video data based on the analysis information of the information related to the lecture.
  • Information about the lecture includes information about teachers and students, board writing, chimes, materials attached to whiteboards, and video materials.
  • Patent Document 1 When the technique described in Patent Document 1 is applied to the editing of the video recorded in the lecture, the importance is determined based on the information associated with the person.
  • the video recorded in the lecture is edited according to the importance determined in this way, how the board is written by determining that the section of the video in which the teacher is writing on the board is less important. There is a possibility that information about whether the lectures were performed in the correct order will be lost from the recorded video of the lecture.
  • the section of the image of the board written with the red pen which is important in learning, is less important, the section of the image of the board written with the red pen is recorded in the lecture. It may be lost from the video.
  • the arithmetic unit 2 edits the video data according to the importance of the analysis information about the information about the lecture, the information on the order in which the board writing was done, the information on the board written with the red pen, etc. It is possible to edit video data without missing important information in the lecture recording.
  • the arithmetic unit 2 can edit the video recorded in the lecture in an appropriate form. Further, since the arithmetic unit 2 performs editing for deleting video data in sections that are not important in learning or editing for compressing with a high compression rate, it is possible to record video data of lecture recorded video with a reduced amount of data. It will be possible.
  • the user who watches the lecture recorded video will watch the video in which the sections that are not important in learning are deleted, so that the lecture content can be learned in a shorter time than the actual lecture time.
  • the importance is determined based on the analysis information about slide switching and animation.
  • this technique can be applied to taking lectures using something other than board writing. Further, the lecture may be photographed while the whiteboard and the screen are simultaneously present in the angle of view of the image photographing apparatus 1.
  • the importance may be judged based on the analysis information about the information about the board writing made on the blackboard, the green board, the paper such as imitation paper, and the like.
  • the sound related to the lecture may be picked up by a sound picking device different from the sound picking device mounted on the video shooting device 1.
  • a sound picking device different from the sound picking device mounted on the video shooting device 1.
  • a pin microphone worn by a teacher can allow the teacher's voice to be picked up.
  • the pin microphone is connected to the arithmetic unit 2 and outputs a sound signal representing the collected sound to the arithmetic unit 2.
  • the meta information for editing the information for reproduction assistance according to the importance may be generated by the automatic editing execution unit 106 as the information for reproduction assistance.
  • meta information representing the result of importance determination by the importance determination unit 105 is generated by the automatic editing execution unit 106 as meta information for editing according to the importance.
  • the video output unit 107 outputs the video data supplied from the video shooting device 1 and the meta information generated by the automatic editing execution unit 106 to the recording device 3 and the input / output device 4.
  • the input / output device 4 uses the meta information supplied from the arithmetic unit 2 to provide video data for each user. Edit and play the edited video data. By doing so, the videographing apparatus 1 can provide a lecture recording video having a length corresponding to the proficiency level of each user.
  • the video data is edited according to the proficiency level of each user according to the meta recorded in the recording device 3 according to the rule signal representing the editing rule for the arithmetic unit 2 to perform the editing according to the proficiency level of each user.
  • the video data may be edited based on the information.
  • meta information for performing reproduction according to the importance may be generated by the automatic editing execution unit 106 as information for assisting reproduction.
  • meta information representing the result of importance determination by the importance determination unit 105 is generated by the automatic editing execution unit 106 as meta information for performing reproduction according to the importance.
  • the video output unit 107 outputs the video data supplied from the video shooting device 1 and the meta information generated by the automatic editing execution unit 106 to the recording device 3 and the input / output device 4.
  • the input / output device 4 displays, for example, the playback position of a section of high importance on the seek bar of the viewing screen of the lecture recorded video. By doing so, the user who watches the lecture recorded video selects, for example, the playback position displayed on the seek bar of the viewing screen, and easily reproduces the video of the important section in learning from the lecture recorded video. It is possible to make it. It should be noted that the input / output device 4 skips the less important section and automatically reproduces only the reproduction position displayed on the seek bar when the importance is high, instead of selecting the reproduction position by the user. good.
  • thumbnail image representing each section whose importance is determined may be generated by the automatic editing execution unit 106.
  • the arithmetic unit 2 determines the importance of each frame constituting a certain section, and sets the frame image of the frame having the highest importance as a thumbnail image.
  • the frame image of the first or last frame of each section may be regarded as a thumbnail image.
  • the video output unit 107 outputs the playback assist information generated by the automatic editing execution unit 106 and the thumbnail image of each section of the lecture recorded video to the recording device 3 and the input / output device 4.
  • the input / output device 4 displays the playback position of the section having high importance and the thumbnail image of the section on the viewing screen. Display on the seek bar of. By doing so, the input / output device 4 can present clearer information to the user who views the video recorded in the lecture.
  • the type of analysis information analyzed by the video analysis unit 102 and the sound analysis unit 103 can be set in advance, and the user can use the rule signal input via the input / output device 4. It is also possible to be instructed. For example, when real-time performance is important for the user, it is instructed to analyze necessary and sufficient analysis information.
  • the importance may be determined according to the frequency with which each element to be the analysis information appears in the image obtained by taking a picture by the image photographing apparatus 1.
  • the importance determination unit 105 emphasizes the characters written with the black pen. Assuming that the characters are written in, the importance of the section where the instructor is writing on the board with a black pen is judged as a high value.
  • the importance determination unit 105 determines the importance according to the frequency of appearance of the writing on the board written with the red pen and the black pen. It is possible to make an importance judgment that reflects the teacher's intention, such as writing with a black pen.
  • the importance determination unit 105 assumes that the expression that appears repeatedly is an important expression in learning, and determines the importance of the section in which the expression that appears repeatedly appears. Judge as a high value. It is also possible to determine the importance of the interval including the timing when the expression that appears repeatedly is first written as a particularly high value.
  • the importance may be determined based on the time change of each analysis information. For example, the importance may be determined based on the time change of the amount of writing on the board.
  • FIG. 13 is a diagram showing the relationship between the time change of the amount of writing on the board and the importance.
  • a in FIG. 13 shows an example of the time change of the amount of writing on the board.
  • the horizontal axis represents the time and the vertical axis represents the amount of writing on the board.
  • the amount of writing on the board is increasing during the period up to time t1 (writing on the board is being performed).
  • the increase in the amount of writing on the board stopped at time t1 (the writing on the board was completed), and the amount of writing on the board did not change during the period from time t1 to time t2 (the explanation is continued without writing on the board).
  • time t2 the amount of writing on the board has decreased (the writing on the board has been erased).
  • B in FIG. 13 shows an example of the importance determined according to the time change of the amount of writing on the board.
  • the horizontal axis represents time and the vertical axis represents importance.
  • the importance becomes low in the period until the time t1 when the amount of writing on the board is increasing. At the timing of time t1 when the increase in the amount of writing on the board is stopped, the importance becomes high. In the period from time t1 to time t2 when the amount of writing on the board does not change, the importance gradually decreases from the timing when the amount of writing on the board does not change continuously for a certain period of time. At the timing of time t2 when the amount of writing on the board begins to decrease, the importance becomes low.
  • the importance determination unit 105 determines the importance of the increase / decrease in the amount of writing on the board as a value as shown in B of FIG. 13 according to the time change of the amount of writing on the board as shown in A of FIG.
  • the importance determination unit 105 determines the importance of each section of the video data according to the information related to the board writing based on the video and sound.
  • the information about the board writing is, for example, information representing the state of the board writing or information representing the contents of the board writing.
  • the information indicating the state of the board writing includes information indicating the amount of increase / decrease (time change) of the board writing, the position of the pen tip, the board writing sound, the color of the board writing, the frequency of appearance of the color of the board writing, and the like.
  • the information representing the contents of the board includes information indicating the characters, expressions, characters, and the frequency of appearance of the expressions on the board.
  • the ranking of such multiple sections is determined by random numbers. It may be decided according to the order of the timeline.
  • each of these sections is based on the importance obtained by referring to the importance of the sections before and after the continuous section.
  • the order of the sections may be determined.
  • the final importance of the determination section 9 and the determination section 11 are the same, and the order of the final importance is also the same.
  • the automatic editing execution unit 106 performs editing to delete either the determination section 9 or the determination section 11, the importance of the determination section 8 and the determination section 10 which are the determination sections before and after the determination section 9
  • the automatic editing execution unit 106 can sum the importance of the determination sections before and after. Editing is performed so that the determination section 9 having a smaller value is deleted.
  • the series of processes described above can be executed by hardware or software.
  • the programs constituting the software are installed from a program recording medium on a computer embedded in dedicated hardware, a general-purpose personal computer, or the like.
  • FIG. 14 is a block diagram showing a configuration example of computer hardware that executes the above-mentioned series of processes programmatically.
  • the CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random Access Memory
  • the input / output interface 305 is further connected to the bus 304.
  • An input unit 306 including a keyboard, a mouse, and the like, and an output unit 307 including a display, a speaker, and the like are connected to the input / output interface 305.
  • the input / output interface 305 is connected to a storage unit 308 made of a hard disk, a non-volatile memory, etc., a communication unit 309 made of a network interface, etc., and a drive 310 for driving the removable media 311.
  • the CPU 301 loads the program stored in the storage unit 308 into the RAM 303 via the input / output interface 305 and the bus 304, and executes the above-mentioned series of processes. Is done.
  • the program executed by the CPU 301 is recorded on the removable media 311 or provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital broadcasting, and installed in the storage unit 308.
  • the program executed by the computer may be a program in which processing is performed in chronological order according to the order described in the present specification, in parallel, or at a necessary timing such as when a call is made. It may be a program in which processing is performed.
  • the system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether all the components are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network, and a device in which a plurality of modules are housed in one housing are both systems. ..
  • this technology can take a cloud computing configuration in which one function is shared by multiple devices via a network and processed jointly.
  • each step described in the above flowchart can be executed by one device or shared by a plurality of devices.
  • the plurality of processes included in the one step can be executed by one device or shared by a plurality of devices.
  • the present technology can also have the following configurations.
  • Information processing provided with a generator that generates information for assisting reproduction according to the importance determined based on the information about the lecture for each of the predetermined sections that separate the data including the video and sound of the lecture.
  • Device (2) The information processing device according to (1) above, wherein the information related to the lecture is information related to the board writing based on the video or the sound.
  • the information regarding the board writing is information representing the state of the board writing or the contents of the board writing.
  • the information regarding the board writing is information representing at least one of the color of the board writing, the increase / decrease of the board writing, and the formula included in the board writing.
  • the information processing device according to any one of (1) to (4) above, wherein the information about the lecture is information representing the behavior of at least one of the lecturer and the listener of the lecture shown in the video.
  • the information processing device according to any one of (1) to (5) above, wherein the information related to the lecture is information representing a sound related to the lecture.
  • the generation unit generates edited data as information for assisting reproduction by editing the data according to the importance. Device.
  • the generation unit deletes the data in the less important section, or compresses the data in the less important section at a higher compression rate than the data in the other section, thereby editing the edit.
  • the information processing apparatus according to (7) above, which generates later data.
  • the information processing device wherein the generation unit generates information for assisting reproduction according to the importance determined by the determination unit for each of the determination sections.
  • the determination unit determines the importance of each of the determination sections in which a preset number of the sections are combined.
  • the determination unit determines the importance of each of the determination sections set based on the information about the lecture.
  • the generation unit generates thumbnail images representing each of the sections together with the information for assisting reproduction.
  • the generation unit generates information for assisting reproduction of the sections having the same importance according to the importance of the sections before and after each of the sections having the same importance.
  • the information processing apparatus according to any one of 1) to (15).
  • the information processing device according to (11), wherein the determination unit determines the importance according to a determination rule instructed by the user via an input device that accepts a user's operation.
  • the generation unit generates information for assisting reproduction according to an editing rule instructed by the user via an input device that accepts a user's operation. .. (19)
  • 1 video shooting device 2 arithmetic unit, 3 recording device, 4 input / output device, 101 video input unit, 102 video analysis unit, 103 sound analysis unit, 104 control parameter input unit, 105 importance judgment unit, 106 automatic editing execution unit , 107 Video output unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Tourism & Hospitality (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Television Signal Processing For Recording (AREA)
  • Electrically Operated Instructional Devices (AREA)
PCT/JP2021/017535 2020-05-21 2021-05-07 情報処理装置、生成方法、およびプログラム Ceased WO2021235246A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202180034991.3A CN115552889A (zh) 2020-05-21 2021-05-07 信息处理设备、生成方法和程序
US17/916,717 US20230141178A1 (en) 2020-05-21 2021-05-07 Information processing device, generation method, and program
JP2022524382A JP7790342B2 (ja) 2020-05-21 2021-05-07 情報処理装置、生成方法、およびプログラム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-088839 2020-05-21
JP2020088839 2020-05-21

Publications (1)

Publication Number Publication Date
WO2021235246A1 true WO2021235246A1 (ja) 2021-11-25

Family

ID=78707790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/017535 Ceased WO2021235246A1 (ja) 2020-05-21 2021-05-07 情報処理装置、生成方法、およびプログラム

Country Status (4)

Country Link
US (1) US20230141178A1 (https=)
JP (1) JP7790342B2 (https=)
CN (1) CN115552889A (https=)
WO (1) WO2021235246A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121213309B (zh) * 2025-09-24 2026-04-03 北京尚睿通科技有限公司 应用认知大模型的ai课程辅助教学方法及系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009147538A (ja) * 2007-12-12 2009-07-02 Nippon Telegr & Teleph Corp <Ntt> 映像アノテーション付与・表示方法及び装置及びプログラム及びコンピュータ読取可能な記録媒体
JP2016046705A (ja) * 2014-08-25 2016-04-04 コニカミノルタ株式会社 会議録編集装置、その方法とプログラム、会議録再生装置、および会議システム

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003517786A (ja) * 1999-12-16 2003-05-27 ケント・リッジ・デジタル・ラブス ビデオ制作システムおよび方法
JP2003241630A (ja) * 2001-12-11 2003-08-29 Rikogaku Shinkokai 動画配信方法、動画表示システム、教育モデル、ユーザーインターフェース、手動操作手順
JP2004266578A (ja) * 2003-02-28 2004-09-24 Kanazawa Inst Of Technology 動画像編集方法および装置
JP2006279111A (ja) * 2005-03-25 2006-10-12 Fuji Xerox Co Ltd 情報処理装置、情報処理方法およびプログラム
JP2008017050A (ja) * 2006-07-04 2008-01-24 Fuji Xerox Co Ltd 会議システム及び会議方法
US8345990B2 (en) * 2009-08-03 2013-01-01 Indian Institute Of Technology Bombay System for creating a capsule representation of an instructional video
JP5243365B2 (ja) * 2009-08-10 2013-07-24 日本電信電話株式会社 コンテンツ生成装置,コンテンツ生成方法およびコンテンツ生成プログラム
US9502073B2 (en) * 2010-03-08 2016-11-22 Magisto Ltd. System and method for semi-automatic video editing
JP2013239797A (ja) * 2012-05-11 2013-11-28 Canon Inc 画像処理装置
IL228204A (en) * 2013-08-29 2017-04-30 Picscout (Israel) Ltd Efficiently obtaining content-based video
US10664687B2 (en) * 2014-06-12 2020-05-26 Microsoft Technology Licensing, Llc Rule-based video importance analysis
US10127824B2 (en) * 2016-04-01 2018-11-13 Yen4Ken, Inc. System and methods to create multi-faceted index instructional videos
CN110035330B (zh) * 2019-04-16 2021-11-23 上海平安智慧教育科技有限公司 基于在线教育的视频生成方法、系统、设备及存储介质
CN110602526B (zh) * 2019-09-11 2021-09-21 腾讯科技(深圳)有限公司 视频处理方法、装置、计算机设备及存储介质

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009147538A (ja) * 2007-12-12 2009-07-02 Nippon Telegr & Teleph Corp <Ntt> 映像アノテーション付与・表示方法及び装置及びプログラム及びコンピュータ読取可能な記録媒体
JP2016046705A (ja) * 2014-08-25 2016-04-04 コニカミノルタ株式会社 会議録編集装置、その方法とプログラム、会議録再生装置、および会議システム

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YOKOMAE TAKUMA, KOU JONGSIL, IGUCHI NOBUKAZU, OCHI YOUJI, MUKAI SONOYO: "Development and evaluation of a support system for composing multiple contents made of videos and documents", THE PAPERS OF TECHNICAL MEETING ON INFORMATION ORIENTED INDUSTRIAL SYSTEM, no. IIS-09-39, 23 March 2009 (2009-03-23), JP, pages 29 - 34, XP009532415 *

Also Published As

Publication number Publication date
JPWO2021235246A1 (https=) 2021-11-25
JP7790342B2 (ja) 2025-12-23
US20230141178A1 (en) 2023-05-11
CN115552889A (zh) 2022-12-30

Similar Documents

Publication Publication Date Title
Burston Video dubbing projects in the foreign language curriculum
CN102572356B (zh) 记录会议的方法和会议系统
JP2002202941A (ja) マルチメディア電子学習システムおよび学習方法
JP2011040921A (ja) コンテンツ生成装置,コンテンツ生成方法およびコンテンツ生成プログラム
JP7790342B2 (ja) 情報処理装置、生成方法、およびプログラム
US10460178B1 (en) Automated production of chapter file for video player
US20240179381A1 (en) Information processing device, generation method, and program
CN111726693A (zh) 音视频播放方法、装置、设备及介质
JP2002008052A (ja) プレゼンテーションシステムおよび記録媒体
Engstrom et al. Audio and Video Journalism
CN117707412A (zh) 笔记生成方法及其装置
JP2005167822A (ja) 情報再生装置及び情報再生方法
CN109862311B (zh) 视频内容的实时制作方法
Machin-Mastromatteo Best practices for developing and disseminating audiovisual contents to promote library and information services
Kawahara Smart posterboard: Multi-modal sensing and analysis of poster conversations
JP2006208784A (ja) 教育システム、装置制御方法
Jones et al. Audio and video production for instructional design professionals
JP2024038810A (ja) 情報処理装置および方法、情報処理システム、プログラム
TWM607510U (zh) 影片自動產生系統
Yokoi et al. Generating a time shrunk lecture video by event detection
Teraguna et al. DESIGNING THE SHORT FILM “KACAU, AKU RINDU” AS A REPRESENTATION OF THE MENTAL CRISIS OF FINAL SEMESTER STUDENTS
CN119906796B (zh) 课程文件的生成方法、装置、终端设备和存储介质
Mora Creation of educational videos: tools and tips
KR20240139688A (ko) 온라인 교육 컨텐츠 제작 플랫폼, 온라인 교육 컨텐츠의 제작 방법, 이를 이용하여 제작된 온라인 교육 컨텐츠가 저장된 데이터베이스 및 이를 활용한 온라인 교육 컨텐츠 제공 플랫폼
JP2024163730A (ja) 情報処理装置、情報処理方法、及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21807548

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022524382

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21807548

Country of ref document: EP

Kind code of ref document: A1