WO2019214019A1 - 基于卷积神经网络的网络教学方法以及装置 - Google Patents

基于卷积神经网络的网络教学方法以及装置 Download PDF

Info

Publication number
WO2019214019A1
WO2019214019A1 PCT/CN2018/092784 CN2018092784W WO2019214019A1 WO 2019214019 A1 WO2019214019 A1 WO 2019214019A1 CN 2018092784 W CN2018092784 W CN 2018092784W WO 2019214019 A1 WO2019214019 A1 WO 2019214019A1
Authority
WO
WIPO (PCT)
Prior art keywords
key
content
video signal
high frequency
image
Prior art date
Application number
PCT/CN2018/092784
Other languages
English (en)
French (fr)
Inventor
陈铿帆
刘善果
刘胜强
Original Assignee
深圳市鹰硕技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市鹰硕技术有限公司 filed Critical 深圳市鹰硕技术有限公司
Publication of WO2019214019A1 publication Critical patent/WO2019214019A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present disclosure relates to the field of computer technologies, and in particular, to a network teaching method, apparatus, electronic device, and computer readable storage medium based on a convolutional neural network.
  • network teaching uses multimedia and network technology to achieve teaching objectives through multilateral, multi-directional interactions such as divisions, students, and media, and the collection, transmission, processing, and sharing of multiple media teaching information.
  • a teaching model It has the advantages of openness, interactivity and sharing, which breaks the limitations of traditional teaching in time and space and is conducive to the promotion of research learning.
  • the network teaching content can only be displayed on the user's display device, the teaching scene cannot be completely reproduced, and the user cannot select the key content area in the network video picture to be learned, or is restricted by the conditions of the specific display device, or You can only watch the key content of the video source signal to identify the extended content display area.
  • CN201610235737 discloses a method and a device for recognizing a text document, by determining a plurality of layout elements according to the extracted original document content, mapping the layout elements one by one to a corresponding preset label, according to the preset label The original document content is displayed.
  • the method is to realize the recognition of the key content in the document by means of establishing a label, and is not an intelligent image processing algorithm;
  • CN201710250098 discloses a video content description method using the space-time attention model, through time attention And the spatial attention model is used to identify the key areas of focus in each frame of the video. This method is to realize the key area recognition by the dynamic effect of multiple images superimposed by the convolutional neural network, and can not only analyze the key frame single.
  • CN201711049706 discloses a block content classification method based on convolutional neural network. By training samples to convert grayscale images and then establishing a last bit-convolution neural network model, the classification of image content can be realized, and the focus cannot be achieved. Identification of content.
  • the purpose of the present disclosure is to provide a network teaching method, apparatus, electronic device and computer readable storage medium based on a convolutional neural network, thereby at least partially obscuring one or more due to limitations and defects of the related art. problem.
  • a network teaching method based on a convolutional neural network including:
  • a framing image generating step for analyzing a network teaching video signal to generate a feature framing image
  • the key video signal generating step generates a key network teaching video signal including high frequency content according to the position of the key content area in the image, and calls the display interface of the terminal device to display and output the key network teaching video signal.
  • the high frequency content detecting step includes:
  • the receptive field being a convolution layer including a partial feature framing image region
  • Convolution operation is performed on the receptive field and the feature framing image to obtain a plurality of candidate high frequency content regions.
  • the method includes:
  • the depth of the convolution layer corresponding to the receptive field is the same as the depth of the feature framing image.
  • the pooling layer processing step includes:
  • the candidate high-frequency content region includes high-frequency content according to the average pooling calculation result, determining the candidate high-frequency content region as a focused content region, and determining a location of the focused content region in the feature framing image .
  • the determining the location of the highlight content area in the feature framing image includes:
  • the method further includes:
  • the importance content is sorted according to the main feature values calculated by the average pooling.
  • the method further includes:
  • the key network teaching video signal or the display network teaching video signal currently displayed by the terminal device is switched to display the network teaching video signal or the key network teaching video signal.
  • the method further includes:
  • the network teaching video signal and the key network teaching video signal are displayed according to the device priority and the user instruction.
  • the feature framing image is a switching frame.
  • a network teaching apparatus based on a convolutional neural network including:
  • a framed image generating module configured to analyze a network teaching video signal, and generate a feature framed image
  • a high-frequency content detecting module configured to perform high-frequency content detection on the feature framing image by using a plurality of convolution layers of a convolutional neural network algorithm, and determine a plurality of candidate high-frequency content regions that satisfy a preset frequency condition;
  • a pooling layer processing module configured to process the plurality of candidate high-frequency content regions by a pooling layer of a convolutional neural network algorithm, to obtain a location of the key content region and the key content region in the feature framing image;
  • the key video signal generating module is configured to generate a key network teaching video signal including high frequency content according to the position of the key content area in the image, and call the display interface of the terminal device to display and output the key network teaching video signal.
  • an electronic device comprising:
  • a memory having stored thereon computer readable instructions that, when executed by the processor, implement the method of any of the above.
  • a computer readable storage medium having stored thereon a computer program, the computer program being executed by a processor, implements the method of any of the above.
  • a convolutional neural network-based network teaching method in an exemplary embodiment of the present disclosure analyzes a network teaching video signal, generates a feature framing image, and aligns the feature framing image by a plurality of convolution layers of a convolutional neural network algorithm Performing high-frequency content detection, determining a plurality of candidate high-frequency content regions that satisfy a preset frequency condition, and processing the plurality of candidate high-frequency content regions by a pooling layer of a convolutional neural network algorithm to obtain a key content region and a key content region
  • a key network teaching video signal containing high frequency content is generated, and the display interface of the terminal device is called to display and output the key network teaching video signal.
  • the automatic search of the key content of the teaching video is realized by the convolutional neural network, which reduces the operation of manually searching and locating the key content in the actual teaching scene, improves the teaching quality and saves the personnel cost;
  • the key content in the plurality of network teaching videos is sorted according to the importance degree and displayed on the display device of the user, so that the user can selectively view a plurality of key contents at the same time, thereby improving the user experience.
  • FIG. 1 illustrates a flow chart of a convolutional neural network based network teaching method according to an exemplary embodiment of the present disclosure
  • FIGS. 2A-2B are schematic diagrams showing a scenario of a network teaching method based on a convolutional neural network according to an exemplary embodiment of the present disclosure
  • 3A-3B are schematic diagrams showing a network teaching method application scenario based on a convolutional neural network, according to an exemplary embodiment of the present disclosure
  • FIG. 4 shows a schematic block diagram of a convolutional neural network-based network teaching device according to an exemplary embodiment of the present disclosure
  • FIG. 5 schematically illustrates a block diagram of an electronic device in accordance with an exemplary embodiment of the present disclosure
  • FIG. 6 schematically illustrates a schematic diagram of a computer readable storage medium in accordance with an exemplary embodiment of the present disclosure.
  • a network teaching method based on a convolutional neural network is first provided, which can be applied to an electronic device such as a computer; as shown in FIG. 1, the network teaching method based on a convolutional neural network may include the following step:
  • a framing image generating step S110 configured to analyze the network teaching video signal to generate a feature framing image
  • the high-frequency content detection is performed on the feature-framed image by using a plurality of convolution layers of the convolutional neural network algorithm, and a plurality of candidate high-frequency content regions satisfying the preset frequency condition are determined;
  • the pooling layer processing step S130 the plurality of candidate high-frequency content regions are processed by the pooling layer of the convolutional neural network algorithm, and the positions of the key content regions and the key content regions in the feature framing image are obtained;
  • the key video signal generating step S140 generates a key network teaching video signal including the high frequency content according to the position of the key content area in the image, and calls the display interface of the terminal device to display and output the key network teaching video signal.
  • the network teaching method based on convolutional neural network since the focus content of the teaching video is automatically searched by the convolutional neural network, the artificial search and positioning of the key content in the actual teaching scene is reduced.
  • the operation improves the teaching quality and saves the personnel cost;
  • the key content in the plurality of network teaching videos is sorted according to the importance degree, and is displayed on the user's display device, so that the user can selectively view multiple at the same time.
  • the key content enhances the user experience.
  • the framing image generating step S110 it may be used to analyze the network teaching video signal to generate a feature framing image.
  • the network teaching video signal is first analyzed, and the feature frame image is selected from the signal as the selected picture source for selecting the key content, and then the convolutional neural network algorithm is further used to realize the intelligent recognition of the key content.
  • the feature framing image is a switching frame.
  • the switching frame is an important data switching point of the video signal, and represents an initial picture when the content of the video signal is transformed, and the switching frame of the network teaching video signal can be selected as a feature framing image. Based on the accuracy, the amount of calculation is reduced, and the selection rate is increased.
  • high-frequency content detection may be performed on the feature framing image by a plurality of convolution layers of the convolutional neural network algorithm, and a plurality of candidate high-frequency content regions satisfying the preset frequency condition may be determined.
  • the convolutional neural network differs from the ordinary neural network in that the convolutional neural network includes a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • a neuron is only connected to some of the adjacent neurons, and the image recognition is realized by the convolutional layer and the pooled layer.
  • the pooled layer can be regarded as a special convolution. process.
  • the convolution and pooling layers greatly simplify the model complexity and reduce the parameters of the model. By performing high frequency content detection on the feature framing image through a plurality of convolution layers of the convolutional neural network algorithm, one or more candidate high frequency content regions may be determined.
  • the high frequency content detecting step includes: generating a receptive field from the feature framing image, wherein the receptive field is a convolution layer including a partial feature framing image region, and the receptive field can be considered to be
  • the partial region of the picture in the neural network is based on the selection range of the visual experience, and the receptive field and the feature frame image are convoluted to obtain a plurality of candidate high-frequency content regions.
  • FIG. 2A a schematic diagram of a plurality of candidate high-frequency content regions obtained by processing a feature segmentation image in a network teaching video through a plurality of convolution layers.
  • the method includes: the depth of the convolution layer corresponding to the receptive field is the same as the depth of the feature framing image.
  • the depth of the receptive field is actually the primary color number of the feature framing image, generally three primary colors of red R, green G, and blue B, and the depth of the convolution layer corresponding to the receptive field is 3.
  • the plurality of candidate high-frequency content regions may be processed by the pooling layer of the convolutional neural network algorithm to obtain the positions of the focused content region and the key content region in the feature framing image.
  • the process of the pooling layer processing is actually the process of solving the main eigenvalues by the pooling layer of the convolutional neural network algorithm.
  • the general pooling layer algorithm has the mean pooling calculation and the maximum pooling calculation. Forms.
  • a high frequency content region in the plurality of candidate high frequency content regions may be selected.
  • FIG. 2B a schematic diagram of a high frequency content region obtained by pooling a plurality of candidate high frequency content regions of a feature framed image in a network teaching video.
  • the pooling layer processing step includes: dividing a plurality of candidate high-frequency content regions into a plurality of sub-regions of the same size; performing average pooling calculation for each sub-region; and calculating results according to the average pooling
  • the candidate high frequency content region is determined as a focused content region, and a position of the focused content region in the feature framing image is determined.
  • the plurality of candidate high-frequency content regions are divided into a plurality of sub-regions of the same size, and the smaller the sub-regions, the more accurate the recognition of the high-frequency content regions.
  • the average pooling calculation can reduce the error caused by the increase of the estimated value variance caused by the limited neighborhood size, and improve the accuracy of the identification.
  • the method further includes: when there are multiple key content areas, the main features calculated according to the average pooling Value, sorting the importance of the key content area. According to the main feature value, the importance ranking of the key content regions can be quickly determined by sorting the main feature values.
  • the determining the location of the focus content area in the feature framing image includes: analyzing a grayscale distribution gradient of the focus content image, performing marginal recognition of the key content image according to the gradation distribution gradient; Marginally determining a key content display area; searching for the key content display area in the feature frame image. According to the difference of the gray level distribution gradient of the key content image, the determination of the margin of the key content image can be realized, thereby realizing the rapid positioning of the position of the key content image in the feature frame image.
  • a key network teaching video signal including high frequency content may be generated according to the position of the key content area in the image, and the display interface of the terminal device is called to display and output the key network teaching video signal.
  • FIG. 3A is a teaching picture before the key content area recognition of a network teaching video signal, and after calculating by the convolutional neural network algorithm, determining a key content area in the feature frame image of the network teaching video signal teaching picture and Positioning the key content area, generating a teaching signal corresponding to the key content area, as shown in FIG. 3B, displaying the teaching picture of the key content area after identifying the key content area of the network teaching video signal.
  • the method further includes: setting a switching button in an output page of the network teaching video signal and the key network teaching video signal; and receiving the switching instruction sent by the user by triggering the switching button, the terminal device is
  • the current display of the key network teaching video signal or display network teaching video signal is switched to display the network teaching video signal or the key network teaching video signal.
  • a switching button may be further disposed in the teaching screen. When the key teaching content area is recognized, the switching button is displayed to prompt the user to have a key teaching content area for the user to select whether to perform a switching operation.
  • the method further includes: when detecting that the terminal device has an associated device, acquiring a device priority of the terminal device and the associated device; displaying the network teaching video signal according to the device priority and the user instruction Focus on network teaching video signals.
  • the user of the network teaching has multiple associated teaching video devices, the user can simultaneously display the plurality of network teaching images at the same time according to the important level of the key network teaching video signals. User's teaching experience.
  • the convolutional neural network-based network teaching apparatus 400 may include a framing image generation module 410, a high frequency content detection module 420, a pooling layer processing module 430, and a focus video signal generation module 440. among them:
  • the framing image generating module 410 is configured to analyze the network teaching video signal to generate a feature framing image
  • the high-frequency content detecting module 420 is configured to perform high-frequency content detection on the feature framing image by using a plurality of convolution layers of the convolutional neural network algorithm, and determine a plurality of candidate high-frequency content regions that satisfy a preset frequency condition;
  • the pooling layer processing module 430 is configured to process the plurality of candidate high-frequency content regions by using a pooling layer of a convolutional neural network algorithm to obtain a location of the key content region and the key content region in the feature framing image;
  • the key video signal generating module 440 is configured to generate a key network teaching video signal including high frequency content according to the position of the key content area in the image, and call the display interface of the terminal device to display and output the key network teaching video signal.
  • modules or units of the network teaching apparatus 400 based on the convolutional neural network are mentioned in the above detailed description, such division is not mandatory. Indeed, in accordance with embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one of the modules or units described above may be further divided into multiple modules or units.
  • an electronic device capable of implementing the above method is also provided.
  • aspects of the present invention can be implemented as a system, method, or program product. Accordingly, aspects of the present invention may be embodied in the form of a complete hardware embodiment, a complete software embodiment (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein. "Circuit,” “module,” or “system.”
  • FIG. 5 An electronic device 500 in accordance with such an embodiment of the present invention is described below with reference to FIG. 5 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present invention.
  • electronic device 500 is embodied in the form of a general purpose computing device.
  • the components of the electronic device 500 may include, but are not limited to, the at least one processing unit 510, the at least one storage unit 520, the bus 530 connecting the different system components (including the storage unit 520 and the processing unit 510), and the display unit 540.
  • the storage unit stores program code, which can be executed by the processing unit 510, such that the processing unit 510 performs various exemplary embodiments according to the present invention described in the "Exemplary Method" section of the present specification.
  • the processing unit 510 can perform steps S110 to S140 as shown in FIG. 1.
  • the storage unit 520 can include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 5201 and/or a cache storage unit 5202, and can further include a read only storage unit (ROM) 5203.
  • RAM random access storage unit
  • ROM read only storage unit
  • the storage unit 520 can also include a program/utility 5204 having a set (at least one) of the program modules 5205, such as but not limited to: an operating system, one or more applications, other program modules, and program data, Implementations of the network environment may be included in each or some of these examples.
  • a program/utility 5204 having a set (at least one) of the program modules 5205, such as but not limited to: an operating system, one or more applications, other program modules, and program data, Implementations of the network environment may be included in each or some of these examples.
  • Bus 530 may be representative of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures. bus.
  • the electronic device 500 can also communicate with one or more external devices 570 (eg, a keyboard, pointing device, Bluetooth device, etc.), and can also communicate with one or more devices that enable the user to interact with the electronic device 500, and/or with Any device (eg, router, modem, etc.) that enables the electronic device 500 to communicate with one or more other computing devices. This communication can take place via an input/output (I/O) interface 550. Also, electronic device 500 can communicate with one or more networks (e.g., a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) via network adapter 560. As shown, network adapter 560 communicates with other modules of electronic device 500 via bus 530.
  • network adapter 560 communicates with other modules of electronic device 500 via bus 530.
  • the exemplary embodiments described herein may be implemented by software, or may be implemented by software in combination with necessary hardware. Therefore, the technical solution according to an embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.) or on a network.
  • a non-volatile storage medium which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.
  • a number of instructions are included to cause a computing device (which may be a personal computer, server, terminal device, or network device, etc.) to perform a method in accordance with an embodiment of the present disclosure.
  • a computer readable storage medium having stored thereon a program product capable of implementing the above method of the present specification.
  • aspects of the present invention may also be embodied in the form of a program product comprising program code for causing said program product to run on a terminal device The terminal device performs the steps according to various exemplary embodiments of the present invention described in the "Exemplary Method" section of the present specification.
  • a program product 600 for implementing the above method which may employ a portable compact disk read only memory (CD-ROM) and includes program code, and may be in a terminal device, is illustrated in accordance with an embodiment of the present invention.
  • CD-ROM portable compact disk read only memory
  • the program product of the present invention is not limited thereto, and in the present document, the readable storage medium may be any tangible medium containing or storing a program that can be used by or in connection with an instruction execution system, apparatus or device.
  • the program product can employ any combination of one or more readable media.
  • the readable medium can be a readable signal medium or a readable storage medium.
  • the readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples (non-exhaustive lists) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • the computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium can also be any readable medium other than a readable storage medium that can transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a readable medium can be transmitted using any suitable medium, including but not limited to wireless, wireline, optical cable, RF, etc., or any suitable combination of the foregoing.
  • Program code for performing the operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++, etc., including conventional procedural Programming language—such as the "C" language or a similar programming language.
  • the program code can execute entirely on the user computing device, partially on the user device, as a stand-alone software package, partially on the remote computing device on the user computing device, or entirely on the remote computing device or server. Execute on.
  • the remote computing device can be connected to the user computing device via any kind of network, including a local area network (LAN) or wide area network (WAN), or can be connected to an external computing device (eg, provided using an Internet service) Businesses are connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Businesses are connected via the Internet.
  • the automatic search of the key content of the teaching video is realized by the convolutional neural network, which reduces the operation of manually searching and locating the key content in the actual teaching scene, improves the teaching quality and saves the personnel cost;
  • the key content in the plurality of network teaching videos is sorted according to the importance degree and displayed on the display device of the user, so that the user can selectively view a plurality of key contents at the same time, thereby improving the user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Image Analysis (AREA)

Abstract

一种基于卷积神经网络的网络教学方法、装置、电子设备。其中,方法包括:分析网络教学视频信号,生成特征分帧图像(S110),通过卷积神经网络算法的多个卷积层对特征分帧图像进行高频内容检测,确定满足预设频次条件的多个候选高频内容区域(S120),通过卷积神经网络算法的池化层处理多个候选高频内容区域,得到重点内容区域及重点内容区域在特征分帧图像中的位置(S130),根据重点内容区域在图像中的位置,生成包含高频内容的重点网络教学视频信号并显示输出(S140)。可以对网络教学视频信号分析自动识别并生成包含重点内容的网络教学视频信号。

Description

基于卷积神经网络的网络教学方法以及装置 技术领域
本公开涉及计算机技术领域,具体而言,涉及一种基于卷积神经网络的网络教学方法、装置、电子设备以及计算机可读存储介质。
背景技术
网络教学是在一定教学理论和思想指导下,应用多媒体和网络技术,通过师、生、媒体等多边、多向互动和对多种媒体教学信息的收集、传输、处理、共享,来实现教学目标的一种教学模式。具有开放性、交互性、共享性等优点,打破了传统教学在时空上的局限,有利于推广研究性学习。
然而,由于网络教学内容只能在用户的显示设备上显示,使得教学场景无法完全重现,受特定显示设备的条件制约,用户无法选择观看想要学习的网络视频画面中的重点内容区域,或者只能收看视频源信号人为认定的重点内容扩展展示区域。
在现有技术中,CN201610235737公开了一种识别文字文档的方法及装置,通过根据提取的原文档内容确定复数个版面元素将版面元素一一映射到相对应的预设标签,根据预设标签将原文档内容进行展示,该方式是通过建立标签的方式来实现对文档中重点内容识别的,不是智能图像处理算法;CN201710250098公开了一种利用时空注意力模型的视频内容描述方法,通过时间注意力和空间注意力模型来实现对视频每帧图片中重点关注的关键区域进行识别,该方法是通过卷积神经网络对多个图片叠加的动态效果实现关键区域识别的,并不能只分析关键帧单一图片来实现对重点区域的识别。CN201711049706公开了一种基于卷积神经网络的块内容分类方法,通过训练样本转换灰度图的方式进而建立末位比特-卷积神经网络模型,来实现对图片内容的分类,并不能实现对重点内容的识别。
因此,需要提供一种或多种至少能够解决上述问题的技术方案。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技 术的信息。
发明内容
本公开的目的在于提供一种基于卷积神经网络的网络教学方法、装置、电子设备以及计算机可读存储介质,进而至少在一定程度上克服由于相关技术的限制和缺陷而导致的一个或者多个问题。
根据本公开的一个方面,提供一种基于卷积神经网络的网络教学方法,包括:
分帧图像生成步骤,用于分析网络教学视频信号,生成特征分帧图像;
高频内容检测步骤,通过卷积神经网络算法的多个卷积层对所述特征分帧图像进行高频内容检测,确定满足预设频次条件的多个候选高频内容区域;
池化层处理步骤,通过卷积神经网络算法的池化层处理所述多个候选高频内容区域,得到重点内容区域及重点内容区域在特征分帧图像中的位置;
重点视频信号生成步骤,根据重点内容区域在图像中的位置,生成包含高频内容的重点网络教学视频信号,调用终端设备的显示接口显示输出所述重点网络教学视频信号。
在本公开的一种示例性实施例中,所述高频内容检测步骤,包括:
由特征分帧图像生成感受野,所述感受野为包含部分特征分帧图像区域的卷积层;
对所述感受野与所述特征分帧图像进行卷积运算,得到多个候选高频内容区域。
在本公开的一种示例性实施例中,所述方法包括:
所述感受野对应的卷积层的深度与特征分帧图像的深度相同。
在本公开的一种示例性实施例中,所述池化层处理步骤,包括:
将多个候选高频内容区域分成多个大小相同的子区域;
对每个子区域进行平均池化计算;
当根据平均池化计算结果确定所述候选高频内容区域包含高频内容时,将所述候选高频内容区域确定为重点内容区域,并确定所述重点内容区域在特征分帧图像中的位置。
在本公开的一种示例性实施例中,所述确定重点内容区域在特征分帧图像中的位置,包括:
分析重点内容图像灰度分布梯度,根据所述灰度分布梯度进行重点内容图像边际识别;
根据重点内容图像边际确定重点内容显示区域;
在所述特征分帧图像中查找确定所述重点内容显示区域。
在本公开的一种示例性实施例中,在得到重点内容区域及重点内容区域在特征分帧图像中的位置后,所述方法还包括:
当存在多个重点内容区域时,根据平均池化计算得出的主要特征值,对重点内容区域进行重要度排序。
在本公开的一种示例性实施例中,所述方法还包括:
在网络教学视频信号与重点网络教学视频信号的输出页面中设置切换按钮;
当接收到用户通过触发所述切换按钮发送的切换指令后,将终端设备当前显示输出的重点网络教学视频信号或显示网络教学视频信号切换为显示网络教学视频信号或重点网络教学视频信号。
在本公开的一种示例性实施例中,所述方法还包括:
当检测到所述终端设备具有关联设备时,获取终端设备以及关联设备的设备优先级;
按照所述设备优先级以及用户指令显示网络教学视频信号与重点网络教学视频信号。
在本公开的一种示例性实施例中,所述特征分帧图像为切换帧。
在本公开的一个方面,提供一种基于卷积神经网络的网络教学装置, 包括:
分帧图像生成模块,用于分析网络教学视频信号,生成特征分帧图像;
高频内容检测模块,用于通过卷积神经网络算法的多个卷积层对所述特征分帧图像进行高频内容检测,确定满足预设频次条件的多个候选高频内容区域;
池化层处理模块,用于通过卷积神经网络算法的池化层处理所述多个候选高频内容区域,得到重点内容区域及重点内容区域在特征分帧图像中的位置;
重点视频信号生成模块,用于根据重点内容区域在图像中的位置,生成包含高频内容的重点网络教学视频信号,调用终端设备的显示接口显示输出所述重点网络教学视频信号。
在本公开的一个方面,提供一种电子设备,包括:
处理器;以及
存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时实现根据上述任意一项所述的方法。
在本公开的一个方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现根据上述任意一项所述的方法。
本公开的示例性实施例中的基于卷积神经网络的网络教学方法,分析网络教学视频信号,生成特征分帧图像,通过卷积神经网络算法的多个卷积层对所述特征分帧图像进行高频内容检测,确定满足预设频次条件的多个候选高频内容区域,通过卷积神经网络算法的池化层处理所述多个候选高频内容区域,得到重点内容区域及重点内容区域在特征分帧图像中的位置,根据重点内容区域在图像中的位置,生成包含高频内容的重点网络教学视频信号,调用终端设备的显示接口显示输出所述重点网络教学视频信号。一方面,由于通过卷积神经网络对教学视频的重点内容实现了自动查找,减少了实际教学场景中人为查找和定位重点内容的操作,提高了教学质量的同时节省了人员成本;另一方面,对多个网络教学视频中的重点内 容按照重要程度排序,在用户的显示设备中显示,使用户可以有选择的同时收看多个重点内容,提高了用户的体验。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
通过参照附图来详细描述其示例实施例,本公开的上述和其它特征及优点将变得更加明显。
图1示出了根据本公开一示例性实施例的基于卷积神经网络的网络教学方法的流程图;
图2A-2B示出了根据本公开一示例性实施例的基于卷积神经网络的网络教学方法应用场景的示意图;
图3A-3B示出了根据本公开一示例性实施例的基于卷积神经网络的网络教学方法应用场景的示意图;
图4示出了根据本公开一示例性实施例的基于卷积神经网络的网络教学装置的示意框图;
图5示意性示出了根据本公开一示例性实施例的电子设备的框图;以及
图6示意性示出了根据本公开一示例性实施例的计算机可读存储介质的示意图。
具体实施方式
现在将参考附图更全面地描述示例实施例。然而,示例实施例能够以多种形式实施,且不应被理解为限于在此阐述的实施例;相反,提供这些实施例使得本公开将全面和完整,并将示例实施例的构思全面地传达给本领域的技术人员。在图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本公开的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而没有所述特定细节中的一个或更多,或者可以采用其它的方法、组元、材料、装置、步骤等。在其它情况下,不详细示出或描述公知结构、方法、装置、实现、材料或者操作以避免模糊本公开的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个软件硬化的模块中实现这些功能实体或功能实体的一部分,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
在本示例实施例中,首先提供了一种基于卷积神经网络的网络教学方法,可以应用于计算机等电子设备;参考图1中所示,该基于卷积神经网络的网络教学方法可以包括以下步骤:
分帧图像生成步骤S110,用于分析网络教学视频信号,生成特征分帧图像;
高频内容检测步骤S120,通过卷积神经网络算法的多个卷积层对所述特征分帧图像进行高频内容检测,确定满足预设频次条件的多个候选高频内容区域;
池化层处理步骤S130,通过卷积神经网络算法的池化层处理所述多个候选高频内容区域,得到重点内容区域及重点内容区域在特征分帧图像中的位置;
重点视频信号生成步骤S140,根据重点内容区域在图像中的位置,生成包含高频内容的重点网络教学视频信号,调用终端设备的显示接口显示输出所述重点网络教学视频信号。
根据本示例实施例中的基于卷积神经网络的网络教学方法,一方面,由于通过卷积神经网络对教学视频的重点内容实现了自动查找,减少了实际教学场景中人为查找和定位重点内容的操作,提高了教学质量的同时节省了人员成本;另一方面,对多个网络教学视频中的重点内容按照重要程度排序,在用户的显示设备中显示,使用户可以有选择的同时收看多个重 点内容,提高了用户的体验。
下面,将对本示例实施例中的基于卷积神经网络的网络教学方法进行进一步的说明。
在分帧图像生成步骤S110中,可以用于分析网络教学视频信号,生成特征分帧图像。
本示例实施方式中,由于网络教学观看设备的局限性,常常需要将网络教学内容中的重点区域局部放大重点观看学习,在现有的操作中,往往是人为的选择认定重点区域并放大显示给用户的,这样的方式不能智能的实现重点教学内容的自动识别和显示。本方法中,首先分析网络教学视频信号,从所述信号中选取特征分帧图像作为选择重点内容的选取图片源,然后进一步使用卷积神经网络算法实现对重点内容的智能识别。
本示例实施方式中,所述特征分帧图像为切换帧。在实际的视频信号中,切换帧是视频信号的重要数据切换点,代表所述视频信号内容的变换时的初始画面,对所述网络教学视频信号的切换帧作为特征分帧图像可以在保证选取准确度的基础上减少运算量,提高选取速率。
在高频内容检测步骤S120中,可以通过卷积神经网络算法的多个卷积层对所述特征分帧图像进行高频内容检测,确定满足预设频次条件的多个候选高频内容区域。
本示例实施方式中,卷积神经网络与普通神经网络的区别在于,卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。在卷积神经网络的卷积层中,一个神经元只与部分邻层神经元连接,通过卷积层与池化层来是实现图像识别,其中池化层可以看作一种特殊的卷积过程。卷积和池化层大大简化了模型复杂度,减少了模型的参数。对所述特征分帧图像通过卷积神经网络算法的多个卷积层进行高频内容检测,可以确定一个或过个候选高频内容区域。
本示例实施方式中,所述高频内容检测步骤,包括:由特征分帧图像生成感受野,所述感受野为包含部分特征分帧图像区域的卷积层,所述感受野可以认为是在神经网络中对所述图片的部分区域基于视觉感受的选取范围,对所述感受野与所述特征分帧图像进行卷积运算,得到多个候选高 频内容区域。如图2A所示,为某网络教学视频中一幅特征分帧图像经过多个卷积层处理后得到的多个候选高频内容区域示意图。
本示例实施方式中,所述方法包括:所述感受野对应的卷积层的深度与特征分帧图像的深度相同。所述感受野的深度其实就是所述特征分帧图像的构成原色数,一般为红R、绿G、蓝B三原色,既感受野对应的卷积层的深度为3。
在池化层处理步骤S130中,可以通过卷积神经网络算法的池化层处理所述多个候选高频内容区域,得到重点内容区域及重点内容区域在特征分帧图像中的位置。
本示例实施方式中,池化层处理处理的过程其实就是通过卷积神经网络算法的池化层进行主要特征值求解的过程,一般池化层算法有均值池化计算和最大值池化计算两种形式。根据所述池化运算可以选取出所述多个候选高频内容区域中的高频内容区域。如图2B所示,为某网络教学视频中一幅特征分帧图像的多个候选高频内容区域经过池化计算后得到的高频内容区域的示意图。
本示例实施方式中,所述池化层处理步骤,包括:将多个候选高频内容区域分成多个大小相同的子区域;对每个子区域进行平均池化计算;当根据平均池化计算结果确定所述候选高频内容区域包含高频内容时,将所述候选高频内容区域确定为重点内容区域,并确定所述重点内容区域在特征分帧图像中的位置。将多个候选高频内容区域分成多个大小相同的子区域,所述子区域越小,对高频内容区域的识别越准确。平均池化计算与最大值池化计算相比,可以减少因邻域大小受限造成的估计值方差增大而带来的误差,提高识别的准确度。
本示例实施方式中,在得到重点内容区域及重点内容区域在特征分帧图像中的位置后,所述方法还包括:当存在多个重点内容区域时,根据平均池化计算得出的主要特征值,对重点内容区域进行重要度排序。根据所述主要特征值,可以快速的通过对主要特征值排序来确定所述重点内容区域的重要度排序。
本示例实施方式中,所述确定重点内容区域在特征分帧图像中的位置, 包括:分析重点内容图像灰度分布梯度,根据所述灰度分布梯度进行重点内容图像边际识别;根据重点内容图像边际确定重点内容显示区域;在所述特征分帧图像中查找确定所述重点内容显示区域。根据重点内容图像灰度分布梯度的差异,可以实现对重点内容图像边际的确定,进而实现对所述重点内容图像在所述特征分帧图像中的位置快速定位。
在重点视频信号生成步骤S140中,可以根据重点内容区域在图像中的位置,生成包含高频内容的重点网络教学视频信号,调用终端设备的显示接口显示输出所述重点网络教学视频信号。
本示例实施方式中,在确定所述特征分帧图像中的重点内容区域并对所述重点内容区域定位后,剪切所述网络教学视频画面中重点内容区域,生成显示重点内容区域的网络教学视频信号并发送至用户端显示。如图3A为某网络教学视频信号的重点内容区域识别前的教学画面,通过所述卷积神经网络算法计算后,确定所述网络教学视频信号教学画面的特征分帧图像中的重点内容区域并对所述重点内容区域定位,生成对应的重点内容区域的教学信号,如图3B所示,为所述网络教学视频信号的重点内容区域识别后显示重点内容区域的教学画面。
本示例实施方式中,所述方法还包括:在网络教学视频信号与重点网络教学视频信号的输出页面中设置切换按钮;当接收到用户通过触发所述切换按钮发送的切换指令后,将终端设备当前显示输出的重点网络教学视频信号或显示网络教学视频信号切换为显示网络教学视频信号或重点网络教学视频信号。另外,所述教学画面中还可以设置切换按钮,当识别到重点教学内容区域时,显示所述切换按钮,用于提示用户有重点教学内容区域,供用户选择是否进行切换操作。
本示例实施方式中,所述方法还包括:当检测到所述终端设备具有关联设备时,获取终端设备以及关联设备的设备优先级;按照所述设备优先级以及用户指令显示网络教学视频信号与重点网络教学视频信号。当网络教学的用户有多个关联的教学视频设备时,可以根据重点网络教学视频信号的重要等级在对应优先级的终端设备上同时显示,试用户可以同时选择观看多个网络教学画面,提升了用户的教学体验。
需要说明的是,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
此外,在本示例实施例中,还提供了一种基于卷积神经网络的网络教学装置。参照图4所示,该基于卷积神经网络的网络教学装置400可以包括:分帧图像生成模块410、高频内容检测模块420、池化层处理模块430以及重点视频信号生成模块440。其中:
分帧图像生成模块410,用于分析网络教学视频信号,生成特征分帧图像;
高频内容检测模块420,用于通过卷积神经网络算法的多个卷积层对所述特征分帧图像进行高频内容检测,确定满足预设频次条件的多个候选高频内容区域;
池化层处理模块430,用于通过卷积神经网络算法的池化层处理所述多个候选高频内容区域,得到重点内容区域及重点内容区域在特征分帧图像中的位置;
重点视频信号生成模块440,用于根据重点内容区域在图像中的位置,生成包含高频内容的重点网络教学视频信号,调用终端设备的显示接口显示输出所述重点网络教学视频信号。
上述中各基于卷积神经网络的网络教学装置模块的具体细节已经在对应的音频段落识别方法中进行了详细的描述,因此此处不再赘述。
应当注意,尽管在上文详细描述中提及了基于卷积神经网络的网络教学装置400的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
此外,在本公开的示例性实施例中,还提供了一种能够实现上述方法 的电子设备。
所属技术领域的技术人员能够理解,本发明的各个方面可以实现为系统、方法或程序产品。因此,本发明的各个方面可以具体实现为以下形式,即:完全的硬件实施例、完全的软件实施例(包括固件、微代码等),或硬件和软件方面结合的实施例,这里可以统称为“电路”、“模块”或“系统”。
下面参照图5来描述根据本发明的这种实施例的电子设备500。图5显示的电子设备500仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。
如图5所示,电子设备500以通用计算设备的形式表现。电子设备500的组件可以包括但不限于:上述至少一个处理单元510、上述至少一个存储单元520、连接不同系统组件(包括存储单元520和处理单元510)的总线530、显示单元540。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元510执行,使得所述处理单元510执行本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施例的步骤。例如,所述处理单元510可以执行如图1中所示的步骤S110至步骤S140。
存储单元520可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)5201和/或高速缓存存储单元5202,还可以进一步包括只读存储单元(ROM)5203。
存储单元520还可以包括具有一组(至少一个)程序模块5205的程序/实用工具5204,这样的程序模块5205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线530可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
电子设备500也可以与一个或多个外部设备570(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备500 交互的设备通信,和/或与使得该电子设备500能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口550进行。并且,电子设备500还可以通过网络适配器560与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器560通过总线530与电子设备500的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备500使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
通过以上的实施例的描述,本领域的技术人员易于理解,这里描述的示例实施例可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施例的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施例的方法。
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施例中,本发明的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施例的步骤。
参考图6所示,描述了根据本发明的实施例的用于实现上述方法的程序产品600,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本发明的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于 电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本发明操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
此外,上述附图仅是根据本发明示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应 性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限。
工业实用性
一方面,由于通过卷积神经网络对教学视频的重点内容实现了自动查找,减少了实际教学场景中人为查找和定位重点内容的操作,提高了教学质量的同时节省了人员成本;另一方面,对多个网络教学视频中的重点内容按照重要程度排序,在用户的显示设备中显示,使用户可以有选择的同时收看多个重点内容,提高了用户的体验。

Claims (12)

  1. 一种基于卷积神经网络的网络教学方法,其特征在于,包括:
    分帧图像生成步骤,用于分析网络教学视频信号,生成特征分帧图像;
    高频内容检测步骤,通过卷积神经网络算法的多个卷积层对所述特征分帧图像进行高频内容检测,确定满足预设频次条件的多个候选高频内容区域;
    池化层处理步骤,通过卷积神经网络算法的池化层处理所述多个候选高频内容区域,得到重点内容区域及重点内容区域在特征分帧图像中的位置;
    重点视频信号生成步骤,根据重点内容区域在图像中的位置,生成包含高频内容的重点网络教学视频信号,调用终端设备的显示接口显示输出所述重点网络教学视频信号。
  2. 如权利要求1所述的方法,其特征在于,所述高频内容检测步骤,包括:
    由特征分帧图像生成感受野,所述感受野为包含部分特征分帧图像区域的卷积层;
    对所述感受野与所述特征分帧图像进行卷积运算,得到多个候选高频内容区域。
  3. 如权利要求2所述的方法,其特征在于,所述方法包括:
    所述感受野对应的卷积层的深度与特征分帧图像的深度相同。
  4. 如权利要求1所述的方法,其特征在于,所述池化层处理步骤,包括:
    将多个候选高频内容区域分成多个大小相同的子区域;
    对每个子区域进行平均池化计算;
    当根据平均池化计算结果确定所述候选高频内容区域包含高频内容时,将所述候选高频内容区域确定为重点内容区域,并确定重点内容区域在特征分帧图像中的位置。
  5. 如权利要求4所述的方法,其特征在于,确定重点内容区域在特征分帧图像中的位置,包括:
    分析重点内容图像灰度分布梯度,根据所述灰度分布梯度进行重点内容图像边际识别;
    根据重点内容图像边际确定重点内容显示区域;
    在所述特征分帧图像中查找确定所述重点内容显示区域。
  6. 如权利要求4所述的方法,其特征在于,在得到重点内容区域及重点内容区域在特征分帧图像中的位置后,所述方法还包括:
    当存在多个重点内容区域时,根据平均池化计算得出的主要特征值,对重点内容区域进行重要度排序。
  7. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    在网络教学视频信号与重点网络教学视频信号的输出页面中设置切换按钮;
    当接收到用户通过触发所述切换按钮发送的切换指令后,将终端设备当前显示输出的重点网络教学视频信号或显示网络教学视频信号切换为显示网络教学视频信号或重点网络教学视频信号。
  8. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    当检测到所述终端设备具有关联设备时,获取终端设备以及关联设备的设备优先级;
    按照所述设备优先级以及用户指令显示网络教学视频信号与重点网络教学视频信号。
  9. 如权利要求1所述的方法,其特征在于,所述特征分帧图像为切换帧。
  10. 一种基于卷积神经网络的网络教学装置,其特征在于,所述装置包括:
    分帧图像生成模块,用于分析网络教学视频信号,生成特征分帧图像;
    高频内容检测模块,用于通过卷积神经网络算法的多个卷积层对所述 特征分帧图像进行高频内容检测,确定满足预设频次条件的多个候选高频内容区域;
    池化层处理模块,用于通过卷积神经网络算法的池化层处理所述多个候选高频内容区域,得到重点内容区域及重点内容区域在特征分帧图像中的位置;
    重点视频信号生成模块,用于根据重点内容区域在图像中的位置,生成包含高频内容的重点网络教学视频信号,调用终端设备的显示接口显示输出所述重点网络教学视频信号。
  11. 一种电子设备,其特征在于,包括:
    处理器;以及
    存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时实现根据权利要求1至9中任一项所述的方法。
  12. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现根据权利要求1至9中任一项所述方法。
PCT/CN2018/092784 2018-05-11 2018-06-26 基于卷积神经网络的网络教学方法以及装置 WO2019214019A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810447977.6A CN108665769B (zh) 2018-05-11 2018-05-11 基于卷积神经网络的网络教学方法以及装置
CN201810447977.6 2018-05-11

Publications (1)

Publication Number Publication Date
WO2019214019A1 true WO2019214019A1 (zh) 2019-11-14

Family

ID=63779079

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/092784 WO2019214019A1 (zh) 2018-05-11 2018-06-26 基于卷积神经网络的网络教学方法以及装置

Country Status (2)

Country Link
CN (1) CN108665769B (zh)
WO (1) WO2019214019A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112201116A (zh) * 2020-09-29 2021-01-08 深圳市优必选科技股份有限公司 一种逻辑板识别方法、装置及终端设备
CN114466240A (zh) * 2022-01-27 2022-05-10 北京精鸿软件科技有限公司 视频处理方法、装置、介质及电子设备
CN115641763A (zh) * 2022-09-12 2023-01-24 中南迅智科技有限公司 一种记忆背诵辅助系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110182469A1 (en) * 2010-01-28 2011-07-28 Nec Laboratories America, Inc. 3d convolutional neural networks for automatic human action recognition
US20160221190A1 (en) * 2015-01-29 2016-08-04 Yiannis Aloimonos Learning manipulation actions from unconstrained videos
CN106897714A (zh) * 2017-03-23 2017-06-27 北京大学深圳研究生院 一种基于卷积神经网络的视频动作检测方法
CN107633238A (zh) * 2017-10-12 2018-01-26 深圳市信海通科技有限公司 一种视频分析方法以及智能分析服务器
CN107909556A (zh) * 2017-11-27 2018-04-13 天津大学 基于卷积神经网络的视频图像去雨方法

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523536B (zh) * 2011-12-15 2014-04-02 清华大学 视频语义可视化方法
US10572735B2 (en) * 2015-03-31 2020-02-25 Beijing Shunyuan Kaihua Technology Limited Detect sports video highlights for mobile computing devices
CN106161873A (zh) * 2015-04-28 2016-11-23 天脉聚源(北京)科技有限公司 一种视频信息提取推送方法及系统
KR101777242B1 (ko) * 2015-09-08 2017-09-11 네이버 주식회사 동영상 컨텐츠의 하이라이트 영상을 추출하여 제공하는 방법과 시스템 및 기록 매체
US20170109584A1 (en) * 2015-10-20 2017-04-20 Microsoft Technology Licensing, Llc Video Highlight Detection with Pairwise Deep Ranking
CN105930402A (zh) * 2016-04-15 2016-09-07 乐视控股(北京)有限公司 基于卷积神经网络的视频检索方法及系统
CN106095804B (zh) * 2016-05-30 2019-08-20 维沃移动通信有限公司 一种视频片段的处理方法、定位方法及终端
CN106503693B (zh) * 2016-11-28 2019-03-15 北京字节跳动科技有限公司 视频封面的提供方法及装置
CN106686377B (zh) * 2016-12-30 2018-09-04 佳都新太科技股份有限公司 一种基于深层神经网络的视频重点区域确定方法
CN107066973B (zh) * 2017-04-17 2020-07-21 杭州电子科技大学 一种利用时空注意力模型的视频内容描述方法
CN107480665B (zh) * 2017-08-09 2020-08-11 北京小米移动软件有限公司 文字检测方法、装置及计算机可读存储介质
CN107562723A (zh) * 2017-08-24 2018-01-09 网易乐得科技有限公司 会议处理方法、介质、装置和计算设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110182469A1 (en) * 2010-01-28 2011-07-28 Nec Laboratories America, Inc. 3d convolutional neural networks for automatic human action recognition
US20160221190A1 (en) * 2015-01-29 2016-08-04 Yiannis Aloimonos Learning manipulation actions from unconstrained videos
CN106897714A (zh) * 2017-03-23 2017-06-27 北京大学深圳研究生院 一种基于卷积神经网络的视频动作检测方法
CN107633238A (zh) * 2017-10-12 2018-01-26 深圳市信海通科技有限公司 一种视频分析方法以及智能分析服务器
CN107909556A (zh) * 2017-11-27 2018-04-13 天津大学 基于卷积神经网络的视频图像去雨方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112201116A (zh) * 2020-09-29 2021-01-08 深圳市优必选科技股份有限公司 一种逻辑板识别方法、装置及终端设备
CN114466240A (zh) * 2022-01-27 2022-05-10 北京精鸿软件科技有限公司 视频处理方法、装置、介质及电子设备
CN115641763A (zh) * 2022-09-12 2023-01-24 中南迅智科技有限公司 一种记忆背诵辅助系统
CN115641763B (zh) * 2022-09-12 2023-12-19 中南迅智科技有限公司 一种记忆背诵辅助系统

Also Published As

Publication number Publication date
CN108665769A (zh) 2018-10-16
CN108665769B (zh) 2021-04-06

Similar Documents

Publication Publication Date Title
US11205100B2 (en) Edge-based adaptive machine learning for object recognition
CN108229485B (zh) 用于测试用户界面的方法和装置
US11605150B2 (en) Method for converting landscape video to portrait mobile layout using a selection interface
CN109614934B (zh) 在线教学质量评估参数生成方法及装置
WO2021003825A1 (zh) 视频镜头剪切的方法、装置及计算机设备
KR20220025257A (ko) 미제어 조명 상태를 갖는 이미지에서 피부색을 식별하는 기법
US20220375225A1 (en) Video Segmentation Method and Apparatus, Device, and Medium
WO2020253127A1 (zh) 脸部特征提取模型训练方法、脸部特征提取方法、装置、设备及存储介质
WO2018106692A1 (en) Method for converting landscape video to portrait mobile layout using a selection interface
US11475588B2 (en) Image processing method and device for processing image, server and storage medium
CN111260545A (zh) 生成图像的方法和装置
KR102002024B1 (ko) 객체 라벨링 처리 방법 및 객체 관리 서버
WO2019214019A1 (zh) 基于卷积神经网络的网络教学方法以及装置
EP3852007B1 (en) Method, apparatus, electronic device, readable storage medium and program for classifying video
CN111209431A (zh) 一种视频搜索方法、装置、设备及介质
CN111199541A (zh) 图像质量评价方法、装置、电子设备及存储介质
CA3052846A1 (en) Character recognition method, device, electronic device and storage medium
US20230066504A1 (en) Automated adaptation of video feed relative to presentation content
WO2019196204A1 (zh) 多屏互动的网络教学方法以及装置
CN113784171A (zh) 视频数据处理方法、装置、计算机系统及可读存储介质
US11948385B2 (en) Zero-footprint image capture by mobile device
US11348254B2 (en) Visual search method, computer device, and storage medium
CN111382647A (zh) 一种图片处理方法、装置、设备及存储介质
US20190339936A1 (en) Smart mirror
US20190149878A1 (en) Determining and correlating visual context on a user device with user behavior using digital content on the user device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18917591

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 26/03/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18917591

Country of ref document: EP

Kind code of ref document: A1