CN115460379A - Teaching recording and broadcasting guide system and method based on Haesi embedded platform - Google Patents

Teaching recording and broadcasting guide system and method based on Haesi embedded platform Download PDF

Info

Publication number
CN115460379A
CN115460379A CN202211078352.XA CN202211078352A CN115460379A CN 115460379 A CN115460379 A CN 115460379A CN 202211078352 A CN202211078352 A CN 202211078352A CN 115460379 A CN115460379 A CN 115460379A
Authority
CN
China
Prior art keywords
video
teacher
teaching
real
image data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211078352.XA
Other languages
Chinese (zh)
Inventor
张俊华
黄可征
王鹏
袁庆丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Yizhi Network Space Technology Innovation Research Institute Co ltd
Original Assignee
Nanjing Yizhi Network Space Technology Innovation Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Yizhi Network Space Technology Innovation Research Institute Co ltd filed Critical Nanjing Yizhi Network Space Technology Innovation Research Institute Co ltd
Priority to CN202211078352.XA priority Critical patent/CN115460379A/en
Publication of CN115460379A publication Critical patent/CN115460379A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440263Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a teaching recording and broadcasting guide system based on a Haisi embedded platform, which comprises a student camera, a teacher camera, a video input module, a video processing subsystem, a first channel, a second channel, a third channel, a neural network reasoning engine module, a video graphics subsystem and a video output module, wherein the student camera is connected with the teacher camera through a video input interface; if there is target action object in the real-time thumbnail image data in student's seat district or teacher's teaching district, the minimum public external rectangle of all target frames is obtained in the calculation of video graphics subsystem, enlargies to original image pixel after the image interception in the minimum public external rectangle of the real-time image data in student's seat district or teacher's teaching district in the second passageway, sends to video output module. The invention has the advantages of strong robustness, wide application range, high detection speed, high precision, simple installation and deployment, low system cost and the like.

Description

Teaching recording and broadcasting guide system and method based on Haesi embedded platform
Technical Field
The invention belongs to the technical field of teaching and broadcasting, and particularly relates to a teaching, recording and broadcasting directing system and method based on a Haisi embedded platform.
Background
With the development of science and technology and the advancement of technology, with the support of various hardware technologies, mobile learning manners such as turning classes, micro classes, MOOC and the like are greatly concerned, and the essential construction of these resources is the support of the education recording and broadcasting guide system.
The manual recorded broadcast director that earliest teaching recorded broadcast director system adopted all needs to be equipped with a special staff alone to each classroom, and is not only with high costs and extravagant manpower. Then, a teaching, recording and broadcasting guide system based on infrared sensors and pressure sensors is born, a large number of sensors need to be arranged in a classroom, the system is high in manufacturing cost, and the system failure rate is high. Most education, recording and broadcasting and directing systems in the market currently adopt the traditional image processing technology to track teachers and students with specific behaviors, and the system is limited in that the robustness is not strong, the accuracy is not high, and the influence of the ages of the students and the lighting conditions of classrooms is large. A small part of education recording and broadcasting guide systems adopt laser radar and other technologies to position teachers and students with specific behaviors, but the system needs to be additionally provided with laser radar and other equipment, and has the defects of high system manufacturing cost, high installation precision requirement, calibration for each classroom, inaccurate detection of small-amplitude actions of the students and the like.
The invention with publication number CN113688680A discloses an intelligent identification and tracking system, which captures image information in a designated scene through a camera, receives person selection from a user, identifies a person target selected by the frame, processes subsequent video streams frame by frame, and controls a motor to drive the camera to rotate according to offset information, so that the person tracked by the target selected by the frame is always in the center of the picture. However, this invention requires a combination of controlling the rotational direction of the cameras, and when a plurality of targets dispersed in position appear in the video, a plurality of cameras need to be arranged.
The invention with application number 202011120026.1 relates to a method and device for transmitting streaming media data, a storage medium and an electronic device. Wherein, the method comprises the following steps: acquiring streaming media data acquired by image acquisition equipment; under the condition that a candidate image containing a face image area is detected from streaming media data, cutting the candidate image according to a target size corresponding to the acquisition resolution of image acquisition equipment to obtain a target image, wherein the target image comprises the face image area; and transmitting the target streaming media data formed by the cut target images to a background server. The invention solves the technical problem of lower transmission efficiency of the streaming media data in the related technology. But, in contrast, the invention does not allow for a clear display of different source images under different circumstances.
The invention with publication number CN101252687A discloses a method for realizing multi-channel joint region-of-interest video coding and transmission, which comprises the steps of firstly, performing spatial down-sampling on a high-resolution panoramic video collected by a panoramic camera to obtain a low-resolution video, and then coding the low-resolution video; step two, detecting interested areas of high-definition videos collected by a visible light camera, and adaptively switching two paths of videos for cutting and down-sampling according to the areas and positions of the interested areas; detecting and tracking the interested target by using an infrared thermal imager, encoding the original low-resolution infrared interested region video, and adjusting quantization parameters to realize code rate control; and step four, setting the priority of the three paths of videos, carrying out non-equal channel protection channel coding according to the priority, multiplexing into a path of code stream, sending the code stream into a channel for transmission, and carrying out code rate allocation of channel bandwidth according to the priority. The method ensures accurate detection and high-quality coding of the region of interest while ensuring the overall monitoring of the whole scene. However, three cameras are required to be arranged for the same target, at least six cameras are required to be arranged for teaching scenes, and hardware cost is high.
Therefore, under the large background of popularization of the education recording and broadcasting directing system, the education recording and broadcasting directing system which is low in system cost, high in detection precision, strong in robustness and simple to install and deploy is urgently needed.
Disclosure of Invention
The technical problem to be solved is as follows: the invention provides a teaching recording and broadcasting guide system and a teaching recording and broadcasting guide method based on a Haisi embedded platform, aiming at overcoming the defects of high manufacturing cost, low detection precision and poor robustness of the existing teaching recording and broadcasting guide system.
The technical scheme is as follows:
a kind of teaching recording and broadcasting directing system based on Haesi embedded platform, the said teaching recording and broadcasting directing system includes student's camera, teacher's camera, video input module, video processing subsystem, first channel, second channel and third channel, neural network inference engine module, video graphic subsystem and video output module;
the student cameras and the teacher cameras respectively acquire real-time image data of a student seat area and a teacher area and send the acquired real-time image data of the student seat area and the teacher area to the video processing subsystem through the video input module;
the video processing subsystem is used for preprocessing real-time image data of the student seat area and the teacher teaching area, carrying out zooming processing on the preprocessed real-time image data, sending the generated real-time thumbnail image data of the student seat area and the teacher teaching area to the neural network reasoning engine module through a first channel, and sending the preprocessed real-time image data of the student seat area and the teacher teaching area to the video graphics subsystem through a second channel and a third channel;
the neural network reasoning engine module processes real-time thumbnail image data of a student seat area and a teacher and teaching area, identifies whether a target behavior object and target frame information of the target behavior object exist in an image, outputs an identification result to the video graphic subsystem, and outputs a corresponding video image to the video output module according to the identification result by the video graphic subsystem; specifically, if the target behavior object does not exist in the real-time thumbnail image data of the student seat area and the teacher area, the video graphics subsystem outputs the real-time image data of the student seat area or the teacher area of the third channel to the video output module according to a preset playing rule; if there is target action object in the real-time thumbnail image data in student's seat district or teacher's teaching district, the minimum public external rectangle of all target frames is obtained in the calculation of video graphics subsystem, enlargies to original image pixel after the image interception in the minimum public external rectangle of the real-time image data in student's seat district or teacher's teaching district in the second passageway, sends to video output module.
Further, the video output module is connected with a display, and displays the received image through the display.
Furthermore, the teaching recording and broadcasting director system further comprises a video coding module, wherein the input end of the video coding module is connected with the output end of the video graphics subsystem, and the video coding module is used for coding the output result of the video graphics subsystem and then storing the coded output result to a specified disk.
Furthermore, the video input module comprises two physical channels for respectively receiving real-time image data of the student seating area and the teacher teaching area, the video input module enables and configures the two physical channels through hi _ mpi _ vi _ enable _ chn, hi _ mpi _ set _ chn _ attr media processing interfaces, and the hi _ mpi _ sys _ bind media processing interfaces are adopted to complete the binding of the two physical channels to the two processing groups of the video processing subsystem; and the two processing groups of the video processing subsystem are used for respectively processing the real-time image data of the student seat area and the teacher area.
Furthermore, the video processing subsystem is used for carrying out preprocessing including denoising, de-interlacing, cutting and frame rate control on the real-time image data of the student seat area and the teacher and teaching area.
Further, the neural network reasoning engine module loads a wk model file through a hi _ mpi _ svp _ nnie _ load _ model interface, acquires an input frame through a hi _ mpi _ vpss _ get _ chn _ frame interface, inputs the acquired frame into the model file through the hi _ mpi _ svp _ nnie _ forward interface to obtain a result of the corresponding image after forward reasoning, and then completes filtering, sorting and non-maximum value inhibition operations through the hi _ mpi _ svp _ nnie _ nms and the hi _ mpi _ svp _ nnie _ filter interfaces to obtain the positions of teachers and students with specific behaviors in the image.
Further, the model file is obtained by improving an end-to-end PP-YOLOE deep learning model; the neural network reasoning engine module comprises a main network, a neck network and a head prediction network which are sequentially connected;
the backbone network consists of 3 stacked convolutional layers and 4 CSPRepResStage and is used for extracting a deep feature map of an input image;
the neck network outputs the extracted image features in different sizes so as to detect targets in different sizes; the neck network consists of 5 CSPRepResStackes, upsampling is carried out from top to bottom firstly to enable the bottom layer feature map to contain stronger target semantic information, then downsampling is carried out from bottom to top to enable the top layer feature map to contain stronger position information, and finally the two features are transversely connected and fused to enable the finally output feature map to contain strong semantic information and strong position information;
the Head prediction network is used for matching two tasks of classification and bounding box regression of image features of different sizes output by the neck network by adopting an efficient task alignment Head algorithm to generate a target bounding box and prediction category information.
Further, the loss function of the model file is:
Figure BDA0003831965520000031
in the formula, alpha, beta and gamma are respectively the weight of the classification branch, the intersection ratio of the target frames and the regression branch loss function and are between 0 and 1;
Figure BDA0003831965520000041
as a true label of the positive sample, normalized values with the maximum value being the maximum IoU in each instance; loss VFL ,loss GIoU And loss DFL Optimization objectives as classification branches, bounding boxes IoU and regression branchesLogo, loss VFL And loss GIoU Respectively as follows:
Figure BDA0003831965520000042
Figure BDA0003831965520000043
where p is the predicted IoU-related classification probability, q is the target IoU score, q is 0 for negative samples, θ is the weight used to balance the positive and negative samples, p is μ Is the weight used to modulate each sample; c is the smallest bounding box containing both a and B.
Further, the teacher camera is fixed at the central position of the rear wall of the classroom; the student camera is fixed at the center of the front wall of the classroom or the center above the blackboard; the display is fixed on the platform.
The invention also discloses a teaching recording and broadcasting guide method based on the Haesi embedded platform, which is realized based on the teaching recording and broadcasting guide system;
the teaching recording and broadcasting directing method comprises the following steps:
s1, respectively acquiring real-time image data of a student seat area and a teacher area by adopting a student camera and a teacher camera, and sending the acquired real-time image data of the student seat area and the teacher area to a video processing subsystem through a video input module;
s2, preprocessing the real-time image data of the student seat area and the teacher teaching area, zooming the preprocessed real-time image data, sending the generated real-time thumbnail image data of the student seat area and the teacher teaching area to a neural network reasoning engine module through a first channel, and sending the preprocessed real-time image data of the student seat area and the teacher teaching area to a video graphic subsystem through a second channel and a third channel;
s3, processing real-time thumbnail image data of the student seat area and the teacher and teaching area, identifying whether a target behavior object and target frame information of the target behavior object exist in the image, outputting an identification result to a video graphic subsystem, and outputting a corresponding video image to a video output module by the video graphic subsystem according to the identification result;
specifically, if the target behavior object does not exist in the real-time thumbnail image data of the student seat area and the teacher area, the video graphics subsystem outputs the real-time image data of the student seat area or the teacher area of the third channel to the video output module according to a preset playing rule; if there is the target action object in the real-time thumbnail image data in student's seat district or teacher's teaching district, the minimum public external rectangle of all target frames is obtained in the calculation of video graphics subsystem, enlargies to the original image pixel after intercepting the image in the minimum public external rectangle of the real-time image data in student's seat district or teacher's teaching district in the second passageway, sends to video output module.
Has the beneficial effects that:
first, the teaching, recording and broadcasting guide system and method based on the Haisi embedded platform provided by the invention have the advantages of strong robustness, wide application range, high detection speed, high precision, simple installation and deployment, low system cost and the like.
Secondly, the teaching recording and broadcasting guide system and method based on the Haisi embedded platform, provided by the invention, apply the deep learning technology to the teaching recording and broadcasting guide system, and can learn the depth characteristics of teachers and students with specific behaviors by using the deep learning model, so that under the condition that training samples are sufficient, the robustness of the detection method is greatly improved, and the teaching recording and broadcasting guide system has the advantage of wide application range. The deep learning model is used for detecting teachers and students with specific behaviors, the detection speed is high, the accuracy is high, the method is simple, and the influence of ambient light and interference objects is not easy to cause.
Thirdly, the teaching recording and broadcasting guide system and method based on Haesi embedded platform provided by the invention only adopt two cameras, an embedded recording and broadcasting guide host adopting Haesi 35 series processors and a plurality of connecting wires, have the advantages of simple installation and deployment, low cost and the like, and are beneficial to popularization and popularization of the teaching recording and broadcasting guide system.
Drawings
Fig. 1 is a flow chart of a teaching recording and broadcasting method based on a haisi embedded platform;
fig. 2 is a schematic structural diagram of a teaching recording and broadcasting directing system based on a seas embedded platform;
FIG. 3 is a schematic diagram of one of the student cameras and the teacher camera mounting method;
fig. 4 is a schematic structural diagram of a neural network inference engine module.
Detailed Description
The following examples are presented to enable one of ordinary skill in the art to more fully understand the present invention and are not intended to limit the invention in any way.
Referring to fig. 2, this embodiment provides a teaching recording and broadcasting directing system based on a haisi embedded platform, where the teaching recording and broadcasting directing system includes a student camera, a teacher camera, a video input module, a video processing subsystem, a first channel, a second channel, a third channel, a neural network inference engine module, a video graphics subsystem, and a video output module.
Student's camera and teacher's camera acquire the real-time image data in student's seat district and teacher's teaching district respectively, and the real-time image data in student's seat district and teacher's teaching district that will acquire sends video processing subsystem to through video input module.
The video processing subsystem carries out the preliminary treatment to the real-time image data in student seat district and teacher's teaching district, zooms the real-time image data after the preliminary treatment, sends the real-time thumbnail image data in student seat district and the teacher's teaching district that will generate to neural network inference engine module through first passageway, sends the real-time image data in student seat district and the teacher's teaching district after the preliminary treatment to video graphics subsystem through second passageway and third passageway.
The neural network reasoning engine module processes real-time thumbnail image data of a student seat area and a teacher and teaching area, identifies whether a target behavior object and target frame information of the target behavior object exist in an image, outputs an identification result to the video graphic subsystem, and outputs a corresponding video image to the video output module according to the identification result by the video graphic subsystem; specifically, if the target behavior object does not exist in the real-time thumbnail image data of the student seat area and the teacher area, the video graphics subsystem outputs the real-time image data of the student seat area or the teacher area of the third channel to the video output module according to a preset playing rule; if there is the target action object in the real-time thumbnail image data in student's seat district or teacher's teaching district, the minimum public external rectangle of all target frames is obtained in the calculation of video graphics subsystem, enlargies to the original image pixel after intercepting the image in the minimum public external rectangle of the real-time image data in student's seat district or teacher's teaching district in the second passageway, sends to video output module.
Referring to fig. 1, an embodiment of the present invention provides a teaching recording and broadcasting directing method and system based on a haisi embedded platform, where the method includes:
step 1: the method comprises the steps of obtaining real-time image data in a classroom through a student camera and a classroom camera, and transmitting the real-time image data to an embedded recording and broadcasting guide host.
And 2, step: and the VI module (Video Input module) of the embedded recording and broadcasting guide host analyzes and processes the real-time image data and outputs the real-time image data to a VPSS module (Video Process Sub-System Video processing subsystem) of the embedded recording and broadcasting guide host.
And 3, step 3: the VPSS module of the embedded recording and broadcasting director host processes the input of the VI module and outputs the processed input to an NNIE module (Neural Network interference Engine module) of the embedded recording and broadcasting director host.
And 4, step 4: and the NNIE module of the embedded recording and broadcasting-directing host loads the model according to the input of the VPSS module, performs forward reasoning and post-processing on a reasoning result, and determines the positions of teachers and students with specific behaviors in the picture.
And 5: and a VGS module (Video graphics Sub-System Video graphics subsystem) of the embedded recording and broadcasting director host processes the picture output by the VPSS module according to the output of the NNIE module, and the embedded host automatically switches the picture according to predefined logic control. And the processed Video is delivered to a VO (Video Output module) of the embedded recording and broadcasting director host computer to be Output to a display, or a VENC (Video Encode module) stores the Video code in a disk of the embedded recording and broadcasting director host computer.
In one implementation, the step 1 includes:
teacher's camera and student's picture through installation in the classroom are recorded to teacher's camera and student's camera to install respectively at the classroom both ends with student's camera, one of them mounted position is as shown in figure 3.
The teacher camera and the student cameras send real-time image data to a VI module of the embedded recording and broadcasting director host in a data packet mode; the embedded recording and broadcasting director host is respectively connected with the teacher camera, the student camera and the display through data interfaces.
Preferably, the processor of the embedded recording and broadcasting director host is a Haisi Hi35 series chip.
In one implementation, the step 2 includes:
the method comprises the steps of receiving the embedded recording and broadcasting-directing host, configuring a sensor driver corresponding to a camera by a VI module, receiving and analyzing real-time image data, specifically configuring Haisis MPP (Media Process Platform) parameters, configuring and initializing a video cache pool and an MPP system by using Media processing interfaces such as hi _ mpi _ vb _ init, hi _ mpi _ sys _ init and the like, loading the corresponding sensor driver, realizing the acquisition of the real-time image data, inputting the video, and processing the data transmission of channel data streams.
The real-time image data is output to the VI module in the form of two-way data packets, the VI module receives and analyzes the real-time image data through two physical channels, media processing interfaces such as hi _ mpi _ VI _ enable _ chn and hi _ mpi _ set _ chn _ attr are used for enabling and configuring the physical channels, and media processing interfaces such as hi _ mpi _ sys _ bind are used for binding each group (group) of the VPSS module.
In one implementation, the step 3 includes:
after the data output by the two channels processed by the VI module of the embedded recording and broadcasting director host is bound and output to two groups of the VPSS module, the VPSS module processes the input, specifically includes starting and initializing the VPSS module, configuring parameters of the VPSS module, specifically includes denoising, de-interlacing, clipping, frame rate control, and the like on the input picture, and then performs operations such as scaling, pixel format conversion, and the like on each channel of each group.
Specifically, in this embodiment, each group of the VPSS module of the embedded recording and broadcasting director configures three channels, the numbers of the channels are 0, 1, and 2, the resolution set by channel 0 is set to 416 × 416, and the resolutions of the other two channels maintain the original input resolution. And binding channel number 1,2 of each group with the VGS module of the embedded recording and broadcasting director host. All the above operations are realized by using the media processing interfaces of hi _ mpi _ vpss _ create _ grp, hi _ mpi _ vpss _ start _ grp, hi _ mpi _ vpss _ set _ chn _ attr, hi _ mpi _ sys _ bind, and the like.
In one implementation, step 4 comprises:
the model file is a deep learning model detected by teachers and students with specific behaviors, which is obtained by training on a self-built training set and optimizing and converting on the deep learning model after training is completed. And the NNIE module of the embedded recording and broadcasting director host calculates and distributes auxiliary space according to the size of the model file and the resolution ratio output by the VPSS module channel 0, loads the model file and configures the parameters of the model file. After loading, acquiring each frame of picture output by the VPSS module channel 0 according to frames through a specified media processing interface, inputting the frame of picture into a model file, and performing forward reasoning. And performing post-processing on the result generated by the inference, specifically comprising operations such as non-maximum suppression, target filtering, sorting and the like.
Specifically, in this embodiment, the embedded recording and broadcasting host NNIE module loads the wk model file applicable to the embedded recording and broadcasting host NNIE module through the hi _ mpi _ svp _ NNIE _ load _ model interface, acquires an input frame through the hi _ mpi _ vpss _ get _ chn _ frame interface, inputs the captured frame into the model file through the hi _ mpi _ svp _ NNIE _ forward interface, obtains a result of a corresponding image after forward reasoning, and then completes operations such as filtering, sorting, non-maximum suppression and the like through the interfaces such as the hi _ mpi _ svp _ NNIE _ nms, the hi _ i _ svp _ NNIE _ filter and the like, so as to obtain the positions of teachers and students with specific behaviors in the picture.
Specifically, the deep learning model is an improved end-to-end PP-YOLOE deep learning model. As shown in fig. 4, the PP-YOLOE network includes a backbone network (backbone), a neck network (nack), and a head prediction network (head);
wherein, the Backbone part: a ReReReResBlock structure combining the advantages of a residual connecting network (ResNet) and a dense connecting network (DenseNet) is designed, the ReReResBlock structure and a cross-phase local (CSP) structure are combined into a CSPRepResstage, meanwhile, an effective extrusion and Extraction Structure (ESE) is also introduced into the CSPRepResstage to exert channel attention, and the characteristic generation capability of the CSPRepResstage is further improved. The main network part consists of 3 stacked convolution layers and 4 CSPRepResstage, and the main network is used for processing the input image to generate a deep feature map;
the Neck part: the hack is responsible for outputting the extracted image features in different sizes for detecting different size targets. The portion of Neck follows the structure of yollov 5's Feature Pyramid Network (FPN) + Path Aggregation Network (PAN), which consists of 5 cspripsenstage. Unlike Backbone, residual concatenation in ESE and RepReBlock is removed in Neck. The bottom-layer feature diagram contains stronger target semantic information by up-sampling from top to bottom at the left side of the Neck part, and the PAN structure at the right side of the Neck part is down-sampled from bottom to top so that the top-layer feature diagram contains stronger position information, and the two features are transversely connected and fused so that the finally output feature diagram contains strong semantic information and strong position information;
head section: the Head part is responsible for processing image features of different sizes and generating a target boundary box and prediction category information, the PP-YOLOE provides an Efficient Task alignment Head (effective Task-aligned Head) to better match two tasks of classification and boundary box regression, the ET Head is an improvement based on Task-aligned one-stage object detection (TOOD), and the specific improvement is that effective extrusion and extraction (ESE) is used for replacing layer attention (layer alignment), the alignment mode of classification branch Task alignment is simplified into short circuit connection (short circuit), and meanwhile, a distributed focus loss layer is used for replacing the alignment mode of regression branch. For the classification branch and the bounding box regression branch, yolloe uses the zoom Loss (variance local) and the Distribution focus Loss (Distribution Focal local), respectively.
Figure BDA0003831965520000081
The loss function of the PP-YOLOE model training is shown in the above formula (1), wherein α, β and γ are the weights of the classification branch, the target block intersection ratio (IoU) and the regression branch loss function, respectively, and are between 0 and 1.
Figure BDA0003831965520000084
As a true label for the positive sample, is a normalized value whose maximum value is the maximum IoU in each instance. loss VFL ,loss GIoU And loss DFL As optimization targets for classification branches, bounding boxes IoU and regression branches, loss VFL And loss GIoU The following expressions (2) and (3) respectively show that in the expression (2), p is the predicted IoU-related classification probability, q is the target IoU score, q is 0 for negative samples, theta is the weight for balancing the positive and negative samples, and p μ Is the weight used to modulate each sample. And (3) wherein C is a minimum bounding box containing both A and B.
Figure BDA0003831965520000082
Figure BDA0003831965520000083
In one implementation, step 5 comprises:
and the embedded recording and broadcasting director host specifically comprises a target central point coordinate, a target width, a target height, a target category and a target confidence coefficient according to the target information obtained in the step 4. And regarding the target with the target confidence coefficient higher than the preset threshold value obtained in 10 frames of pictures, considering that the target is the actually existing target in the application scene, and otherwise, discarding the result. And after obtaining the targets meeting the conditions, calculating the minimum common external rectangle of all target frames in each picture, if the size of the rectangle is smaller than 1920 x 1080, modifying the height and width of the rectangle to 1920 x 1080, and if the size of the rectangle is larger than 1920 x 1080, not modifying.
Specifically, in this embodiment, after the VGS module of the embedded recording and broadcasting director host completes initialization, auxiliary space allocation, and parameter configuration, the VGS module processes image data of channel 1 of two groups of the VPSS module, which specifically includes: and (3) cutting and scaling the picture according to the rectangular data by using interfaces such as hi _ mpi _ vgs _ add _ scale _ task, hi _ mpi _ vgs _ osd _ task and the like, so as to ensure that the resolution of the output picture is 1920 × 1080 and the detected target is always positioned in the center of the picture.
And when the target detection is carried out through the step 4 and the target cannot be detected in the picture of the student camera, the student is judged to stand up or raise hands, and the embedded recording and broadcasting director host sends a command to switch the main picture of the student to the channel 2, namely the panoramic picture of the student.
And 4, when the target is detected in the step 4 and the target cannot be detected in the picture of the teacher camera, judging that no teacher target can be tracked currently, and sending a command by the embedded recording and broadcasting director host to switch the main picture of the teacher to the channel 2, namely the panoramic picture of the teacher. The priority between the teacher panoramic picture and the student panoramic picture may be autonomously set by the user.
Specifically, in this embodiment, after the VGS module of the embedded recording and broadcasting director host completes the picture processing and the embedded recording and broadcasting director host determines the order of the director and the picture switching, the obtained picture may be output to the display or the disk of the embedded recording and broadcasting director host through the VO module or the VENC module of the embedded recording and broadcasting director host according to the actual requirement. Specifically, the method is realized by calling a media processing interface such as hi _ mpi _ vo _ send _ frame, hi _ mpi _ venc _ start, and the like.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (10)

1. A teaching recording and broadcasting guide system based on a Haisi embedded platform is characterized in that the teaching recording and broadcasting guide system comprises a student camera, a teacher camera, a video input module, a video processing subsystem, a first channel, a second channel, a third channel, a neural network reasoning engine module, a video graphics subsystem and a video output module;
the student cameras and the teacher cameras respectively acquire real-time image data of student seat areas and teacher areas and send the acquired real-time image data of the student seat areas and the teacher areas to the video processing subsystem through the video input module;
the video processing subsystem is used for preprocessing real-time image data of the student seat area and the teacher teaching area, carrying out zooming processing on the preprocessed real-time image data, sending the generated real-time thumbnail image data of the student seat area and the teacher teaching area to the neural network reasoning engine module through a first channel, and sending the preprocessed real-time image data of the student seat area and the teacher teaching area to the video graphics subsystem through a second channel and a third channel;
the neural network reasoning engine module processes real-time thumbnail image data of a student seat area and a teacher and teaching area, identifies whether a target behavior object and target frame information of the target behavior object exist in an image, outputs an identification result to the video graphic subsystem, and outputs a corresponding video image to the video output module according to the identification result by the video graphic subsystem; specifically, if the target behavior object does not exist in the real-time thumbnail image data of the student seat area and the teacher area, the video graphics subsystem outputs the real-time image data of the student seat area or the teacher area of the third channel to the video output module according to a preset playing rule; if there is target action object in the real-time thumbnail image data in student's seat district or teacher's teaching district, the minimum public external rectangle of all target frames is obtained in the calculation of video graphics subsystem, enlargies to original image pixel after the image interception in the minimum public external rectangle of the real-time image data in student's seat district or teacher's teaching district in the second passageway, sends to video output module.
2. The Haisi embedded platform based teaching, recording and broadcasting guide system of claim 1, wherein the video output module is connected to a display, and displays the received image through the display.
3. The haisi embedded platform based teaching recording and broadcasting and directing system of claim 1, further comprising a video coding module, wherein an input end of the video coding module is connected with an output end of the video graphics subsystem, and the video coding module is used for coding an output result of the video graphics subsystem and then storing the coded output result to a designated disk.
4. The haisi embedded platform based teaching, recording, broadcasting and directing system of claim 1, wherein the video input module comprises two physical channels for receiving real-time image data of a student seat area and a teacher area, respectively, the video input module enables and configures the two physical channels through hi _ mpi _ vi _ enable _ chn, hi _ mpi _ set _ chn _ attr media processing interfaces, and the hi _ mpi _ sys _ bind media processing interfaces are adopted to complete binding of the two physical channels to the two processing groups of the video processing subsystem; and the two processing groups of the video processing subsystem are used for respectively processing the real-time image data of the student seat area and the teacher area.
5. The Haas embedded platform based teaching recording and broadcasting guide system as claimed in claim 1, wherein the video processing subsystem is used to perform pre-processing including de-noising, de-interlacing, cropping, and frame rate control on the real-time image data of the student seating area and the teacher teaching area.
6. The Haisi embedded platform based teaching recording and broadcasting guide system of claim 1, wherein the neural network inference engine module loads wk model file through hi _ mpi _ svp _ nnie _ load _ model interface, acquires input frame through hi _ mpi _ vpss _ get _ chn _ frame interface, inputs the captured frame into the model file using hi _ mpi _ svp _ nnie _ forward interface, obtains result of forward inference of corresponding image, and then completes teacher filtering, sorting, non-maximum suppression operation using hi _ mpi _ svp _ nnie _ nms and hi _ mpi _ svp _ nnie _ filter interface, obtains location of students in the picture and specific behaviors.
7. The haisi embedded platform based teaching, recording, broadcasting and directing system of claim 6, wherein the model file is obtained based on end-to-end PP-YOLOE deep learning model improvement; the neural network reasoning engine module comprises a main network, a neck network and a head prediction network which are sequentially connected;
the backbone network consists of 3 stacked convolutional layers and 4 CSPRepResStage and is used for extracting a deep feature map of an input image;
the neck network outputs the extracted image features in different sizes to detect targets in different sizes; the neck network consists of 5 CSPRepResStackes, up-sampling is carried out from top to bottom firstly, so that the bottom layer feature map contains stronger target semantic information, down-sampling is carried out from bottom to top, so that the top layer feature map contains stronger position information, and finally the two features are transversely connected and fused, so that the finally output feature map contains strong semantic information and strong position information;
the Head prediction network is used for matching two tasks of classification and bounding box regression of image features of different sizes output by the neck network by adopting an efficient task alignment Head algorithm to generate a target bounding box and prediction category information.
8. The Haisi embedded platform-based teaching recording and broadcasting director system of claim 7, wherein the loss function of the model file is:
Figure FDA0003831965510000021
in the formula, alpha, beta and gamma are respectively the weight of the classification branch, the intersection ratio of the target frames and the regression branch loss function and are between 0 and 1;
Figure FDA0003831965510000022
as a true label of the positive sample, a normalized value whose maximum value is the maximum IoU in each example; loss VFL ,loss GIoU And loss DFL As optimization targets for classification branches, bounding boxes IoU and regression branches, loss VFL And loss GIoU Respectively as follows:
Figure FDA0003831965510000023
Figure FDA0003831965510000031
where p is the predicted IoU-related classification probability, q is the target IoU score, q is 0 for negative samples, θ is the weight used to balance the positive and negative samples, p μ Is the weight used to modulate each sample; c is the smallest bounding box containing both a and B.
9. The haisi embedded platform based teaching recording and broadcasting guide system as claimed in claim 1, wherein the teacher camera is fixed at the center position of the wall behind the classroom; the student camera is fixed at the center of the front wall of the classroom or the center above the blackboard; the display is fixed on the platform.
10. A teaching recording and broadcasting directing method based on Haesi embedded platform, which is characterized in that the teaching recording and broadcasting directing method is realized based on the teaching recording and broadcasting directing system of any one of claims 1-9;
the teaching recording and broadcasting directing method comprises the following steps:
s1, respectively acquiring real-time image data of a student seat area and a teacher area by adopting a student camera and a teacher camera, and sending the acquired real-time image data of the student seat area and the teacher area to a video processing subsystem through a video input module;
s2, preprocessing the real-time image data of the student seat area and the teacher teaching area, zooming the preprocessed real-time image data, sending the generated real-time thumbnail image data of the student seat area and the teacher teaching area to a neural network reasoning engine module through a first channel, and sending the preprocessed real-time image data of the student seat area and the teacher teaching area to a video graphic subsystem through a second channel and a third channel;
s3, processing real-time thumbnail image data of the student seat area and the teacher and teaching area, identifying whether a target behavior object and target frame information of the target behavior object exist in the image, outputting an identification result to a video graphic subsystem, and outputting a corresponding video image to a video output module by the video graphic subsystem according to the identification result;
specifically, if the target behavior objects do not exist in the real-time thumbnail image data of the student seat area and the teacher teaching area, the video graphics subsystem outputs the real-time image data of the student seat area or the teacher teaching area of the third channel to the video output module according to a preset playing rule; if there is target action object in the real-time thumbnail image data in student's seat district or teacher's teaching district, the minimum public external rectangle of all target frames is obtained in the calculation of video graphics subsystem, enlargies to original image pixel after the image interception in the minimum public external rectangle of the real-time image data in student's seat district or teacher's teaching district in the second passageway, sends to video output module.
CN202211078352.XA 2022-09-05 2022-09-05 Teaching recording and broadcasting guide system and method based on Haesi embedded platform Pending CN115460379A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211078352.XA CN115460379A (en) 2022-09-05 2022-09-05 Teaching recording and broadcasting guide system and method based on Haesi embedded platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211078352.XA CN115460379A (en) 2022-09-05 2022-09-05 Teaching recording and broadcasting guide system and method based on Haesi embedded platform

Publications (1)

Publication Number Publication Date
CN115460379A true CN115460379A (en) 2022-12-09

Family

ID=84302939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211078352.XA Pending CN115460379A (en) 2022-09-05 2022-09-05 Teaching recording and broadcasting guide system and method based on Haesi embedded platform

Country Status (1)

Country Link
CN (1) CN115460379A (en)

Similar Documents

Publication Publication Date Title
CN106791710B (en) Target detection method and device and electronic equipment
US8953900B2 (en) Increased quality of image objects based on depth in scene
CN103795976A (en) Full space-time three-dimensional visualization method
US10685263B2 (en) System and method for object labeling
CN110751630B (en) Power transmission line foreign matter detection method and device based on deep learning and medium
CN103905734A (en) Method and device for intelligent tracking and photographing
CN102741879A (en) Method for generating depth maps from monocular images and systems using the same
CN102857739A (en) Distributed panorama monitoring system and method thereof
CN109063667B (en) Scene-based video identification mode optimization and pushing method
CN110060230B (en) Three-dimensional scene analysis method, device, medium and equipment
US11599974B2 (en) Joint rolling shutter correction and image deblurring
CN117152443B (en) Image instance segmentation method and system based on semantic lead guidance
WO2022205329A1 (en) Object detection method, object detection apparatus, and object detection system
CN115460379A (en) Teaching recording and broadcasting guide system and method based on Haesi embedded platform
US10735660B2 (en) Method and device for object identification
WO2023086398A1 (en) 3d rendering networks based on refractive neural radiance fields
Gupta et al. Reconnoitering the Essentials of Image and Video Processing: A Comprehensive Overview
CN114120231A (en) Intelligent monitoring system and method for court
CN113239931A (en) Logistics station license plate recognition method
KR20230064959A (en) Surveillance Camera WDR(Wide Dynamic Range) Image Processing Using Object Detection Based on Artificial Intelligence
CN114495044A (en) Label identification method, label identification device, computer equipment and storage medium
CN113542866B (en) Video processing method, device, equipment and computer readable storage medium
RU2788301C1 (en) Object recognition method in video surveillance system
US11908340B2 (en) Magnification enhancement of video for visually impaired viewers
CN111476353B (en) Super-resolution method of GAN image introducing significance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination