CN113191403A - Generation and display system of theater dynamic poster - Google Patents

Generation and display system of theater dynamic poster Download PDF

Info

Publication number
CN113191403A
CN113191403A CN202110408710.8A CN202110408710A CN113191403A CN 113191403 A CN113191403 A CN 113191403A CN 202110408710 A CN202110408710 A CN 202110408710A CN 113191403 A CN113191403 A CN 113191403A
Authority
CN
China
Prior art keywords
target
poster
yolo
user target
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110408710.8A
Other languages
Chinese (zh)
Inventor
朱云
陈晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI THEATRE ACADEMY
Original Assignee
SHANGHAI THEATRE ACADEMY
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI THEATRE ACADEMY filed Critical SHANGHAI THEATRE ACADEMY
Priority to CN202110408710.8A priority Critical patent/CN113191403A/en
Publication of CN113191403A publication Critical patent/CN113191403A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a generation and display system of a theater dynamic poster, which belongs to the technical field of theater dynamic poster equipment and comprises a system platform, wherein the system platform comprises an image capture module, a function menu module, a dynamic poster display module and an infrared sensor area, and the actual distance between a user target and the system is judged through an infrared sensor; when the actual distance is greater than the standard distance, the system is in a dynamic poster mode, namely the user target is at a longer distance, and the system captures gestures and image features of the user target through the image capture module as input to control the system. In the display process of the dynamic poster, the multimedia technology, the artificial intelligent image recognition technology and the motion sensing technology are widely applied, so that the diversity and the interactivity of the theater poster are enhanced; the exhibition mode of combining the poster and the drama of the performance can mobilize the enthusiasm of the audience, so that the audience has greater interest and deeper understanding on the drama, and the exhibition mode has very important significance.

Description

Generation and display system of theater dynamic poster
Technical Field
The invention relates to the technical field of theater dynamic poster equipment, in particular to a generation and display system of a theater dynamic poster.
Background
In the production and display of posters, theater posters are generally regarded as printed posters to be posted on posts, walls, and the like, are displayed in public places, and often convey information in a two-dimensional static form. Theatre posters have been developed with the development of society, science and technology, media, etc., and the forms of expression have been developed from hand-drawing to the present by using computer synthesis technology. However, the domestic theater posters have obvious defects, and because no special theater poster design company exists in China, some advertisement or plane design companies only have the theater poster design business. Most of the design companies also only have to do some works of adding characters, processing graphics and simple typesetting, and cannot take charge of the whole planning and design of theater posters. The visual planarization of the research object only stays on a two-dimensional plane, and only the visual elements such as characters, star pictures or patterns and the like are subjected to image special effect processing and layout arrangement through computer design software. At present, the large LED screen gradually replaces walls and columns and becomes a display platform of a theater poster. The poster is in the form of a two-dimensional picture, and is played in a scrolling mode in the screen. The staff copies the pictures to the storage space of the U disk or the LED, and the pictures or the video player provided by the screen realize scrolling playing.
In the aspect of human-computer interaction related technology, human-computer interaction is realized by using a certain dialogue language between a user and a computer and realizing information interaction between the user and the computer in a certain interaction mode. In the current stage, in the aspect of intelligent manual interaction of multiple channels and multimedia, multiple sense channels and action channels of people, such as voice, gestures, expressions and the like, are utilized to interact with a computer in a parallel and non-accurate mode, and the method mainly comprises a vision-based gesture recognition technology and an augmented reality technology.
The gesture recognition method based on vision is that a color image is collected through a camera, and then the color image is analyzed and processed through a computer, and finally gesture recognition is achieved. Generally, the identification method is divided into five stages: the method comprises the steps of gesture image acquisition, gesture image segmentation, gesture tracking, feature extraction and gesture recognition.
The augmented reality technology can combine real world information and virtual world information together, and superimposes entity information which is difficult to experience in a certain time space range of the real world through a computer technology after analog simulation, so that virtual information is applied to the real world, and the reality is enhanced. The audience looks up information through the augmented reality glasses or the helmet-type display, and can realize the functions of shooting pictures, sending information and the like through voice instructions.
In the aspect of artificial intelligence image recognition technology, along with the improvement of computer hardware performance, the application of deep learning in computer vision is favored. The conventional image recognition technology generally comprises the steps of image preprocessing, feature extraction, classifier design and reasoning prediction and recognition, and in the conventional classification mode, the classification accuracy is limited by the extracted features, the process is complex and tedious, the workload is large, and the accuracy cannot reach an ideal level. Compared with the traditional identification method, the application of the Convolutional Neural Network (CNN) based on deep learning realizes that the data is directly output from the original training data end to end, and the final classification result is directly output by the data of the output layer. The method does not extract explicit image features, but directly uses unstructured original images as the input of the network without processing, and automatically extracts the features of the images by a feature extraction layer corresponding to a convolutional neural network, so that the classification performance is superior.
At present, image analysis based on deep learning is applied in the medical treatment, industry, national defense weapons and other directions in a mature industrial level, and is less applied in the fields of art, exhibition and display and the like.
The traditional image recognition method usually performs feature extraction manually, such as HOG (histogram of oriented gradient) and SIFT (scale invariant feature transform) matched support vector machine (S VI M) to detect and recognize targets, and each type of targets need to be manually selected for features, so that the workload is very large, and the detection any kind of tasks can not meet the requirements at all in various times.
Disclosure of Invention
In view of the problems identified in the background, the present invention provides a system for generating and displaying a dynamic poster in a theater.
The technical scheme of the invention is realized as follows:
a generation and display system of a theater dynamic poster comprises a system platform, and is characterized in that: the system platform comprises an image capturing module, a function menu module, a dynamic poster display module and an infrared sensor area, wherein the actual distance between a user target and the system is judged through the infrared sensor, and is compared with a standard distance, wherein the standard distance is set as a fixed value;
when the actual distance is larger than the standard distance, the system is in a dynamic poster mode, namely the user target is at a longer distance, and the system captures gestures and image characteristics of the user target through an image capturing module to be used as input to control the system;
when the actual distance is smaller than the standard distance, the system is in a manual operation mode, namely, the user target selects and browses the playhouse poster content through the function menu module to check the detailed information of the playhouse drama.
The invention is further configured to: the system platform is in communication connection with the user terminal.
The invention is further configured to: the image capturing module is used for realizing real-time detection and recognition of a user target by adopting a rapid recognition and positioning method based on a YOLO v2 algorithm model, and is used for detecting and recognizing gestures and image characteristics of the user target;
the rapid identification and positioning method based on the YOLO v2 algorithm model comprises the following steps:
s1, designing a user target real-time detection and identification system based on a YOLO v2 algorithm;
s2, designing user target real-time detection and identification system software, and applying the software to a system platform;
the invention is further configured to: in said step S1, including,
establishing a YOLO v2 algorithm model, determining main parameters of the YOLO v2 algorithm model, and determining the width and height dimensions of a prior frame based on a K-means clustering algorithm;
analyzing a model loss function and a network training method, deducing batch normalization forward propagation and backward propagation, and solving the gradient of a weighted value, a bias and a linear scaling parameter of a YOLO v2 algorithm;
the method comprises the steps of taking the information of the whole image as direct input, converting a target recognition problem into a regression problem, dividing the input image into s x s unit cells by using an s x s network through a YOLO v2 algorithm, learning and extracting the features of each unit cell through a network training method, detecting the task of recognizing a target by using the corresponding unit cell if the target is detected to fall into one unit cell, directly predicting the position of each network for an output feature map, and obtaining the central coordinates (x) of a priori frame on the assumption that N boundary frames exist, wherein the central coordinates (x) of the priori frame can be obtained firstlya,ya) Width and height of prior frame (w)a,ha) And a target object confidence C.
The invention is further configured to: in said step S2, including,
designing user target real-time detection and recognition system software based on a UI graphical interface of Qt, wherein the software is used for testing the performance of a system, and the software interface comprises an interface display control, a camera calibration control and a parameter setting control;
the software runs under the environment of the Ubuntu system and comprises the following steps,
s201, initializing a system;
s202, calibrating a camera;
s203, read in a frame of image, determine if it is empty?
S204, if the signal is empty, ending; if not, the target detection is performed and the mark frame is output, and the process returns to step S203 again.
In conclusion, the beneficial effects of the invention are as follows:
1. in the display process of the dynamic poster, the multimedia technology, the artificial intelligent image recognition technology and the motion sensing technology are widely applied, so that the diversity and the interactivity of the theater poster are enhanced; the exhibition mode of combining the poster and the drama of the performance can mobilize the enthusiasm of the audience, so that the audience has greater interest and deeper understanding on the drama, and the exhibition mode has very important significance.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a diagram of a Darknet-19 network architecture according to the present invention;
FIG. 2 is a block diagram of a prediction edge of the present invention;
FIG. 3 is a process diagram of the target detection based on the YOLO v2 algorithm of the present invention;
FIG. 4 is a code diagram of other key parameter settings of the present invention;
FIG. 5 is a functional flow diagram of the real-time monitoring and identification system of the present invention;
FIG. 6 is a graphical real-time monitoring and identification software interface diagram of the Qt-based UI of the present invention;
fig. 7 is a schematic structural diagram of a system application product of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is described below with reference to fig. 1-7:
a generation and display system of a theater dynamic poster comprises a system platform, wherein the system platform comprises an image capture module, a function menu module, a dynamic poster display module and an infrared sensor area, and is shown in figure 7. Before a user target comes to the system, judging the actual distance between the user target and the system through an infrared sensor, and comparing the actual distance with a standard distance, wherein the standard distance is set as a fixed value;
when the actual distance is larger than the standard distance, the system is in a dynamic poster mode, namely the user target is at a longer distance, and the system captures gestures and image characteristics of the user target through an image capturing module to be used as input to control the system;
when the actual distance is smaller than the standard distance, the system is in a manual operation mode, namely, the user target selects and browses the playhouse poster content through the function menu module to check the detailed information of the playhouse drama.
In addition, the system platform is in communication connection with the user terminal, and the user target can operate the system through the mobile phone.
Preferably, when the system is in dynamic poster mode, the system detects gesture manipulation or other image features that identify the user's target, thereby manipulating the interface of the system. The image capturing module is used for realizing real-time detection and recognition of the user target by adopting a rapid recognition and positioning method based on a YOLO v2 algorithm model, and is used for detecting and recognizing the gesture and the image characteristics of the user target.
The rapid identification and positioning method based on the YOLO v2 algorithm model comprises the following steps:
s1, designing a user target real-time detection and identification system based on a YOLO v2 algorithm.
And S2, designing user target real-time detection and identification system software, and applying the software to a system platform.
In the design of a user target real-time detection and recognition system based on a YOLO v2 algorithm, a YOLO v2 algorithm model is established and main parameters of the YOLO v2 algorithm model are determined by explaining the network structure of the YOLO v2 algorithm model, the principle and implementation of the algorithm, and the specific detection principle and flow, and the prior frame width and height dimension is determined based on a K-means clustering algorithm.
Principle description of the YOLO v2 algorithm:
the basic model of Yolov 2 is Darknet-19, which is shown in FIG. 1, and is a Darknet-19 network structure diagram, wherein the network structure includes 19 convolutional layers and 5 maximum pooling layers.
The YOLO directly inputs the information of the whole image, and evaluates the position coordinates of the output target at the output layer and the confidence coefficient of the target at the position; a sliding window is not used in the process of extracting the image features; the deviation caused by human factors is reduced in the network training process; the whole training and detection process of the YOLO algorithm adopts a complete end-to-end concept and can be independently used as a target object block detection and identification framework.
The specific network structure of YOLO v2 is shown in table 1.
Table 1: yolo v2 model network structure table
Figure BDA0003023313640000071
Figure BDA0003023313640000081
The YOLO algorithm firstly uses an s x s network to divide an input image into s x s unit cells, the characteristics of each unit cell are extracted through training and learning, if a target is detected to fall in one unit cell, the corresponding unit cell is responsible for detecting and identifying the task of the target, for an output characteristic diagram, the position of each network is directly predicted, and the central coordinates (x) of a prior frame can be obtained firstly on the assumption that N boundary frames exista,ya) Width and height of prior frame (w)a,ha) And a target object confidence C.
As shown in fig. 2, is a prediction edge block diagram.
In the early stage of the YOLO algorithm training, the frame prediction is unstable, and the prediction expression is as follows:
bx=(tx*wa)+xa
by=(ty*ha)+ya
wherein, tx,tyIs a parameter obtained by training learning, bxCoordinates of lateral position representing the center of the predicted bounding box, byThe vertical position coordinates representing the center of the predicted bounding box. If the offset of the cell from the upper left corner of the input image is (c)x,cy) Then the predicted frame is:
bx=σ(tx)+cx
by=σ(ty)+cy
Figure BDA0003023313640000092
Figure BDA0003023313640000091
wherein, tw,thIs a parameter obtained by training learning, bwTo predict the width of the bounding box, bhThe height of the bounding box is predicted.
And the confidence coefficient represents the prediction precision of the boundary frame by the probability of the boundary frame and the corresponding target object to be detected and identified and the intersection degree of the boundary frame and the real position. The calculation expression is as follows:
C=P1I
in the formula, P1Representing the probability of the target object existing in the prediction boundary box in the current network model, and when the target object does not exist in the prediction boundary box in the network model, P10; i denotes the accuracy of predicting the target object within the bounding box. In addition, in the YOLO algorithm, a group of probability sets is generated in each cell, and each probability set comprises A conditional probabilities P2And the method is used for determining the optimal position for detecting and identifying the target object. P3Is P2Multiplying by P1The specific detection and identification process is as follows:
(1) dividing an input image into 13 × 13 cells
(2) Each cell is responsible for predicting 5 bounding boxes, and for the detection and identification tasks of the invention, A is equal to 1, so that the prediction length is s2Vector of (5B + C)
(3) Obtaining the optimal position of the target object to be detected and identified, and comparing s2(5B + C) performing non-maximum suppression processing to judge whether the detection recognition target bounding box is reserved.
P2P1I=P3I
The detection principle and process based on the YOLO v2 algorithm are shown in fig. 3, which is a target detection process based on the YOLO v2 algorithm.
Determination of the primary parameters of the YOLO v2 model:
and in YOLO v2, extracting the prior frame scale which is more matched with the size of the sample target object by using a K-means clustering method, and taking the clustering K value as the number of the prior frame candidate frames. The width and height value of the cluster center is used as the width and height of the prior box. The pseudo code of the K-means clustering method is as follows:
Figure BDA0003023313640000101
Figure BDA0003023313640000111
aiming at the clustering of the frame scale, the invention measures the distance between each frame and the frame of the clustering center by adopting an intersection-to-parallel ratio index:
dji=1-IOU(xj,μi)
in the formula,. mu.iCenter frame representing cluster, xjOther frames are represented, and the distance is smaller when the intersection ratio (IOU) of the two frames is larger. The accuracy is not enough due to the fact that the selection of the number of the common prior frames is too small, and the calculation time of the to-be-selected area is prolonged due to the fact that the selection is too large. And K is 5 by comprehensive consideration. The clustering centers obtained by the K-means clustering algorithm are shown in table 2, namely the prior frame width and height dimensions.
Table 2: clustering result table
Clustering center 1 Clustering center 2 Clustering center 3 Clustering center 4 Clustering center 5
(1.41,1.47) (1.33,1.35) (1.45,1.35) (1.57,1.60) (1.25,1.26)
FIG. 4 is a code diagram of other key parameter settings of the YOLO v2 model part.
For mesh training:
a deep convolutional neural network is actually a multi-layered perceptron that contains several hidden layers. Generally, in a multi-layer neural network, a certain neuron in an implied layer is connected with a local neuron in an upper layer, wherein each connection corresponds to a weight value. Thus, each pair of two connected neurons corresponds to a weight parameter value WiAll connected neuron weight parameter values WiThe combination constitutes the hyper-parameter W of the whole neural network. The training of the convolutional neural network actually means that a set of proper hyper-parameters W is searched by continuously optimizing the training data, so that the fitting performance of the training data and the convolutional neural network is optimal. The degree of fit of the predicted and true values of the model is typically evaluated using a loss function (cost function). The nature of the loss function is to represent the distance between the estimated value and the true value, and when a series of training sample data inputs are given, the hyper-parameter W can be used as an independent variable, and the expression form of the loss function is also various.
The common method for solving to obtain the minimum value of the loss function is a gradient descent strategy, and parameters of the neural network are adjusted in the negative gradient direction of the target. Let J (W) be a loss function with W as an argument, whichWhere W represents a set of weight values for the neural network. The specific process of calculating the minimum loss function by adopting the gradient descent method is that an initial value W is firstly set for W0Then substituted into W0Calculating the value of the loss function, obtaining an updated value of W according to a corresponding updating strategy, solving the value of the loss function again, and obtaining a new value of W through the operation of obtaining the loss function value through the previous calculation. These steps are performed in a loop until the loss function converges. The update strategy of the general neural network weight value W is to move a distance along the direction of the negative gradient, which is also called a learning rate and is generally expressed by η. The update of the weight value W is an iterative process, and the update formula can be expressed as:
Figure BDA0003023313640000121
wherein the content of the first and second substances,
Figure BDA0003023313640000122
representing the gradient of the neural network weight parameter W.
In the process of training the convolutional neural network, the gradient descent method can be divided into a small-batch gradient descent method, a batch gradient descent method and a random gradient descent method according to different data volumes of loss functions of the neural network.
In fact, a neural network can be seen from the perspective of a high-dimensional complex function, and the function may have saddle points, which may cause that a global minimum point of a loss function cannot be found by adopting a gradient descent strategy. As can be known from the weighted value parameter updating formula, the learning rate eta and the gradient of the weighted parameter W are main factors in the updating process, and therefore important attention is needed in the neural network training process.
For the learning rate, if the selection is too small, the convergence speed of the neural network target function is slow, and if the selection is too large, the target function may oscillate or diverge, so that the minimum value of the target function cannot be found. Therefore, the selection strategy or the updating strategy of the neural network learning rate is important. Common learning rate update strategies include Momentum Optimization (Momentum Optimization), neterov estimated Gradient acceleration (NAG), Ada Grad, RMSprop, and the like.
For each weight parameter W of the neural networkiGradient of (2)
Figure BDA0003023313640000131
The solution of (2) usually adopts an error inverse propagation algorithm (BP algorithm), which can realize the partial derivatives of all variables, and the realization process mainly comprises forward signal propagation and reverse error propagation. The forward signal propagation is used for calculating errors of output values and expected values obtained through a series of calculations of the neural hidden layer and the like, and the reverse error propagation is used for distributing the total errors of the output values and the expected values of the neural networks to each neuron of each layer of network and then updating each weight parameter corresponding to the neuron. The reverse error propagation is opposite to the forward signal propagation direction, namely the propagation starting point is the output layer, the error is transmitted back all the way along the reverse direction of the forward signal, and the transmission end point is the input layer. And (3) applying a gradient descent strategy, wherein in the continuous cyclic reciprocating transmission process of the forward signal and the reverse error, the total error is smaller than a specific value, and the training is finished.
For the loss function:
the loss function of YOLO v2 uses the mean square and error index, and consists of three parts, namely coordinate error, IOU error and classification error. When calculating the model IOU error, the IOU error of the target-containing cell and the IOU error of the non-target-containing cell have different influences on the network loss function value. If the adopted weights are the same, the confidence coefficient of the cell not containing the target is approximate to 0, and the influence of the confidence coefficient error of the cell containing the target object in the calculation of the parameter gradient of the neural network is indirectly amplified. For identification of target objects of equal error value, the effect of the error of the larger target object on the detection should be less than the effect of the error of the smaller target object on the detection. This is because the same positional deviation accounts for a much smaller proportion of larger targets than the same deviation. The loss function of YOLO v2 during training is shown as follows:
Figure BDA0003023313640000132
wherein (x)i,yi,wi,hi),Ci,pi(c) The values of the training labels are represented,
Figure BDA0003023313640000133
indicating the predicted value. First item
Figure BDA0003023313640000141
Judging whether the jth bounding box in the ith grid is in charge of the target: the largest bounding box of the IOU with the target's annotation data bounding box is responsible for the target,
Figure BDA0003023313640000142
representing a confidence prediction of the bounding box containing the object,
Figure BDA0003023313640000143
confidence prediction representing bounding box without object, fifth term
Figure BDA0003023313640000144
In (1)
Figure BDA0003023313640000145
Judging whether the center of the target falls into a grid i: the center of the mesh containing the object is responsible for predicting the class probability of the object.
Deriving batch normalized forward and backward propagation.
In Batch Normalization (BN) forward propagation, batch normalization accelerates the convergence speed of neural network training, effectively improves model accuracy, and derives and analyzes the batch normalization forward propagation process in the YOLO v2 algorithm.
The output of the I-layer network of the YOLO v2 algorithm model is assumed to be alphalSince the input of the network is the output of the network of the previous layer, the input of the model layer I network is alphal-1Let l be the rope of a batchFurthermore, the input to the l-th network can be described as
Figure BDA0003023313640000146
(i ═ 0,1, … m, m batch size).
Input alpha of l-th network datal-1Convolution transformation or multiplication by weight (the current layer is a full connection layer) output:
Figure BDA0003023313640000147
calculating to obtain the batch size ZlMean value of
Figure BDA0003023313640000148
Figure BDA0003023313640000149
Calculated to obtain the batch size zlVariance of (2)
Figure BDA00030233136400001410
Figure BDA00030233136400001411
Size of lot zlNormalized to obtain
Figure BDA0003023313640000151
Due to variance
Figure BDA0003023313640000152
Then, the nonlinear expression capability of the network is weakened, so that the method is suitable for
Figure BDA0003023313640000153
Scaling and migration transforms to distribute data among non-peersLinear region:
Figure BDA0003023313640000154
finally, the
Figure BDA0003023313640000155
Through the activation function, the output of the forward propagation of the l-th network is as follows:
Figure BDA0003023313640000156
the relationship between the I-layer network forward propagation process input and output of the YOLO v2 algorithm is described as follows:
Figure BDA0003023313640000157
in Batch Normalization (BN) back propagation, the purpose of back propagation is to solve the gradient of a network for network training, and chain derivation needs to be calculated in the process of solving the weight and the bias gradient, so that a loss function f (loss) is firstly calculated for ZlThe mean and variance of (c):
loss function f (loss) vs ZlVariance of (2)
Figure BDA0003023313640000158
Derivation:
Figure BDA0003023313640000159
in which are respectively aligned with
Figure BDA00030233136400001510
Chain derivation, obtaining:
Figure BDA00030233136400001511
further obtaining:
Figure BDA00030233136400001512
the formula is further simplified, and the variance derivative result of the loss function to zl is obtained:
Figure BDA0003023313640000161
loss function f (loss) vs ZlMean value of
Figure BDA0003023313640000162
Derivation:
Figure BDA0003023313640000163
intermediate variable of formula
Figure BDA0003023313640000164
To mean value
Figure BDA0003023313640000165
The derivation yields:
Figure BDA0003023313640000166
the pair-wise simplification process yields:
Figure BDA0003023313640000167
further expanding the formula to obtain:
Figure BDA0003023313640000168
further solving the equation to obtainTo the loss function pair ZlMean value of
Figure BDA0003023313640000169
And (3) derivation results:
Figure BDA00030233136400001610
loss function f (loss) pair
Figure BDA00030233136400001611
Derivation:
Figure BDA00030233136400001612
further solving the formula to obtain:
Figure BDA0003023313640000171
further solving the formula to obtain:
Figure BDA0003023313640000172
applying the chain-type derivation rule to the formula to obtain:
Figure BDA0003023313640000173
further solving the pair formula to obtain a loss function pair
Figure BDA0003023313640000178
And (3) derivation results:
Figure BDA0003023313640000174
calculate the gradient of the YOLO v2 model backpropagation weight values:
Figure BDA0003023313640000175
calculate the gradient of the YOLO v2 model back propagation bias:
Figure BDA0003023313640000176
calculating delta of backward propagation parameter of YOLO v2 modellGradient:
Figure BDA0003023313640000177
the YOLO v2 algorithm model is pre-trained on an ImageNet data set to obtain a classification network and initial weight parameters, and then the training set of the HUSTC605 data set for a specific project is trained for the second time on the basis of the classification network and the initial weight parameters to obtain a new network model. The key parameter settings during model training of the invention are shown in table 3:
table 3: key parameter setting table during training of YOLO algorithm model
Batch size Number of channels Constant rate of infusion Weight attenuation coefficient Learning rate
64 3 0.9 0.0005 0.001
In designing user target real-time detection and recognition system software, the system carries out interface design based on a UI graphical interface of Qt, the software runs in an Ubuntu system environment, as shown in figure 5, for a real-time monitoring and recognition system function flow chart, a specific design idea comprises the following steps:
s201, initializing a system;
s202, calibrating a camera;
s203, read in a frame of image, determine if it is empty?
S204, if the signal is empty, ending; if not, the target detection is performed and the mark frame is output, and the process returns to step S203 again.
The real-time detection and identification software also comprises a camera calibration module which has the function of calibrating an effective area for coordinate positioning of a target object block. Specific interfaces as shown in fig. 6, the graphical real-time monitoring and recognition software interface diagram of the Qt-based UI is shown.
When the detection and identification are started, the parameters are set, the camera opening module is clicked, and pictures shot by the camera can be transmitted to the windows in real time, wherein the windows are arranged on the left side and the right side respectively. And the left picture window can display the original real-time video shot by the camera, and the right picture window can display the real-time detection and identification result of the system. And clicking the 'start detection' module, starting real-time detection and identification by the system, and displaying the coordinates of the target position in the interface in real time.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1. A generation and display system of a theater dynamic poster comprises a system platform, and is characterized in that: the system platform comprises an image capturing module, a function menu module, a dynamic poster display module and an infrared sensor area, wherein the actual distance between a user target and the system is judged through the infrared sensor, and is compared with a standard distance, wherein the standard distance is set as a fixed value;
when the actual distance is larger than the standard distance, the system is in a dynamic poster mode, namely the user target is at a longer distance, and the system captures gestures and image characteristics of the user target through an image capturing module to be used as input to control the system;
when the actual distance is smaller than the standard distance, the system is in a manual operation mode, namely, the user target selects and browses the playhouse poster content through the function menu module to check the detailed information of the playhouse drama.
2. A theatre dynamic poster generation and display system as claimed in claim 1 wherein: the system platform is in communication connection with the user terminal.
3. A theatre dynamic poster generation and display system as claimed in claim 1 wherein: the image capturing module is used for realizing real-time detection and recognition of a user target by adopting a rapid recognition and positioning method based on a YOLO v2 algorithm model, and is used for detecting and recognizing gestures and image characteristics of the user target;
the rapid identification and positioning method based on the YOLO v2 algorithm model comprises the following steps:
s1, designing a user target real-time detection and identification system based on a YOLO v2 algorithm;
s2, designing user target real-time detection and identification system software, and applying the software to a system platform;
4. a theatre dynamic poster generation and display system as claimed in claim 3 wherein: in said step S1, including,
establishing a YOLO v2 algorithm model, determining main parameters of the YOLO v2 algorithm model, and determining the width and height dimensions of a prior frame based on a K-means clustering algorithm;
analyzing a model loss function and a network training method, deducing batch normalization forward propagation and backward propagation, and solving the gradient of a weighted value, a bias and a linear scaling parameter of a YOLO v2 algorithm;
the method comprises the steps of taking the information of the whole image as direct input, converting a target recognition problem into a regression problem, dividing the input image into s x s unit cells by using an s x s network through a YOLO v2 algorithm, learning and extracting the features of each unit cell through a network training method, detecting the task of recognizing a target by using the corresponding unit cell if the target is detected to fall into one unit cell, directly predicting the position of each network for an output feature map, and obtaining the central coordinates (x) of a priori frame on the assumption that N boundary frames exist, wherein the central coordinates (x) of the priori frame can be obtained firstlya,ya) Width and height of prior frame (w)a,ha) And a target object confidence C.
5. A theatre dynamic poster generation and display system as claimed in claim 3 wherein: in said step S2, including,
designing user target real-time detection and recognition system software based on a UI graphical interface of Qt, wherein the software is used for testing the performance of a system, and the software interface comprises an interface display control, a camera calibration control and a parameter setting control;
the software runs under the environment of the Ubuntu system and comprises the following steps,
s201, initializing a system;
s202, calibrating a camera;
s203, read in a frame of image, determine if it is empty?
S204, if the signal is empty, ending; if not, the target detection is performed and the mark frame is output, and the process returns to step S203 again.
CN202110408710.8A 2021-04-16 2021-04-16 Generation and display system of theater dynamic poster Pending CN113191403A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110408710.8A CN113191403A (en) 2021-04-16 2021-04-16 Generation and display system of theater dynamic poster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110408710.8A CN113191403A (en) 2021-04-16 2021-04-16 Generation and display system of theater dynamic poster

Publications (1)

Publication Number Publication Date
CN113191403A true CN113191403A (en) 2021-07-30

Family

ID=76977038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110408710.8A Pending CN113191403A (en) 2021-04-16 2021-04-16 Generation and display system of theater dynamic poster

Country Status (1)

Country Link
CN (1) CN113191403A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102830801A (en) * 2012-08-03 2012-12-19 中国科学技术大学 Method and system for controlling digital signage by utilizing gesture recognition
CN103873696A (en) * 2014-03-27 2014-06-18 惠州Tcl移动通信有限公司 Method and system for operating mobile phone through gestures in different scene modes
US20140317576A1 (en) * 2011-12-06 2014-10-23 Thomson Licensing Method and system for responding to user's selection gesture of object displayed in three dimensions
CN107340852A (en) * 2016-08-19 2017-11-10 北京市商汤科技开发有限公司 Gestural control method, device and terminal device
CN107590449A (en) * 2017-08-31 2018-01-16 电子科技大学 A kind of gesture detecting method based on weighted feature spectrum fusion
CN107835359A (en) * 2017-10-25 2018-03-23 捷开通讯(深圳)有限公司 Triggering method of taking pictures, mobile terminal and the storage device of a kind of mobile terminal
CN109710071A (en) * 2018-12-26 2019-05-03 青岛小鸟看看科技有限公司 A kind of screen control method and device
KR20190066660A (en) * 2017-12-06 2019-06-14 서울과학기술대학교 산학협력단 The system of conference live streaming broadcasting
CN110443208A (en) * 2019-08-08 2019-11-12 南京工业大学 A kind of vehicle target detection method, system and equipment based on YOLOv2
CN114360047A (en) * 2021-11-29 2022-04-15 深圳市鸿合创新信息技术有限责任公司 Hand-lifting gesture recognition method and device, electronic equipment and storage medium
CN116092113A (en) * 2021-10-29 2023-05-09 Tcl科技集团股份有限公司 Gesture recognition method, gesture recognition device, electronic equipment and computer readable storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140317576A1 (en) * 2011-12-06 2014-10-23 Thomson Licensing Method and system for responding to user's selection gesture of object displayed in three dimensions
CN102830801A (en) * 2012-08-03 2012-12-19 中国科学技术大学 Method and system for controlling digital signage by utilizing gesture recognition
CN103873696A (en) * 2014-03-27 2014-06-18 惠州Tcl移动通信有限公司 Method and system for operating mobile phone through gestures in different scene modes
CN107340852A (en) * 2016-08-19 2017-11-10 北京市商汤科技开发有限公司 Gestural control method, device and terminal device
CN107590449A (en) * 2017-08-31 2018-01-16 电子科技大学 A kind of gesture detecting method based on weighted feature spectrum fusion
CN107835359A (en) * 2017-10-25 2018-03-23 捷开通讯(深圳)有限公司 Triggering method of taking pictures, mobile terminal and the storage device of a kind of mobile terminal
KR20190066660A (en) * 2017-12-06 2019-06-14 서울과학기술대학교 산학협력단 The system of conference live streaming broadcasting
CN109710071A (en) * 2018-12-26 2019-05-03 青岛小鸟看看科技有限公司 A kind of screen control method and device
CN110443208A (en) * 2019-08-08 2019-11-12 南京工业大学 A kind of vehicle target detection method, system and equipment based on YOLOv2
CN116092113A (en) * 2021-10-29 2023-05-09 Tcl科技集团股份有限公司 Gesture recognition method, gesture recognition device, electronic equipment and computer readable storage medium
CN114360047A (en) * 2021-11-29 2022-04-15 深圳市鸿合创新信息技术有限责任公司 Hand-lifting gesture recognition method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US20210326597A1 (en) Video processing method and apparatus, and electronic device and storage medium
CN110532900B (en) Facial expression recognition method based on U-Net and LS-CNN
CN106407889B (en) Method for recognizing human body interaction in video based on optical flow graph deep learning model
CN111709310B (en) Gesture tracking and recognition method based on deep learning
CN114220035A (en) Rapid pest detection method based on improved YOLO V4
CN105243154B (en) Remote sensing image retrieval method based on notable point feature and sparse own coding and system
CN104573706A (en) Object identification method and system thereof
CN106408030A (en) SAR image classification method based on middle lamella semantic attribute and convolution neural network
CN111738344A (en) Rapid target detection method based on multi-scale fusion
CN110796018A (en) Hand motion recognition method based on depth image and color image
CN111401293A (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
KR20200010672A (en) Smart merchandise searching method and system using deep learning
CN113378770A (en) Gesture recognition method, device, equipment, storage medium and program product
CN110096991A (en) A kind of sign Language Recognition Method based on convolutional neural networks
CN115223239B (en) Gesture recognition method, gesture recognition system, computer equipment and readable storage medium
Zhang et al. A Gaussian mixture based hidden Markov model for motion recognition with 3D vision device
Fei et al. Flow-pose Net: An effective two-stream network for fall detection
CN113963333B (en) Traffic sign board detection method based on improved YOLOF model
CN114764941A (en) Expression recognition method and device and electronic equipment
Cao et al. Effective action recognition with embedded key point shifts
Abdulhamied et al. Real-time recognition of American sign language using long-short term memory neural network and hand detection
Cao Face recognition robot system based on intelligent machine vision image recognition
CN116311518A (en) Hierarchical character interaction detection method based on human interaction intention information
CN112926681B (en) Target detection method and device based on deep convolutional neural network
CN113191403A (en) Generation and display system of theater dynamic poster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination