CN108470392B - Video data processing method - Google Patents

Video data processing method Download PDF

Info

Publication number
CN108470392B
CN108470392B CN201810254204.6A CN201810254204A CN108470392B CN 108470392 B CN108470392 B CN 108470392B CN 201810254204 A CN201810254204 A CN 201810254204A CN 108470392 B CN108470392 B CN 108470392B
Authority
CN
China
Prior art keywords
user
target
face
cloud
control unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810254204.6A
Other languages
Chinese (zh)
Other versions
CN108470392A (en
Inventor
李仁超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Jinhong network media Co.,Ltd.
Original Assignee
Guangzhou Jinhong Network Media Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Jinhong Network Media Co ltd filed Critical Guangzhou Jinhong Network Media Co ltd
Priority to CN201810254204.6A priority Critical patent/CN108470392B/en
Publication of CN108470392A publication Critical patent/CN108470392A/en
Application granted granted Critical
Publication of CN108470392B publication Critical patent/CN108470392B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C9/00Individual registration on entry or exit
    • G07C9/30Individual registration on entry or exit not involving the use of a pass
    • G07C9/32Individual registration on entry or exit not involving the use of a pass in combination with an identity check
    • G07C9/37Individual registration on entry or exit not involving the use of a pass in combination with an identity check using biometric data, e.g. fingerprints, iris scans or voice recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4014Identity check for transactions
    • G06Q20/40145Biometric identity checks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Accounting & Taxation (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a video data processing method, which comprises the following steps: recognizing that a user arrives at a card swiping site, and sending user identification information to a control unit of a settlement client; receiving user identification information and sending a face video frame acquisition control instruction to a face identification unit; the face recognition unit captures the face frame of the user and transmits the face video frame to the control unit; acquiring a facial frame of the user captured by a face recognition unit and recognizing a user identifier; transmitting the user ID passing through the card swiping station and settlement data of the user payment time and place to a transaction cloud of a subway ticket card settlement system; and the transaction cloud retrieves the user ID according to the user ID, calculates the charging required by the passengers entering and leaving the terminal according to a preset ticket purchasing mode and settlement data, and sends the charging value and the ticket purchasing mode to the passenger terminal. The invention provides a video data processing method, which does not need to increase IC equipment for users, saves a large amount of equipment cost, and improves the calculation efficiency and the passenger flow passing efficiency.

Description

Video data processing method
Technical Field
The present invention relates to video recognition, and in particular, to a method for processing video data.
Background
In modern cities, subways are increasingly widely used as a convenient, fast, stable and large-traffic vehicle. A large number of passengers get in or out of a subway station, how to ensure the running efficiency of the station and how to prevent crowding events are very important. For example, card swiping into and out of a station requires the operation of aligning the hand-held card with the sensing area, and a long queue is often required for card swiping into the station during rush hours. The user experience is poor. Ticket purchasing systems based on face recognition have been developed in the prior art, and pre-installed photographing devices are used to collect and recognize passengers. However, when the method is applied to an indoor multi-target scene, due to the characteristics of complex background, low quality, variable form and the like, the user and the crowd background are difficult to distinguish by using simple manually selected features, and the accuracy rate of segmentation and identification is low.
Disclosure of Invention
In order to solve the problems in the prior art, the present invention provides a method for processing video data, including:
step 1: a settlement client of the subway ticket card settlement system identifies that a user arrives at a card swiping station and sends user identification information to a control unit of the settlement client;
step 2: the control unit receives the user identification information sent by the settlement client triggering unit and sends a face video frame acquisition control instruction to the face identification unit;
and step 3: the face recognition unit acquires a control instruction according to a face video frame sent by the control unit, captures the face frame of the user and transmits the face video frame to the control unit;
and 4, step 4: the control unit acquires a facial frame of the user captured by the face recognition unit and recognizes a user identifier; transmitting the user ID passing through the card swiping station and settlement data of the user payment time and place to a transaction cloud of a subway ticket card settlement system;
and 5: the transaction cloud retrieves the user ID according to the user ID, calculates the charging required by the passengers entering and leaving the terminal according to a preset ticket purchasing mode and settlement data, and sends the charging value and the ticket purchasing mode to the passenger terminal;
step 6: the passenger terminal automatically pays and charges and uploads payment completion information to the transaction cloud;
and 7: the transaction cloud sends payment completion information to the control unit, and the control unit controls the barrier gate mechanism to be opened to allow passengers who complete payment.
Compared with the prior art, the invention has the following advantages:
the invention provides a video data processing method, which does not need to increase IC equipment for users, saves a large amount of equipment cost, and improves the calculation efficiency and the passenger flow passing efficiency.
Drawings
Fig. 1 is a flowchart of a method for processing video data according to an embodiment of the present invention.
Detailed Description
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details.
An aspect of the present invention provides a method for processing video data. Fig. 1 is a flowchart of a method for processing video data according to an embodiment of the present invention.
The subway ticket card settlement system comprises settlement clients arranged at all subway gates, an identity authentication cloud, a transaction cloud and passenger terminals. The passenger terminal and the settlement client are respectively in communication connection with the transaction cloud; the settlement client includes: the system comprises a triggering unit, a face recognition unit, an access control system for controlling the passing of a user and a control unit; the triggering unit is used for identifying that a user arrives at the card swiping station and sending user identification information to the control unit; the face recognition unit is used for acquiring a control instruction according to a face video frame sent by the control unit, capturing the face frame of the user and transmitting the face video frame to the control unit; the control unit is used for receiving the user identification information sent by the trigger unit and sending a face video frame acquisition control instruction to the face identification unit so as to control the face identification unit to capture the face frame of the user and identify the user identification; transmitting the user ID of the card swiping site, the payment time and place of the user and settlement data related to ticket buying of the user to a transaction cloud; the identity authentication cloud is used for registering personal ID by the user; the settlement data may include: and calculating the mileage charge according to the user inbound time and the user outbound time.
Before a passenger settles through a gate based on the system, the passenger registers personal information in the identity authentication cloud in advance, submits the facial image information of the user and associates the personal account. After the registration is successful, the user obtains the unique ID. The unique ID and the corresponding user information are stored in a database of the identity authentication cloud. When the user registers the personal information, the user can log in the identity authentication cloud through the passenger terminal to register the personal information. The transaction cloud is used for acquiring the user ID and the user information through the identity authentication cloud, the transaction cloud is provided with a ticket purchasing module, the ticket purchasing module calculates the charging required by the passengers entering and leaving the station according to a preset ticket purchasing mode, and sends the charging value and the ticket purchasing mode to the passenger terminal.
The passenger terminal enables a user to obtain the charging information and the ticket buying mode information, the passenger terminal automatically pays, and the payment completion information is uploaded to the transaction cloud. The control unit is also used for receiving confirmation information and authorization information sent by the transaction cloud, controlling the gate to be opened according to the confirmation information and the authorization information, and releasing passengers.
In this embodiment, data transmission is completed between the transaction cloud and the identity authentication cloud in the following manner, the transaction cloud accesses and acquires the user ID and the user information stored in the identity authentication cloud, and only the user ID and the user information stored in the identity authentication cloud can be accessed and acquired, and the identity authentication cloud cannot access and acquire the data information stored in the transaction cloud. The transaction cloud periodically pulls the user ID and the user face frame information from the identity authentication cloud, and updates the transaction cloud database.
The transaction cloud can be used as an independent server of a service provider, is connected with the identity authentication cloud through the Internet, and synchronously updates the user ID and the user information in the database. The transaction cloud includes a ticketing module. And the ticket buying module calculates the amount according to the requirements of the ticket buying mode and the access site.
Correspondingly, the invention also provides a subway ticket card settlement method based on the passenger terminal, which comprises the following steps:
step 1: the triggering unit identifies that the user arrives at the card swiping station and sends the user identification information to the control unit;
step 2: the control unit receives the user identification information sent by the trigger unit and sends a face video frame acquisition control instruction to the face identification unit;
and step 3: the face recognition unit acquires a control instruction according to a face video frame sent by the control unit, captures the face frame of the user and transmits the face video frame to the control unit;
and 4, step 4: the control unit acquires a facial frame of the user captured by the face recognition unit and recognizes a user identifier; transmitting the user ID passing through the card swiping site and user payment time and place settlement data to a transaction cloud;
and 5: the transaction cloud retrieves the user ID according to the user ID, calculates the charging required by the passengers entering and leaving the terminal according to a preset ticket purchasing mode and settlement data, and sends the charging value and the ticket purchasing mode to the passenger terminal;
step 6: the passenger terminal automatically pays and charges, and the payment completion information is uploaded to the transaction cloud;
and 7: the transaction cloud sends payment completion information to the control unit, and the control unit controls the barrier gate mechanism to be opened to allow passengers who complete payment.
Prior to step 1, the method further comprises: the user accesses the identity authentication cloud, registers personal information and user information, and associates a personal account; after the registration is successful, the user obtains a unique identity ID; the identity ID, the personal information submitted by the user and the corresponding user information are stored in a database of the identity authentication cloud.
After the photographing device is started to scan the video frame to be identified, the face frame of the video frame to be identified, which is located in the scanning area, is acquired. And extracting the characteristic pixel points of the frame to generate a characteristic set to be identified. Specifically, a corresponding scale space is generated according to the face frame, then local extreme points in the scale space are detected, and then the local extreme points are accurately positioned by removing points with contrast lower than a threshold and edge response points, so that feature pixel points capable of reflecting features of the face frame are finally obtained.
When describing the feature pixel points, the main direction of each extreme point is calculated, histogram gradient direction statistics is carried out on the area with the extreme points as the center, and a feature descriptor is generated. And generating a feature set to be identified by the feature pixel points.
And acquiring a sample feature set from the identity authentication cloud, and performing feature matching on the feature set to be identified and the sample feature set. Specifically, the feature set to be identified can be feature matched with the sample feature set by the following method. Counting the number of characteristic pixel points successfully matched with the characteristic set to be identified and the sample characteristic set, acquiring the number of target characteristic pixel points of the sample characteristic set as a first number, and calculating the ratio of the first matching pair number to the first number as the similarity. And finally, comparing the similarity with a second threshold, and if the similarity is greater than the second threshold, judging that the sample feature set is successfully matched.
And then, if the matching is successful, performing feature matching on the feature set to be identified and a verification feature set corresponding to the successfully matched sample feature set to calculate the identification similarity. Then, the number of the feature pixel points successfully matched is counted to serve as a second matching pair number, the number of the feature pixel points in the feature set to be identified is obtained to serve as a second number, and the number of the verification feature pixel points in the verification feature set is obtained to serve as a third number. And finally, calculating the ratio of the second matching pair quantity to the smaller value of the second quantity and the third quantity to serve as the identification similarity.
And finally, determining that the video frame to be identified contains the target identification user corresponding to the sample feature set if the identification similarity exceeds a first threshold value. Specifically, the target recognition users including the feature set corresponding to the sample in the video frame to be recognized can be determined through the following method. Firstly, whether the recognition similarity exceeds a first threshold value is judged, and if the recognition similarity exceeds the first threshold value, the number of verification feature sets with the recognition similarity exceeding the first threshold value is counted. Then, whether the number of the verification feature sets of which the verification feature sets exceed the first threshold is greater than 1 is judged, and if so, a sample feature set related to the verification feature set with the highest identification similarity is obtained. Further, if the verification feature set with the recognition similarity exceeding the first threshold does not exist, it is determined that the target recognition user corresponding to the sample feature set does not exist in the video sequence.
In the above video identification process, a sample feature set for feature matching needs to be generated in advance. Firstly, a to-be-processed facial frame is obtained, the to-be-processed facial frame comprises a target identification user, the target identification user comprises a target feature object and at least one verification feature object, a sample feature set is formed by target feature pixel points, feature pixel points of the verification feature object in a to-be-processed picture are extracted to serve as verification feature pixel points, the verification feature pixel points form a verification feature set, and the sample feature set of the verification feature object is obtained. And finally, associating the sample feature set with the verification feature set to form a sample feature set, wherein the sample feature set corresponds to the target recognition user. After all the facial frames to be processed are preprocessed to generate corresponding sample feature sets, all the sample feature sets are stored in the identity authentication cloud.
In the process of capturing the facial frame of the user by the face recognition unit, in order to reconstruct the background in a motion scene and effectively avoid the mixing phenomenon of a target and the background, the following method is adopted in the target positioning process:
(1) and establishing a video gray two-dimensional vector.
(2) And determining current frame and background pixel points by using the symmetric adjacent frame difference.
(3) And counting and updating the two-dimensional vector according to the determined background pixel points.
(4) The entire initial background was constructed.
Where the size of the input video frame is M x N, a two-dimensional vector LM is created, the value of each element LM (p, l) representing the total number of occurrences of the pixel value l (0 < l < 255) for the pixel at p in the video frame. Let the video sequence be (I)0,I1,I2,…,IT+1) I (p, t-1), I (p, t +1) represent the pixel value at the point p in the t-1, t, t +1 frame in the N +2 frame, then the forward and backward mask maps of the ith frame are:
Figure BDA0001608565300000071
Figure BDA0001608565300000072
wherein t is 1,2, …, N. Th-1(t),Th+1(t) are threshold values for whether or not the pixel value at the determination point p changes, respectively.
To D+1(p, t) and D-1(p, t) performing logic AND operation to obtain a mask map of the moving pixel point:
Figure BDA0001608565300000073
if for any point p, if OB (p, t) ═ 1, at D+1(p, t) and D-1The median values of (p, t) are all 1, and the current point p is a pixel point of the identified foreground. Otherwise, the current point p is a background pixel point.
Then, the two-dimensional vector LM is counted and updated: if OB (p, t) at point p is 0, adding 1 to the number of occurrences of pixel value l at p; otherwise, no processing is performed.
The selected T +2 frame is repeated through steps 2 and 3. Counting a two-dimensional vector LM according to pixel values, and taking the pixel value with the most occurrence times as an initial background pixel value of each pixel point p, thus finishing the whole initial background B (p), namely the two-dimensional vector LM
B(p)=max(LM(p,l))
After the initialization of the current background is completed, the background is automatically replaced in a self-adaptive mode along with the arrival of the next frame of image. And updating the background according to the information of target detection and tracking, and utilizing the following three-level algorithm.
(a) Background pixel label (gs), which indicates the number of times a certain pixel is a background pixel in the previous N frames:
Figure BDA0001608565300000074
(b) identification target label (ms), representing the number of times a certain pixel is divided into moving pixels:
Figure BDA0001608565300000081
(c) a change history label (hs) representing the number of frames that pixel x has elapsed since the previous marking as a foreground pixel:
Figure BDA0001608565300000082
let IM t(p) all pixels representing recognition targets, IB t(p) represents all pixels of the background, Ic BK(p) isBackground pixel currently in use, IBK(p) new background pixels. The judgment criterion is as follows:
if gs (p) > k > N, then IBK(p)=IB t(p)
If (gs (p) < k × N) # ms (p)<r × N), then IBK(p)=IM t(p)
IBK(p)=Ic BK(p)
The extraction of the identification target area is carried out on a real target mask image B, and a searching two-dimensional vector DB, a connected domain two-dimensional vector DF and a marking two-dimensional vector flag which have the same size as the real image B are createdW×HAnd initializing DB, DF is 0, initializing a connected component flag value L to be 1, scanning each row and each column of B, marking the scanned pixel point DB to be 1, and setting a flag when searching a seed point p1 of which the first B is 1 and DB is 0W×H(p) ═ L (L ═ 1,2, …, connected domain flag value). And carrying out eight neighborhood search on the point, marking the point which is consistent with B being 1 and DB being 0 until the marking of the whole area is completed. And marking the points meeting the requirements by using a connected domain two-dimensional vector DF, and setting DF to be 1. The value of L is reset to L for a point in the communication area, and finally L is set to L + 1.
In the previous step, marking of the first area is completed, scanning is continued for points in the image, and the next point with B equal to 1 and DB equal to 0 is searched. And detecting whether the point is the last point, and if not, continuing to scan each row and each column of the B.
And completing the marking of the connected domain, and simultaneously acquiring position and area information so as to facilitate subsequent feature extraction and motion area calculation processing.
For the object recognition of a complex scene, preferably, a preprocessing step of a video frame is further included before the recognition, which mainly includes the detection of a target edge, specifically as follows:
inputting a video frame subjected to gray processing, presetting an integral attenuation parameter and an attenuation coefficient, presetting a short-time FFT filter group of a plurality of direction parameters uniformly distributed along the circumference, and performing short-time FFT filtering on each pixel point in the video frame according to each direction parameter to obtain a short-time FFT energy value of each pixel point in each direction; selecting the maximum value in the short-time FFT energy values of all directions of each pixel point;
for each pixel point, carrying out segmentation processing on the maximum value in the short-time FFT energy values of each direction of each pixel point;
constructing a group of temporary windows by using a Gaussian difference template, wherein each temporary window has different deviation angles relative to a video picture window; for each pixel point, integrating and regularizing the temporary window response and a Gaussian difference template to obtain a group of regularized weight functions;
for each pixel point, under different deflection angles, multiplying the regularized weight function by the maximum value in the segmented short-time FFT energy values in each direction in the Gaussian difference template, and then summing to obtain the short-time FFT energy maximum value approximation result of each pixel point under each deflection angle; solving a standard deviation of a short-time FFT energy maximum value approximation result of each pixel point at each deflection angle;
for each pixel point, calculating by combining the standard deviation of the short-time FFT energy maximum value approximation result under each deflection angle and the integral attenuation parameter to obtain a standard deviation weight; multiplying the standard deviation weight value with the minimum value of the short-time FFT energy maximum value approximation result under each deflection angle to obtain the final result of the short-time FFT energy maximum value of the pixel point;
and for each pixel point, calculating the final result of the maximum value of the short-time FFT energy values in each direction and the maximum value of the short-time FFT energy in combination with the attenuation coefficient to obtain the edge identification value of the pixel point, and carrying out non-maximum attenuation and binarization on the edge identification values of all the pixel points of the video frame to obtain the edge identification image of the video frame.
The calculation of the maximum value in the short-time FFT energy values of each direction specifically includes:
defining a two-dimensional short-time FFT function expression:
Figure BDA0001608565300000101
wherein
Figure BDA0001608565300000102
Gamma is a constant representing the ratio of the long axis to the short axis of the elliptical field, lambda is the wavelength, sigma is the standard deviation of the short-time FFT function and the bandwidth of the Gaussian difference template window, 1/lambda is the spatial frequency of the cosine function, sigma/lambda is the bandwidth of the spatial frequency,
Figure BDA0001608565300000105
is a phase angle parameter, theta is an angle parameter of the short-time FFT filtering;
calculating e (x, y) ═ I (x, y)*f(x,y)
I (x, y) is the video frame, the convolution operator;
E(x,y;σ)=max{e(x,y)|i∈[1,Nθ]}
e (x, y; sigma) is the maximum value of the short-time FFT filtering energy value of each angle of the pixel point (x, y), NθThe number of angles theta.
The calculation process of the maximum value in the segmented short-time FFT energy values in each direction is as follows:
e (x, y, sigma) is segmented by utilizing the upper limit proportion and the lower limit proportion, the E (x, y, sigma) of each pixel point is selected from small to large, the E (x, y, sigma) with percentage number corresponding to the upper limit proportion is selected, and the maximum value is set as QH(ii) a E (x, y; sigma) of each pixel point is selected from small to large, E (x, y; sigma) with percentage corresponding to lower limit proportion is selected, and the maximum value is set as QL(ii) a Maximum value in the short-time FFT energy values of each direction after segmentation:
Figure BDA0001608565300000103
the expression of the Gaussian difference template is as follows:
Figure BDA0001608565300000104
wherein k is a parameter for controlling the size of the Gaussian difference template;
the expression of the temporary window response is as follows:
Figure BDA0001608565300000111
wherein d represents the distance from the center of the video picture to the temporary window;
the integration and regularization process of each pixel point comprises the following steps of performing regularized weight function expression, namely:
Figure BDA0001608565300000112
the calculation process of the short-time FFT energy maximum value approximation result under each deflection angle of each pixel point is as follows:
Figure BDA0001608565300000113
wherein-3 k σ < x' <3k σ; -3k σ < y' <3k σ, representing the range of the gaussian difference template;
the calculation process of the average Ave (x, y) and the standard deviation STD (x, y) of the short-time FFT energy maximum value approximation result under each deflection angle of each pixel point is as follows:
Figure BDA0001608565300000114
Figure BDA0001608565300000115
when the collected video frame information is analyzed based on the content, the method adopts the deep neural network to extract the crowd characteristics in the scene in real time, associates the crowd characteristics with the corresponding time information labels, and calculates the projection vector according to the position and the angle of the shooting equipment calibrated in advance so as to realize the conversion from a plurality of pixel coordinates to a uniform three-dimensional coordinate and associate the pixel coordinates with the three-dimensional coordinate labels. The method comprises two training steps: firstly, training a human body detector, then carrying out network compression to reduce the number of layers and channels and weight aggregation, and retraining according to the previous detection result to obtain a detector suitable for the current visual angle; specific feature detection is added on the basis of a crowd detection algorithm, and local features are described to serve as supplementary features of the overall features. Then, for each photographing device, a lightweight DNN based on the perspective is trained. And calibrating corresponding time information according to each target detection result, and calculating a projection vector by means of the position and the angle of the shooting equipment calibrated in advance, so that mapping from pixel coordinates to a three-dimensional position is realized, and the mapping is related to a three-dimensional coordinate label. Then, the mapping of the target from the pixel space to the three-dimensional space is realized through the three-dimensional position and the projection vector of the photographing device, and the conversion from a plurality of pixel coordinates to unified three-dimensional coordinates is realized.
And according to the crowd characteristics, carrying out single-lens tracking on the corresponding human body target to generate a human body target tracking path, and converting the human body target tracking path into a coordinate path of a three-dimensional space through coordinate mapping.
The identity authentication cloud receives a human body target tracking path returned by the settlement client, and aggregates the human body target tracking path to obtain an aggregated path, wherein the aggregated path specifically comprises the following steps:
(1) processing target path discontinuity caused by shielding and illumination problems, and realizing continuous path depiction through feature comparison;
(2) according to the motion direction information of the target projection, surrounding photographing equipment coverage is searched in the three-dimensional space, weight values are given to the photographing equipment according to the maximum possibility, and target aggregation is carried out based on the weight values.
And the identity authentication cloud respectively samples the human body target tracking path under each single lens according to the aggregation path obtained in the last step to serve as a characteristic basic library of the human body target, and corresponds the multi-lens aggregated target to the same library ID.
Wherein, sampling is carried out to human target tracking path under every single-lens, includes: the sequence is sampled through the target path. And sets a multi-shot object unified library ID management method.
The identity authentication cloud receives the crowd image to be retrieved, the features of the crowd image are extracted through DNN to serve as retrieval features, the retrieval features are compared with the stored feature base libraries, the successfully compared human body target paths are searched, the human body target paths are ranked according to the matching degree, and the retrieval result is returned.
Preferably, searching the successfully compared human body target paths, and sorting according to the matching degree comprises: according to the input crowd image to be retrieved, a two-stage retrieval mechanism is adopted, firstly, the target position with the highest matching degree is obtained, and then, retrieval is preferentially carried out on the basis of the periphery of the target.
In the process of constructing DNN, the whole DNN network is divided into a convolutional layer, a positioning layer and a matching layer, and the concrete analysis is as follows:
the convolution layer adopts a structure of 5 layers of convolution layers, Relu activating functions are used between the layers, and a maximum value cache layer is added after the first two layers of convolution layers. A series of image feature maps can be extracted through the convolutional layer, and a cache layer next to the last layer of the image is changed into the following mode, so that the finally obtained feature maps are uniform in size: if the final feature size requirement is W0,H0And when the size of the current feature map is { W, h }, defining the size of the current feature map as { W0/w,H0The sliding window of/h performs maximum value buffering processing.
The positioning layer adopts a sliding window for each dimension feature map obtained in the above way, and a low-dimension feature can be extracted for each sliding window. The invention carries out multi-scale sampling on the characteristic diagram to extract the characteristics of objects with different scales: and extracting K possible candidate sliding windows from the center point of each sliding window, and extracting at most W H K candidate sliding windows from the feature map with the size of W H. The K possibilities include a area scales and b aspect ratios, i.e.: k ═ a × b. And then inputting the extracted low-dimensional features into a sliding window regression layer and a sliding window meter layer respectively, and obtaining the position correction of K candidate sliding windows extracted from the central point of the sliding window and the score of whether the candidate sliding windows belong to the foreground target respectively, wherein the method can be realized by two parallel 1-by-1 fully-connected convolution layers. The sliding window regression layer further corrects the position of each candidate sliding window, outputs the corrected upper left corner and the corrected length and width correction value of the candidate sliding window, and constructs different regressors for K different candidate sliding windows, namely K regression quantities do not share a weight, so that candidate areas with different sizes can be predicted for each 3 x 3 sliding window. And judging whether each candidate sliding window belongs to the target detection area or not by the sliding window counting layer, and outputting scores of the candidate sliding windows respectively belonging to the foreground and the background. And finally, carrying out non-maximum attenuation processing on all the candidate sliding windows extracted by the sliding window, removing the regions with higher repetition degree, and finally extracting N candidate sliding windows with the highest scores as candidate regions to suggest to enter final target classification.
The matching layer carries out classification judgment on the candidate regions obtained by the positioning layer and further obtains positioning position correction, and firstly, the characteristics of the candidate regions need to be extracted. The feature map of the candidate region can be extracted by calculating the position of the candidate region in the feature map, so that the network only needs to calculate the feature map of the whole face frame once, and the positioning layer and the matching layer can share the feature map extracted by the convolutional layer. And respectively inputting the feature graph to a clustering layer and a position adjusting layer after passing through two full-connection layers, and respectively obtaining the category score and the position correction of the candidate region.
After a whole DNN network framework is constructed, a regression attenuation function of a positioning layer and a classification attenuation function of a matching layer are defined, so that an overall objective function of the whole network is obtained, and global end-to-end training of the whole network is realized; when supervised training is carried out, the training set needs to be labeled, and the labeled content comprises the category of the object and the position of the object. And for K candidate sliding windows extracted from each 3-by-3 sliding window, defining that the intersection degree of the candidate sliding windows with the actually marked sliding window is more than 0.8 as a positive sample, defining that the intersection degree is less than 0.3 as a negative sample, and discarding the rest.
The definition of the degree of intersection is:
Cm=ML∩CD/ML∪CD
wherein, ML is label, CD is candidate sliding window. Cm is the ratio of the area of the overlapped part of the two to the total occupied area of the two, IoU is 1 when the candidate sliding window and the label are completely overlapped, and IoU is 0 when the candidate sliding window and the label are not overlapped.
Defining its classification decay function as:
Lp(pi,pi *)=-log[pi *pi+(1-pi *)(1-pi)]
wherein p isiRepresenting the score of the ith candidate sliding window predicted as the target, i.e. the probability that it belongs to the target, pi *Represents a training label, which is 1 when the candidate sliding window is a positive sample and 0 when the candidate sliding window is a negative sample.
The regression decay function defining the sliding window regression network is:
Lr(ti,ti *)=pi *R(ti-ti *)
wherein, ti={tx,ty,tw,thDenotes the position coordinate information of the i-th candidate sliding window regression, respectively, ti *={tx *,ty *,tw *,th *Denotes the position coordinate information of the positive sample window.
Wherein, in training, a term p is introduced in the attenuation functioni *To ensure that the calculation of the regression decay function is only performed when the sliding window is a positive sample.
The function R takes the following function:
Figure BDA0001608565300000151
knowing the classification decay function and the regression decay function, the decay function of the localization layer can be defined as:
Figure BDA0001608565300000152
where p ∈ { p }i},t∈{tiAnd the parameter lambda is a weighting parameter of the two sub-attenuation functions.
At the matching layerThe method also comprises two parts of candidate region counting and region regression. If the network needs to construct a classifier for distinguishing M classes, after each candidate region passes through the matching layer, the score of whether the candidate region belongs to each class of the M classes and the score of whether the candidate region belongs to the background can be obtained, so that M +1 score values are obtained in the classifier, the sum of the score values is 1, and each score value also represents the probability score of whether the candidate region belongs to the class, namely c ═ c0,c1,...cM+1}。
And training the network by adopting a training set of the calibrated facial feature categories and the position information, thereby obtaining a network model for positioning and identifying the facial features. In training, if the candidate sliding windows are from the same face frame, the results of the previous convolutional layer computation may be shared. Because the network mainly comprises three parts of networks, a layer-by-layer progressive training mode is adopted, and the method specifically comprises the following steps:
1) the convolutional layer is trained first. Migration initialization is performed for convolutional layers. 2) And adding a positioning layer on the basis of the trained convolutional layer for training, fixing parameters of the convolutional layer, initializing the parameters of the positioning layer by adopting a random initialization mode, and adjusting the parameters of the positioning layer according to the defined attenuation function of the positioning layer. 3) Then, a matching layer is added, the convolution layer and the positioning layer parameters are fixed, the parameters of the matching layer are initialized in a random initialization mode, and the parameters of the matching layer are learned and adjusted according to the defined attenuation function of the matching layer. 4) And finally, carrying out end-to-end fine adjustment on the whole network according to the defined global network attenuation function to obtain a final training result.
After learning and training the network by the calibrated training set of facial feature categories and position information, a result of a network model can be obtained, and the model comprises numerical values of weights of each layer in the DNN. When the method is applied to practical application, the collected facial feature images are input to a network for forward transmission, and the output of the network is the N candidate regions with corrected positions and the category scores thereof.
Carrying out subsequent processing on the N candidate regions to obtain a final accurate recognition result, wherein the method comprises the following steps: 1) scoring each candidate region by M +1 categories, and selecting the highest scoring person as the category of the candidate region; 2) de-overlapping candidate regions of the same category: and calculating the repetition degree Cm value pairwise, and reserving the candidate region with high score when the repetition degree Cm value is greater than 0.7. 3) And in the facial feature recognition, all the facial features are not overlapped, and the residual candidate areas are subjected to full-class de-duplication processing to obtain the final positioning and recognition result of the network.
In summary, the present invention provides a method for processing video data, which does not require an additional IC device for a user, saves a lot of device costs, and improves the computation efficiency and the passenger traffic efficiency.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented in a general purpose computing system, centralized on a single computing system, or distributed across a network of computing systems, and optionally implemented in program code that is executable by the computing system, such that the program code is stored in a storage system and executed by the computing system. Thus, the present invention is not limited to any specific combination of hardware and software.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims (1)

1. A method for processing video data, comprising:
step 1: the user accesses the identity authentication cloud, registers personal information and user information, and associates a personal account; after the registration is successful, the user obtains a unique identity ID; the identity ID, the personal information submitted by the user and the corresponding user information are stored in a database of the identity authentication cloud; a settlement client of the subway ticket card settlement system identifies that a user arrives at a card swiping station and sends user identification information to a control unit of the settlement client;
step 2: the control unit receives the user identification information sent by the settlement client triggering unit and sends a face video frame acquisition control instruction to the face identification unit;
and step 3: the face recognition unit acquires a control instruction according to a face video frame sent by the control unit, captures a face frame of the user and transmits the face frame of the user to the control unit;
and 4, step 4: the control unit acquires a facial frame of the user captured by the face recognition unit and recognizes a user identifier; transmitting the user ID passing through the card swiping station and settlement data of the user payment time and place to a transaction cloud of a subway ticket card settlement system;
and 5: the transaction cloud calculates the charging required to be paid by the passengers entering and leaving the station according to the user ID and the preset ticket buying mode and settlement data, and sends the charging value and the ticket buying mode to the passenger terminal;
step 6: the passenger terminal automatically pays and charges and uploads payment completion information to the transaction cloud;
and 7: the transaction cloud sends payment completion information to the control unit, and the control unit controls the barrier mechanism to be opened to allow passengers who complete payment to pass;
the method further comprises the steps of obtaining a sample feature set from the identity authentication cloud, and carrying out feature matching on the feature set to be identified and the sample feature set; the sample feature set for feature matching needs to be generated in advance: firstly, acquiring a to-be-processed facial frame, wherein the to-be-processed facial frame comprises a target identification user, the target identification user comprises a target characteristic object and at least one verification characteristic object, forming target characteristic pixel points into a sample characteristic set, extracting characteristic pixel points of the verification characteristic object in a to-be-processed picture as verification characteristic pixel points, and forming the verification characteristic pixel points into verification characteristic sets to obtain the sample characteristic set of the verification characteristic object; finally, the sample feature set and the verification feature set are associated to form an associated sample feature set, and the associated sample feature set corresponds to the target recognition user; after all the facial frames to be processed are preprocessed to generate corresponding associated sample feature sets, all the sample feature sets are stored in an identity authentication cloud;
the identity authentication cloud receives a human body target tracking path returned by the settlement client, and aggregates the human body target tracking path to obtain an aggregated path, wherein the aggregated path specifically comprises the following steps:
(1) processing target path discontinuity caused by shielding and illumination problems, and realizing continuous path depiction through feature comparison;
(2) according to the motion direction information of the target projection, searching surrounding photographing equipment coverage in a three-dimensional space, endowing weight values to the photographing equipment according to the maximum possibility, and performing target aggregation based on the weight values;
the identity authentication cloud respectively samples a human body target tracking path under each single lens according to the aggregation path obtained in the last step to serve as a characteristic basic library of the human body target, and the multiple lens aggregated targets correspond to the same library ID;
wherein, sampling is carried out to human target tracking path under every single-lens, includes: performing sequence sampling through a target path; and a multi-shot target unified library ID management method is set;
the identity authentication cloud receives a crowd image to be retrieved, the characteristics of the crowd image are extracted through DNN to serve as retrieval characteristics, the retrieval characteristics are compared with a plurality of stored characteristic base libraries, human body target paths which are successfully compared are searched, the human body target paths are ranked according to the matching degree, and a retrieval result is returned;
wherein, searching and comparing successfully human target paths, and sequencing according to the matching degree comprises: according to the input crowd image to be retrieved, a two-stage retrieval mechanism is adopted, firstly, the target position with the highest matching degree is obtained, and then, retrieval is preferentially carried out on the basis of the periphery of the target.
CN201810254204.6A 2018-03-26 2018-03-26 Video data processing method Active CN108470392B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810254204.6A CN108470392B (en) 2018-03-26 2018-03-26 Video data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810254204.6A CN108470392B (en) 2018-03-26 2018-03-26 Video data processing method

Publications (2)

Publication Number Publication Date
CN108470392A CN108470392A (en) 2018-08-31
CN108470392B true CN108470392B (en) 2021-03-02

Family

ID=63265872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810254204.6A Active CN108470392B (en) 2018-03-26 2018-03-26 Video data processing method

Country Status (1)

Country Link
CN (1) CN108470392B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111223220B (en) * 2018-11-26 2022-07-12 阿里巴巴集团控股有限公司 Image correlation method and device and server
CN111027929B (en) * 2019-12-03 2023-08-29 交控科技股份有限公司 Subway ticket sorting method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090232365A1 (en) * 2008-03-11 2009-09-17 Cognimatics Ab Method and device for face recognition
KR101222100B1 (en) * 2011-06-28 2013-01-15 고려대학교 산학협력단 Apparatus for detecting frontal face
CN106056403B (en) * 2016-05-23 2019-09-24 青岛博宁福田智能交通科技发展有限公司 A kind of rail traffic expense of riding determines method and system
CN206209937U (en) * 2016-12-01 2017-05-31 长春理工大学光电信息学院 A kind of entrance guard device with recognition of face
CN106887058A (en) * 2017-01-09 2017-06-23 北京微影时代科技有限公司 Face identification method, device, access management system and gate

Also Published As

Publication number Publication date
CN108470392A (en) 2018-08-31

Similar Documents

Publication Publication Date Title
CN108256459B (en) Security check door face recognition and face automatic library building algorithm based on multi-camera fusion
WO2021170030A1 (en) Method, device, and system for target tracking
CN112862702B (en) Image enhancement method, device, equipment and storage medium
CN104813339B (en) Methods, devices and systems for detecting objects in a video
US8345921B1 (en) Object detection with false positive filtering
US20210027081A1 (en) Method and device for liveness detection, and storage medium
Farley et al. Real time IP camera parking occupancy detection using deep learning
US20140313345A1 (en) Flying object visual identification system
CN108416632B (en) Dynamic video identification method
KR102261880B1 (en) Method, appratus and system for providing deep learning based facial recognition service
JP6789876B2 (en) Devices, programs and methods for tracking objects using pixel change processed images
KR102333143B1 (en) System for providing people counting service
US20090010499A1 (en) Advertising impact measuring system and method
CN109657580B (en) Urban rail transit gate traffic control method
CN111241932A (en) Automobile exhibition room passenger flow detection and analysis system, method and storage medium
CN112633222B (en) Gait recognition method, device, equipment and medium based on countermeasure network
CN109902681B (en) User group relation determining method, device, equipment and storage medium
CN108470392B (en) Video data processing method
CN108416880B (en) Video-based identification method
AU2024201525B2 (en) Gate system, gate apparatus, image processing method therefor, program, and arrangement method for gate apparatus
CN110751226A (en) Crowd counting model training method and device and storage medium
CN113570530A (en) Image fusion method and device, computer readable storage medium and electronic equipment
CN111310751A (en) License plate recognition method and device, electronic equipment and storage medium
Voronov et al. Software Complex of Biometric Identification Based on Neural Network Face Recognition
CN115767424A (en) Video positioning method based on RSS and CSI fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210209

Address after: 511400 3011, 79 Wanbo 2nd Road, Nancun Town, Panyu District, Guangzhou City, Guangdong Province

Applicant after: Guangzhou Jinhong network media Co.,Ltd.

Address before: No.11, 10th floor, building 1, NO.666, Jitai Road, high tech Zone, Chengdu, Sichuan 610000

Applicant before: CHENGDU CINDA OUTWIT TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant