CN108833928B - Traffic monitoring video coding method - Google Patents

Traffic monitoring video coding method Download PDF

Info

Publication number
CN108833928B
CN108833928B CN201810720989.1A CN201810720989A CN108833928B CN 108833928 B CN108833928 B CN 108833928B CN 201810720989 A CN201810720989 A CN 201810720989A CN 108833928 B CN108833928 B CN 108833928B
Authority
CN
China
Prior art keywords
vehicle
coded
background
current
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810720989.1A
Other languages
Chinese (zh)
Other versions
CN108833928A (en
Inventor
刘�东
马常月
吴枫
彭秀莲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201810720989.1A priority Critical patent/CN108833928B/en
Publication of CN108833928A publication Critical patent/CN108833928A/en
Application granted granted Critical
Publication of CN108833928B publication Critical patent/CN108833928B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Abstract

The invention discloses a traffic monitoring video coding method, which is based on a vehicle and a background database to realize traffic monitoring video coding, can effectively remove global redundancy of a traffic monitoring video in a time dimension after paying a certain cost of storage space, and finally has the overall effect of effectively improving the overall coding performance of the traffic monitoring video under the condition of not obviously increasing the complexity of a coding end and a decoding end.

Description

Traffic monitoring video coding method
Technical Field
The invention relates to the technical field of video coding, in particular to a traffic monitoring video coding method.
Background
In recent years, with the rapid development of intelligent traffic, the data volume of monitoring videos shows explosive growth. In order to effectively store and transmit the monitoring video data, the problem to be solved is the coding problem of the monitoring video.
Currently, the compression of surveillance video usually adopts the general video coding standard H.264/AVC or H.265/HEVC. However, considering some characteristics of the surveillance video, such as the still of the surveillance camera, the common video coding technique is directly applied to the coding of the surveillance video, and the inherent characteristics of the surveillance video cannot be fully utilized. In order to further improve the performance of surveillance video coding, many researchers have invented a series of coding techniques for surveillance video.
Generally, the content in the surveillance video can be roughly divided into background content and foreground content. Accordingly, the encoding for the surveillance video can be designed from two aspects of optimizing background encoding and optimizing foreground encoding respectively. Considering the characteristic that a monitoring camera is static, the optimized background coding usually generates a high-quality background frame first, and then improves the coding efficiency of the whole monitoring video by quality transmission. In the aspect of optimizing foreground coding, researchers successively put forward some foreground coding technologies based on models and object segmentation.
There are some works that have proposed other surveillance video coding techniques, such as:
adaptive prediction technology based on Background modeling (Xiaoanguo Zhang, Tiejun Huang, Yonghong Tian, and WenGao, "Background-modeling-based adaptive prediction for passive analysis video coding," IEEE Transactions on imaging processing, vol.23, No.2, pp.769-784,2014.)
Global vehicle coding technology based on vehicle 3D model database (Jing Xiao, Ruimin Hu, LiangLiao, Yu Chen, ZhongyuanWang, and ZixiangXiong, "Knowledge-based coding of object for multiple resource and video data," IEEETransactions on Multimedia, vol.18, No.9, pp.1691-1706,2016.)
The disadvantages of the above method:
1. the background coding technology based on the high-quality background frame can cause the surge of code stream when generating the high-quality background frame, which causes adverse effect on network transmission, and the coding performance is to be improved.
2. The foreground encoding technique based on model and object segmentation has difficulty in finely segmenting the foreground at the pixel level, and the code rate for representing the foreground is very large because the segmented foreground may have irregular shape.
3. The self-adaptive prediction technology based on background modeling subtracts a reconstructed background frame from a current frame and a reference frame at the same time, and then directly uses the obtained current frame foreground pixels to perform inter-frame prediction on the reference frame foreground pixels when a foreground is coded. When the segmentation effect of the foreground pixels is not good, the improvement of the foreground coding efficiency is easily affected.
4. The global vehicle encoding technology based on the vehicle 3D model database cannot improve the reconstruction quality of the vehicle because the texture information of the vehicle is not stored. In addition, the 3D model of the vehicle, the internal and external parameters of the monitoring camera, and the position and attitude information of the vehicle on the road required by the technique are difficult to obtain or estimate, thereby bringing difficulties to the practical use of the technique.
Disclosure of Invention
The invention aims to provide a traffic monitoring video coding method which can improve the coding performance of traffic monitoring videos.
The purpose of the invention is realized by the following technical scheme:
a traffic monitoring video coding method mainly comprises the following steps:
step 1, processing an original traffic monitoring video sequence by adopting a foreground and background segmentation method, separating a vehicle and a background, and respectively removing redundancy existing between the separated vehicle and the background and then putting the vehicle and the background into a database.
2, for the traffic monitoring video to be coded, a foreground and background segmentation method is also adopted to separate the vehicle to be coded and the background to be coded; selecting matched vehicles from a database by adopting a characteristic matching and rapid motion estimation mode for the vehicles to be coded; and selecting a matching background from the database based on the absolute difference sum for the background to be coded.
Step 3, when an inter-frame prediction mode or an intra-frame prediction mode is adopted, judging whether the vehicle to be coded or the background to be coded needs to perform rate distortion optimization processing on the matched vehicle or the matched background by using a preset mode; and carrying out corresponding processing according to the judgment result, and encoding by using a corresponding prediction mode.
According to the technical scheme provided by the invention, the traffic monitoring video coding is realized based on the vehicle and the background database, the global redundancy of the traffic monitoring video in the time dimension can be effectively removed after a certain cost of storage space is paid out, and finally, the overall effect is that the overall coding performance of the traffic monitoring video is effectively improved under the condition that the complexity of a coding end and a decoding end is not obviously increased.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a traffic monitoring video encoding method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a traffic monitoring video coding framework according to an embodiment of the present invention;
fig. 3 is a flowchart of removing SIFT features from the background of a vehicle area according to an embodiment of the present invention;
FIG. 4 is a flow chart of vehicle and background similarity analysis provided by an embodiment of the present invention;
fig. 5 is a schematic diagram of reference index bit change information according to an embodiment of the present invention;
fig. 6 is a screenshot of a test sequence provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a traffic monitoring video coding method, which mainly comprises the following steps as shown in figure 1:
step 1, processing an original traffic monitoring video sequence by adopting a foreground and background segmentation method, separating a vehicle and a background, and respectively removing redundancy existing between the separated vehicle and the background and then putting the vehicle and the background into a database.
2, for the traffic monitoring video to be coded, a foreground and background segmentation method is also adopted to separate the vehicle to be coded and the background to be coded; selecting matched vehicles from a database by adopting a characteristic matching and rapid motion estimation mode for the vehicles to be coded; and selecting a matching background from the database based on the absolute difference sum for the background to be coded.
Step 3, when an inter-frame prediction mode or an intra-frame prediction mode is adopted, judging whether the vehicle to be coded or the background to be coded needs to perform rate distortion optimization processing on the matched vehicle or the matched background by using a preset mode; and carrying out corresponding processing according to the judgment result, and encoding by using a corresponding prediction mode.
The schematic diagram of the whole encoding framework is shown in fig. 2, wherein the lower part of the line is also the step 1, and the upper part of the line is also the steps 2 to 3.
For ease of understanding, the following description will be made in detail with respect to the above three steps.
Firstly, establishing a vehicle and background database.
In the embodiment of the invention, for an original traffic monitoring video sequence, a foreground segmentation method (for example, a SuBSENSE method) is adopted to separate vehicles in the original traffic monitoring video sequence, a background is extracted from a background model generated when the foreground is separated, and the vehicles and the background which belong to the front section of the video sequence are used for constructing a database. The main implementation process can refer to the following ways:
1. and establishing a vehicle database.
The preferred implementation of the vehicle database establishment is as follows:
after separating the vehicles from the front section of the original traffic monitoring video sequence and removing redundancy, numbering the vehicles from 1 to N; and N is the number of separated vehicles.
Initially, the vehicle in the database is empty; for a vehicle v with redundancy removediRemoving vehicles v by using a method based on an inverted listiAll other vehiclesSimilar vehicles are retrieved from vehiclesi1,vi2,...,vimWhere m is the number of similar vehicles.
To determine the size of m, consider vehicle viAnd vjThe number of matched SIFT features, and the manner of determining whether two SIFT features are matched may be implemented by conventional techniques, or by the manner described below in connection with vehicle matching.
Comparing vehicles v when retrieving similar vehiclesiAnd any one of the remaining vehicles vjNumber of SIFT features matched when vehicle viAnd a vehicle vjWhen the matched SIFT feature number meets the following formula, the vehicle v is connectedjPut in { vi1,vi2,...,vimIn the method, the following steps:
Nij≥β×Ni
Nij≥min(N0,Ni);
in the above formula, NijFor vehicles viAnd a vehicle vjNumber of matched SIFT features, NiFor vehicles viNumber of SIFT features in (1), β and N0Is constant, exemplary, β and N0May be set to 0.1 and 4, respectively. After the processing in the mode, the vehicle v can be obtainediCorresponding similar vehicle vi1,vi2,...,vim}。
And then, comparing the similarity of the pixel levels of the vehicles: for vehicle viIf the vehicle in the database is empty, the vehicle v is putiPutting the obtained product into a database; otherwise, the vehicle v is driveniAnd { vi1,vi2,...,vimVehicles already put into the database are compared for pixel level similarity, the similarity comparison uses a fast motion estimation method, and the loss function uses Sum of Absolute Differences (SAD).
The fast motion estimation mentioned here can be implemented by conventional techniques, and a specific fast motion estimation used in the vehicle matching described later can be used.
If calculatingThe obtained absolute difference and the average value are smaller than a set value (for example, 5), and it is determined that the two vehicles are similar at the pixel level. It will be understood by those skilled in the art that in the similarity calculation, the calculation target at each time is the vehicle viAnd { vi1,vi2,...,vimIn a certain vehicle already put into the database, when the sum of absolute differences is calculated, the vehicle v is put into the databaseiDivided into blocks of a certain size, vehicle viOne block on the image is put into a database to carry out rapid motion estimation on the whole image of one vehicle; as mentioned later for the 16x16 blocks, one SAD value is obtained for each 16x16 block; the absolute difference and mean, i.e. v, considered hereiAverage of the sum of absolute differences of all 16x16 blocks in (a).
If { v }i1,vi2,...,vimA plurality of (e.g., 10) consecutive vehicles that have been put into the database do not communicate with the vehicle viAt a pixel level, the vehicle v is similariPut into the database, otherwise, the vehicle viNot put into the database.
If the vehicle v is to be finally decidediPut into the database, then { v }i1,vi2,...,vimThe vehicles and the vehicle v in which the database has been putiMaking a comparison of the pixel level similarity, if any, to the vehicle viRemoving similar vehicles which are put into the database from the database if the vehicles are similar in pixel level; if the number of vehicles and the number of vehicles v are more than the total numberiAt the pixel level, the above checking process stops.
And processing each vehicle in the above manner, determining the vehicle finally put into the database, coding the vehicle and putting the vehicle into the database.
2. Background database establishment
For removing redundant background, a frame of background is taken at intervals (e.g., 20s) and encoded, and then placed in the database.
In practical application, after the monitoring camera is installed, the encoder firstly establishes the vehicle and the background database. And for the vehicle, the encoder performs high-quality encoding on the vehicle to be placed in the database according to the step of establishing the vehicle database, and places the encoded vehicle in the database. Meanwhile, information for identifying the vehicles is also coded into the code stream, and after the decoding end decodes the reconstructed image, the same vehicle database establishing process is carried out according to the decoded vehicle identification information; and for the background, the encoder performs high-quality encoding on the generated background frame at intervals according to the establishing step of the background database, and puts the encoded background into the database. Meanwhile, the high-quality coded background and the information for identifying the background are also coded into the code stream, and the decoder decodes a high-quality background frame according to the information and then puts the high-quality background frame into the database. In this way, the same vehicle and context database can be built at the codec end.
In the embodiment of the invention, the original traffic monitoring video sequence can be divided, and the former part of data is used for establishing a vehicle and background database; the latter part is used as the traffic monitoring video to be coded. Of course, the traffic monitoring video of the first day can also be used for building a vehicle and background database, and the data from the second day is used as the traffic monitoring video to be coded. The coder-decoder carries out coding and decoding work of the traffic monitoring video according to the method of the invention. A typical traffic surveillance video is typically stored for a period of several months, and the above-described process is repeated when the stored data is emptied.
Vehicle and background retrieval
1. And (5) vehicle retrieval.
1) Separation of the vehicle from the background and redundancy removal operations.
In the embodiment of the invention, the separation of the vehicle and the background and the redundancy removal operation are also required for the traffic monitoring video to be coded; this part of the operation is similar to the operation when the vehicle and context database are built. This procedure is preferably carried out as follows:
after the vehicles in the monitoring video sequence (original traffic monitoring video sequence or traffic monitoring video to be coded) are separated by adopting the SuBSENSE method, the pixels in a square area from the upper left corner to the lower right corner of the separated vehicles are used as the vehicles and the rest parts are used as the background because the shapes of the vehicles are possibly irregular. The SIFT features of the vehicles are extracted, the background SIFT features in the SIFT features are removed, and the flow of removing the background SIFT features is shown in figure 3.
When the SuBSENSE method is adopted to separate the vehicles, a relatively clean background frame is generated step by step. And extracting the vehicle from the monitoring video sequence, and extracting the background of the corresponding position on the background frame.
Taking a traffic monitoring video to be coded as an example, extracting SIFT features of a current vehicle to be coded and a corresponding background which are separated, and retrieving each SIFT feature extracted from the current vehicle to be coded in a certain position neighborhood range on the corresponding background by adopting the following formula:
(xsc-xsb)2+(ysc-ysb)2≤d2
wherein, xscAnd yscCoordinates, xs, representing SIFT features extracted from the vehicle currently to be codedbAnd ysbCoordinates representing SIFT features extracted from a corresponding background; d is the defined range of the location neighborhood; illustratively, d-5 may be set.
If the distance between the searched SIFT feature with the minimum normalized Euclidean distance and a certain SIFT feature of the current vehicle to be coded is smaller than a certain threshold value: dmin≤D1(ii) a Wherein D isminIs the normalized Euclidean distance between the SIFT feature with the minimum normalized Euclidean distance and a certain SIFT feature of the current vehicle to be coded, D1For the threshold, exemplary D may be set11.1 as the ratio; and if so, indicating that SIFT characteristics similar to the SIFT characteristics of the current vehicle to be coded exist in the background region, and removing the SIFT characteristics from the vehicle SIFT, wherein the corresponding SIFT characteristics of the current vehicle to be coded are background SIFT characteristics.
2) And performing coarse retrieval by adopting feature matching.
In the embodiment of the invention, SIFT features of vehicles (including vehicles in a database and vehicles to be coded) are extracted, vehicles in the database establish inverted list indexes based on the SIFT features, and for the vehicles to be coded, a plurality of candidate vehicles are roughly retrieved from the database based on SIFT feature matching. The preferred implementation of this process is as follows:
roughly selecting a plurality of candidate vehicles from a database by adopting a characteristic matching mode: quantizing SIFT features of each vehicle in the database into visual words by adopting a k-means algorithm, and calculating a corresponding mapping mean vector for each visual word; mapping each SIFT feature of each vehicle in the database to the nearest neighbor visual character, and comparing the mapped SIFT feature vectors with the mapping mean vector corresponding to the nearest neighbor visual character to obtain the binarization representation of each SIFT feature vector; and simultaneously, representing each vehicle in the database by using a frequency histogram of visual characters corresponding to SIFT features of the vehicle, and organizing the frequency histogram of each vehicle in the database in an inverted list mode.
For the current vehicle to be coded, each SIFT feature of the current vehicle to be coded is distributed to the nearest visual characters according to the method for processing the vehicle in the database, so that a frequency histogram of the current vehicle to be coded is obtained, and meanwhile, the binarization representation of each SIFT feature is calculated.
When the similarity between a current vehicle to be coded and a certain vehicle in a database is compared, under the condition that the Hamming distance of the binarization representation of the SIFT features mapped to the same visual characters is smaller than a certain threshold value, taking the distance of a frequency histogram weighted by a tf-idf (term frequency-inverse document frequency) item as an evaluation index of the similarity, and obtaining a comparison result of the similarity between the current vehicle to be coded and each vehicle in the database; and sorting according to the comparison result of the calculated similarity, and selecting a plurality of vehicles with the top similarity as candidate vehicles.
For example, in a particular implementation, 10 candidate vehicles may be retrieved.
3) Vehicle culling is performed using fast motion estimation.
In the embodiment of the invention, a matched vehicle is selected from a plurality of candidate vehicles by using a rapid motion estimation mode; the preferred implementation of this process is as follows:
a. and aligning the current vehicle to be coded with each candidate vehicle.
The preferred embodiment of the alignment is as follows:
for a certain SIFT feature of the current vehicle to be coded, calculating the distance between the certain SIFT feature and all SIFT features of each candidate vehicle, sorting the calculated distances from small to large, and if the following formula is met, judging that the corresponding SIFT feature of the current vehicle to be coded finds a matched SIFT feature in the corresponding candidate vehicle:
d1≤D2
d1/d2≤α;
wherein d is1And d2Respectively, a minimum and a second small distance, D2And α are constants;
calculating each SIFT feature of the current vehicle to be coded according to the mode to obtain SIFT matching pairs of the current vehicle to be coded and each candidate vehicle; according to the obtained result of SIFT feature matching pairs, calculating the position offset of the current vehicle to be coded and each candidate vehicle, as shown in the following formula:
Figure GDA0002422074410000071
Figure GDA0002422074410000072
wherein, MVxAnd MVyFor the horizontal and vertical components of the offset, n is the number of matched SIFT feature pairs, xciAnd yciAs coordinates of the SIFT feature of the vehicle currently to be encoded, xviAnd yviCoordinates of SIFT features of the candidate vehicle; i is the serial number of SIFT feature matching pair;
removing abnormal points in an iteration mode to obtain a final position offset result; and aligning the current vehicle to be coded with the corresponding candidate vehicle according to the calculated position offset result.
The anomaly point may be determined by: if the motion vector calculated by a certain pair of SIFT matching pairs deviates far from the mean motion vector (i.e. exceeds a set value), the SIFT feature matching pair is an outlier.
b. Dividing the current vehicle to be coded into blocks with the fixed size of 16x16, and searching a block with the minimum loss function in a certain candidate vehicle by each block with the size of 16x16, wherein the loss function consists of the sum of absolute differences and the coding code rate of a motion vector; the searching mode is that the position of the current 16x16 block is taken as a starting point, eight-point diamond type searching is carried out in the range of 64 pixels up, down, left and right around the starting point, and the loss functions of all the 16x16 blocks are accumulated to be used as the whole loss function of the whole current vehicle to be coded on a candidate vehicle; and finally, the candidate vehicle with the minimum overall loss function is reserved as the matching vehicle.
2. And (5) retrieving the background.
In the embodiment of the present invention, for the background to be encoded, based on the absolute difference and the matching background selected from the database, the preferred implementation manner of this process is as follows:
taking the absolute difference sum of the current background to be coded and the pixels at the corresponding positions of the background in the database as a similarity evaluation criterion, calculating the absolute difference sum of the current background to be coded and each background in the database, as shown in the following formula:
SAD=∑k∈B|pck-plk|;
wherein, pckAnd plkRespectively representing the current background to be coded and the k pixel value of the background in the database, wherein B is the set of the current background pixel to be coded;
and sorting the calculation results from small to large, and taking the background with the minimum absolute difference as the matching background of the current background to be coded.
And thirdly, encoding.
1. And (5) analyzing the similarity.
In the embodiment of the invention, after the matched vehicle and background of the current vehicle to be coded and the background are determined, whether the Rate Distortion Optimization (RDO) is carried out on the matched vehicle and background by the current vehicle and the background is determined. When the current vehicle and the background adopt an inter-frame prediction mode, RDO comparison is carried out on the matched vehicle and the background and the existing reference frame information of the current vehicle and the background; and when the current vehicle and the background adopt the intra-frame prediction mode, RDO comparison is carried out on the candidate vehicle and the background and the rough intra-frame prediction mode of the current vehicle and the background. The detailed flow of the vehicle and background similarity analysis is shown in fig. 4. The comparison of RDO in the inter-frame and intra-frame prediction modes will be described in detail below.
1) Comparison of RDO in inter prediction mode.
The comparison criterion for rate-distortion optimization in the inter-prediction mode is as follows:
Figure GDA0002422074410000091
wherein, I is a Lagrange loss function, D is the sum of absolute differences of the prediction block and the matching block, R is the bit number used for representing the mode information, and lambda is a Lagrange multiplier;
in order to compare a matched vehicle with a background and an existing reference frame, a Lagrangian loss function of the current vehicle to be coded, the background and the existing reference frame is obtained through calculation, an updated Lagrangian loss function is obtained after the matched vehicle and the background obtained through retrieval are considered through calculation, the Lagrangian loss functions before and after updating are compared, and whether RDO is carried out on the matched vehicle and the background is determined. The preferred implementation of this process is as follows:
a. calculating Lagrange loss functions of the current vehicle to be encoded and the current background to be encoded and the current reference frame:
for each existing reference frame of the current vehicle to be coded, firstly estimating the displacement of the current vehicle on the existing reference frame, then obtaining the optimal RDO result of the current vehicle on the existing reference frame, and finally comparing the optimal RDO result with the optimal RDO result of the current vehicle on the candidate matching vehicle to determine whether to perform RDO on the candidate matching vehicle, wherein the correlation process is as follows:
the Motion Vector (MV) of the block adopting inter-frame prediction 4 × 4 at the corresponding position of the current vehicle to be coded and the image number (POC) information of the reference frame thereof are obtained by taking the block of 4 × 4 as a unit, and based on the Motion Vector (MV) and the POC information, the Motion Vector information of the block corresponding to 4 × 4 on the current vehicle to be coded is estimated, and the estimation formula is as follows:
Figure GDA0002422074410000092
Figure GDA0002422074410000093
wherein, MVXrefAnd MVYrefHorizontal and vertical components of the block motion vector, POC, of inter prediction 4 × 4 on an existing reference frame, respectivelycur、POCrefAnd POCcolrefRespectively POC of the frame where the current vehicle to be coded is positioned, POC of the existing reference frame and POC of the block reference frame of inter-frame prediction 4 × 4 on the existing reference frame, MVXcurAnd MVYcurTraversing each 4x4 small block in the current vehicle to be coded, recording the number of blocks of inter-frame prediction 4 × and the motion vector of the corresponding block of the current vehicle to be coded 4 ×, and finally, taking the horizontal component and the vertical component of the finally estimated current vehicle to be coded as the average value of all the inter-frame prediction 4x4 small block motion vectors;
obtaining the displacement of the current vehicle to be coded on each existing reference frame, then dividing the current vehicle to be coded into blocks with the fixed size of 16x16, and sequentially searching the block with the minimum loss function in all the existing reference frames by each block with the size of 16x16, wherein the loss function consists of the sum of absolute differences and the coding code rate of a motion vector; the searching mode is that the position of the current 16x16 block after translation according to the estimated displacement is taken as a starting point, and eight-point diamond type searching is carried out in the range of 64 pixels around the starting point; taking a 16x16 block as a unit, recording the minimum loss function of all blocks in the current vehicle to be coded and the matching blocks in all the existing reference frames; sequentially traversing each 16x16 block in the current vehicle to be coded, accumulating the blocks to obtain the minimum loss function sum, and obtaining the Lagrangian loss function of the current vehicle to be coded relative to the current reference frame
Figure GDA0002422074410000101
For the current background to be coded, dividing the current background into 16x16 blocks; for the current 16x16 block, searching a matching block corresponding to the minimum loss function from all the existing reference frames; the searching mode is that the absolute difference sum of the 16x16 blocks at the corresponding positions of all the reference frames and the current 16x16 block in the current background to be coded is compared, and the minimum absolute difference sum is selected as the loss function of the current 16x16 block in the current background to be coded; traversing all 16x16 blocks in the background to be coded currently, and accumulating the loss functions of all 16x16 blocks as Lagrangian loss functions of the background to be coded currently
Figure GDA0002422074410000102
b. Taking the matched vehicle and background into account, calculating an updated Lagrangian loss function:
for each 16x16 block in the vehicle currently to be encoded, the lagrange loss function
Figure GDA0002422074410000103
On the basis of the calculation result, calculating a loss function of the vehicle and the matched vehicle by adopting a rapid motion estimation method; then, the loss function of each 16x16 block and the matched vehicle is compared with the calculated Lagrange loss function
Figure GDA00024220744100001012
Comparing the obtained minimum loss function with the minimum loss function of the existing reference frame, and taking the smaller one as the minimum loss function of the corresponding 16x16 block; traversing each 16x16 block in the current vehicle to be coded, and accumulating the minimum loss function of each 16x16 block to obtain the Lagrangian loss function of the current vehicle to be coded
Figure GDA0002422074410000104
Meanwhile, for the current vehicle to be coded, the change of the bit number includes the position index information of the matched vehicle in the database, the position information of the matched vehicle in the reference frame, the reference index (index of the reference frame) bit change information and the CTU levelRepresenting information relating to the change in these bit numbers to a Lagrangian loss function
Figure GDA0002422074410000105
Combined to obtain an updated Lagrangian loss function
Figure GDA0002422074410000106
For each 16x16 block within the current context to be encoded, the lagrangian loss function
Figure GDA0002422074410000107
Calculating a loss function of the matched background on the basis of the calculation result; then, the loss function of each 16x16 block and the matched background is combined with the calculated Lagrangian loss function
Figure GDA0002422074410000108
Comparing the obtained minimum loss function with the minimum loss function of the existing reference frame, and taking the smaller one as the minimum loss function of the corresponding 16x16 block; traversing each 16x16 block in the current background to be coded, and accumulating the minimum loss function of each 16x16 block to obtain the loss function of the current background to be coded
Figure GDA0002422074410000109
Meanwhile, for the current background to be coded, the change of the bit number comprises the position index information and the reference index bit change information of the matched background in the database, and the bit number change and the Lagrangian loss function are carried out
Figure GDA00024220744100001010
Combined to obtain an updated Lagrangian loss function
Figure GDA00024220744100001011
The following description will be given by taking a bit number calculation method of reference index bit change information as an example:
as shown in fig. 5, for each 16 × 16 block in the current vehicle to be coded and the background, when calculating the minimum loss function between the block and the existing reference frame and between the matching vehicle and the background, if the matching block index corresponding to the minimum loss function is n-1, the bit number is added by 1, where n is the number of the existing reference frames; otherwise, if the matching block corresponding to the minimum loss function is on the matched vehicle or background, the bit number is increased by n-1-idx, wherein idx is the index of the matching block corresponding to the 16x16 block minimum loss function when the matched vehicle and background are not considered. In addition, the number of bits is unchanged. Traversing each 16x16 block within the current vehicle and context to be encoded, the final reference index bit change information bits is a summation of the block change bit numbers of each 16x 16. And combining the bit number change with the Lagrangian loss function obtained by the previous calculation to obtain an updated Lagrangian loss function corresponding to the current vehicle to be coded and the background.
Finally, the Lagrangian loss functions are compared
Figure GDA0002422074410000111
With updated lagrange loss function
Figure GDA0002422074410000112
The size between, if
Figure GDA0002422074410000113
Performing rate distortion optimization processing on the matched vehicle; comparison of Lagrange loss functions
Figure GDA0002422074410000114
With updated lagrange loss function
Figure GDA0002422074410000115
The size between, if
Figure GDA0002422074410000116
Then a rate distortion optimization process is performed on the matching background.
2. Comparison of RDO in intra prediction mode.
Similar to the inter prediction mode, the comparison criterion for rate-distortion optimization in the intra prediction mode is also expressed as:
Figure GDA0002422074410000117
wherein J is a Lagrangian loss function, D is the sum of absolute differences of the prediction block and the matching block, R is the number of bits used for representing mode information, and λ is a Lagrangian multiplier.
a. And for the current background to be coded, always performing rate distortion optimization processing on the matched background in the intra-frame prediction mode.
b. For the current vehicle to be coded, firstly, a loss function when the current vehicle to be coded adopts intra-frame prediction is roughly estimated: dividing a current vehicle to be coded into blocks with a fixed size of 16x16, and sequentially estimating a mean value mode (DC), a smoothing mode (planar), a horizontal intra-frame prediction mode and a vertical intra-frame prediction mode for each block of 16x16 to obtain the sum of absolute differences of each block of 16x16 corresponding to each mode; in intra prediction mode estimation, the reference pixel values of the current 16x16 block are deduced from the original values of the neighboring 16x16 blocks; for each 16x16 block, sorting the sum of absolute differences estimated in all modes in order from small to large, and taking the result with the smallest sum of absolute differences as the optimal matching result of the current 16x16 block; traversing all 16x16 blocks in the current vehicle to be encoded, accumulating the optimal matching result of each 16x16 block, and obtaining the Lagrangian loss function of the current vehicle to be encoded
Figure GDA0002422074410000118
Taking the matched vehicles into account, calculating an updated Lagrangian loss function: for each 16x16 block in the vehicle currently to be encoded, the lagrange loss function
Figure GDA0002422074410000119
On the basis of the calculation result, calculating a loss function (sum of absolute differences) of the matched vehicle by adopting a rapid motion estimation method; then each 16x16 block is matched with the loss function of the matched vehicleNumber, and calculating Lagrangian loss function
Figure GDA00024220744100001110
Comparing the obtained minimum absolute difference sum estimated by the intra-frame prediction, and taking the smaller one as the minimum loss function of the corresponding 16x16 block; traversing each 16x16 block in the current vehicle to be coded, and accumulating the minimum loss function of each 16x16 block to obtain the loss function of the current vehicle to be coded
Figure GDA0002422074410000121
Meanwhile, for the current vehicle to be coded, the change of the bit number comprises the position index information of the matched vehicle in the database, the position information of the matched vehicle in the reference frame and the CTU-level representation information, and the change of the bit number and the Lagrange loss function are combined
Figure GDA0002422074410000122
Combined to obtain an updated Lagrangian loss function
Figure GDA0002422074410000123
Comparison of Lagrange loss functions
Figure GDA0002422074410000124
With updated lagrange loss function
Figure GDA0002422074410000125
The size between, if
Figure GDA0002422074410000126
Then a rate-distortion optimization process is performed on the matching vehicle.
2. And (4) coding of the vehicle and the background.
1) When an inter-frame prediction mode is adopted, if the rate distortion optimization processing needs to be carried out on the matched vehicle or the matched background, a reference frame space is newly applied, and the matched vehicle or the matched background is attached to the newly applied reference frame and is used for inter-frame prediction of the current vehicle to be coded or the background to be coded together with the existing reference frame; after the interframe prediction is finished, traversing each 4x4 block covered by the current vehicle to be coded or the current background to be coded, and if a certain 4x4 block refers to the information of the current vehicle to be coded or the current background to be coded, coding a corresponding syntax element into a code stream;
2) when an intra-frame prediction mode is adopted, if the rate distortion optimization processing needs to be carried out on the matched vehicle or the matched background, a reference frame space is newly applied, and the matched vehicle or the matched background is attached to the newly applied reference frame for intra-frame prediction of the current vehicle to be coded or the background to be coded.
In the above two parts, the position of the matching vehicle attached to the reference frame of the new application is determined by the following formula:
x0=xc+MVx
y0=yc+MVy
wherein x is0And y0Indicating the location of the matching vehicle to attach to the newly applied reference frame, xcAnd ycIndicating the position, MV, of the current vehicle to be coded in the current framexAnd MVyThe horizontal component and the vertical component (obtained by the aforementioned fast motion estimation) of the offset of the current vehicle to be coded with respect to the matching vehicle;
when the matched background is pasted on the reference frame, the matched background is aligned with the position of the reference frame.
3. Coding code stream structure
In the embodiment of the invention, the structure of the coding code stream is divided into two layers, namely a slice (slice) and a tree coding unit (CTU); wherein:
slice layer: for the current vehicle to be coded, the slice layer comprises a flag (flag) which indicates whether a matching vehicle is referred to in the current slice layer; traversing 4x4 blocks covered by all vehicles in the current slice layer, and judging whether the blocks refer to matched vehicles, if a certain 4x4 block refers to a matched vehicle, marking the block as true, otherwise, marking the block as false; if the mark is true, the slice layer also comprises a syntax element which represents the number of the referenced matched vehicles in the current slice layer; for each matched vehicle, the position index of the matched vehicle in the database and the position of the matched vehicle attached to the reference frame of the new application are coded into a code stream, and the number of the referenced matched vehicles, the index of each matched vehicle and the position of each matched vehicle attached to the reference frame of the new application are coded in a fixed-length coding mode;
for the current background to be coded, the slice layer comprises a mark for indicating whether the matching background is referred in the current slice layer; traversing all 4x4 blocks covered by the background in the current slice layer, and judging whether the blocks refer to a matching background, if a certain 4x4 block refers to the matching background, marking the block as true, otherwise, marking the block as false; if the mark is true, the slice layer also contains a position index syntax element of the referenced matching background in the database, and the syntax element is coded by adopting a fixed-length coding mode;
and (3) CTU layer: for the current vehicle to be coded, the CTU layer comprises a mark for indicating whether the current CTU layer refers to the matched vehicle pixel or not; traversing each 4x4 block in the current CTU layer, if there is some 4x4 block that references a matching vehicle pixel, then marking as true, otherwise marking as false; when the flag is true, the CTU layer further includes a syntax element indicating a matching vehicle index (index);
for the current background to be encoded, the CTU layer contains a flag indicating whether the current CTU layer references matching background pixels.
On the other hand, the related tests are also performed in order to illustrate the coding performance of the above scheme of the present invention.
The test conditions included: 1) and (3) interframe configuration: random Access, RA; low latency B, Low-delayB, LDB; low latency P, Low-delay P, LDP. 2) The base quantization step (QP) is set to {27,32,37,42}, the software based on is HM16.7, and the test sequence is 14 test sequences taken for oneself, the screenshot of which is shown in fig. 6. The results are shown in tables 1 and 2.
Wherein, table 1 is the performance comparison result under the setting of RA, LDB, and LDP, and table 2 is the complexity comparison result of the encoding and decoding end under the setting of RA, LDB, and LDP.
Figure GDA0002422074410000131
TABLE 1 Performance comparison results under RA, LDB, LDP settings
Figure GDA0002422074410000141
Table 2 complexity contrast results of encoding and decoding end under RA, LDB and LDP settings
As can be seen from tables 1 to 2, the above scheme of the embodiment of the present invention can obtain code rate savings of 35.1%, 31.3% and 28.8.0% in RA, LDB and LDP modes, respectively, relative to HM16.7, and increase complexity at the encoding and decoding ends is within a reasonable range.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

1. A traffic monitoring video coding method is characterized by comprising the following steps:
processing an original traffic monitoring video sequence by adopting a foreground and background segmentation method, separating out vehicles and a background, and respectively removing redundancy existing between the separated vehicles and the background and then putting the vehicles and the background into a database;
for the traffic monitoring video to be coded, a foreground and background segmentation method is also adopted to separate the vehicle to be coded and the background to be coded; selecting matched vehicles from a database by adopting a characteristic matching and rapid motion estimation mode for the vehicles to be coded; selecting a matched background from a database on the basis of the sum of absolute differences for the background to be coded;
when an inter-frame prediction mode or an intra-frame prediction mode is adopted, judging whether the vehicle to be coded or the background to be coded needs to perform rate distortion optimization processing on a matched vehicle or a matched background by using a preset mode; performing corresponding processing according to the judgment result, and encoding by using a corresponding prediction mode;
wherein, the processing the original traffic monitoring video sequence by adopting the front background segmentation method to separate the vehicle and the background, and respectively removing the redundancy existing between the separated vehicle and the background and then putting the vehicle and the background into the database comprises the following steps:
numbering vehicles from 1 to N after the redundancy is removed;
initially, the vehicle in the database is empty; for a certain vehicle v with redundancy removediRemoving vehicles v by using a method based on an inverted listiSearching similar vehicles { v ] from all other vehiclesi1,vi2,…,vimWhere m is the number of similar vehicles;
comparing vehicles v when retrieving similar vehiclesiAnd any one of the remaining vehicles vjNumber of SIFT features matched when vehicle viAnd a vehicle vjWhen the matched SIFT feature number meets the following formula, the vehicle v is connectedjPut in { vi1,vi2,…,vimIn the method, the following steps:
Nij≥β×Ni
Nij≥min(N0,Ni);
in the above formula, NijFor vehicles viAnd a vehicle vjNumber of matched SIFT features, NiFor vehicles viNumber of SIFT features in (1), β and N0Is a constant;
then, forAnd (3) comparing the similarity of the pixel levels by the vehicle: for vehicle viIf the vehicle in the database is empty, the vehicle v is putiPutting the obtained product into a database; otherwise, the vehicle v is driveniAnd { vi1,vi2,…,vimComparing the pixel level similarity of vehicles which are put into the database, using a rapid motion estimation mode during similarity comparison, using a loss function as a sum of absolute differences, and judging that the two vehicles are similar at the pixel level if the calculated sum of absolute differences and the average value are smaller than a set value; if { v }i1,vi2,…,vimThe vehicles already put into the database are not connected with the vehicle v continuouslyiAt a pixel level, the vehicle v is similariPut into the database, otherwise, the vehicle viNot putting the data into a database; if the vehicle v is to be finally decidediPut into the database, then { v }i1,vi2,…,vimThe vehicles and the vehicle v in which the database has been putiMaking a comparison of the pixel level similarity, if any, to the vehicle viRemoving similar vehicles which are put into the database from the database if the vehicles are similar in pixel level; if the number of vehicles and the number of vehicles v are more than the total numberiStopping comparing the pixel level similarity of the vehicles when the pixel levels are not similar;
processing each vehicle in the above manner, determining the vehicle finally put into the database, coding the vehicle and putting the vehicle into the database;
and for the background after removing the redundancy, taking a frame of background at intervals, coding the background and then putting the background into a database.
2. The traffic monitoring video coding method according to claim 1, wherein when a front background segmentation method is adopted to separate the vehicle from the background, pixels in a square region from the upper left corner to the lower right corner of the separated vehicle are used as the vehicle, and the rest part is used as the background;
for the separated current vehicle to be coded and the corresponding background, SIFT features of the vehicle to be coded and the corresponding background are respectively extracted, and for each SIFT feature extracted from the current vehicle to be coded, the following formula is adopted to search in a certain position neighborhood range on the corresponding background:
(xsc-xsb)2+(ysc-ysb)2≤d2
wherein, xscAnd yscCoordinates, xs, representing SIFT features extracted from the vehicle currently to be codedbAnd ysbCoordinates representing SIFT features extracted from a corresponding background, and d is a defined range of a position neighborhood;
if the distance between the searched SIFT feature with the minimum Euclidean distance after normalization and a certain SIFT feature of the current vehicle to be coded is smaller than a threshold value, the fact that the SIFT feature similar to the SIFT feature of the current vehicle to be coded exists in the background region is indicated, the corresponding SIFT feature of the current vehicle to be coded is the background SIFT feature, and the SIFT feature is removed from the SIFT of the vehicle.
3. The traffic monitoring video coding method according to claim 1, wherein the selecting matched vehicles from the database by using the methods of feature matching and fast motion estimation for the vehicles to be coded comprises:
firstly, a plurality of candidate vehicles are roughly selected from a database in a characteristic matching mode: quantizing SIFT features of each vehicle in the database into visual words by adopting a k-means algorithm, and calculating a corresponding mapping mean vector for each visual word; mapping each SIFT feature of each vehicle in the database to the nearest neighbor visual character, and comparing the mapped SIFT feature vectors with the mapping mean vector corresponding to the nearest neighbor visual character to obtain the binarization representation of each SIFT feature vector; simultaneously, representing each vehicle in the database by using a frequency histogram of visual characters corresponding to SIFT features of the vehicle, and organizing the frequency histogram of each vehicle in the database in an inverted list mode; for the current vehicle to be coded, distributing each SIFT feature to the nearest visual characters according to the method for processing the vehicle in the database to obtain a frequency histogram of the current vehicle to be coded, and simultaneously calculating the binarization representation of each SIFT feature; when the similarity between a current vehicle to be coded and a certain vehicle in a database is compared, under the condition that the Hamming distance of the binarization representation of the SIFT features mapped to the same visual characters is smaller than a certain threshold value, the distance of a frequency histogram weighted by a tf-idf term is used as an evaluation index of the similarity, and the comparison result of the similarity between the current vehicle to be coded and each vehicle in the database is obtained; sorting according to the comparison result of the calculated similarity, and selecting a plurality of vehicles with the top similarity ranking as candidate vehicles;
then, a matching vehicle is selected from a plurality of candidate vehicles by using a rapid motion estimation mode: aligning a current vehicle to be coded with each candidate vehicle, dividing the current vehicle to be coded into blocks with the fixed size of 16x16, and searching a block with the minimum loss function in a certain candidate vehicle by each block with the size of 16x16, wherein the loss function consists of the sum of absolute differences and the coding code rate of a motion vector; the searching mode is that the position of the current 16x16 block is taken as a starting point, eight-point diamond type searching is carried out in the range of 64 pixels up, down, left and right around the starting point, and the loss functions of all the 16x16 blocks are accumulated to be used as the whole loss function of the whole current vehicle to be coded on a candidate vehicle; and finally, the candidate vehicle with the minimum overall loss function is reserved as the matching vehicle.
4. The traffic monitoring video coding method according to claim 3, wherein the alignment of the current vehicle to be coded with each candidate vehicle is performed as follows:
for a certain SIFT feature of the current vehicle to be coded, calculating the distance between the certain SIFT feature and all SIFT features of each candidate vehicle, sorting the calculated distances from small to large, and if the following formula is met, judging that the corresponding SIFT feature of the current vehicle to be coded finds a matched SIFT feature in the corresponding candidate vehicle:
d1≤D2
d1/d2≤α;
wherein d is1And d2Respectively, a minimum and a second small distance, D2And α are constants;
calculating each SIFT feature of the current vehicle to be coded according to the mode to obtain SIFT matching pairs of the current vehicle to be coded and each candidate vehicle; according to the obtained result of SIFT feature matching pairs, calculating the position offset of the current vehicle to be coded and each candidate vehicle, as shown in the following formula:
Figure FDA0002422074400000031
Figure FDA0002422074400000032
wherein, MVxAnd MVyFor the horizontal and vertical components of the offset, n is the number of matched SIFT feature pairs, xciAnd yciAs coordinates of the SIFT feature of the vehicle currently to be encoded, xviAnd yviCoordinates of SIFT features of the candidate vehicle; i is the serial number of SIFT feature matching pair;
removing abnormal points in an iteration mode to obtain a final position offset result; and aligning the current vehicle to be coded with the corresponding candidate vehicle according to the calculated position offset result.
5. The traffic monitoring video coding method according to claim 1, wherein selecting the matching context from the database based on the sum of absolute differences for the context to be coded comprises:
taking the absolute difference sum of the current background to be coded and the pixels at the corresponding positions of the background in the database as a similarity evaluation criterion, calculating the absolute difference sum of the current background to be coded and each background in the database, as shown in the following formula:
SAD=∑k∈B|pck-plk|;
wherein, pckAnd plkRespectively representing the current background to be coded and the k pixel value of the background in the database, wherein B is the set of the current background pixel to be coded;
and sorting the calculation results from small to large, and taking the background with the minimum absolute difference as the matching background of the current background to be coded.
6. The traffic monitoring video coding method according to claim 1, wherein when the inter-frame prediction mode is adopted, the determining whether the vehicle to be coded or the background to be coded needs to perform rate distortion optimization processing on the matching vehicle or the matching background by using a predetermined mode comprises:
the comparison criterion for rate-distortion optimization in the inter-prediction mode is as follows:
Figure FDA0002422074400000041
j is a Lagrange loss function, D is the sum of absolute differences of the prediction block and the matching block, R is the bit number used for representing mode information, and lambda is a Lagrange multiplier;
firstly, calculating Lagrange loss functions of a current vehicle to be encoded and a current background to be encoded and a current reference frame:
for each existing reference frame of the current vehicle to be coded, the motion vector of the block adopting inter-frame prediction 4 × 4 at the corresponding position of the current vehicle to be coded and the image number information of the reference frame are obtained by taking the block of 4 × 4 as a unit, and on the basis, the motion vector information of the block corresponding to 4 × 4 on the current vehicle to be coded is estimated, wherein the estimation formula is as follows:
Figure FDA0002422074400000042
Figure FDA0002422074400000043
wherein, MVXrefAnd MVYrefHorizontal and vertical components of the block motion vector, POC, of inter prediction 4 × 4 on an existing reference frame, respectivelycur、POCrefAnd POCcolrefThe image number of the frame where the current vehicle to be coded is located, the image number of the existing reference frame and the block reference frame of inter-frame prediction 4 × 4 on the existing reference frameThe image number of (1); MVXcurAnd MVYcurTraversing each 4x4 small block in the current vehicle to be coded, recording the number of blocks of inter-frame prediction 4 × and the motion vector of the corresponding block of the current vehicle to be coded 4 ×, and finally, taking the horizontal component and the vertical component of the finally estimated current vehicle to be coded as the average value of all the inter-frame prediction 4x4 small block motion vectors;
obtaining the displacement of the current vehicle to be coded on each existing reference frame, then dividing the current vehicle to be coded into blocks with the fixed size of 16x16, and sequentially searching the block with the minimum loss function in all the existing reference frames by each block with the size of 16x16, wherein the loss function consists of the sum of absolute differences and the coding code rate of a motion vector; the searching mode is that the position of the current 16x16 block after translation according to the estimated displacement is taken as a starting point, and eight-point diamond type searching is carried out in the range of 64 pixels around the starting point; recording the minimum loss function of all blocks in the current vehicle to be coded and the matching blocks of the blocks in all the existing reference frames by taking a 16x16 block as a unit; sequentially traversing each 16x16 block in the current vehicle to be coded, accumulating the blocks to obtain the minimum loss function sum, and obtaining the Lagrangian loss function of the current vehicle to be coded relative to the current reference frame
Figure FDA0002422074400000051
For the current background to be coded, dividing the current background into 16x16 blocks; for the current 16x16 block, searching a matching block corresponding to the minimum loss function from all the existing reference frames; the searching mode is that the absolute difference sum of the 16x16 blocks at the corresponding positions of all the reference frames and the current 16x16 block in the current background to be coded is compared, and the minimum absolute difference sum is selected as the loss function of the current 16x16 block in the current background to be coded; traversing all 16x16 blocks in the background to be coded currently, and accumulating the loss functions of all 16x16 blocks as Lagrangian loss functions of the background to be coded currently
Figure FDA0002422074400000052
Then, taking into account the matching vehicles and the background, an updated lagrangian loss function is calculated:
for each 16x16 block in the vehicle currently to be encoded, the lagrange loss function
Figure FDA0002422074400000053
On the basis of the calculation result, calculating a loss function of the vehicle and the matched vehicle by adopting a rapid motion estimation method; then, the loss function of each 16x16 block and the matched vehicle is compared with the calculated Lagrange loss function
Figure FDA0002422074400000054
Comparing the obtained minimum loss function with the minimum loss function of the existing reference frame, and taking the smaller one as the minimum loss function of the corresponding 16x16 block; traversing each 16x16 block in the current vehicle to be coded, and accumulating the minimum loss function of each 16x16 block to obtain the Lagrangian loss function of the current vehicle to be coded
Figure FDA0002422074400000055
Meanwhile, for the current vehicle to be coded, the change of the bit number comprises the position index information of the matched vehicle in the database, the position information of the matched vehicle in the reference frame, the reference index bit change information and the CTU-level representation information, and the change of the bit number and the Lagrangian loss function are combined
Figure FDA0002422074400000056
Combined to obtain an updated Lagrangian loss function
Figure FDA0002422074400000057
For each 16x16 block within the current context to be encoded, the lagrangian loss function
Figure FDA0002422074400000058
Calculating a loss function of the matched background on the basis of the calculation result; then, the loss function of each 16x16 block and the matched background is combined with the calculated Lagrangian loss function
Figure FDA0002422074400000059
Comparing the obtained minimum loss function with the minimum loss function of the existing reference frame, and taking the smaller one as the minimum loss function of the corresponding 16x16 block; traversing each 16x16 block in the current background to be coded, and accumulating the minimum loss function of each 16x16 block to obtain the loss function of the current background to be coded
Figure FDA00024220744000000510
Meanwhile, for the current background to be coded, the change of the bit number comprises the position index information and the reference index bit change information of the matched background in the database, and the bit number change and the Lagrangian loss function are carried out
Figure FDA00024220744000000511
Combined to obtain an updated Lagrangian loss function
Figure FDA00024220744000000512
Finally, the Lagrangian loss functions are compared
Figure FDA0002422074400000061
With updated lagrange loss function
Figure FDA0002422074400000062
The size between, if
Figure FDA0002422074400000063
Performing rate distortion optimization processing on the matched vehicle; comparison of Lagrange loss functions
Figure FDA0002422074400000064
With updated lagrange loss function
Figure FDA0002422074400000065
The size between, if
Figure FDA0002422074400000066
Then a rate distortion optimization process is performed on the matching background.
7. The traffic monitoring video coding method according to claim 1, wherein when the intra prediction mode is adopted, the determining whether the vehicle to be coded or the background to be coded needs to perform rate distortion optimization processing on the matching vehicle or the matching background by using a predetermined mode comprises:
the comparison criteria for rate-distortion optimization in intra prediction mode are:
Figure FDA0002422074400000067
j is a Lagrange loss function, D is the sum of absolute differences of the prediction block and the matching block, R is the bit number used for representing mode information, and lambda is a Lagrange multiplier;
for the current background to be coded, carrying out rate distortion optimization processing on the matched background all the time in an intra-frame prediction mode;
for the current vehicle to be coded, firstly, a loss function when the current vehicle to be coded adopts intra-frame prediction is roughly estimated: dividing a current vehicle to be coded into blocks with a fixed size of 16x16, and sequentially estimating DC, planar, horizontal and vertical intra-frame prediction modes for each block of 16x16 to obtain the sum of absolute differences of each block of 16x16 corresponding to each mode; in intra prediction mode estimation, the reference pixel values of the current 16x16 block are deduced from the original values of the neighboring 16x16 blocks; for each 16x16 block, sorting the sum of absolute differences estimated in all modes in order from small to large, and taking the result with the smallest sum of absolute differences as the optimal matching result of the current 16x16 block; go throughAccumulating the optimal matching result of each 16x16 block of all 16x16 blocks in the current vehicle to be coded to obtain the Lagrangian loss function of the current vehicle to be coded
Figure FDA0002422074400000068
Then, taking the matching vehicles into account, an updated lagrangian loss function is calculated: for each 16x16 block in the vehicle currently to be encoded, the lagrange loss function
Figure FDA0002422074400000069
On the basis of the calculation result, calculating and matching a loss function of the vehicle by adopting a rapid motion estimation method; then, the loss function of each 16x16 block and the matched vehicle is compared with the calculated Lagrange loss function
Figure FDA00024220744000000610
Comparing the obtained minimum absolute difference sum estimated by the intra-frame prediction, and taking the smaller one as the minimum loss function of the corresponding 16x16 block; traversing each 16x16 block in the current vehicle to be coded, and accumulating the minimum loss function of each 16x16 block to obtain the loss function of the current vehicle to be coded
Figure FDA00024220744000000611
Meanwhile, for the current vehicle to be coded, the change of the bit number comprises the position index information of the matched vehicle in the database, the position information of the matched vehicle in the reference frame and the CTU-level representation information, and the change of the bit number and the Lagrange loss function are combined
Figure FDA00024220744000000612
Combined to obtain an updated Lagrangian loss function
Figure FDA00024220744000000613
Finally, the Lagrangian loss functions are compared
Figure FDA0002422074400000071
With updated lagrange loss function
Figure FDA0002422074400000072
The size between, if
Figure FDA0002422074400000073
Then a rate-distortion optimization process is performed on the matching vehicle.
8. The traffic monitoring video coding method according to claim 1, 6 or 7, wherein the corresponding processing is performed according to the judgment result, the coding is performed by using the corresponding prediction mode, and the information of the matched vehicle or the matched background referred to in the coding is coded into the code stream together
When an inter-frame prediction mode is adopted, if the rate distortion optimization processing needs to be carried out on the matched vehicle or the matched background, a reference frame space is newly applied, and the matched vehicle or the matched background is attached to the newly applied reference frame and is used for inter-frame prediction of the current vehicle to be coded or the background to be coded together with the existing reference frame; after the interframe prediction is finished, traversing each 4x4 block covered by the current vehicle to be coded or the current background to be coded, and if a certain 4x4 block refers to the information of the current vehicle to be coded or the current background to be coded, coding a corresponding syntax element into a code stream;
when an intra-frame prediction mode is adopted, if the rate distortion optimization processing needs to be carried out on the matched vehicle or the matched background, a reference frame space is newly applied, and the matched vehicle or the matched background is attached to the newly applied reference frame for intra-frame prediction of the current vehicle to be coded or the background to be coded;
the location of the reference frame where the matching vehicle is attached to the new application is determined by:
x0=xc+MVx
y0=yc+MVy
wherein x is0And y0Indicating the location of the matching vehicle to attach to the newly applied reference frame, xcAnd ycIndicating the position, MV, of the current vehicle to be coded in the current framexAnd MVyThe horizontal component and the vertical component of the offset of the current vehicle to be coded relative to the matched vehicle;
when the matched background is pasted on the reference frame, the matched background is aligned with the position of the reference frame.
9. The traffic monitoring video coding method according to claim 8, wherein the structure of the coded stream is divided into two layers, a slice and a tree coding unit CTU; wherein:
slice layer: for the current vehicle to be coded, the slice layer comprises a mark for indicating whether a matched vehicle is referenced in the current slice layer; traversing 4x4 blocks covered by all vehicles in the current slice layer, and judging whether the blocks refer to matched vehicles, if a certain 4x4 block refers to a matched vehicle, marking the block as true, otherwise, marking the block as false; if the mark is true, the slice layer also comprises a syntax element which represents the number of the referenced matched vehicles in the current slice layer; for each matched vehicle, the position index of the matched vehicle in the database and the position of the matched vehicle attached to the reference frame of the new application are coded into a code stream, and the number of the referenced matched vehicles, the index of each matched vehicle and the position of each matched vehicle attached to the reference frame of the new application are coded in a fixed-length coding mode;
for the current background to be coded, the slice layer comprises a mark for indicating whether the matching background is referred in the current slice layer; traversing all 4x4 blocks covered by the background in the current slice layer, and judging whether the blocks refer to a matching background, if a certain 4x4 block refers to the matching background, marking the block as true, otherwise, marking the block as false; if the mark is true, the slice layer also contains a position index syntax element of the referenced matching background in the database, and the syntax element is coded by adopting a fixed-length coding mode;
and (3) CTU layer: for the current vehicle to be coded, the CTU layer comprises a mark for indicating whether the current CTU layer refers to the matched vehicle pixel or not; traversing each 4x4 block in the current CTU layer, if there is some 4x4 block that references a matching vehicle pixel, then marking as true, otherwise marking as false; when the flag is true, the CTU layer further includes a syntax element indicating a matching vehicle index;
for the current background to be encoded, the CTU layer contains a flag indicating whether the current CTU layer references matching background pixels.
CN201810720989.1A 2018-07-03 2018-07-03 Traffic monitoring video coding method Active CN108833928B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810720989.1A CN108833928B (en) 2018-07-03 2018-07-03 Traffic monitoring video coding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810720989.1A CN108833928B (en) 2018-07-03 2018-07-03 Traffic monitoring video coding method

Publications (2)

Publication Number Publication Date
CN108833928A CN108833928A (en) 2018-11-16
CN108833928B true CN108833928B (en) 2020-06-26

Family

ID=64135268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810720989.1A Active CN108833928B (en) 2018-07-03 2018-07-03 Traffic monitoring video coding method

Country Status (1)

Country Link
CN (1) CN108833928B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871024A (en) * 2019-01-04 2019-06-11 中国计量大学 A kind of UAV position and orientation estimation method based on lightweight visual odometry
CN111582251B (en) * 2020-06-15 2021-04-02 江苏航天大为科技股份有限公司 Method for detecting passenger crowding degree of urban rail transit based on convolutional neural network
CN112714322B (en) * 2020-12-28 2023-08-01 福州大学 Inter-frame reference optimization method for game video

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009111498A3 (en) * 2008-03-03 2009-12-03 Videoiq, Inc. Object matching for tracking, indexing, and search
CN104301735A (en) * 2014-10-31 2015-01-21 武汉大学 Method and system for global encoding of urban traffic surveillance video
CN104539962A (en) * 2015-01-20 2015-04-22 北京工业大学 Layered video coding method fused with visual perception features
CN105849771A (en) * 2013-12-19 2016-08-10 Metaio有限公司 SLAM on a mobile device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009111498A3 (en) * 2008-03-03 2009-12-03 Videoiq, Inc. Object matching for tracking, indexing, and search
CN105849771A (en) * 2013-12-19 2016-08-10 Metaio有限公司 SLAM on a mobile device
CN104301735A (en) * 2014-10-31 2015-01-21 武汉大学 Method and system for global encoding of urban traffic surveillance video
CN104539962A (en) * 2015-01-20 2015-04-22 北京工业大学 Layered video coding method fused with visual perception features

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Background-modeling-based adaptive prediction for surveillance video coding;Xianguo Zhang;《IEEE Transactions on Image Processing》;20141231;全文 *
Global coding of multi-source surveillance video data;Jing Xiao;《Data Compression Conference (DCC)》;20151231;全文 *
Knowledge-based coding of objects for multisource surveillance video data;Jing Xiao;《IEEE Transactions on Multimedia》;20161231;全文 *
Surveillance video coding with vehicle library;Changyue Ma;《2017 IEEE International Conference on Image Processing (ICIP)》;20170920;第1-4节 *

Also Published As

Publication number Publication date
CN108833928A (en) 2018-11-16

Similar Documents

Publication Publication Date Title
CN110087087B (en) VVC inter-frame coding unit prediction mode early decision and block division early termination method
JP6073404B2 (en) Video decoding method and apparatus
WO2018192235A1 (en) Coding unit depth determination method and device
CN108833928B (en) Traffic monitoring video coding method
CN107657228B (en) Video scene similarity analysis method and system, and video encoding and decoding method and system
EP1022667A2 (en) Methods of feature extraction of video sequences
US11070803B2 (en) Method and apparatus for determining coding cost of coding unit and computer-readable storage medium
EP3405904B1 (en) Method for processing keypoint trajectories in video
JP2015536092A5 (en)
CN103873861A (en) Coding mode selection method for HEVC (high efficiency video coding)
CN109040764B (en) HEVC screen content intra-frame rapid coding algorithm based on decision tree
CN112437310B (en) VVC intra-frame coding rapid CU partition decision method based on random forest
CN103020138A (en) Method and device for video retrieval
Ma et al. Traffic surveillance video coding with libraries of vehicles and background
CN113112519A (en) Key frame screening method based on interested target distribution
KR102261669B1 (en) Artificial Neural Network Based Object Region Detection Method, Device and Computer Program Thereof
Ma et al. Surveillance video coding with vehicle library
WO2013163197A1 (en) Macroblock partitioning and motion estimation using object analysis for video compression
CN113422959A (en) Video encoding and decoding method and device, electronic equipment and storage medium
Ma et al. An adaptive lagrange multiplier determination method for dynamic texture in HEVC
CN112770116B (en) Method for extracting video key frame by using video compression coding information
CN110519597B (en) HEVC-based encoding method and device, computing equipment and medium
CN109618152A (en) Depth divides coding method, device and electronic equipment
CN116962708B (en) Intelligent service cloud terminal data optimization transmission method and system
CN114095736B (en) Fast motion estimation video coding method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant