TW202301874A

TW202301874A - Video data transmission method, electronic device, and storage medium

Info

Publication number: TW202301874A
Application number: TW110129862A
Authority: TW
Inventors: 蔡福發; 張玉勇
Original assignee: 鴻海精密工業股份有限公司
Priority date: 2021-06-15
Filing date: 2021-08-12
Publication date: 2023-01-01
Also published as: CN115484376A

Abstract

A video data transmission method, an electronic device, and a storage medium are provided. The method includes: capturing a video of a scene, recognizing at least one moving target and background; recognizing moving target information and background information in each video; comparing the moving target information in the currently captured video with the moving target information in the previous video, to determine a attribute change value and a motion trajectory change value of the moving target; comparing the background information in the currently captured video with the background information in the previous video, to determine a change value of the background pixel; if at least one of the attribute change value, the motion trajectory change value, and the pixel change value is greater than or equal to a corresponding threshold, determining the attribute, motion trajectory, or background of the moving target corresponding to the at least one value as video data to be transmitted; transmitting the video data to a server.

Description

Video data transmission method, electronic device and storage medium

本申請涉及視頻處理技術領域，尤其涉及一種視頻資料傳輸方法、電子裝置及存儲介質。The present application relates to the technical field of video processing, and in particular to a video data transmission method, an electronic device and a storage medium.

視頻可以為用戶提供直觀、生動的資訊，因此視頻相關的流媒體技術及自媒體行業快速發展。視頻資料的傳輸通常包括採集端拍攝視頻，並將拍攝的視頻編碼傳輸至伺服器，再由伺服器將視頻資料分發至各個終端進行播放。然而，由於視頻資料佔用容量大，對終端的處理能力要求較高，並且視頻資料的傳輸對網路頻寬有極高的要求，容易造成傳輸延遲甚至網路擁堵，大量視頻資料的存儲也會給伺服器造成較大的存儲壓力。Video can provide users with intuitive and vivid information, so video-related streaming media technology and self-media industry are developing rapidly. The transmission of video data usually includes shooting video at the acquisition end, encoding and transmitting the captured video to the server, and then the server distributes the video data to each terminal for playback. However, due to the large capacity occupied by video data, the processing capability of the terminal is relatively high, and the transmission of video data has extremely high requirements on network bandwidth, which is likely to cause transmission delay or even network congestion, and the storage of a large amount of video data will also Causes greater storage pressure on the server.

有鑒於此，有必要提供一種視頻資料傳輸方法、電子裝置及存儲介質，可以僅對採集的視頻中變化較大的視頻資料進行傳輸。In view of this, it is necessary to provide a video data transmission method, an electronic device and a storage medium, which can only transmit the video data with large changes in the collected videos.

本申請提供一種視頻資料傳輸方法，包括：This application provides a video data transmission method, including:

藉由攝像裝置即時拍攝一場景的視頻，根據拍攝的視頻中的至少一段視頻識別所述場景內的運動目標及背景；Taking a video of a scene in real time by a camera device, and identifying moving objects and backgrounds in the scene according to at least one section of the video taken;

識別拍攝的每段視頻中的運動目標資訊及背景資訊，其中，所述運動目標資訊至少包括所述運動目標的屬性及運動軌跡，所述背景資訊至少包括所述背景的像素值；Identifying moving target information and background information in each segment of the video, wherein the moving target information includes at least the attributes and motion tracks of the moving target, and the background information includes at least the pixel value of the background;

將當前拍攝的一段視頻中的所述運動目標資訊與拍攝的上一段視頻中的所述運動目標資訊進行比對，以確定所述運動目標的屬性變化值及運動軌跡變化值；Comparing the information of the moving object in the currently captured video with the information of the moving object in the last video taken, to determine the attribute change value and the change value of the moving track of the moving object;

將當前拍攝的一段視頻中的所述背景資訊與拍攝的上一段視頻中的所述背景資訊進行比對，以確定所述背景的像素值變化值；Comparing the background information in a currently captured video with the background information in a previous video captured to determine the pixel value change value of the background;

若所述屬性變化值、所述運動軌跡變化值及所述像素值變化值中的至少一個值大於或等於對應的閾值，將所述至少一個值對應的所述運動目標的屬性、運動軌跡或所述背景作為待傳輸的視頻資料；If at least one of the attribute change value, the motion track change value, and the pixel value change value is greater than or equal to the corresponding threshold, the attribute, motion track, or The background is used as video material to be transmitted;

將所述待傳輸的視頻資料編碼傳輸至伺服器，藉由所述伺服器將接收到的視頻資料與所述上一段視頻融合並發送至接收端進行播放。The video data to be transmitted is coded and transmitted to the server, and the received video data is fused with the previous video by the server and sent to the receiving end for playback.

可選地，所述方法還包括：Optionally, the method also includes:

若所述屬性變化值、所述運動軌跡變化值及所述像素值變化值都小於對應的所述閾值，發送控制指令至所述伺服器，藉由所述伺服器將所述上一段視頻發送至接收端進行播放。If the attribute change value, the motion track change value and the pixel value change value are all smaller than the corresponding threshold value, send a control command to the server, and send the last video through the server to the receiver for playback.

可選地，所述方法還包括：Optionally, the method also includes:

將所述即時拍攝的視頻劃分為多個片段；Dividing the video captured in real time into a plurality of segments;

在未識別出所述場景內的運動目標時，將拍攝的第一段視頻編碼傳輸至所述伺服器。When the moving target in the scene is not recognized, the first segment of the captured video is encoded and transmitted to the server.

可選地，所述根據拍攝的視頻中的至少一段視頻識別所述場景內的運動目標及背景包括：Optionally, the identifying the moving target and the background in the scene according to at least one video in the captured video includes:

將拍攝的各段視頻輸入目標跟蹤模型，藉由所述目標跟蹤模型識別所述場景內的運動目標；Input each section of video taken into the target tracking model, and identify the moving target in the scene by the target tracking model;

將所述場景內所述運動目標之外的區域作為所述背景。Taking the area outside the moving target in the scene as the background.

可選地，所述識別拍攝的每段視頻中的運動目標資訊及背景資訊包括：Optionally, the moving target information and background information in each section of video taken by the identification includes:

對所述每段視頻中的視頻幀圖像進行圖像識別以獲取所述運動目標的屬性及所述背景的像素值。Image recognition is performed on the video frame images in each segment of video to obtain the attribute of the moving object and the pixel value of the background.

可選地，所述識別拍攝的每段視頻中的運動目標資訊及背景資訊還包括：Optionally, the moving target information and background information in each section of video taken by the identification also includes:

藉由圖像識別確定所述運動目標是否為人或動物；Determining whether the moving target is a human or an animal by image recognition;

若確定所述運動目標為人或動物，藉由姿態識別模型識別所述人或動物的關節，確定所述關節的座標，並根據所述關節的座標變化確定所述運動目標的運動軌跡；If it is determined that the moving target is a human or an animal, the joints of the human or animal are identified by a gesture recognition model, the coordinates of the joints are determined, and the trajectory of the moving target is determined according to the change in coordinates of the joints;

若確定所述運動目標不是人或動物，藉由目標跟蹤模型確定所述運動目標的矩形框，確定所述矩形框的中心點，並根據所述中心點的座標變化確定所述運動目標的運動軌跡。If it is determined that the moving object is not a person or an animal, the rectangular frame of the moving object is determined by the object tracking model, the center point of the rectangular frame is determined, and the movement of the moving object is determined according to the coordinate change of the center point track.

可選地，所述將當前拍攝的一段視頻中的所述運動目標資訊與拍攝的上一段視頻中的所述運動目標資訊進行比對包括：Optionally, comparing the moving target information in a currently captured video with the moving target information in a previous captured video includes:

計算當前拍攝的一段視頻中的所述運動目標的屬性與拍攝的上一段視頻中的所述運動目標的屬性之間的差值，作為所述運動目標的屬性變化值；Calculating the difference between the attribute of the moving object in the currently shot video and the attribute of the moving object in the last video shot, as the attribute change value of the moving object;

計算當前拍攝的一段視頻中的所述運動目標的運動軌跡與拍攝的上一段視頻中的所述運動目標的運動軌跡之間的差值，作為所述運動目標的運動軌跡變化值；Calculating the difference between the trajectory of the moving object in the currently captured video and the trajectory of the moving object in the previous video taken as the variation value of the trajectory of the moving object;

所述將當前拍攝的一段視頻中的所述背景資訊與拍攝的上一段視頻中的所述背景資訊進行比對包括：The comparing the background information in a currently captured video with the background information in a previous video captured includes:

計算當前拍攝的一段視頻中的所述背景的像素值與拍攝的上一段視頻中的所述背景的像素值之間的差值，作為所述背景的像素值變化值。Calculate the difference between the pixel value of the background in the currently shot video and the pixel value of the background in the last video shot, as the change value of the background pixel value.

可選地，所述方法還包括：Optionally, the method also includes:

判斷所述屬性變化值是否大於或等於第一閾值，判斷所述運動軌跡變化值是否大於或等於第二閾值，及判斷所述像素值變化值是否大於或等於第三閾值。Judging whether the property change value is greater than or equal to a first threshold, judging whether the motion track change value is greater than or equal to a second threshold, and judging whether the pixel value change value is greater than or equal to a third threshold.

本申請還提供一種電子裝置，包括：The present application also provides an electronic device, including:

處理器；以及processor; and

記憶體，所述記憶體中存儲有多個程式模組，所述多個程式模組由所述處理器載入並執行上述的視頻資料傳輸方法。A memory, wherein a plurality of program modules are stored in the memory, and the plurality of program modules are loaded by the processor to execute the above video data transmission method.

本申請還提供一種電腦可讀存儲介質，其上存儲有至少一條電腦指令，所述指令由處理器並載入執行上述的視頻資料傳輸方法。The present application also provides a computer-readable storage medium, on which at least one computer instruction is stored, and the instruction is loaded by a processor to execute the above video data transmission method.

上述視頻資料傳輸方法、電子裝置及存儲介質可以僅對採集的視頻中變化較大的視頻內容進行傳輸，減少了視頻傳輸的資料量，從而減輕資料處理壓力和網路傳輸壓力，同時減輕伺服器的存儲壓力。The above-mentioned video data transmission method, electronic device and storage medium can only transmit the video content with large changes in the collected video, which reduces the data volume of video transmission, thereby reducing the pressure of data processing and network transmission, and at the same time reducing the pressure of the server. storage pressure.

為了能夠更清楚地理解本申請的上述目的、特徵和優點，下面結合附圖和具體實施例對本申請進行詳細描述。需要說明的是，在不衝突的情況下，本申請的實施例及實施例中的特徵可以相互組合。In order to more clearly understand the above objects, features and advantages of the present application, the present application will be described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other.

在下面的描述中闡述了很多具體細節以便於充分理解本申請，所描述的實施例僅僅是本申請一部分實施例，而不是全部的實施例。基於本申請中的實施例，本領域普通技術人員在沒有做出創造性勞動前提下所獲得的所有其他實施例，都屬於本申請保護的範圍。A lot of specific details are set forth in the following description to facilitate a full understanding of the application, and the described embodiments are only a part of the embodiments of the application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

除非另有定義，本文所使用的所有的技術和科學術語與屬於本申請的技術領域的技術人員通常理解的含義相同。本文中在本申請的說明書中所使用的術語只是為了描述具體的實施例的目的，不是旨在於限制本申請。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein in the specification of the application are only for the purpose of describing specific embodiments, and are not intended to limit the application.

請參閱圖1所示，為本申請較佳實施方式提供的視頻資料傳輸方法的應用環境架構示意圖。Please refer to FIG. 1 , which is a schematic diagram of an application environment architecture of a video data transmission method provided by a preferred embodiment of the present application.

本申請中的視頻資料傳輸方法應用在電子裝置1中，所述電子裝置1與伺服器2及接收端3藉由網路建立通訊連接。所述網路可以是有線網路，也可以是無線網路，例如無線電、無線保真（Wireless Fidelity, WIFI）、蜂窩、衛星、廣播等。蜂窩網路可以是4G網路或5G網路。The video data transmission method in this application is applied in the electronic device 1, and the electronic device 1 establishes a communication connection with the server 2 and the receiving end 3 through the network. The network may be a wired network or a wireless network, such as radio, wireless fidelity (Wireless Fidelity, WIFI), cellular, satellite, broadcast, and the like. The cellular network can be a 4G network or a 5G network.

所述電子裝置1可以為安裝有視頻資料傳輸程式的電子設備，例如智慧手機、個人電腦、伺服器等，其中，所述伺服器可以是單一的伺服器、伺服器集群、雲端伺服器等。The electronic device 1 can be an electronic device installed with a video data transmission program, such as a smart phone, a personal computer, a server, etc., wherein the server can be a single server, a cluster of servers, or a cloud server.

所述伺服器2可以是單一的伺服器、伺服器集群、雲端伺服器等。所述接收端3為可以是智慧手機、個人電腦、智慧電視等。The server 2 can be a single server, a server cluster, a cloud server, and the like. The receiving end 3 may be a smart phone, a personal computer, a smart TV, and the like.

請參閱圖2所示，為本申請較佳實施方式提供的視頻資料傳輸方法的流程圖。根據不同的需求，所述流程圖中步驟的順序可以改變，某些步驟可以省略。Please refer to FIG. 2 , which is a flowchart of a video data transmission method provided by a preferred embodiment of the present application. According to different requirements, the order of the steps in the flowchart can be changed, and some steps can be omitted.

S201，藉由攝像裝置即時拍攝一場景的視頻，根據拍攝的視頻中的至少一段視頻識別所述場景內的運動目標及背景。S201. Shoot a video of a scene in real time by a camera device, and identify a moving target and a background in the scene according to at least one video in the shot video.

在一實施方式中，所述場景可以是教學課堂、電視節目等。S201包括：將即時拍攝的視頻劃分為多個片段，將拍攝的各段視頻輸入目標跟蹤模型，藉由所述目標跟蹤模型識別所述場景內的運動目標，將所述場景內所述運動目標之外的區域作為所述背景。在一實施方式中，將所述即時拍攝的視頻平均劃分為持續時間為預設時間的多個片段，可選地，所述預設時間為一秒。In an implementation manner, the scene may be a teaching class, a TV program, and the like. S201 includes: dividing the video captured in real time into a plurality of segments, inputting each segment of the captured video into a target tracking model, using the target tracking model to identify the moving target in the scene, and the moving target in the scene The area outside serves as the background. In an embodiment, the video shot in real time is evenly divided into a plurality of segments whose duration is a preset time, and optionally, the preset time is one second.

在一實施方式中，所述目標跟蹤模型為SA-Siam模型。請參閱圖3所示，為所述SA-Siam模型的結構示意圖。在所述SA-Siam模型中，S代表語義（Semantic）分支，包括圖3中實線連接的方框，A代表外觀（Appearance）分支，包括圖3中虛線連接的方框。A-Net為SiamFC網路，S-Net採用在ImageNet圖像分類任務中訓練的網路，並且凍結網路中的所有參數，藉由訓練語義分支中的融合模組（fuse）來實現遷移學習。語義分支中引入了一個注意力模組（attention），用於突出目標物體，同時弱化非目標物體。在SA-Siam模型中，兩個分支最後都藉由與第一視頻幀圖像中的目標在特徵空間做相關得到回應圖h，將兩個回應圖加權平均得到最終回應圖，找到最終回應圖中的最大響應點，並插值到當前視頻幀圖像中的位置，即可得到當前視頻幀的目標位置，從而完成當前視頻幀中目標的跟蹤，將目標在第一視頻幀和當前視頻幀中的位置進行對比即可確定目標是否產生運動，若目標產生運動，則確定目標是運動目標。In one embodiment, the target tracking model is an SA-Siam model. Please refer to FIG. 3 , which is a schematic structural diagram of the SA-Siam model. In the SA-Siam model, S represents the Semantic branch, including the boxes connected by solid lines in Figure 3, and A represents the Appearance branch, including the boxes connected by dotted lines in Figure 3. A-Net is the SiamFC network, and S-Net uses the network trained in the ImageNet image classification task, and freezes all parameters in the network, and realizes migration learning by training the fusion module (fuse) in the semantic branch . An attention module (attention) is introduced in the semantic branch to highlight target objects while weakening non-target objects. In the SA-Siam model, the two branches finally obtain the response map h by correlating with the target in the first video frame image in the feature space, and the weighted average of the two response maps is obtained to obtain the final response map, and the final response map is found The maximum response point in the current video frame, and interpolated to the position in the current video frame image, the target position of the current video frame can be obtained, so as to complete the tracking of the target in the current video frame, the target in the first video frame and the current video frame If the target moves, it is determined that the target is a moving target.

在一實施方式中，S201進一步包括：在未識別出所述場景內的運動目標時，將拍攝的第一段視頻編碼傳輸至所述伺服器2。可以理解的是，在未識別出所述場景內的運動目標時，視頻畫面視為沒有變化，故只需將拍攝的第一段視頻編碼傳輸至所述伺服器2，藉由所述伺服器2將所述第一端視頻解碼並持續分發至所述接收端3進行播放。In one embodiment, S201 further includes: when no moving object in the scene is recognized, encoding and transmitting the first segment of the captured video to the server 2 . It can be understood that when the moving target in the scene is not identified, the video picture is considered as unchanged, so it is only necessary to encode and transmit the first segment of the captured video to the server 2, and the server 2 2. Decoding the video at the first end and continuously distributing it to the receiving end 3 for playing.

S202，識別拍攝的每段視頻中的運動目標資訊及背景資訊。S202. Identify the moving target information and background information in each captured video.

在一實施方式中，所述運動目標資訊至少包括所述運動目標的屬性及運動軌跡，所述運動目標的屬性可以是數量、像素值、形狀面積中的至少一種，所述背景資訊至少包括所述背景的像素值。其中，所述形狀面積為所述運動目標的形狀所包含的像素點數量。In one embodiment, the moving object information includes at least the attribute and the moving track of the moving object, the attribute of the moving object may be at least one of quantity, pixel value, and shape area, and the background information includes at least the The pixel value of the background. Wherein, the shape area is the number of pixels contained in the shape of the moving target.

在一實施方式中，S202包括：對所述每段視頻中的視頻幀圖像進行圖像識別以獲取所述運動目標的屬性及所述背景的像素值。可選地，所述運動目標的屬性包括像素值和形狀面積。所述運動目標的像素值、形狀面積以及所述背景的像素值可以藉由圖像識別方法獲得，例如，所述圖像識別方法為OpenCV圖像識別。In one embodiment, S202 includes: performing image recognition on video frame images in each segment of video to acquire attributes of the moving object and pixel values of the background. Optionally, the attribute of the moving target includes pixel value and shape area. The pixel value and shape area of the moving object and the pixel value of the background can be obtained by an image recognition method, for example, the image recognition method is OpenCV image recognition.

在一實施方式中，S202進一步包括：藉由圖像識別確定所述運動目標是否為人或動物。若確定所述運動目標為人或動物，藉由姿態識別模型識別所述人或動物的關節，確定所述關節的座標，並根據所述關節的座標變化確定所述運動目標的運動軌跡。若確定所述運動目標不是人或動物，藉由目標跟蹤模型確定所述運動目標的矩形框，確定所述矩形框的中心點，並根據所述中心點的座標變化確定所述運動目標的運動軌跡。In one embodiment, S202 further includes: determining whether the moving target is a person or an animal by image recognition. If it is determined that the moving object is a human or an animal, the joints of the human or animal are identified by a gesture recognition model, the coordinates of the joints are determined, and the moving track of the moving object is determined according to the coordinate changes of the joints. If it is determined that the moving object is not a person or an animal, the rectangular frame of the moving object is determined by the object tracking model, the center point of the rectangular frame is determined, and the movement of the moving object is determined according to the coordinate change of the center point track.

請參閱圖4所示，在一實施方式中，所述姿態識別模型為open-pose模型。將一段視頻的多個視頻幀圖像輸入所述姿態識別模型，所述姿態識別模型採用VGG-19對所述視頻幀圖像進行預處理生成特徵映射F，將特徵映射F輸入第一階段（Stage 1），在所述第一階段生成PAFs（Part Affinity Fields，部分親和域），即像素點在人或動物骨架上的走向，在後續的每個階段（Stage t）將前一階段的預測與原始視頻幀圖像的特徵映射F相關聯，產生精細的預測，藉由反覆運算產生人或動物骨架上的特徵點，即關節點。所述關節點的座標為像素座標，即所述關節點對應的像素點位置，可以從識別出的關節點中選擇部分關鍵關節點，可選地，關鍵關節點包括頭部關節點、肘部關節點、手腕關節點及膝蓋關節點。藉由所述姿態識別模型追蹤所述關鍵關節點，根據所述關鍵關節點的座標變化計算各個關鍵關節點在該段視頻中移動的距離，作為所述運動目標的運動軌跡。需要說明的是，所述攝像裝置的拍攝範圍固定，在所述人或動物產生運動時，所述關鍵關節點的像素座標會產生變化。Please refer to FIG. 4 , in an implementation manner, the gesture recognition model is an open-pose model. Input multiple video frame images of a section of video into the gesture recognition model, the gesture recognition model uses VGG-19 to preprocess the video frame images to generate a feature map F, and input the feature map F to the first stage ( Stage 1), generate PAFs (Part Affinity Fields, partial affinity domain) in the first stage, that is, the direction of pixels on the human or animal skeleton, and in each subsequent stage (Stage t) the prediction of the previous stage Associated with the feature map F of the original video frame image, a fine prediction is generated, and the feature points on the human or animal skeleton, that is, the joint points, are generated through repeated operations. The coordinates of the joint points are pixel coordinates, that is, the pixel point positions corresponding to the joint points, and some key joint points can be selected from the identified joint points. Optionally, the key joint points include head joint points, elbow Joint points, wrist joint points and knee joint points. The key joint points are tracked by the gesture recognition model, and the moving distance of each key joint point in the video is calculated according to the coordinate changes of the key joint points as the movement track of the moving object. It should be noted that the shooting range of the camera device is fixed, and when the person or animal moves, the pixel coordinates of the key joint points will change.

在一實施方式中，若確定所述運動目標不是人或動物，藉由上述的SA-Siam模型識別所述運動目標，確定將所述運動目標包括在內的最小框為所述運動目標的識別框，將所述運動目標的識別框作為所述運動目標的矩形框，確定所述矩形框的中心點，並根據所述中心點的座標變化計算所述運動目標在該段視頻中移動的距離，確定所述運動目標的運動軌跡。其中，所述中心點的座標為像素座標，即所述中心點對應的像素點位置。In one embodiment, if it is determined that the moving target is not a person or an animal, the moving target is identified by the above-mentioned SA-Siam model, and the minimum frame including the moving target is determined as the identification of the moving target frame, using the identification frame of the moving target as the rectangular frame of the moving target, determining the center point of the rectangular frame, and calculating the moving distance of the moving target in the segment of video according to the coordinate change of the center point , to determine the trajectory of the moving object. Wherein, the coordinates of the central point are pixel coordinates, that is, the pixel position corresponding to the central point.

S203，將當前拍攝的一段視頻中的所述運動目標資訊與拍攝的上一段視頻中的所述運動目標資訊進行比對，以確定所述運動目標的屬性變化值及運動軌跡變化值。S203. Compare the information of the moving object in the currently captured video with the information of the moving object in the last video captured, so as to determine the attribute change value and the movement track change value of the moving object.

在一實施方式中，將當前拍攝的一段視頻中所述運動目標的屬性與拍攝的上一段視頻中所述運動目標的屬性進行比對，計算當前拍攝的一段視頻中所述運動目標的屬性與拍攝的上一段視頻中所述運動目標的屬性之間的差值，作為所述運動目標的屬性變化值。例如，計算當前拍攝的一段視頻中的所述運動目標的像素值與拍攝的上一段視頻中的所述運動目標的像素值之間的差值，作為所述運動目標的屬性變化值。In one embodiment, the attribute of the moving object in the currently shot video is compared with the attribute of the moving object in the last shot video, and the difference between the attribute of the moving object in the currently shot video is calculated. The difference between the attributes of the moving object in the last video shot is used as the attribute change value of the moving object. For example, the difference between the pixel value of the moving object in a currently captured video and the pixel value of the moving object in a previous video captured is calculated as the attribute change value of the moving object.

在一實施方式中，將當前拍攝的一段視頻中所述運動目標的運動軌跡與拍攝的上一段視頻中所述運動目標的運動軌跡進行比對，計算當前拍攝的一段視頻中所述運動目標的運動軌跡與拍攝的上一段視頻中所述運動目標的運動軌跡之間的差值，作為所述運動目標的運動軌跡變化值。In one embodiment, the trajectory of the moving object in the currently captured video is compared with the trajectory of the moving object in the previous video captured, and the trajectory of the moving object in the currently captured video is calculated. The difference between the motion trajectory and the motion trajectory of the moving object in the last captured video is used as the variation value of the motion trajectory of the moving object.

S204，將當前拍攝的一段視頻中的所述背景資訊與拍攝的上一段視頻中的所述背景資訊進行比對，以確定所述背景的像素值變化值。S204. Compare the background information in a currently captured video with the background information in a previous captured video, so as to determine a pixel value change value of the background.

在一實施方式中，將當前拍攝的一段視頻中所述背景的像素值與拍攝的上一段視頻中所述背景的像素值進行比對，計算當前拍攝的一段視頻中所述背景的像素值與拍攝的上一段視頻中所述背景的像素值之間的差值，作為所述背景的像素值變化值。在一實施方式中，所述背景的像素值為所述背景中所有物件的平均像素值。In one embodiment, the pixel value of the background in the currently captured video is compared with the pixel value of the background in the previous video captured, and the pixel value and the pixel value of the background in the currently captured video are calculated. The difference between the pixel values of the background in the last captured video is used as the change value of the pixel value of the background. In one embodiment, the pixel value of the background is an average pixel value of all objects in the background.

S205，判斷所述屬性變化值是否大於或等於第一閾值，判斷所述運動軌跡變化值是否大於或等於第二閾值，及判斷所述像素值變化值是否大於或等於第三閾值。S205. Determine whether the attribute change value is greater than or equal to a first threshold, determine whether the motion trajectory change value is greater than or equal to a second threshold, and determine whether the pixel value change value is greater than or equal to a third threshold.

S206，若所述屬性變化值、所述運動軌跡變化值及所述像素值變化值中的至少一個值大於或等於對應的閾值，將所述至少一個值對應的所述運動目標的屬性、運動軌跡或所述背景作為待傳輸的視頻資料。S206. If at least one of the attribute change value, the motion track change value, and the pixel value change value is greater than or equal to a corresponding threshold, set the attribute, motion The track or the background is used as the video material to be transmitted.

在一實施方式中，若確定所述屬性變化值大於或等於第一閾值，將所述運動目標的屬性作為待傳輸的視頻資料。例如，若確定所述運動目標的像素值變化值大於或等於第一閾值，將所述運動目標的像素值作為待傳輸的視頻資料。若確定所述運動軌跡變化值大於或等於第二閾值，將所述運動目標的運動軌跡作為待傳輸的視頻資料。若確定所述背景的像素值變化值大於或等於第三閾值，將所述背景的像素值作為待傳輸的視頻資料。In one embodiment, if it is determined that the attribute change value is greater than or equal to a first threshold, the attribute of the moving object is used as the video data to be transmitted. For example, if it is determined that the change value of the pixel value of the moving object is greater than or equal to the first threshold, the pixel value of the moving object is used as the video data to be transmitted. If it is determined that the change value of the moving track is greater than or equal to the second threshold, the moving track of the moving object is used as the video data to be transmitted. If it is determined that the change value of the pixel value of the background is greater than or equal to the third threshold, the pixel value of the background is used as the video data to be transmitted.

在一實施方式中，若確定所述屬性變化值大於或等於第一閾值，且所述運動軌跡變化值大於或等於第二閾值，將所述運動目標的屬性及運動軌跡作為待傳輸的視頻資料。若確定所述屬性變化值大於或等於第一閾值，且所述背景的像素值變化值大於或等於第三閾值，將所述運動目標的屬性及所述背景的像素值作為待傳輸的視頻資料。若確定所述運動軌跡變化值大於或等於第二閾值，且所述背景的像素值變化值大於或等於第三閾值，將所述運動目標的運動軌跡及所述背景的像素值作為待傳輸的視頻資料。In one embodiment, if it is determined that the attribute change value is greater than or equal to the first threshold, and the change value of the motion trajectory is greater than or equal to the second threshold, the attribute and motion trajectory of the moving object are used as the video data to be transmitted. . If it is determined that the change value of the attribute is greater than or equal to the first threshold, and the change value of the pixel value of the background is greater than or equal to the third threshold, the attribute of the moving object and the pixel value of the background are used as the video data to be transmitted . If it is determined that the change value of the moving track is greater than or equal to the second threshold, and the change value of the pixel value of the background is greater than or equal to the third threshold, the moving track of the moving object and the pixel value of the background are used as the to-be-transmitted Video material.

在一實施方式中，若確定所述屬性變化值大於或等於第一閾值，且所述運動軌跡變化值大於或等於第二閾值，且所述背景的像素值變化值大於或等於第三閾值，將所述運動目標的屬性、運動軌跡及所述背景的像素值作為待傳輸的視頻資料。In one embodiment, if it is determined that the change value of the attribute is greater than or equal to the first threshold, and the change value of the motion track is greater than or equal to the second threshold, and the change value of the pixel value of the background is greater than or equal to the third threshold, The attribute, motion trajectory of the moving object and the pixel value of the background are used as the video data to be transmitted.

S207，將所述待傳輸的視頻資料編碼傳輸至伺服器2，藉由所述伺服器2將接收到的視頻資料與所述上一段視頻融合並發送至接收端3進行播放。S207, encode and transmit the video data to be transmitted to the server 2, and use the server 2 to fuse the received video data with the previous video and send it to the receiving end 3 for playback.

在一實施方式中，將所述待傳輸的視頻資料經過編碼後再傳輸至伺服器2，所述伺服器2對接收到的視頻資料進行解碼，將解碼獲得的視頻資料與所述上一段視頻的視頻資料進行融合，以形成完整的視頻。在對視頻資料進行融合的過程中，將接收到的視頻資料更新至所述上一段視頻的視頻資料中，即，將變化較大的視頻資料進行更新，變化較小或沒有變化的視頻資料仍來源於所述上一段視頻。例如，若所述待傳輸的視頻資料為所述運動目標的運動軌跡，所述伺服器2將接收到的所述運動目標的運動軌跡與所述上一段視頻融合，即，將所述上一段視頻中所述運動目標的運動軌跡替換為所述運動目標的當前運動軌跡，從而形成完整的視頻。In one embodiment, the video data to be transmitted is encoded and then transmitted to the server 2, the server 2 decodes the received video data, and combines the decoded video data with the last video The video data are fused to form a complete video. In the process of merging the video data, the received video data is updated to the video data of the previous video, that is, the video data with a large change is updated, and the video data with little or no change are still From the above mentioned video. For example, if the video data to be transmitted is the moving track of the moving target, the server 2 will fuse the received moving track of the moving target with the last segment of video, that is, combine the last segment of video The movement trajectory of the moving object in the video is replaced with the current movement trajectory of the moving object, thereby forming a complete video.

在一實施方式中，所述伺服器2藉由內容分發網路將經過融合的視頻分發至所述接收端3進行播放。In one embodiment, the server 2 distributes the fused video to the receiving end 3 through a content distribution network for playback.

若所述屬性變化值、所述運動軌跡變化值及所述像素值變化值中的至少一個值大於或等於對應的閾值，說明所述運動目標的屬性、運動軌跡或所述背景的像素值變化較大，只將變化較大的視頻資料編碼傳輸至所述伺服器2，在不影響視頻即時性的前提下，減輕了視頻資料的處理壓力和傳輸壓力，同時減輕了所述伺服器2對視頻資料的存儲壓力。If at least one of the attribute change value, the motion track change value, and the pixel value change value is greater than or equal to the corresponding threshold value, it means that the attribute of the moving object, the motion track, or the pixel value change of the background Larger, only the video data with a large change is encoded and transmitted to the server 2, without affecting the immediacy of the video, it reduces the processing pressure and transmission pressure of the video data, and at the same time reduces the impact on the server 2. The storage pressure of video data.

S208，若所述屬性變化值、所述運動軌跡變化值及所述像素值變化值都小於閾值，發送控制指令至所述伺服器2，藉由所述伺服器2將所述上一段視頻發送至接收端3進行播放。S208, if the attribute change value, the motion trajectory change value and the pixel value change value are all smaller than a threshold, send a control instruction to the server 2, and send the last video through the server 2 to the receiver 3 for playback.

若所述屬性變化值、所述運動軌跡變化值及所述像素值變化值都小於閾值，說明所述運動目標的屬性、運動軌跡及所述背景的像素值變化較小或沒有變化，無需將當前拍攝的視頻資料傳輸至所述伺服器2，在不影響視頻即時性的前提下，減輕了視頻資料的處理壓力和傳輸壓力，同時減輕了所述伺服器2對視頻資料的存儲壓力。If the attribute change value, the motion track change value and the pixel value change value are all less than the threshold value, it means that the attribute of the moving object, the motion track, and the pixel value of the background have little or no change. The currently shot video data is transmitted to the server 2, without affecting the immediacy of the video, reducing the processing pressure and transmission pressure of the video data, and at the same time reducing the storage pressure of the server 2 on the video data.

在一實施方式中，所述控制指令用於指示所述伺服器2無需進行視頻融合，而直接將所述上一段視頻發送至接收端3進行播放。In one embodiment, the control instruction is used to instruct the server 2 to directly send the last segment of video to the receiving end 3 for playback without performing video fusion.

需要說明的是，本申請的視頻資料傳輸方法對視頻的處理僅包括對視頻圖像的處理，視頻的聲音仍然即時、同步地進行傳輸及播放，如此雖然用戶藉由接收端3看到的視頻是經過融合的，但聲音仍同步播放，從而不會影響用戶觀看視頻。It should be noted that the video data transmission method of the present application only includes the processing of video images, and the sound of the video is still transmitted and played in real time and synchronously, so although the video that the user sees through the receiving end 3 is blended, but the sound is still played synchronously so that it does not interfere with the user watching the video.

本申請提供的視頻資料傳輸方法可以僅對採集的視頻中變化較大的視頻內容進行傳輸，減少視頻傳輸的資料量，從而減輕資料處理壓力和網路傳輸壓力，同時減輕伺服器的存儲壓力。The video data transmission method provided by this application can only transmit the video content with large changes in the collected video, reduce the data volume of video transmission, thereby reducing the pressure of data processing and network transmission, and at the same time reduce the storage pressure of the server.

請參閱圖5所示，為本申請較佳實施方式提供的電子裝置的結構示意圖。Please refer to FIG. 5 , which is a schematic structural diagram of an electronic device provided in a preferred embodiment of the present application.

所述電子裝置1包括，但不僅限於，處理器10、記憶體20、存儲在所述記憶體20中並可在所述處理器10上運行的電腦程式30及攝像裝置40。例如，所述電腦程式30為視頻資料傳輸程式。所述處理器10執行所述電腦程式30時實現視頻資料傳輸方法中的步驟，例如圖2所示的步驟S201~S208。The electronic device 1 includes, but not limited to, a processor 10 , a memory 20 , a computer program 30 stored in the memory 20 and capable of running on the processor 10 , and a camera 40 . For example, the computer program 30 is a video data transmission program. When the processor 10 executes the computer program 30, the steps in the video data transmission method are realized, such as steps S201-S208 shown in FIG. 2 .

示例性的，所述電腦程式30可以被分割成一個或多個模組/單元，所述一個或者多個模組/單元被存儲在所述記憶體20中，並由所述處理器10執行，以完成本申請。所述一個或多個模組/單元可以是能夠完成特定功能的一系列電腦程式指令段，所述指令段用於描述所述電腦程式30在所述電子裝置1中的執行過程。Exemplarily, the computer program 30 can be divided into one or more modules/units, and the one or more modules/units are stored in the memory 20 and executed by the processor 10 , to complete this application. The one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 30 in the electronic device 1 .

本領域技術人員可以理解，所述示意圖僅僅是電子裝置1的示例，並不構成對電子裝置1的限定，可以包括比圖示更多或更少的部件，或者組合某些部件，或者不同的部件，例如所述電子裝置1還可以包括輸入輸出設備、網路接入設備、匯流排等。Those skilled in the art can understand that the schematic diagram is only an example of the electronic device 1 and does not constitute a limitation to the electronic device 1. It may include more or less components than those shown in the illustration, or combine certain components, or have different Components, for example, the electronic device 1 may also include input and output devices, network access devices, bus bars, and the like.

所稱處理器10可以是中央處理單元(Central Processing Unit，CPU)，還可以是其他通用處理器、數位訊號處理器(Digital Signal Processor，DSP)、專用積體電路(Application Specific Integrated Circuit，ASIC)、現成可程式設計閘陣列(Field-Programmable Gate Array，FPGA)或者其他可程式設計邏輯器件、分立門或者電晶體邏輯器件、分立硬體元件等。通用處理器可以是微處理器或者所述處理器10也可以是任何常規的處理器等，所述處理器10是所述電子裝置1的控制中心，利用各種介面和線路連接整個電子裝置1的各個部分。The so-called processor 10 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC) , Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or the processor 10 can also be any conventional processor, etc., the processor 10 is the control center of the electronic device 1, and uses various interfaces and lines to connect the entire electronic device 1. various parts.

所述記憶體20可用於存儲所述電腦程式30和/或模組/單元，所述處理器10藉由運行或執行存儲在所述記憶體20內的電腦程式和/或模組/單元，以及調用存儲在記憶體20內的資料，實現所述電子裝置1的各種功能。所述記憶體20可主要包括存儲程式區和存儲資料區，其中，存儲程式區可存儲作業系統、至少一個功能所需的應用程式（比如聲音播放功能、圖像播放功能等）等；存儲資料區可存儲根據電子裝置1的使用所創建的資料（比如音訊資料、電話本等）等。此外，記憶體20可以包括易失性和非易失性記憶體，例如硬碟、記憶體、插接式硬碟，智慧存儲卡（Smart Media Card, SMC），安全數位（Secure Digital, SD）卡，快閃記憶體卡（Flash Card）、至少一個磁碟記憶體件、快閃記憶體器件、或其他記憶體件。所述攝像裝置40為CCD攝像頭。The memory 20 can be used to store the computer program 30 and/or module/unit, and the processor 10 runs or executes the computer program and/or module/unit stored in the memory 20, And calling the data stored in the memory 20 to realize various functions of the electronic device 1 . The memory 20 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program (such as a sound playback function, an image playback function, etc.) required by at least one function, etc.; The area can store data (such as audio data, phone book, etc.) created according to the use of the electronic device 1 . In addition, the memory 20 can include volatile and non-volatile memory, such as hard disk, memory, plug-in hard disk, smart memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash memory card (Flash Card), at least one disk memory component, flash memory device, or other memory components. The camera device 40 is a CCD camera.

所述電子裝置1集成的模組/單元如果以軟體功能單元的形式實現並作為獨立的產品銷售或使用時，可以存儲在一個電腦可讀取存儲介質中。基於這樣的理解，本申請實現上述實施例方法中的全部或部分流程，也可以藉由電腦程式來指令相關的硬體來完成，所述的電腦程式可存儲於一電腦可讀存儲介質中，所述電腦程式在被處理器執行時，可實現上述各個方法實施例的步驟。其中，所述電腦程式包括電腦程式代碼，所述電腦程式代碼可以為原始程式碼形式、物件代碼形式、可執行檔或某些中間形式等。所述電腦可讀介質可以包括：能夠攜帶所述電腦程式代碼的任何實體或裝置、記錄介質、U盤、移動硬碟、磁碟、光碟、電腦記憶體、唯讀記憶體（ROM，Read-Only Memory）、隨機存取記憶體（RAM，Random Access Memory）。If the integrated modules/units of the electronic device 1 are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on such an understanding, all or part of the processes in the methods of the above embodiments of the present application can also be completed by instructing related hardware through computer programs, and the computer programs can be stored in a computer-readable storage medium. When the computer program is executed by the processor, it can realize the steps of the above-mentioned various method embodiments. Wherein, the computer program includes computer program code, and the computer program code may be in the form of original code, object code, executable file or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read- Only Memory), Random Access Memory (RAM, Random Access Memory).

本申請提供的視頻資料傳輸方法、電子裝置及存儲介質可以僅對採集的視頻中變化較大的視頻內容進行傳輸，減少視頻傳輸的資料量，從而減輕資料處理壓力和網路傳輸壓力，同時減輕伺服器的存儲壓力。The video data transmission method, electronic device, and storage medium provided by this application can only transmit the video content that changes greatly in the collected video, reduce the amount of video transmission data, thereby reducing the pressure of data processing and network transmission, and at the same time Server storage pressure.

對於本領域技術人員而言，顯然本申請不限於上述示範性實施例的細節，而且在不背離本申請的精神或基本特徵的情況下，能夠以其他的具體形式實現本申請。因此，無論從哪一點來看，均應將實施例看作是示範性的，而且是非限制性的，本申請的範圍由所附申請專利範圍而不是上述說明限定，因此旨在將落在申請專利範圍的等同要件的含義和範圍內的所有變化涵括在本申請內。不應將申請專利範圍中的任何附圖標記視為限制所涉及的申請專利範圍。此外，顯然“包括”一詞不排除其他單元或步驟，單數不排除複數。裝置申請專利範圍中陳述的多個單元或裝置也可以由同一個單元或裝置藉由軟體或者硬體來實現。第一，第二等詞語用來表示名稱，而並不表示任何特定的順序。It will be apparent to those skilled in the art that the present application is not limited to the details of the exemplary embodiments described above, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application. Therefore, no matter from any point of view, the embodiments should be regarded as exemplary and non-restrictive. The scope of the present application is defined by the appended patent scope rather than the above description, so it is intended that the scope of the application shall be All changes within the meaning and range of equivalents of the patent claims are embraced in this application. Any reference sign in a claim should not be construed as limiting the claim involved. In addition, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or devices stated in the scope of patent application for devices can also be realized by the same unit or device through software or hardware. The words first, second, etc. are used to denote names and do not imply any particular order.

綜上所述，本發明符合發明專利要件，爰依法提出專利申請。惟，以上所述者僅為本發明之較佳實施方式，舉凡熟悉本案技藝之人士，於爰依本發明精神所作之等效修飾或變化，皆應涵蓋於以下之申請專利範圍內。In summary, the present invention meets the requirements of an invention patent, and a patent application is filed according to law. However, what is described above is only a preferred embodiment of the present invention, and all equivalent modifications or changes made by those who are familiar with the technology of the present invention according to the spirit of the present invention should be covered by the scope of the following patent application.

1:電子裝置 10:處理器 20:記憶體 30:電腦程式 40:攝像裝置 2:伺服器 3:接收端 S201~S208:步驟 1: Electronic device 10: Processor 20: memory 30: Computer program 40: camera device 2: Server 3: Receiver S201~S208: steps

圖1是本申請較佳實施方式提供的視頻資料傳輸方法的應用環境架構示意圖。圖2是本申請較佳實施方式提供的視頻資料傳輸方法的流程圖。圖3是本申請較佳實施方式提供的目標跟蹤模型的結構示意圖。圖4是本申請較佳實施方式提供的姿態識別模型的結構示意圖。圖5是本申請較佳實施方式提供的電子裝置的結構示意圖。 FIG. 1 is a schematic diagram of an application environment architecture of a video data transmission method provided by a preferred embodiment of the present application. Fig. 2 is a flowchart of a video data transmission method provided by a preferred embodiment of the present application. Fig. 3 is a schematic structural diagram of a target tracking model provided by a preferred embodiment of the present application. Fig. 4 is a schematic structural diagram of a gesture recognition model provided by a preferred embodiment of the present application. FIG. 5 is a schematic structural diagram of an electronic device provided in a preferred embodiment of the present application.

S201~S208:步驟 S201~S208: steps

Claims

A video data transmission method, wherein the method includes: Taking a video of a scene in real time by a camera device, and identifying moving objects and backgrounds in the scene according to at least one section of the video taken; Identifying moving target information and background information in each segment of the video, wherein the moving target information includes at least the attributes and motion tracks of the moving target, and the background information includes at least the pixel value of the background; Comparing the information of the moving object in the currently captured video with the information of the moving object in the last video taken, to determine the attribute change value and the change value of the moving track of the moving object; Comparing the background information in a currently captured video with the background information in a previous video captured to determine the pixel value change value of the background; If at least one of the attribute change value, the motion track change value, and the pixel value change value is greater than or equal to the corresponding threshold, the attribute, motion track, or The background is used as video material to be transmitted; The video data to be transmitted is coded and transmitted to the server, and the received video data is fused with the previous video by the server and sent to the receiving end for playback.

The video data transmission method as described in claim 1, wherein the method further includes: If the attribute change value, the motion track change value and the pixel value change value are all smaller than the corresponding threshold value, send a control command to the server, and send the last video through the server to the receiver for playback.

The video data transmission method as described in claim 2, wherein the method further includes: Dividing the video captured in real time into a plurality of segments; When the moving target in the scene is not recognized, the first segment of the captured video is encoded and transmitted to the server.

The video data transmission method as described in Claim 1, wherein said identifying the moving target and the background in the scene according to at least one segment of the captured video includes: Input each section of video taken into the target tracking model, and identify the moving target in the scene by the target tracking model; Taking the area outside the moving target in the scene as the background.

The video data transmission method as described in claim item 1, wherein, the moving target information and background information in each section of video taken by the identification includes: Image recognition is performed on the video frame images in each segment of video to obtain the attribute of the moving object and the pixel value of the background.

The video data transmission method as described in claim item 5, wherein the identification of moving target information and background information in each section of video taken further includes: Determining whether the moving target is a person or an animal by image recognition; If it is determined that the moving object is a human or an animal, the joints of the human or animal are identified by a gesture recognition model, the coordinates of the joints are determined, and the trajectory of the moving object is determined according to the change in the coordinates of the joints; If it is determined that the moving object is not a person or an animal, the rectangular frame of the moving object is determined by the object tracking model, the center point of the rectangular frame is determined, and the movement of the moving object is determined according to the coordinate change of the center point track.

The video data transmission method as described in Claim 1, wherein said comparing the moving target information in the currently captured video with the moving target information in the previous captured video includes: Calculating the difference between the attribute of the moving object in the currently shot video and the attribute of the moving object in the last video shot, as the attribute change value of the moving object; Calculating the difference between the trajectory of the moving object in the currently captured video and the trajectory of the moving object in the previous video taken as the variation value of the trajectory of the moving object; The comparing the background information in a currently captured video with the background information in a previous video captured includes: Calculate the difference between the pixel value of the background in a currently shot video and the pixel value of the background in a previous video shot, as the change value of the background pixel value.

The video data transmission method as described in claim 1, wherein the method further includes: Judging whether the attribute change value is greater than or equal to a first threshold, judging whether the motion trajectory change value is greater than or equal to a second threshold, and judging whether the pixel value change value is greater than or equal to a third threshold.

An electronic device, wherein the electronic device includes: processor; and A memory, wherein a plurality of program modules are stored in the memory, and the plurality of program modules are loaded by the processor to execute the video data transmission method as described in any one of claims 1 to 8.

A computer-readable storage medium, on which at least one computer instruction is stored, wherein the instruction is loaded by a processor to execute the video data transmission method described in any one of claims 1-8.