WO2021051912A1

WO2021051912A1 - Media data transmission method and related device

Info

Publication number: WO2021051912A1
Application number: PCT/CN2020/097302
Authority: WO
Inventors: 刘俊; 杨胜凯
Original assignee: 华为技术有限公司
Priority date: 2019-09-19
Filing date: 2020-06-20
Publication date: 2021-03-25
Also published as: CN111263097B; CN111263097A

Abstract

Disclosed are a media data transmission method and a related device. The media data transmission method comprises: a camera generating a plurality of original video frames; the camera generating a video by using the plurality of original video frames; the camera acquiring information of the position of a target image in the video, wherein the target image is a video frame in the video or part of the video frame in the video; and the camera sending the video and the position information to a storage device. The video and the position information of the target image can be transmitted, such that the consumption of a transmission bandwidth is effectively reduced.

Description

Media data transmission method and related equipment

Technical field

This application relates to the field of data processing, and in particular to a media data transmission method and related equipment.

Background technique

The camera shoots a large amount of video data, and then transmits the video to the storage device for storage.

With the advancement of camera intelligence, in addition to video, the camera also needs to transmit pictures to the storage device so that the storage device can perform operations such as image recognition. The additional image transmission increases the network bandwidth and storage device space occupation.

Summary of the invention

The embodiments of the present application provide a media data transmission method and related equipment, which can transmit video and image location information, which effectively reduces the consumption of transmission bandwidth.

In the first aspect, an embodiment of the present application provides a media data transmission method, the method including:

The camera generates multiple original video frames;

The camera uses multiple original video frames to generate video;

The camera obtains the position information of the target image in the video, where the target image is a video frame in the video or a part of the video frame in the video;

The camera sends the video and location information to the storage device.

In this example, by transmitting the location information of the video and the target image, since the size of the location information is much smaller than the size of the image, the transmission bandwidth occupied by the location information is also much smaller than the transmission bandwidth occupied by the image, which reduces Consumption of bandwidth resources.

With reference to the first aspect, in a possible embodiment of the first aspect,

In the case that the target image category is a large image, the position information includes the first absolute position and/or the first relative position, where the first absolute position includes one of the frame number and the time stamp of the target image in the video, or In many cases, the first relative position includes the offset of the target image relative to the specific video frame.

When the target image is a small image, the position information includes a second absolute position and/or a second relative position, where the second absolute position includes the frame number and timestamp of the video frame corresponding to the target image in the video. The second absolute position also includes the position of the target image in the corresponding video frame; the second relative position includes the offset of the target image relative to the specific video frame, and the position of the target image in the corresponding video frame position.

In this example, the position information includes the absolute position and the relative position, and the absolute position and the relative position can be transmitted at the same time to verify each other, so as to avoid the loss of the video frame during the transmission process, which may cause errors.

With reference to the first aspect, in a possible embodiment of the first aspect, the camera does not generate the target image, and does not send the target image to the storage device.

In this example, the camera does not generate the target image and does not send the target image to the storage device, so there is no need to transmit the target image separately, which reduces the consumption of bandwidth resources.

With reference to the first aspect, in a possible embodiment of the first aspect, the method further includes:

The camera selects the target frame where the target image is located from the video;

The camera obtains the position information of the target image in the video according to the target frame and the video.

The target image is located in the I frame in the GOP of the video.

The camera obtains a target image from multiple original video frames, and the image quality of the target image is an image including the target feature in the multiple original video frames.

With reference to the first aspect, in a possible embodiment of the first aspect, the method further includes: the storage device receives the video sent by the camera and the location information of the target image;

The storage device obtains the target image from the corresponding video frame of the video according to the location information.

The storage device stores the video sent by the camera and the location information of the target image;

At the end of the storage life cycle of the video, the storage device obtains the target image from the corresponding video frame of the video according to the location information;

The storage device saves the target image;

The storage device deletes the video.

In this example, at the end of the storage life cycle of the video, the target image is obtained from the corresponding video frame of the video according to the location information and saved, so that the target image can be stored after the subsequent video is deleted, which can be used for subsequent image search. Provide the target image.

The storage device receives the video stream sent by the camera and the location information of the target image, where the target image is a video frame in the video stream or a part of the video frame in the video stream;

The storage device obtains the target image from the corresponding video frame of the video stream according to the location information.

In a second aspect, an embodiment of the present application provides a media data transmission method, the method including:

The storage device receives the video sent by the camera and the location information of the target image, where the target image is a video frame in the video or a part of the video frame in the video;

With reference to the second aspect, in a possible embodiment of the second aspect,

In combination with the second aspect, in a possible embodiment of the second aspect, during the storage life cycle of the video, when the storage device receives a request to read the target image, the storage device obtains the corresponding video frame from the video according to the location information. To obtain the target image.

With reference to the second aspect, in a possible embodiment of the second aspect, it further includes: at the end of the storage life cycle of the video, the storage device obtains the target image from the corresponding video frame of the video according to the location information; the storage device saves the target image , And delete the video.

In this example, during the storage life cycle of the video, the target image is obtained from the corresponding video frame in the video. Therefore, during the storage life cycle of the video, there is no need to store the target image. The storage device only needs to store the video and location information. The memory space occupied by the location information is much smaller than the memory space occupied by the target image, so storing the location information can reduce the consumption of storage resources relative to storing the target image.

With reference to the second aspect, in a possible embodiment of the second aspect, the storage device does not store the target image during the storage life cycle of the video; after the storage life cycle of the video ends, the storage device stores the target image.

In this example, during the storage life cycle of the video, there is no need to store the target image, which can reduce the consumption of storage resources.

In a third aspect, an embodiment of the present application provides a media data transmission device, and the device includes:

The first generating unit is used to generate multiple original video frames;

The second generating unit is used to generate a video using multiple original video frames;

An acquiring unit for acquiring position information of the target image in the video, where the target image is a video frame in the video or a part of the video frame in the video;

The sending unit is used to send the video and location information to the storage device.

With reference to the third aspect, in a possible embodiment of the third aspect,

With reference to the third aspect, in a possible embodiment of the third aspect, the media data transmission apparatus does not generate the target image, and does not send the target image to the storage device.

With reference to the third aspect, in a possible embodiment of the third aspect, it further includes:

The target image is located in the I frame in the GOP of the video.

In combination with the third aspect, in a possible embodiment of the third aspect, it is also used to:

The target image is obtained from multiple original video frames, and the image quality of the target image is an image including the target feature in the multiple original video frames.

In a fourth aspect, an embodiment of the present application provides a media data transmission device, and the device includes:

The receiving unit is used to receive the video sent by the camera and the location information of the target image, where the target image is a video frame in the video or a part of the video frame in the video;

The acquiring unit is used to acquire the target image from the corresponding video frame of the video according to the location information.

With reference to the fourth aspect, in a possible embodiment of the fourth aspect,

With reference to the fourth aspect, in a possible embodiment of the fourth aspect, during the storage life cycle of the video, when the storage device receives a request to read the target image, the storage device obtains the corresponding video frame from the video according to the location information. To obtain the target image.

With reference to the fourth aspect, in a possible embodiment of the fourth aspect, it is also used to: at the end of the storage life cycle of the video, the storage device obtains the target image from the corresponding video frame of the video according to the location information; the storage device saves the target Images, and delete videos.

With reference to the fourth aspect, in a possible embodiment of the fourth aspect, it is also used to:

Receive the video stream sent by the camera and the location information of the target image, where the target image is a video frame in the video stream or a part of the video frame in the video stream;

Obtain the target image from the corresponding video frame of the video stream according to the location information.

In a fifth aspect, an embodiment of the present application provides a camera, which includes:

A processor for generating multiple original video frames, using multiple original video frames to generate a video; acquiring position information of the target image in the video, where the target image is a video frame in the video or a part of the video frame in the video;

The transceiver module is used to send the video and location information to the storage device.

With reference to the fifth aspect, in a possible embodiment of the fifth aspect,

With reference to the fifth aspect, in a possible embodiment of the fifth aspect, the camera does not generate the target image, and the transceiver module does not send the target image to the storage device.

With reference to the fifth aspect, in a possible embodiment of the fifth aspect, it is also used to:

Select the target frame where the target image is located from the video;

According to the target frame and the video, the position information of the target image in the video is obtained.

The target image is located in the I frame in the GOP of the video.

The target image is obtained from multiple original video frames, and the image quality of the target image is the image including the target feature in the multiple original video frames.

In a sixth aspect, an embodiment of the present application provides a storage device, which includes:

The transceiver module is used to receive the video sent by the camera and the position information of the target image, where the target image is a video frame in the video or a part of the video frame in the video;

The processor is used to obtain the target image from the corresponding video frame of the video according to the position information.

With reference to the sixth aspect, in a possible embodiment of the sixth aspect,

With reference to the sixth aspect, in a possible embodiment of the sixth aspect, during the storage life cycle of the video, when the storage device receives a request to read the target image, the storage device obtains the corresponding video frame from the video according to the location information. To obtain the target image.

With reference to the sixth aspect, in a possible embodiment of the sixth aspect, it is also used to: at the end of the storage life cycle of the video, obtain the target image from the corresponding video frame of the video according to the location information; save the target image, and delete video.

With reference to the sixth aspect, in a possible embodiment of the sixth aspect, it is also used to:

In a seventh aspect, an embodiment of the present application provides a camera, which includes a processor, a transceiver, and a memory, and the processor executes the code in the memory to execute the method as in the first aspect.

In an eighth aspect, an embodiment of the present application provides a storage device. The storage device includes a processor, a transceiver, and a memory, and the processor executes code in the memory to execute the method in the second aspect.

In a ninth aspect, an embodiment of the present application provides a computer-readable storage medium, and the computer-readable storage medium stores a computer program. The computer program includes program instructions. Any of the methods of the second aspect.

A tenth aspect provides a computer program product. When the computer program product is read and executed by a computer, the method of any one of the first aspect and the second aspect will be executed.

These and other aspects of the present application will be more concise and understandable in the description of the following embodiments.

Description of the drawings

In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings needed for the embodiments. Obviously, the drawings in the following description are only some implementations of the present application. example.

FIG. 1 is a schematic diagram of a camera collecting images according to an embodiment of the application;

FIG. 2 provides a schematic diagram of a large image and a small image according to an embodiment of the application;

FIG. 3A provides a schematic diagram when a video frame is transmitted for an embodiment of this application;

FIG. 3B provides a schematic diagram of video and location information transmitted by a camera according to an embodiment of this application;

FIG. 4 is a schematic diagram of a storage device storing video and location information according to an embodiment of this application;

FIG. 5A provides a schematic diagram of an index table of a large image according to an embodiment of this application;

FIG. 5B provides a schematic diagram of an index table of thumbnails according to an embodiment of this application;

FIG. 6A provides a schematic diagram of a camera extracting a target image according to an embodiment of this application;

FIG. 6B provides a schematic diagram of an index table for multiplex storage of thumbnails according to an embodiment of this application;

FIG. 6C provides a schematic diagram of another index table for multiplex storage of small pictures according to an embodiment of this application;

FIG. 6D provides a schematic diagram of an index table of another large image according to an embodiment of the present application;

FIG. 7 is an interactive schematic diagram of a media data transmission method according to an embodiment of the application;

FIG. 8 is a schematic structural diagram of a media data transmission device according to an embodiment of the application;

FIG. 9 is a schematic structural diagram of a camera provided in an embodiment of the application;

FIG. 10 is a schematic structural diagram of another camera provided in an embodiment of this application;

FIG. 11 is a schematic structural diagram of a media data transmission device according to an embodiment of the application;

FIG. 12 is a schematic structural diagram of a storage device provided in an embodiment of this application;

FIG. 13 is a schematic structural diagram of a server provided in an embodiment of this application.

detailed description

The embodiments of the present application will be described below in conjunction with the drawings.

First, the video transmission process involved in this application will be introduced.

As shown in Figure 1, the camera will collect video according to the time axis t, where the video includes n frames of images I ₁ , I ₂ ,..., I _n , and the image I ₁ is the image collected by the camera at t _1. I ₂ is the image collected by the camera at _{time t 2} ,..., the image I _n is the image collected by the camera at _{time t n.} Here, _{the time interval between t 1} , t ₂ ,..., t _n can be equal or unequal, that is, t _n -t _n-1 , t _n-1 -t _n-2 ,..., t ₂ -t ₁ may be equal or unequal, and there is no specific limitation here.

In addition to sending the video to the storage device, the camera also needs to select a target image from the video and send it to the storage device. Among them, the target image can be a large image or a small image, and the storage device can be a storage server, a video stream management platform, and so on. As shown in Figure 2, the large image can be a complete image of a certain video frame, or an image that occupies an area of a certain video frame exceeding a preset threshold, and so on. In a specific embodiment, the big picture can contain multiple target subjects (the subject can be understood as target features). For example, the big picture can include a scene where a vehicle hits a pedestrian. Therefore, the big picture can be used to analyze the difference between different target subjects. Relationship and behavior. The thumbnail can be a partial area of a certain video frame. In a specific embodiment, the thumbnail may include only a single target subject, or only a partial area of a single target subject. For example, the thumbnail may include the face of a pedestrian. Therefore, the thumbnail can be used to analyze a single target. The details and structure of the main body. Here, the target subject can be pedestrians, animals, vehicles, license plates, road signs, traffic lights, etc., which are not specifically limited here. The small image can be obtained by extracting the regional image block with the target subject from the large image. The extraction method can be an image feature extraction algorithm, specifically HOG (histogram of Oriented Gradient, directional gradient histogram), SIFT (Scale- invariant features transform, scale-invariant feature transformation), etc., which are not specifically limited here.

In order to reduce the bandwidth required for the camera to send video and target images to the storage device, the camera can also use video compression algorithms to compress the video and image compression algorithms to compress the target image. The video compression algorithm can be H.264, H.265, H.266, etc., are not specifically limited here. The image compression algorithm can be JPEG, HEIF, etc., which is not specifically limited here.

When reducing the bandwidth occupied by the camera to send data to the storage device, the camera uses a video compression algorithm to compress the video and a picture compression algorithm to compress the target image. After compression, the bandwidth occupation is reduced. However, the bandwidth will still be consumed during transmission, especially when a large number of target images are transmitted, the consumption of bandwidth resources will still be huge. At the same time, after the transmission to the storage device, the storage device needs to consume memory resources. The target image is stored, which leads to a large consumption of storage resources. Therefore, how to reduce the transmission of the target image is a problem that needs to be solved.

The embodiment of the application aims to solve the above-mentioned large bandwidth consumption when transmitting the target image, and large consumption of memory resources and hard disk resources when storing the storage device. The video (video can be transmitted in a streaming mode, called Video stream) and the location information of the target image are transmitted. Since the size of the location information is much smaller than the size of the image, the bandwidth occupied by the location information is also much smaller than the bandwidth occupied by the image, which reduces bandwidth resources. Consumption: After the storage device receives the location information of the video and the target image, the storage device stores the video and location information. The storage location information can also reduce the consumption of storage resources compared to storing the target image.

This application provides a media data transmission method and related equipment, which can effectively reduce the consumption of bandwidth resources and storage resources.

Send to the storage device in a video camera, comprising an image I _1, I _2, ..., when I _n, in addition to the need to send the video camera, but also the location information transmitted target image, a target position mark in the video image. Here, the video and location information can be transmitted at the same time or separately; they can be transmitted on the same channel or on different channels (for example, both use data channels, or video streams use data channels, and location information uses Management channel), there is no specific limitation here. The following is a detailed introduction to the position information when the target image is a large image and a small image.

When the category of the target image is a large image, the position information of the target image may be the first absolute position or the first relative position, and may also include the first absolute position and the first relative position at the same time. The first absolute position may include one or more of the frame number and time stamp of the target image in the video. For example, a video image comprising n frames _{_{I 1, I 2, ...,}} I n, the target image may be an image frame 5, then the first absolute position of the target image may be a frame number 5 in the video. The first relative position may be an offset relative to a certain video frame, etc. For example, a video image comprising n frames _{_{I 1, I 2, ...,}} I n, the target image may be an image of the fifth frame. 5-1=4, then, the first relative position can be the offset 4 of the target image relative to the first frame of image, or it can be that when the specific video frame is, for example, the compressed I frame of the video frame, the offset The amount can be the offset of the target image relative to the I frame, and so on. Of course, when the video frame corresponding to the target image is an I frame, the position information can be described by using the first absolute position. Specifically, the position information can be the frame number of the I frame. When the target image frame is a non-I frame, the position information The first relative position may be used for description, which may specifically be the offset of the target image relative to the I frame. Supplementary note: When the large image is not a complete frame image, the position information of the target image may also include the position of the target image in the video frame, and the position includes coordinates and size (see the following description for details).

When the category of the target image is a small image, the position information of the target image may be the second absolute position or the second relative position, or include the second absolute position and the second relative position at the same time. Wherein, the second absolute position may include one or more of the frame number and time stamp of the video frame corresponding to the target image in the video, and the position of the target image in the video frame. The position includes coordinates and size. The target image The coordinates of the video frame can be expressed as (x, y), x is the horizontal coordinate, y is the vertical coordinate, the size of the target image in the video frame can be expressed as mXn, m is the horizontal size, and n is the vertical size. The second relative position includes the absolute position of the video frame corresponding to the target image and the relative position of the target image in the corresponding video frame, the relative position of the video frame corresponding to the target image and the absolute position of the target image in the corresponding video frame, the target image The relative position of the video frame corresponding to the image and the relative position of the target image in the corresponding video frame. The relative position of the target image in the corresponding video frame can be an offset relative to the position of a specific mark. For example, the video frame is the image of the target pedestrian visiting Tiananmen Square, and the target image is the target pedestrian, then the target image is in the video The relative position in the frame may be the direction and distance of the target pedestrian relative to Tiananmen Square. For example, the position of the target pedestrian on the east side of Tiananmen Square at a distance of 100 meters is expressed as (East, 100).

The location information of the target image may also include the frame category of the video frame in which the target image is located during transmission after compression. The frame category includes I frame and P frame. Send to the storage device in a video camera, comprising an image such as I _1, I _2, ..., when I _n, the need for a video frame coded and transmitted, as shown in FIG. 3A, FIG. 3A illustrates a video frame is transmitted Schematic diagram of time. After the video frame is encoded, I frame and P frame are obtained. I frame can be a complete video frame, I frame can be understood as a key frame, and P frame is the difference between this frame and the previous key frame. For example, I frame is An image of a certain moment when a car is driving, then the P frame can be the position offset of the car at the next moment relative to the previous moment, etc. The target image can be an I frame or a P frame.

When the camera sends video and location information to the storage device, it can also send the associated information of the target image. The associated information includes the acquisition time of the video, the acquisition time of the video frame corresponding to the target image, the camera identification, the category of the target image, and the target image The sequence number, the frame number of the video frame corresponding to the target image, and the offset of the video frame corresponding to the target image. The target image categories include large images and small images, and the time when the video is collected can be the start time of the video, etc.

It can be understood that in practical applications, the absolute position and the relative position can be transmitted at the same time to verify each other, so as to avoid the loss of the video frame during the transmission process, which may lead to errors.

In a specific embodiment, as shown in FIG. 3B, the position information transmitted between the camera and the storage device is compressed. Specifically, the camera may compress the original position information, and send the compressed position information to the storage device. The original location information may also be part or all of the location information described above. When the original position information is part of the position information in the above position information, it includes at least the position information of the target image and so on.

As shown in Figure 4, Figure 4 shows a specific example when the storage device stores the received video and location information. After receiving the video and location information, the storage device stores the video in the storage space corresponding to the video. Store location information. In order to facilitate storage, the position information can also be further processed to obtain an index table of the target image, and store the index table. The position information (or index table) can represent the position information of the target picture in the video. The index table includes the index table of the big picture and the index table of the small picture. The index table of the big picture and the index table of the small picture can be stored separately. That is, store all the large-picture index tables in one memory space, and store all the small-picture index tables in another storage space.

When the position information is processed, obtaining the index table of the target image may specifically be: extracting the position information of the target image, and generating a template according to the preset index table to generate the index table of the target image. When extracting the location information of the target image, it can be extracted from the cache or from the memory. The preset index table generation template may be a preset template. Of course, it is also possible to obtain the index table of the target image in other ways.

In a specific embodiment, when the index table is generated according to the index table template, FIG. 5A shows a schematic diagram of an index table of a large image, and FIG. 5B shows a schematic diagram of an index table of a small image. As shown in Figure 5A, the index table of the big picture includes the camera ID, the collection time of the video frame corresponding to the big picture, the frame number of the video frame, the picture type, the picture sequence number, the video frame type, the video frame offset, etc. The content of the index table can be directly extracted from the received location information. As shown in Figure 5B, the index table of the thumbnail includes the camera ID, the collection time of the video frame in which the thumbnail is located, the frame number of the video frame, the picture type, the picture sequence number, the video frame type, the video frame offset, and the time when the thumbnail is located. The offset in the video frame, the size of the thumbnail in the video frame, etc. The offset of the thumbnail in the video frame is expressed in the form of coordinates. The size of the thumbnail in the video frame is expressed by the horizontal size and the vertical size. For example, 80X80 means that the horizontal size is 80 and the vertical size is 80. The above index The content of the table can be directly extracted from the received location information.

The storage device does not need to store the target image separately. After the storage device receives the request from the host to read the target image, it uses the index table to obtain the corresponding target image from the video and sends it to the host.

After the storage device stores the video, it sets a storage life cycle for the video, and deletes the video after the storage life cycle ends. The storage life cycle can be specifically understood as a fixed duration. In order to avoid that the target image can no longer be read after the video is deleted, the storage device can extract the target image from the video at the end of the video storage life cycle, store the target image in the corresponding storage space, and update the picture index table ( The updated index table is used to describe the storage location of the target image in the storage device). The end of the storage life cycle is a condition that triggers the step of extracting the target image from the video. After the extraction of the target image is completed, the storage device deletes the video. At a specific point in time, the end of the life cycle includes: the life cycle is about to reach the end time point, or a short time after the end time point of the life cycle. Of course, the storage device can also extract the target image from the video before the end of the storage life cycle of the video, for example, perform the extraction operation and store it within 10 minutes before the end of the storage life cycle, and after the end of the storage life cycle , The storage device can delete the video immediately.

When extracting the target image from the video according to the index table of the target image, it may be specifically: acquiring at least one video corresponding to the camera identifier according to the camera identifier, and then determining the target from the at least one video according to the video acquisition time Video, the target video includes the target image. According to the time of the video frame in the index table, the type of the video frame, the frame number of the video frame in the video, and the video frame offset, the video frame is extracted from the target video, and the position information of the target image is extracted. , Obtain the target image from the video frame. If the target image is a large image, the video frame can be determined as the target image (here, the large image is a complete video frame as an example). If the target image is a small image, follow The offset of the picture in the video frame and the size of the picture in the video frame in the index table obtain the target image. In a specific example, as shown in FIG. 6A, FIG. 6A shows a schematic diagram of extracting a target image. The storage device proposes n videos from the storage space corresponding to the video according to the camera ID, video 1, video 2, ..., video n-1, video n, and determines m videos from n videos according to the camera ID, and video k, …, video j, the target video is determined from the m videos according to the video acquisition time, and then the target image is determined from the n video frames of the target video according to other information in the index table, and other information includes the information of the video frame Time, video frame type, frame number of video frame, video frame offset, position information of target image (first absolute position and/or first relative position, second absolute position and/or second relative position), etc.

When extracting a target image from a video, first determine whether there is a target image in the video. If there is a target image, extract the target image from the video according to the index table of the target image and store it in the image storage space. To determine whether there is a target image in the video, you can determine it according to the index table. Specifically, it can be determined by the frame number of the video frame in the index table and the acquisition time of the video frame to determine whether the video frame exists in the video. If it exists, then determine If there is a target image in the video, if it does not exist, it is determined that there is no target image in the video.

When storing the target image, the large image and the small image can be stored separately or multiplexed. Multiplexing storage can be understood as when the target image is a small image, storing the large image or video frame where the target image is located, and storing the location information of the small image in the large image or video frame, thereby realizing the small image and the large image/video The effect that frames are stored. When storing the target image, different encoding formats may be adopted to encode the target image and then stored, and the encoding format may be the HEIF format.

After deleting the video, because the specific index position in the current target image index table is the position in the video, the current index table can no longer meet the conditions for representing the position of the target image, so it is necessary to perform the index table The updated index table is used to describe the storage location of the target image, the picture type of the target image, and the encoding format of the target image during storage.

The image index table after the update will be different for different storage methods. The storage location of the target image and the file name of the target image are added to the updated image index table, and the original location information related to the video is deleted, for example, video The type of frame, the offset of the video frame, etc. As shown in FIG. 6B, FIG. 6B shows a schematic diagram of the index table of the thumbnail when the thumbnail is multiplexed and stored. At this time, the video frame corresponding to the thumbnail is multiplexed when the thumbnail is stored. Figure 6C shows that the small image is multiplexed and stored, and the large image corresponding to the small image is multiplexed when the small image is stored. If the large image is a complete video frame, the offset of the small image in the large image does not need to be Re-acquire. If the large image is part of the video frame, the offset of the small image in the large image needs to be obtained again. For the specific acquisition method, please refer to the aforementioned method of obtaining the position information of the small image in the video frame, here No more specific explanation. FIG. 6D shows a schematic diagram of the index table of the updated large image. Among them, the thumbnail storage types include 0 and 1, 1 means the thumbnail is multiplexed and stored, and 0 means the thumbnail is stored separately.

After the storage device stores the video, if it receives a request to read the target image, it can extract the target image from the video or extract the target image from the storage space of the target image, and feed back the target image to the requesting party. Specifically, the request to read the target image is extracted from the video according to the index table during the storage life cycle of the video; the request to read the target image is retrieved from the target according to the index table after the video storage life cycle ends. The target image is extracted from the image storage space. After the target image is extracted, the target image is fed back to the requesting party. When the target image is fed back to the requesting party, the picture format of the target image can be converted to the picture format corresponding to the requesting party, for example, When the requester requests the JPEG format, the format of the target image is converted to the JPEG format. For the method of extracting the target image from the video, refer to the image extraction method shown in FIG. 6A of the foregoing embodiment, which will not be repeated here.

The storage device can also receive the video stream and the location information of the target image sent by the camera, and the storage device obtains the target image from the corresponding video frame in the video stream according to the location information. The location information of the video and the target image, as well as the implementation of obtaining the target image from the video, will not be repeated here.

As shown in FIG. 7, FIG. 7 is an interactive schematic diagram of a media data transmission method provided in an embodiment of this application. The data transmission method of this embodiment includes the following steps:

S101. The camera acquires position information of a target image in the video, where the target image is a video frame in the video or a part of the video frame in the video.

The target image includes a large image and/or a small image. The large image can be a complete image of a certain video frame, or an image that occupies an area of a certain video frame that exceeds a preset threshold, etc.; the small image can be of a certain video frame partial area. In a specific embodiment, the thumbnail may only include a single target subject, or only a partial area of a single target subject.

The position information includes an absolute position and a relative position. The absolute position may be, for example, the frame number of a video frame, a time stamp, etc., and the relative position may be, for example, an offset relative to a specific video frame.

Before acquiring the position information of the target image in the video, the camera generates multiple original video frames, and the camera uses the multiple original video frames to generate the video.

S102. The camera sends the video and location information to the storage device.

When the camera sends the video and location information to the storage device, it can be sent simultaneously or non-simultaneously.

S103: The storage device receives the video sent by the camera and the location information of the target image, where the target image is a video frame in the video or a part of the video frame in the video.

S104. The storage device obtains a target image from a corresponding video frame of the video according to the location information.

When the storage device obtains the target image according to the location information, it can obtain the target image according to the index table that carries the location information. Specifically, it can obtain the target image from the corresponding video frame in the video according to the index table, or it can obtain the target image from the target according to the index table. Obtain the target image from the image storage space.

In one possible implementation,

When the target image is a small image, the position information includes a second absolute position and/or a second relative position, where the second absolute position includes the frame number and timestamp of the video frame corresponding to the target image in the video. The second absolute position also includes the position of the target image in the corresponding video frame, and the position includes coordinates and size; the second relative position includes the offset of the target image relative to the specific video frame, and the target image is in the corresponding video frame. The coordinates and size of the video frame.

In a possible embodiment, the camera does not generate the target image, and does not send the target image to the storage device.

In a possible embodiment, it further includes:

The target frame can be understood as the video frame where the target image is located in the foregoing embodiment.

In one possible embodiment,

The target image is located in the I frame in the GOP of the video.

The I frame in the GOP of the video is a key frame.

In a possible embodiment, the camera obtains a target image from a plurality of original video frames, and the image quality of the target image is an image including the target feature in the plurality of original video frames.

The target feature can be understood as a specific feature, for example, the behavior among multiple subjects.

In a possible embodiment, it further includes: the storage device receives the video sent by the camera and the location information of the target image;

In a possible embodiment, it further includes:

The storage device saves the target image;

The storage device deletes the video.

In a possible embodiment, it further includes:

For the sake of simplicity, the present embodiment does not have definitions of large images, small images, location information, index tables, etc., for expanded description. For details, please refer to Figure 2, Figure 3A, Figure 3B, Figure 5A, and Figure 5B, etc. and related large images. Descriptions of pictures, thumbnails, location information, index tables, definitions of specific video frames, etc. This embodiment also does not introduce the collection of video by the camera, the transmission of the video, etc. For details, please refer to FIG. 1, FIG. 3A, FIG. 3B and related descriptions. For other terms and definitions, please refer to the content described in the foregoing embodiment.

Referring to FIG. 8, FIG. 8 is a schematic structural diagram of a media data transmission device provided in this application. The media data transmission device 800 of the embodiment of the present application includes:

The first generating unit 810 is configured to generate multiple original video frames;

The second generating unit 820 is configured to generate a video using multiple original video frames;

The obtaining unit 830 is configured to obtain location information of the target image in the video, where the target image is a video frame in the video or a part of the video frame in the video;

The sending unit 840 is used to send the video and location information to the storage device.

In one possible embodiment,

In the case that the target image category is a large image, the position information includes the first absolute position and/or the first relative position, where the first absolute position includes one of the frame number and the time stamp of the target image in the video, or There are multiple types, the first relative position includes the offset of the target image relative to the specific video frame;

In one possible embodiment,

In a possible embodiment, the media data transmission apparatus does not generate the target image, and does not send the target image to the storage device.

In a possible embodiment, it is also used for:

Select the target frame where the target image is located from the video;

In one possible embodiment,

The target image is located in the I frame in the GOP of the video.

In a possible embodiment, it is also used to: obtain a target image from multiple original video frames, and the image quality of the target image is an image including the target feature in the multiple original video frames.

Refer to FIG. 9, which is a schematic structural diagram of a camera provided in this application. The camera 900 in the embodiment of the present application includes a processor 910 and a transceiver module 920, where:

The processor 910 is configured to generate multiple original video frames, and use the multiple original video frames to generate a video; obtain position information of the target image in the video, where the target image is a video frame in the video or a part of the video frame in the video;

The transceiver module 920 is used to send the video and location information to the storage device.

In one possible embodiment,

In a possible embodiment, the processor 910 does not generate the target image, and the transceiver module 920 does not send the target image to the storage device.

In a possible embodiment, it is also used for:

Select the target frame where the target image is located from the video;

In one possible embodiment,

The target image is located in the I frame in the GOP of the video.

As shown in FIG. 10, an embodiment of the present application further provides a camera 1000. The camera 1000 includes a processor 1010, a memory 1020, and a transceiver 1030. The memory 1020 stores instructions or programs, and the processor 1010 is configured to execute the memory 1020. Instruction or program stored in. When the instructions or programs stored in the memory 1020 are executed, the processor 1010 is used to perform the operations performed by the processor 920 in the foregoing embodiment, and the transceiver 1030 is used to perform the operations performed by the transceiver module 902 in the foregoing embodiment.

Refer to FIG. 11, which is a schematic structural diagram of a media data transmission device provided in this application. The media data transmission device 1100 provided in the embodiment of the present application includes:

The receiving unit 1110 is configured to receive the video sent by the camera and the position information of the target image, where the target image is a video frame in the video or a part of the video frame in the video;

The obtaining unit 1120 is configured to obtain a target image from a corresponding video frame of the video according to the location information.

In one possible embodiment,

In a possible embodiment, during the storage life cycle of the video, when a request to read the target image is received, the target image is obtained from the corresponding video frame in the video according to the location information.

In a possible embodiment, it is also used to: at the end of the storage life cycle of the video, obtain the target image from the corresponding video frame of the video according to the location information; save the target image, and delete the video.

In a possible embodiment, it is also used for:

Refer to FIG. 12, which is a schematic structural diagram of a storage device provided in this application. The storage device 1200 provided in the embodiment of the present application includes a transceiver module 1210 and a processor 1220:

The transceiver module 1210 is used to receive the video sent by the camera and the location information of the target image, where the target image is a video frame in the video or a part of the video frame in the video;

The processor 1220 is configured to obtain a target image from a corresponding video frame of the video according to the location information.

In one possible embodiment,

In a possible embodiment, during the storage life cycle of the video, when the storage device 1200 receives a request to read the target image, the storage device 1200 obtains the target image from the corresponding video frame in the video according to the location information.

In a possible embodiment, it is also used for:

As shown in FIG. 13, an embodiment of the present application further provides a server 1300. The server 1300 includes a processor 1310, a memory 1320, and a transceiver 1330. The memory 1320 stores instructions or programs, and the processor 1310 is configured to execute the memory 1320. Instruction or program stored in. When the instructions or programs stored in the memory 1320 are executed, the processor 1310 is used to perform the operations performed by the processor 1220 in the foregoing embodiment, and the transceiver 1330 is used to perform the operations performed by the transceiver module 1210 in the foregoing embodiment.

The embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium can store a program, and when the program is executed, it includes part or all of any of the media data transmission methods described in the above method embodiments. step.

The embodiments of the present application also provide a program product, wherein when the computer program product is read and executed by a computer, part or all of the steps of any media data transmission method recorded in the above method embodiments will be executed.

In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium, (for example, a floppy disk, a storage disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a Solid State Disk (SSD)).

The embodiments of the application are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the application; at the same time, for A person of ordinary skill in the art, based on the idea of the application, will have changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as a limitation to the application.

Claims

A media data transmission method, characterized in that the method includes:

The camera generates multiple original video frames;

The camera generates a video using the multiple original video frames;

Acquiring, by the camera, position information of a target image in a video, wherein the target image is a video frame in the video or a part of a video frame in the video;

The camera sends the video and the location information to a storage device.
The method of claim 1, wherein:

In the case that the category of the target image is a large image, the position information includes a first absolute position and/or a first relative position, wherein the first absolute position includes the position of the target image in the video One or more of a frame number and a time stamp, and the first relative position includes an offset of the target image relative to a specific video frame.
The method of claim 1, wherein:

In the case that the category of the target image is a small image, the position information includes a second absolute position and/or a second relative position, where the second absolute position includes the position of the video frame corresponding to the target image in the video. One or more of a frame number and a time stamp, the second absolute position further includes the position of the target image in the corresponding video frame; the second relative position includes the target image relative to a specific video frame , The position of the target image in the corresponding video frame.
The method according to any one of claims 1 to 3, wherein the camera does not generate the target image, and does not send the target image to the storage device.
The method according to any one of claims 1 to 4, further comprising:

The camera selects the target frame where the target image is located from the video;

The camera acquires position information of the target image in the video according to the target frame and the video.
The method according to any one of claims 1 to 5, wherein:

The target image is located in an I frame in the group of pictures GOP of the video.
The method according to any one of claims 1 to 5, further comprising:

The camera acquires a target image from the plurality of original video frames, and the target image is an image including a target feature in the plurality of original video frames.
The method according to any one of claims 1 to 5, further comprising: the storage device receiving the video sent by the camera and the location information of the target image;

The storage device obtains the target image from the corresponding video frame of the video according to the location information.
The method according to any one of claims 1 to 5, further comprising:

The storage device stores the video sent by the camera and the location information of the target image;

When the storage life cycle of the video ends, the storage device obtains the target image from a corresponding video frame of the video according to the location information;

The storage device saves the target image;

The storage device deletes the video.
The method according to any one of claims 1 to 3, further comprising:

The storage device receives a video stream sent by the camera and location information of a target image, where the target image is a video frame in the video stream or a part of a video frame in the video stream;

The storage device obtains the target image from the corresponding video frame of the video stream according to the location information.
A media data transmission method, characterized in that the method includes:

The storage device receives the video sent by the camera and the location information of the target image, where the target image is a video frame in the video or a part of a video frame in the video;

The storage device obtains the target image from the corresponding video frame of the video according to the location information.
The method of claim 11, wherein:

In the case that the category of the target image is a large image, the position information includes a first absolute position and/or a first relative position, wherein the first absolute position includes the position of the target image in the video One or more of a frame number and a time stamp, and the first relative position includes an offset of the target image relative to a specific video frame.
The method of claim 11, wherein:

In the case that the category of the target image is a small image, the position information includes a second absolute position and/or a second relative position, where the second absolute position includes the position of the video frame corresponding to the target image in the video. One or more of a frame number and a time stamp, the second absolute position further includes the position of the target image in the corresponding video frame; the second relative position includes the target image relative to a specific video frame , The position of the target image in the corresponding video frame.
The method according to claims 11 to 13, characterized in that, during the storage life cycle of the video, when the storage device receives a request to read the target image, the storage device is based on the location The information obtains the target image from the corresponding video frame in the video.
The method according to claims 11-14, further comprising: at the end of the storage life cycle of the video, the storage device obtains the video from the corresponding video frame of the video according to the location information. Target image; the storage device saves the target image, and deletes the video.
A media data transmission device, characterized in that the device includes:

The first generating unit is used to generate multiple original video frames;

The second generating unit is configured to generate a video using the multiple original video frames;

An acquiring unit, configured to acquire position information of a target image in a video, wherein the target image is a video frame in the video or a part of a video frame in the video;

The sending unit is used to send the video and the location information to the storage device.
The device of claim 16, wherein:

In the case that the category of the target image is a large image, the position information includes a first absolute position and/or a first relative position, wherein the first absolute position includes the position of the target image in the video One or more of a frame number and a time stamp, and the first relative position includes an offset of the target image relative to a specific video frame.
The device of claim 16, wherein:

In the case that the category of the target image is a small image, the position information includes a second absolute position and/or a second relative position, where the second absolute position includes the position of the video frame corresponding to the target image in the video. One or more of a frame number and a time stamp, the second absolute position further includes the position of the target image in the corresponding video frame; the second relative position includes the target image relative to a specific video frame , The position of the target image in the corresponding video frame.
The apparatus according to any one of claims 16 to 18, wherein the media data transmission apparatus does not generate the target image, and does not send the target image to the storage device.
A camera, characterized in that the camera comprises:

The processor is configured to generate a plurality of original video frames, and use the plurality of original video frames to generate a video; obtain position information of a target image in the video, wherein the target image is a video frame in the video or the The part of the video frame in the video;

The transceiver module is used to send the video and the location information to the storage device.
The camera of claim 20, wherein:

In the case that the category of the target image is a large image, the position information includes a first absolute position and/or a first relative position, wherein the first absolute position includes the position of the target image in the video One or more of a frame number and a time stamp, and the first relative position includes an offset of the target image with respect to a specific video frame.
The camera of claim 20, wherein:

In the case that the category of the target image is a small image, the position information includes a second absolute position and/or a second relative position, where the second absolute position includes the position of the video frame corresponding to the target image in the video. One or more of a frame number and a time stamp, the second absolute position further includes the position of the target image in the corresponding video frame; the second relative position includes the target image relative to a specific video frame , The position of the target image in the corresponding video frame.
The camera according to any one of claims 20 to 22, wherein the camera does not generate the target image, and does not send the target image to the storage device.
A storage device, characterized in that the device includes:

The transceiver module is used to receive the video sent by the camera and the position information of the target image, wherein the target image is a video frame in the video or a part of a video frame in the video;

The processor is configured to obtain the target image from the corresponding video frame of the video according to the location information.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program includes program instructions that when executed by a processor cause the processor to execute The method of any one of 1-15 is required.