CN113573065A

CN113573065A - Multimedia data coding method and device

Info

Publication number: CN113573065A
Application number: CN202010352449.XA
Authority: CN
Inventors: 宣曼
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2021-10-29

Abstract

The method and the device for encoding the multimedia data can encode different data segments in the multimedia data shot by a camera, specifically, original data segments with different value grades can be encoded in encoding modes with different compression ratios, and the storage spaces occupied by the encoded multimedia data with different compression ratios are different, so that the pressure of storing the multimedia data is effectively relieved.

Description

Multimedia data coding method and device

Technical Field

The embodiment of the application relates to the technical field of information, in particular to a multimedia data encoding method and device.

Background

Video surveillance systems generally consist of a camera at the front end, a transmission system and a storage system at the back end. The camera is mainly responsible for the acquisition of the original multimedia data. The transmission system is responsible for transmitting the encoded multimedia data acquired by the camera and encoded to the storage system. The storage system is responsible for displaying the received encoded multimedia data to a display device in real time or storing the received encoded multimedia data in a storage medium. The storage system may also perform big data analysis or Artificial Intelligence (AI) analysis on the encoded multimedia data to obtain valuable information. The original multimedia data may be an image frame or a video segment, etc.

At present, the work flow of a video monitoring system is as follows: the camera transmits the original multimedia data shot in real time back to the storage system for storage analysis, and the essence of the method is that the camera shoots the original multimedia data, then codes the original multimedia data, and transmits the coded original multimedia data to the storage system through the transmission system for persistent storage. However, in an actual scene, the image frames or video segments captured by the camera occupy a small amount of valuable content, and a large part of the image frames or video segments consume a large amount of transmission bandwidth, storage cost and intelligent analysis cost. In the prior art, in order to save the return bandwidth, the characteristic value of the image frame or the video segment is returned in real time only in busy hours, the on-site image frame or video segment is temporarily stored in the local of the camera, and the image frame or the video segment is asynchronously returned only in idle hours. In the technology, although the pressure of returning the image frames or video segments during busy hours is relieved, the data volume cannot be reduced, and the image frames or video segments cannot be returned together with the characteristic values, so that the integrity of the evidence data is caused, and the real-time decision of the back end is influenced.

Disclosure of Invention

The embodiment of the application provides a multimedia data encoding method and device, which can identify the value grade of an original data segment in original multimedia data, so that the original data segments with different value grades can be encoded in encoding modes with different compression ratios, and therefore when the encoded multimedia data are transmitted back, the transmission bandwidths occupied by the encoded multimedia data with different compression ratios are different, the occupied storage space is also different, and the bandwidth pressure and the storage pressure can be effectively relieved.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

in a first aspect, a multimedia data encoding method is provided, including: shooting by a camera to obtain original multimedia data; determining the value grade of each original data segment in the original multimedia data according to a value grade rule, wherein the original multimedia data comprises a plurality of original data segments, the plurality of original data segments comprise a first original data segment and a second original data segment, and the value grade of the first original data segment is higher than that of the second original data segment; selecting an encoding mode corresponding to the value grade of each of the plurality of original data segments, wherein compression rates corresponding to different encoding modes are different, and the compression rate of the encoding mode corresponding to the first original data segment is lower than that of the encoding mode corresponding to the second original data segment; and coding each original data segment in the original multimedia data according to the selected coding mode to generate coded multimedia data corresponding to the original multimedia data.

Therefore, in the embodiment of the application, by analyzing the value levels of different original data segments in the original multimedia data shot by the camera, the encoding modes corresponding to the value levels can be determined according to the value levels of the different original data segments, and compression rates corresponding to the different encoding modes are different. If the compression ratio of the original data segment with high value grade is low, and the compression ratio of the original data segment with low value grade is high, the storage space occupied by the coded multimedia data with high compression ratio is smaller than the storage space occupied by the coded multimedia data with low compression ratio when the coded multimedia data corresponding to the original data segment is obtained by coding and sent to the media management equipment. Compared with the prior art, the same encoding mode with low compression rate is adopted for encoding during encoding, and the storage space occupied by the compressed encoded multimedia data is large.

Moreover, compared with the prior art that an encoding mode is adopted for data compression, the size of the encoded multimedia data is large when the compression rate is low, and the bandwidth pressure during transmission is large.

In one possible design, the original data segment includes one or more of the following: video segments, full image frames, portions of image frames.

When the original data segment is a video segment, that is, the encoding modes corresponding to different video segments may be different, that is, the compression rates adopted are different. The video segments with high value levels are low in compression ratio and high in fidelity, and the video segments with low value levels are high in compression ratio and low in fidelity. Similarly, when the original data segment is a complete image frame, the complete image frame with a low value level has a high compression rate and low fidelity, and the complete image frame with a high value level has a low compression rate and high fidelity. This allows the user to visually perceive the video segments and the entire image frames with a high level of value, such as for a security scene, as clearly visible to the user, and to identify the person in need of attention.

When the original data segment is a part of the image frame, the value grade of one part of the same image frame is high, the value grade of the other part of the same image frame is low, and the same image frame can be encoded by adopting different encoding modes, so that the original data segment with high value grade in the same image frame is low in compression ratio and high in fidelity during encoding, and the original data segment with low value grade is high in compression ratio and low in fidelity during encoding. For example, in a security scene, if a plurality of pedestrians are shot in the same image frame, the camera can only encode the portrait area of the pedestrian matched with the value event in the value event library by adopting an encoding method with a low compression rate, and encode other pedestrians by adopting an encoding method with a high compression rate.

In one possible design, determining the value level of each original data segment in the original multimedia data according to a value level rule specifically includes: identifying a value event in original multimedia data through semantic analysis, and determining a value level of an original data segment where the value event is located according to the value event, wherein a value level rule is used for describing a corresponding relation between the value event and the value level.

The semantic analysis may be performed in various manners, such as AI algorithms like computer vision analysis, video semantic analysis, or machine learning. The method comprises the steps that an object needing attention possibly exists in an image frame corresponding to original multimedia data, the feature of the object needing attention can be identified through semantic analysis, the feature is compared with the feature of a value event in a value event library, if the comparison is successful, the value level of the object needing attention can be determined according to a value level rule, and the value level of the object needing attention is the value level of an original data segment where the object is located. Therefore, the corresponding coding modes can be adopted for coding according to the determined value grade, and the compression rates corresponding to different coding modes are different, so that the storage space of the multimedia management equipment is further saved.

In one possible design, a valuable event is recorded in a first raw data segment; no value events are recorded in the second raw data segment.

That is, when a value event is recorded in a first raw data segment and a value event is not recorded in a second raw data segment, the value rating of the first raw data segment is higher than the value rating of the second raw data segment. Therefore, the first original data segment with high value grade can adopt an encoding mode with low compression rate, so that the visual fidelity of the first original data segment with high value grade is high.

In one possible design, the method further includes: immediately sending the first original data segment to the multimedia management equipment; the second original data segment is stored in a non-volatile storage medium local to the camera, after which the second original data segment in the non-volatile storage medium is sent to the multimedia management device.

The value grade of the first original data segment is higher than that of the second original data segment, the valuable event is recorded in the first original data segment, and the value event is not recorded in the second original data segment, so that the first original data segment can be immediately sent to the multimedia management equipment through the transmission system, and the second original data segment can be temporarily stored locally in the camera. The multimedia management device comprises a storage device, a display device, a control analysis device and the like, so that a user at the multimedia management device side can timely know the value event. And storing the second original data segment which does not record the value event into a local nonvolatile storage medium of the camera, and sending the second original data segment in the nonvolatile storage medium to the multimedia management equipment in idle time, so that the storage pressure at the multimedia management equipment side and the bandwidth pressure of transmission data can be further relieved on the basis of timely acquiring the value event. The second original data segment can also be stored locally in the camera for a storage period, and when the storage period expires, the second original data segment can be deleted.

In one possible design, the method further includes: sending the first original data segment to the multimedia management equipment by using a high-speed network; and sending the second original data segment to the multimedia management device by using the low-speed network.

Because the value grade of the first original data segment is higher than that of the second original data segment, for example, when a value event is recorded in the first original data segment and a value event is not recorded in the second original data segment, the encoded multimedia data obtained by encoding the first original data segment can be sent to the multimedia management device through the high-speed network, so that a user at the multimedia management device side can know the value event in time. For example, in a security scenario, the first original data segment records portrait characteristics of a criminal suspect, and then the user can quickly learn the trend of the criminal suspect. In addition, location information for the value event may also be included in the encoded multimedia data at the time of transmission.

And sending the second original data segment to the multimedia management equipment by adopting a low-speed network when the camera is idle. In this way, when the first original data segment of the value event is sent, the second original data segment which does not record the value event can not occupy the bandwidth of the high-speed network, and the bandwidth pressure when the value event is sent is relieved. The high-speed network may be, for example, 5G wireless transmission, and the low-speed network may be, for example, eMBB high-capacity-demand network slice transmission.

In one possible design, the format of the original multimedia data includes: one of RAW format, RGB format or YUV format, HSV format, Lab format, CMY format or YCbCr format; the format of the encoded multimedia data includes: one of an image media format or a video media format.

The image frames directly generated by the sensor in the camera may be in a RAW format, and the RAW format is correspondingly different according to different sensor designs. For example, the RAW format may be bayer RGGB, RYYB, RCCC, RCCB, RGBW, CMYW, or other formats. In some embodiments, the camera may also convert RAW images in various formats to RGB format using an ISP. The ISP may also convert the RAW format to a first of the following formats: YUV format, HSV format, Lab format, CMY format, YCbCr format. These formats, such as RGGB, YUV, HSV format, Lab format, CMY format, YCbCr format, may be referred to as raw media data format.

If encoded by an encoder in the camera, the original image can be converted into a media data format that is easily recognized by the human eye and occupies less storage space. The media data format of the image may be, for example: jpeg format, bmp format, tga format png format, gif format, and the like. The media format of the video may be, for example: MPEG format, AVI format, nAII format, ASF format, MOV format, WMV format, 3GP format, RM format, RMVB format, FLV/F4V format, H.264 format, H.265 format, etc.

In one possible design, the camera may receive configuration information sent by the multimedia management device, which may include value events and value rating rules. The value grade rule is used for describing the corresponding relation between the value event and the value grade.

The configuration information may further include a storage manner corresponding to the value class, and may further include a transmission manner corresponding to the value class.

For example, when the value level of the first original data segment is higher than that of the second original data segment, the first original data segment may be immediately sent and stored in the multimedia management device side, and the second original data segment may be temporarily stored in the local area of the camera. The first original data segment may be transmitted to the multimedia management device using a high-speed network, and the second original data segment may be transmitted to the multimedia management device using a low-speed network. Therefore, the storage pressure of the multimedia management equipment side can be relieved, and the storage space is saved. Bandwidth pressure in transmitting data can also be alleviated.

In a second aspect, an electronic device is provided, comprising: the shooting module is used for shooting to obtain original multimedia data;

the processing module is further used for determining the value grade of each original data segment in the original multimedia data according to the value grade rule, wherein the original multimedia data comprises a plurality of original data segments, the plurality of original data segments comprise a first original data segment and a second original data segment, and the value grade of the first original data segment is higher than the value grade of the second original data segment; the processing module is further used for selecting an encoding mode corresponding to the value grade of each original data segment in the plurality of original data segments, wherein compression rates corresponding to different encoding modes are different, and the compression rate of the encoding mode corresponding to the first original data segment is lower than that of the encoding mode corresponding to the second original data segment; and the coding module is also used for coding each original data segment in the original multimedia data according to the selected coding mode to generate the coded multimedia data corresponding to the original multimedia data.

In one possible design, the processing module is to: identifying a value event in original multimedia data through semantic analysis, and determining a value level of an original data segment where the value event is located according to the value event, wherein a value level rule is used for describing a corresponding relation between the value event and the value level.

In one possible design, the electronic device further includes a transceiver module to: immediately sending the first original data segment to the multimedia management equipment; the electronic device further comprises a storage module for storing the second original data segment in a non-volatile storage medium local to the camera, and then transmitting the second original data segment in the non-volatile storage medium to the multimedia management device.

In one possible design, the transceiver module is further configured to: sending the first original data segment to the multimedia management equipment by using a high-speed network; and sending the second original data segment to the multimedia management device by using the low-speed network.

In a third aspect, there is provided a computer readable storage medium comprising computer instructions which, when executed on an electronic device, cause the electronic device to perform the method of the first aspect as well as any one of the possible designs of the first aspect.

In a fourth aspect, there is provided a computer program product for causing an electronic device to perform the method of the first aspect as well as any one of the possible designs of the first aspect when the computer program product runs on a computer.

In a fifth aspect, a camera is provided, which includes: the lens assembly is used for receiving light rays; a sensor for converting light received by the lens assembly into original multimedia data; a processor to: determining the value grade of each original data segment in the original multimedia data according to a value grade rule, wherein the original multimedia data comprises a plurality of original data segments, the plurality of original data segments comprise a first original data segment and a second original data segment, and the value grade of the first original data segment is higher than that of the second original data segment; selecting an encoding mode corresponding to the value grade of each of the plurality of original data segments, wherein compression rates corresponding to different encoding modes are different, and the compression rate of the encoding mode corresponding to the first original data segment is lower than that of the encoding mode corresponding to the second original data segment; and coding each original data segment in the original multimedia data according to the selected coding mode to generate coded multimedia data corresponding to the original multimedia data.

In one possible design, further comprising: and an image signal processor ISP between the sensor and the processor for performing signal processing on the raw multimedia data before transmitting the raw multimedia data to the processor.

Drawings

Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present application;

fig. 2 is a flowchart illustrating a multimedia data encoding method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of identifying a value event in an image frame according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. In the description of the embodiments herein, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present application, "a plurality" means two or more than two.

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present embodiment, "a plurality" means two or more unless otherwise specified.

The embodiment of the application can be applied to a video monitoring system, when multimedia data such as video segments, image frames or image frames and the like shot by a camera are transmitted to multimedia management equipment, the application can identify the multimedia data with different value levels, so that the time for sending the different multimedia data to the multimedia management equipment is determined according to the different value levels, and the bandwidth during multimedia data transmission is saved.

Referring to fig. 1, fig. 1 is a network architecture provided by the present application, which may include a camera 10, a transmission system 11, and a storage system 12. The camera 10 may comprise a camera, which may be an analog camera, a web camera, a high definition camera, or the like. The camera may comprise a sound pick-up, which may also be referred to as a listening head, a combination of a microphone and an amplifier, etc., sensors, encoders, acquisition cards, etc. The transmission system 11 may include transmission devices such as video lines, audio lines, wireless networks, network lines, optical fibers, optical transceivers, switches, and routers. The storage system 12 includes a display device, a multimedia management device (storage device), and a control device. The display device may be a monitor or a large-screen television wall, the storage device may be a Network Video Recorder (NVR), an internet protocol storage area network (IP SAN), or a cloud storage device, and the control device may be a video distributor, a matrix, a control keyboard, a comprehensive management platform, or the like.

The camera 10 is mainly used for collecting audio and video data. The transmission system 11 is used for transmitting the audio and video signals collected by the camera and subjected to encoding and decoding to the storage system. The storage system 12 is used for displaying the audio and video signals received from the transmission device to the display device in real time or storing the audio and video signals into a hard disk medium, and also can perform big data analysis or AI analysis and the like on the audio and video signals to obtain valuable information, and can further optimize or classify the audio and video data, so that the user can watch the audio and video signals conveniently.

For example, for a security scene, the camera may transmit data of captured image frames or video segments to the storage system through the transmission system, store the data of the image frames or video segments in the storage device of the storage system, and the control device may read the data in the storage device, control the data to be distributed to a plurality of display devices, or select a picture of any one display device to be displayed on a large-screen television wall. In the data transmission process, a large bandwidth needs to be occupied, and in the data storage process, a large storage cost is also needed.

The camera can be in a wired connection mode or a wireless connection mode with the transmission system and the storage system.

It should be noted that in some scenarios, the original multimedia data acquisition end and the data receiving end, and the multimedia management device (storage device) may be different devices, for example, in a security scenario, the original multimedia data acquisition end obtained by shooting may include a camera, and the multimedia management device may include an NVR, which is a store-and-forward part of the video monitoring system, and the NVR cooperates with the network camera. In other scenarios, the acquiring end of the original multimedia data and the receiving end of the data may also be the same device. For example, a smart phone, a tablet computer, a vehicle event data recorder and the like have a camera shooting function and a storage function, and in such a scene, the transmission system can be set as default.

In a video monitoring system, a camera captures video segments or image frames in real time, and generally captures how much multimedia data content is stored in a storage device according to how much multimedia data content is captured, wherein valuable data occupies less data, and most of the less valuable or non-valuable data consumes a large amount of transmission bandwidth and storage cost. Although currently, to save bandwidth, a camera may transmit feature values of video segments or image frames in real time, while live pictures taken are cached locally in the camera and asynchronously transmitted to a storage device at idle. For example, when there is an image area of a person in an image frame captured by a camera, only the sensory characteristic values of the face, nose, eyes, mouth, ears, and the like of the person may be sent to the storage device in real time, and the image frame may be temporarily stored locally in the camera and transmitted to the storage device when the network is idle.

Although the scheme relieves the pressure of transmitting video frames or image frames when the network is busy, the data volume of transmission cannot be reduced, so that the occupation of storage space in a storage system cannot be reduced, and the image frames are not transmitted to the storage system together with characteristic values, so that the incomplete evidence making data is easily caused, and the real-time decision of a rear end is influenced. For example, when the feature value matches with the feature of a person pre-stored in the storage system, the person is a marked dangerous person, but the image of the person captured in real time is not transmitted to the storage system in time, so that the environment where the person is located at that time cannot be known in time, which brings inconvenience to the user. Moreover, when the picture frame is temporarily stored in the local of the camera, the local storage space of the camera is limited in a security scene, so that space recovery exists. For example, when the capacity of the stored disk reaches a reclamation threshold, the camera may be deleted from the file with the oldest storage time to ensure that the most recently taken picture or video frame may be cached locally to the camera. However, there may be a risk that the picture frame or video frame in the scene is not transmitted to the storage system and is deleted locally in the camera, which is likely to cause a serious accident of losing the key data.

The method comprises the steps that a camera still sends all multimedia data contents to a storage device, but the camera can identify the multimedia data with different value levels, so that the multimedia data with different value levels can be encoded by adopting different encoding modes, the compression ratios corresponding to the different encoding modes are different, for example, the compression ratio of the encoding mode corresponding to the multimedia data with a high value level is low, the compression ratio of the encoding mode corresponding to the multimedia data with a low value level is high, and the multimedia data with a high value level adopts a visual lossless encoding mode and has high fidelity. For multimedia data with low value grade, a visual lossy coding mode is adopted, and the fidelity is low. Therefore, during transmission, the high-value-level multimedia data occupies a smaller bandwidth than the low-value-level multimedia data due to a low compression rate, and during storage in the storage system, the high-value-level multimedia data needs a smaller storage space due to a low compression rate, and the low-value-level multimedia data needs a larger storage space due to a high compression rate. Therefore, on one hand, certain transmission bandwidth can be saved, and certain storage space can also be saved.

The examples of the present application are further illustrated below.

An embodiment of the present application provides a multimedia data encoding method, as shown in fig. 2, the method includes:

201. and shooting by the camera to acquire original multimedia data.

The camera can shoot the environment within the visual angle range of the lens in real time to obtain original multimedia data, and the original multimedia data can be a video file obtained when the camera shoots. The video file may be composed of a plurality of image frames. An image frame may also be understood as a video frame.

The original multimedia data in this application can also be understood as video data captured by a camera before being encoded.

202. The camera determines the value grade of each original data segment in the original multimedia data according to a value grade rule, wherein the original multimedia data comprises a plurality of original data segments, the plurality of original data segments comprise a first original data segment and a second original data segment, and the value grade of the first original data segment is higher than the value grade of the second original data segment.

Wherein, determining the value grade of each original data segment in the original multimedia data according to the value grade rule may include: identifying a value event in original multimedia data through semantic analysis, and determining a value level of an original data segment where the value event is located according to the value event, wherein a value level rule is used for describing a corresponding relation between the value event and the value level. Regions of interest (ROIs) may also be associated with non-ROIs to different value levels, for example: and taking the foreground part in the video/image as a high value grade and the background part as a low value grade.

The semantic analysis may be an AI algorithm such as computer vision, video semantic analysis, machine learning, and the like. Among them, computer vision can be understood as: the camera is used for replacing human eyes to perform machine vision such as identification, tracking and measurement on a target, and further performs image processing, so that the camera is processed into an image which is more suitable for human eyes to observe or is transmitted to an instrument to detect. Video semantic analysis can be understood as: the mapping relation between the video low-level physical characteristic space and the video high-level semantic space is established through computer vision processing, semantic information capable of reflecting human subjective concepts is extracted from the video content, and the video semantic content is automatically labeled. Machine learning can be understood as the science of artificial intelligence, and the main research object in the field is artificial intelligence, and the computer algorithm is automatically improved by adopting data or past experience so as to optimize the performance of a computer program.

That is, when the camera obtains the original multimedia data, the value analysis is performed on the original multimedia data to determine the value event in the original multimedia data, and then the value level corresponding to the value event is determined according to the corresponding relationship between the value event and the value level in the value level rule. The original multimedia data comprises a plurality of original data segments, and the value grade corresponding to the value event is the value grade corresponding to the original data segment where the value event is located.

Wherein the original data segment may include one or more of: video segments, full image frames, portions of image frames (incomplete image frames). The original data segment, which may be a video segment, may be an image frame, or may be a portion of an image frame, is determined by a value event of the original multimedia data.

In some embodiments, the camera captures a video file consisting of a plurality of image frames during the capturing. The raw multimedia data may be understood as a video file consisting of a plurality of image frames. The camera identifies the value event of the original multimedia data, and can be understood as the value event of the video file identified by the camera. Since the video file is made up of multiple image frames, each time a camera takes an image frame, the camera may analyze the image frame to determine the value event for that image frame. If the camera identifies that the value events exist in the image frame through semantic analysis, the number of the value events in the image frame can be determined, and if at least one value event exists, the number of the at least one value event in the image frame is smaller than or equal to a first threshold value, the camera can determine that the value event exists in a part of the image frame, so that the camera can determine the value level corresponding to the at least one value event in the part of the image frame according to the corresponding relation. In this case, each value event in a portion of the image frame may be a raw data segment. If the number of value events in an image frame is greater than a first threshold, the camera may treat the entire area of the image frame as a raw data segment. When the original data segment is determined, the camera can determine the value level corresponding to the original data segment where the value event is located according to the value rule level.

Illustratively, when a complete image frame is captured by the camera, each object in the image frame may be identified by semantic analysis, referring to fig. 3, for example, the object may include one or more of the following: at least one person a, at least one animal B, at least one building C, or at least one plant D, etc. When the plurality of objects are identified by the camera, the plurality of objects may be compared to a plurality of value events in a preset library of value events to determine whether certain objects of the plurality of objects have a similarity to the preset plurality of value events above a second threshold. If there is an object with a similarity higher than the second threshold, for example, it is determined that the similarity of a certain person a to the preset value event is higher than the second threshold, the camera determines that the value event of the person a exists in the image frame. The data of the portrait area of the person a in the image frame is one piece of original data. If the first threshold is 3, determining 3 or more than 3 objects with similarity higher than a second threshold with a preset value event in the image frame, and then using the image frame as an original data segment by the camera, and determining the value level of the image frame when determining the value level; if the number of the objects with the similarity higher than the second threshold value is determined to be less than 3 and 2 objects in the image frame, the data of the image area corresponding to each object in the 2 objects is taken as one original data segment by the camera, and 2 original data segments are used, and each original data segment records the valuable event. In determining the value level, the value level of the original data segment for each object will be determined.

For example, in a security scene, the camera needs to determine whether a specific tracking person exists in the captured objects within the view angle range in real time, so that when the camera captures only the person a as the object of the objects in the currently captured image frame, and the similarity between the image area of the person a and the value event E in the value event library is higher than the second threshold, the camera may use the image area of the image frame except for the person a as the image area of the non-value event. If the value event E describes a portrait of a criminal suspect, the camera may determine that the person A is a criminal suspect. Because the value event in the image frame only has the person A, and the number of the objects of the value event is less than 3, the camera takes the portrait area of the person A in the image frame as an original data segment, and further the value level of the original data segment can be determined according to the corresponding relation. Meanwhile, the camera determines an image area other than the portrait area of the person a in the image frame as a non-value event. Similarly, if there are a plurality of persons in the image frame whose degree of similarity to the specific tracked person is higher than the second threshold value and the number is more than 3, the camera may determine the entire area of the image frame as one piece of raw data and determine the value level of the piece of raw data.

When the similarity is determined, the facial features of the people in the image frame can be identified through semantics, the facial features of the people are compared with the facial features of the people stored in the value event library, and the people with the similarity higher than the second threshold after comparison are determined as the tracking people.

In some embodiments, the original data segment may also be a video segment, i.e. the camera may take a video segment composed of a plurality of image frames as one original data segment. In this case, if the camera determines that the similarity between a certain object in the first image frame and the preset value event is greater than or equal to the third threshold and less than the second threshold, the camera may determine that the object in the image frame is a suspected object with a certain similarity to the value event in the value event library, but the camera cannot accurately determine that the object is determined to be the value event in the value event library. At this time, the camera may continue to track a plurality of image frames consecutively captured after the first image frame, and use a video segment composed of the first image frame and the plurality of image frames as an original data segment to determine a value level of the video segment as the original data segment. In this case, when there is a suspected object in the first image frame, the video segment of the first image frame is regarded as an original data segment, and the suspected value event is recorded in the original data segment.

In some embodiments, if it is determined that the threshold for similarity to a value event comprises a second threshold and a third threshold, the camera may determine the image frame as a raw data segment in the event that each object in the image frame has a similarity to a value event in the value event repository that is less than the third threshold, and the raw data segment does not record a value event, the image frame may be treated as a non-value event.

In summary, if the original multimedia data includes a plurality of original data segments, and the plurality of original data segments includes a first original data segment and a second original data segment, the value event is recorded in the first original data segment, and when no value event is recorded in the second original data segment, the value level of the first original data segment is higher than the value level of the second original data segment.

For example, in a security scene, if a client requests a video monitoring system to track multiple types of key personnel, such as criminal suspects, the camera side needs to perform real-time online monitoring on the key personnel to determine whether people highly similar to the key personnel appear within the view angle range of the camera, so as to inform a user of the storage system in time. The key personnel can perform network configuration on the camera through pre-configuration at the camera side, or through a storage system or other systems, that is, through sending configuration information to the camera, so that the camera can determine the original data segment of the multimedia data, and the value event and the value grade corresponding to the original data segment according to the configuration information.

In some embodiments, referring to table 1, the configuration information may include a value event, an event tag, a value rating, and an event coordinate, among others. The value event may be an image of an object that needs to be tracked, for example, may be a portrait photograph of multiple classes of important people in a security scene. The camera can determine whether a value event and a suspected value event are found in the image frames shot in real time according to the configured value event, and the events except the value event and the suspected value event are determined to be non-value events.

The event tag may be used to indicate that the camera may be used to mark the raw data segment, for example, to mark the object to be tracked with a specific type of multiple key personnel. For example, in a security scene, if the camera determines that a value event with extremely high similarity to a criminal suspect exists in the image frame, an event label of the criminal suspect can be added to an original data segment of the value event; or, when the camera determines that a suspected value event of a suspected criminal suspect exists in the image frame, the event label of the suspected value event can be determined to be suspected, and then the event labels of key personnel and events except the suspected event can be determined to be safe.

The value level may be a level of the value event, for example, the value level of multiple types of important persons is high, the value level of suspected important persons (suspected value event) is medium, and the value level of other persons (non-value event) is low.

The event coordinates may indicate that the camera needs to mark target coordinates of the value event in the image frame, and the target coordinates may be understood as an image area of the value event in the image frame, for example, the camera may use the event coordinates of the key person to indicate a portrait area of the key person in the image frame; the event coordinates of the suspected value event are used for indicating the portrait area of the suspected key person in the image frame, and the event coordinates of the non-value event are marked as default.

TABLE 1

Event(s)	Value rating	Event label	Event coordinates
				Value events	Height of	Criminal suspect	Target coordinates
Suspected value event	In	Suspected of being	Target coordinates
				Non-value events	Is low in	Secure	Default

203. The camera selects an encoding method corresponding to the value level of each of the plurality of original data segments, wherein the encoding may be considered as compressing the data because the encoding may cause a reduction in the amount of data, the compression rates corresponding to different encoding methods may be different, and the compression rate of the encoding method corresponding to the first original data segment may be lower than the compression rate of the encoding method corresponding to the second original data segment.

It should be noted that different coding algorithms often correspond to different coding schemes, but the coding schemes are not equivalent to the coding algorithms, for example: for the same encoding algorithm, a plurality of different levels may be included, and the different levels have different compression rates, so that each compression rate corresponds to one encoding mode.

When the value level of the original data segment is determined, the camera may determine the corresponding encoding mode according to the value level. When the value level of the first original data segment is higher than the value level of the second original data segment, the camera may determine that the compression rate of the encoding manner corresponding to the first original data segment is lower than the compression rate of the encoding manner corresponding to the second original data segment.

In some embodiments, when a valuable event is recorded in the first original data segment or a suspected valuable event is recorded, the camera may perform data compression by using a compression rate that is low, or that is, a compression rate that ensures visual loss or high fidelity is used. When the value event is not recorded in the second original data segment, or the events in the second original data segment are all non-value events, the camera may perform data compression by using a compression rate with a high compression rate, or by using a visual loss, or by using a compression rate with a low fidelity.

For example, referring to table 2, for the implementation of table 1, different compression rates may be used according to different value levels. For example, for a value event with a high value level, the original data segment corresponding to the value event may be encoded with a low compression rate, the original data segment may correspond to a value event region of an image frame, the value event region is a partial region or a whole region of the image frame, or the original data segment may correspond to a video segment. Corresponding coding modes, for example, high-definition compression or lossless compression is adopted for the value event area of the image frame, and standard compression is adopted for the area outside the value event area of the image frame; or when the value event is a video band, high-definition compression or lossless compression is adopted for the video band.

For a suspected value event with a medium value level, the encoding mode of the original data segment of the suspected value event can be similar to the encoding mode of the value event with a high value level, namely, high-definition compression or lossless compression is adopted for a suspected value event area in an image frame, and standard compression is adopted for areas except the suspected value event area in the image frame. When the suspected value event is a video segment, high-definition compression or lossless compression can be adopted for the video segment.

For events with low value level, or for non-value events, low definition compression (lossy compression) with high compression rate can be applied to the original data segment of the non-value event, and no definition requirement is made on the image frame or video segment.

As for the encoding method, for example, the high definition compression may be High Efficiency Video Coding (HEVC), and the lossless compression may be Moving Picture Experts Group (MPEG) standard. The standard compression can be in a coding mode such as H.261, H.263 or H.264 proposed by the International telecommunication Union. The low-definition compression may be, for example, Joint Photographic Experts Group (JPEG).

TABLE 2

Value rating	Data content	Coding method
			Height of	Value event region of image frame	High definition compression or lossless compression
In	Suspected value event area of image frame	Standard compression
			Is low in	Image frame worthless region	Low definition compression.

In the above embodiment, when the value event is in the image frame, the value event area may adopt a coding mode with a low compression rate, and the non-value event area may adopt a coding mode with a high compression rate; or the video segment of the value event adopts a low-compression-rate coding mode. In other embodiments, the same image frame may be encoded differently according to other different modes. For example, there may be multiple pedestrians for the same image frame. According to the embodiment of the application, through semantic recognition, when a pedestrian needing attention (for example, at least one of the above key persons) is detected from a plurality of pedestrians, the pedestrian needing attention in an image frame can be determined as a value event when a value level is determined, an image area of the pedestrian needing attention in the image frame is an original data segment, and the original data segment can adopt a coding mode with a low compression rate, for example, HEVC; a pedestrian that does not need attention in the image frame may be determined as a non-value event, but since the non-value event is a pedestrian as an object, the original data segment of the image area in the image frame may adopt an encoding mode with a medium compression rate, such as the MPEG standard; the background region of the image frame other than the pedestrian may be encoded with a high compression rate, for example, JPEG.

When the camera shoots a plurality of pedestrians, the facial features of each pedestrian can be extracted, the extracted facial features are compared with the facial features of the people stored in the value event library, the pedestrian with the similarity higher than a certain threshold can be determined as the pedestrian needing attention, the pedestrian with the similarity lower than the threshold is determined as the pedestrian not needing attention, and the image areas except the pedestrians are determined as the background areas.

The encoding method corresponding to the value class may be pre-configured in the camera, or the storage system may be configured in the camera via a network, and for example, the configuration information may include an encoding method corresponding to the value class.

204. And the camera encodes each original data segment in the original multimedia data according to the selected encoding mode to generate encoded multimedia data corresponding to the original multimedia data.

The camera can acquire raw multimedia data within a view angle range through a sensor, and the format of the raw multimedia data generated by the sensor, or the format of each generated image frame, can be different according to different sensor designs.

In some embodiments, the format of the RAW multimedia data generated by the sensor may be a RAW format, which may be different according to the sensor design. For example, the RAW format may be one of plural formats such as bayer RGGB, RYYB, RCCC, RCCB, RGBW, CMYW, and the like. The camera may also use Image Signal Processing (ISP) to convert the RAW format acquired by the sensor into one of the following formats: YUV format, HSV format, Lab format, CMY format, or YCbCr format. In the embodiments of the present application, after being processed by the ISP, the original multimedia data is still referred to as original multimedia data, that is, the format of the original multimedia data may also be one of YUV format, HSV format, Lab format, CMY format, or YCbCr format.

When each original data segment in the original multimedia data is encoded by an encoder in the camera, the image frame can be converted into encoded multimedia data that is easily recognized by the human eye and occupies less storage space. The encoded multimedia data may be in an image media format or a video media format. The image media format may be, for example: one of jpeg format, bmp format, tga format png format, gif format, and the like; the video media format may be one of an MPEG format, AVI format, naiv format, ASF format, MOV format, WMV format, 3GP format, RM format, RMVB format, FLV/F4V format, h.264 format, or h.265 format.

Therefore, when the camera obtains the encoded multimedia data, the encoded multimedia data can be sent to the storage device for storage through the transmission system.

In summary, according to the present application, when the original multimedia data is divided into original data segments with different value levels, the original data segments can be encoded by using encoding modes corresponding to the different value levels, the original data segments with high value levels are encoded by using an encoding mode with a low compression rate, and the original data segments with low value levels are encoded by using an encoding mode with a high compression rate, so that encoded multimedia data encoded by the original data with high value levels can be timely returned to the back-end device, so that a user can timely know the movement of key personnel or attention personnel. In addition, due to the fact that the sizes of the data corresponding to different compression ratios are different, and the occupied storage space is different, the storage system can achieve the effect of saving the storage space of the storage device when storing the coded multimedia data.

In some embodiments, the encoded multimedia data may be sent on a case-by-case basis when sent to the storage device. In the embodiment of the present application, the original multimedia data may include a first original data segment with a high value level and a second original data segment with a low value level, where the first original data segment includes a value event, that is, a person or key person needing attention, and therefore, the first original data segment may be immediately sent to the multimedia management device, or in other words, when the first original data segment is encoded to obtain encoded multimedia data, the encoded multimedia data obtained after encoding may be immediately sent to the multimedia management device, and the multimedia management device may be a storage device of the storage system, so that a manager of the storage system may obtain a movement of the person or key person needing attention in time. The second original data segment has a lower value grade, and the second original data segment does not include a value event, so that the second original data segment can be temporarily stored in a local nonvolatile storage medium (such as a magnetic disk, a solid state disk, and an erasable optical disk) of the camera, so that the second original data segment can be read from the nonvolatile storage medium and sent to the multimedia management device at the back end when the camera is idle; or the encoded multimedia data obtained after the second original data segment is encoded is temporarily stored in a local nonvolatile storage medium of the camera, so that the encoded multimedia data can be read from the nonvolatile storage medium and sent to the multimedia management equipment at the back end when the camera is idle.

In some embodiments, for encoded multimedia data corresponding to the first original data segment with a high value grade, when the encoded multimedia data is immediately transmitted to the multimedia management device of the storage system, the encoded multimedia data corresponding to the first original data segment may be sent to the multimedia management device by using a high-speed network, so that an administrator of the storage system may timely or important information recorded by the first original data segment, such as criminal suspect information. For the encoded multimedia data corresponding to the second original data segment with low value grade, when the camera is idle and is locally read from the camera and then sent to the multimedia management equipment of the storage system, the encoded multimedia data corresponding to the second original data segment can be sent to the multimedia management equipment by adopting a low-speed network. Therefore, timely transmission of the data segment with high value grade can be ensured, and bandwidth pressure during network transmission can be reduced.

Illustratively, referring to table 3, assuming that the value level includes three levels of high, medium, and low, when an original data segment whose value level is high is determined, encoded multimedia data of the original data segment may be immediately transmitted to the multimedia management apparatus and transmitted using a high-speed network. The high-speed network may be, for example, a highly reliable transmission protocol, such as 5G wireless transmission, or eMTC (enhanced communication technology) slice transmission. When the original data segment with the medium value level is determined, low-speed network transmission may be adopted, for example, the encoded multimedia data after the original data segment is encoded adopts Enhanced Mobile Broadband (eMBB) high-capacity-demand network slice transmission. When the original data segment with a low value grade is determined, the original data segment can be temporarily stored in the local of the camera, and when the network of the camera is idle, the encoded multimedia data obtained by encoding the original data segment can be transmitted to the multimedia management device by adopting a low-speed network, for example, an eMBB high-capacity demand network slice.

TABLE 3

When the multimedia management device stores the encoded multimedia data, the storage media and the storage periods which can be used by the encoded multimedia data corresponding to different value levels can be different.

In some embodiments, for the first original data segment with high value grade, a storage medium with long-term preservation can be adopted; for the second original data segment with low value grade, a storage medium suitable for caching can be adopted. Thus, for important data, a storage medium with a long storage period can be used, so that a manager can conveniently view the data when the manager needs to view the data. And caching can be carried out corresponding to the non-important data so as to store the newly acquired data into a cacheable storage medium in time.

For example, referring to table 4, assuming that the value classes include three classes, i.e., a high class, a medium class and a low class, when the multimedia management apparatus determines that encoded multimedia data corresponding to an original data segment with a high value class is received, the encoded multimedia data may be stored in a storage medium that can be stored for a long time, for example, the storage medium may be a Storage Class Memory (SCM) or a blu-ray disc storage medium, or may also be a Solid State Disk (SSD), or a disk array that constitutes a disk array (RAID) 1. The SCM not only enjoys performance of a Dynamic Random Access Memory (DRAM), but also enjoys capacity advantages of NAND Flash, and has memory level persistence and memory fast byte level access commonality. The storage period may be 90 days or permanent.

When the multimedia management device determines that the encoded multimedia data corresponding to the original data segment with the value grade of middle is received, the encoded multimedia data can adopt a storage medium of a common video or image. For example, the general video or image storage medium may be a common Serial Advanced Technology Attachment (SATA) hard disk, which is also called a serial hard disk, or may also be a disk array constituting RAID 5. The storage period may be 30 days, or other time periods.

For the coded multimedia data corresponding to the original data segment with low value grade, a storage medium which is suitable for temporary storage and can be frequently erased can be adopted for local caching of the camera. The storage period may be 14 days, or other time periods.

TABLE 4

Therefore, in the method for processing multimedia data in the embodiment of the application, the original data segments with different value levels can be determined for the multimedia data, and the corresponding coding modes of the original data segments with different value levels can be different. The compression rate of the original data segment having a high value level may be lower than that of the original data segment having a low value level. Thus, the original data segment with high value grade power can be coded by adopting a coding mode with high fidelity, so that the multimedia data with high value grade can be played in a multimedia management device at the back end in a visually lossless manner. Meanwhile, for the original data segment with high value grade, the original data segment can be transmitted to the multimedia management equipment at the rear end immediately by adopting a high-speed network, for the original data segment with low value grade, the original data segment can be temporarily stored in the local part of the camera, and when the network of the camera is idle, the original data segment is transmitted to the multimedia management equipment at the rear end by adopting a low-speed network, so that the data with high value grade can be transmitted to the multimedia management equipment in time, and the bandwidth pressure during the transmission of the multimedia data is relieved. In addition, data with low value grade is cached to the local part of the camera, and the storage cost of the multimedia management equipment side can be saved.

Fig. 4 is a schematic block diagram of an electronic device 40 provided in an embodiment of the present application, which can be used to execute the above-mentioned method for encoding multimedia data. The electronic device 40 may be a camera as in the above-described embodiments. The electronic device 40 may include:

a shooting module 401, configured to perform shooting to obtain original multimedia data;

the processing module 402 is further configured to determine a value level of each original data segment in the original multimedia data according to a value level rule, where the original multimedia data includes a plurality of original data segments, where the plurality of original data segments includes a first original data segment and a second original data segment, and a value level of the first original data segment is higher than a value level of the second original data segment;

the processing module 402 is further configured to select an encoding mode corresponding to the value level of each of the plurality of original data segments, where compression rates corresponding to different encoding modes are different, and a compression rate of an encoding mode corresponding to a first original data segment is lower than a compression rate of an encoding mode corresponding to a second original data segment;

the encoding module 403 is further configured to encode each original data segment in the original multimedia data according to the selected encoding manner, so as to generate encoded multimedia data corresponding to the original multimedia data.

In an embodiment of the present application, the original data segment includes one or more of the following: video segments, full image frames, portions of image frames.

In this embodiment, the processing module 402 may be configured to: identifying a value event in original multimedia data through semantic analysis, and determining a value level of an original data segment where the value event is located according to the value event, wherein a value level rule is used for describing a corresponding relation between the value event and the value level.

In the embodiment of the application, a valuable event is recorded in a first original data segment; no value events are recorded in the second raw data segment.

In the embodiment of the present application, the electronic device 40 further includes a transceiver module 404 for: immediately sending the first original data segment to the multimedia management equipment; the electronic device 40 further comprises a storage module 405 for storing the second piece of raw data in a non-volatile storage medium local to the camera, and then transmitting the second piece of raw data in the non-volatile storage medium to the multimedia management device.

In the embodiment of the present application, a transceiver module 404 is further included to: sending the first original data segment to the multimedia management equipment by using a high-speed network; and sending the second original data segment to the multimedia management device by using the low-speed network.

In the embodiment of the present application, the format of the original multimedia data includes: one of RAW format, RGB format or YUV format, HSV format, Lab format, CMY format or YCbCr format; the format of the encoded multimedia data includes: one of an image media format or a video media format.

It should be understood that the electronic device 40 according to the embodiment of the present application may correspond to the camera in the method embodiment of the present application, and the operations and/or functions of the modules in the electronic device 40 are respectively for implementing the corresponding flows of the method of fig. 2, and are not described herein again for brevity.

Fig. 5 is a schematic diagram of a hardware architecture of an exemplary image processing apparatus according to an embodiment of the present disclosure. The hardware architecture of the image processing apparatus 50 is applicable to the camera in fig. 2.

For example, the image processing apparatus 50 may include a lens assembly, at least one Central Processing Unit (CPU), at least one memory, a microphone, a sensor, a GPU, an encoder, a receiving interface, a transmitting interface, and the like, which are connected through a memory bus. Optionally, the image processing apparatus 50 may further include a decoder (not shown), a dedicated video or graphics processor, a microprocessor and Microcontroller (MCU) (not shown), and the like. In an alternative case, the above parts of the image processing apparatus 50 are coupled through a connector, and it should be understood that in the embodiments of the present application, the coupling refers to interconnection through a specific manner, including direct connection or indirect connection through other devices, for example, connection through various interfaces, transmission lines or buses, which are usually electrical communication interfaces, but mechanical interfaces or other interfaces are not excluded, and the present embodiment is not limited thereto. In an alternative case, the above-mentioned parts are integrated on the same chip; in another alternative, the CPU, the GPU, the encoder, the receiving interface, and the transmitting interface are integrated on a chip, and portions inside the chip access an external memory through a bus. The dedicated video/graphics processor may be integrated on the same chip as the CPU or may exist as a separate processor chip, e.g., the dedicated video/graphics processor may be a dedicated ISP. The ISP may be located between the sensor and the processor for signal processing of the raw multimedia data before sending the raw multimedia data to the CPU. The chips referred to in the embodiments of the present application are systems manufactured on the same semiconductor substrate in an integrated circuit process, also called semiconductor chip, which may be a collection of integrated circuits formed on the substrate (typically a semiconductor material such as silicon) by an integrated circuit process, the outer layers of which are typically encapsulated by a semiconductor encapsulation material. The integrated circuit may include various types of functional devices, each of which includes a logic gate, a metal-oxide-semiconductor (MOS) transistor, a bipolar transistor, a diode, or other like transistor, and may also include a capacitor, a resistor, or an inductor, among other components. Each functional device can work independently or under the action of necessary driving software, and can realize various functions such as communication, operation, storage and the like.

In some embodiments, the CPU may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor; alternatively, the CPU may be a processor group including a plurality of processors, and the plurality of processors are coupled to each other via one or more buses. In an alternative case, the processing of the image or video signal is performed partly by the GPU, partly by a dedicated video/graphics processor, and possibly by software code running on a general purpose CPU or GPU.

Corresponding to fig. 4, the CPU may be the same function as the processing module 402 in the electronic device of fig. 4.

A memory operable to store computer program instructions, including an Operating System (OS), various user application programs, and various computer program code for executing the program code of aspects of the present application; the memory may also be used to store video data, image signal data, and the like; the CPU may be configured to execute computer program code stored in the memory to implement the methods of embodiments of the present application. Alternatively, the memory may be a nonvolatile memory, such as an embedded multimedia card (EMMC), a universal flash memory (UFS) or a read-only memory (ROM), or other types of static storage devices capable of storing static information and instructions, or a nonvolatile memory, such as a Random Access Memory (RAM) or other types of dynamic storage devices capable of storing information and instructions, or an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM), or other optical disc storage, optical disc storage (including compact disc, laser disc, digital versatile disc, optical disc, etc.), or other optical disc storage, But is not limited to, magnetic disk storage media or other magnetic storage devices, or any other computer-readable storage medium that can be used to carry or store program code in the form of instructions or data structures and that can be accessed by a computer.

Corresponding to fig. 4, the memory may be the same function as the memory module 405 in fig. 4.

The receiving interface may be an interface for data input of the processor chip, and in an optional case, the receiving interface may be a High Definition Multimedia Interface (HDMI). The transmit interface may be an interface for data reception by the processor chip.

The receive interface and the transmit interface may be functionally the same as the transceiver module 404 in fig. 4.

The encoder may be of the same function as the encoding module 403 in fig. 4.

The sensor may be the same function as the capture module 401 of fig. 4, capable of converting light received by the lens assembly into raw multimedia data.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. Through the description of the above embodiments, those skilled in the art will understand that, for convenience and simplicity of description, only the division of the above functional modules is used as an example, and in practical applications, the above function distribution may be completed by different functional modules as needed, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical functional division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another device, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, that is, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially contributed to by the prior art, or all or part of the technical solutions may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for encoding multimedia data, comprising:

shooting by a camera to obtain original multimedia data;

determining the value grade of each original data segment in the original multimedia data according to a value grade rule, wherein the original multimedia data comprises a plurality of original data segments, the plurality of original data segments comprise a first original data segment and a second original data segment, and the value grade of the first original data segment is higher than the value grade of the second original data segment;

selecting an encoding mode corresponding to the value grade of each of the plurality of original data segments, wherein compression rates corresponding to different encoding modes are different, and the compression rate of the encoding mode corresponding to the first original data segment is lower than that of the encoding mode corresponding to the second original data segment;

and coding each original data segment in the original multimedia data according to the selected coding mode to generate coded multimedia data corresponding to the original multimedia data.

2. The method of claim 1, wherein the original data segment comprises one or more of:

video segments, full image frames, portions of image frames.

3. The method of claim 1, wherein determining a value rating for each original data segment in the original multimedia data according to a value rating rule comprises:

identifying a value event in the original multimedia data through semantic analysis, and determining a value grade of an original data segment where the value event is located according to the value event, wherein the value grade rule is used for describing a corresponding relation between the value event and the value grade.

4. The method of claim 3, wherein:

recording a valuable event in the first original data segment;

no value events are recorded in the second raw data segment.

5. The method according to any one of claims 1-4, further comprising:

immediately sending the first original data segment to a multimedia management device;

storing the second raw data segment in a non-volatile storage medium local to the camera, and then transmitting the second raw data segment in the non-volatile storage medium to the multimedia management device.

6. The method according to any one of claims 1-4, further comprising:

sending the first original data segment to a multimedia management device by using a high-speed network;

and sending the second original data segment to the multimedia management device by using a low-speed network.

7. The method according to any one of claims 1-6, wherein:

the format of the original multimedia data includes: one of RAW format, RGB format or YUV format, HSV format, Lab format, CMY format or YCbCr format;

the format of the encoded multimedia data includes: one of an image media format or a video media format.

8. An electronic device, comprising:

the shooting module is used for shooting to obtain original multimedia data;

the processing module is further used for determining the value grade of each original data segment in the original multimedia data according to a value grade rule, wherein the original multimedia data comprises a plurality of original data segments, the plurality of original data segments comprise a first original data segment and a second original data segment, and the value grade of the first original data segment is higher than the value grade of the second original data segment;

the processing module is further configured to select an encoding mode corresponding to the value level of each of the plurality of original data segments, where compression rates corresponding to different encoding modes are different, and the compression rate of the encoding mode corresponding to the first original data segment is lower than that of the encoding mode corresponding to the second original data segment;

and the encoding module is further used for encoding each original data segment in the original multimedia data according to the selected encoding mode to generate encoded multimedia data corresponding to the original multimedia data.

9. The electronic device of claim 8, wherein the original data segment comprises one or more of:

video segments, full image frames, portions of image frames.

10. The electronic device of claim 8, wherein the processing module is configured to:

11. The electronic device of claim 10,

recording a valuable event in the first original data segment;

no value events are recorded in the second raw data segment.

12. The electronic device of any of claims 8-11, further comprising a transceiver module configured to: immediately sending the first original data segment to a multimedia management device;

the electronic device further comprises a storage module for storing the second original data segment in a non-volatile storage medium local to the camera, and then transmitting the second original data segment in the non-volatile storage medium to the multimedia management device.

13. The electronic device of any of claims 8-11, further comprising a transceiver module to:

14. The electronic device of any of claims 8-13, wherein the format of the raw multimedia data comprises: one of RAW format, RGB format or YUV format, HSV format, Lab format, CMY format or YCbCr format;

15. A computer-readable storage medium comprising computer instructions that, when executed on an electronic device, cause the electronic device to perform the method of any of claims 1-7.

16. A computer program product, characterized in that it causes an electronic device to perform the method of any of the preceding claims 1-7, when the computer program product is run on a computer.

17. A camera, comprising:

the lens assembly is used for receiving light rays;

a sensor for converting light received by the lens assembly into raw multimedia data;

a processor to:

18. The camera of claim 17, further comprising:

an image signal processor ISP, located between the sensor and the processor, for performing signal processing on the raw multimedia data before sending the raw multimedia data to the processor.