CN101547351B

CN101547351B - Method for generating and processing video data stream and equipment thereof

Info

Publication number: CN101547351B
Application number: CN 200810035030
Authority: CN
Inventors: 武晓阳; 丁亚强; 林福辉
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Spreadtrum Communications Shanghai Co Ltd
Priority date: 2008-03-24
Filing date: 2008-03-24
Publication date: 2013-05-15
Anticipated expiration: 2028-03-24
Also published as: CN101547351A

Abstract

The invention relates to the field of video coding and processing, and discloses a method for generating and processing a video data stream and equipment thereof and aims to solve the problem of poor expandability existing in the prior ROI labeling mode. In the method, by adding the total length of attribute information of the ROI to a segmenting unit of the data stream, a label of the ROI has different definition methods according to different application fields, thereby having great flexibility. In the method, an image of the video data in the segmenting unit of the data stream is identified, and the identification results are taken as an integral part of the ROI attribute of the segmenting unit of the data stream, and the ROI attribute can be directly identified when the video data is searched, and the efficiency of video search is greatly improved.

Description

Video data stream generation, processing method and equipment thereof

Technical field

The present invention relates to Video coding and process field, particularly based on Video coding and the treatment technology in zone.

Background technology

Along with shooting and the treatment facility of digital video are constantly made a price reduction, digital video technology has become the part of daily life.Digital video not only can for people's amusement, also have important application in production and security fields.Video monitoring system is for example often arranged on biotope, office building, market and road, and these video monitoring systems are indefatigable observers, can replace the monitoring that the Security Personnel carries out 24 hours.Can produce a large amount of video datas in the video monitoring process, these video datas are usually encoded with compress mode and are preserved.Can adopt different compression ratios during compression, if compression ratio is high, the video data after the compression takies less memory space, but video image is second-rate; If compression ratio is low, the video data after the compression takies more memory space, but the quality of video image is better.

In order to reach balance preferably between shared memory space and video image quality, the technical staff has proposed based on the Video coding in zone and processing mode, briefly to set a part of zone of video image for area-of-interest (Region Of Interest, be called for short " ROI "), ROI is carried out the lower high-quality coding of compression ratio, and the higher low quality coding of compression ratio is carried out in other zone.

This Video coding and processing mode based on the zone has great significance in special application scenario, such as in video monitoring, the image area of someone or movable vehicle is key area, and other image area is all the background area, the content of key area is that the monitor staff is interested, and the background area often can be left in the basket.Can will have the image area of people or movable vehicle to be arranged to ROI this moment, and high-quality coded system is adopted in this part zone, keeps more details.

ROI can be set by hand by the user, also can automatically generate according to picture material, automatically this zone definitions is become ROI after for example identifying people's face.

For system can be processed pointedly according to ROI, its basis is to carry out mark effectively to ROI.The mark of area-of-interest is stipulated a series of ROI attributes exactly, for each attribute definition relevant parameter.These ROI property parameters will be included in video code flow, transmit together with video data or store.The existence of these parameters makes the effect of monitoring more show especially, and it provides interface for the coding and decoding video upper strata, makes supervisory control system that abundanter function be arranged.

At present in the standard of industry, be " rigid " method for the mark of ROI, which attribute that is to say has to a ROI, how many parameters each attribute has, how many bits are each parameter have fix.The benefit of doing like this is to read the ROI attribute with fixing data structure, deals with fairly simple.

The subject matter of the ROI labeling method of this " rigid " is to lack autgmentability.In some application scenarios, may need ROI to have more attribute and parameter, but present standard can not be supported; In other occasions, may not need to define so multi-parameter, cause needn't waste bits.

Summary of the invention

The object of the present invention is to provide a kind of video data stream generation, processing method and equipment thereof, to solve the problem of present ROI mark mode poor expandability.

For solving the problems of the technologies described above, embodiments of the present invention provide a kind of video data stream generating method, comprise the following steps:

Generate the area-of-interest attribute information relevant to the video data of a unit;

Calculate the total length of area-of-interest attribute information;

Total length, area-of-interest attribute information and video data are write the data flow segmenting unit.

Embodiments of the present invention also provide a kind of video data method for stream processing, comprise the following steps:

Read the total length of area-of-interest attribute information from the data flow segmenting unit;

Read the area-of-interest attribute information from the data flow segmenting unit according to total length.

Embodiments of the present invention also provide a kind of video data stream generating device, comprising:

The attribute generation unit is used for generating the area-of-interest attribute information relevant to the video data of a unit;

Computing unit is used for the total length of the area-of-interest attribute information that the computation attribute generation unit generates;

Writing unit writes the data flow segmenting unit for the total length that computing unit is obtained, area-of-interest attribute information and the video data that the attribute generation unit generates.

Embodiments of the present invention also provide a kind of video data device for processing streams, comprising:

The length reading unit is for read the total length of area-of-interest attribute information from the data flow segmenting unit;

The attribute reading unit is used for reading the area-of-interest attribute information according to total length from the data flow segmenting unit.

Embodiments of the present invention also provide a kind of attribute extendible area-of-interest labeling method, for the data flow segmenting unit that has the region of interest field mark, exist syntactic element to be used for recording the parameter of area-of-interest all properties and the total length of the adjunct grammar element of necessity in this data flow segmenting unit to be illustrated in this data flow segmenting unit.

Embodiment of the present invention compared with prior art, the main distinction and effect thereof are:

Join the data flow segmenting unit by the total length with the ROI attribute information, make the mark of ROI along with the application scenario is different, different specific definition methods can be arranged, have very large flexibility.

Further, the present invention carries out image recognition to the video data in a data flow point segment unit, with the part of recognition result as the ROI attribute of this data flow segmenting unit, when being retrieved, video data can directly identify the ROI attribute, no longer need the video data in each data flow segmenting unit is carried out respectively image recognition, greatly improved the efficient of video frequency searching.Recognition result can be the information such as vehicle model, number, can be also an event, and the collision event has for example occured.

Further, can also comprise necessary adjunct grammar element in the ROI attribute information so that the ROI attribute information after expansion is compatible with prior art on grammer, for example filling bit can make the ROI attribute information realize that byte-aligned, market bit can prevent from occurring in the ROI attribute information pseudo-initial code.

Description of drawings

Fig. 1 is the video data stream product process schematic diagram in first embodiment of the invention;

Fig. 2 is the video data stream handling process schematic diagram in third embodiment of the invention;

Fig. 3 is the video data stream handling process schematic diagram in four embodiment of the invention.

Embodiment

For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing, embodiments of the present invention are described in further detail.

The first execution mode of the present invention relates to a kind of video data stream generating method, and its flow process as shown in Figure 1.

In step 110, generate the ROI attribute information relevant to the video data of a unit.

In embodiment of the present invention, alleged ROI attribute information comprises parameter and the necessary adjunct grammar element of ROI attribute.Necessary adjunct grammar element comprises filling bit, market bit etc.Wherein the use of filling bit is to guarantee that the ROI attribute information is byte-aligned, to facilitate application layer, the ROI attribute information is processed, and the effect of market bit is to prevent that pseudo-initial code from occurring.These necessary adjunct grammar element purposes are to make the data flow of the parameter that records the area-of-interest all properties to satisfy requirement or the application requirements of corresponding standard.Data flow in the parameter that records the area-of-interest all properties has met the requirements of in situation, does not need necessary adjunct grammar element.For example, if the ROI attribute information has been byte-aligned, just do not need filling bit, if do not occur pseudo-initial code in the ROI attribute information, do not needed market bit.

One or more ROI can be comprised in the ROI attribute information, one or more parameters can be comprised again in each ROI.

In embodiments of the present invention, the video data of a unit refers to be encapsulated in a video unit in data flow point segment unit, the data flow segmenting unit is a logical block, in different occasions, different definition can be arranged, and can be for example a frame or a field or a layer etc.Frame, field, layer are all terms of the prior art, and its concept can with reference to relevant open source literature, not explained in embodiment of the present invention in detail.

The ROI attribute information comprises the build-in attribute of ROI and the custom attributes of ROI.In coding and decoding video, ROI is the part in video image, and only for the image other parts, the content in ROI has prior meaning, and ROI is the interested zone of user, and the user need to give different attributes to ROI.Like this, the attribute of mark ROI just is divided into the build-in attribute of two aspect: the first, ROI; The custom attributes of the second, ROI, i.e. the ROI attribute of customization.The build-in attribute of ROI is the attribute that the ROI existence is described, and comprises the number of ROI in image, the shape of ROI, the size of ROI, the position of ROI, base unit (for example base unit is macro block or pixel) of ROI etc.; The custom attributes of ROI is created a philosophy of one's own with regard to difficulty, and different applied environments may need to customize different attributes, such as traffic monitoring need to be known guarded region event, vehicle model, vehicle color etc.; The sales counter monitoring need to know whether guarded region has sex of people, people's number, people etc.

In embodiments of the present invention, can be namely to have build-in attribute that custom attributes is arranged again to the ROI attribute information in data flow point segment unit, can be also to only have custom attributes, can also be to only have build-in attribute.

After this enter step 120, calculate the total length of ROI attribute information.Total length normally calculates with byte (Byte), but also can calculate with units such as bit (Bit) or words (Word) in certain applications.

After this enter step 130, total length, ROI attribute information and video data are write the data flow segmenting unit.

A kind of typical way is first write the initial code that an expression ROI begins, and then successively writes total length and ROI attribute information.The benefit of doing like this is that total length and ROI attribute information are to deposit continuously in the data flow segmenting unit, and is more convenient when reading.

Certainly total length and ROI attribute information also can discontinuously be deposited, and for example total length can write in the frame head of a frame, and the ROI attribute information can write on the last of frame.

As long as can all data writing flow point segment unit is just passable with total length, ROI attribute information and video data, this three's relative order be not too important.

Except total length, ROI attribute information and video data, also can write other content in the data flow segmenting unit, these contents have difference according to different agreements, different application scenarioss.

The second execution mode of the present invention improves on the basis of the first execution mode.Main improvement is in step 110, namely in the step that generates the ROI attribute information, has increased the identifying operation to image.To be a frame take the data flow segmenting unit describe as example the below.

Specifically, when generating the ROI attribute information, can be first the video data of a frame be carried out image recognition, be combined as needs with recognition result with by the ROI attribute information that alternate manner obtains and be written to ROI attribute information in frame.The ROI attribute information that obtains by alternate manner can be the build-in attribute of ROI, ROI custom attributes of the manual definition of user etc.In some specific application scenarios, also recognition result can be only had in the ROI attribute information and the ROI attribute information that do not have alternate manner to obtain.

Take traffic monitoring as example, obtain the original video data of a frame from monitoring camera, video data in frame is carried out image recognition, judged whether that the collision event occurs, if having add expression that collision event event sign is arranged in the ROI of this frame attribute information.As long as search the associated video picture that the ROI attribute information just can find fast the generation of collision event after like this, no longer need to be when searching video flowing be carried out frame by frame analysis.

Take the sales counter monitoring as example, obtain the original video data of a frame from monitoring camera, video data in frame is carried out image recognition, identify guarded region whether the people is arranged, if someone, to represent on the one hand that guarded region has people's sign to write the ROI attribute information of this frame, on the other hand with human face region as a ROI, carry out relatively high-quality Video coding.When retrieving in the future, system can skip nobody's video-frequency band fast, when seeing that someone enters the picture of guarded region, can relatively easily identify the people.

The 3rd execution mode of the present invention relates to a kind of video data method for stream processing, and its flow chart as shown in Figure 2.

In step 210, obtain a data flow point segment unit from data flow, be put in buffer memory.The data flow segmenting unit can one be a frame, a field or a layer etc.

After this enter step 220, read the total length of ROI attribute information from the data flow segmenting unit.

After this enter step 230, read the ROI attribute information according to total length from the data flow segmenting unit.

In a typical example, total length and ROI attribute information be storage successively in the data flow segmenting unit, and indicates the beginning of total length and ROI attribute information with a specific initial code.In step 220, if read that specific initial code, just can then read total length L (total length is as a field, and the byte number of itself is fixed), then reading total length L byte afterwards, has been exactly the ROI attribute information of this data flow segmenting unit again.

After this enter step 240, the ROI attribute information that reads is carried out grammer process.

Common grammer is processed and is comprised that removal prevents the market bit that pseudo-initial code occurs, and out, abandons filling bit etc. with the data structure that pre-defines with each ROI and Parameter analysis of electrochemical thereof in the ROI attribute information.

The 4th execution mode of the present invention also relates to a kind of video data method for stream processing, and specifically a kind of method of video frequency searching, wherein be applied to the technical scheme in the 3rd execution mode.The basic procedure of the 4th execution mode as shown in Figure 3.

Step 310-340 is identical with step 210-240 in the 3rd execution mode respectively, repeats no more here.

Enter step 350 after step 340, whether the ROI attribute information that judgement reads meets search criterion.Search criterion is user-defined, for example finds out the collision event that whether occured, whether someone enters monitored space, whether has and appear at simultaneously monitored space etc. more than 2 people.

If the ROI attribute information meets search criterion, enter step 360, otherwise enter step 370.

In step 360, because the ROI attribute information of current data flow point segment unit meets search criterion, so the sign of this data flow segmenting unit is joined the lookup result collection, after this enter step 370.

In step 370, judge whether to satisfy the condition that end is searched, if it is process ends, process otherwise get back to step 310 pair next data flow segmenting unit.Search criterion is also user-defined, can be to finish after finding first to meet search criterion, finish etc. after the searching of all data flow segmenting units in complete paired data stream.

The 5th execution mode of the present invention relates to a kind of video data stream generating device, comprises

The attribute generation unit is used for generating the ROI attribute information relevant to the video data of a unit;

Computing unit is used for the total length of the ROI attribute information that the computation attribute generation unit generates;

Writing unit writes the data flow segmenting unit for the total length that computing unit is obtained, ROI attribute information and the video data that the attribute generation unit generates.

The related equipment of present embodiment can be used for completing the method flow that the first execution mode is mentioned.Therefore all ins and outs of mentioning in the first embodiment are still effective in the present embodiment, in order to reduce repetition, repeat no more here.

The 6th execution mode of the present invention also relates to a kind of video data stream generating device, and the 6th execution mode has carried out having improved on the basis of the 5th execution mode, has increased the image recognition unit, is used for the video data of a unit is carried out image recognition;

The attribute generation unit also is used for the recognition result of the image recognition unit part as the ROI attribute information is generated the ROI attribute information.

The related equipment of present embodiment can be used for completing the method flow that the second execution mode is mentioned.Therefore all ins and outs of mentioning in the second execution mode are still effective in the present embodiment, in order to reduce repetition, repeat no more here.

The 7th execution mode of the present invention relates to a kind of video data processing device, comprising:

The length reading unit is for read the total length of ROI attribute information from the data flow segmenting unit;

The attribute reading unit is used for reading the ROI attribute information according to total length from the data flow segmenting unit.

The related equipment of present embodiment can be used for completing the method flow that the 3rd execution mode is mentioned.Therefore all ins and outs of mentioning in the 3rd execution mode are still effective in the present embodiment, in order to reduce repetition, repeat no more here.

The 8th execution mode of the present invention relates to a kind of video data device for processing streams, increased the function that video is searched on the basis of the 7th execution mode, specifically increased and searched the unit, be used for judging whether the ROI attribute information that the attribute reading unit reads meets search criterion, if meet, the data flow segmenting unit that this ROI attribute information is corresponding adds the lookup result collection.

The related equipment of present embodiment can be used for completing the method flow that the 4th execution mode is mentioned.Therefore all ins and outs of mentioning in the 4th execution mode are still effective in the present embodiment, in order to reduce repetition, repeat no more here.

Need to prove, each unit of mentioning in present device execution mode (the 5th to the 8th execution mode) is all logical block, physically, a logical block can be a physical location, it can be also the part of a physical location, can also realize with the combination of a plurality of physical locations, the physics realization mode of these logical blocks itself is not most important, and the combination of the function that these logical blocks realize is the key that just solves technical problem proposed by the invention.For example, the writing unit in the 5th execution mode can also write other information except can writing total length, ROI attribute information and video data; And for example, the image recognition unit in the 6th execution mode and computing unit can realize with same digital signal processor (Digital Signal Processor, be called for short " DSP ") physically, etc.

In addition, for outstanding innovation part of the present invention, the said equipment execution mode of the present invention (the 5th to the 8th execution mode) will not introduced not too close unit with solving technical problem relation proposed by the invention, and this does not show that there is not other unit in the said equipment execution mode.For example, the video data processing device of the 7th execution mode can also have buffer unit to be used for the data of processing are carried out buffer memory etc.; The video data processing device of the 8th execution mode can also have display screen to be used for showing lookup result etc. to the user.

The 9th execution mode of the present invention relates to the extendible ROI labeling method of a kind of attribute, for the data flow segmenting unit that has the ROI mark, exist syntactic element to be used for recording the parameter of ROI all properties and the total length of the adjunct grammar element of necessity in this data flow segmenting unit to be illustrated in this data flow segmenting unit.

In video monitoring system, the ROI zone can allow monitoring effect more show especially, but ROI is so not direct with the relation between coding and decoding video.Such as at the current encoding and decoding framework of AVS, with regard to encoder, the effect of ROI is to tell how many ROI this picture frame of encoder has, and where each ROI comprises what in each ROI, and how etc. each ROI importance information; Encoder is according to the prompting of these information, select suitable coding strategy to encode, such as the little QP of ROI zone selection is quantized (it is exactly high-quality quantification that little QP quantizes, and has different expression waies in different agreements, for example can be called small quantization step).With regard to decoder, ROI is a string syntactic element in code stream, and decoder does not need to know several ROI, where ROI is arranged, what the ROI event is, also do not need to know other attribute of ROI, and it only need to carry out according to the strategy of encoder " transmission " corresponding decoding just.

The ROI mark is placed in video layer, and frequently exists in video code flow, its reason is that its existence can provide abundanter application for the upper strata.In the situation that only have video flowing (such as the sequence of monitoring video storage), application program can according to the ROI syntactic element to the monitoring code stream retrieve, event judgement, importance judgement etc. effectively process, or even some subsequent treatment (such as the ROI zone is strengthened and so on).

The mark of ROI, namely the regulation of ROI property parameters is the basis that ROI uses.Different applied environments requires differently to the expression of ROI, and some only needs simple ROI mark, such as ROI number and position; Some needs more detailed ROI mark, such as knowing in ROI occurrence etc.Need to adopt a kind of extendible method to the mark of ROI like this.

In video, ROI generally is associated with picture frame, if there is the ROI zone in this picture frame, need to have the ROI mark in the data flow of this picture frame, comprises the ROI number, position, size etc. property parameters.The mark of ROI can be placed in image head picture header, also can be placed in growth data extension data, also can be placed on other place of frame level unit.Being placed in growth data as example, can be as the mark of the ROI that gives a definition:

roi_parameters_extension(){

extension_id

roi_para_len

The list of //roi property parameters

// the marker_bit that may exist

stuffing_bits

next_start_code()

}

Wherein extensio_id is the label of growth data, does not belong to ROI property parameters part; Roi_para_len just means the syntactic element of this frame ROI all properties parameter length; Can record the ROI property parameters subsequently, attribute comprises the build-in attributes such as ROI number, position, size, also comprises custom attributes, such as ROI event type etc.; And insert marker_bit (market bit) in the place that pseudo-initial code may occur; Stuffing_bits is filling bit, and purpose is to guarantee that the roi_parameters_extension data are byte-aligned.Marker_bit and stuffing_bits are exactly necessary adjunct grammar element, and marker_bit prevents that pseudo-initial code from occurring, and stuffing_bits can facilitate application layer that the ROI property parameters is processed.In this case, the value of roi_para_len refers to that the roi property parameters represents the total length of data, marker_bit and stuffing_bits.

Method execution mode of the present invention can be realized in software, hardware, firmware etc. mode.No matter the present invention realizes with software, hardware or firmware mode, instruction code can be stored in the memory of computer-accessible of any type (for example permanent or revisable, volatibility or non-volatile, solid-state or non-solid-state, medium fixing or that change etc.).equally, memory can be for example programmable logic array (Programmable Array Logic, be called for short " PAL "), random access memory (Random Access Memory, be called for short " RAM "), programmable read only memory (Programmable Read Only Memory, be called for short " PROM "), read-only memory (Read-Only Memory, be called for short " ROM "), Electrically Erasable Read Only Memory (Electrically Erasable Programmable ROM, be called for short " EEPROM "), disk, CD, digital versatile disc (Digtal Versatile Disc, be called for short " DVD ") etc.

Although pass through with reference to some of the preferred embodiment of the invention, the present invention is illustrated and describes, but those of ordinary skill in the art should be understood that and can do various changes to it in the form and details, and without departing from the spirit and scope of the present invention.

Claims

1. a video data stream generating method, is characterized in that, comprises the following steps:

Calculate the total length of described area-of-interest attribute information;

Described total length, described area-of-interest attribute information and described video data are write the data flow segmenting unit;

Also comprise following substep in the step that generates the area-of-interest attribute information relevant to the video data of a unit:

Video data to a described unit carries out image recognition, with the part of recognition result as described area-of-interest attribute information, for retrieval.

2. video data stream generating method according to claim 1, it is characterized in that, the step of described " video data to a described unit carries out image recognition, with the part of recognition result as described area-of-interest attribute information, for retrieval " comprises following substep:

Obtain the original video data of a frame from monitoring camera, video data in frame is carried out image recognition, identify guarded region whether the people is arranged, if someone, will represent that guarded region has people's sign to write the area-of-interest attribute information of this frame on the one hand, on the other hand with human face region as an area-of-interest, carry out relatively high-quality Video coding.

3. video data stream generating method according to claim 1, is characterized in that, described area-of-interest attribute information comprises parameter and the necessary adjunct grammar element of region of interest Domain Properties;

Described data flow segmenting unit can be a frame or a field or a layer;

The adjunct grammar element of described necessity comprises filling bit and/or is used for preventing the market bit of pseudo-initial code.

4. video data stream generating method according to claim 1, is characterized in that, described area-of-interest attribute information comprises region of interest Domain Properties and the parameter thereof of customization.

5. a video data method for stream processing, is characterized in that, comprises the following steps:

Read the area-of-interest attribute information according to described total length from described data flow segmenting unit, this area-of-interest attribute information comprises the recognition result that the video data of this data flow segmenting unit is carried out image recognition;

If described area-of-interest attribute information meets search criterion, the data flow segmenting unit that this area-of-interest attribute information is corresponding adds the lookup result collection, and wherein this search criterion comprises the condition of carrying out the recognition result of image recognition for data flow segmenting unit video data.

6. a video data stream generating device, is characterized in that, comprising:

Computing unit is used for calculating the total length of the area-of-interest attribute information that described attribute generation unit generates;

Writing unit writes the data flow segmenting unit for the total length that described computing unit is obtained, area-of-interest attribute information and the described video data that described attribute generation unit generates;

Also comprise the image recognition unit, be used for the video data of a described unit is carried out image recognition; Described attribute generation unit also is used for the recognition result of the described image recognition unit part as the area-of-interest attribute information is generated described area-of-interest attribute information.

7. a video data device for processing streams, is characterized in that, comprising:

The attribute reading unit is used for reading the area-of-interest attribute information according to described total length from described data flow segmenting unit, and this area-of-interest attribute information comprises the recognition result that the video data of this data flow segmenting unit is carried out image recognition;

Search the unit, be used for judging whether the area-of-interest attribute information that described attribute reading unit reads meets search criterion, if meet, the data flow segmenting unit that this area-of-interest attribute information is corresponding adds the lookup result collection; Wherein this search criterion comprises the condition of carrying out the recognition result of image recognition for data flow segmenting unit video data.