CN101167365A - Region-of-interest processing for video telephony - Google Patents

Region-of-interest processing for video telephony Download PDF

Info

Publication number
CN101167365A
CN101167365A CNA2006800145199A CN200680014519A CN101167365A CN 101167365 A CN101167365 A CN 101167365A CN A2006800145199 A CNA2006800145199 A CN A2006800145199A CN 200680014519 A CN200680014519 A CN 200680014519A CN 101167365 A CN101167365 A CN 101167365A
Authority
CN
China
Prior art keywords
roi
video
end video
information
far
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2006800145199A
Other languages
Chinese (zh)
Inventor
李彦辑
哈立德·希勒米·厄勒-马列
蔡明章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN101167365A publication Critical patent/CN101167365A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The disclosure is directed to techniques for region-of-interest (ROI) processing for video telephony (VT) applications. According to the disclosed techniques, a recipient device defines ROI information for video information transmitted by a sender device, i.e., far-end video information. The recipient device transmits the ROI information to the sender device. Using the ROI information transmitted by the recipient device, the sender device applies preferential encoding to an ROI within a video scene. In this manner, the recipient device is able to remotely control ROI encoding of far-end video information by the sender device.

Description

The area-of-interest that is used for visual telephone is handled
The CROSS-REFERENCE TO RELATED APPLICATIONS case
The 60/660th of the application's case request application on March 9th, 2005, No. 200 U.S. Provisional Application cases, and the title of application on July 15th, 2005 is the interests of the 11/182nd, No. 432 U.S. patent application case co-pending of REGION-OF-INTEREST EXTRACTION FOR VIDEOTELEPHONY.
Technical field
The present invention relates to digital video coding and decoding, and more particularly, relate to the technology that is used to visual telephone (VT) to use processing area-of-interest (ROI) information.
Background technology
People have worked out many different video coding standards that are used for the encoded digital video sequence.For example, mobile motion picture expert group version (MPEG) has been worked out many standards, comprises MPEG-1, MPEG-2 and MPEG-4.Other example comprises H.263 standard and emerging ITU standard H.264 of International Telecommunication Union.Usually, these video encoding standards are supported through improved efficiency of transmission by with compress mode data being encoded.
Visual telephone (VT) is permitted user's sharing video frequency and audio-frequency information, to support the application such as video conference.Exemplary visual telephone standard comprises standard and the ITU standard of standard definition H.324 H.323 by session initiation protocol (SIP), ITU.In the VT system, the user can send and receiver, video information, receiver, video information or only send video information only.The recipient watches the institute's receiver, video information that presents with transmit leg institute transmission form usually.
People have advised the selected portion of video information is carried out priority encoding.For example, transmit leg can be stipulated area-of-interest (ROI) is encoded into has higher quality, to be transferred to the recipient.Transmit leg may be expected to emphasize ROI to long-range recipient.The representative instance of ROI is people's a face, but transmit leg may be expected to be primarily focused on other interior object of video scenery.By ROI is carried out priority encoding, the recipient can more clearly watch ROI than non-ROI zone.
Summary of the invention
The present invention is directed to and be used to visual telephone (VT) to feel the technology that handle in interest zone (ROI).According to the technology that is disclosed, local reception side's device is to define ROI information by the video of long-range sending method, device coding and transmission (that is, far-end video).Local reception side's device arrives long-range sending method, device with described ROI message transmission.Use is by the ROI information of recipient's device transmission, and sending method, device will be used priority encoding (for example, higher-quality coding or error protection) to the ROI in the video scenery.In this way, recipient's device can Long-distance Control be encoded to carried out ROI by the coded far-end video of sending method, device.
Except that receiving the far-end video, the recipient can be through being equipped with sending video, that is, the near-end video.Therefore, participate in the transmit leg that device in the VT communication can serve as video information symmetrically and also can serve as the recipient of video information.When serving as the recipient, each device can be served as reasons and be defined far-end ROI information as the coded video of the remote-control device of transmit leg.Equally, when serving as transmit leg, each device can be the video information that will be transferred to as another device of recipient and defines near-end OI information.Can handle the ROI information that provides by another device to support that transmit leg or recipient's device can be described as " ROI knows " on the meaning to the Long-distance Control of ROI video coding.
The long-range ROI that far-end ROI information allowance recipient control is implemented by sending method, device encodes more clearly to watch object spare or the zone in institute's receiver, video scenery.Near-end ROI information permit transmit leg control local ROI coding to emphasize object in the transmission of video scenery or zone.Therefore, the ROI priority encoding of transmit leg enforcement can be based on the ROI information of recipient or transmit leg generation.In addition, recipient's device also preferably (for example) by using higher-quality reprocessing (for example, error concealment, deblock or go the singing technology) ROI is decoded according to ROI information.
Handle for ease of ROI, this aspect further contains following technology: ROI selection, ROI mapping, ROI extraction, ROI signaling, ROI follows the tracks of and the access of recipient's device is verified to permit the ROI coding that Long-distance Control is implemented by sending method, device.ROI selects can be dependent on ROI pattern, language or text ROI explanation of defining in advance or the ROI that is drawn by the user.ROI mapping relates to selected ROI pattern is translated into the ROI mapping graph, and described ROI mapping graph can take to be suitable for the form of macro block (MB) mapping graph that used by video encoder.
The ROI signaling can relate to the signaling in band or out of band of the ROI information from recipient to the sending method, device.ROI follows the tracks of to relate in response to the ROI motion and dynamically regulates the ROI mapping graph.The access checking can relate to for the purpose of long-range ROI control authorizes recipient's device access right and access grade, and solves the ROI control hazard between local and a long-distance user or a plurality of long-distance user.
ROI extracts to relate to the instruction manual of area-of-interest (ROI) is handled to produce the information of regulation ROI according to described explanation.The near-end video information of ROI is according to the rules encoded to strengthen the picture quality of described ROI with respect to the non-ROI district of near-end video.Instruction manual can be text, figure or based on language.Extraction module applies suitable processing to the ROI information that is produced from instruction manual.Extraction module can reside on video communication device in this locality, or resides in to be configured and be used for the different intermediate servers that ROI extracts.
In one embodiment, the invention provides a kind of following method of operating that comprises: be received in information by regulation area-of-interest (ROI) in local device coding and the near-end video that receive by remote-control device from remote-control device; Reach according to described ROI the near-end video is encoded to strengthen the picture quality of described ROI with respect to the non-ROI district of described video.
In another embodiment, the invention provides a kind of video coding apparatus, described video coding apparatus comprises: area-of-interest (ROI) engine, and it is received in the information of regulation area-of-interest (ROI) in the near-end video that will be transferred to remote-control device from remote video communication device; And video encoder, it is encoded to strengthen the picture quality of described ROI with respect to the non-ROI district of described video to described near-end video.
In additional embodiment, the invention provides a kind of following method of operating that comprises: the information that produces the area-of-interest (ROT) in the far-end video that regulation receives by the remote-control device transmission and by local device; Reach described message transmission is encoded to strengthen the picture quality of described ROI with respect to the non-ROI district of described video to the far-end video according to described ROI for being used for to remote-control device.
In another embodiment, the invention provides a kind of video coding apparatus, described video coding apparatus comprises: area-of-interest (ROI) engine, and it produces the information of the area-of-interest (ROI) of regulation in the far-end video that remote-control device received; And video encoder, thereby it is encoded to described near-end video and transmits the information of regulation ROI and is used for according to described ROI the far-end video being encoded with respect to the picture quality of the described ROI of non-ROI district enhancing of described far-end video for described remote-control device with the near-end video of having encoded.
In another embodiment, the invention provides a kind of following method of operating that comprises: receive explanation by the area-of-interest (ROI) in the near-end video of local device generation from the user; According to the information of described explanation generation regulation ROI and according to the information of described regulation ROI the near-end video is encoded to strengthen the picture quality of described ROI with respect to the non-ROI district of near-end video.
In additional embodiment, the invention provides a kind of video coding apparatus, described video coding apparatus comprises: area-of-interest (ROI) engine, and it receives produce the information of stipulating ROI by the explanation of the area-of-interest (ROI) in the near-end video of described device code and according to described explanation; And video encoder, it is encoded to strengthen the picture quality of described ROI with respect to the non-ROI district of described video to the near-end video.
In another embodiment, the invention provides a kind of video coding system, described video coding system comprises: first video communication device, and it is encoded to described near-end video; Second video communication device, it receives the near-end video from described first video communication device, and wherein said second video communication device produces the instruction manual to the area-of-interest (ROI) in the near-end video that is produced by first video communication device; And intermediate server, they are structurally different with first and second video communication device and produce the information of regulation ROI according to described explanation, and wherein first video communication device is encoded with the picture quality with respect to the described ROI of non-ROI district enhancing of near-end video to the near-end video according to the information of described regulation ROI.
The techniques described herein can be implemented in hardware, software, firmware or its arbitrary combination.If be structured in the software, then described technology can realize by computer-readable media partly that described computer-readable media comprises the program code that contains instruction, when carrying out described instruction, can implement one or more methods as herein described.
Reach the details that to set forth one or more embodiment in the hereinafter explanation at accompanying drawing.According to described explanation and graphic and according to claims, further feature of the present invention, purpose and advantage will be apparent.
Description of drawings
Fig. 1 is a block diagram, and video coding and the decode system that has ROI to know Video Codec (CODEC) incorporated in its graphic extension into.
Fig. 2 be illustrated in display that radio communication device is associated on define the graphic of ROI in the video scenery that presents.
Fig. 3 is the block diagram that graphic extension is incorporated into has ROI to know the communicator of CODEC.
Fig. 4 is that graphic extension has the block diagram that ROI knows CODEC and further incorporates another communicator that the ROI extraction module is arranged into.
Fig. 5 extracts the block diagram that server-assignment ROI extracts in the middle of graphic extension is passed through.
Fig. 6 is illustrated as the block diagram that a plurality of video telephony conversation distribute ROI to extract.
To be graphic extension define the graphic of ROI pattern for what the user selected to Fig. 7 A-7D in advance.
Fig. 8 is illustrated in recipient's device place generation ROI information and to be controlled at long-range sending method, device place the near-end video is carried out the flow chart that preferential ROI encodes.
Fig. 9 is that the flow chart that the near-end video is carried out preferential ROI coding at the sending method, device place is followed the tracks of in graphic extension to handling from the ROI information of recipient's device so that in conjunction with ROI.
Figure 10 be graphic extension to handling from the ROI information of recipient's device so that at the sending method, device place near-end video is carried out the flow chart of preferential ROI coding in conjunction with user rs authentication.
Figure 11 is the flow chart that the selection of ROI pattern is defined in graphic extension in advance.
Figure 12 is graphic extension by expansion and shrinks the ROI template and define the graphic of ROI pattern in the shown video scenery.
To be graphic extension define the graphic of ROI pattern in the shown video scenery by dragging the ROI template to Figure 13.
To be graphic extension define the graphic of ROI pattern in the shown video scenery by draw the ROI district on touch-screen with recording pen to Figure 14.
Figure 15 is that graphic extension uses pull-down menu and the defined ROI object that will dynamically extract and follow the tracks of to define the graphic of ROI pattern in the shown video scenery.
Figure 16 be graphic extension use pull-down menu be mapped to define as the defined ROI object that defines the ROI pattern among Fig. 7 A-7D in advance as shown in ROI pattern graphic in the video scenery.
Figure 17 is that graphic extension uses the ROI specification interface to define the flow chart of the ROI pattern in the shown video scenery.
Figure 18 is that graphic extension solves the flow chart that transmit leg conflicts with ROI between recipient's device.
Figure 19 is priority encoding is carried out in graphic extension to the ROI macro block in the near-end video a flow chart.
Embodiment
Fig. 1 is a block diagram, and video coding and the decode system 10 that has ROI to know Video Codec (CODEC) incorporated in its graphic extension into.As shown in fig. 1, system 10 comprises first video communication device 12 and second video communication device 14.Communicator 12,14 connects by transmission channel 16.Transmission channel 16 can be wired or wireless medium.Two-way video transmission between the video communication device 12,14 of system's 10 support visual telephones.Roughly symmetrical manner running of device 12,14.Yet in certain embodiments, the one or both in the video communication device 12,14 can be configured and only be used for one-way communication, to support that ROI knows video flowing.
For bidirectional applications, can on the opposite end of channel 16, provide reciprocal coding, decoding, multiplexed (MUX) and demultiplexing (DEMUX) assembly.In the example of Fig. 1, video communication device 12 comprises MUX/DEMUX assembly 18, ROI knows video CODEC 20 and audio frequency CODEC 22.Similarly, video communication device 14 comprises MUX/DEMUX assembly 26, ROI knows video CODEC 28 and audio frequency CODEC 30.On can be to the long-range meaning that provide or that handle from the ROI information that its oneself video communication device this locality provides from another video communication device 12,14, each CODEC 20,28 be " ROI knows ".
Video communication device 12,14 can be configured to through outfit and be used for video flowing, visual telephone or its both mobile radio terminal machines or catv terminal machine.For this reason, video communication device 12,14 can further comprise suitable launch, the electronic installation of reception, modulation and processing is with support of wireless communication.The example of mobile radio terminal machine comprises mobile radia telephone, mobile personal digital assistants (PDA), mobile computer or is equipped with wireless communication ability and other mobile device of video coding and/or decoding capability.The example of catv terminal machine comprises laptop computer, visual telephone, network equipment, TV set-top box, interactive television etc.Any one of video communication device 12,14 all can be configured to send video information, receiver, video information or transmission and receiver, video information.
For videophone application, need device 12 not only to support the video transmitting capacity but also support the video reception ability usually.Yet the stream-type video application has also been contained in the present invention.At visual telephone, and in the specific mobile video telephone by radio communication enforcement, bandwidth is the problem of being concerned about the most.Therefore, optimize the picture quality that coding step can improve a part of described video for the outer bits of coded of ROI selectivity allocation or other, keep binary encoding efficient simultaneously.For carrying out priority encoding, can give the ROI allocation outer position, can reduce the figure place of distributing to non-ROI zone (for example, the background in the video scenery) simultaneously.
Usually, system 10 is adopted as visual telephone (VT) and uses the technology of handling area-of-interest (ROI).Yet as indicated above, these technology are also applicable to video stream application.For the illustration purpose, suppose that each video communication device 12,14 can not only operate as the transmit leg of video information but also as the recipient, and operate as the full participant in the VT dialogue thus.For the video information that is transferred to video communication device 14 from video communication device 12, video communication device 12 is sending method, devices, and video communication device 14 is recipient's devices.On the contrary, for the video information that is transferred to video communication device 12 from video communication device 14, video communication device 12 is recipient's devices, and video communication device 14 is sending method, devices.When touching upon will be by the video information of local video communication device 12,14 coding and transmission the time, described video information can be described as " near-end " video.When touching upon will be by remote video communication device 12,14 coding with from video information that remote video communication device 12,14 receives the time, described video information can be described as " far-end " video.
According to the technology that is disclosed, when operating as recipient's device, video communication device 12 or 14 is to define ROI information from the far-end video information that sending method, device receives.Equally, be on the meaning that receives from another (transmit leg) device that is positioned at described video information far-end at it, can be called " far-end " video information from the video information that sending method, device receives.Equally, the ROI information that defines for the video information that receives from sending method, device will be called " far-end " ROI information.Far-end ROI is often referred to the zone that the recipient of the described far-end video in the far-end video is most interested in.Recipient's device is the far-end video-information decoding, and will present to the user by display equipment through the far-end video of decoding.The user selects ROI in the video scenery that is presented by the far-end video.
Recipient's device produces far-end ROI information according to the ROI that is selected by the user, and described far-end ROI information is sent to sending method, device.Far-end ROI information can be taked the form of ROI macro block (MB) mapping graph, and described ROI macroblock map figure defines ROI according to the macro block that resides in the ROI.ROI MB mapping graph can be labeled as 1 with the MB that is in the ROI, and the MB outside the ROI is labeled as 0, comprises (1) in ROI to discern MB easily, and does not comprise (0) in ROI.MB is the video blocks that forms the part of frame.The size of MB can be a 16x16 pixel.Yet, may be other MB size.Therefore, MB can refer to arbitrary video blocks, including but not limited to the particular video frequency coding standard (for example, MPEG-1, MPEG-2 and MPEG-4, ITU H.263, ITU H.264) or arbitrary other standard in the macro block that defined.
Use is by the far-end ROI information of recipient's device transmission, and sending method, device can apply priority encoding to the corresponding ROI in the video scenery.Specific, can give the ROI allocation outer bits of coded, the graphic amount that reduces the bits of coded of distributing to non-ROI zone is improved the picture quality of ROI thus.In this way, the ROI coding can Long-distance Control implemented by sending method, device of recipient's device to the far-end video information.Compare with non-ROI district, priority encoding applies higher-quality coding to the ROI district of video scenery, for example, and by in the ROI district, implementing position distribution in advance or preferential the quantification.The user who permits recipient's device through the ROI of priority encoding is the object of observation or zone more clearly.For example, the user of recipient's device may expect more clearly to watch face or a certain other object than the background area of video scenery.
When operating as sending method, device, video communication device 12,14 also can define ROI information for the video information by the sending method, device transmission.Equally, result from video information on the meaning of proximal end of communication channel, the video information that is produced in the sending method, device can be called " near-end " video.The ROI information that is produced by sending method, device will be called " near-end " ROI information.Near-end ROI is often referred to the zone of wanting the near-end video emphasized to the recipient for transmit leg.Therefore, ROI can be defined as far-end ROI information by recipient's device users, or is defined as near-end ROI information by the sending method, device user.Sending method, device presents the near-end video by display equipment to the user.The user who is associated with sending method, device selects ROI in the video scenery that the near-end video presents.The ROI that sending method, device uses the user to select comes the near-end video is encoded, so that the higher-quality coding in non-relatively ROI district is for example used in the ROI priority encoding in looking closely frequently nearly.
Select near-end ROI to allow the user of sending method, device to emphasize zone or object in the video scenery at the sending method, device place by the local user, and make these zones or object become the focus of recipient's device users thus.It should be noted that does not need the near-end ROI that sending method, device is user-selected to be transferred to recipient's device.But sending method, device uses the near-end ROI information selected, in this locality the near-end video is encoded before being passed to recipient's device so that look closely nearly to keep pouring in.Yet in certain embodiments, sending method, device can send ROI information to recipient's device, permitting using preferential decoding technique, and for example higher-quality error correcting (for example, error concealment) or reprocessing (for example, deblock and remove the singing filter).
If the two provides ROI information by sending method, device and recipient's device, then sending method, device is used the far-end ROI information or the local near-end ROI information that produces that receive from recipient's device and is come the near-end video is decoded.Between near-end that provides by sending method, device and recipient's device and far-end ROI selection, the ROI conflict may take place.May need these conflicts are solved, for example the active of being implemented by the local user solves, or access rights according to the rules and grade solve, the description of doing as other places in the present invention.Under arbitrary situation of two kinds of situations, sending method, device can provide or the long-range near-end ROI information that provides of recipient's device comes ROI is carried out priority encoding according to sending method, device this locality.
Handle for ease of ROI, the technology that is used for following operation is further contained in the present invention: near-end ROI selection, ROI mapping, ROI signaling, ROI follow the tracks of and the access of recipient's device is verified to permit the ROI coding that Long-distance Control is implemented by sending method, device.As with as described in, the different near-end ROI that used by recipient's device or sending method, device select technology can relate to the ROI that selection is defined ROI pattern, language or text ROI explanation in advance or drawn by the user.In recipient's device, ROI shines upon to relate to selected far-end or near-end ROI pattern is translated into the ROI mapping graph, and described ROI mapping graph can be taked the form of macro block (MB) mapping graph.The ROI signaling can relate to the signaling in band or out of band to the far-end ROI information from recipient's device to sending method, device.ROI follows the tracks of to relate in response to the ROI motion and dynamically regulates far-end ROI mapping graph that is produced by recipient's device or the local near-end ROI that is produced by transmit leg oneself.The access checking can relate to for the purpose of remote ROI control authorizes recipient's device access right and access grade, and solves the ROI control hazard between recipient and the sending method, device.
System 10 can according to session initiation protocol (SIP), ITU H.323 standard, ITU H.324 standard or other standard are supported visual telephone.Each video CODEC 20,28 according to video compression standard (for example MPEG-2, MPEG-4, ITU H.263 or ITU H.264) produce encoded video data.Show further that as so-called among Fig. 1 video CODEC 20,28 can integrate with corresponding audio frequency CODEC 22,30, and comprises the Voice ﹠ Video part of suitable MUX/ DEMUX assembly 18,26 with data streams.MUX- DEMUX unit 18,26 can be observed ITUH.223 multiplexer agreement or be waited other agreement such as User Datagram Protoco (UDP) (UDP).
Fig. 2 be illustrated in ROI 32 in the video scenery that presents on the display 36 that is associated with radio communication device 38 34 define graphic.In the example of Fig. 2, ROI 32 is rectangular areas of containing the face 39 of being presented in the people in the video scenery 34, though ROI can contain any needs are improved or strengthen image encoded or object.In VT uses, be presented in the user that people in the video scenery 34 is generally long-range sending method, device, described user is and running is a side of the user's of the radio communication device 38 of recipient's device video conference.ROI 32 constitutes far-end ROI, and this is because it defines the ROI from the video scenery of long-range sending method, device transmission.According to the present invention, far-end ROI 32 is transferred to sending method, device and with regulation respectively distinguishing of the video scenery in the described ROI is carried out priority encoding.By this way, the picture quality that the local user of recipient's device 38 can Long-distance Control far-end ROI 32.As describing, the size of far-end ROI 32, shape and position can be fixed or scalable, and can define in every way, describe or regulate.
ROI 32 allowance recipient device users are more clearly watched the individual objects in the video scenery 34, for example, and people's face 39.Face 39 in the ROI 32 is to be encoded with respect to the higher picture quality in non-ROI district (for example, the background area of video scenery 34).By this way, the user can more clearly watch countenance, lip action, eye motion etc.Yet another is chosen as, and ROI 32 can be used for stipulating any object except that facial.By and large, the ROI during VT uses can be very subjective, and can change because of the user is different.Required ROI depends on also how VT uses.In some cases, different with video conference, VT can be used to watch and estimate object.
For example, the husband can use VT should be used for showing that he wants the gift of buying in the gift shop, airport.The husband may expect to obtain second option with timely and alternant way there from wife.By doing like this, the husband can will take the leaving the time of flight and make decision at once according to it.In this case, ROI covers the zone that the husband is just considering gift.By allowing wife (or husband) to select ROI, just may obtain the better coding of specific ROI or better service quality and permit wife thus and more clearly watch gift.
As another example, two or more engineers can relate to the VT that presents and discuss various equations or program and call out on blank.In this case, the long-distance user may expect to watch with the better pictures quality part of blank, for example, and to be more clearly visible equational details.For this reason, the long-distance user selects to surround described equational ROI.In addition, when the engineer added content to blank, the long-distance user can expect that mobile ROI is to follow the tracks of the subject matter that newly adds blank to.The long-distance user stipulates that the ability of ROI can improve the information exchange in the technical discussion significantly.
ROI technology as herein described not only can be improved the video quality of ROI, also can improve two video interactions between the user.Generally speaking, conventional VT uses and only can make up two one-way video transmission and finish any reciprocation on sound.In traditional VT uses, on the video side, there is not reciprocation usually.Allow recipient's device users that the video content that receives from sending method, device during calling out at VT is had limited at least control, the video interaction that can permit increasing.
By this way, the VT Application Design can be become recipient's device users can select ROI, and ROI information is sent back to sending method, device so that ROI is carried out priority treatment; for example; higher-quality coding (for example, distributing more odd encoder position) or stronger error protection (for example, refreshing in the MB).In fact, by regulation far-end ROI, the remote controlled sending method, device encoder of recipient's device users.In addition, this far-end ROI information can be known Video Decoder by the ROI in the device that receives the far-end video and be used for carrying out better reprocessing, for example, error concealment, deblocks or goes singing.The recipient of encoded video is different from pan, inclination, varifocal or the focusing of only controlling remote camera to the Long-distance Control of video encoder.On the contrary, use long-range ROI to handle, the user can influence and be applied to specific region or coding quality that each is regional.Yet, in certain embodiments, can control in conjunction with the long-distance video encoder remote camera control is provided.
Fig. 3 is the block diagram that the video communication device 12 that has ROI to know CODEC is incorporated in graphic extension into.Though the video communication device 12 of Fig. 3 depiction 1 can be constructed video communication device 14 similarly.Equally, video communication device 12 or 14 can take on recipient's device, sending method, device and preferably recipient and sending method, device both.As shown in Figure 3, video communication device 12 comprises ROI and knows CODEC 20, capture device 40 and user interface 42.Though show channel 16 among Fig. 3, MUX-DEMUX and audio-frequency assembly are to omit for the purpose that is easy to graphic extension.Capture device 40 can be video camera, but described video camera and video communication device 12 combine or be coupled to video communication device 12 with mode of operation.In certain embodiments, for example, capture device 40 can combine mutually with mobile phone, to form so-called camera phone.By this way, capture device 40 can support portable VT to use.
User interface 42 can comprise display equipment, for example, LCD (LCD) but, plasma screen, projector's display or any other is can be with video communication device 12 together whole or be coupled to the display apparatus of video communication device 12 with mode of operation.Display equipment presents video image for the user of video communication device 12.Video image can comprise the near-end video that capture device 40 obtains in this locality, and from the far-end video of sending method, device remote transmission.In addition, user interface 42 can comprise any one of various user's input mediums, wherein comprises hardkey, soft key, various indicator device, recording pen and analog, for the user's input information of video communication device 12.In certain embodiments, user's input medium of display equipment and user interface 42 can be together whole with mobile phone.The user of video communication device 12 depends on user interface 42 to watch far-end video and optionally, the near-end video.In addition, the user depends on user interface 42 and is used to define with input or selects far-end ROI and optionally, the information of near-end ROI.
As the further demonstration among Fig. 3, ROI knows that CODEC 20 comprises ROI engine 44, ROI knows video encoder 46 and ROI knows Video Decoder 48.ROI knows 46 pairs of near-end videos (" NEAR-END VIDEO ") that obtain from capture device 40 of video encoder and encodes so that be transferred to recipient's device.Equally, the video in this locality results from video communication device 12 is compared in term " near-end " indication with " far-end " video that receives from remote video communication device (for example, video communication device 14).In the example of Fig. 3, ROI knows video encoder 46 and uses from the near-end ROI information (" REMOTE NEAR-END ROI ") of remote receiver acquisition to come near-end ROI is carried out priority encoding.Remote receiver is the user who is associated with remote video communication device 14.
From long-distance user's viewpoint, when long-range near-end ROI is that it is remote ROI when being transmitted by remote-control device 14, and from the local user's of the device 12 of receiving remote near-end ROI viewpoint, it can be called long-range near-end ROI.That is to say, determine that as transmit leg or recipient's viewpoint video and ROI are applicable to near-end or far-end video from installing 12,14.Equally, local device 12 user's regulation far-end ROI of the video coding that carries out at remote-control device 14 places of Long-distance Control.But when the user of remote-control device 14 received far-end ROI, it was considered as being subordinated to the near-end video of encoding by local device 14 with near-end ROI.Generally speaking, for the purpose of used mark among the present invention, viewpoint is very important.
Optionally, ROI knows video encoder 46 and can use the near-end ROI information (" LOCAL NEAR-END ROI ") that obtains from the local user of video communication device 14.Local near-end ROI also can be called the ROI that transmit leg drives, and this is produced because it is a transmit leg by the near-end video of encoding.Local near-end ROI information is used by local encoder 46 and is not sent to other video communication device 14 usually, unless the Video Decoder in the remote-control device 14 is designed to the user-defined near-end ROI by sending method, device 12 is used preferential decoding.Long-range near-end ROI also can be called receiver-driven ROI, and this is produced because it is long-range recipient by the near-end video of encoding.Long-range near-end ROI permits recipient's control of video communication device 12 video that produces and knows the ROI coding that encoder 46 is implemented by ROI, and the transmit leg control of local near-end ROI allowance video communication device 12 video that produces is known the ROI coding that encoder 46 is implemented by ROI.In some cases, as with as described in, therefore long-range and defining of local ROI can clash, and need manage conflict.
Local and remote near-end ROI information can near-end ROI macro block (MB) mapping graph (" near-end ROIMB mapping graph ") form offer ROI and know encoder 46.The identification of near-end ROI MB mapping graph resides in the specific MB in recipient's near-end ROI or the transmit leg near-end ROI.ROI knows encoder 46 and uses higher-quality coding, stronger error protection or its both come the ROI in the near-end video is carried out priority encoding to improve the picture quality of ROI when long-distance user's (for example) watches at remote video communication device 14 places.Especially need be in the wireless phone applications to the better error protection of ROI.Then, the near-end of the coding video (" ENCODED NEAR-END VIDEO ") with gained is transferred to remote-control device 14.
As will explaining, ROI knows video encoder 46 and also transmits the far-end ROI information (" FAR-END ROI ") that the local user by video communication device 12 produces for the far-end video that is received from remote video communication device 14.Far-end ROI is as the receiver-driven ROI by remote video communication device 14 coded videos.In fact, permit controlling at least in part the encoder of the remote video communication device 14 near-end video that produces by the far-end ROI information of video communication device 12 transmission, ROI knows video encoder 46 knows the long-range near-end ROI that decoder 48 is received by ROI just as being used for by video communication device 12 controlling.By this way, each video communication device 12,14 all can influence the ROI coding that carries out in the far-end video that is produced by other device.
Form that can signaling information in band or out of band is transmitted the far-end ROI information of being transmitted by video communication device 12.Under the situation of in-band signalling, far-end ROI information can be embedded in the near-end of the coding video bit stream that will be transferred to remote video communication device 14.In the mpeg 4 bitstream form, for example, there is the field of the what is called " user_data " can be used for embedding the information of describing bit stream.Take " user_data " field or the similar field of other bit stream format to can be used to embed far-end ROI information and can not run counter to the bit stream biddability.Another is chosen as, and ROI information can be passed through so-called data hiding technique (for example, Steganography) and embed video bit stream.
ROI knows Video Decoder 48 and is configured to seek ROI information in the user_data field or the other places that enter in the far-end video from remote-control device.Under the situation of out-of-band signalling, signaling protocol (for example, ITU H.245 or SIP) can be used for transmitting far-end ROI information.Under arbitrary situation of above two kinds of situations, far-end ROI information can be taked ROI MB mapping graph or define the position of far-end ROI and/or the form of entity coordinate of size.In case decoder 48 receives the far-end video bit stream, it can basis retrieve ROI information with the form that long-range sending method, device is decided through consultation, thereby and transmits described ROI information with the access right of access authentication module 58 acquisitions to near-end ROI control before long-range near-end ROI is offered video encoder 56.
Except that control long-distance video encoder so that the ROI in the far-end video is carried out the priority encoding, far-end ROI information also can be applied to the local video decoder in the far-end video MB in the ROI is carried out priority encoding.For example, as the further demonstration among Fig. 3, ROI mapper 54 is known Video Decoder 48 for the identical far-end ROI MB mapping graph that is transferred to long-range encoder and produces can offer ROI.ROI knows Video Decoder 48 and uses ROI MB mapping graph so that the MB in the far-end video that receives from remote video communication device 14 is preferentially decoded.For example, ROI knows Video Decoder 48 and uses than giving the better reprocessing of non-ROI MB for ROI MB.In addition, or another is chosen as, and ROI knows Video Decoder 48 and uses than giving the more healthy and stronger error concealment technology of non-ROI MB for ROI MB.By this way, ROI knows Video Decoder 48 and relies on the picture quality of coming the ROI to the far-end video that enters partly preferentially to decode and strengthen to obtain by the far-end ROI information that the local user produced.
ROI knows Video Decoder 48 and receives the far-end video that enters from remote video communication device (for example, the video communication device 14 of Fig. 1).ROI knows 48 pairs of described far-end videos of Video Decoder and decodes and decoded video is provided to user interface 42 so that present to the local user on display equipment.In addition, as above discuss, ROI knows Video Decoder 48 from remote video communication device 14 receiving remote near-end ROI information (" REMOTENEAR-END ROI ").Knowing near-end ROI information that Video Decoder 48 receives by ROI is that user's generation by remote video communication device 14 is to stipulate by the ROI in 12 transmission of video of video communication device.As mentioned above, knowing long-range near-end ROI information that Video Decoder 48 receives by ROI is used for Long-distance Control ROI and knows video encoder 46 so that the ROI that is produced in the near-end video by video communication device 12 is carried out priority encoding.As above discuss, long-range near-end ROI transmits by signaling technology in band or out of band.
With further reference to Fig. 3, ROI knows video encoder 46 and ROI knows Video Decoder 48 and ROI engine 44 reciprocations.ROI engine 44 processing this locality and long-range near-end ROI information are so that to encoding from the near-end video bit stream of capture device 40 and transmitting.In addition, ROI engine 44 is handled the far-end ROI information that provides via user interface 42 to encode and to be transferred to remote video communication device 14.ROI engine 44 comprises ROI controller 52, ROI mapper 54, ROI tracking module 56 and authentication module 58.In certain embodiments, ROI tracking module 56 and authentication module 58 can be optional.
ROI knows video encoder 46, ROI knows Video Decoder 48, ROI controller 52, ROI mapper 54, ROI tracking module 56 and authentication module 58 and may be formed in various ways for the discrete functionality module or for comprising functional monolithic module of giving each module.In any case ROI knows the various assemblies (comprising ROI engine 44, video encoder 46 and Video Decoder 48) of CODEC 20 can hardware, software, firmware or its make up and realize.For example, these assemblies can operate the software process of carrying out on following array apparatus: one or more microprocessors or digital signal processor (DSP), one or more application-specific integrated circuit (ASIC)s (ASIC), one or more field programmable gate arrays (FPGA) or the integrated or discrete logic of other equivalence.If be structured in the software, then described technology can partly realize by a computer-readable media, described computer-readable media comprises the program code that contains instruction, when carrying out described instruction in processor or DPS, can implement one or more methods mentioned above.
In operation, the user of video communication device 12 select the near-end video that produces by capture device 40 or by ROI know Video Decoder 48 decodings the far-end video in case with display equipment that user interface 42 is associated on watch.In certain embodiments, the functional user of allowance of picture-in-picture (PIP) watches near-end video and far-end video simultaneously.Watch near-end or far-end video for define purpose for ROI, the user can handle user interface 42 and define pattern to call ROI.By default, video communication device 12 can be handled video coding and decoding under the situation of not considering ROI.Define pattern by entering ROI, the ROI that the user starts video communication device 12 knows coding and decoding aspect.Another is chosen as, and ROI knows coding and decoding can be default mode.
After presenting the far-end video, the user uses the arbitrary technology in the various technology to come ROI in the indicating remote video, and this will describe in more detail.Far-end ROI highlights user's interest and expectation in video scenery has the zone of high image quality more or object user's interface 42 and imports according to the user and produce far-end ROI indication.ROI information can further be handled to produce far-end ROI information for being transferred to video communication device 14 by ROI engine 44.
Another is chosen as, and the user can be ROI and defines the near-end video that selection obtains from capture device 40.After presenting the near-end video, the user can optionally use the technology of those technology that are similar to or are same as the ROI indication that is used for the far-end video to indicate ROI in the near-end video.Near-end ROI or far-end ROI can initially be stipulated when VT calls out beginning, or any time during the VT calling procedure is stipulated.In certain embodiments, Initial R OI can be upgraded by local user or long-distance user, or is automatically upgraded by ROI tracking module 56.If ROI automatically upgrades, then do not need the user to continue to import ROI information.On the contrary, will keep ROI till the user changes or stops ROI according to user's initial input.
The indication that user interface 42 provides according to the user produces local near-end ROI indication.Identical with far-end ROI indication, near-end ROI indication can further be handled by ROI engine 44.Near-end ROI indication highlights user expectation to zone or object that the long-distance user emphasizes in video scenery, that is, by the picture quality that strengthens.The local user can or draw the ROI patterns via user interface 42 and select near-end ROI or far-end ROI by the ROI pattern that defines in advance.Drawing the ROI pattern can relate to the free-hand drawing of service recorder pen or default ROI pattern is reseted size or reorientated.
In the example of Fig. 3, user interface 42 provides local near-end ROI indication (if providing) and far-end ROI to indicate both for the ROI controller 52 in the ROI engine 44.In addition, ROI controller 52 is known Video Decoder 48 receiving remote near-end ROI via authentication module 58 from ROI.Specific, ROI knows Video Decoder 48 and detect the existence of long-range near-end ROI information in reception far-end video flowing or via out-of-band signalling, and long-range near-end ROI information is provided to authentication module 58.Local near-end ROI and far-end ROI indication can be represented according to the coordinate in the frame of video of respective proximal video or far-end video.The coordinate of ROI can be the x-y coordinate in the frame of video.Yet as explaining, the x-y coordinate is treated to produce the ROI MB mapping graph for encoder 46 or decoder 48 uses.
ROI controller 54 is handled local near-end ROI, long-range near-end ROI and far-end ROI, and applies it to ROI mapper 54.ROI mapper 54 is transformed into macro block (MB) mapping graph with corresponding ROI coordinate.More particularly, ROI mapper 54 produces far-end MB mapping graph, the MB corresponding to the indicated far-end ROI of local user in the described mapping graph regulation far-end video.In addition, ROI mapper 54 produces near-end ROI MB mapping graph, the MB corresponding to local near-end ROI, long-range near-end ROI or both combinations in the described mapping graph regulation near-end video.
For defining the ROI pattern in advance, the ROI mapping is flat-footed.Each defines the ROI pattern in advance can have the also specified MB mapping for defining in advance.Yet for the ROI pattern of drawing, reorientate or reset size, ROI mapper 54 is selected the most approaching MB border that meets by user's defined ROI pattern coordinate.For example, pass MB if the ROI of defined crosses over, then ROI mapper 54 places the ROI border outward flange or the inside edge of relevant MB.In other words, the MB that ROI mapper 54 can be configured to only will be in fully in the ROI is contained in the ROI MB mapping graph, or also comprises the MB that partly is in the ROI.Under any situation of above two kinds of situations, ROI comprises the full MB of one group of approximate defined ROI the most approaching.Equally, video encoder 46 or Video Decoder 48 operate under the MB grade, and usually need be with ROI translating to the MB mapping graph.Be contained in the ROI or get rid of outside described ROI by indivedual MB are appointed as, ROI MB mapping graph permits defining the ROI with irregular or non-rectangular shape.
ROI knows video encoder 46 and in the near-end video of encoding or via out-of-band signalling far-end ROI MB mapping graph is transferred to remote video communication device 14.Near-end ROI MB mapping graph is not transferred to remote video communication device 14.But near-end ROI MB mapping graph is known video encoder 46 by ROI and is used for using before being transferred to remote video communication device 14 higher-quality coding or stronger error protection to come the defined MB in the near-end video is carried out priority encoding.Therefore, ROI knows the video encoder 46 near-end video of will encoding and is transferred to remote video communication device 14 with the ROI and the far-end ROI information of priority encoding.
ROI tracking module 56 is followed the tracks of the change in the ROI zone of near-end video.For example, if VT uses and to reside in the mobile video communication device, then the user may the time and can move, this can cause the position change of user with respect to previous defined ROI.In addition, even customer location is stable, other object in the ROI may shift out the ROI zone.For example, the canoe in the lake can swing up and down along with wave motion or move left and right.For avoiding needing the user when appearance is mobile, heavily to define ROI, can provide ROI tracking module 56 automatically to follow the tracks of the object in the ROI zone.
In the example of Fig. 3, ROI tracking module 56 is from being known the near-end of the coding video reception movable information that video encoder 46 produces by ROI.Movable information can be taked the form at the motion vector of the MB in the near-end video of encoding, defines with 54 couples of ROI MB of allowance ROI mapper mapping graph and carries out closed circuit control.According to movable information, ROI tracking module 56 is that near-end ROI MB mapping graph produces the incremental counter adjusting, and described adjusting is provided to ROI mapper 54.Position adjustments can be taked the MB state to change into and be contained in the interior or form of eliminating outside ROI of ROI.
If movable information indication ROI obviously moves, then the MB state in the ROI MB mapping graph changes.Usually, change will be at taking place at the MB at ROI external boundary place for state.In response to position adjustments, ROI mapper 54 makes the ROI displacement of near-end ROI MB mapping graph defined, the moving in the near-end video so that the ROI position is adapted to encode on basis frame by frame.When cooperating with the motion in detecting video scenery, ROI tracking module 56 and ROI mapper 54 automatically regulate the ROI position.By this way, ROI engine 44 is regulated ROI to follow the tracks of the mobile object in the ROI.
Authentication module 58 is used to solve long-distance user's ROI right, wherein comprises individual user's right and the right priority in the middle of a plurality of user.When ROI knew Video Decoder 48 from remote video communication device 14 receiving remote near-end ROI, it was provided to ROI engine 44 with long-range near-end ROI.Yet in some illustration, the long-range near-end ROI that is stipulated by the long-distance user may clash with the local near-end ROI by local user's regulation.For example, this locality and long-distance user can stipulate overlapping ROI or diverse ROI in video scenery.In this case, can provide authentication module 58 to solve the ROI conflict.
In certain embodiments, authentication module 58 can be used what is called " principal and subordinate " mechanism and coordinates should use which near-end ROI information (Local or Remote) at given time.Specific, before transmit leg received receiver-driven ROI information, transmit leg was near-end owner ROI and controls its near-end ROI.In other words, before video communication device 12 places received long-range near-end ROI, the local user controlled near-end ROI.Then, the long-distance user becomes near-end ROI " foot man " and does not control near-end ROI, unless owner (that is, local user) authorizes the access right of control near-end ROI.
In case the local user authorizes the long-distance user with access right, then the local user no longer controls its near-end ROI.But the long-distance user that is associated with video communication device 14 obtains the control of the near-end ROI of the video communication device 12 near-end video that produces and becomes the owner of near-end ROI.But long-distance user's retentive control up to the local user cancel clearly access privilege or in addition the long-distance user refuse access, or the long-distance user stops near-end ROI and selects, owner ROI control recovers to give the local user in the case.
Receive the far-end video (if any) of encoding in case ROI knows Video Decoder 48, then it is according to retrieving long-range near-end ROI information with the form of transmit leg negotiation from video bit stream.Equally, near-end ROI information can be embedded in the far-end video of having encoded or by out-of-band signalling and send.Under any situation of above two kinds of situations, ROI knows Video Decoder 48 long-range near-end ROI is delivered to authentication module 58 to obtain access right before knowing video encoder 46 long-range near-end ROI being sent to ROI via ROI controller 52 and ROI mapper 54.The access right that the specific user is given in authentication module 58 restrictions is so that user's uncontrollable cataloged procedure under the situation that does not have the local user to authorize.
Authentication module 58 can be configured to authorize and manage access right and access grade between one or more long-distance users.For example, the local user can authorize access right selected long-distance user.Therefore, the local user can permit some long-distance user and controls near-end ROI and forbid that other long-distance user controls near-end ROI.Equally, the local user can give relative access grade or the priority of long-distance user's assignment.By this way, the local user can stipulate the level of access grade between the long-distance user, and some long-distance user can have the priority that is better than other long-distance user aspect the control near-end ROI under the situation of ROI control so that ask simultaneously a plurality of long-distance users.For example, a plurality of long-distance users can ask ROI control simultaneously in the multipart video-meeting process.Under these situations,, then authorize the selected user among the long-distance user if ROI control is authorized a user, arbitrary local user usually exclusively or authorized by the local user.
In certain embodiments, authentication module 58 also may be responsible for monitoring resource to determine whether local video communication device 12 has ability and enable ROI and know Video processing.If local device does not have sufficient processing resource to control to support long-range ROI at given time, or the ROI request of service particular type, then authentication module 58 is cancelled long-range ROI control access right or is refused the ROI request.As an example, the bandwidth constraints that applies of communication channel or locally handle load and can cause refusal to long-range ROI control.As further example, these restrictions can be permitted using pre-configured ROI pattern, draw or describe the ROI pattern but disapprove.Authentication module 58 can have been encoded in the near-end video ROI decision notice remote-control device by status message being embedded in the output that will send to remote-control device.
In addition, can authorize the different access grade of indivedual long-distance users makes long-distance user's may command near-end ROI with control degree.For example, the long-distance user can be restricted to and select one group to define ROI pattern, specific ROI position or size in advance or only ratify the back ROI is stipulated the local user.Therefore, authentication module 58 can automatically solve the long-distance user control of near-end ROI or by consulting to ratify long-distance user's near-end ROI control on one's own initiative with local user's reciprocation.For example, when the long-distance user asked access with control near-end ROI, authentication module 58 can present query to the local user via user interface 42, with request to long-distance user ROI control ratify.
The access grade that the arbitrary mode of authentication module 58 in can be in every way followed the tracks of the long-distance user.As mentioned above, the local user can ratify on one's own initiative to the request to control near-end ROI from the long-distance user, and controls the access grade of authorizing the long-distance user effectively.Another is chosen as, and the local user can keep address book in the memory in the video communication device 12, and the information that described address book storage is associated with the long-distance user wherein comprises access right or grade.Address book can be taked the form of database, and described database has long-distance user's inventory and related access grade.When the long-distance user asks near-end ROI when control, authentication module 58 retrieve relevant access right information and is controlled with the ROI that solves between local user, long-distance user and possible several long-distance users in application verification process on the spontaneous basis from address book.If the long-distance user is not put in the address book, then the local user can select to add described long-distance user to address book having under the situation of applicable access right.
In some cases, but local user's overlapping address is the default access grade of particular remote user defined in thin.For example, authentication module 58 can be permitted the local user and rearranged the ROI control priority between the different long-distance users during the VT calling procedure on one's own initiative or interfered to regain the exclusiveness control to near-end ROI as the local user.Keeping address book or managing on one's own initiative in the ROI control request, the reciprocation between local user and the authentication module 58 is represented by the ACCESS CONTROLINFO among Fig. 3.
After the near-end ROI control of automatically or on one's own initiative ratifying the long-distance user, authentication module 58 is delivered to ROI controller 52 with long-range near-end ROI so that handled and shone upon by ROI mapper 54.Another is chosen as, that is, do not control near-end ROI if provide long-range near-end ROI or local user to select to get rid of the long-distance user, then ROI controller 52 is handled the local near-end ROI that the local user provides via user interface 42.
Authentication module 58 is used to solve this locality and conflicts with ROI between the long-distance user.By default, authentication module 58 is used principal and subordinate's notion that the local user has near-end ROI control by this.After the access right that will have highest ranking was authorized the long-distance user, the long-distance user can control the near-end ROI selection of knowing video encoder 46 at the ROI of video communication device 12 fully.Otherwise the local user has the near-end ROI control that the near-end ROI that can overlapping long-distance user have done selects.
Even can authorize long-distance user's access right, but the local user still can preponderate aspect the near-end ROI control, and this is because long-distance user's access right is in the grade lower than those access rights of local user usually.Therefore, if the local user selects to stipulate near-end ROI, then the arbitrary near-end ROI that has been done by the long-distance user selects and can ignore.On the other hand, if the local user does not stipulate near-end ROI, then assignment gives long-distance user's access right grade effective, and the long-distance user can control near-end ROI.However, as mentioned above, the local user still can select overlapping default master slave relation and give the access right of local user's highest ranking.
Fig. 4 be graphic extension have ROI know codec and further incorporate into another video communication device 12 that ROI extraction module 60 is arranged ' block diagram.The video communication device 12 of Fig. 4 ' almost the video communication device 12 with Fig. 3 is consistent.Yet, video communication device 12 ' further comprise ROI extraction module 60 to form local near-end ROI and far-end ROI according to input from the user.Except that simplifying the selection or the allowance user that handle pre-configured ROI pattern default ROI is drawn, reorientates or resets the size, ROI extraction module 60 is also permitted the local user and is illustrated regulation ROI by language or text ROI.Specific, ROI extraction module 60 illustrates according to the ROI that is provided by the local user and produces local near-end ROI or far-end ROI.
The example of ROI explanation for example comprises the text or the language input of the wording of " face ", " mobile object ", " lip ", " mankind ", " background " and analog.High expectations is carried out priority encoding to described object.For example, lip or facial priority encoding can be represented facial expression, the degree of lip-rounding of speaking and analogue better.The text input can be imported from the menu that is presented by user interface 42 or select.Language input can be by facing to providing with the microphone talk of video communication device 12 ' be associated.Under each situation, local user " description " ROI but not select or draw ROI.ROI extraction module 60 changes explanation in near-end applicatory or the far-end video scenery one group of coordinate.Under the situation of using language ROI explanation, user interface 42 or ROI extraction module 60 can be incorporated conventional language resolving ability into.Specific, ROI extraction module 60 can produce the information of regulation ROI according to one or more wording through distinguishing.
ROI extraction module 60 is automatically selected the ROI coordinate by the conventional precoding processing algorithm that application is configured to detect desired ROI.Specific, ROI extraction module 60 can be used algorithm so that carry out facial detection, feature extraction, object fragments or tracking according to the routine techniques known to the skilled in the video ROI process field.For example, but the routine techniques that ROI extraction module 60 application-dependent are discerned in ROI, and described ROI identification is based on the brightness or the chromatic value of the pixel of video input data.
Conventional facial detection scheme involves usually to be discerned the colour of skin facial to non-facial pixel as guide.The case description of conventional facial detection scheme is in C.-W.Lin, Y.-J.Chang and Y.-C.Chen, " A low-complexityface-assisted coding scheme for low bit-rate video telephony " IEICE Trans.Inf.﹠amp; Syst., the E86-D volume, the 1st phase, in January, 2003, reach D.Chai and K.N.Ngan in the 101-108 page or leaf, " Facesegmentation using skin-color map in videophone applications; " IEEE Trans On Circuitsand Systems for Video Technology, the 9th volume, the 4th phase, in June, 1999 is in the 551-564 page or leaf.
When the local user according to " face " when describing ROI, ROI extraction module 60 analyze near-ends or far-end video (as long as applicable) with identification automatically facial and specify with through discerning coordinate that face is associated as ROI.ROI extraction module 60 is delivered to ROI controller 52 with coordinate then so that handled and shone upon by ROI mapper 54.It should be noted that ROI extraction module 60 is handled local near-end ROI explanation or far-end ROI (as long as applicable) is described, described explanation is mapped to suitable extraction algorithm and automatically analyzes precoding near-end video applicatory or pre decoding far-end video automatically extracts suitable ROI.
For supporting automatic ROI to detect, ROI extraction module 60 receives the near-end video and knows Video Decoder 48 from ROI from capture device 40 and receives the far-end video.Use is from local near-end ROI explanation or the far-end ROI explanation and the automatic detection algorithm of user interface 42, and ROI extraction module 60 local near-end ROI of generation and far-end ROI (as long as applicable) are so that be applied to ROI controller 52.In all cases, ROI extraction module 60 can be transformed into local near-end ROI explanation or far-end ROI explanation the coordinate of the most suitable applicable explanation.In this case, do not need the user to draw ROI.In addition, the user can not be limited in one group and define the ROI pattern in advance.But ROI controller 52 detects the suitable zone of coupling ROI explanation in the near-end video on one's own initiative.
ROI mapper 54 is the relevant macro block (MB) of ROI coordinate to the frame of video, and produces near-end or far-end ROIMB mapping graph.In fact, ROI mapper 54 will be translated into video encoder 46 intelligible forms from the ROI coordinate of ROI controller 52.Specific, video encoder 46 is through being equipped with handling the coding under the MB grade, that is, on the basis of MB to MB.For this reason, ROI mapper 54 is that near-end or far-end video produce ROI MB mapping graph.The identification of ROI MB mapping graph belongs to the MB in the specified ROI so that video encoder 46 can be used priority encoding to those MB.
Except that handling the ROI explanation, ROI extraction module 60 also can define the ROI pattern of selecting the pattern or drawn, reorientated or reset size by this degree user from one group in advance through being equipped with to handle the local user.Therefore, video communication device 12 ' can produce roughly as about the video communication device 12 described ROI information of Fig. 3, ROI extraction module 60 be arranged to handle the ROI explanation with textual form or linguistic form input by the local user but further incorporate into.ROI extraction module 60 may need aspect local user's use being easy to.Yet some video communication device may not have sufficient processing power and support ROI extraction module 60.Therefore, 60 expressions of ROI extraction module are according to the still optional assembly of the needs of video communication device of the present invention.
In certain embodiments, can to handle not be only by the local user but by ROI explanation that the long-distance user produced to ROI extraction module 60 yet.By this way, in some device, remove at this and extraterrestrially also can remotely implement extraction functionality.For example, particular video frequency communicator 14 may not have sufficient local resource or ability is supported the ROI that the user provided explanation the carrying out ROI by device 14 is extracted.Yet another video communication device 12 can be used for ROI through outfit better and extract.In this case, the local ROI extraction that can be unloaded to or be assigned to remote video communication device is contained in the present invention.
For supporting long-range extraction, can in every way the ROI explanation be provided to remote-control device.For example, the language explanation can be contained in the audio stream that is transferred to remote-control device.Similarly, text ROI explanation and the ROI pattern that defines ROI pattern or drafting in advance can be embedded in these information by (for example) and be transferred to remote-control device in the encoded video streams.Therefore, the ROI information that sends to another device from a device can be taked through pretreated ROI MB mapping graph form, or any other ROI indication or form illustrated, wherein comprise indication or the explanation that need before being applied to long-range encoder, handle in the remote-control device place.
Fig. 5 extracts the block diagram that server 61 distributes ROI to extract in the middle of graphic extension is passed through.As shown in Figure 5, video communication device 12,14 can provide the information of abundance to the centre and extract server 61 so that can extract ROI.For example, each device 12,14 can provide corresponding local near-end ROI explanation, far-end ROI to illustrate, encoded or the original near-end video and the far-end video of having encoded.As the replacement scheme that the far-end video of encoding is provided from near-end device, ROI extracts server 61 can directly receive the far-end video from far end device.Use this information, extract the one or both that server 61 produces among far-end ROI and the local near-end ROI, and it is provided to corresponding device thereof 12,14.Extraction server 61 can be and is positioned at communication network server Anywhere, and can be coupled to device 12,14 by wired media, wireless medium or both combinations.Extraction server 61 can be positioned at apart from video communication device 12,14 remote places or with device 12,14 and quarter at same place.Yet, in many illustrations, extract server 61 and can be remote server.Generally speaking, extraction server 61 is structurally completely different with video communication device 12,14.
The effect of extracting server 61 is more as extraction module 60, but it is in running on the basis of long-range distribution so that need implement local ROI in device 12,14 and extract.By this way, the processing cost that ROI extracts can be assigned to different devices, thereby can have bigger processing power.Be similar to ROI extraction module 60, extract server 61 and can handle dissimilar ROI explanations, for example, customer-furnished language, text or picture specification.For this reason, ROI extraction server 61 can be incorporated into the ability that is suitable for handling described explanation, for example, and the language resolving ability.In addition, ROI extracts server 61 and can be equipped with video decoding capability with allowance the extraction of video and ROI to be analyzed, and code capacity is to carry out recompile and to embed ROI information (if desired) to video.
Fig. 6 is illustrated as the block diagram that a plurality of video telephony conversation distribute ROI to extract.In the example of Fig. 6, ROI extracts server 61 runnings to extract at the VT dialog process ROI between a plurality of video communication device 12A-14A, 12B-14B, the 12C-14D to 12N-14N.By this way, ROI extracts server 61 and implement the various VTs dialogues of a plurality of ROI extraction tasks to support to carry out abreast on set communication network.
To be graphic extension define the graphic of ROI pattern for what the Local or Remote user selected to Fig. 7 A-7D in advance.The ROI pattern of Fig. 7 A-7D is for the example purpose, and it should be considered as restriction.Fig. 7 A shows the ROI 62 in the video scenery 34 of being presented on the display 36 that is associated with radio communication device 38.ROI 62 is basic rectangles that its center roughly is positioned at video scenery 34.The main length of rectangle ROI 62 vertical extent in video scenery 34.In many cases, the rectangle ROI 62 that defines the center in advance will capture people's face effectively, that is, the long-distance user's that participation VT calls out face.
Fig. 7 B shows another rectangular ROI 64, and described rectangle has horizontally extending main length in video scenery 34.The center of ROI 64 roughly is positioned at video scenery 34 and can captures object effectively, for example, and vehicle, canoe, product, gift and analog.
Fig. 7 C shows another ROI 66, and it has through design with the long-distance user's of seizure participation VT calling the face and the shape of shoulder.Another is chosen as, and ROI 66 can catch the reporter, party host or the meeting speechmaker's that are just reporting news broadcast face and shoulder (for example, in one-way video stream is used).In any case, define ROI 66 in advance and focus on the human VT participant or the person of presenting, and realize this people's substance feature is carried out priority encoding.
Fig. 7 D is presented at the two groups of ROI 68,70 that present side by side in the video scenery 34.In the example of Fig. 5 D, ROI68,70 can capture two people's that sit in a line or stand face effectively.By this way, can carry out priority encoding to two participants' face to support facial expression and the more high image quality that moves.
That is described among Fig. 7 A-7D defines the ROI pattern in advance all for the purpose of graphic extension.What can provide that other has alternative site or a shape defines the ROI pattern in advance.For example, some ROI pattern can have circular or irregularly shaped, as long as described pattern can be mapped to the MB border.
In certain embodiments, can permit the user resets size to selected ROI pattern or reorientates.Can use traditional cursor and turning drive technology to realize reseting size and reorientate.In addition, can drag or realize convergent-divergent again by the turning the ROI size by regulation zoom percentage clearly.Certainly, when ROI became big, the degree of priority encoding reduced because of bandwidth constraints.Therefore, in some cases, can in video communication device 12, execute maximum ROI size.
Fig. 8 is illustrated in recipient's device place to produce far-end ROI information and be in the flow chart of the preferential ROI coding that carries out in the near-end video to be controlled at sending method, device.The process of describing among Fig. 8 can be implemented in the video communication device 12 of Fig. 3 of the video communication device 12 ' of Fig. 4.In operation, the ROI in the video communication device 12 know 48 pairs of far-end videos from long-range sending method, device (for example, video communication device 14 (Fig. 1)) of Video Decoder decode (72).After the far-end video was decoded, the user interface 42 demonstration far-end videos of recipient's device 12 were watched (74) for the local user.
If the local user does not ask near-end ROI to select (76), then do not take to move and not to the next frame of far-end video decode (72).Yet, if request near-end ROI selects (76), the far-end ROI information (78) that user interface 42 is accepted from the local user.Then, ROI controller 52 and 54 cooperations of ROI mapper are to produce far-end ROI MB mapping graph (80).ROI knows encoder 46 and is embedded in far-end ROI MB mapping graph in the near-end video of encoding and thus far-end ROI mapping graph is transferred to the long-range sending method, device 14 (82) that the far-end video is decoded.Far-end ROI MB mapping graph regulation is tackled the interior MB application priority encoding of relevant ROI of the far-end video that will send to video communication device 12 with the encoder that remote video communication device 14 is associated.
Fig. 9 is that the flow chart that the near-end video is carried out preferential ROI coding at the sending method, device place is followed the tracks of in graphic extension to handling from the ROI information of recipient's device so that in conjunction with ROI.In the example of Fig. 9, user interface 42 receives the near-end video flowing that is produced by capture device 40 and nearly looks closely and present to local user (84) frequently.If local user or long-distance user do not ask near-end ROI to select (86), then all MB in each frame of video are carried out normal encoding (88), that is, the MB in the ROI is not carried out any priority encoding.Then, the near-end video of will encoding sends to long-range recipient's device 14 (89).
Yet if local user or long-distance user all ask near-end ROI to select (86), ROI controller 52 and ROI mapper 54 are handled relevant near-end ROI information to produce near-end ROI MB mapping graph (90).If near-end ROI is that both stipulate that then authentication module 58 can be interfered to manage conflict to help one among the ROI by local user and long-distance user.After receiving near-end ROI MB mapping graph (90), ROI knows video encoder 46 and comes to the MB in the ROI encode (92) by using higher-quality coding, stronger error protection or both.
Tracking module 56 is known the movable information that video encoder 46 produced by monitoring ROI and is followed the tracks of the position (94) of ROI in the near-end video.If detect displacement (96) is not arranged among the ROI, then use existing ROI mapping graph the ROI MB in the near-end video is encoded (100) and the near-end video of will encoding sends to long-range recipient's device (102).If detect displacement (96) is arranged among the ROI, then video tracking module 56 (100) before the near-end video is encoded is regulated ROI MB mapping graph (98) according to movable information.
Figure 10 be graphic extension to handling from the ROI information of recipient's device so that at the sending method, device place near-end video is carried out the flow chart of preferential ROI coding in conjunction with user rs authentication.The operation of the authentication module 58 of Figure 10 depiction 3 or Fig. 4 is controlled near-end ROI and for simplicity to permit the long-distance user, does not suppose local near-end ROI is stipulated.As shown in Figure 10, for the near-end video flowing (104) that is produced by the capture device in the video communication device 12 40, authentication module 58 determines whether the long-distance user of video communication device 14 has asked long-range near-end ROI (106).
If do not ask long-range near-end ROI (106), and do not ask local near-end ROI, then all MB in the near-end video are carried out normal encoding (110).Yet if ask long-range near-end ROI (106), next authentication module 58 determines whether the long-distance user of request near-end ROI has passed through checking (108).Specific, authentication module 58 can be by automatically determining long-distance user's access right with reference to be stored in address book in the video communication device 12 in this locality.Another is chosen as, and authentication module 58 can inquire that on one's own initiative the local user is to obtain approval or the refusal to the access right of being implemented near-end ROI control by the long-distance user via user interface 42.
If the long-distance user by checking (108), does not then carry out normal encoding (110) to all MB in the near-end video.Yet,, authorize the long-distance user with near-end ROI control if the long-distance user has passed through checking (108).In this case, ROI controller 52 and ROI mapper 54 handled from long-distance user's near-end ROI information and produced near-end MB mapping graph (112).Use near-end MB mapping graph, ROI knows 46 couples of MB by the identification of near-end MB mapping graph of encoder and carries out priority encoding (114).Then, the video communication device 12 near-end video of will encoding sends to remote video communication device 14 (116).
Figure 11 is graphic extension to the flow chart of the selection of defining the ROI pattern in advance.Decode (118) in case ROI knows 48 pairs of far-end videos that receive from remote video communication device 14 of Video Decoder, promptly the far-end video is shown to local user (120) via user interface 42.If the local user asks ROI to select (122), then user interface 42 shows the menu (124) that defines the ROI pattern in advance, for example, and shown those ROI patterns among Fig. 7 A-7D.Another is chosen as, and the user can provide the ROI explanation or size is drawn, reorientates or reseted to the ROI pattern.Yet in the example of Figure 11, operation focuses on defining presenting of ROI pattern in advance.Selected to define the ROI pattern in advance by the local user after (126), ROI controller 52 and ROI mapper 54 are according to selected pattern defining ROI MB mapping graph (128).ROI knows video encoder 46 and ROI MB mapping graph is embedded in the near-end video of encoding and with ROI MB mapping graph is transferred to remote video communication device 14 (130) for the ROI in the far-end video is being carried out using in the priority encoding.
Figure 12 is graphic extension by expansion and shrinks ROI template 132 and define the graphic of ROI pattern in the shown video scenery 34.Figure 12 is roughly corresponding to Fig. 2, but graphic extension can be reseted presenting of big or small ROI template 132 by the user.In the example of Figure 12, ROI template 132 can drag one in the ROI template turning by the turning and expand and shrink the ROI template and reset size.Carry out the turning and drag result with expansion ROI template 132 by representing through the ROI template 134 of expansion.The turning drags increase or the reduction that causes ROI template 132 sizes, but keeps relative length to the width ratio convergent-divergent.Yet, in certain embodiments, also can permit the side that the user drags ROI template 132 and also change length to the width ratio convergent-divergent simultaneously with the size that increases or reduce the ROI template.Drag the recording pen that can use, or other indicator device that is associated with the user interface 42 of video communication device 12 is realized with Touch Screen.Other indicator device can comprise joystick, touch pads, scroll wheel, tracking ball and analog.
To be graphic extension define the graphic of ROI pattern in the shown video scenery by dragging ROI template 132 to Figure 13.Specific, Figure 13 demonstration is reorientated ROI template 132 by the another locations 135 that the ROI template dragged in the video scenery 34.Drag and to realize by recording pen and touch-screen or the indicator device that is associated with user interface 42.
To be graphic extension define the graphic of ROI pattern in the shown video scenery by draw ROI pattern 136 on touch-screen with recording pen 138 to Figure 14.In the example of Figure 14, ROI pattern 136 is to produce by free-hand drawing.ROI controller 52 and 54 cooperations of ROI mapper become the MB mapping graph with the Coordinate Conversion that will be associated with drawn ROI pattern, approximately belong to the MB in ROI pattern 136 scopes in the described mapping graph identification video scenery 34.Defining of ROI pattern shown in Figure 12,13 and 14 can be applicable to near-end video or the interior ROI of far-end video.
Figure 15 is that graphic extension uses pull-down menu 140 and the defined ROI object of wanting dynamic tracking to define the graphic of ROI pattern in the shown video scenery.As shown in Figure 15, user interface 42 presents pull-down menu 140, shown in menu present ROI explanation, for example, " facial (FACE) ", " lip (LIP) ", " background (BACKGROUND) " reach " mobile (MOVEMENT) ".The local user selects an input item as needed ROI explanation in pull-down menu.In response to this, ROI extraction module 60 (Fig. 4) analysis near-end video or far-end video (as long as applicable) are to detect the ROI pattern corresponding to described explanation.As the replacement scheme of pull-down menu 140, the user can or be read in microphone with language form with text by user interface 42 and be come input text.In all cases, use traditional feature detection algorithm (for example, skin-tone detection, object fragments or similar approach) that selected ROI is matched with suitable ROI pattern.After selecting the ROI pattern, ROI controller 52 and ROI mapper 54 produce suitable ROIMB mapping graph.Each ROI explanation must dynamically mate consider on the meaning of the ROI pattern in the particular video frequency scenery that the process among Figure 15 can be described as " dynamically ".
Figure 16 be graphic extension use pull-down menu 142 be mapped to define as the defined ROI object that defines the ROI pattern among Fig. 7 A-7D in advance as shown in ROI pattern graphic in the video scenery.As shown in Figure 16, user interface 42 presents pull-down menu 142, described menu presents the ROI explanation, and for example, " single face portion (SINGLEFACE) ", " two-sided (DUAL FACE) " " head/shoulder (HEAD/SHOULDERS) " reach " object (OBJECT) ".The local user selects an input item as needed ROI pattern in pull-down menu.In response to this, ROI controller 52 with selected ROI pattern match in correspondence define the ROI pattern in advance, for example, those patterns of being described among Fig. 7 A-7D.Therefore, be different from the ROI explanation shown in Figure 15, static ROI pattern does not need video analysis.But, the pre-configured ROI MB mapping graph that ROI controller 52 and ROI mapper 54 produce corresponding to the alternate item in the pull-down menu 142.Equally, as the replacement scheme of pull-down menu 142, the user can or be read in microphone with language form with text by user interface 42 and be come input text.Each ROI pattern all corresponding to the meaning that defines ROI pattern and MB mapping graph in advance on, the process among Figure 15 can be called " static ".
Figure 17 is that graphic extension uses the ROI specification interface to define the flow chart of the ROI pattern in the shown video scenery.Process shown in Figure 17 can be used in combination with pull-down menu or other input medium of Figure 15.As shown in Figure 17, ROI knows far-end videos that 48 pairs of Video Decoders receive from long-range sending method, device 14 decode (144).Then, user interface 42 is shown to local user (146) with the far-end video.If the local user does not ask the near-end ROI of far-end video is selected (148), then there is not ROI information to send to remote video communication device 14.Yet if request near-end ROI selects (148), user interface 42 presents ROI specification interface (150), for example, and the pull-down menu 140 of Figure 17.
After receiving local user ROI explanation (152), ROI controller 52 and ROI mapper 54 are according to described explanation selection ROI pattern (154) and according to selected ROI pattern defining ROI MB mapping graph (156).Equally, can be by using the traditional detection technology to analyze the far-end video and the specific MB in ROI explanation and the far-end video being mated to determine selected ROI pattern.After producing far-end ROI MB mapping graph, ROI knows video encoder 12 and far-end ROIMB mapping graph is embedded in the near-end video of having encoded and with it is transferred to remote video communication device 14 for far-end ROI is carried out priority encoding.
Figure 18 is that graphic extension solves the flow chart that transmit leg conflicts with ROI between recipient's device 12,14.Specific, the running of Figure 18 graphic extension authentication module 58 (Fig. 3 or Fig. 4) when solving conflicting between the near-end ROI specified and the near-end ROI specified by the long-distance user by the local user.After the sending method, device place produces the near-end video (160), authentication module 58 determines that near-end ROI still have been long-distance user file a request (162) by the local user.If no, then all MB are carried out normal encoding (164) and do not need ROI is carried out priority encoding, and the encoded video of gained is sent to recipient's video communication device 14 (166).
If request near-end ROI (162), then authentication module 58 determines whether there are conflict (168) between near-end ROI that is stipulated by the local user and the near-end ROI that is stipulated by the long-distance user.If there is no Gui Ding long-range near-end ROI, if or local consistent with long-range near-end ROI, then verify and selected near-end ROI can be delivered to ROI controller 52 to handle.
If there is no local near-end ROI, but selected long-range near-end ROI, then authentication module 58 can permit using long-range near-end ROI.Another is chosen as, and in certain embodiments, is only authorizing the long-distance user explicit access, by local user's reciprocation or by under the situation that is recorded in the access grade in the address book, authentication module 58 can permit using long-range near-end ROI.If there is no ROI conflict, then ROI mapper 54 produces near-end MB mapping graph and applies it to ROI according to near-end ROI applicatory and knows video encoder 46.Then, the MB that knows in the ROI of 46 pairs of near-end videos of video encoder of ROI carries out priority encoding (172).
If have conflict (168) between local and the long-range near-end ROI, then authentication module 58 determines whether assignment access grade (174), for example, and in the address book in this locality is stored in video communication device 12.If assignment access grade (174), then authentication module 58 solves ROI conflict (176) according to the access grade.For example, the long-distance user's who is stored access grade can be indicated and should be authorized the ROI control that the long-distance user is better than the local user.If assignment access grade (174) not, then authentication module 58 is sought approval (178) to long-range ROI control from the local user.Specific, authentication module 58 can present query via user interface 42 and implement near-end ROI control to request for permission by the long-distance user.
If the local user ratifies, then authentication module 58 is delivered to ROI controller 52 so that handle with long-range near-end ROI.If ratify, then ROI controller 52 is handled local near-end ROI.Under arbitrary situation of above two kinds of situations, ROI knows video encoder 46 and uses selected ROI to come the MB that belongs in the near-end video in the ROI scope is carried out priority encoding (172), and the near-end video of will encoding reads into long-range recipient's device 14 (166).In some cases, authentication module 58 not only can solve local user and a long-distance user also can solve with several possible long-distance users between ROI conflict.The local user can authorize one among the long-distance user with the access right of control near-end ROI on one's own initiative, or assignment can be distinguished the relevant access grade of different long-distance users' ROI control order of priority.Usually, authorize a user exclusively with the access right of control ROI, for example, one among local user or the long-distance user.
Figure 19 is the flow chart that graphic extension is preferentially decoded to the ROI macro block in the far-end video.As shown in Figure 19, after receiving the far-end video from long-range sending method, device 14 (180), the ROI in local reception side's device 12 knows Video Decoder 48 and determines whether far-end ROI is stipulated (182) by the local user.If no, then all MB of knowing in 48 pairs of far-end videos of Video Decoder of ROI carry out normal encoding (184).Yet, if far-end ROI information is stipulated by the local user that ROI knows 48 pairs of Video Decoders and receives ROIMB in the far-end video preferentially decode (186).Can come ROI MB is preferentially decoded by using higher-quality interpolation equation formula or more healthy and stronger error concealment technology (with respect to interpolation equation formula that is applied to non-ROIMB and error concealment technology).Preferential decoding can comprise preferential reprocessing, for example, higher-qualityly deblocks or removes the singing filter.
The techniques described herein can be implemented in hardware, software, firmware or its arbitrary combination.If be structured in the software, then described technology can partly realize that by a computer-readable media described computer-readable media comprises the program code that contains instruction, when carrying out described instruction, can implement one or more methods mentioned above.In this case, computer-readable media can comprise random-access memory (ram), for example Synchronous Dynamic Random Access Memory (SDRAM), read-only memory (ROM), nonvolatile RAM (NVRAM), EEPROM (Electrically Erasable Programmable Read Only Memo) (EEPROM), flash memory, magnetic or optical data storage media and similar device.
Program code can be carried out by one or more processors, for example one or more digital signal processors (DSP), general purpose microprocessor, application-specific integrated circuit (ASIC) (ASIC), field programmable logic array (FPGA) or the integrated or discrete logic of other equivalence.In certain embodiments, can provide function as herein described being configured the dedicated software modules or the hardware cell that are used for Code And Decode, or function described herein is incorporated in the Video Codec (CODEC) of combination.
This paper has set forth various embodiment.These and other embodiment still belongs in the category of following claims.

Claims (54)

1. method, it comprises:
Receive the information of regulation from remote-control device by the area-of-interest (ROI) in the local device coding and the near-end video that receive by described remote-control device; And
According to described ROI described near-end video is encoded to strengthen the picture quality of described ROI with respect to the non-ROI district of described video.
2. the method for claim 1, it further comprises the described near-end video transmission of having encoded to described remote-control device, and receives the far-end video by described remote-control device coding in described local device.
3. the method for claim 1, it further comprises the information that receives the described ROI of regulation, and described ROI has the far-end of the coding video that receives from described remote-control device, and the information of the described ROI of wherein said regulation is embedded in the described far-end video of having encoded.
4. the method for claim 1, it further comprises the information that receives the described ROI of described regulation by the out-of-band signalling from described remote-control device.
5. the method for claim 1, it further comprises:
In described local device, receive far-end video by described remote-control device coding;
Produce the information of the ROI in the described far-end video of having encoded of regulation; And
Described ROI information is transferred to described remote-control device with the described near-end video of having encoded.
6. the method for claim 1, it further comprises:
In described local device, receive far-end video by described remote-control device coding; And
The described far-end video of having encoded that receives from described remote-control device is decoded to strengthen the picture quality of the described ROI the described far-end video with respect to the non-ROI district of described far-end video.
7. method as claimed in claim 6 is wherein used higher-quality reprocessing or error concealment technology to the described far-end video of having encoded the non-ROI district that comprises with respect to described far-end video of decoding to the described ROI in the described far-end video.
8. the method for claim 1, it comprises that further the information according to the described ROI of described regulation produces the MB mapping graph that identification is in the macro block (MB) in the described ROI.
9. the method for claim 1 is wherein used higher-quality coding or error protection technology to the described near-end video non-ROI district that comprises with respect to described near-end video of encoding to the described ROI in the described near-end video.
10. the method for claim 1, the long-distance user that checking was associated with described remote-control device before it further was included in and according to described ROI described near-end video is encoded.
11. method as claimed in claim 10 is wherein verified to comprise to determine whether to authorize described long-distance user to control the coding to described near-end video according to described ROI.
12. method as claimed in claim 10 is wherein verified to comprise from the local user who is associated with described local device and is sought described long-distance user's control according to the mandate of described ROI to the coding of described far-end video.
13. the method for claim 1, wherein comprise the information that receives a plurality of ROI in the described near-end video of regulation from a plurality of remote-control devices from remote-control device reception information, described method comprises that further the long-distance user that checking is associated with described remote-control device controls the coding to described near-end video according to corresponding ROI to select one among the described long-distance user.
14. the method for claim 1, it further comprises:
The movable information that monitoring is associated with the described near-end video of having encoded;
Regulate described ROI according to described movable information; And
According to described ROI described near-end video is encoded through regulating.
15. method as claimed in claim 14, it comprises that further producing identification according to the information of the described ROI of described regulation is in the MB mapping graph of the macro block (MB) in the described ROI, and wherein regulate described ROI comprise according to described movable information with the status modifier of MB for be contained in the described ROI or eliminating outside described ROI.
16. the method for claim 1, the information of wherein said regulation ROI comprises text or language message, and described method further comprises according to described text or language message and defines described ROI.
17. method as claimed in claim 16, wherein define described ROI be included in described local device and described remote-control device at least one intermediate server place that communicates by letter define described ROI.
18. a video coding apparatus, it comprises:
Area-of-interest (ROI) engine, it receives the information of the area-of-interest (ROI) in the near-end video that regulation is transferred to described remote-control device from remote video communication device; And
Video encoder, it is encoded to strengthen the picture quality of described ROI with respect to the non-ROI district of described video to described near-end video.
19. to described remote-control device, described device further comprises the Video Decoder of reception by the far-end video of described remote-control device coding with the described near-end video transmission of having encoded for device as claimed in claim 18, wherein said video encoder.
20. device as claimed in claim 19, wherein said Video Decoder receives the information of the described ROI of regulation, described ROI has the far-end of the coding video that receives from described remote-control device, and the information of the described ROI of wherein said regulation is embedded in the described far-end video of having encoded of described remote-control device reception.
21. device as claimed in claim 18, it further comprises the Video Decoder of reception by the far-end video of described remote-control device coding, and wherein said Video Decoder receives the information of the described ROI of described regulation by the out-of-band signalling from described remote-control device.
22. device as claimed in claim 21, wherein said ROI engine produce the information of the ROI in the described far-end video of having encoded of regulation, and described video encoder is transferred to described remote-control device with described ROI information with the described near-end video of having encoded.
23. device as claimed in claim 18, it further comprises Video Decoder, and described Video Decoder is decoded to strengthen the picture quality of the described ROI the described far-end video with respect to the non-ROI district of described far-end video to the described far-end video of having encoded that receives from described remote-control device.
24. device as claimed in claim 23, wherein said Video Decoder is used higher-quality reprocessing or error concealment technology with respect to the non-ROI district of described far-end video to the described ROI in the described far-end video.
25. device as claimed in claim 18, it further comprises: the ROI mapper module, and its information according to the described ROI of described regulation produces the MB mapping graph that identification is in the macro block (MB) in the described ROI; And the ROI controller, its information supply of handling the described ROI of described regulation is used for described ROI mapper module.
26. device as claimed in claim 18, wherein said video encoder is used higher-quality coding or error protection technology with respect to the non-ROI district of described near-end video to the described ROI in the described near-end video.
27. device as claimed in claim 18, it further comprises authentication module, described authentication module was verified the long-distance user who is associated with described remote-control device before according to described ROI described near-end video being encoded, wherein said authentication module determines whether to authorize described long-distance user's control according to the coding of described ROI to described near-end video.
28. device as claimed in claim 27, wherein said authentication module is sought described long-distance user's control according to the mandate of described ROI to the coding of described near-end video from the local user who is associated with described device.
29. device as claimed in claim 18, the wherein said information that receives from remote-control device comprises the information from a plurality of ROI the described near-end video of the regulation of a plurality of remote-control devices, described system further comprises authentication module, and the described authentication module couple long-distance user who is associated with described remote-control device verifies to select one among the described long-distance user to control the coding to described near-end video according to corresponding ROI.
30. device as claimed in claim 18, it further comprises tracking module, the movable information that the monitoring of described tracking module is associated with described near-end video is also regulated described ROI according to described movable information, and wherein said encoder is encoded to described near-end video according to described ROI through adjusting.
31. device as claimed in claim 30, it further comprises the ROI mapper module, described ROI mapper module produces the MB mapping graph that identification is in the macro block (MB) in the described ROI according to the information of the described ROI of described regulation, wherein said tracking module to the adjusting of described ROI comprise according to described movable information with the status modifier of MB for be contained in the described ROI or eliminating outside described ROI.
32. device as claimed in claim 18, the information of wherein said regulation ROI comprises text or language message, and described system further comprises the extraction module that defines described ROI according to described text or language message.
33. device as claimed in claim 18, the information of wherein said regulation ROI comprises text or language message, described system further comprise according to described text or language message define described ROI in the middle of extract server, its be positioned at described video communication device and described remote video communication device at a distance of So Far Away.
34. a computer-readable media, it comprises the instruction that makes processor carry out following operation:
Receive the information of the area-of-interest (ROI) in the near-end video that regulation receives by the local device coding and by remote-control device from remote-control device; And
Described near-end video is encoded to strengthen the picture quality of described ROI with respect to the non-ROI district of described video.
35. computer-readable media as claimed in claim 34, wherein said instruction make described processor that described remote-control device is arrived in the described near-end video transmission of having encoded, and receive the far-end video by described remote-control device coding in described local device.
36. computer-readable media as claimed in claim 34, wherein said instruction makes described processor receive the information of the described ROI of described regulation, described ROI has the far-end of the coding video that receives from described remote-control device, and the information of the described ROI of wherein said regulation is embedded in the described far-end video of having encoded.
37. computer-readable media as claimed in claim 34, wherein said instruction make described processor receive the information of the described ROI of described regulation by the out-of-band signalling from described remote-control device.
38. computer-readable media as claimed in claim 34, wherein said instruction makes described processor produce the information of the ROI in the far-end video of encoding that regulation receives from described remote-control device, and described ROI information is transferred to described remote-control device with the described near-end video of having encoded.
39. computer-readable media as claimed in claim 34, wherein said instruction make described processor decode to strengthen the picture quality of the described ROI the described far-end video with respect to the non-ROI district of described far-end video to the described far-end video of having encoded that receives from described remote-control device.
40. computer-readable media as claimed in claim 39, wherein said instruction make described processor use higher-quality reprocessing by the non-ROI district with respect to described far-end video to the described ROI in the described far-end video or the error concealment technology comes the described far-end video of having encoded is decoded.
41. computer-readable media as claimed in claim 34, the information of the described ROI of wherein said regulation comprise the MB mapping graph that identification is in the macro block (MB) in the described ROI.
42. computer-readable media as claimed in claim 34, wherein said instruction make described processor use higher-quality coding by the non-ROI district with respect to described near-end video to the described ROI in the described near-end video or the error protection technology comes described near-end video is encoded.
43. computer-readable media as claimed in claim 34, wherein said instruction makes described processor determine whether to authorize described long-distance user to control the coding to described near-end video according to described ROI before according to described ROI described near-end video being encoded, and wherein said instruction makes described processor seek described long-distance user's control according to the mandate of described ROI to the coding of described near-end video from the local user who is associated with described local device.
44. computer-readable media as claimed in claim 34, wherein receive the described information of a plurality of ROI in the described near-end video of regulation, and the long-distance user that described instruction makes described processor checking be associated with described remote-control device controls the coding to described near-end video according to corresponding ROI to select one among the described long-distance user from a plurality of remote-control devices.
45. computer-readable media as claimed in claim 34, wherein said instruction make described processor carry out following operation:
The movable information that monitoring is associated with the described near-end video of having encoded;
Regulate described ROI according to described movable information; And
According to described ROI described near-end video is encoded through regulating.
46. computer-readable media as claimed in claim 45, the information of the described ROI of wherein said regulation comprises the MB mapping graph that identification is in the macro block (MB) in the described ROI, and described instruction make described processor regulate described ROI comprise according to described movable information with the status modifier of MB for be contained in the described ROI or eliminating outside described ROI.
47. a method, it comprises:
Produce the information of the area-of-interest (ROI) in the far-end video that regulation receives by the remote-control device transmission and by local device; And
Described message transmission is encoded to strengthen the picture quality of described ROI with respect to the non-ROI district of described video to described far-end video according to described ROI for being used for to described remote-control device.
48. method as claimed in claim 47 wherein is embedded in the information of the described ROI of described regulation by described local device coding and is transferred in the near-end video of described remote-control device.
49. method as claimed in claim 47 wherein receives the information of the described ROI of described regulation by the out-of-band signalling from described remote-control device.
50. method as claimed in claim 47, the information of the described ROI of wherein said regulation comprise the MB mapping graph that identification is in the macro block (MB) in the described ROI.
51. a video coding apparatus, it comprises:
Area-of-interest (ROI) engine, it produces the information of the area-of-interest (ROI) in the far-end video that regulation receives from remote-control device; And
Video encoder, it is encoded to the near-end video and the information that will stipulate described ROI is transmitted for described remote-control device with the described near-end video of having encoded and is used for according to described ROI described far-end video being encoded, thereby strengthens the picture quality of described ROI with respect to the non-ROI district of described far-end video.
52. device as claimed in claim 51, the information of the described ROI of wherein said regulation are embedded in the described near-end video that transfers to described remote-control device.
53. device as claimed in claim 51, the information of the described ROI of wherein said regulation is transferred to described remote-control device by out-of-band signalling.
54. device as claimed in claim 51, the information of the described ROI of wherein said regulation comprise the MB mapping graph that identification is in the macro block (MB) in the described ROI.
CNA2006800145199A 2005-03-09 2006-03-08 Region-of-interest processing for video telephony Pending CN101167365A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US66020005P 2005-03-09 2005-03-09
US60/660,200 2005-03-09
US11/182,432 2005-07-15

Publications (1)

Publication Number Publication Date
CN101167365A true CN101167365A (en) 2008-04-23

Family

ID=39334927

Family Applications (2)

Application Number Title Priority Date Filing Date
CN200680014872.7A Expired - Fee Related CN101171841B (en) 2005-03-09 2006-03-08 Region-of-interest extraction for video telephony
CNA2006800145199A Pending CN101167365A (en) 2005-03-09 2006-03-08 Region-of-interest processing for video telephony

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN200680014872.7A Expired - Fee Related CN101171841B (en) 2005-03-09 2006-03-08 Region-of-interest extraction for video telephony

Country Status (1)

Country Link
CN (2) CN101171841B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102025965A (en) * 2010-12-07 2011-04-20 华为终端有限公司 Video talking method and visual telephone
CN102438144A (en) * 2011-11-22 2012-05-02 苏州科雷芯电子科技有限公司 Video transmission method
CN103024334A (en) * 2011-09-28 2013-04-03 中国移动通信集团公司 Method, system and device for achieving visual telephone service
CN103190156A (en) * 2010-09-24 2013-07-03 株式会社Gnzo Video bit stream transmission system
WO2013181965A1 (en) * 2012-06-05 2013-12-12 华为技术有限公司 Control method, device and system for multipicture display
CN105120366A (en) * 2015-08-17 2015-12-02 宁波菊风系统软件有限公司 A presentation method for an image local enlarging function in video call
CN105794204A (en) * 2014-01-06 2016-07-20 英特尔Ip公司 Interactive video conferencing
CN107113397A (en) * 2014-12-05 2017-08-29 英特尔Ip公司 Interactive video meeting
US10148868B2 (en) 2014-10-02 2018-12-04 Intel Corporation Interactive video conferencing
CN111416939A (en) * 2020-03-30 2020-07-14 咪咕视讯科技有限公司 Video processing method, video processing equipment and computer readable storage medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102170552A (en) * 2010-02-25 2011-08-31 株式会社理光 Video conference system and processing method used therein
EP2523145A1 (en) * 2011-05-11 2012-11-14 Alcatel Lucent Method for dynamically adapting video image parameters for facilitating subsequent applications
US9317751B2 (en) * 2012-04-18 2016-04-19 Vixs Systems, Inc. Video processing system with video to text description generation, search system and methods for use therewith
CN103581603B (en) * 2012-07-24 2017-06-27 联想(北京)有限公司 The transmission method and electronic equipment of a kind of multi-medium data
TW201410014A (en) * 2012-08-22 2014-03-01 Triple Domain Vision Co Ltd A method for defining a monitored area for an image
CN103310411B (en) * 2012-09-25 2017-04-12 中兴通讯股份有限公司 Image local reinforcement method and device
WO2014094216A1 (en) * 2012-12-18 2014-06-26 Intel Corporation Multiple region video conference encoding
EP3879819A4 (en) * 2018-11-06 2022-01-05 Sony Group Corporation Information processing device and information processing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6178204B1 (en) * 1998-03-30 2001-01-23 Intel Corporation Adaptive control of video encoder's bit allocation based on user-selected region-of-interest indication feedback from video decoder
US20040257432A1 (en) * 2003-06-20 2004-12-23 Apple Computer, Inc. Video conferencing system having focus control
US20050024487A1 (en) * 2003-07-31 2005-02-03 William Chen Video codec system with real-time complexity adaptation and region-of-interest coding

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3597780B2 (en) * 1998-03-20 2004-12-08 ユニヴァーシティ オブ メリーランド Lossless / lossless image coding for regions of interest

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6178204B1 (en) * 1998-03-30 2001-01-23 Intel Corporation Adaptive control of video encoder's bit allocation based on user-selected region-of-interest indication feedback from video decoder
US20040257432A1 (en) * 2003-06-20 2004-12-23 Apple Computer, Inc. Video conferencing system having focus control
US20050024487A1 (en) * 2003-07-31 2005-02-03 William Chen Video codec system with real-time complexity adaptation and region-of-interest coding

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103190156A (en) * 2010-09-24 2013-07-03 株式会社Gnzo Video bit stream transmission system
CN102025965A (en) * 2010-12-07 2011-04-20 华为终端有限公司 Video talking method and visual telephone
WO2012075937A1 (en) * 2010-12-07 2012-06-14 华为终端有限公司 Video call method and videophone
CN102025965B (en) * 2010-12-07 2014-01-01 华为终端有限公司 Video talking method and visual telephone
CN103024334A (en) * 2011-09-28 2013-04-03 中国移动通信集团公司 Method, system and device for achieving visual telephone service
CN103024334B (en) * 2011-09-28 2015-11-25 中国移动通信集团公司 A kind of method, system and equipment realizing visual telephone service
CN102438144A (en) * 2011-11-22 2012-05-02 苏州科雷芯电子科技有限公司 Video transmission method
WO2013181965A1 (en) * 2012-06-05 2013-12-12 华为技术有限公司 Control method, device and system for multipicture display
US10165226B2 (en) 2014-01-06 2018-12-25 Intel IP Corporation Interactive video conferencing
CN105794204A (en) * 2014-01-06 2016-07-20 英特尔Ip公司 Interactive video conferencing
CN110417753A (en) * 2014-01-06 2019-11-05 英特尔Ip公司 The device of multimedia telephony services receiver and transmitter
CN110417753B (en) * 2014-01-06 2023-02-03 苹果公司 Apparatus for multimedia telephone service receiver and transmitter, and storage medium
US10148868B2 (en) 2014-10-02 2018-12-04 Intel Corporation Interactive video conferencing
US10791261B2 (en) 2014-10-02 2020-09-29 Apple Inc. Interactive video conferencing
CN107113397A (en) * 2014-12-05 2017-08-29 英特尔Ip公司 Interactive video meeting
US10491861B2 (en) 2014-12-05 2019-11-26 Intel IP Corporation Interactive video conferencing
CN107113397B (en) * 2014-12-05 2021-01-12 苹果公司 Interactive video conferencing
CN105120366A (en) * 2015-08-17 2015-12-02 宁波菊风系统软件有限公司 A presentation method for an image local enlarging function in video call
CN111416939A (en) * 2020-03-30 2020-07-14 咪咕视讯科技有限公司 Video processing method, video processing equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN101171841B (en) 2012-06-27
CN101171841A (en) 2008-04-30

Similar Documents

Publication Publication Date Title
CN101167365A (en) Region-of-interest processing for video telephony
US8019175B2 (en) Region-of-interest processing for video telephony
EP1856914B1 (en) Region-of-interest processing for video telephony
US9077847B2 (en) Video communication method and digital television using the same
CN101507278B (en) Techniques and method for variable resolution encoding and decoding of digital video
CN101288303B (en) Picture-in-picture processing method and device for video telephony
JP4000844B2 (en) Content distribution system, content distribution system distribution server and display terminal, and content distribution program
US20080279276A1 (en) Data processing system and method, communication system and method, and charging apparatus and method
US20180103236A1 (en) Method and system for new layout experience in video communication
CN103686219B (en) A kind of method, equipment and the system of video conference recorded broadcast
CN201312356Y (en) Digital television network media phone set terminal
CN103024523A (en) Method and system for sharing television programs based on set top box (STB)
CN104322065A (en) Terminal and video image compression method
CN1411278A (en) IP network TV conference system
CN113301342B (en) Video coding method, network live broadcasting method, device and terminal equipment
CN104935952B (en) A kind of video transcoding method and system
JP4655065B2 (en) Content distribution system, content distribution system distribution server and display terminal, and content distribution program
KR20180123863A (en) Apparatus and Method for interlocking Broadcasting Receiving Terminal with Mobile Terminal in IPTV System
CN101018316A (en) Video conference system based on IPTV and its implementation method
KR100776635B1 (en) Remote access method between settop box and remote server system using h.264 codec and apparatus therefor
Smith Trends in video-on-demand
KR20020013239A (en) A mobile studio type internet web casting system and controling method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1117686

Country of ref document: HK

REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1117686

Country of ref document: HK

C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20080423