CN101171841A - Region-of-interest extraction for video telephony - Google Patents

Region-of-interest extraction for video telephony Download PDF

Info

Publication number
CN101171841A
CN101171841A CN200680014872.7A CN200680014872A CN101171841A CN 101171841 A CN101171841 A CN 101171841A CN 200680014872 A CN200680014872 A CN 200680014872A CN 101171841 A CN101171841 A CN 101171841A
Authority
CN
China
Prior art keywords
roi
video
information
description
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200680014872.7A
Other languages
Chinese (zh)
Other versions
CN101171841B (en
Inventor
李彦辑
哈立德·希勒米·厄勒-马列
蔡明章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/183,072 external-priority patent/US8019175B2/en
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN101171841A publication Critical patent/CN101171841A/en
Application granted granted Critical
Publication of CN101171841B publication Critical patent/CN101171841B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The disclosure is directed to techniques for region-of-interest (ROI) processing for video telephony (VT) applications. According to the disclosed techniques, a recipient device defines ROI information for video information transmitted by a sender device, i.e., far-end video information. The recipient device transmits the ROI information to the sender device. Using the ROI information transmitted by the recipient device, the sender device applies preferential encoding to an ROI within a video scene. ROI extraction may be applied to process a user description of a region of interest (ROI) to generate information specifying the ROI based on the description. The user description may be textual, graphical, or speech-based. An extraction module applies appropriate processing to generated the ROI information from the user description. The extraction module may locally reside with a video communication device, or reside in a distinct intermediate server configured for ROI extraction.

Description

The concern district that is used for visual telephone extracts
The application's case is advocated the 60/660th of application on March 9th, 2005, the rights and interests of the 11/183rd, No. 072 U.S. patent application case co-pending that is entitled as REGION-OF-INTEREST PROCESSING FOR VIDEO TELEPHONY of No. 200 U.S. Provisional Application cases and application on July 15th, 2005.
Technical field
This disclosure relates to digital video coding and decoding, and more particularly relates to the technology that district's (ROI) information is paid close attention in the processing that is used for visual telephone (VT) application.
Background technology
Many different video encoding standards have been set up for the encoded digital video sequence.For instance, mobile photographic experts group (MPEG) has been developed many standards, comprises MPEG-1, MPEG-2 and MPEG-4.Other example comprises H.263 standard and emerging ITU standard H.264 of International Telecommunication Union.These video encoding standards are supported usually by improve the efficiency of transmission of video sequence with the compress mode coded data.
Visual telephone (VT) allows shared video of user and audio-frequency information to support the application of for example video conference.Exemplary visual telephone standard comprises H.323 standard and ITU standard H.324 of those standards of being defined by session initiation protocol (SIP), ITU.In the VT system, the user can send and receiver, video information, receiver, video information only, or only send video information.The recipient checks the video information that is received with video information from the form of sender's transmission usually.
Proposed the selected part of video information is carried out priority encoding.For instance, the sender can specify with the better quality coding and pay close attention to district (ROI) to be used to be transferred to the recipient.The sender may wish to emphasize described ROI to long-range recipient.Although the sender may wish to pay close attention to other object in the video scene, the representative instance of ROI is people's face.Utilization is compared with non-ROI district the priority encoding of ROI, and the recipient can more clearly check ROI.
Summary of the invention
This disclosure is at concern district (ROI) treatment technology that is used for visual telephone (VT).According to the technology of described announcement, local reception person's device defines the video by long-range sender's device code and transmission, i.e. the ROI information of far-end video.Described local reception person's device arrives described long-range sender's device with described ROI message transmission.Described sender's device uses the described ROI information by described recipient's device transmission, and the ROI in the video scene is used priority encoding, for example better quality coding or error protection.In this way, recipient's device can Long-distance Control to ROI coding by the far-end video of sender's device code.
Except receiving the far-end video, the recipient also can be through equipment to send video, i.e. near-end video.Therefore, participate in the VT communicating devices and can serve as the sender and the recipient of video information symmetrically.When serving as the recipient, the video of each device definable far-end ROI information to be used for encoding by as sender's remote-control device.And when serving as the sender, each device definable near-end ROI information is to be used to be transferred to the video information as another device of recipient.Sender or recipient's device can be described as " the ROI perception ", are meant that it can handle the ROI information that provided by another device to support the Long-distance Control to the ROI video coding.
Far-end ROI information allows the recipient to control long-range ROI coding that sender's device carries out more clearly to check object or the district in the video scene that is received.Near-end ROI information allows the sender to control object or district in the video scene that local ROI coding transmitted to emphasize.Therefore, the sender can be based on the ROI information that is produced by recipient or sender to the priority encoding of ROI.In addition, recipient's device can (for example) by application examples such as error concealing, deblock or the better quality reprocessing of deringing technology comes based on ROI information and the ROI that preferentially decodes.
In order to promote ROI to handle, this disclosure further contemplates that the technology that is used for ROI selection, ROI mapping, ROI extraction, ROI signaling, ROI tracking and the access checking of recipient's device is carried out Long-distance Control with permission to the ROI coding of sender's device.ROI selects can be dependent on predefined ROI pattern, oral or text ROI description, or the ROI of user's delimitation.ROI shines upon to relate to and selected ROI pattern is translated to the ROI mapping, the form that it can take the suitable macro zone block (MB) that is used by video encoder to shine upon.
The ROI signaling can relate to the signaling in band or out of band of carrying out ROI information from the recipient to sender's device.ROI follows the tracks of and relates to the dynamic adjustments ROI mapping in response to the ROI motion.The access checking can relate to for the purpose of the ROI control hazard between long-range ROI control and solution this locality and long-distance user or a plurality of long-distance user authorizes access right and grade to recipient's device.
The ROI extraction can relate to processing the user who pays close attention to district (ROI) is described the information of specifying described ROI to produce based on described description.Can encode the near-end video with the ROI that strengthens the near-end video picture quality based on the information of specifying ROI with respect to non-ROI zone.The user describes can be based on text, figure or voice.Extraction module is used and is suitably handled to produce ROI information from the user describes.Extraction module can reside on video communication device this locality, or resides on and be configured to carry out in the different intermediate server that ROI extracts.
In one embodiment, this disclosure provides a kind of method, it comprises from remote-control device receive to specify the information in the concern district (ROI) in the near-end video that receives by the local device coding and by remote-control device and encodes the near-end video with the ROI that the strengthens video picture quality with respect to non-ROI zone based on ROI.
In another embodiment, this disclosure provides a kind of video coding apparatus, and described video coding apparatus comprises: pay close attention to district's (ROI) engine, it receives the information in the concern district (ROI) of given transmission in the near-end video of remote-control device from remote video communication device; And video encoder, its coding near-end video is with the ROI that the strengthens video picture quality with respect to non-ROI zone.
In extra embodiment, this disclosure provides a kind of method, it comprise the information of produce specifying the concern district (ROI) in the far-end video that receives by the remote-control device transmission and by local device and with described message transmission to remote-control device to be used for encoding the far-end video with the ROI of enhancing video picture quality with respect to non-ROI zone based on ROI.
In another embodiment, this disclosure provides a kind of video coding apparatus, and described video coding apparatus comprises: pay close attention to district's (ROI) engine, it produces the information of specifying the concern district (ROI) in the far-end video that receives from remote-control device; And video encoder, its coding near-end video also will be specified the information of ROI and encoded near-end video transmits together to be made the far-end video that is used for encoding based on ROI with the ROI of the enhancing far-end video picture quality with respect to non-ROI zone by remote-control device.
In another embodiment, this disclosure provides a kind of method, it comprises the description that receives the concern district (ROI) in the near-end video that is produced by local device from the user, produce the information of specifying ROI and encode the near-end video with the ROI that strengthens the near-end video picture quality based on described description with respect to non-ROI zone based on the information of specifying ROI.
In extra embodiment, this disclosure provides a kind of video coding apparatus, described video coding apparatus comprises: pay close attention to district's (ROI) engine, it receives the description by the concern district (ROI) in the near-end video of described device code, and produces the information of specifying ROI based on described description; And video encoder, its coding near-end video is with the ROI that the strengthens video picture quality with respect to non-ROI zone.
In another embodiment, this disclosure provides a kind of video coding system, and described video coding system comprises: first video communication device, its coding near-end video; Second video communication device, it receives the near-end video from first video communication device, and the user that wherein said second video communication device produces the concern district (ROI) in the near-end video that is produced by described first video communication device describes; And intermediate server, be different from described first and second video communication device on its structure, and it produces the information of specifying ROI based on described description, and wherein first video communication device is encoded the near-end video with the ROI that strengthens the near-end video picture quality with respect to non-ROI zone based on the information of specifying ROI.
Technology described herein may be implemented in hardware, software, firmware or its any combination.If be implemented in the software, can come part to realize described technology by computer-readable media so, described computer-readable media comprises the program code that contains instruction, and described program code can carry out one or more methods in the method described herein when being performed.
Stated the details of one or more embodiment in the accompanying drawings and the description below content.From describe content and accompanying drawing and accessory rights claim, will understand other features, objects and advantages.
Description of drawings
Fig. 1 is that the video coding that ROI perception Video Codec (CODEC) is arranged and the block diagram of decode system are incorporated in explanation into.
Fig. 2 be explanation with display that radio communication device is associated on the figure of definition of the interior ROI of the video scene that presents.
Fig. 3 is the block diagram that the communicator that ROI perception CODEC is arranged is incorporated in explanation into.
Fig. 4 illustrates the block diagram that has ROI perception CODEC and further incorporate another communicator that the ROI extraction module is arranged into.
Fig. 5 is the distributed ROI extraction of server is extracted in explanation via the centre a block diagram.
Fig. 6 is the block diagram that explanation is used for the distributed ROI extraction of a plurality of video-phone sessions.
Fig. 7 A-7D is the figure of explanation for the predefined ROI pattern of user's selection.
Fig. 8 is that explanation produces ROI information to control the flow chart to the preferential ROI coding of near-end video at long-range sender's device place at recipient's device place.
Fig. 9 is that explanation is handled ROI information from recipient's device so that at sender's device place the near-end video is carried out the flow chart of preferential ROI coding in conjunction with ROI follows the tracks of.
Figure 10 is that explanation is handled ROI information from recipient's device so that at sender's device place the near-end video is carried out the flow chart of preferential ROI coding in conjunction with user rs authentication.
Figure 11 is the flow chart that predefined ROI pattern is selected in explanation.
Figure 12 is explanation defines the ROI pattern in the shown video scene by expansion and contraction ROI template figure.
Figure 13 is that explanation defines the figure of the ROI pattern in the shown video scene by dragging the ROI template.
Figure 14 is that explanation be by delimiting the figure that the ROI zone defines the ROI pattern in the shown video scene with stylus on touch screen.
Figure 15 is that explanation uses the pull-down menu of the ROI object of the appointment with the Dynamic Extraction treated and tracking to define the figure of the ROI pattern in the shown video scene.
Figure 16 is that the figure with the ROI pattern in the video scene of the pull-down menu that is mapped to as the ROI object of the appointment of predefined ROI pattern among Fig. 7 A-7D as shown in defining is used in explanation.
Figure 17 is that explanation uses ROI to describe the flow chart that the interface defines the ROI pattern in the shown video scene.
Figure 18 is that explanation solves the flow chart that the sender conflicts with ROI between recipient's device.
Figure 19 is the flow chart of the preferential decoding of the ROI macro zone block in the explanation far-end video.
Embodiment
Fig. 1 is that the video coding that ROI perception Video Codec (CODEC) is arranged and the block diagram of decode system 10 are incorporated in explanation into.As shown in Figure 1, system 10 comprises first video communication device 12 and second video communication device 14.Communicator 12,14 connects by transmission channel 16.Transmission channel 16 can be wired or wireless medium.System 10 supports the two-way video transmission that is used for visual telephone between the video communication device 12,14.Device 12,14 symmetrical manner operation substantially.Yet in certain embodiments, the one or both in the video communication device 12,14 can be configured to only be used for one-way communication to support ROI perception video streaming.
For bidirectional applications, reciprocal coding, decoding, multiplexed (MUX) and multichannel are decomposed the opposite end that (DEMUX) assembly can be provided at channel 16.In the example of Fig. 1, video communication device 12 comprises MUX/DEMUX assembly 18, ROI perception video CODEC20 and audio frequency CODEC22.Similarly, video communication device 14 comprises MUX/DEMUX assembly 26, ROI perception video CODEC28 and audio frequency CODEC30.Each CODEC20,28 is " the ROI perception ", is meant that it can handle by the long-range ROI information that provides or provided by himself video communication device this locality of another video communication device 12,14.
Video communication device 12,14 can be embodied as through equipment to be used for video streaming, visual telephone or both mobile radio terminals or catv terminal.For this reason, video communication device 12,14 can further comprise suitable wireless transmission, reception, modulatedemodulate reconciliation process electronic component with support of wireless communication.The example of mobile radio terminal comprises mobile radiotelephone, mobile personal digital assistant (PDA), mobile computer or is equipped with wireless communication ability and other mobile device of video coding and/or decoding capability.The example of catv terminal comprises desktop computer, visual telephone, the network equipment, set-top box, interactive television etc.Any one can be configured to send video information, receiver, video information in the video communication device 12,14, or sends and receiver, video information.
For videophone application, need device 12 to support video to send and the video reception ability usually.Yet, also expect the crossfire Video Applications.In visual telephone and the especially mobile video telephone by radio communication, bandwidth is important concern factor.Therefore, extra bits of coded optionally is assigned to the picture quality that ROI or other priority encoding step can be improved the part of video, keeps overall code efficiency simultaneously.For priority encoding, extra bits can be assigned to ROI, simultaneously the position of the number that reduces can be assigned to non-ROI district (for example, the background in the video scene).
Usually, system 10 is used for concern district (ROI) treatment technology that visual telephone (VT) is used.Yet this type of technology also can be applicable to video streaming and uses, as mentioned above.For purposes of illustration, will suppose that each video communication device 12,14 can be as the sender of video information and recipient and operated, and operates as the participant in full in the VT session by this.For the video information that is transferred to video communication device 14 from video communication device 12, video communication device 12 is that sender's device and video communication device 14 are recipient's devices.On the contrary, for the video information that is transferred to video communication device 12 from video communication device 14, video communication device 12 is that recipient's device and video communication device 14 are sender's devices.When discussing will be by the video information of local video communication device 12,14 codings and transmission the time, described video information will be called " near-end " video.When discussing will be by remote video communication device 12,14 codings and from video information that remote video communication device 12,14 receives the time, described video information will be called " far-end " video.
According to the technology that is disclosed, when operating as recipient's device, video communication device 12 or 14 defines the ROI information at the far-end video information that receives from sender's device.Once more, the video information that receives from sender's device is called " far-end " video information, receives because it is another (sender) device from the far-end that is in communication channel.Equally, the ROI information that defines at the video information that receives from sender's device is called " far-end " ROI information.Far-end ROI typically refers to the district of the recipient's concern that causes the far-end video in the far-end video most.Recipient's device decoding far-end video information also will be presented to the user via display unit through the far-end video of decoding.The user selects ROI in the video scene that the far-end video is presented.
The ROI that recipient's device is selected based on the user and produce far-end ROI information, and far-end ROI information is sent to sender's device.Far-end ROI information can be taked the form of ROI macro zone block (MB) mapping, and it defines ROI according to the macro zone block that resides in the ROI.ROI MB shines upon available 1 mark and is in MB in the ROI, and with the MB of 0 mark ROI outside, is included in (1) among the ROI and the eliminating MB of (0) outside ROI with identification easily.MB is the video block that forms the part of frame.The size of MB can be 16 * 16 pixels.Yet other MB size is possible.Therefore, MB can refer to any video block, including (but not limited to) for example MPEG-1, MPEG-2 and MPEG-4, ITU H.263, ITU H.264 the particular video frequency coding standard or any other standard in the macro zone block that defines.
By using the far-end ROI information by the transmission of recipient's device, sender's device is applied to corresponding ROI in the video scene with priority encoding.In particular, extra bits of coded can be assigned to ROI, simultaneously the bits of coded of the number that reduces can be assigned to non-ROI district, improve the picture quality of ROI by this.In this way, the ROI coding that can Long-distance Control sender device the far-end video information be carried out of recipient's device.Priority encoding for example distributes by the priority bit in the ROI zone or preferential the quantification, and the high-quality coding will be applied to the ROI zone and will compare more with the non-ROI zone of video scene.Allow the user of recipient's device more clearly to check object or district through the ROI of priority encoding.For instance, compare with the background area of video scene, the user of recipient's device may wish more clearly to check face or a certain other object.
When operating as sender's device, video communication device 12 or 14 also definable at ROI information by the video information of sender's device transmission.Once more, the video information that produces in sender's device is called " near-end " video, because it is to produce at the near-end of communication channel.The ROI information that is produced by sender's device is called " near-end " ROI information.Near-end ROI typically refers to the district that the sender wishes the near-end video emphasized to the recipient.Therefore, ROI can be appointed as far-end ROI information by recipient's device users, or is appointed as near-end ROI information by sender's device users.Sender's device is looked closely frequency nearly and is presented to the user via display unit.The user who is associated with sender's device selects ROI in the video scene that the near-end video is presented.The ROI that sender's device uses the user the to select near-end video of encoding makes that with respect to non-ROI zone the ROI in the near-end video is carried out priority encoding by (for example) with the better quality coding.
The near-end ROI that is selected by the local user at sender's device place allows the user of sender's device to emphasize district or object in the video scene, and makes these districts or object cause the concern of recipient's device users by this.It should be noted that the near-end ROI that is selected by sender's device users need not to be transferred to recipient's device.In fact, sender's device is looked closely nearly to keep pouring in to be passed to and is used before recipient's device selected near-end ROI information at the described near-end video of local coder.Yet in certain embodiments, sender's device can send to ROI information recipient's device to allow to use preferential decoding technique, for example better quality error correction (as error concealing) or reprocessing (as deblocking and the deringing filter).
If ROI information is provided by sender's device and recipient's device, sender's device is used the far-end ROI information that receives from recipient's device or the local near-end ROI information that the produces near-end video of encoding so.The near-end that sender's device and recipient's device provide with ROI may occur between far-end ROI selects and conflict.This type of conflict may need to solve, and is for example initiatively solved by the local user or solves according to the access right and the grade of defined, will describe as other place in this disclosure.In either case, sender's device all comes priority encoding ROI based on the near-end ROI information that is provided by sender's device this locality or by the long-range ROI information that provides of recipient's device.
In order to promote ROI to handle, this disclosure further contemplates that the technology that is used for ROI selection, ROI mapping, ROI signaling, ROI tracking and the access checking of recipient's device is carried out Long-distance Control with permission to the ROI coding of sender's device.As describing, the different ROI that recipient's device or sender's device are used selects technology to relate to and selects predefined ROI pattern, oral or text ROI description, or user's ROI delimit.In recipient's device, the ROI mapping relates to far-end or the near-end ROI pattern that will select and translates to the ROI mapping, and it can take the form of macro zone block (MB) mapping.The ROI signaling can relate to the signaling in band or out of band of carrying out far-end ROI information from recipient's device to sender's device.ROI follows the tracks of and to relate to far-end ROI mapping that in response to ROI motion dynamic adjustments produces by recipient's device or by the local near-end ROI of sender generation itself.Access checking can be for to the Long-distance Control of far-end ROI and solve the purpose of the ROI control hazard between recipient and the sender's device and relate to recipient's device and authorize access right and grade.
System 10 can support standard, the ITU visual telephone of standard or other standard H.324 H.323 according to session initiation protocol (SIP), ITU.Each video CODEC 20,28 according to for example MPEG-2, MPEG-4, ITU H.263 or the video compression standard of ITUH.264 produce encoded video data.As further showing among Fig. 1, video CODEC20,28 can with audio frequency CODEC22 separately, 30 integrated, and comprise the Voice ﹠ Video part of suitable MUX/DEMUX assembly 18,26 with data streams.MUX/DEMUX unit 18,26 can meet ITU H.223 multiplexer agreement or other agreement of User Datagram Protoco (UDP) (UDP) for example.
Fig. 2 be explanation with display 36 that radio communication device 38 is associated on the figure of definition of the interior ROI32 of the video scene that presents 34.In the example of Fig. 2, ROI32 is a rectangle region, and it contains the people's who presents in the video scene 34 face 39, and needs improve or any image or the object of the coding of enhancing but ROI can contain.In VT uses, the people who presents in the video scene 34 will be the user of long-range sender's device usually, and it is a side of the video conference carried out with user as the radio communication device 38 of recipient's device operation.ROI32 constitutes far-end ROI, because the ROI of its definition from the video scene of long-range sender's device transmission.According to this disclosure, far-end ROI32 is transferred to sender's device to specify the priority encoding to the video scene zone in the ROI.In this way, the picture quality that the local user of recipient's device 38 can Long-distance Control far-end ROI32.As describing, the size of far-end ROI32, shape and position can be fixing or adjustable, and can be defined in many ways, describe or regulate.
ROI32 permission recipient device users is more clearly checked the individual objects in the video scene 34, for example people's face 39.Face 39 in the ROI32 is encoded with higher image quality with respect to the non-ROI zone (for example, background area) of video scene 34.In this way, the user can more clearly check facial expression, lip activity, eye activity etc.Yet, perhaps can use ROI32 to specify any object except face.In general, the ROI during VT uses may be very subjective and may be different because the user is different.Required ROI also depends on how to use VT.In some cases, VT can be used for checking and evaluation object, forms contrast with video conference.
For instance, the husband can use VT should be used for showing that it wants the present of buying in the gift shop, airport.The husband may wish to obtain second kind of suggestion with timely and alternant way there from his wife.Do like this, he can make decision immediately, because the airliner that he took will set out at once.In this case, ROI is the district that covers the present that the husband just considering.By allowing wife (or husband) to select ROI, might realize better coding or good quality of service, and allow wife more clearly to check present by this at described specific ROI.
As another example, two or more engineers can relate to the VT conversation of demonstrating and discuss various equatioies or chart on blank.In this case, the long-distance user may wish to check with the better image quality zone of blank, for example is more clearly visible the details of equation.For this reason, the long-distance user selects to comprise the ROI of described equation.In addition, when an engineer when blank adds, the long-distance user may wish that mobile ROI is to follow the tracks of the theme that newly adds blank to.The long-distance user specifies the ability of ROI can significantly improve the exchange of information in the technical discussion process.
ROI technology described herein is not only improved the video quality of ROI, and improves two video interactives between the user.In general, conventional VT only use with two one-way video transmission combinations and any all are oral carrying out alternately.In conventional VT used, the video side did not exist usually alternately.The Finite control that permission recipient device users has during the VT conversation at least to the video content that receives from sender's device can allow more video interactive.
In this way, VT uses and can make recipient's device users can select ROI through design, and ROI information is sent it back sender's device so that ROI is carried out priority treatment, and for example the better quality coding (for example, by distributing more bits of coded) or strong error protection (for example, inner MB upgrades).In fact, by specifying far-end ROI, the remote controlled sender's device code of recipient's device users device.In addition, this far-end ROI information can be used by the ROI perception Video Decoder in the device, and described ROI perception Video Decoder receives the far-end video to carry out reprocessing preferably, for example error concealing, deblock or deringing.By the recipient of encoded video the Long-distance Control of video encoder is different from pan, inclination, zoom or the focal length of only controlling remote camera.By contrast, handle by long-range ROI, the user can influence the encoding quality that is applied to given zone.Yet, in certain embodiments, can provide remote camera control and the control combination of long-distance video encoder.
Fig. 3 is the block diagram that the video communication device 12 that ROI perception CODEC is arranged is incorporated in explanation into.Although the video communication device 12 of Fig. 3 depiction 1 can be constructed video communication device 14 similarly.Once more, video communication device 12 or 14 can be served as recipient's device, sender's device, and preferably recipient and sender's device.As shown in Figure 3, video communication device 12 comprises ROI perception CODEC20, video capture device 40 and user interface 42.Although show channel 16 among Fig. 3, omitted MUX/DEMUX and audio-frequency assembly for convenience of explanation.Video capture device 40 can be integrated or operationally be coupled to the video camera of video communication device 12 with video communication device 12.In certain embodiments, for instance, video capture device 40 can be integrated to form so-called video camera phone with mobile phone.In this way, video capture device 40 can support mobile VT to use.
User interface 42 can comprise display unit, for example LCD (LCD), plasma screen, projecting apparatus display, or can be with video communication device 12 integrated or operationally be coupled to any other display device of video communication device 12.Display unit presents video image to the user of video communication device 12.Video image can comprise the near-end video that is obtained in this locality by video capture device 40, and from the far-end video of sender's device remote transmission.In addition, user interface 42 can comprise any one in multiple user's input medium, comprises hardkey, soft key, various indicator device, stylus etc., to be used for the user's input information by video communication device 12.In certain embodiments, the display unit of user interface 42 and user's input medium can be integrated with mobile phone.The user of video communication device 12 depends on user interface 42 and checks that far-end video and (according to circumstances) check the near-end video.In addition, the user depends on user interface 42 and comes input information to be used for definition or to select far-end ROI and (according to circumstances) near-end ROI.
As showing further among Fig. 3 that ROI perception CODEC20 comprises ROI engine 44, ROI perception video encoder 46 and ROI perception Video Decoder 48.The near-end video (" near-end video ") that ROI perception video encoder 46 coding obtains from video capture device 40 is to be used to be transferred to long-range recipient's device.Once more, term " near-end " is illustrated in the local video that produces in the video communication device 12, and this forms contrast with " far-end " video that receives from remote video communication device (for example, video communication device 14).In the example of Fig. 3, ROI perception video encoder 46 uses from the near-end ROI information (" long-range near-end ROI ") of remote receiver acquisition and comes priority encoding near-end ROI.Long-range recipient is the user who is associated with remote video communication device 14.
From long-distance user's visual angle, long-range near-end ROI is remote ROI when by remote-control device 14 transmission, and is called long-range near-end ROI from the visual angle of installing 12 local user when it is received.That is to say, determined to think that as the visual angle of sender or recipient's device 12,14 video and ROI are applicable to that near-end still is the far-end video.Once more, the user of the local device 12 of the video coding at Long-distance Control remote-control device 14 places specifies far-end ROI.Yet when the user of remote-control device 14 received far-end ROI, it was considered to long-range near-end ROI, because its near-end video about just being encoded by local device 14.In general, for the purpose of the mark that uses in this disclosure, the visual angle is important.
According to circumstances, ROI perception video encoder 46 can use the near-end ROI information (" local near-end ROI ") that obtains from the local user of video communication device 14.Local near-end ROI also can be described as the ROI that the sender drives, because its sender by encoded near-end video produces.Local near-end ROI information is used by local encoder 46 and is not sent to another video communication device 14 usually, unless the Video Decoder in the remote-control device 14 is applied to near-end ROI by user's appointment of sender's device 12 through design will preferentially decoding.Long-range near-end ROI also can be described as the ROI that receiver drives, because its remote receiver by encoded near-end video produces.Long-range near-end ROI allows the recipient of the video that produced by video communication device 12 to control the ROI coding that ROI perceptual audio coder 46 carries out, and local near-end ROI allows the sender of the video that produced by video communication device 12 to control the ROI coding that ROI perceptual audio coder 46 carries out.In some cases, as describing, long-range and local ROI definition potentially conflicting solves thereby need to conflict.
Local and remote near-end ROI information can be provided to ROI perceptual audio coder 46 as near-end ROI macro zone block (MB) mapping (" near-end ROI MB mapping ").Near-end ROI MB mapping identification resides on the specific MB in receiver near-end ROI or the sender's near-end ROI.ROI perceptual audio coder 46 comes ROI in the priority encoding near-end video with better quality coding, strong error protection or both, to improve the picture quality of ROI when for example the long-distance user at remote video communication device 14 places checks.Error protection preferably for ROI may especially cater to the need in wireless phone applications.Then the encoded near-end video (" encoded near-end video ") that is produced is transferred to remote-control device 14.
As will explaining, ROI perception video encoder 46 also transmits the far-end ROI information (" far-end ROI ") that the local user by video communication device 12 produces at the far-end video that receives from remote video communication device 14.Far-end ROI serves as the ROI at the receiver driving of the video of being encoded by remote video communication device 14.In fact, far-end ROI information by video communication device 12 transmission allows to the encoder of small part control by the far-end video of remote video communication device 14 generations, is used with control ROI perception video encoder 46 by video communication device 12 as the long-range near-end ROI that is received by ROI perception decoder 48.In this way, each video communication device 12,14 can influence the ROI coding in the far-end video that is produced by another device.
Can be used as signaling information in band or out of band and transmit by the far-end ROI information of video communication device 12 transmission.Under the situation of in-band signalling, far-end ROI information can be embedded in the encoded near-end video bit stream that is transferred to remote video communication device 14.For instance, in the mpeg 4 bitstream form, have the field that is called " user_data ", it can be used for the information of embedded description bit stream.Similar field in " user_data " field or other bit stream format can be used for embedded far-end ROI information and can not violate the bit stream compliance.Perhaps, ROI information can be embedded in the video bit stream by the so-called data hiding technique of for example Steganography.
ROI perception Video Decoder 48 be configured in the user_data field or the far-end video that imports into from remote-control device in other place seek ROI information.Under the situation of out-of-band signalling, for example can use ITU H.245 or the signaling protocol of SIP pass on far-end ROI information.In either case, far-end ROI information can take to define position and/or the ROI MB mapping of size or the form of physical coordinates of far-end ROI.In case decoder 48 receives the far-end video bit stream, it is just based on retrieving ROI information with the form of long-range sender's device agreement, and ROI information is delivered to access authentication module 58 to obtain access permission, to be used for carrying out near-end ROI control before long-range near-end ROI is provided to video encoder 56.
Except controlling the long-distance video encoder with the ROI in the priority encoding far-end video, far-end ROI information also can be applicable to the local video decoder with the MB in the ROI in the preferential decoding far-end video.For instance, as further showing among Fig. 3, the identical far-end ROI MB mapping that is produced to be used to be transferred to long-range encoder by ROI mapper 54 can be provided to ROI perception Video Decoder 48.ROI perception Video Decoder 48 uses ROI MB to shine upon the MB that preferentially decodes in the far-end video of remote video communication device 14 receptions.For instance, ROI perception Video Decoder 48 can be compared to ROI MB with non-ROIMB and use better reprocessing.Extraly or as an alternative, ROI perception Video Decoder 48 can be compared with non-ROI MB to ROI MB and use more healthy and stronger error concealing technology.In this way, ROI perception Video Decoder 48 depends on the far-end ROI information that is produced by the local user and preferentially decode the ROI of the far-end video that imports into partly with the picture quality of realization enhancing.
ROI perception Video Decoder 48 receives the far-end video that imports into from remote video communication device (for example, the video communication device 14 of Fig. 1).ROI perception Video Decoder 48 decoding far-end videos also will be provided to user interface 42 to present to the local user on display unit through the video of decoding.In addition, as mentioned above, ROI perception Video Decoder 48 is from remote video communication device 14 receiving remote near-end ROI information (" long-range near-end ROI ").The near-end ROI information that ROI perception Video Decoder 48 receives is produced to specify by the ROI in the video of video communication device 12 transmission by the user of remote video communication device 14.As mentioned above, the long-range near-end ROI information that receives of ROI perception Video Decoder 48 is used for the ROI of the near-end video that Long-distance Control ROI perception video encoder 46 produces by video communication device 12 with priority encoding.As mentioned above, transmit long-range near-end ROI by signaling technology in band or out of band.
Further referring to Fig. 3, ROI perception video encoder 46 and ROI perception Video Decoder 48 are mutual with ROI engine 44.ROI engine 44 is handled local and remote near-end ROI information to be used to encode and to transmit near-end video bit stream from video capture device 40.In addition, ROI engine 44 is handled the far-end ROI information that provides via user interface 42 to be used for coding and to be transferred to remote video communication device 14.ROI engine 44 comprises ROI controller 52, ROI mapper 54, ROI tracking module 56 and authentication module 58.In certain embodiments, ROI tracking module 56 and authentication module 58 can be chosen wantonly.
ROI perception video encoder 46, ROI perception Video Decoder 48, ROI controller 52, ROI mapper 54, ROI tracking module 56 and authentication module 58 can form in many ways, as the discrete functionality module or as comprising the functional one chip module that belongs to each module.In either case, each assembly of ROI perception CODEC20 (comprising ROI engine 44, video encoder 46 and Video Decoder 48) can be implemented in hardware, software, firmware or its combination.For instance, this class component can be used as one or more microprocessors or digital signal processor (DSP), one or more application-specific integrated circuit (ASIC)s (ASIC), one or more field programmable gate arrays (FPGA) or other equivalence is integrated or discrete logic on the software process carried out and operating.If be implemented in the software, can come part to realize described technology by computer-readable media so, described computer-readable media comprises the program code that contains instruction, and described program code can carry out one or more methods in the method described herein when carrying out in processor or DSP.
In operation, the user of video communication device 12 selects the near-end video that produced by video capture module 40 or by the far-end video of ROI perception Video Decoder 48 decodings, with display unit that user interface 42 is associated on check.In certain embodiments, the functional user of permission of picture-in-picture (PEP) checks near-end video and far-end video simultaneously.In order to check near-end or far-end video for the purpose of ROI definition, the user can handle user interface 42 and call the ROI defining mode.Default ground, video communication device 12 can be handled video coding and decoding and not consider ROI.By entering the ROI defining mode, the user activates the ROI perceptual coding and the decoding aspect of video communication device 12.Perhaps, ROI perceptual coding and decoding can be default mode.
When presenting the far-end video, the user uses in the multiple technologies any one to come ROI in the indicating remote video, will be described in more detail described technology.Far-end ROI highlights user district or object that pay close attention to or that need higher image quality in video scene.User interface 42 is imported based on the user and is produced far-end ROI indication.ROI information can further be handled to produce far-end ROI information to be used to be transferred to video communication device 14 by ROI engine 44.
Perhaps, the user can select to define to be used for ROI from the near-end video that video capture module 40 obtains.When presenting the near-end video, the user can use according to circumstances with the technology type of the ROI indication that is used for the far-end video like or identical technology indicate ROI in the near-end video.Near-end ROI or far-end ROI can VT when beginning conversation initially be specified or any time during the VT communication process designated.In certain embodiments, Initial R OI can be upgraded by local user or long-distance user, or upgrades automatically by ROI tracking module 56.If ROI is upgraded automatically, the user does not need to continue input ROI information so.In fact, will keep ROI, till the user changes or ends ROI based on user's initial input.
The indication that user interface 42 provides based on the user and produce local near-end ROI indication.The same with far-end ROI indication, near-end ROI indication can further be handled by ROI engine 44.Near-end ROI indication highlights (that is, by increasing picture quality) the interior user of video scene and wishes district or the object to the long-distance user emphasizes.The local user can be by selecting predefined ROI pattern via user interface 42 or delimiting the ROI pattern and select near-end ROI or far-end ROI.Delimit the ROI pattern and can relate to stylus and carry out free-hand drafting, or to default ROI pattern redesign size or reorientate.
In the example of Fig. 3, user interface 42 indicates (if providing) and far-end ROI to indicate the ROI controller 52 that is provided in the ROI engine 44 local near-end ROI.In addition, ROI controller 52 via authentication module 58 from ROI perception Video Decoder 48 receiving remote near-end ROI.In particular, ROI perception Video Decoder 48 detects the existence of long-range near-end ROI information in the far-end video flowing that is received, or via the existence of the long-range near-end ROI information of out-of-band signalling, and long-range near-end ROI information is provided to authentication module 58.Local near-end ROI and far-end ROI indication can be expressed according to the coordinate in the frame of video of each near-end video or far-end video.The coordinate of ROI can be the x-y coordinate in the frame of video.Yet the x-y coordinate is treated to produce ROI MB mapping, to be used by encoder 46 or decoder 48, as explaining.
ROI controller 54 is handled local near-end ROI, long-range near-end ROI and far-end ROI, and they are applied to ROI mapper 54.ROI mapper 54 is macro zone block (MB) mapping with each ROI Coordinate Conversion.More particularly, ROI mapper 54 produces far-end MB mapping, and it specifies the MB corresponding to the far-end ROI that is indicated by the local user in the far-end video.In addition, ROI mapper 54 produces near-end ROI MB mapping, and it specifies the MB corresponding to local near-end ROI, long-range near-end ROI or both combinations in the near-end video.
For predefined ROI pattern, the ROI mapping is simpler.Each predefined ROI pattern can have the appointment MB mapping of same scheduled justice.Yet for the ROI pattern of delimiting, reorientate or redesign size, 54 selections of ROI mapper meet the MB border by the coordinate of the ROI pattern of user's appointment most.For instance, if the ROI of appointment crosses MB, ROI mapper 54 places the ROI border at external margin or the internal edge place of relevant MB so.In other words, the MB that ROI mapper 54 can be configured to only will be in fully in the ROI is included in the ROI MB mapping, perhaps also comprises part and is in the interior MB of ROI.In either case, ROI comprises one group of complete MB that is proximate to the ROI of appointment.Once more, video encoder 46 or Video Decoder 48 are operated in the MB level, and will need usually ROI is translated to the MB mapping.Be included among the ROI or get rid of outside ROI by indivedual MB are appointed as, ROI MB mapping allows with irregular or non-rectangular shape definition ROI.
ROI perception video encoder 46 is transferred to remote video communication device 14 with far-end ROI MB mapping in encoded near-end video or by out-of-band signalling.Near-end ROI MB mapping is not transferred to remote video communication device 14.In fact, near-end ROI MB mapping is used by ROI perception video encoder 46, so as before to be transferred to remote video communication device 14 with better quality coding or strong error protection and the MB of the appointment in the priority encoding near-end video.Therefore, ROI perception video encoder 46 with encoded near-end video with through the ROI of priority encoding and far-end ROI message transmission to remote video communication device 14.
Variation in the ROI district of ROI tracking module 56 tracking near-end videos.If VT uses and to reside in the mobile video communication device, for instance, the user may move every now and then, thereby the position that causes the user changes with respect to the ROI of previous appointment.In addition, even when customer location is stablized, other object in the ROI also may shift out the ROI district.For instance, the canoe on the lake surface can pitch along with wave motion or move left and right.Need redefine ROI for fear of user when being moved, can provide ROI tracking module 56 with the object in motion tracking ROI district.
In the example of Fig. 3, the encoded near-end video reception movable information of ROI tracking module 56 from producing by ROI perception video encoder 46.Movable information can be taked the form of the motion vector of the MB in the encoded near-end video, thereby allows to carry out closed-loop control by 54 couples of ROI MB of ROI mapper mapping definition.Based on movable information, ROI tracking module 56 produces to be regulated the incremental positions of near-end ROI MB mapping, and adjusting is provided to ROI mapper 54.Position adjustments can be taked as being included among the ROI or getting rid of the form of the MB state variation outside ROI.
If a large amount of the moving of movable information indication ROI, the state of MB may change in the ROI MB mapping so.Usually, the state that is in the MB at ROI outer boundary place will change.In response to position adjustments, ROI mapper 54 makes the ROI displacement by near-end ROI MB mapping appointment, makes the ROI position to serve as the basic motion that is adapted in the encoded near-end video frame by frame.When moving to detect, ROI tracking module 56 and 54 cooperations of ROI mapper regulate the ROI position automatically in video scene.In this way, ROI engine 44 is regulated ROI to follow the tracks of the object that moves in the ROI.
Authentication module 58 is used to resolve long-distance user's ROI right, comprises individual user's the right and the priority of the right between a plurality of user.When ROI perception Video Decoder 48 during from remote video communication device 14 receiving remote near-end ROI, it is provided to ROI engine 44 with long-range near-end ROI.Yet, in some cases, may conflict with local near-end ROI by local user's appointment by the long-range near-end ROI of long-distance user's appointment.For instance, but local and remote user's designated scene in overlapping ROI or diverse ROI.In the case, can provide authentication module 58 to solve the ROI conflict.
In certain embodiments, authentication module 58 can be used so-called " MS master-slave " mechanism and coordinates should use which near-end ROI information (Local or Remote) in preset time.In particular, receive the sender before the ROI information of receiver driving, the sender is near-end ROI master device and controls its near-end ROI.In other words, before video communication device 12 places received long-range near-end ROI, the local user controlled near-end ROI.Thereby the long-distance user is near-end ROI " slave unit " and do not control near-end ROI, unless main device (that is local user) is authorized the access right of control near-end ROI.
In case the local user authorizes access right to the long-distance user, the local user just no longer controls its near-end ROI.In fact, the long-distance user who is associated with video communication device 14 obtains the control for the near-end ROI of the near-end video that is produced by video communication device 12, and becomes the main device of near-end ROI.But long-distance user's retentive control power is till the local user cancels access privilege clearly or refuses long-distance user's access in other mode, perhaps till the long-distance user ended the ROI chosen position, main in the case ROI control can be given back in the local user.
In case ROI perception Video Decoder 48 receives encoded far-end video (if any), it is just based on retrieving long-range near-end ROI information with the form of sender's agreement from video bit stream.Once more, near-end ROI information can be embedded in the encoded far-end video or by out-of-band signalling and send.In either case, ROI perception Video Decoder 48 is delivered to authentication module 58 to obtain access permission before via ROI controller 52 and ROI mapper 54 long-range near-end ROI being sent to ROI perception video encoder 46 with long-range near-end ROI.Authentication module 58 is formed on the specific user with access right, makes the user can not control cataloged procedure under the situation of authorizing without the local user.
Authentication module 58 can be configured to authorize and managing access power, and carries out balance between one or more long-distance users.For instance, the local user can authorize access right to selected long-distance user.Therefore, the local user can allow some long-distance users to control near-end ROI and forbid that other long-distance user controls near-end ROI.And the local user can assign relative access grade or priority to the long-distance user.In this way, but the stratum of the access grade between local user's assigning remote user makes and asks simultaneously under the situation of ROI control a plurality of long-distance users that some long-distance users compare with other long-distance user can have priority aspect the control near-end ROI.For instance, a plurality of long-distance users may ask the ROI control simultaneously in the multipart video-meeting process.Under this type of situation, the ROI control will be authorized usually specially to a user, and it is the local user, if perhaps control is authorized by the local user, it is selected one among the long-distance user so.
In certain embodiments, authentication module 58 also can be responsible for resource monitoring to determine whether local video communication device 12 has the ability of enabling ROI perception Video processing.If not having sufficient processing resource, local device do not come to support long-range ROI control or satisfy the ROI request of particular type that in preset time authentication module 58 is cancelled long-range ROI control access right or refusal ROI request so.As an example, bandwidth constraints or the local load of handling forced by communication channel may cause refusing long-range ROI control.As another example, these restrictions may allow to use pre-configured ROI pattern, rather than a ROI pattern of delimiting or describing.Authentication module 58 can be notified described ROI decision-making by status message being embedded in be sent to come in the encoded near-end video of spreading out of of remote-control device to remote-control device.
In addition, can authorize the degree that different access grades is controlled long-distance user's may command near-end ROI to indivedual long-distance users.For instance, the long-distance user only can be limited to just can select one group of predefined ROI pattern, specific ROI position or the specification of size or ROI when the local user ratifies.Therefore, authentication module 58 can be resolved the control of long-distance user for near-end ROI automatically, or by consulting active approval for long-distance user's near-end ROI control alternately with the local user.For instance, when the long-distance user asked access right with control near-end ROI, authentication module 58 can submit to inquiry with the long-distance user ROI control that requests for permission to the local user via user interface 42.
Authentication module 58 any one in can be in many ways followed the tracks of long-distance user's access grade.As mentioned above, the local user can ratify the request from long-distance user's control near-end ROI on one's own initiative, and controls the access grade of authorizing to the long-distance user on one's own initiative.Perhaps, the local user can keep address book in the memory in the video communication device 12 of the information (comprising access right or grade) that storage is associated with the long-distance user.Described address book can take to have the form of the long-distance user and the database of the tabulation of the access grade that is associated.When the long-distance user asked near-end ROI control, authentication module 58 was from the relevant access right information of address book retrieval, and automatic application verification process is resolved the ROI control between local user, long-distance user and the some long-distance users of possibility.If the long-distance user is not listed in the address book, the local user can select the long-distance user is added to address book and has suitable access right so.
In some cases, the local user can surmount (override) default access grade for the particular remote user appointment in the address book.For instance, authentication module 58 can allow the local user reconfiguring ROI control priority during the VT communication process on one's own initiative between different long-distance users, or interferes to regain the proprietary control to near-end ROI as the local user.Representing by the access control information among Fig. 3 (ACCESS CONTROL INFO) alternately between local user and the authentication module 58 when keeping address book or the request of active management ROI control.
When ratifying long-distance user's near-end ROI control automatically or initiatively, authentication module 58 is delivered to ROI controller 52 with long-range near-end ROI and is handled and shone upon by near-end ROI mapper 54 being used for.Perhaps, control near-end ROI if promptly do not provide long-range near-end ROI or local user to select to repel the long-distance user, the local near-end ROI that is provided via user interface 42 by the local user is provided ROI controller 52 so.
Authentication module 58 is used to solve this locality and conflicts with ROI between the long-distance user.Default ground, authentication module 58 is used the MS master-slave notion, and according to described MS master-slave notion, the local user has near-end ROI control.When authorizing the access right with highest ranking to the long-distance user, the near-end ROI of the ROI perception video encoder 46 of the complete control of video communicator 12 of long-distance user selects.Otherwise the local user has near-end ROI control, and it surmounts any near-end ROI that is made by the long-distance user and selects.
Although can authorize access right to the long-distance user, the local user will preponderate in near-end ROI control procedure, because long-distance user's access right has lower grade than local user's access right usually.Therefore, if the local user selects to specify near-end ROI, will ignore any near-end ROI selection that the long-distance user makes so.On the other hand, if the local user does not specify near-end ROI, divide the grade of the access right of tasking the long-distance user effective so, and the long-distance user can control near-end ROI.Yet as mentioned above, the local user still can select to surmount the access right that default MS master-slave concerns and abandon giving local user's highest ranking.
Fig. 4 be explanation have ROI perception CODEC and further incorporate into another video communication device 12 that ROI extraction module 60 is arranged ' block diagram.The video communication device 12 of Fig. 4 ' almost consistent with the video communication device 12 of Fig. 3.Yet, video communication device 12 ' further comprise ROI extraction module 60 to form local near-end ROI and far-end ROI based on input from the user.Except handling the selection of the ROI pattern that pre-sets simply or allowing the user that default ROI delimited, reorientates or redesign size, ROI extraction module 60 also allows the local user to describe by oral or text ROI and specifies ROI.In particular, ROI extraction module 60 is described based on the ROI that is provided by the local user and is produced local near-end ROI or far-end ROI.
The example that ROI describes for example comprises the text or the oral input of projects such as " face ", " mobile object ", " lip ", " human body ", " background ".May be starved of priority encoding to these objects.For instance, the priority encoding to lip or face can show facial expression preferably, tell speech etc.The text input can be keyed in or be selected from the menu that is presented by user interface 42.Can by to the microphone of video communication device 12 ' be associated in speak oral input be provided.Under each situation, local user " description " ROI rather than selection or delimitation ROI.ROI extraction module 60 is converted to one group of coordinate in suitable near-end or the far-end video scene with described description.Under the situation of using oral ROI to describe, user interface 42 or ROI extraction module 60 can comprise conventional speech recognition capabilities.In particular, ROI extraction module 60 can produce the information of specifying ROI based on one or more projects through identification.
ROI extraction module 60 is selected the ROI coordinate automatically by the conventional precoding processing algorithm that application is configured to detect required ROI.In particular, ROI extraction module 60 can be used an algorithm and carries out face detection, feature extraction, Object Segmentation or tracking according to the routine techniques known to the skilled of video ROI process field.For instance, but ROI extraction module 60 application-dependent in based on the brightness of the pixel of video input data or the routine techniques that chromatic value carries out ROI identification.
Conventional face detection scheme is usually directed to use the colour of skin to discern face and non-face pixel as instructing.IEICE journal Inf.﹠amp; Syst, in January, 2003, the E86-D volume, the 1st phase, the 101-108 page or leaf, C.-W.Lin, in Y.-J.Chang and Y.-C.Chen " A low-complexity face-assisted coding scheme for low bit-rate videotelephony " and IEEE journal On Circuits and Systems for Video Technology, in June, 1999, the 9th volume, the 4th phase, the 551-564 page or leaf has been described the example of conventional face detection scheme in D.Chai and K.N.Ngan " Face segmentation using skin-colormap in videophone applications ".
When the local user described ROI according to " face ", ROI extraction module 60 was analyzed near-end or far-end video according to circumstances, with automatic identification face and will be appointed as ROI with the coordinate that the face that is discerned is associated.ROI extraction module 60 then is delivered to coordinate ROI controller 52 and is handled and shone upon by ROI mapper 54 being used for.It should be noted that, ROI extraction module 60 is handled local near-end ROI description according to circumstances or far-end ROI describes, described description is mapped to suitable extraction algorithm, and automatically analyze be suitable for through the near-end video of precoding or through the far-end video of decoding with the suitable ROI of automatic extraction.
In order to support automatic ROI to detect, ROI extraction module 60 receives the near-end video from video capture device 40, and receives the far-end video from ROI perception Video Decoder 48.Use is described or far-end ROI description from the local near-end ROI of user interface 42, and the automation detection algorithm, and ROI extraction module 60 produces local near-end ROI and far-end ROI according to circumstances, so that be applied to ROI controller 52.Under each situation, ROI extraction module 60 is described local near-end ROI or far-end ROI description is converted to the coordinate that meets suitable description most.In the case, the user does not need to delimit ROI.In addition, the user is not defined to one group of predefined ROI pattern.In fact, ROI controller 52 detects the suitable district of describing coupling in the near-end video with ROI on one's own initiative.
ROI mapper 54 is mapped to relevant macro zone block (MB) in the frame of video with the ROI coordinate, and produces near-end or far-end ROI MB mapping.In fact, ROI mapper 54 will be translated into video encoder 46 intelligible forms from the ROI coordinate of ROI controller 52.In particular, video encoder 46 with in the MB level, is promptly being handled coding on the MB basis through equipment one by one.For this reason, ROI mapper 54 produces the ROI MB mapping of near-end or far-end video.ROI MB mapping identification drops on the interior MB of ROI of appointment, makes video encoder 46 to use priority encoding to those MB.
Except handling the ROI description, ROI extraction module 60 also can be through equipment to handle the ROI pattern that be selected from one group of predefined pattern by the local user or delimited, reorientated or redesign size by the local user.Therefore, video communication device 12 ' can be substantially produces ROI information as described about the video communication device 12 of Fig. 3, have ROI extraction module 60 to describe with the ROI of text or oral form input to handle by the local user but further incorporate into.Be convenient to aspect local user's use, ROI extraction module 60 may be desirable.Yet some video communication device may not have enough disposal abilities and support ROI extraction module 60.Therefore, 60 expressions of ROI extraction module are according to a desirable but optional assembly of the video communication device of this disclosure.
In certain embodiments, ROI extraction module 60 can be handled not only and describe by the local user but also by the ROI that the long-distance user produces.In this way, can be remotely in some devices but not carry out extraction functionality in this locality.For instance, particular video frequency communicator 14 ROI that ROI that enough local resources or ability support to provide for the user by device 14 describes may be provided extract.Yet another video communication device 12 may be extracted to carry out ROI through equipment preferably.In the case, expecting that local ROI extracts can be unloaded or be assigned to remote video communication device.
In order to support long-range extraction, can in many ways ROI be described and be provided to remote-control device.For instance, word picture can be included in the audio stream that is transferred to remote-control device.Text ROI describes and the ROI pattern of predefined ROI pattern or delimitation can (for example) be transferred to remote-control device by this information is embedded in the encoded video flowing equally.Therefore, the ROI information that sends to another device from a device can be taked pretreated ROI MB mapping or any other indication of ROI or the form of description, and described indication or description are included in and are applied to indication or the description that need handle at the remote-control device place before the long-range encoder.
Fig. 5 is the distributed ROI extraction of server 61 is extracted in explanation via the centre a block diagram.As shown in Figure 5, video communication device 12,14 can be extracted server 61 to the centre provides enough information to make can to extract ROI.For instance, each device 12,14 can provide separately local near-end ROI description, far-end ROI description, encoded or original near-end video and encoded far-end video.As the alternative method that encoded far-end video is provided from near-end device, ROI extracts server 61 can directly receive the far-end video from far end device.Use this information, extract the one or both that server 61 produces among far-end ROI and the local near-end ROI, and they are provided to install 12,14 separately.Extracting server 61 can be the server Anywhere that is positioned at communication network, and can be coupled to device 12,14 by wired media, wireless medium or both combinations.Extracting server 61 can be positioned at a distance with respect to video communication device 12,14, or is positioned at installing one in 12,14.Yet in many cases, extracting server 61 can be remote server.In general, extract server 61 and will structurally be different from video communication device 12,14.
Extracting server 61 can work very similarly with extraction module 60, but long-range, distributed earth operation makes and need not extract by the local ROI of execution in device 12,14.In this way, the processing cost of ROI extraction can be distributed to the different device that may have the larger process ability.The same with ROI extraction module 60, but for example oral, the text of server 61 process user extracted or the dissimilar ROI of pattern description describes.For this reason, ROI extracts server 61 and can comprise suitable ability (for example, speech recognition capabilities) and handle described description.In addition, ROI extracts server 61 can be equipped with video decoding capability with permission analysis video and extraction ROI, and code capacity is with recompile video and embedded ROI information (optionally).
Fig. 6 is the block diagram that explanation is used for the distributed ROI extraction of a plurality of video-phone sessions.In the example of Fig. 6, ROI extract server 61 operations with handle a plurality of video communication device 12A-14A, 12B-14B, 12C-14D extracts to the ROI of the VT session between the 12N-14N.In this way, ROI extracts a plurality of ROI of server 61 executed in parallel and extract the various VT sessions of task to support just carrying out on given current network.
Fig. 7 A-7D is the figure of explanation for the predefined ROI pattern of Local or Remote user selection.The ROI pattern of Fig. 7 A-7D is the purpose for example, and should not think have limited.ROI62 in the video scene 34 that presents on the display 36 that Fig. 7 A shows with radio communication device 38 is associated.ROI62 is a basic rectangle placed in the middle substantially in video scene 34.The major length of rectangle ROI62 is vertical extent in video scene 34.In many cases, predefined centered rectangle ROI62 will capture people's face effectively, promptly participate in the long-distance user's of VT conversation face.
Fig. 7 B shows another ROI64, and it takes to have the form of the rectangle of horizontally extending major length in video scene 34.ROI64 is placed in the middle substantially in video scene 34, and can capture for example objects such as vehicle, ship, product, demonstration effectively.
Fig. 7 C shows another ROI66, and its shape is through designing to capture the long-distance user's who participates in the VT conversation face and shoulder.Perhaps, ROI66 can capture the spokesman's of the host of intelligencer that one-way video crossfire for example provides news broadcast in using, rally or meeting face and shoulder.Under any circumstance, predefined ROI66 all focuses on human VT participant or demonstrator, and realizes the priority encoding to described personnel's physical features.
Fig. 7 D is illustrated in one group of two ROI68,70 that present side by side in the video scene 34.In the example of Fig. 5 D, ROI68,70 faces that can capture two people that take one's seat side by side or stand effectively.In this way, two participants' face can be by priority encoding to support facial expression and the higher image quality that moves.
The predefined ROI pattern of describing among Fig. 7 A-7D is for purposes of illustration.Other predefined ROI pattern with alternative site or shape can be provided.For instance, then can have circular or irregularly shaped if some ROI patterns can be mapped to the MB border.
In certain embodiments, can allow the user to selected ROI pattern redesign size or reorientate.Conventional pointer and corner drive technology can be used for realizing the redesign size and reorientate.In addition, can drag or by specifying zoom percentage to realize convergent-divergent again clearly by corner the ROI size.Certainly, when ROI became big, the degree of priority encoding was owing to the cause of bandwidth constraints reduces.Therefore, in some cases, can in video communication device 12, carry out maximum ROI size.
Fig. 8 is that explanation produces the flow chart of far-end ROI information with the preferential ROI coding in sender's device place control near-end video at recipient's device place.The process of describing among Fig. 8 may be implemented in the video communication device 12 of Fig. 3 or the video communication device 12 of Fig. 4 ' in.In operation, 48 decodings of the ROI perception Video Decoder in the video communication device 12 are from the far-end video (72) of long-range sender's device (for example, video communication device 14 (Fig. 1)).In case decoding far-end video, the user interface 42 of recipient's device 12 just show the far-end video and check (74) for the local user.
If the local user does not ask ROI to select (76), the next frame (72) of the far-end video of holding fire so and decode.Yet if request ROI selects (76), user interface 42 acceptance are from local user's far-end ROI information (78) so.ROI controller 52 and ROI mapper 54 then cooperation shine upon (80) to produce far-end ROI MB.ROI perceptual audio coder 46 is embedded in far-end ROI MB mapping in the encoded near-end video and by this far-end ROI mapping is transferred to long-range sender's device 14 (82) of coding far-end video.The interior MB of relevant ROI that far-end ROI MB mapping specifies the encoder reply that is associated with remote video communication device 14 to be sent to the far-end video of video communication device 12 uses priority encoding.
Fig. 9 is that explanation is handled near-end ROI information from recipient's device so that be in the flow chart that carries out preferential ROI coding in the near-end video in conjunction with ROI follows the tracks of at sender's device.In the example of Fig. 9, user interface 42 receives the near-end video flowing that is produced by video capture device 40, and presents near-end video (84) to the local user.If local user or long-distance user all do not ask near-end ROI to select (86), all MB (88) in each frame of video of normal encoding promptly do not carry out any priority encoding to the MB in the ROI so.Then encoded near-end video is sent to long-range recipient's device 14 (89).
Yet if local user or long-distance user ask near-end ROI to select (86), ROI controller 52 is handled relevant near-end ROI information to produce near-end ROI MB mapping (90) with ROI mapper 54 so.If near-end ROI is specified by local user and long-distance user, authentication module 58 can be interfered to help managing conflict among the ROI so.When receiving near-end ROI MB when mapping (90), ROI perception video encoder 46 comes MB (92) in the described ROI of priority encoding by using better quality coding, strong error protection or both.
Tracking module 56 is followed the tracks of the ROI position (94) in the near-end video by monitoring the movable information that is produced by ROI perception video encoder 46.If do not detect the displacement (96) among the ROI, use existing ROI so and shine upon the interior ROI MB (100) of near-end video that encodes, and encoded near-end video is sent to long-range recipient's device (102).If detect the displacement (96) among the ROI, video tracking module 56 is regulated ROI MB mapping (98) based on movable information before at coding near-end video (100) so.
Figure 10 is that explanation is handled ROI information from recipient's device so that be in the flow chart that carries out preferential ROI coding in the near-end video in conjunction with user rs authentication at sender's device.Figure 10 depiction 3 or 4 authentication module 58 allow long-distance users to control the operation of near-end ROI, and do not specify any local near-end ROI for easy supposition.As shown in figure 10, for the near-end video flowing (104) that is produced by the video capture device in the video communication device 12 40, authentication module 58 determines whether the long-distance user of video communication device 14 has asked long-range near-end ROI (106).
If do not ask any long-range near-end ROI (106), and do not specify any local near-end ROI, so all MB (110) in the normal encoding near-end video.Yet, if asked long-range near-end ROI (106), authentication module 58 long-distance user's empirical tests (108) whether of then determining request near-end ROI so.In particular, authentication module 58 can be by determining long-distance user's access right automatically with reference to the address book that is stored in video communication device 12 this locality.Perhaps, authentication module 58 can be inquired the local user on one's own initiative via user interface 42, to obtain approval or the refusal to the access right of being carried out near-end ROI control by the long-distance user.
If long-distance user's invalidated (108), all MB (110) in the normal encoding near-end video so.Yet, if long-distance user's empirical tests (108) is authorized near-end ROI control to the long-distance user so.In the case, ROI controller 52 and ROI mapper 54 handled from long-distance user's near-end ROI information and produced near-end MB mapping (112).Use near-end MB mapping, 46 priority encodings of ROI perceptual audio coder are by the MB (114) of near-end MB mapping identification.Video communication device 12 then sends to remote video communication device 14 (116) with encoded near-end video.
Figure 11 is the flow chart that predefined ROI pattern is selected in explanation.In case 48 decodings of ROI perception Video Decoder just show far-end video (120) via user interface 42 to the local user from the far-end video (118) that remote video communication device 14 receives.If the local user asks ROI to select (122), user interface 42 shows for example menu (124) of the predefine ROI pattern of the ROI pattern shown in Fig. 7 A-7D so.Perhaps, the user can provide ROI to describe or size delimited, reorientates or redesigned to the ROI pattern.Yet in the example of Figure 11, operation concentrates on and presents predefined ROI pattern.When the local user selects predefined ROI pattern (126), ROI controller 52 and ROI mapper 54 are based on selected pattern definition ROI MB mapping (128).ROI perception video encoder 46 is embedded in ROI MB mapping in the encoded near-end video and with ROI MB mapping and is transferred to remote video communication device 14 (130) to be used for the ROI of priority encoding far-end video.
Figure 12 is explanation defines the ROI pattern in the shown video scene 34 by expansion and contraction ROI template 132 figure.Figure 12 is substantially corresponding to Fig. 2, but explanation can redesign presenting of big or small ROI template 132 by the user.In the example of Figure 12, can drag with expansion and shrink the ROI template and come by one of the corner of ROI template being carried out corner to ROI template 132 redesign sizes.The result that corner drags with expansion ROI template 132 is represented by the ROI template 134 through expansion.Corner drags the size increase that causes ROI template 132 or reduces, but keeps relative length and width scaling.Yet, in certain embodiments, also can allow the user to drag a side of ROI template 132 so that increase or reduce the size of ROI template, also change the length and width scaling simultaneously.Can use stylus to realize dragging with another indicator device that the user interface 42 of video communication device 12 is associated in conjunction with touch screen or use.Other indicator device can comprise joystick, touch pads, roller, tracking ball etc.
Figure 13 is that explanation defines the figure of the ROI pattern in the shown video scene by dragging ROI template 132.In particular, Figure 13 displaying is reorientated ROI template 132 by the another locations 135 that the ROI template dragged in the video scene 34.Can realize dragging by stylus and touch screen or another indicator device that is associated with user interface 42.
Figure 14 is that explanation be by delimiting the figure that ROI pattern 136 defines the ROI pattern in the shown video scene with stylus 138 on touch screen.In the example of Figure 14, describe to produce ROI pattern 136 by free-hand.54 cooperations of ROI controller 52 and ROI mapper to be will becoming the MB mapping with the Coordinate Conversion that the ROI pattern of delimiting be associated, and roughly drop on MB in the ROI pattern 136 in the described MB mapping identification video scene 34.The definition of the ROI pattern shown in Figure 12,13 and 14 is applicable to the ROI in near-end video or the far-end video.
Figure 15 is that explanation uses the pull-down menu 140 of the ROI object of the appointment with the dynamic tracking treated to define the figure of the ROI pattern in the shown video scene.As shown in figure 15, user interface 42 presents pull-down menu 140, and its ROI that for example presents " face ", " lip ", " background " and " moving " describes.The local user selects one of clauses and subclauses in the pull-down menu to describe as required ROI.In response, ROI extraction module 60 (Fig. 4) is analyzed near-end video or far-end video according to circumstances, to detect corresponding to the ROI pattern of describing.As substituting of pull-down menu 140, the user can be via user interface 42 input texts or to the oral text of saying of microphone.Under each situation, for example using, the feature detection algorithm of the routine of skin-tone detection, Object Segmentation or similar techniques makes selected ROI and suitable ROI pattern matching.When selected ROI pattern, ROI controller 52 and ROI mapper 54 produce suitable ROI MB mapping.Process among Figure 15 is called " dynamically ", be meant each ROI describe must be dynamically with consider in the particular video frequency scene in the ROI pattern matching.
Figure 16 is that the figure with the ROI pattern in the video scene of the pull-down menu 142 that is mapped to as the ROI object of the appointment of the predefined ROI pattern among Fig. 7 A-7D as shown in defining is used in explanation.As shown in figure 16, user interface 42 presents pull-down menu 142, and its ROI that for example presents " single face ", " two face ", " head/shoulder " and " object " describes.The local user selects one of clauses and subclauses in the pull-down menu as required ROI pattern.In response, ROI controller 52 makes selected ROI pattern and corresponding predefined ROI pattern (as the ROI pattern of describing among Fig. 7 A-7D) coupling.Therefore, be different from ROI shown in Figure 15 and describe, static ROI pattern does not need video analysis.In fact, ROI controller 52 and ROI mapper 54 produce the pre-configured ROI MB mapping corresponding to the selection in the pull-down menu 142.Once more, as substituting of pull-down menu 142, the user can be via user interface 42 input texts or to the oral text of saying of microphone.Process among Figure 15 is called " static state ", is meant that each ROI pattern is corresponding to predefined ROI pattern and MB mapping.
Figure 17 is that explanation uses ROI to describe the flow chart that the interface defines the ROI pattern in the shown video scene.Process shown in Figure 17 can be used in combination with pull-down menu or other input medium of Figure 15.As shown in figure 17,48 decodings of ROI perception Video Decoder are from the far-end video (144) of long-range sender's device 14 receptions.User interface 42 then shows far-end video (146) to the local user.If the local user does not ask to select (148) for the ROI of far-end video, any ROI information is not sent to remote video communication device 14 so.Yet if asked ROI selection (148), user interface 42 for example presents that the ROI of the pull-down menu 140 of Figure 17 describes interface (150) so.
When receiving local user ROI when describing (152), ROI controller 52 and ROI mapper 54 are selected ROI pattern (154) and based on selected ROI pattern definition ROI MB mapping (156) based on describing.Once more, can be by using conventional sense technical Analysis far-end video and making ROI description and the specific MB coupling in the far-end video determine the ROI pattern of selecting.When producing far-end ROI MB mapping, ROI perception video encoder 12 is embedded in far-end ROI MB mapping in the encoded near-end video and with it and is transferred to remote video communication device 14 to be used for priority encoding far-end ROI.
Figure 18 is the flow chart of the solution that conflicts with ROI between recipient's device 12,14 of explanation sender.In particular, Figure 18 illustrate authentication module 58 (Fig. 3 or Fig. 4) solve by the near-end ROI of local user's appointment with by the operation that conflicts between the near-end ROI of long-distance user's appointment.When producing the near-end video at sender's device place (160), authentication module 58 determines that whether near-end ROI is by local user or long-distance user's request (162).If not, all MB of normal encoding (164) and not priority encoding ROI so, and the encoded video that is produced sent to recipient's video communication device 14 (166).
If asked near-end ROI (162), authentication module 58 determines whether there are conflict (168) by between the near-end ROI of local user's appointment and the near-end ROI by long-distance user's appointment so.If assigning remote near-end ROI not, if or local consistent with long-range near-end ROI, checking can be delivered to ROI controller 52 to handle with the near-end ROI that selectes so.
If there is no local near-end ROI, but selected long-range near-end ROI, authentication module 58 can allow to use long-range near-end ROI so.Perhaps, in certain embodiments, only when mutual by the local user or by address book in the access grade that writes down and when the long-distance user had authorized clear and definite access right, authentication module 58 just can allow to use long-range near-end ROI.If there is no ROI conflict, ROI mapper 54 produces near-end MB mapping and it is applied to ROI perception video encoder 46 based on the near-end ROI that is suitable for so.ROI perception video encoder 46 is the interior MB (172) of ROI of priority encoding near-end video then.
If have conflict (168) between local and the long-range near-end ROI, so authentication module 58 definite access grades (174) of for example in video communication device 12, whether having assigned in the local address stored book.If assigned access grade (174), authentication module 58 solves ROI conflict (176) according to the access grade so.The access grade of storing at the long-distance user for instance, can be indicated should surmount in the local user and be authorized ROI control to the long-distance user.If do not assign access grade (174), authentication module 58 is sought permission (178) to long-range ROI control from the local user so.In particular, authentication module 58 can submit to inquiry to carry out near-end ROI control with the long-distance user that requests for permission via user interface 42.
If the local user ratifies, authentication module 58 is delivered to ROI controller 52 to handle with long-range near-end ROI so.If ratify, ROI controller 52 is handled local near-end ROI so.In either case, ROI perception video encoder 46 uses selected ROI to come the MB (172) in the described ROI of dropping in the priority encoding near-end video, and encoded near-end video is read into long-range recipient's device 14 (166).In some cases, authentication module 58 not only can solve the local user and conflict with ROI between the long-distance user, and may solve the ROI conflict between some long-distance users.The local user is the access rights of authorizing control near-end ROI in the long-distance user on one's own initiative, or the relative access grade of order of priority is distinguished each long-distance user's ROI control in assignment.Usually, authorize the access right of control ROI specially to a user (for example, local user, or one among the long-distance user).
Figure 19 is the flow chart of the preferential decoding of the ROI macro zone block in the explanation far-end video.As shown in figure 19, when when long-range sender's device 14 receives the far-end video (180), the ROI perception Video Decoder 48 in local reception person's device 12 determines whether long-range ROI specify (182) by the local user.If not, all MB (184) in the ROI perception Video Decoder 48 normal encoding far-end videos so.Yet, if far-end ROI information is specified the ROI MB (186) in the far-end videos that received of the preferential decoding of ROI perception Video Decoder 48 so by the local user.Can be by with respect to interpolation equation that is applied to non-ROI MB and error concealing technology, use better quality interpolation equation or the healthy and strong error concealing technology ROI MB that preferentially decodes.Preferential decoding can comprise that for example better quality is deblocked or the preferential reprocessing of deringing filter.
Technology described herein may be implemented in hardware, software, firmware or its any combination.If be implemented in the software, can come part to realize described technology by computer-readable media so, described computer-readable media comprises the program code that contains instruction, and described program code can carry out one or more methods in the above-described method when carrying out.In the case, computer-readable media can comprise for example random-access memory (ram), read-only memory (ROM), nonvolatile RAM (NVRAM), Electrically Erasable Read Only Memory (EEPROM), FLASH memory, magnetic or the optical data storage media etc. of Synchronous Dynamic Random Access Memory (SDRAM).
Program code can be carried out by one or more processors, and described one or more processors are one or more digital signal processors (DSP), general purpose microprocessor, application-specific integrated circuit (ASIC) (ASIC), field programmable logic array (FPGA) or the integrated or discrete logic of other equivalence for example.In certain embodiments, functional being provided in the ad hoc software modules or hardware cell that is configured for use in Code And Decode described herein, or be incorporated in the Video Codec (CODEC) of combination.
Various embodiment have been described.These and other embodiment within the scope of the appended claims.

Claims (47)

1. method, it comprises:
Receive description from the user by the concern district (ROI) in the near-end video of local device generation;
Produce the information of specifying described ROI based on described description; And
Encode described near-end video with the ROI that the strengthens described near-end video picture quality with respect to non-ROI zone based on the information of the described ROI of described appointment.
2. method according to claim 1, wherein said description is a textual description.
3. method according to claim 1, wherein said description is a word picture.
4. method according to claim 3, it further comprises by speech recognition handles described word picture and produces the information of the described ROI of described appointment through the item of identification based on one or more.
5. method according to claim 1, wherein said description is a pattern description.
6. method according to claim 5, wherein said pattern description receives as the zone that described user delimit on user interface screen.
7. method according to claim 1, it further comprises from the user of local device and receives described description and handle described description to produce the information of the described ROI of described appointment in described local device.
8. method according to claim 1, it further comprises from the user of local device and receives described description and handle described description to produce the information of the described ROI of described appointment in being different from the intermediate server of described local device.
9. method according to claim 1, it further comprises from the user of remote-control device and receives described description, the described ROI about the near-end video of being encoded by described local device is defined in wherein said description, and the information of the described ROI of wherein said appointment is embedded in the encoded far-end video of described remote-control device reception.
10. method according to claim 1, it further comprises from the user of remote-control device and receives described description, the described ROI about the near-end video of being encoded by described local device is defined in wherein said description, and wherein receives the information of the described ROI of described appointment from described remote-control device by out-of-band signalling.
11. method according to claim 1, it further comprises the information of produce specifying the ROI in the encoded far-end video that receives from described remote-control device and described ROI information and described encoded near-end video is transferred to described remote-control device together.
12. method according to claim 1, it comprises that further encoded far-end video that decoding receives from described remote-control device is to strengthen ROI zone the described far-end video with respect to the picture quality in the non-ROI zone of described far-end video.
13. method according to claim 1, it comprises that further the information based on the described ROI of described appointment produces macro zone block (MB) mapping, and described MB mapping identification is in the MB in the described ROI.
14. method according to claim 1, it further comprises:
Receive described description from the user of local device, the described ROI about the near-end video of being encoded by described local device is defined in wherein said description;
Monitor the movable information that is associated with described encoded near-end video;
Regulate described ROI based on described movable information; And
Based on the described ROI described near-end video of encoding through regulating.
15. method according to claim 14, it comprises that further the information based on the described ROI of described appointment produces macro zone block (MB) mapping, described MB mapping identification is in the MB in the described ROI, and wherein regulate described ROI comprise based on described movable information with the status modifier of MB for be included among the described ROI or eliminating outside described ROI.
16. a video coding apparatus, it comprises:
Pay close attention to district's (ROI) engine, it receives the description by the concern district (ROI) in the near-end video of described device code, and produces the information of specifying described ROI based on described description; And
Video encoder, its described near-end video of encoding is with the ROI that the strengthens described video picture quality with respect to non-ROI zone.
17. device according to claim 16, wherein said description is a textual description.
18. device according to claim 16, wherein said description is a word picture.
19. device according to claim 18, it further comprises extraction module, and described extraction module is handled described word picture by speech recognition, and produces the information of the described ROI of described appointment through the item of identification based on one or more.
20. device according to claim 16, wherein said description is a pattern description.
21. device according to claim 20, wherein said pattern description receives as the zone that described user delimit on user interface screen.
22. device according to claim 16, wherein said ROI engine receives described description from the user of described device, and the described ROI about described near-end video is defined in wherein said description.
23. device according to claim 16, wherein said ROI engine is transferred to intermediate server to be used to produce the information of the described ROI of described appointment with described description.
24. device according to claim 16, wherein said ROI engine receives described description from the user of remote video communication device, the described ROI about the near-end video of being encoded by described video communication device is defined in described description, and the information of the described ROI of described appointment is embedded in the encoded far-end video of described remote-control device reception.
25. method according to claim 16, wherein said ROI engine receives described description from the user of remote video communication device, the described ROI about the near-end video of being encoded by described video communication device is defined in described description, and receives the information of the described ROI of described appointment from described remote-control device by out-of-band signalling.
26. device according to claim 25, wherein said ROI engine produce the information of specifying the ROI in the encoded far-end video that receives from described remote-control device, and described ROI information and described encoded near-end video are transferred to described remote-control device together.
27. device according to claim 16, it further comprises Video Decoder, and the encoded far-end video that described video decoder decodes receives from described remote-control device is to strengthen ROI zone the described far-end video with respect to the picture quality in the non-ROI zone of described far-end video.
28. device according to claim 16, it comprises that further the information based on the described ROI of described appointment produces macro zone block (MB) mapping, and described MB mapping identification is in the MB in the described ROI.
29. device according to claim 16, it further comprises tracking module, described tracking module monitors the movable information that is associated with described encoded near-end video, and regulate described ROI based on described movable information, wherein said encoder is based on described ROI through the regulating described near-end video of encoding.
30. device according to claim 29, it further comprises mapper module, described mapper module produces macro zone block (MB) mapping based on the information of the described ROI of described appointment, described MB mapping identification is in the MB in the described ROI, wherein said tracking module by based on described movable information with the status modifier of MB for be included among the described ROI or eliminating at the described ROI of the external adjusting of described ROI.
31. computer-readable media, it comprises that instruction receives the description by the concern district (ROI) in the near-end video of local device generation from the user to impel processor, produce the information of specifying described ROI based on described description, and encode described near-end video with the ROI that strengthens described near-end video picture quality with respect to non-ROI zone based on the information of the described ROI of described appointment.
32. computer-readable media according to claim 31, wherein said description is a textual description.
33. computer-readable media according to claim 31, wherein said description is a word picture.
34. computer-readable media according to claim 33, wherein said instruction impel described processor to handle described word picture by speech recognition, and produce the information of the described ROI of described appointment through the item of identification based on one or more.
35. computer-readable media according to claim 31, wherein said description is a pattern description.
36. computer-readable media according to claim 35, wherein said pattern description receives as the zone that described user delimit on user interface screen.
37. computer-readable media according to claim 31, wherein said instruction impel described processor to receive described description from the user of local device.
38. computer-readable media according to claim 31, wherein said instruction impel described processor to produce the information of the described ROI of described appointment in described local device.
39. computer-readable media according to claim 31, wherein said instruction impel described processor to receive described description from the user of remote-control device, the described ROI about the near-end video of being encoded by described local device is defined in wherein said description.
40. computer-readable media according to claim 31, wherein said description are embedded in the far-end video of described remote-control device reception.
41. computer-readable media according to claim 31 wherein receives described description by out-of-band signalling from described remote-control device.
42. computer-readable media according to claim 31, wherein said instruction impels described processor to produce to specify the information of the ROI in the encoded far-end video that receives from described remote-control device, and described ROI information and described encoded near-end video are transferred to described remote-control device together.
43. according to the described computer-readable media of claim 42, wherein said instruction impels described encoded far-end video that described processor decodes receives from described remote-control device to strengthen ROI zone the described far-end video with respect to the picture quality in the non-ROI zone of described far-end video.
44. computer-readable media according to claim 31, wherein said instruction impel described processor to produce macro zone block (MB) mapping based on the information of the described ROI of described appointment, described MB mapping identification is in the MB in the described ROI.
45. computer-readable media according to claim 31, wherein said instruction impels described processor to receive described description from the user of local device, the described ROI about the near-end video of being encoded by described local device is defined in described description, and the movable information that described instruction impels described processor monitors to be associated with described encoded near-end video, regulate described ROI based on described movable information, and based on described ROI through the regulating described near-end video of encoding.
46. according to the described computer-readable media of claim 45, wherein said instruction impels described processor to produce macro zone block (MB) mapping based on the information of the described ROI of described appointment, described MB mapping identification is in the MB in the described ROI, and described instruction impel described processor by based on described movable information with the status modifier of MB for be included among the described ROI or eliminating at the described ROI of the external adjusting of described ROI.
47. a video coding system, it comprises:
First video communication device, its coding near-end video;
Second video communication device, it receives described near-end video from described first video communication device, and the user that wherein said second video communication device produces the concern district (ROI) in the described near-end video that is produced by described first video communication device describes;
Intermediate server is different from described first and second video communication device on its structure, and its information of specifying described ROI based on described description generation,
Wherein said first video communication device is encoded described near-end video with the ROI that the strengthens described near-end video picture quality with respect to non-ROI zone based on the information of the described ROI of described appointment.
CN200680014872.7A 2005-03-09 2006-03-08 Region-of-interest extraction for video telephony Expired - Fee Related CN101171841B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US66020005P 2005-03-09 2005-03-09
US60/660,200 2005-03-09
US11/183,072 US8019175B2 (en) 2005-03-09 2005-07-15 Region-of-interest processing for video telephony
US11/183,072 2005-07-15
PCT/US2006/008457 WO2006130198A1 (en) 2005-03-09 2006-03-08 Region-of-interest extraction for video telephony

Publications (2)

Publication Number Publication Date
CN101171841A true CN101171841A (en) 2008-04-30
CN101171841B CN101171841B (en) 2012-06-27

Family

ID=39334927

Family Applications (2)

Application Number Title Priority Date Filing Date
CNA2006800145199A Pending CN101167365A (en) 2005-03-09 2006-03-08 Region-of-interest processing for video telephony
CN200680014872.7A Expired - Fee Related CN101171841B (en) 2005-03-09 2006-03-08 Region-of-interest extraction for video telephony

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CNA2006800145199A Pending CN101167365A (en) 2005-03-09 2006-03-08 Region-of-interest processing for video telephony

Country Status (1)

Country Link
CN (2) CN101167365A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102025965A (en) * 2010-12-07 2011-04-20 华为终端有限公司 Video talking method and visual telephone
CN102170552A (en) * 2010-02-25 2011-08-31 株式会社理光 Video conference system and processing method used therein
CN103428488A (en) * 2012-04-18 2013-12-04 Vixs系统公司 Video processing system with pattern detection and method for use thereof
WO2013185699A1 (en) * 2012-09-25 2013-12-19 中兴通讯股份有限公司 Local image enhancing method and apparatus
CN103518210A (en) * 2011-05-11 2014-01-15 阿尔卡特朗讯公司 Method for dynamically adapting video image parameters for facilitating subsequent applications
CN103581603A (en) * 2012-07-24 2014-02-12 联想(北京)有限公司 Multimedia data transmission method and electronic equipment
CN103634564A (en) * 2012-08-22 2014-03-12 一二三视股份有限公司 Method for assigning image monitoring area
CN104782121A (en) * 2012-12-18 2015-07-15 英特尔公司 Multiple region video conference encoding
CN111416939A (en) * 2020-03-30 2020-07-14 咪咕视讯科技有限公司 Video processing method, video processing equipment and computer readable storage medium
CN113330735A (en) * 2018-11-06 2021-08-31 索尼集团公司 Information processing apparatus, information processing method, and computer program
CN114157870A (en) * 2021-12-01 2022-03-08 安谋科技(中国)有限公司 Encoding method, medium, and electronic device

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2621167A4 (en) * 2010-09-24 2015-04-29 Gnzo Inc Video bit stream transmission system
CN103024334B (en) * 2011-09-28 2015-11-25 中国移动通信集团公司 A kind of method, system and equipment realizing visual telephone service
CN102438144B (en) * 2011-11-22 2013-09-25 苏州科雷芯电子科技有限公司 Video transmission method
CN102750122B (en) * 2012-06-05 2015-10-21 华为技术有限公司 Picture display control, Apparatus and system
US9386275B2 (en) 2014-01-06 2016-07-05 Intel IP Corporation Interactive video conferencing
US9516220B2 (en) 2014-10-02 2016-12-06 Intel Corporation Interactive video conferencing
US10021346B2 (en) 2014-12-05 2018-07-10 Intel IP Corporation Interactive video conferencing
CN105120366A (en) * 2015-08-17 2015-12-02 宁波菊风系统软件有限公司 A presentation method for an image local enlarging function in video call

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100550105B1 (en) * 1998-03-20 2006-02-08 미쓰비시텐키 가부시키가이샤 Method and apparatus for compressing ad decompressing image
US6178204B1 (en) * 1998-03-30 2001-01-23 Intel Corporation Adaptive control of video encoder's bit allocation based on user-selected region-of-interest indication feedback from video decoder
US7559026B2 (en) * 2003-06-20 2009-07-07 Apple Inc. Video conferencing system having focus control
US20050024487A1 (en) * 2003-07-31 2005-02-03 William Chen Video codec system with real-time complexity adaptation and region-of-interest coding

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102170552A (en) * 2010-02-25 2011-08-31 株式会社理光 Video conference system and processing method used therein
CN102025965A (en) * 2010-12-07 2011-04-20 华为终端有限公司 Video talking method and visual telephone
CN102025965B (en) * 2010-12-07 2014-01-01 华为终端有限公司 Video talking method and visual telephone
CN103518210A (en) * 2011-05-11 2014-01-15 阿尔卡特朗讯公司 Method for dynamically adapting video image parameters for facilitating subsequent applications
CN103428488A (en) * 2012-04-18 2013-12-04 Vixs系统公司 Video processing system with pattern detection and method for use thereof
CN103581603B (en) * 2012-07-24 2017-06-27 联想(北京)有限公司 The transmission method and electronic equipment of a kind of multi-medium data
CN103581603A (en) * 2012-07-24 2014-02-12 联想(北京)有限公司 Multimedia data transmission method and electronic equipment
CN103634564A (en) * 2012-08-22 2014-03-12 一二三视股份有限公司 Method for assigning image monitoring area
WO2013185699A1 (en) * 2012-09-25 2013-12-19 中兴通讯股份有限公司 Local image enhancing method and apparatus
US11330262B2 (en) 2012-09-25 2022-05-10 Zte Corporation Local image enhancing method and apparatus
CN104782121A (en) * 2012-12-18 2015-07-15 英特尔公司 Multiple region video conference encoding
CN113330735A (en) * 2018-11-06 2021-08-31 索尼集团公司 Information processing apparatus, information processing method, and computer program
CN111416939A (en) * 2020-03-30 2020-07-14 咪咕视讯科技有限公司 Video processing method, video processing equipment and computer readable storage medium
CN114157870A (en) * 2021-12-01 2022-03-08 安谋科技(中国)有限公司 Encoding method, medium, and electronic device

Also Published As

Publication number Publication date
CN101171841B (en) 2012-06-27
CN101167365A (en) 2008-04-23

Similar Documents

Publication Publication Date Title
CN101171841B (en) Region-of-interest extraction for video telephony
JP6022618B2 (en) Region of interest extraction for video telephony
US8977063B2 (en) Region-of-interest extraction for video telephony
CN102215217B (en) Establishing a video conference during a phone call
CN102215373B (en) In conference display adjustments
US7966005B2 (en) Data processing system and method, communication system and method, and charging apparatus and method
CN108337465B (en) Video processing method and device
KR20110087025A (en) Video communication method and digital television thereof
EP2936802A1 (en) Multiple region video conference encoding
CN103155548A (en) Control of user interface to display call participants auto focus
CN104012086A (en) System and method for depth-guided image filtering in a video conference environment
KR20120133006A (en) System and method for providing a service to streaming IPTV panorama image
CN111193892A (en) Remote linkage system and method based on virtual intelligent medical platform
CN103269445A (en) Smart television system and control method thereof
KR101939130B1 (en) Methods for broadcasting media contents, methods for providing media contents and apparatus using the same
CN104994405A (en) Instant-video transmission method and electronic equipment
US11877084B2 (en) Video conference user interface layout based on face detection
JP2004343175A (en) Video relaying apparatus
KR20090026467A (en) Fractal scalable video coding system using multi-porcessor and processing method thereof
CN104935861A (en) Multi-party multimedia communication method
JP2002209197A (en) Multiple place video conference system
KR20120004148A (en) Method for transmitting and receiving of video telephony having function of adjusting quality of resolution
KR20090015673A (en) Method for transmitting and receiving of video telephony having function of adjusting transmission environment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1117688

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1117688

Country of ref document: HK

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120627

Termination date: 20190308