CN101171841B - Region-of-interest extraction for video telephony - Google Patents

Region-of-interest extraction for video telephony Download PDF

Info

Publication number
CN101171841B
CN101171841B CN200680014872.7A CN200680014872A CN101171841B CN 101171841 B CN101171841 B CN 101171841B CN 200680014872 A CN200680014872 A CN 200680014872A CN 101171841 B CN101171841 B CN 101171841B
Authority
CN
China
Prior art keywords
roi
video
information
far
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200680014872.7A
Other languages
Chinese (zh)
Other versions
CN101171841A (en
Inventor
李彦辑
哈立德·希勒米·厄勒-马列
蔡明章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/183,072 external-priority patent/US8019175B2/en
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN101171841A publication Critical patent/CN101171841A/en
Application granted granted Critical
Publication of CN101171841B publication Critical patent/CN101171841B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The disclosure is directed to techniques for region-of-interest (ROI) processing for video telephony (VT) applications. According to the disclosed techniques, a recipient device defines ROI information for video information transmitted by a sender device, i.e., far-end video information. The recipient device transmits the ROI information to the sender device. Using the ROI information transmitted by the recipient device, the sender device applies preferential encoding to an ROI within a video scene. ROI extraction may be applied to process a user description of a region of interest (ROI) to generate information specifying the ROI based on the description. The user description may be textual, graphical, or speech-based. An extraction module applies appropriate processing to generated the ROI information from the user description. The extraction module may locally reside with a video communication device, or reside in a distinct intermediate server configured for ROI extraction.

Description

The concern district that is used for visual telephone extracts
The application's case is advocated the 60/660th of application on March 9th, 2005; The rights and interests of the 11/183rd, No. 072 patent application co-pending that is entitled as REGION-OF-INTEREST PROCESSING FOR VIDEO TELEPHONY of No. 200 U.S. Provisional Application cases and application on July 15th, 2005.
Technical field
This disclosure relates to digital video coding and decoding, and more particularly relates to the technology that district's (ROI) information is paid close attention in the processing that is used for visual telephone (VT) application.
Background technology
Many different video encoding standards have been set up for the encoded digital video sequence.For instance, move photographic experts group (MPEG) and developed many standards, comprise MPEG-1, MPEG-2 and MPEG-4.Other instance comprises H.263 standard and emerging ITU standard H.264 of International Telecommunication Union.These video encoding standards are supported through improve the efficiency of transmission of video sequence with the compress mode coded data usually.
Visual telephone (VT) allows shared video of user and audio-frequency information to support the application of for example video conference.Exemplary visual telephone standard comprises H.323 standard and ITU standard H.324 of those standards of being defined by session initiation protocol (SIP), ITU.In the VT system, the user can send and receiver, video information, receiver, video information only, or only send video information.The recipient checks the video information that is received with video information from the form of sender's transmission usually.
Proposed the selected part of video information is carried out priority encoding.For instance, the sender can specify with the better quality coding and pay close attention to district (ROI) to be used to be transferred to the recipient.The sender possibly hope to stress said ROI to long-range recipient.Although the sender possibly hope to pay close attention to other object in the video scene, the representative instance of ROI is people's face.Utilization is compared with non-ROI district the priority encoding of ROI, and the recipient can more clearly check ROI.
Summary of the invention
This disclosure is to concern district (ROI) treatment technology that is used for visual telephone (VT).According to the technology of said announcement, local reception person's device defines the video by long-range sender's device code and transmission, i.e. the ROI information of far-end video.Said local reception person's device arrives said long-range sender's device with said ROI message transmission.Said sender's device uses the said ROI information by said recipient's device transmission, and the ROI in the video scene is used priority encoding, for example better quality coding or error protection.In this way, recipient's device can Long-distance Control to ROI coding by the far-end video of sender's device code.
Except receiving the far-end video, the recipient also can be through equipment to send video, i.e. near-end video.Therefore, participate in the VT communicating devices and can serve as the sender and the recipient of video information symmetrically.When serving as the recipient, the video of each device definable far-end ROI information to be used for encoding by as sender's remote-control device.And when serving as the sender, each device definable near-end ROI information is to be used to be transferred to the video information as another device of recipient.Sender or recipient's device can be described as " the ROI perception ", are meant it and can handle the ROI information that provided by another device to support the Long-distance Control to the ROI video coding.
Far-end ROI information allows the recipient to control long-range ROI coding that sender's device carries out more clearly to check object or the district in the video scene that is received.Near-end ROI information allows the sender to control local ROI coding with object or district in the video scene of stressing to be transmitted.Therefore, the sender can be based on the ROI information that is produced by recipient or sender to the priority encoding of ROI.In addition, recipient's device can (for example) through application examples such as error concealing, deblock or the better quality reprocessing of deringing technology comes based on ROI information and the ROI that preferentially decodes.
In order to promote ROI to handle, this disclosure further contemplate that be used for ROI selections, ROI mapping, ROI extractions, ROI signaling, ROI follows the tracks of and the access of recipient's device verified with permission the ROI of sender's device to be encoded carry out the technology of Long-distance Control.ROI selects can be dependent on predefined ROI pattern, oral or text ROI description, or the ROI of user's delimitation.ROI shines upon to relate to selected ROI pattern is translated to the ROI mapping, the form that it can take the suitable macro zone block (MB) that is used by video encoder to shine upon.
The ROI signaling can relate to from the recipient in sender's device carries out the band of ROI information or out-of-band signalling.ROI follows the tracks of and relates to the dynamic adjustments ROI mapping in response to the ROI motion.The access checking can relate to from the purpose of the ROI control hazard between long-range ROI control and solution this locality and long-distance user or a plurality of long-distance user authorizes access right and grade to recipient's device.
The ROI extraction can relate to processing the user who pays close attention to district (ROI) is described the information of specifying said ROI to produce based on said description.Can encode the near-end video with the ROI that strengthens the near-end video picture quality regional based on the information of specifying ROI with respect to non-ROI.The user describes can be based on text, figure or voice.Extraction module is used and is suitably handled from the user describes, to produce ROI information.Extraction module can be stayed and existed video communication device local, or stays and exist in being configured to carry out the different intermediate server that ROI extracts.
In one embodiment; This disclosure provides a kind of method; It comprises from remote-control device reception appointment encodes the near-end video with the ROI that the strengthens video picture quality regional with respect to non-ROI by the local device coding and by the information in the concern district (ROI) in the near-end video of remote-control device reception with based on ROI.
In another embodiment, this disclosure provides a kind of video coding apparatus, and said video coding apparatus comprises: pay close attention to district's (ROI) engine, it receives the information in the concern district (ROI) of given transmission in the near-end video of remote-control device from remote video communication device; And video encoder, its coding near-end video is with the ROI that the strengthens video picture quality regional with respect to non-ROI.
In extra embodiment; This disclosure provides a kind of method; It comprises the information of produce specifying the concern district (ROI) in the far-end video that receives by the remote-control device transmission and by local device and with said message transmission to remote-control device to be used for encoding the far-end video with the ROI that the strengthens video picture quality regional with respect to non-ROI based on ROI.
In another embodiment, this disclosure provides a kind of video coding apparatus, and said video coding apparatus comprises: pay close attention to district's (ROI) engine, it produces the information of specifying the concern district (ROI) in the far-end video that receives from remote-control device; And video encoder, its coding near-end video also will be specified the information of ROI and transmit together to be made by remote-control device through the near-end video of coding and be used for encoding the far-end video with the ROI that strengthens the far-end video picture quality regional with respect to non-ROI based on ROI.
In another embodiment; This disclosure provides a kind of method; It comprises the description that receives the concern district (ROI) in the near-end video that is produced by local device from the user; Produce the information of appointment ROI and encode the near-end video with the ROI that strengthens the near-end video picture quality regional based on said description with respect to non-ROI based on the information of specifying ROI.
In extra embodiment; This disclosure provides a kind of video coding apparatus; Said video coding apparatus comprises: pay close attention to district's (ROI) engine, it receives the description by the concern district (ROI) in the near-end video of said device code, and produces the information of specifying ROI based on said description; And video encoder, its coding near-end video is with the ROI that the strengthens video picture quality regional with respect to non-ROI.
In another embodiment, this disclosure provides a kind of video coding system, and said video coding system comprises: first video communication device, its coding near-end video; Second video communication device, it receives the near-end video from first video communication device, and the user that wherein said second video communication device produces the concern district (ROI) in the near-end video that is produced by said first video communication device describes; And intermediate server; Be different from said first and second video communication device on its structure; And it produces the information of specifying ROI based on said description, and wherein first video communication device is encoded the near-end video with the ROI that strengthens the near-end video picture quality regional with respect to non-ROI based on the information of specifying ROI.
Technology described herein may be implemented in hardware, software, firmware or its any combination.If be implemented in the software; Can come part to realize said technology through computer-readable media so; Said computer-readable media comprises the program code that contains instruction, and said program code can carry out one or more methods in the method described herein when being performed.
Stated the details of one or more embodiment in accompanying drawing and the following description content.From describe content and accompanying drawing and accessory rights claim, will understand other features, objects and advantages.
Description of drawings
Fig. 1 is that the video coding that ROI perception Video Codec (CODEC) is arranged and the block diagram of decode system are incorporated in explanation into.
Fig. 2 be explanation with display that radio communication device is associated on the figure of definition of the interior ROI of the video scene that appears.
Fig. 3 is the block diagram that the communicator that ROI perception CODEC is arranged is incorporated in explanation into.
Fig. 4 explains the block diagram that has ROI perception CODEC and further incorporate another communicator that the ROI extraction module is arranged into.
Fig. 5 is the distributed ROI extraction of server is extracted in explanation via the centre a block diagram.
Fig. 6 is the block diagram that explanation is used for the distributed ROI extraction of a plurality of video-phone sessions.
Fig. 7 A-7D is the figure that explanation supplies the predefined ROI pattern of user's selection.
Fig. 8 is that explanation produces ROI information to control the flow chart to the preferential ROI coding of near-end video at long-range sender's device place at recipient's device place.
Fig. 9 is that explanation is handled from the ROI information of recipient's device so that combine ROI to follow the tracks of and at sender's device place the near-end video carried out the flow chart of preferential ROI coding.
Figure 10 is that the explanation processing is carried out flow chart that preferential ROI encode at sender's device place to the near-end video so that combine user rs authentication from the ROI information of recipient's device.
Figure 11 is the flow chart that predefined ROI pattern is selected in explanation.
Figure 12 is explanation defines the ROI pattern in the video scene that is shown through expansion and contraction ROI template figure.
Figure 13 is that explanation defines the figure of the ROI pattern in the video scene that is shown through dragging the ROI template.
Figure 14 is that explanation be through delimiting the figure that the ROI zone defines the ROI pattern in the video scene that is shown with stylus on touch screen.
Figure 15 is that explanation uses the pull-down menu of the ROI object of the appointment with the Dynamic Extraction treated and tracking to define the figure of the ROI pattern in the video scene that is shown.
Figure 16 is that explanation is used to have and is mapped to the figure that defines the ROI pattern in the video scene that is shown like the pull-down menu of the ROI object of the appointment of predefined ROI pattern among Fig. 7 A-7D.
Figure 17 is that explanation uses ROI to describe the flow chart that the interface defines the ROI pattern in the video scene that is shown.
Figure 18 is that explanation solves the flow chart that the sender conflicts with ROI between recipient's device.
Figure 19 is the flow chart of the preferential decoding of the ROI macro zone block in the explanation far-end video.
Embodiment
Fig. 1 is that the video coding that ROI perception Video Codec (CODEC) is arranged and the block diagram of decode system 10 are incorporated in explanation into.As shown in Figure 1, system 10 comprises first video communication device 12 and second video communication device 14.Communicator 12,14 connects through transmission channel 16.Transmission channel 16 can be wired or wireless medium.System 10 supports video communication device 12, the two-way video transmission that is used for visual telephone between 14. Device 12,14 symmetrical manner operation substantially.Yet in certain embodiments, one in the video communication device 12,14 or both can be through being configured to only to be used for one-way communication to support ROI perception video streaming.
For bidirectional applications, reciprocal coding, decoding, multiplexed (MUX) and multichannel are decomposed the opposite end that (DEMUX) assembly can be provided at channel 16.In the instance of Fig. 1, video communication device 12 comprises MUX/DEMUX assembly 18, ROI perception video CODEC20 and audio frequency CODEC22.Similarly, video communication device 14 comprises MUX/DEMUX assembly 26, ROI perception video CODEC28 and audio frequency CODEC30.Each CODEC20,28 is " the ROI perception ", is meant it and can handles by the long-range ROI information that provides or provided by himself video communication device this locality of another video communication device 12,14.
Video communication device 12,14 can be embodied as through equipment to be used for video streaming, visual telephone or both mobile radio terminals or catv terminal.For this reason, video communication device 12,14 can further comprise suitable wireless transmit, reception, modulatedemodulate reconciliation process electronic component with support of wireless communication.The instance of mobile radio terminal comprises mobile radiotelephone, mobile personal digital assistant (PDA), mobile computer or is equipped with wireless communication ability and other mobile device of video coding and/or decoding capability.The instance of catv terminal comprises desktop computer, visual telephone, the network equipment, STB, interactive television etc.Any one can be through being configured to send video information, receiver, video information in the video communication device 12,14, or send and receiver, video information.
For videophone application, need device 12 to support video to send and the video reception ability usually.Yet, also expect the crossfire Video Applications.In visual telephone and the especially mobile video telephone by radio communication, bandwidth is important concern factor.Therefore, the additional coding position optionally is assigned to the picture quality that ROI or other priority encoding step can be improved the part of video, keeps overall code efficiency simultaneously.For priority encoding, can extra bits be assigned to ROI, can the position of the number that reduces be assigned to non-ROI district (for example, the background in the video scene) simultaneously.
Usually, system 10 is used for concern district (ROI) treatment technology that visual telephone (VT) is used.Yet this type of technology also can be applicable to video streaming and uses, and is mentioned like preceding text.For purposes of illustration, will suppose that each video communication device 12,14 can be as the sender of video information and recipient and operated, and operates as the participant in full in the VT session by this.For the video information that is transferred to video communication device 14 from video communication device 12, video communication device 12 is that sender's device and video communication device 14 are recipient's devices.On the contrary, for the video information that is transferred to video communication device 12 from video communication device 14, video communication device 12 is that recipient's device and video communication device 14 are sender's devices.When discussing will be by the video information of local video communication device 12,14 codings and transmission the time, said video information will be called " near-end " video.When discussing will be by remote video communication device 12,14 codings and from video information that remote video communication device 12,14 receives the time, said video information will be called " far-end " video.
According to the technology that is disclosed, when operating as recipient's device, video communication device 12 or 14 defines the ROI information to the far-end video information that receives from sender's device.Once more, the video information that receives from sender's device is called " far-end " video information, receives because it is another (sender) device from the far-end that is in communication channel.Equally, the ROI information that defines to the video information that receives from sender's device is called " far-end " ROI information.Far-end ROI typically refers to the district of the recipient's concern that causes the far-end video in the far-end video most.Recipient's device decoding far-end video information also will be presented to the user via display unit through the far-end video of decoding.The user selects ROI in the video scene that the far-end video is appeared.
The ROI that recipient's device is selected based on the user and produce far-end ROI information, and far-end ROI information is sent to sender's device.Far-end ROI information can be taked the form of ROI macro zone block (MB) mapping, and it defines ROI according to staying the macro zone block that exists in the ROI.ROI MB shines upon available 1 mark and is in the MB in the ROI, and with the outside MB of 0 mark ROI, is included in (1) among the ROI and the eliminating MB of (0) outside ROI with identification easily.MB is the video block that forms the part of frame.The size of MB can be 16 * 16 pixels.Yet other MB size is possible.Therefore, MB can refer to any video block, comprise (but being not limited to) for example MPEG-1, MPEG-2 and MPEG-4, ITU H.263, the macro zone block of definition in ITU particular video frequency coding standard or any other standard H.264.
Through using the far-end ROI information by the transmission of recipient's device, sender's device is applied to the corresponding ROI in the video scene with priority encoding.In particular, can the additional coding position be assigned to ROI, can the bits of coded of the number that reduces be assigned to non-ROI district simultaneously, improve the picture quality of ROI by this.In this way, the ROI coding that can Long-distance Control sender device the far-end video information be carried out of recipient's device.Priority encoding for example distributes through the priority bit in the ROI zone or preferential the quantification, and high-quality coding will be applied to the ROI zone and will compare more with the non-ROI zone of video scene.Allow the user of recipient's device more clearly to check object or district through the ROI of priority encoding.For instance, compare with the background area of video scene, the user of recipient's device possibly hope more clearly to check face or a certain other object.
When operating as sender's device, video communication device 12 or 14 also definable is directed against the ROI information by the video information of sender's device transmission.Once more, the video information that produces in sender's device is called " near-end " video, because it is to produce at the near-end of communication channel.The ROI information that is produced by sender's device is called " near-end " ROI information.Near-end ROI typically refers to the district of the near-end video that the sender hopes to stress to the recipient.Therefore, ROI can be appointed as far-end ROI information by recipient's device users, or is appointed as near-end ROI information by sender's device users.Sender's device is looked closely frequency nearly and is presented to the user via display unit.The user apparatus associated with the sender selects ROI in the video scene that the near-end video is appeared.The ROI that sender's device uses the user the to select near-end video of encoding makes that the ROI in the near-end video is carried out priority encoding by (for example) with the better quality coding with respect to non-ROI zone.
The near-end ROI that is selected by the local user at sender's device place allows the user of sender's device to stress district or object in the video scene, and makes these districts or object cause the concern of recipient's device users by this.It should be noted that the near-end ROI that is selected by sender's device users need not to be transferred to recipient's device.In fact, sender's device is looked closely nearly to keep pouring in to be passed to and is used selected near-end ROI information at the said near-end video of local coder before recipient's device.Yet in certain embodiments, sender's device can send to recipient's device to allow to use preferential decoding technique, for example better quality error correction (like error concealing) or reprocessing (as deblocking and the deringing filter) with ROI information.
If ROI information is provided by sender's device and recipient's device, sender's device is used the far-end ROI information that receives from recipient's device or the local near-end ROI information that the produces near-end video of encoding so.The near-end that sender's device and recipient's device provide with ROI possibly occur between far-end ROI selects and conflict.This type of conflict possibly need to solve, and is for example initiatively solved by the local user or solves according to the access right and the grade of defined, will describe like other place in this disclosure.In either case, sender's device all comes priority encoding ROI based on the near-end ROI information that is provided by sender's device this locality or by the long-range ROI information that provides of recipient's device.
In order to promote ROI to handle, this disclosure further contemplate that be used for ROI selections, ROI mapping, ROI signaling, ROI follows the tracks of and the access checking of recipient's device encoded to the ROI of sender's device with permission carry out the technology of Long-distance Control.As will describe, the different ROI that recipient's device or sender's device are used selects technology to relate to and selects predefined ROI pattern, oral or text ROI description, or user's ROI delimit.In recipient's device, ROI shines upon to relate to selected far-end or near-end ROI pattern is translated to the ROI mapping, and it can take the form of macro zone block (MB) mapping.The ROI signaling can relate to from recipient's device in sender's device carries out the band of far-end ROI information or out-of-band signalling.ROI follows the tracks of and to relate to far-end ROI mapping that in response to ROI motion dynamic adjustments produces by recipient's device or by the local near-end ROI of sender generation itself.Access checking can be to the Long-distance Control of far-end ROI and solve the purpose of the ROI control hazard between recipient and the sender's device and relate to recipient's device and authorize access right and grade.
System 10 can support standard, the ITU visual telephone of standard or other standard H.324 H.323 according to session initiation protocol (SIP), ITU.Each video CODEC 20,28 according to for example MPEG-2, MPEG-4, ITU H.263 or the video compression standard of ITUH.264 produce video data through coding.As further showing among Fig. 1, video CODEC20,28 can with audio frequency CODEC22 separately, 30 integrated, and comprise the Voice & Video part of suitable MUX/ DEMUX assembly 18,26 with data streams.MUX/ DEMUX unit 18,26 can meet ITU H.223 multiplexer agreement or other agreement of UDP (UDP) for example.
Fig. 2 be explanation with display 36 that radio communication device 38 is associated on the figure of definition of the interior ROI32 of the video scene that appears 34.In the instance of Fig. 2, ROI32 is a rectangle region, and it contains the people's who appears in the video scene 34 face 39, and needs improve or any image or the object of the coding of enhancing but ROI can contain.In VT uses, the people who appears in the video scene 34 will be the user of long-range sender's device usually, and it is a side of the video conference carried out with user as the radio communication device 38 of recipient's device operation.ROI32 constitutes far-end ROI, because the ROI of its definition from the video scene of long-range sender's device transmission.According to this disclosure, far-end ROI32 is transferred to sender's device to specify the priority encoding to the zone of the video scene in the ROI.In this way, the picture quality that the local user of recipient's device 38 can Long-distance Control far-end ROI32.As will describe, the size of far-end ROI32, shape and position can be fixing or adjustable, and can define in many ways, describe or regulate.
ROI32 allows recipient's device users more clearly to check the individual objects in the video scene 34, for example people's face 39.Face 39 in the ROI32 is encoded with higher image quality with respect to the non-ROI zone (for example, background area) of video scene 34.In this way, the user can more clearly check facial expression, lip activity, eye activity etc.Yet, perhaps can use ROI32 to specify any object except face.In general, the ROI during VT uses maybe be very subjective and maybe be different because the user is different.Required ROI also depends on how to use VT.In some cases, VT can be used for checking and evaluation object, forms contrast with video conference.
For instance, the husband can use VT should be used for showing it to want the present of buying in the gift shop, airport.The husband possibly hope to obtain second kind of suggestion with timely and alternant way there from his wife.Do like this, he can make decision immediately, because the airliner that he took will set out at once.In this case, ROI is the district that covers the present that the husband just considering.Through allowing wife (or husband) to select ROI, might realize better coding or good quality of service, and allow wife more clearly to check present by this to said specific ROI.
As another instance, two or more engineers can relate to the VT conversation of on blank, demonstrating and discuss various equalities or chart.In this case, the long-distance user possibly hope to check with the better image quality zone of blank, for example is more clearly visible the details of equality.For this reason, the long-distance user selects to comprise the ROI of said equality.In addition, when an engineer when blank adds, the long-distance user possibly hope to move ROI to follow the tracks of the theme that newly adds blank to.The long-distance user specifies the ability of ROI can significantly improve the exchange of information in the technical discussion process.
ROI technology described herein is not only improved the video quality of ROI, and improves two video interactives between the user.In general, conventional VT only use with two one-way video transmission combinations and any all are oral carrying out alternately.In conventional VT used, the video side did not exist usually alternately.The Finite control that permission recipient device users has during the VT conversation the video content that receives from sender's device at least can allow more video interactive.
In this way; VT uses and can make recipient's device users can select ROI through design, and ROI information is sent it back sender's device so that ROI is carried out priority treatment, and for example the better quality coding (for example; Through distributing more bits of coded) or strong error protection (for example, inner MB upgrades).In fact, through specifying far-end ROI, the remote controlled sender's device code of recipient's device users device.In addition, this far-end ROI information can be used by the ROI perception Video Decoder in the device, and said ROI perception Video Decoder receives the far-end video to carry out reprocessing preferably, for example error concealing, deblock or deringing.By the Long-distance Control of video encoder being different from pan, inclination, zoom or the focal length of only controlling remote camera through the recipient of encoded video.By contrast, handle through long-range ROI, the user can influence the encoding quality that is applied to given zone.Yet, in certain embodiments, remote camera control and the control combination of long-distance video encoder can be provided.
Fig. 3 is the block diagram that the video communication device 12 that ROI perception CODEC is arranged is incorporated in explanation into.Although the video communication device 12 of Fig. 3 depiction 1 can be constructed video communication device 14 similarly.Once more, video communication device 12 or 14 can be served as recipient's device, sender's device, and preferably recipient and sender's device.As shown in Figure 3, video communication device 12 comprises ROI perception CODEC20, video capture device 40 and user interface 42.Although show channel 16 among Fig. 3, omitted MUX/DEMUX and audio-frequency assembly for the ease of explanation.Video capture device 40 can be integrated or operationally be coupled to the video camera of video communication device 12 with video communication device 12.In certain embodiments, for instance, video capture device 40 can be integrated to form so-called video camera phone with mobile phone.In this way, video capture device 40 can support to move the VT application.
User interface 42 can comprise display unit, for example LCD (LCD), plasma screen, projecting apparatus display, or can be with video communication device 12 integrated or operationally be coupled to any other display device of video communication device 12.Display unit presents video image to the user of video communication device 12.Video image can comprise the near-end video that is obtained in this locality by video capture device 40, and from the far-end video of sender's device remote transmission.In addition, user interface 42 can comprise any one in multiple user's input medium, comprises hardkey, soft key, various indicator device, stylus etc., to be used for the user's input information by video communication device 12.In certain embodiments, the display unit of user interface 42 and user's input medium can be integrated with mobile phone.The user of video communication device 12 depends on user interface 42 and checks that far-end video and (according to circumstances) check the near-end video.In addition, the user depends on user interface 42 and comes input information to be used for definition or to select far-end ROI and (according to circumstances) near-end ROI.
As showing further among Fig. 3 that ROI perception CODEC20 comprises ROI engine 44, ROI perception video encoder 46 and ROI perception Video Decoder 48.The near-end video (" near-end video ") that ROI perception video encoder 46 coding obtains from video capture device 40 is to be used to be transferred to long-range recipient's device.Once more, term " near-end " is illustrated in the local video that produces in the video communication device 12, and this forms contrast with " far-end " video that receives from remote video communication device (for example, video communication device 14).In the instance of Fig. 3, ROI perception video encoder 46 uses from the near-end ROI information (" long-range near-end ROI ") of remote receiver acquisition and comes priority encoding near-end ROI.Long-range recipient is the user who is associated with remote video communication device 14.
From long-distance user's visual angle, long-range near-end ROI is remote ROI when by remote-control device 14 transmission, and when it is received, is called long-range near-end ROI from the visual angle of installing 12 local user.That is to say, determined to think that as the visual angle of sender or recipient's device 12,14 video and ROI are applicable to that near-end still is the far-end video.Once more, the user of the local device 12 of the video coding at Long-distance Control remote-control device 14 places specifies far-end ROI.Yet when the user of remote-control device 14 received far-end ROI, it was considered to long-range near-end ROI, because its near-end video about just being encoded by local device 14.In general, from the purpose of the mark that uses in this disclosure, the visual angle is important.
According to circumstances, ROI perception video encoder 46 can use the near-end ROI information (" local near-end ROI ") that obtains from the local user of video communication device 14.Local near-end ROI also can be described as the ROI that the sender drives, because it is produced by the sender through coding near-end video.Local near-end ROI information is used by local encoder 46 and is not sent to another video communication device 14 usually, only if the Video Decoder in the remote-control device 14 is applied to the near-end ROI by user's appointment of sender's device 12 through design will preferentially decoding.Long-range near-end ROI also can be described as the ROI that receiver drives, because it is by producing through coding near-end video remote receiver.The recipient of the video that long-range near-end ROI allows to be produced by video communication device 12 controls the ROI coding that ROI perceptual audio coder 46 carries out, and the sender of the video that local near-end ROI allows to be produced by video communication device 12 controls the ROI coding that ROI perceptual audio coder 46 carries out.In some cases, as describing, long-range and local ROI definition potentially conflicting solves thereby need to conflict.
Local and long-range near-end ROI information can be provided to ROI perceptual audio coder 46 as near-end ROI macro zone block (MB) mapping (" near-end ROI MB mapping ").The specific MB that exists in receiver near-end ROI or the sender's near-end ROI is stayed in near-end ROI MB mapping identification.ROI perceptual audio coder 46 is encoded, is come the ROI in the priority encoding near-end video than strong error protection or both with better quality, to improve the picture quality of ROI when for example the long-distance user at remote video communication device 14 places checks.Error protection preferably for ROI possibly especially cater to the need in wireless phone applications.Then be transferred to remote-control device 14 through coding near-end video (" through coding near-end video ") with what produce.
As will explain that ROI perception video encoder 46 also transmits the far-end ROI information (" far-end ROI ") that the local user by video communication device 12 produces to the far-end video that receives from remote video communication device 14.Far-end ROI serves as the ROI to the receiver driving of the video of being encoded by remote video communication device 14.In fact; Far-end ROI information by video communication device 12 transmission allows part control at least by the encoder of the far-end video of remote video communication device 14 generations, is used with control ROI perception video encoder 46 by video communication device 12 as the long-range near-end ROI that is received by ROI perception decoder 48.In this way, each video communication device 12,14 can influence the ROI coding in the far-end video that is produced by another device.
Can be used as in the band or out-of-band signalling information and transmitting by the far-end ROI information of video communication device 12 transmission.Under the situation of in-band signalling, far-end ROI information can be embedded in the warp coding near-end video bit stream that is transferred to remote video communication device 14.For instance, in the mpeg 4 bitstream form, have the field that is called " user_data ", it can be used for the information of embedded description bit stream.Similar field in " user_data " field or other bit stream format can be used for embedded far-end ROI information and can not violate the bit stream compliance.Perhaps, ROI information can be embedded in the video bit stream through the so-called data hiding technique of for example Steganography.
ROI information is sought in ROI perception Video Decoder 48 other place in the far-end video that is configured in the user_data field or imports into from remote-control device.Under the situation of out-of-band signalling, for example can use ITU H.245 or the signaling protocol of SIP pass on far-end ROI information.In either case, far-end ROI information can take to define position and/or the ROI MB mapping of size or the form of physical coordinates of far-end ROI.In case decoder 48 receives the far-end video bit stream; It is just based on retrieving ROI information with the form of long-range sender's device agreement; And ROI information is delivered to access authentication module 58 to obtain access permission, to be used for before long-range near-end ROI being provided to video encoder 56, carrying out near-end ROI control.
Except controlling the long-distance video encoder with the ROI in the priority encoding far-end video, far-end ROI information also can be applicable to the local video decoder with the MB in the ROI in the preferential decoding far-end video.For instance, as further showing among Fig. 3, the identical far-end ROI MB mapping that is produced to be used to be transferred to long-range encoder by ROI mapper 54 can be provided to ROI perception Video Decoder 48.ROI perception Video Decoder 48 uses ROI MB mapping to come the MB in the far-end video that preferential decoding receives from remote video communication device 14.For instance, ROI perception Video Decoder 48 can be compared to ROI MB with non-ROIMB and use better reprocessing.Extraly or alternatively, ROI perception Video Decoder 48 can be compared with non-ROI MB to ROI MB and use more healthy and stronger error concealing technology.In this way, ROI perception Video Decoder 48 depends on the picture quality of the ROI part of the far-end video that is imported into by the next preferential decoding of the far-end ROI information of local user's generation with the realization enhancing.
ROI perception Video Decoder 48 receives the far-end video that imports into from remote video communication device (for example, the video communication device 14 of Fig. 1).ROI perception Video Decoder 48 decoding far-end videos also will be provided to user interface 42 on display unit, to present to the local user through the video of decoding.In addition, as stated, ROI perception Video Decoder 48 is from remote video communication device 14 receiving remote near-end ROI information (" long-range near-end ROI ").The near-end ROI information that ROI perception Video Decoder 48 receives is produced to specify by the ROI in the video of video communication device 12 transmission by the user of remote video communication device 14.As stated, the long-range near-end ROI information that receives of ROI perception Video Decoder 48 is used for the ROI of the near-end video that Long-distance Control ROI perception video encoder 46 produces by video communication device 12 with priority encoding.As stated, through being with interior or out-of-band signalling is technological transmits long-range near-end ROI.
Further referring to Fig. 3, ROI perception video encoder 46 is mutual with ROI perception Video Decoder 48 and ROI engine 44.ROI engine 44 is handled local and long-range near-end ROI information is encoded and transmitted the near-end video bit stream from video capture device 40 being used to.In addition, ROI engine 44 is handled the far-end ROI information that provides via user interface 42 to be used for coding and to be transferred to remote video communication device 14.ROI engine 44 comprises ROI controller 52, ROI mapper 54, ROI tracking module 56 and authentication module 58.In certain embodiments, ROI tracking module 56 can be chosen wantonly with authentication module 58.
ROI perception video encoder 46, ROI perception Video Decoder 48, ROI controller 52, ROI mapper 54, ROI tracking module 56 and authentication module 58 can form in many ways, as the discrete functionality module or as comprising the functional one chip module that belongs to each module.In either case, each assembly of ROI perception CODEC20 (comprising ROI engine 44, video encoder 46 and Video Decoder 48) can be implemented in hardware, software, firmware or its combination.For instance, this class component can be used as one or more microprocessors or digital signal processor (DSP), one or more application-specific integrated circuit (ASIC)s (ASIC), one or more field programmable gate arrays (FPGA) or other equivalence is integrated or discrete logic on the software process carried out and operating.If be implemented in the software; Can come part to realize said technology through computer-readable media so; Said computer-readable media comprises the program code that contains instruction, and said program code can carry out one or more methods in the method described herein when in processor or DSP, carrying out.
In operation, the near-end video that the user of video communication device 12 selects to be produced by video capture module 40 or by the far-end video of ROI perception Video Decoder 48 decodings, with display unit that user interface 42 is associated on check.In certain embodiments, the functional user of permission of picture-in-picture (PEP) checks near-end video and far-end video simultaneously.In order to check near-end or far-end video from the purpose of ROI definition, the user can handle user interface 42 and call the ROI defining mode.Default ground, video communication device 12 can be handled video coding and decoding and not consider ROI.Through getting into ROI defining mode, the ROI perceptual coding of user activation video communication device 12 and decoding aspect.Perhaps, ROI perceptual coding and decoding can be default mode.
When presenting the far-end video, the user uses in the multiple technologies any one to come the ROI in the indicating remote video, will more describe in detail said technology.Far-end ROI is outstanding explicit user district or object that pay close attention to or that need higher image quality in video scene.User interface 42 is imported based on the user and is produced far-end ROI indication.ROI information can further be handled to produce far-end ROI information to be used to be transferred to video communication device 14 by ROI engine 44.
Perhaps, the user can select to define to be used for ROI from the near-end video that video capture module 40 obtains.When presenting the near-end video, the user can use according to circumstances with the technology type of the ROI indication that is used for the far-end video like or identical technology indicate the ROI in the near-end video.Near-end ROI or far-end ROI can be designated by initial appointment or any time during the VT communication process when VT conversation beginning.In certain embodiments, Initial R OI can be upgraded by local user or long-distance user, or upgrades automatically through ROI tracking module 56.If ROI is upgraded automatically, the user need not continue to import ROI information so.In fact, will keep ROI, till the user changes or ends ROI based on user's initial input.
The indication that user interface 42 provides based on the user and produce local near-end ROI indication.The same with far-end ROI indication, near-end ROI indication can further be handled by ROI engine 44.Near-end ROI indication is outstanding to be shown, and (that is, through increasing picture quality) the interior user of video scene hopes district or object to the long-distance user stresses.The local user can be through selecting predefined ROI pattern via user interface 42 or delimiting the ROI pattern and select near-end ROI or far-end ROI.Delimit the ROI pattern and can relate to stylus and carry out free-hand drafting, or to default ROI pattern designed size again or reorientate.
In the instance of Fig. 3, user interface 42 indicates (if providing) and far-end ROI indication to be provided to the ROI controller 52 in the ROI engine 44 local near-end ROI.In addition, ROI controller 52 via authentication module 58 from ROI perception Video Decoder 48 receiving remote near-end ROI.In particular, ROI perception Video Decoder 48 detects the existence of long-range near-end ROI information in the far-end video flowing that is received, or via the existence of the long-range near-end ROI information of out-of-band signalling, and long-range near-end ROI information is provided to authentication module 58.Local near-end ROI and far-end ROI indication can be expressed according to the coordinate in the frame of video of each near-end video or far-end video.The coordinate of ROI can be the x-y coordinate in the frame of video.Yet the x-y coordinate is treated to produce ROI MB mapping, to be used by encoder 46 or decoder 48, as explaining.
ROI controller 54 is handled local near-end ROI, long-range near-end ROI and far-end ROI, and they are applied to ROI mapper 54.ROI mapper 54 is macro zone block (MB) mapping with each ROI Coordinate Conversion.More particularly, ROI mapper 54 produces far-end MB mapping, and it specifies the MB corresponding to the far-end ROI that is indicated by the local user in the far-end video.In addition, ROI mapper 54 produces near-end ROI MB mapping, and it specifies the MB corresponding to local near-end ROI, long-range near-end ROI or both combinations in the near-end video.
For predefined ROI pattern, the ROI mapping is simpler.Each predefined ROI pattern can have equally by predefined appointment MB mapping.Yet, for delimit, reorientate or the ROI pattern of designed size again, ROI mapper 54 selects to meet most the MB border by the coordinate of the ROI pattern of user's appointment.For instance, if the ROI of appointment crosses MB, ROI mapper 54 places the ROI border at external margin or the internal edge place of relevant MB so.In other words, ROI mapper 54 can be included in the ROI MB mapping through the MB that is configured to only will be in fully in the ROI, perhaps also comprises part and is in the MB in the ROI.In either case, ROI comprises one group of complete MB that is proximate to the ROI of appointment.Once more, video encoder 46 or Video Decoder 48 are operated in the MB level, and need ROI be translated to the MB mapping usually.Be included among the ROI or get rid of outside ROI through indivedual MB are appointed as, ROI MB mapping allows with irregular or non-rectangular shape definition ROI.
ROI perception video encoder 46 is being transferred to remote video communication device 14 with far-end ROI MB mapping in the near-end video of coding or through out-of-band signalling.Near-end ROI MB mapping is not transferred to remote video communication device 14.In fact, near-end ROI MB mapping is used by ROI perception video encoder 46, so as before to be transferred to remote video communication device 14 with better quality coding or strong error protection and the MB of the appointment in the priority encoding near-end video.Therefore, ROI perception video encoder 46 will through the near-end video of coding with through the ROI of priority encoding and far-end ROI message transmission to remote video communication device 14.
Variation in the ROI district of ROI tracking module 56 tracking near-end videos.If VT use to stay and to exist in the mobile video communication device, for instance, the user possibly move every now and then, thereby the position that causes the user changes with respect to the ROI of previous appointment.In addition, even when customer location is stablized, other object in the ROI also possibly shift out the ROI district.For instance, the canoe on the lake surface can pitch along with wave motion or move left and right.Need define ROI again for fear of user when being moved, can provide ROI tracking module 56 with the object in motion tracking ROI district.
In the instance of Fig. 3, the near-end video reception movable information through coding of ROI tracking module 56 from producing by ROI perception video encoder 46.Movable information can be taked the form of the motion vector of the MB in the near-end video of coding, thereby allows to carry out closed-loop control through 54 couples of ROI MB of ROI mapper mapping definition.Based on movable information, ROI tracking module 56 produces to be regulated the incremental positions of near-end ROI MB mapping, and adjusting is provided to ROI mapper 54.Position adjustments can be taked as being included among the ROI or getting rid of the form of the MB state variation outside ROI.
If a large amount of the moving of movable information indication ROI, the state of MB possibly change in the ROI MB mapping so.Usually, the state that is in the MB at ROI outer boundary place will change.In response to position adjustments, ROI mapper 54 makes the ROI displacement by near-end ROI MB mapping appointment, makes that the ROI position is that the basis is adapted to the motion in the near-end video of coding to pursue frame.When moving in video scene, to detect with 54 cooperations of ROI mapper, regulates ROI tracking module 56 the ROI position automatically.In this way, ROI engine 44 is regulated ROI to follow the tracks of the object that moves in the ROI.
Authentication module 58 is used to resolve long-distance user's ROI right, comprises individual user's the right and the priority of the right between a plurality of user.When ROI perception Video Decoder 48 during from remote video communication device 14 receiving remote near-end ROI, it is provided to ROI engine 44 with long-range near-end ROI.Yet, in some cases, possibly conflict with local near-end ROI by local user's appointment by the long-range near-end ROI of long-distance user's appointment.For instance, but local and interior overlapping ROI or the diverse ROI of long-distance user's designated scene.In the case, can provide authentication module 58 to solve the ROI conflict.
In certain embodiments, authentication module 58 can be used so-called " MS master-slave " mechanism and coordinates should use which near-end ROI information (Local or Remote) in preset time.In particular, receive the sender before the ROI information of receiver driving, the sender is near-end ROI master device and controls its near-end ROI.In other words, before video communication device 12 places received long-range near-end ROI, the local user controlled near-end ROI.Thereby the long-distance user is near-end ROI " slave unit " and do not control near-end ROI, only if main device (that is local user) is authorized the access right of control near-end ROI.
In case the local user authorizes access right to the long-distance user, the local user just no longer controls its near-end ROI.In fact, the long-distance user who is associated with video communication device 14 obtains the control for the near-end ROI of the near-end video that is produced by video communication device 12, and becomes the main device of near-end ROI.But long-distance user's retentive control power is till the local user cancels access privilege clearly or refuses long-distance user's access with other mode; Perhaps till the long-distance user ended the ROI chosen position, main in the case ROI control can be given back in the local user.
In case the far-end video (if any) that ROI perception Video Decoder 48 receives through coding, it is just based on from video bit stream, retrieving long-range near-end ROI information with the form of sender's agreement.Once more, near-end ROI information can be embedded in the far-end video of coding or through out-of-band signalling and send.In either case, ROI perception Video Decoder 48 is delivered to authentication module 58 before via ROI controller 52 and ROI mapper 54 long-range near-end ROI being sent to ROI perception video encoder 46, to obtain access permission with long-range near-end ROI.Authentication module 58 is formed on the specific user with access right, makes the user under the situation of authorizing without the local user, can not control cataloged procedure.
Authentication module 58 can be through being configured to authorize and managing access power, and between one or more long-distance users, carry out balance.For instance, the local user can authorize access right to selected long-distance user.Therefore, the local user can allow some long-distance users to control near-end ROI and forbid that other long-distance user controls near-end ROI.And the local user can assign relative access grade or priority to the long-distance user.In this way, but the stratum of the access grade between local user's assigning remote user makes and asks simultaneously under the situation of ROI control a plurality of long-distance users that some long-distance users compare with other long-distance user can have priority aspect the control near-end ROI.For instance, a plurality of long-distance users possibly ask the ROI control simultaneously in the multipart video-meeting process.Under this type of situation, the ROI control will be authorized to a user usually specially, and it is the local user, if perhaps control is authorized by the local user, it is selected one among the long-distance user so.
In certain embodiments, authentication module 58 also can be responsible for resource monitoring to confirm whether local video communication device 12 has the ability of launching ROI perception Video processing.If not having sufficient processing resource, local device do not come to support long-range ROI control or satisfy the ROI request of particular type that in preset time authentication module 58 is cancelled long-range ROI control access right or refusal ROI request so.As an instance, bandwidth constraints or the local load of handling forced by communication channel possibly cause refusing long-range ROI control.As another instance, these restrictions possibly allow to use pre-configured ROI pattern, rather than a ROI pattern of delimiting or describing.Authentication module 58 can be sent to spreading out of of remote-control device and in coding near-end video, comes to notify said ROI decision-making to remote-control device through status message is embedded in.
In addition, can authorize the degree that different access grades is controlled long-distance user's may command near-end ROI to indivedual long-distance users.For instance, the long-distance user only can be limited to and when the local user ratifies, just can select one group of predefined ROI pattern, specific ROI position or the specification of size or ROI.Therefore, authentication module 58 can be resolved the control of long-distance user for near-end ROI automatically, or through consulting the active approval for long-distance user's near-end ROI control alternately with the local user.For instance, when the long-distance user asked access right with control near-end ROI, authentication module 58 can submit to inquiry with the long-distance user ROI control that requests for permission to the local user via user interface 42.
Authentication module 58 any one in can be in many ways followed the tracks of long-distance user's access grade.As stated, the local user can ratify the request from long-distance user's control near-end ROI on one's own initiative, and controls the access grade of authorizing to the long-distance user on one's own initiative.Perhaps, the local user can keep address book in the memory in the video communication device 12 of the information (comprising access right or grade) that storage is associated with the long-distance user.Said address book can take to have the form of the long-distance user and the database of the tabulation of the access grade that is associated.When the long-distance user asked near-end ROI control, authentication module 58 was from the relevant access right information of address book retrieval, and the application verification process is resolved the ROI control between local user, long-distance user and the some long-distance users of possibility automatically.If the long-distance user is not listed in the address book, the local user can select the long-distance user is added to address book and has suitable access right so.
In some cases, the local user can surmount (override) default access grade for the particular remote user appointment in the address book.For instance, authentication module 58 can allow the local user during the VT communication process, between different remote users, to reconfigure ROI control priority on one's own initiative, or interferes to regain the proprietary control to near-end ROI as the local user.Representing by the access control information among Fig. 3 (ACCESS CONTROL INFO) alternately between local user and the authentication module 58 when keeping address book or the request of active management ROI control.
When ratifying long-distance user's near-end ROI control automatically or initiatively, authentication module 58 is delivered to ROI controller 52 with long-range near-end ROI and is handled and shone upon by near-end ROI mapper 54 being used for.Perhaps, control near-end ROI if promptly do not provide long-range near-end ROI or local user to select to repel the long-distance user, ROI controller 52 is handled the local near-end ROI that is provided via user interface 42 by the local user so.
Authentication module 58 is used to solve this locality and conflicts with ROI between the long-distance user.Default ground, authentication module 58 is used the MS master-slave notion, and according to said MS master-slave notion, the local user has near-end ROI control.When authorizing the access right with highest ranking to the long-distance user, the near-end ROI of the ROI perception video encoder 46 of the complete control of video communicator 12 of long-distance user selects.Otherwise the local user has near-end ROI control, and it surmounts any near-end ROI that is made by the long-distance user and selects.
Although can authorize access right to the long-distance user, the local user will preponderate in near-end ROI control procedure, because long-distance user's access right has lower grade than local user's access right usually.Therefore, if the local user selects to specify near-end ROI, will ignore any near-end ROI selection that the long-distance user makes so.On the other hand, if the local user does not specify near-end ROI, divide the grade of the access right of tasking the long-distance user effective so, and the long-distance user can control near-end ROI.Yet as stated, the local user still can select to surmount the access right that default MS master-slave concerns and abandon giving local user's highest ranking.
Fig. 4 be explanation have ROI perception CODEC and further incorporate into another video communication device 12 that ROI extraction module 60 is arranged ' block diagram.The video communication device 12 of Fig. 4 ' almost consistent with the video communication device 12 of Fig. 3.Yet, video communication device 12 ' further comprise ROI extraction module 60 to form local near-end ROI and far-end ROI based on input from the user.Except handling the selection of the ROI pattern that pre-sets simply or allowing the user that default ROI delimited, reorientates or designed size again, ROI extraction module 60 also allows the local user to describe through oral or text ROI and specifies ROI.In particular, ROI extraction module 60 is described based on the ROI that is provided by the local user and is produced local near-end ROI or far-end ROI.
The instance that ROI describes for example comprises the text or the oral input of projects such as " face ", " moving object ", " lip ", " human body ", " background ".Possibly be starved of priority encoding to these objects.For instance, the priority encoding to lip or face can show facial expression preferably, tell speech etc.The text input can be keyed in or from the menu that is appeared by user interface 42, selected.Can through to the microphone of video communication device 12 ' be associated in speak oral input be provided.Under each situation, local user " description " ROI rather than selection or delimitation ROI.ROI extraction module 60 converts said description in suitable near-end or the far-end video scene one group of coordinate.Under the situation of using oral ROI to describe, user interface 42 or ROI extraction module 60 can comprise conventional speech recognition capabilities.In particular, ROI extraction module 60 can produce the information of specifying ROI based on one or more projects through identification.
ROI extraction module 60 is through using through being configured to detect the next ROI coordinate of selecting automatically of conventional precoding processing algorithm of required ROI.In particular, ROI extraction module 60 can be used an algorithm and carries out face detection, feature extraction, Object Segmentation or tracking according to the known routine techniques of the technical staff of video ROI process field.For instance, but ROI extraction module 60 application-dependent in based on the brightness of the pixel of video input data or the routine techniques that chromatic value carries out ROI identification.
Conventional face detection scheme is usually directed to use the colour of skin to discern face and non-face pixel as instructing.IEICE journal Inf.& Syst, in January, 2003, E86-D volume; The 1st phase; The 101-108 page or leaf, in C.-W.Lin, Y.-J.Chang and Y.-C.Chen " A low-complexity face-assisted coding scheme for low bit-rate videotelephony " and IEEE journal On Circuits and Systems for Video Technology, in June, 1999; The 9th volume; The 4th phase, the 551-564 page or leaf has been described the instance of conventional face detection scheme in D.Chai and K.N.Ngan " Face segmentation using skin-colormap in videophone applications ".
When the local user described ROI according to " face ", ROI extraction module 60 was analyzed near-end or far-end video according to circumstances, with automatic identification face and will be appointed as ROI with the coordinate that the face that is discerned is associated.ROI extraction module 60 then is delivered to ROI controller 52 with coordinate and is handled and shone upon by ROI mapper 54 being used for.It should be noted that; ROI extraction module 60 is handled local near-end ROI description according to circumstances or far-end ROI describes; Said description is mapped to suitable extraction algorithm, and the far-end video that suitable near-end video or the warp through precoding of automatic analysis decoded is with the suitable ROI of automatic extraction.
In order to support automatic ROI to detect, ROI extraction module 60 receives the near-end video from video capture device 40, and receives the far-end video from ROI perception Video Decoder 48.Use is described or far-end ROI description from the local near-end ROI of user interface 42, and the automation detection algorithm, and ROI extraction module 60 produces local near-end ROI and far-end ROI according to circumstances, so that be applied to ROI controller 52.Under each situation, ROI extraction module 60 is described local near-end ROI or far-end ROI description converts the coordinate that meets suitable description most into.In the case, the user need not delimit ROI.In addition, the user is not defined to one group of predefined ROI pattern.In fact, ROI controller 52 detects on one's own initiative in the near-end video and describes the suitable district of mating with ROI.
ROI mapper 54 is mapped to the relevant macro zone block (MB) in the frame of video with the ROI coordinate, and produces near-end or far-end ROI MB mapping.In fact, ROI mapper 54 will be translated into video encoder 46 intelligible forms from the ROI coordinate of ROI controller 52.In particular, video encoder 46 with in the MB level, is promptly being handled coding on the MB basis through equipment one by one.For this reason, ROI mapper 54 produces the ROI MB mapping of near-end or far-end video.ROI MB mapping identification drops on the interior MB of ROI of appointment, makes video encoder 46 to use priority encoding to those MB.
Except handling ROI describes, ROI extraction module 60 also can through equipment with handle select from one group of predefined pattern by the local user or delimit, reorientate or the ROI pattern of designed size again by the local user.Therefore, video communication device 12 ' can be substantially produces ROI information as described about the video communication device of Fig. 3 12, have ROI extraction module 60 to describe with the ROI of text or oral form input to handle by the local user but further incorporate into.Be convenient to aspect local user's use, ROI extraction module 60 possibly be desirable.Yet some video communication device possibly not have enough disposal abilities and support ROI extraction module 60.Therefore, 60 expressions of ROI extraction module are according to a desirable but optional assembly of the video communication device of this disclosure.
In certain embodiments, ROI extraction module 60 can be handled not only and describe by the local user but also by the ROI that the long-distance user produces.In this way, can be remotely in some devices but not carry out extraction functionality in this locality.For instance, particular video frequency communicator 14 possibly not have the ROI that ROI that enough local resources or ability support to provide for the user by device 14 describes and extracts.Yet another video communication device 12 possibly extracted to carry out ROI through equipment preferably.In the case, expect that local ROI extraction can or be assigned to remote video communication device by unloading.
In order to support long-range extraction, can in many ways ROI be described and be provided to remote-control device.For instance, word picture can be included in the audio stream that is transferred to remote-control device.Text ROI describes and the ROI pattern of predefined ROI pattern or delimitation can (for example) be transferred to remote-control device through this information being embedded in the video flowing of coding equally.Therefore; The ROI information that sends to another device from a device can be taked pretreated ROI MB mapping or any other indication of ROI or the form of description, and said indication or description are included in and are applied to indication or the description that need handle at the remote-control device place before the long-range encoder.
Fig. 5 is the distributed ROI extraction of server 61 is extracted in explanation via the centre a block diagram.As shown in Figure 5, video communication device 12,14 can be extracted server 61 to the centre and provide enough information to make can to extract ROI.For instance, each device 12,14 can provide separately local near-end ROI to describe, far-end ROI describes, through coding or original near-end video with through coding far-end video.As the alternative method that provides from near-end device through coding far-end video, ROI extracts server 61 can directly receive the far-end video from far end device.Use this information, extract server 61 and produce one among far-end ROI and the local near-end ROI or both, and they are provided to install 12,14 separately.Extracting server 61 can be the server Anywhere that is positioned at communication network, and can be coupled to device 12,14 through wired media, wireless medium or both combinations.Extracting server 61 can be positioned at a distance with respect to video communication device 12,14, or is positioned at installing one in 12,14.Yet in many cases, extracting server 61 can be remote server.In general, extract server 61 and will structurally be different from video communication device 12,14.
Extracting server 61 can work with extraction module 60 very similarly, but long-range, distributed earth operation makes and need in device 12,14, not extract by the local ROI of execution.In this way, the processing cost of ROI extraction can be distributed to the different device that possibly have the larger process ability.The same with ROI extraction module 60, but for example oral, the text of server 61 process user extracted or the dissimilar ROI of pattern description describes.For this reason, ROI extracts server 61 and can comprise suitable ability (for example, speech recognition capabilities) and handle said description.In addition, ROI extracts server 61 can be equipped with video decoding capability with permission analysis video and extraction ROI, and code capacity is with recompile video and embedded ROI information (optionally).
Fig. 6 is the block diagram that explanation is used for the distributed ROI extraction of a plurality of video-phone sessions.In the instance of Fig. 6, ROI extracts server 61 operations and extracts to the ROI of the VT session between the 12N-14N to handle a plurality of video communication device 12A-14A, 12B-14B, 12C-14D.In this way, ROI extracts a plurality of ROI of server 61 executed in parallel and extracts the various VT sessions of task to support just on given current network, carrying out.
Fig. 7 A-7D is the figure that explanation supplies the predefined ROI pattern of Local or Remote user selection.The ROI pattern of Fig. 7 A-7D is the purpose from instance, and should not think have limited.ROI62 in the video scene 34 that appears on the display 36 that Fig. 7 A shows with radio communication device 38 is associated.ROI62 is a basic rectangle placed in the middle substantially in video scene 34.The major length of rectangle ROI62 is vertical extent in video scene 34.In many cases, predefined centered rectangle ROI62 will capture people's face effectively, promptly participate in the long-distance user's of VT conversation face.
Fig. 7 B shows another ROI64, and it takes to have the form of the rectangle of horizontally extending major length in video scene 34.ROI64 is placed in the middle substantially in video scene 34, and can capture for example objects such as vehicle, ship, product, demonstration effectively.
Fig. 7 C shows another ROI66, and its shape is through designing to capture the long-distance user's who participates in the VT conversation face and shoulder.Perhaps, ROI66 can capture spokesman's the face and the shoulder of host or the meeting of intelligencer that one-way video crossfire for example provides news broadcast in using, rally.Under any circumstance, predefined ROI66 all focuses on human VT participant or demonstrator, and realizes the priority encoding to said personnel's physical features.
Fig. 7 D is illustrated in one group of two ROI68,70 that appear side by side in the video scene 34.In the instance of Fig. 5 D, the face that ROI68,70 can capture two people that take one's seat side by side or stand effectively.In this way, two participants' face can be by priority encoding to support facial expression and the higher image quality that moves.
The predefined ROI pattern of describing among Fig. 7 A-7D is for purposes of illustration.Other predefined ROI pattern with alternative site or shape can be provided.For instance, then can have circular or irregularly shaped if some ROI patterns can be mapped to the MB border.
In certain embodiments, can allow the user to selected ROI pattern designed size again or reorientate.Conventional pointer and corner drive technology can be used for realizing again designed size and reorientate.In addition, can drag or through specifying zoom percentage to realize convergent-divergent again clearly through corner ROI size.Certainly, when ROI became big, the degree of priority encoding was owing to the cause of bandwidth constraints reduces.Therefore, in some cases, can in video communication device 12, carry out maximum ROI size.
Fig. 8 is that explanation produces the flow chart of far-end ROI information with the preferential ROI coding in sender's device place control near-end video at recipient's device place.The process of describing among Fig. 8 may be implemented in the video communication device 12 of Fig. 3 or the video communication device 12 of Fig. 4 ' in.In operation, 48 decodings of the ROI perception Video Decoder in the video communication device 12 are from the far-end video (72) of long-range sender's device (for example, video communication device 14 (Fig. 1)).In case decoding far-end video, the user interface 42 of recipient's device 12 just show the far-end video and check (74) for the local user.
If the local user does not ask ROI to select (76), the next frame (72) of the far-end video of holding fire so and decode.Yet if request ROI selects (76), user interface 42 acceptance are from local user's far-end ROI information (78) so.ROI controller 52 is then cooperated to produce far-end ROI MB mapping (80) with ROI mapper 54.ROI perceptual audio coder 46 is embedded in far-end ROI MB mapping in the near-end video of coding and by this far-end ROI mapping is transferred to coding far-end video remote sender device 14 (82).The interior MB of relevant ROI that far-end ROI MB mapping specifies the encoder reply that is associated with remote video communication device 14 to be sent to the far-end video of video communication device 12 uses priority encoding.
Fig. 9 is that the near-end ROI information that explanation is handled from recipient's device is in the flow chart that carries out preferential ROI coding in the near-end video so that combine ROI to follow the tracks of at sender's device.In the instance of Fig. 9, user interface 42 receives the near-end video flowing that is produced by video capture device 40, and presents near-end video (84) to the local user.If local user or long-distance user all do not ask near-end ROI to select (86), all MB (88) in each frame of video of normal encoding promptly do not carry out any priority encoding to the MB in the ROI so.Then will send to long-range recipient's device 14 (89) through the near-end video of coding.
Yet if local user or long-distance user ask near-end ROI to select (86), ROI controller 52 is handled relevant near-end ROI information to produce near-end ROI MB mapping (90) with ROI mapper 54 so.If near-end ROI is specified by local user and long-distance user, authentication module 58 can be interfered to help managing conflict among the ROI so.When receiving near-end ROI MB mapping (90), ROI perception video encoder 46 is through using the better quality coding, coming the MB (92) in the said ROI of priority encoding than strong error protection or both.
Tracking module 56 is followed the tracks of the ROI position (94) in the near-end video through keeping watch on the movable information that is produced by ROI perception video encoder 46.If do not detect the displacement (96) among the ROI, use existing ROI so and shine upon the ROI MB (100) in the near-end video of encoding, and will send to long-range recipient's device (102) through the near-end video of coding.If detect the displacement (96) among the ROI, video tracking module 56 is regulated ROI MB mapping (98) based on movable information before at coding near-end video (100) so.
Figure 10 is that the ROI information that explanation is handled from recipient's device is in the flow chart that carries out preferential ROI coding in the near-end video so that combine user rs authentication at sender's device.Figure 10 depiction 3 or 4 authentication module 58 allow long-distance users to control the operation of near-end ROI, and do not specify any local near-end ROI for easy supposition.Shown in figure 10, for the near-end video flowing (104) that is produced by the video capture device in the video communication device 12 40, authentication module 58 confirms whether the long-distance user of video communication device 14 has asked long-range near-end ROI (106).
If do not ask any long-range near-end ROI (106), and do not specify any local near-end ROI, so all MB (110) in the normal encoding near-end video.Yet, if asked long-range near-end ROI (106), authentication module 58 long-distance user's empirical tests (108) whether of then confirming request near-end ROI so.In particular, authentication module 58 can be through confirming automatically long-distance user's access right with reference to the address books that are stored in video communication device 12 this locality.Perhaps, authentication module 58 can be inquired the local user on one's own initiative via user interface 42, to obtain approval or the refusal to the access right of being carried out near-end ROI control by the long-distance user.
If long-distance user's invalidated (108), all MB (110) in the normal encoding near-end video so.Yet, if long-distance user's empirical tests (108) is authorized near-end ROI control to the long-distance user so.In the case, ROI controller 52 and ROI mapper 54 are handled from long-distance user's near-end ROI information and are produced near-end MB mapping (112).Use near-end MB mapping, 46 priority encodings of ROI perceptual audio coder are by the MB (114) of near-end MB mapping identification.Video communication device 12 then will send to remote video communication device 14 (116) through the near-end video of coding.
Figure 11 is the flow chart that predefined ROI pattern is selected in explanation.In case 48 decodings of ROI perception Video Decoder just show far-end video (120) via user interface 42 to the local user from the far-end video (118) that remote video communication device 14 receives.If the local user asks ROI to select (122), user interface 42 shows the for example menu (124) of the predefine ROI pattern of the ROI pattern shown in Fig. 7 A-7D so.Perhaps, the user can provide ROI to describe or the ROI pattern delimited, reorientated or designed size again.Yet in the instance of Figure 11, operation concentrates on and presents predefined ROI pattern.When the local user selects predefined ROI pattern (126), ROI controller 52 and ROI mapper 54 are based on selected pattern definition ROI MB mapping (128).ROI perception video encoder 46 is embedded in ROI MB mapping in the near-end video of coding and with ROI MB mapping and is transferred to remote video communication device 14 (130) to be used for the ROI of priority encoding far-end video.
Figure 12 is explanation defines the ROI pattern in the video scene 34 that is shown through expansion and contraction ROI template 132 figure.Figure 12 is substantially corresponding to Fig. 2, but explanation can appearing by user's's designed size again ROI template 132.In the instance of Figure 12, can drag with expansion and shrink the ROI template and come through one of the corner of ROI template being carried out corner to ROI template 132 designed size again.The result that corner drags with expansion ROI template 132 is represented by the ROI template 134 through expansion.Corner drags the size increase that causes ROI template 132 or reduces, but keeps relative length and width scaling.Yet, in certain embodiments, also can allow the user to drag a side of ROI template 132 so that increase or reduce the size of ROI template, also change the length and width scaling simultaneously.Can use stylus to combine touch screen or use to realize dragging with another indicator device that the user interface 42 of video communication device 12 is associated.Other indicator device can comprise joystick, touch pads, roller, tracking ball etc.
Figure 13 is that explanation defines the figure of the ROI pattern in the video scene that is shown through dragging ROI template 132.In particular, Figure 13 displaying is reorientated ROI template 132 through another positions 135 that the ROI template dragged in the video scene 34.Can realize dragging through stylus and touch screen or another indicator device that is associated with user interface 42.
Figure 14 is that explanation be through delimiting the figure that ROI pattern 136 defines the ROI pattern in the video scene that is shown with stylus 138 on touch screen.In the instance of Figure 14, describe to produce ROI pattern 136 through free-hand.54 cooperations of ROI controller 52 and ROI mapper to be will becoming the MB mapping with the Coordinate Conversion that the ROI pattern of delimiting be associated, and roughly drop on the MB in the ROI pattern 136 in the said MB mapping identification video scene 34.Like the definition of Figure 12, the ROI pattern shown in 13 and 14 applicable to the ROI in near-end video or the far-end video.
Figure 15 is that explanation uses the pull-down menu 140 of the ROI object of the appointment with the dynamic tracking treated to define the figure of the ROI pattern in the video scene that is shown.Shown in figure 15, user interface 42 presents pull-down menu 140, and its ROI that for example presents " face ", " lip ", " background " and " moving " describes.The local user selects one of clauses and subclauses in the pull-down menu to describe as required ROI.As response, ROI extraction module 60 (Fig. 4) is analyzed near-end video or far-end video according to circumstances, to detect corresponding to the ROI pattern of describing.As substituting of pull-down menu 140, the user can be via user interface 42 input texts or to the oral text of saying of microphone.Under each situation, for example using, the feature detection algorithm of the routine of skin-tone detection, Object Segmentation or similar techniques makes selected ROI and suitable ROI pattern matched.When selected ROI pattern, ROI controller 52 produces suitable ROI MB mapping with ROI mapper 54.Process among Figure 15 is called " dynamically ", be meant each ROI describe must be dynamically with consider in the particular video frequency scene in the ROI pattern matched.
Figure 16 is that explanation is used to have and is mapped to the figure that defines the ROI pattern in the video scene that is shown like the pull-down menu 142 of the ROI object of the appointment of the predefined ROI pattern among Fig. 7 A-7D.Shown in figure 16, user interface 42 presents pull-down menu 142, and its ROI that for example presents " single face ", " two face ", " head/shoulder " and " object " describes.The local user selects one of clauses and subclauses in the pull-down menu as required ROI pattern.As response, ROI controller 52 makes selected ROI pattern and corresponding predefined ROI pattern (like the ROI pattern of describing among Fig. 7 A-7D) coupling.Therefore, be different from ROI shown in Figure 15 and describe, static ROI pattern does not need video analysis.In fact, ROI controller 52 produces the pre-configured ROI MB mapping corresponding to the selection in the pull-down menu 142 with ROI mapper 54.Once more, as substituting of pull-down menu 142, the user can be via user interface 42 input texts or to the oral text of saying of microphone.Process among Figure 15 is called " static state ", is meant that each ROI pattern is corresponding to predefined ROI pattern and MB mapping.
Figure 17 is that explanation uses ROI to describe the flow chart that the interface defines the ROI pattern in the video scene that is shown.Process shown in Figure 17 can be used in combination with pull-down menu or other input medium of Figure 15.Far-end video (144) shown in figure 17, that 48 decodings of ROI perception Video Decoder receive from long-range sender's device 14.User interface 42 then shows far-end video (146) to the local user.If the local user does not ask to select (148) for the ROI of far-end video, any ROI information is not sent to remote video communication device 14 so.Yet if asked ROI selection (148), user interface 42 for example presents that the ROI of the pull-down menu 140 of Figure 17 describes interface (150) so.
When receiving local user ROI when describing (152), ROI controller 52 and ROI mapper 54 are selected ROI pattern (154) and based on selected ROI pattern definition ROI MB mapping (156) based on describing.Once more, can be through using conventional sense technical Analysis far-end video and making ROI description and the specific MB coupling in the far-end video confirm the ROI pattern of selecting.When producing far-end ROI MB mapping, ROI perception video encoder 12 is embedded in far-end ROI MB mapping in the near-end video of coding and with it and is transferred to remote video communication device 14 to be used for priority encoding far-end ROI.
Figure 18 is the flow chart of the solution that conflicts of explanation sender and recipient's device 12, the ROI between 14.In particular, Figure 18 explain authentication module 58 (Fig. 3 or Fig. 4) solve by the near-end ROI of local user's appointment with by the operation that conflicts between the near-end ROI of long-distance user's appointment.When producing the near-end video at sender's device place (160), authentication module 58 confirms that whether near-end ROI is by local user or long-distance user's request (162).If not, all MB of normal encoding (164) and not priority encoding ROI so, and the video through coding that is produced sent to recipient's video communication device 14 (166).
If asked near-end ROI (162), authentication module 58 confirms whether there are conflict (168) by between the near-end ROI of local user's appointment and the near-end ROI by long-distance user's appointment so.If assigning remote near-end ROI not, if or local consistent with long-range near-end ROI, verify so and can the near-end ROI that select be delivered to ROI controller 52 to handle.
If there is not local near-end ROI, but selected long-range near-end ROI, authentication module 58 can allow to use long-range near-end ROI so.Perhaps, in certain embodiments, only when mutual through the local user or through address book in the access grade that writes down and when the long-distance user had authorized clear and definite access right, authentication module 58 just can allow to use long-range near-end ROI.If there is not the ROI conflict, ROI mapper 54 produces near-end MB mapping and it is applied to ROI perception video encoder 46 based on the near-end ROI that is suitable for so.ROI perception video encoder 46 is the interior MB (172) of ROI of priority encoding near-end video then.
If have conflict (168) between local and the long-range near-end ROI, so authentication module 58 definite access grades (174) of for example in video communication device 12, whether having assigned in the local address stored book.If assigned access grade (174), authentication module 58 solves ROI conflict (176) according to the access grade so.The access grade of storing to the long-distance user for instance, can be indicated should surmount in the local user and authorized ROI control to the long-distance user.If do not assign access grade (174), authentication module 58 is sought the permission (178) to long-range ROI control from the local user so.In particular, authentication module 58 can submit to inquiry to carry out near-end ROI control with the long-distance user that requests for permission via user interface 42.
If the local user ratifies, authentication module 58 is delivered to ROI controller 52 to handle with long-range near-end ROI so.If ratify, ROI controller 52 is handled local near-end ROI so.In either case, ROI perception video encoder 46 uses selected ROI to come the MB (172) in the said ROI that drops in the priority encoding near-end video, and will read into long-range recipient's device 14 (166) through the near-end video of coding.In some cases, authentication module 58 not only can solve the local user and conflict with ROI between the long-distance user, and possibly solve the ROI conflict between some long-distance users.The local user can be on one's own initiative one authorizes the access right of control near-end ROI in the long-distance user, or assigns the relative access grade of each long-distance user's ROI control being distinguished order of priority.Usually, authorize the access right of control ROI specially to a user (for example, local user, or one among the long-distance user).
Figure 19 is the flow chart of the preferential decoding of the ROI macro zone block in the explanation far-end video.Shown in figure 19, when when long-range sender's device 14 receives the far-end video (180), the ROI perception Video Decoder 48 in local reception person's device 12 confirms whether long-range ROI specify (182) by the local user.If not, all MB (184) in the ROI perception Video Decoder 48 normal encoding far-end videos so.Yet, if far-end ROI information is specified the ROI MB (186) in the far-end videos that received of the preferential decoding of ROI perception Video Decoder 48 so by the local user.Can use better quality interpolation equation or healthy and strong error concealing technology and come preferential decoding ROI MB through with respect to interpolation equation that is applied to non-ROI MB and error concealing technology.Preferential decoding can comprise that for example better quality is deblocked or the preferential reprocessing of deringing filter.
Technology described herein may be implemented in hardware, software, firmware or its any combination.If be implemented in the software; Can come part to realize said technology through computer-readable media so; Said computer-readable media comprises the program code that contains instruction, and said program code can carry out one or more methods in the above-described method when carrying out.In the case, computer-readable media can comprise for example random-access memory (ram), read-only memory (ROM), nonvolatile RAM (NVRAM), Electrically Erasable Read Only Memory (EEPROM), FLASH memory, magnetic or the optical data storage media etc. of Synchronous Dynamic Random Access Memory (SDRAM).
Program code can be carried out by one or more processors, and said one or more processors are one or more digital signal processors (DSP), general purpose microprocessor, application-specific integrated circuit (ASIC) (ASIC), FPGA (FPGA) or the integrated or discrete logic of other equivalence for example.In certain embodiments, functional being provided in the ad hoc software modules or hardware cell that is configured for use in Code And Decode described herein, or be incorporated in the Video Codec (CODEC) of combination.
Various embodiment have been described.These and other embodiment are within the scope of the appended claims.

Claims (26)

1. video communication method, it comprises:
Receive from the local user of local device and to describe paying close attention to first of district ROI by first in the near-end video of said local device coding, wherein said first describes definition about will be by a said ROI of the said near-end video of said local device coding;
Receive from the remote subscriber of far end device and to describe paying close attention to second of district ROI by second in the near-end video of said local device coding, wherein said second describes definition about will be by said the 2nd ROI of the said near-end video of said local device coding; Select one among a said ROI and said the 2nd ROI;
Said corresponding description based on selected ROI produces the information of specifying selected ROI;
Encode said near-end video on the said local device with the selected ROI that the strengthens said near-end video picture quality with respect to non-ROI zone based on the information of the selected ROI of said appointment; And
Be transferred to said far end device through the near-end video of coding and the information of the selected ROI of said appointment from said local device with said.
2. method according to claim 1, wherein the said corresponding description of selected ROI is a textual description.
3. method according to claim 1, wherein the said corresponding description of selected ROI is a word picture.
4. method according to claim 3, it further comprises through speech recognition handles said word picture and produces the information of the selected ROI of said appointment through the item of identification based on one or more.
5. method according to claim 1, wherein the said corresponding description of selected ROI is a pattern description.
6. method according to claim 5, wherein said pattern description receives as at least one zone delimited on user interface screen in said local user and the said remote subscriber.
7. method according to claim 1, its said corresponding description that further is included in the ROI that treatment of selected is selected in the intermediate server that is different from said local device is to produce the information of the selected ROI of said appointment.
8. method according to claim 1, the information of the selected ROI of its further said appointment are embedded in coding near-end video.
9. method according to claim 1, it further is delivered to said far end device with the information of the selected ROI of said appointment from said local device through out-of-band signalling.
10. method according to claim 1; It further comprises the information of produce specifying the 3rd ROI in coding far-end video that receives from said far end device and will said the 3rd ROI information and saidly be transferred to said far end device together through the near-end video of encoding.
The far-end video is to strengthen the 3rd ROI zone picture quality regional with respect to the non-ROI of said far-end video the said far-end video 11. method according to claim 1, its warp that comprises that further decoding receives from said far end device are encoded.
12. method according to claim 1, it comprises that further the information based on the selected ROI of said appointment produces macro zone block MB mapping, and said MB mapping identification is in the MB in the selected ROI.
13. method according to claim 1, it further comprises:
Keep watch on and the said movable information that is associated through coding near-end video;
Regulate selected ROI based on said movable information; And
Based on said through the ROI that the regulates said near-end video of encoding through selecting.
14. method according to claim 13; It comprises that further the information based on the selected ROI of said appointment produces macro zone block MB mapping; Said MB mapping identification is in the MB in the selected ROI, and wherein regulate selected ROI comprise based on said movable information with the status modifier of MB for be included among the selected ROI or eliminating outside selected ROI.
15. a video communication device, it comprises:
Pay close attention to district ROI engine; It receives from local user of said device and describes paying close attention to first of district ROI by first in the near-end video of said device code, and wherein said first describes definition about will be by a said ROI of the said near-end video of said device code; Receive will being described by second of the 2nd ROI in the near-end video of said device code from the remote subscriber of a far end device, wherein said second describes definition about will be by said the 2nd ROI of the said near-end video of said device code; Select one among a said ROI and said the 2nd ROI;
And produce the information of specifying selected ROI based on the said corresponding description of selected ROI;
Video encoder; Its information based on the selected ROI of said appointment is encoded said near-end video with the selected ROI that the strengthens said near-end video picture quality regional with respect to non-ROI on said device, and it further arrives said far end device with said through the near-end video of coding and the message transmission of the selected ROI of said appointment.
16. device according to claim 15, wherein the said corresponding description of selected ROI is a textual description.
17. device according to claim 15, wherein the said corresponding description of selected ROI is a word picture.
18. device according to claim 17, it further comprises extraction module, and said extraction module is handled said word picture through speech recognition, and produces the information of the selected ROI of said appointment through the item of identification based on one or more.
19. device according to claim 15, wherein the said corresponding description of selected ROI is a pattern description.
20. device according to claim 19, wherein said pattern description receives as at least one zone delimited on user interface screen in said local user and the said remote subscriber.
21. device according to claim 15; Wherein said ROI engine produces the information of specifying the 3rd ROI in coding far-end video that receives from said far end device, and wherein said device is with the information of said the 3rd ROI of said appointment with saidly be transferred to said far end device together through coding near-end video.
22. device according to claim 15; It further comprises Video Decoder, and the warp that said video decoder decodes receives from said far end device is encoded the far-end video to strengthen the 3rd ROI picture quality regional with respect to the non-ROI of said far-end video the said far-end video.
23. device according to claim 15, it comprises that further the information based on the selected ROI of said appointment produces macro zone block MB mapping, and said MB mapping identification is in the MB in the selected ROI.
24. device according to claim 15; It further comprises tracking module; Said tracking module is kept watch on and the said movable information that is associated through coding near-end video; And regulate selected ROI based on said movable information, wherein said encoder based on said through the ROI that the regulates said near-end video of encoding through selecting.
25. device according to claim 24; It further comprises mapper module; Said mapper module produces macro zone block MB mapping based on the information of the selected ROI of said appointment; Said MB mapping identification is in the MB in the selected ROI, wherein said tracking module through based on said movable information with the status modifier of MB for be included among the selected ROI or eliminating at the selected ROI of the external adjusting of selected ROI.
26. a video coding system, it comprises:
First video communication device, its coding near-end video,
Second video communication device, it receives said near-end video from said first video communication device,
Wherein said first video communication device receives from the local user of said first video communication device and describes paying close attention to first of district ROI by first in the said near-end video of said first video communication device coding,
Wherein said first describes and to define about will be by a said ROI of the said near-end video of said first video communication device coding;
Wherein said first video communication device receives being described by second of the 2nd ROI in the said near-end video of said first video communication device coding from the remote subscriber of said second video communication device;
Wherein said second describes definition about will be by said the 2nd ROI of the said near-end video of said first video communication device coding;
Wherein said first video communication device is selected one among a said ROI and said the 2nd ROI;
Intermediate server is different from said first and second video communication device on its structure, and its information of specifying selected ROI based on said corresponding description generation of selected ROI,
Wherein said first video communication device based on the information of the selected ROI of said appointment encode said near-end video with the selected ROI that strengthens said near-end video with respect to the picture quality in non-ROI zone and with the message transmission of said near-end video and the selected ROI of said appointment through coding to said second video communication device.
CN200680014872.7A 2005-03-09 2006-03-08 Region-of-interest extraction for video telephony Expired - Fee Related CN101171841B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US66020005P 2005-03-09 2005-03-09
US60/660,200 2005-03-09
US11/183,072 US8019175B2 (en) 2005-03-09 2005-07-15 Region-of-interest processing for video telephony
US11/183,072 2005-07-15
PCT/US2006/008457 WO2006130198A1 (en) 2005-03-09 2006-03-08 Region-of-interest extraction for video telephony

Publications (2)

Publication Number Publication Date
CN101171841A CN101171841A (en) 2008-04-30
CN101171841B true CN101171841B (en) 2012-06-27

Family

ID=39334927

Family Applications (2)

Application Number Title Priority Date Filing Date
CNA2006800145199A Pending CN101167365A (en) 2005-03-09 2006-03-08 Region-of-interest processing for video telephony
CN200680014872.7A Expired - Fee Related CN101171841B (en) 2005-03-09 2006-03-08 Region-of-interest extraction for video telephony

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CNA2006800145199A Pending CN101167365A (en) 2005-03-09 2006-03-08 Region-of-interest processing for video telephony

Country Status (1)

Country Link
CN (2) CN101167365A (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102170552A (en) * 2010-02-25 2011-08-31 株式会社理光 Video conference system and processing method used therein
US20130223537A1 (en) * 2010-09-24 2013-08-29 Gnzo Inc. Video Bit Stream Transmission System
CN102025965B (en) * 2010-12-07 2014-01-01 华为终端有限公司 Video talking method and visual telephone
EP2523145A1 (en) * 2011-05-11 2012-11-14 Alcatel Lucent Method for dynamically adapting video image parameters for facilitating subsequent applications
CN103024334B (en) * 2011-09-28 2015-11-25 中国移动通信集团公司 A kind of method, system and equipment realizing visual telephone service
CN102438144B (en) * 2011-11-22 2013-09-25 苏州科雷芯电子科技有限公司 Video transmission method
US20130279570A1 (en) * 2012-04-18 2013-10-24 Vixs Systems, Inc. Video processing system with pattern detection and methods for use therewith
CN102750122B (en) * 2012-06-05 2015-10-21 华为技术有限公司 Picture display control, Apparatus and system
CN103581603B (en) * 2012-07-24 2017-06-27 联想(北京)有限公司 The transmission method and electronic equipment of a kind of multi-medium data
TW201410014A (en) * 2012-08-22 2014-03-01 Triple Domain Vision Co Ltd A method for defining a monitored area for an image
CN103310411B (en) 2012-09-25 2017-04-12 中兴通讯股份有限公司 Image local reinforcement method and device
CN104782121A (en) * 2012-12-18 2015-07-15 英特尔公司 Multiple region video conference encoding
US9386275B2 (en) * 2014-01-06 2016-07-05 Intel IP Corporation Interactive video conferencing
US9516220B2 (en) 2014-10-02 2016-12-06 Intel Corporation Interactive video conferencing
US10021346B2 (en) 2014-12-05 2018-07-10 Intel IP Corporation Interactive video conferencing
CN105120366A (en) * 2015-08-17 2015-12-02 宁波菊风系统软件有限公司 A presentation method for an image local enlarging function in video call
WO2020095728A1 (en) * 2018-11-06 2020-05-14 ソニー株式会社 Information processing device and information processing method
CN111416939A (en) * 2020-03-30 2020-07-14 咪咕视讯科技有限公司 Video processing method, video processing equipment and computer readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1291315A (en) * 1998-03-20 2001-04-11 三菱电机株式会社 Lossy/lossless region of interest image coding

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6178204B1 (en) * 1998-03-30 2001-01-23 Intel Corporation Adaptive control of video encoder's bit allocation based on user-selected region-of-interest indication feedback from video decoder
US7559026B2 (en) * 2003-06-20 2009-07-07 Apple Inc. Video conferencing system having focus control
US20050024487A1 (en) * 2003-07-31 2005-02-03 William Chen Video codec system with real-time complexity adaptation and region-of-interest coding

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1291315A (en) * 1998-03-20 2001-04-11 三菱电机株式会社 Lossy/lossless region of interest image coding

Also Published As

Publication number Publication date
CN101171841A (en) 2008-04-30
CN101167365A (en) 2008-04-23

Similar Documents

Publication Publication Date Title
CN101171841B (en) Region-of-interest extraction for video telephony
JP6022618B2 (en) Region of interest extraction for video telephony
EP1856914B1 (en) Region-of-interest processing for video telephony
CN101288303B (en) Picture-in-picture processing method and device for video telephony
US9154737B2 (en) User-defined content magnification and multi-point video conference system, method and logic
CN102215217B (en) Establishing a video conference during a phone call
CN102215373B (en) In conference display adjustments
WO2005104552A1 (en) Moving picture data encoding method, decoding method, terminal device for executing them, and bi-directional interactive system
EP2936802A1 (en) Multiple region video conference encoding
KR101577986B1 (en) System for generating two way virtual reality
CN104270597A (en) Establishing A Video Conference During A Phone Call
CN101123702A (en) Apparatus for image display and control method thereof
CN105704424A (en) Multi-image processing method, multi-point control unit, and video system
CN111193892A (en) Remote linkage system and method based on virtual intelligent medical platform
KR20120078649A (en) Camera-equipped portable video conferencing device and control method thereof
KR20090026467A (en) Fractal scalable video coding system using multi-porcessor and processing method thereof
CN101018316A (en) Video conference system based on IPTV and its implementation method
CN104935861A (en) Multi-party multimedia communication method
JP2002209197A (en) Multiple place video conference system
JP2009055292A (en) Set top box and bidirectional communication system using the same
KR101628071B1 (en) Method and system for large scaled group communication
CN103139531A (en) Method and device of video conference terminal display frame processing
KR20090015673A (en) Method for transmitting and receiving of video telephony having function of adjusting transmission environment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1117688

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1117688

Country of ref document: HK

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120627

Termination date: 20190308

CF01 Termination of patent right due to non-payment of annual fee