CN104935860A

CN104935860A - Method and device for realizing video calling

Info

Publication number: CN104935860A
Application number: CN201410100207.6A
Authority: CN
Inventors: 王强; 秦文煜; 左力; 黄英; 熊君君
Original assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Current assignee: Beijing Samsung Telecom R&D Center; Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Priority date: 2014-03-18
Filing date: 2014-03-18
Publication date: 2015-09-23

Abstract

The invention discloses a method and a device for realizing video calling, wherein the method comprises the steps of establishing video calling connection with an opposite-end device; acquiring the current 2D image of the local end; and according to the current image, generating a driving parameter which corresponds with a local-end user, and transmitting the generated driving parameter to the opposite-end device. According to the method and the device of the invention, because transmission of small data volume of driving parameter is required in the video calling process, volume of transmitted data is greatly reduced.

Description

Video calling implementation method and device

Technical field

The application relates to technical field of video communication, particularly a kind of video calling implementation method and device.

Background technology

Video call technology is a problem of extensively research in the technical fields such as image procossing, computer vision, graphics, and it is in daily life, industrial production, long-distance education and meeting, and even the numerous areas such as military has important application.

Video calling, also known as visual telephone, is divided into walking IP circuit and walks ordinary telephone wiring two kinds of modes.Video calling is often referred to based on the Internet and mobile Internet (3G the Internet) end, by transmitting the voice of people and a kind of communication mode of image (bust, photo, article etc. of user) in real time between the equipment such as mobile phone.Achieve 3D(three-dimensional at present) video calling, by 3D video calling, the 3D model that the image in local terminal display opposite end can be obtained after 3D modeling.

Propose a kind of 3 D stereo video call method based on 3D television set in prior art, the method mainly comprises the following steps:

The 3D television set of A, calling party receives the telephone operation instruction of calling subscriber, the 3D television set used by the video calling administrative unit calling callee of the 3D television set of calling party;

The 3D television set of B, callee receives the operational order of called subscriber, by the video calling request of video calling administrative unit response calling subscriber, the 3D television set of calling and called both sides sets up P2P(peer-to-peer by video calling administrative unit, point-to-point/equity) connect;

The two dimensional image collecting unit of the 3D television set of C, calling and called both sides starts the 2D(two dimension gathering user) video image information, start depth image sensing unit simultaneously and start the deep image information responding to user place environment;

The user place depth information of scene that the depth image processing unit analysis depth image sensing unit of the 3D television set of D, calling and called both sides senses, and in conjunction with the 2D video image information that two dimensional image collecting unit gathers, generate the 3 D stereo call video with depth information in real time;

The 3 D stereo of each self-generating call video through overcompression and decompress(ion) cell processing, is sent to partner place 3D television set in the mode of P2P and carries out 3D Stereoscopic Video Presentation by the 3D television set of E, calling and called both sides.

But there is following problem in said method:

After calling and called both sides generate the 3 D stereo call video with depth information in real time, this 3 D stereo call video is sent to the other side, like this, because needs carry out real-time Transmission to three dimensional video data, therefore transmitted data amount is larger, once be subject to the network bandwidth and network fluctuation impact, in video call process, easily produce the loss of three dimensional video data, thus occur the phenomenons such as mosaic.

Summary of the invention

This application provides a kind of video calling implementation method and device, be intended to solve the problem that the transmitted data amount of prior art existence is larger.

The technical scheme of the application is as follows:

On the one hand, provide a kind of video calling implementation method, comprising:

Set up and be connected with to the video calling of end device;

Obtain the 2D image that local terminal is current;

According to this present image, generate the driving parameter that this end subscriber is corresponding, the driving parameter of generation is sent to end device.

On the other hand, additionally provide a kind of video calling implementation method, comprising:

Set up and be connected with to the video calling of end device;

The 3D model corresponding to peer user is obtained by 3D modeling;

Receive the driving parameter that the peer user sent end device is corresponding, use this driving parameter, drive the 3D model corresponding to peer user to make corresponding actions.

Another aspect, additionally provides a kind of video calling implementation method, comprising:

Set up and be connected with to the video calling of end device;

Obtain the 2D image that local terminal is current;

According to this present image, generate the driving parameter that this end subscriber is corresponding, the driving parameter of generation is sent to end device;

The 3D model corresponding to peer user is obtained by 3D modeling;

Another aspect, additionally provides a kind of video calling implement device, comprising:

Connection establishment module, is connected with to the video calling of end device for setting up;

Video acquiring module, for obtaining the current 2D image of local terminal;

Generation module, for the present image got according to image collection module, generates the driving parameter that this end subscriber is corresponding;

Sending module, for sending to the driving parameter driving parameter generation module to generate end device.

MBM, for obtaining the 3D model corresponding to peer user by 3D modeling;

Receiver module, for receiving driving parameter corresponding to the peer user sent end device;

Driver module, the driving parameter that the peer user received for using receiver module is corresponding, corresponding actions made by the 3D model corresponding to peer user driving MBM to obtain.

Video acquiring module, for obtaining the current 2D image of local terminal;

Sending module, for sending to end device by the driving parameter driving parameter generation module to generate;

MBM, for obtaining the 3D model corresponding to peer user by 3D modeling;

In the above technical scheme of the application, originating devices and called device obtain the 2D video image of local terminal separately, generate driving parameter corresponding to this end subscriber, send to end device by the driving parameter of generation according to the present image got; In addition, also can obtain by 3D modeling the 3D model corresponding to peer user, after receiving the driving parameter sent end device, corresponding actions made by the 3D model using the driving driving parameter received to correspond to peer user.Can find out, originating devices and called device both sides are in the process of carrying out video calling, only need mutually to send driving parameter corresponding to this end subscriber, can according to the driving parameter sent end device, the 3D model corresponding to peer user is driven to make corresponding actions, thus form 3D animation effect true to nature, achieve 3D video calling; Due in video call process, only transmit the driving parameter of small data quantity, significantly reduce transmitted data amount, affect by the network bandwidth and network fluctuation also just less, thus avoid the loss easily producing three dimensional video data in video call process, thus there is the phenomenons such as mosaic.

Accompanying drawing explanation

Fig. 1 is the flow chart of the video calling implementation method of the embodiment of the present application one;

Fig. 2 is the schematic diagram of the 3D model of the originating devices of the embodiment of the present application one and the display screen display of called device;

Fig. 3 be the embodiment of the present application one at t ₁moment, the 2D video image of calling subscriber and called subscriber, true 3D manikin, display figure with emotion icons;

Fig. 4 be the embodiment of the present application one at t ₂moment, the 2D video image of calling subscriber and called subscriber, true 3D manikin, display figure with emotion icons;

Fig. 5 be the embodiment of the present application one at t ₃moment, the 2D video image of calling subscriber and called subscriber, true 3D manikin, display figure with emotion icons;

When Fig. 6 is that the calling subscriber of the embodiment of the present application one has 2 people, called subscriber has 1 people, the 2D video image of calling subscriber and called subscriber, true 3D manikin, display figure with emotion icons;

Fig. 7 is the called subscriber of the embodiment of the present application one when employing the secret protection of face, the 2D video image of called subscriber, 3D scene be made up of default 3D manikin and virtual background, and the display figure of emotion icons;

Fig. 8 is the secret protection that the called subscriber of the embodiment of the present application one employs face, and when substituted for the hair in default 3D manikin, the 2D video image of called subscriber, by substituted for 3D scene that the default 3D manikin after hair and virtual scene form, display figure with emotion icons;

Fig. 9 is the calling subscriber of the embodiment of the present application one when substituted for virtual background, the 3D scene before replacement, with replace after the display figure of 3D scene;

Figure 10 be the embodiment of the present application one calling subscriber every sky to correspond to the true 3D manikin of called subscriber impact a fist time, the 2D video image of calling subscriber and called subscriber, the display figure with the 3D scene be made up of true 3D manikin and virtual scene;

Figure 11 is the structural representation of the video calling implement device of the embodiment of the present application two.

Embodiment

In order to solve the larger problem of transmitted data amount that prior art exists, in the application's following examples, provide a kind of video calling implementation method, and a kind of can device to apply the method.

The video call method of the embodiment of the present application comprises the content of the following aspects:

One, secret protection is not used

As shown in Figure 1, the video calling implementation method of the embodiment of the present application comprises the following steps:

Step S100, originating devices is set up video calling with called device and is connected;

Step S102, originating devices and called device obtain the 2D(two dimension of local terminal separately) video image;

In step s 102, the 2D video image of local terminal can be obtained by a 2D video camera.While the 2D video image obtaining local terminal, also can obtain the audio frequency of local terminal, after getting video image and audio frequency, first can also carry out enhancing process to video image and audio frequency, to conform requirement and transmission requirement etc.

Step S104, the first two field picture including the front face of this end subscriber (that is, calling subscriber) got is sent to called device by originating devices;

Wherein, front refers to that the anglec of rotation on upper and lower, left and right or plane is less than or equal to 20 degree.

Step S106, after called device receives above-mentioned first two field picture that originating devices sends, the first two field picture including the front face of this end subscriber (that is, called subscriber) got by local terminal sends (or feedback) to originating devices;

Follow-up both sides are without the need to transmitting video image frame again;

Step S108, above-mentioned first two field picture that originating devices and called device get according to oneself separately carries out 3D modeling, obtains the 3D model corresponding to this end subscriber;

After obtaining the 3D model corresponding to this end subscriber, show this 3D model.

In actual implementation process, the display screen of originating devices and called device is provided with two 3D display windows, our 3D display window and the other side 3D display window can be called, be respectively used to the 3D model showing we and the other side.Like this, in step S108, in our 3D display window, display corresponds to the 3D model of this end subscriber.

Step S110, originating devices and called device receive the first two field picture sent end device separately, and the first two field picture according to receiving carries out 3D modeling, obtain the 3D model corresponding to peer user;

After obtaining the 3D model corresponding to peer user, in the other side 3D display window, show this 3D model.

Step S112, originating devices and called device according to each two field picture got, generate the driving parameter that this end subscriber is corresponding, send to end device by the driving parameter of generation separately;

Concrete, when generating driving parameter, first generate facial feature data corresponding to this end subscriber according to each two field picture, then generate according to facial feature data and drive parameter.

Wherein, facial feature data is the data of key point corresponding to each organ (or parts) on the face of people.

Driving parameter refers to the parameter for driving the face of 3D model to make corresponding actions.Such as, each organ of face corresponds to one and drives parameter γ, and 0≤γ≤1, when γ is different value, represents that this organ is in different operate conditions.

Step S114, originating devices and called device are used in the driving parameter generated in step S112 separately, drive the 3D model corresponding to this end subscriber to make corresponding actions, thus form 3D animation effect true to nature;

Step S116, originating devices and called device receive the driving parameter sent end device separately, use the driving parameter received, and drive the 3D model corresponding to peer user to make corresponding actions, thus form 3D animation effect true to nature.

In step S108-step S116, according to the first two field picture and the facial feature data according to the first two field picture generation, carry out the 3D modeling of single image, obtain 3D model, and generate expression base data simultaneously; Follow-up, to drive parameter and the acting in conjunction of expression base data in this 3D model, make the face of 3D model make corresponding actions.

In addition, in above-mentioned steps S102, while the 2D video image obtaining local terminal, also can obtain the audio frequency of local terminal, follow-up, the audio frequency that local terminal gets also can send to end device by originating devices and called device in real time, and receives the audio frequency sent end device, then, the audio frequency received is play.

Obviously, in actual implementation process, originating devices and called device are when sending data, and the data that will send send after compressed encoding again, when receiving data, after carrying out demodulation, then process the data received.Wherein, transmission and the reception of data can be carried out according to 3G-324M agreement.

In actual implementation process, the step S104 in said method, step S108 and step S112 can carry out simultaneously.

In the method for the embodiment of the present application, originating devices and called device obtain the 2D video image of local terminal separately, then, the first two field picture in this video is sent to end device, according to each two field picture got, generate the driving parameter that this end subscriber is corresponding, the driving parameter of generation is sent to end device; In addition, also can receive and the first two field picture is sent to end device, the first two field picture according to receiving carries out 3D modeling, obtain the 3D model corresponding to peer user, receive the driving parameter that end device is sent, use the driving parameter received, drive the 3D model corresponding to peer user to make corresponding actions.

Can find out, originating devices and called device both sides are in the process of carrying out video calling, only need the first two field picture mutually sending local terminal, 3D modeling can be carried out based on the first two field picture sent end device, obtain the 3D model corresponding to peer user, and, follow-up both sides are without the need to transmission of video frame again, only need transmit driving parameter, can according to the driving parameter sent end device, drive above-mentioned 3D model to make corresponding actions, thus form 3D animation effect true to nature, achieve 3D video calling; Due in video call process, only transmit the driving parameter of a frame 2D video data and small data quantity, significantly reduce transmitted data amount, affect also just less by the network bandwidth and network fluctuation, thus avoid the loss easily producing three dimensional video data in video call process, thus there is the phenomenons such as mosaic.

In addition, due in the above-mentioned methods, what show is correspond to the 3D model of this end subscriber and correspond to the 3D model of peer user, only display 3D model is attractive in appearance not, therefore, in step S108 and step S110, (can be the first two field picture that local terminal gets according to the first two field picture, also can be the first two field picture received) when carrying out 3D modeling, 3D modeling is carried out according to the human region image in the first two field picture, obtain 3D model, then, by the virtual background in the preset model storehouse of this locality, add to after this 3D model, obtain 3D scene, show this 3D scene.Like this, have virtual background after 3D model, when showing, in we and the other side 3D display window, show the 3D scene be made up of 3D model and virtual background, display effect is more attractive in appearance, improves Consumer's Experience.

In order to embody the expression information of caller, originating devices and called device can also identify the expression of this end subscriber separately, send to end device, thus, at this end device with to end device all demonstrating corresponding emotion icons.Concrete, in step S102 while the 2D video image obtaining local terminal, also obtain the audio frequency of local terminal; Then, after the 2D video image obtaining local terminal and audio frequency, originating devices and called device also need to perform following steps:

Step S202, according to video image and/or the audio frequency of the local terminal got, identifies the expression of this end subscriber; Perform step S204 afterwards;

Concrete, expression can be such as glad, sad, careless, Embarrassing etc.

Step S204, from the emotion icons storehouse of this locality, finds and identifies the emotion icons that the expression of this end subscriber obtained matches, showing this emotion icons; By the identification information of emotion icons found, send to end device; Wherein, the corresponding identification information of each emotion icons in local emotion icons storehouse;

In actual implementation process, this emotion icons can be shown in the upper left corner of our 3D display window.

Step S206, receives the identification information to the emotion icons that end device is sent, and according to the identification information received, finds corresponding emotion icons, show this emotion icons from the emotion icons storehouse of this locality.

In actual implementation process, this emotion icons can be shown in the upper left corner of the other side 3D display window.

Above-mentioned step S202 and step S206 does not limit the sequencing of execution.

In said method, image and/or sound can be passed through, identify the expression of this end subscriber in communication process in real time, from the emotion icons storehouse of this locality, find and identify the emotion icons that the expression of this end subscriber obtained matches, then, show this emotion icons and the id information of this emotion icons is sent to end device; After end device being received to the id information of this emotion icons, also can find the emotion icons indicated by this ID from the emotion icons storehouse of this locality, show this emotion icons, thus, this emotion icons is all shown, for embodying the current expression of respective user (or mood) at local terminal and opposite end.Thus, achieve the expression identification function of user in 3D video call process, make the mode combined by 3D model and emotion icons, carry out expression and the emotional change of real-time embodying user.

Such as, as shown in Figure 2, on the display screen of originating devices, the emotion icons that 3D model and expression corresponding to called subscriber (i.e. the user of called this one end of device) " are taken aback " is shown in the other side 3D display window of top, the emotion icons of 3D model corresponding to calling subscriber (that is, the user of this one end of originating devices) and expression " arrogance " is shown in the our 3D display window of below.In addition, the 2D video image of 2D display window for display of calling user can also be set.

On the display screen of called device, show the emotion icons of 3D model corresponding to calling subscriber and expression " arrogance " in the other side 3D display window of top, in the our 3D display window of below, show the 3D model corresponding to called subscriber.In addition, 2D display window can also be set for showing the 2D video image of called subscriber.

And, in 3D video call process, the 3D model corresponding to calling subscriber can make corresponding actions along with the changes in faces of calling subscriber, and, 3D model corresponding to called subscriber also can make corresponding actions along with the changes in faces of called subscriber, as in Figure 3-5.

Wherein, Fig. 3 shows at t ₁moment, the 2D video image of calling subscriber, 3D model, display figure with emotion icons, and, the 2D video image of called subscriber, 3D model, display figure with emotion icons.In figure 3, the upper left corner be the 2D video image of calling subscriber, the upper right corner be the 3D model corresponding to calling subscriber, show corresponding emotion icons in the upper left corner of this 3D model; The lower left corner be the 2D video image of called subscriber, the lower right corner be the 3D model corresponding to called subscriber, show corresponding emotion icons in the upper left corner of this 3D model.

Fig. 4 shows at t ₂moment, the 2D video image of calling subscriber, 3D model, display figure with emotion icons, and, the 2D video image of called subscriber, 3D model, display figure with emotion icons.In the diagram, the upper left corner be the 2D video image of calling subscriber, the upper right corner be the 3D model corresponding to calling subscriber, show corresponding emotion icons in the upper left corner of this 3D model; The lower left corner be the 2D video image of called subscriber, the lower right corner be the 3D model corresponding to called subscriber, show corresponding emotion icons in the upper left corner of this 3D model.

Fig. 5 shows at t ₃moment, the 2D video image of calling subscriber, 3D model, display figure with emotion icons, and, the 2D video image of called subscriber, 3D model, display figure with emotion icons.In Figure 5, the upper left corner be the 2D video image of calling subscriber, the upper right corner be the 3D model corresponding to calling subscriber, show corresponding emotion icons in the upper left corner of this 3D model; The lower left corner be the 2D video image of called subscriber, the lower right corner be the 3D model corresponding to called subscriber, show corresponding emotion icons in the upper left corner of this 3D model.

As can be seen from Fig. 3-5, not in the same time, the 3D model corresponding to calling subscriber and called subscriber defines animation effect true to nature.

In the prior art, transmission be three dimensional video data, because the size of three dimensional video data is directly proportional to the number of user, namely, people in video is more, and three dimensional video data is larger, therefore, usually be only applicable to the 3D video calling of both sides totally 2 people, that is, a side only has 1 people.In the embodiment of the present application, because transmitted data amount is little, therefore, applicable each party has many people 3D video calling of many people.

Concrete, in the above-mentioned methods, this end subscriber can have at least one, and peer user also can have at least one.Now, after the first two field picture getting local terminal, also can add up the number of this end subscriber in the first two field picture, be that each this end subscriber in the first two field picture distributes an identification information; Then, while the first two field picture of local terminal is sent to end device, also the corresponding relation of each this end subscriber in the first two field picture and identification information thereof can be sent to end device.Further, while receiving the first two field picture that end device is sent, the corresponding relation of each peer user in the first two field picture that end device is sent and identification information thereof can also be received; Then, obtain correspond to peer user 3D model after, also can set up each peer user, this peer user identification information and correspond to this peer user 3D model between corresponding relation.

In step S108 and step S110, carry out 3D modeling according to the first two field picture, obtain 3D model, the method showing this 3D model comprises step 11-12:

Step 11: carry out 3D modeling according to everyone the body region image in the first two field picture, obtain at least one 3D model, wherein, a human region image corresponds to this end subscriber or peer user;

Step 12: the virtual background in the preset model storehouse of this locality and this at least one 3D model are combined, obtains 3D scene, show this 3D scene.

That is, when calling subscriber and/or called subscriber have many people, understand the virtual background of interpolation below at 3D model corresponding to this many people, then, in the 3D display window of correspondence, show the 3D scene be combined into by 3D model corresponding to this many people and this virtual background.

In addition, when calling subscriber and/or called subscriber have at least one individual, in step S112, for everyone the body region image in each two field picture, corresponding driving parameter can be generated, in step S114 and step S116, can use and drive 3D model corresponding to driving parameter to make corresponding actions.

And when calling subscriber and/or called subscriber have at least one, due in communication process, number may change, therefore, originating devices and called device need the situation of change of the number detecting this end subscriber in real time.Now, when this end subscriber has at least one, after step s 102, originating devices and/or called device also need to perform following steps:

Step S302, what get for local terminal works as previous frame image, adds up the number in this present image, and judges compared with the number in previous frame image, and whether the number in this present image there occurs change; Decrease at least one this end subscriber if judge, then perform step S304, increased at least one this end subscriber newly if judge, then perform step S306;

Concrete, Face datection recognition technology can be utilized, add up the number in each two field picture, and when judging to decrease at least one this end subscriber, whom this end subscriber that can identify minimizing is, when judging to have increased at least one this end subscriber newly, whom this end subscriber that can identify increase is.

Step S304, sends to end device by mark (ID) information of this end subscriber reduced, and is deleted by the 3D model corresponding to this end subscriber of this minimizing;

That is, in the 3D scene shown from our 3D display window, delete the 3D model of this end subscriber corresponding to this minimizing.

Step S306, from this present image, segmentation obtains the human region image corresponding to this newly-increased end subscriber, and this end subscriber newly-increased for each distributes an identification information; By this human region image, and the corresponding relation of each this newly-increased end subscriber and identification information thereof sends to end device; Carry out 3D modeling according to this human region image, obtain the 3D model corresponding to each this newly-increased end subscriber, display corresponds to the 3D model of each this newly-increased end subscriber.

Concrete, head can be utilized to take on cutting techniques etc., and from present image, segmentation obtains the human region image corresponding to this newly-increased end subscriber.

That is, from the first two field picture when detecting that local terminal has increased user newly, segmentation obtains the human region image of this end subscriber increased newly, send to end device by splitting the human region image obtained, simultaneously, carry out 3D modeling according to this human region image, obtain the 3D model corresponding to this this newly-increased end subscriber, this 3D model is added in the 3D scene shown in our 3D display window.

In addition, when peer user has at least one, originating devices and/or called device also need to perform following operation:

Step S402, receives the identification information of the peer user to the minimizing that end device is sent; Perform step S404 afterwards;

Step S404, according to the identification information received, searches and deletes the 3D model of peer user corresponding to this minimizing;

That is, in the 3D scene shown from the other side 3D display window, delete the 3D model of the peer user corresponding to this minimizing.

Step S406, receives the human region image corresponding to newly-increased peer user sent end device, and the corresponding relation of each newly-increased peer user and identification information thereof; Perform step S408 afterwards;

Step S408, the human region image according to receiving carries out 3D modeling, obtains the 3D model corresponding to newly-increased peer user, shows this 3D model; Set up the corresponding relation between each newly-increased peer user, the identification information of this newly-increased peer user and the 3D model corresponding to this newly-increased peer user.

That is, by corresponding to the 3D model of newly-increased peer user, add in the 3D scene shown in the other side 3D display window.

In actual implementation process, step S402 and step S406 does not have sequencing.

In the said method of the embodiment of the present application, originating devices and called device detect the number situation of change of this end subscriber separately in real time, when detecting that the number of this end subscriber reduces, the id information of this end subscriber reduced is sent to end device, meanwhile, the 3D model of this end subscriber corresponding to this minimizing is deleted; When detecting that the number of this end subscriber increases, this end subscriber newly-increased for each distributes an identification information, the corresponding relation of the human region image and each this newly-increased end subscriber and identification information thereof that correspond to this newly-increased end subscriber in the first two field picture when having increased user newly is sent to end device, simultaneously, 3D modeling is carried out according to this human region image, obtain the 3D model corresponding to this newly-increased end subscriber, show this 3D model; And, can after receiving above-mentioned id information to end device, search and delete the 3D model indicated by this id information, when receiving the corresponding relation of above-mentioned human region image and each newly-increased peer user and identification information thereof, carry out 3D modeling according to this human region image, obtain corresponding 3D model; In addition, when this end subscriber there occurs minimizing, also this end subscriber of minimizing is just no longer included in follow-up video image, thus, also the driving parameter of this end subscriber of this minimizing would not be corresponded to by regeneration, and, when this end subscriber there occurs increase, this newly-increased end subscriber is had in follow-up video image, thus, also just can generate the driving parameter corresponding to this this newly-increased end subscriber, this driving parameter can be used, drive the 3D model corresponding to this this newly-increased end subscriber to make corresponding actions.Can find out, said method achieves the 3D video calling of many people, and the animation effect of many people 3D model.

In addition, when calling subscriber and/or called subscriber have at least one, in the technical scheme of above-mentioned use emotion icons, the expression of each this end subscriber can be identified in step S202, then, from the emotion icons storehouse of this locality in step S204, the emotion icons matched with the expression of each this end subscriber is found, show this emotion icons, the id information of the emotion icons found is sent to end device; Further, receive the id information at least one emotion icons that end device is sent in step S206 after, from the emotion icons storehouse of this locality, find corresponding emotion icons according to each id information, show this emotion icons.Thus, each all corresponding emotion icons at least one calling subscriber or called subscriber.

Such as, calling subscriber has 2 people, called subscriber has 1 people, now, the 2D video image including these 2 calling subscribers that originating devices obtains, the 3D model of correspondence, the display figure with corresponding emotion icons, and, the 2D video image including this 1 called subscriber that called device obtains, the 3D model of correspondence, the display figure with corresponding emotion icons, as shown in Figure 6.In figure 6, the upper left corner is the 2D video image including these 2 calling subscribers, and the upper right corner is the 3D model corresponding to these 2 calling subscribers, shows corresponding emotion icons in the upper left corner of this 3D model; The lower left corner is the 2D video image of this 1 called subscriber, and the lower right corner is the 3D model corresponding to this 1 called subscriber, shows corresponding emotion icons in the upper left corner of this 3D model.

Can find out, each in these 2 calling subscribers, an all corresponding 3D model and an emotion icons.

Two, secret protection is used

In video call process, because both sides can see the real video of the other side, also can hear the actual sound of the other side, therefore, in some cases, need to use individual privacy protection, take some secret protection measures.Individual privacy is divided into absolute privacy and indirect privacy.The video call process of prior art mainly adopts the mode of absolute secret protection, such as, do not allow the other side see and/or hear we.But such mode underaction, reduces Consumer's Experience.

In order to better carry out individual privacy protection, adopt corresponding secret protection measure more neatly according to user's request, the mode adopting absolute privacy and indirect privacy to combine in the embodiment of the present application carries out individual privacy protection.User can select to protect the privacy of following three kinds of contents: 1) protect real life facial, 2) protection background sound, 3) protect real life facial and background sound.

Below respectively the concrete grammar of protection real life facial and protection background sound is introduced.

(1) real life facial is protected

In the method, to use a side of facial secret protection to be that originating devices is described.

Now, originating devices needs to perform following steps:

Step S502, after the 2D video image obtaining local terminal, according to the present image that local terminal gets, identifies the personal information of this end subscriber, sends to end device by this personal information; According to the personal information of this end subscriber, from the preset model storehouse of this locality, select corresponding 3D model, as the 3D model corresponding to this end subscriber, in our 3D display window, show the 3D model chosen;

Wherein, above-mentioned personal information comprises: age and sex.

Step S504, according to the present image that local terminal gets, generates the driving parameter that this end subscriber is corresponding, sends to end device by the driving parameter of generation; Use this driving parameter, drive the 3D model corresponding to this end subscriber to make corresponding actions.

In addition, when called device also using the secret protection of face, also can perform above-mentioned step S502-step S504, now, originating devices also can perform following steps:

Step S602, receive the personal information to the peer user that end device is sent, according to the personal information received, from the preset model storehouse of this locality, select corresponding 3D model, as the 3D model corresponding to peer user, in the other side 3D display window, show the 3D model chosen;

Such as, the 3D model chosen can be 3D cartoon model.

Step S604, receives the driving parameter sent end device, uses the driving parameter received, and drives the 3D model corresponding to peer user to make corresponding actions.

In the technical scheme of the embodiment of the present application, use a side of facial secret protection after the 2D video image obtaining local terminal, according to the personal information of this this end subscriber of image recognition, this personal information is sent to end device, according to the personal information of this end subscriber, from the preset model storehouse of this locality, select corresponding 3D model, as the 3D model corresponding to this end subscriber, show the 3D model chosen; And, according to the video image that local terminal gets, generate the driving parameter that this end subscriber is corresponding, the driving parameter of generation is sent to end device, use this driving parameter, drive the 3D model corresponding to this end subscriber to make corresponding actions.In addition, when opposite end also using the secret protection of face, we also can receive the personal information to the peer user that end device is sent, according to the personal information received, corresponding 3D model is selected from the preset model storehouse of this locality, as the 3D model corresponding to peer user, show the 3D model chosen; Receive the driving parameter that end device is sent, use the driving parameter received, drive the 3D model corresponding to peer user to make corresponding actions.Thus; can when using the secret protection of face; age and the sex of user is gone out by image recognition; and the 3D model matched with this age and sex demonstrated at local terminal and opposite end in local preset model storehouse; achieve the protection of a human face, reach the object of indirect secret protection, both avoided the privacy of leaking oneself face; it also avoid embarrassment, also assures that normally carrying out of video calling.

In addition, due in the above-mentioned methods, what show is correspond to the 3D model of this end subscriber and correspond to the 3D model of peer user, only display 3D model is attractive in appearance not, therefore, when showing the 3D model chosen in step S502 and step S602, by the virtual background in the preset model storehouse of this locality, add to after the 3D model chosen, obtain 3D scene, show this 3D scene.Like this, have virtual background after 3D model, when showing, in we and the other side 3D display window, show the 3D scene be made up of 3D model and virtual background, display effect is more attractive in appearance, improves Consumer's Experience.

Concrete, in the above-mentioned methods, this end subscriber can have at least one, and peer user also can have at least one.Now, in step S502 and step S602, show the method for 3D model chosen and comprise: by the virtual background in the preset model storehouse of this locality, add to after at least one 3D model of choosing, obtain 3D scene, show this 3D scene.

In addition, when calling subscriber and/or called subscriber have at least one individual, can for everyone the body region image in each two field picture in step S504, generate corresponding driving parameter, in step S504 and step S604, this driving parameter can be used, drive corresponding 3D model to make corresponding actions.

And when calling subscriber and/or called subscriber have at least one, due in communication process, number may change, therefore, originating devices and called device need the situation of change of the number detecting this end subscriber in real time.Now, when this end subscriber has at least one, originating devices and called device, after the 2D video image obtaining local terminal, also need to perform following steps:

Step S702, for the present image that local terminal gets, add up the number in this present image, and judge compared with the number in previous frame image, whether the number in this present image there occurs change; Decrease at least one this end subscriber if judge, then perform step S704, increased at least one this end subscriber newly if judge, then perform step S706;

Step S704, sends to end device by the id information of this end subscriber reduced, and is deleted by the 3D model corresponding to this end subscriber of this minimizing;

Step S706, according to this present image and successive image (from this two field picture, containing this newly-increased end subscriber in video image), identifies the personal information of this newly-increased end subscriber, sends to end device by this personal information; According to the personal information of this this newly-increased end subscriber, from the preset model storehouse of this locality, select corresponding 3D model, as the 3D model corresponding to this this newly-increased end subscriber, show the 3D model chosen;

That is, the 3D model corresponding to this this newly-increased end subscriber is added in the 3D scene shown in our 3D display window.

Step S802, receives the identification information of the peer user to the minimizing that end device is sent; Perform step S804 afterwards;

Step S804, according to the identification information received, searches and deletes the 3D model of peer user corresponding to this minimizing;

Step S806, receive the personal information to the newly-increased peer user that end device is sent, according to the personal information received, from the preset model storehouse of this locality, select corresponding 3D model, as the 3D model corresponding to this newly-increased peer user, show the 3D model chosen.

Such as, as shown in Figure 7, the facial secret protection of called subscriber's choice for use, then, the 3D model corresponding to called subscriber that originating devices and called device show is exactly cartoon 3D model, thus, secret protection is achieved to the face of called subscriber.In the figure 7, top is the 2D video image of called subscriber, and below is the 3D scene corresponding to called subscriber, and this 3D scene is combined by the cartoon 3D model and virtual background corresponding to called subscriber.Further, as seen from Figure 7, the cartoon 3D model corresponding to called subscriber can make corresponding actions along with the changes in faces of called subscriber.

(2) background sound is protected

In the method, for the side employing the secret protection of background sound for originating devices is described.

Now, originating devices needs to perform following steps:

Step S902, obtains the audio frequency of local terminal;

Step S904, wiping out background sound from the audio frequency obtained, sends to the audio frequency after filtering end device.

Equally, when called device also using the secret protection of background sound, called device also needs to perform above-mentioned steps S902-S904; now, originating devices also needs to receive the audio frequency sent end device, plays the audio frequency received; wherein, the audio frequency after this audio frequency background sound that has been filtering.

By the filtering of background sound, achieve the absolute privacy of individual environment, which kind of environment we are in allow the other side not know.

In actual implementation process, when using the secret protection of facial secret protection and background sound, only by the technical scheme of above-mentioned protection real life facial and the technical scheme protecting background sound, need combine and implement, repeat no more here.

In addition, when using the secret protection of facial secret protection and/or background sound, also can use the technical scheme of the emotion icons of above-mentioned steps S202-S206, repeating no more here.

Three, 3D model editing is realized by gesture

Do not use secret protection above-mentioned and use in the embodiment of secret protection, on the display screen of originating devices and called device, finally respectively illustrate the 3D model corresponding to this end subscriber and peer user or 3D scene (being made up of the 3D model and virtual background that correspond to this end subscriber or peer user).Afterwards, user according to the hobby of oneself and demand, can also replace the entirety in above-mentioned 3D model or 3D scene or local.

Like this, in the method for the embodiment of the present application, originating devices and/called device also need to perform following steps:

Step S1002, according to the present image that local terminal gets, carries out gesture identification to the arm of this end subscriber and obtains the movable information of arm; Wherein, movable information comprises: the information that the direction of motion, movement locus etc. are relevant with motion;

Concrete, focus method of motion analysis can be utilized, from each two field picture, segmentation obtains arm regions, extract skeleton and the skeleton point (or control point) of arm, according to the skeleton extracted and skeleton point, gesture identification is carried out to arm, namely, the skeleton extracted and skeleton point are mated with the gesture model that this locality is preserved, and, at the same time, the on-line training of gesture model can also be carried out, namely, to the skeleton extracted from each two field picture and skeleton point, carry out on-line training and obtain a gesture model and preserve.

Step S1004, if the gesture identifying arm matches with the local selection gesture model preserved, and the direction of motion of arm is first pointed to a preset model in the preset model storehouse of display screen display, is pointed to the model to be edited shown in our 3D display window on display screen or the other side 3D display window afterwards, then choose this preset model (namely, the 3D model first pointed to), use this preset model, edit this model to be edited (that is, the model of rear sensing).

In actual implementation process, originating devices and/or called device according to the operation of user, can also show the model in local preset model storehouse on a display screen, for you to choose.Owing to display screen there being 2 3D display windows: our 3D display window and the other side 3D display window, therefore, calling subscriber or called subscriber can by certain gestures, the model (now, this model is called model to be edited) shown in our 3D display window or the other side 3D display window is edited.

Wherein, preset model comprises: preset 3D manikin, default 3D human part model, virtual background, preset 3D ornament model etc.Wherein, human body comprises: head and body part.

Such as, presetting 3D manikin can be 3D cartoon model, preset the 3D model that 3D human part model can be the various parts of the human body faces such as hair, eyebrow, beard and eyes, preset the 3D model that 3D ornament model can be the ornament such as glasses, headdress flower.

Wherein, above-mentioned model to be edited can be with the one in drag: corresponding to the 3D model of this end subscriber or peer user, the virtual background in 3D scene.Wherein, the 3D model corresponding to this end subscriber or peer user can be the 3D model obtained according to the first two field picture, for convenience, is called true 3D manikin; Also can be the 3D model found from the preset model storehouse of this locality according to personal information, be called default 3D manikin.

Concrete, in step S1004, if the gesture identifying arm matches with the local selection gesture model preserved, and the preset model that the direction of motion of arm points to display screen display reaches the scheduled time, such as, 1 second, then chooses this preset model; According to arm pointing to the movement locus after the preset model chosen, move this preset model chosen; If the model to be edited that the direction of motion of arm points to display screen display reaches the scheduled time, such as, 1 second, then use this model to be edited of preset model editor chosen.Whole process, can show on a display screen.

Step S1006, sends to end device by the first identification information, the second identification information and the 3rd identification information; Wherein, the first identification information be this model to be edited our 3D display window or the identification information of the other side 3D display window, the second identification information is the identification information that this model to be edited is corresponding, and the 3rd identification information is the identification information of this preset model;

Step S1008, receives the first identification information, the second identification information and the 3rd identification information sent end device; Perform step S1010 afterwards;

Step S1010, according to the first identification information received and the second identification information, determines the model to be edited in our 3D display window or the other side 3D display window; According to the 3rd identification information received, from the preset model storehouse of this locality, find this preset model;

Step S1012, uses the preset model found, edits this model to be edited.

Wherein, above-mentioned editor comprises: replace and add.

When being compiled as replacement, use preset model, the method for editing model to be edited comprises: according to this preset model, determines the part to be replaced in this model to be edited, uses this preset model to replace this part to be replaced.

Or when being compiled as interpolation, use preset model, the method for editing model to be edited comprises: according to this preset model, determines the position to be added in this model to be edited, is added to by this preset model on this position to be added.

Therefore the method for the embodiment of the present application can realize the editor to background, human body integral, body local, improves personal entertainment and user experience.

When the preset model chosen in step S1004 is virtual background, the editor of background can be realized, comprising: by the virtual background in the 3D scene that shows in our 3D display window or the other side 3D display window, replace with the virtual background that this is chosen; And, by the virtual background chosen, add to show in our 3D display window or the other side 3D display window true/preset after 3D manikin.

When the preset model chosen in step S1004 is default 3D manikin, the editor to human body integral can be realized, comprise: by true/default 3D manikin shown in our 3D display window or the other side 3D display window, replace with the default 3D manikin chosen.

When the preset model chosen in step S1004 be default 3D human part model or default 3D ornament model time, the editor of body local can be realized, comprise: by show in our 3D display window or the other side 3D display window true/preset corresponding component in 3D manikin, replace with the default 3D human part model chosen; By show in our 3D display window or the other side 3D display window true/preset corresponding ornament in 3D manikin, replace with the default 3D ornament model chosen; By the default 3D ornament model chosen, add on the relevant position in true/default 3D manikin shown in our 3D display window or the other side 3D display window.

Such as; as shown in Figure 8; called subscriber, after the secret protection employing face, can also select a default 3D Hair model oneself liked by certain gesture, the hair in the default 3D manikin of oneself be replaced with the default 3D Hair model chosen from preset model storehouse.In fig. 8, top is the 2D video image of called subscriber, and below corresponds to the 3D scene of called subscriber, and this 3D scene is by substituted for the default 3D manikin after hair and virtual background forms.

As shown in Figure 9, calling subscriber's virtual background current to oneself is unsatisfied with, and can also select a virtual background oneself liked, current virtual background is replaced with the virtual background chosen by certain gesture from preset model storehouse.In fig .9, top is the 3D scene corresponding to calling subscriber before replacing, and below is the 3D scene corresponding to calling subscriber after replacing.

Four, word is realized by gesture mutual

User by certain gesture, can also input corresponding character by the mode of sky-writing, and edits the character of input.Concrete, originating devices and/called device need to perform following steps:

Step S1102, according to the present image that local terminal gets, carries out gesture identification to the hand of this end subscriber and obtains the movable information of hand;

Concrete, focus method of motion analysis can be utilized, from each two field picture, segmentation obtains arm regions, and obtain hand region from arm regions segmentation further, extract skeleton and the skeleton point (or control point) of hand, according to the skeleton extracted and skeleton point, gesture identification is carried out to hand, namely, the skeleton extracted and skeleton point are mated with the gesture model that this locality is preserved, and, at the same time, the on-line training of gesture model can also be carried out, namely, to the skeleton extracted from each two field picture and skeleton point, carry out on-line training obtain a gesture model and preserve.

To the movement locus of the hand obtained, step S1104, if the gesture identifying hand matches with the local hand writing gesture model preserved, then identifies that the movement locus of the hand got sends to end device by Identification display result;

Step S1106, receives the movement locus to the hand that end device is sent, and identifies the movement locus of the hand received, Identification display result.

Wherein, above-mentioned recognition result is at least one character or edit commands, and wherein, edit commands comprises: delete command.When recognition result is delete command, after identifying this recognition result, also comprise: determine the character to be deleted shown in the text of screen display, from text, delete character to be deleted.

Such as, when calling subscriber or called subscriber want input text, corresponding character can be aloft write by the gesture that holds a pen, this end device just can be made and to the movement locus of end device according to user's hand by said method, identify corresponding character, by the Charactes Display that identifies in the upper left corner of the 3D display window of correspondence.Or, when calling subscriber or called subscriber want to delete some character, also can by gesture of waving, at this end device with to deletion end device realized to respective symbols.

Five, virtual man-machine interaction is realized by gesture

User can also by certain gesture, to oneself or the other side true/preset 3D manikin to make some actions such as impact or touch, make oneself or the other side true/face of presetting 3D manikin makes action of expressing one's feelings accordingly.

Concrete, originating devices and/called device need to perform following steps:

Step S1202, according to the present image that local terminal gets, carries out gesture identification to the arm of this end subscriber and obtains the movable information of arm;

Step S1204, if the gesture identifying arm matches with the local interaction gesture model preserved, and the direction of motion of arm points to the 3D model that one of display screen display corresponds to this end subscriber or peer user, then according to the interaction gesture model matched, the 3D model pointed to is driven to make corresponding expression action.

Wherein, the 3D model corresponding to this end subscriber or peer user comprises: true 3D manikin, default 3D manikin; Interaction gesture model comprises: impact gesture model, touch gesture model etc.Such as, impact gesture can be with fist impact or fan slap on the face etc. with hand, and touching gesture can be stroke.

Step S1206, sends to end device by the first identification information and the second identification information;

Wherein, the first identification information is the identification information of the interaction gesture model matched, and the second identification information is the identification information that the 3D model of sensing is corresponding.When the 3D model of this sensing is the 3D model corresponding to this end subscriber, the identification information that this 3D model is corresponding is the identification information of this this end subscriber; When the 3D model of this sensing is the 3D model corresponding to peer user, the identification information that this 3D model is corresponding is the identification information of this peer user.

Step S1208, receives the first identification information and the second identification information sent end device; Perform step S1210 afterwards;

Step S1210, according to the first identification information received, finds corresponding interaction gesture model; According to the interaction gesture model found, the 3D model indicated by the second identification information is driven to make corresponding expression action.

Pass through said method, user can make certain gesture motion by true/default 3D manikin of the display screen display to this end device, make this end device and corresponding facial expression action is made to true/default 3D manikin of the correspondence on end device, reach mutual, interaction effect well.

Concrete, if the interaction gesture model matched in step S1204 is the gesture model with fist impact, then virtual collision detection is carried out to the gesture of arm, obtain gesture force value (being designated as T), according to this gesture force value T calculate with size be T gesture pressure boxing face time facial deformation values, according to this facial deformation values, find corresponding facial expression and drive parameter, according to this driving driving parameter corresponding true/face of presetting 3D manikin makes corresponding actions.

Such as, as shown in Figure 10, calling subscriber impacts the true 3D manikin corresponding to called subscriber of the display screen display of originating devices every sky with fist, the face of this true 3D manikin makes facial muscle distortion expression, meanwhile, the true 3D manikin of the correspondence of the display screen display of called device and called device also can make action of expressing one's feelings equally.In Fig. 10, the upper left corner is calling subscriber with the 2D video image of fist when sky impacts, and the upper right corner is the 3D scene corresponding to calling subscriber, and this 3D scene is made up of the true 3D manikin and virtual scene corresponding to calling subscriber; The lower left corner is the 2D video image of called subscriber, and the lower right corner is the 3D scene corresponding to called subscriber, and this 3D scene is made up of the true 3D manikin and virtual scene corresponding to called subscriber.As can be seen from the true 3D manikin corresponding to called subscriber, the true 3D manikin corresponding to called subscriber has made the action after being impacted by fist.

Or, when the interaction gesture model matched in step S1204 is the gesture model stroked with hand, then corresponding true/default 3D manikin can be driven to make the expression action of blushing.

In addition, in said method, the gesture identifying arm in step S1204 matches with the local interaction gesture model preserved, and the direction of motion of arm is pointed to after one of display screen display corresponds to the 3D model of this end subscriber or peer user, also needs execution following steps:

Step S1302, according to the interaction gesture model matched, determines corresponding expression; Perform step S1304 afterwards;

Step S1304, finds the emotion icons matched with the expression determined from the emotion icons storehouse of this locality, shows the emotion icons found, sends to end device by the identification information of the emotion icons found;

Step S1306, receives the identification information to the emotion icons that end device is sent, and according to the identification information received, finds the emotion icons corresponding with this identification information, show this emotion icons from the emotion icons storehouse of this locality.

By step S1302-step S1306, user can make certain gesture motion by true/default 3D manikin of the display screen display to this end device, make this end device and corresponding facial expression action is made to true/default 3D manikin of the correspondence on end device, simultaneously, the emotion icons that can also make this end device and display and this facial expression on end device are matched, make more to be rich in alternately recreational, improve user experience.

Obviously, in actual implementation process, can according to actual needs, be implemented separately by any one in the various methods in above-mentioned to five, also wherein multiple can be combined arbitrarily, the application does not limit this.

Embodiment two

For the method in above-described embodiment one, provide in the embodiment of the present application a kind of can video calling implement device to apply the method.

As shown in figure 11, this video calling implement device comprises with lower module: connection establishment module 00, video acquiring module 10, generation module 20, sending module 30, receiver module 40, MBM 50, display module 60 and driver module 70.

One, secret protection is not used

Now, connection establishment module 00, is connected with to the video calling of end device for setting up;

Video acquiring module 10, for obtaining the current 2D video image of local terminal;

Generation module 20, for the present image got according to video acquiring module 10, generates the driving parameter that this end subscriber is corresponding;

Sending module 30, sends to end device for the first two field picture video acquiring module 10 got; Driving parameter also for being generated by generation module 20 sends to end device;

Receiver module 40, for receiving the first two field picture sent end device; Also for receiving the driving parameter sent end device;

MBM 50, carries out 3D modeling for the first two field picture received according to receiver module 40, obtains the 3D model corresponding to peer user;

Display module 60, for showing the 3D model that MBM 50 obtains;

Driver module 70, the driving parameter received for using receiver module 40, drives the 3D model corresponding to peer user to make corresponding actions.

Wherein, MBM 50, the first two field picture also for getting according to video acquiring module 10 carries out 3D modeling, obtains the 3D model corresponding to this end subscriber;

Driver module 70, the driving parameter also for using generation module 20 to generate, drives the 3D model corresponding to this end subscriber to make corresponding actions.

Wherein, video acquiring module 10 also for while obtaining the current 2D video of local terminal, obtains the audio frequency of local terminal; Then, also comprise: Expression Recognition module and search module, wherein:

Expression Recognition module, for the present image that gets according to video acquiring module 10 and/or audio frequency, identifies the expression of this end subscriber;

Search module, for from the emotion icons storehouse of this locality, the emotion icons that the expression finding this end subscriber obtained with Expression Recognition Module recognition matches;

Sending module, also for the identification information by searching the emotion icons that module searches arrives, sends to end device;

Display module, also for showing the emotion icons searched module searches and arrive.

In addition, receiver module, also for receiving the identification information to the emotion icons that end device is sent; Searching module, the identification information also for receiving according to receiver module, from the emotion icons storehouse of this locality, finding corresponding emotion icons, show this emotion icons by display module.

In addition, when this end subscriber has at least one, also comprise: demographics module, removing module and segmentation module, wherein:

Demographics module, after getting the first two field picture in video acquiring module, adds up the number of this end subscriber in the first two field picture, is that each this end subscriber in the first two field picture distributes an identification information; Sending module, also for while sending to end device at the first two field picture video acquiring module got, sends to end device by the corresponding relation of each this end subscriber in the first two field picture and identification information thereof; Receiver module, also for while receiving the first two field picture of sending end device, receives the corresponding relation of each peer user in the first two field picture sent end device and identification information thereof; MBM, also for obtain correspond to peer user 3D model after, set up each peer user, this peer user identification information and correspond to this peer user 3D model between corresponding relation.

In addition, demographics module is also for the arbitrary present image after the first two field picture of getting for video acquiring module 10, add up the number of this end subscriber in this present image, and judge compared with the number of this end subscriber in previous frame image, whether the number of this end subscriber in this present image there occurs change; If also increased at least one this end subscriber newly for judging, then for each this newly-increased end subscriber distributes an identification information;

Sending module, if also judge to decrease at least one this end subscriber for demographics module, then sends to end device by the identification information of this end subscriber reduced;

Removing module, the true 3D manikin for this end subscriber by corresponding to this minimizing is deleted;

Segmentation module, if judge to have increased at least one this end subscriber newly for demographics module, then from this two field picture, segmentation obtains the human region image corresponding to this newly-increased end subscriber;

Sending module, the human region image also for segmentation module is obtained, and the corresponding relation of each this newly-increased end subscriber and identification information thereof sends to end device;

MBM, the human region image also for obtaining according to segmentation module carries out 3D modeling, obtains the 3D model corresponding to this this newly-increased end subscriber, shows this 3D model by display module.

In addition, when peer user has at least one, receiver module, also for receiving the identification information of the peer user to the minimizing that end device is sent; Also for receiving the human region image corresponding to newly-increased peer user sent end device, and the corresponding relation of each newly-increased peer user and identification information thereof;

Search module, the identification information of the peer user of minimizing also for receiving according to receiver module, finds the 3D model of the peer user corresponding to this minimizing;

Removing module, also for being deleted by the 3D model searching the peer user corresponding to this minimizing that module searches arrives;

MBM, human region image also for receiving according to receiver module carries out 3D modeling, obtain the 3D model corresponding to newly-increased peer user, set up the corresponding relation between each newly-increased peer user, the identification information of this newly-increased peer user and the 3D model corresponding to this newly-increased peer user, show this 3D model by display module.

Two, secret protection is used

Now, can also comprise with lower module in this device: personal information identification module and selection module, wherein:

Video acquiring module, for obtaining the current 2D video image of local terminal;

Generation module, for the present image got according to video acquiring module, generates the driving parameter that this end subscriber is corresponding;

Personal information identification module, for the present image got according to video acquiring module, identifies the personal information of this end subscriber;

Sending module, for the personal information of this end subscriber identification of personal information identification module obtained, send to end device, wherein, personal information comprises: age and sex; Driving parameter also for being generated by generation module sends to end device;

Receiver module, also for receiving the personal information to the peer user that end device is sent;

Select module, for the personal information of this end subscriber obtained according to the identification of personal information identification module, from the preset model storehouse of this locality, select corresponding 3D model, as the 3D model corresponding to this end subscriber; The personal information of peer user also for receiving according to receiver module, selects corresponding 3D model from the preset model storehouse of this locality, as the 3D model corresponding to peer user;

Display module, also for showing the 3D model selecting module to choose;

Driver module, the driving parameter also for using generation module to generate, drives the 3D model corresponding to this end subscriber to make corresponding actions; Driving parameter also for using receiver module to receive, drives the 3D model corresponding to peer user to make corresponding actions.

In addition, also comprise with lower module in this device: filtering module and playing module, wherein:

Video acquiring module, also for while obtaining the current 2D video image of local terminal, obtains the audio frequency of local terminal;

Filtering module, for wiping out background sound in the audio frequency that obtains from video acquiring module;

Sending module, also for sending to end device by the audio frequency after the filtering of filtering module;

Receiver module, also for receiving the audio frequency sent end device, wherein, the audio frequency after this audio frequency background sound that has been filtering;

Playing module, for playing the audio frequency that receiver module receives.

Three, 3D model editing is realized by gesture

Now, in one and two content basis on, can also comprise with lower module in this device: gesture recognition module, movable information acquisition module, select module, search module and editor module, wherein:

Gesture recognition module, for the present image got according to video acquiring module, carries out gesture identification to the arm of this end subscriber;

Movable information acquisition module, for the present image got according to video acquiring module, obtains the movable information of the arm of this end subscriber;

Select module, if the gesture identifying arm for gesture recognition module matches with the local selection gesture model preserved, and the direction of motion of arm is first pointed to a preset model in the preset model storehouse of display screen display, is pointed to the model to be edited that shows in we or the other side 3D display window afterwards, then choose this preset model; Wherein, our 3D display window is for showing the 3D scene be made up of the 3D model and a virtual background that correspond to this end subscriber; The other side 3D display window is for showing the 3D scene be made up of the 3D model and a virtual background that correspond to peer user;

Sending module, for the first identification information, the second identification information and the 3rd identification information are sent to end device, wherein, first identification information be model to be edited we or the identification information of the other side 3D display window, second identification information is the identification information that this model to be edited is corresponding, and the 3rd identification information is the identification information of this preset model;

Receiver module, for receiving the first identification information, the second identification information and the 3rd identification information sent end device;

Search module, for according to the first identification information received and the second identification information, determine the model to be edited in we or the other side 3D display window; According to the 3rd identification information received, from the preset model storehouse of this locality, find this preset model;

Editor module, for using the 3D model selecting module to choose, edits this model to be edited; Also for using the preset model searched module searches and arrive, edit the model to be edited found.

Wherein, this preset model is 3D model or virtual background, and this 3D model comprises: preset 3D manikin, default 3D human part model, virtual background, preset 3D ornament model;

When this preset model is 3D model, this model to be edited is the 3D model (can be true 3D manikin or default 3D manikin) corresponding to this end subscriber or peer user shown in we or the other side 3D display window, and the second identification information is the identification information of this this end subscriber or peer user;

When this preset model is virtual background, this model to be edited is the virtual background that shows in we or the other side 3D display window, and the second identification information is the identification information of the virtual background shown in we or the other side 3D display window.

Above-mentioned editor comprises: replace and add.

Four, word is realized by gesture mutual

Now, in one and two content basis on, can also comprise with lower module in this device: gesture recognition module, movable information acquisition module and movement locus identification module, wherein:

Gesture recognition module, for the present image got according to video acquiring module, carries out gesture identification to the hand of this end subscriber;

Movable information acquisition module, for the present image got according to video acquiring module, obtains the movable information of the hand of this end subscriber;

Sending module, for sending to end device by the movement locus of the hand got;

Receiver module, for receiving the movement locus to the hand that end device is sent;

Movement locus identification module, if the gesture identifying hand for gesture recognition module matches with the local hand writing gesture model preserved, then identifies the movement locus of the hand that movable information acquisition module obtains; Movement locus also for the hand received receiver module identifies; Wherein, recognition result is at least one character or edit commands;

Display module, for showing the recognition result that movement locus identification module obtains.

Five, virtual man-machine interaction is realized by gesture

Now, in one and two content basis on, can also comprise with lower module in this device: gesture recognition module, movable information acquisition module and search module, wherein:

Sending module, if the gesture also identifying arm for gesture recognition module matches with the local interaction gesture model preserved, and the direction of motion of arm points to the 3D model (can be true 3D model or default 3D model) that one of display screen display corresponds to this end subscriber or peer user, then send to end device by the first identification information and the second identification information; Wherein, the first identification information is the identification information of the interaction gesture model matched, and the second identification information is the identification information that this 3D model is corresponding;

Receiver module, for receiving the first identification information and the second identification information sent end device;

Search module, for the first identification information received according to receiver module, find corresponding interaction gesture model;

Driver module, if the gesture identifying arm for gesture recognition module matches with the local interaction gesture model preserved, and the direction of motion of arm points to the 3D model that one of display screen display corresponds to this end subscriber or peer user, then according to the interaction gesture model matched, this 3D model is driven to make corresponding expression action; Also for according to searching the interaction gesture model that module searches arrives, the 3D model indicated by the second identification information is driven to make corresponding expression action.

In addition, can also comprise with lower module in this device: expression determination module, wherein:

Expression determination module, if the gesture identifying arm for gesture recognition module matches with the local interaction gesture model preserved, and the direction of motion of arm points to the 3D model that one of display screen display corresponds to this end subscriber or peer user, then according to the interaction gesture model matched, determine corresponding expression;

Receiver module, also for receiving the identification information to the emotion icons that end device is sent;

Search module, also for from the emotion icons storehouse of this locality, find the emotion icons that the expression determined with determination module of expressing one's feelings matches; Identification information also for receiving according to receiver module, finds the emotion icons corresponding with this identification information from the emotion icons storehouse of this locality;

Display module, also for showing the emotion icons found.

Wherein, the above-mentioned 3D model corresponding to this end subscriber or peer user comprises: true 3D manikin, default 3D manikin;

Above-mentioned interaction gesture model comprises: impact gesture model, touch gesture model.

This device can be applied to mobile phone, panel computer, PC(PC) etc. on equipment.

To sum up, the above embodiment of the application can reach following technique effect:

(1) originating devices and called device obtain the 2D video image of local terminal separately, then, when not using secret protection, first two field picture is sent to end device, according to getting each two field picture, generate the driving parameter that this end subscriber is corresponding, the driving parameter of generation is sent to end device; In addition, also can receive and the first two field picture is sent to end device, the first two field picture according to receiving carries out 3D modeling, obtain the 3D model corresponding to peer user, show this 3D model, receive the driving parameter that end device is sent, use the driving parameter received, drive the 3D model corresponding to peer user to make corresponding actions.

(2) originating devices and called device detect the number situation of change of this end subscriber separately in real time, when detecting that the number of this end subscriber reduces, the id information of this end subscriber reduced is sent to end device, simultaneously, delete the 3D model of this end subscriber corresponding to this minimizing, when detecting that the number of this end subscriber increases, the human region image corresponding to this newly-increased end subscriber in the first two field picture when having increased user newly is sent to end device, meanwhile, carry out according to this human region image the 3D model that 3D modeling obtains corresponding to this this newly-increased end subscriber; And, end device after receiving above-mentioned id information, can be searched and delete the 3D model indicated by this id information, when receiving above-mentioned human region image, 3D modeling being carried out to this human region image and obtains corresponding 3D model; In addition, when this end subscriber there occurs minimizing, also this end subscriber of minimizing is just no longer included in follow-up video, thus, also the facial information of this end subscriber of this minimizing would not be corresponded to by regeneration, and, when this end subscriber there occurs increase, this newly-increased end subscriber is had in follow-up video image, thus, also just can generate the facial information corresponding to this this newly-increased end subscriber, this facial information can be used to drive the 3D model corresponding to this this newly-increased end subscriber to make corresponding actions.

Can find out, said method achieves the 3D video calling of many people, and the animation effect of many people 3D model.

(3) in video call process; user can according to residing environment and the relation with the other side; select the protection carrying out individual privacy; comprise: background sound is protected; the i.e. wiping out background sound of local terminal; the secret protection of face, namely uses the 3D model in local preset model storehouse (namely presetting 3D manikin) to substitute true 3D manikin (that is, according to the 3D model that the first two field picture obtains).In addition; when using the secret protection of face; this end device can by the video image obtained; identify age and the sex of user; thus according to age of user and sex; from the preset model storehouse of this locality, find the default 3D manikin of coupling, use the 3D model of default 3D manikin as this user of coupling, thus improve recreational and privacy.

(4) user can utilize gesture to true/preset 3D manikin or virtual background is edited, utilize gesture input characters, utilize gesture and the other side to carry out alternately, make video call process oneself however interesting, improve the Experience Degree of user.

(5) can by image and sound, identify the expression of this end subscriber in communication process in real time, from the emotion icons storehouse of this locality, find the emotion icons matched with this expression, show this emotion icons, meanwhile, the id information of the emotion icons found is sent to end device; After this id information is received to end device, also can find the emotion icons indicated by this mark from the emotion icons storehouse of this locality, show this emotion icons, thus, this emotion icons is all shown, for representing the expression (or mood) that respective user is current at local terminal and opposite end.

Thus, achieve the expression identification function of user in 3D video call process, make it possible to the mode combined by true/default 3D manikin and emotion icons, come expression and the emotional change of real-time embodying user.

The foregoing is only the preferred embodiment of the application, not in order to limit the application, within all spirit in the application and principle, any amendment made, equivalent replacements, improvement etc., all should be included within scope that the application protects.

Claims

1. a video calling implementation method, is characterized in that, comprising:

Set up and be connected with to the video calling of end device;

Obtain the two-dimentional 2D image that local terminal is current;

2. method according to claim 1, is characterized in that, when the present image obtained is the first two field picture of the front face including this end subscriber, after getting described first two field picture, also comprises:

Described first two field picture is sent to end device.

3. method according to claim 2, it is characterized in that, when this present image is described first two field picture, after getting described first two field picture, also comprise: carry out 3D modeling according to described first two field picture, obtain the 3D model corresponding to this end subscriber;

When this present image is any frame image, after generating driving parameter corresponding to this end subscriber, also comprise: use the driving parameter generated, drive the described 3D model corresponding to this end subscriber to make corresponding actions.

4. method according to claim 3, it is characterized in that, when this present image is described first two field picture, after getting described first two field picture, also comprise: the number of adding up this end subscriber in described first two field picture, for each this end subscriber in described first two field picture distributes an identification information;

Then, while described first two field picture is sent to end device, the corresponding relation of each this end subscriber in described first two field picture and identification information thereof is sent to end device.

5. method according to claim 4, is characterized in that, during any frame image after this present image is described first two field picture, after getting this present image, also comprises:

Add up the number of this end subscriber in this present image, judge compared with the number of this end subscriber in previous frame image, in this present image, whether the number of this end subscriber there occurs change;

Decrease at least one this end subscriber if judge, then the identification information of this end subscriber reduced is sent to end device, the 3D model corresponding to this end subscriber of this minimizing is deleted.

6. method according to claim 5, is characterized in that, is judging compared with the number of this end subscriber in previous frame image, after in this present image, whether the number of this end subscriber there occurs change, also comprises:

Increased at least one this end subscriber newly if judge, then from this present image, segmentation obtains the human region image corresponding to this newly-increased end subscriber, and this end subscriber newly-increased for each distributes an identification information; By this human region image, and the corresponding relation of each this newly-increased end subscriber and identification information thereof sends to end device; Carry out 3D modeling according to this human region image, obtain the 3D model corresponding to each this newly-increased end subscriber.

7. method according to claim 1, is characterized in that, after getting the current 2D image of local terminal, also comprises:

According to this present image, identify the personal information of this end subscriber, send to end device by this personal information, wherein, described personal information comprises: age and sex.

8. method according to claim 7, it is characterized in that, after the personal information identifying this end subscriber, also comprise: according to the personal information of this end subscriber, corresponding 3D model is selected, as the 3D model corresponding to this end subscriber from the preset model storehouse of this locality;

After generating driving parameter corresponding to this end subscriber, also comprise: use this driving parameter, drive the 3D model corresponding to this end subscriber to make corresponding actions.

9. method according to any one of claim 1 to 8, is characterized in that, while obtaining the current 2D image of local terminal, obtains the audio frequency that local terminal is current; Then, after getting the current 2D image of local terminal and audio frequency, also comprise:

According to this present image and/or present video, identify the expression of this end subscriber;

From the emotion icons storehouse of this locality, find the emotion icons matched with the expression of this end subscriber, show this emotion icons; Wherein, the corresponding identification information of each emotion icons in described emotion icons storehouse;

The identification information of the emotion icons found is sent to end device.

10. the method according to claim 3 or 8, is characterized in that, after getting the current 2D image of local terminal, also comprises:

According to this present image, gesture identification is carried out to the arm of this end subscriber and obtains the movable information of arm;

If the gesture identifying arm matches with the local selection gesture model preserved, and the direction of motion of arm first point to display screen display described preset model storehouse in a preset model, point to the model to be edited shown in our 3D display window or the other side 3D display window afterwards, then choose this preset model; Wherein, described our 3D display window is for showing the 3D scene be made up of the 3D model and a virtual background that correspond to this end subscriber, and described the other side 3D display window is for showing the 3D scene be made up of the 3D model and a virtual background that correspond to peer user;

First identification information, the second identification information and the 3rd identification information are sent to end device; Wherein, described first identification information be this model to be edited our 3D display window or the identification information of the other side 3D display window, described second identification information is the identification information that this model to be edited is corresponding, and described 3rd identification information is the identification information of this preset model.

11. methods according to claim 10, is characterized in that, after choosing this preset model, also comprise: use this preset model, edit this model to be edited.

12. methods according to claim 3 or 8, is characterized in that, also comprise:

According to this present image, gesture identification is carried out to the hand of this end subscriber and obtains the movable information of described hand;

If the gesture identifying described hand matches with the local hand writing gesture model preserved, then the movement locus of the described hand obtained is identified that the movement locus of the described hand got sends to end device by Identification display result.

13. methods according to claim 3 or 8, is characterized in that, also comprise:

According to this present image, gesture identification is carried out to the arm of this end subscriber and obtains the movable information of described arm;

If the gesture identifying described arm matches with the local interaction gesture model preserved, and the direction of motion of arm points to the 3D model that one of display screen display corresponds to this end subscriber or peer user, then send to identification information corresponding to the identification information of the described interaction gesture model matched and this 3D model to end device.

14. methods according to claim 13, it is characterized in that, while identification information corresponding to the identification information of the described interaction gesture model matched and this 3D model is sent to end device, according to the described interaction gesture model matched, this 3D model is driven to make corresponding expression action.

15. methods according to claim 13, it is characterized in that, match in the gesture identifying described arm and the local interaction gesture model preserved, and after the direction of motion of arm points to a 3D model corresponding to this end subscriber or peer user of display screen display, also comprise:

According to the described interaction gesture model matched, determine corresponding expression;

From the emotion icons storehouse of this locality, find the emotion icons matched with the expression determined, show this emotion icons; Wherein, the corresponding identification information of each emotion icons in described emotion icons storehouse;

16. 1 kinds of video calling implementation methods, is characterized in that, comprising:

Set up and be connected with to the video calling of end device;

The 3D model corresponding to peer user is obtained by three-dimensional 3D modeling;

Receive the driving parameter that the peer user sent end device is corresponding, use this driving parameter, drive the described 3D model corresponding to peer user to make corresponding actions.

17. methods according to claim 16, is characterized in that, the method being obtained the 3D model corresponding to peer user by 3D modeling is comprised:

Receive the first two field picture including the front face of peer user that end device is sent;

Carry out 3D modeling according to this first two field picture, obtain the described 3D model corresponding to peer user.

18. methods according to claim 17, is characterized in that, while receiving described first two field picture sent end device, receive the corresponding relation of each peer user in described first two field picture sent end device and identification information thereof;

Then, obtain described correspond to the 3D model of peer user after, also comprise: set up the corresponding relation between each peer user, the identification information of this peer user and the 3D model corresponding to this peer user.

19. methods according to claim 18, is characterized in that, also comprise:

Receive identification information to the peer user of the minimizing that end device is sent, according to the identification information received, search and delete the 3D model of the peer user corresponding to this minimizing;

Receive the human region image corresponding to newly-increased peer user that end device is sent, and the corresponding relation of each newly-increased peer user and identification information thereof; Human region image according to receiving carries out 3D modeling, obtains the 3D model corresponding to newly-increased peer user; Set up the corresponding relation between each newly-increased peer user, the identification information of this newly-increased peer user and the 3D model corresponding to this newly-increased peer user.

20. methods according to claim 16, is characterized in that, the method being obtained the 3D model corresponding to peer user by 3D modeling is comprised:

Receive the personal information to the peer user that end device is sent, wherein, described personal information comprises: age and sex;

According to the personal information received, from the preset model storehouse of this locality, select corresponding 3D model, as the 3D model corresponding to peer user.

21. methods according to claim 17 or 20, is characterized in that, also comprise:

Receive the first identification information, the second identification information and the 3rd identification information that end device is sent, wherein, described first identification information be model to be edited our 3D display window or the identification information of the other side 3D display window, described second identification information is the identification information of this model to be edited, and described 3rd identification information is the identification information of the preset model in the preset model storehouse chosen of peer user; Wherein, described our 3D display window is for showing the 3D scene be made up of the 3D model and a virtual background that correspond to this end subscriber, and described the other side 3D display window is for showing the 3D scene be made up of the 3D model and a virtual background that correspond to peer user;

According to the first identification information received and the second identification information, determine the model to be edited in described our 3D display window or described the other side 3D display window; According to the 3rd identification information received, from the preset model storehouse of this locality, find this preset model;

Use this preset model, edit this model to be edited.

22. methods according to claim 17 or 20, is characterized in that, also comprise:

Receive the movement locus to the hand that end device is sent, the movement locus of the hand received is identified, Identification display result.

23. methods according to claim 17 or 20, is characterized in that, also comprise:

Receive the first identification information and the second identification information sent end device, wherein, described first identification information is the identification information of interaction gesture model, and described second identification information corresponds to identification information corresponding to the 3D model of this end subscriber or peer user;

According to the first identification information received, find corresponding interaction gesture model; According to the interaction gesture model found, corresponding expression action made by the 3D model driving the second identification information corresponding.

24., according to claim 16 to the method according to any one of 20,23, is characterized in that, also comprise:

Receive the identification information to the emotion icons that end device is sent, according to the identification information received, from the emotion icons storehouse of this locality, find the emotion icons corresponding with this identification information, show this emotion icons.

25. 1 kinds of video calling implementation methods, is characterized in that, comprising:

Set up and be connected with to the video calling of end device;

Obtain the two-dimentional 2D image that local terminal is current;

26. 1 kinds of video calling implement devices, is characterized in that, comprising:

Video acquiring module, for obtaining the current two-dimentional 2D image of local terminal;

Generation module, for the present image got according to described image collection module, generates the driving parameter that this end subscriber is corresponding;

Sending module, sends to end device for the driving parameter generated by described driving parameter generation module.

27. 1 kinds of video calling implement devices, is characterized in that, comprising:

MBM, for obtaining the 3D model corresponding to peer user by three-dimensional 3D modeling;

Driver module, the driving parameter that the peer user for using described receiver module to receive is corresponding, corresponding actions made by the described 3D model corresponding to peer user driving described MBM to obtain.

28. 1 kinds of video calling implement devices, is characterized in that, comprising:

Sending module, sends to end device for the driving parameter generated by described driving parameter generation module;