CN108174123A

CN108174123A - Data processing method, apparatus and system

Info

Publication number: CN108174123A
Application number: CN201711443989.3A
Authority: CN
Inventors: 张引; 吴烁
Original assignee: Beijing Sohu New Media Information Technology Co Ltd
Current assignee: Beijing Sohu New Media Information Technology Co Ltd
Priority date: 2017-12-27
Filing date: 2017-12-27
Publication date: 2018-06-15

Abstract

This application provides a kind of data processing method, apparatus system, including：Obtain user voice data and user's lteral data；Wherein, the user voice data is corresponding with user's lteral data；Determine lip image set corresponding with user's lteral data；It adjusts the lip image set and obtains lip image set corresponding with facial image, and synthesize the corresponding lip video data of facial image；User voice data and lip video data are synthesized, obtains user video data.The application can be based on user voice data, and with reference to facial image, display is for voice data on facial image, to show with the effect of facial image displaying user voice data.The exchange way of instant message applications can be enriched in this way.

Description

Data processing method, apparatus and system

Technical field

This application involves field of communication technology more particularly to a kind of data processing method, apparatus and systems.

Background technology

In increasingly flourishing internet, some social networking applications may be used voice mode and send message.Speech message its It is more single to show form, interaction effect is poor.

Invention content

In consideration of it, the application provides a kind of data processing method, apparatus and system, the friendship of instant message applications can be enriched Stream mode.

To achieve these goals, this application provides following technical characteristics：

A kind of data processing method, including：

Obtain user voice data and user's lteral data；Wherein, the user voice data and user's word Data correspond to；

Determine lip image set corresponding with user's lteral data；

It adjusts the lip image set and obtains lip image set corresponding with facial image, and it is corresponding to synthesize facial image Lip video data；

User voice data and lip video data are synthesized, obtains user video data.

Optionally, the acquisition user voice data and user's lteral data, including：

User's lteral data is obtained in response to lteral data input by user, voice data is converted to based on lteral data and is obtained Obtain user voice data；Alternatively,

User voice data is obtained in response to voice data input by user, lteral data is converted to based on voice data and is obtained Obtain user's lteral data.

Optionally, it is described to determine lip image set corresponding with user's lteral data, including：

Semantic analysis is carried out to user's lteral data and is segmented, multiple participles is obtained and corresponding multiple participles belongs to Property information；

Multiple lip images corresponding with multiple participles are determined respectively；

Corresponding lip image is adjusted based on participle attribute information；

Lip image composition lip image set after multiple adjustment.

Optionally, it is described to determine multiple lip images corresponding with multiple participles respectively, including：

In the multiple lip images divided by simple or compound vowel of a Chinese syllable, lip image corresponding with participle simple or compound vowel of a Chinese syllable is determined；

In by initial consonant and multiple lip images of simple or compound vowel of a Chinese syllable division, lip figure corresponding with the initial consonant and simple or compound vowel of a Chinese syllable that segment is determined Picture；

Initial consonant and simple or compound vowel of a Chinese syllable are input to lip iconic model, obtain the lip image of lip iconic model output.

Optionally, the adjustment lip image set obtains lip image set corresponding with facial image, including：

The lip feature in facial image is adjusted, so that lip feature and the lip characteristic matching in lip image；

Facial image after several are adjusted is determined as lip image set corresponding with facial image.

Optionally, the synthesis user voice data and lip video data, obtain user video data, including：

Determine the coding parameter of user voice data, the voice document after being encoded；

Determine the coding parameter of lip video data, the video file after being encoded；

Audio-visual synchronization is carried out to the video file after the voice document and coding after coding, obtains user video data.

A kind of data processing equipment, including：

Data cell is obtained, for obtaining user voice data and user's lteral data；Wherein, the user speech number According to corresponding with user's lteral data；

Image set unit is determined, for determining lip image set corresponding with user's lteral data；

Adjustment unit obtains lip image set corresponding with facial image, and synthesize for adjusting the lip image set The corresponding lip video data of facial image；

Synthesis unit for synthesizing user voice data and lip video data, obtains user video data.

Optionally, the determining image set unit, including：

Participle unit for user's lteral data semantic analysis and segment, obtains multiple participles and right The multiple participle attribute informations answered；

Lip elementary area is determined, for determining multiple lip images corresponding with multiple participles respectively；

Lip elementary area is adjusted, corresponding lip image is adjusted for being based on participle attribute information；

Component units form lip image set for the lip image after multiple adjustment.

Optionally, the adjustment unit includes：

Adjustment unit, for adjusting the lip feature in facial image, so that lip feature and the lip in lip image Characteristic matching；

Determination unit for the facial image after several are adjusted, is determined as lip image set corresponding with facial image.

A kind of data processing system, including：

Terminal is sent, for the face image that determines concurrently make a gift to someone using facial image to server；Send user speech number According to or user's lteral data to server；

Server for receiving and storing facial image, obtains user voice data and user's lteral data；Wherein, The user voice data is corresponding with user's lteral data；Determine lip image set corresponding with user's lteral data；It adjusts The whole lip image set obtains lip image set corresponding with facial image, and synthesize the corresponding lip video counts of facial image According to；User voice data and lip video data are synthesized, obtains user video data；User video data are sent to receiving end End；

Terminal is received, for receiving and showing user video data.

By more than technological means, following advantageous effect can be realized：

The application can be based on user voice data, and with reference to facial image, display is for voice number on facial image According to show with the effect of facial image displaying user voice data.The exchange side of instant message applications can be enriched in this way Formula.

Description of the drawings

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or it will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application, for those of ordinary skill in the art, without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.

Fig. 1 a are a kind of structure diagram of data processing system disclosed in the embodiment of the present application；

Fig. 1 b are a kind of flow chart of data processing method disclosed in the embodiment of the present application；

Fig. 2 is a kind of flow chart of data processing method disclosed in the embodiment of the present application；

Fig. 3 is the schematic diagram that the embodiment of the present application discloses some lips divided based on simple or compound vowel of a Chinese syllable；

Fig. 4 a-4c are the schematic diagram that the embodiment of the present application discloses some lips；

Fig. 5 is the schematic diagram that the embodiment of the present application discloses some lip characteristic points；

Fig. 6 is a kind of flow chart of data processing equipment disclosed in the embodiment of the present application.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, the technical solution in the embodiment of the present application is carried out clear, complete Site preparation describes, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, those of ordinary skill in the art are obtained every other without making creative work Embodiment shall fall in the protection scope of this application.

At present, in the exchange way in instant message applications, in order to enable exchange way is more diversified, needle is provided Voice data is carried out the scheme that shows of video and, carry out the scheme that video shows for lteral data.

According to one embodiment that the application provides, a kind of data processing method is provided.Referring to Fig. 1 a, including：It sends Terminal 100, server 200 and reception terminal 300.

The specific implementation of data processing method is described below, referring to Fig. 1 b, includes the following steps：

Step S101：It sends terminal 100 and determines the facial image used, and sender's face image is to server 200.

Step S102：It sends terminal 100 and sends user voice data or user's lteral data to server 200.

Step S103：Server 200 receives user voice data or user's lteral data, and obtain user voice data with And user's lteral data；Wherein, the user voice data is corresponding with user's lteral data.

When send terminal 100 send be user voice data in the case of, server 200 is in response to text input by user Digital data obtains user's lteral data, then, is converted to voice data based on lteral data and obtains user voice data.

The process that voice data is converted to based on lteral data has been mature technology, and details are not described herein.

When send terminal 200 send be user's lteral data in the case of, server 200 is in response to language input by user Sound data obtain user voice data, then, are converted to lteral data based on voice data and obtain user's lteral data.

The process that lteral data is converted to based on voice data has been mature technology, and details are not described herein.

Step S104：Server 200 determines lip image set corresponding with user's lteral data.

Referring to Fig. 2, this step specifically includes：

Step S201：User's lteral data semantic analysis and segment, obtains multiple participles and corresponding Multiple participle attribute informations.

According to the category of language of user's lteral data, lteral data is segmented to obtain multiple participles.For example, with user's word Data are speech category that there are two types of the determining user's lteral data tools of first choice for " Hello, hello "：English and Chinese.

English components are segmented according to English participle mode, such as each word is a participle.To Chinese part It is segmented according to Chinese mode, such as a Chinese character is a participle.After so, being segmented to user's lteral data It arrives：It is Hello, big, family, good.

Step S202：Multiple lip images corresponding with multiple participles are determined respectively.

This step can be by three kinds of realization methods：

The first realization method：Classification mode is divided based on simple or compound vowel of a Chinese syllable.

It is found after a large amount of lip data are analyzed, lip depends primarily on the simple or compound vowel of a Chinese syllable (for example, a, ang, ao etc.) of participle.Cause This, can be based on the multiple lip classifications of simple or compound vowel of a Chinese syllable and, lip image corresponding with lip classification.It is to be drawn based on simple or compound vowel of a Chinese syllable referring to Fig. 3 The signal of some lips divided.

Therefore after being segmented, can the simple or compound vowel of a Chinese syllable based on participle, search obtain lip image corresponding with simple or compound vowel of a Chinese syllable.Example Such as, by taking " big " as an example, simple or compound vowel of a Chinese syllable is " a ", then searches lip image corresponding with simple or compound vowel of a Chinese syllable " a ".

Second realization method：Class is divided otherwise based on initial consonant and simple or compound vowel of a Chinese syllable.

Lip depends primarily on the simple or compound vowel of a Chinese syllable of participle, but the initial consonant segmented can also generate lip some difference, so, it can Lip image is determined with the initial consonant based on participle and simple or compound vowel of a Chinese syllable jointly.

Therefore after being segmented, initial consonant and simple or compound vowel of a Chinese syllable that can be based on participle be searched and are obtained with initial consonant and common with simple or compound vowel of a Chinese syllable Corresponding lip image.For example, by taking " big " as an example, initial consonant is " d ", simple or compound vowel of a Chinese syllable is " a ", then searches common with initial consonant and simple or compound vowel of a Chinese syllable " da " With corresponding lip image.

Third realization method：Lip image is determined based on lip iconic model.

Initial consonant and simple or compound vowel of a Chinese syllable are in advance based on to train lip iconic model, training at present can be based on about lip iconic model Model trains the initial consonant of a large amount of words, simple or compound vowel of a Chinese syllable and its lip data, and the lip iconic model after the completion of being trained.

Therefore, the initial consonant and simple or compound vowel of a Chinese syllable of participle can be obtained after being analyzed, and is input to lip iconic model, is passed through After lip iconic model calculates, lip image corresponding with participle is obtained.

Referring to Fig. 4 a-4c, respectively " big " " family ", " good " lip image.

Step S203：Corresponding lip image is adjusted based on participle attribute information.

The attribute information of participle can include the attribute informations such as emotion information and the information volume of participle.Using emotion information as Example, the corresponding lip image of different emotions information are also different.For example, when emotion information is happy, the lip of " hello " is said Shape, with emotion information for it is boiling with rage when, the lip for saying " hello " is different.

Also, lip can also can be big with the raising opening and closing degree of volume, lip is with the reduction opening and closing degree of volume It can reduce.And hence it is also possible to the opening and closing degree of lip is adjusted based on the size of word.

A large amount of lip sample can be obtained in advance, and obtains the attribute information of sample, with the attribute information of lip sample For input, using lip image as output, training pattern is trained.It is obtained after training with the attribute information of lip sample For input, using lip image as the model of output；The model can be based on attribute information and export lip corresponding with attribute information Image.

Step S204：Lip image composition lip image set after multiple adjustment.

It is performed both by the above process for each participle, obtains the corresponding lip image of multiple participles.By user's word number According to being obtained after participle by the multiple participles successively arranged in user's lteral data.By the sequence of the priority of participle, determine and segment Multiple orderly lip images are determined as lip image set by the sequence of corresponding lip image.

Fig. 1 b are then returned to, enter step S105：It adjusts the lip image set and obtains lip corresponding with facial image Image set, and synthesize the corresponding lip video data of facial image.

It sends terminal and uploads facial image in advance to server, therefore, server is obtained with sending 100 corresponding people of terminal Face image.There is lip image on facial image.

Below by taking " hello " as an example, this step is illustrated.

First process：Obtain the lip image of " big ".

Step A：Facial image is identified determining lip eigenmatrix 1.

Referring to Fig. 5, lip has many features point：Lip outer feature point m1-m10；Characteristic point n1-n8 on the inside of lip. Characteristic point can generate eigenmatrix according to definite composition mode.Specifically generator matrix mode can be determined according to actual algorithm, Details are not described herein.

Step B：" big " in lip image set corresponding lip image is identified, determines lip eigenmatrix 2.

Step C：Determine the transformation matrix 1 between lip eigenmatrix 1 and lip eigenmatrix 2.

D steps：By lip eigenmatrix 1 and 1 product 1 of transformation matrix, it is determined as the facial image 1 with lip image 1.

Second process：Based on the lip image that " family " is obtained on the basis of " big " lip.

Step A：By lip eigenmatrix 1 and 1 product 1 of transformation matrix, it is determined as lip eigenmatrix 3.

Step B：" family " in lip image set corresponding lip image is identified, determines lip eigenmatrix 4.

Step C：Determine the transformation matrix 2 between lip eigenmatrix 3 and lip eigenmatrix 4.

D steps：It is determined as lip eigenmatrix 3 and 2 product 2 of transformation matrix for the facial image with lip image 2 2。

Third process：Based on the lip image that " good " is obtained on the basis of " family " lip.

Step A：Lip eigenmatrix 3 and 2 product 2 of transformation matrix, are determined as lip eigenmatrix 5.

Step B：" good " corresponding lip image in lip image set is identified, determines lip eigenmatrix 6.

Step C：Determine the transformation matrix 3 between lip eigenmatrix 5 and lip eigenmatrix 6.

D steps：By lip eigenmatrix 5 and 3 product 3 of transformation matrix, it is determined as the facial image 3 with lip image 3.

By the facial image 1 with lip image 1, the facial image 2 with lip image/2 and with lip image 3 Facial image 3, be determined as lip image set corresponding with facial image.

By several face image synthesis videos, the corresponding lip video data of facial image is obtained.

Step S106：Server 200 synthesizes user voice data and lip video data, obtains user video data.

Server 200 determines the coding parameter of user voice data, the voice document after being encoded；Server 200 is true Determine the coding parameter of lip video data, the video file after being encoded；Server 200 is to the voice document and volume after coding Video file after code carries out audio-visual synchronization, obtains user video data.

For example, the coding parameter of lip video data can select H264 to be encoded, the frame per second of video is set as 30 frames；Sound It is 1 that frequency, which selects AAC coding channels number, sample rate 44100, final to synthesize MP4 forms.

Step S107：Server 200 sends user video data to reception terminal 300.

Through the above, the application can be obtained to have the advantages that：

Referring to Fig. 6, this application provides a kind of data processing equipment, including：

Data cell 31 is obtained, for obtaining user voice data and user's lteral data；Wherein, the user speech Data are corresponding with user's lteral data；

Image set unit 32 is determined, for determining lip image set corresponding with user's lteral data；

Adjustment unit 33 obtains lip image set corresponding with facial image, and close for adjusting the lip image set Into the corresponding lip video data of facial image；

Synthesis unit 34 for synthesizing user voice data and lip video data, obtains user video data.

Wherein described determining image set unit 32, including：

Participle unit 321, for user's lteral data semantic analysis and segment, obtain multiple participles and Corresponding multiple participle attribute informations；

Lip elementary area 322 is determined, for determining multiple lip images corresponding with multiple participles respectively；

Lip elementary area 323 is adjusted, corresponding lip image is adjusted for being based on participle attribute information；

Component units 324 form lip image set for the lip image after multiple adjustment.

Wherein, the adjustment unit 33 includes：

Lip unit 331 is adjusted, for adjusting the lip feature in facial image, so that in lip feature and lip image Lip characteristic matching；

Determination unit 332 for the facial image after several are adjusted, is determined as lip image corresponding with facial image Collection.

The particular content of said program may refer to the embodiment shown in Fig. 1 b, and details are not described herein.

If the function described in the present embodiment method is realized in the form of SFU software functional unit and is independent product pin It sells or in use, can be stored in a computing device read/write memory medium.Based on such understanding, the embodiment of the present application The part contribute to the prior art or the part of the technical solution can be embodied in the form of software product, this is soft Part product is stored in a storage medium, used including some instructions so that computing device (can be personal computer, Server, mobile computing device or network equipment etc.) perform all or part of step of each embodiment the method for the application Suddenly.And aforementioned storage medium includes：USB flash disk, read-only memory (ROM, Read-Only Memory), is deposited mobile hard disk at random The various media that can store program code such as access to memory (RAM, Random Access Memory), magnetic disc or CD.

Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with it is other The difference of embodiment, just to refer each other for same or similar part between each embodiment.

The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or using the application. A variety of modifications of these embodiments will be apparent for those skilled in the art, it is as defined herein General Principle can in other embodiments be realized in the case where not departing from spirit herein or range.Therefore, the application The embodiments shown herein is not intended to be limited to, and is to fit to and the principles and novel features disclosed herein phase one The most wide range caused.

Claims

1. a kind of data processing method, which is characterized in that including：

Obtain user voice data and user's lteral data；Wherein, the user voice data and user's lteral data It is corresponding；

Determine lip image set corresponding with user's lteral data；

It adjusts the lip image set and obtains lip image set corresponding with facial image, and synthesize the corresponding lip of facial image Video data；

User voice data and lip video data are synthesized, obtains user video data.

2. the method as described in claim 1, which is characterized in that the acquisition user voice data and user's lteral data, Including：

User's lteral data is obtained in response to lteral data input by user, voice data is converted to based on lteral data and is used Family voice data；Alternatively,

User voice data is obtained in response to voice data input by user, lteral data is converted to based on voice data and is used Family lteral data.

3. the method as described in claim 1, which is characterized in that described to determine lip image corresponding with user's lteral data Collection, including：

Semantic analysis is carried out to user's lteral data and is segmented, obtains multiple participles and corresponding multiple participle attribute letters Breath；

Lip image composition lip image set after multiple adjustment.

4. the method as described in claim 1, which is characterized in that described to determine multiple lip figures corresponding with multiple participles respectively Picture, including：

In by initial consonant and multiple lip images of simple or compound vowel of a Chinese syllable division, lip image corresponding with the initial consonant and simple or compound vowel of a Chinese syllable that segment is determined；

5. the method as described in claim 1, which is characterized in that the adjustment lip image set obtains and facial image pair The lip image set answered, including：

6. the method as described in claim 1, which is characterized in that the synthesis user voice data and lip video data obtain User's video data is obtained, including：

7. a kind of data processing equipment, which is characterized in that including：

Data cell is obtained, for obtaining user voice data and user's lteral data；Wherein, the user voice data with User's lteral data corresponds to；

Adjustment unit obtains lip image set corresponding with facial image, and synthesize face for adjusting the lip image set The corresponding lip video data of image；

8. device as claimed in claim 7, which is characterized in that the determining image set unit, including：

Participle unit for user's lteral data semantic analysis and segment, obtains multiple participles and corresponding Multiple participle attribute informations；

Component units form lip image set for the lip image after multiple adjustment.

9. device as claimed in claim 7, which is characterized in that the adjustment unit includes：

Adjustment unit, for adjusting the lip feature in facial image, so that lip feature and the lip feature in lip image Matching；

10. a kind of data processing system, which is characterized in that including：

Terminal is sent, for the face image that determines concurrently make a gift to someone using facial image to server；Send user voice data or User's lteral data is to server；

Server for receiving and storing facial image, obtains user voice data and user's lteral data；Wherein, it is described User voice data is corresponding with user's lteral data；Determine lip image set corresponding with user's lteral data；Adjustment institute It states lip image set and obtains lip image set corresponding with facial image, and synthesize the corresponding lip video data of facial image； User voice data and lip video data are synthesized, obtains user video data；User video data are sent to receiving terminal；

Terminal is received, for receiving and showing user video data.