CN106777204B

CN106777204B - Picture data processing method and device and mobile terminal

Info

Publication number: CN106777204B
Application number: CN201611209247.XA
Authority: CN
Inventors: 车继红
Original assignee: Beijing Anyun Century Technology Co Ltd
Current assignee: Beijing Anyun Century Technology Co Ltd
Priority date: 2016-12-23
Filing date: 2016-12-23
Publication date: 2020-08-07
Anticipated expiration: 2036-12-23
Also published as: CN106777204A

Abstract

The invention provides a picture data processing method and device and a mobile terminal. The method comprises the following steps: acquiring first picture data to be processed, and identifying character facial feature data in the first picture data; acquiring an audio file corresponding to the character facial feature data, wherein the audio file is voice data of a character corresponding to the character facial feature data; and synthesizing second picture data according to the first picture data and the audio file, wherein the second picture data carries audio identification information of the audio file. The technical scheme enables the audio file to be synthesized into the picture data, so that the application of the audio picture in the mobile terminal is realized, compared with a video, the mode of synthesizing the audio file in the picture data is simpler and more convenient, and the occupied storage space is small.

Description

Picture data processing method and device and mobile terminal

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for processing picture data, and a mobile terminal.

Background

At present, recording life clips by using photos or videos is a way which is frequently used by people. The mobile phone is more popular because the mobile phone has the characteristics of small volume, thin thickness, convenient carrying and operation and the like.

The main purpose of generating the photos is to memorize and store the photos, but the memory of the photos is faded away due to no environmental memory as time goes on, and the recording, storing, browsing or sharing of videos cannot meet the requirements of simplicity and convenience needed by people. On one hand, the requirements of people on the mobile phone photographing function and the diversified artistic expression of photos are continuously improved, and on the other hand, the operations of recording, storing, browsing, sharing and the like are required to be simple and convenient, so that the common photographing and photographing functions cannot meet the diversified requirements of users.

Disclosure of Invention

In view of the above problems, the present invention is proposed to provide a picture data processing method, apparatus and mobile terminal that overcome or at least partially solve the above problems.

According to an aspect of the present invention, there is provided a method for processing picture data, applied to a mobile terminal, the method including:

acquiring first picture data to be processed, and identifying character facial feature data in the first picture data;

acquiring an audio file corresponding to the character facial feature data, wherein the audio file is voice data of a character corresponding to the character facial feature data;

and synthesizing second picture data according to the first picture data and the audio file, wherein the second picture data carries audio identification information of the audio file.

Optionally, the acquiring of the first picture data to be processed includes at least one of:

acquiring the first picture data to be processed from a system gallery of the mobile terminal;

shooting the first picture data to be processed by utilizing the shooting function of the mobile terminal;

and calling a camera client installed on the mobile terminal, and shooting the first picture data to be processed by utilizing the shooting function of the camera client.

Optionally, an audio file corresponding to the person facial feature data is obtained, and the audio file includes at least one of the following:

locally selecting an audio file corresponding to the character facial feature data;

downloading an audio file corresponding to the character facial feature data from a network side;

and calling a camera client with a recording function installed on the mobile terminal, and executing recording operation by using the recording function of the camera client to obtain an audio file corresponding to the character facial feature data.

Optionally, synthesizing second picture data according to the first picture data and the audio file, including:

calling a camera client with a sound picture synthesizing function installed on the mobile terminal;

transmitting the first picture data and the audio file to the camera client;

and triggering the camera client to start the sound picture synthesis function, and synthesizing the first picture data and the audio file by using the sound picture synthesis function of the camera client to obtain the second picture data.

generating unique audio identification information for the audio file; or sending the audio file to a server, generating unique audio identification information for the audio file by the server, and sending the generated audio identification information to the mobile terminal;

and writing the audio identification information into the specified data of the first picture data to obtain the second picture data.

Optionally, when the first picture data includes a plurality of pieces, writing the audio identification information into the specific data of the first picture data includes:

synthesizing the plurality of pieces of first picture data according to a specified rule to obtain dynamic picture data conforming to an image interchange format GIF;

and writing the audio identification information into the specified data of the dynamic picture data.

Optionally, the audio identification information includes a uniform resource locator UR L of the audio file.

Optionally, the specific data includes format data of the first picture data or the dynamic picture data.

Optionally, the method further comprises:

establishing a mapping relation between the audio file and the audio identification information;

and sending the audio file and/or the mapping relation to the server for storage, so that the mobile terminal analyzes and acquires the audio identification information after receiving the second picture data, and acquires the audio file from the server for playing according to the audio identification information.

Optionally, when a plurality of pieces of the character facial feature data are recognized, synthesizing second picture data from the first picture data and the audio file includes:

extracting sub-audio files respectively corresponding to the face characteristic data of each person from the audio files;

synthesizing the sub audio file corresponding to the face characteristic data of each person with the first picture data to obtain a plurality of sub picture data;

and combining the plurality of sub-picture data to obtain the second picture data.

Optionally, extracting, from the audio file, sub audio files respectively corresponding to the facial feature data of each person includes:

determining the character information of the characters respectively corresponding to the face feature data of each character, wherein the character information comprises at least one of the head portrait, the communication number, the name and the nickname of each corresponding character;

acquiring a corresponding relation between preset character information and audio data, and determining audio data corresponding to each character information according to the corresponding relation;

and extracting sub-audio files matched with the audio data from the audio files according to the audio data respectively corresponding to the character information to serve as the sub-audio files respectively corresponding to the face characteristic data of the characters.

Optionally, the method further comprises:

determining communication information of a person corresponding to the person face feature data, wherein the communication information comprises at least one of the communication number and an instant communication account number bound with the communication number;

and sending the second picture data to the corresponding person according to the communication information.

Optionally, the method further comprises:

and outputting prompt information for prompting that the second picture data to be sent by the user is a sound picture.

Optionally, the outputting prompt information for prompting that the second picture data to be sent by the user is an audio picture includes:

adding an identifier for identifying that the second picture data carries an audio file on the second picture data;

alternatively, the first and second electrodes may be,

and outputting the prompt message by voice.

According to another aspect of the present invention, there is provided an apparatus for processing picture data, which is disposed in a mobile terminal, the apparatus including:

the first acquisition module is suitable for acquiring first picture data to be processed and identifying character facial feature data in the first picture data;

the second acquisition module is suitable for acquiring an audio file corresponding to the character facial feature data, and the audio file is voice data of a character corresponding to the character facial feature data;

and the synthesis module is suitable for synthesizing second picture data according to the first picture data and the audio file, wherein the second picture data carries the audio identification information of the audio file.

Optionally, the first obtaining module is further adapted to at least one of:

Optionally, the second obtaining module is further adapted to at least one of:

Optionally, the synthesis module is further adapted to:

transmitting the first picture data and the audio file to the camera client;

Optionally, the synthesis module is further adapted to:

when the first picture data comprises a plurality of pieces of first picture data, synthesizing the plurality of pieces of first picture data according to a specified rule to obtain dynamic picture data conforming to an image interchange format GIF;

Optionally, the apparatus further comprises:

the establishing module is suitable for establishing a mapping relation between the audio file and the audio identification information;

and the first sending module is suitable for sending the audio file and/or the mapping relation to the server for storage, so that the mobile terminal analyzes and obtains the audio identification information after receiving the second picture data, and obtains the audio file from the server for playing according to the audio identification information.

Optionally, the synthesis module is further adapted to:

when a plurality of pieces of the character facial feature data are recognized, extracting sub audio files corresponding to the respective character facial feature data from the audio files;

Optionally, the synthesis module is further adapted to:

Optionally, the apparatus further comprises:

the determining module is suitable for determining communication information of a person corresponding to the person facial feature data, and the communication information comprises at least one of the communication number and an instant communication account number bound with the communication number;

and the second sending module is suitable for sending the second picture data to the corresponding person according to the communication information.

Optionally, the apparatus further comprises:

and the output module is suitable for outputting prompt information for prompting that the second picture data to be sent by the user is an audio picture.

Optionally, the output module is further adapted to:

alternatively, the first and second electrodes may be,

and outputting the prompt message by voice.

According to another aspect of the present invention, there is provided a mobile terminal including a processor and a memory, the memory storing a program for executing the above-described picture data processing method, the processor being configured to execute the program stored in the memory.

By adopting the technical scheme provided by the embodiment of the invention, the face characteristic data of the person in the first picture data to be processed can be identified, the audio file corresponding to the face characteristic data of the person (namely the voice data of the person corresponding to the face characteristic data of the person) is obtained, and the second picture data is synthesized according to the first picture data and the audio file, so that the audio file can be synthesized into the picture data, the application of the sound picture in the mobile terminal is realized, particularly when the picture data is a picture, the user can synthesize the audio file into the picture by adopting the technical scheme, further the user can record some information which cannot be recorded in the picture by the audio file synthesized into the picture while storing the picture, and the interest of the user in taking the picture is increased. In addition, compared with video, the method for synthesizing the audio file in the picture data is simpler and more convenient, and the occupied storage space is small.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a schematic flow chart of a picture data processing method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a method for processing picture data according to a first embodiment of the present invention;

fig. 3 is a schematic block diagram of a picture data processing apparatus according to an embodiment of the present invention;

fig. 4 is a schematic block diagram of a picture data processing apparatus according to another embodiment of the present invention;

fig. 5 is a block diagram illustrating a partial structure of a mobile phone related to a mobile terminal according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a schematic flow chart of a picture data processing method according to an embodiment of the present invention. As shown in fig. 1, the method is applied to a mobile terminal, which may generally include the following steps S101-S103:

step S101, obtaining first picture data to be processed, and identifying character face characteristic data in the first picture data.

Step S102, obtaining an audio file corresponding to the character facial feature data, wherein the audio file is the voice data of the character corresponding to the character facial feature data.

Step S103, synthesizing second picture data according to the first picture data and the audio file, wherein the second picture data carries audio identification information of the audio file.

In this embodiment, the first picture data may be a photograph.

The following describes the above steps S101 to S103 in detail.

Step S101 is executed first, that is, first picture data to be processed is acquired, and the person facial feature data in the first picture data is identified. The first picture data to be processed can be acquired by any one of the following modes:

the method comprises the steps that first picture data to be processed are obtained from a system gallery of the mobile terminal; when the system gallery comprises the system photo album, any one or more photos can be obtained from the system photo album of the mobile terminal as the first picture data to be processed.

The second mode is that the first picture data to be processed is shot by utilizing the shooting function of the mobile terminal; the shooting function of the mobile terminal may be a shooting function of a camera of the mobile terminal system, or may be a shooting function of a third-party application installed on the mobile terminal, for example, the third-party application is an application with a shooting function, such as a beautiful show and a free-hand shooting.

And thirdly, calling up a camera client installed on the mobile terminal, and shooting the first picture data to be processed by utilizing the shooting function of the camera client.

In addition, in this step, the image recognition technology can be used to recognize the character facial feature data from the first image data, and the image recognition technology is the prior art, and thus is not repeated. One or more character facial feature data may be included in the first picture data.

After the facial feature data of the person in the first picture data is identified, step S102 is continuously executed to obtain an audio file corresponding to the facial feature data of the person, where the audio file is voice data of the person corresponding to the facial feature data of the person. The audio file corresponding to the character face feature data can be acquired by any one of the following modes:

the first mode is to select an audio file corresponding to the character facial feature data from the local. For example, a song, an episode, or a pre-recorded audio file is selected locally as the audio file corresponding to the character facial feature data.

And secondly, downloading an audio file corresponding to the character facial feature data from the network side. For example, a song, an episode, and the like are downloaded from the network side as an audio file corresponding to the character facial feature data.

And thirdly, calling up a camera client with a recording function installed on the mobile terminal, and executing recording operation by using the recording function of the camera client to obtain an audio file corresponding to the character facial feature data.

And fourthly, starting the recording function of the mobile terminal, executing recording operation by utilizing the recording function of the mobile terminal, and obtaining an audio file corresponding to the character facial feature data.

In an embodiment, when a plurality of pieces of person facial feature data are recognized, an audio file corresponding to each piece of person facial feature data may also be obtained, which may be implemented as any one of the following manners:

and in the first mode, the audio files corresponding to the face characteristic data of the people are selected from the local. For example, one audio file is selected locally as the audio file corresponding to the first person facial feature data, while another audio file is selected locally as the audio file corresponding to the second person facial feature data.

And secondly, downloading audio files corresponding to the face characteristic data of each person from the network side. For example, one audio file is downloaded from the network side as an audio file corresponding to the first person facial feature data, while another audio file is downloaded from the network side as an audio file corresponding to the second person facial feature data.

And thirdly, calling up a camera client with a recording function installed on the mobile terminal, and executing multiple recording operations by using the recording function of the camera client to obtain audio files corresponding to the face characteristic data of each person. For example, a first recording file is recorded by using the recording function of the camera client, where the first recording file is an audio file corresponding to the facial feature data of the first person, and then a second recording file is recorded by using the recording function of the camera client, where the second recording file is an audio file corresponding to the facial feature data of the second person.

And fourthly, starting the recording function of the mobile terminal, executing recording operation by utilizing the recording function of the mobile terminal, and obtaining audio files corresponding to the face characteristic data of each person. For example, a third recording file is recorded by using the recording function of the mobile terminal, where the third recording file is an audio file corresponding to the facial feature data of the first person, and then a fourth recording file is recorded by using the recording function of the mobile terminal, where the fourth recording file is an audio file corresponding to the facial feature data of the second person.

It should be noted that, when a plurality of pieces of character facial feature data are recognized, the audio files corresponding to the respective pieces of character facial feature data may be acquired in the same or different manners. For example, an audio file is selected from the local as the audio file corresponding to the facial feature data of the first person, then the recording function of the mobile terminal is started, and a section of recording file is recorded as the audio file corresponding to the facial feature data of the second person by using the recording function of the mobile terminal.

After the audio file corresponding to the data of the facial features of the person is obtained, step S103 is executed to synthesize second picture data according to the first picture data and the audio file, where the second picture data carries audio identification information of the audio file, where the audio identification information may include at least one of a uniform resource locator UR L and number information of the audio file.

In one embodiment, the second picture data may be synthesized as follows: firstly, a camera client with a sound picture synthesizing function installed on a mobile terminal is called up; such as a talking camera client, which has a voiced picture synthesis function. And secondly, transmitting the first picture data and the audio file to the camera client. And triggering the camera client to start the sound picture synthesis function, and synthesizing the first picture data and the audio file by using the sound picture synthesis function of the camera client to obtain second picture data. For example, the talking camera client is triggered to start the talking picture synthesizing function, and synthesizes the first picture data and the audio file by using the started talking picture synthesizing function, so as to obtain the second picture data.

In another embodiment, when the first picture data is acquired by using the camera client having the audio picture synthesizing function and the audio file corresponding to the number of the facial features of the person in the first picture data is recorded by using the recording function of the camera client, the first picture data and the audio file do not need to be transmitted to the camera client, and only the audio picture synthesizing function of the camera client needs to be started to directly synthesize the first picture data and the audio file.

In another embodiment, the second picture data can be synthesized by firstly generating unique audio identification information for the audio file, and secondly writing the audio identification information into the specified data of the first picture data to obtain the second picture data.

For example, the format data of a certain first picture data is "123. jpg", and the audio identification information of the audio file corresponding to the certain first picture data is UR L "http:// www.11111", so that the UR L of the audio file can be written into the last position of the format data of the first picture data when the first picture data and the audio file are synthesized, and the second picture data "123. jpg. http:// www.11111" can be obtained.

When the first picture data includes a plurality of pieces of first picture data, audio identification information may be written into the designation data of the first picture data in such a manner that, first, the plurality of pieces of first picture data are synthesized in accordance with a designation rule to obtain moving picture data conforming to the image interchange format GIF, and, second, the audio identification information is written into the designation data of the moving picture data, wherein, the designation rule is, for example, synthesized in order of acquisition of the plurality of pieces of first picture data, synthesized in order of arrangement of the plurality of pieces of first picture data, synthesized in order of data of the plurality of pieces of first picture data from large to small (or from small to large), and so on.

Of course, the writing position for writing the audio identification information into the designated data of the first picture data may be other positions in the designated data besides the last position listed above, such as an initial position, a preset middle position, and the like in the designated data.

As described above, the mobile terminal can identify one or more pieces of character facial feature data from the first picture data, and when a plurality of pieces of character facial feature data are identified, the second picture data can be synthesized by using the above-mentioned synthesis method, that is, the plurality of pieces of character facial feature data correspond to the same audio file, and the first picture data and the same audio file are synthesized to obtain the second picture data.

Further, the second picture data may be synthesized by: firstly, extracting sub-audio files respectively corresponding to face characteristic data of each person from the audio files; secondly, synthesizing the sub audio file and the first picture data respectively corresponding to the face characteristic data of each person to obtain a plurality of sub picture data; and combining the plurality of sub-picture data to obtain second picture data.

Specifically, the sub-audio files corresponding to the facial feature data of each person can be extracted from the audio file according to the following steps: firstly, determining the character information of the characters corresponding to the face characteristic data of each character respectively, wherein the character information comprises at least one of the head portrait, the communication number, the name and the nickname of each corresponding character; secondly, acquiring a corresponding relation between preset character information and audio data, and determining audio data corresponding to each character information according to the corresponding relation; and thirdly, extracting sub-audio files matched with the audio data from the audio files according to the audio data respectively corresponding to the information of each person, and taking the sub-audio files as the sub-audio files respectively corresponding to the face characteristic data of each person.

For example, the first picture data includes a person a and a person B, so that 2 pieces of person facial feature data (i.e., the person facial feature data of the person a and the person facial feature data of the person B) can be recognized, sub audio files corresponding to the person a and the person B are included in the audio file, and are stored in a corresponding relationship between the audio data and the person information, so that the audio data corresponding to the person a and the audio data corresponding to the person B can be determined according to the corresponding relationship, and then the sub audio files matching with the respective audio data are extracted from the audio files according to the audio data corresponding to the person a and the audio data corresponding to the person B.

In one embodiment, the above method further comprises the steps of: firstly, establishing a mapping relation between an audio file and audio identification information; and secondly, sending the audio file and/or the mapping relation between the audio file and the audio identification information to a server for storage, so that the mobile terminal analyzes and obtains the audio identification information after receiving the second picture data, and obtains the audio file from the server according to the audio identification information for playing. For example, the server stores audio files and mapping relationships between the audio files and the audio identification information, when the mobile terminal receives second picture data (i.e., a sound picture), the audio identification information carried in the second picture data is analyzed, the audio identification information is sent to the server, the server queries and obtains the audio files corresponding to the audio identification information according to the mapping relationships between the audio files and the audio identification information, and the obtained audio files are sent to the mobile terminal, so that the mobile terminal can play the audio files when viewing the second picture data.

In one embodiment, the above method further comprises the steps of: firstly, determining communication information of a person corresponding to the face characteristic data of the person, wherein the communication information comprises at least one of a communication number and an instant communication account number bound with the communication number; and secondly, sending the second picture data to the corresponding person according to the communication information. When the communication information of the person corresponding to the person face feature data is determined, the person face feature data can be matched with the contact person head portrait in the address book of the mobile terminal and/or the contact person head portrait in the third-party communication application installed in the mobile terminal, and the corresponding communication information is determined according to the matched contact person head portrait.

For example, the first picture data comprises a person a, after the person facial feature data of the person a is recognized, the person facial feature data of the person a is matched with a contact portrait in an address book of the mobile terminal and/or a contact portrait in a third-party communication application installed in the mobile terminal, communication information of the person a is determined according to the matched contact portrait, and then the second picture data is sent to the person a, so that the person a can also share a sound picture related to the person a, and the interaction pleasure between users is increased.

Of course, the audio picture (i.e., the second picture data) is not limited to be sent to the person included in the picture, and may also be sent to any other user, for example, after the audio picture is synthesized, the audio picture is sent to any contact in the address list of the mobile terminal, or sent to any friend in a third-party communication application installed on the mobile terminal.

When sending the second picture data, the method further comprises the following steps: and outputting prompt information for prompting that the second picture data to be sent by the user is a sound picture. Specifically, the output mode of the prompt message includes any one of the following modes: adding an identifier for identifying that the second picture data carries an audio file on the second picture data; for example, a note identifier is added at any position on the second picture data to indicate that the second picture data carries an audio file. And a second mode is that prompt information is output by voice.

The following describes a method for processing picture data according to an embodiment of the present invention.

Example one

Fig. 2 is a schematic flowchart of a method for processing picture data according to a first embodiment of the present invention. In the first embodiment, the first picture data to be processed is a photo, and is applied to a mobile phone. As shown in fig. 2, the method comprises the following steps S201-S207:

step S201, a shooting function of the mobile phone is started to take a photo to be processed.

In step S202, the face feature data of the person in the photograph is recognized.

Step S203, the recording function of the mobile phone is started to record a corresponding audio file for the face feature data of the person.

And step S204, sending the audio file to a server, and generating unique audio identification information for the audio file by the server, wherein the audio identification information can be at least one of a uniform resource locator UR L and number information of the audio file.

Step S205, acquiring audio identification information corresponding to the audio file from the server.

Step S206, writing the audio identification information into the preset position of the format data of the photo to synthesize the photo and the audio file to obtain the audio photo. The preset position may be any position in the format data, such as an initial position, a specified middle position, or an end position in the format data.

And step S207, selecting at least one contact from the mobile phone address list, and sending the sound photo to the selected at least one contact.

The technical solution of the first embodiment is described by taking a specific scene as an example. For example, the user takes a picture of the child "wangming" with the format data of "123. gif" by using the shooting function of the mobile phone, and the mobile phone recognizes the face feature information of the person of "wangming" in the picture and starts the recording function of the mobile phone. Secondly, the user records a section of audio file by utilizing the recording function of the mobile phone, for example, if the user is good at the present day and plays Trojan horse, the serial number information of the audio file is 0000. Then, the mobile phone writes the number information "0000" of the audio file into the end position in the format data "123. gif" of the photo, and obtains the audio photo named "123. gif.0000", and the format of the audio photo is still in the picture format. Finally, the user can send the vocal photos to his family, so that the family can also see the vocal photos taken for "wangxing".

Therefore, by adopting the technical scheme in the first embodiment of the invention, the face feature data of the person in the photo to be processed can be identified, the audio file corresponding to the face feature data of the person (namely the voice data of the person corresponding to the face feature data of the person) can be obtained, and the audio photo can be synthesized according to the photo and the audio file, so that the audio file can be synthesized into the photo, the application of the audio photo in a mobile phone is realized, a user can record some information which cannot be recorded in the photo by synthesizing the audio file in the photo while storing the photo, and the interest of the user in taking the photo is increased. In addition, the mode of synthesizing the audio file in the photo is simpler and more convenient compared with the video, and the occupied storage space is small.

Fig. 3 is a schematic block diagram of a picture data processing apparatus according to an embodiment of the present invention. As shown in fig. 3, the apparatus is provided in a mobile terminal, and includes:

the first obtaining module 310 is adapted to obtain first picture data to be processed, and identify character facial feature data in the first picture data;

the second obtaining module 320, coupled to the first obtaining module 310, is adapted to obtain an audio file corresponding to the facial feature data of the person, where the audio file is voice data of the person corresponding to the facial feature data of the person;

the synthesizing module 330 is coupled to the second obtaining module 320 and adapted to synthesize second picture data according to the first picture data and the audio file, where the second picture data carries audio identification information of the audio file.

In one embodiment, the first obtaining module is further adapted to at least one of:

acquiring first picture data to be processed from a system gallery of the mobile terminal;

shooting first picture data to be processed by utilizing a shooting function of the mobile terminal;

In one embodiment, the second obtaining module is further adapted to at least one of:

selecting an audio file corresponding to the character facial feature data from the local;

In one embodiment, the synthesis module 330 is further adapted to:

transmitting the first picture data and the audio file to a camera client;

and triggering the camera client to start the sound picture synthesis function, and synthesizing the first picture data and the audio file by using the sound picture synthesis function of the camera client to obtain second picture data.

In one embodiment, the synthesis module 330 is further adapted to:

generating unique audio identification information for the audio file; or, the audio file is sent to a server, the server generates unique audio identification information for the audio file, and the generated audio identification information is sent to the mobile terminal;

and writing the audio identification information into the specified data of the first picture data to obtain second picture data.

In one embodiment, the synthesis module 330 is further adapted to:

when the first picture data comprises a plurality of pieces, synthesizing the plurality of pieces of first picture data according to a specified rule to obtain dynamic picture data conforming to the image interchange format GIF;

In one embodiment, the audio identifying information includes a uniform resource locator UR L of the audio file.

In one embodiment, the designation data includes format data of the first picture data or the moving picture data.

In one embodiment, as shown in fig. 4, the apparatus further comprises:

an establishing module 340, coupled to the synthesizing module 330, adapted to establish a mapping relationship between the audio file and the audio identification information;

the first sending module 350 is coupled to the establishing module 340, and is adapted to send the audio file and/or the mapping relationship to the server for storage, so that the mobile terminal obtains the audio identification information after receiving the second picture data, and obtains the audio file from the server according to the audio identification information for playing.

In one embodiment, the synthesis module 330 is further adapted to:

when a plurality of character facial feature data are identified, extracting sub-audio files corresponding to the character facial feature data from the audio files;

synthesizing the sub audio file and the first picture data respectively corresponding to the face characteristic data of each person to obtain a plurality of sub picture data;

and combining the plurality of sub-picture data to obtain second picture data.

In one embodiment, the synthesis module 330 is further adapted to:

determining the character information of the characters corresponding to the face feature data of each character respectively, wherein the character information comprises at least one item of head portrait, communication number, name and nickname of each corresponding character;

and extracting sub-audio files matched with the audio data from the audio files according to the audio data respectively corresponding to the information of each person, and taking the sub-audio files as the sub-audio files respectively corresponding to the face characteristic data of each person.

In one embodiment, the above apparatus further comprises:

the determining module is coupled with the synthesizing module 330 and is adapted to determine communication information of a person corresponding to the person facial feature data, wherein the communication information includes at least one of a communication number and an instant communication account number bound with the communication number;

and the second sending module is coupled with the determining module and is suitable for sending the second picture data to the corresponding person according to the communication information.

In one embodiment, the above apparatus further comprises:

and the output module is coupled with the second sending module and is suitable for outputting prompt information for prompting the user that the second picture data to be sent is a sound picture.

In one embodiment, the output module is further adapted to:

alternatively, the first and second electrodes may be,

and outputting prompt information by voice.

By adopting the device provided by the embodiment of the invention, the face characteristic data of the person in the first picture data to be processed can be identified, the audio file corresponding to the face characteristic data of the person (namely the voice data of the person corresponding to the face characteristic data of the person) is obtained, and the second picture data is synthesized according to the first picture data and the audio file, so that the audio file can be synthesized into the picture data, the application of the sound picture in the mobile terminal is realized, particularly when the picture data is a picture, the technical scheme enables a user to synthesize the audio file into the picture, further enables the user to record some information which cannot be recorded in the picture through the audio file synthesized into the picture while storing the picture, and increases the pleasure of the user in taking the picture. In addition, compared with video, the method for synthesizing the audio file in the picture data is simpler and more convenient, and the occupied storage space is small.

It should be understood by those skilled in the art that the apparatus for processing picture data in fig. 3 and fig. 4 can be used to implement the above-mentioned scheme for processing picture data, and the detailed description thereof should be similar to the above-mentioned method, and is not repeated herein to avoid complexity.

Fig. 5 shows only a portion related to the embodiment of the present invention for convenience of description, and please refer to the method portion in the embodiment of the present invention for details that are not disclosed. The mobile terminal may be any terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), a vehicle-mounted computer, etc., taking the terminal as the mobile phone as an example:

fig. 5 is a block diagram illustrating a partial structure of a mobile phone related to a mobile terminal according to an embodiment of the present invention. Referring to fig. 5, the handset includes: radio Frequency (RF) circuitry 510, memory 520, input unit 530, display unit 540, sensor 550, audio circuitry 560, wireless-fidelity (Wi-Fi) module 570, processor 580, and power supply 590. Those skilled in the art will appreciate that the handset configuration shown in fig. 5 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 5:

the RF circuit 510 may be used for receiving and transmitting signals during a message transmission or call, and in particular, for receiving downlink information of a base station and processing the received downlink information, and for transmitting data designed for uplink to the base station, the RF circuit 510 may include, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (L w noise amplifier, &lttttransmission = L "&ttt/t &gttna), a duplexer, etc. furthermore, the RF circuit 510 may communicate with a network and other devices through wireless communication, which may use any communication standard or protocol, including, but not limited to, a global system for Mobile communication (GSM), a General Packet radio Service (General Packet radio Service, GPRS), a Code Division Multiple Access (Code Division Multiple Access, Wideband CDMA), a Code Division Multiple Access (CDMA), a Short Service Access (SMS Service, L), a long Term Evolution (SMS) message, L, a Service, a Short Service (Service), a WCDMA, a Mobile communication system, a Mobile communication, a wireless.

The memory 520 may be used to store software programs and modules, and the processor 580 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 520. The memory 520 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 520 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 530 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 530 may include a touch panel 531 and other input devices 532. The touch panel 531, also called a touch screen, can collect touch operations of a user on or near the touch panel 531 (for example, operations of the user on or near the touch panel 531 by using any suitable object or accessory such as a finger or a stylus pen), and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 531 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, and sends the touch point coordinates to the processor 580, and can receive and execute commands sent by the processor 580. In addition, the touch panel 531 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 530 may include other input devices 532 in addition to the touch panel 531. In particular, other input devices 532 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 540 may include a display panel 541, and optionally, the display panel 541 may be configured in the form of a liquid crystal display (L required crystal display, &ttttranslation = L "&tttl &ttt/t &gttcd), an Organic light Emitting Diode (Organic L sight-Emitting Diode, O L ED), and the like, and further, the touch panel 531 may cover the display panel 541, and when a touch operation is detected on or near the touch panel 531, the touch panel 531 may be transferred to the processor 580 to determine the type of touch event, and then the processor 580 may provide a corresponding visual output on the display panel 541 according to the type of touch event.

The handset may also include at least one sensor 550, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 541 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 541 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuitry 560, speaker 561, and microphone 562 may provide an audio interface between a user and a cell phone. The audio circuit 560 may transmit the electrical signal converted from the received audio data to the speaker 561, and convert the electrical signal into a sound signal by the speaker 561 for output; on the other hand, the microphone 562 converts the collected sound signals into electrical signals, which are received by the audio circuit 560 and converted into audio data, which are then processed by the audio data output processor 580, and then passed through the RF circuit 510 to be sent to, for example, another cellular phone, or output to the memory 520 for further processing.

WiFi belongs to short distance wireless transmission technology, and the mobile phone can help the user to send and receive e-mail, browse web pages, access streaming media, etc. through the WiFi module 570, which provides wireless broadband internet access for the user. Although fig. 5 shows the WiFi module 570, it is understood that it does not belong to the essential constitution of the handset, and can be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 580 is a control center of the mobile phone, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 520 and calling data stored in the memory 520, thereby performing overall monitoring of the mobile phone. Alternatively, processor 580 may include one or more processing units; preferably, the processor 580 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 580.

The handset also includes a power supply 590 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 580 via a power management system, such that the power management system may be used to manage charging, discharging, and power consumption.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.

In the embodiment of the present invention, the processor 580 included in the terminal further has the following functions:

Optionally, the processor 580 also has the following functions:

transmitting the first picture data and the audio file to the camera client;

Optionally, the processor 580 also has the following functions:

alternatively, the first and second electrodes may be,

and outputting the prompt message by voice.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a picture data processing apparatus according to an embodiment of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Thus, it should be appreciated by those skilled in the art that while a number of exemplary embodiments of the invention have been illustrated and described in detail herein, many other variations or modifications consistent with the principles of the invention may be directly determined or derived from the disclosure of the present invention without departing from the spirit and scope of the invention. Accordingly, the scope of the invention should be understood and interpreted to cover all such other variations or modifications.

Claims

1. A picture data processing method is applied to a mobile terminal, and comprises the following steps:

synthesizing second picture data according to the first picture data and the audio file, wherein the second picture data carries audio identification information of the audio file;

wherein synthesizing second picture data from the first picture data and the audio file comprises:

writing the audio identification information into the specified data of the first picture data to obtain the second picture data;

when a plurality of pieces of character facial feature data are identified, the audio files corresponding to the character facial feature data are obtained in the same or different modes; acquiring an audio file corresponding to the character facial feature data, wherein the audio file comprises at least one of the following items:

calling a camera client with a recording function installed on the mobile terminal, and executing recording operation by using the recording function of the camera client to obtain an audio file corresponding to the character facial feature data;

and starting the recording function of the mobile terminal, executing recording operation by utilizing the recording function of the mobile terminal, and obtaining an audio file corresponding to the character facial feature data.

2. The method of claim 1, wherein the obtaining of the first picture data to be processed comprises at least one of:

3. The method of claim 1 or 2, wherein synthesizing second picture data from the first picture data and the audio file comprises:

transmitting the first picture data and the audio file to the camera client;

4. The method according to claim 1, wherein, when the first picture data includes a plurality of pieces, writing the audio identification information in the designation data of the first picture data includes:

5. The method of claim 1, wherein the audio identifying information comprises a uniform resource locator, UR L, of the audio file.

6. The method according to claim 4, wherein the designation data includes format data of the first picture data or the dynamic picture data.

7. The method of claim 4, wherein the method further comprises:

8. The method according to claim 1 or 2, wherein when a plurality of the character facial feature data are recognized, synthesizing second picture data from the first picture data and the audio file comprises:

9. The method of claim 8, wherein extracting sub-audio files from the audio file corresponding to respective facial feature data of the person comprises:

10. The method according to claim 1 or 2, wherein the method further comprises:

determining communication information of a person corresponding to the person face feature data, wherein the communication information comprises at least one of a communication number and an instant communication account number bound with the communication number;

11. The method of claim 10, wherein the method further comprises:

12. The method according to claim 11, wherein the outputting prompt information for prompting a user that the second picture data to be sent is a sound picture comprises:

alternatively, the first and second electrodes may be,

and outputting the prompt message by voice.

13. A picture data processing device is arranged in a mobile terminal, and the device comprises:

the synthesis module is suitable for synthesizing second picture data according to the first picture data and the audio file, and the second picture data carries audio identification information of the audio file;

the synthesis module is further adapted to: generating unique audio identification information for the audio file; or sending the audio file to a server, generating unique audio identification information for the audio file by the server, and sending the generated audio identification information to the mobile terminal;

the second obtaining module is further adapted to obtain, when a plurality of pieces of character facial feature data are identified, the audio files corresponding to the respective character facial feature data in the same or different manners, where the specific obtaining manner includes at least one of:

14. The apparatus of claim 13, wherein the first acquisition module is further adapted to at least one of:

15. The apparatus of claim 13 or 14, wherein the synthesis module is further adapted to:

transmitting the first picture data and the audio file to the camera client;

16. The apparatus of claim 13, wherein the synthesis module is further adapted to:

17. The apparatus of claim 13, wherein the audio identifying information comprises a uniform resource locator, UR L, of the audio file.

18. The apparatus according to claim 16, wherein the designation data includes format data of the first picture data or the moving picture data.

19. The apparatus of claim 13 or 14, wherein the apparatus further comprises:

20. The apparatus of claim 19, wherein the synthesis module is further adapted to:

21. The apparatus of claim 20, wherein the synthesis module is further adapted to:

22. The apparatus of claim 13 or 14, wherein the apparatus further comprises:

the determining module is suitable for determining communication information of a person corresponding to the person facial feature data, and the communication information comprises at least one of a communication number and an instant communication account number bound with the communication number;

23. The apparatus of claim 22, wherein the apparatus further comprises:

24. The apparatus of claim 23, wherein the output module is further adapted to:

alternatively, the first and second electrodes may be,

and outputting the prompt message by voice.

25. A mobile terminal comprising a processor and a memory:

the memory for storing a program for performing the method of any one of claims 1 to 12,

the processor is configured to execute programs stored in the memory.