CN110442867A - Image processing method, device, terminal and computer storage medium - Google Patents

Image processing method, device, terminal and computer storage medium Download PDF

Info

Publication number
CN110442867A
CN110442867A CN201910693744.9A CN201910693744A CN110442867A CN 110442867 A CN110442867 A CN 110442867A CN 201910693744 A CN201910693744 A CN 201910693744A CN 110442867 A CN110442867 A CN 110442867A
Authority
CN
China
Prior art keywords
mood
data
target
image
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910693744.9A
Other languages
Chinese (zh)
Inventor
王伟航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910693744.9A priority Critical patent/CN110442867A/en
Publication of CN110442867A publication Critical patent/CN110442867A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Human Resources & Organizations (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Psychiatry (AREA)
  • Hospice & Palliative Care (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Child & Adolescent Psychology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a kind of image processing method, device, terminal and computer storage mediums, wherein, the described method includes: obtaining mood data and image to be processed, identify the target emotion that the mood data is reflected, with according to the target emotion be the corresponding target filter mode of the images match to be processed, filter processing is finally carried out to the image to be processed using the target filter mode, obtains target image.Using the embodiment of the present invention, it is able to solve that image enhancement effects present in traditional technology are poor, can not accurately express the problems such as true intention of user.

Description

Image processing method, device, terminal and computer storage medium
Technical field
The present invention relates to technical field of image processing more particularly to image processing method, device, terminal and computer storages Medium.
Background technique
Social activity refers to the dealings of person to person in society, is people's (tool) transmitting information, friendship by way of certain The meaning of thought is flowed, to reach the social Activities of certain purpose.With the development of science and technology with Internet resources in life Application, interpersonal contacts beginning realizes by internet, can also be carried out by internet between stranger it is social, with Further expand and develops oneself.
Stranger is during social activity at present, often by means of intelligent terminal.User is social in the stranger of intelligent terminal Oneself is shown using the dynamic mode such as text, voice and image in, attraction is interacted with more sympathetic response persons.Wherein, scheme It seem the most common selection of user, to issue personal dynamic.However find in practice: the image provided due to intelligent terminal Filter mode is relatively more limited, causes image enhancement effects bad, and the image effect of user's publication is limited, can not accurately express user True intention.To influence the enthusiasm of stranger's interaction, the utilization rate of stranger's social application is influenced, stranger is unfavorable for Social development.
Summary of the invention
The embodiment of the invention provides a kind of image processing method, device, terminal and computer storage mediums, can improve Image effect, and then promote the enthusiasm of user interaction, improve the utilization rate of social application.
On the one hand, the embodiment of the present invention, which discloses, provides a kind of image processing method, which comprises
It obtains mood data and image to be processed, the mood data includes mood voice data, mood image data Or mood text data;
Identify the target emotion that the mood data is reflected;
It is the corresponding target filter mode of the images match to be processed according to the target emotion;
Filter processing is carried out to the image to be processed using the target filter mode, obtains target image.
On the other hand, the embodiment of the present invention, which is also disclosed, provides a kind of image processing apparatus, and described device includes:
Acquiring unit, for obtaining mood data and image to be processed, the mood data include mood text data, Mood voice data or mood image data;
Recognition unit, the target emotion that the mood data is reflected for identification;
Matching unit, for being the corresponding target filter mould of the images match to be processed according to the target emotion Formula;
Processing unit is obtained for carrying out filter processing to the image to be processed using the target filter mode Target image.
In another aspect, the embodiment of the present invention, which is also disclosed, provides a kind of terminal, the terminal includes input equipment and output Equipment, the terminal further include:
Processor is adapted for carrying out one or more instruction;And
Computer storage medium, the computer storage medium are stored with one or more instruction, and described one or more Instruction is suitable for being loaded by the processor and executing following steps:
It obtains mood data and image to be processed, the mood data includes mood text data, mood voice data Or mood image data;
Identify the target emotion that the mood data is reflected;
It is the corresponding target filter mode of the images match to be processed according to the target emotion;
Filter processing is carried out to the image to be processed using the target filter mode, obtains target image.Further Aspect, the embodiment of the invention provides a kind of computer storage medium, the computer storage medium is stored with one or more Instruction, one or more instruction are suitable for being loaded by processor and executing following steps:
It obtains mood data and image to be processed, the mood data includes mood text data, mood voice data Or mood image data;
Identify the target emotion that the mood data is reflected;
It is the corresponding target filter mode of the images match to be processed according to the target emotion;
Filter processing is carried out to the image to be processed using the target filter mode, obtains target image.
The embodiment of the present invention can obtain mood data and image to be processed, and identify the mesh that the mood data is reflected Mood is marked, for the corresponding target filter mode of the images match to be processed, finally to use institute according to the target emotion It states target filter mode and filter processing is carried out to the image to be processed, obtain target image.In this way based on mood to image Filter processing is carried out, can solve that image enhancement effects present in traditional technology are poor, can not accurately express the true meaning of user The problems such as figure and influence interaction enthusiasm.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of image processing method provided in an embodiment of the present invention.
Fig. 2 (a) and Fig. 2 (b) is the waveform diagram of two kinds of mood voice data provided in an embodiment of the present invention.
Fig. 3 is that a kind of mood provided in an embodiment of the present invention divides schematic diagram.
Fig. 4-Fig. 5 is the flow diagram of other two image processing method provided in an embodiment of the present invention.
Fig. 6 (a)-Fig. 6 (h) is a series of schematic diagram of a scenario provided in an embodiment of the present invention.
Fig. 7 is the flow diagram of another image processing method provided in an embodiment of the present invention.
Fig. 8 is the flow diagram of another image processing method provided in an embodiment of the present invention.
Fig. 9 is a kind of structural schematic diagram of image processing apparatus provided in an embodiment of the present invention.
Figure 10 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.
Description and claims of this specification and term " first " in above-mentioned attached drawing, " second " and " third " are (such as Fruit presence) etc. be for distinguishing different objects, not for description particular order.In addition, term " includes " and they are any Deformation, it is intended that cover and non-exclusive include.Such as contain the process, method, system, product of a series of steps or units Or equipment is not limited to listed step or unit, but optionally further comprising the step of not listing or unit, or can Selection of land further includes the other step or units intrinsic for these process, methods, product or equipment.
It referring to Figure 1, is a kind of flow diagram of image processing method provided in an embodiment of the present invention.The image procossing Method can be executed by terminal.Method as shown in Figure 1 includes the following steps S101-S104.
S101, mood data and image to be processed are obtained.
Terminal can respond dynamic release instruction, obtain mood when detecting the dynamic release instruction in social application Data and image to be processed.Wherein, dynamic release instruction, which can be, receives from other equipment (such as server) transmission, It can be and detect that the dynamic release of user is operated and generated by terminal, dynamic release operation can refer to user and need in social activity The corresponding operating issuing dynamic in and carrying out, such as according to the slide of desired guiding trajectory or needle in social application To a series of clicking operations etc. of designated button in social application.
Mood data refers to the data for describing user emotion, and mood here refers to some column Subjective experiences It is referred to as, is polyesthesia, the comprehensive psychology and physiological status generated of thought and act.For example, mood may include but be not limited to give birth to Gas, happy, excitement are wished or other are used to describe the vocabulary of user psychology and physiological status.
In practical applications, the specific manifestation form of mood data without limitation, may include but be not limited in following At least one of: mood voice data, mood image data, mood video data and mood text data.Wherein, video is usual It is made of a frame frame image, therefore mood video data can be considered and be made of a frame frame mood image data.Terminal regards mood The analysis essence of frequency evidence is the analysis to a frame frame mood image data, therefore the present invention is hereafter with mood image data substitution The elaboration of related content is carried out for mood video data.
The target emotion that S102, identification mood data are reflected.
In one embodiment, if mood data includes mood voice data, step S102 specifically may include as follows Step S11~S13:
S11, mood voice data is converted into mood text data, and extracts the text feature in mood text data.
Mood voice data is converted to corresponding mood text data by speech recognition program by terminal.The voice is known Other program can be the system program of terminal disposition, be also possible to third party application, for realizing turning for speech-to-text It changes.Further terminal extracts the text feature for including in mood text data using Text character extraction algorithm.The text is special Take over the mood shown on text in reflection mood voice data for use.Text feature extraction algorithm is the customized setting of system , such as customized setting according to actual needs, it may include but be not limited to Text eigenvector algorithm, Principal Component Analysis Or other are used to extract the algorithm of text feature.
Acoustic feature in S12, extraction mood voice data.
Terminal extracts the acoustic feature in mood voice data using acoustic feature extraction algorithm, which extracts The algorithm concretely customized setting of system, such as convolutional neural networks algorithm, Recognition with Recurrent Neural Network algorithm etc..
Optionally, which includes time domain acoustic feature and/or frequency domain acoustic feature.Wherein, time domain acoustic feature Refer to the feature for being used to reflect user emotion that mood voice data is shown in the time domain.Frequency domain acoustic feature refers to mood voice The feature for being used to reflect user emotion that data are shown on frequency domain.
In practical applications, the mood voice data of terminal acquisition is substantially a voice signal, comprising voice in frequency domain With the feature in time domain.Shown in waveform diagram such as Fig. 2 (a) of the voice signal (alternatively referred to as time frequency signal), abscissa indicates frequency Rate (frequency), ordinate indicate Oscillation Amplitude (abbreviation amplitude, amplitude).Terminal can in voice signal when Between, many-sided feature extraction such as amplitude, frequency, to extract to obtain the time domain acoustic feature of voice in the time domain.Further terminal Fourier Transform Algorithm can be used and convert voice signals into voice spectrum, as Fig. 2 (b) shows a kind of signal of voice spectrum figure Figure.The voice spectrum figure refers to waveform diagram of the voice signal on frequency domain, alternatively referred to as spectrogram.I.e. the voice spectrum figure refers to The time-domain signal of voice is converted to the waveform diagram embodied after frequency-region signal, when the abscissa in the voice spectrum figure indicates Between, ordinate indicate frequency.The terminal situation that can change with time to different periods frequency-region signal in voice spectrum figure is divided Analysis and identification, obtain the frequency domain acoustic feature that voice signal is shown on frequency domain.
Specifically, terminal can be used frequency domain character extraction algorithm and be analyzed voice spectrum figure to obtain frequency domain acoustics spy Sign.The frequency domain character extraction algorithm concretely system customized setting, for acoustically extracting the frequency domain character of voice, It may include but be not limited to convolutional neural networks algorithm, Recognition with Recurrent Neural Network algorithm etc..For example, terminal uses convolutional neural networks Algorithm carries out local shape factor to voice spectrum figure, is such as shifted for voice spectrum figure, is scaled or other forms distortion The image procossing of invariance obtains frequency domain acoustic feature.
S13, it calls the first mood model to carry out fusion recognition to text feature and acoustic feature, obtains target emotion.
Terminal calls the first mood model to carry out unified identification or fusion recognition to text feature and acoustic feature, obtains feelings The target emotion that thread voice data is reflected.First mood model can be the customized setting of system, such as inclined according to user Good or actual demand setting.First mood model is the model of preparatory trained user emotion for identification, can be wrapped Include but be not limited to feedforward neural network (feed forward, FF), depth feedforward neural network (deep feed forward, DFF), recurrent neural network (recurrent neural network, RNN), long memory network (long/short in short-term Term memory, LSTM) or for Emotion identification model.
It should be noted that if the embodiment of the present invention does not consider that the accuracy of Emotion identification, terminal can only consider text spy Sign or acoustic feature, the target emotion that the mood of corresponding identification is reflected as mood data.Without comprehensively considering text spy The Emotion identification sought peace under acoustic feature collective effect is conducive to save terminal computing resource, promotes treatment effeciency.
In another embodiment, if mood data includes mood voice data, step S102 includes the following steps S21-S25。
S21, mood voice data is converted to mood text data, call the second mood model to mood text data into Row semantic analysis obtains the first mood.
After mood voice data is converted to mood text data by terminal, the second mood model can be called to mood textual data According to semantic analysis is carried out, the first mood is obtained.Second mood model can be equally preparatory trained Emotion identification model, tool Body can refer to the related introduction previously with regard to the first mood model, and which is not described herein again.
When it is implemented, terminal first carries out semantic analysis to mood text data by the second mood model, mood is obtained The candidate mood vocabulary of the one or more for including in text data, candidate's mood vocabulary are used to reflect the mood of user, such as Angry, indignation, irritated, happy, pleasure etc..Specifically, terminal can be according to mood dictionary existing in model to mood textual data According to the analyses processing such as semantic analysis, such as crawl syntax rule, fractionation vocabulary are carried out, at least one candidate mood vocabulary is obtained. The mood dictionary is the customized setting of system, such as can be language inquiry and word counting (linguistic inquiry And word count, CLIWC) dictionary and EmoCD mood dictionary etc. include preconfigured at least one in the mood dictionary It is a to refer to mood vocabulary.Optionally, each to be configured with correspondingly weight (weight be claimed) with reference to mood vocabulary.The weight is used for Indicate the intensity with reference to the reflected mood of mood vocabulary, abbreviation emotional intensity.Such as the feelings reflected with reference to mood vocabulary Thread intensity is bigger, then this with reference to mood vocabulary weight it is bigger;Conversely, it is smaller with reference to the emotional intensity that mood vocabulary is reflected, Then this with reference to mood vocabulary weight it is smaller.
Further terminal can carry out similarity mode to the reference mood vocabulary in candidate mood vocabulary and model, calculate To candidate mood vocabulary and with reference to the similarity between mood vocabulary, and then the mood that target emotion vocabulary is reflected is determined as First mood.Wherein, target emotion vocabulary is the vocabulary for meeting the following conditions at least one candidate mood vocabulary: candidate mood Similarity between vocabulary and reference mood vocabulary is greater than or equal to preset threshold (concretely third threshold value), and this refers to feelings The weight of thread vocabulary is greater than or equal to the 4th threshold value.The concretely customized setting of system of the third threshold value and the 4th threshold value, Such as liked according to user or actual demand customized setting, or the numerical value obtained according to a series of experiments data statistics Deng.They can be equal, can not also wait, and the present invention is without limitation.
Similarity mode of the present invention, specific embodiment and without limitation.For example, terminal device can be used it is as follows Any one of similarity mode algorithm (can also claim similarity calculation method) or multinomial combination are to calculate the similarity between vocabulary: Word frequency (term frequency, TF) calculating method, word frequency-inverse file frequency (term frequency-inverse document Frequency, TF-IDF) calculating method, conversion (word2Vec) calculating method of vocabulary to vector or other seek Lexical Similarity Algorithm etc..
S22, it calls third mood model to carry out acoustic analysis to mood voice data, obtains the second mood.
Terminal can carry out acoustic character to mood voice data by third mood model, obtain mood voice data In include acoustic feature.The acoustic feature has time domain acoustic feature and frequency domain acoustic feature according to frequency domain and temporal partitioning.Into One step third mood model can be analyzed according to the time domain acoustic feature and/or frequency domain acoustic feature for including in mood voice data Obtain the second mood that mood voice data is reflected.The present invention is hereafter special with comprehensive analysis time domain acoustic feature and frequency domain acoustics For sign, the specific implementation for obtaining the second mood is described in detail.
Specifically, terminal can carry out feature extraction to mood voice data in the time domain by third mood model, obtain The time domain acoustic feature for including in mood voice data.The time domain acoustic feature refers to that mood voice data wraps on time domain direction The temporal signatures contained may include but be not limited to word speed, the duration of a sound, mel cepstrum coefficients (mel-scale frequency Cepstral coefficients, MFCC), perception linear prediction (perceptual linear prediction, PLP), altogether Shake peak or other time domain charactreristic parameters etc..Correspondingly, terminal can also be on frequency domain to mood voice number by third mood model According to feature extraction is carried out, the frequency domain acoustic feature for including in mood voice data is obtained.The frequency domain acoustic feature refers to mood language The frequency domain character that sound data are included on frequency domain direction may include but be not limited to short-time energy, short-time average amplitude, zero passage Rate or other frequency domain character parameters etc..
Further third mood model can carry out comprehensive analysis to time domain acoustic feature and frequency domain acoustic feature, obtain second Mood.For example, third mood model can analyze time domain acoustic feature and frequency domain acoustic feature respectively locating threshold interval range, And then obtain mood corresponding to the threshold interval range.The third mood model concretely in advance trained Emotion identification Model can be corresponded to reference to the related introduction previously with regard to the first mood model, and which is not described herein again.
Similarity between S23, the first mood of calculating and the second mood.
Terminal calculates the similarity between the first mood and the second mood using preset similarity calculation method, after being convenient for The continuous target emotion for determining that mood voice data is reflected based on the similarity.Before reference can be corresponded to about the similarity calculation method It states and illustrates which is not described herein again about the correlation of similarity mode algorithm.
S24, when similarity be greater than or equal to first threshold when, the first mood or the second mood are determined as target emotion.
S25, when similarity be less than first threshold when, the first mood is determined as target emotion.
Terminal is if it is determined that be greater than or equal to first threshold to the similarity between the first mood and the second mood, then it is assumed that the One mood and the second mood are more close, for example, the first mood be it is happy, the second mood be pleasure.Terminal can be by the first mood or Two moods are determined as target emotion.
Conversely, if similarity between the first mood and the second mood is less than first threshold, then it is assumed that the first mood and the Two moods differ greatly or conflicting, such as the first mood is pleasure, and the second mood is agitation.For the standard for guaranteeing Emotion identification True property, terminal can select to obtain target emotion from the first mood and the second mood.For example, terminal can be from the first mood and second Mood of choosing any one kind of them in mood is as target emotion.For another example in Emotion identification, the accuracy of usual text semantic analysis is higher than sound Signature analysis is learned, therefore the first mood that terminal can obtain semantic analysis is determined as target emotion.
It should be noted that if not considering the accuracy of Emotion identification, terminal can only consider text semantic analysis or voice Acoustic analysis, the target emotion that the mood of corresponding identification is reflected as mood data, without comprehensively considering text semantic (text Eigen) and Speech acoustics (acoustic feature) are conducive to promote computational efficiency to save the computing resource of terminal.
The present embodiments relate to mood (target emotion) division granularity without limitation.For example, drawing when mood When dividing granularity larger, mood description is fuzzyyer, such as mood only has point positively and negatively, can be divided into active mood and passiveness Mood.Conversely, mood description is more accurate when the division granularity of mood is smaller.Wherein, may include in the mood of bulky grain degree The mood of several small particle sizes.
For example, as Fig. 3 shows a kind of schematic diagram that mood divides.As divided the mood there are three types of granularity in Fig. 3, Mood comprising the first level, the second level and third level.Wherein, the first level is divided according to the positive negative sense of mood, It include active mood and negative feeling.Second level is after dividing along positive negative sense mood, and every kind of mood includes several The mood of band strength.As schemed, includes enjoyment mood and positive expectations in active mood, include irritated mood in negative feeling With dislike mood.Further third level be each mood for including in the second level is done further fine, discrete landearth-circumstance Thread divides.Include in high spirits, pleasant in enjoyment mood and amusement such as in diagram, in positive expectations comprising being hopeful and It is expected that.It include perturbed, discontented and dejected in irritated mood.Disliking in mood includes to disdain and dislike.
In actual treatment, since the granularity of text semantic analysis acquisition mood and Speech acoustics analysis obtain mood Granularity may not be identical, then terminal can further be verified or micronized particles degree in the way of the lesser Emotion identification of granularity Lesser Emotion identification mode.If such as text semantic analysis obtains the granularity of mood greater than Speech acoustics analysis acquisition mood Granularity, i.e. the mood granularity of the second mood model is greater than the mood granularity of third mood model, then terminal is being called After the first mood that second mood model is reflected based on text semantic identification mood data, third mood mould can be further called The second mood that type is reflected based on acoustic analysis identification mood data, further to verify or refine the first mood.It is convenient for The subsequent target emotion for more precisely obtaining mood data and being reflected, to promote the accuracy of Emotion identification.
As described above, if the similarity between the first mood and the second mood is greater than or equal to first threshold, terminal It is believed that the first mood and the second mood belong to same type of emotion.Due to the second mood granularity less than the first mood Granularity, mood described in the second mood is finer, then the first mood can be determined as the target that mood data is reflected by terminal Mood.Conversely, if similarity between the first mood and the second mood is less than first threshold, terminal thinks the first mood and the Two moods are not belonging to same class type of emotion, and prompt information can be transmitted at this time, prompt the user whether the first mood being determined as feelings The target emotion that thread data are reflected.Be conducive to be promoted the accuracy of Emotion identification in this way, while user being allowed to have sense of participation, favorably In promotion user experience.
In another embodiment, if mood data includes mood text data, terminal can to mood text data into Row semantic analysis obtains the target emotion that the mood text data is reflected, can specifically correspond to reference to abovementioned steps S11 or S21 Specific embodiment, which is not described herein again.
In another embodiment, if mood data includes mood image data, step S102 includes the following steps S31-S35。
Target face expression in S31, extraction mood image data, and obtain the third feelings that target face expression is reflected Thread.
Terminal can be used face recognition algorithms and carry out recognition of face to mood image data, obtain wrapping in mood image data The third mood that the target face expression and the target face expression contained is reflected.The face recognition algorithms can be made by oneself for system Justice setting, it may include but be not limited to face Emotion identification algorithm, Local Features Analysis algorithm, feature based on geometrical characteristic Face algorithm, neural network algorithm etc..
By taking the face Emotion identification algorithm based on geometrical characteristic as an example, geometrical characteristic is can be used to mood image data in terminal Recognition of face is carried out, such as usually extracts the important features organs such as human eye, mouth, nose, dimple as characteristic of division, obtains mood The facial image for including in image data.Can further Expression Recognition be carried out to facial image, obtaining the facial image includes Target face expression, and then obtain the third mood of target face expression reflection.If such as target face expression be smile, The third mood that the target face expression is reflected is happy etc..
Target limbs behavior in S32, extraction mood image data, and obtain the target limbs behavior is reflected the 4th Mood.
Terminal can be used Activity recognition algorithm and carry out Activity recognition to mood image data, obtain in the mood image data The 4th mood that the behavior of target limbs and the target limbs behavior for including are reflected.Behavior recognizer can be instruction in advance It perfects, may include but be not limited to the human body behavior algorithm based on deep learning, the human body behavior based on convolutional neural networks Algorithm etc..
Optionally, the output result of Activity recognition algorithm may be either the target limbs behavior for including in mood image data, The 4th mood that can also be reflected for the target limbs behavior.When the output of Activity recognition algorithm is the result is that in mood image data When the target limbs behavior for including, since different limbs behaviors can correspond to different moods, terminal need to also be from limbs feelings Corresponding the 4th the reflected mood of the target limbs behavior is obtained in thread mapping table.Wherein, it is wrapped in limbs mood mapping table Include the mapping relations between one or more groups of limbs behaviors and mood, every kind of limbs behavior corresponds to a kind of mood, and a kind of mood can Corresponding one or more limbs behaviors.For example, as the following table 1 shows a kind of schematic diagram of limbs mood mapping table.
Table 1
Serial number Limbs behavior User emotion
1 Limbs behavior 1 Happily
2 Limbs behavior 2 Indignation
....... ....... ......
Similarity between S33, calculating third mood and the 4th mood.
S34, when the similarity between third mood and the 4th mood is greater than or equal to second threshold, by third mood or 4th mood is determined as target emotion.
S35, when the similarity between third mood and the 4th mood be less than second threshold when, third mood is determined as mesh Mark mood.
Terminal calculates the similarity between third mood and the 4th mood using similarity calculation method.When third mood and Similarity between four moods is less than or equal to second threshold, then it represents that third mood and the 4th mood are more similar, at this time terminal The target emotion that third mood or the 4th mood can be reflected as mood data.
Conversely, when the similarity between third mood and the 4th mood is greater than second threshold, then it represents that third mood and the Four moods differ greatly, and terminal can obtain target emotion from third mood and the 4th mood according to default decision rule at this time. The default decision rule is the customized setting of system, such as the third mood that facial expression is reflected directly is determined as target Mood etc..
Optionally, if not considering the accuracy of Emotion identification, the third mood for being reflected target face expression can be held eventually Or the 4th mood that target limbs behavior is reflected is determined as target emotion.Without comprehensively considering facial expression and limbs behavior It analyzes target emotion, the computing resource of terminal can be saved in this way, promote treatment effeciency.
Optionally in actual process, since the granularity of facial expression reflection mood and limbs behavior reflect mood Granularity division may not be identical, terminal can be verified or micronized particles in the way of the lesser Emotion identification of granularity at this time Biggish Emotion identification mode is spent, can be corresponded to reference to the related introduction in previous embodiment, which is not described herein again.
It should be noted that the present embodiments relate to several specific embodiments can be used alone, can also be a variety of It is used in combination.Such as if mood data, when including mood voice data and mood image data, terminal can be in combination with mood language Sound data and the respective Emotion identification mode of mood image data, comprehensive analysis obtain the target emotion that mood data is reflected, It can similarly correspond to reference to the specific embodiment for obtaining target emotion previously for mood voice data and mood image data, this In repeat no more.
S103, according to target emotion be the corresponding target filter mode of images match to be processed.
After terminal obtains target emotion, mood filter mapping table can be obtained, is further obtained from the mood filter mapping table Take target filter mode corresponding with target emotion.Wherein, which can be the local for being pre-configured in terminal In database, it can also be configured in remote server.It include one or more groups of moods and filter in the mood filter mapping table Mapping relations between mode, every kind of mood correspond to a kind of filter mode, and a kind of filter mode can correspond to one or more moods. Illustratively, as the following table 2 shows a kind of schematic diagram of mood filter mapping table.
Table 2
S104, filter processing is carried out to image to be processed using target filter mode, obtains target image.
Figure of the embodiment of the present invention due to formats such as the images, such as JPG, PNG that image to be processed is usually after encoding Picture, terminal need to be decoded image to be processed, obtain decoded image.Then, terminal passes through central processing unit (central processing unit, CPU) carries out filter and rendering to decoded image using target filter mode, obtains To target image.It can be seen that terminal carries out mood filter for image to be processed, is conducive to promote image enhancement effects, keep away Exempt from the problems such as the language fails to express the meaning, can not accurately express user's true intention occur.
Fig. 4 is referred to, is the flow diagram of another image processing method provided in an embodiment of the present invention.Such as Fig. 4 institute The method shown includes step S401-S405.
S401, it is instructed in response to the dynamic release in social application, acquires mood data.
If including mood image data in S402, mood data, mood image data is determined as to image to be processed.
If terminal detects the dynamic release instruction for social application, dynamic release instruction can be responded, feelings are acquired Thread data.The mood data can be the mood data of designated user, can also be the mood number in designated user at the appointed time section According to.Wherein, mood data can be corresponded to reference to described above, concretely at least one of the following: mood voice data, mood Image data, mood text data and mood text data.The designated time period can be set by the user himself or system default, Such as 60 seconds (s) etc..Designated user can be any user, and illustratively terminal can carry out audio recording to designated user, obtain To mood voice data;Track up is carried out to designated user, obtains mood image data etc..
If including mood image data in mood data, terminal can be directly using mood image data as figure to be processed Picture avoids user from inputting image to be processed again, reduces user's operation, is conducive to the efficiency for promoting image procossing.
The target emotion that S403, the identification mood data are reflected;
S404, according to the target emotion be the corresponding target filter mode of the images match to be processed;
S405, filter processing is carried out to the image to be processed using the target filter mode, obtains target image. Optionally, terminal can also issue the target image in social application, for user's access.
Fig. 5 is referred to, is the flow diagram of another image processing method provided in an embodiment of the present invention.Such as Fig. 4 institute The method shown includes step S501-S505.
S501, it is instructed in response to the dynamic release in social application, acquires mood data.
If terminal detects the dynamic release instruction for social application, dynamic release instruction can be responded, feelings are acquired Thread data.It can correspond to about dynamic release instruction and mood data with reference to described above, which is not described herein again.
S502, image to be processed is obtained according to dynamic release instruction.
If carrying pending image in the instruction of dynamic release of the embodiment of the present invention, terminal can be directly dynamic by parsing State publication instruction obtains image to be processed.Alternatively, but being used to indicate if not carrying image to be processed in dynamic release instruction Image to be processed is obtained, then the instruction that terminal can be instructed according to dynamic release obtains image to be processed, the figure to be processed As that can be inputted for user, can also being sent for other equipment (such as server).
Optionally, terminal can be transmitted prompting message, prompt the user whether to input figure to be processed after acquiring mood data Picture.The embodiment of the prompting message without limitation, such as by modes such as pop-up (suspension windows), short message, subtitle, pictures mentions Show whether user selects to input image to be processed.
The target emotion that S503, the identification mood data are reflected.
S504, according to the target emotion be the corresponding target filter mode of the images match to be processed.
S505, filter processing is carried out to the image to be processed using the target filter mode, obtains target image.
For example, by taking social application is echo application as an example.A kind of acquisition mood number is shown referring to Fig. 6 (a)-Fig. 6 (e) According to the schematic diagram of a scenario with image to be processed.Such as Fig. 6 (a), user enables echo application, uses boundary into what echo was applied Face, it is specific as shown in Fig. 6 (b).User is issuing dynamic using selection in interface, into audio recording circle shown in Fig. 6 (c) Face.User's long-pressing audio recording key records the mood data of designated time period for the user, here as mood voice number According to.It recorded 38 seconds mood datas as Fig. 6 (d) is shown.Further user clicks next button and (is illustrated as being used to indicate down The icon of single stepping, the icon are specially the circular icon for including is-greater-than symbol), into interface shown in Fig. 6 (e), user It is used as the image to be processed of dynamic release needed for actively choosing.It optionally, can also new display between Fig. 6 (d)-Fig. 6 (e) Prompting interface (not shown) prompts the user whether to select image to be processed, if to detect that user need to select to be processed for terminal Image then jumps to Fig. 6 (e) and selects to input image to be processed for user.
It should be noted that can correspond to and implement with reference to 1 the method for earlier figures about the content not described in Fig. 4 and Fig. 5 Description in example, which is not described herein again.
Optionally terminal can also carry out the step S701-S705 in Fig. 7 after obtaining target image.
If including mood voice data or mood text data in S701, mood data, by mood voice data or mood Text data is synthesized in target image, obtains composograph.
In one embodiment, if including mood voice data in mood data, terminal can be by mood voice data Correspondingly mood text data is converted to, mood text data is further added to target image in a manner of image subtitle In, to obtain composograph.Wherein, mood text data is added to the specific location in target image and without limitation, example The upper left corner of target image, the upper right corner can be such as added to or occupy middle position.About mood voice data to mood text data Conversion can correspond to reference to the related introduction in previous embodiment, which is not described herein again.
In another embodiment, if including mood voice data in mood data, terminal can be by mood voice data It is embedded into target image, obtains composograph.
In another embodiment, if mood data directly includes mood text data, terminal can be by the mood text Data are added in target image in a manner of image subtitle, to obtain composograph.
In another embodiment, if mood data includes mood text data, terminal can turn mood text data It is changed to correspondingly mood voice data, which is embedded into target image, to obtain composograph.Its In, the conversion regime of mood text data to mood voice data and without limitation, such as terminal use pre-configured sound Sound mode (such as child's voice, soprano) plays mood lteral data, to form correspondingly mood voice data.
It should be noted that the embodiment that above-mentioned several composographs obtain can be used alone, also may be used in combination. Such as mood voice data can both be embedded into target image by terminal, it can also be by the corresponding mood textual data of mood voice data According to being added in target image, to obtain comprising voice and the composograph of text data etc..
S702, composograph is issued in social application.
The terminal of that embodiment of the invention can be issued further in response to image and be instructed, and composograph is issued in social application. Image publication instruction can refer to the same instruction with dynamic release instruction above, also can refer to different instructions, the present invention is not It limits.When they are different instruction, image publication refers to that terminal detects and generate when image publication operates Instruction, which refers to the instruction of generation when terminal detects progress data acquisition operations, for acquiring mood data And/or image to be processed etc..Wherein, image publication operation is the operation of the customized setting of system, such as clicks publication button Deng;Correspondingly data acquisition operations can also be operation of the customized setting of system, such as click voice recording button etc..
For example, with reference to the mood data and image to be processed acquired in 6 (a)-Fig. 6 (e) of earlier figures example, Assuming that the target emotion that terminal recognition mood data is reflected is happy.It shows in social application and sends out referring to Fig. 6 (f)-Fig. 6 (h) The concrete scene schematic diagram of cloth composograph.Specifically, terminal is after the target emotion that is reflected of identification mood data is happy, Filter processing can be carried out to image to be processed according to happy corresponding target filter mode, it can be to be processed as shown in Fig. 6 (f) Image on the happy expression of rendering smiling face, obtain target image.And terminal can also be by mood data (the mood voice of recording Data) it is embedded into target image, composograph is obtained, it is specific such as Fig. 6 (g).Further user can click in echo application Button is issued, the composograph is issued in echo application, as shown in Fig. 6 (h).
S703, operation is checked in response to being directed to the first of composograph, show the target image in composograph.
If terminal, which is detected, checks operation for the first of composograph, responds this and first check operation, in display screen The target image for including in middle display composograph.This first checks that operation is system customized setting, such as according to product Demand or the customized setting of user preference.For example, user browses composograph in social application, it is somebody's turn to do if terminal detects to be directed to When the browse operation of composograph, can show the target image for including in composograph, play mood voice data or The mood text data etc. for including in display composograph.
S704, operation is checked in response to being directed to the second of composograph, show the target image in composograph and broadcasting Target speech data, wherein the target speech data can be the mood voice data for including in mood data, can also be mood The voice data for the converted acquisition of mood text data for including in data.
No matter in mood data including mood voice data and/or mood text data, if terminal is detected for synthesis The second of image checks operation, then responds this and second check operation, the target image in composograph is shown in display screen, and Play target speech data.Wherein, if including mood voice data in mood data, which can directly be feelings Thread voice data.If in mood data including mood text data, which can be corresponding for mood text data The voice data of conversion.If mood data includes mood text data and mood voice data, for the true of accurate reception and registration user It is intended to, which can be mood voice data.It can also be the mood text data corresponding conversion of system default setting Voice data etc..
Second checks that operation equally can be the customized setting of system, and checks that operation is not identical with first.Such as terminal If detecting the double click operation for composograph in social application, terminal can enter the mesh in full screen display composograph Logo image, and play the mood voice data in composograph.
Optionally, terminal respond this second check operation after, can also simultaneous display target text data, the target text number It can also be corresponding turn of mood voice data for including in mood data according to that can be the mood text data for including in mood data The circumferential edge changed.The viewing experience for advantageously ensuring that user in this way promotes the utilization rate of social application.
S705, operation is checked in response to the third for composograph, show target image and target in composograph Text data, the target text data can for include in mood data mood text data or mood data in include The text data of mood voice data corresponding conversion.
If terminal detects that the third for composograph checks operation, responds the third and check operation, in display screen Target image and target text data in upper display composograph.Wherein, if including textual data of being in a bad mood in mood data According to then the target text data can directly be mood text data.If in mood data including mood voice data, the target Text data can be the text data of mood voice data corresponding conversion.If in mood data including mood voice data and mood Text data, then to save terminal resource, which can be mood text data.It optionally, can also be mood language The text data of sound data corresponding conversion, the present invention is without limitation.
Third, which checks operation equally, to be the customized setting of system, it checks that operation and second checks that operation is equal with first It is not identical.If such as terminal can be shown in composograph when detecting the clicking operation for composograph in social application Target image, and simultaneous display mood text data etc..
In practical applications, any one or more steps in step S703-S705 can be performed in terminal.When terminal is executable When multiple steps, without limitation, such as terminal can first carry out step S705 to the execution sequencing of each step, execute step afterwards S703。
The terminal of social activity of the embodiment of the present invention may include smart phone (such as Android phone, IOS mobile phone), individual Computer, tablet computer, palm PC, mobile internet device (mobile internet devices, MID) or wearable intelligence Internet devices, the embodiment of the present invention such as energy equipment are not construed as limiting.
By the implementation embodiment of the present invention, can be presented by the content of more sense organs, such as combined by sound and vision Mode presents the image of sound or text, allows user that can show publication content more acurrate in social application, richerly, It may advantageously facilitate the interest, interactivity and utilization rate of social application.And enhance publication content (figure based on Emotion identification Picture), also solve the problems such as image enhancement effects present in traditional technology are bad, the true intention for the user that is beyond expression.
Fig. 8 is referred to, is that a kind of process of image processing method based on scene application provided in an embodiment of the present invention is shown It is intended to.Method as shown in Figure 8 includes step S801-S803.
S801, it is instructed in response to the dynamic release in social application, obtains mood data and image to be processed.
If the terminal of that embodiment of the invention detects that the dynamic release in social application instructs, may be in response in the social application Dynamic release instruction, obtain mood data and target image.Specifically terminal can obtain mood data and image to be processed, The image to be processed is handled to obtain target image based on mood data, the acquisition about target image can correspond to reference Description in earlier figures 1, Fig. 4 and Fig. 5 either method embodiment, details are not described herein again.
Wherein, dynamic release instruction can be detected that user carries out dynamic release and operate to generate in social application by terminal Instruction, dynamic release operation can be the clicking operation in social application for specified dynamic release key, slide Deng.The social application refers to the software for reaching user's communication dealing purpose by network, may include but be not limited to blog class and answers It is (such as micro- with the application of, microblogging class, the application of forum's class, social networks class application (such as facebook) and the application of instant messaging class Letter, QQ etc.) etc..
The target emotion that S802, identification mood data are reflected, and it is corresponding with target emotion for images match to be processed Target filter mode.
S803, filter processing is carried out to image to be processed using target filter mode, obtains target image.
S804, target image is issued in social application.
Optionally, the interest and integrality, enhancing user social contact, terminal for considering dynamic (image) publication are contemplated that feelings Thread data are synthesized with target image, obtain composograph, to issue composograph in social application.Specifically:
In one embodiment, if only including mood image data in mood data, terminal responds the dynamic release Instruction, can issue target image in social application.
It, can if including mood voice data or mood text data, terminal in mood data in another embodiment Mood voice data or mood text data are synthesized in target image, composograph is obtained.And then it is sent out in social application The cloth composograph completes corresponding dynamic publication.Elaboration about composograph can be corresponded to referring to Fig. 7 the method embodiment In it is related illustrate, details are not described herein again.About issuing dynamic scene application example in social application, before reference being corresponded to The related introduction of Fig. 6 (a)-Fig. 6 (h) embodiment of the present invention is stated, user in social application sequentially complete in social application by operation The publication of composograph, details are not described herein again.
By the implementation embodiment of the present invention, can be presented by the content of more sense organs, such as combined by sound and vision Mode presents the image of sound or text, allows user that can show publication content more acurrate in social application, richerly, It may advantageously facilitate the interest, interactivity and utilization rate of social application.And enhance publication content (figure based on Emotion identification Picture), also solve the problems such as image enhancement effects present in traditional technology are bad, the true intention for the user that is beyond expression.
Based on the description of above-mentioned image processing method embodiment, a kind of image processing apparatus is also disclosed in the embodiment of the present invention, The device can be operate in a computer program (including program code) in terminal.The device can be executed such as figure 1 above- Content described in any one of Fig. 8 embodiment.Fig. 9 is referred to, which can run such as lower unit:
Acquiring unit 801, for obtaining mood data and image to be processed, the mood data includes mood voice number According to, mood image data or mood text data;
Recognition unit 802, the target emotion that the mood data is reflected for identification;
Matching unit 803, for being the corresponding target filter of the images match to be processed according to the target emotion Mode;
Processing unit 804 is obtained for carrying out filter processing to the image to be processed using the target filter mode To target image.
In one embodiment, acquiring unit 801 is specifically used for instructing in response to the dynamic release in social application, adopts Collect mood data;If in the mood data including mood image data, the mood image data is determined as described wait locate The image of reason.
In another embodiment, acquiring unit 801 is specifically used for instructing in response to the dynamic release in social application, adopts Collect mood data;Image to be processed is obtained according to dynamic release instruction.
In another embodiment, if processing unit 804 be also used in the mood data include mood voice data or The mood voice data or mood text data are synthesized in the target image, obtain composite diagram by mood text data Picture;The composograph is issued in the social application.
In another embodiment, processing unit 804 is also used in response to checking behaviour for the first of the composograph Make, shows the target image in the composograph;Alternatively, being shown in response to checking operation for the second of the composograph Show the target image in the composograph and play target speech data, the target speech data is the mood voice number According to or the corresponding voice data of the mood text data;Alternatively, checking behaviour in response to the third for the composograph Make, show the target image and target text data in the composograph, the target text data are the mood text Data or the corresponding text data of the mood voice data.
In another embodiment, matching unit 803 is specifically used for obtaining mood filter mapping table, the mood filter The mapping relations being in a bad mood with filter mode are recorded in mirror mapping table, the mapping relations are that a kind of filter mode is corresponding extremely A kind of few mood;The target filter mould corresponding with the target emotion is obtained from the mood filter mapping table Formula.
In another embodiment, if mood data includes mood voice data, recognition unit 802 is specifically used for institute It states mood voice data and is converted to corresponding mood text data, and extract the spy of the text in the corresponding mood text data Sign;Extract the acoustic feature in the mood voice data;Call the first mood model to the text feature and the acoustics Feature carries out fusion recognition, obtains the target emotion.
In another embodiment, if mood data includes mood voice data, recognition unit 802 is specifically used for institute It states mood voice data and is converted to corresponding mood text data, call the second mood model to the corresponding mood textual data According to semantic analysis is carried out, the first mood is obtained;Third mood model is called to carry out acoustic feature point to the mood voice data Analysis, obtains the second mood;When the similarity between first mood and second mood is greater than or equal to first threshold, First mood or second mood are determined as the target emotion;When first mood and second mood Between similarity be less than first threshold when, first mood is determined as target emotion.
In another embodiment, if mood data includes mood image data, recognition unit 802 is specifically used for extracting Target face expression in the mood image data, and obtain the third mood that the target face expression is reflected;It extracts Target limbs behavior in the mood image data, and obtain the 4th mood that the target limbs behavior is reflected;Work as institute When stating similarity between third mood and the 4th mood and being greater than or equal to second threshold, by the third mood or described 4th mood is determined as the target emotion;When the similarity between the third mood and the 4th mood is less than the second threshold When value, the third mood is determined as the target emotion.
In another embodiment, recognition unit 802 is specifically used for carrying out semantic analysis to the mood text data, obtains To at least one candidate mood vocabulary;The reference mood word that will include in the candidate mood vocabulary and first mood model It converges and carries out similarity mode, obtain the candidate mood vocabulary and the similarity with reference between mood vocabulary;By target feelings The mood that thread vocabulary is reflected is determined as first mood;Wherein, the target emotion vocabulary is at least one described candidate Meet the similarity in mood vocabulary more than or equal to third threshold value, and the weight with reference to mood vocabulary is greater than or equal to Vocabulary corresponding to 4th threshold value, the weight with reference to mood vocabulary are used to indicate the feelings reflected with reference to mood vocabulary The intensity of thread.
In another embodiment, recognition unit 802 is specifically used in the time domain carrying out the mood voice data special Sign is extracted, and time domain acoustic feature is obtained;Feature extraction is carried out to the mood voice data on frequency domain, it is special to obtain frequency domain acoustics Sign;The time domain acoustic feature and the frequency domain acoustic feature are analyzed, second mood is obtained.
According to another embodiment of the invention, each unit in image processing apparatus shown in Fig. 9 can respectively or All one or several other units are merged into constitute or some (a little) unit therein can also be split as function again Smaller multiple units are constituted on energy, this may be implemented similarly to operate, and the technology without influencing the embodiment of the present invention is imitated The realization of fruit.Said units are logic-based function divisions, and in practical applications, the function of a unit can also be by multiple Unit is realized or the function of multiple units is realized by a unit.In other embodiments of the invention, based at image Managing device also may include other units, and in practical applications, these functions can also be assisted to realize by other units, and can It is realized with being cooperated by multiple units.
It according to another embodiment of the invention, can be by including central processing unit (CPU), random access memory It is transported on the universal computing device of such as computer of the processing elements such as medium (RAM), read-only storage medium (ROM) and memory element Row is able to carry out such as the computer program (including program code) for letting off each step involved in embodiment any in Fig. 1-Fig. 8, Construct image processing apparatus equipment as shown in Figure 9, and come the image processing method of realizing the embodiment of the present invention.It is described Computer program can be recorded in such as computer readable recording medium, and be loaded by computer readable recording medium It states and calculates in equipment, and run wherein.
The embodiment of the present invention can obtain mood data and image to be processed, and identify the mesh that the mood data is reflected Mood is marked, for the corresponding target filter mode of the images match to be processed, finally to use institute according to the target emotion It states target filter mode and filter processing is carried out to the image to be processed, obtain target image.In this way based on mood to image Filter processing is carried out, can solve that image enhancement effects present in traditional technology are poor, can not accurately express the true meaning of user The problems such as figure and influence interaction enthusiasm.
Description based on above method embodiment and Installation practice, the embodiment of the present invention also provide a kind of terminal.Please Referring to Figure 10, which includes at least processor 901, input equipment 902, output equipment 903 and computer storage medium 904.Wherein, the processor 901 in terminal, input equipment 902, output equipment 903 and computer storage medium 904 can pass through Bus or other modes connection.
Computer storage medium 904 can store in the memory of terminal, and the computer storage medium 904 is for depositing Computer program is stored up, the computer program includes program instruction, and the processor 901 is situated between for executing the computer storage The program instruction that matter 904 stores.Processor 901 (or CPU (Central Processing Unit, central processing unit)) is The calculating core and control core of terminal, are adapted for carrying out one or more instruction, be particularly adapted to load and execute one or A plurality of instruction is to realize correlation method process or corresponding function;In one embodiment, processing described in the embodiment of the present invention Device 901 can be used for carrying out a series of image procossing, comprising: obtain mood data and image to be processed;Identify the feelings The target emotion that thread data are reflected;It is the corresponding target filter mould of the images match to be processed according to the target emotion Formula;Filter processing is carried out to the image to be processed using the target filter mode, obtains target image, etc..
The embodiment of the invention also provides a kind of computer storage medium (Memory), the computer storage medium is eventually Memory device in end, for storing program and data.It is understood that computer storage medium herein both may include Built-in storage medium in terminal, naturally it is also possible to the expansion storage medium supported including terminal.Computer storage medium mentions For memory space, which stores the operating system of terminal.Also, it is also housed in the memory space and is suitable for being located One or more instruction that reason device 901 is loaded and executed, these instructions can be one or more computer program (including program code).It should be noted that computer storage medium herein can be high speed RAM memory, it is also possible to Non-labile memory (non-volatile memory), for example, at least a magnetic disk storage;Optionally can also be to Few one is located remotely from the computer storage medium of aforementioned processor.
In one embodiment, it can be loaded by processor 901 and execute one stored in computer storage medium or more Item instruction, to realize the above-mentioned corresponding steps in relation to the method in image procossing embodiment;In the specific implementation, computer storage is situated between One or more instruction in matter is loaded by processor 901 and executes following steps:
It obtains mood data and image to be processed, the mood data includes mood voice data, mood image data Or mood text data;
Identify the target emotion that the mood data is reflected;
It is the corresponding target filter mode of the images match to be processed according to the target emotion;
Filter processing is carried out to the image to be processed using the target filter mode, obtains target image.
In further embodiment, one or more instruction is loaded by processor 901 and is specifically executed: in response to social activity Dynamic release instruction in, acquires mood data;If including mood image data in the mood data, by the mood Image data is determined as the image to be processed.
In further embodiment, one or more instruction is loaded by processor 901 and is specifically executed: in response to social activity Dynamic release instruction in, acquires mood data;Image to be processed is obtained according to dynamic release instruction.
In further embodiment, one or more instruction can also be loaded and be executed by processor 901: if the mood Include mood voice data or mood text data in data, the mood voice data or mood text data are synthesized to institute It states in target image, obtains composograph;The composograph is issued in the social application.
In further embodiment, one or more instruction can also be loaded and be executed by processor 901: in response to being directed to The first of the composograph checks operation, shows the target image in the composograph.Alternatively, in response to being directed to the conjunction Operation is checked at the second of image, is shown the target image in the composograph and is played target speech data, the target Voice data is the mood voice data or the corresponding voice data of the mood text data.Alternatively, in response to being directed to The third of the composograph checks operation, shows target image and target text data in the composograph, the mesh Marking text data is the mood text data or the corresponding text data of the mood voice data.
In further embodiment, one or more instruction is loaded by processor 901 and is specifically executed: obtaining mood filter Mirror mapping table records the mapping relations being in a bad mood with filter mode, the mapping in the mood filter mapping table Relationship is a kind of corresponding at least one mood of filter mode;It is obtained and the target feelings from the mood filter mapping table The corresponding target filter mode of thread.
In further embodiment, one or more instruction is loaded by processor 901 and is specifically executed: by the mood Voice data is converted to corresponding mood text data, and extracts the text feature in the corresponding mood text data;It mentions Take the acoustic feature in the mood voice data;Call the first mood model to the text feature and the acoustic feature into Row fusion recognition obtains the target emotion.
In further embodiment, one or more instruction is loaded by processor 901 and is specifically executed: by the mood Voice data is converted to corresponding mood text data, and the second mood model is called to carry out the corresponding mood text data Semantic analysis obtains the first mood;It calls third mood model to carry out acoustic character to the mood voice data, obtains Second mood;It, will be described when the similarity between first mood and second mood is greater than or equal to first threshold First mood or second mood are determined as the target emotion;When between first mood and second mood When similarity is less than first threshold, first mood is determined as target emotion.
In further embodiment, one or more instruction is loaded by processor 901 and is specifically executed: extracting the feelings Target face expression in thread image data, and obtain the third mood that the target face expression is reflected;Extract the feelings Target limbs behavior in thread image data, and obtain the 4th mood that the target limbs behavior is reflected;When the third When similarity between mood and the 4th mood is greater than or equal to second threshold, by the third mood or the 4th feelings Thread is determined as the target emotion;When the similarity between the third mood and the 4th mood is less than second threshold, The third mood is determined as the target emotion.
In further embodiment, one or more instruction is loaded by processor 901 and is specifically executed: to the mood Text data carries out semantic analysis, obtains at least one candidate mood vocabulary;By the candidate mood vocabulary and first feelings The reference mood vocabulary that includes in thread model carries out similarity mode, obtains the candidate mood vocabulary and described with reference to mood word Similarity between remittance;The mood that target emotion vocabulary is reflected is determined as first mood;Wherein, the target emotion Vocabulary is greater than or equal to third threshold value to meet the similarity at least one described candidate mood vocabulary, and described with reference to feelings The weight of thread vocabulary is greater than or equal to vocabulary corresponding to the 4th threshold value, described in the weight with reference to mood vocabulary is used to indicate With reference to the intensity for the mood that mood vocabulary is reflected.
In further embodiment, one or more instruction is loaded by processor 901 and is specifically executed: right in the time domain The mood voice data carries out feature extraction, obtains time domain acoustic feature;The mood voice data is carried out on frequency domain Feature extraction obtains frequency domain acoustic feature;The time domain acoustic feature and the frequency domain acoustic feature are analyzed, institute is obtained State the second mood.
The embodiment of the present invention can obtain mood data and image to be processed, and identify the mesh that the mood data is reflected Mood is marked, for the corresponding target filter mode of the images match to be processed, finally to use institute according to the target emotion It states target filter mode and filter processing is carried out to the image to be processed, obtain target image.In this way based on mood to image Filter processing is carried out, can solve that image enhancement effects present in traditional technology are poor, can not accurately express the true meaning of user The problems such as figure and influence interaction enthusiasm.The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than right It is limited;Although the present invention is described in detail referring to the foregoing embodiments, those skilled in the art should manage Solution: it is still possible to modify the technical solutions described in the foregoing embodiments, or to part of technical characteristic into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (15)

1. a kind of image processing method, which is characterized in that the described method includes:
It obtains mood data and image to be processed, the mood data includes mood voice data, mood image data or feelings Thread text data;
Identify the target emotion that the mood data is reflected;
It is the corresponding target filter mode of the images match to be processed according to the target emotion;
Filter processing is carried out to the image to be processed using the target filter mode, obtains target image.
2. the method according to claim 1, wherein the acquisition mood data and image to be processed include:
In response to the dynamic release instruction in social application, mood data is acquired;
If including mood image data in the mood data, the mood image data is determined as the figure to be processed Picture.
3. the method according to claim 1, wherein the acquisition mood data and image to be processed include:
In response to the dynamic release instruction in social application, mood data is acquired;
Image to be processed is obtained according to dynamic release instruction.
4. according to the method in claim 2 or 3, which is characterized in that the method also includes:
If in the mood data including mood voice data or mood text data, by the mood voice data or mood text Notebook data is synthesized in the target image, obtains composograph;
The composograph is issued in the social application.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
In response to checking operation for the first of the composograph, the target image in the composograph is shown;Alternatively,
In response to checking operation for the second of the composograph, shows the target image in the composograph and play mesh Voice data is marked, the target speech data is the language of the mood voice data or the mood text data corresponding conversion Sound data;Alternatively,
Operation is checked in response to the third for the composograph, shows target image and target text in the composograph Notebook data, the target text data are the text of the mood text data or the mood voice data corresponding conversion Data.
6. method according to any one of claims 1-5, which is characterized in that described according to the target emotion is described The corresponding target filter mode of images match to be processed includes:
Mood filter mapping table is obtained, records the mapping being in a bad mood with filter mode in the mood filter mapping table Relationship, the mapping relations are a kind of corresponding at least one mood of filter mode;
The target filter mode corresponding with the target emotion is obtained from the mood filter mapping table.
7. method according to claim 1 to 6, which is characterized in that the mood data includes mood voice number According to the target emotion that the identification mood data is reflected includes:
The mood voice data is converted into corresponding mood text data, and is extracted in the corresponding mood text data Text feature;
Extract the acoustic feature in the mood voice data;
It calls the first mood model to carry out fusion recognition to the text feature and the acoustic feature, obtains the target feelings Thread.
8. method according to claim 1 to 6, which is characterized in that the mood data includes mood voice number According to the target emotion that the identification mood data is reflected includes:
The mood voice data is converted into corresponding mood text data, calls the second mood model to the corresponding feelings Thread text data carries out semantic analysis, obtains the first mood;
It calls third mood model to carry out acoustic character to the mood voice data, obtains the second mood;
When the similarity between first mood and second mood is greater than or equal to first threshold, by first feelings Thread or second mood are determined as the target emotion;
When the similarity between first mood and second mood is less than first threshold, first mood is determined For target emotion.
9. method according to claim 1 to 6, which is characterized in that the mood data includes mood picture number According to the target emotion that the identification mood data is reflected includes:
The target face expression in the mood image data is extracted, and obtains the third feelings that the target face expression is reflected Thread;
The target limbs behavior in the mood image data is extracted, and obtains the 4th feelings that the target limbs behavior is reflected Thread;
When the similarity between the third mood and the 4th mood is greater than or equal to second threshold, by the third feelings Thread or the 4th mood are determined as the target emotion;
When the similarity between the third mood and the 4th mood is less than second threshold, the third mood is determined For the target emotion.
10. according to the method described in claim 8, it is characterized in that, the second mood model of the calling is to the mood text Data carry out semantic analysis, and obtaining the first mood includes:
Semantic analysis is carried out to the mood text data, obtains at least one candidate mood vocabulary;
The reference mood vocabulary for including in the candidate mood vocabulary and first mood model is subjected to similarity mode, is obtained To the candidate mood vocabulary and the similarity with reference between mood vocabulary;
The mood that target emotion vocabulary is reflected is determined as first mood;
Wherein, the target emotion vocabulary is to meet the similarity at least one described candidate mood vocabulary to be greater than or equal to Third threshold value, and the weight with reference to mood vocabulary is greater than or equal to vocabulary corresponding to the 4th threshold value, it is described to refer to mood The weight of vocabulary is used to indicate the intensity of the mood reflected with reference to mood vocabulary.
11. according to the method described in claim 8, it is characterized in that, the calling third mood model is to the mood voice Data carry out acoustic analysis, and obtaining the second mood includes:
Feature extraction is carried out to the mood voice data in the time domain, obtains time domain acoustic feature;
Feature extraction is carried out to the mood voice data on frequency domain, obtains frequency domain acoustic feature;
The time domain acoustic feature and the frequency domain acoustic feature are analyzed, second mood is obtained.
12. a kind of image processing method, which is characterized in that the described method includes:
In response to the dynamic release instruction in social application, mood data and image to be processed are obtained;The mood data packet Include mood voice data, mood image data or mood text data;
It identifies the target emotion that the mood data is reflected, and is the images match to be processed and the target emotion pair The target filter mode answered;
Filter processing is carried out to the image to be processed using the target filter mode, obtains target image;
The target image is issued in the social application.
13. a kind of image processing apparatus characterized by comprising
Acquiring unit, for obtaining mood data and image to be processed, the mood data includes mood voice data, mood Image data or mood text data;
Recognition unit, the target emotion that the mood data is reflected for identification;
Matching unit, for being the corresponding target filter mode of the images match to be processed according to the target emotion;
Processing unit obtains target for carrying out filter processing to the image to be processed using the target filter mode Image.
14. a kind of terminal, including input equipment and output equipment, which is characterized in that further include:
Processor is adapted for carrying out one or more instruction;And
Computer storage medium, the computer storage medium are stored with one or more instruction, one or more instruction Suitable for being loaded by the processor and being executed the described in any item image processing methods of claim 1-11 as above.
15. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with one or more instruction, One or more instruction is suitable for being loaded by processor and executing the described in any item image procossings of claim 1-11 as above Method.
CN201910693744.9A 2019-07-30 2019-07-30 Image processing method, device, terminal and computer storage medium Pending CN110442867A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910693744.9A CN110442867A (en) 2019-07-30 2019-07-30 Image processing method, device, terminal and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910693744.9A CN110442867A (en) 2019-07-30 2019-07-30 Image processing method, device, terminal and computer storage medium

Publications (1)

Publication Number Publication Date
CN110442867A true CN110442867A (en) 2019-11-12

Family

ID=68432176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910693744.9A Pending CN110442867A (en) 2019-07-30 2019-07-30 Image processing method, device, terminal and computer storage medium

Country Status (1)

Country Link
CN (1) CN110442867A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110879840A (en) * 2019-11-19 2020-03-13 珠海格力电器股份有限公司 Information feedback method, device and storage medium
CN110991427A (en) * 2019-12-25 2020-04-10 北京百度网讯科技有限公司 Emotion recognition method and device for video and computer equipment
EP4174849A1 (en) * 2021-11-02 2023-05-03 Capital One Services, LLC Automatic generation of a contextual meeting summary

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992824A (en) * 2017-11-30 2018-05-04 努比亚技术有限公司 Take pictures processing method, mobile terminal and computer-readable recording medium
CN108537749A (en) * 2018-03-29 2018-09-14 广东欧珀移动通信有限公司 Image processing method, device, mobile terminal and computer readable storage medium
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN109254669A (en) * 2017-07-12 2019-01-22 腾讯科技(深圳)有限公司 A kind of expression picture input method, device, electronic equipment and system
CN109325904A (en) * 2018-08-28 2019-02-12 百度在线网络技术(北京)有限公司 Image filters treating method and apparatus
CN109660728A (en) * 2018-12-29 2019-04-19 维沃移动通信有限公司 A kind of photographic method and device
CN109766759A (en) * 2018-12-12 2019-05-17 成都云天励飞技术有限公司 Emotion identification method and Related product

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109254669A (en) * 2017-07-12 2019-01-22 腾讯科技(深圳)有限公司 A kind of expression picture input method, device, electronic equipment and system
CN107992824A (en) * 2017-11-30 2018-05-04 努比亚技术有限公司 Take pictures processing method, mobile terminal and computer-readable recording medium
CN108537749A (en) * 2018-03-29 2018-09-14 广东欧珀移动通信有限公司 Image processing method, device, mobile terminal and computer readable storage medium
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN109325904A (en) * 2018-08-28 2019-02-12 百度在线网络技术(北京)有限公司 Image filters treating method and apparatus
CN109766759A (en) * 2018-12-12 2019-05-17 成都云天励飞技术有限公司 Emotion identification method and Related product
CN109660728A (en) * 2018-12-29 2019-04-19 维沃移动通信有限公司 A kind of photographic method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110879840A (en) * 2019-11-19 2020-03-13 珠海格力电器股份有限公司 Information feedback method, device and storage medium
CN110991427A (en) * 2019-12-25 2020-04-10 北京百度网讯科技有限公司 Emotion recognition method and device for video and computer equipment
EP4174849A1 (en) * 2021-11-02 2023-05-03 Capital One Services, LLC Automatic generation of a contextual meeting summary
US11967314B2 (en) 2021-11-02 2024-04-23 Capital One Services, Llc Automatic generation of a contextual meeting summary

Similar Documents

Publication Publication Date Title
US20240168933A1 (en) Ai story platform with customizable personality for education, entertainment, and therapy
US20220366281A1 (en) Modeling characters that interact with users as part of a character-as-a-service implementation
CN106658129B (en) Terminal control method and device based on emotion and terminal
US10706873B2 (en) Real-time speaker state analytics platform
JP2022551788A (en) Generate proactive content for ancillary systems
CN116547746A (en) Dialog management for multiple users
CN114556333A (en) Smart camera enabled by assistant system
US9754585B2 (en) Crowdsourced, grounded language for intent modeling in conversational interfaces
US11562744B1 (en) Stylizing text-to-speech (TTS) voice response for assistant systems
TW202132967A (en) Interaction methods, apparatuses thereof, electronic devices and computer readable storage media
CN111081280B (en) Text-independent speech emotion recognition method and device and emotion recognition algorithm model generation method
CN110442867A (en) Image processing method, device, terminal and computer storage medium
Katayama et al. Situation-aware emotion regulation of conversational agents with kinetic earables
CN107463684A (en) Voice replying method and device, computer installation and computer-readable recording medium
CN112860213B (en) Audio processing method and device, storage medium and electronic equipment
CN114138960A (en) User intention identification method, device, equipment and medium
CN112673641A (en) Inline response to video or voice messages
KR102413860B1 (en) Voice agent system and method for generating responses based on user context
CN109961152B (en) Personalized interaction method and system of virtual idol, terminal equipment and storage medium
CN110781329A (en) Image searching method and device, terminal equipment and storage medium
US11887600B2 (en) Techniques for interpreting spoken input using non-verbal cues
CN113301352B (en) Automatic chat during video playback
Firc Applicability of Deepfakes in the Field of Cyber Security
Mornatta Steps towards the use of NLP in emotional theatre improvisation
CN110795581B (en) Image searching method and device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination