CN108320318A - Image processing method, device, computer equipment and storage medium - Google Patents
Image processing method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN108320318A CN108320318A CN201810036627.0A CN201810036627A CN108320318A CN 108320318 A CN108320318 A CN 108320318A CN 201810036627 A CN201810036627 A CN 201810036627A CN 108320318 A CN108320318 A CN 108320318A
- Authority
- CN
- China
- Prior art keywords
- text
- image
- textual
- target
- initial position
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
This application involves a kind of image processing method, this method includes:Obtain target image, the target image includes target subject, target subject in the target image is identified, identify target subject region, the collected voice data is identified as text, according to the target subject region by collecting voice data in real time, determine the initial position that the text is presented, using the initial position as starting point by the textual presentation in the target image.The image processing method by collected voice data by being converted to text, then in the picture by textual presentation, need not carry out additional edit operation, you can word is added in the image of acquisition by realization, easy to operate.In addition, it is also proposed that a kind of image processing apparatus, computer equipment and storage medium.
Description
Technical field
This application involves computer processing technical fields, are set more particularly to a kind of image processing method, device, computer
Standby and storage medium.
Background technology
With the development of the development of terminal, especially mobile terminal, taken pictures using the photographic device in mobile terminal
Or shooting video has become universal phenomenon.But it is traditional using mobile terminal carry out shooting photo or shoot video can only
It is simply shot, user needs the later stage to be compiled by repairing figure tool if it is intended to add content in the picture of shooting
Processing is collected, it is cumbersome.
Invention content
Based on this, it is necessary in view of the above-mentioned problems, proposing a kind of image processing method of simple operation, device, calculating
Machine equipment and storage medium.
A kind of image processing method, the method includes:
Target image is obtained, the target image includes target subject;
Target subject in the target image is identified, identifies target subject region;
The collected voice data is identified as text by collecting voice data in real time;
According to the target subject region, the initial position that the text is presented is determined;
Using the initial position as starting point by the textual presentation in the target image.
A kind of image processing apparatus, described device include:
Acquisition module, for obtaining target image, the target image includes target subject;
Picture recognition module identifies target subject area for the target subject in the target image to be identified
Domain;
Sound identification module is used for collecting voice data in real time, the collected voice data is identified as text;
Position determination module, for according to the target subject region, determining the initial position that the text is presented;
Display module, for using the initial position as starting point by the textual presentation in the target image.
A kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor,
So that the processor executes following steps:
Target image is obtained, the target image includes target subject;
Target subject in the target image is identified, identifies target subject region;
The collected voice data is identified as text by collecting voice data in real time;
According to the target subject region, the initial position that the text is presented is determined;
Using the initial position as starting point by the textual presentation in the target image.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the calculating
When machine program is executed by the processor so that the processor executes following steps:
Target image is obtained, the target image includes target subject;
Target subject in the target image is identified, identifies target subject region;
The collected voice data is identified as text by collecting voice data in real time;
According to the target subject region, the initial position that the text is presented is determined;
Using the initial position as starting point by the textual presentation in the target image.
Above-mentioned image processing method, device, computer equipment and storage medium obtain target image, in target image
Target subject be identified, target subject region, while collecting voice data in real time are identified, by collected voice data
It is identified as text, initial position that text is presented then is determined according to target subject region, is starting point by text using initial position
Displaying is in the target image.Then the image processing method is existed textual presentation by converting voice data into text in real time
In target image, it additional editor need not be carried out can be realized word being added in the image of acquisition, it is easy to operate, and root
The initial position of text in the picture is determined according to target subject region so that the displaying of text and image being capable of dynamic bind.
A kind of image processing method, the method includes:
Target image is obtained, the target image includes mouth;
The mouth in the target image is detected, lip reading identification is carried out according to mouth action, obtains corresponding identification text;
The context synchronization that identification obtains is illustrated in the target image.
A kind of image processing apparatus, described device include:
Image collection module, for obtaining target image, the target image includes mouth;
Lip reading identification module carries out lip reading identification according to mouth action, obtains for detecting the mouth in the target image
To corresponding identification text;
Synchronous display module, the context synchronization for obtaining identification are illustrated in the target image.
A kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor,
So that the processor executes following steps:
Target image is obtained, the target image includes mouth;
The mouth in the target image is detected, lip reading identification is carried out according to mouth action, obtains corresponding identification text;
The context synchronization that identification obtains is illustrated in the target image.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the calculating
When machine program is executed by the processor so that the processor executes following steps:
Target image is obtained, the target image includes mouth;
The mouth in the target image is detected, lip reading identification is carried out according to mouth action, obtains corresponding identification text;
The context synchronization that identification obtains is illustrated in the target image.
Above-mentioned image processing method, device, computer equipment and storage medium detect target by obtaining target image
Mouth in image carries out lip reading identification according to mouth action, obtains corresponding identification text, the text for then obtaining identification
Synchronous displaying is in the target image.Above-mentioned image processing method carries out lip reading identification by the mouth action identified in image, and
Corresponding text and mouth action are synchronized into displaying, realizes and easily text is added in image, and can realize
Text and face action are consistent.
Description of the drawings
Fig. 1 is the flow chart of image processing method in one embodiment;
Fig. 2A is that first word is illustrated in target image median surface schematic diagram in one embodiment;
Fig. 2 B are that a upper word is carried out deviation displaying in one embodiment, and next word is illustrated in initial position
Interface schematic diagram;
Fig. 2 C are the interface schematic diagram that multiple words chromatograph in target image in one embodiment;
Fig. 3 is the interface schematic diagram of segment textual presentation in the target image in one embodiment;
Fig. 4 is the flow chart of image processing method in another embodiment;
Fig. 5 is the schematic diagram of the human face characteristic point extracted in one embodiment;
Fig. 6 is that control textual image to originate display location according to Fe coatings in one embodiment be starting point into Mobile state
The method flow diagram of displaying;
Fig. 7 is the particle renders flow diagram of particIe system in one embodiment;
Fig. 8 is that control textual image to originate display location according to Fe coatings in another embodiment be starting point into action
The method flow diagram of state displaying;
Fig. 9 is the flow chart of image processing method in another embodiment;
Figure 10 is the flow diagram of image processing method in one embodiment;
Figure 11 is that the effect diagram enunciated is presented in image in one embodiment;
Figure 12 is the flow chart of image processing method in further embodiment;
Figure 13 is the flow chart of image processing method in a still further embodiment;
Figure 14 is the structure diagram of image processing apparatus in one embodiment;
Figure 15 is the structure diagram of image processing apparatus in another embodiment;
Figure 16 is the structure diagram of image processing apparatus in another embodiment;
Figure 17 is the structure diagram of display module in one embodiment;
Figure 18 is the structure diagram of image processing apparatus in a still further embodiment;
Figure 19 is the internal structure chart of one embodiment Computer equipment.
Specific implementation mode
It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and
It is not used in restriction the application.
As shown in Figure 1, in one embodiment, providing a kind of image processing method.The present embodiment is mainly in this way
It is illustrated applied to terminal.Referring to Fig.1, which specifically comprises the following steps:
Step S102 obtains target image, and target image includes target subject.
Wherein, target image refers to pending image.The acquisition mode of target image can pass through the shape of shooting photo
Formula can also be to be obtained by shooting the form of video, because video can be regarded as the picture composition of a frame frame.Image
Acquisition can be acquired by the preposition or rear camera in terminal.Target subject refers to mesh to be identified in image
Mark object.Target subject self-defined can be arranged, for example, target subject can be arranged as people, may be set to be face, may be used also
To be set as face with more refining, naturally it is also possible to be set as animal, trees etc., can specifically be carried out according to actual conditions demand
Self-defined setting.Target image can be the image or video acquired in real time, can also be image or video after shooting.One
In a embodiment, the target image of acquisition is the preview image to be captured by calling camera to obtain, and preview image refers to
Still unsaved image.
The target subject in target image is identified in step S104, identifies target subject region.
Wherein, the target subject in target image is identified using target subject recognition methods, such as, it is assumed that target
Main body is face, then the face in target image is identified using face identification method.Where identifying target subject
Region, convenient for subsequently determining the display location of text according to target subject region.
Collected voice data is identified as text by step S106, collecting voice data in real time.
Wherein, voice data is to acquire what user speech obtained in real time by the microphone in terminal.Receive user's
After voice, collected voice data is identified to obtain text using speech recognition technology.Text refers to according to voice number
The word sequence obtained according to identification.Speech recognition technology realized using existing technology, for example, for IOS systems, it can
To carry out speech recognition operation by calling the API in SpeechKit (speech recognition tools).It, can be with for Android system
It is realized by calling other speech recognition interfaces.Here, the identification of voice data is not defined.
Step S108 determines the initial position that text is presented according to target subject region.
Wherein it is possible to pre-set the position relationship between target subject and text, during acquiring image, when obtaining
The starting display location that text is assured that behind the position of target subject is got, it then will according to the starting display location of text
Corresponding textual presentation is in the picture.The initial position of text is arranged on the upper left side of target subject for example, pre-setting,
Behind the position that target subject is determined, so that it may to determine the initial position where text, after getting text, so that it may with will be literary
This is correspondingly shown to corresponding initial position.In one embodiment, further include a textual presentation frame, target is being determined
Behind the position of main body, it is first determined the position of textual presentation frame, then by textual presentation in textual presentation frame, textual presentation frame
Size can be automatically adjusted according to the length of text.
Step 110, using initial position as starting point by textual presentation in the target image.
Wherein, initial position refers to the starting display location of text.After the initial position of text is determined, by text with
Initial position be starting point by textual presentation in the target image.
Above-mentioned image processing method obtains target image, is identified to the target subject in target image, identifies mesh
Body region, while collecting voice data in real time are marked, collected voice data is identified as text, then according to target subject
Region determine text present initial position, using initial position as starting point by textual presentation in the target image.The image procossing
Method by converting voice data into text in real time, then in the target image by textual presentation, need not carry out additional
Editor, which can be realized, is added to word in the image of acquisition, easy to operate, and determines that text is being schemed according to target subject region
Initial position as in so that the displaying of text and image being capable of dynamic bind.
In one embodiment, using initial position as starting point by textual presentation in the target image the step of include:Work as language
When the corresponding text of sound data forms word, word is illustrated in initial position;When the corresponding text of voice data formed it is next
When a word, the word of history displaying to the direction movement for deviateing initial position and is shown;By next word according to starting
The step of position shows, repeats to enter when the corresponding text of voice data forms next word, with data under voice
The passage of time shows the corresponding text of voice data in real time in such a way that word moves.
Wherein, collecting voice data in real time just will when the corresponding text of collected voice data can form word
Word is illustrated in initial position, then when the corresponding text of voice data forms next word, then by the word of history displaying
Language is moved to the direction for deviateing initial position, while next word being shown according to initial position.Next word can be straight
It connects and is illustrated in initial position, can also be illustrated near initial position.Over time, according to collected voice number
According to word is constantly formed, then constantly word is shown in the picture in real time according to such word move mode.Such as figure
2A is in one embodiment, by the interface schematic diagram of the initial position of first word displaying of formation in the target image, figure
2B is to form next word, a upper word is shown to initial position is deviateed, and next word is illustrated in start bit
The interface schematic diagram set, Fig. 2 C are to have the interface schematic diagram that multiple words are presented on target image over time.
In one embodiment, using initial position as starting point by textual presentation in the target image the step of include:It will be real
When the voice data segment that acquires form sound bite, obtain the corresponding segment text of sound bite;Segment textual presentation is existed
Initial position;The corresponding next segment text of next sound bite is obtained, by the segment text of history displaying to deviateing
The direction of beginning position is mobile and shows;Next segment text is shown according to initial position, repeats to enter the next language of acquisition
The step of tablet section corresponding next segment text, the side moved with segment text with the passage of data under voice time
Formula shows the corresponding text of voice data in real time.
Wherein, collecting voice data in real time obtains voice when the voice data segment acquired in real time forms sound bite
The corresponding segment text of segment, the current clip textual presentation that will identify that is in initial position.Wherein, the side of voice data segment
Method can be by using the method for mute detection, when occurring mute, then it is assumed that the voice data of mute front is a voice
Segment.A word that statement completely looks like can also be identified as a sound bite by semantics recognition.It is next when getting
When the corresponding next segment text of a sound bite, the segment text of history displaying is moved to the direction for deviateing initial position
Displaying, while next segment text being shown according to initial position.It in one embodiment, can be direct by next segment
It is illustrated in initial position, in another embodiment, next fragment display can also be set near initial position.Class successively
It pushes away, in such a manner, constantly shows the segment text of formation in the target image in real time in a mobile manner.Such as
Fig. 3 is the interface schematic diagram of segment textual presentation in the target image in one embodiment.
In one embodiment, above-mentioned image processing method further includes:The text is subjected to word segmentation processing, is obtained multiple
Ziwen sheet;It is described to include by step of the textual presentation in the target image using the initial position as starting point:According to
The corresponding Speech time stamp of each Ziwen sheet determines each Ziwen sheet corresponding starting displaying time;According to each Ziwen sheet
The corresponding starting displaying time by each Ziwen sheet using the initial position as starting point according to preset track pushing away with the time
It is moved into Mobile state displaying.
Wherein, participle refers to that word sequence is cut into individual word one by one, individual word can be unitary word,
It can be polynary word.Unitary word refers to word one by one, and polynary word refers to the word of binary and binary or more.Text is carried out
Word segmentation processing obtains multiple Ziwen sheets.Specifically, the voice data acquired in real time is identified as text first, then to text into
Row word segmentation processing obtains multiple Ziwen sheets, and each corresponding of Ziwen sheet is determined according to the corresponding Speech time stamp of each Ziwen sheet
Begin the displaying time.Speech time stamp refers to the acquisition time of the corresponding voice data of text.
In one embodiment, the sequencing that can be stabbed according to the Speech time of the corresponding voice data of Ziwen sheet determines
The sequencing of sub- textual presentation.Specifically, the relationship between Speech time stamp and starting displaying time can be set, for example,
Speech time is stabbed to be positively correlated with the starting displaying time, i.e., the time that Speech time stamp represents is more early, and corresponding Ziwen sheet rises
Begin to show that the time is more early.
In another embodiment, for multiple Ziwen sheets of within the same period (for example, 1 second), unrest can be carried out
Sequence shows, because of one meaning of multiple this combinational expression of Ziwen in the same period, although out of order but still can be with
The meaning for finding out expression is accomplished " in unrest orderly ".For example, " you are very beautiful ", if correspondence generates three sub- texts, respectively
" you ", "true" and " beautiful ", then in displaying, if carry out is out of order, for example, " beautiful ", "true", " you ", can still see
Go out the meaning of " you are very beautiful ", and in this way out of order, further increases the interest of displaying.
It, will according to each Ziwen sheet corresponding starting displaying time after the starting displaying time that each Ziwen sheet is determined
Each Ziwen sheet carries out Dynamic Display over time according to preset track by starting point of initial position.
In one embodiment, text is being subjected to word segmentation processing, obtain further include after multiple this step of of Ziwen:Root
According to semantics recognition crucial text is extracted from multiple Ziwen sheets.
Using initial position as starting point by textual presentation in the target image the step of include:It is corresponded to according to each crucial text
Speech time stamp determine each crucial text corresponding starting displaying time;According to the corresponding starting displaying of each key text
Each crucial text is carried out Dynamic Display by the time over time using initial position as starting point according to preset track.
Wherein, crucial text refers to the keyword obtained by semantics recognition.After identification obtains crucial text, according to key
Text corresponding Speech time stamp determines that the corresponding starting of each crucial text shows the time, then by crucial text according to
The starting displaying time carries out Dynamic Display over time using initial position as starting point according to preset track.Due to cutting
Obtained Ziwen originally may be very long, need not all show all words, it is only necessary to extract crucial text and be opened up
Show.For example " this summer is really awfully hot!", according to extraction of semantics to crucial text " summer is really hot ", will can only close
The processing of key textual presentation.
As shown in figure 4, in one embodiment, above-mentioned image processing method further includes:
Step S112 obtains shooting instruction, obtains present image according to shooting instruction and is illustrated in working as in present image
Preceding text.
Wherein, the target image for acquiring in real time, if it is desired to preserve target image, it is also necessary to obtain shooting
Instruction, shooting instruction are to shoot the instruction of present image, that is, shoot the instruction of photo.Shooting instruction is by detecting user's triggering
What the operation of shooting button obtained.After terminal gets shooting instruction, obtains present image and be illustrated in working as in present image
Preceding text, present image refer to current shooting moment corresponding image, and current text refers to the corresponding displaying of current shooting moment
Text in the picture.
Current text and present image are synthetically formed synthesis by step S114 according to the current presentation position of current text
Image, and preserve composograph.
Wherein, in order to get the composograph for including current text and present image, existed according to current text
Current text and present image are synthesized and are preserved to obtain composograph by the current presentation position in present image.
In one embodiment, above-mentioned image processing method further includes:Starting shooting instruction is obtained, is clapped according to the starting
It takes the photograph instruction and displaying text in the picture and image is constantly synthetically formed each synthesized image frame, and preserve each composite diagram
As frame;It obtains and terminates shooting instruction, synthetic video is formed according to each synthesized image frame.
Wherein, starting shooting instruction refers to shooting the initial order of video.After getting starting shooting instruction, constantly obtain
The current text in present image and present image is taken to generate synthesized image frame, synthesized image frame refers to the video for shooting video
Frame, i.e., each video frame are a composograph.Present image refers to current time corresponding image, and current text refers to current
Moment corresponding text, with the variation of time, current time constantly changes, so will constantly be illustrated in present image
Text and present image synthesize to obtain synthesized image frame, and preserve each synthesized image frame in real time.It refers to clapping to terminate shooting instruction
Take the photograph the END instruction of video.Synthetic video is made of a continuous synthesized image frame of frame frame.Get end shooting instruction
Afterwards, stop shooting, and synthetic video is generated according to each synthesized image frame of preservation.
In one embodiment, target subject is face;The step of target subject in the identification described image includes:
The human face characteristic point in described image is extracted, the position of face is determined according to the human face characteristic point.
Wherein, using face as target subject, in order to identify that the face in image extracts face characteristic in image first
Point, human face characteristic point are also referred to as face key point, are used for the position of locating human face, the position of wherein face includes but not limited to eye
The face locations such as eyeball, face, nose, eyebrow.The position where face is assured that according to the human face characteristic point extracted.Tool
Body, face label location technology may be used to extract the human face characteristic point in facial image, can specifically be divided into two steps
Suddenly, one is Face datection, and one is face label.The rough position in image residing for face is obtained by Face datection first
It sets, usually frames a rectangle frame of face, then on the basis of the rectangle frame, found by face label more accurate
Position is then back to a series of coordinate of human face characteristic point positions.As shown in figure 5, in one embodiment, positioned by marking
The schematic diagram of obtained human face characteristic point.Existing method may be used in the method for face label positioning, for example AAM may be used
(Active Appearance Models faces display model), ERT (Ensemble of Regression tree, regression tree
Combination) etc..Here the method for face label positioning is not limited.
In one embodiment, it is described during acquiring image according to the position of the target subject by the text
The step being illustrated in described image includes:Mouth position is determined according to the characteristic point for representing mouth in the human face characteristic point,
The display location that the text is determined according to the mouth position, according to the display location by the textual presentation in the figure
As in.
Wherein, include the characteristic point of mouth in human face characteristic point, the spy for representing mouth is extracted from human face characteristic point
Point is levied, mouth position is then assured that according to the characteristic point of mouth.Pre-set the display location and mouth position of text
Between correspondence the display location of text is determined according to mouth position after mouth position is determined, then by text exhibition
Show in display location.Since people is spoken by face, so by corresponding textual presentation around face, can build
The scene that user speaks.By using this feature, by typing voice data, can be come from while taking pictures or shooting the video
It is dynamic to add some texts expressed oneself mood or describe scenic picture, increase the interest of shooting.
In one embodiment, using initial position as starting point by textual presentation in the target image the step of include:According to
The text corresponding displaying control parameter control text carries out Dynamic Display by starting point of the initial position.
Wherein, displaying control parameter refers to that the parameter of Dynamic Display is carried out for controlling text.Show that control parameter includes
At least one of speed parameter, angle parameter, color parameter, size parameter, time parameter.Wherein, speed parameter includes just
Speed, acceleration etc. are used for the parameter of controlled motion speed, wherein initial velocity, acceleration are vectors, are with directive speed
Degree and acceleration, so movement velocity and the position of text can be calculated according to speed parameter and initial position.Angle
Parameter includes rotation angle parameter, i.e., controls the rotary motion that text carries out angle according to the angle parameter.Color parameter is used for
Control the color shows variation of text.Size parameter is used to control the displaying variation of the size of text, and time parameter is for controlling
The displaying duration of text.Specifically, text is controlled using initial presentation position as starting point according to the corresponding displaying control parameter of text
Carry out Dynamic Display, wherein Dynamic Display includes shift in position, angle change, color change, size variation, residence time etc.
At least one of dynamic.
In one embodiment, text is controlled using initial presentation position as starting point according to the corresponding displaying control parameter of text
Carry out Dynamic Display the step of include:Obtain the display location of text in forward frame image;According to displaying control parameter and forward direction
The target location of text in current frame image is calculated in the display location of text in frame image, the target in current frame image
Position text exhibition.
Wherein, if text is when being shown in a manner of movement in the picture, the position of movement is to need basis
The display location of text is calculated in real time in displaying control parameter and forward frame image, and forward frame image refers to being in work as
Picture frame before previous frame.In one embodiment, show in control parameter to include that initial velocity, acceleration etc. are transported for controlling
The parameter of dynamic speed, according to the position of text in forward frame image, so that it may the displaying of text in current frame image is calculated
Position.In one embodiment, it is assumed that the position of text is A in any forward frame image, the forward frame corresponding time is t1,
Initial time is set as t0, it is assumed that initial velocity v0, acceleration a, and assume that initial velocity is consistent with acceleration direction, present frame figure
As the corresponding time is t2.The position B, S=v0 of text in current frame image can so be calculated by following formula
(t2-t1) the distance between A and B is calculated under the premise of the position A of known forward frame in+a (t2-t1) (t2+t1)/2
S, and the known direction of motion, can be calculated B location.
In one embodiment, in acquisition voice data, in the collecting voice data in real time, by collected institute's predicate
Further include after the step of sound data are identified as text:Convert text to textual image.Wherein, it is identified by voice data
To after text, textual image is converted text to.In one embodiment, the background colour of textual image is transparent.
In one embodiment, it is described using the initial position as starting point by the textual presentation in the target image
Step S110 include:Using textual image as the particle in particIe system, joined according to pre-set particle in particIe system
Number control textual image carries out Dynamic Display by starting point of initial position.
Wherein, the textual image that will convert into is as the particle in particIe system.ParticIe system refers to indicating three-dimensional computations
The technology of some specific bloomings is simulated in machine graphics, particle is the X-Y scheme rendered in three dimensions, they
It is mainly used for such as cigarette, fire, water droplet or leaf and other effects.One particIe system is made of three parts:Particle emitter,
Particle animation device and particle renders are gone.Wherein, particle emitter is used to control generation and the original state of particle.Particle animation
Device is used for the motion state with time control particle, and particle tenderer draws them on the screen.Wherein, particle emitter
It is mainly to be indicated by one group of Fe coatings with particle animation device.Fe coatings may include particle formation speed (i.e. unit
The number that time particle generates), particle initial velocity vectorial (for example, when being moved to what direction), particle age (warp
Cross how long particle is buried in oblivion), particle color, the variation (for example, variation of size) etc. in particle life cycle be used for
Control the parameter of particle variation.
As shown in fig. 6, in one embodiment, Fe coatings include speed parameter, angle parameter, color parameter, size
At least one of parameter, time parameter;
Control the textual image according to pre-set Fe coatings in the particIe system is with the initial position
Starting point carry out Dynamic Display the step of include:
Step S602, obtains textual image state in forward frame image, and textual image state includes the position, big of text
At least one of small, angle, color.
Wherein, forward frame image refer to before the present frame to picture frame.Textual image state includes textual image
At least one of position, size, angle, color.Specifically, in order to residing for the textual image that is calculated in current frame image
State, need to be calculated according to the textual image state in Fe coatings and forward frame image.
Step S604 obtains current frame image Chinese according to textual image state computation in Fe coatings and forward frame image
This picture state, according to textual image state text exhibition picture in current frame image.
Wherein, Fe coatings include in speed parameter, angle parameter, color parameter, size parameter, time parameter at least
It is a kind of.Wherein, speed parameter is used to control movement velocity and the direction of particle, particle can be calculated according to speed parameter and work as
It is the location of preceding.Angle parameter is used to control the rotation angle and rotary speed of particle, can be calculated according to angle parameter
Obtain the angle that particle is presently in.Color parameter is used to control the displaying of the color of particle.Size parameter is for controlling particle
Size and corresponding variation.Time parameter is used to control the service life of particle, the i.e. time of particle existence.Before being calculated
Into frame image after textual image state, current frame image Chinese can be calculated according to textual image state and Fe coatings
This picture state, and then current text is calculated come text exhibition picture according to textual image state in current frame image
The position of picture, size, angle, color, are then shown.
In one embodiment, Fe coatings include only speed parameter.So textual image state is becoming in addition to position
Change, other are all remained unchanged.The textual image state of the forward video frame got also only has position, according in forward video frame
The position of textual image in current frame video image is calculated in the position and speed parameter of textual image.
In one embodiment, forward frame image uses the previous frame image of present frame, i.e., according to previous frame image Chinese
The state of textual image in the state computation current frame image of this picture, and call particle tenderer according to current frame image Chinese
The state of this picture is rendered.Particle renders flow diagram if Fig. 7 is particIe system in one embodiment obtains first
The original state of textual image is taken, i.e., the state of textual image in first frame image, then, by according to previous frame image Chinese
The principle of the state of textual image is calculated in the state computation current frame image of this picture, finally, by corresponding text diagram
Piece carries out color applying drawing on the screen.
In one embodiment, the step of converting text to textual image include:Cutting word processing is carried out to text, is obtained
Multiple displaying words, it is each to show that word corresponds to one textual image of generation, obtain multiple textual images.
Wherein, cutting word also known as " segments ", refers to that word sequence is cut into individual word one by one, individual word can
To be unitary word, can also be polynary word.Unitary word refers to word one by one, and polynary word refers to binary and binary or more
Word contains the phrase of the tandem relationship between word and word.Cutting obtains word and is known as " displaying word ".Each displaying word corresponds to
A textual image is generated, obtains multiple textual images in this way.
In one embodiment, the step of converting text to textual image include:Gone out in text according to semantics recognition
Target keyword;Target keyword is converted into textual image.
Wherein, target keywords refer to the word for needing emphasis to handle obtained by semantics recognition.Identification obtains target pass
After keyword, target keyword is converted into textual image.For example " this summer is really awfully hot!", according to extraction of semantics to key
Word " summer ", " genuine ", " heat " subsequently carry out emphasis for target keyword and show, for non-targeted keyword, for example, " this
It is a ", " very " can carry out desalination and display or not show.
As shown in figure 8, in one embodiment, using textual image as the particle in particIe system, according to particIe system
In pre-set Fe coatings control textual image include to originate the step of display location is starting point progress Dynamic Display:
Step S802 is stabbed according to the Speech time of the corresponding voice data of textual image and is determined that the starting of textual image is shown
Time.
Wherein, since textual image is by text generation, and text is by being identified to obtain to voice data
, Speech time stamp refers to the time for obtaining voice data.So when can Speech time be stabbed corresponding as textual image
Between stab, to determine the corresponding starting displaying time according to the corresponding timestamp of textual image.It in one embodiment, can be with
The sequencing stabbed according to the Speech time of the corresponding voice data of textual image determines the sequencing of textual image displaying.Tool
Body, the relationship between Speech time stamp and starting displaying time can be set, for example, Speech time stamp and starting displaying time
It is positively correlated, i.e., the time that Speech time stamp represents is more early, and the starting displaying time of corresponding textual image is more early.At another
In embodiment, for multiple textual images of within the same period (for example, 1 second), out of order displaying can be carried out, because same
One meaning of multiple textual image combinational expressions in period, although out of order but still it can be seen that the meaning of expression
Think, accomplishes " in unrest orderly ".For example, " you are very beautiful ", if correspondence generates three textual images, respectively " you ", "true"
" beautiful ", then in displaying, if carry out is out of order, for example, " beautiful ", "true", " you ", still it can be seen that " you really float
It is bright " the meaning, and in this way out of order further increase the interest of shooting.
Step S804 controls corresponding textual image to originate exhibition respectively according to the corresponding Fe coatings of each textual image
Show that position and starting displaying time have been that dotted state carries out Dynamic Display.
Wherein, starting display location refers to the textual image initially position of displaying in the picture, and starting shows that the time refers to
The initial time of textual image.After the starting displaying time and starting display location that textual image is determined, so that it may with basis
The corresponding Fe coatings of textual image carry out Dynamic Display.The corresponding Fe coatings of different textual images can be identical, also may be used
With difference.For example, identical Fe coatings can be arranged for all particles, only the displaying time of each particle is different.
Different Fe coatings can also be set for different particles.
As shown in figure 9, in one embodiment, above-mentioned image processing method further includes:
Step S116 records the image temporal stamp of collected each frame video image.
Wherein, image temporal stamp refer to collected each video frame corresponding time, that is, acquire the video image when
Between.Specifically, terminal records the acquisition time of each frame video image when carrying out video acquisition, obtains a series of video figures
As corresponding image temporal stamp sequence (time1, time2, time3 ...), to determine corresponding priority according to acquisition time
Sequentially,
Step S118 records the corresponding Speech time stamp of the voice data got, corresponding Speech time is stabbed and known
The text not gone out is associated.
Wherein, the corresponding time, goes out according to speech recognition when the Speech time stamp of voice data refers to acquisition voice data
After text, Speech time is stabbed and is associated storage with the text identified.
Step S120 is stabbed according to image temporal stamp and Speech time so that video image synchronizes exhibition with corresponding text
Show.
Wherein, in order to enable video image and text synchronize broadcastings, according to image temporal stamp and Speech time stab come
Determine the displaying time of text.Specifically, image temporal stamp is stabbed into consistent text and image with Speech time and synchronizes exhibition
Show.Synchronous text exhibition and image are realized with image temporal stamp by recording Speech time stamp, that is, realizes word and the shape of the mouth as one speaks
It is synchronous.
It is as shown in Figure 10 the flow diagram of image processing method in one embodiment, includes mainly three steps:1,
By camera real-time image acquisition, Face datection is carried out to acquisition image, extracts human face characteristic point, it is true according to human face characteristic point
Determine the position of face;2, during acquiring image, voice data is received by microphone, language then is carried out to voice data
Sound identifies to obtain identification text, is then textual image by the text conversion of identification;3, using textual image as in particIe system
Particle, particle emission initiation region is arranged near the corners of the mouth, the effect of real-time " enunciating " is built.If Figure 11 is a reality
It applies in example, the effect diagram of real-time " enunciating ".Wherein, recognition of face can pass through the SDK (Software of calling recognition of face
Development Kit, Software Development Kit) it realizes, voice data can be by calling the SDK of speech recognition to realize.
As shown in figure 12, in one embodiment it is proposed that a kind of image processing method, this approach includes the following steps:
Step S1201 obtains target image, and target image includes face.
Step S1202 extracts the human face characteristic point in target image, human face region is determined according to human face characteristic point.
Collected voice data is identified as text by step S1203, collecting voice data in real time.
Step S1204 carries out cutting word processing to text, obtains multiple displaying words, each to show that word corresponds to one text of generation
This picture obtains multiple textual images.
Step S1205 determines mouth position according to the characteristic point for representing mouth in human face characteristic point, true according to mouth position
Determine the initial position of textual image.
Step S1206 stabs the starting exhibition for determining textual image according to the Speech time of the corresponding voice data of textual image
Show the time.
Step S1207 joins using textual image as the particle in particIe system according to the corresponding particle of each textual image
It has been that dotted state carries out Dynamic Display that number controls corresponding textual image to originate display location and starting displaying time respectively.
Step S1208 obtains shooting instruction, obtains present image according to shooting instruction and is illustrated in working as in present image
Preceding textual image.
Current text and present image are synthetically formed by step S1209 according to the current presentation position of current text picture
Composograph, and preserve composograph.
As shown in figure 13, in one embodiment it is proposed that a kind of image processing method, this method include:
Step S1302 obtains target image, and target image includes mouth.
Wherein, target image refers to pending image.The acquisition mode of target image can pass through the shape of shooting photo
Formula can also be to be obtained by shooting the form of video, because video can be regarded as the picture composition of a frame frame.Image
Acquisition can be acquired by the preposition or rear camera in terminal.Target image can be the image acquired in real time
Or video, can also be the image or video after shooting.In one embodiment, the target image of acquisition is imaged by calling
The preview image to be captured that head obtains, preview image refers to still unsaved image.
Step S1304 detects the mouth in target image, carries out lip reading identification according to mouth action, obtains corresponding knowledge
Other text.
Wherein, the characteristic point of mouth is determined by extracting the form of human face characteristic point, it is then true according to the characteristic point of mouth
Determine the position of mouth and the action of mouth.Lip reading identification is an item collection machine vision with natural language processing in the technology of one,
The content that speech can be directly identified from the image that people talks, i.e., can identify to obtain corresponding text according to mouth action
This, wherein lip reading identification can be realized by calling lip reading SDK.SDK refers to the software work for lip reading identification write
Tool packet.
Step S1306, the context synchronization displaying that identification is obtained is in the target image.
Wherein, by mouth action in image being identified to obtain word in real time, then will the obtained word of identification and
Corresponding includes that the image of corresponding mouth action synchronizes displaying.
Above-mentioned image processing method, by obtain target image, detect target image in mouth, according to mouth action into
Row lip reading identifies, obtains corresponding identification text, and the context synchronization displaying for then obtaining identification is in the target image.Above-mentioned figure
As processing method, lip reading identification is carried out by the mouth action identified in image, and corresponding text and mouth action are carried out
Synchronous displaying, realizes and easily text is added in image, and can realize and be consistent text and face action.
In one embodiment, it will identify that the step of obtained context synchronization is shown in the target image includes:According to mouth
The display location of the location determination text in portion, by the display location of context synchronization displaying in the target image.
Wherein, the correspondence between display location and mouth position is pre-set, after the position of mouth is determined, i.e.,
The display location of text is determined, then the display location by context synchronization displaying in the picture.By by textual presentation in mouth
A kind of effect enunciated in real time is built on portion periphery.
It should be understood that although each step in above-mentioned flow chart is shown successively according to the instruction of arrow, this
A little steps are not that the inevitable sequence indicated according to arrow executes successively.Unless expressly state otherwise herein, these steps
It executes there is no the limitation of stringent sequence, these steps can execute in other order.Moreover, at least part step can be with
Including multiple sub-steps either these sub-steps of multiple stages or stage be not necessarily execute completion in synchronization, and
It is to execute at different times, the execution sequence in these sub-steps or stage is also not necessarily to be carried out successively, but can
With either the sub-step of other steps or at least part in stage execute in turn or alternately with other steps.
As shown in figure 14, in one embodiment it is proposed that a kind of image processing apparatus, the device include:
Acquisition module 1402, for obtaining target image, the target image includes target subject;
Picture recognition module 1404 identifies target master for the target subject in the target image to be identified
Body region;
Sound identification module 1406 is used for collecting voice data in real time, the collected voice data is identified as text
This;
Position determination module 1408, for according to the target subject region, determining the initial position that the text is presented;
Display module 1410, for using the initial position as starting point by the textual presentation in the target image.
In one embodiment, display module is additionally operable to when the corresponding text of voice data forms word, by institute's predicate
Language is illustrated in the initial position;When the corresponding text of voice data forms next word, by history displaying word to
Deviate the direction movement of the initial position and shows;Next word is shown according to the initial position, repeat into
Enter it is described when the corresponding text of voice data forms next word the step of, with the passage of data under voice time with
The mode of word movement shows the corresponding text of voice data in real time.
In one embodiment, display module is additionally operable to the voice data segment acquired in real time forming sound bite, obtains
Take the corresponding segment text of the sound bite;By the segment textual presentation in the initial position;Obtain next voice
The corresponding next segment text of segment to the direction movement for deviateing the initial position and opens up the segment text of history displaying
Show;Next segment text is shown according to the initial position, repeats to enter the next sound bite pair of acquisition
The step of next segment text answered, with the data under voice time passage by segment text move in a manner of in real time
Show the corresponding text of voice data.
As shown in figure 15, in one embodiment, described device further includes:
Word-dividing mode 1412 obtains multiple Ziwen sheets for the text to be carried out word segmentation processing;
The display module is additionally operable to determine each Ziwen according to each Ziwen sheet corresponding Speech time stamp that this is right
The starting displaying time answered shows that each Ziwen sheet is by the time with the initial position according to the corresponding starting of each Ziwen sheet
Starting point carries out Dynamic Display over time according to preset track.
As shown in figure 16, in one embodiment, described device further includes:
Extraction module 1414, for extracting crucial text from the multiple Ziwen sheet according to semantics recognition;
The display module is additionally operable to determine each crucial text according to the corresponding Speech time stamp of each key text
This corresponding starting shows the time;According to each key text corresponding starting displaying time by each crucial text with described
Beginning position is that starting point carries out Dynamic Display over time according to preset track.
In one embodiment, above-mentioned image processing apparatus further includes:Image taking module, for obtaining shooting instruction,
The current text for obtaining present image according to the shooting instruction and being illustrated in the present image, according to working as current text
The current text and the present image are synthetically formed composograph, and preserve the composograph by preceding display location.
In one embodiment, above-mentioned image processing apparatus further includes:Video capture module refers to for obtaining starting shooting
It enables, displaying text in the picture and image is constantly synthetically formed by each composograph according to the starting shooting instruction
Frame, and each synthesized image frame is preserved, it obtains and terminates shooting instruction, synthetic video is formed according to each synthesized image frame.
In one embodiment, the target subject is face;Described image identification module is additionally operable to extraction described image
In human face characteristic point, the position of face is determined according to the human face characteristic point;The display module is additionally operable to according to the people
The characteristic point that mouth is represented in face characteristic point determines mouth position, and the displaying position of the text is determined according to the mouth position
Set, according to the display location by the textual presentation in described image.
In one embodiment, display module is additionally operable to control the text according to the corresponding displaying control parameter of the text
This carries out Dynamic Display by starting point of the initial position.
In one embodiment, the display module is additionally operable to obtain the display location of text in forward frame image, according to
The mesh of text in current frame image is calculated in the display location of text in the displaying control parameter and the forward frame image
Cursor position, the target location in current frame image show the text.
In one embodiment, above-mentioned image processing apparatus further includes:Conversion module, for being text by the text conversion
This picture;The display module is additionally operable to using the textual image as the particle in particIe system, according to the target subject
Location determination described in textual image starting display location, controlled according to pre-set Fe coatings in the particIe system
The textual image carries out Dynamic Display by starting point of the starting display location.
As shown in figure 17, in one embodiment, the Fe coatings include speed parameter, angle parameter, color parameter,
At least one of size parameter, time parameter;
The display module includes:
Forward direction textual image state acquisition module 1410A, for obtaining textual image state in forward frame image, the text
This picture state includes at least one of the position of text, size, angle, color;
Textual image display module 1410B, for the shape according to text in the Fe coatings and the forward frame image
Textual image state in current frame image is calculated in state, and the text is shown according to textual image state in the current frame image
This picture.
In one embodiment, the conversion module is additionally operable to carry out cutting word processing to the text, obtains multiple displayings
Word, it is each to show that word corresponds to one textual image of generation, obtain multiple textual images.
In one embodiment, the conversion module is additionally operable to go out the target critical in the text according to semantics recognition
The target keyword is converted to textual image by word.
In one embodiment, the display module is additionally operable to the voice according to the corresponding voice data of the textual image
Timestamp determines the starting displaying time of the textual image, and phase is controlled respectively according to the corresponding Fe coatings of each textual image
The textual image answered has been that dotted state carries out Dynamic Display with the starting display location and starting displaying time.
In one embodiment, above-mentioned image processing apparatus further includes:Synchronous display module, it is collected every for recording
The image temporal of one frame video image stabs, and the corresponding Speech time stamp of the voice data got is recorded, when by corresponding voice
Between the text that stabs and identify be associated, according to described image timestamp and the Speech time stabs so that video image and phase
The text answered synchronizes displaying.
As shown in figure 18, in one embodiment it is proposed that a kind of image processing apparatus, the device include:
Image collection module 1802, for obtaining target image, the target image includes mouth.
Lip reading identification module 1804 carries out lip reading knowledge for detecting the mouth in the target image according to mouth action
Not, corresponding identification text is obtained.
Synchronous display module 1806, the context synchronization for obtaining identification are illustrated in the target image.
In one embodiment, the synchronous display module 1806 is additionally operable to literary described in the location determination according to the mouth
This display location, the display location context synchronization being illustrated in described image.
Figure 19 shows the internal structure chart of one embodiment Computer equipment.The computer equipment can be specifically clothes
Business device.As shown in figure 19, which includes the processor connected by system bus, memory, network interface, input
Device, image collecting device, voice acquisition device and display screen.Wherein, memory includes non-volatile memory medium and memory
Reservoir.The non-volatile memory medium of the computer equipment is stored with operating system, can also be stored with computer program, the calculating
When machine program is executed by processor, processor may make to realize image processing method.Also calculating can be stored in the built-in storage
Machine program when the computer program is executed by processor, may make processor to execute image processing method.The figure of computer equipment
Picture harvester is camera, and for acquiring image, voice acquisition device is microphone, for acquiring voice data.Computer
The display screen of equipment can be liquid crystal display or electric ink display screen, and the input unit of computer equipment can be display
The touch layer covered on screen can also be the button being arranged on computer equipment shell, trace ball or Trackpad, can also be outer
Keyboard, Trackpad or mouse for connecing etc..It will be understood by those skilled in the art that structure shown in Figure 19, only with this Shen
Please the relevant part-structure of scheme block diagram, do not constitute the limit for the computer equipment being applied thereon to application scheme
Fixed, specific computer equipment may include either combining certain components or tool than more or fewer components as shown in the figure
There is different component arrangements.
In one embodiment, image processing method provided by the present application can be implemented as a kind of shape of computer program
Formula, computer program can be run on computer equipment as shown in figure 19.Composition can be stored in the memory of computer equipment
Each program module of the image processing apparatus, for example, acquisition module 1402, picture recognition module 1404, language shown in Figure 14
Sound identification module 1406, position determination module 1408 and display module 1410.The computer program that each program module is constituted makes
Processor executes the step in the image processing apparatus of each embodiment of the application described in this specification.For example, Figure 19
Shown in computer equipment can by the acquisition module 1402 in image processing apparatus as shown in figure 14 obtain target image,
The target image includes target subject;The target subject in the target image is carried out by picture recognition module 1404
Identification, identifies target subject region;By 1406 collecting voice data in real time of sound identification module, by collected institute's predicate
Sound data are identified as text;By position determination module 1408 according to the target subject region, determine what the text was presented
Initial position;By display module 1410 using the initial position as starting point by the textual presentation in the target image.
In one embodiment it is proposed that a kind of computer equipment, including memory and processor, the memory storage
There is computer program, when the computer program is executed by the processor so that the processor executes following steps:It obtains
Target image, the target image include target subject;Target subject in the target image is identified, is identified
Target subject region;The collected voice data is identified as text by collecting voice data in real time;According to the target master
Body region determines the initial position that the text is presented;Using the initial position as starting point by the textual presentation in the mesh
In logo image.
In one embodiment, it is described using the initial position as starting point by the textual presentation in the target image
The step of include:When the corresponding text of voice data forms word, the word is illustrated in the initial position;Work as voice
When the corresponding text of data forms next word, the word of history displaying is moved simultaneously to the direction for deviateing the initial position
Displaying;Next word is shown according to the initial position, repeats to enter described when the corresponding text shape of voice data
At the step of when next word, as the passage of data under voice time shows voice in real time in such a way that word moves
The corresponding text of data.
In one embodiment, it is described using the initial position as starting point by the textual presentation in the target image
The step of include:The voice data segment acquired in real time is formed into sound bite, obtains the corresponding segment text of the sound bite
This;By the segment textual presentation in the initial position;The corresponding next segment text of next sound bite is obtained, it will
The segment text of history displaying is mobile to the direction for deviateing the initial position and shows;By next segment text according to
The step of initial position shows, repeats next segment text corresponding into the next sound bite of acquisition, with
The passage for the data under voice time shows the corresponding text of voice data in real time in such a way that segment text moves.
In one embodiment, the processor is additionally operable to execute following steps:The text is subjected to word segmentation processing, is obtained
To multiple Ziwen sheets;
It is described to include by step of the textual presentation in the target image using the initial position as starting point:According to
The corresponding Speech time stamp of each Ziwen sheet determines each Ziwen sheet corresponding starting displaying time;According to each Ziwen sheet
The corresponding starting displaying time by each Ziwen sheet using the initial position as starting point according to preset track pushing away with the time
It is moved into Mobile state displaying.
In one embodiment, the processor is described by text progress word segmentation processing in execution, obtains multiple sons
After the step of text, it is additionally operable to execute following steps:Crucial text is extracted from the multiple Ziwen sheet according to semantics recognition
This;It is described to include by step of the textual presentation in the target image using the initial position as starting point:According to each
The corresponding Speech time stamp of crucial text determines each crucial text corresponding starting displaying time;According to each crucial text
This corresponding starting displaying time by each key text using the initial position as starting point according to preset track with the time
Passage carry out Dynamic Display.
In one embodiment, the processor is additionally operable to execute following steps:Shooting instruction is obtained, according to the shooting
The current text that instruction obtains current target image and is illustrated in the current target image;According to the current exhibition of current text
Show position, the current text and the current target image is synthetically formed composograph, and preserve the composograph.
In one embodiment, the processor is additionally operable to execute following steps:Starting shooting instruction is obtained, according to described
Displaying text in the target image and target image are constantly synthetically formed each synthesized image frame by starting shooting instruction, and
Preserve each synthesized image frame;It obtains and terminates shooting instruction, synthetic video is formed according to each synthesized image frame.
In one embodiment, the target subject is face;The target subject in the target image carries out
Identification, the step of identifying target subject region include:The human face characteristic point in described image is extracted, according to the face characteristic
Point determines human face region;It is described according to the target subject region, the step of determining the initial position that the text is presented includes:
Mouth position is determined according to the characteristic point for representing mouth in the human face characteristic point, and the text is determined according to the mouth position
Initial position.In one embodiment, it is described using the initial position as starting point by the textual presentation in the target figure
As in step include:The text is controlled using the initial position as starting point according to the corresponding displaying control parameter of the text
Carry out Dynamic Display.
In one embodiment, described that the text is controlled with described first according to the corresponding displaying control parameter of the text
Beginning display location is that the step of starting point carries out Dynamic Display includes:Obtain the display location of text in forward frame image;According to institute
The target of text in current frame image is calculated in the display location for stating text in displaying control parameter and the forward frame image
Position, the target location in current frame image show the text.
In one embodiment, the processor is executing the collecting voice data in real time, by collected institute's predicate
After the step of sound data are identified as text, it is additionally operable to execute following steps:It is textual image by the text conversion;It is described with
The initial position is that step of the textual presentation in the target image is included by starting point:Using the textual image as
Particle in particIe system controls the textual image with described according to pre-set Fe coatings in the particIe system
Beginning position is that starting point carries out Dynamic Display.
In one embodiment, the Fe coatings include speed parameter, angle parameter, color parameter, size parameter, when
Between at least one of parameter;The textual image is controlled with described according to pre-set Fe coatings in the particIe system
Initial position is that the step of starting point carries out Dynamic Display includes:Obtain textual image state in forward frame image, the text diagram
Piece state includes at least one of the position of textual image, size, angle, color;According to the Fe coatings and it is described before
Into frame image, textual image state computation obtains textual image state in current frame image, according to current frame image Chinese
This picture state shows the textual image.
In one embodiment, described using the textual image as the particle in particIe system, according to the particle systems
Pre-set Fe coatings control the step of textual image carries out Dynamic Display using the initial position as starting point in system
Including:When determining the starting displaying of the textual image according to the Speech time of the corresponding voice data of textual image stamp
Between;Corresponding textual image is controlled respectively with the initial position and described according to the corresponding Fe coatings of each textual image
Begin to show that the time has been that dotted state carries out Dynamic Display.
In one embodiment, the processor is additionally operable to execute following steps:Record collected each frame video figure
The image temporal of picture stabs;The corresponding Speech time stamp of the voice data got is recorded, corresponding Speech time is stabbed and identified
The text gone out is associated;Stabbed according to described image timestamp and the Speech time so that video image and corresponding text into
The synchronous displaying of row.
In one embodiment it is proposed that a kind of computer equipment, including memory and processor, the memory storage
There is computer program, when the computer program is executed by the processor so that the processor executes following steps:It obtains
Target image, the target image include mouth;The mouth in the target image is detected, lip reading is carried out according to mouth action
Identification, obtains corresponding identification text;The context synchronization that identification obtains is illustrated in the target image.
In one embodiment, the context synchronization that identification is obtained is illustrated in the step packet in the target image
It includes:According to the display location of text described in the location determination of the mouth, the context synchronization is illustrated in the target image
In display location.
In one embodiment it is proposed that a kind of computer readable storage medium, is stored with computer program, the calculating
When machine program is executed by processor so that the processor executes following steps:Target image is obtained, is wrapped in the target image
Include target subject;Target subject in the target image is identified, identifies target subject region;Acquisition voice in real time
The collected voice data is identified as text by data;According to the target subject region, determine what the text was presented
Initial position;Using the initial position as starting point by the textual presentation in the target image.
In one embodiment, it is described using the initial position as starting point by the textual presentation in the target image
The step of include:When the corresponding text of voice data forms word, the word is illustrated in the initial position;Work as voice
When the corresponding text of data forms next word, the word of history displaying is moved simultaneously to the direction for deviateing the initial position
Displaying;Next word is shown according to the initial position, repeats to enter described when the corresponding text shape of voice data
At the step of when next word, as the passage of data under voice time shows voice in real time in such a way that word moves
The corresponding text of data.
In one embodiment, it is described using the initial position as starting point by the textual presentation in the target image
The step of include:The voice data segment acquired in real time is formed into sound bite, obtains the corresponding segment text of the sound bite
This;By the segment textual presentation in the initial position;The corresponding next segment text of next sound bite is obtained, it will
The segment text of history displaying is mobile to the direction for deviateing the initial position and shows;By next segment text according to
The step of initial position shows, repeats next segment text corresponding into the next sound bite of acquisition, with
The passage for the data under voice time shows the corresponding text of voice data in real time in such a way that segment text moves.
In one embodiment, the processor is additionally operable to execute following steps:The text is subjected to word segmentation processing, is obtained
To multiple Ziwen sheets;
It is described to include by step of the textual presentation in the target image using the initial position as starting point:According to
The corresponding Speech time stamp of each Ziwen sheet determines each Ziwen sheet corresponding starting displaying time;According to each Ziwen sheet
The corresponding starting displaying time by each Ziwen sheet using the initial position as starting point according to preset track pushing away with the time
It is moved into Mobile state displaying.
In one embodiment, the processor is described by text progress word segmentation processing in execution, obtains multiple sons
After the step of text, it is additionally operable to execute following steps:Crucial text is extracted from the multiple Ziwen sheet according to semantics recognition
This;It is described to include by step of the textual presentation in the target image using the initial position as starting point:According to each
The corresponding Speech time stamp of crucial text determines each crucial text corresponding starting displaying time;According to each crucial text
This corresponding starting displaying time by each key text using the initial position as starting point according to preset track with the time
Passage carry out Dynamic Display.
In one embodiment, the processor is additionally operable to execute following steps:Shooting instruction is obtained, according to the shooting
The current text that instruction obtains current target image and is illustrated in the current target image;According to the current exhibition of current text
Show position, the current text and the current target image is synthetically formed composograph, and preserve the composograph.
In one embodiment, the processor is additionally operable to execute following steps:Starting shooting instruction is obtained, according to described
Displaying text in the target image and target image are constantly synthetically formed each synthesized image frame by starting shooting instruction, and
Preserve each synthesized image frame;It obtains and terminates shooting instruction, synthetic video is formed according to each synthesized image frame.
In one embodiment, the target subject is face;The target subject in the target image carries out
Identification, the step of identifying target subject region include:The human face characteristic point in described image is extracted, according to the face characteristic
Point determines human face region;It is described according to the target subject region, the step of determining the initial position that the text is presented includes:
Mouth position is determined according to the characteristic point for representing mouth in the human face characteristic point, and the text is determined according to the mouth position
Initial position.In one embodiment, it is described using the initial position as starting point by the textual presentation in the target figure
As in step include:The text is controlled using the initial position as starting point according to the corresponding displaying control parameter of the text
Carry out Dynamic Display.
In one embodiment, described that the text is controlled with described first according to the corresponding displaying control parameter of the text
Beginning display location is that the step of starting point carries out Dynamic Display includes:Obtain the display location of text in forward frame image;According to institute
The target of text in current frame image is calculated in the display location for stating text in displaying control parameter and the forward frame image
Position, the target location in current frame image show the text.
In one embodiment, the processor is executing the collecting voice data in real time, by collected institute's predicate
After the step of sound data are identified as text, it is additionally operable to execute following steps:It is textual image by the text conversion;It is described with
The initial position is that step of the textual presentation in the target image is included by starting point:Using the textual image as
Particle in particIe system controls the textual image with described according to pre-set Fe coatings in the particIe system
Beginning position is that starting point carries out Dynamic Display.
In one embodiment, the Fe coatings include speed parameter, angle parameter, color parameter, size parameter, when
Between at least one of parameter;The textual image is controlled with described according to pre-set Fe coatings in the particIe system
Initial position is that the step of starting point carries out Dynamic Display includes:Obtain textual image state in forward frame image, the text diagram
Piece state includes at least one of the position of textual image, size, angle, color;According to the Fe coatings and it is described before
Into frame image, textual image state computation obtains textual image state in current frame image, according to current frame image Chinese
This picture state shows the textual image.
In one embodiment, described using the textual image as the particle in particIe system, according to the particle systems
Pre-set Fe coatings control the step of textual image carries out Dynamic Display using the initial position as starting point in system
Including:When determining the starting displaying of the textual image according to the Speech time of the corresponding voice data of textual image stamp
Between;Corresponding textual image is controlled respectively with the initial position and described according to the corresponding Fe coatings of each textual image
Begin to show that the time has been that dotted state carries out Dynamic Display.
In one embodiment, the processor is additionally operable to execute following steps:Record collected each frame video figure
The image temporal of picture stabs;The corresponding Speech time stamp of the voice data got is recorded, corresponding Speech time is stabbed and identified
The text gone out is associated;Stabbed according to described image timestamp and the Speech time so that video image and corresponding text into
The synchronous displaying of row.
In one embodiment it is proposed that a kind of computer readable storage medium, is stored with computer program, the calculating
When machine program is executed by processor so that the processor executes following steps:Target image is obtained, is wrapped in the target image
Include mouth;The mouth in the target image is detected, lip reading identification is carried out according to mouth action, obtains corresponding identification text;
The context synchronization that identification obtains is illustrated in the target image.
In one embodiment, the context synchronization that identification is obtained is illustrated in the step packet in the target image
It includes:According to the display location of text described in the location determination of the mouth, the context synchronization is illustrated in the target image
In display location.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a non-volatile computer and can be read
In storage medium, the program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, provided herein
Each embodiment used in any reference to memory, storage, database or other media, may each comprise non-volatile
And/or volatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled
Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory
(RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM
(SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM
(ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight
Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield is all considered to be the range of this specification record.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
Cannot the limitation to the application the scope of the claims therefore be interpreted as.It should be pointed out that for those of ordinary skill in the art
For, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the guarantor of the application
Protect range.Therefore, the protection domain of the application patent should be determined by the appended claims.
Claims (20)
1. a kind of image processing method, the method includes:
Target image is obtained, the target image includes target subject;
Target subject in the target image is identified, identifies target subject region;
The collected voice data is identified as text by collecting voice data in real time;
According to the target subject region, the initial position that the text is presented is determined;
Using the initial position as starting point by the textual presentation in the target image.
2. according to the method described in claim 1, it is characterized in that, described using the initial position is starting point by the text exhibition
Show that the step in the target image includes:
When the corresponding text of voice data forms word, the word is illustrated in the initial position;
When the corresponding text of voice data forms next word, by the word of history displaying to deviateing the initial position
Direction is mobile and shows;
Next word is shown according to the initial position, repeats to enter described when the corresponding text of voice data is formed
The step of when next word, as the passage of data under voice time shows voice number in real time in such a way that word moves
According to corresponding text.
3. according to the method described in claim 1, it is characterized in that, described using the initial position is starting point by the text exhibition
Show that the step in the target image includes:
The voice data segment acquired in real time is formed into sound bite, obtains the corresponding segment text of the sound bite;
By the segment textual presentation in the initial position;
The corresponding next segment text of next sound bite is obtained, by the segment text of history displaying to deviateing the starting
The direction of position is mobile and shows;
Next segment text is shown according to the initial position, repeats to enter the next sound bite pair of acquisition
The step of next segment text answered, with the data under voice time passage by segment text move in a manner of in real time
Show the corresponding text of voice data.
4. according to the method described in claim 1, it is characterized in that, the method further includes:
The text is subjected to word segmentation processing, obtains multiple Ziwen sheets;
It is described to include by step of the textual presentation in the target image using the initial position as starting point:
Determine that the corresponding starting of each Ziwen sheet shows the time according to the corresponding Speech time stamp of each Ziwen sheet;
By each Ziwen sheet it is starting point according to presetting using the initial position according to each Ziwen sheet corresponding starting displaying time
Track carry out Dynamic Display over time.
5. according to the method described in claim 4, it is characterized in that, text progress word segmentation processing is obtained more described
Further include after the step of a sub- text:
According to semantics recognition crucial text is extracted from the multiple Ziwen sheet;
It is described to include by step of the textual presentation in the target image using the initial position as starting point:
Determine that the corresponding starting of each key text shows the time according to the corresponding Speech time stamp of each key text;
According to each crucial text corresponding starting displaying time by each key text using the initial position as starting point according to
Preset track carries out Dynamic Display over time.
6. according to the method described in claim 1, it is characterized in that, the method further includes:
Shooting instruction is obtained, current target image is obtained according to the shooting instruction and is illustrated in the current target image
Current text;
According to the current presentation position of current text, the current text and the current target image are synthetically formed composite diagram
Picture, and preserve the composograph.
7. according to the method described in claim 1, it is characterized in that, the method further includes:
Starting shooting instruction is obtained, according to the starting shooting instruction constantly by displaying text in the target image and target
Image is synthetically formed each synthesized image frame, and preserves each synthesized image frame;
It obtains and terminates shooting instruction, synthetic video is formed according to each synthesized image frame.
8. according to the method described in claim 1, it is characterized in that, the target subject is face;
The step of target subject in the target image is identified, identifies target subject region include:
The human face characteristic point in described image is extracted, human face region is determined according to the human face characteristic point;
It is described according to the target subject region, the step of determining the initial position that the text is presented includes:According to the people
The characteristic point that mouth is represented in face characteristic point determines mouth position, and the start bit of the text is determined according to the mouth position
It sets.
9. according to the method described in claim 1, it is characterized in that, described using the initial position is starting point by the text exhibition
Show that the step in the target image includes:
It is starting point into Mobile state exhibition to control the text using the initial position according to the corresponding displaying control parameter of the text
Show.
10. according to the method described in claim 9, it is characterized in that, described according to the corresponding displaying control parameter of the text
Controlling the step of text carries out Dynamic Display using the initial presentation position as starting point includes:
Obtain the display location of text in forward frame image;
It is calculated in current frame image according to the display location of text in the displaying control parameter and the forward frame image
The target location of text, the target location in current frame image show the text.
11. according to the method described in claim 1, it is characterized in that, in the collecting voice data in real time, by collected institute
Stating the step of voice data is identified as text further includes later:
It is textual image by the text conversion;
It is described to include by step of the textual presentation in the target image using the initial position as starting point:
Using the textual image as the particle in particIe system, according to pre-set Fe coatings control in the particIe system
It makes the textual image and carries out Dynamic Display by starting point of the initial position.
12. according to the method for claim 11, which is characterized in that the Fe coatings include speed parameter, angle parameter,
At least one of color parameter, size parameter, time parameter;
The textual image is controlled using the initial position as starting point according to pre-set Fe coatings in the particIe system
Carry out Dynamic Display the step of include:
Textual image state in forward frame image is obtained, the textual image state includes position, size, the angle of textual image
At least one of degree, color;
Text diagram in current frame image is obtained according to textual image state computation in the Fe coatings and the forward frame image
Piece state shows the textual image according to textual image state in the current frame image.
13. according to the method for claim 11, which is characterized in that described using the textual image as in particIe system
Particle controls the textual image using the initial position as starting point according to pre-set Fe coatings in the particIe system
Carry out Dynamic Display the step of include:
Determine that the starting of the textual image shows the time according to the Speech time of the corresponding voice data of textual image stamp;
Corresponding textual image is controlled respectively with the initial position and described according to the corresponding Fe coatings of each textual image
The starting displaying time has been that dotted state carries out Dynamic Display.
14. method according to any one of claim 1 to 13, which is characterized in that the method further includes:
Record the image temporal stamp of collected each frame video image;
The corresponding Speech time stamp of the voice data got is recorded, corresponding Speech time is stabbed and is carried out with the text identified
Association;
It is stabbed according to described image timestamp and the Speech time so that video image synchronizes displaying with corresponding text.
15. a kind of image processing method, the method includes:
Target image is obtained, the target image includes mouth;
The mouth in the target image is detected, lip reading identification is carried out according to mouth action, obtains corresponding identification text;
The context synchronization that identification obtains is illustrated in the target image.
16. according to the method for claim 15, which is characterized in that the context synchronization for obtaining identification is illustrated in described
Step in target image includes:
According to the display location of text described in the location determination of the mouth, the context synchronization is illustrated in the target image
In display location.
17. a kind of image processing apparatus, described device include:
Acquisition module, for obtaining target image, the target image includes target subject;
Picture recognition module identifies target subject region for the target subject in the target image to be identified;
Sound identification module is used for collecting voice data in real time, the collected voice data is identified as text;
Position determination module, for according to the target subject region, determining the initial position that the text is presented;
Display module, for using the initial position as starting point by the textual presentation in the target image.
18. a kind of image processing apparatus, described device include:
Image collection module, for obtaining target image, the target image includes mouth;
Lip reading identification module carries out lip reading identification according to mouth action, obtains pair for detecting the mouth in the target image
The identification text answered;
Synchronous display module, the context synchronization for obtaining identification are illustrated in the target image.
19. a kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor,
So that the processor is executed such as the step of any one of claim 1 to 16 the method.
20. a kind of computer equipment, including memory and processor, the memory is stored with computer program, the calculating
When machine program is executed by the processor so that the processor is executed such as any one of claim 1 to 16 the method
Step.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810036627.0A CN108320318B (en) | 2018-01-15 | 2018-01-15 | Image processing method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810036627.0A CN108320318B (en) | 2018-01-15 | 2018-01-15 | Image processing method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108320318A true CN108320318A (en) | 2018-07-24 |
CN108320318B CN108320318B (en) | 2023-07-28 |
Family
ID=62893260
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810036627.0A Active CN108320318B (en) | 2018-01-15 | 2018-01-15 | Image processing method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108320318B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857905A (en) * | 2018-11-29 | 2019-06-07 | 维沃移动通信有限公司 | A kind of video editing method and terminal device |
CN110445954A (en) * | 2019-07-26 | 2019-11-12 | 腾讯科技(深圳)有限公司 | Image-pickup method, device and electronic equipment |
CN110782899A (en) * | 2018-07-26 | 2020-02-11 | 富士施乐株式会社 | Information processing apparatus, storage medium, and information processing method |
CN111124145A (en) * | 2018-11-01 | 2020-05-08 | 奇酷互联网络科技(深圳)有限公司 | Information input method, mobile terminal and storage device |
CN111464827A (en) * | 2020-04-20 | 2020-07-28 | 玉环智寻信息技术有限公司 | Data processing method and device, computing equipment and storage medium |
CN111462279A (en) * | 2019-01-18 | 2020-07-28 | 阿里巴巴集团控股有限公司 | Image display method, device, equipment and readable storage medium |
CN112015943A (en) * | 2019-05-31 | 2020-12-01 | 华为技术有限公司 | Humming recognition method and related equipment |
CN113873165A (en) * | 2021-10-25 | 2021-12-31 | 维沃移动通信有限公司 | Photographing method and device and electronic equipment |
WO2022183814A1 (en) * | 2021-03-03 | 2022-09-09 | Oppo广东移动通信有限公司 | Voice annotation and use method and device for image, electronic device, and storage medium |
CN115209175A (en) * | 2022-07-18 | 2022-10-18 | 忆月启函(盐城)科技有限公司 | Voice transmission method and system |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004056286A (en) * | 2002-07-17 | 2004-02-19 | Fuji Photo Film Co Ltd | Image display method |
CN1822651A (en) * | 2005-11-21 | 2006-08-23 | 深圳创维-Rgb电子有限公司 | Method for dynamically forming caption image data and caption data flow |
US20090002368A1 (en) * | 2007-06-26 | 2009-01-01 | Nokia Corporation | Method, apparatus and a computer program product for utilizing a graphical processing unit to provide depth information for autostereoscopic display |
CN101539929A (en) * | 2009-04-17 | 2009-09-23 | 无锡天脉聚源传媒科技有限公司 | Method for indexing TV news by utilizing computer system |
CN101917557A (en) * | 2010-08-10 | 2010-12-15 | 浙江大学 | Method for dynamically adding subtitles based on video content |
CN101996195A (en) * | 2009-08-28 | 2011-03-30 | 中国移动通信集团公司 | Searching method and device of voice information in audio files and equipment |
CN102222101A (en) * | 2011-06-22 | 2011-10-19 | 北方工业大学 | Method for video semantic mining |
CN202352332U (en) * | 2011-11-30 | 2012-07-25 | 李扬德 | Portable type lip language identifier |
CN103716537A (en) * | 2013-12-18 | 2014-04-09 | 宇龙计算机通信科技(深圳)有限公司 | Photograph synthesizing method and terminal |
CN104408462A (en) * | 2014-09-22 | 2015-03-11 | 广东工业大学 | Quick positioning method of facial feature points |
CN105245917A (en) * | 2015-09-28 | 2016-01-13 | 徐信 | System and method for generating multimedia voice caption |
CN105654532A (en) * | 2015-12-24 | 2016-06-08 | Tcl集团股份有限公司 | Photo photographing and processing method and system |
CN105975273A (en) * | 2016-05-04 | 2016-09-28 | 腾讯科技(深圳)有限公司 | Particle animation realization method and system as well as purification process display method and system for optimization tool |
CN106384108A (en) * | 2016-08-31 | 2017-02-08 | 上海斐讯数据通信技术有限公司 | Text content retrieval method, word interpreting device and mobile terminal |
CN107220228A (en) * | 2017-06-13 | 2017-09-29 | 深圳市鹰硕技术有限公司 | One kind teaching recorded broadcast data correction device |
-
2018
- 2018-01-15 CN CN201810036627.0A patent/CN108320318B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004056286A (en) * | 2002-07-17 | 2004-02-19 | Fuji Photo Film Co Ltd | Image display method |
CN1822651A (en) * | 2005-11-21 | 2006-08-23 | 深圳创维-Rgb电子有限公司 | Method for dynamically forming caption image data and caption data flow |
US20090002368A1 (en) * | 2007-06-26 | 2009-01-01 | Nokia Corporation | Method, apparatus and a computer program product for utilizing a graphical processing unit to provide depth information for autostereoscopic display |
CN101539929A (en) * | 2009-04-17 | 2009-09-23 | 无锡天脉聚源传媒科技有限公司 | Method for indexing TV news by utilizing computer system |
CN101996195A (en) * | 2009-08-28 | 2011-03-30 | 中国移动通信集团公司 | Searching method and device of voice information in audio files and equipment |
CN101917557A (en) * | 2010-08-10 | 2010-12-15 | 浙江大学 | Method for dynamically adding subtitles based on video content |
CN102222101A (en) * | 2011-06-22 | 2011-10-19 | 北方工业大学 | Method for video semantic mining |
CN202352332U (en) * | 2011-11-30 | 2012-07-25 | 李扬德 | Portable type lip language identifier |
CN103716537A (en) * | 2013-12-18 | 2014-04-09 | 宇龙计算机通信科技(深圳)有限公司 | Photograph synthesizing method and terminal |
CN104408462A (en) * | 2014-09-22 | 2015-03-11 | 广东工业大学 | Quick positioning method of facial feature points |
CN105245917A (en) * | 2015-09-28 | 2016-01-13 | 徐信 | System and method for generating multimedia voice caption |
CN105654532A (en) * | 2015-12-24 | 2016-06-08 | Tcl集团股份有限公司 | Photo photographing and processing method and system |
CN105975273A (en) * | 2016-05-04 | 2016-09-28 | 腾讯科技(深圳)有限公司 | Particle animation realization method and system as well as purification process display method and system for optimization tool |
CN106384108A (en) * | 2016-08-31 | 2017-02-08 | 上海斐讯数据通信技术有限公司 | Text content retrieval method, word interpreting device and mobile terminal |
CN107220228A (en) * | 2017-06-13 | 2017-09-29 | 深圳市鹰硕技术有限公司 | One kind teaching recorded broadcast data correction device |
Non-Patent Citations (2)
Title |
---|
SAAD D. AL-SHAMMA ET AL.: "Arabic Braille Recognition and transcription into text and voice", 《2010 5TH CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE》, pages 1 - 5 * |
吴慧: "面向盲人视觉辅助系统的自然场景文本检测", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 3, pages 138 - 1904 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110782899A (en) * | 2018-07-26 | 2020-02-11 | 富士施乐株式会社 | Information processing apparatus, storage medium, and information processing method |
CN111124145A (en) * | 2018-11-01 | 2020-05-08 | 奇酷互联网络科技(深圳)有限公司 | Information input method, mobile terminal and storage device |
CN111124145B (en) * | 2018-11-01 | 2023-05-16 | 奇酷互联网络科技(深圳)有限公司 | Information input method, mobile terminal and storage device |
CN109857905B (en) * | 2018-11-29 | 2022-03-15 | 维沃移动通信有限公司 | Video editing method and terminal equipment |
CN109857905A (en) * | 2018-11-29 | 2019-06-07 | 维沃移动通信有限公司 | A kind of video editing method and terminal device |
CN111462279A (en) * | 2019-01-18 | 2020-07-28 | 阿里巴巴集团控股有限公司 | Image display method, device, equipment and readable storage medium |
CN111462279B (en) * | 2019-01-18 | 2023-06-09 | 阿里巴巴集团控股有限公司 | Image display method, device, equipment and readable storage medium |
CN112015943A (en) * | 2019-05-31 | 2020-12-01 | 华为技术有限公司 | Humming recognition method and related equipment |
WO2020239001A1 (en) * | 2019-05-31 | 2020-12-03 | 华为技术有限公司 | Humming recognition method and related device |
CN110445954A (en) * | 2019-07-26 | 2019-11-12 | 腾讯科技(深圳)有限公司 | Image-pickup method, device and electronic equipment |
CN111464827A (en) * | 2020-04-20 | 2020-07-28 | 玉环智寻信息技术有限公司 | Data processing method and device, computing equipment and storage medium |
WO2022183814A1 (en) * | 2021-03-03 | 2022-09-09 | Oppo广东移动通信有限公司 | Voice annotation and use method and device for image, electronic device, and storage medium |
CN113873165A (en) * | 2021-10-25 | 2021-12-31 | 维沃移动通信有限公司 | Photographing method and device and electronic equipment |
CN115209175A (en) * | 2022-07-18 | 2022-10-18 | 忆月启函(盐城)科技有限公司 | Voice transmission method and system |
CN115209175B (en) * | 2022-07-18 | 2023-10-24 | 深圳蓝色鲨鱼科技有限公司 | Voice transmission method and system |
Also Published As
Publication number | Publication date |
---|---|
CN108320318B (en) | 2023-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108320318A (en) | Image processing method, device, computer equipment and storage medium | |
Olszewski et al. | High-fidelity facial and speech animation for VR HMDs | |
CN109120866B (en) | Dynamic expression generation method and device, computer readable storage medium and computer equipment | |
US20200357180A1 (en) | Augmented reality apparatus and method | |
Fu et al. | High-fidelity face manipulation with extreme poses and expressions | |
Wang et al. | Movie2comics: Towards a lively video content presentation | |
US20210345016A1 (en) | Computer vision based extraction and overlay for instructional augmented reality | |
US20170287481A1 (en) | System and method to insert visual subtitles in videos | |
US20120130717A1 (en) | Real-time Animation for an Expressive Avatar | |
TWI255141B (en) | Method and system for real-time interactive video | |
KR20200054613A (en) | Video metadata tagging system and method thereof | |
CN111638784B (en) | Facial expression interaction method, interaction device and computer storage medium | |
US10755087B2 (en) | Automated image capture based on emotion detection | |
CN110868635A (en) | Video processing method and device, electronic equipment and storage medium | |
CN113709545A (en) | Video processing method and device, computer equipment and storage medium | |
US20040068408A1 (en) | Generating animation from visual and audio input | |
WO2018177134A1 (en) | Method for processing user-generated content, storage medium and terminal | |
CN108833964B (en) | Real-time continuous frame information implantation identification system | |
CN110176044A (en) | Information processing method, device, storage medium and computer equipment | |
Mattos et al. | Multi-view mouth renderization for assisting lip-reading | |
CN116206024A (en) | Video-based virtual human model driving method, device, equipment and storage medium | |
CN114567819B (en) | Video generation method, device, electronic equipment and storage medium | |
Bigioi et al. | Pose-aware speech driven facial landmark animation pipeline for automated dubbing | |
Doukas et al. | Dynamic neural portraits | |
KR20180082825A (en) | Method and apparatus for producing graphic effect according to motion recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |