CN108320318A

CN108320318A - Image processing method, device, computer equipment and storage medium

Info

Publication number: CN108320318A
Application number: CN201810036627.0A
Authority: CN
Inventors: 程培; 傅斌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-01-15
Filing date: 2018-01-15
Publication date: 2018-07-24
Anticipated expiration: 2038-01-15
Also published as: CN108320318B

Abstract

This application involves a kind of image processing method, this method includes：Obtain target image, the target image includes target subject, target subject in the target image is identified, identify target subject region, the collected voice data is identified as text, according to the target subject region by collecting voice data in real time, determine the initial position that the text is presented, using the initial position as starting point by the textual presentation in the target image.The image processing method by collected voice data by being converted to text, then in the picture by textual presentation, need not carry out additional edit operation, you can word is added in the image of acquisition by realization, easy to operate.In addition, it is also proposed that a kind of image processing apparatus, computer equipment and storage medium.

Description

Image processing method, device, computer equipment and storage medium

Technical field

This application involves computer processing technical fields, are set more particularly to a kind of image processing method, device, computer Standby and storage medium.

Background technology

With the development of the development of terminal, especially mobile terminal, taken pictures using the photographic device in mobile terminal Or shooting video has become universal phenomenon.But it is traditional using mobile terminal carry out shooting photo or shoot video can only It is simply shot, user needs the later stage to be compiled by repairing figure tool if it is intended to add content in the picture of shooting Processing is collected, it is cumbersome.

Invention content

Based on this, it is necessary in view of the above-mentioned problems, proposing a kind of image processing method of simple operation, device, calculating Machine equipment and storage medium.

A kind of image processing method, the method includes：

Target image is obtained, the target image includes target subject；

Target subject in the target image is identified, identifies target subject region；

The collected voice data is identified as text by collecting voice data in real time；

According to the target subject region, the initial position that the text is presented is determined；

Using the initial position as starting point by the textual presentation in the target image.

A kind of image processing apparatus, described device include：

Acquisition module, for obtaining target image, the target image includes target subject；

Picture recognition module identifies target subject area for the target subject in the target image to be identified Domain；

Sound identification module is used for collecting voice data in real time, the collected voice data is identified as text；

Position determination module, for according to the target subject region, determining the initial position that the text is presented；

Display module, for using the initial position as starting point by the textual presentation in the target image.

A kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor executes following steps：

Target image is obtained, the target image includes target subject；

A kind of computer equipment, including memory and processor, the memory are stored with computer program, the calculating When machine program is executed by the processor so that the processor executes following steps：

Target image is obtained, the target image includes target subject；

Above-mentioned image processing method, device, computer equipment and storage medium obtain target image, in target image Target subject be identified, target subject region, while collecting voice data in real time are identified, by collected voice data It is identified as text, initial position that text is presented then is determined according to target subject region, is starting point by text using initial position Displaying is in the target image.Then the image processing method is existed textual presentation by converting voice data into text in real time In target image, it additional editor need not be carried out can be realized word being added in the image of acquisition, it is easy to operate, and root The initial position of text in the picture is determined according to target subject region so that the displaying of text and image being capable of dynamic bind.

A kind of image processing method, the method includes：

Target image is obtained, the target image includes mouth；

The mouth in the target image is detected, lip reading identification is carried out according to mouth action, obtains corresponding identification text；

The context synchronization that identification obtains is illustrated in the target image.

A kind of image processing apparatus, described device include：

Image collection module, for obtaining target image, the target image includes mouth；

Lip reading identification module carries out lip reading identification according to mouth action, obtains for detecting the mouth in the target image To corresponding identification text；

Synchronous display module, the context synchronization for obtaining identification are illustrated in the target image.

Target image is obtained, the target image includes mouth；

Above-mentioned image processing method, device, computer equipment and storage medium detect target by obtaining target image Mouth in image carries out lip reading identification according to mouth action, obtains corresponding identification text, the text for then obtaining identification Synchronous displaying is in the target image.Above-mentioned image processing method carries out lip reading identification by the mouth action identified in image, and Corresponding text and mouth action are synchronized into displaying, realizes and easily text is added in image, and can realize Text and face action are consistent.

Description of the drawings

Fig. 1 is the flow chart of image processing method in one embodiment；

Fig. 2A is that first word is illustrated in target image median surface schematic diagram in one embodiment；

Fig. 2 B are that a upper word is carried out deviation displaying in one embodiment, and next word is illustrated in initial position Interface schematic diagram；

Fig. 2 C are the interface schematic diagram that multiple words chromatograph in target image in one embodiment；

Fig. 3 is the interface schematic diagram of segment textual presentation in the target image in one embodiment；

Fig. 4 is the flow chart of image processing method in another embodiment；

Fig. 5 is the schematic diagram of the human face characteristic point extracted in one embodiment；

Fig. 6 is that control textual image to originate display location according to Fe coatings in one embodiment be starting point into Mobile state The method flow diagram of displaying；

Fig. 7 is the particle renders flow diagram of particIe system in one embodiment；

Fig. 8 is that control textual image to originate display location according to Fe coatings in another embodiment be starting point into action The method flow diagram of state displaying；

Fig. 9 is the flow chart of image processing method in another embodiment；

Figure 10 is the flow diagram of image processing method in one embodiment；

Figure 11 is that the effect diagram enunciated is presented in image in one embodiment；

Figure 12 is the flow chart of image processing method in further embodiment；

Figure 13 is the flow chart of image processing method in a still further embodiment；

Figure 14 is the structure diagram of image processing apparatus in one embodiment；

Figure 15 is the structure diagram of image processing apparatus in another embodiment；

Figure 16 is the structure diagram of image processing apparatus in another embodiment；

Figure 17 is the structure diagram of display module in one embodiment；

Figure 18 is the structure diagram of image processing apparatus in a still further embodiment；

Figure 19 is the internal structure chart of one embodiment Computer equipment.

Specific implementation mode

It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.

As shown in Figure 1, in one embodiment, providing a kind of image processing method.The present embodiment is mainly in this way It is illustrated applied to terminal.Referring to Fig.1, which specifically comprises the following steps：

Step S102 obtains target image, and target image includes target subject.

Wherein, target image refers to pending image.The acquisition mode of target image can pass through the shape of shooting photo Formula can also be to be obtained by shooting the form of video, because video can be regarded as the picture composition of a frame frame.Image Acquisition can be acquired by the preposition or rear camera in terminal.Target subject refers to mesh to be identified in image Mark object.Target subject self-defined can be arranged, for example, target subject can be arranged as people, may be set to be face, may be used also To be set as face with more refining, naturally it is also possible to be set as animal, trees etc., can specifically be carried out according to actual conditions demand Self-defined setting.Target image can be the image or video acquired in real time, can also be image or video after shooting.One In a embodiment, the target image of acquisition is the preview image to be captured by calling camera to obtain, and preview image refers to Still unsaved image.

The target subject in target image is identified in step S104, identifies target subject region.

Wherein, the target subject in target image is identified using target subject recognition methods, such as, it is assumed that target Main body is face, then the face in target image is identified using face identification method.Where identifying target subject Region, convenient for subsequently determining the display location of text according to target subject region.

Collected voice data is identified as text by step S106, collecting voice data in real time.

Wherein, voice data is to acquire what user speech obtained in real time by the microphone in terminal.Receive user's After voice, collected voice data is identified to obtain text using speech recognition technology.Text refers to according to voice number The word sequence obtained according to identification.Speech recognition technology realized using existing technology, for example, for IOS systems, it can To carry out speech recognition operation by calling the API in SpeechKit (speech recognition tools).It, can be with for Android system It is realized by calling other speech recognition interfaces.Here, the identification of voice data is not defined.

Step S108 determines the initial position that text is presented according to target subject region.

Wherein it is possible to pre-set the position relationship between target subject and text, during acquiring image, when obtaining The starting display location that text is assured that behind the position of target subject is got, it then will according to the starting display location of text Corresponding textual presentation is in the picture.The initial position of text is arranged on the upper left side of target subject for example, pre-setting, Behind the position that target subject is determined, so that it may to determine the initial position where text, after getting text, so that it may with will be literary This is correspondingly shown to corresponding initial position.In one embodiment, further include a textual presentation frame, target is being determined Behind the position of main body, it is first determined the position of textual presentation frame, then by textual presentation in textual presentation frame, textual presentation frame Size can be automatically adjusted according to the length of text.

Step 110, using initial position as starting point by textual presentation in the target image.

Wherein, initial position refers to the starting display location of text.After the initial position of text is determined, by text with Initial position be starting point by textual presentation in the target image.

Above-mentioned image processing method obtains target image, is identified to the target subject in target image, identifies mesh Body region, while collecting voice data in real time are marked, collected voice data is identified as text, then according to target subject Region determine text present initial position, using initial position as starting point by textual presentation in the target image.The image procossing Method by converting voice data into text in real time, then in the target image by textual presentation, need not carry out additional Editor, which can be realized, is added to word in the image of acquisition, easy to operate, and determines that text is being schemed according to target subject region Initial position as in so that the displaying of text and image being capable of dynamic bind.

In one embodiment, using initial position as starting point by textual presentation in the target image the step of include：Work as language When the corresponding text of sound data forms word, word is illustrated in initial position；When the corresponding text of voice data formed it is next When a word, the word of history displaying to the direction movement for deviateing initial position and is shown；By next word according to starting The step of position shows, repeats to enter when the corresponding text of voice data forms next word, with data under voice The passage of time shows the corresponding text of voice data in real time in such a way that word moves.

Wherein, collecting voice data in real time just will when the corresponding text of collected voice data can form word Word is illustrated in initial position, then when the corresponding text of voice data forms next word, then by the word of history displaying Language is moved to the direction for deviateing initial position, while next word being shown according to initial position.Next word can be straight It connects and is illustrated in initial position, can also be illustrated near initial position.Over time, according to collected voice number According to word is constantly formed, then constantly word is shown in the picture in real time according to such word move mode.Such as figure 2A is in one embodiment, by the interface schematic diagram of the initial position of first word displaying of formation in the target image, figure 2B is to form next word, a upper word is shown to initial position is deviateed, and next word is illustrated in start bit The interface schematic diagram set, Fig. 2 C are to have the interface schematic diagram that multiple words are presented on target image over time.

In one embodiment, using initial position as starting point by textual presentation in the target image the step of include：It will be real When the voice data segment that acquires form sound bite, obtain the corresponding segment text of sound bite；Segment textual presentation is existed Initial position；The corresponding next segment text of next sound bite is obtained, by the segment text of history displaying to deviateing The direction of beginning position is mobile and shows；Next segment text is shown according to initial position, repeats to enter the next language of acquisition The step of tablet section corresponding next segment text, the side moved with segment text with the passage of data under voice time Formula shows the corresponding text of voice data in real time.

Wherein, collecting voice data in real time obtains voice when the voice data segment acquired in real time forms sound bite The corresponding segment text of segment, the current clip textual presentation that will identify that is in initial position.Wherein, the side of voice data segment Method can be by using the method for mute detection, when occurring mute, then it is assumed that the voice data of mute front is a voice Segment.A word that statement completely looks like can also be identified as a sound bite by semantics recognition.It is next when getting When the corresponding next segment text of a sound bite, the segment text of history displaying is moved to the direction for deviateing initial position Displaying, while next segment text being shown according to initial position.It in one embodiment, can be direct by next segment It is illustrated in initial position, in another embodiment, next fragment display can also be set near initial position.Class successively It pushes away, in such a manner, constantly shows the segment text of formation in the target image in real time in a mobile manner.Such as Fig. 3 is the interface schematic diagram of segment textual presentation in the target image in one embodiment.

In one embodiment, above-mentioned image processing method further includes：The text is subjected to word segmentation processing, is obtained multiple Ziwen sheet；It is described to include by step of the textual presentation in the target image using the initial position as starting point：According to The corresponding Speech time stamp of each Ziwen sheet determines each Ziwen sheet corresponding starting displaying time；According to each Ziwen sheet The corresponding starting displaying time by each Ziwen sheet using the initial position as starting point according to preset track pushing away with the time It is moved into Mobile state displaying.

Wherein, participle refers to that word sequence is cut into individual word one by one, individual word can be unitary word, It can be polynary word.Unitary word refers to word one by one, and polynary word refers to the word of binary and binary or more.Text is carried out Word segmentation processing obtains multiple Ziwen sheets.Specifically, the voice data acquired in real time is identified as text first, then to text into Row word segmentation processing obtains multiple Ziwen sheets, and each corresponding of Ziwen sheet is determined according to the corresponding Speech time stamp of each Ziwen sheet Begin the displaying time.Speech time stamp refers to the acquisition time of the corresponding voice data of text.

In one embodiment, the sequencing that can be stabbed according to the Speech time of the corresponding voice data of Ziwen sheet determines The sequencing of sub- textual presentation.Specifically, the relationship between Speech time stamp and starting displaying time can be set, for example, Speech time is stabbed to be positively correlated with the starting displaying time, i.e., the time that Speech time stamp represents is more early, and corresponding Ziwen sheet rises Begin to show that the time is more early.

In another embodiment, for multiple Ziwen sheets of within the same period (for example, 1 second), unrest can be carried out Sequence shows, because of one meaning of multiple this combinational expression of Ziwen in the same period, although out of order but still can be with The meaning for finding out expression is accomplished " in unrest orderly ".For example, " you are very beautiful ", if correspondence generates three sub- texts, respectively " you ", "true" and " beautiful ", then in displaying, if carry out is out of order, for example, " beautiful ", "true", " you ", can still see Go out the meaning of " you are very beautiful ", and in this way out of order, further increases the interest of displaying.

It, will according to each Ziwen sheet corresponding starting displaying time after the starting displaying time that each Ziwen sheet is determined Each Ziwen sheet carries out Dynamic Display over time according to preset track by starting point of initial position.

In one embodiment, text is being subjected to word segmentation processing, obtain further include after multiple this step of of Ziwen：Root According to semantics recognition crucial text is extracted from multiple Ziwen sheets.

Using initial position as starting point by textual presentation in the target image the step of include：It is corresponded to according to each crucial text Speech time stamp determine each crucial text corresponding starting displaying time；According to the corresponding starting displaying of each key text Each crucial text is carried out Dynamic Display by the time over time using initial position as starting point according to preset track.

Wherein, crucial text refers to the keyword obtained by semantics recognition.After identification obtains crucial text, according to key Text corresponding Speech time stamp determines that the corresponding starting of each crucial text shows the time, then by crucial text according to The starting displaying time carries out Dynamic Display over time using initial position as starting point according to preset track.Due to cutting Obtained Ziwen originally may be very long, need not all show all words, it is only necessary to extract crucial text and be opened up Show.For example " this summer is really awfully hot！", according to extraction of semantics to crucial text " summer is really hot ", will can only close The processing of key textual presentation.

As shown in figure 4, in one embodiment, above-mentioned image processing method further includes：

Step S112 obtains shooting instruction, obtains present image according to shooting instruction and is illustrated in working as in present image Preceding text.

Wherein, the target image for acquiring in real time, if it is desired to preserve target image, it is also necessary to obtain shooting Instruction, shooting instruction are to shoot the instruction of present image, that is, shoot the instruction of photo.Shooting instruction is by detecting user's triggering What the operation of shooting button obtained.After terminal gets shooting instruction, obtains present image and be illustrated in working as in present image Preceding text, present image refer to current shooting moment corresponding image, and current text refers to the corresponding displaying of current shooting moment Text in the picture.

Current text and present image are synthetically formed synthesis by step S114 according to the current presentation position of current text Image, and preserve composograph.

Wherein, in order to get the composograph for including current text and present image, existed according to current text Current text and present image are synthesized and are preserved to obtain composograph by the current presentation position in present image.

In one embodiment, above-mentioned image processing method further includes：Starting shooting instruction is obtained, is clapped according to the starting It takes the photograph instruction and displaying text in the picture and image is constantly synthetically formed each synthesized image frame, and preserve each composite diagram As frame；It obtains and terminates shooting instruction, synthetic video is formed according to each synthesized image frame.

Wherein, starting shooting instruction refers to shooting the initial order of video.After getting starting shooting instruction, constantly obtain The current text in present image and present image is taken to generate synthesized image frame, synthesized image frame refers to the video for shooting video Frame, i.e., each video frame are a composograph.Present image refers to current time corresponding image, and current text refers to current Moment corresponding text, with the variation of time, current time constantly changes, so will constantly be illustrated in present image Text and present image synthesize to obtain synthesized image frame, and preserve each synthesized image frame in real time.It refers to clapping to terminate shooting instruction Take the photograph the END instruction of video.Synthetic video is made of a continuous synthesized image frame of frame frame.Get end shooting instruction Afterwards, stop shooting, and synthetic video is generated according to each synthesized image frame of preservation.

In one embodiment, target subject is face；The step of target subject in the identification described image includes： The human face characteristic point in described image is extracted, the position of face is determined according to the human face characteristic point.

Wherein, using face as target subject, in order to identify that the face in image extracts face characteristic in image first Point, human face characteristic point are also referred to as face key point, are used for the position of locating human face, the position of wherein face includes but not limited to eye The face locations such as eyeball, face, nose, eyebrow.The position where face is assured that according to the human face characteristic point extracted.Tool Body, face label location technology may be used to extract the human face characteristic point in facial image, can specifically be divided into two steps Suddenly, one is Face datection, and one is face label.The rough position in image residing for face is obtained by Face datection first It sets, usually frames a rectangle frame of face, then on the basis of the rectangle frame, found by face label more accurate Position is then back to a series of coordinate of human face characteristic point positions.As shown in figure 5, in one embodiment, positioned by marking The schematic diagram of obtained human face characteristic point.Existing method may be used in the method for face label positioning, for example AAM may be used (Active Appearance Models faces display model), ERT (Ensemble of Regression tree, regression tree Combination) etc..Here the method for face label positioning is not limited.

In one embodiment, it is described during acquiring image according to the position of the target subject by the text The step being illustrated in described image includes：Mouth position is determined according to the characteristic point for representing mouth in the human face characteristic point, The display location that the text is determined according to the mouth position, according to the display location by the textual presentation in the figure As in.

Wherein, include the characteristic point of mouth in human face characteristic point, the spy for representing mouth is extracted from human face characteristic point Point is levied, mouth position is then assured that according to the characteristic point of mouth.Pre-set the display location and mouth position of text Between correspondence the display location of text is determined according to mouth position after mouth position is determined, then by text exhibition Show in display location.Since people is spoken by face, so by corresponding textual presentation around face, can build The scene that user speaks.By using this feature, by typing voice data, can be come from while taking pictures or shooting the video It is dynamic to add some texts expressed oneself mood or describe scenic picture, increase the interest of shooting.

In one embodiment, using initial position as starting point by textual presentation in the target image the step of include：According to The text corresponding displaying control parameter control text carries out Dynamic Display by starting point of the initial position.

Wherein, displaying control parameter refers to that the parameter of Dynamic Display is carried out for controlling text.Show that control parameter includes At least one of speed parameter, angle parameter, color parameter, size parameter, time parameter.Wherein, speed parameter includes just Speed, acceleration etc. are used for the parameter of controlled motion speed, wherein initial velocity, acceleration are vectors, are with directive speed Degree and acceleration, so movement velocity and the position of text can be calculated according to speed parameter and initial position.Angle Parameter includes rotation angle parameter, i.e., controls the rotary motion that text carries out angle according to the angle parameter.Color parameter is used for Control the color shows variation of text.Size parameter is used to control the displaying variation of the size of text, and time parameter is for controlling The displaying duration of text.Specifically, text is controlled using initial presentation position as starting point according to the corresponding displaying control parameter of text Carry out Dynamic Display, wherein Dynamic Display includes shift in position, angle change, color change, size variation, residence time etc. At least one of dynamic.

In one embodiment, text is controlled using initial presentation position as starting point according to the corresponding displaying control parameter of text Carry out Dynamic Display the step of include：Obtain the display location of text in forward frame image；According to displaying control parameter and forward direction The target location of text in current frame image is calculated in the display location of text in frame image, the target in current frame image Position text exhibition.

Wherein, if text is when being shown in a manner of movement in the picture, the position of movement is to need basis The display location of text is calculated in real time in displaying control parameter and forward frame image, and forward frame image refers to being in work as Picture frame before previous frame.In one embodiment, show in control parameter to include that initial velocity, acceleration etc. are transported for controlling The parameter of dynamic speed, according to the position of text in forward frame image, so that it may the displaying of text in current frame image is calculated Position.In one embodiment, it is assumed that the position of text is A in any forward frame image, the forward frame corresponding time is t1, Initial time is set as t0, it is assumed that initial velocity v0, acceleration a, and assume that initial velocity is consistent with acceleration direction, present frame figure As the corresponding time is t2.The position B, S=v0 of text in current frame image can so be calculated by following formula (t2-t1) the distance between A and B is calculated under the premise of the position A of known forward frame in+a (t2-t1) (t2+t1)/2 S, and the known direction of motion, can be calculated B location.

In one embodiment, in acquisition voice data, in the collecting voice data in real time, by collected institute's predicate Further include after the step of sound data are identified as text：Convert text to textual image.Wherein, it is identified by voice data To after text, textual image is converted text to.In one embodiment, the background colour of textual image is transparent.

In one embodiment, it is described using the initial position as starting point by the textual presentation in the target image Step S110 include：Using textual image as the particle in particIe system, joined according to pre-set particle in particIe system Number control textual image carries out Dynamic Display by starting point of initial position.

Wherein, the textual image that will convert into is as the particle in particIe system.ParticIe system refers to indicating three-dimensional computations The technology of some specific bloomings is simulated in machine graphics, particle is the X-Y scheme rendered in three dimensions, they It is mainly used for such as cigarette, fire, water droplet or leaf and other effects.One particIe system is made of three parts：Particle emitter, Particle animation device and particle renders are gone.Wherein, particle emitter is used to control generation and the original state of particle.Particle animation Device is used for the motion state with time control particle, and particle tenderer draws them on the screen.Wherein, particle emitter It is mainly to be indicated by one group of Fe coatings with particle animation device.Fe coatings may include particle formation speed (i.e. unit The number that time particle generates), particle initial velocity vectorial (for example, when being moved to what direction), particle age (warp Cross how long particle is buried in oblivion), particle color, the variation (for example, variation of size) etc. in particle life cycle be used for Control the parameter of particle variation.

As shown in fig. 6, in one embodiment, Fe coatings include speed parameter, angle parameter, color parameter, size At least one of parameter, time parameter；

Control the textual image according to pre-set Fe coatings in the particIe system is with the initial position Starting point carry out Dynamic Display the step of include：

Step S602, obtains textual image state in forward frame image, and textual image state includes the position, big of text At least one of small, angle, color.

Wherein, forward frame image refer to before the present frame to picture frame.Textual image state includes textual image At least one of position, size, angle, color.Specifically, in order to residing for the textual image that is calculated in current frame image State, need to be calculated according to the textual image state in Fe coatings and forward frame image.

Step S604 obtains current frame image Chinese according to textual image state computation in Fe coatings and forward frame image This picture state, according to textual image state text exhibition picture in current frame image.

Wherein, Fe coatings include in speed parameter, angle parameter, color parameter, size parameter, time parameter at least It is a kind of.Wherein, speed parameter is used to control movement velocity and the direction of particle, particle can be calculated according to speed parameter and work as It is the location of preceding.Angle parameter is used to control the rotation angle and rotary speed of particle, can be calculated according to angle parameter Obtain the angle that particle is presently in.Color parameter is used to control the displaying of the color of particle.Size parameter is for controlling particle Size and corresponding variation.Time parameter is used to control the service life of particle, the i.e. time of particle existence.Before being calculated Into frame image after textual image state, current frame image Chinese can be calculated according to textual image state and Fe coatings This picture state, and then current text is calculated come text exhibition picture according to textual image state in current frame image The position of picture, size, angle, color, are then shown.

In one embodiment, Fe coatings include only speed parameter.So textual image state is becoming in addition to position Change, other are all remained unchanged.The textual image state of the forward video frame got also only has position, according in forward video frame The position of textual image in current frame video image is calculated in the position and speed parameter of textual image.

In one embodiment, forward frame image uses the previous frame image of present frame, i.e., according to previous frame image Chinese The state of textual image in the state computation current frame image of this picture, and call particle tenderer according to current frame image Chinese The state of this picture is rendered.Particle renders flow diagram if Fig. 7 is particIe system in one embodiment obtains first The original state of textual image is taken, i.e., the state of textual image in first frame image, then, by according to previous frame image Chinese The principle of the state of textual image is calculated in the state computation current frame image of this picture, finally, by corresponding text diagram Piece carries out color applying drawing on the screen.

In one embodiment, the step of converting text to textual image include：Cutting word processing is carried out to text, is obtained Multiple displaying words, it is each to show that word corresponds to one textual image of generation, obtain multiple textual images.

Wherein, cutting word also known as " segments ", refers to that word sequence is cut into individual word one by one, individual word can To be unitary word, can also be polynary word.Unitary word refers to word one by one, and polynary word refers to binary and binary or more Word contains the phrase of the tandem relationship between word and word.Cutting obtains word and is known as " displaying word ".Each displaying word corresponds to A textual image is generated, obtains multiple textual images in this way.

In one embodiment, the step of converting text to textual image include：Gone out in text according to semantics recognition Target keyword；Target keyword is converted into textual image.

Wherein, target keywords refer to the word for needing emphasis to handle obtained by semantics recognition.Identification obtains target pass After keyword, target keyword is converted into textual image.For example " this summer is really awfully hot！", according to extraction of semantics to key Word " summer ", " genuine ", " heat " subsequently carry out emphasis for target keyword and show, for non-targeted keyword, for example, " this It is a ", " very " can carry out desalination and display or not show.

As shown in figure 8, in one embodiment, using textual image as the particle in particIe system, according to particIe system In pre-set Fe coatings control textual image include to originate the step of display location is starting point progress Dynamic Display：

Step S802 is stabbed according to the Speech time of the corresponding voice data of textual image and is determined that the starting of textual image is shown Time.

Wherein, since textual image is by text generation, and text is by being identified to obtain to voice data , Speech time stamp refers to the time for obtaining voice data.So when can Speech time be stabbed corresponding as textual image Between stab, to determine the corresponding starting displaying time according to the corresponding timestamp of textual image.It in one embodiment, can be with The sequencing stabbed according to the Speech time of the corresponding voice data of textual image determines the sequencing of textual image displaying.Tool Body, the relationship between Speech time stamp and starting displaying time can be set, for example, Speech time stamp and starting displaying time It is positively correlated, i.e., the time that Speech time stamp represents is more early, and the starting displaying time of corresponding textual image is more early.At another In embodiment, for multiple textual images of within the same period (for example, 1 second), out of order displaying can be carried out, because same One meaning of multiple textual image combinational expressions in period, although out of order but still it can be seen that the meaning of expression Think, accomplishes " in unrest orderly ".For example, " you are very beautiful ", if correspondence generates three textual images, respectively " you ", "true" " beautiful ", then in displaying, if carry out is out of order, for example, " beautiful ", "true", " you ", still it can be seen that " you really float It is bright " the meaning, and in this way out of order further increase the interest of shooting.

Step S804 controls corresponding textual image to originate exhibition respectively according to the corresponding Fe coatings of each textual image Show that position and starting displaying time have been that dotted state carries out Dynamic Display.

Wherein, starting display location refers to the textual image initially position of displaying in the picture, and starting shows that the time refers to The initial time of textual image.After the starting displaying time and starting display location that textual image is determined, so that it may with basis The corresponding Fe coatings of textual image carry out Dynamic Display.The corresponding Fe coatings of different textual images can be identical, also may be used With difference.For example, identical Fe coatings can be arranged for all particles, only the displaying time of each particle is different. Different Fe coatings can also be set for different particles.

As shown in figure 9, in one embodiment, above-mentioned image processing method further includes：

Step S116 records the image temporal stamp of collected each frame video image.

Wherein, image temporal stamp refer to collected each video frame corresponding time, that is, acquire the video image when Between.Specifically, terminal records the acquisition time of each frame video image when carrying out video acquisition, obtains a series of video figures As corresponding image temporal stamp sequence (time1, time2, time3 ...), to determine corresponding priority according to acquisition time Sequentially,

Step S118 records the corresponding Speech time stamp of the voice data got, corresponding Speech time is stabbed and known The text not gone out is associated.

Wherein, the corresponding time, goes out according to speech recognition when the Speech time stamp of voice data refers to acquisition voice data After text, Speech time is stabbed and is associated storage with the text identified.

Step S120 is stabbed according to image temporal stamp and Speech time so that video image synchronizes exhibition with corresponding text Show.

Wherein, in order to enable video image and text synchronize broadcastings, according to image temporal stamp and Speech time stab come Determine the displaying time of text.Specifically, image temporal stamp is stabbed into consistent text and image with Speech time and synchronizes exhibition Show.Synchronous text exhibition and image are realized with image temporal stamp by recording Speech time stamp, that is, realizes word and the shape of the mouth as one speaks It is synchronous.

It is as shown in Figure 10 the flow diagram of image processing method in one embodiment, includes mainly three steps：1, By camera real-time image acquisition, Face datection is carried out to acquisition image, extracts human face characteristic point, it is true according to human face characteristic point Determine the position of face；2, during acquiring image, voice data is received by microphone, language then is carried out to voice data Sound identifies to obtain identification text, is then textual image by the text conversion of identification；3, using textual image as in particIe system Particle, particle emission initiation region is arranged near the corners of the mouth, the effect of real-time " enunciating " is built.If Figure 11 is a reality It applies in example, the effect diagram of real-time " enunciating ".Wherein, recognition of face can pass through the SDK (Software of calling recognition of face Development Kit, Software Development Kit) it realizes, voice data can be by calling the SDK of speech recognition to realize.

As shown in figure 12, in one embodiment it is proposed that a kind of image processing method, this approach includes the following steps：

Step S1201 obtains target image, and target image includes face.

Step S1202 extracts the human face characteristic point in target image, human face region is determined according to human face characteristic point.

Collected voice data is identified as text by step S1203, collecting voice data in real time.

Step S1204 carries out cutting word processing to text, obtains multiple displaying words, each to show that word corresponds to one text of generation This picture obtains multiple textual images.

Step S1205 determines mouth position according to the characteristic point for representing mouth in human face characteristic point, true according to mouth position Determine the initial position of textual image.

Step S1206 stabs the starting exhibition for determining textual image according to the Speech time of the corresponding voice data of textual image Show the time.

Step S1207 joins using textual image as the particle in particIe system according to the corresponding particle of each textual image It has been that dotted state carries out Dynamic Display that number controls corresponding textual image to originate display location and starting displaying time respectively.

Step S1208 obtains shooting instruction, obtains present image according to shooting instruction and is illustrated in working as in present image Preceding textual image.

Current text and present image are synthetically formed by step S1209 according to the current presentation position of current text picture Composograph, and preserve composograph.

As shown in figure 13, in one embodiment it is proposed that a kind of image processing method, this method include：

Step S1302 obtains target image, and target image includes mouth.

Wherein, target image refers to pending image.The acquisition mode of target image can pass through the shape of shooting photo Formula can also be to be obtained by shooting the form of video, because video can be regarded as the picture composition of a frame frame.Image Acquisition can be acquired by the preposition or rear camera in terminal.Target image can be the image acquired in real time Or video, can also be the image or video after shooting.In one embodiment, the target image of acquisition is imaged by calling The preview image to be captured that head obtains, preview image refers to still unsaved image.

Step S1304 detects the mouth in target image, carries out lip reading identification according to mouth action, obtains corresponding knowledge Other text.

Wherein, the characteristic point of mouth is determined by extracting the form of human face characteristic point, it is then true according to the characteristic point of mouth Determine the position of mouth and the action of mouth.Lip reading identification is an item collection machine vision with natural language processing in the technology of one, The content that speech can be directly identified from the image that people talks, i.e., can identify to obtain corresponding text according to mouth action This, wherein lip reading identification can be realized by calling lip reading SDK.SDK refers to the software work for lip reading identification write Tool packet.

Step S1306, the context synchronization displaying that identification is obtained is in the target image.

Wherein, by mouth action in image being identified to obtain word in real time, then will the obtained word of identification and Corresponding includes that the image of corresponding mouth action synchronizes displaying.

Above-mentioned image processing method, by obtain target image, detect target image in mouth, according to mouth action into Row lip reading identifies, obtains corresponding identification text, and the context synchronization displaying for then obtaining identification is in the target image.Above-mentioned figure As processing method, lip reading identification is carried out by the mouth action identified in image, and corresponding text and mouth action are carried out Synchronous displaying, realizes and easily text is added in image, and can realize and be consistent text and face action.

In one embodiment, it will identify that the step of obtained context synchronization is shown in the target image includes：According to mouth The display location of the location determination text in portion, by the display location of context synchronization displaying in the target image.

Wherein, the correspondence between display location and mouth position is pre-set, after the position of mouth is determined, i.e., The display location of text is determined, then the display location by context synchronization displaying in the picture.By by textual presentation in mouth A kind of effect enunciated in real time is built on portion periphery.

It should be understood that although each step in above-mentioned flow chart is shown successively according to the instruction of arrow, this A little steps are not that the inevitable sequence indicated according to arrow executes successively.Unless expressly state otherwise herein, these steps It executes there is no the limitation of stringent sequence, these steps can execute in other order.Moreover, at least part step can be with Including multiple sub-steps either these sub-steps of multiple stages or stage be not necessarily execute completion in synchronization, and It is to execute at different times, the execution sequence in these sub-steps or stage is also not necessarily to be carried out successively, but can With either the sub-step of other steps or at least part in stage execute in turn or alternately with other steps.

As shown in figure 14, in one embodiment it is proposed that a kind of image processing apparatus, the device include：

Acquisition module 1402, for obtaining target image, the target image includes target subject；

Picture recognition module 1404 identifies target master for the target subject in the target image to be identified Body region；

Sound identification module 1406 is used for collecting voice data in real time, the collected voice data is identified as text This；

Position determination module 1408, for according to the target subject region, determining the initial position that the text is presented；

Display module 1410, for using the initial position as starting point by the textual presentation in the target image.

In one embodiment, display module is additionally operable to when the corresponding text of voice data forms word, by institute's predicate Language is illustrated in the initial position；When the corresponding text of voice data forms next word, by history displaying word to Deviate the direction movement of the initial position and shows；Next word is shown according to the initial position, repeat into Enter it is described when the corresponding text of voice data forms next word the step of, with the passage of data under voice time with The mode of word movement shows the corresponding text of voice data in real time.

In one embodiment, display module is additionally operable to the voice data segment acquired in real time forming sound bite, obtains Take the corresponding segment text of the sound bite；By the segment textual presentation in the initial position；Obtain next voice The corresponding next segment text of segment to the direction movement for deviateing the initial position and opens up the segment text of history displaying Show；Next segment text is shown according to the initial position, repeats to enter the next sound bite pair of acquisition The step of next segment text answered, with the data under voice time passage by segment text move in a manner of in real time Show the corresponding text of voice data.

As shown in figure 15, in one embodiment, described device further includes：

Word-dividing mode 1412 obtains multiple Ziwen sheets for the text to be carried out word segmentation processing；

The display module is additionally operable to determine each Ziwen according to each Ziwen sheet corresponding Speech time stamp that this is right The starting displaying time answered shows that each Ziwen sheet is by the time with the initial position according to the corresponding starting of each Ziwen sheet Starting point carries out Dynamic Display over time according to preset track.

As shown in figure 16, in one embodiment, described device further includes：

Extraction module 1414, for extracting crucial text from the multiple Ziwen sheet according to semantics recognition；

The display module is additionally operable to determine each crucial text according to the corresponding Speech time stamp of each key text This corresponding starting shows the time；According to each key text corresponding starting displaying time by each crucial text with described Beginning position is that starting point carries out Dynamic Display over time according to preset track.

In one embodiment, above-mentioned image processing apparatus further includes：Image taking module, for obtaining shooting instruction, The current text for obtaining present image according to the shooting instruction and being illustrated in the present image, according to working as current text The current text and the present image are synthetically formed composograph, and preserve the composograph by preceding display location.

In one embodiment, above-mentioned image processing apparatus further includes：Video capture module refers to for obtaining starting shooting It enables, displaying text in the picture and image is constantly synthetically formed by each composograph according to the starting shooting instruction Frame, and each synthesized image frame is preserved, it obtains and terminates shooting instruction, synthetic video is formed according to each synthesized image frame.

In one embodiment, the target subject is face；Described image identification module is additionally operable to extraction described image In human face characteristic point, the position of face is determined according to the human face characteristic point；The display module is additionally operable to according to the people The characteristic point that mouth is represented in face characteristic point determines mouth position, and the displaying position of the text is determined according to the mouth position Set, according to the display location by the textual presentation in described image.

In one embodiment, display module is additionally operable to control the text according to the corresponding displaying control parameter of the text This carries out Dynamic Display by starting point of the initial position.

In one embodiment, the display module is additionally operable to obtain the display location of text in forward frame image, according to The mesh of text in current frame image is calculated in the display location of text in the displaying control parameter and the forward frame image Cursor position, the target location in current frame image show the text.

In one embodiment, above-mentioned image processing apparatus further includes：Conversion module, for being text by the text conversion This picture；The display module is additionally operable to using the textual image as the particle in particIe system, according to the target subject Location determination described in textual image starting display location, controlled according to pre-set Fe coatings in the particIe system The textual image carries out Dynamic Display by starting point of the starting display location.

As shown in figure 17, in one embodiment, the Fe coatings include speed parameter, angle parameter, color parameter, At least one of size parameter, time parameter；

The display module includes：

Forward direction textual image state acquisition module 1410A, for obtaining textual image state in forward frame image, the text This picture state includes at least one of the position of text, size, angle, color；

Textual image display module 1410B, for the shape according to text in the Fe coatings and the forward frame image Textual image state in current frame image is calculated in state, and the text is shown according to textual image state in the current frame image This picture.

In one embodiment, the conversion module is additionally operable to carry out cutting word processing to the text, obtains multiple displayings Word, it is each to show that word corresponds to one textual image of generation, obtain multiple textual images.

In one embodiment, the conversion module is additionally operable to go out the target critical in the text according to semantics recognition The target keyword is converted to textual image by word.

In one embodiment, the display module is additionally operable to the voice according to the corresponding voice data of the textual image Timestamp determines the starting displaying time of the textual image, and phase is controlled respectively according to the corresponding Fe coatings of each textual image The textual image answered has been that dotted state carries out Dynamic Display with the starting display location and starting displaying time.

In one embodiment, above-mentioned image processing apparatus further includes：Synchronous display module, it is collected every for recording The image temporal of one frame video image stabs, and the corresponding Speech time stamp of the voice data got is recorded, when by corresponding voice Between the text that stabs and identify be associated, according to described image timestamp and the Speech time stabs so that video image and phase The text answered synchronizes displaying.

As shown in figure 18, in one embodiment it is proposed that a kind of image processing apparatus, the device include：

Image collection module 1802, for obtaining target image, the target image includes mouth.

Lip reading identification module 1804 carries out lip reading knowledge for detecting the mouth in the target image according to mouth action Not, corresponding identification text is obtained.

Synchronous display module 1806, the context synchronization for obtaining identification are illustrated in the target image.

In one embodiment, the synchronous display module 1806 is additionally operable to literary described in the location determination according to the mouth This display location, the display location context synchronization being illustrated in described image.

Figure 19 shows the internal structure chart of one embodiment Computer equipment.The computer equipment can be specifically clothes Business device.As shown in figure 19, which includes the processor connected by system bus, memory, network interface, input Device, image collecting device, voice acquisition device and display screen.Wherein, memory includes non-volatile memory medium and memory Reservoir.The non-volatile memory medium of the computer equipment is stored with operating system, can also be stored with computer program, the calculating When machine program is executed by processor, processor may make to realize image processing method.Also calculating can be stored in the built-in storage Machine program when the computer program is executed by processor, may make processor to execute image processing method.The figure of computer equipment Picture harvester is camera, and for acquiring image, voice acquisition device is microphone, for acquiring voice data.Computer The display screen of equipment can be liquid crystal display or electric ink display screen, and the input unit of computer equipment can be display The touch layer covered on screen can also be the button being arranged on computer equipment shell, trace ball or Trackpad, can also be outer Keyboard, Trackpad or mouse for connecing etc..It will be understood by those skilled in the art that structure shown in Figure 19, only with this Shen Please the relevant part-structure of scheme block diagram, do not constitute the limit for the computer equipment being applied thereon to application scheme Fixed, specific computer equipment may include either combining certain components or tool than more or fewer components as shown in the figure There is different component arrangements.

In one embodiment, image processing method provided by the present application can be implemented as a kind of shape of computer program Formula, computer program can be run on computer equipment as shown in figure 19.Composition can be stored in the memory of computer equipment Each program module of the image processing apparatus, for example, acquisition module 1402, picture recognition module 1404, language shown in Figure 14 Sound identification module 1406, position determination module 1408 and display module 1410.The computer program that each program module is constituted makes Processor executes the step in the image processing apparatus of each embodiment of the application described in this specification.For example, Figure 19 Shown in computer equipment can by the acquisition module 1402 in image processing apparatus as shown in figure 14 obtain target image, The target image includes target subject；The target subject in the target image is carried out by picture recognition module 1404 Identification, identifies target subject region；By 1406 collecting voice data in real time of sound identification module, by collected institute's predicate Sound data are identified as text；By position determination module 1408 according to the target subject region, determine what the text was presented Initial position；By display module 1410 using the initial position as starting point by the textual presentation in the target image.

In one embodiment it is proposed that a kind of computer equipment, including memory and processor, the memory storage There is computer program, when the computer program is executed by the processor so that the processor executes following steps：It obtains Target image, the target image include target subject；Target subject in the target image is identified, is identified Target subject region；The collected voice data is identified as text by collecting voice data in real time；According to the target master Body region determines the initial position that the text is presented；Using the initial position as starting point by the textual presentation in the mesh In logo image.

In one embodiment, it is described using the initial position as starting point by the textual presentation in the target image The step of include：When the corresponding text of voice data forms word, the word is illustrated in the initial position；Work as voice When the corresponding text of data forms next word, the word of history displaying is moved simultaneously to the direction for deviateing the initial position Displaying；Next word is shown according to the initial position, repeats to enter described when the corresponding text shape of voice data At the step of when next word, as the passage of data under voice time shows voice in real time in such a way that word moves The corresponding text of data.

In one embodiment, it is described using the initial position as starting point by the textual presentation in the target image The step of include：The voice data segment acquired in real time is formed into sound bite, obtains the corresponding segment text of the sound bite This；By the segment textual presentation in the initial position；The corresponding next segment text of next sound bite is obtained, it will The segment text of history displaying is mobile to the direction for deviateing the initial position and shows；By next segment text according to The step of initial position shows, repeats next segment text corresponding into the next sound bite of acquisition, with The passage for the data under voice time shows the corresponding text of voice data in real time in such a way that segment text moves.

In one embodiment, the processor is additionally operable to execute following steps：The text is subjected to word segmentation processing, is obtained To multiple Ziwen sheets；

It is described to include by step of the textual presentation in the target image using the initial position as starting point：According to The corresponding Speech time stamp of each Ziwen sheet determines each Ziwen sheet corresponding starting displaying time；According to each Ziwen sheet The corresponding starting displaying time by each Ziwen sheet using the initial position as starting point according to preset track pushing away with the time It is moved into Mobile state displaying.

In one embodiment, the processor is described by text progress word segmentation processing in execution, obtains multiple sons After the step of text, it is additionally operable to execute following steps：Crucial text is extracted from the multiple Ziwen sheet according to semantics recognition This；It is described to include by step of the textual presentation in the target image using the initial position as starting point：According to each The corresponding Speech time stamp of crucial text determines each crucial text corresponding starting displaying time；According to each crucial text This corresponding starting displaying time by each key text using the initial position as starting point according to preset track with the time Passage carry out Dynamic Display.

In one embodiment, the processor is additionally operable to execute following steps：Shooting instruction is obtained, according to the shooting The current text that instruction obtains current target image and is illustrated in the current target image；According to the current exhibition of current text Show position, the current text and the current target image is synthetically formed composograph, and preserve the composograph.

In one embodiment, the processor is additionally operable to execute following steps：Starting shooting instruction is obtained, according to described Displaying text in the target image and target image are constantly synthetically formed each synthesized image frame by starting shooting instruction, and Preserve each synthesized image frame；It obtains and terminates shooting instruction, synthetic video is formed according to each synthesized image frame.

In one embodiment, the target subject is face；The target subject in the target image carries out Identification, the step of identifying target subject region include：The human face characteristic point in described image is extracted, according to the face characteristic Point determines human face region；It is described according to the target subject region, the step of determining the initial position that the text is presented includes： Mouth position is determined according to the characteristic point for representing mouth in the human face characteristic point, and the text is determined according to the mouth position Initial position.In one embodiment, it is described using the initial position as starting point by the textual presentation in the target figure As in step include：The text is controlled using the initial position as starting point according to the corresponding displaying control parameter of the text Carry out Dynamic Display.

In one embodiment, described that the text is controlled with described first according to the corresponding displaying control parameter of the text Beginning display location is that the step of starting point carries out Dynamic Display includes：Obtain the display location of text in forward frame image；According to institute The target of text in current frame image is calculated in the display location for stating text in displaying control parameter and the forward frame image Position, the target location in current frame image show the text.

In one embodiment, the processor is executing the collecting voice data in real time, by collected institute's predicate After the step of sound data are identified as text, it is additionally operable to execute following steps：It is textual image by the text conversion；It is described with The initial position is that step of the textual presentation in the target image is included by starting point：Using the textual image as Particle in particIe system controls the textual image with described according to pre-set Fe coatings in the particIe system Beginning position is that starting point carries out Dynamic Display.

In one embodiment, the Fe coatings include speed parameter, angle parameter, color parameter, size parameter, when Between at least one of parameter；The textual image is controlled with described according to pre-set Fe coatings in the particIe system Initial position is that the step of starting point carries out Dynamic Display includes：Obtain textual image state in forward frame image, the text diagram Piece state includes at least one of the position of textual image, size, angle, color；According to the Fe coatings and it is described before Into frame image, textual image state computation obtains textual image state in current frame image, according to current frame image Chinese This picture state shows the textual image.

In one embodiment, described using the textual image as the particle in particIe system, according to the particle systems Pre-set Fe coatings control the step of textual image carries out Dynamic Display using the initial position as starting point in system Including：When determining the starting displaying of the textual image according to the Speech time of the corresponding voice data of textual image stamp Between；Corresponding textual image is controlled respectively with the initial position and described according to the corresponding Fe coatings of each textual image Begin to show that the time has been that dotted state carries out Dynamic Display.

In one embodiment, the processor is additionally operable to execute following steps：Record collected each frame video figure The image temporal of picture stabs；The corresponding Speech time stamp of the voice data got is recorded, corresponding Speech time is stabbed and identified The text gone out is associated；Stabbed according to described image timestamp and the Speech time so that video image and corresponding text into The synchronous displaying of row.

In one embodiment it is proposed that a kind of computer equipment, including memory and processor, the memory storage There is computer program, when the computer program is executed by the processor so that the processor executes following steps：It obtains Target image, the target image include mouth；The mouth in the target image is detected, lip reading is carried out according to mouth action Identification, obtains corresponding identification text；The context synchronization that identification obtains is illustrated in the target image.

In one embodiment, the context synchronization that identification is obtained is illustrated in the step packet in the target image It includes：According to the display location of text described in the location determination of the mouth, the context synchronization is illustrated in the target image In display location.

In one embodiment it is proposed that a kind of computer readable storage medium, is stored with computer program, the calculating When machine program is executed by processor so that the processor executes following steps：Target image is obtained, is wrapped in the target image Include target subject；Target subject in the target image is identified, identifies target subject region；Acquisition voice in real time The collected voice data is identified as text by data；According to the target subject region, determine what the text was presented Initial position；Using the initial position as starting point by the textual presentation in the target image.

In one embodiment it is proposed that a kind of computer readable storage medium, is stored with computer program, the calculating When machine program is executed by processor so that the processor executes following steps：Target image is obtained, is wrapped in the target image Include mouth；The mouth in the target image is detected, lip reading identification is carried out according to mouth action, obtains corresponding identification text； The context synchronization that identification obtains is illustrated in the target image.

One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a non-volatile computer and can be read In storage medium, the program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, provided herein Each embodiment used in any reference to memory, storage, database or other media, may each comprise non-volatile And/or volatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield is all considered to be the range of this specification record.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously Cannot the limitation to the application the scope of the claims therefore be interpreted as.It should be pointed out that for those of ordinary skill in the art For, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the protection domain of the application patent should be determined by the appended claims.

Claims

1. a kind of image processing method, the method includes：

Target image is obtained, the target image includes target subject；

2. according to the method described in claim 1, it is characterized in that, described using the initial position is starting point by the text exhibition Show that the step in the target image includes：

When the corresponding text of voice data forms word, the word is illustrated in the initial position；

When the corresponding text of voice data forms next word, by the word of history displaying to deviateing the initial position Direction is mobile and shows；

Next word is shown according to the initial position, repeats to enter described when the corresponding text of voice data is formed The step of when next word, as the passage of data under voice time shows voice number in real time in such a way that word moves According to corresponding text.

3. according to the method described in claim 1, it is characterized in that, described using the initial position is starting point by the text exhibition Show that the step in the target image includes：

The voice data segment acquired in real time is formed into sound bite, obtains the corresponding segment text of the sound bite；

By the segment textual presentation in the initial position；

The corresponding next segment text of next sound bite is obtained, by the segment text of history displaying to deviateing the starting The direction of position is mobile and shows；

Next segment text is shown according to the initial position, repeats to enter the next sound bite pair of acquisition The step of next segment text answered, with the data under voice time passage by segment text move in a manner of in real time Show the corresponding text of voice data.

4. according to the method described in claim 1, it is characterized in that, the method further includes：

The text is subjected to word segmentation processing, obtains multiple Ziwen sheets；

It is described to include by step of the textual presentation in the target image using the initial position as starting point：

Determine that the corresponding starting of each Ziwen sheet shows the time according to the corresponding Speech time stamp of each Ziwen sheet；

By each Ziwen sheet it is starting point according to presetting using the initial position according to each Ziwen sheet corresponding starting displaying time Track carry out Dynamic Display over time.

5. according to the method described in claim 4, it is characterized in that, text progress word segmentation processing is obtained more described Further include after the step of a sub- text：

According to semantics recognition crucial text is extracted from the multiple Ziwen sheet；

Determine that the corresponding starting of each key text shows the time according to the corresponding Speech time stamp of each key text；

According to each crucial text corresponding starting displaying time by each key text using the initial position as starting point according to Preset track carries out Dynamic Display over time.

6. according to the method described in claim 1, it is characterized in that, the method further includes：

Shooting instruction is obtained, current target image is obtained according to the shooting instruction and is illustrated in the current target image Current text；

According to the current presentation position of current text, the current text and the current target image are synthetically formed composite diagram Picture, and preserve the composograph.

7. according to the method described in claim 1, it is characterized in that, the method further includes：

Starting shooting instruction is obtained, according to the starting shooting instruction constantly by displaying text in the target image and target Image is synthetically formed each synthesized image frame, and preserves each synthesized image frame；

It obtains and terminates shooting instruction, synthetic video is formed according to each synthesized image frame.

8. according to the method described in claim 1, it is characterized in that, the target subject is face；

The step of target subject in the target image is identified, identifies target subject region include：

The human face characteristic point in described image is extracted, human face region is determined according to the human face characteristic point；

It is described according to the target subject region, the step of determining the initial position that the text is presented includes：According to the people The characteristic point that mouth is represented in face characteristic point determines mouth position, and the start bit of the text is determined according to the mouth position It sets.

9. according to the method described in claim 1, it is characterized in that, described using the initial position is starting point by the text exhibition Show that the step in the target image includes：

It is starting point into Mobile state exhibition to control the text using the initial position according to the corresponding displaying control parameter of the text Show.

10. according to the method described in claim 9, it is characterized in that, described according to the corresponding displaying control parameter of the text Controlling the step of text carries out Dynamic Display using the initial presentation position as starting point includes：

Obtain the display location of text in forward frame image；

It is calculated in current frame image according to the display location of text in the displaying control parameter and the forward frame image The target location of text, the target location in current frame image show the text.

11. according to the method described in claim 1, it is characterized in that, in the collecting voice data in real time, by collected institute Stating the step of voice data is identified as text further includes later：

It is textual image by the text conversion；

Using the textual image as the particle in particIe system, according to pre-set Fe coatings control in the particIe system It makes the textual image and carries out Dynamic Display by starting point of the initial position.

12. according to the method for claim 11, which is characterized in that the Fe coatings include speed parameter, angle parameter, At least one of color parameter, size parameter, time parameter；

The textual image is controlled using the initial position as starting point according to pre-set Fe coatings in the particIe system Carry out Dynamic Display the step of include：

Textual image state in forward frame image is obtained, the textual image state includes position, size, the angle of textual image At least one of degree, color；

Text diagram in current frame image is obtained according to textual image state computation in the Fe coatings and the forward frame image Piece state shows the textual image according to textual image state in the current frame image.

13. according to the method for claim 11, which is characterized in that described using the textual image as in particIe system Particle controls the textual image using the initial position as starting point according to pre-set Fe coatings in the particIe system Carry out Dynamic Display the step of include：

Determine that the starting of the textual image shows the time according to the Speech time of the corresponding voice data of textual image stamp；

Corresponding textual image is controlled respectively with the initial position and described according to the corresponding Fe coatings of each textual image The starting displaying time has been that dotted state carries out Dynamic Display.

14. method according to any one of claim 1 to 13, which is characterized in that the method further includes：

Record the image temporal stamp of collected each frame video image；

The corresponding Speech time stamp of the voice data got is recorded, corresponding Speech time is stabbed and is carried out with the text identified Association；

It is stabbed according to described image timestamp and the Speech time so that video image synchronizes displaying with corresponding text.

15. a kind of image processing method, the method includes：

Target image is obtained, the target image includes mouth；

16. according to the method for claim 15, which is characterized in that the context synchronization for obtaining identification is illustrated in described Step in target image includes：

According to the display location of text described in the location determination of the mouth, the context synchronization is illustrated in the target image In display location.

17. a kind of image processing apparatus, described device include：

Picture recognition module identifies target subject region for the target subject in the target image to be identified；

18. a kind of image processing apparatus, described device include：

Lip reading identification module carries out lip reading identification according to mouth action, obtains pair for detecting the mouth in the target image The identification text answered；

19. a kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor is executed such as the step of any one of claim 1 to 16 the method.

20. a kind of computer equipment, including memory and processor, the memory is stored with computer program, the calculating When machine program is executed by the processor so that the processor is executed such as any one of claim 1 to 16 the method Step.