CN109429084A

CN109429084A - Method for processing video frequency and device, for the device of video processing

Info

Publication number: CN109429084A
Application number: CN201710737845.2A
Authority: CN
Inventors: 张�杰; 卜海亮; 靳笑; 靳一笑; 邢真臻; 蒋品; 冯新强
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2017-08-24
Filing date: 2017-08-24
Publication date: 2019-03-05
Anticipated expiration: 2037-08-24
Also published as: CN109429084B

Abstract

The embodiment of the invention provides a kind of method for processing video frequency and device, a kind of device for video processing, method therein is specifically included: obtaining the text information in video frame；The target item to match with the text information is obtained from pre- placing articles library；By the corresponding target information addition of the target item in the video frame.The embodiment of the present invention can shorten the processing time of video and promote video treatment effeciency, and the video coverage rate of target information can be improved.

Description

Method for processing video frequency and device, for the device of video processing

Technical field

The present invention relates to video technique fields, are used for video more particularly to a kind of method for processing video frequency and device, one kind The device of processing.

Background technique

With the development of internet technology, more and more users' habit watches video, tool by terminals such as computer, mobile phones Body, user can watch interested view by the player being implanted on the player or webpage of locally-installed client Frequently.

Information is added in video currently, can handle by video.Existing scheme can be by manual operation in video Middle addition information, specifically, operator extract the video for being suitble to addition information after watching video from video first Then frame obtains the corresponding information of the video frame, be inserted into acquired information in the video frame followed by editing system.

However, existing scheme adds information by manual operation in video, need to spend more time cost and people It is low to will lead to video treatment effeciency in this way for power cost.

Summary of the invention

In view of the above problems, it proposes the embodiment of the present invention and overcomes the above problem or at least partly in order to provide one kind The method for processing video frequency that solves the above problems, video process apparatus and the device for video processing, the embodiment of the present invention can be with Shorten the processing time of video and promote video treatment effeciency, and the video coverage rate of target information can be improved.

To solve the above-mentioned problems, the invention discloses a kind of method for processing video frequency, comprising:

Obtain the text information in video frame；

The target item to match with the text information is obtained from pre- placing articles library；

By the corresponding target information addition of the target item in the video frame.

On the other hand, the invention discloses a kind of video process apparatus, comprising:

Text information obtains module, for obtaining the text information in video frame；

Target item obtains module, for obtaining the object to match with the text information from pre- placing articles library Product；And

Target information adding module, for adding the corresponding target information of the target item in the video frame.

Optionally, the text information acquisition module includes:

Identify submodule, the video frame for including to video carries out text identification and/or subtitle recognition, described to obtain Text information in video frame.

Optionally, the target item acquisition module includes:

Judging submodule, for judge the text information whether include with the first article in the pre- placing articles library or The information that the corresponding characteristic information of the ware of first article matches, if so, using first article as with it is described The target item that text information matches.

Optionally, the target information adding module includes:

Target position determines submodule, for determining in the video frame for adding the target position of target information；

Submodule is added, adds the target information for the target position in the video frame.

Optionally, the target position determines that submodule includes:

First object position determination unit has between article and the target item for determining in the video frame Degree of conformity；From having the position for obtaining degree of conformity in article and meeting the article of prerequisite in the video frame, as target position It sets；And/or

Second target position determination unit is suitable for adding the preset figure of the target information for identification out in video frame As target area, using the prediction picture target area as the target position.

Optionally, the target position is subtitle relevant position；

The addition submodule includes:

Subtitle adding unit, for modifying according to target information to the subtitle for including in the video frame, in institute It states and adds the target information in the subtitle that video frame includes；And/or

Subtitle extra cell, for adding target information as the additional information of subtitle in the video frame in the word Around curtain, to add the target information in the video frame.

Optionally, the target information adding module includes:

Video frame information modifies submodule, for according to the target information, to corresponding to target position in the video frame Information modify, to obtain the modified video frame including the target information；Or

Additional submodule, for adding the target information as the additional information for corresponding to target position in the video frame The video frame is added.

Optionally, the video frame information modification submodule includes:

Pixel value modifies unit, for the first pixel value for corresponding to target position in the video frame to be replaced with target letter Cease corresponding second pixel value, target information and/or text of corresponding second pixel value of the target information according to picture format The color-values of the target information of this format determine；And/or

Captioned test modifies unit will be right for modifying to the text information for corresponding to subtitle position in video frame The text information of subtitle position is answered to be revised as the target information of text formatting.

Optionally, described device further include:

Picture charge pattern module, the image object in the successive video frames for including to video carry out image trace；

Target information Multiplexing module, the image object for being directed in subsequent video frame according to image trace result are multiple With the corresponding target information of identical image target in video frame before.

In another aspect, the invention discloses a kind of device for video processing, include memory and one or More than one program, perhaps more than one program is stored in memory and is configured to by one or one for one of them It includes the instruction for performing the following operation that a above processor, which executes the one or more programs:

Obtain the text information in video frame；

Another aspect, the invention discloses a kind of machine readable medias, are stored thereon with instruction, when by one or more When managing device execution, so that device executes method for processing video frequency described in aforementioned one or more.

The embodiment of the present invention includes following advantages:

The embodiment of the present invention obtains the text information in video frame by machine automatically, obtain in pre- placing articles library with this article The target item that this information matches, and by the corresponding target information addition of the target item in corresponding video frame；By In the embodiment of the present invention can in the case where being not necessarily to manual intervention quick obtaining and video frame the mesh that matches of text information Article is marked, therefore the processing time of video can be shortened and promote video treatment effeciency.

Also, in the case where being shortened the video processing time, manageable number of videos can go out in the unit time The growth of existing geometry rank, and the machine scale of video can be handled come infinite expanding by way of computing cluster, in this way, The video coverage rate of target information can be improved.

Further, the embodiment of the present invention is carried out at video by the way of text information acquisition and pre- placing articles storehouse matching Reason, in this way, can be obtained based on pre- placing articles storehouse matching newest in the case that the information in the pre- placing articles library changes Target item and its corresponding target information, therefore the timeliness for the target information added in the video frame can be improved, very The real-time update of target information can be extremely realized to a certain extent.

Detailed description of the invention

Fig. 1 is a kind of step flow chart of method for processing video frequency embodiment one of the invention；

Fig. 2 is a kind of step flow chart of method for processing video frequency embodiment two of the invention；

Fig. 3 is a kind of structural block diagram of video process apparatus embodiment of the invention；

Fig. 4 be a kind of device 900 for video processing of the invention as terminal when structural block diagram；

Fig. 5 is the structural schematic diagram of server in some embodiments of the present invention.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.

The embodiment of the invention provides a kind of video processing schemes, text information in the available video frame of the program, The target item to match with the text information is obtained from pre- placing articles library, and the corresponding target of the target item is believed Breath addition is in the video frame.

The embodiment of the present invention obtains the text information in video frame by machine automatically, obtain in pre- placing articles library with this article The target item that this information matches, and by the corresponding target information addition of the target item into video frame；Due to this Inventive embodiments can in the case where being not necessarily to manual intervention quick obtaining and video frame the object that matches of text information Product, therefore the processing time of video can be shortened and promote video treatment effeciency.

Also, in the case where being shortened the video processing time, manageable number of videos can go out in the unit time The growth of existing geometry rank, and the machine scale of video can be handled come infinite expanding by way of computing cluster, in this way, It can be further improved video treatment effeciency.

Further, the embodiment of the present invention is carried out at video by the way of text information acquisition and pre- placing articles storehouse matching Reason, in this way, can be obtained based on pre- placing articles storehouse matching newest in the case that the information in the pre- placing articles library changes Target item and its corresponding target information, therefore the update cycle of target information can be shortened, such as to a certain extent The real-time update of target information may be implemented.

Video processing schemes provided in an embodiment of the present invention can be handled for the video from any video platform, And video processing schemes provided in an embodiment of the present invention can play video to offline video or in real time and handle.Wherein, Video platform can be for for providing the network platform of video, in practical applications, the example of video platform may include: video Website and/or video APP (application program, Application) etc..

Referring to Fig.1, a kind of exemplary block diagram of processing system for video of the embodiment of the present invention is shown, which can be with It include: video server 101, videoconference client 102 and video process apparatus 103；Wherein, video server 101 and video visitor Family end 102 can be located in wired or wireless network, by the wired or wireless network, video server 101 and video consumer End 102 carries out data interaction；Video server 101 can also be counted with video process apparatus 103 by wired or wireless network According to interaction.

In practical applications, video server 101 can provide the first video to videoconference client 102, so that video is objective The first video that family end 102 provides video server 101 plays out；For example, can be according to the broadcasting of videoconference client 102 Request or downloading request, provide corresponding first video to videoconference client 102.

Also, video server 101 can provide the second video for needing to add information to video process apparatus 103, then The video processing schemes that video process apparatus 103 can use the embodiment of the present invention handle the second video, to be added Added with the second video of target information, and the second video for being added with target information is sent to video server 101.

In practical applications, the second video can play for offline video or in real time video.

In the case where the second video is offline video, the second video can be current popular video etc., Video service Device 101 can send offline video to video process apparatus 103, obtain from video process apparatus 103 added with target information Offline video, and the second video added with target information is stored, in this way, being sent receiving videoconference client 102 Playing request or downloading request, then can be with to the first video that videoconference client 102 provides are as follows: playing request or downloading Request corresponding the second video added with target information.

In the case where the second video is to play video in real time, video server 101 can receive the hair of videoconference client 102 The playing request sent, for example, can be carried in the playing request in real time play video URL (uniform resource locator, Uniform Resource Locator) etc. information, then can according to the URL obtain in real time play video, and to video handle Device 103 is sent plays video in real time, the real-time broadcasting video for being added with target information is obtained from video process apparatus 103, then The first video provided to videoconference client 102 can be with are as follows: the real-time broadcasting video added with target information.

It is appreciated that processing system for video shown in Fig. 1 is intended only as the application of the method for processing video frequency of the embodiment of the present invention The example of environment, it will be understood that the method for processing video frequency of the embodiment of the present invention can be applied in arbitrary application environment, example Such as, the method for processing video frequency of the embodiment of the present invention can also be applied in the application environment of client, wherein videoconference client 102 can use the method for processing video frequency of the embodiment of the present invention, and the first video provided video server 101 is handled, To add target information etc. in the first video, the embodiment of the present invention is without restriction for specific application environment.

Embodiment of the method

Referring to Fig. 2, a kind of step flow chart of method for processing video frequency embodiment of the invention is shown, can specifically include Following steps:

Text information in step 201, acquisition video frame；

Step 202 obtains the target item to match with the text information from pre- placing articles library；

Step 203 adds the corresponding target information of the target item in the video frame.

The embodiment of the present invention is without restriction for the source of video in step 201.For example, the video can be originated from video Server may originate from user.Wherein, in the case where the video source is from video server, which can be offline view Frequency plays video in real time.In the case where the video source is from user, for example, can by way of website or APP to User, which provides, uploads interface, and the video that user is uploaded by the upload interface is as video in step 201.

Video is usually made of static picture, these static pictures are referred to as video frame.It in practical applications, can be with Several video frames are extracted from video according to preset time interval, and are input to step 201 for obtained video frame is extracted, That is, extracting obtained video frame can be used as the input data of step 201.It is appreciated that those skilled in the art can basis Practical application request determines above-mentioned preset time interval, for example, above-mentioned preset time interval can be corresponding for N number of video frame Playing duration, N are positive integer, it will be understood that the embodiment of the present invention is without restriction for specific N and preset time interval.

In the embodiment of the present invention, the text information in video frame may include: the text information for including in image, and/or Text information in subtitle.In practical applications, the process of the text information in above-mentioned acquisition video frame, may include: to view The video frame that frequency includes carries out text identification and/or subtitle recognition, to obtain the text information in the video frame.

In practical applications, text identification can be carried out to the video frame that video includes using text recognition technique.It is above-mentioned Text recognition technique may include: OCR (optical character identification, Optical Character Recognition) technology etc., OCR technique can carry out cutting to the character in image, to obtain single character after carrying out the pretreatment such as noise reduction to image Image, and identify the corresponding character of single character picture.It is appreciated that the embodiment of the present invention is for specific text recognition technique It is without restriction.

In practical applications, the corresponding subtitle file of the subtitle of available video frame, and obtained from the subtitle file Text information in subtitle；Alternatively, screenshotss can be carried out to the corresponding picture of video frame, and text knowledge is carried out to screenshotss image Not, to obtain the text information in subtitle.It is appreciated that specific acquisition of the embodiment of the present invention for the text information in subtitle Mode is without restriction.

After step 201 obtains the text information in the video frame, step 202 can be obtained from pre- placing articles library The target item to match with text information.

Wherein, pre- placing articles library can be used for storing the first article, also, first article can also be corresponding with characteristic information And target information.In practical applications, it can cooperate with operator, to obtain the first article and its corresponding characteristic information And target information.

Wherein, the characteristic information of the first article is used to characterize the article characteristics of the first article, can be used as and text envelope Breath carries out matched matching foundation.

Target information is the information for adding in the video frame；For example, target information can for the first article logo, Picture etc. attracts the information of user, and for another example, target information can be the access entrances such as link, so that user passes through the access entrance Into the corresponding page of the first article.

The example of first article may include: the commodity such as clothes, shoes, beverage, adornment, and target information may include: Target information and/or the target information of text formatting of the picture formats such as logo, display diagram, poster etc., it will be understood that operator It can determine that the first article recommended and its corresponding target information, the present invention are implemented according to practical application request Example is without restriction for specific first article and its corresponding target information.

Additionally, it is appreciated that providing the first article and its corresponding characteristic information and target information above by operator Mode be intended only as alternative embodiment, in fact, those skilled in the art can be according to practical application request, using its other party Formula obtains the first article and its corresponding characteristic information and target information, for example, according to the historical behavior data acquisition of user the One article etc. specifically can be according to the feature of interest of the historical behavior data acquisition user of user, and it is emerging to obtain the sense Corresponding first article of interesting feature, for example, the feature of interest can be the product features that user bought, which can Think similar another characteristic of the product features etc., it will be understood that the embodiment of the present invention is for the first article and its corresponding target The specific acquisition modes of information are without restriction.

In an alternative embodiment of the invention, above-mentioned steps 202 obtain and the text envelope from pre- placing articles library The process of the matched target item of manner of breathing may include: judge the text information whether include and in the pre- placing articles library The information that the corresponding characteristic information of the ware of one article or the first article matches, if so, by first article As the target item to match with the text information.The embodiment of the present invention can be by text information and the first article or The corresponding characteristic information of the ware of one article matches, and can increase the matching range of target item.

Optionally, the characteristic information may include: at least one of title, brand, classification and advertising slogan.Text information And characteristic information match may include: all or part of text information character corresponding with characteristic information it is identical, it is semantic it is identical, Semantic similar, semantic correlation etc..It is alternatively possible to determine text information and the corresponding text vector of characteristic information respectively, and root Semantic similar judgement is carried out according to the similarity between two text vectors, it will be understood that the embodiment of the present invention is for text envelope Breath matches with characteristic information and its corresponding matching process is without restriction.

In a kind of application example 1 of the invention, it is assumed that the corresponding subtitle of video frame includes that text information " has me to most like Three squirrels ", then can be special by text information title corresponding with the first article in pre- placing articles library, brand, classification etc. Reference breath is matched, since text information includes the information that characteristic information corresponding with the first article matches, therefore can be with The target item that brand is " three squirrels " is obtained, the target item that brand is " non-defective unit shop " can also be obtained, wherein " good Product shop " is identical as the classification of " three squirrels ".

In a kind of application example 2 of the invention, it is assumed that the corresponding subtitle of video frame includes that " I thought one to text information Excellent life " can then match text information advertising slogan information corresponding with the first article in pre- placing articles library, Assuming that matching result shows: the advertising slogan of text information and certain beverage " youth will wake spelling " matches, then can will The beverage is as target item.

In a kind of application example 3 of the invention, it is assumed that include in the corresponding image of video frame text information " GAP ", Personage's wearing i.e. in image has the article (such as clothes, cap, school bag) with " GAP " logo, then can believe the text It ceases the characteristic informations such as title corresponding with the first article in pre- placing articles library, brand, classification to be matched, due to text information Including the information that characteristic information corresponding with the first article matches, therefore available brand is the target item of " GAP ", may be used also To obtain the target item that brand is " excellent clothing library ", wherein " excellent clothing library " is same or similar with the classification of " GAP ".

In step 202 after the target item that acquisition matches with the text information in pre- placing articles library, step 203 The corresponding target information of the target item can be added in the video frame, when watching the video so as to subsequent user, When video progress to the video frame, target information is showed into user.

In an alternative embodiment of the invention, above-mentioned steps 203 add the corresponding target information of the target item The process being added in the video frame may include: in the determining video frame for adding the target position of the target information It sets；Add the target information in target position in the video frame.

In practical applications, video frame can be analyzed, is suitable for adding mesh to obtain from the image of video frame Mark the target position of information.

In an alternative embodiment of the invention, the target position can be subtitle relevant position.Subtitle relevant bits Set may include: subtitle position or subtitle peripheral location.It wherein, can be according to mesh when target position is subtitle position Mark information modifies to the subtitle for including in the video frame, to add the target letter in the subtitle that the video frame includes Breath.Alternatively, when target position is the peripheral location of subtitle, can using target information as in the video frame subtitle it is additional Information is added around the subtitle.

In an alternative embodiment of the invention, the target position can be consistent with the target item, in this way, can To improve the naturalness of video.Correspondingly, for adding the target position of the target information in the above-mentioned determination video frame Process may include: the degree of conformity having between article and the target item in the determining video frame；From the video Have in frame and obtain the position that degree of conformity meets the article of prerequisite in article, as target position.

Wherein, existing article can will have in video frame in practical applications for the article for including in video frame The characteristic information (such as shape, color, title, classification) of article and the target item characteristic information (such as shape, color, Title, classification, brand and target information etc.) it is matched, to obtain degree of conformity between the two.Further, if the degree of conformity Meet prerequisite, then this can be had to position of the article in the video frame as target position.Optionally, degree of conformity accords with Closing prerequisite may include: degree of conformity more than preset threshold etc..For example, if target item " cola " is the drink of pop can shape Material, then shape is the position where the article of pop can shape or ampuliform in available video frame according to image analysis Deng as target position.For another example, if the target information of target item be certain brand (as " GAP ") logo, then available Position etc. where the article of the clothes or shoes and hats that are consistent in video frame with the logo, as target position, for example, such as with The style of clothes or shoes and hats that the logo of " GAP " is consistent can be Casual Style corresponding with " GAP ", it will be understood that the mesh Cursor position can position where the article that is consistent in video frame with the logo in the target position of the embodiment of the present invention Within protection scope, wherein the position where article is consistent with the logo can refer to the position addition being suitable for where the article The logo.

In another alternative embodiment of the invention, the target position can be corresponding for prediction picture target area Position, the prediction picture target can for do not influence user viewing image object, the prediction picture target may include: in addition to Image object except the article that personage, personage dress, the prediction picture target can be the skies such as wall, ground, elevator, blue sky Between, which can also be furniture and other items etc..Correspondingly, described for adding in the above-mentioned determination video frame The process of the target position of target information may include: to identify the preset figure for being suitable for adding the target information in video frame As target area, using the prediction picture target area as the target position.

In a kind of application example of the invention, it is assumed that there are the prediction picture target areas of large area in certain video frame (such as wall area, ground region, elevator region or wardrobe region) then can identify that this is pre- by image recognition technology Image target area is set, and is inserted into target information (such as poster information, display diagram) in the prediction picture target area.Usually For watching for the user of video, it is interior other than video for will not perceiving the content of prediction picture target area substantially Hold, thus can reduce influence of the target information to video and user for target information dislike degree while, reality The recommendation of existing target information.

Image recognition refers to and is handled image, analyzed and understood using machine, to identify the figure of various different modes As the technology of target.Specific to the embodiment of the present invention, it can use machine and video frame handled, analyzed and is understood, to know The technology of the image object of not various different modes, wherein the image object in usual video frame can correspond in the video frame There is certain image-region, the image object in video frame may include: article, personage, space etc., for example, personage can be Personage in video frame, article can be the article of personage's wearing in video frame, and space can be ring locating for personage in video frame Border space, such as outdoor environment, indoor environment can be with for example, indoor environment may include the information such as indoor wall, ground Understand, the embodiment of the present invention is without restriction for the specific image object in video frame.

In an alternative embodiment of the invention, the process for carrying out image recognition to the video frame that video includes can wrap It includes: the image object in detection video frame, and the image object got is analyzed using deep learning method, to obtain Corresponding image object information.Therefore, the image recognition result of the embodiment of the present invention may include: the corresponding image mesh of video frame Mark information.Above-mentioned image object information may include: image object image (namely the image of image object in the video frame, The image object is usually corresponding with certain closed area in the video frame), the image recognition result of image object (such as identifies The information such as title, the classification of the image object arrived).For example, can use the face in human face detection tech detection video frame, and Face is analyzed using deep learning method, with information such as gender, ages for obtaining personage, or even personage can also be obtained Source, such as which movie and television play be originated from, or even can also obtain which famous person personage is.Further, the people can also be detected The article of object wearing, such as clothes, shoes, the wrist-watch of wearing, jewellery.Alternatively, the letter of space locating for the personage can also be detected Breath etc..

In practical applications, above-mentioned steps 203 are by the corresponding target information addition of the target item in the video frame Employed in addition manner may include:

Addition manner 1, according to the target information, modify to the information for corresponding to target position in the video frame, To obtain the modified video frame including the target information；Or

Addition manner 2 is added into the target information as the additional information for corresponding to target position in the video frame The video frame.

Wherein, addition manner 1 can be by modifying to the information for corresponding to target position in video frame, by target information It is added to the video frame, the information in video frame can be made to change in this way.

According to a kind of embodiment, the above-mentioned process modified to the information for corresponding to target position in the video frame can be with Include: to modify to the pixel value for corresponding to target position in video frame, specifically, can will correspond to target in the video frame First pixel value of position replaces with corresponding second pixel value of target information, wherein can believe according to the target of picture format The color-values (such as RGB (RGB, Red Green Blue) value) of the target information of breath and/or text formatting determine that target is believed Cease corresponding second pixel value.

According to another embodiment, the above-mentioned process modified to the information for corresponding to target position in the video frame can To include: to modify to the text information for corresponding to subtitle position in video frame, the text information for corresponding to subtitle position is repaired It is changed to the target information of text formatting.

Addition manner 2 can be added the target information as the additional information for corresponding to target position in the video frame Enter the video frame, wherein the additional information may include caption information or mask information.

Wherein it is possible to using the target information of text formatting as the caption information for corresponding to target position in video frame, for example, The personage of video frame is installed with clothes, then can regard the corresponding target information of target item (such as apparel brand A) as the clothes pair The caption information of position is answered, to realize the recommendation of apparel brand A.It should be noted that if the clothes that personage wears in video frame With brand, then the brand that can be had the clothes that the personage of the video frame wears by image processing techniques removes, to avoid The repetition of brand.

Mask refers to that the figure layer with certain transparent value, the parameter of mask may include size, display position and transparent value. Mask in the embodiment of the present invention can be covered in video frame, in this way, can realize mask and video by the parameter of mask It is shown while frame.For example, can be while frame of display video, target position in the video frame shows the mesh by mask Mark information.Also, in order to reduce influence of the mask for video frame, which can be located at where prediction picture target above-mentioned The band of position.

The corresponding target information of the target item is added the application example in the video frame by the embodiment of the present invention May include:

Using example 1, assume that the corresponding subtitle of video frame includes text information " has my favorite three squirrels ", it is assumed that The target item that brand is " non-defective unit shop ", the then text information that can will include in the subtitle of the video frame are obtained by matching " three squirrels " in " have my favorite three squirrels " replaces with " non-defective unit shop ", and obtaining modified caption information is " have my favorite non-defective unit shop ", and be presented in the video frame after addition.

Using example 2, assume that the corresponding subtitle of video frame includes text information " I thought an excellent life ", it is assumed that The advertising slogan " youth will wake spelling " of text information and certain beverage matches, then can be using the beverage as object Product, and mask is set at the peripheral region of the subtitle (such as upper area), it is right by mask load target item (beverage) The target information answered, such as the logo and advertising slogan of beverage, and the mask is presented in the video frame after addition.

Using example 3, assume that personage's wearing in the corresponding image of video frame has article (such as clothing with " GAP " logo Clothes, cap, school bag etc.), it is assumed that obtain the target item that brand is " excellent clothing library " by matching, then it can be in the video frame The logo (such as the logo UNIQLO in excellent clothing library) that target item is added on target position is corresponded in image, or by the video The logo that the logo of the second article replaces with target item in frame (such as replaces with the logo " GAP " in video frame on dress ornament "UNIQLO").Wherein it is possible to realize the addition of the logo of target item by the modification or mask of pixel value or replace It changes.Also, target position can be consistent with the logo of target item, and specifically, which can cover any type of items Article position etc., for example, the type of items of excellent clothing library logo " UNIQLO " covering may include: clothes, cap etc..

In some embodiments of the invention, the text information in the successive video frames that video includes can also be carried out with Track, in this way, can be according to tracking result for the text information in subsequent video frame, Xiang Tongwen in the video frame before being multiplexed The corresponding target item of this information, operand needed for the acquisition of target item not only can be reduced, and target item It is multiple appearance can deepen memory of the user for target item.For example, video frame i (i is the number of video frame, i be greater than Integer equal to 0) there is certain text information " GAP ", the corresponding target item of text information " GAP " is that brand is The article of " UNIQLO " then can carry out picture charge pattern to text information " GAP ", if subsequent video frame i+1, video frame i+ Still there is text information " GAP " in 2 ... video frame i+M (wherein, M is positive integer), then can be directed to subsequent video frame i+ 1, the text information " GAP " for including in video frame i+2 ... video frame i+M, multiplexing brand is the article of " UNIQLO ", until identification Until into video frame i+M+1, text information " GAP " disappears, so that, when video progress is to implanting target information When video frame, user can see the target information that joined the article that brand is " UNIQLO ", until text information " GAP " Until no longer showing.

In some embodiments of the invention, it can be handled for video is played in real time, correspondingly, can be directed to and work as Corresponding first video frame of preceding playing time obtains corresponding first object article, and in corresponding second view of next playing time The corresponding target information of the first object article is added in frequency frame, wherein the text information in the second video frame can be with One target item matches.

It should be noted that same text information is corresponding in the case where successive video frames include same text information Target item can be corresponding with multiple target informations, in this way, can add the target in the different video frame of successive video frames The corresponding different target information of article, may be implemented the diversity that target item corresponds to target information in this way.For example, the object The corresponding different target information of product may include: the corresponding logo of same target item, display diagram, poster, even text information Deng.

It should be noted that can recorde text information and mesh after obtaining the target item to match with text information The mapping relations between article are marked, in this way, can obtain and text by the mapping relations for the text information in video frame The target item that this information matches.Operand needed for the acquisition of target item not only can be reduced, and object The multiple appearance of product can deepen memory of the user for target item.For example, if more in the corresponding lines of video frame (subtitle) Secondary appearance " three squirrels " can establish " three then after obtaining " three squirrels " corresponding target item " non-defective unit shop " for the first time Mapping relations between squirrel " and " non-defective unit shop "；In this way, " three squirrels " of subsequent appearance can be directed to, reflected by this Penetrate the matched target item of Relation acquisition " non-defective unit shop ".

In other embodiments of the invention, image recognition can also be carried out to the corresponding video flowing of video, to obtain pair The image object information answered；And/or speech recognition is carried out to the corresponding audio stream of video, to obtain corresponding text information.Into And the target item to match with described image target information and/or text information can be obtained from pre- placing articles library, and will The corresponding target information addition of the target item is in the corresponding video frame of the video flowing.

Wherein, the process for carrying out image recognition to the corresponding video flowing of video may include: the image detected in video frame Target, and the image object got is analyzed using deep learning method, to obtain corresponding image object information, because This, the recognition result of the embodiment of the present invention may include: the corresponding image object information of video frame.Above-mentioned image object information can With the image that includes: image object, (namely the image of image object in the video frame, the image object are usually right in the video frame Should have certain closed area), the recognition result of image object (title for the image object that such as identification obtains, classification letter Breath).For example, can use the face in human face detection tech detection video frame, and face is carried out using deep learning method Analysis with information such as gender, ages for obtaining personage, or even can also obtain the source of personage, such as from which movie and television play, Even it can also obtain which famous person personage is.Further, the article that personage wearing can also be detected, such as clothes, shoes, pendant Wrist-watch, jewellery for wearing etc..Alternatively, spatial information locating for the personage etc. can also be detected.

Video is usually made of static picture, these static pictures are referred to as video frame.The corresponding audio stream of video It can be used for indicating continuous audio signal, audio stream video frame corresponding with audio stream can have synchronism, to realize view Effect is played simultaneously in frequency picture and audio.

In practical applications, the corresponding audio stream of video can be corresponding to the lines of video, the video contents such as dub in background music, this is matched Pleasure may include: theme song, interlude, piece caudal flexure and the corresponding background music of lines etc..It is appreciated that the embodiment of the present invention Specific video content corresponding for audio stream is without restriction.

In practical applications, the corresponding video flowing of video and audio stream can be located in identical file, in such cases, Audio can be extracted from video file, specifically, video file can be converted to audio file, such as can be by MP4 (dynamic image expert's compression standard audio level 4, Moving Picture Experts Group Audio Layer 4) lattice The video file of formula is converted to MP3 (dynamic image expert's compression standard audio level 3, Moving Picture Experts Group Audio Layer III) format audio file etc..Alternatively, the corresponding video flowing of video and audio stream can be distinguished In independent file, that is, video file and audio file can be independent, in such cases, it can directly acquire Audio file.It may include the corresponding audio stream of video in above-mentioned audio file, therefore view can be read from above-mentioned audio file Frequently corresponding audio stream.

The corresponding audio stream of video can be converted to text information using speech recognition technology by the embodiment of the present invention.If The corresponding audio stream of video is denoted as S, corresponding phonetic feature sequence O is obtained after carrying out a series of processing to S, is denoted as O={ O₁, O₂..., O_i..., O_T, wherein O_iIt is i-th of phonetic feature, T is phonetic feature total number.The corresponding sentence of audio stream S Son is considered as a word string being made of many words, is denoted as W={ w₁, w₂..., w_n}.The process of speech recognition is exactly according to The phonetic feature sequence O known, finds out most probable word string W.

Specifically, speech recognition is the process of a Model Matching, in this process, can be first according to the language of people Sound feature establishes speech model, by the analysis of the voice signal to input, extracts required feature, Lai Jianli speech recognition institute The template needed；The process that voice inputted to user is identified is by the feature of the inputted voice of user and the template ratio Compared with process, the finally determining optimal Template with the inputted voice match of the user, to obtain the result of speech recognition.Tool The speech recognition algorithm of body can be used the training and recognizer of the hidden Markov model based on statistics, base can also be used In the training of neural network and recognizer, based on the matched recognizer of dynamic time consolidation etc. other algorithms, the present invention Embodiment is without restriction for specific speech recognition process.

Wherein, in the case where the recognition result includes image object information, it can be determined that described image target information In whether include second article identical, similar or generic as the first article in the pre- placing articles library, if so, by institute The first article is stated as the target item to match with the recognition result；It and/or in the recognition result include text information In the case where, judge whether the text information includes similar with the first article in the pre- placing articles library or the first article The information that the corresponding characteristic information of article matches, if so, matching using first article as with the text information Target item.

The embodiment of the present invention can be by identical or generic as the second article for including in image object information first As target item, therefore the video coverage rate of target information can be improved in article.For example, including in image object information " cap 2 " for including in " cap 1 " and pre- placing articles library is identical；For another example, " Western-style clothes 1 " for including in image object information with it is preset " Western-style clothes 2 " for including in article library is similar；For another example, the article for including in pre- placing articles library is " cola ", in image object information Article is " Sprite ", and classification belonging to " cola " and " Sprite " is the beverage etc. of pop can shape.

Specifically, it is above-mentioned judge in described image target information whether include and the first article phase in the pre- placing articles library The process of the second same, similar or generic article may include: the second article that will include in described image target information Characteristic information matched with the characteristic information of the first article in the pre- placing articles library, to obtain corresponding matching result； If the matching result is successful match, it is determined that include in described image target information and the first object in the pre- placing articles library Same, the similar or generic target item of condition；Wherein, the characteristic information may include: in shape, color and classification It is at least one.

In practical applications, the profile for the second article that can include according to image object information determines the shape of the second article Shape；And/or the second article can be determined according to the color-values (such as RGB (RGB, Red Green Blue) value) of the second article Color；And/or the second article is analyzed using deep learning method, to obtain the classification of the second article.

Optionally, the in the characteristic information for the second article for including by described image target information and the pre- placing articles library The characteristic information of one article carries out the spy that matched process may include: the second article that determining described image target information includes Similarity in reference breath and the pre- placing articles library between the characteristic information of the first article, and judge whether the similarity meets Preset similarity condition, if so, corresponding matching result can be successful match.

For example, the first object in the shape and color of the second article that can include by image object information and pre- placing articles library The shape and color of product are matched, if successful match, it may be considered that first article matches with second article.Example Such as, if the shape and color of the clothes that the corresponding image object information of the video frame of certain TV play includes are respectively " Western-style clothes shape 1 " and " claret ", and the shape and color of the first article for including in certain pre- placing articles library are respectively " Western-style clothes shape 2 " and " jujube It is red ", it may be considered that the clothes that image object information includes and the first article successful match.It is appreciated that the present invention Embodiment is without restriction for specific preset similarity condition, for example, preset similarity condition may include: that similarity is super Similarity threshold is crossed, which can wait the positive number no more than 1 for 0.8.

In a kind of example of the embodiment of the present invention, the first pixel value of target position can will be corresponded in the video frame Replace with corresponding second pixel value of target target information.For example, can be by corresponding first image of the second article of certain in video frame Including the first pixel value replace with the second pixel that the second image corresponding with the generic target item of second article includes Value.The example of second article may include: the first beverage of pop can shape or ampuliform, the generic mesh with second article Mark article may include: the second beverage of pop can shape or ampuliform, in this way can be by the picture of the first beverage in video frame Replace with the picture of the second beverage.

In another example of the embodiment of the present invention, it can be corresponded in the video frame and add target on target position The logo of article, or the logo of the second article in the video frame is replaced with to the logo of target item.Wherein it is possible to pass through The modification of pixel value or mask realize the addition or replacement of the logo of target item.Also, target position can be with mesh The logo of mark article is consistent, for example, the logo of target item is the logo of certain brand, then the target position can be for suitable for addition The position of the logo, specifically, the logo can cover the article position etc. of any type of items, for example, logo " GAP " covers Type of items may include: clothes, cap etc., the type of items of logo " NIKE " covering may include: clothes, shoes and hats, case Packet etc..

In another example of the embodiment of the present invention, it can be corresponded in the video frame and pass through mask on target position The corresponding target information of displaying target article, such as logo, display diagram, the target information of poster picture format and/or text lattice The target information etc. of formula, the target information shown by mask can be with links, so that user is linked into target by this The corresponding page of article.

In the embodiment of the present application, the corresponding video frame of audio stream can be one or more.It in practical applications, can be with It, can also be only by target item by the corresponding target information addition of target item in the corresponding all videos frame of the audio stream Corresponding target information addition is in the corresponding partial video frame of the audio stream.It is alternatively possible to first from the audio stream Selection is suitable for adding the target video frame of target information in corresponding video frame, then believes the corresponding target of the target item Breath addition is in the target video frame.It is alternatively possible to which video frame corresponding with the text information that target item matches is made For target video frame, in this manner it is achieved that video pictures are synchronous with target information.For example, the text to match with target item This information is the information of certain section of lines in video, then can believe the corresponding video frame of this section of lines as addition target is suitable for The target video frame of breath.Certainly, the embodiment of the present invention is without restriction for specific target video frame, for example, it can be with For the video frame etc. after video frame corresponding with the text information that target item matches, it is assumed that with object condition The text information matched is located at the end of certain section of lines in video, then can be using the corresponding next video frame of this section of lines as target Video frame.

In an alternative embodiment of the invention, above-mentioned to add the corresponding target information of the target item described Process in the corresponding video frame of audio stream may include: to select to be suitable for addition mesh from the corresponding video frame of the audio stream Mark the target video frame of information；It determines in the target video frame for adding the target position of the target information；Described Add the target information in target position in target video frame.

Wherein, the target video frame may include: video frame corresponding with the text information that target item matches.Tool Body, the selection from the audio stream corresponding video frame is suitable for adding the target video frame of the target information, can be with It include: to obtain the information to match in the recognition result with the characteristic information of the target item as target identification result； Part corresponding with the target identification result is extracted in the audio stream as target audio；The target audio is corresponding Video frame is as the target video frame；The recognition result is the text envelope obtained to the audio stream by speech recognition Breath.In practical applications, audio stream can have certain length, and the text information as recognition result also can have centainly Length, therefore the characteristic information that can be first depending on target item obtains target identification as a result, such as the target text in text information Then this information extracts the target audio in audio stream, and then navigates to the corresponding target video frame of target audio, wherein can To navigate to the corresponding target video frame of target audio according to the synchronism between video flowing and audio stream.

It should be noted that each target video frame can be directed to respectively when target video frame is multiple, determine wherein For adding the target position of the target information；In this way, can avoid a target video frame corresponding to a certain extent Duration compared with short-range missile apply family miss target information the problem of.

In an alternative embodiment of the invention, the method for the embodiment of the present invention can also include: according to the target Information modifies to the audio stream, to obtain the modified audio stream to match with the target information.Wherein, it repairs It may include the audio to match with target information in audio stream after changing, for example, it is assumed that certain section of lines of video are " to have me most Three squirrels liked ", it is assumed that target item is " non-defective unit shop ", then can be " to have me by the corresponding audio modification of the lines Favorite non-defective unit shop ".

According to a kind of embodiment, speech synthesis can be carried out to the target information, to obtain target audio；Using described Target audio replaces the audio to match in the audio stream with the target item, and replaced audio stream is as modified Audio stream.

Speech synthesis technique is also known as literary periodicals (TTS, Text-to-Speech) technology, i.e., is voice by text conversion Technology.The example of speech synthesis technique may include: based on hidden Markov model (HMM, Hidden Markov Model) Speech synthesis (HTS, HMM-based Speech Synthesis System), the basic ideas of HTS are: to voice signal into Row parametrization is decomposed, and establishes the corresponding HMM model of each parameters,acoustic, the HMM model prediction obtained using training when synthesis to The parameters,acoustic of synthesis text, these parameters,acoustics are input to Parametric synthesizers, finally obtain synthesis voice.Above-mentioned acoustics ginseng Number may include: at least one of frequency spectrum parameter and base frequency parameters.

According to another embodiment, the above-mentioned process modified to the audio stream may include: to obtain the audio Flow corresponding phonetic feature；Using the phonetic feature, speech synthesis is carried out to the target information, to obtain target audio； The audio to match in the audio stream with the target item, replaced audio stream conduct are replaced using the target audio Modified audio stream.In the present embodiment, the phonetic feature can use, determine the corresponding parameters,acoustic of speech synthesis, this The audio not being replaced in audio stream and consistency of the replaced audio in terms of phonetic feature may be implemented in sample.

Optionally, above-mentioned phonetic feature may include vocal print feature, and vocal print feature is the carrying that electricity consumption acoustic instrument is shown The sound wave spectrum of verbal information, vocal print not only has specificity, but also has the characteristics of relative stability.The embodiment of the present invention utilizes The corresponding vocal print feature of audio stream carries out the speech synthesis of target information, the target audio that synthesis can be made to obtain and audio stream pair The primary sound answered matches, and realizes the integrality of video content.

It in an alternative embodiment of the invention, can be to the audio stream before modified audio stream and modification (referred to as Raw audio streams) time shaft alignment is carried out, modified audio stream may be implemented for above-mentioned time shaft alignment and raw audio streams exist Consistency in terms of time shaft, the influence that can be synchronized in this way to avoid the modification because of audio stream for video/audio.Assuming that original Corresponding with text information " have my favorite three squirrels " in audio stream is the first audio, it is assumed that in modified audio stream Corresponding with text information after modification " have my favorite non-defective unit shop " is the second audio, then the first audio is in raw audio streams In temporal information and the second audio audio stream after the modification in temporal information be consistent；Specifically, the first audio and The corresponding duration of second audio can be consistent, also, when initial time and termination of first audio in raw audio streams Between with the initial time in the second audio audio stream after the modification and terminate the time and be consistent.

In an alternative embodiment of the invention, the method for the embodiment of the present invention can also include: to obtain locating for equipment The corresponding object language in geographic area and the geographic area；It is translated as the corresponding text information of audio stream to meet institute State the target text information of object language；By target text information addition in the corresponding video frame of the audio stream.Its In, equipment can be equipment used by a user, and the embodiment of the present invention can be for geographic area locating for user, by audio stream Corresponding text information (such as lines, the lyrics) carries out machine translation, and different language user may be implemented in this way to be understood The purpose of video content.The granularity of above-mentioned geographic area can be country etc., in this way, for the user in American-European region, it can The corresponding text information of audio stream is translated as English from a kind of language (such as Chinese).Certainly, the granularity of above-mentioned geographic area It can also be provinces and cities etc., in this way, the corresponding text information of audio stream can be translated as some area from a kind of language (such as Chinese) The dialect (such as northeast dialect, Sichuan dialect, Guangdong dialect) in domain.

To sum up, the method for processing video frequency of the embodiment of the present invention is obtained the text information in video frame automatically by machine, obtained The target item to match in pre- placing articles library with text information is taken, and the corresponding target information of the target item is added In corresponding video frame；Due to the embodiment of the present invention can in the case where being not necessarily to manual intervention quick obtaining and video frame The target item that text information matches, therefore the processing time of video can be shortened and promote video treatment effeciency.

It should be noted that for simple description, therefore, it is stated as a series of movement is dynamic for embodiment of the method It combines, but those skilled in the art should understand that, the embodiment of the present invention is not by the limit of described athletic performance sequence System, because according to an embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, art technology Personnel also should be aware of, and the embodiments described in the specification are all preferred embodiments, and related athletic performance is simultaneously different It surely is necessary to the embodiment of the present invention.

Installation practice

Referring to Fig. 3, a kind of structural block diagram of video process apparatus embodiment of the invention is shown, can specifically include: Text information obtains module 301, text information obtains module 302 and target information adding module 303.

Wherein, text information obtains module 301, for obtaining the text information in video frame；

Target item obtains module 302, for obtaining the target to match with the text information from pre- placing articles library Article；

Target information adding module 303, for adding the corresponding target information of the target item in the video frame In.

Optionally, the text information acquisition module 301 may include:

Identify submodule, the video frame for may include to video carries out text identification and/or subtitle recognition, to obtain Text information in the video frame.

Optionally, the target item acquisition module 302 may include:

Judging submodule, for judge the text information whether may include and the first article in the pre- placing articles library Or first article the information that matches of the corresponding characteristic information of ware, if so, using first article as with The target item that the text information matches.

Optionally, the target information adding module 303 may include:

Optionally, the target position determines that submodule may include:

Optionally, the target position is subtitle relevant position；

The addition submodule may include:

Subtitle adding unit, for modifying according to target information to the subtitle that may include in the video frame, with The target information is added in the subtitle that the video frame may include；And/or

Optionally, the target information adding module 303 may include:

Video frame information modifies submodule, for according to the target information, to corresponding to target position in the video frame Information modify, with obtain it is modified may include the target information video frame；Or

Optionally, the video frame information modification submodule may include:

Optionally, described device can also include:

Picture charge pattern module, the image object in the successive video frames for may include to video carry out image trace；

For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.

All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

The embodiment of the invention provides a kind of devices for video processing, the apparatus may include there is memory, and One perhaps more than one program one of them or more than one program be stored in memory, and be configured to by one It includes the instruction for performing the following operation that a or more than one processor, which executes the one or more programs: being obtained Take the text information in video frame；The target item to match with the text information is obtained from pre- placing articles library；It will be described The corresponding target information addition of target item is in the video frame.

Optionally, the text information obtained in video frame, comprising:

Text identification and/or subtitle recognition are carried out to the video frame that video includes, to obtain the text in the video frame Information.

It is optionally, described that the target item to match with the text information is obtained from pre- placing articles library, comprising:

Judge whether the text information includes similar with the first article in the pre- placing articles library or the first article The information that the corresponding characteristic information of article matches, if so, matching using first article as with the text information Target item.

It is optionally, described to add the corresponding target information of the target item in the video frame, comprising:

It determines in the video frame for adding the target position of target information；

Add the target information in the target position in the video frame.

Optionally, for adding the target position of the target information in the determination video frame, comprising:

Determine the degree of conformity having between article and the target item in the video frame；Have from the video frame The position that degree of conformity meets the article of prerequisite is obtained in article, as target position；

And/or

The prediction picture target area for being suitable for adding the target information in video frame is identified, by the prediction picture Target area is as the target position.

Optionally, the target position is subtitle relevant position；

Add the target information in the target position in the video frame

It modifies according to target information to the subtitle for including in the video frame, with the subtitle for including in the video frame The middle addition target information；

And/or

It is added target information as the additional information of subtitle in the video frame around the subtitle, in the view The target information is added in frequency frame.

According to the target information, modify to the information for corresponding to target position in the video frame, to be modified The video frame including the target information afterwards；Or

The video frame is added into using the target information as the additional information for corresponding to target position in the video frame.

It is optionally, described to modify to the information for corresponding to target position in the video frame, comprising:

The first pixel value that target position is corresponded in the video frame is replaced with into corresponding second pixel value of target information, Color of corresponding second pixel value of the target information according to the target information of picture format and/or the target information of text formatting Coloured silk value determines；

And/or

It modifies to the text information for corresponding to subtitle position in video frame, the text information for corresponding to subtitle position is repaired It is changed to the target information of text formatting.

Optionally, described device is also configured to execute one or one by one or more than one processor Procedure above includes the instruction for performing the following operation: the image object in the successive video frames for including to video carries out image Tracking；According to image trace result for the image object in subsequent video frame, identical image in the video frame before being multiplexed The corresponding target information of target.

Fig. 4 be it is shown according to an exemplary embodiment it is a kind of for video processing device 900 as terminal when frame Figure.For example, device 900 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console put down Panel device, Medical Devices, body-building equipment, personal digital assistant etc..

Referring to Fig. 4, device 900 may include following one or more components: processing component 902, memory 904, power supply Component 906, multimedia component 908, audio component 910, the interface 912 of input/output (I/O), sensor module 914, and Communication component 916.

The integrated operation of the usual control device 900 of processing component 902, such as with display, telephone call, data communication, phase Machine operation and record operate associated operation.Processing element 902 may include that one or more processors 920 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 902 may include one or more modules, just Interaction between processing component 902 and other assemblies.For example, processing component 902 may include multi-media module, it is more to facilitate Interaction between media component 908 and processing component 902.

Memory 904 is configured as storing various types of data to support the operation in equipment 900.These data are shown Example includes the instruction of any application or method for operating on device 900, contact data, and telephone book data disappears Breath, picture, video etc..Memory 904 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.

Power supply module 906 provides electric power for the various assemblies of device 900.Power supply module 906 may include power management system System, one or more power supplys and other with for device 900 generate, manage, and distribute the associated component of electric power.

Multimedia component 908 includes the screen of one output interface of offer between described device 900 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding motion The boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, Multimedia component 908 includes a front camera and/or rear camera.When equipment 900 is in operation mode, as shot mould When formula or video mode, front camera and/or rear camera can receive external multi-medium data.Each preposition camera shooting Head and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 910 is configured as output and/or input audio signal.For example, audio component 910 includes a Mike Wind (MIC), when device 900 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched It is set to reception external audio signal.The received audio signal can be further stored in memory 904 or via communication set Part 916 is sent.In some embodiments, audio component 910 further includes a loudspeaker, is used for output audio signal.

I/O interface 912 provides interface between processing component 902 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.

Sensor module 914 includes one or more sensors, and the state for providing various aspects for device 900 is commented Estimate.For example, sensor module 914 can detecte the state that opens/closes of equipment 900, and the relative positioning of component, for example, it is described Component is the display and keypad of device 900, and sensor module 914 can be with 900 1 components of detection device 900 or device Position change, the existence or non-existence that user contacts with device 900,900 orientation of device or acceleration/deceleration and device 900 Temperature change.Sensor module 914 may include proximity sensor, be configured to detect without any physical contact The presence of neighbouring article.Sensor module 914 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 916 is configured to facilitate the communication of wired or wireless way between device 900 and other equipment.Device 900 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation In example, communication component 916 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 916 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 900 can be believed by one or more application specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 904 of instruction, above-metioned instruction can be executed by the processor 920 of device 900 to complete the above method.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..

Fig. 5 is the structural schematic diagram of server in some embodiments of the present invention.The server 1900 can be because of configuration or property Energy is different and generates bigger difference, may include one or more central processing units (central processing Units, CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage applications The storage medium 1930 (such as one or more mass memory units) of program 1942 or data 1944.Wherein, memory 1932 and storage medium 1930 can be of short duration storage or persistent storage.The program for being stored in storage medium 1930 may include one A or more than one module (diagram does not mark), each module may include to the series of instructions operation in server.More into One step, central processing unit 1922 can be set to communicate with storage medium 1930, execute storage medium on server 1900 Series of instructions operation in 1930.

Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..

A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium by device (terminal or Server) processor execute when, enable the terminal to execute a kind of method for processing video frequency, which comprises obtain video Text information in frame；The target item to match with the text information is obtained from pre- placing articles library；By the object The corresponding target information addition of product is in the video frame.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Above to a kind of method for processing video frequency provided by the present invention, a kind of video process apparatus and a kind of at video The device of reason, is described in detail, and specific case used herein explains the principle of the present invention and embodiment It states, the above description of the embodiment is only used to help understand the method for the present invention and its core ideas；Meanwhile for this field Those skilled in the art, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, to sum up institute It states, the contents of this specification are not to be construed as limiting the invention.

Claims

1. a kind of method for processing video frequency characterized by comprising

Obtain the text information in video frame；

2. the method according to claim 1, wherein the text information obtained in video frame, comprising:

Text identification and/or subtitle recognition are carried out to the video frame that video includes, to obtain the text information in the video frame.

3. the method according to claim 1, wherein described obtain and the text information from pre- placing articles library The target item to match, comprising:

Judge the text information whether include and the ware of the first article or the first article in the pre- placing articles library The information that corresponding characteristic information matches, if so, using first article as the mesh to match with the text information Mark article.

4. the method according to claim 1, wherein described add the corresponding target information of the target item In the video frame, comprising:

Add the target information in the target position in the video frame.

5. according to the method described in claim 4, it is characterized in that, for adding the target in the determination video frame The target position of information, comprising:

Determine the degree of conformity having between article and the target item in the video frame；Has article from the video frame The middle position for obtaining degree of conformity and meeting the article of prerequisite, as target position；And/or

The prediction picture target area for being suitable for adding the target information in video frame is identified, by the prediction picture target Region is as the target position.

6. according to the method described in claim 4, it is characterized in that, the target position is subtitle relevant position；

Add the target information in the target position in the video frame

It modifies according to target information to the subtitle for including in the video frame, to add in the subtitle that the video frame includes Add the target information；And/or

It is added target information as the additional information of subtitle in the video frame around the subtitle, in the video frame The middle addition target information.

7. the method according to claim 1, wherein described add the corresponding target information of the target item In the video frame, comprising:

According to the target information, modify to the information for corresponding to target position in the video frame, it is modified to obtain Video frame including the target information；Or

8. the method according to the description of claim 7 is characterized in that described to the information for corresponding to target position in the video frame It modifies, comprising:

The first pixel value that target position is corresponded in the video frame is replaced with into corresponding second pixel value of target information, it is described Corresponding second pixel value of target information is according to the target information of picture format and/or the color-values of the target information of text formatting It determines；

And/or

It modifies to the text information for corresponding to subtitle position in video frame, the text information for corresponding to subtitle position is revised as The target information of text formatting.

9. a kind of video process apparatus characterized by comprising

Target item obtains module, for obtaining the target item to match with the text information from pre- placing articles library；With And

10. a kind of device for video processing, which is characterized in that include memory and one or more than one Program, perhaps more than one program is stored in memory and is configured to by one or more than one processing for one of them It includes the instruction for performing the following operation that device, which executes the one or more programs:

Obtain the text information in video frame；

11. a kind of machine readable media is stored thereon with instruction, when executed by one or more processors, so that device is held Method for processing video frequency of the row as described in one or more in claim 1 to 8.